Sample records for comparative genomics based

  1. YersiniaBase: a genomic resource and analysis platform for comparative analysis of Yersinia.

    PubMed

    Tan, Shi Yang; Dutta, Avirup; Jakubovics, Nicholas S; Ang, Mia Yang; Siow, Cheuk Chuen; Mutha, Naresh Vr; Heydari, Hamed; Wee, Wei Yee; Wong, Guat Jah; Choo, Siew Woh

    2015-01-16

    Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causes plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species. With the advances in sequencing technologies, many genomes of Yersinia have been sequenced. However, there is currently no specialized platform to hold the rapidly-growing Yersinia genomic data and to provide analysis tools particularly for comparative analyses, which are required to provide improved insights into their biology, evolution and pathogenicity. To facilitate the ongoing and future research of Yersinia, especially those generally considered non-pathogenic species, a well-defined repository and analysis platform is needed to hold the Yersinia genomic data and analysis tools for the Yersinia research community. Hence, we have developed the YersiniaBase, a robust and user-friendly Yersinia resource and analysis platform for the analysis of Yersinia genomic data. YersiniaBase has a total of twelve species and 232 genome sequences, of which the majority are Yersinia pestis. In order to smooth the process of searching genomic data in a large database, we implemented an Asynchronous JavaScript and XML (AJAX)-based real-time searching system in YersiniaBase. Besides incorporating existing tools, which include JavaScript-based genome browser (JBrowse) and Basic Local Alignment Search Tool (BLAST), YersiniaBase also has in-house developed tools: (1) Pairwise Genome Comparison tool (PGC) for comparing two user-selected genomes; (2) Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomics analysis of Yersinia genomes; (3) YersiniaTree for constructing phylogenetic tree of Yersinia. We ran analyses based on the tools and genomic data in YersiniaBase and the preliminary results showed differences in virulence genes found in Yersinia pestis and Yersinia pseudotuberculosis compared to other Yersinia species, and differences between Yersinia enterocolitica subsp. enterocolitica and Yersinia enterocolitica subsp. palearctica. YersiniaBase offers free access to wide range of genomic data and analysis tools for the analysis of Yersinia. YersiniaBase can be accessed at http://yersinia.um.edu.my .

  2. The Diagnostic Yield of Array Comparative Genomic Hybridization Is High Regardless of Severity of Intellectual Disability/Developmental Delay in Children.

    PubMed

    D'Arrigo, Stefano; Gavazzi, Francesco; Alfei, Enrico; Zuffardi, Orsetta; Montomoli, Cristina; Corso, Barbara; Buzzi, Erika; Sciacca, Francesca L; Bulgheroni, Sara; Riva, Daria; Pantaleoni, Chiara

    2016-05-01

    Microarray-based comparative genomic hybridization is a method of molecular analysis that identifies chromosomal anomalies (or copy number variants) that correlate with clinical phenotypes. The aim of the present study was to apply a clinical score previously designated by de Vries to 329 patients with intellectual disability/developmental disorder (intellectual disability/developmental delay) referred to our tertiary center and to see whether the clinical factors are associated with a positive outcome of aCGH analyses. Another goal was to test the association between a positive microarray-based comparative genomic hybridization result and the severity of intellectual disability/developmental delay. Microarray-based comparative genomic hybridization identified structural chromosomal alterations responsible for the intellectual disability/developmental delay phenotype in 16% of our sample. Our study showed that causative copy number variants are frequently found even in cases of mild intellectual disability (30.77%). We want to emphasize the need to conduct microarray-based comparative genomic hybridization on all individuals with intellectual disability/developmental delay, regardless of the severity, because the degree of intellectual disability/developmental delay does not predict the diagnostic yield of microarray-based comparative genomic hybridization. © The Author(s) 2015.

  3. mySyntenyPortal: an application package to construct websites for synteny block analysis.

    PubMed

    Lee, Jongin; Lee, Daehwan; Sim, Mikang; Kwon, Daehong; Kim, Juyeon; Ko, Younhee; Kim, Jaebum

    2018-06-05

    Advances in sequencing technologies have facilitated large-scale comparative genomics based on whole genome sequencing. Constructing and investigating conserved genomic regions among multiple species (called synteny blocks) are essential in the comparative genomics. However, they require significant amounts of computational resources and time in addition to bioinformatics skills. Many web interfaces have been developed to make such tasks easier. However, these web interfaces cannot be customized for users who want to use their own set of genome sequences or definition of synteny blocks. To resolve this limitation, we present mySyntenyPortal, a stand-alone application package to construct websites for synteny block analyses by using users' own genome data. mySyntenyPortal provides both command line and web-based interfaces to build and manage websites for large-scale comparative genomic analyses. The websites can be also easily published and accessed by other users. To demonstrate the usability of mySyntenyPortal, we present an example study for building websites to compare genomes of three mammalian species (human, mouse, and cow) and show how they can be easily utilized to identify potential genes affected by genome rearrangements. mySyntenyPortal will contribute for extended comparative genomic analyses based on large-scale whole genome sequences by providing unique functionality to support the easy creation of interactive websites for synteny block analyses from user's own genome data.

  4. KGCAK: a K-mer based database for genome-wide phylogeny and complexity evaluation.

    PubMed

    Wang, Dapeng; Xu, Jiayue; Yu, Jun

    2015-09-16

    The K-mer approach, treating genomic sequences as simple characters and counting the relative abundance of each string upon a fixed K, has been extensively applied to phylogeny inference for genome assembly, annotation, and comparison. To meet increasing demands for comparing large genome sequences and to promote the use of the K-mer approach, we develop a versatile database, KGCAK ( http://kgcak.big.ac.cn/KGCAK/ ), containing ~8,000 genomes that include genome sequences of diverse life forms (viruses, prokaryotes, protists, animals, and plants) and cellular organelles of eukaryotic lineages. It builds phylogeny based on genomic elements in an alignment-free fashion and provides in-depth data processing enabling users to compare the complexity of genome sequences based on K-mer distribution. We hope that KGCAK becomes a powerful tool for exploring relationship within and among groups of species in a tree of life based on genomic data.

  5. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. Ensembl comparative genomics resources.

    PubMed

    Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. © The Author(s) 2016. Published by Oxford University Press.

  7. Ensembl comparative genomics resources

    PubMed Central

    Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847

  8. A whole-genome, radiation hybrid map of wheat

    USDA-ARS?s Scientific Manuscript database

    Generating a reference sequence of bread wheat (Triticum aestivum L.) is a challenging task because of its large, highly repetitive and allopolyploid genome. Ordering of BAC- and NGS-based contigs in ongoing wheat genome-sequencing projects primarily uses recombination and comparative genomics-base...

  9. CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.

    PubMed

    Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M; Tettelin, Hervé; White, Owen; Angiuoli, Samuel V; Mahurkar, Anup; Fricke, W Florian

    2017-04-27

    The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.

  10. [Advance on genome research of Yersinia pestis bacteriophage].

    PubMed

    Tan, H L; Wang, P; Li, W

    2017-04-10

    Completion of the genome sequences on Yersinia pestis bacteriophage offered unprecedented opportunity for researchers to carry out related genomic studies. This review was based on the genomic sequences and provided a genomic perspective in describing the essential features of genome on Yersinia pestis bacteriophage. Based on the comparative genomics, genetic evolutionary relationship was discussed. Description of functions from the gene prediction and protein annotation provided evidence for further related studies.

  11. Building the Evidence Base for Decision-making in Cancer Genomic Medicine Using Comparative Effectiveness Research

    PubMed Central

    Goddard, Katrina A.B.; Knaus, William A.; Whitlock, Evelyn; Lyman, Gary H.; Feigelson, Heather Spencer; Schully, Sheri D.; Ramsey, Scott; Tunis, Sean; Freedman, Andrew N.; Khoury, Muin J.; Veenstra, David L.

    2013-01-01

    Background The clinical utility is uncertain for many cancer genomic applications. Comparative effectiveness research (CER) can provide evidence to clarify this uncertainty. Objectives To identify approaches to help stakeholders make evidence-based decisions, and to describe potential challenges and opportunities using CER to produce evidence-based guidance. Methods We identified general CER approaches for genomic applications through literature review, the authors’ experiences, and lessons learned from a recent, seven-site CER initiative in cancer genomic medicine. Case studies illustrate the use of CER approaches. Results Evidence generation and synthesis approaches include comparative observational and randomized trials, patient reported outcomes, decision modeling, and economic analysis. We identified significant challenges to conducting CER in cancer genomics: the rapid pace of innovation, the lack of regulation, the limited evidence for clinical utility, and the beliefs that genomic tests could have personal utility without having clinical utility. Opportunities to capitalize on CER methods in cancer genomics include improvements in the conduct of evidence synthesis, stakeholder engagement, increasing the number of comparative studies, and developing approaches to inform clinical guidelines and research prioritization. Conclusions CER offers a variety of methodological approaches to address stakeholders’ needs. Innovative approaches are needed to ensure an effective translation of genomic discoveries. PMID:22516979

  12. GenColors-based comparative genome databases for small eukaryotic genomes.

    PubMed

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  13. CoryneBase: Corynebacterium Genomic Resources and Analysis Tools at Your Fingertips

    PubMed Central

    Tan, Mui Fern; Jakubovics, Nick S.; Wee, Wei Yee; Mutha, Naresh V. R.; Wong, Guat Jah; Ang, Mia Yang; Yazdi, Amir Hessam; Choo, Siew Woh

    2014-01-01

    Corynebacteria are used for a wide variety of industrial purposes but some species are associated with human diseases. With increasing number of corynebacterial genomes having been sequenced, comparative analysis of these strains may provide better understanding of their biology, phylogeny, virulence and taxonomy that may lead to the discoveries of beneficial industrial strains or contribute to better management of diseases. To facilitate the ongoing research of corynebacteria, a specialized central repository and analysis platform for the corynebacterial research community is needed to host the fast-growing amount of genomic data and facilitate the analysis of these data. Here we present CoryneBase, a genomic database for Corynebacterium with diverse functionality for the analysis of genomes aimed to provide: (1) annotated genome sequences of Corynebacterium where 165,918 coding sequences and 4,180 RNAs can be found in 27 species; (2) access to comprehensive Corynebacterium data through the use of advanced web technologies for interactive web interfaces; and (3) advanced bioinformatic analysis tools consisting of standard BLAST for homology search, VFDB BLAST for sequence homology search against the Virulence Factor Database (VFDB), Pairwise Genome Comparison (PGC) tool for comparative genomic analysis, and a newly designed Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomic analysis. CoryneBase offers the access of a range of Corynebacterium genomic resources as well as analysis tools for comparative genomics and pathogenomics. It is publicly available at http://corynebacterium.um.edu.my/. PMID:24466021

  14. Base-By-Base: single nucleotide-level analysis of whole viral genome alignments.

    PubMed

    Brodie, Ryan; Smith, Alex J; Roper, Rachel L; Tcherepanov, Vasily; Upton, Chris

    2004-07-14

    With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes) is not feasible without new bioinformatics tools. A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1) rapidly identify and correct alignment errors in large, multiple genome alignments; and 2) generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs) to retrieve detailed annotation information about the aligned genomes or use information from text files. Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.

  15. Using Comparative Genomics for Inquiry-Based Learning to Dissect Virulence of "Escherichia coli" O157:H7 and "Yersinia pestis"

    ERIC Educational Resources Information Center

    Baumler, David J.; Banta, Lois M.; Hung, Kai F.; Schwarz, Jodi A.; Cabot, Eric L.; Glasner, Jeremy D.; Perna, Nicole T.

    2012-01-01

    Genomics and bioinformatics are topics of increasing interest in undergraduate biological science curricula. Many existing exercises focus on gene annotation and analysis of a single genome. In this paper, we present two educational modules designed to enable students to learn and apply fundamental concepts in comparative genomics using examples…

  16. In silico Comparison of 19 Porphyromonas gingivalis Strains in Genomics, Phylogenetics, Phylogenomics and Functional Genomics.

    PubMed

    Chen, Tsute; Siddiqui, Huma; Olsen, Ingar

    2017-01-01

    Currently, genome sequences of a total of 19 Porphyromonas gingivalis strains are available, including eight completed genomes (strains W83, ATCC 33277, TDC60, HG66, A7436, AJW4, 381, and A7A1-28) and 11 high-coverage draft sequences (JCVI SC001, F0185, F0566, F0568, F0569, F0570, SJD2, W4087, W50, Ando, and MP4-504) that are assembled into fewer than 300 contigs. The objective was to compare these genomes at both nucleotide and protein sequence levels in order to understand their phylogenetic and functional relatedness. Four copies of 16S rRNA gene sequences were identified in each of the eight complete genomes and one in the other 11 unfinished genomes. These 43 16S rRNA sequences represent only 24 unique sequences and the derived phylogenetic tree suggests a possible evolutionary history for these strains. Phylogenomic comparison based on shared proteins and whole genome nucleotide sequences consistently showed two groups with closely related members: one consisted of ATCC 33277, 381, and HG66, another of W83, W50, and A7436. At least 1,037 core/shared proteins were identified in the 19 P. gingivalis genomes based on the most stringent detecting parameters. Comparative functional genomics based on genome-wide comparisons between NCBI and RAST annotations, as well as additional approaches, revealed functions that are unique or missing in individual P. gingivalis strains, or species-specific in all P. gingivalis strains, when compared to a neighboring species P. asaccharolytica . All the comparative results of this study are available online for download at ftp://www.homd.org/publication_data/20160425/.

  17. In silico Comparison of 19 Porphyromonas gingivalis Strains in Genomics, Phylogenetics, Phylogenomics and Functional Genomics

    PubMed Central

    Chen, Tsute; Siddiqui, Huma; Olsen, Ingar

    2017-01-01

    Currently, genome sequences of a total of 19 Porphyromonas gingivalis strains are available, including eight completed genomes (strains W83, ATCC 33277, TDC60, HG66, A7436, AJW4, 381, and A7A1-28) and 11 high-coverage draft sequences (JCVI SC001, F0185, F0566, F0568, F0569, F0570, SJD2, W4087, W50, Ando, and MP4-504) that are assembled into fewer than 300 contigs. The objective was to compare these genomes at both nucleotide and protein sequence levels in order to understand their phylogenetic and functional relatedness. Four copies of 16S rRNA gene sequences were identified in each of the eight complete genomes and one in the other 11 unfinished genomes. These 43 16S rRNA sequences represent only 24 unique sequences and the derived phylogenetic tree suggests a possible evolutionary history for these strains. Phylogenomic comparison based on shared proteins and whole genome nucleotide sequences consistently showed two groups with closely related members: one consisted of ATCC 33277, 381, and HG66, another of W83, W50, and A7436. At least 1,037 core/shared proteins were identified in the 19 P. gingivalis genomes based on the most stringent detecting parameters. Comparative functional genomics based on genome-wide comparisons between NCBI and RAST annotations, as well as additional approaches, revealed functions that are unique or missing in individual P. gingivalis strains, or species-specific in all P. gingivalis strains, when compared to a neighboring species P. asaccharolytica. All the comparative results of this study are available online for download at ftp://www.homd.org/publication_data/20160425/. PMID:28261563

  18. Investigation of the Evolutionary Development of the Genus Bifidobacterium by Comparative Genomics

    PubMed Central

    Lugli, Gabriele Andrea; Milani, Christian; Turroni, Francesca; Duranti, Sabrina; Ferrario, Chiara; Viappiani, Alice; Mancabelli, Leonardo; Mangifesta, Marta; Taminiau, Bernard; Delcenserie, Véronique; van Sinderen, Douwe

    2014-01-01

    The Bifidobacterium genus currently encompasses 48 recognized taxa, which have been isolated from different ecosystems. However, the current phylogeny of bifidobacteria is hampered by the relative paucity of genotypic data. Here, we reassessed the taxonomy of this bacterial genus using genome-based approaches, which demonstrated that the previous taxonomic view of bifidobacteria contained several inconsistencies. In particular, high levels of genetic relatedness were shown to exist between particular Bifidobacterium taxa which would not justify their status as separate species. The results presented are here based on average nucleotide identity analysis involving the genome sequences for each type strain of the 48 bifidobacterial taxa, as well as phylogenetic comparative analysis of the predicted core genome of the Bifidobacterium genus. The results of this study demonstrate that the availability of complete genome sequences allows the reconstruction of a more robust bifidobacterial phylogeny than that obtained from a single gene-based sequence comparison, thus discouraging the assignment of a new or separate bifidobacterial taxon without such a genome-based validation. PMID:25107967

  19. GenomeFingerprinter: the genome fingerprint and the universal genome fingerprint analysis for systematic comparative genomics.

    PubMed

    Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

    2013-01-01

    No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set up the methodology of systematic comparative genomics based on the genome fingerprint analysis.

  20. Complete Genome Sequence and Comparative Genomics of a Novel Myxobacterium Myxococcus hansupus

    PubMed Central

    Sharma, Gaurav; Narwani, Tarun; Subramanian, Srikrishna

    2016-01-01

    Myxobacteria, a group of Gram-negative aerobes, belong to the class δ-proteobacteria and order Myxococcales. Unlike anaerobic δ-proteobacteria, they exhibit several unusual physiogenomic properties like gliding motility, desiccation-resistant myxospores and large genomes with high coding density. Here we report a 9.5 Mbp complete genome of Myxococcus hansupus that encodes 7,753 proteins. Phylogenomic and genome-genome distance based analysis suggest that Myxococcus hansupus is a novel member of the genus Myxococcus. Comparative genome analysis with other members of the genus Myxococcus was performed to explore their genome diversity. The variation in number of unique proteins observed across different species is suggestive of diversity at the genus level while the overrepresentation of several Pfam families indicates the extent and mode of genome expansion as compared to non-Myxococcales δ-proteobacteria. PMID:26900859

  1. Cost-effective cloud computing: a case study using the comparative genomics tool, roundup.

    PubMed

    Kudtarkar, Parul; Deluca, Todd F; Fusaro, Vincent A; Tonellato, Peter J; Wall, Dennis P

    2010-12-22

    Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource-Roundup-using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon's Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon's computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure.

  2. A new strategy for genome assembly using short sequence reads and reduced representation libraries.

    PubMed

    Young, Andrew L; Abaan, Hatice Ozel; Zerbino, Daniel; Mullikin, James C; Birney, Ewan; Margulies, Elliott H

    2010-02-01

    We have developed a novel approach for using massively parallel short-read sequencing to generate fast and inexpensive de novo genomic assemblies comparable to those generated by capillary-based methods. The ultrashort (<100 base) sequences generated by this technology pose specific biological and computational challenges for de novo assembly of large genomes. To account for this, we devised a method for experimentally partitioning the genome using reduced representation (RR) libraries prior to assembly. We use two restriction enzymes independently to create a series of overlapping fragment libraries, each containing a tractable subset of the genome. Together, these libraries allow us to reassemble the entire genome without the need of a reference sequence. As proof of concept, we applied this approach to sequence and assembled the majority of the 125-Mb Drosophila melanogaster genome. We subsequently demonstrate the accuracy of our assembly method with meaningful comparisons against the current available D. melanogaster reference genome (dm3). The ease of assembly and accuracy for comparative genomics suggest that our approach will scale to future mammalian genome-sequencing efforts, saving both time and money without sacrificing quality.

  3. Comparative Genomics Analyses Reveal Extensive Chromosome Colinearity and Novel Quantitative Trait Loci in Eucalyptus.

    PubMed

    Li, Fagen; Zhou, Changpin; Weng, Qijie; Li, Mei; Yu, Xiaoli; Guo, Yong; Wang, Yu; Zhang, Xiaohong; Gan, Siming

    2015-01-01

    Dense genetic maps, along with quantitative trait loci (QTLs) detected on such maps, are powerful tools for genomics and molecular breeding studies. In the important woody genus Eucalyptus, the recent release of E. grandis genome sequence allows for sequence-based genomic comparison and searching for positional candidate genes within QTL regions. Here, dense genetic maps were constructed for E. urophylla and E. tereticornis using genomic simple sequence repeats (SSR), expressed sequence tag (EST) derived SSR, EST-derived cleaved amplified polymorphic sequence (EST-CAPS), and diversity arrays technology (DArT) markers. The E. urophylla and E. tereticornis maps comprised 700 and 585 markers across 11 linkage groups, totaling at 1,208.2 and 1,241.4 cM in length, respectively. Extensive synteny and colinearity were observed as compared to three earlier DArT-based eucalypt maps (two maps with E. grandis × E. urophylla and one map of E. globulus) and with the E. grandis genome sequence. Fifty-three QTLs for growth (10-56 months of age) and wood density (56 months) were identified in 22 discrete regions on both maps, in which only one colocalizaiton was found between growth and wood density. Novel QTLs were revealed as compared with those previously detected on DArT-based maps for similar ages in Eucalyptus. Eleven to 585 positional candidate genes were obained for a 56-month-old QTL through aligning QTL confidence interval with the E. grandis genome. These results will assist in comparative genomics studies, targeted gene characterization, and marker-assisted selection in Eucalyptus and the related taxa.

  4. Comparative Genomics Analyses Reveal Extensive Chromosome Colinearity and Novel Quantitative Trait Loci in Eucalyptus

    PubMed Central

    Weng, Qijie; Li, Mei; Yu, Xiaoli; Guo, Yong; Wang, Yu; Zhang, Xiaohong; Gan, Siming

    2015-01-01

    Dense genetic maps, along with quantitative trait loci (QTLs) detected on such maps, are powerful tools for genomics and molecular breeding studies. In the important woody genus Eucalyptus, the recent release of E. grandis genome sequence allows for sequence-based genomic comparison and searching for positional candidate genes within QTL regions. Here, dense genetic maps were constructed for E. urophylla and E. tereticornis using genomic simple sequence repeats (SSR), expressed sequence tag (EST) derived SSR, EST-derived cleaved amplified polymorphic sequence (EST-CAPS), and diversity arrays technology (DArT) markers. The E. urophylla and E. tereticornis maps comprised 700 and 585 markers across 11 linkage groups, totaling at 1,208.2 and 1,241.4 cM in length, respectively. Extensive synteny and colinearity were observed as compared to three earlier DArT-based eucalypt maps (two maps with E. grandis × E. urophylla and one map of E. globulus) and with the E. grandis genome sequence. Fifty-three QTLs for growth (10–56 months of age) and wood density (56 months) were identified in 22 discrete regions on both maps, in which only one colocalizaiton was found between growth and wood density. Novel QTLs were revealed as compared with those previously detected on DArT-based maps for similar ages in Eucalyptus. Eleven to 585 positional candidate genes were obained for a 56-month-old QTL through aligning QTL confidence interval with the E. grandis genome. These results will assist in comparative genomics studies, targeted gene characterization, and marker-assisted selection in Eucalyptus and the related taxa. PMID:26695430

  5. Genome-Wide Mapping of Copy Number Variation in Humans: Comparative Analysis of High Resolution Array Platforms

    PubMed Central

    Haraksingh, Rajini R.; Abyzov, Alexej; Gerstein, Mark; Urban, Alexander E.; Snyder, Michael

    2011-01-01

    Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications. PMID:22140474

  6. Mitochondrial genomes of Meloidogyne chitwoodi and M. incognita (Nematoda: Tylenchina): comparative analysis, gene order and phylogenetic relationships with other nematodes.

    PubMed

    Humphreys-Pereira, Danny A; Elling, Axel A

    2014-01-01

    Root-knot nematodes (Meloidogyne spp.) are among the most important plant pathogens. In this study, the mitochondrial (mt) genomes of the root-knot nematodes, M. chitwoodi and M. incognita were sequenced. PCR analyses suggest that both mt genomes are circular, with an estimated size of 19.7 and 18.6-19.1kb, respectively. The mt genomes each contain a large non-coding region with tandem repeats and the control region. The mt gene arrangement of M. chitwoodi and M. incognita is unlike that of other nematodes. Sequence alignments of the two Meloidogyne mt genomes showed three translocations; two in transfer RNAs and one in cox2. Compared with other nematode mt genomes, the gene arrangement of M. chitwoodi and M. incognita was most similar to Pratylenchus vulnus. Phylogenetic analyses (Maximum Likelihood and Bayesian inference) were conducted using 78 complete mt genomes of diverse nematode species. Analyses based on nucleotides and amino acids of the 12 protein-coding mt genes showed strong support for the monophyly of class Chromadorea, but only amino acid-based analyses supported the monophyly of class Enoplea. The suborder Spirurina was not monophyletic in any of the phylogenetic analyses, contradicting the Clade III model, which groups Ascaridomorpha, Spiruromorpha and Oxyuridomorpha based on the small subunit ribosomal RNA gene. Importantly, comparisons of mt gene arrangement and tree-based methods placed Meloidogyne as sister taxa of Pratylenchus, a migratory plant endoparasitic nematode, and not with the sedentary endoparasitic Heterodera. Thus, comparative analyses of mt genomes suggest that sedentary endoparasitism in Meloidogyne and Heterodera is based on convergent evolution. Copyright © 2014 Elsevier B.V. All rights reserved.

  7. Phylogenomic Insights into Mouse Evolution Using a Pseudoreference Approach

    PubMed Central

    Sarver, Brice A.J.; Keeble, Sara; Cosart, Ted; Tucker, Priscilla K.; Dean, Matthew D.

    2017-01-01

    Comparative genomic studies are now possible across a broad range of evolutionary timescales, but the generation and analysis of genomic data across many different species still present a number of challenges. The most sophisticated genotyping and down-stream analytical frameworks are still predominantly based on comparisons to high-quality reference genomes. However, established genomic resources are often limited within a given group of species, necessitating comparisons to divergent reference genomes that could restrict or bias comparisons across a phylogenetic sample. Here, we develop a scalable pseudoreference approach to iteratively incorporate sample-specific variation into a genome reference and reduce the effects of systematic mapping bias in downstream analyses. To characterize this framework, we used targeted capture to sequence whole exomes (∼54 Mbp) in 12 lineages (ten species) of mice spanning the Mus radiation. We generated whole exome pseudoreferences for all species and show that this iterative reference-based approach improved basic genomic analyses that depend on mapping accuracy while preserving the associated annotations of the mouse reference genome. We then use these pseudoreferences to resolve evolutionary relationships among these lineages while accounting for phylogenetic discordance across the genome, contributing an important resource for comparative studies in the mouse system. We also describe patterns of genomic introgression among lineages and compare our results to previous studies. Our general approach can be applied to whole or partitioned genomic data and is easily portable to any system with sufficient genomic resources, providing a useful framework for phylogenomic studies in mice and other taxa. PMID:28338821

  8. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands

    PubMed Central

    Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Rajakumar, Kumar

    2007-01-01

    MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands. PMID:17537813

  9. Assessing the Robustness of Complete Bacterial Genome Segmentations

    NASA Astrophysics Data System (ADS)

    Devillers, Hugo; Chiapello, Hélène; Schbath, Sophie; El Karoui, Meriem

    Comparison of closely related bacterial genomes has revealed the presence of highly conserved sequences forming a "backbone" that is interrupted by numerous, less conserved, DNA fragments. Segmentation of bacterial genomes into backbone and variable regions is particularly useful to investigate bacterial genome evolution. Several software tools have been designed to compare complete bacterial chromosomes and a few online databases store pre-computed genome comparisons. However, very few statistical methods are available to evaluate the reliability of these software tools and to compare the results obtained with them. To fill this gap, we have developed two local scores to measure the robustness of bacterial genome segmentations. Our method uses a simulation procedure based on random perturbations of the compared genomes. The scores presented in this paper are simple to implement and our results show that they allow to discriminate easily between robust and non-robust bacterial genome segmentations when using aligners such as MAUVE and MGA.

  10. Gramene 2016: comparative plant genomics and pathway resources

    PubMed Central

    Tello-Ruiz, Marcela K.; Stein, Joshua; Wei, Sharon; Preece, Justin; Olson, Andrew; Naithani, Sushma; Amarasinghe, Vindhya; Dharmawardhana, Palitha; Jiao, Yinping; Mulvaney, Joseph; Kumari, Sunita; Chougule, Kapeel; Elser, Justin; Wang, Bo; Thomason, James; Bolser, Daniel M.; Kerhornou, Arnaud; Walts, Brandon; Fonseca, Nuno A.; Huerta, Laura; Keays, Maria; Tang, Y. Amy; Parkinson, Helen; Fabregat, Antonio; McKay, Sheldon; Weiser, Joel; D'Eustachio, Peter; Stein, Lincoln; Petryszak, Robert; Kersey, Paul J.; Jaiswal, Pankaj; Ware, Doreen

    2016-01-01

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. PMID:26553803

  11. Genome-wide comparisons of phylogenetic similarities between partial genomic regions and the full-length genome in Hepatitis E virus genotyping.

    PubMed

    Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng

    2014-01-01

    Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.

  12. StreptoBase: An Oral Streptococcus mitis Group Genomic Resource and Analysis Platform.

    PubMed

    Zheng, Wenning; Tan, Tze King; Paterson, Ian C; Mutha, Naresh V R; Siow, Cheuk Chuen; Tan, Shi Yang; Old, Lesley A; Jakubovics, Nicholas S; Choo, Siew Woh

    2016-01-01

    The oral streptococci are spherical Gram-positive bacteria categorized under the phylum Firmicutes which are among the most common causative agents of bacterial infective endocarditis (IE) and are also important agents in septicaemia in neutropenic patients. The Streptococcus mitis group is comprised of 13 species including some of the most common human oral colonizers such as S. mitis, S. oralis, S. sanguinis and S. gordonii as well as species such as S. tigurinus, S. oligofermentans and S. australis that have only recently been classified and are poorly understood at present. We present StreptoBase, which provides a specialized free resource focusing on the genomic analyses of oral species from the mitis group. It currently hosts 104 S. mitis group genomes including 27 novel mitis group strains that we sequenced using the high throughput Illumina HiSeq technology platform, and provides a comprehensive set of genome sequences for analyses, particularly comparative analyses and visualization of both cross-species and cross-strain characteristics of S. mitis group bacteria. StreptoBase incorporates sophisticated in-house designed bioinformatics web tools such as Pairwise Genome Comparison (PGC) tool and Pathogenomic Profiling Tool (PathoProT), which facilitate comparative pathogenomics analysis of Streptococcus strains. Examples are provided to demonstrate how StreptoBase can be employed to compare genome structure of different S. mitis group bacteria and putative virulence genes profile across multiple streptococcal strains. In conclusion, StreptoBase offers access to a range of streptococci genomic resources as well as analysis tools and will be an invaluable platform to accelerate research in streptococci. Database URL: http://streptococcus.um.edu.my.

  13. Gene discovery in the hamster: a comparative genomics approach for gene annotation by sequencing of hamster testis cDNAs

    PubMed Central

    Oduru, Sreedhar; Campbell, Janee L; Karri, SriTulasi; Hendry, William J; Khan, Shafiq A; Williams, Simon C

    2003-01-01

    Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells. PMID:12783626

  14. Elucidating the triplicated ancestral genome structure of radish based on chromosome-level comparison with the Brassica genomes.

    PubMed

    Jeong, Young-Min; Kim, Namshin; Ahn, Byung Ohg; Oh, Mijin; Chung, Won-Hyong; Chung, Hee; Jeong, Seongmun; Lim, Ki-Byung; Hwang, Yoon-Jung; Kim, Goon-Bo; Baek, Seunghoon; Choi, Sang-Bong; Hyung, Dae-Jin; Lee, Seung-Won; Sohn, Seong-Han; Kwon, Soo-Jin; Jin, Mina; Seol, Young-Joo; Chae, Won Byoung; Choi, Keun Jin; Park, Beom-Seok; Yu, Hee-Ju; Mun, Jeong-Hwan

    2016-07-01

    This study presents a chromosome-scale draft genome sequence of radish that is assembled into nine chromosomal pseudomolecules. A comprehensive comparative genome analysis with the Brassica genomes provides genomic evidences on the evolution of the mesohexaploid radish genome. Radish (Raphanus sativus L.) is an agronomically important root vegetable crop and its origin and phylogenetic position in the tribe Brassiceae is controversial. Here we present a comprehensive analysis of the radish genome based on the chromosome sequences of R. sativus cv. WK10039. The radish genome was sequenced and assembled into 426.2 Mb spanning >98 % of the gene space, of which 344.0 Mb were integrated into nine chromosome pseudomolecules. Approximately 36 % of the genome was repetitive sequences and 46,514 protein-coding genes were predicted and annotated. Comparative mapping of the tPCK-like ancestral genome revealed that the radish genome has intermediate characteristics between the Brassica A/C and B genomes in the triplicated segments, suggesting an internal origin from the genus Brassica. The evolutionary characteristics shared between radish and other Brassica species provided genomic evidences that the current form of nine chromosomes in radish was rearranged from the chromosomes of hexaploid progenitor. Overall, this study provides a chromosome-scale draft genome sequence of radish as well as novel insight into evolution of the mesohexaploid genomes in the tribe Brassiceae.

  15. Comparative analysis and visualization of multiple collinear genomes

    PubMed Central

    2012-01-01

    Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains. PMID:22536897

  16. Improving Microbial Genome Annotations in an Integrated Database Context

    PubMed Central

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Anderson, Iain; Mavromatis, Konstantinos; Kyrpides, Nikos C.; Ivanova, Natalia N.

    2013-01-01

    Effective comparative analysis of microbial genomes requires a consistent and complete view of biological data. Consistency regards the biological coherence of annotations, while completeness regards the extent and coverage of functional characterization for genomes. We have developed tools that allow scientists to assess and improve the consistency and completeness of microbial genome annotations in the context of the Integrated Microbial Genomes (IMG) family of systems. All publicly available microbial genomes are characterized in IMG using different functional annotation and pathway resources, thus providing a comprehensive framework for identifying and resolving annotation discrepancies. A rule based system for predicting phenotypes in IMG provides a powerful mechanism for validating functional annotations, whereby the phenotypic traits of an organism are inferred based on the presence of certain metabolic reactions and pathways and compared to experimentally observed phenotypes. The IMG family of systems are available at http://img.jgi.doe.gov/. PMID:23424620

  17. ChloroMitoCU: Codon patterns across organelle genomes for functional genomics and evolutionary applications.

    PubMed

    Sablok, Gaurav; Chen, Ting-Wen; Lee, Chi-Ching; Yang, Chi; Gan, Ruei-Chi; Wegrzyn, Jill L; Porta, Nicola L; Nayak, Kinshuk C; Huang, Po-Jung; Varotto, Claudio; Tang, Petrus

    2017-06-01

    Organelle genomes are widely thought to have arisen from reduction events involving cyanobacterial and archaeal genomes, in the case of chloroplasts, or α-proteobacterial genomes, in the case of mitochondria. Heterogeneity in base composition and codon preference has long been the subject of investigation of topics ranging from phylogenetic distortion to the design of overexpression cassettes for transgenic expression. From the overexpression point of view, it is critical to systematically analyze the codon usage patterns of the organelle genomes. In light of the importance of codon usage patterns in the development of hyper-expression organelle transgenics, we present ChloroMitoCU, the first-ever curated, web-based reference catalog of the codon usage patterns in organelle genomes. ChloroMitoCU contains the pre-compiled codon usage patterns of 328 chloroplast genomes (29,960 CDS) and 3,502 mitochondrial genomes (49,066 CDS), enabling genome-wide exploration and comparative analysis of codon usage patterns across species. ChloroMitoCU allows the phylogenetic comparison of codon usage patterns across organelle genomes, the prediction of codon usage patterns based on user-submitted transcripts or assembled organelle genes, and comparative analysis with the pre-compiled patterns across species of interest. ChloroMitoCU can increase our understanding of the biased patterns of codon usage in organelle genomes across multiple clades. ChloroMitoCU can be accessed at: http://chloromitocu.cgu.edu.tw/. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  18. Gobe: an interactive, web-based tool for comparative genomic visualization.

    PubMed

    Pedersen, Brent S; Tang, Haibao; Freeling, Michael

    2011-04-01

    Gobe is a web-based tool for viewing comparative genomic data. It supports viewing multiple genomic regions simultaneously. Its simple text format and flash-based rendering make it an interactive, exploratory research tool. Gobe can be used without installation through our web service, or downloaded and customized with stylesheets and javascript callback functions. Gobe is a flash application that runs in all modern web-browsers. The full source-code, including that for the online web application is available under the MIT license at: http://github.com/brentp/gobe. Sample applications are hosted at http://try-gobe.appspot.com/ and http://synteny.cnr.berkeley.edu/gobe-app/.

  19. Cost-Effective Cloud Computing: A Case Study Using the Comparative Genomics Tool, Roundup

    PubMed Central

    Kudtarkar, Parul; DeLuca, Todd F.; Fusaro, Vincent A.; Tonellato, Peter J.; Wall, Dennis P.

    2010-01-01

    Background Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource—Roundup—using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Methods Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon’s Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. Results We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon’s computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure. PMID:21258651

  20. Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs.

    PubMed

    Auch, Alexander F; Klenk, Hans-Peter; Göker, Markus

    2010-01-28

    DNA-DNA hybridization (DDH) is a widely applied wet-lab technique to obtain an estimate of the overall similarity between the genomes of two organisms. To base the species concept for prokaryotes ultimately on DDH was chosen by microbiologists as a pragmatic approach for deciding about the recognition of novel species, but also allowed a relatively high degree of standardization compared to other areas of taxonomy. However, DDH is tedious and error-prone and first and foremost cannot be used to incrementally establish a comparative database. Recent studies have shown that in-silico methods for the comparison of genome sequences can be used to replace DDH. Considering the ongoing rapid technological progress of sequencing methods, genome-based prokaryote taxonomy is coming into reach. However, calculating distances between genomes is dependent on multiple choices for software and program settings. We here provide an overview over the modifications that can be applied to distance methods based in high-scoring segment pairs (HSPs) or maximally unique matches (MUMs) and that need to be documented. General recommendations on determining HSPs using BLAST or other algorithms are also provided. As a reference implementation, we introduce the GGDC web server (http://ggdc.gbdp.org).

  1. Ecology and genomics of Bacillus subtilis.

    PubMed

    Earl, Ashlee M; Losick, Richard; Kolter, Roberto

    2008-06-01

    Bacillus subtilis is a remarkably diverse bacterial species that is capable of growth within many environments. Recent microarray-based comparative genomic analyses have revealed that members of this species also exhibit considerable genomic diversity. The identification of strain-specific genes might explain how B. subtilis has become so broadly adapted. The goal of identifying ecologically adaptive genes could soon be realized with the imminent release of several new B. subtilis genome sequences. As we embark upon this exciting new era of B. subtilis comparative genomics we review what is currently known about the ecology and evolution of this species.

  2. Aligning a New Reference Genetic Map of Lupinus angustifolius with the Genome Sequence of the Model Legume, Lotus japonicus

    PubMed Central

    Nelson, Matthew N.; Moolhuijzen, Paula M.; Boersma, Jeffrey G.; Chudy, Magdalena; Lesniewska, Karolina; Bellgard, Matthew; Oliver, Richard P.; Święcicki, Wojciech; Wolko, Bogdan; Cowling, Wallace A.; Ellwood, Simon R.

    2010-01-01

    We have developed a dense reference genetic map of Lupinus angustifolius (2n = 40) based on a set of 106 publicly available recombinant inbred lines derived from a cross between domesticated and wild parental lines. The map comprised 1090 loci in 20 linkage groups and three small clusters, drawing together data from several previous mapping publications plus almost 200 new markers, of which 63 were gene-based markers. A total of 171 mainly gene-based, sequence-tagged site loci served as bridging points for comparing the Lu. angustifolius genome with the genome sequence of the model legume, Lotus japonicus via BLASTn homology searching. Comparative analysis indicated that the genomes of Lu. angustifolius and Lo. japonicus are highly diverged structurally but with significant regions of conserved synteny including the region of the Lu. angustifolius genome containing the pod-shatter resistance gene, lentus. We discuss the potential of synteny analysis for identifying candidate genes for domestication traits in Lu. angustifolius and in improving our understanding of Fabaceae genome evolution. PMID:20133394

  3. A comparative physical map reveals the pattern of chromosomal evolution between the turkey (Meleagris gallopavo) and chicken (Gallus gallus) genomes

    PubMed Central

    2011-01-01

    Background A robust bacterial artificial chromosome (BAC)-based physical map is essential for many aspects of genomics research, including an understanding of chromosome evolution, high-resolution genome mapping, marker-assisted breeding, positional cloning of genes, and quantitative trait analysis. To facilitate turkey genetics research and better understand avian genome evolution, a BAC-based integrated physical, genetic, and comparative map was developed for this important agricultural species. Results The turkey genome physical map was constructed based on 74,013 BAC fingerprints (11.9 × coverage) from two independent libraries, and it was integrated with the turkey genetic map and chicken genome sequence using over 41,400 BAC assignments identified by 3,499 overgo hybridization probes along with > 43,000 BAC end sequences. The physical-comparative map consists of 74 BAC contigs, with an average contig size of 13.6 Mb. All but four of the turkey chromosomes were spanned on this map by three or fewer contigs, with 14 chromosomes spanned by a single contig and nine chromosomes spanned by two contigs. This map predicts 20 to 27 major rearrangements distinguishing turkey and chicken chromosomes, despite up to 40 million years of separate evolution between the two species. These data elucidate the chromosomal evolutionary pattern within the Phasianidae that led to the modern turkey and chicken karyotypes. The predominant rearrangement mode involves intra-chromosomal inversions, and there is a clear bias for these to result in centromere locations at or near telomeres in turkey chromosomes, in comparison to interstitial centromeres in the orthologous chicken chromosomes. Conclusion The BAC-based turkey-chicken comparative map provides novel insights into the evolution of avian genomes, a framework for assembly of turkey whole genome shotgun sequencing data, and tools for enhanced genetic improvement of these important agricultural and model species. PMID:21906286

  4. Sequencing and comparing whole mitochondrial genomes ofanimals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based onmore » our experiences to date with determining and comparing complete mtDNA sequences.« less

  5. Position-based scanning for comparative genomics and identification of genetic islands in Haemophilus influenzae type b.

    PubMed

    Bergman, Nicholas H; Akerley, Brian J

    2003-03-01

    Bacteria exhibit extensive genetic heterogeneity within species. In many cases, these differences account for virulence properties unique to specific strains. Several such loci have been discovered in the genome of the type b serotype of Haemophilus influenzae, a human pathogen able to cause meningitis, pneumonia, and septicemia. Here we report application of a PCR-based scanning procedure to compare the genome of a virulent type b (Hib) strain with that of the laboratory-passaged Rd KW20 strain for which a complete genome sequence is available. We have identified seven DNA segments or H. influenzae genetic islands (HiGIs) present in the type b genome and absent from the Rd genome. These segments vary in size and content and show signs of horizontal gene transfer in that their percent G+C content differs from that of the rest of the H. influenzae genome, they contain genes similar to those found on phages or other mobile elements, or they are flanked by DNA repeats. Several of these loci represent potential pathogenicity islands, because they contain genes likely to mediate interactions with the host. These newly identified genetic islands provide areas of investigation into both the evolution and pathogenesis of H. influenzae. In addition, the genome scanning approach developed to identify these islands provides a rapid means to compare the genomes of phenotypically diverse bacterial strains once the genome sequence of one representative strain has been determined.

  6. MicroScope: a platform for microbial genome annotation and comparative genomics

    PubMed Central

    Vallenet, D.; Engelen, S.; Mornico, D.; Cruveiller, S.; Fleury, L.; Lajus, A.; Rouy, Z.; Roche, D.; Salvignol, G.; Scarpelli, C.; Médigue, C.

    2009-01-01

    The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope’s rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone. Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc PMID:20157493

  7. Comparative Genomics of Oral Isolates of Streptococcus mutans by in silico Genome Subtraction Does Not Reveal Accessory DNA Associated with Severe Early Childhood Caries

    PubMed Central

    Argimón, Silvia; Konganti, Kranti; Chen, Hao; Alekseyenko, Alexander V.; Brown, Stuart; Caufield, Page W.

    2014-01-01

    Comparative genomics is a popular method for the identification of microbial virulence determinants, especially since the sequencing of a large number of whole bacterial genomes from pathogenic and non-pathogenic strains has become relatively inexpensive. The bioinformatics pipelines for comparative genomics usually include gene prediction and annotation and can require significant computer power. To circumvent this, we developed a rapid method for genome-scale in silico subtractive hybridization, based on blastn and independent of feature identification and annotation. Whole genome comparisons by in silico genome subtraction were performed to identify genetic loci specific to Streptococcus mutans strains associated with severe early childhood caries (S-ECC), compared to strains isolated from caries-free (CF) children. The genome similarity of the 20 S. mutans strains included in this study, calculated by Simrank k-mer sharing, ranged from 79.5 to 90.9%, confirming this is a genetically heterogeneous group of strains. We identified strain-specific genetic elements in 19 strains, with sizes ranging from 200 bp to 39 kb. These elements contained protein-coding regions with functions mostly associated with mobile DNA. We did not, however, identify any genetic loci consistently associated with dental caries, i.e., shared by all the S-ECC strains and absent in the CF strains. Conversely, we did not identify any genetic loci specific with the healthy group. Comparison of previously published genomes from pathogenic and carriage strains of Neisseria meningitidis with our in silico genome subtraction yielded the same set of genes specific to the pathogenic strains, thus validating our method. Our results suggest that S. mutans strains derived from caries active or caries free dentitions cannot be differentiated based on the presence or absence of specific genetic elements. Our in silico genome subtraction method is available as the Microbial Genome Comparison (MGC) tool, with a user-friendly JAVA graphical interface. PMID:24291226

  8. MicroScope: a platform for microbial genome annotation and comparative genomics.

    PubMed

    Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C

    2009-01-01

    The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone.Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc.

  9. New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits

    PubMed Central

    2011-01-01

    Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18) to duodecaploid (12X = 108). Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective). Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of homoeologous harboring OsBRI1 orthologs present a glimpse into the switchgrass genome structure and complexity. Data obtained demonstrate the feasibility of using HICF fingerprinting to resolve the homoeologous chromosomes of the two distinct genomes in switchgrass, providing a robust and accurate BAC-based physical platform for this species. The genomic resources and sequence data generated will lay the foundation for deciphering the switchgrass genome and lead the way for an accurate genome sequencing strategy. PMID:21767393

  10. New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits.

    PubMed

    Saski, Christopher A; Li, Zhigang; Feltus, Frank A; Luo, Hong

    2011-07-18

    Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18) to duodecaploid (12X = 108). Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective). Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. The construction of the first switchgrass BAC library and comparative analysis of homoeologous harboring OsBRI1 orthologs present a glimpse into the switchgrass genome structure and complexity. Data obtained demonstrate the feasibility of using HICF fingerprinting to resolve the homoeologous chromosomes of the two distinct genomes in switchgrass, providing a robust and accurate BAC-based physical platform for this species. The genomic resources and sequence data generated will lay the foundation for deciphering the switchgrass genome and lead the way for an accurate genome sequencing strategy.

  11. EDGAR: A software framework for the comparative analysis of prokaryotic genomes

    PubMed Central

    Blom, Jochen; Albaum, Stefan P; Doppmeier, Daniel; Pühler, Alfred; Vorhölter, Frank-Jörg; Zakrzewski, Martha; Goesmann, Alexander

    2009-01-01

    Background The introduction of next generation sequencing approaches has caused a rapid increase in the number of completely sequenced genomes. As one result of this development, it is now feasible to analyze large groups of related genomes in a comparative approach. A main task in comparative genomics is the identification of orthologous genes in different genomes and the classification of genes as core genes or singletons. Results To support these studies EDGAR – "Efficient Database framework for comparative Genome Analyses using BLAST score Ratios" – was developed. EDGAR is designed to automatically perform genome comparisons in a high throughput approach. Comparative analyses for 582 genomes across 75 genus groups taken from the NCBI genomes database were conducted with the software and the results were integrated into an underlying database. To demonstrate a specific application case, we analyzed ten genomes of the bacterial genus Xanthomonas, for which phylogenetic studies were awkward due to divergent taxonomic systems. The resultant phylogeny EDGAR provided was consistent with outcomes from traditional approaches performed recently and moreover, it was possible to root each strain with unprecedented accuracy. Conclusion EDGAR provides novel analysis features and significantly simplifies the comparative analysis of related genomes. The software supports a quick survey of evolutionary relationships and simplifies the process of obtaining new biological insights into the differential gene content of kindred genomes. Visualization features, like synteny plots or Venn diagrams, are offered to the scientific community through a web-based and therefore platform independent user interface , where the precomputed data sets can be browsed. PMID:19457249

  12. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes.

    PubMed

    Zhu, Huaiqiu; Hu, Gang-Qing; Yang, Yi-Fan; Wang, Jin; She, Zhen-Su

    2007-03-16

    Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs) and Translation Initiation Sites (TISs). The former is based on a linguistic "Entropy Density Profile" (EDP) model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED) algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  13. A dictionary based informational genome analysis

    PubMed Central

    2012-01-01

    Background In the post-genomic era several methods of computational genomics are emerging to understand how the whole information is structured within genomes. Literature of last five years accounts for several alignment-free methods, arisen as alternative metrics for dissimilarity of biological sequences. Among the others, recent approaches are based on empirical frequencies of DNA k-mers in whole genomes. Results Any set of words (factors) occurring in a genome provides a genomic dictionary. About sixty genomes were analyzed by means of informational indexes based on genomic dictionaries, where a systemic view replaces a local sequence analysis. A software prototype applying a methodology here outlined carried out some computations on genomic data. We computed informational indexes, built the genomic dictionaries with different sizes, along with frequency distributions. The software performed three main tasks: computation of informational indexes, storage of these in a database, index analysis and visualization. The validation was done by investigating genomes of various organisms. A systematic analysis of genomic repeats of several lengths, which is of vivid interest in biology (for example to compute excessively represented functional sequences, such as promoters), was discussed, and suggested a method to define synthetic genetic networks. Conclusions We introduced a methodology based on dictionaries, and an efficient motif-finding software application for comparative genomics. This approach could be extended along many investigation lines, namely exported in other contexts of computational genomics, as a basis for discrimination of genomic pathologies. PMID:22985068

  14. High-density genetic map construction and comparative genome analysis in asparagus bean.

    PubMed

    Huang, Haitao; Tan, Huaqiang; Xu, Dongmei; Tang, Yi; Niu, Yisong; Lai, Yunsong; Tie, Manman; Li, Huanxiu

    2018-03-19

    Genetic maps are a prerequisite for quantitative trait locus (QTL) analysis, marker-assisted selection (MAS), fine gene mapping, and assembly of genome sequences. So far, several asparagus bean linkage maps have been established using various kinds of molecular markers. However, these maps were all constructed by gel- or array-based markers. No maps based on sequencing method have been reported. In this study, an NGS-based strategy, SLAF-seq, was applied to create a high-density genetic map for asparagus bean. Through SLAF library construction and Illumina sequencing of two parents and 100 F2 individuals, a total of 55,437 polymorphic SLAF markers were developed and mined for SNP markers. The map consisted of 5,225 SNP markers in 11 LGs, spanning a total distance of 1,850.81 cM, with an average distance between markers of 0.35 cM. Comparative genome analysis with four other legume species, soybean, common bean, mung bean and adzuki bean showed that asparagus bean is genetically more related to adzuki bean. The results will provide a foundation for future genomic research, such as QTL fine mapping, comparative mapping in pulses, and offer support for assembling asparagus bean genome sequence.

  15. solGS: a web-based tool for genomic selection

    USDA-ARS?s Scientific Manuscript database

    Genomic selection (GS) promises to improve accuracy in estimating breeding values and genetic gain for quantitative traits compared to traditional breeding methods. Its reliance on high-throughput genome-wide markers and statistical complexity, however, is a serious challenge in data management, ana...

  16. Whole-genome sequence-based genomic prediction in laying chickens with different genomic relationship matrices to account for genetic architecture.

    PubMed

    Ni, Guiyan; Cavero, David; Fangmann, Anna; Erbe, Malena; Simianer, Henner

    2017-01-16

    With the availability of next-generation sequencing technologies, genomic prediction based on whole-genome sequencing (WGS) data is now feasible in animal breeding schemes and was expected to lead to higher predictive ability, since such data may contain all genomic variants including causal mutations. Our objective was to compare prediction ability with high-density (HD) array data and WGS data in a commercial brown layer line with genomic best linear unbiased prediction (GBLUP) models using various approaches to weight single nucleotide polymorphisms (SNPs). A total of 892 chickens from a commercial brown layer line were genotyped with 336 K segregating SNPs (array data) that included 157 K genic SNPs (i.e. SNPs in or around a gene). For these individuals, genome-wide sequence information was imputed based on data from re-sequencing runs of 25 individuals, leading to 5.2 million (M) imputed SNPs (WGS data), including 2.6 M genic SNPs. De-regressed proofs (DRP) for eggshell strength, feed intake and laying rate were used as quasi-phenotypic data in genomic prediction analyses. Four weighting factors for building a trait-specific genomic relationship matrix were investigated: identical weights, -(log 10 P) from genome-wide association study results, squares of SNP effects from random regression BLUP, and variable selection based weights (known as BLUP|GA). Predictive ability was measured as the correlation between DRP and direct genomic breeding values in five replications of a fivefold cross-validation. Averaged over the three traits, the highest predictive ability (0.366 ± 0.075) was obtained when only genic SNPs from WGS data were used. Predictive abilities with genic SNPs and all SNPs from HD array data were 0.361 ± 0.072 and 0.353 ± 0.074, respectively. Prediction with -(log 10 P) or squares of SNP effects as weighting factors for building a genomic relationship matrix or BLUP|GA did not increase accuracy, compared to that with identical weights, regardless of the SNP set used. Our results show that little or no benefit was gained when using all imputed WGS data to perform genomic prediction compared to using HD array data regardless of the weighting factors tested. However, using only genic SNPs from WGS data had a positive effect on prediction ability.

  17. Genome sequencing of ovine isolates of Mycobacterium avium subspecies paratuberculosis offers insights into host association

    PubMed Central

    2012-01-01

    Background The genome of Mycobacterium avium subspecies paratuberculosis (MAP) is remarkably homogeneous among the genomes of bovine, human and wildlife isolates. However, previous work in our laboratories with the bovine K-10 strain has revealed substantial differences compared to sheep isolates. To systematically characterize all genomic differences that may be associated with the specific hosts, we sequenced the genomes of three U.S. sheep isolates and also obtained an optical map. Results Our analysis of one of the isolates, MAP S397, revealed a genome 4.8 Mb in size with 4,700 open reading frames (ORFs). Comparative analysis of the MAP S397 isolate showed it acquired approximately 10 large sequence regions that are shared with the human M. avium subsp. hominissuis strain 104 and lost 2 large regions that are present in the bovine strain. In addition, optical mapping defined the presence of 7 large inversions between the bovine and ovine genomes (~ 2.36 Mb). Whole-genome sequencing of 2 additional sheep strains of MAP (JTC1074 and JTC7565) further confirmed genomic homogeneity of the sheep isolates despite the presence of polymorphisms on the nucleotide level. Conclusions Comparative sequence analysis employed here provided a better understanding of the host association, evolution of members of the M. avium complex and could help in deciphering the phenotypic differences observed among sheep and cattle strains of MAP. A similar approach based on whole-genome sequencing combined with optical mapping could be employed to examine closely related pathogens. We propose an evolutionary scenario for M. avium complex strains based on these genome sequences. PMID:22409516

  18. Mycobacterial species as case-study of comparative genome analysis.

    PubMed

    Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M

    2011-02-08

    The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.

  19. [CRISPR/Cas system for genome editing in pluripotent stem cells].

    PubMed

    Vasil'eva, E A; Melino, D; Barlev, N A

    2015-01-01

    Genome editing systems based on site-specific nucleases became very popular for genome editing in modern bioengineering. Human pluripotent stem cells provide a unique platform for genes function study, disease modeling, and drugs testing. Consequently, technology for fast, accurate and well controlled genome manipulation is required. CRISPR/Cas (clustered regularly interspaced short palindromic repeat/CRISPR-associated) system could be employed for these purposes. This system is based on site-specific programmable nuclease Cas9. Numerous advantages of the CRISPR/Cas system and its successful application to human stem cells provide wide opportunities for genome therapy and regeneration medicine. In this publication, we describe and compare the main genome editing systems based on site-specific programmable nucleases and discuss opportunities and perspectives of the CRISPR/Cas system for application to pluripotent stem cells.

  20. Evaluation of the efficacy of constitutional array-based comparative genomic hybridization in the diagnosis of aneuploidy using genomic and amplified DNA.

    PubMed

    Tan, Niap H; Palmer, Rodger; Wang, Rubin

    2010-02-01

    Array-based comparative genomic hybridization (array CGH) is a new molecular technique that has the potential to revolutionize cytogenetics. However, use of high resolution array CGH in the clinical setting is plagued by the problem of widespread copy number variations (CNV) in the human genome. Constitutional microarray, containing only clones that interrogate regions of known constitutional syndromes, may circumvent the dilemma of detecting CNV of unknown clinical significance. The present study investigated the efficacy of constitutional microarray in the diagnosis of trisomy. Test samples included genomic DNA from trisomic cell lines, amplification products of 50 ng of genomic DNA and whole genome amplification products of single cells. DNA amplification was achieved by means of multiple displacement amplification (MDA) over 16 h. The trisomic and sex chromosomes copy number imbalances in the genomic DNA were correctly identified by the constitutional microarrays. However, there was a failure to detect the trisomy in the amplification products of 50 ng of genomic DNA and whole genome amplification products of single cells. Using carefully selected clones, Spectral Genomics constitutional microarray was able to detect the chromosomal copy number imbalances in genomic DNA without the confounding effects of CNV. The diagnostic failure in amplified DNA samples could be attributed to the amplification process. The MDA duration of 16 h generated excessive amount of biases and shortening the duration might minimize the problem.

  1. GeNemo: a search engine for web-based functional genomic data.

    PubMed

    Zhang, Yongqing; Cao, Xiaoyi; Zhong, Sheng

    2016-07-08

    A set of new data types emerged from functional genomic assays, including ChIP-seq, DNase-seq, FAIRE-seq and others. The results are typically stored as genome-wide intensities (WIG/bigWig files) or functional genomic regions (peak/BED files). These data types present new challenges to big data science. Here, we present GeNemo, a web-based search engine for functional genomic data. GeNemo searches user-input data against online functional genomic datasets, including the entire collection of ENCODE and mouse ENCODE datasets. Unlike text-based search engines, GeNemo's searches are based on pattern matching of functional genomic regions. This distinguishes GeNemo from text or DNA sequence searches. The user can input any complete or partial functional genomic dataset, for example, a binding intensity file (bigWig) or a peak file. GeNemo reports any genomic regions, ranging from hundred bases to hundred thousand bases, from any of the online ENCODE datasets that share similar functional (binding, modification, accessibility) patterns. This is enabled by a Markov Chain Monte Carlo-based maximization process, executed on up to 24 parallel computing threads. By clicking on a search result, the user can visually compare her/his data with the found datasets and navigate the identified genomic regions. GeNemo is available at www.genemo.org. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Genomic Repeat Abundances Contain Phylogenetic Signal

    PubMed Central

    Dodsworth, Steven; Chase, Mark W.; Kelly, Laura J.; Leitch, Ilia J.; Macas, Jiří; Novák, Petr; Piednoël, Mathieu; Weiss-Schneeweiss, Hanna; Leitch, Andrew R.

    2015-01-01

    A large proportion of genomic information, particularly repetitive elements, is usually ignored when researchers are using next-generation sequencing. Here we demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based clustering of next-generation sequence reads, which results in abundance estimates of different classes of genomic repeats. Phylogenetic trees are then inferred based on the genome-wide abundance of different repeat types treated as continuously varying characters; such repeats are scattered across chromosomes and in angiosperms can constitute a majority of nuclear genomic DNA. In six diverse examples, five angiosperms and one insect, this method provides generally well-supported relationships at interspecific and intergeneric levels that agree with results from more standard phylogenetic analyses of commonly used markers. We propose that this methodology may prove especially useful in groups where there is little genetic differentiation in standard phylogenetic markers. At the same time as providing data for phylogenetic inference, this method additionally yields a wealth of data for comparative studies of genome evolution. PMID:25261464

  3. A greedy, graph-based algorithm for the alignment of multiple homologous gene lists.

    PubMed

    Fostier, Jan; Proost, Sebastian; Dhoedt, Bart; Saeys, Yvan; Demeester, Piet; Van de Peer, Yves; Vandepoele, Klaas

    2011-03-15

    Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists. Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes. http://bioinformatics.psb.ugent.be/software. The algorithm is implemented as a part of the i-ADHoRe 3.0 package.

  4. CFGP: a web-based, comparative fungal genomics platform.

    PubMed

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F; Blair, Jaime E; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the 'fill-in-the-form-and-press-SUBMIT' user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI.

  5. Whole-genome comparative analysis of three phytopathogenic Xylella fastidiosa strains.

    PubMed

    Bhattacharyya, Anamitra; Stilwagen, Stephanie; Ivanova, Natalia; D'Souza, Mark; Bernal, Axel; Lykidis, Athanasios; Kapatral, Vinayak; Anderson, Iain; Larsen, Niels; Los, Tamara; Reznik, Gary; Selkov, Eugene; Walunas, Theresa L; Feil, Helene; Feil, William S; Purcell, Alexander; Lassez, Jean-Louis; Hawkins, Trevor L; Haselkorn, Robert; Overbeek, Ross; Predki, Paul F; Kyrpides, Nikos C

    2002-09-17

    Xylella fastidiosa (Xf) causes wilt disease in plants and is responsible for major economic and crop losses globally. Owing to the public importance of this phytopathogen we embarked on a comparative analysis of the complete genome of Xf pv citrus and the partial genomes of two recently sequenced strains of this species: Xf pv almond and Xf pv oleander, which cause leaf scorch in almond and oleander plants, respectively. We report a reanalysis of the previously sequenced Xf 9a5c (CVC, citrus) strain and the two "gapped" Xf genomes revealing ORFs encoding critical functions in pathogenicity and conjugative transfer. Second, a detailed whole-genome functional comparison was based on the three sequenced Xf strains, identifying the unique genes present in each strain, in addition to those shared between strains. Third, an "in silico" cellular reconstruction of these organisms was made, based on a comparison of their core functional subsystems that led to a characterization of their conjugative transfer machinery, identification of potential differences in their adhesion mechanisms, and highlighting of the absence of a classical quorum-sensing mechanism. This study demonstrates the effectiveness of comparative analysis strategies in the interpretation of genomes that are closely related.

  6. Exploration of the Drosophila buzzatii transposable element content suggests underestimation of repeats in Drosophila genomes.

    PubMed

    Rius, Nuria; Guillén, Yolanda; Delprat, Alejandra; Kapusta, Aurélie; Feschotte, Cédric; Ruiz, Alfredo

    2016-05-10

    Many new Drosophila genomes have been sequenced in recent years using new-generation sequencing platforms and assembly methods. Transposable elements (TEs), being repetitive sequences, are often misassembled, especially in the genomes sequenced with short reads. Consequently, the mobile fraction of many of the new genomes has not been analyzed in detail or compared with that of other genomes sequenced with different methods, which could shed light into the understanding of genome and TE evolution. Here we compare the TE content of three genomes: D. buzzatii st-1, j-19, and D. mojavensis. We have sequenced a new D. buzzatii genome (j-19) that complements the D. buzzatii reference genome (st-1) already published, and compared their TE contents with that of D. mojavensis. We found an underestimation of TE sequences in Drosophila genus NGS-genomes when compared to Sanger-genomes. To be able to compare genomes sequenced with different technologies, we developed a coverage-based method and applied it to the D. buzzatii st-1 and j-19 genome. Between 10.85 and 11.16 % of the D. buzzatii st-1 genome is made up of TEs, between 7 and 7,5 % of D. buzzatii j-19 genome, while TEs represent 15.35 % of the D. mojavensis genome. Helitrons are the most abundant order in the three genomes. TEs in D. buzzatii are less abundant than in D. mojavensis, as expected according to the genome size and TE content positive correlation. However, TEs alone do not explain the genome size difference. TEs accumulate in the dot chromosomes and proximal regions of D. buzzatii and D. mojavensis chromosomes. We also report a significantly higher TE density in D. buzzatii and D. mojavensis X chromosomes, which is not expected under the current models. Our easy-to-use correction method allowed us to identify recently active families in D. buzzatii st-1 belonging to the LTR-retrotransposon superfamily Gypsy.

  7. CGI: Java Software for Mapping and Visualizing Data from Array-based Comparative Genomic Hybridization and Expression Profiling

    PubMed Central

    Gu, Joyce Xiuweu-Xu; Wei, Michael Yang; Rao, Pulivarthi H.; Lau, Ching C.; Behl, Sanjiv; Man, Tsz-Kwong

    2007-01-01

    With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator) that matches the BAC clones from array-based comparative genomic hybridization (aCGH) to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specific BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License. PMID:19936083

  8. CGI: Java software for mapping and visualizing data from array-based comparative genomic hybridization and expression profiling.

    PubMed

    Gu, Joyce Xiuweu-Xu; Wei, Michael Yang; Rao, Pulivarthi H; Lau, Ching C; Behl, Sanjiv; Man, Tsz-Kwong

    2007-10-06

    With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator) that matches the BAC clones from array-based comparative genomic hybridization (aCGH) to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specific BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License.

  9. A ddRAD-based genetic map and its integration with the genome assembly of Japanese eel (Anguilla japonica) provides insights into genome evolution after the teleost-specific genome duplication

    PubMed Central

    2014-01-01

    Background Recent advancements in next-generation sequencing technology have enabled cost-effective sequencing of whole or partial genomes, permitting the discovery and characterization of molecular polymorphisms. Double-digest restriction-site associated DNA sequencing (ddRAD-seq) is a powerful and inexpensive approach to developing numerous single nucleotide polymorphism (SNP) markers and constructing a high-density genetic map. To enrich genomic resources for Japanese eel (Anguilla japonica), we constructed a ddRAD-based genetic map using an Ion Torrent Personal Genome Machine and anchored scaffolds of the current genome assembly to 19 linkage groups of the Japanese eel. Furthermore, we compared the Japanese eel genome with genomes of model fishes to infer the history of genome evolution after the teleost-specific genome duplication. Results We generated the ddRAD-based linkage map of the Japanese eel, where the maps for female and male spanned 1748.8 cM and 1294.5 cM, respectively, and were arranged into 19 linkage groups. A total of 2,672 SNP markers and 115 Simple Sequence Repeat markers provide anchor points to 1,252 scaffolds covering 151 Mb (13%) of the current genome assembly of the Japanese eel. Comparisons among the Japanese eel, medaka, zebrafish and spotted gar genomes showed highly conserved synteny among teleosts and revealed part of the eight major chromosomal rearrangement events that occurred soon after the teleost-specific genome duplication. Conclusions The ddRAD-seq approach combined with the Ion Torrent Personal Genome Machine sequencing allowed us to conduct efficient and flexible SNP genotyping. The integration of the genetic map and the assembled sequence provides a valuable resource for fine mapping and positional cloning of quantitative trait loci associated with economically important traits and for investigating comparative genomics of the Japanese eel. PMID:24669946

  10. A ddRAD-based genetic map and its integration with the genome assembly of Japanese eel (Anguilla japonica) provides insights into genome evolution after the teleost-specific genome duplication.

    PubMed

    Kai, Wataru; Nomura, Kazuharu; Fujiwara, Atushi; Nakamura, Yoji; Yasuike, Motoshige; Ojima, Nobuhiko; Masaoka, Tetsuji; Ozaki, Akiyuki; Kazeto, Yukinori; Gen, Koichiro; Nagao, Jiro; Tanaka, Hideki; Kobayashi, Takanori; Ototake, Mitsuru

    2014-03-26

    Recent advancements in next-generation sequencing technology have enabled cost-effective sequencing of whole or partial genomes, permitting the discovery and characterization of molecular polymorphisms. Double-digest restriction-site associated DNA sequencing (ddRAD-seq) is a powerful and inexpensive approach to developing numerous single nucleotide polymorphism (SNP) markers and constructing a high-density genetic map. To enrich genomic resources for Japanese eel (Anguilla japonica), we constructed a ddRAD-based genetic map using an Ion Torrent Personal Genome Machine and anchored scaffolds of the current genome assembly to 19 linkage groups of the Japanese eel. Furthermore, we compared the Japanese eel genome with genomes of model fishes to infer the history of genome evolution after the teleost-specific genome duplication. We generated the ddRAD-based linkage map of the Japanese eel, where the maps for female and male spanned 1748.8 cM and 1294.5 cM, respectively, and were arranged into 19 linkage groups. A total of 2,672 SNP markers and 115 Simple Sequence Repeat markers provide anchor points to 1,252 scaffolds covering 151 Mb (13%) of the current genome assembly of the Japanese eel. Comparisons among the Japanese eel, medaka, zebrafish and spotted gar genomes showed highly conserved synteny among teleosts and revealed part of the eight major chromosomal rearrangement events that occurred soon after the teleost-specific genome duplication. The ddRAD-seq approach combined with the Ion Torrent Personal Genome Machine sequencing allowed us to conduct efficient and flexible SNP genotyping. The integration of the genetic map and the assembled sequence provides a valuable resource for fine mapping and positional cloning of quantitative trait loci associated with economically important traits and for investigating comparative genomics of the Japanese eel.

  11. Whole genome comparative studies between chicken and turkey and their implications for avian genome evolution

    PubMed Central

    Griffin, Darren K; Robertson, Lindsay B; Tempest, Helen G; Vignal, Alain; Fillon, Valérie; Crooijmans, Richard PMA; Groenen, Martien AM; Deryusheva, Svetlana; Gaginskaya, Elena; Carré, Wilfrid; Waddington, David; Talbot, Richard; Völker, Martin; Masabanda, Julio S; Burt, Dave W

    2008-01-01

    Background Comparative genomics is a powerful means of establishing inter-specific relationships between gene function/location and allows insight into genomic rearrangements, conservation and evolutionary phylogeny. The availability of the complete sequence of the chicken genome has initiated the development of detailed genomic information in other birds including turkey, an agriculturally important species where mapping has hitherto focused on linkage with limited physical information. No molecular study has yet examined conservation of avian microchromosomes, nor differences in copy number variants (CNVs) between birds. Results We present a detailed comparative cytogenetic map between chicken and turkey based on reciprocal chromosome painting and mapping of 338 chicken BACs to turkey metaphases. Two inter-chromosomal changes (both involving centromeres) and three pericentric inversions have been identified between chicken and turkey; and array CGH identified 16 inter-specific CNVs. Conclusion This is the first study to combine the modalities of zoo-FISH and array CGH between different avian species. The first insight into the conservation of microchromosomes, the first comparative cytogenetic map of any bird and the first appraisal of CNVs between birds is provided. Results suggest that avian genomes have remained relatively stable during evolution compared to mammalian equivalents. PMID:18410676

  12. A privacy-preserving solution for compressed storage and selective retrieval of genomic data.

    PubMed

    Huang, Zhicong; Ayday, Erman; Lin, Huang; Aiyar, Raeka S; Molyneaux, Adam; Xu, Zhenyu; Fellay, Jacques; Steinmetz, Lars M; Hubaux, Jean-Pierre

    2016-12-01

    In clinical genomics, the continuous evolution of bioinformatic algorithms and sequencing platforms makes it beneficial to store patients' complete aligned genomic data in addition to variant calls relative to a reference sequence. Due to the large size of human genome sequence data files (varying from 30 GB to 200 GB depending on coverage), two major challenges facing genomics laboratories are the costs of storage and the efficiency of the initial data processing. In addition, privacy of genomic data is becoming an increasingly serious concern, yet no standard data storage solutions exist that enable compression, encryption, and selective retrieval. Here we present a privacy-preserving solution named SECRAM (Selective retrieval on Encrypted and Compressed Reference-oriented Alignment Map) for the secure storage of compressed aligned genomic data. Our solution enables selective retrieval of encrypted data and improves the efficiency of downstream analysis (e.g., variant calling). Compared with BAM, the de facto standard for storing aligned genomic data, SECRAM uses 18% less storage. Compared with CRAM, one of the most compressed nonencrypted formats (using 34% less storage than BAM), SECRAM maintains efficient compression and downstream data processing, while allowing for unprecedented levels of security in genomic data storage. Compared with previous work, the distinguishing features of SECRAM are that (1) it is position-based instead of read-based, and (2) it allows random querying of a subregion from a BAM-like file in an encrypted form. Our method thus offers a space-saving, privacy-preserving, and effective solution for the storage of clinical genomic data. © 2016 Huang et al.; Published by Cold Spring Harbor Laboratory Press.

  13. A privacy-preserving solution for compressed storage and selective retrieval of genomic data

    PubMed Central

    Huang, Zhicong; Ayday, Erman; Lin, Huang; Aiyar, Raeka S.; Molyneaux, Adam; Xu, Zhenyu; Hubaux, Jean-Pierre

    2016-01-01

    In clinical genomics, the continuous evolution of bioinformatic algorithms and sequencing platforms makes it beneficial to store patients’ complete aligned genomic data in addition to variant calls relative to a reference sequence. Due to the large size of human genome sequence data files (varying from 30 GB to 200 GB depending on coverage), two major challenges facing genomics laboratories are the costs of storage and the efficiency of the initial data processing. In addition, privacy of genomic data is becoming an increasingly serious concern, yet no standard data storage solutions exist that enable compression, encryption, and selective retrieval. Here we present a privacy-preserving solution named SECRAM (Selective retrieval on Encrypted and Compressed Reference-oriented Alignment Map) for the secure storage of compressed aligned genomic data. Our solution enables selective retrieval of encrypted data and improves the efficiency of downstream analysis (e.g., variant calling). Compared with BAM, the de facto standard for storing aligned genomic data, SECRAM uses 18% less storage. Compared with CRAM, one of the most compressed nonencrypted formats (using 34% less storage than BAM), SECRAM maintains efficient compression and downstream data processing, while allowing for unprecedented levels of security in genomic data storage. Compared with previous work, the distinguishing features of SECRAM are that (1) it is position-based instead of read-based, and (2) it allows random querying of a subregion from a BAM-like file in an encrypted form. Our method thus offers a space-saving, privacy-preserving, and effective solution for the storage of clinical genomic data. PMID:27789525

  14. Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries.

    PubMed

    Lu, Fu-Hao; McKenzie, Neil; Kettleborough, George; Heavens, Darren; Clark, Matthew D; Bevan, Michael W

    2018-05-01

    The accurate sequencing and assembly of very large, often polyploid, genomes remains a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15-Gb hexaploid bread wheat (Triticum aestivum) genome has been particularly challenging to sequence, and several different approaches have recently generated long-range assemblies. Mapping and understanding the types of assembly errors are important for optimising future sequencing and assembly approaches and for comparative genomics. Here we use a Fosill 38-kb jumping library to assess medium and longer-range order of different publicly available wheat genome assemblies. Modifications to the Fosill protocol generated longer Illumina sequences and enabled comprehensive genome coverage. Analyses of two independent Bacterial Artificial Chromosome (BAC)-based chromosome-scale assemblies, two independent Illumina whole genome shotgun assemblies, and a hybrid Single Molecule Real Time (SMRT-PacBio) and short read (Illumina) assembly were carried out. We revealed a surprising scale and variety of discrepancies using Fosill mate-pair mapping and validated several of each class. In addition, Fosill mate-pairs were used to scaffold a whole genome Illumina assembly, leading to a 3-fold increase in N50 values. Our analyses, using an independent means to validate different wheat genome assemblies, show that whole genome shotgun assemblies based solely on Illumina sequences are significantly more accurate by all measures compared to BAC-based chromosome-scale assemblies and hybrid SMRT-Illumina approaches. Although current whole genome assemblies are reasonably accurate and useful, additional improvements will be needed to generate complete assemblies of wheat genomes using open-source, computationally efficient, and cost-effective methods.

  15. The effect of using genealogy-based haplotypes for genomic prediction

    PubMed Central

    2013-01-01

    Background Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. Methods A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. Results About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Conclusions Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy. PMID:23496971

  16. The effect of using genealogy-based haplotypes for genomic prediction.

    PubMed

    Edriss, Vahid; Fernando, Rohan L; Su, Guosheng; Lund, Mogens S; Guldbrandtsen, Bernt

    2013-03-06

    Genomic prediction uses two sources of information: linkage disequilibrium between markers and quantitative trait loci, and additive genetic relationships between individuals. One way to increase the accuracy of genomic prediction is to capture more linkage disequilibrium by regression on haplotypes instead of regression on individual markers. The aim of this study was to investigate the accuracy of genomic prediction using haplotypes based on local genealogy information. A total of 4429 Danish Holstein bulls were genotyped with the 50K SNP chip. Haplotypes were constructed using local genealogical trees. Effects of haplotype covariates were estimated with two types of prediction models: (1) assuming that effects had the same distribution for all haplotype covariates, i.e. the GBLUP method and (2) assuming that a large proportion (π) of the haplotype covariates had zero effect, i.e. a Bayesian mixture method. About 7.5 times more covariate effects were estimated when fitting haplotypes based on local genealogical trees compared to fitting individuals markers. Genealogy-based haplotype clustering slightly increased the accuracy of genomic prediction and, in some cases, decreased the bias of prediction. With the Bayesian method, accuracy of prediction was less sensitive to parameter π when fitting haplotypes compared to fitting markers. Use of haplotypes based on genealogy can slightly increase the accuracy of genomic prediction. Improved methods to cluster the haplotypes constructed from local genealogy could lead to additional gains in accuracy.

  17. Variation block-based genomics method for crop plants.

    PubMed

    Kim, Yul Ho; Park, Hyang Mi; Hwang, Tae-Young; Lee, Seuk Ki; Choi, Man Soo; Jho, Sungwoong; Hwang, Seungwoo; Kim, Hak-Min; Lee, Dongwoo; Kim, Byoung-Chul; Hong, Chang Pyo; Cho, Yun Sung; Kim, Hyunmin; Jeong, Kwang Ho; Seo, Min Jung; Yun, Hong Tai; Kim, Sun Lim; Kwon, Young-Up; Kim, Wook Han; Chun, Hye Kyung; Lim, Sang Jong; Shin, Young-Ah; Choi, Ik-Young; Kim, Young Sun; Yoon, Ho-Sung; Lee, Suk-Ha; Lee, Sunghoon

    2014-06-15

    In contrast with wild species, cultivated crop genomes consist of reshuffled recombination blocks, which occurred by crossing and selection processes. Accordingly, recombination block-based genomics analysis can be an effective approach for the screening of target loci for agricultural traits. We propose the variation block method, which is a three-step process for recombination block detection and comparison. The first step is to detect variations by comparing the short-read DNA sequences of the cultivar to the reference genome of the target crop. Next, sequence blocks with variation patterns are examined and defined. The boundaries between the variation-containing sequence blocks are regarded as recombination sites. All the assumed recombination sites in the cultivar set are used to split the genomes, and the resulting sequence regions are termed variation blocks. Finally, the genomes are compared using the variation blocks. The variation block method identified recurring recombination blocks accurately and successfully represented block-level diversities in the publicly available genomes of 31 soybean and 23 rice accessions. The practicality of this approach was demonstrated by the identification of a putative locus determining soybean hilum color. We suggest that the variation block method is an efficient genomics method for the recombination block-level comparison of crop genomes. We expect that this method will facilitate the development of crop genomics by bringing genomics technologies to the field of crop breeding.

  18. CAR: contig assembly of prokaryotic draft genomes using rearrangements.

    PubMed

    Lu, Chin Lung; Chen, Kun-Tze; Huang, Shih-Yuan; Chiu, Hsien-Tai

    2014-11-28

    Next generation sequencing technology has allowed efficient production of draft genomes for many organisms of interest. However, most draft genomes are just collections of independent contigs, whose relative positions and orientations along the genome being sequenced are unknown. Although several tools have been developed to order and orient the contigs of draft genomes, more accurate tools are still needed. In this study, we present a novel reference-based contig assembly (or scaffolding) tool, named as CAR, that can efficiently and more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome of a related organism. Given a set of contigs in multi-FASTA format and a reference genome in FASTA format, CAR can output a list of scaffolds, each of which is a set of ordered and oriented contigs. For validation, we have tested CAR on a real dataset composed of several prokaryotic genomes and also compared its performance with several other reference-based contig assembly tools. Consequently, our experimental results have shown that CAR indeed performs better than all these other reference-based contig assembly tools in terms of sensitivity, precision and genome coverage. CAR serves as an efficient tool that can more accurately order and orient the contigs of a prokaryotic draft genome based on a reference genome. The web server of CAR is freely available at http://genome.cs.nthu.edu.tw/CAR/ and its stand-alone program can also be downloaded from the same website.

  19. Gramene 2016: comparative plant genomics and pathway resources.

    PubMed

    Tello-Ruiz, Marcela K; Stein, Joshua; Wei, Sharon; Preece, Justin; Olson, Andrew; Naithani, Sushma; Amarasinghe, Vindhya; Dharmawardhana, Palitha; Jiao, Yinping; Mulvaney, Joseph; Kumari, Sunita; Chougule, Kapeel; Elser, Justin; Wang, Bo; Thomason, James; Bolser, Daniel M; Kerhornou, Arnaud; Walts, Brandon; Fonseca, Nuno A; Huerta, Laura; Keays, Maria; Tang, Y Amy; Parkinson, Helen; Fabregat, Antonio; McKay, Sheldon; Weiser, Joel; D'Eustachio, Peter; Stein, Lincoln; Petryszak, Robert; Kersey, Paul J; Jaiswal, Pankaj; Ware, Doreen

    2016-01-04

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  20. Comparative genomic data of the Avian Phylogenomics Project.

    PubMed

    Zhang, Guojie; Li, Bo; Li, Cai; Gilbert, M Thomas P; Jarvis, Erich D; Wang, Jun

    2014-01-01

    The evolutionary relationships of modern birds are among the most challenging to understand in systematic biology and have been debated for centuries. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders, and used the genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomics analyses (Jarvis et al. in press; Zhang et al. in press). Here we release assemblies and datasets associated with the comparative genome analyses, which include 38 newly sequenced avian genomes plus previously released or simultaneously released genomes of Chicken, Zebra finch, Turkey, Pigeon, Peregrine falcon, Duck, Budgerigar, Adelie penguin, Emperor penguin and the Medium Ground Finch. We hope that this resource will serve future efforts in phylogenomics and comparative genomics. The 38 bird genomes were sequenced using the Illumina HiSeq 2000 platform and assembled using a whole genome shotgun strategy. The 48 genomes were categorized into two groups according to the N50 scaffold size of the assemblies: a high depth group comprising 23 species sequenced at high coverage (>50X) with multiple insert size libraries resulting in N50 scaffold sizes greater than 1 Mb (except the White-throated Tinamou and Bald Eagle); and a low depth group comprising 25 species sequenced at a low coverage (~30X) with two insert size libraries resulting in an average N50 scaffold size of about 50 kb. Repetitive elements comprised 4%-22% of the bird genomes. The assembled scaffolds allowed the homology-based annotation of 13,000 ~ 17000 protein coding genes in each avian genome relative to chicken, zebra finch and human, as well as comparative and sequence conservation analyses. Here we release full genome assemblies of 38 newly sequenced avian species, link genome assembly downloads for the 7 of the remaining 10 species, and provide a guideline of genomic data that has been generated and used in our Avian Phylogenomics Project. To the best of our knowledge, the Avian Phylogenomics Project is the biggest vertebrate comparative genomics project to date. The genomic data presented here is expected to accelerate further analyses in many fields, including phylogenetics, comparative genomics, evolution, neurobiology, development biology, and other related areas.

  1. flyDIVaS: A Comparative Genomics Resource for Drosophila Divergence and Selection

    PubMed Central

    Stanley, Craig E.; Kulathinal, Rob J.

    2016-01-01

    With arguably the best finished and expertly annotated genome assembly, Drosophila melanogaster is a formidable genetics model to study all aspects of biology. Nearly a decade ago, the 12 Drosophila genomes project expanded D. melanogaster’s breadth as a comparative model through the community-development of an unprecedented genus- and genome-wide comparative resource. However, since its inception, these datasets for evolutionary inference and biological discovery have become increasingly outdated, outmoded, and inaccessible. Here, we provide an updated and upgradable comparative genomics resource of Drosophila divergence and selection, flyDIVaS, based on the latest genomic assemblies, curated FlyBase annotations, and recent OrthoDB orthology calls. flyDIVaS is an online database containing D. melanogaster-centric orthologous gene sets, CDS and protein alignments, divergence statistics (% gaps, dN, dS, dN/dS), and codon-based tests of positive Darwinian selection. Out of 13,920 protein-coding D. melanogaster genes, ∼80% have one aligned ortholog in the closely related species, D. simulans, and ∼50% have 1–1 12-way alignments in the original 12 sequenced species that span over 80 million yr of divergence. Genes and their orthologs can be chosen from four different taxonomic datasets differing in phylogenetic depth and coverage density, and visualized via interactive alignments and phylogenetic trees. Users can also batch download entire comparative datasets. A functional survey finds conserved mitotic and neural genes, highly diverged immune and reproduction-related genes, more conspicuous signals of divergence across tissue-specific genes, and an enrichment of positive selection among highly diverged genes. flyDIVaS will be regularly updated and can be freely accessed at www.flydivas.info. We encourage researchers to regularly use this resource as a tool for biological inference and discovery, and in their classrooms to help train the next generation of biologists to creatively use such genomic big data resources in an integrative manner. PMID:27226167

  2. flyDIVaS: A Comparative Genomics Resource for Drosophila Divergence and Selection.

    PubMed

    Stanley, Craig E; Kulathinal, Rob J

    2016-08-09

    With arguably the best finished and expertly annotated genome assembly, Drosophila melanogaster is a formidable genetics model to study all aspects of biology. Nearly a decade ago, the 12 Drosophila genomes project expanded D. melanogaster's breadth as a comparative model through the community-development of an unprecedented genus- and genome-wide comparative resource. However, since its inception, these datasets for evolutionary inference and biological discovery have become increasingly outdated, outmoded, and inaccessible. Here, we provide an updated and upgradable comparative genomics resource of Drosophila divergence and selection, flyDIVaS, based on the latest genomic assemblies, curated FlyBase annotations, and recent OrthoDB orthology calls. flyDIVaS is an online database containing D. melanogaster-centric orthologous gene sets, CDS and protein alignments, divergence statistics (% gaps, dN, dS, dN/dS), and codon-based tests of positive Darwinian selection. Out of 13,920 protein-coding D. melanogaster genes, ∼80% have one aligned ortholog in the closely related species, D. simulans, and ∼50% have 1-1 12-way alignments in the original 12 sequenced species that span over 80 million yr of divergence. Genes and their orthologs can be chosen from four different taxonomic datasets differing in phylogenetic depth and coverage density, and visualized via interactive alignments and phylogenetic trees. Users can also batch download entire comparative datasets. A functional survey finds conserved mitotic and neural genes, highly diverged immune and reproduction-related genes, more conspicuous signals of divergence across tissue-specific genes, and an enrichment of positive selection among highly diverged genes. flyDIVaS will be regularly updated and can be freely accessed at www.flydivas.info We encourage researchers to regularly use this resource as a tool for biological inference and discovery, and in their classrooms to help train the next generation of biologists to creatively use such genomic big data resources in an integrative manner. Copyright © 2016 Stanley and Kulathinal.

  3. NeisseriaBase: a specialised Neisseria genomic resource and analysis platform.

    PubMed

    Zheng, Wenning; Mutha, Naresh V R; Heydari, Hamed; Dutta, Avirup; Siow, Cheuk Chuen; Jakubovics, Nicholas S; Wee, Wei Yee; Tan, Shi Yang; Ang, Mia Yang; Wong, Guat Jah; Choo, Siew Woh

    2016-01-01

    Background. The gram-negative Neisseria is associated with two of the most potent human epidemic diseases: meningococcal meningitis and gonorrhoea. In both cases, disease is caused by bacteria colonizing human mucosal membrane surfaces. Overall, the genus shows great diversity and genetic variation mainly due to its ability to acquire and incorporate genetic material from a diverse range of sources through horizontal gene transfer. Although a number of databases exist for the Neisseria genomes, they are mostly focused on the pathogenic species. In this present study we present the freely available NeisseriaBase, a database dedicated to the genus Neisseria encompassing the complete and draft genomes of 15 pathogenic and commensal Neisseria species. Methods. The genomic data were retrieved from National Center for Biotechnology Information (NCBI) and annotated using the RAST server which were then stored into the MySQL database. The protein-coding genes were further analyzed to obtain information such as calculation of GC content (%), predicted hydrophobicity and molecular weight (Da) using in-house Perl scripts. The web application was developed following the secure four-tier web application architecture: (1) client workstation, (2) web server, (3) application server, and (4) database server. The web interface was constructed using PHP, JavaScript, jQuery, AJAX and CSS, utilizing the model-view-controller (MVC) framework. The in-house developed bioinformatics tools implemented in NeisseraBase were developed using Python, Perl, BioPerl and R languages. Results. Currently, NeisseriaBase houses 603,500 Coding Sequences (CDSs), 16,071 RNAs and 13,119 tRNA genes from 227 Neisseria genomes. The database is equipped with interactive web interfaces. Incorporation of the JBrowse genome browser in the database enables fast and smooth browsing of Neisseria genomes. NeisseriaBase includes the standard BLAST program to facilitate homology searching, and for Virulence Factor Database (VFDB) specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC) for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT) for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomic resources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my.

  4. NeisseriaBase: a specialised Neisseria genomic resource and analysis platform

    PubMed Central

    Zheng, Wenning; Mutha, Naresh V.R.; Heydari, Hamed; Dutta, Avirup; Siow, Cheuk Chuen; Jakubovics, Nicholas S.; Wee, Wei Yee; Tan, Shi Yang; Ang, Mia Yang; Wong, Guat Jah

    2016-01-01

    Background. The gram-negative Neisseria is associated with two of the most potent human epidemic diseases: meningococcal meningitis and gonorrhoea. In both cases, disease is caused by bacteria colonizing human mucosal membrane surfaces. Overall, the genus shows great diversity and genetic variation mainly due to its ability to acquire and incorporate genetic material from a diverse range of sources through horizontal gene transfer. Although a number of databases exist for the Neisseria genomes, they are mostly focused on the pathogenic species. In this present study we present the freely available NeisseriaBase, a database dedicated to the genus Neisseria encompassing the complete and draft genomes of 15 pathogenic and commensal Neisseria species. Methods. The genomic data were retrieved from National Center for Biotechnology Information (NCBI) and annotated using the RAST server which were then stored into the MySQL database. The protein-coding genes were further analyzed to obtain information such as calculation of GC content (%), predicted hydrophobicity and molecular weight (Da) using in-house Perl scripts. The web application was developed following the secure four-tier web application architecture: (1) client workstation, (2) web server, (3) application server, and (4) database server. The web interface was constructed using PHP, JavaScript, jQuery, AJAX and CSS, utilizing the model-view-controller (MVC) framework. The in-house developed bioinformatics tools implemented in NeisseraBase were developed using Python, Perl, BioPerl and R languages. Results. Currently, NeisseriaBase houses 603,500 Coding Sequences (CDSs), 16,071 RNAs and 13,119 tRNA genes from 227 Neisseria genomes. The database is equipped with interactive web interfaces. Incorporation of the JBrowse genome browser in the database enables fast and smooth browsing of Neisseria genomes. NeisseriaBase includes the standard BLAST program to facilitate homology searching, and for Virulence Factor Database (VFDB) specific homology searches, the VFDB BLAST is also incorporated into the database. In addition, NeisseriaBase is equipped with in-house designed tools such as the Pairwise Genome Comparison tool (PGC) for comparative genomic analysis and the Pathogenomics Profiling Tool (PathoProT) for the comparative pathogenomics analysis of Neisseria strains. Discussion. This user-friendly database not only provides access to a host of genomic resources on Neisseria but also enables high-quality comparative genome analysis, which is crucial for the expanding scientific community interested in Neisseria research. This database is freely available at http://neisseria.um.edu.my. PMID:27017950

  5. Ancient bacterial endosymbionts of insects: Genomes as sources of insight and springboards for inquiry.

    PubMed

    Wernegreen, Jennifer J

    2017-09-15

    Ancient associations between insects and bacteria provide models to study intimate host-microbe interactions. Currently, a wealth of genome sequence data for long-term, obligately intracellular (primary) endosymbionts of insects reveals profound genomic consequences of this specialized bacterial lifestyle. Those consequences include severe genome reduction and extreme base compositions. This minireview highlights the utility of genome sequence data to understand how, and why, endosymbionts have been pushed to such extremes, and to illuminate the functional consequences of such extensive genome change. While the static snapshots provided by individual endosymbiont genomes are valuable, comparative analyses of multiple genomes have shed light on evolutionary mechanisms. Namely, genome comparisons have told us that selection is important in fine-tuning gene content, but at the same time, mutational pressure and genetic drift contribute to genome degradation. Examples from Blochmannia, the primary endosymbiont of the ant tribe Camponotini, illustrate the value and constraints of genome sequence data, and exemplify how genomes can serve as a springboard for further comparative and experimental inquiry. Copyright © 2017. Published by Elsevier Inc.

  6. CMG-biotools, a free workbench for basic comparative microbial genomics.

    PubMed

    Vesth, Tammi; Lagesen, Karin; Acar, Öncel; Ussery, David

    2013-01-01

    Today, there are more than a hundred times as many sequenced prokaryotic genomes than were present in the year 2000. The economical sequencing of genomic DNA has facilitated a whole new approach to microbial genomics. The real power of genomics is manifested through comparative genomics that can reveal strain specific characteristics, diversity within species and many other aspects. However, comparative genomics is a field not easily entered into by scientists with few computational skills. The CMG-biotools package is designed for microbiologists with limited knowledge of computational analysis and can be used to perform a number of analyses and comparisons of genomic data. The CMG-biotools system presents a stand-alone interface for comparative microbial genomics. The package is a customized operating system, based on Xubuntu 10.10, available through the open source Ubuntu project. The system can be installed on a virtual computer, allowing the user to run the system alongside any other operating system. Source codes for all programs are provided under GNU license, which makes it possible to transfer the programs to other systems if so desired. We here demonstrate the package by comparing and analyzing the diversity within the class Negativicutes, represented by 31 genomes including 10 genera. The analyses include 16S rRNA phylogeny, basic DNA and codon statistics, proteome comparisons using BLAST and graphical analyses of DNA structures. This paper shows the strength and diverse use of the CMG-biotools system. The system can be installed on a vide range of host operating systems and utilizes as much of the host computer as desired. It allows the user to compare multiple genomes, from various sources using standardized data formats and intuitive visualizations of results. The examples presented here clearly shows that users with limited computational experience can perform complicated analysis without much training.

  7. GI-SVM: A sensitive method for predicting genomic islands based on unannotated sequence of a single genome.

    PubMed

    Lu, Bingxin; Leong, Hon Wai

    2016-02-01

    Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.

  8. Comparing Mycobacterium tuberculosis genomes using genome topology networks.

    PubMed

    Jiang, Jianping; Gu, Jianlei; Zhang, Liang; Zhang, Chenyi; Deng, Xiao; Dou, Tonghai; Zhao, Guoping; Zhou, Yan

    2015-02-14

    Over the last decade, emerging research methods, such as comparative genomic analysis and phylogenetic study, have yielded new insights into genotypes and phenotypes of closely related bacterial strains. Several findings have revealed that genomic structural variations (SVs), including gene gain/loss, gene duplication and genome rearrangement, can lead to different phenotypes among strains, and an investigation of genes affected by SVs may extend our knowledge of the relationships between SVs and phenotypes in microbes, especially in pathogenic bacteria. In this work, we introduce a 'Genome Topology Network' (GTN) method based on gene homology and gene locations to analyze genomic SVs and perform phylogenetic analysis. Furthermore, the concept of 'unfixed ortholog' has been proposed, whose members are affected by SVs in genome topology among close species. To improve the precision of 'unfixed ortholog' recognition, a strategy to detect annotation differences and complete gene annotation was applied. To assess the GTN method, a set of thirteen complete M. tuberculosis genomes was analyzed as a case study. GTNs with two different gene homology-assigning methods were built, the Clusters of Orthologous Groups (COG) method and the orthoMCL clustering method, and two phylogenetic trees were constructed accordingly, which may provide additional insights into whole genome-based phylogenetic analysis. We obtained 24 unfixable COG groups, of which most members were related to immunogenicity and drug resistance, such as PPE-repeat proteins (COG5651) and transcriptional regulator TetR gene family members (COG1309). The GTN method has been implemented in PERL and released on our website. The tool can be downloaded from http://homepage.fudan.edu.cn/zhouyan/gtn/ , and allows re-annotating the 'lost' genes among closely related genomes, analyzing genes affected by SVs, and performing phylogenetic analysis. With this tool, many immunogenic-related and drug resistance-related genes were found to be affected by SVs in M. tuberculosis genomes. We believe that the GTN method will be suitable for the exploration of genomic SVs in connection with biological features of bacterial strains, and that GTN-based phylogenetic analysis will provide additional insights into whole genome-based phylogenetic analysis.

  9. Genomic prediction using imputed whole-genome sequence data in Holstein Friesian cattle.

    PubMed

    van Binsbergen, Rianne; Calus, Mario P L; Bink, Marco C A M; van Eeuwijk, Fred A; Schrooten, Chris; Veerkamp, Roel F

    2015-09-17

    In contrast to currently used single nucleotide polymorphism (SNP) panels, the use of whole-genome sequence data is expected to enable the direct estimation of the effects of causal mutations on a given trait. This could lead to higher reliabilities of genomic predictions compared to those based on SNP genotypes. Also, at each generation of selection, recombination events between a SNP and a mutation can cause decay in reliability of genomic predictions based on markers rather than on the causal variants. Our objective was to investigate the use of imputed whole-genome sequence genotypes versus high-density SNP genotypes on (the persistency of) the reliability of genomic predictions using real cattle data. Highly accurate phenotypes based on daughter performance and Illumina BovineHD Beadchip genotypes were available for 5503 Holstein Friesian bulls. The BovineHD genotypes (631,428 SNPs) of each bull were used to impute whole-genome sequence genotypes (12,590,056 SNPs) using the Beagle software. Imputation was done using a multi-breed reference panel of 429 sequenced individuals. Genomic estimated breeding values for three traits were predicted using a Bayesian stochastic search variable selection (BSSVS) model and a genome-enabled best linear unbiased prediction model (GBLUP). Reliabilities of predictions were based on 2087 validation bulls, while the other 3416 bulls were used for training. Prediction reliabilities ranged from 0.37 to 0.52. BSSVS performed better than GBLUP in all cases. Reliabilities of genomic predictions were slightly lower with imputed sequence data than with BovineHD chip data. Also, the reliabilities tended to be lower for both sequence data and BovineHD chip data when relationships between training animals were low. No increase in persistency of prediction reliability using imputed sequence data was observed. Compared to BovineHD genotype data, using imputed sequence data for genomic prediction produced no advantage. To investigate the putative advantage of genomic prediction using (imputed) sequence data, a training set with a larger number of individuals that are distantly related to each other and genomic prediction models that incorporate biological information on the SNPs or that apply stricter SNP pre-selection should be considered.

  10. Amino acid usage is asymmetrically biased in AT- and GC-rich microbial genomes.

    PubMed

    Bohlin, Jon; Brynildsrud, Ola; Vesth, Tammi; Skjerve, Eystein; Ussery, David W

    2013-01-01

    Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study.

  11. Amino Acid Usage Is Asymmetrically Biased in AT- and GC-Rich Microbial Genomes

    PubMed Central

    Bohlin, Jon; Brynildsrud, Ola; Vesth, Tammi; Skjerve, Eystein; Ussery, David W.

    2013-01-01

    Introduction Genomic base composition ranges from less than 25% AT to more than 85% AT in prokaryotes. Since only a small fraction of prokaryotic genomes is not protein coding even a minor change in genomic base composition will induce profound protein changes. We examined how amino acid and codon frequencies were distributed in over 2000 microbial genomes and how these distributions were affected by base compositional changes. In addition, we wanted to know how genome-wide amino acid usage was biased in the different genomes and how changes to base composition and mutations affected this bias. To carry this out, we used a Generalized Additive Mixed-effects Model (GAMM) to explore non-linear associations and strong data dependences in closely related microbes; principal component analysis (PCA) was used to examine genomic amino acid- and codon frequencies, while the concept of relative entropy was used to analyze genomic mutation rates. Results We found that genomic amino acid frequencies carried a stronger phylogenetic signal than codon frequencies, but that this signal was weak compared to that of genomic %AT. Further, in contrast to codon usage bias (CUB), amino acid usage bias (AAUB) was differently distributed in AT- and GC-rich genomes in the sense that AT-rich genomes did not prefer specific amino acids over others to the same extent as GC-rich genomes. AAUB was also associated with relative entropy; genomes with low AAUB contained more random mutations as a consequence of relaxed purifying selection than genomes with higher AAUB. Conclusion Genomic base composition has a substantial effect on both amino acid- and codon frequencies in bacterial genomes. While phylogeny influenced amino acid usage more in GC-rich genomes, AT-content was driving amino acid usage in AT-rich genomes. We found the GAMM model to be an excellent tool to analyze the genomic data used in this study. PMID:23922837

  12. Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies

    PubMed Central

    Lasitschka, Bärbel; Jones, David; Northcott, Paul; Hutter, Barbara; Jäger, Natalie; Kool, Marcel; Taylor, Michael; Lichter, Peter; Pfister, Stefan; Wolf, Stephan; Brors, Benedikt; Eils, Roland

    2013-01-01

    The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes. PMID:23776689

  13. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms

    PubMed Central

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources. PMID:26151450

  14. Next Generation Semiconductor Based Sequencing of the Donkey (Equus asinus) Genome Provided Comparative Sequence Data against the Horse Genome and a Few Millions of Single Nucleotide Polymorphisms.

    PubMed

    Bertolini, Francesca; Scimone, Concetta; Geraci, Claudia; Schiavo, Giuseppina; Utzeri, Valerio Joe; Chiofalo, Vincenzo; Fontanesi, Luca

    2015-01-01

    Few studies investigated the donkey (Equus asinus) at the whole genome level so far. Here, we sequenced the genome of two male donkeys using a next generation semiconductor based sequencing platform (the Ion Proton sequencer) and compared obtained sequence information with the available donkey draft genome (and its Illumina reads from which it was originated) and with the EquCab2.0 assembly of the horse genome. Moreover, the Ion Torrent Personal Genome Analyzer was used to sequence reduced representation libraries (RRL) obtained from a DNA pool including donkeys of different breeds (Grigio Siciliano, Ragusano and Martina Franca). The number of next generation sequencing reads aligned with the EquCab2.0 horse genome was larger than those aligned with the draft donkey genome. This was due to the larger N50 for contigs and scaffolds of the horse genome. Nucleotide divergence between E. caballus and E. asinus was estimated to be ~ 0.52-0.57%. Regions with low nucleotide divergence were identified in several autosomal chromosomes and in the whole chromosome X. These regions might be evolutionally important in equids. Comparing Y-chromosome regions we identified variants that could be useful to track donkey paternal lineages. Moreover, about 4.8 million of single nucleotide polymorphisms (SNPs) in the donkey genome were identified and annotated combining sequencing data from Ion Proton (whole genome sequencing) and Ion Torrent (RRL) runs with Illumina reads. A higher density of SNPs was present in regions homologous to horse chromosome 12, in which several studies reported a high frequency of copy number variants. The SNPs we identified constitute a first resource useful to describe variability at the population genomic level in E. asinus and to establish monitoring systems for the conservation of donkey genetic resources.

  15. Genome Sequence, Assembly and Characterization of Two Metschnikowia fructicola Strains Used as Biocontrol Agents of Postharvest Diseases

    PubMed Central

    Piombo, Edoardo; Sela, Noa; Wisniewski, Michael; Hoffmann, Maria; Gullino, Maria L.; Allard, Marc W.; Levin, Elena; Spadaro, Davide; Droby, Samir

    2018-01-01

    The yeast Metschnikowia fructicola was reported as an efficient biological control agent of postharvest diseases of fruits and vegetables, and it is the bases of the commercial formulated product “Shemer.” Several mechanisms of action by which M. fructicola inhibits postharvest pathogens were suggested including iron-binding compounds, induction of defense signaling genes, production of fungal cell wall degrading enzymes and relatively high amounts of superoxide anions. We assembled the whole genome sequence of two strains of M. fructicola using PacBio and Illumina shotgun sequencing technologies. Using the PacBio, a high-quality draft genome consisting of 93 contigs, with an estimated genome size of approximately 26 Mb, was obtained. Comparative analysis of M. fructicola proteins with the other three available closely related genomes revealed a shared core of homologous proteins coded by 5,776 genes. Comparing the genomes of the two M. fructicola strains using a SNP calling approach resulted in the identification of 564,302 homologous SNPs with 2,004 predicted high impact mutations. The size of the genome is exceptionally high when compared with those of available closely related organisms, and the high rate of homology among M. fructicola genes points toward a recent whole-genome duplication event as the cause of this large genome. Based on the assembled genome, sequences were annotated with a gene description and gene ontology (GO term) and clustered in functional groups. Analysis of CAZymes family genes revealed 1,145 putative genes, and transcriptomic analysis of CAZyme expression levels in M. fructicola during its interaction with either grapefruit peel tissue or Penicillium digitatum revealed a high level of CAZyme gene expression when the yeast was placed in wounded fruit tissue. PMID:29666611

  16. CFGP: a web-based, comparative fungal genomics platform

    PubMed Central

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F.; Blair, Jaime E.; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the ‘fill-in-the-form-and-press-SUBMIT’ user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI. PMID:17947331

  17. Development of Real Time PCR Using Novel Genomic Target for Detection of Multiple Salmonella Serovars from Milk and Chickens

    USDA-ARS?s Scientific Manuscript database

    Background: A highly sensitive and specific novel genomic and plasmid target-based PCR platform was developed to detect multiple Salmonella serovars (S. Heidelberg, S. Dublin, S. Hadar, S. Kentucky and S. Enteritidis). Through extensive genome mining of protein databases of these serovars and compar...

  18. Integrated Approach to Reconstruction of Microbial Regulatory Networks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rodionov, Dmitry A; Novichkov, Pavel S

    2013-11-04

    This project had the goal(s) of development of integrated bioinformatics platform for genome-scale inference and visualization of transcriptional regulatory networks (TRNs) in bacterial genomes. The work was done in Sanford-Burnham Medical Research Institute (SBMRI, P.I. D.A. Rodionov) and Lawrence Berkeley National Laboratory (LBNL, co-P.I. P.S. Novichkov). The developed computational resources include: (1) RegPredict web-platform for TRN inference and regulon reconstruction in microbial genomes, and (2) RegPrecise database for collection, visualization and comparative analysis of transcriptional regulons reconstructed by comparative genomics. These analytical resources were selected as key components in the DOE Systems Biology KnowledgeBase (SBKB). The high-quality data accumulated inmore » RegPrecise will provide essential datasets of reference regulons in diverse microbes to enable automatic reconstruction of draft TRNs in newly sequenced genomes. We outline our progress toward the three aims of this grant proposal, which were: Develop integrated platform for genome-scale regulon reconstruction; Infer regulatory annotations in several groups of bacteria and building of reference collections of microbial regulons; and Develop KnowledgeBase on microbial transcriptional regulation.« less

  19. Comparative Genomics of Interreplichore Translocations in Bacteria: A Measure of Chromosome Topology?

    PubMed

    Khedkar, Supriya; Seshasayee, Aswin Sai Narain

    2016-06-01

    Genomes evolve not only in base sequence but also in terms of their architecture, defined by gene organization and chromosome topology. Whereas genome sequence data inform us about the changes in base sequences for a large variety of organisms, the study of chromosome topology is restricted to a few model organisms studied using microscopy and chromosome conformation capture techniques. Here, we exploit whole genome sequence data to study the link between gene organization and chromosome topology in bacteria. Using comparative genomics across ∼250 pairs of closely related bacteria we show that: (a) many organisms show a high degree of interreplichore translocations throughout the chromosome and not limited to the inversion-prone terminus (ter) or the origin of replication (oriC); (b) translocation maps may reflect chromosome topologies; and (c) symmetric interreplichore translocations do not disrupt the distance of a gene from oriC or affect gene expression states or strand biases in gene densities. In summary, we suggest that translocation maps might be a first line in defining a gross chromosome topology given a pair of closely related genome sequences. Copyright © 2016 Khedkar and Seshasayee.

  20. Comparative Genomics of Interreplichore Translocations in Bacteria: A Measure of Chromosome Topology?

    PubMed Central

    Khedkar, Supriya; Seshasayee, Aswin Sai Narain

    2016-01-01

    Genomes evolve not only in base sequence but also in terms of their architecture, defined by gene organization and chromosome topology. Whereas genome sequence data inform us about the changes in base sequences for a large variety of organisms, the study of chromosome topology is restricted to a few model organisms studied using microscopy and chromosome conformation capture techniques. Here, we exploit whole genome sequence data to study the link between gene organization and chromosome topology in bacteria. Using comparative genomics across ∼250 pairs of closely related bacteria we show that: (a) many organisms show a high degree of interreplichore translocations throughout the chromosome and not limited to the inversion-prone terminus (ter) or the origin of replication (oriC); (b) translocation maps may reflect chromosome topologies; and (c) symmetric interreplichore translocations do not disrupt the distance of a gene from oriC or affect gene expression states or strand biases in gene densities. In summary, we suggest that translocation maps might be a first line in defining a gross chromosome topology given a pair of closely related genome sequences. PMID:27172194

  1. Functional analysis and transcriptional output of the Göttingen minipig genome.

    PubMed

    Heckel, Tobias; Schmucki, Roland; Berrera, Marco; Ringshandl, Stephan; Badi, Laura; Steiner, Guido; Ravon, Morgane; Küng, Erich; Kuhn, Bernd; Kratochwil, Nicole A; Schmitt, Georg; Kiialainen, Anna; Nowaczyk, Corinne; Daff, Hamina; Khan, Azinwi Phina; Lekolool, Isaac; Pelle, Roger; Okoth, Edward; Bishop, Richard; Daubenberger, Claudia; Ebeling, Martin; Certa, Ulrich

    2015-11-14

    In the past decade the Göttingen minipig has gained increasing recognition as animal model in pharmaceutical and safety research because it recapitulates many aspects of human physiology and metabolism. Genome-based comparison of drug targets together with quantitative tissue expression analysis allows rational prediction of pharmacology and cross-reactivity of human drugs in animal models thereby improving drug attrition which is an important challenge in the process of drug development. Here we present a new chromosome level based version of the Göttingen minipig genome together with a comparative transcriptional analysis of tissues with pharmaceutical relevance as basis for translational research. We relied on mapping and assembly of WGS (whole-genome-shotgun sequencing) derived reads to the reference genome of the Duroc pig and predict 19,228 human orthologous protein-coding genes. Genome-based prediction of the sequence of human drug targets enables the prediction of drug cross-reactivity based on conservation of binding sites. We further support the finding that the genome of Sus scrofa contains about ten-times less pseudogenized genes compared to other vertebrates. Among the functional human orthologs of these minipig pseudogenes we found HEPN1, a putative tumor suppressor gene. The genomes of Sus scrofa, the Tibetan boar, the African Bushpig, and the Warthog show sequence conservation of all inactivating HEPN1 mutations suggesting disruption before the evolutionary split of these pig species. We identify 133 Sus scrofa specific, conserved long non-coding RNAs (lncRNAs) in the minipig genome and show that these transcripts are highly conserved in the African pigs and the Tibetan boar suggesting functional significance. Using a new minipig specific microarray we show high conservation of gene expression signatures in 13 tissues with biomedical relevance between humans and adult minipigs. We underline this relationship for minipig and human liver where we could demonstrate similar expression levels for most phase I drug-metabolizing enzymes. Higher expression levels and metabolic activities were found for FMO1, AKR/CRs and for phase II drug metabolizing enzymes in minipig as compared to human. The variability of gene expression in equivalent human and minipig tissues is considerably higher in minipig organs, which is important for study design in case a human target belongs to this variable category in the minipig. The first analysis of gene expression in multiple tissues during development from young to adult shows that the majority of transcriptional programs are concluded four weeks after birth. This finding is in line with the advanced state of human postnatal organ development at comparative age categories and further supports the minipig as model for pediatric drug safety studies. Genome based assessment of sequence conservation combined with gene expression data in several tissues improves the translational value of the minipig for human drug development. The genome and gene expression data presented here are important resources for researchers using the minipig as model for biomedical research or commercial breeding. Potential impact of our data for comparative genomics, translational research, and experimental medicine are discussed.

  2. Identification of Genetic Bases of Vibrio fluvialis Species-Specific Biochemical Pathways and Potential Virulence Factors by Comparative Genomic Analysis

    PubMed Central

    Lu, Xin; Liang, Weili; Wang, Yunduan; Xu, Jialiang

    2014-01-01

    Vibrio fluvialis is an important food-borne pathogen that causes diarrheal illness and sometimes extraintestinal infections in humans. In this study, we sequenced the genome of a clinical V. fluvialis strain and determined its phylogenetic relationships with other Vibrio species by comparative genomic analysis. We found that the closest relationship was between V. fluvialis and V. furnissii, followed by those with V. cholerae and V. mimicus. Moreover, based on genome comparisons and gene complementation experiments, we revealed genetic mechanisms of the biochemical tests that differentiate V. fluvialis from closely related species. Importantly, we identified a variety of genes encoding potential virulence factors, including multiple hemolysins, transcriptional regulators, and environmental survival and adaptation apparatuses, and the type VI secretion system, which is indicative of complex regulatory pathways modulating pathogenesis in this organism. The availability of V. fluvialis genome sequences may promote our understanding of pathogenic mechanisms for this emerging pathogen. PMID:24441165

  3. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    DOE PAGES

    Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.; ...

    2016-02-08

    The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based functionmore » assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). In conclusion, this new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.« less

  4. PATtyFams: Protein families for the microbial genomes in the PATRIC database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Davis, James J.; Gerdes, Svetlana; Olsen, Gary J.

    The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based functionmore » assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). In conclusion, this new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.« less

  5. Sockeye: A 3D Environment for Comparative Genomics

    PubMed Central

    Montgomery, Stephen B.; Astakhova, Tamara; Bilenky, Mikhail; Birney, Ewan; Fu, Tony; Hassel, Maik; Melsopp, Craig; Rak, Marcin; Robertson, A. Gordon; Sleumer, Monica; Siddiqui, Asim S.; Jones, Steven J.M.

    2004-01-01

    Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization. PMID:15123592

  6. Comparative Genomics of Erwinia amylovora and Related Erwinia Species—What do We Learn?

    PubMed Central

    Zhao, Youfu; Qi, Mingsheng

    2011-01-01

    Erwinia amylovora, the causal agent of fire blight disease of apples and pears, is one of the most important plant bacterial pathogens with worldwide economic significance. Recent reports on the complete or draft genome sequences of four species in the genus Erwinia, including E. amylovora, E. pyrifoliae, E. tasmaniensis, and E. billingiae, have provided us near complete genetic information about this pathogen and its closely-related species. This review describes in silico subtractive hybridization-based comparative genomic analyses of eight genomes currently available, and highlights what we have learned from these comparative analyses, as well as genetic and functional genomic studies. Sequence analyses reinforce the assumption that E. amylovora is a relatively homogeneous species and support the current classification scheme of E. amylovora and its related species. The potential evolutionary origin of these Erwinia species is also proposed. The current understanding of the pathogen, its virulence mechanism and host specificity from genome sequencing data is summarized. Future research directions are also suggested. PMID:24710213

  7. Assigning protein functions by comparative genome analysis protein phylogenetic profiles

    DOEpatents

    Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

    2003-05-13

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  8. A machine-learned computational functional genomics-based approach to drug classification.

    PubMed

    Lötsch, Jörn; Ultsch, Alfred

    2016-12-01

    The public accessibility of "big data" about the molecular targets of drugs and the biological functions of genes allows novel data science-based approaches to pharmacology that link drugs directly with their effects on pathophysiologic processes. This provides a phenotypic path to drug discovery and repurposing. This paper compares the performance of a functional genomics-based criterion to the traditional drug target-based classification. Knowledge discovery in the DrugBank and Gene Ontology databases allowed the construction of a "drug target versus biological process" matrix as a combination of "drug versus genes" and "genes versus biological processes" matrices. As a canonical example, such matrices were constructed for classical analgesic drugs. These matrices were projected onto a toroid grid of 50 × 82 artificial neurons using a self-organizing map (SOM). The distance, respectively, cluster structure of the high-dimensional feature space of the matrices was visualized on top of this SOM using a U-matrix. The cluster structure emerging on the U-matrix provided a correct classification of the analgesics into two main classes of opioid and non-opioid analgesics. The classification was flawless with both the functional genomics and the traditional target-based criterion. The functional genomics approach inherently included the drugs' modulatory effects on biological processes. The main pharmacological actions known from pharmacological science were captures, e.g., actions on lipid signaling for non-opioid analgesics that comprised many NSAIDs and actions on neuronal signal transmission for opioid analgesics. Using machine-learned techniques for computational drug classification in a comparative assessment, a functional genomics-based criterion was found to be similarly suitable for drug classification as the traditional target-based criterion. This supports a utility of functional genomics-based approaches to computational system pharmacology for drug discovery and repurposing.

  9. 16q24.1 microdeletion in a premature newborn: usefulness of array-based comparative genomic hybridization in persistent pulmonary hypertension of the newborn.

    PubMed

    Zufferey, Flore; Martinet, Danielle; Osterheld, Maria-Chiara; Niel-Bütschi, Florence; Giannoni, Eric; Schmutz, Nathalie Besuchet; Xia, Zhilian; Beckmann, Jacques S; Shaw-Smith, Charles; Stankiewicz, Pawel; Langston, Claire; Fellmann, Florence

    2011-11-01

    Report of a 16q24.1 deletion in a premature newborn, demonstrating the usefulness of array-based comparative genomic hybridization in persistent pulmonary hypertension of the newborn and multiple congenital malformations. Descriptive case report. Genetic department and neonatal intensive care unit of a tertiary care children's hospital. None. We report the case of a preterm male infant, born at 26 wks of gestation. A cardiac malformation and bilateral hydronephrosis were diagnosed at 19 wks of gestation. Karyotype analysis was normal, and a 22q11.2 microdeletion was excluded by fluorescence in situ hybridization analysis. A cesarean section was performed due to fetal distress. The patient developed persistent pulmonary hypertension unresponsive to mechanical ventilation and nitric oxide treatment and expired at 16 hrs of life. An autopsy revealed partial atrioventricular canal malformation and showed bilateral dilation of the renal pelvocaliceal system with bilateral ureteral stenosis and annular pancreas. Array-based comparative genomic hybridization analysis (Agilent oligoNT 44K, Agilent Technologies, Santa Clara, CA) showed an interstitial microdeletion encompassing the forkhead box gene cluster in 16q24.1. Review of the pulmonary microscopic examination showed the characteristic features of alveolar capillary dysplasia with misalignment of pulmonary veins. Some features were less prominent due to the gestational age. Our review of the literature shows that alveolar capillary dysplasia with misalignment of pulmonary veins is rare but probably underreported. Prematurity is not a usual presentation, and histologic features are difficult to interpret. In our case, array-based comparative genomic hybridization revealed a 16q24.1 deletion, leading to the final diagnosis of alveolar capillary dysplasia with misalignment of pulmonary veins. It emphasizes the usefulness of array-based comparative genomic hybridization analysis as a diagnostic tool with implications for both prognosis and management decisions in newborns with refractory persistent pulmonary hypertension and multiple congenital malformations.

  10. Systems biology definition of the core proteome of metabolism and expression is consistent with high-throughput data.

    PubMed

    Yang, Laurence; Tan, Justin; O'Brien, Edward J; Monk, Jonathan M; Kim, Donghyuk; Li, Howard J; Charusanti, Pep; Ebrahim, Ali; Lloyd, Colton J; Yurkovich, James T; Du, Bin; Dräger, Andreas; Thomas, Alex; Sun, Yuekai; Saunders, Michael A; Palsson, Bernhard O

    2015-08-25

    Finding the minimal set of gene functions needed to sustain life is of both fundamental and practical importance. Minimal gene lists have been proposed by using comparative genomics-based core proteome definitions. A definition of a core proteome that is supported by empirical data, is understood at the systems-level, and provides a basis for computing essential cell functions is lacking. Here, we use a systems biology-based genome-scale model of metabolism and expression to define a functional core proteome consisting of 356 gene products, accounting for 44% of the Escherichia coli proteome by mass based on proteomics data. This systems biology core proteome includes 212 genes not found in previous comparative genomics-based core proteome definitions, accounts for 65% of known essential genes in E. coli, and has 78% gene function overlap with minimal genomes (Buchnera aphidicola and Mycoplasma genitalium). Based on transcriptomics data across environmental and genetic backgrounds, the systems biology core proteome is significantly enriched in nondifferentially expressed genes and depleted in differentially expressed genes. Compared with the noncore, core gene expression levels are also similar across genetic backgrounds (two times higher Spearman rank correlation) and exhibit significantly more complex transcriptional and posttranscriptional regulatory features (40% more transcription start sites per gene, 22% longer 5'UTR). Thus, genome-scale systems biology approaches rigorously identify a functional core proteome needed to support growth. This framework, validated by using high-throughput datasets, facilitates a mechanistic understanding of systems-level core proteome function through in silico models; it de facto defines a paleome.

  11. MiSNPDb: a web-based genomic resources of tropical ecology fruit mango (Mangifera indica L.) for phylogeography and varietal differentiation.

    PubMed

    Iquebal, M A; Jaiswal, Sarika; Mahato, Ajay Kumar; Jayaswal, Pawan K; Angadi, U B; Kumar, Neeraj; Sharma, Nimisha; Singh, Anand K; Srivastav, Manish; Prakash, Jai; Singh, S K; Khan, Kasim; Mishra, Rupesh K; Rajan, Shailendra; Bajpai, Anju; Sandhya, B S; Nischita, Puttaraju; Ravishankar, K V; Dinesh, M R; Rai, Anil; Kumar, Dinesh; Sharma, Tilak R; Singh, Nagendra K

    2017-11-02

    Mango is one of the most important fruits of tropical ecological region of the world, well known for its nutritive value, aroma and taste. Its world production is >45MT worth >200 billion US dollars. Genomic resources are required for improvement in productivity and management of mango germplasm. There is no web-based genomic resources available for mango. Hence rapid and cost-effective high throughput putative marker discovery is required to develop such resources. RAD-based marker discovery can cater this urgent need till whole genome sequence of mango becomes available. Using a panel of 84 mango varieties, a total of 28.6 Gb data was generated by ddRAD-Seq approach on Illumina HiSeq 2000 platform. A total of 1.25 million SNPs were discovered. Phylogenetic tree using 749 common SNPs across these varieties revealed three major lineages which was compared with geographical locations. A web genomic resources MiSNPDb, available at http://webtom.cabgrid.res.in/mangosnps/ is based on 3-tier architecture, developed using PHP, MySQL and Javascript. This web genomic resources can be of immense use in the development of high density linkage map, QTL discovery, varietal differentiation, traceability, genome finishing and SNP chip development for future GWAS in genomic selection program. We report here world's first web-based genomic resources for genetic improvement and germplasm management of mango.

  12. Global mapping of transposon location.

    PubMed

    Gabriel, Abram; Dapprich, Johannes; Kunkel, Mark; Gresham, David; Pratt, Stephen C; Dunham, Maitreya J

    2006-12-15

    Transposable genetic elements are ubiquitous, yet their presence or absence at any given position within a genome can vary between individual cells, tissues, or strains. Transposable elements have profound impacts on host genomes by altering gene expression, assisting in genomic rearrangements, causing insertional mutations, and serving as sources of phenotypic variation. Characterizing a genome's full complement of transposons requires whole genome sequencing, precluding simple studies of the impact of transposition on interindividual variation. Here, we describe a global mapping approach for identifying transposon locations in any genome, using a combination of transposon-specific DNA extraction and microarray-based comparative hybridization analysis. We use this approach to map the repertoire of endogenous transposons in different laboratory strains of Saccharomyces cerevisiae and demonstrate that transposons are a source of extensive genomic variation. We also apply this method to mapping bacterial transposon insertion sites in a yeast genomic library. This unique whole genome view of transposon location will facilitate our exploration of transposon dynamics, as well as defining bases for individual differences and adaptive potential.

  13. Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

    PubMed

    Wang, Linhai; Yu, Jingyin; Li, Donghua; Zhang, Xiurong

    2015-01-01

    Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus) and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their importance, genetic and genomic analyses on these species have been insufficient due to a lack of reference genome information. The now available S. indicum genome will provide an unprecedented opportunity for studying both S. indicum genetic traits and comparative genomics. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a web-based database with comprehensive sesame genomic, genetic and comparative genomic information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167) and non-coding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  14. Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient Yersinia pestis genomes

    PubMed Central

    Doerr, Daniel; Chauve, Cedric

    2017-01-01

    Yersinia pestis is the causative agent of the bubonic plague, a disease responsible for several dramatic historical pandemics. Progress in ancient DNA (aDNA) sequencing rendered possible the sequencing of whole genomes of important human pathogens, including the ancient Y. pestis strains responsible for outbreaks of the bubonic plague in London in the 14th century and in Marseille in the 18th century, among others. However, aDNA sequencing data are still characterized by short reads and non-uniform coverage, so assembling ancient pathogen genomes remains challenging and often prevents a detailed study of genome rearrangements. It has recently been shown that comparative scaffolding approaches can improve the assembly of ancient Y. pestis genomes at a chromosome level. In the present work, we address the last step of genome assembly, the gap-filling stage. We describe an optimization-based method AGapEs (ancestral gap estimation) to fill in inter-contig gaps using a combination of a template obtained from related extant genomes and aDNA reads. We show how this approach can be used to refine comparative scaffolding by selecting contig adjacencies supported by a mix of unassembled aDNA reads and comparative signal. We applied our method to two Y. pestis data sets from the London and Marseilles outbreaks, for which we obtained highly improved genome assemblies for both genomes, comprised of, respectively, five and six scaffolds with 95 % of the assemblies supported by ancient reads. We analysed the genome evolution between both ancient genomes in terms of genome rearrangements, and observed a high level of synteny conservation between these strains. PMID:29114402

  15. A genomic view of food-related and probiotic Enterococcus strains

    PubMed Central

    Suárez, Nadia; Hormigo, Ricardo; Fadda, Silvina; Saavedra, Lucila

    2017-01-01

    Abstract The study of enterococcal genomes has grown considerably in recent years. While special attention is paid to comparative genomic analysis among clinical relevant isolates, in this study we performed an exhaustive comparative analysis of enterococcal genomes of food origin and/or with potential to be used as probiotics. Beyond common genetic features, we especially aimed to identify those that are specific to enterococcal strains isolated from a certain food-related source as well as features present in a species-specific manner. Thus, the genome sequences of 25 Enterococcus strains, from 7 different species, were examined and compared. Their phylogenetic relationship was reconstructed based on orthologous proteins and whole genomes. Likewise, markers associated with a successful colonization (bacteriocin genes and genomic islands) and genome plasticity (phages and clustered regularly interspaced short palindromic repeats) were investigated for lifestyle specific genetic features. At the same time, a search for antibiotic resistance genes was carried out, since they are of big concern in the food industry. Finally, it was possible to locate 1617 FIGfam families as a core proteome universally present among the genera and to determine that most of the accessory genes code for hypothetical proteins, providing reasonable hints to support their functional characterization. PMID:27773878

  16. Draft genome sequence of the New Jersey aster yellows strain of ‘Candidatus Phytoplasma asteris’

    USDA-ARS?s Scientific Manuscript database

    The NJAY (New Jersey aster yellows) strain of ‘Candidatus Phytoplasma asteris’ is a significant plant pathogen responsible for causing severe lettuce yellows in the U.S. state of New Jersey. A draft genome sequence was prepared for this organism and used for genome- and gene-based comparative phylog...

  17. Mobilomics in Saccharomyces cerevisiae strains

    PubMed Central

    2013-01-01

    Background Mobile Genetic Elements (MGEs) are selfish DNA integrated in the genomes. Their detection is mainly based on consensus–like searches by scanning the investigated genome against the sequence of an already identified MGE. Mobilomics aims at discovering all the MGEs in a genome and understanding their dynamic behavior: The data for this kind of investigation can be provided by comparative genomics of closely related organisms. The amount of data thus involved requires a strong computational effort, which should be alleviated. Results Our approach proposes to exploit the high similarity among homologous chromosomes of different strains of the same species, following a progressive comparative genomics philosophy. We introduce a software tool based on our new fast algorithm, called regender, which is able to identify the conserved regions between chromosomes. Our case study is represented by a unique recently available dataset of 39 different strains of S.cerevisiae, which regender is able to compare in few minutes. By exploring the non–conserved regions, where MGEs are mainly retrotransposons called Tys, and marking the candidate Tys based on their length, we are able to locate a priori and automatically all the already known Tys and map all the putative Tys in all the strains. The remaining putative mobile elements (PMEs) emerging from this intra–specific comparison are sharp markers of inter–specific evolution: indeed, many events of non–conservation among different yeast strains correspond to PMEs. A clustering based on the presence/absence of the candidate Tys in the strains suggests an evolutionary interconnection that is very similar to classic phylogenetic trees based on SNPs analysis, even though it is computed without using phylogenetic information. Conclusions The case study indicates that the proposed methodology brings two major advantages: (a) it does not require any template sequence for the wanted MGEs and (b) it can be applied to infer MGEs also for low coverage genomes with unresolved bases, where traditional approaches are largely ineffective. PMID:23514613

  18. Caryoscope: An Open Source Java application for viewing microarray data in a genomic context

    PubMed Central

    Awad, Ihab AB; Rees, Christian A; Hernandez-Boussard, Tina; Ball, Catherine A; Sherlock, Gavin

    2004-01-01

    Background Microarray-based comparative genome hybridization experiments generate data that can be mapped onto the genome. These data are interpreted more easily when represented graphically in a genomic context. Results We have developed Caryoscope, which is an open source Java application for visualizing microarray data from array comparative genome hybridization experiments in a genomic context. Caryoscope can read General Feature Format files (GFF files), as well as comma- and tab-delimited files, that define the genomic positions of the microarray reporters for which data are obtained. The microarray data can be browsed using an interactive, zoomable interface, which helps users identify regions of chromosomal deletion or amplification. The graphical representation of the data can be exported in a number of graphic formats, including publication-quality formats such as PostScript. Conclusion Caryoscope is a useful tool that can aid in the visualization, exploration and interpretation of microarray data in a genomic context. PMID:15488149

  19. GenColors: annotation and comparative genomics of prokaryotes made easy.

    PubMed

    Romualdi, Alessandro; Felder, Marius; Rose, Dominic; Gausmann, Ulrike; Schilhabel, Markus; Glöckner, Gernot; Platzer, Matthias; Sühnel, Jürgen

    2007-01-01

    GenColors (gencolors.fli-leibniz.de) is a new web-based software/database system aimed at an improved and accelerated annotation of prokaryotic genomes considering information on related genomes and making extensive use of genome comparison. It offers a seamless integration of data from ongoing sequencing projects and annotated genomic sequences obtained from GenBank. A variety of export/import filters manages an effective data flow from sequence assembly and manipulation programs (e.g., GAP4) to GenColors and back as well as to standard GenBank file(s). The genome comparison tools include best bidirectional hits, gene conservation, syntenies, and gene core sets. Precomputed UniProt matches allow annotation and analysis in an effective manner. In addition to these analysis options, base-specific quality data (coverage and confidence) can also be handled if available. The GenColors system can be used both for annotation purposes in ongoing genome projects and as an analysis tool for finished genomes. GenColors comes in two types, as dedicated genome browsers and as the Jena Prokaryotic Genome Viewer (JPGV). Dedicated genome browsers contain genomic information on a set of related genomes and offer a large number of options for genome comparison. The system has been efficiently used in the genomic sequencing of Borrelia garinii and is currently applied to various ongoing genome projects on Borrelia, Legionella, Escherichia, and Pseudomonas genomes. One of these dedicated browsers, the Spirochetes Genome Browser (sgb.fli-leibniz.de) with Borrelia, Leptospira, and Treponema genomes, is freely accessible. The others will be released after finalization of the corresponding genome projects. JPGV (jpgv.fli-leibniz.de) offers information on almost all finished bacterial genomes, as compared to the dedicated browsers with reduced genome comparison functionality, however. As of January 2006, this viewer includes 632 genomic elements (e.g., chromosomes and plasmids) of 293 species. The system provides versatile quick and advanced search options for all currently known prokaryotic genomes and generates circular and linear genome plots. Gene information sheets contain basic gene information, database search options, and links to external databases. GenColors is also available on request for local installation.

  20. Genome alignment with graph data structures: a comparison

    PubMed Central

    2014-01-01

    Background Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference. Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment. Results We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures. Conclusion We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools. PMID:24712884

  1. HpBase: A genome database of a sea urchin, Hemicentrotus pulcherrimus.

    PubMed

    Kinjo, Sonoko; Kiyomoto, Masato; Yamamoto, Takashi; Ikeo, Kazuho; Yaguchi, Shunsuke

    2018-04-01

    To understand the mystery of life, it is important to accumulate genomic information for various organisms because the whole genome encodes the commands for all the genes. Since the genome of Strongylocentrotus purpratus was sequenced in 2006 as the first sequenced genome in echinoderms, the genomic resources of other North American sea urchins have gradually been accumulated, but no sea urchin genomes are available in other areas, where many scientists have used the local species and reported important results. In this manuscript, we report a draft genome of the sea urchin Hemincentrotus pulcherrimus because this species has a long history as the target of developmental and cell biology in East Asia. The genome of H. pulcherrimus was assembled into 16,251 scaffold sequences with an N50 length of 143 kbp, and approximately 25,000 genes were identified in the genome. The size of the genome and the sequencing coverage were estimated to be approximately 800 Mbp and 100×, respectively. To provide these data and information of annotation, we constructed a database, HpBase (http://cell-innovation.nig.ac.jp/Hpul/). In HpBase, gene searches, genome browsing, and blast searches are available. In addition, HpBase includes the "recipes" for experiments from each lab using H. pulcherrimus. These recipes will continue to be updated according to the circumstances of individual scientists and can be powerful tools for experimental biologists and for the community. HpBase is a suitable dataset for evolutionary, developmental, and cell biologists to compare H. pulcherrimus genomic information with that of other species and to isolate gene information. © 2018 Japanese Society of Developmental Biologists.

  2. CoGI: Towards Compressing Genomes as an Image.

    PubMed

    Xie, Xiaojing; Zhou, Shuigeng; Guan, Jihong

    2015-01-01

    Genomic science is now facing an explosive increase of data thanks to the fast development of sequencing technology. This situation poses serious challenges to genomic data storage and transferring. It is desirable to compress data to reduce storage and transferring cost, and thus to boost data distribution and utilization efficiency. Up to now, a number of algorithms / tools have been developed for compressing genomic sequences. Unlike the existing algorithms, most of which treat genomes as one-dimensional text strings and compress them based on dictionaries or probability models, this paper proposes a novel approach called CoGI (the abbreviation of Compressing Genomes as an Image) for genome compression, which transforms the genomic sequences to a two-dimensional binary image (or bitmap), then applies a rectangular partition coding algorithm to compress the binary image. CoGI can be used as either a reference-based compressor or a reference-free compressor. For the former, we develop two entropy-based algorithms to select a proper reference genome. Performance evaluation is conducted on various genomes. Experimental results show that the reference-based CoGI significantly outperforms two state-of-the-art reference-based genome compressors GReEn and RLZ-opt in both compression ratio and compression efficiency. It also achieves comparable compression ratio but two orders of magnitude higher compression efficiency in comparison with XM--one state-of-the-art reference-free genome compressor. Furthermore, our approach performs much better than Gzip--a general-purpose and widely-used compressor, in both compression speed and compression ratio. So, CoGI can serve as an effective and practical genome compressor. The source code and other related documents of CoGI are available at: http://admis.fudan.edu.cn/projects/cogi.htm.

  3. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this needs to be experimentally characterized with ecologically relevant phenotype properties. This study justifies the need to sequence multiple isolates, especially from P. fluorescens group in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.« less

  4. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    DOE PAGES

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; ...

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this needs to be experimentally characterized with ecologically relevant phenotype properties. This study justifies the need to sequence multiple isolates, especially from P. fluorescens group in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.« less

  5. In silico genomic analyses reveal three distinct lineages of Escherichia coli O157:H7, one of which is associated with hyper-virulence.

    PubMed

    Laing, Chad R; Buchanan, Cody; Taboada, Eduardo N; Zhang, Yongxiang; Karmali, Mohamed A; Thomas, James E; Gannon, Victor Pj

    2009-06-29

    Many approaches have been used to study the evolution, population structure and genetic diversity of Escherichia coli O157:H7; however, observations made with different genotyping systems are not easily relatable to each other. Three genetic lineages of E. coli O157:H7 designated I, II and I/II have been identified using octamer-based genome scanning and microarray comparative genomic hybridization (mCGH). Each lineage contains significant phenotypic differences, with lineage I strains being the most commonly associated with human infections. Similarly, a clade of hyper-virulent O157:H7 strains implicated in the 2006 spinach and lettuce outbreaks has been defined using single-nucleotide polymorphism (SNP) typing. In this study an in silico comparison of six different genotyping approaches was performed on 19 E. coli genome sequences from 17 O157:H7 strains and single O145:NM and K12 MG1655 strains to provide an overall picture of diversity of the E. coli O157:H7 population, and to compare genotyping methods for O157:H7 strains. In silico determination of lineage, Shiga-toxin bacteriophage integration site, comparative genomic fingerprint, mCGH profile, novel region distribution profile, SNP type and multi-locus variable number tandem repeat analysis type was performed and a supernetwork based on the combination of these methods was produced. This supernetwork showed three distinct clusters of strains that were O157:H7 lineage-specific, with the SNP-based hyper-virulent clade 8 synonymous with O157:H7 lineage I/II. Lineage I/II/clade 8 strains clustered closest on the supernetwork to E. coli K12 and E. coli O55:H7, O145:NM and sorbitol-fermenting O157 strains. The results of this study highlight the similarities in relationships derived from multi-locus genome sampling methods and suggest a "common genotyping language" may be devised for population genetics and epidemiological studies. Future genotyping methods should provide data that can be stored centrally and accessed locally in an easily transferable, informative and extensible format based on comparative genomic analyses.

  6. Leveraging Comparative Genomics to Identify and Functionally Characterize Genes Associated with Sperm Phenotypes in Python bivittatus (Burmese Python)

    PubMed Central

    Rutllant, Josep

    2016-01-01

    Comparative genomics approaches provide a means of leveraging functional genomics information from a highly annotated model organism's genome (such as the mouse genome) in order to make physiological inferences about the role of genes and proteins in a less characterized organism's genome (such as the Burmese python). We employed a comparative genomics approach to produce the functional annotation of Python bivittatus genes encoding proteins associated with sperm phenotypes. We identify 129 gene-phenotype relationships in the python which are implicated in 10 specific sperm phenotypes. Results obtained through our systematic analysis identified subsets of python genes exhibiting associations with gene ontology annotation terms. Functional annotation data was represented in a semantic scatter plot. Together, these newly annotated Python bivittatus genome resources provide a high resolution framework from which the biology relating to reptile spermatogenesis, fertility, and reproduction can be further investigated. Applications of our research include (1) production of genetic diagnostics for assessing fertility in domestic and wild reptiles; (2) enhanced assisted reproduction technology for endangered and captive reptiles; and (3) novel molecular targets for biotechnology-based approaches aimed at reducing fertility and reproduction of invasive reptiles. Additional enhancements to reptile genomic resources will further enhance their value. PMID:27200191

  7. Leveraging Comparative Genomics to Identify and Functionally Characterize Genes Associated with Sperm Phenotypes in Python bivittatus (Burmese Python).

    PubMed

    Irizarry, Kristopher J L; Rutllant, Josep

    2016-01-01

    Comparative genomics approaches provide a means of leveraging functional genomics information from a highly annotated model organism's genome (such as the mouse genome) in order to make physiological inferences about the role of genes and proteins in a less characterized organism's genome (such as the Burmese python). We employed a comparative genomics approach to produce the functional annotation of Python bivittatus genes encoding proteins associated with sperm phenotypes. We identify 129 gene-phenotype relationships in the python which are implicated in 10 specific sperm phenotypes. Results obtained through our systematic analysis identified subsets of python genes exhibiting associations with gene ontology annotation terms. Functional annotation data was represented in a semantic scatter plot. Together, these newly annotated Python bivittatus genome resources provide a high resolution framework from which the biology relating to reptile spermatogenesis, fertility, and reproduction can be further investigated. Applications of our research include (1) production of genetic diagnostics for assessing fertility in domestic and wild reptiles; (2) enhanced assisted reproduction technology for endangered and captive reptiles; and (3) novel molecular targets for biotechnology-based approaches aimed at reducing fertility and reproduction of invasive reptiles. Additional enhancements to reptile genomic resources will further enhance their value.

  8. Deciphering the Diploid Ancestral Genome of the Mesohexaploid Brassica rapa[C][W

    PubMed Central

    Cheng, Feng; Mandáková, Terezie; Wu, Jian; Xie, Qi; Lysak, Martin A.; Wang, Xiaowu

    2013-01-01

    The genus Brassica includes several important agricultural and horticultural crops. Their current genome structures were shaped by whole-genome triplication followed by extensive diploidization. The availability of several crucifer genome sequences, especially that of Chinese cabbage (Brassica rapa), enables study of the evolution of the mesohexaploid Brassica genomes from their diploid progenitors. We reconstructed three ancestral subgenomes of B. rapa (n = 10) by comparing its whole-genome sequence to ancestral and extant Brassicaceae genomes. All three B. rapa paleogenomes apparently consisted of seven chromosomes, similar to the ancestral translocation Proto-Calepineae Karyotype (tPCK; n = 7), which is the evolutionarily younger variant of the Proto-Calepineae Karyotype (n = 7). Based on comparative analysis of genome sequences or linkage maps of Brassica oleracea, Brassica nigra, radish (Raphanus sativus), and other closely related species, we propose a two-step merging of three tPCK-like genomes to form the hexaploid ancestor of the tribe Brassiceae with 42 chromosomes. Subsequent diversification of the Brassiceae was marked by extensive genome reshuffling and chromosome number reduction mediated by translocation events and followed by loss and/or inactivation of centromeres. Furthermore, via interspecies genome comparison, we refined intervals for seven of the genomic blocks of the Ancestral Crucifer Karyotype (n = 8), thus revising the key reference genome for evolutionary genomics of crucifers. PMID:23653472

  9. Complete Genome Analysis of Thermus parvatiensis and Comparative Genomics of Thermus spp. Provide Insights into Genetic Variability and Evolution of Natural Competence as Strategic Survival Attributes

    PubMed Central

    Tripathi, Charu; Mishra, Harshita; Khurana, Himani; Dwivedi, Vatsala; Kamra, Komal; Negi, Ram K.; Lal, Rup

    2017-01-01

    Thermophilic environments represent an interesting niche. Among thermophiles, the genus Thermus is among the most studied genera. In this study, we have sequenced the genome of Thermus parvatiensis strain RL, a thermophile isolated from Himalayan hot water springs (temperature >96°C) using PacBio RSII SMRT technique. The small genome (2.01 Mbp) comprises a chromosome (1.87 Mbp) and a plasmid (143 Kbp), designated in this study as pTP143. Annotation revealed a high number of repair genes, a squeezed genome but containing highly plastic plasmid with transposases, integrases, mobile elements and hypothetical proteins (44%). We performed a comparative genomic study of the group Thermus with an aim of analysing the phylogenetic relatedness as well as niche specific attributes prevalent among the group. We compared the reference genome RL with 16 Thermus genomes to assess their phylogenetic relationships based on 16S rRNA gene sequences, average nucleotide identity (ANI), conserved marker genes (31 and 400), pan genome and tetranucleotide frequency. The core genome of the analyzed genomes contained 1,177 core genes and many singleton genes were detected in individual genomes, reflecting a conserved core but adaptive pan repertoire. We demonstrated the presence of metagenomic islands (chromosome:5, plasmid:5) by recruiting raw metagenomic data (from the same niche) against the genomic replicons of T. parvatiensis. We also dissected the CRISPR loci wide all genomes and found widespread presence of this system across Thermus genomes. Additionally, we performed a comparative analysis of competence loci wide Thermus genomes and found evidence for recent horizontal acquisition of the locus and continued dispersal among members reflecting that natural competence is a beneficial survival trait among Thermus members and its acquisition depicts unending evolution in order to accomplish optimal fitness. PMID:28798737

  10. Substantial genome synteny preservation among woody angiosperm species: comparative genomics of Chinese chestnut (Castanea mollissima) and plant reference genomes.

    PubMed

    Staton, Margaret; Zhebentyayeva, Tetyana; Olukolu, Bode; Fang, Guang Chen; Nelson, Dana; Carlson, John E; Abbott, Albert G

    2015-10-05

    Chinese chestnut (Castanea mollissima) has emerged as a model species for the Fagaceae family with extensive genomic resources including a physical map, a dense genetic map and quantitative trait loci (QTLs) for chestnut blight resistance. These resources enable comparative genomics analyses relative to model plants. We assessed the degree of conservation between the chestnut genome and other well annotated and assembled plant genomic sequences, focusing on the QTL regions of most interest to the chestnut breeding community. The integrated physical and genetic map of Chinese chestnut has been improved to now include 858 shared sequence-based markers. The utility of the integrated map has also been improved through the addition of 42,970 BAC (bacterial artificial chromosome) end sequences spanning over 26 million bases of the estimated 800 Mb chestnut genome. Synteny between chestnut and ten model plant species was conducted on a macro-syntenic scale using sequences from both individual probes and BAC end sequences across the chestnut physical map. Blocks of synteny with chestnut were found in all ten reference species, with the percent of the chestnut physical map that could be aligned ranging from 10 to 39 %. The integrated genetic and physical map was utilized to identify BACs that spanned the three previously identified QTL regions conferring blight resistance. The clones were pooled and sequenced, yielding 396 sequence scaffolds covering 13.9 Mbp. Comparative genomic analysis on a microsytenic scale, using the QTL-associated genomic sequence, identified synteny from chestnut to other plant genomes ranging from 5.4 to 12.9 % of the genome sequences aligning. On both the macro- and micro-synteny levels, the peach, grape and poplar genomes were found to be the most structurally conserved with chestnut. Interestingly, these results did not strictly follow the expectation that decreased phylogenetic distance would correspond to increased levels of genome preservation, but rather suggest the additional influence of life-history traits on preservation of synteny. The regions of synteny that were detected provide an important tool for defining and cataloging genes in the QTL regions for advancing chestnut blight resistance research.

  11. Improving draft genome contiguity with reference-derived in silico mate-pair libraries.

    PubMed

    Grau, José Horacio; Hackl, Thomas; Koepfli, Klaus-Peter; Hofreiter, Michael

    2018-05-01

    Contiguous genome assemblies are a highly valued biological resource because of the higher number of completely annotated genes and genomic elements that are usable compared to fragmented draft genomes. Nonetheless, contiguity is difficult to obtain if only low coverage data and/or only distantly related reference genome assemblies are available. In order to improve genome contiguity, we have developed Cross-Species Scaffolding-a new pipeline that imports long-range distance information directly into the de novo assembly process by constructing mate-pair libraries in silico. We show how genome assembly metrics and gene prediction dramatically improve with our pipeline by assembling two primate genomes solely based on ∼30x coverage of shotgun sequencing data.

  12. Comparative genomic analysis of single-molecule sequencing and hybrid approaches for finishing the Clostridium autoethanogenum JA1-1 strain DSM 10061 genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, Steven D; Nagaraju, Shilpa; Utturkar, Sagar M

    Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G +more » C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a phophoenolpyruvate synthase) and substrate utilization pathway (mannose and aromatics utilization) that might explain phenotypic differences between C. autoethanogenum and C. ljungdahlii. Conclusions Single molecule sequencing will be increasingly used to produce finished microbial genomes. The complete genome will facilitate comparative genomics and functional genomics and support future comparisons between Clostridia and studies that examine the evolution of plasmids, bacteriophage and CRISPR systems.« less

  13. Genome-wide analysis of Dongxiang wild rice (Oryza rufipogon Griff.) to investigate lost/acquired genes during rice domestication.

    PubMed

    Zhang, Fantao; Xu, Tao; Mao, Linyong; Yan, Shuangyong; Chen, Xiwen; Wu, Zhenfeng; Chen, Rui; Luo, Xiangdong; Xie, Jiankun; Gao, Shan

    2016-04-26

    It is widely accepted that cultivated rice (Oryza sativa L.) was domesticated from common wild rice (Oryza rufipogon Griff.). Compared to other studies which concentrate on rice origin, this study is to genetically elucidate the substantially phenotypic and physiological changes from wild rice to cultivated rice at the whole genome level. Instead of comparing two assembled genomes, this study directly compared the Dongxiang wild rice (DXWR) Illumina sequencing reads with the Nipponbare (O. sativa) complete genome without assembly of the DXWR genome. Based on the results from the comparative genomics analysis, structural variations (SVs) between DXWR and Nipponbare were determined to locate deleted genes which could have been acquired by Nipponbare during rice domestication. To overcome the limit of the SV detection, the DXWR transcriptome was also sequenced and compared with the Nipponbare transcriptome to discover the genes which could have been lost in DXWR during domestication. Both 1591 Nipponbare-acquired genes and 206 DXWR-lost transcripts were further analyzed using annotations from multiple sources. The NGS data are available in the NCBI SRA database with ID SRP070627. These results help better understanding the domestication from wild rice to cultivated rice at the whole genome level and provide a genomic data resource for rice genetic research or breeding. One finding confirmed transposable elements contribute greatly to the genome evolution from wild rice to cultivated rice. Another finding suggested the photophosphorylation and oxidative phosphorylation system in cultivated rice could have adapted to environmental changes simultaneously during domestication.

  14. Diversity and evolution of phycobilisomes in marine Synechococcus spp.: a comparative genomics study.

    PubMed

    Six, Christophe; Thomas, Jean-Claude; Garczarek, Laurence; Ostrowski, Martin; Dufresne, Alexis; Blot, Nicolas; Scanlan, David J; Partensky, Frédéric

    2007-01-01

    Marine Synechococcus owe their specific vivid color (ranging from blue-green to orange) to their large extrinsic antenna complexes called phycobilisomes, comprising a central allophycocyanin core and rods of variable phycobiliprotein composition. Three major pigment types can be defined depending on the major phycobiliprotein found in the rods (phycocyanin, phycoerythrin I or phycoerythrin II). Among strains containing both phycoerythrins I and II, four subtypes can be distinguished based on the ratio of the two chromophores bound to these phycobiliproteins. Genomes of eleven marine Synechococcus strains recently became available with one to four strains per pigment type or subtype, allowing an unprecedented comparative genomics study of genes involved in phycobilisome metabolism. By carefully comparing the Synechococcus genomes, we have retrieved candidate genes potentially required for the synthesis of phycobiliproteins in each pigment type. This includes linker polypeptides, phycobilin lyases and a number of novel genes of uncharacterized function. Interestingly, strains belonging to a given pigment type have similar phycobilisome gene complements and organization, independent of the core genome phylogeny (as assessed using concatenated ribosomal proteins). While phylogenetic trees based on concatenated allophycocyanin protein sequences are congruent with the latter, those based on phycocyanin and phycoerythrin notably differ and match the Synechococcus pigment types. We conclude that the phycobilisome core has likely evolved together with the core genome, while rods must have evolved independently, possibly by lateral transfer of phycobilisome rod genes or gene clusters between Synechococcus strains, either via viruses or by natural transformation, allowing rapid adaptation to a variety of light niches.

  15. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource

    PubMed Central

    Seaver, Samuel M. D.; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M. T.; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D.; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D.; Henry, Christopher S.

    2014-01-01

    The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today’s annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed. PMID:24927599

  16. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource.

    PubMed

    Seaver, Samuel M D; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M T; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D; Henry, Christopher S

    2014-07-01

    The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.

  17. A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    PubMed

    Thakur, Shalabh; Guttman, David S

    2016-06-30

    Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .

  18. CMG-Biotools, a Free Workbench for Basic Comparative Microbial Genomics

    PubMed Central

    Vesth, Tammi; Lagesen, Karin; Acar, Öncel; Ussery, David

    2013-01-01

    Background Today, there are more than a hundred times as many sequenced prokaryotic genomes than were present in the year 2000. The economical sequencing of genomic DNA has facilitated a whole new approach to microbial genomics. The real power of genomics is manifested through comparative genomics that can reveal strain specific characteristics, diversity within species and many other aspects. However, comparative genomics is a field not easily entered into by scientists with few computational skills. The CMG-biotools package is designed for microbiologists with limited knowledge of computational analysis and can be used to perform a number of analyses and comparisons of genomic data. Results The CMG-biotools system presents a stand-alone interface for comparative microbial genomics. The package is a customized operating system, based on Xubuntu 10.10, available through the open source Ubuntu project. The system can be installed on a virtual computer, allowing the user to run the system alongside any other operating system. Source codes for all programs are provided under GNU license, which makes it possible to transfer the programs to other systems if so desired. We here demonstrate the package by comparing and analyzing the diversity within the class Negativicutes, represented by 31 genomes including 10 genera. The analyses include 16S rRNA phylogeny, basic DNA and codon statistics, proteome comparisons using BLAST and graphical analyses of DNA structures. Conclusion This paper shows the strength and diverse use of the CMG-biotools system. The system can be installed on a vide range of host operating systems and utilizes as much of the host computer as desired. It allows the user to compare multiple genomes, from various sources using standardized data formats and intuitive visualizations of results. The examples presented here clearly shows that users with limited computational experience can perform complicated analysis without much training. PMID:23577086

  19. A Pan-Genomic Approach to Understand the Basis of Host Adaptation in Achromobacter

    PubMed Central

    Jeukens, Julie; Freschi, Luca; Vincent, Antony T.; Emond-Rheault, Jean-Guillaume; Kukavica-Ibrulj, Irena; Charette, Steve J.

    2017-01-01

    Over the past decade, there has been a rising interest in Achromobacter sp., an emerging opportunistic pathogen responsible for nosocomial and cystic fibrosis lung infections. Species of this genus are ubiquitous in the environment, can outcompete resident microbiota, and are resistant to commonly used disinfectants as well as antibiotics. Nevertheless, the Achromobacter genus suffers from difficulties in diagnosis, unresolved taxonomy and limited understanding of how it adapts to the cystic fibrosis lung, not to mention other host environments. The goals of this first genus-wide comparative genomics study were to clarify the taxonomy of this genus and identify genomic features associated with pathogenicity and host adaptation. This was done with a widely applicable approach based on pan-genome analysis. First, using all publicly available genomes, a combination of phylogenetic analysis based on 1,780 conserved genes with average nucleotide identity and accessory genome composition allowed the identification of a largely clinical lineage composed of Achromobacter xylosoxidans, Achromobacter insuavis, Achromobacter dolens, and Achromobacter ruhlandii. Within this lineage, we identified 35 positively selected genes involved in metabolism, regulation and efflux-mediated antibiotic resistance. Second, resistome analysis showed that this clinical lineage carried additional antibiotic resistance genes compared with other isolates. Finally, we identified putative mobile elements that contribute 53% of the genus’s resistome and support horizontal gene transfer between Achromobacter and other ecologically similar genera. This study provides strong phylogenetic and pan-genomic bases to motivate further research on Achromobacter, and contributes to the understanding of opportunistic pathogen evolution. PMID:28383665

  20. Using Comparative Genomics for Inquiry-Based Learning to Dissect Virulence of Escherichia coli O157:H7 and Yersinia pestis

    PubMed Central

    Baumler, David J.; Banta, Lois M.; Hung, Kai F.; Schwarz, Jodi A.; Cabot, Eric L.; Glasner, Jeremy D.; Perna, Nicole T.

    2012-01-01

    Genomics and bioinformatics are topics of increasing interest in undergraduate biological science curricula. Many existing exercises focus on gene annotation and analysis of a single genome. In this paper, we present two educational modules designed to enable students to learn and apply fundamental concepts in comparative genomics using examples related to bacterial pathogenesis. Students first examine alignments of genomes of Escherichia coli O157:H7 strains isolated from three food-poisoning outbreaks using the multiple-genome alignment tool Mauve. Students investigate conservation of virulence factors using the Mauve viewer and by browsing annotations available at the A Systematic Annotation Package for Community Analysis of Genomes database. In the second module, students use an alignment of five Yersinia pestis genomes to analyze single-nucleotide polymorphisms of three genes to classify strains into biovar groups. Students are then given sequences of bacterial DNA amplified from the teeth of corpses from the first and second pandemics of the bubonic plague and asked to classify these new samples. Learning-assessment results reveal student improvement in self-efficacy and content knowledge, as well as students' ability to use BLAST to identify genomic islands and conduct analyses of virulence factors from E. coli O157:H7 or Y. pestis. Each of these educational modules offers educators new ready-to-implement resources for integrating comparative genomic topics into their curricula. PMID:22383620

  1. Genome sequence of the Japanese oak silk moth, Antheraea yamamai: the first draft genome in the family Saturniidae

    PubMed Central

    Kim, Seong-Ryul; Kwak, Woori; Kim, Hyaekang; Kim, Kee-Young; Kim, Su-Bae; Choi, Kwang-Ho; Kim, Seong-Wan; Hwang, Jae-Sam; Kim, Minjee; Kim, Iksoo; Goo, Tae-Won

    2018-01-01

    Abstract Background Antheraea yamamai, also known as the Japanese oak silk moth, is a wild species of silk moth. Silk produced by A. yamamai, referred to as tensan silk, shows different characteristics such as thickness, compressive elasticity, and chemical resistance compared with common silk produced from the domesticated silkworm, Bombyx mori. Its unique characteristics have led to its use in many research fields including biotechnology and medical science, and the scientific as well as economic importance of the wild silk moth continues to gradually increase. However, no genomic information for the wild silk moth, including A. yamamai, is currently available. Findings In order to construct the A. yamamai genome, a total of 147G base pairs using Illumina and Pacbio sequencing platforms were generated, providing 210-fold coverage based on the 700-Mb estimated genome size of A. yamamai. The assembled genome of A. yamamai was 656 Mb (>2 kb) with 3675 scaffolds, and the N50 length of assembly was 739 Kb with a 34.07% GC ratio. Identified repeat elements covered 37.33% of the total genome, and the completeness of the constructed genome assembly was estimated to be 96.7% by Benchmarking Universal Single-Copy Orthologs v2 analysis. A total of 15 481 genes were identified using Evidence Modeler based on the gene prediction results obtained from 3 different methods (ab initio, RNA-seq-based, known-gene-based) and manual curation. Conclusions Here we present the genome sequence of A. yamamai, the first genome sequence of the wild silk moth. These results provide valuable genomic information, which will help enrich our understanding of the molecular mechanisms relating to not only specific phenotypes such as wild silk itself but also the genomic evolution of Saturniidae. PMID:29186418

  2. [Genome-editing: focus on the off-target effects].

    PubMed

    He, Xiubin; Gu, Feng

    2017-10-25

    Breakthroughs of genome-editing in recent years have paved the way to develop new therapeutic strategies. These genome-editing tools mainly include Zinc-finger nucleases (ZFNs), Transcription activator-like effector nucleases (TALENs), and clustered regulatory interspaced short palindromic repeat (CRISPR)/Cas-based RNA-guided DNA endonucleases. However, off-target effects are still the major issue in genome editing, and limit the application in gene therapy. Here, we summarized the cause and compared different detection methods of off-targets.

  3. Comparative Genomics of Completely Sequenced Lactobacillus helveticus Genomes Provides Insights into Strain-Specific Genes and Resolves Metagenomics Data Down to the Strain Level.

    PubMed

    Schmid, Michael; Muri, Jonathan; Melidis, Damianos; Varadarajan, Adithi R; Somerville, Vincent; Wicki, Adrian; Moser, Aline; Bourqui, Marc; Wenzel, Claudia; Eugster-Meier, Elisabeth; Frey, Juerg E; Irmler, Stefan; Ahrens, Christian H

    2018-01-01

    Although complete genome sequences hold particular value for an accurate description of core genomes, the identification of strain-specific genes, and as the optimal basis for functional genomics studies, they are still largely underrepresented in public repositories. Based on an assessment of the genome assembly complexity for all lactobacilli, we used Pacific Biosciences' long read technology to sequence and de novo assemble the genomes of three Lactobacillus helveticus starter strains, raising the number of completely sequenced strains to 12. The first comparative genomics study for L. helveticus -to our knowledge-identified a core genome of 988 genes and sets of unique, strain-specific genes ranging from about 30 to more than 200 genes. Importantly, the comparison of MiSeq- and PacBio-based assemblies uncovered that not only accessory but also core genes can be missed in incomplete genome assemblies based on short reads. Analysis of the three genomes revealed that a large number of pseudogenes were enriched for functional Gene Ontology categories such as amino acid transmembrane transport and carbohydrate metabolism, which is in line with a reductive genome evolution in the rich natural habitat of L. helveticus . Notably, the functional Clusters of Orthologous Groups of proteins categories "cell wall/membrane biogenesis" and "defense mechanisms" were found to be enriched among the strain-specific genes. A genome mining effort uncovered examples where an experimentally observed phenotype could be linked to the underlying genotype, such as for cell envelope proteinase PrtH3 of strain FAM8627. Another possible link identified for peptidoglycan hydrolases will require further experiments. Of note, strain FAM22155 did not harbor a CRISPR/Cas system; its loss was also observed in other L. helveticus strains and lactobacillus species, thus questioning the value of the CRISPR/Cas system for diagnostic purposes. Importantly, the complete genome sequences proved to be very useful for the analysis of natural whey starter cultures with metagenomics, as a larger percentage of the sequenced reads of these complex mixtures could be unambiguously assigned down to the strain level.

  4. Comparative Genomics of Completely Sequenced Lactobacillus helveticus Genomes Provides Insights into Strain-Specific Genes and Resolves Metagenomics Data Down to the Strain Level

    PubMed Central

    Schmid, Michael; Muri, Jonathan; Melidis, Damianos; Varadarajan, Adithi R.; Somerville, Vincent; Wicki, Adrian; Moser, Aline; Bourqui, Marc; Wenzel, Claudia; Eugster-Meier, Elisabeth; Frey, Juerg E.; Irmler, Stefan; Ahrens, Christian H.

    2018-01-01

    Although complete genome sequences hold particular value for an accurate description of core genomes, the identification of strain-specific genes, and as the optimal basis for functional genomics studies, they are still largely underrepresented in public repositories. Based on an assessment of the genome assembly complexity for all lactobacilli, we used Pacific Biosciences' long read technology to sequence and de novo assemble the genomes of three Lactobacillus helveticus starter strains, raising the number of completely sequenced strains to 12. The first comparative genomics study for L. helveticus—to our knowledge—identified a core genome of 988 genes and sets of unique, strain-specific genes ranging from about 30 to more than 200 genes. Importantly, the comparison of MiSeq- and PacBio-based assemblies uncovered that not only accessory but also core genes can be missed in incomplete genome assemblies based on short reads. Analysis of the three genomes revealed that a large number of pseudogenes were enriched for functional Gene Ontology categories such as amino acid transmembrane transport and carbohydrate metabolism, which is in line with a reductive genome evolution in the rich natural habitat of L. helveticus. Notably, the functional Clusters of Orthologous Groups of proteins categories “cell wall/membrane biogenesis” and “defense mechanisms” were found to be enriched among the strain-specific genes. A genome mining effort uncovered examples where an experimentally observed phenotype could be linked to the underlying genotype, such as for cell envelope proteinase PrtH3 of strain FAM8627. Another possible link identified for peptidoglycan hydrolases will require further experiments. Of note, strain FAM22155 did not harbor a CRISPR/Cas system; its loss was also observed in other L. helveticus strains and lactobacillus species, thus questioning the value of the CRISPR/Cas system for diagnostic purposes. Importantly, the complete genome sequences proved to be very useful for the analysis of natural whey starter cultures with metagenomics, as a larger percentage of the sequenced reads of these complex mixtures could be unambiguously assigned down to the strain level. PMID:29441050

  5. Genome Dynamics and Molecular Infection Epidemiology of Multidrug-Resistant Helicobacter pullorum Isolates Obtained from Broiler and Free-Range Chickens in India.

    PubMed

    Qumar, Shamsul; Majid, Mohammad; Kumar, Narender; Tiwari, Sumeet K; Semmler, Torsten; Devi, Savita; Baddam, Ramani; Hussain, Arif; Shaik, Sabiha; Ahmed, Niyaz

    2017-01-01

    Some life-threatening, foodborne, and zoonotic infections are transmitted through poultry birds. Inappropriate and indiscriminate use of antimicrobials in the livestock industry has led to an increased prevalence of multidrug-resistant bacteria with epidemic potential. Here, we present a functional molecular epidemiological analysis entailing the phenotypic and whole-genome sequence-based characterization of 11 H. pullorum isolates from broiler and free-range chickens sampled from retail wet markets in Hyderabad City, India. Antimicrobial susceptibility tests revealed all of the isolates to be resistant to multiple antibiotic classes such as fluoroquinolones, cephalosporins, sulfonamides, and macrolides. The isolates were also found to be extended-spectrum β-lactamase producers and were even resistant to clavulanic acid. Whole-genome sequencing and comparative genomic analysis of these isolates revealed the presence of five or six well-characterized antimicrobial resistance genes, including those encoding a resistance-nodulation-division efflux pump(s). Phylogenetic analysis combined with pan-genome analysis revealed a remarkable degree of genetic diversity among the isolates from free-range chickens; in contrast, a high degree of genetic similarity was observed among broiler chicken isolates. Comparative genomic analysis of all publicly available H. pullorum genomes, including our isolates (n = 16), together with the genomes of 17 other Helicobacter species, revealed a high number (8,560) of H. pullorum-specific protein-encoding genes, with an average of 535 such genes per isolate. In silico virulence screening identified 182 important virulence genes and also revealed high strain-specific gene content in isolates from free-range chickens (average, 34) compared to broiler chicken isolates. A significant prevalence of prophages (ranging from 1 to 9) and a significant presence of genomic islands (0 to 4) were observed in free-range and broiler chicken isolates. Taken together, these observations provide significant baseline data for functional molecular infection epidemiology of nonpyloric Helicobacter species such as H. pullorum by unraveling their evolution in chickens and their possible zoonotic transmission to humans. Globally, the poultry industry is expanding with an ever-growing consumer base for chicken meat. Given this, food-associated transmission of multidrug-resistant bacteria represents an important health care issue. Our study involves a critical baseline approach directed at genome sequence-based epidemiology and transmission dynamics of H. pullorum, a poultry pathogen having established zoonotic potential. We believe our studies would facilitate the development of surveillance systems that ensure the safety of food for humans and guide public health policies related to the use of antibiotics in animal feed in countries such as India. We sequenced 11 new genomes of H. pullorum as a part of this study. These genomes would provide much value in addition to the ongoing comparative genomic studies of helicobacters. Copyright © 2016 American Society for Microbiology.

  6. Automated array-based genomic profiling in chronic lymphocytic leukemia: Development of a clinical tool and discovery of recurrent genomic alterations

    PubMed Central

    Schwaenen, Carsten; Nessling, Michelle; Wessendorf, Swen; Salvi, Tatjana; Wrobel, Gunnar; Radlwimmer, Bernhard; Kestler, Hans A.; Haslinger, Christian; Stilgenbauer, Stephan; Döhner, Hartmut; Bentz, Martin; Lichter, Peter

    2004-01-01

    B cell chronic lymphocytic leukemia (B-CLL) is characterized by a highly variable clinical course. Recurrent chromosomal imbalances provide significant prognostic markers. Risk-adapted therapy based on genomic alterations has become an option that is currently being tested in clinical trials. To supply a robust tool for such large scale studies, we developed a comprehensive DNA microarray dedicated to the automated analysis of recurrent genomic imbalances in B-CLL by array-based comparative genomic hybridization (matrix–CGH). Validation of this chip in a series of 106 B-CLL cases revealed a high specificity and sensitivity that fulfils the criteria for application in clinical oncology. This chip is immediately applicable within clinical B-CLL treatment trials that evaluate whether B-CLL cases with distinct chromosomal abnormalities should be treated with chemotherapy of different intensities and/or stem cell transplantation. Through the control set of DNA fragments equally distributed over the genome, recurrent genomic imbalances were discovered: trisomy of chromosome 19 and gain of the MYCN oncogene correlating with an elevation of MYCN mRNA expression. PMID:14730057

  7. Simultaneous gene finding in multiple genomes.

    PubMed

    König, Stefanie; Romoth, Lars W; Gerischer, Lizzy; Stanke, Mario

    2016-11-15

    As the tree of life is populated with sequenced genomes ever more densely, the new challenge is the accurate and consistent annotation of entire clades of genomes. We address this problem with a new approach to comparative gene finding that takes a multiple genome alignment of closely related species and simultaneously predicts the location and structure of protein-coding genes in all input genomes, thereby exploiting negative selection and sequence conservation. The model prefers potential gene structures in the different genomes that are in agreement with each other, or-if not-where the exon gains and losses are plausible given the species tree. We formulate the multi-species gene finding problem as a binary labeling problem on a graph. The resulting optimization problem is NP hard, but can be efficiently approximated using a subgradient-based dual decomposition approach. The proposed method was tested on whole-genome alignments of 12 vertebrate and 12 Drosophila species. The accuracy was evaluated for human, mouse and Drosophila melanogaster and compared to competing methods. Results suggest that our method is well-suited for annotation of (a large number of) genomes of closely related species within a clade, in particular, when RNA-Seq data are available for many of the genomes. The transfer of existing annotations from one genome to another via the genome alignment is more accurate than previous approaches that are based on protein-spliced alignments, when the genomes are at close to medium distances. The method is implemented in C ++ as part of Augustus and available open source at http://bioinf.uni-greifswald.de/augustus/ CONTACT: stefaniekoenig@ymail.com or mario.stanke@uni-greifswald.deSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  8. A Proposal for a Genome Similarity-Based Taxonomy for Plant-Pathogenic Bacteria that Is Sufficiently Precise to Reflect Phylogeny, Host Range, and Outbreak Affiliation Applied to Pseudomonas syringae sensu lato as a Proof of Concept.

    PubMed

    Vinatzer, Boris A; Weisberg, Alexandra J; Monteil, Caroline L; Elmarakeby, Haitham A; Sheppard, Samuel K; Heath, Lenwood S

    2017-01-01

    Taxonomy of plant pathogenic bacteria is challenging because pathogens of different crops often belong to the same named species but current taxonomy does not provide names for bacteria below the subspecies level. The introduction of the host range-based pathovar system in the 1980s provided a temporary solution to this problem but has many limitations. The affordability of genome sequencing now provides the opportunity for developing a new genome-based taxonomic framework. We already proposed to name individual bacterial isolates based on pairwise genome similarity. Here, we expand on this idea and propose to use genome similarity-based codes, which we now call life identification numbers (LINs), to describe and name bacterial taxa. Using 93 genomes of Pseudomonas syringae sensu lato, LINs were compared with a P. syringae genome tree whereby the assigned LINs were found to be informative of a majority of phylogenetic relationships. LINs also reflected host range and outbreak association for strains of P. syringae pathovar actinidiae, a pathovar for which many genome sequences are available. We conclude that LINs could provide the basis for a new taxonomic framework to address the shortcomings of the current pathovar system and to complement the current taxonomic system of bacteria in general.

  9. Ebolavirus comparative genomics

    DOE PAGES

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; ...

    2015-07-14

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. We examine the dynamics of this genome, comparing more than one hundred currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus, and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of themore » same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP), and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. In conclusion, this information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.« less

  10. Typing and comparative genome analysis of Brucella melitensis isolated from Lebanon.

    PubMed

    Abou Zaki, Natalia; Salloum, Tamara; Osman, Marwan; Rafei, Rayane; Hamze, Monzer; Tokajian, Sima

    2017-10-16

    Brucella melitensis is the main causative agent of the zoonotic disease brucellosis. This study aimed at typing and characterizing genetic variation in 33 Brucella isolates recovered from patients in Lebanon. Bruce-ladder multiplex PCR and PCR-RFLP of omp31, omp2a and omp2b were performed. Sixteen representative isolates were chosen for draft-genome sequencing and analyzed to determine variations in virulence, resistance, genomic islands, prophages and insertion sequences. Comparative whole-genome single nucleotide polymorphism analysis was also performed. The isolates were confirmed to be B. melitensis. Genome analysis revealed multiple virulence determinants and efflux pumps. Genome comparisons and single nucleotide polymorphisms divided the isolates based on geographical distribution but revealed high levels of similarity between the strains. Sequence divergence in B. melitensis was mainly due to lateral gene transfer of mobile elements. This is the first report of an in-depth genomic characterization of B. melitensis in Lebanon. © FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  11. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes).

    PubMed

    Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-09-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.

  12. Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

    PubMed Central

    Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

    2011-01-01

    Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341

  13. SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

    PubMed

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.

  14. SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

    PubMed Central

    Yu, Xiaoyu; Reva, Oleg N

    2018-01-01

    Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354

  15. Development of Mycoplasma synoviae (MS) core genome multilocus sequence typing (cgMLST) scheme.

    PubMed

    Ghanem, Mostafa; El-Gazzar, Mohamed

    2018-05-01

    Mycoplasma synoviae (MS) is a poultry pathogen with reported increased prevalence and virulence in recent years. MS strain identification is essential for prevention, control efforts and epidemiological outbreak investigations. Multiple multilocus based sequence typing schemes have been developed for MS, yet the resolution of these schemes could be limited for outbreak investigation. The cost of whole genome sequencing became close to that of sequencing the seven MLST targets; however, there is no standardized method for typing MS strains based on whole genome sequences. In this paper, we propose a core genome multilocus sequence typing (cgMLST) scheme as a standardized and reproducible method for typing MS based whole genome sequences. A diverse set of 25 MS whole genome sequences were used to identify 302 core genome genes as cgMLST targets (35.5% of MS genome) and 44 whole genome sequences of MS isolates from six countries in four continents were used for typing applying this scheme. cgMLST based phylogenetic trees displayed a high degree of agreement with core genome SNP based analysis and available epidemiological information. cgMLST allowed evaluation of two conventional MLST schemes of MS. The high discriminatory power of cgMLST allowed differentiation between samples of the same conventional MLST type. cgMLST represents a standardized, accurate, highly discriminatory, and reproducible method for differentiation between MS isolates. Like conventional MLST, it provides stable and expandable nomenclature, allowing for comparing and sharing the typing results between different laboratories worldwide. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.

  16. Imputation-Based Genomic Coverage Assessments of Current Human Genotyping Arrays

    PubMed Central

    Nelson, Sarah C.; Doheny, Kimberly F.; Pugh, Elizabeth W.; Romm, Jane M.; Ling, Hua; Laurie, Cecelia A.; Browning, Sharon R.; Weir, Bruce S.; Laurie, Cathy C.

    2013-01-01

    Microarray single-nucleotide polymorphism genotyping, combined with imputation of untyped variants, has been widely adopted as an efficient means to interrogate variation across the human genome. “Genomic coverage” is the total proportion of genomic variation captured by an array, either by direct observation or through an indirect means such as linkage disequilibrium or imputation. We have performed imputation-based genomic coverage assessments of eight current genotyping arrays that assay from ~0.3 to ~5 million variants. Coverage was determined separately in each of the four continental ancestry groups in the 1000 Genomes Project phase 1 release. We used the subset of 1000 Genomes variants present on each array to impute the remaining variants and assessed coverage based on correlation between imputed and observed allelic dosages. More than 75% of common variants (minor allele frequency > 0.05) are covered by all arrays in all groups except for African ancestry, and up to ~90% in all ancestries for the highest density arrays. In contrast, less than 40% of less common variants (0.01 < minor allele frequency < 0.05) are covered by low density arrays in all ancestries and 50–80% in high density arrays, depending on ancestry. We also calculated genome-wide power to detect variant-trait association in a case-control design, across varying sample sizes, effect sizes, and minor allele frequency ranges, and compare these array-based power estimates with a hypothetical array that would type all variants in 1000 Genomes. These imputation-based genomic coverage and power analyses are intended as a practical guide to researchers planning genetic studies. PMID:23979933

  17. The nucleotide composition of microbial genomes indicates differential patterns of selection on core and accessory genomes.

    PubMed

    Bohlin, Jon; Eldholm, Vegard; Pettersson, John H O; Brynildsrud, Ola; Snipen, Lars

    2017-02-10

    The core genome consists of genes shared by the vast majority of a species and is therefore assumed to have been subjected to substantially stronger purifying selection than the more mobile elements of the genome, also known as the accessory genome. Here we examine intragenic base composition differences in core genomes and corresponding accessory genomes in 36 species, represented by the genomes of 731 bacterial strains, to assess the impact of selective forces on base composition in microbes. We also explore, in turn, how these results compare with findings for whole genome intragenic regions. We found that GC content in coding regions is significantly higher in core genomes than accessory genomes and whole genomes. Likewise, GC content variation within coding regions was significantly lower in core genomes than in accessory genomes and whole genomes. Relative entropy in coding regions, measured as the difference between observed and expected trinucleotide frequencies estimated from mononucleotide frequencies, was significantly higher in the core genomes than in accessory and whole genomes. Relative entropy was positively associated with coding region GC content within the accessory genomes, but not within the corresponding coding regions of core or whole genomes. The higher intragenic GC content and relative entropy, as well as the lower GC content variation, observed in the core genomes is most likely associated with selective constraints. It is unclear whether the positive association between GC content and relative entropy in the more mobile accessory genomes constitutes signatures of selection or selective neutral processes.

  18. PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

    PubMed Central

    Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J

    2008-01-01

    Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client side software setup or installation required. Source code is freely available to researchers interested in setting up a local version of PSAT for analysis of genomes not available through the public server. Access to the public web server and instructions for obtaining source code can be found at . PMID:18366802

  19. LAMP detection assays for boxwood blight pathogens: A comparative genomics approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Malapi-Wight, Martha; Demers, Jill E.; Veltri, Daniel

    Rapid and accurate molecular diagnostic tools are critical to efforts to minimize the impact and spread of emergent pathogens. The identification of diagnostic markers for novel pathogens presents several challenges, especially in the absence of information about population diversity and where genetic resources are limited. The objective of this study was to use comparative genomics datasets to find unique target regions suitable for the diagnosis of two fungal species causing a newly emergent blight disease of boxwood. Candidate marker regions for loop-mediated isothermal amplification (LAMP) assays were identified from draft genomes of Calonectria henricotiae and C. pseudonaviculata, as well asmore » three related species not associated with this disease. To increase the probability of identifying unique targets, we used three approaches to mine genome datasets, based on (i) unique regions, (ii) polymorphisms, and (iii) presence/absence of regions across datasets. From a pool of candidate markers, we demonstrate LAMP assay specificity by testing related fungal species, common boxwood pathogens, and environmental samples containing 445 diverse fungal taxa. In conclusion, this comparative-genomics-based approach to the development of LAMP diagnostic assays is the first of its kind for fungi and could be easily applied to diagnostic marker development for other newly emergent plant pathogens.« less

  20. LAMP detection assays for boxwood blight pathogens: A comparative genomics approach

    DOE PAGES

    Malapi-Wight, Martha; Demers, Jill E.; Veltri, Daniel; ...

    2016-05-20

    Rapid and accurate molecular diagnostic tools are critical to efforts to minimize the impact and spread of emergent pathogens. The identification of diagnostic markers for novel pathogens presents several challenges, especially in the absence of information about population diversity and where genetic resources are limited. The objective of this study was to use comparative genomics datasets to find unique target regions suitable for the diagnosis of two fungal species causing a newly emergent blight disease of boxwood. Candidate marker regions for loop-mediated isothermal amplification (LAMP) assays were identified from draft genomes of Calonectria henricotiae and C. pseudonaviculata, as well asmore » three related species not associated with this disease. To increase the probability of identifying unique targets, we used three approaches to mine genome datasets, based on (i) unique regions, (ii) polymorphisms, and (iii) presence/absence of regions across datasets. From a pool of candidate markers, we demonstrate LAMP assay specificity by testing related fungal species, common boxwood pathogens, and environmental samples containing 445 diverse fungal taxa. In conclusion, this comparative-genomics-based approach to the development of LAMP diagnostic assays is the first of its kind for fungi and could be easily applied to diagnostic marker development for other newly emergent plant pathogens.« less

  1. Comparative genomic analysis of Lactobacillus plantarum ZJ316 reveals its genetic adaptation and potential probiotic profiles* #

    PubMed Central

    Li, Ping; Li, Xuan; Gu, Qing; Lou, Xiu-yu; Zhang, Xiao-mei; Song, Da-feng; Zhang, Chen

    2016-01-01

    Objective: In previous studies, Lactobacillus plantarum ZJ316 showed probiotic properties, such as antimicrobial activity against various pathogens and the capacity to significantly improve pig growth and pork quality. The purpose of this study was to reveal the genes potentially related to its genetic adaptation and probiotic profiles based on comparative genomic analysis. Methods: The genome sequence of L. plantarum ZJ316 was compared with those of eight L. plantarum strains deposited in GenBank. BLASTN, Mauve, and MUMmer programs were used for genome alignment and comparison. CRISPRFinder was applied for searching the clustered regularly interspaced short palindromic repeats (CRISPRs). Results: We identified genes that encode proteins related to genetic adaptation and probiotic profiles, including carbohydrate transport and metabolism, proteolytic enzyme systems and amino acid biosynthesis, CRISPR adaptive immunity, stress responses, bile salt resistance, ability to adhere to the host intestinal wall, exopolysaccharide (EPS) biosynthesis, and bacteriocin biosynthesis. Conclusions: Comparative characterization of the L. plantarum ZJ316 genome provided the genetic basis for further elucidating the functional mechanisms of its probiotic properties. ZJ316 could be considered a potential probiotic candidate. PMID:27487802

  2. Comparative genomic analysis of Lactobacillus plantarum ZJ316 reveals its genetic adaptation and potential probiotic profiles.

    PubMed

    Li, Ping; Li, Xuan; Gu, Qing; Lou, Xiu-Yu; Zhang, Xiao-Mei; Song, Da-Feng; Zhang, Chen

    2016-08-01

    In previous studies, Lactobacillus plantarum ZJ316 showed probiotic properties, such as antimicrobial activity against various pathogens and the capacity to significantly improve pig growth and pork quality. The purpose of this study was to reveal the genes potentially related to its genetic adaptation and probiotic profiles based on comparative genomic analysis. The genome sequence of L. plantarum ZJ316 was compared with those of eight L. plantarum strains deposited in GenBank. BLASTN, Mauve, and MUMmer programs were used for genome alignment and comparison. CRISPRFinder was applied for searching the clustered regularly interspaced short palindromic repeats (CRISPRs). We identified genes that encode proteins related to genetic adaptation and probiotic profiles, including carbohydrate transport and metabolism, proteolytic enzyme systems and amino acid biosynthesis, CRISPR adaptive immunity, stress responses, bile salt resistance, ability to adhere to the host intestinal wall, exopolysaccharide (EPS) biosynthesis, and bacteriocin biosynthesis. Comparative characterization of the L. plantarum ZJ316 genome provided the genetic basis for further elucidating the functional mechanisms of its probiotic properties. ZJ316 could be considered a potential probiotic candidate.

  3. Global Metabolic Reconstruction and Metabolic Gene Evolution in the Cattle Genome

    PubMed Central

    Kim, Woonsu; Park, Hyesun; Seo, Seongwon

    2016-01-01

    The sequence of cattle genome provided a valuable opportunity to systematically link genetic and metabolic traits of cattle. The objectives of this study were 1) to reconstruct genome-scale cattle-specific metabolic pathways based on the most recent and updated cattle genome build and 2) to identify duplicated metabolic genes in the cattle genome for better understanding of metabolic adaptations in cattle. A bioinformatic pipeline of an organism for amalgamating genomic annotations from multiple sources was updated. Using this, an amalgamated cattle genome database based on UMD_3.1, was created. The amalgamated cattle genome database is composed of a total of 33,292 genes: 19,123 consensus genes between NCBI and Ensembl databases, 8,410 and 5,493 genes only found in NCBI or Ensembl, respectively, and 266 genes from NCBI scaffolds. A metabolic reconstruction of the cattle genome and cattle pathway genome database (PGDB) was also developed using Pathway Tools, followed by an intensive manual curation. The manual curation filled or revised 68 pathway holes, deleted 36 metabolic pathways, and added 23 metabolic pathways. Consequently, the curated cattle PGDB contains 304 metabolic pathways, 2,460 reactions including 2,371 enzymatic reactions, and 4,012 enzymes. Furthermore, this study identified eight duplicated genes in 12 metabolic pathways in the cattle genome compared to human and mouse. Some of these duplicated genes are related with specific hormone biosynthesis and detoxifications. The updated genome-scale metabolic reconstruction is a useful tool for understanding biology and metabolic characteristics in cattle. There has been significant improvements in the quality of cattle genome annotations and the MetaCyc database. The duplicated metabolic genes in the cattle genome compared to human and mouse implies evolutionary changes in the cattle genome and provides a useful information for further research on understanding metabolic adaptations of cattle. PMID:26992093

  4. A genomic view of food-related and probiotic Enterococcus strains.

    PubMed

    Bonacina, Julieta; Suárez, Nadia; Hormigo, Ricardo; Fadda, Silvina; Lechner, Marcus; Saavedra, Lucila

    2017-02-01

    The study of enterococcal genomes has grown considerably in recent years. While special attention is paid to comparative genomic analysis among clinical relevant isolates, in this study we performed an exhaustive comparative analysis of enterococcal genomes of food origin and/or with potential to be used as probiotics. Beyond common genetic features, we especially aimed to identify those that are specific to enterococcal strains isolated from a certain food-related source as well as features present in a species-specific manner. Thus, the genome sequences of 25 Enterococcus strains, from 7 different species, were examined and compared. Their phylogenetic relationship was reconstructed based on orthologous proteins and whole genomes. Likewise, markers associated with a successful colonization (bacteriocin genes and genomic islands) and genome plasticity (phages and clustered regularly interspaced short palindromic repeats) were investigated for lifestyle specific genetic features. At the same time, a search for antibiotic resistance genes was carried out, since they are of big concern in the food industry. Finally, it was possible to locate 1617 FIGfam families as a core proteome universally present among the genera and to determine that most of the accessory genes code for hypothetical proteins, providing reasonable hints to support their functional characterization. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  5. Extensive Mobilome-Driven Genome Diversification in Mouse Gut-Associated Bacteroides vulgatus mpk

    PubMed Central

    Lange, Anna; Beier, Sina; Steimle, Alex; Autenrieth, Ingo B.; Huson, Daniel H.; Frick, Julia-Stefanie

    2016-01-01

    Like many other Bacteroides species, Bacteroides vulgatus strain mpk, a mouse fecal isolate which was shown to promote intestinal homeostasis, utilizes a variety of mobile elements for genome evolution. Based on sequences collected by Pacific Biosciences SMRT sequencing technology, we discuss the challenges of assembling and studying a bacterial genome of high plasticity. Additionally, we conducted comparative genomics comparing this commensal strain with the B. vulgatus type strain ATCC 8482 as well as multiple other Bacteroides and Parabacteroides strains to reveal the most important differences and identify the unique features of B. vulgatus mpk. The genome of B. vulgatus mpk harbors a large and diverse set of mobile element proteins compared with other sequenced Bacteroides strains. We found evidence of a number of different horizontal gene transfer events and a genome landscape that has been extensively altered by different mobilization events. A CRISPR/Cas system could be identified that provides a possible mechanism for preventing the integration of invading external DNA. We propose that the high genome plasticity and the introduced genome instabilities of B. vulgatus mpk arising from the various mobilization events might play an important role not only in its adaptation to the challenging intestinal environment in general, but also in its ability to interact with the gut microbiota. PMID:27071651

  6. Genomic selection models double the accuracy of predicted breeding values for bacterial cold water disease resistance compared to a traditional pedigree-based model in rainbow trout aquaculture

    USDA-ARS?s Scientific Manuscript database

    Previously we have shown that bacterial cold water disease (BCWD) resistance in rainbow trout can be improved using traditional family-based selection, but progress has been limited to exploiting only between-family genetic variation. Genomic selection (GS) is a new alternative enabling exploitation...

  7. Genome-enabled selection doubles the accuracy of predicted breeding values for bacterial cold water disease resistance compared to traditional family-based selection in rainbow trout aquaculture

    USDA-ARS?s Scientific Manuscript database

    We have shown previously that bacterial cold water disease (BCWD) resistance in rainbow trout can be improved using traditional family-based selection, but progress has been limited to exploiting only between-family genetic variation. Genomic selection (GS) is a new alternative enabling exploitation...

  8. The Comprehensive Phytopathogen Genomics Resource: a web-based resource for data-mining plant pathogen genomes.

    PubMed

    Hamilton, John P; Neeno-Eckwall, Eric C; Adhikari, Bishwo N; Perna, Nicole T; Tisserat, Ned; Leach, Jan E; Lévesque, C André; Buell, C Robin

    2011-01-01

    The Comprehensive Phytopathogen Genomics Resource (CPGR) provides a web-based portal for plant pathologists and diagnosticians to view the genome and trancriptome sequence status of 806 bacterial, fungal, oomycete, nematode, viral and viroid plant pathogens. Tools are available to search and analyze annotated genome sequences of 74 bacterial, fungal and oomycete pathogens. Oomycete and fungal genomes are obtained directly from GenBank, whereas bacterial genome sequences are downloaded from the A Systematic Annotation Package (ASAP) database that provides curation of genomes using comparative approaches. Curated lists of bacterial genes relevant to pathogenicity and avirulence are also provided. The Plant Pathogen Transcript Assemblies Database provides annotated assemblies of the transcribed regions of 82 eukaryotic genomes from publicly available single pass Expressed Sequence Tags. Data-mining tools are provided along with tools to create candidate diagnostic markers, an emerging use for genomic sequence data in plant pathology. The Plant Pathogen Ribosomal DNA (rDNA) database is a resource for pathogens that lack genome or transcriptome data sets and contains 131 755 rDNA sequences from GenBank for 17 613 species identified as plant pathogens and related genera. Database URL: http://cpgr.plantbiology.msu.edu.

  9. Genomecmp: computer software to detect genomic rearrangements using markers

    NASA Astrophysics Data System (ADS)

    Kulawik, Maciej; Nowak, Robert M.

    2017-08-01

    Detection of genomics rearrangements is a tough task, because of the size of data to be processed. As genome sequences may consist of hundreds of millions symbols, it is not only practically impossible to compare them by hand, but it is also complex problem for computer software. The way to significantly accelerate the process is to use rearrangement detection algorithm based on unique short sequences called markers. The algorithm described in this paper develops markers using base genome and find the markers positions on other genome. The algorithm has been extended by support for ambiguity symbols. Web application with graphical user interface has been created using three-layer architecture, where users could run the task simultaneously. The accuracy and efficiency of proposed solution has been studied using generated and real data.

  10. Self-similarity analysis of eubacteria genome based on weighted graph.

    PubMed

    Qi, Zhao-Hui; Li, Ling; Zhang, Zhi-Meng; Qi, Xiao-Qin

    2011-07-07

    We introduce a weighted graph model to investigate the self-similarity characteristics of eubacteria genomes. The regular treating in similarity comparison about genome is to discover the evolution distance among different genomes. Few people focus their attention on the overall statistical characteristics of each gene compared with other genes in the same genome. In our model, each genome is attributed to a weighted graph, whose topology describes the similarity relationship among genes in the same genome. Based on the related weighted graph theory, we extract some quantified statistical variables from the topology, and give the distribution of some variables derived from the largest social structure in the topology. The 23 eubacteria recently studied by Sorimachi and Okayasu are markedly classified into two different groups by their double logarithmic point-plots describing the similarity relationship among genes of the largest social structure in genome. The results show that the proposed model may provide us with some new sights to understand the structures and evolution patterns determined from the complete genomes. Copyright © 2011 Elsevier Ltd. All rights reserved.

  11. Comparative Analysis of Transcription Factors Families across Fungal Tree of Life

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Salamov, Asaf; Grigoriev, Igor

    2015-03-19

    Transcription factors (TFs) are proteins that regulate the transcription of genes, by binding to specific DNA sequences. Based on literature (Shelest, 2008; Weirauch and Hughes,2011) collected and manually curated list of DBD Pfam domains (in total 62 DBD domains) We looked for distribution of TFs in 395 fungal genomes plus additionally in plant genomes (Phytozome), prokaryotes(IMG), some animals/metazoans and protists genomes

  12. Draft Genome Sequences of New Genomospecies "Candidatus Pectobacterium maceratum" Strains, Which Cause Soft Rot in Plants.

    PubMed

    Shirshikov, Fedor V; Korzhenkov, Aleksei A; Miroshnikov, Kirill K; Kabanova, Anastasia P; Barannik, Alla P; Ignatov, Alexander N; Miroshnikov, Konstantin A

    2018-04-12

    Investigation of collections of phytopathogenic bacteria has revealed some strains distinct from known Pectobacterium spp. We report here the draft genome sequences of five such strains, isolated during the period of 1947 to 2012. Based on comparative genomics, we propose a new candidate genomospecies of the genus Pectobacterium , " Candidatus Pectobacterium maceratum." Copyright © 2018 Shirshikov et al.

  13. Draft Genome Sequences of New Genomospecies “Candidatus Pectobacterium maceratum” Strains, Which Cause Soft Rot in Plants

    PubMed Central

    2018-01-01

    ABSTRACT Investigation of collections of phytopathogenic bacteria has revealed some strains distinct from known Pectobacterium spp. We report here the draft genome sequences of five such strains, isolated during the period of 1947 to 2012. Based on comparative genomics, we propose a new candidate genomospecies of the genus Pectobacterium, “Candidatus Pectobacterium maceratum.” PMID:29650577

  14. Genome-wide array-based comparative genomic hybridization (array-CGH) analysis in Aicardi Syndrome

    USDA-ARS?s Scientific Manuscript database

    Aicardi syndrome is characterized by agenesis of the corpus callosum, chorioretinal lacunae, severe seizures (starting as infantile spasms), neuronal migration defects, mental retardation, costovertebral defects, and typical facial features. Because Aicardi syndrome is sporadic and affects only fem...

  15. Chloroplast Genome Evolution in Early Diverged Leptosporangiate Ferns

    PubMed Central

    Kim, Hyoung Tae; Chung, Myong Gi; Kim, Ki-Joong

    2014-01-01

    In this study, the chloroplast (cp) genome sequences from three early diverged leptosporangiate ferns were completed and analyzed in order to understand the evolution of the genome of the fern lineages. The complete cp genome sequence of Osmunda cinnamomea (Osmundales) was 142,812 base pairs (bp). The cp genome structure was similar to that of eusporangiate ferns. The gene/intron losses that frequently occurred in the cp genome of leptosporangiate ferns were not found in the cp genome of O. cinnamomea. In addition, putative RNA editing sites in the cp genome were rare in O. cinnamomea, even though the sites were frequently predicted to be present in leptosporangiate ferns. The complete cp genome sequence of Diplopterygium glaucum (Gleicheniales) was 151,007 bp and has a 9.7 kb inversion between the trnL-CAA and trnV-GCA genes when compared to O. cinnamomea. Several repeated sequences were detected around the inversion break points. The complete cp genome sequence of Lygodium japonicum (Schizaeales) was 157,142 bp and a deletion of the rpoC1 intron was detected. This intron loss was shared by all of the studied species of the genus Lygodium. The GC contents and the effective numbers of co-dons (ENCs) in ferns varied significantly when compared to seed plants. The ENC values of the early diverged leptosporangiate ferns showed intermediate levels between eusporangiate and core leptosporangiate ferns. However, our phylogenetic tree based on all of the cp gene sequences clearly indicated that the cp genome similarity between O. cinnamomea (Osmundales) and eusporangiate ferns are symplesiomorphies, rather than synapomorphies. Therefore, our data is in agreement with the view that Osmundales is a distinct early diverged lineage in the leptosporangiate ferns. PMID:24823358

  16. Chloroplast genome evolution in early diverged leptosporangiate ferns.

    PubMed

    Kim, Hyoung Tae; Chung, Myong Gi; Kim, Ki-Joong

    2014-05-01

    In this study, the chloroplast (cp) genome sequences from three early diverged leptosporangiate ferns were completed and analyzed in order to understand the evolution of the genome of the fern lineages. The complete cp genome sequence of Osmunda cinnamomea (Osmundales) was 142,812 base pairs (bp). The cp genome structure was similar to that of eusporangiate ferns. The gene/intron losses that frequently occurred in the cp genome of leptosporangiate ferns were not found in the cp genome of O. cinnamomea. In addition, putative RNA editing sites in the cp genome were rare in O. cinnamomea, even though the sites were frequently predicted to be present in leptosporangiate ferns. The complete cp genome sequence of Diplopterygium glaucum (Gleicheniales) was 151,007 bp and has a 9.7 kb inversion between the trnL-CAA and trnVGCA genes when compared to O. cinnamomea. Several repeated sequences were detected around the inversion break points. The complete cp genome sequence of Lygodium japonicum (Schizaeales) was 157,142 bp and a deletion of the rpoC1 intron was detected. This intron loss was shared by all of the studied species of the genus Lygodium. The GC contents and the effective numbers of codons (ENCs) in ferns varied significantly when compared to seed plants. The ENC values of the early diverged leptosporangiate ferns showed intermediate levels between eusporangiate and core leptosporangiate ferns. However, our phylogenetic tree based on all of the cp gene sequences clearly indicated that the cp genome similarity between O. cinnamomea (Osmundales) and eusporangiate ferns are symplesiomorphies, rather than synapomorphies. Therefore, our data is in agreement with the view that Osmundales is a distinct early diverged lineage in the leptosporangiate ferns.

  17. A locally funded Puerto Rican parrot (Amazona vittata) genome sequencing project increases avian data and advances young researcher education

    PubMed Central

    2012-01-01

    Background Amazona vittata is a critically endangered Puerto Rican endemic bird, the only surviving native parrot species in the United States territory, and the first parrot in the large Neotropical genus Amazona, to be studied on a genomic scale. Findings In a unique community-based funded project, DNA from an A. vittata female was sequenced using a HiSeq Illumina platform, resulting in a total of ~42.5 billion nucleotide bases. This provided approximately 26.89x average coverage depth at the completion of this funding phase. Filtering followed by assembly resulted in 259,423 contigs (N50 = 6,983 bp, longest = 75,003 bp), which was further scaffolded into 148,255 fragments (N50 = 19,470, longest = 206,462 bp). This provided ~76% coverage of the genome based on an estimated size of 1.58 Gb. The assembled scaffolds allowed basic genomic annotation and comparative analyses with other available avian whole-genome sequences. Conclusions The current data represents the first genomic information from and work carried out with a unique source of funding. This analysis further provides a means for directed training of young researchers in genetic and bioinformatics analyses and will facilitate progress towards a full assembly and annotation of the Puerto Rican parrot genome. It also adds extensive genomic data to a new branch of the avian tree, making it useful for comparative analyses with other avian species. Ultimately, the knowledge acquired from these data will contribute to an improved understanding of the overall population health of this species and aid in ongoing and future conservation efforts. PMID:23587420

  18. COGNAT: a web server for comparative analysis of genomic neighborhoods.

    PubMed

    Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y

    2017-11-22

    In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.

  19. Comparison of phasing strategies for whole human genomes

    PubMed Central

    Kirkness, Ewen; Schork, Nicholas J.

    2018-01-01

    Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not ‘phase’ the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available ‘Genome-In-A-Bottle’ (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density. PMID:29621242

  20. Mutation Detection with Next-Generation Resequencing through a Mediator Genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wurtzel, Omri; Dori-Bachash, Mally; Pietrokovski, Shmuel

    2010-12-31

    The affordability of next generation sequencing (NGS) is transforming the field of mutation analysis in bacteria. The genetic basis for phenotype alteration can be identified directly by sequencing the entire genome of the mutant and comparing it to the wild-type (WT) genome, thus identifying acquired mutations. A major limitation for this approach is the need for an a-priori sequenced reference genome for the WT organism, as the short reads of most current NGS approaches usually prohibit de-novo genome assembly. To overcome this limitation we propose a general framework that utilizes the genome of relative organisms as mediators for comparing WTmore » and mutant bacteria. Under this framework, both mutant and WT genomes are sequenced with NGS, and the short sequencing reads are mapped to the mediator genome. Variations between the mutant and the mediator that recur in the WT are ignored, thus pinpointing the differences between the mutant and the WT. To validate this approach we sequenced the genome of Bdellovibrio bacteriovorus 109J, an obligatory bacterial predator, and its prey-independent mutant, and compared both to the mediator species Bdellovibrio bacteriovorus HD100. Although the mutant and the mediator sequences differed in more than 28,000 nucleotide positions, our approach enabled pinpointing the single causative mutation. Experimental validation in 53 additional mutants further established the implicated gene. Our approach extends the applicability of NGS-based mutant analyses beyond the domain of available reference genomes.« less

  1. cisprimertool: software to implement a comparative genomics strategy for the development of conserved intron scanning (CIS) markers.

    PubMed

    Jayashree, B; Jagadeesh, V T; Hoisington, D

    2008-05-01

    The availability of complete, annotated genomic sequence information in model organisms is a rich resource that can be extended to understudied orphan crops through comparative genomic approaches. We report here a software tool (cisprimertool) for the identification of conserved intron scanning regions using expressed sequence tag alignments to a completely sequenced model crop genome. The method used is based on earlier studies reporting the assessment of conserved intron scanning primers (called CISP) within relatively conserved exons located near exon-intron boundaries from onion, banana, sorghum and pearl millet alignments with rice. The tool is freely available to academic users at http://www.icrisat.org/gt-bt/CISPTool.htm. © 2007 ICRISAT.

  2. Towards the delineation of the ancestral eutherian genome organization: comparative genome maps of human and the African elephant (Loxodonta africana) generated by chromosome painting.

    PubMed Central

    Frönicke, Lutz; Wienberg, Johannes; Stone, Gary; Adams, Lisa; Stanyon, Roscoe

    2003-01-01

    This study presents a whole-genome comparison of human and a representative of the Afrotherian clade, the African elephant, generated by reciprocal Zoo-FISH. An analysis of Afrotheria genomes is of special interest, because recent DNA sequence comparisons identify them as the oldest placental mammalian clade. Complete sets of whole-chromosome specific painting probes for the African elephant and human were constructed by degenerate oligonucleotide-primed PCR amplification of flow-sorted chromosomes. Comparative genome maps are presented based on their hybridization patterns. These maps show that the elephant has a moderately rearranged chromosome complement when compared to humans. The human paint probes identified 53 evolutionary conserved segments on the 27 autosomal elephant chromosomes and the X chromosome. Reciprocal experiments with elephant probes delineated 68 conserved segments in the human genome. The comparison with a recent aardvark and elephant Zoo-FISH study delineates new chromosomal traits which link the two Afrotherian species phylogenetically. In the absence of any morphological evidence the chromosome painting data offer the first non-DNA sequence support for an Afrotherian clade. The comparative human and elephant genome maps provide new insights into the karyotype organization of the proto-afrotherian, the ancestor of extant placental mammals, which most probably consisted of 2n=46 chromosomes. PMID:12965023

  3. Complete genome sequence of the fire blight pathogen Erwinia pyrifoliae DSM 12163T and comparative genomic insights into plant pathogenicity

    PubMed Central

    2010-01-01

    Background Erwinia pyrifoliae is a newly described necrotrophic pathogen, which causes fire blight on Asian (Nashi) pear and is geographically restricted to Eastern Asia. Relatively little is known about its genetics compared to the closely related main fire blight pathogen E. amylovora. Results The genome of the type strain of E. pyrifoliae strain DSM 12163T, was sequenced using both 454 and Solexa pyrosequencing and annotated. The genome contains a circular chromosome of 4.026 Mb and four small plasmids. Based on their respective role in virulence in E. amylovora or related organisms, we identified several putative virulence factors, including type III and type VI secretion systems and their effectors, flagellar genes, sorbitol metabolism, iron uptake determinants, and quorum-sensing components. A deletion in the rpoS gene covering the most conserved region of the protein was identified which may contribute to the difference in virulence/host-range compared to E. amylovora. Comparative genomics with the pome fruit epiphyte Erwinia tasmaniensis Et1/99 showed that both species are overall highly similar, although specific differences were identified, for example the presence of some phage gene-containing regions and a high number of putative genomic islands containing transposases in the E. pyrifoliae DSM 12163T genome. Conclusions The E. pyrifoliae genome is an important addition to the published genome of E. tasmaniensis and the unfinished genome of E. amylovora providing a foundation for re-sequencing additional strains that may shed light on the evolution of the host-range and virulence/pathogenicity of this important group of plant-associated bacteria. PMID:20047678

  4. Complete chloroplast genome sequence and comparative analysis of loblolly pine (Pinus taeda L.) with related species

    PubMed Central

    Khan, Abdul Latif; Khan, Muhammad Aaqil; Shahzad, Raheem; Lubna; Kang, Sang Mo; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

    2018-01-01

    Pinaceae, the largest family of conifers, has a diversified organization of chloroplast (cp) genomes with two typical highly reduced inverted repeats (IRs). In the current study, we determined the complete sequence of the cp genome of an economically and ecologically important conifer tree, the loblolly pine (Pinus taeda L.), using Illumina paired-end sequencing and compared the sequence with those of other pine species. The results revealed a genome size of 121,531 base pairs (bp) containing a pair of 830-bp IR regions, distinguished by a small single copy (42,258 bp) and large single copy (77,614 bp) region. The chloroplast genome of P. taeda encodes 120 genes, comprising 81 protein-coding genes, four ribosomal RNA genes, and 35 tRNA genes, with 151 randomly distributed microsatellites. Approximately 6 palindromic, 34 forward, and 22 tandem repeats were found in the P. taeda cp genome. Whole cp genome comparison with those of other Pinus species exhibited an overall high degree of sequence similarity, with some divergence in intergenic spacers. Higher and lower numbers of indels and single-nucleotide polymorphism substitutions were observed relative to P. contorta and P. monophylla, respectively. Phylogenomic analyses based on the complete genome sequence revealed that 60 shared genes generated trees with the same topologies, and P. taeda was closely related to P. contorta in the subgenus Pinus. Thus, the complete P. taeda genome provided valuable resources for population and evolutionary studies of gymnosperms and can be used to identify related species. PMID:29596414

  5. Multiple genome alignment for identifying the core structure among moderately related microbial genomes.

    PubMed

    Uchiyama, Ikuo

    2008-10-31

    Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.

  6. Comparative Analysis of Four Buckwheat Species Based on Morphology and Complete Chloroplast Genome Sequences.

    PubMed

    Wang, Cheng-Long; Ding, Meng-Qi; Zou, Chen-Yan; Zhu, Xue-Mei; Tang, Yu; Zhou, Mei-Liang; Shao, Ji-Rong

    2017-07-26

    Buckwheat is a nutritional and economically crop belonging to Polygonaceae, Fagopyrum. To better understand the mutation patterns and evolution trend in the chloroplast (cp) genome of buckwheat, and found sufficient number of variable regions to explore the phylogenetic relationships of this genus, two complete cp genomes of buckwheat including Fagopyrum dibotrys (F. dibotrys) and Fagopyrum luojishanense (F. luojishanense) were sequenced, and other two Fagopyrum cp genomes were used for comparative analysis. After morphological analysis, the main difference among these buckwheat were height, leaf shape, seeds and flower type. F. luojishanense was distinguishable from the cultivated species easily. Although the F. dibotrys and two cultivated species has some similarity, they different in habit and component contents. The cp genome of F. dibotrys was 159,320 bp while the F. luojishanense was 159,265 bp. 48 and 61 SSRs were found in F. dibotrys and F. luojishanense respectively. Meanwhile, 10 highly variable regions among these buckwheat species were located precisely. The phylogenetic relationships among four Fagopyrum species based on complete cp genomes was showed. The results suggested that F. dibotrys is more closely related to Fagopyrum tataricum. These data provided valuable genetic information for Fagopyrum species identification, taxonomy, phylogenetic study and molecular breeding.

  7. Detecting microsatellites within genomes: significant variation among algorithms.

    PubMed

    Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe

    2007-04-18

    Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.

  8. Detecting microsatellites within genomes: significant variation among algorithms

    PubMed Central

    Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe

    2007-01-01

    Background Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Results Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Conclusion Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions. PMID:17442102

  9. Draft sequencing and comparative genomics of Xylella fastidiosa strains reveal novel biological insights.

    PubMed

    Bhattacharyya, Anamitra; Stilwagen, Stephanie; Reznik, Gary; Feil, Helene; Feil, William S; Anderson, Iain; Bernal, Axel; D'Souza, Mark; Ivanova, Natalia; Kapatral, Vinayak; Larsen, Niels; Los, Tamara; Lykidis, Athanasios; Selkov, Eugene; Walunas, Theresa L; Purcell, Alexander; Edwards, Rob A; Hawkins, Trevor; Haselkorn, Robert; Overbeek, Ross; Kyrpides, Nikos C; Predki, Paul F

    2002-10-01

    Draft sequencing is a rapid and efficient method for determining the near-complete sequence of microbial genomes. Here we report a comparative analysis of one complete and two draft genome sequences of the phytopathogenic bacterium, Xylella fastidiosa, which causes serious disease in plants, including citrus, almond, and oleander. We present highlights of an in silico analysis based on a comparison of reconstructions of core biological subsystems. Cellular pathway reconstructions have been used to identify a small number of genes, which are likely to reside within the draft genomes but are not captured in the draft assembly. These represented only a small fraction of all genes and were predominantly large and small ribosomal subunit protein components. By using this approach, some of the inherent limitations of draft sequence can be significantly reduced. Despite the incomplete nature of the draft genomes, it is possible to identify several phage-related genes, which appear to be absent from the draft genomes and not the result of insufficient sequence sampling. This region may therefore identify potential host-specific functions. Based on this first functional reconstruction of a phytopathogenic microbe, we spotlight an unusual respiration machinery as a potential target for biological control. We also predicted and developed a new defined growth medium for Xylella.

  10. SMART precision cancer medicine: a FHIR-based app to provide genomic information at the point of care.

    PubMed

    Warner, Jeremy L; Rioth, Matthew J; Mandl, Kenneth D; Mandel, Joshua C; Kreda, David A; Kohane, Isaac S; Carbone, Daniel; Oreto, Ross; Wang, Lucy; Zhu, Shilin; Yao, Heming; Alterovitz, Gil

    2016-07-01

    Precision cancer medicine (PCM) will require ready access to genomic data within the clinical workflow and tools to assist clinical interpretation and enable decisions. Since most electronic health record (EHR) systems do not yet provide such functionality, we developed an EHR-agnostic, clinico-genomic mobile app to demonstrate several features that will be needed for point-of-care conversations. Our prototype, called Substitutable Medical Applications and Reusable Technology (SMART)® PCM, visualizes genomic information in real time, comparing a patient's diagnosis-specific somatic gene mutations detected by PCR-based hotspot testing to a population-level set of comparable data. The initial prototype works for patient specimens with 0 or 1 detected mutation. Genomics extensions were created for the Health Level Seven® Fast Healthcare Interoperability Resources (FHIR)® standard; otherwise, the prototype is a normal SMART on FHIR app. The PCM prototype can rapidly present a visualization that compares a patient's somatic genomic alterations against a distribution built from more than 3000 patients, along with context-specific links to external knowledge bases. Initial evaluation by oncologists provided important feedback about the prototype's strengths and weaknesses. We added several requested enhancements and successfully demonstrated the app at the inaugural American Society of Clinical Oncology Interoperability Demonstration; we have also begun to expand visualization capabilities to include cancer specimens with multiple mutations. PCM is open-source software for clinicians to present the individual patient within the population-level spectrum of cancer somatic mutations. The app can be implemented on any SMART on FHIR-enabled EHRs, and future versions of PCM should be able to evolve in parallel with external knowledge bases. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  11. A Trichosporonales genome tree based on 27 haploid and three evolutionarily conserved 'natural' hybrid genomes.

    PubMed

    Takashima, Masako; Sriswasdi, Sira; Manabe, Ri-Ichiroh; Ohkuma, Moriya; Sugita, Takashi; Iwasaki, Wataru

    2018-01-01

    To construct a backbone tree consisting of basidiomycetous yeasts, draft genome sequences from 25 species of Trichosporonales (Tremellomycetes, Basidiomycota) were generated. In addition to the hybrid genomes of Trichosporon coremiiforme and Trichosporon ovoides that we described previously, we identified an interspecies hybrid genome in Cutaneotrichosporon mucoides (formerly Trichosporon mucoides). This hybrid genome had a gene retention rate of ~55%, and its closest haploid relative was Cutaneotrichosporon dermatis. After constructing the C. mucoides subgenomes, we generated a phylogenetic tree using genome data from the 27 haploid species and the subgenome data from the three hybrid genome species. It was a high-quality tree with 100% bootstrap support for all of the branches. The genome-based tree provided superior resolution compared with previous multi-gene analyses. Although our backbone tree does not include all Trichosporonales genera (e.g. Cryptotrichosporon), it will be valuable for future analyses of genome data. Interest in interspecies hybrid fungal genomes has recently increased because they may provide a basis for new technologies. The three Trichosporonales hybrid genomes described in this study are different from well-characterized hybrid genomes (e.g. those of Saccharomyces pastorianus and Saccharomyces bayanus) because these hybridization events probably occurred in the distant evolutionary past. Hence, they will be useful for studying genome stability following hybridization and speciation events. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  12. An Expressed Sequence Tag (EST)-enriched genetic map of turbot (Scophthalmus maximus): a useful framework for comparative genomics across model and farmed teleosts

    PubMed Central

    2012-01-01

    Background The turbot (Scophthalmus maximus) is a relevant species in European aquaculture. The small turbot genome provides a source for genomics strategies to use in order to understand the genetic basis of productive traits, particularly those related to sex, growth and pathogen resistance. Genetic maps represent essential genomic screening tools allowing to localize quantitative trait loci (QTL) and to identify candidate genes through comparative mapping. This information is the backbone to develop marker-assisted selection (MAS) programs in aquaculture. Expressed sequenced tag (EST) resources have largely increased in turbot, thus supplying numerous type I markers suitable for extending the previous linkage map, which was mostly based on anonymous loci. The aim of this study was to construct a higher-resolution turbot genetic map using EST-linked markers, which will turn out to be useful for comparative mapping studies. Results A consensus gene-enriched genetic map of the turbot was constructed using 463 SNP and microsatellite markers in nine reference families. This map contains 438 markers, 180 EST-linked, clustered at 24 linkage groups. Linkage and comparative genomics evidences suggested additional linkage group fusions toward the consolidation of turbot map according to karyotype information. The linkage map showed a total length of 1402.7 cM with low average intermarker distance (3.7 cM; ~2 Mb). A global 1.6:1 female-to-male recombination frequency (RF) ratio was observed, although largely variable among linkage groups and chromosome regions. Comparative sequence analysis revealed large macrosyntenic patterns against model teleost genomes, significant hits decreasing from stickleback (54%) to zebrafish (20%). Comparative mapping supported particular chromosome rearrangements within Acanthopterygii and aided to assign unallocated markers to specific turbot linkage groups. Conclusions The new gene-enriched high-resolution turbot map represents a useful genomic tool for QTL identification, positional cloning strategies, and future genome assembling. This map showed large synteny conservation against model teleost genomes. Comparative genomics and data mining from landmarks will provide straightforward access to candidate genes, which will be the basis for genetic breeding programs and evolutionary studies in this species. PMID:22747677

  13. The complete mitochondrial genome of eastern lowland gorilla, Gorilla beringei graueri, and comparative mitochondrial genomics of Gorilla species.

    PubMed

    Hu, Xiao-di; Gao, Li-zhi

    2016-01-01

    In this study, we determined the complete mitochondrial (mt) genome of eastern lowland gorilla, Gorilla beringei graueri for the first time. The total genome was 16,416 bp in length. It contained a total of 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes and 1 control region (D-loop region). The base composition was A (30.88%), G (13.10%), C (30.89%) and T (25.13%), indicating that the percentage of A+T (56.01%) was higher than G+C (43.99%). Comparisons with the other publicly available Gorilla mitogenome showed the conservation of gene order and base compositions but a bunch of nucleotide diversity. This complete mitochondrial genome sequence will provide valuable genetic information for further studies on conservation genetics of eastern lowland gorilla.

  14. WormBase 2016: expanding to enable helminth genomic research.

    PubMed

    Howe, Kevin L; Bolt, Bruce J; Cain, Scott; Chan, Juancarlos; Chen, Wen J; Davis, Paul; Done, James; Down, Thomas; Gao, Sibyl; Grove, Christian; Harris, Todd W; Kishore, Ranjana; Lee, Raymond; Lomax, Jane; Li, Yuling; Muller, Hans-Michael; Nakamura, Cecilia; Nuin, Paulo; Paulini, Michael; Raciti, Daniela; Schindelman, Gary; Stanley, Eleanor; Tuli, Mary Ann; Van Auken, Kimberly; Wang, Daniel; Wang, Xiaodong; Williams, Gary; Wright, Adam; Yook, Karen; Berriman, Matthew; Kersey, Paul; Schedl, Tim; Stein, Lincoln; Sternberg, Paul W

    2016-01-04

    WormBase (www.wormbase.org) is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species, and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed a new portal for parasitic worm genomes. WormBase ParaSite (parasite.wormbase.org) contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support helminth genomic research. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. The cacao Criollo genome v2.0: an improved version of the genome for genetic and functional genomic studies.

    PubMed

    Argout, X; Martin, G; Droc, G; Fouet, O; Labadie, K; Rivals, E; Aury, J M; Lanaud, C

    2017-09-15

    Theobroma cacao L., native to the Amazonian basin of South America, is an economically important fruit tree crop for tropical countries as a source of chocolate. The first draft genome of the species, from a Criollo cultivar, was published in 2011. Although a useful resource, some improvements are possible, including identifying misassemblies, reducing the number of scaffolds and gaps, and anchoring un-anchored sequences to the 10 chromosomes. We used a NGS-based approach to significantly improve the assembly of the Belizian Criollo B97-61/B2 genome. We combined four Illumina large insert size mate paired libraries with 52x of Pacific Biosciences long reads to correct misassembled regions and reduced the number of scaffolds. We then used genotyping by sequencing (GBS) methods to increase the proportion of the assembly anchored to chromosomes. The scaffold number decreased from 4,792 in assembly V1 to 554 in V2 while the scaffold N50 size has increased from 0.47 Mb in V1 to 6.5 Mb in V2. A total of 96.7% of the assembly was anchored to the 10 chromosomes compared to 66.8% in the previous version. Unknown sites (Ns) were reduced from 10.8% to 5.7%. In addition, we updated the functional annotations and performed a new RefSeq structural annotation based on RNAseq evidence. Theobroma cacao Criollo genome version 2 will be a valuable resource for the investigation of complex traits at the genomic level and for future comparative genomics and genetics studies in cacao tree. New functional tools and annotations are available on the Cocoa Genome Hub ( http://cocoa-genome-hub.southgreen.fr ).

  16. Chromosome arm-specific BAC end sequences permit comparative analysis of homoeologous chromosomes and genomes of polyploid wheat

    PubMed Central

    2012-01-01

    Background Bread wheat, one of the world’s staple food crops, has the largest, highly repetitive and polyploid genome among the cereal crops. The wheat genome holds the key to crop genetic improvement against challenges such as climate change, environmental degradation, and water scarcity. To unravel the complex wheat genome, the International Wheat Genome Sequencing Consortium (IWGSC) is pursuing a chromosome- and chromosome arm-based approach to physical mapping and sequencing. Here we report on the use of a BAC library made from flow-sorted telosomic chromosome 3A short arm (t3AS) for marker development and analysis of sequence composition and comparative evolution of homoeologous genomes of hexaploid wheat. Results The end-sequencing of 9,984 random BACs from a chromosome arm 3AS-specific library (TaaCsp3AShA) generated 11,014,359 bp of high quality sequence from 17,591 BAC-ends with an average length of 626 bp. The sequence represents 3.2% of t3AS with an average DNA sequence read every 19 kb. Overall, 79% of the sequence consisted of repetitive elements, 1.38% as coding regions (estimated 2,850 genes) and another 19% of unknown origin. Comparative sequence analysis suggested that 70-77% of the genes present in both 3A and 3B were syntenic with model species. Among the transposable elements, gypsy/sabrina (12.4%) was the most abundant repeat and was significantly more frequent in 3A compared to homoeologous chromosome 3B. Twenty novel repetitive sequences were also identified using de novo repeat identification. BESs were screened to identify simple sequence repeats (SSR) and transposable element junctions. A total of 1,057 SSRs were identified with a density of one per 10.4 kb, and 7,928 junctions between transposable elements (TE) and other sequences were identified with a density of one per 1.39 kb. With the objective of enhancing the marker density of chromosome 3AS, oligonucleotide primers were successfully designed from 758 SSRs and 695 Insertion Site Based Polymorphisms (ISBPs). Of the 96 ISBP primer pairs tested, 28 (29%) were 3A-specific and compared to 17 (18%) for 96 SSRs. Conclusion This work reports on the use of wheat chromosome arm 3AS-specific BAC library for the targeted generation of sequence data from a particular region of the huge genome of wheat. A large quantity of sequences were generated from the A genome of hexaploid wheat for comparative genome analysis with homoeologous B and D genomes and other model grass genomes. Hundreds of molecular markers were developed from the 3AS arm-specific sequences; these and other sequences will be useful in gene discovery and physical mapping. PMID:22559868

  17. The ‘thousand-dollar genome': an ethical exploration

    PubMed Central

    Dondorp, Wybo J; de Wert, Guido M W R

    2013-01-01

    Sequencing an individual's complete genome is expected to be possible for a relatively low sum ‘one thousand dollars' within a few years. Sequencing refers to determining the order of base pairs that make up the genome. The result is a library of three billion letter combinations. Cheap whole-genome sequencing is of greatest importance to medical scientific research. Comparing individual complete genomes will lead to a better understanding of the contribution genetic variation makes to health and disease. As knowledge increases, the ‘thousand-dollar genome' will also become increasingly important to healthcare. The applications that come within reach raise a number of ethical questions. This monitoring report addresses the issue. PMID:23677179

  18. An integrative approach to energy, carbon, and redox metabolism in the cyanobacterium Synechocystis sp. PCC 6803

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Overbeek, Ross; Fonstein, Veronika; Osterman, Andrei

    2005-02-15

    The team of the Fellowship for Interpretation of Genomes (FIG) under the leadership of Ross Overbeek, began working on this Project in November 2003. During the previous year, the Project was performed at Integrated Genomics Inc. A transition from the industrial environment to the public domain prompted us to adjust some aspects of the Project. Notwithstanding the challenges, we believe that these adjustments had a strong positive impact on our deliverables. Most importantly, the work of the research team led by R. Overbeek resulted in the deployment of a new open source genomic platform, the SEED (Specific Aim 1). Thismore » platform provided a foundation for the development of CyanoSEED a specialized portal to comparative analysis and metabolic reconstruction of all available cyanobacterial genomes (Specific Aim 3). The SEED represents a new generation of software for genome analysis. Briefly, it is a portable and extendable system, containing one of the largest and permanently growing collections of complete and partial genomes. The complete system with annotations and tools is freely available via browsing or via installation on a user's Mac or Linux computer. One of the important unique features of the SEED is the support of metabolic reconstruction and comparative genome analysis via encoding and projection of functional subsystems. During the project period, the FIG research team has validated the new software by developing a significant number of core subsystems, covering many aspects of central metabolism (Specific Aim 2), as well as metabolic areas specific for cyanobacteria and other photoautotrophic organisms (Specific Aim 3). In addition to providing a proof of technology and a starting point for further community-based efforts, these subsystems represent a valuable asset. An extensive coverage of central metabolism provides the bulk of information required for metabolic modeling in Synechocystis sp.PCC 6803. Detailed analysis of several subsystems covering energy, carbon, and redox metabolism in the Synechocystis sp. PCC 6803 and other cyanobacteria has been performed (Specific Aim 4). The main objectives for this year (adjusted to reflect a new, public domain, setting of the Project research team) were: Aim 1. To develop, test, and deploy a new open source system, the SEED, for integrating community-based annotation, and comparative analysis of all publicly available microbial genomes. Develop a comprehensive genomic database by integrating within SEED all publicly available complete and nearly complete genome sequences with special emphasis on genomes of cyanobacteria, phototrophic eukaryotes, and anoxygenic phototrophic bacteria--invaluable for comparative genomic studies of energy and carbon metabolism in Synechocystis sp. PCC 6803. Aim 2. To develop the SEED's biological content in the form of a collection of encoded Subsystems largely covering the conserved cellular machinery in prokaryotes (and central metabolic machinery in eukaryotes). Aim 3. To develop, utilizing core SEED technology, the CyanoSEED--a specialized WEB portal for community-based annotation, and comparative analysis of all publicly available cyanobacterial genomes. Encode the set of additional subsystems representing key metabolic transformations in cyanobacteria and other photoautotrophs. We envisioned this resource as complementary to other public access databases for comparative genomic analysis currently available to the cyanobacterial research community. Aim 4. Perform in-depth analysis of several subsystems covering energy, carbon, and redox metabolism in the Synechocystis sp. PCC 6803 and all other cyanobacteria with available genome sequences. Reveal inconsistencies and gaps in the current knowledge of these subsystems. Use functional and genome context analysis tools in CyanoSEED to predict, whenever possible, candidate genes for inferred functional roles. To disseminate freely these conjectures and predictions by publishing them on CyanoSEED (http://cyanoseed.thefig.info/) and the Subsystems Forum (http://brucella.uchicago.edu/SubsystemForum/) in order to facilitate experimental analysis by our collaborator on this Project and by other experimentalists working in various field of cyanobacterial physiology and biotechnology.« less

  19. The Aspergillus Genome Database: multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations.

    PubMed

    Cerqueira, Gustavo C; Arnaud, Martha B; Inglis, Diane O; Skrzypek, Marek S; Binkley, Gail; Simison, Matt; Miyasato, Stuart R; Binkley, Jonathan; Orvis, Joshua; Shah, Prachi; Wymore, Farrell; Sherlock, Gavin; Wortman, Jennifer R

    2014-01-01

    The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available web-based resource that was designed for Aspergillus researchers and is also a valuable source of information for the entire fungal research community. In addition to being a repository and central point of access to genome, transcriptome and polymorphism data, AspGD hosts a comprehensive comparative genomics toolbox that facilitates the exploration of precomputed orthologs among the 20 currently available Aspergillus genomes. AspGD curators perform gene product annotation based on review of the literature for four key Aspergillus species: Aspergillus nidulans, Aspergillus oryzae, Aspergillus fumigatus and Aspergillus niger. We have iteratively improved the structural annotation of Aspergillus genomes through the analysis of publicly available transcription data, mostly expressed sequenced tags, as described in a previous NAR Database article (Arnaud et al. 2012). In this update, we report substantive structural annotation improvements for A. nidulans, A. oryzae and A. fumigatus genomes based on recently available RNA-Seq data. Over 26 000 loci were updated across these species; although those primarily comprise the addition and extension of untranslated regions (UTRs), the new analysis also enabled over 1000 modifications affecting the coding sequence of genes in each target genome.

  20. [Technology of analysis of epigenetic and structural changes of epithelial tumors genome with NotI-microarrays by the example of human chromosome].

    PubMed

    Pavlova, T V; Kashuba, V I; Muravenko, O V; Yenamandra, S P; Ivanova, T A; Zabarovskaia, V I; Rakhmanaliev, E R; Petrenko, L A; Pronina, I V; Loginov, V I; Iurkevich, O Iu; Kiselev, L L; Zelenin, A V; Zabarovskiĭ, E R

    2009-01-01

    New comparative genome hybridization technology on NotI-microarrays is presented (Karolinska Institute International Patent WO02/086163). The method is based on comparative genome hybridization of NotI-probes from tumor and normal genomic DNA with the principle of new DNA NotI-microarrays. Using this method 181 NotI linking loci from human chromosome 3 were analyzed in 200 malignant tumor samples from different organs: kidney, lung, breast, ovary, cervical, prostate. Most frequently (more than in 30%) aberrations--deletions, methylation,--were identified in NotI-sites located in MINT24, BHLHB2, RPL15, RARbeta1, ITGA9, RBSP3, VHL, ZIC4 genes, that suggests they probably are involved in cancer development. Methylation of these genomic loci was confirmed by methylation-specific PCR and bisulfite sequencing. The results demonstrate perspective of using this method to solve some oncogenomic problems.

  1. Clarification of Taxonomic Status within the Pseudomonas syringae Species Group Based on a Phylogenomic Analysis.

    PubMed

    Gomila, Margarita; Busquets, Antonio; Mulet, Magdalena; García-Valdés, Elena; Lalucat, Jorge

    2017-01-01

    The Pseudomonas syringae phylogenetic group comprises 15 recognized bacterial species and more than 60 pathovars. The classification and identification of strains is relevant for practical reasons but also for understanding the epidemiology and ecology of this group of plant pathogenic bacteria. Genome-based taxonomic analyses have been introduced recently to clarify the taxonomy of the whole genus. A set of 139 draft and complete genome sequences of strains belonging to all species of the P. syringae group available in public databases were analyzed, together with the genomes of closely related species used as outgroups. Comparative genomics based on the genome sequences of the species type strains in the group allowed the delineation of phylogenomic species and demonstrated that a high proportion of strains included in the study are misclassified. Furthermore, representatives of at least 7 putative novel species were detected. It was also confirmed that P. ficuserectae, P. meliae , and P. savastanoi are later synonyms of P. amygdali and that " P. coronafaciens " should be revived as a nomenspecies.

  2. Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level

    PubMed Central

    Tian, Xinpeng; Zhang, Zhewen; Yang, Tingting; Chen, Meili; Li, Jie; Chen, Fei; Yang, Jin; Li, Wenjie; Zhang, Bing; Zhang, Zhang; Wu, Jiayan; Zhang, Changsheng; Long, Lijuan; Xiao, Jingfa

    2016-01-01

    Over 200 genomes of streptomycete strains that were isolated from various environments are available from the NCBI. However, little is known about the characteristics that are linked to marine adaptation in marine-derived streptomycetes. The particularity and complexity of the marine environment suggest that marine streptomycetes are genetically diverse. Here, we sequenced nine strains from the Streptomyces genus that were isolated from different longitudes, latitudes, and depths of the South China Sea. Then we compared these strains to 22 NCBI downloaded streptomycete strains. Thirty-one streptomycete strains are clearly grouped into a marine-derived subgroup and multiple source subgroup-based phylogenetic tree. The phylogenetic analyses have revealed the dynamic process underlying streptomycete genome evolution, and lateral gene transfer is an important driving force during the process. Pan-genomics analyses have revealed that streptomycetes have an open pan-genome, which reflects the diversity of these streptomycetes and guarantees the species a quick and economical response to diverse environments. Functional and comparative genomics analyses indicate that the marine-derived streptomycetes subgroup possesses some common characteristics of marine adaptation. Our findings have expanded our knowledge of how ocean isolates of streptomycete strains adapt to marine environments. The availability of streptomycete genomes from the South China Sea will be beneficial for further analysis on marine streptomycetes and will enrich the South China Sea’s genetic data sources. PMID:27446038

  3. Extensive Mobilome-Driven Genome Diversification in Mouse Gut-Associated Bacteroides vulgatus mpk.

    PubMed

    Lange, Anna; Beier, Sina; Steimle, Alex; Autenrieth, Ingo B; Huson, Daniel H; Frick, Julia-Stefanie

    2016-04-25

    Like many other Bacteroides species, Bacteroides vulgatus strain mpk, a mouse fecal isolate which was shown to promote intestinal homeostasis, utilizes a variety of mobile elements for genome evolution. Based on sequences collected by Pacific Biosciences SMRT sequencing technology, we discuss the challenges of assembling and studying a bacterial genome of high plasticity. Additionally, we conducted comparative genomics comparing this commensal strain with the B. vulgatus type strain ATCC 8482 as well as multiple other Bacteroides and Parabacteroides strains to reveal the most important differences and identify the unique features of B. vulgatus mpk. The genome of B. vulgatus mpk harbors a large and diverse set of mobile element proteins compared with other sequenced Bacteroides strains. We found evidence of a number of different horizontal gene transfer events and a genome landscape that has been extensively altered by different mobilization events. A CRISPR/Cas system could be identified that provides a possible mechanism for preventing the integration of invading external DNA. We propose that the high genome plasticity and the introduced genome instabilities of B. vulgatus mpk arising from the various mobilization events might play an important role not only in its adaptation to the challenging intestinal environment in general, but also in its ability to interact with the gut microbiota. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  4. Comparison of Two Capillary Gel Electrophoresis Systems for Clostridium difficile Ribotyping, Using a Panel of Ribotype 027 Isolates and Whole-Genome Sequences as a Reference Standard

    PubMed Central

    Xiao, Meng; Kong, Fanrong; Jin, Ping; Wang, Qinning; Xiao, Kelin; Jeoffreys, Neisha; James, Gregory

    2012-01-01

    PCR ribotyping is the most commonly used Clostridium difficile genotyping method, but its utility is limited by lack of standardization. In this study, we analyzed four published whole genomes and tested an international collection of 21 well-characterized C. difficile ribotype 027 isolates as the basis for comparison of two capillary gel electrophoresis (CGE)-based ribotyping methods. There were unexpected differences between the 16S-23S rRNA intergenic spacer region (ISR) allelic profiles of the four ribotype 027 genomes, but six bands were identified in all four and a seventh in three genomes. All seven bands and another, not identified in any of the whole genomes, were found in all 21 isolates. We compared sequencer-based CGE (SCGE) with three different primer pairs to the Qiagen QIAxcel CGE (QCGE) platform. Deviations from individual reference/consensus band sizes were smaller for SCGE (0 to 0.2 bp) than for QCGE (4.2 to 9.5 bp). Compared with QCGE, SCGE more readily distinguished bands of similar length (more discriminatory), detected bands of larger size and lower intensity (more sensitive), and assigned band sizes more accurately and reproducibly, making it more suitable for standardization. Specifically, QCGE failed to identify the largest ISR amplicon. Based on several criteria, we recommend the primer set 16S-USA/23S-USA for use in a proposed standard SCGE method. Similar differences between SCGE and QCGE were found on testing of 14 isolates of four other C. difficile ribotypes. Based on our results, ISR profiles based on accurate sequencer-based band lengths would be preferable to agarose gel-based banding patterns for the assignment of ribotypes. PMID:22692737

  5. Phylogenetics of modern birds in the era of genomics

    PubMed Central

    Edwards, Scott V; Bryan Jennings, W; Shedlock, Andrew M

    2005-01-01

    In the 14 years since the first higher-level bird phylogenies based on DNA sequence data, avian phylogenetics has witnessed the advent and maturation of the genomics era, the completion of the chicken genome and a suite of technologies that promise to add considerably to the agenda of avian phylogenetics. In this review, we summarize current approaches and data characteristics of recent higher-level bird studies and suggest a number of as yet untested molecular and analytical approaches for the unfolding tree of life for birds. A variety of comparative genomics strategies, including adoption of objective quality scores for sequence data, analysis of contiguous DNA sequences provided by large-insert genomic libraries, and the systematic use of retroposon insertions and other rare genomic changes all promise an integrated phylogenetics that is solidly grounded in genome evolution. The avian genome is an excellent testing ground for such approaches because of the more balanced representation of single-copy and repetitive DNA regions than in mammals. Although comparative genomics has a number of obvious uses in avian phylogenetics, its application to large numbers of taxa poses a number of methodological and infrastructural challenges, and can be greatly facilitated by a ‘community genomics’ approach in which the modest sequencing throughputs of single PI laboratories are pooled to produce larger, complementary datasets. Although the polymerase chain reaction era of avian phylogenetics is far from complete, the comparative genomics era—with its ability to vastly increase the number and type of molecular characters and to provide a genomic context for these characters—will usher in a host of new perspectives and opportunities for integrating genome evolution and avian phylogenetics. PMID:16024355

  6. Ab Initio structure prediction for Escherichia coli: towards genome-wide protein structure modeling and fold assignment

    PubMed Central

    Xu, Dong; Zhang, Yang

    2013-01-01

    Genome-wide protein structure prediction and structure-based function annotation have been a long-term goal in molecular biology but not yet become possible due to difficulties in modeling distant-homology targets. We developed a hybrid pipeline combining ab initio folding and template-based modeling for genome-wide structure prediction applied to the Escherichia coli genome. The pipeline was tested on 43 known sequences, where QUARK-based ab initio folding simulation generated models with TM-score 17% higher than that by traditional comparative modeling methods. For 495 unknown hard sequences, 72 are predicted to have a correct fold (TM-score > 0.5) and 321 have a substantial portion of structure correctly modeled (TM-score > 0.35). 317 sequences can be reliably assigned to a SCOP fold family based on structural analogy to existing proteins in PDB. The presented results, as a case study of E. coli, represent promising progress towards genome-wide structure modeling and fold family assignment using state-of-the-art ab initio folding algorithms. PMID:23719418

  7. Association analysis of whole genome sequencing data accounting for longitudinal and family designs.

    PubMed

    Hu, Yijuan; Hui, Qin; Sun, Yan V

    2014-01-01

    Using the whole genome sequencing data and the simulated longitudinal phenotypes for 849 pedigree-based individuals from Genetic Analysis Workshop 18, we investigated various approaches to detecting the association of rare and common variants with blood pressure traits. We compared three strategies for longitudinal data: (a) using the baseline measurement only, (b) using the average from multiple visits, and (c) using all individual measurements. We also compared the power of using all of the pedigree-based data and the unrelated subset. The analyses were performed without knowledge of the underlying simulating model.

  8. Exploiting rice-sorghum synteny for targeted development of EST-SSRs to enrich the sorghum genetic linkage map.

    PubMed

    Ramu, P; Kassahun, B; Senthilvel, S; Ashok Kumar, C; Jayashree, B; Folkertsma, R T; Reddy, L Ananda; Kuruvinashetti, M S; Haussmann, B I G; Hash, C T

    2009-11-01

    The sequencing and detailed comparative functional analysis of genomes of a number of select botanical models open new doors into comparative genomics among the angiosperms, with potential benefits for improvement of many orphan crops that feed large populations. In this study, a set of simple sequence repeat (SSR) markers was developed by mining the expressed sequence tag (EST) database of sorghum. Among the SSR-containing sequences, only those sharing considerable homology with rice genomic sequences across the lengths of the 12 rice chromosomes were selected. Thus, 600 SSR-containing sorghum EST sequences (50 homologous sequences on each of the 12 rice chromosomes) were selected, with the intention of providing coverage for corresponding homologous regions of the sorghum genome. Primer pairs were designed and polymorphism detection ability was assessed using parental pairs of two existing sorghum mapping populations. About 28% of these new markers detected polymorphism in this 4-entry panel. A subset of 55 polymorphic EST-derived SSR markers were mapped onto the existing skeleton map of a recombinant inbred population derived from cross N13 x E 36-1, which is segregating for Striga resistance and the stay-green component of terminal drought tolerance. These new EST-derived SSR markers mapped across all 10 sorghum linkage groups, mostly to regions expected based on prior knowledge of rice-sorghum synteny. The ESTs from which these markers were derived were then mapped in silico onto the aligned sorghum genome sequence, and 88% of the best hits corresponded to linkage-based positions. This study demonstrates the utility of comparative genomic information in targeted development of markers to fill gaps in linkage maps of related crop species for which sufficient genomic tools are not available.

  9. Comparative genomics of Clavibacter michiganensis subspecies, pathogens of important agricultural crops.

    PubMed

    Tambong, James T

    2017-01-01

    Subspecies of Clavibacter michiganensis are important phytobacterial pathogens causing devastating diseases in several agricultural crops. The genome organizations of these pathogens are poorly understood. Here, the complete genomes of 5 subspecies (C. michiganensis subsp. michiganensis, Cmi; C. michiganensis subsp. sepedonicus, Cms; C. michiganensis subsp. nebraskensis, Cmn; C. michiganensis subsp. insidiosus, Cmi and C. michiganensis subsp. capsici, Cmc) were analyzed. This study assessed the taxonomic position of the subspecies based on 16S rRNA and genome-based DNA homology and concludes that there is ample evidence to elevate some of the subspecies to species-level. Comparative genomics analysis indicated distinct genomic features evident on the DNA structural atlases and annotation features. Based on orthologous gene analysis, about 2300 CDSs are shared across all the subspecies; and Cms showed the highest number of subspecies-specific CDS, most of which are mobile elements suggesting that Cms could be more prone to translocation of foreign genes. Cms and Cmi had the highest number of pseudogenes, an indication of potential degenerating genomes. The stress response factors that may be involved in cold/heat shock, detoxification, oxidative stress, osmoregulation, and carbon utilization are outlined. For example, the wco-cluster encoding for extracellular polysaccharide II is highly conserved while the sucrose-6-phosphate hydrolase that catalyzes the hydrolysis of sucrose-6-phosphate yielding glucose-6-phosphate and fructose is highly divergent. A unique second form of the enzyme is only present in Cmn NCPPB 2581. Also, twenty-eight plasmid-borne CDSs in the other subspecies were found to have homologues in the chromosomal genome of Cmn which is known not to carry plasmids. These CDSs include pathogenesis-related factors such as Endocellulases E1 and Beta-glucosidase. The results presented here provide an insight of the functional organization of the genomes of five core C. michiganensis subspecies, enabling a better understanding of these phytobacteria.

  10. Comparative genomics of Clavibacter michiganensis subspecies, pathogens of important agricultural crops

    PubMed Central

    2017-01-01

    Subspecies of Clavibacter michiganensis are important phytobacterial pathogens causing devastating diseases in several agricultural crops. The genome organizations of these pathogens are poorly understood. Here, the complete genomes of 5 subspecies (C. michiganensis subsp. michiganensis, Cmi; C. michiganensis subsp. sepedonicus, Cms; C. michiganensis subsp. nebraskensis, Cmn; C. michiganensis subsp. insidiosus, Cmi and C. michiganensis subsp. capsici, Cmc) were analyzed. This study assessed the taxonomic position of the subspecies based on 16S rRNA and genome-based DNA homology and concludes that there is ample evidence to elevate some of the subspecies to species-level. Comparative genomics analysis indicated distinct genomic features evident on the DNA structural atlases and annotation features. Based on orthologous gene analysis, about 2300 CDSs are shared across all the subspecies; and Cms showed the highest number of subspecies-specific CDS, most of which are mobile elements suggesting that Cms could be more prone to translocation of foreign genes. Cms and Cmi had the highest number of pseudogenes, an indication of potential degenerating genomes. The stress response factors that may be involved in cold/heat shock, detoxification, oxidative stress, osmoregulation, and carbon utilization are outlined. For example, the wco-cluster encoding for extracellular polysaccharide II is highly conserved while the sucrose-6-phosphate hydrolase that catalyzes the hydrolysis of sucrose-6-phosphate yielding glucose-6-phosphate and fructose is highly divergent. A unique second form of the enzyme is only present in Cmn NCPPB 2581. Also, twenty-eight plasmid-borne CDSs in the other subspecies were found to have homologues in the chromosomal genome of Cmn which is known not to carry plasmids. These CDSs include pathogenesis-related factors such as Endocellulases E1 and Beta-glucosidase. The results presented here provide an insight of the functional organization of the genomes of five core C. michiganensis subspecies, enabling a better understanding of these phytobacteria. PMID:28319117

  11. Genome sequence of Ensifer adhaerens OV14 provides insights into its ability as a novel vector for the genetic transformation of plant genomes.

    PubMed

    Rudder, Steven; Doohan, Fiona; Creevey, Christopher J; Wendt, Toni; Mullins, Ewen

    2014-04-07

    Recently it has been shown that Ensifer adhaerens can be used as a plant transformation technology, transferring genes into several plant genomes when equipped with a Ti plasmid. For this study, we have sequenced the genome of Ensifer adhaerens OV14 (OV14) and compared it with those of Agrobacterium tumefaciens C58 (C58) and Sinorhizobium meliloti 1021 (1021); the latter of which has also demonstrated a capacity to genetically transform crop genomes, albeit at significantly reduced frequencies. The 7.7 Mb OV14 genome comprises two chromosomes and two plasmids. All protein coding regions in the OV14 genome were functionally grouped based on an eggNOG database. No genes homologous to the A. tumefaciens Ti plasmid vir genes appeared to be present in the OV14 genome. Unexpectedly, OV14 and 1021 were found to possess homologs to chromosomal based genes cited as essential to A. tumefaciens T-DNA transfer. Of significance, genes that are non-essential but exert a positive influence on virulence and the ability to genetically transform host genomes were identified in OV14 but were absent from the 1021 genome. This study reveals the presence of homologs to chromosomally based Agrobacterium genes that support T-DNA transfer within the genome of OV14 and other alphaproteobacteria. The sequencing and analysis of the OV14 genome increases our understanding of T-DNA transfer by non-Agrobacterium species and creates a platform for the continued improvement of Ensifer-mediated transformation (EMT).

  12. Genome sequence of Ensifer adhaerens OV14 provides insights into its ability as a novel vector for the genetic transformation of plant genomes

    PubMed Central

    2014-01-01

    Background Recently it has been shown that Ensifer adhaerens can be used as a plant transformation technology, transferring genes into several plant genomes when equipped with a Ti plasmid. For this study, we have sequenced the genome of Ensifer adhaerens OV14 (OV14) and compared it with those of Agrobacterium tumefaciens C58 (C58) and Sinorhizobium meliloti 1021 (1021); the latter of which has also demonstrated a capacity to genetically transform crop genomes, albeit at significantly reduced frequencies. Results The 7.7 Mb OV14 genome comprises two chromosomes and two plasmids. All protein coding regions in the OV14 genome were functionally grouped based on an eggNOG database. No genes homologous to the A. tumefaciens Ti plasmid vir genes appeared to be present in the OV14 genome. Unexpectedly, OV14 and 1021 were found to possess homologs to chromosomal based genes cited as essential to A. tumefaciens T-DNA transfer. Of significance, genes that are non-essential but exert a positive influence on virulence and the ability to genetically transform host genomes were identified in OV14 but were absent from the 1021 genome. Conclusions This study reveals the presence of homologs to chromosomally based Agrobacterium genes that support T-DNA transfer within the genome of OV14 and other alphaproteobacteria. The sequencing and analysis of the OV14 genome increases our understanding of T-DNA transfer by non-Agrobacterium species and creates a platform for the continued improvement of Ensifer-mediated transformation (EMT). PMID:24708309

  13. What constitutes an Arabian Helicobacter pylori? Lessons from comparative genomics.

    PubMed

    Kumar, Narender; Albert, M John; Al Abkal, Hanan; Siddique, Iqbal; Ahmed, Niyaz

    2017-02-01

    Helicobacter pylori, the human gastric pathogen, causes a variety of gastric diseases ranging from mild gastritis to gastric cancer. While the studies on H. pylori are dominated by those based on either East Asian or Western strains, information regarding H. pylori strains prevalent in the Middle East remains scarce. Therefore, we carried out whole-genome sequencing and comparative analysis of three H. pylori strains isolated from three native Arab, Kuwaiti patients. H. pylori strains were sequenced using Illumina platform. The sequence reads were filtered and draft genomes were assembled and annotated. Various pathogenicity-associated regions and phages present within the genomes were identified. Phylogenetic analysis was carried out to determine the genetic relatedness of Kuwaiti strains to various lineages of H. pylori. The core genome content and virulence-related genes were analyzed to assess the pathogenic potential. The three genomes clustered along with HpEurope strains in the phylogenetic tree comprising various H. pylori lineages. A total of 1187 genes spread among various functional classes were identified in the core genome analysis. The three genomes possessed a complete cagPAI and also retained most of the known outer membrane proteins as well as virulence-related genes. The cagA gene in all three strains consisted of an AB-C type EPIYA motif. The comparative genomic analysis of Kuwaiti H. pylori strains revealed a European ancestry and a high pathogenic potential. © 2016 John Wiley & Sons Ltd.

  14. OrthoANI: An improved algorithm and software for calculating average nucleotide identity.

    PubMed

    Lee, Imchang; Ouk Kim, Yeong; Park, Sang-Cheol; Chun, Jongsik

    2016-02-01

    Species demarcation in Bacteria and Archaea is mainly based on overall genome relatedness, which serves a framework for modern microbiology. Current practice for obtaining these measures between two strains is shifting from experimentally determined similarity obtained by DNA-DNA hybridization (DDH) to genome-sequence-based similarity. Average nucleotide identity (ANI) is a simple algorithm that mimics DDH. Like DDH, ANI values between two genome sequences may be different from each other when reciprocal calculations are compared. We compared 63 690 pairs of genome sequences and found that the differences in reciprocal ANI values are significantly high, exceeding 1 % in some cases. To resolve this problem of not being symmetrical, a new algorithm, named OrthoANI, was developed to accommodate the concept of orthology for which both genome sequences were fragmented and only orthologous fragment pairs taken into consideration for calculating nucleotide identities. OrthoANI is highly correlated with ANI (using BLASTn) and the former showed approximately 0.1 % higher values than the latter. In conclusion, OrthoANI provides a more robust and faster means of calculating average nucleotide identity for taxonomic purposes. The standalone software tools are freely available at http://www.ezbiocloud.net/sw/oat.

  15. Visualization of RNA structure models within the Integrative Genomics Viewer.

    PubMed

    Busan, Steven; Weeks, Kevin M

    2017-07-01

    Analyses of the interrelationships between RNA structure and function are increasingly important components of genomic studies. The SHAPE-MaP strategy enables accurate RNA structure probing and realistic structure modeling of kilobase-length noncoding RNAs and mRNAs. Existing tools for visualizing RNA structure models are not suitable for efficient analysis of long, structurally heterogeneous RNAs. In addition, structure models are often advantageously interpreted in the context of other experimental data and gene annotation information, for which few tools currently exist. We have developed a module within the widely used and well supported open-source Integrative Genomics Viewer (IGV) that allows visualization of SHAPE and other chemical probing data, including raw reactivities, data-driven structural entropies, and data-constrained base-pair secondary structure models, in context with linear genomic data tracks. We illustrate the usefulness of visualizing RNA structure in the IGV by exploring structure models for a large viral RNA genome, comparing bacterial mRNA structure in cells with its structure under cell- and protein-free conditions, and comparing a noncoding RNA structure modeled using SHAPE data with a base-pairing model inferred through sequence covariation analysis. © 2017 Busan and Weeks; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  16. Component identification of electron transport chains in curdlan-producing Agrobacterium sp. ATCC 31749 and its genome-specific prediction using comparative genome and phylogenetic trees analysis.

    PubMed

    Zhang, Hongtao; Setubal, Joao Carlos; Zhan, Xiaobei; Zheng, Zhiyong; Yu, Lijun; Wu, Jianrong; Chen, Dingqiang

    2011-06-01

    Agrobacterium sp. ATCC 31749 (formerly named Alcaligenes faecalis var. myxogenes) is a non-pathogenic aerobic soil bacterium used in large scale biotechnological production of curdlan. However, little is known about its genomic information. DNA partial sequence of electron transport chains (ETCs) protein genes were obtained in order to understand the components of ETC and genomic-specificity in Agrobacterium sp. ATCC 31749. Degenerate primers were designed according to ETC conserved sequences in other reported species. DNA partial sequences of ETC genes in Agrobacterium sp. ATCC 31749 were cloned by the PCR method using degenerate primers. Based on comparative genomic analysis, nine electron transport elements were ascertained, including NADH ubiquinone oxidoreductase, succinate dehydrogenase complex II, complex III, cytochrome c, ubiquinone biosynthesis protein ubiB, cytochrome d terminal oxidase, cytochrome bo terminal oxidase, cytochrome cbb (3)-type terminal oxidase and cytochrome caa (3)-type terminal oxidase. Similarity and phylogenetic analyses of these genes revealed that among fully sequenced Agrobacterium species, Agrobacterium sp. ATCC 31749 is closest to Agrobacterium tumefaciens C58. Based on these results a comprehensive ETC model for Agrobacterium sp. ATCC 31749 is proposed.

  17. The genomic landscape of rapid, repeated evolutionary rescue from toxic pollution in wild fish

    USDA-ARS?s Scientific Manuscript database

    Here we describe evolutionary rescue from intense pollution via multiple modes of selection in killifish populations from 4 urban estuaries of the US eastern seaboard. Comparative transcriptomics and analysis of 384 whole genome sequences show that the functioning of a receptor-based signaling pathw...

  18. Prenatal diagnosis of chromosomal abnormalities using array-based comparative genomic hybridization

    USDA-ARS?s Scientific Manuscript database

    This study was designed to evaluate the feasibility of using a targeted array-CGH strategy for prenatal diagnosis of genomic imbalances in a clinical setting of current pregnancies. Women undergoing prenatal diagnosis were counseled and offered array-CGH (BCM V4.0) in addition to routine chromosome ...

  19. Computational approach for elucidating interactions of cross-species miRNAs and their targets in Flaviviruses.

    PubMed

    Shinde, Santosh P; Banerjee, Amit Kumar; Arora, Neelima; Murty, U S N; Sripathi, Venkateswara Rao; Pal-Bhadra, Manika; Bhadra, Utpal

    2015-03-01

    Combating viral diseases has been a challenging task since time immemorial. Available molecular approaches are limited and not much effective for this daunting task. MicroRNA based therapies have shown promise in recent times. MicroRNAs are tiny non-coding RNAs that regulate translational repression of target mRNA in highly specific manner. In this study, we have determined the target regions for human and viral microRNAs in the conserved genomic regions of selected viruses of Flaviviridae family using miRanda and performed a comparative target selectivity analysis among them. Specific target regions were determined and they were compared extensively among themselves by exploring their position to determine the vicinity. Based on the multiplicity and cooperativity analysis, interaction maps were developed manually to represent the interactions between top-ranking miRNAs and genomes of the viruses considered in this study. Self-organizing map (SOM) was used to cluster the best-ranked microRNAs based on the vital physicochemical properties. This study will provide deep insight into the interrelation of the viral and human microRNAs interactions with the selected Flaviviridae genomes and will help to identify cross-species microRNA targets on the viral genome.

  20. The perennial ryegrass GenomeZipper: targeted use of genome resources for comparative grass genomics.

    PubMed

    Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F X; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

    2013-02-01

    Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species.

  1. The Perennial Ryegrass GenomeZipper: Targeted Use of Genome Resources for Comparative Grass Genomics1[C][W

    PubMed Central

    Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F.X.; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

    2013-01-01

    Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species. PMID:23184232

  2. A Mitochondrial Genome of Rhyparochromidae (Hemiptera: Heteroptera) and a Comparative Analysis of Related Mitochondrial Genomes.

    PubMed

    Li, Teng; Yang, Jie; Li, Yinwan; Cui, Ying; Xie, Qiang; Bu, Wenjun; Hillis, David M

    2016-10-19

    The Rhyparochromidae, the largest family of Lygaeoidea, encompasses more than 1,850 described species, but no mitochondrial genome has been sequenced to date. Here we describe the first mitochondrial genome for Rhyparochromidae: a complete mitochondrial genome of Panaorus albomaculatus (Scott, 1874). This mitochondrial genome is comprised of 16,345 bp, and contains the expected 37 genes and control region. The majority of the control region is made up of a large tandem-repeat region, which has a novel pattern not previously observed in other insects. The tandem-repeats region of P. albomaculatus consists of 53 tandem duplications (including one partial repeat), which is the largest number of tandem repeats among all the known insect mitochondrial genomes. Slipped-strand mispairing during replication is likely to have generated this novel pattern of tandem repeats. Comparative analysis of tRNA gene families in sequenced Pentatomomorpha and Lygaeoidea species shows that the pattern of nucleotide conservation is markedly higher on the J-strand. Phylogenetic reconstruction based on mitochondrial genomes suggests that Rhyparochromidae is not the sister group to all the remaining Lygaeoidea, and supports the monophyly of Lygaeoidea.

  3. Using comparative genome analysis to identify problems in annotated microbial genomes.

    PubMed

    Poptsova, Maria S; Gogarten, J Peter

    2010-07-01

    Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

  4. Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia

    PubMed Central

    2014-01-01

    Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G + C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a phophoenolpyruvate synthase) and substrate utilization pathway (mannose and aromatics utilization) that might explain phenotypic differences between C. autoethanogenum and C. ljungdahlii. Conclusions Single molecule sequencing will be increasingly used to produce finished microbial genomes. The complete genome will facilitate comparative genomics and functional genomics and support future comparisons between Clostridia and studies that examine the evolution of plasmids, bacteriophage and CRISPR systems. PMID:24655715

  5. An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.

    PubMed

    Tørresen, Ole K; Star, Bastiaan; Jentoft, Sissel; Reinar, William B; Grove, Harald; Miller, Jason R; Walenz, Brian P; Knight, James; Ekholm, Jenny M; Peluso, Paul; Edvardsen, Rolf B; Tooming-Klunderud, Ave; Skage, Morten; Lien, Sigbjørn; Jakobsen, Kjetill S; Nederbragt, Alexander J

    2017-01-18

    The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies. By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual. The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.

  6. Survey of gene splicing algorithms based on reads.

    PubMed

    Si, Xiuhua; Wang, Qian; Zhang, Lei; Wu, Ruo; Ma, Jiquan

    2017-11-02

    Gene splicing is the process of assembling a large number of unordered short sequence fragments to the original genome sequence as accurately as possible. Several popular splicing algorithms based on reads are reviewed in this article, including reference genome algorithms and de novo splicing algorithms (Greedy-extension, Overlap-Layout-Consensus graph, De Bruijn graph). We also discuss a new splicing method based on the MapReduce strategy and Hadoop. By comparing these algorithms, some conclusions are drawn and some suggestions on gene splicing research are made.

  7. GreenPhylDB v2.0: comparative and functional genomics in plants.

    PubMed

    Rouard, Mathieu; Guignon, Valentin; Aluome, Christelle; Laporte, Marie-Angélique; Droc, Gaëtan; Walde, Christian; Zmasek, Christian M; Périn, Christophe; Conte, Matthieu G

    2011-01-01

    GreenPhylDB is a database designed for comparative and functional genomics based on complete genomes. Version 2 now contains sixteen full genomes of members of the plantae kingdom, ranging from algae to angiosperms, automatically clustered into gene families. Gene families are manually annotated and then analyzed phylogenetically in order to elucidate orthologous and paralogous relationships. The database offers various lists of gene families including plant, phylum and species specific gene families. For each gene cluster or gene family, easy access to gene composition, protein domains, publications, external links and orthologous gene predictions is provided. Web interfaces have been further developed to improve the navigation through information related to gene families. New analysis tools are also available, such as a gene family ontology browser that facilitates exploration. GreenPhylDB is a component of the South Green Bioinformatics Platform (http://southgreen.cirad.fr/) and is accessible at http://greenphyl.cirad.fr. It enables comparative genomics in a broad taxonomy context to enhance the understanding of evolutionary processes and thus tends to speed up gene discovery.

  8. GEAR: genomic enrichment analysis of regional DNA copy number changes.

    PubMed

    Kim, Tae-Min; Jung, Yu-Chae; Rhyu, Mun-Gan; Jung, Myeong Ho; Chung, Yeun-Jun

    2008-02-01

    We developed an algorithm named GEAR (genomic enrichment analysis of regional DNA copy number changes) for functional interpretation of genome-wide DNA copy number changes identified by array-based comparative genomic hybridization. GEAR selects two types of chromosomal alterations with potential biological relevance, i.e. recurrent and phenotype-specific alterations. Then it performs functional enrichment analysis using a priori selected functional gene sets to identify primary and clinical genomic signatures. The genomic signatures identified by GEAR represent functionally coordinated genomic changes, which can provide clues on the underlying molecular mechanisms related to the phenotypes of interest. GEAR can help the identification of key molecular functions that are activated or repressed in the tumor genomes leading to the improved understanding on the tumor biology. GEAR software is available with online manual in the website, http://www.systemsbiology.co.kr/GEAR/.

  9. Comparative analysis of grapevine whole-genome gene predictions, functional annotation, categorization and integration of the predicted gene sequences

    PubMed Central

    2012-01-01

    Background The first draft assembly and gene prediction of the grapevine genome (8X base coverage) was made available to the scientific community in 2007, and functional annotation was developed on this gene prediction. Since then additional Sanger sequences were added to the 8X sequences pool and a new version of the genomic sequence with superior base coverage (12X) was produced. Results In order to more efficiently annotate the function of the genes predicted in the new assembly, it is important to build on as much of the previous work as possible, by transferring 8X annotation of the genome to the 12X version. The 8X and 12X assemblies and gene predictions of the grapevine genome were compared to answer the question, “Can we uniquely map 8X predicted genes to 12X predicted genes?” The results show that while the assemblies and gene structure predictions are too different to make a complete mapping between them, most genes (18,725) showed a one-to-one relationship between 8X predicted genes and the last version of 12X predicted genes. In addition, reshuffled genomic sequence structures appeared. These highlight regions of the genome where the gene predictions need to be taken with caution. Based on the new grapevine gene functional annotation and in-depth functional categorization, twenty eight new molecular networks have been created for VitisNet while the existing networks were updated. Conclusions The outcomes of this study provide a functional annotation of the 12X genes, an update of VitisNet, the system of the grapevine molecular networks, and a new functional categorization of genes. Data are available at the VitisNet website (http://www.sdstate.edu/ps/research/vitis/pathways.cfm). PMID:22554261

  10. A pan-genomic approach to understand the basis of host adaptation in Achromobacter.

    PubMed

    Jeukens, J; Freschi, L; Vincent, A T; Emond-Rheault, J G; Kukavica-Ibrulj, I; Charette, S J; Levesque, R C

    2017-04-05

    Over the past decade, there has been a rising interest in Achromobacter sp., an emerging opportunistic pathogen responsible for nosocomial and cystic fibrosis (CF) lung infections. Species of this genus are ubiquitous in the environment, can outcompete resident microbiota, and are resistant to commonly used disinfectants as well as antibiotics. Nevertheless, the Achromobacter genus suffers from difficulties in diagnosis, unresolved taxonomy and limited understanding of how it adapts to the CF lung, not to mention other host environments. The goals of this first genus-wide comparative genomics study were to clarify the taxonomy of this genus and identify genomic features associated with pathogenicity and host adaptation. This was done with a widely applicable approach based on pan-genome analysis. First, using all publicly available genomes, a combination of phylogenetic analysis based on 1,780 conserved genes with average nucleotide identity and accessory genome composition allowed the identification of a largely clinical lineage composed of A. xylosoxidans A insuavis A. dolens and A. ruhlandii. Within this lineage, we identified 35 positively selected genes involved in metabolism, regulation and efflux-mediated antibiotic resistance. Second, resistome analysis showed that this clinical lineage carried additional antibiotic resistance genes compared to other isolates. Finally, we identified putative mobile elements that contribute 53% of the genus's resistome and support horizontal gene transfer between Achromobacter and other ecologically similar genera. This study provides strong phylogenetic and pan-genomic bases to motivate further research on Achromobacter, and contributes to the understanding of opportunistic pathogen evolution. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  11. MycoCosm, an Integrated Fungal Genomics Resource

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shabalov, Igor; Grigoriev, Igor

    2012-03-16

    MycoCosm is a web-based interactive fungal genomics resource, which was first released in March 2010, in response to an urgent call from the fungal community for integration of all fungal genomes and analytical tools in one place (Pan-fungal data resources meeting, Feb 21-22, 2010, Alexandria, VA). MycoCosm integrates genomics data and analysis tools to navigate through over 100 fungal genomes sequenced at JGI and elsewhere. This resource allows users to explore fungal genomes in the context of both genome-centric analysis and comparative genomics, and promotes user community participation in data submission, annotation and analysis. MycoCosm has over 4500 unique visitors/monthmore » or 35000+ visitors/year as well as hundreds of registered users contributing their data and expertise to this resource. Its scalable architecture allows significant expansion of the data expected from JGI Fungal Genomics Program, its users, and integration with external resources used by fungal community.« less

  12. Alternaria section Alternaria: Species, formae speciales or pathotypes?

    PubMed Central

    Woudenberg, J.H.C.; Seidl, M.F.; Groenewald, J.Z.; de Vries, M.; Stielow, J.B.; Thomma, B.P.H.J.; Crous, P.W.

    2015-01-01

    The cosmopolitan fungal genus Alternaria consists of multiple saprophytic and pathogenic species. Based on phylogenetic and morphological studies, the genus is currently divided into 26 sections. Alternaria sect. Alternaria contains most of the small-spored Alternaria species with concatenated conidia, including important plant, human and postharvest pathogens. Species within sect. Alternaria have been mostly described based on morphology and / or host-specificity, yet molecular variation between them is minimal. To investigate whether the described morphospecies within sect. Alternaria are supported by molecular data, whole-genome sequencing of nine Alternaria morphospecies supplemented with transcriptome sequencing of 12 Alternaria morphospecies as well as multi-gene sequencing of 168 Alternaria isolates was performed. The assembled genomes ranged in size from 33.3–35.2 Mb within sect. Alternaria and from 32.0–39.1 Mb for all Alternaria genomes. The number of repetitive sequences differed significantly between the different Alternaria genomes; ranging from 1.4–16.5 %. The repeat content within sect. Alternaria was relatively low with only 1.4–2.7 % of repeats. Whole-genome alignments revealed 96.7–98.2 % genome identity between sect. Alternaria isolates, compared to 85.1–89.3 % genome identity for isolates from other sections to the A. alternata reference genome. Similarly, 1.4–2.8 % and 0.8–1.8 % single nucleotide polymorphisms (SNPs) were observed in genomic and transcriptomic sequences, respectively, between isolates from sect. Alternaria, while the percentage of SNPs found in isolates from different sections compared to the A. alternata reference genome was considerably higher; 8.0–10.3 % and 6.1–8.5 %. The topology of a phylogenetic tree based on the whole-genome and transcriptome reads was congruent with multi-gene phylogenies based on commonly used gene regions. Based on the genome and transcriptome data, a set of core proteins was extracted, and primers were designed on two gene regions with a relatively low degree of conservation within sect. Alternaria (96.8 and 97.3 % conservation). Their potential discriminatory power within sect. Alternaria was tested next to nine commonly used gene regions in sect. Alternaria, namely the SSU, LSU, ITS, gapdh, rpb2, tef1, Alt a 1, endoPG and OPA10-2 gene regions. The phylogenies from the two gene regions with a relatively low conservation, KOG1058 and KOG1077, could not distinguish the described morphospecies within sect. Alternaria more effectively than the phylogenies based on the commonly used gene regions for Alternaria. Based on genome and transcriptome comparisons and molecular phylogenies, Alternaria sect. Alternaria consists of only 11 phylogenetic species and one species complex. Thirty-five morphospecies, which cannot be distinguished based on the multi-gene phylogeny, are synonymised under A. alternata. By providing guidelines for the naming and identification of phylogenetic species in Alternaria sect. Alternaria, this manuscript provides a clear and stable species classification in this section. PMID:26951037

  13. Comparative Analysis of the Base Compositions of the Pre-mRNA 3′ Cleaved-Off Region and the mRNA 3′ Untranslated Region Relative to the Genomic Base Composition in Animals and Plants

    PubMed Central

    Li, Xiu-Qing

    2014-01-01

    The precursor messenger RNA (pre-mRNA) three-prime cleaved-off region (3′COR) and the mRNA three-prime untranslated region (3′UTR) play critical roles in regulating gene expression. The differences in base composition between these regions and the corresponding genomes are still largely uncharacterized in animals and plants. In this study, the base compositions of non-redundant 3′CORs and 3′UTRs were compared with the corresponding whole genomes of eleven animals, four dicotyledonous plants, and three monocotyledonous (cereal) plants. Among the four bases (A, C, G, and U for adenine, cytosine, guanine, and uracil, respectively), U (which corresponds to T, for thymine, in DNA) was the most frequent, A the second most frequent, G the third most frequent, and C the least frequent in most of the species in both the 3′COR and 3′UTR regions. In comparison with the whole genomes, in both regions the U content was usually the most overrepresented (particularly in the monocotyledonous plants), and the C content was the most underrepresented. The order obtained for the species groups, when ranked from high to low according to the U contents in the 3′COR and 3′UTR was as follows: dicotyledonous plants, monocotyledonous plants, non-mammal animals, and mammals. In contrast, the genomic T content was highest in dicotyledonous plants, lowest in monocotyledonous plants, and intermediate in animals. These results suggest the following: 1) there is a mechanism operating in both animals and plants which is biased toward U and against C in the 3′COR and 3′UTR; 2) the 3′UTR and 3′COR, as functional units, minimized the difference between dicotyledonous and monocotyledonous plants, while the dicotyledonous and monocotyledonous genomes evolved into two extreme groups in terms of base composition. PMID:24941005

  14. An ancient genome duplication contributed to the abundance of metabolic genes in the moss Physcomitrella patens

    PubMed Central

    Rensing, Stefan A; Ick, Julia; Fawcett, Jeffrey A; Lang, Daniel; Zimmer, Andreas; Van de Peer, Yves; Reski, Ralf

    2007-01-01

    Background: Analyses of complete genomes and large collections of gene transcripts have shown that most, if not all seed plants have undergone one or more genome duplications in their evolutionary past. Results: In this study, based on a large collection of EST sequences, we provide evidence that the haploid moss Physcomitrella patens is a paleopolyploid as well. Based on the construction of linearized phylogenetic trees we infer the genome duplication to have occurred between 30 and 60 million years ago. Gene Ontology and pathway association of the duplicated genes in P. patens reveal different biases of gene retention compared with seed plants. Conclusion: Metabolic genes seem to have been retained in excess following the genome duplication in P. patens. This might, at least partly, explain the versatility of metabolism, as described for P. patens and other mosses, in comparison to other land plants. PMID:17683536

  15. Draft genome of neurotropic nematode parasite Angiostrongylus cantonensis, causative agent of human eosinophilic meningitis.

    PubMed

    Yong, Hoi-Sen; Eamsobhana, Praphathip; Lim, Phaik-Eem; Razali, Rozaimi; Aziz, Farhanah Abdul; Rosli, Nurul Shielawati Mohamed; Poole-Johnson, Johan; Anwar, Arif

    2015-08-01

    Angiostrongylus cantonensis is a bursate nematode parasite that causes eosinophilic meningitis (or meningoencephalitis) in humans in many parts of the world. The genomic data from A. cantonensis will form a useful resource for comparative genomic and chemogenomic studies to aid the development of diagnostics and therapeutics. We have sequenced, assembled and annotated the genome of A. cantonensis. The genome size is estimated to be ∼260 Mb, with 17,280 genomic scaffolds, 91X coverage, 81.45% for complete and 93.95% for partial score based on CEGMA analysis of genome completeness. The number of predicted genes of ≥300 bp was 17,482. A total of 7737 predicted protein-coding genes of ≥50 amino acids were identified in the assembled genome. Among the proteins of known function, kinases are the most abundant followed by transferases. The draft genome contains 34 excretory-secretory proteins (ES), a minimum of 44 Nematode Astacin (NAS) metalloproteases, 12 Homeobox (HOX) genes, and 30 neurotransmitters. The assembled genome size (260 Mb) is larger than those of Pristionchus pacificus, Caenorhabditis elegans, Necator americanus, Caenorhabditis briggsae, Trichinella spiralis, Brugia malayi and Loa loa, but smaller than Haemonchus contortus and Ascaris suum. The repeat content (25%) is similar to H. contortus. The GC content (41.17%) is lower compared to P. pacificus (42.7%) and H. contortus (43.1%) but higher compared to C. briggsae (37.69%), A. suum (37.9%) and N. americanus (40.2%) while the scaffold N50 is 42,191. This draft genome will facilitate the understanding of many unresolved issues on the parasite and the disorder it causes. Copyright © 2015 Elsevier B.V. All rights reserved.

  16. Array-Based Comparative Genomic Hybridization for the Genomewide Detection of Submicroscopic Chromosomal Abnormalities

    PubMed Central

    Vissers, Lisenka E. L. M. ; de Vries, Bert B. A. ; Osoegawa, Kazutoyo ; Janssen, Irene M. ; Feuth, Ton ; Choy, Chik On ; Straatman, Huub ; van der Vliet, Walter ; Huys, Erik H. L. P. G. ; van Rijk, Anke ; Smeets, Dominique ; van Ravenswaaij-Arts, Conny M. A. ; Knoers, Nine V. ; van der Burgt, Ineke ; de Jong, Pieter J. ; Brunner, Han G. ; van Kessel, Ad Geurts ; Schoenmakers, Eric F. P. M. ; Veltman, Joris A. 

    2003-01-01

    Microdeletions and microduplications, not visible by routine chromosome analysis, are a major cause of human malformation and mental retardation. Novel high-resolution, whole-genome technologies can improve the diagnostic detection rate of these small chromosomal abnormalities. Array-based comparative genomic hybridization allows such a high-resolution screening by hybridizing differentially labeled test and reference DNAs to arrays consisting of thousands of genomic clones. In this study, we tested the diagnostic capacity of this technology using ∼3,500 flourescent in situ hybridization–verified clones selected to cover the genome with an average of 1 clone per megabase (Mb). The sensitivity and specificity of the technology were tested in normal-versus-normal control experiments and through the screening of patients with known microdeletion syndromes. Subsequently, a series of 20 cytogenetically normal patients with mental retardation and dysmorphisms suggestive of a chromosomal abnormality were analyzed. In this series, three microdeletions and two microduplications were identified and validated. Two of these genomic changes were identified also in one of the parents, indicating that these are large-scale genomic polymorphisms. Deletions and duplications as small as 1 Mb could be reliably detected by our approach. The percentage of false-positive results was reduced to a minimum by use of a dye-swap-replicate analysis, all but eliminating the need for laborious validation experiments and facilitating implementation in a routine diagnostic setting. This high-resolution assay will facilitate the identification of novel genes involved in human mental retardation and/or malformation syndromes and will provide insight into the flexibility and plasticity of the human genome. PMID:14628292

  17. ABACAS: algorithm-based automatic contiguation of assembled sequences

    PubMed Central

    Assefa, Samuel; Keane, Thomas M.; Otto, Thomas D.; Newbold, Chris; Berriman, Matthew

    2009-01-01

    Summary: Due to the availability of new sequencing technologies, we are now increasingly interested in sequencing closely related strains of existing finished genomes. Recently a number of de novo and mapping-based assemblers have been developed to produce high quality draft genomes from new sequencing technology reads. New tools are necessary to take contigs from a draft assembly through to a fully contiguated genome sequence. ABACAS is intended as a tool to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence. The input to ABACAS is a set of contigs which will be aligned to the reference genome, ordered and orientated, visualized in the ACT comparative browser, and optimal primer sequences are automatically generated. Availability and Implementation: ABACAS is implemented in Perl and is freely available for download from http://abacas.sourceforge.net Contact: sa4@sanger.ac.uk PMID:19497936

  18. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives.

    PubMed

    Zhao, Min; Wang, Qingguo; Wang, Quan; Jia, Peilin; Zhao, Zhongming

    2013-01-01

    Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development.

  19. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives

    PubMed Central

    2013-01-01

    Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development. PMID:24564169

  20. Comparative chloroplast genomes of eleven Schima (Theaceae) species: Insights into DNA barcoding and phylogeny.

    PubMed

    Yu, Xiang-Qin; Drew, Bryan T; Yang, Jun-Bo; Gao, Lian-Ming; Li, De-Zhu

    2017-01-01

    Schima is an ecologically and economically important woody genus in tea family (Theaceae). Unresolved species delimitations and phylogenetic relationships within Schima limit our understanding of the genus and hinder utilization of the genus for economic purposes. In the present study, we conducted comparative analysis among the complete chloroplast (cp) genomes of 11 Schima species. Our results indicate that Schima cp genomes possess a typical quadripartite structure, with conserved genomic structure and gene order. The size of the Schima cp genome is about 157 kilo base pairs (kb). They consistently encode 114 unique genes, including 80 protein-coding genes, 30 tRNAs, and 4 rRNAs, with 17 duplicated in the inverted repeat (IR). These cp genomes are highly conserved and do not show obvious expansion or contraction of the IR region. The percent variability of the 68 coding and 93 noncoding (>150 bp) fragments is consistently less than 3%. The seven most widely touted DNA barcode regions as well as one promising barcode candidate showed low sequence divergence. Eight mutational hotspots were identified from the 11 cp genomes. These hotspots may potentially be useful as specific DNA barcodes for species identification of Schima. The 58 cpSSR loci reported here are complementary to the microsatellite markers identified from the nuclear genome, and will be leveraged for further population-level studies. Phylogenetic relationships among the 11 Schima species were resolved with strong support based on the cp genome data set, which corresponds well with the species distribution pattern. The data presented here will serve as a foundation to facilitate species identification, DNA barcoding and phylogenetic reconstructions for future exploration of Schima.

  1. Cost-effectiveness of the Decipher Genomic Classifier to Guide Individualized Decisions for Early Radiation Therapy After Prostatectomy for Prostate Cancer.

    PubMed

    Lobo, Jennifer M; Trifiletti, Daniel M; Sturz, Vanessa N; Dicker, Adam P; Buerki, Christine; Davicioni, Elai; Cooperberg, Matthew R; Karnes, R Jeffrey; Jenkins, Robert B; Den, Robert B; Showalter, Timothy N

    2017-06-01

    Controversy exists regarding the effectiveness of early adjuvant versus salvage radiation therapy after prostatectomy for prostate cancer. Estimates of prostate cancer progression from the Decipher genomic classifier (GC) could guide informed decision-making and improve the outcomes for patients. We developed a Markov model to compare the costs and quality-adjusted life years (QALYs) associated with GC-based treatment decisions regarding adjuvant therapy after prostatectomy with those of 2 control strategies: usual care (determined from patterns of care studies) and the alternative of 100% adjuvant radiation therapy. Using the bootstrapping method of sampling with replacement, the cases of 10,000 patients were simulated during a 10-year time horizon, with each subject having individual estimates for cancer progression (according to GC findings) and noncancer mortality (according to age). GC-based care was more effective and less costly than 100% adjuvant radiation therapy and resulted in cost savings up to an assay cost of $11,402. Compared with usual care, GC-based care resulted in more QALYs. Assuming a $4000 assay cost, the incremental cost-effectiveness ratio was $90,833 per QALY, assuming a 7% usage rate of adjuvant radiation therapy. GC-based care was also associated with a 16% reduction in the percentage of patients with distant metastasis at 5 years compared with usual care. The Decipher GC could be a cost-effective approach for genomics-driven cancer treatment decisions after prostatectomy, with improvements in estimated clinical outcomes compared with usual care. The individualized decision analytic framework applied in the present study offers a flexible approach to estimate the potential utility of genomic assays for personalized cancer medicine. Copyright © 2016 Elsevier Inc. All rights reserved.

  2. Evolutionary and comparative analyses of the soybean genome

    PubMed Central

    Cannon, Steven B.; Shoemaker, Randy C.

    2012-01-01

    The soybean genome assembly has been available since the end of 2008. Significant features of the genome include large, gene-poor, repeat-dense pericentromeric regions, spanning roughly 57% of the genome sequence; a relatively large genome size of ~1.15 billion bases; remnants of a genome duplication that occurred ~13 million years ago (Mya); and fainter remnants of older polyploidies that occurred ~58 Mya and >130 Mya. The genome sequence has been used to identify the genetic basis for numerous traits, including disease resistance, nutritional characteristics, and developmental features. The genome sequence has provided a scaffold for placement of many genomic feature elements, both from within soybean and from related species. These may be accessed at several websites, including http://www.phytozome.net, http://soybase.org, http://comparative-legumes.org, and http://www.legumebase.brc.miyazaki-u.ac.jp. The taxonomic position of soybean in the Phaseoleae tribe of the legumes means that there are approximately two dozen other beans and relatives that have undergone independent domestication, and which may have traits that will be useful for transfer to soybean. Methods of translating information between species in the Phaseoleae range from design of markers for marker assisted selection, to transformation with Agrobacterium or with other experimental transformation methods. PMID:23136483

  3. Bluejay 1.0: genome browsing and comparison with rich customization provision and dynamic resource linking

    PubMed Central

    Soh, Jung; Gordon, Paul MK; Taschuk, Morgan L; Dong, Anguo; Ah-Seng, Andrew C; Turinsky, Andrei L; Sensen, Christoph W

    2008-01-01

    Background The Bluejay genome browser has been developed over several years to address the challenges posed by the ever increasing number of data types as well as the increasing volume of data in genome research. Beginning with a browser capable of rendering views of XML-based genomic information and providing scalable vector graphics output, we have now completed version 1.0 of the system with many additional features. Our development efforts were guided by our observation that biologists who use both gene expression profiling and comparative genomics gain functional insights above and beyond those provided by traditional per-gene analyses. Results Bluejay 1.0 is a genome viewer integrating genome annotation with: (i) gene expression information; and (ii) comparative analysis with an unlimited number of other genomes in the same view. This allows the biologist to see a gene not just in the context of its genome, but also its regulation and its evolution. Bluejay now has rich provision for personalization by users: (i) numerous display customization features; (ii) the availability of waypoints for marking multiple points of interest on a genome and subsequently utilizing them; and (iii) the ability to take user relevance feedback of annotated genes or textual items to offer personalized recommendations. Bluejay 1.0 also embeds the Seahawk browser for the Moby protocol, enabling users to seamlessly invoke hundreds of Web Services on genomic data of interest without any hard-coding. Conclusion Bluejay offers a unique set of customizable genome-browsing features, with the goal of allowing biologists to quickly focus on, analyze, compare, and retrieve related information on the parts of the genomic data they are most interested in. We expect these capabilities of Bluejay to benefit the many biologists who want to answer complex questions using the information available from completely sequenced genomes. PMID:18940007

  4. The Biofuel Feedstock Genomics Resource: a web-based portal and database to enable functional genomics of plant biofuel feedstock species.

    PubMed

    Childs, Kevin L; Konganti, Kranti; Buell, C Robin

    2012-01-01

    Major feedstock sources for future biofuel production are likely to be high biomass producing plant species such as poplar, pine, switchgrass, sorghum and maize. One active area of research in these species is genome-enabled improvement of lignocellulosic biofuel feedstock quality and yield. To facilitate genomic-based investigations in these species, we developed the Biofuel Feedstock Genomic Resource (BFGR), a database and web-portal that provides high-quality, uniform and integrated functional annotation of gene and transcript assembly sequences from species of interest to lignocellulosic biofuel feedstock researchers. The BFGR includes sequence data from 54 species and permits researchers to view, analyze and obtain annotation at the gene, transcript, protein and genome level. Annotation of biochemical pathways permits the identification of key genes and transcripts central to the improvement of lignocellulosic properties in these species. The integrated nature of the BFGR in terms of annotation methods, orthologous/paralogous relationships and linkage to seven species with complete genome sequences allows comparative analyses for biofuel feedstock species with limited sequence resources. Database URL: http://bfgr.plantbiology.msu.edu.

  5. Genomic Approaches to Zebrafish Cancer

    PubMed Central

    2017-01-01

    The zebrafish has emerged as an important model for studying cancer biology. Identification of DNA, RNA and chromatin abnormalities can give profound insight into the mechanisms of tumorigenesis and the there are many techniques for analyzing the genomes of these tumors. Here, I present an overview of the available technologies for analyzing tumor genomes in the zebrafish, including array based methods as well as next-generation sequencing technologies. I also discuss the ways in which zebrafish tumor genomes can be compared to human genomes using cross-species oncogenomics, which act to filter genomic noise and ultimately uncover central drivers of malignancy. Finally, I discuss downstream analytic tools, including network analysis, that can help to organize the alterations into coherent biological frameworks that can then be investigated further. PMID:27165352

  6. Microarray-based Comparative Genomic Indexing of the Cronobacter genus (Enterobacter sakazakii)

    USDA-ARS?s Scientific Manuscript database

    Cronobacter is a recently defined genus synonymous with Enterobacter sakazakii. This new genus currently comprises 6 genomospecies. To extend our understanding of the genetic relationship between Cronobacter sakazakii BAA-894 and the other species of this genus, microarray-based comparative genomi...

  7. GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes.

    PubMed

    Yuan, Lina; Yu, Yang; Zhu, Yanmin; Li, Yulai; Li, Changqing; Li, Rujiao; Ma, Qin; Siu, Gilman Kit-Hang; Yu, Jun; Jiang, Taijiao; Xiao, Jingfa; Kang, Yu

    2017-01-25

    Next-generation sequencing (NGS) technologies have greatly promoted the genomic study of prokaryotes. However, highly fragmented assemblies due to short reads from NGS are still a limiting factor in gaining insights into the genome biology. Reference-assisted tools are promising in genome assembly, but tend to result in false assembly when the assigned reference has extensive rearrangements. Herein, we present GAAP, a genome assembly pipeline for scaffolding based on core-gene-defined Genome Organizational Framework (cGOF) described in our previous study. Instead of assigning references, we use the multiple-reference-derived cGOFs as indexes to assist in order and orientation of the scaffolds and build a skeleton structure, and then use read pairs to extend scaffolds, called local scaffolding, and distinguish between true and chimeric adjacencies in the scaffolds. In our performance tests using both empirical and simulated data of 15 genomes in six species with diverse genome size, complexity, and all three categories of cGOFs, GAAP outcompetes or achieves comparable results when compared to three other reference-assisted programs, AlignGraph, Ragout and MeDuSa. GAAP uses both cGOF and pair-end reads to create assemblies in genomic scale, and performs better than the currently available reference-assisted assembly tools as it recovers more assemblies and makes fewer false locations, especially for species with extensive rearranged genomes. Our method is a promising solution for reconstruction of genome sequence from short reads of NGS.

  8. The Complete Chloroplast Genome Sequence of a Relict Conifer Glyptostrobus pensilis: Comparative Analysis and Insights into Dynamics of Chloroplast Genome Rearrangement in Cupressophytes and Pinaceae

    PubMed Central

    Zheng, Renhua; Xu, Haibin; Zhou, Yanwei; Li, Meiping; Lu, Fengjuan; Dong, Yini; Liu, Xin; Chen, Jinhui; Shi, Jisen

    2016-01-01

    Glyptostrobus pensilis, belonging to the monotypic genus Glyptostrobus (Family: Cupressaceae), is an ancient conifer that is naturally distributed in low-lying wet areas. Here, we report the complete chloroplast (cp) genome sequence (132,239 bp) of G. pensilis. The G. pensilis cp genome is similar in gene content, organization and genome structure to the sequenced cp genomes from other cupressophytes, especially with respect to the loss of the inverted repeat region A (IRA). Through phylogenetic analysis, we demonstrated that the genus Glyptostrobus is closely related to the genus Cryptomeria, supporting previous findings based on physiological characteristics. Since IRs play an important role in stabilize cp genome and conifer cp genomes lost different IR regions after splitting in two clades (cupressophytes and Pinaceae), we performed cp genome rearrangement analysis and found more extensive cp genome rearrangements among the species of cupressophytes relative to Pinaceae. Additional repeat analysis indicated that cupressophytes cp genomes contained less potential functional repeats, especially in Cupressaceae, compared with Pinaceae. These results suggested that dynamics of cp genome rearrangement in conifers differed since the two clades, Pinaceae and cupressophytes, lost IR copies independently and developed different repeats to complement the residual IRs. In addition, we identified 170 perfect simple sequence repeats that will be useful in future research focusing on the evolution of genetic diversity and conservation of genetic variation for this endangered species in the wild. PMID:27560965

  9. Molecular Networking and Pattern-Based Genome Mining Improves Discovery of Biosynthetic Gene Clusters and their Products from Salinispora Species

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna

    Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less

  10. Molecular Networking and Pattern-Based Genome Mining Improves Discovery of Biosynthetic Gene Clusters and their Products from Salinispora Species

    DOE PAGES

    Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; ...

    2015-04-09

    Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less

  11. Molecular Networking and Pattern-Based Genome Mining Improves discovery of biosynthetic gene clusters and their products from Salinispora species

    PubMed Central

    Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; Sarkar, Anindita; Li, Jie; Ziemert, Nadine; Wang, Mingxun; Bandeira, Nuno; Moore, Bradley S.; Dorrestein, Pieter C.; Jensen, Paul R.

    2015-01-01

    Summary Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. Here we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated the identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. These efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches. PMID:25865308

  12. Pyrosequencing-based comparative genome analysis of the nosocomial pathogen Enterococcus faecium and identification of a large transferable pathogenicity island

    PubMed Central

    2010-01-01

    Background The Gram-positive bacterium Enterococcus faecium is an important cause of nosocomial infections in immunocompromized patients. Results We present a pyrosequencing-based comparative genome analysis of seven E. faecium strains that were isolated from various sources. In the genomes of clinical isolates several antibiotic resistance genes were identified, including the vanA transposon that confers resistance to vancomycin in two strains. A functional comparison between E. faecium and the related opportunistic pathogen E. faecalis based on differences in the presence of protein families, revealed divergence in plant carbohydrate metabolic pathways and oxidative stress defense mechanisms. The E. faecium pan-genome was estimated to be essentially unlimited in size, indicating that E. faecium can efficiently acquire and incorporate exogenous DNA in its gene pool. One of the most prominent sources of genomic diversity consists of bacteriophages that have integrated in the genome. The CRISPR-Cas system, which contributes to immunity against bacteriophage infection in prokaryotes, is not present in the sequenced strains. Three sequenced isolates carry the esp gene, which is involved in urinary tract infections and biofilm formation. The esp gene is located on a large pathogenicity island (PAI), which is between 64 and 104 kb in size. Conjugation experiments showed that the entire esp PAI can be transferred horizontally and inserts in a site-specific manner. Conclusions Genes involved in environmental persistence, colonization and virulence can easily be aquired by E. faecium. This will make the development of successful treatment strategies targeted against this organism a challenge for years to come. PMID:20398277

  13. GeNets: a unified web platform for network-based genomic analyses.

    PubMed

    Li, Taibo; Kim, April; Rosenbluh, Joseph; Horn, Heiko; Greenfeld, Liraz; An, David; Zimmer, Andrew; Liberzon, Arthur; Bistline, Jon; Natoli, Ted; Li, Yang; Tsherniak, Aviad; Narayan, Rajiv; Subramanian, Aravind; Liefeld, Ted; Wong, Bang; Thompson, Dawn; Calvo, Sarah; Carr, Steve; Boehm, Jesse; Jaffe, Jake; Mesirov, Jill; Hacohen, Nir; Regev, Aviv; Lage, Kasper

    2018-06-18

    Functional genomics networks are widely used to identify unexpected pathway relationships in large genomic datasets. However, it is challenging to compare the signal-to-noise ratios of different networks and to identify the optimal network with which to interpret a particular genetic dataset. We present GeNets, a platform in which users can train a machine-learning model (Quack) to carry out these comparisons and execute, store, and share analyses of genetic and RNA-sequencing datasets.

  14. The pomegranate (Punica granatum L.) genome provides insights into fruit quality and ovule developmental biology.

    PubMed

    Yuan, Zhaohe; Fang, Yanming; Zhang, Taikui; Fei, Zhangjun; Han, Fengming; Liu, Cuiyu; Liu, Min; Xiao, Wei; Zhang, Wenjing; Wu, Shan; Zhang, Mengwei; Ju, Youhui; Xu, Huili; Dai, He; Liu, Yujun; Chen, Yanhui; Wang, Lili; Zhou, Jianqing; Guan, Dian; Yan, Ming; Xia, Yanhua; Huang, Xianbin; Liu, Dongyuan; Wei, Hongmin; Zheng, Hongkun

    2017-12-22

    Pomegranate (Punica granatum L.) has an ancient cultivation history and has become an emerging profitable fruit crop due to its attractive features such as the bright red appearance and the high abundance of medicinally valuable ellagitannin-based compounds in its peel and aril. However, the limited genomic resources have restricted further elucidation of genetics and evolution of these interesting traits. Here, we report a 274-Mb high-quality draft pomegranate genome sequence, which covers approximately 81.5% of the estimated 336-Mb genome, consists of 2177 scaffolds with an N50 size of 1.7 Mb and contains 30 903 genes. Phylogenomic analysis supported that pomegranate belongs to the Lythraceae family rather than the monogeneric Punicaceae family, and comparative analyses showed that pomegranate and Eucalyptus grandis share the paleotetraploidy event. Integrated genomic and transcriptomic analyses provided insights into the molecular mechanisms underlying the biosynthesis of ellagitannin-based compounds, the colour formation in both peels and arils during pomegranate fruit development, and the unique ovule development processes that are characteristic of pomegranate. This genome sequence provides an important resource to expand our understanding of some unique biological processes and to facilitate both comparative biology studies and crop breeding. © 2017 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  15. Comparative chloroplast genomics and phylogenetics of Fagopyrum esculentum ssp. ancestrale – A wild ancestor of cultivated buckwheat

    PubMed Central

    Logacheva, Maria D; Samigullin, Tahir H; Dhingra, Amit; Penin, Aleksey A

    2008-01-01

    Background Chloroplast genome sequences are extremely informative about species-interrelationships owing to its non-meiotic and often uniparental inheritance over generations. The subject of our study, Fagopyrum esculentum, is a member of the family Polygonaceae belonging to the order Caryophyllales. An uncertainty remains regarding the affinity of Caryophyllales and the asterids that could be due to undersampling of the taxa. With that background, having access to the complete chloroplast genome sequence for Fagopyrum becomes quite pertinent. Results We report the complete chloroplast genome sequence of a wild ancestor of cultivated buckwheat, Fagopyrum esculentum ssp. ancestrale. The sequence was rapidly determined using a previously described approach that utilized a PCR-based method and employed universal primers, designed on the scaffold of multiple sequence alignment of chloroplast genomes. The gene content and order in buckwheat chloroplast genome is similar to Spinacia oleracea. However, some unique structural differences exist: the presence of an intron in the rpl2 gene, a frameshift mutation in the rpl23 gene and extension of the inverted repeat region to include the ycf1 gene. Phylogenetic analysis of 61 protein-coding gene sequences from 44 complete plastid genomes provided strong support for the sister relationships of Caryophyllales (including Polygonaceae) to asterids. Further, our analysis also provided support for Amborella as sister to all other angiosperms, but interestingly, in the bayesian phylogeny inference based on first two codon positions Amborella united with Nymphaeales. Conclusion Comparative genomics analyses revealed that the Fagopyrum chloroplast genome harbors the characteristic gene content and organization as has been described for several other chloroplast genomes. However, it has some unique structural features distinct from previously reported complete chloroplast genome sequences. Phylogenetic analysis of the dataset, including this new sequence from non-core Caryophyllales supports the sister relationship between Caryophyllales and asterids. PMID:18492277

  16. WormBase ParaSite - a comprehensive resource for helminth genomics.

    PubMed

    Howe, Kevin L; Bolt, Bruce J; Shafie, Myriam; Kersey, Paul; Berriman, Matthew

    2017-07-01

    The number of publicly available parasitic worm genome sequences has increased dramatically in the past three years, and research interest in helminth functional genomics is now quickly gathering pace in response to the foundation that has been laid by these collective efforts. A systematic approach to the organisation, curation, analysis and presentation of these data is clearly vital for maximising the utility of these data to researchers. We have developed a portal called WormBase ParaSite (http://parasite.wormbase.org) for interrogating helminth genomes on a large scale. Data from over 100 nematode and platyhelminth species are integrated, adding value by way of systematic and consistent functional annotation (e.g. protein domains and Gene Ontology terms), gene expression analysis (e.g. alignment of life-stage specific transcriptome data sets), and comparative analysis (e.g. orthologues and paralogues). We provide several ways of exploring the data, including genome browsers, genome and gene summary pages, text search, sequence search, a query wizard, bulk downloads, and programmatic interfaces. In this review, we provide an overview of the back-end infrastructure and analysis behind WormBase ParaSite, and the displays and tools available to users for interrogating helminth genomic data. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  17. Comparison between genotyping by sequencing and SNP-chip genotyping in QTL mapping in wheat

    USDA-ARS?s Scientific Manuscript database

    Array- or chip-based single nucleotide polymorphism (SNP) markers are widely used in genomic studies because of their abundance in a genome and cost less per data point compared to older marker technologies. Genotyping by sequencing (GBS), a relatively newer approach of genotyping, suggests equal or...

  18. TabPath: interactive tables for metabolic pathway analysis.

    PubMed

    Moraes, Lauro Ângelo Gonçalves de; Felestrino, Érica Barbosa; Assis, Renata de Almeida Barbosa; Matos, Diogo; Lima, Joubert de Castro; Lima, Leandro de Araújo; Almeida, Nalvo Franco; Setubal, João Carlos; Garcia, Camila Carrião Machado; Moreira, Leandro Marcio

    2018-03-15

    Information about metabolic pathways in a comparative context is one of the most powerful tool to help the understanding of genome-based differences in phenotypes among organisms. Although several platforms exist that provide a wealth of information on metabolic pathways of diverse organisms, the comparison among organisms using metabolic pathways is still a difficult task. We present TabPath (Tables for Metabolic Pathway), a web-based tool to facilitate comparison of metabolic pathways in genomes based on KEGG. From a selection of pathways and genomes of interest on the menu, TabPath generates user-friendly tables that facilitate analysis of variations in metabolism among the selected organisms. TabPath is available at http://200.239.132.160:8686. lmmorei@gmail.com.

  19. Proteolysis in hyperthermophilic microorganisms

    DOE PAGES

    Ward, Donald E.; Shockley, Keith R.; Chang, Lara S.; ...

    2002-01-01

    Proteases are found in every cell, where they recognize and break down unneeded or abnormal polypeptides or peptide-based nutrients within or outside the cell. Genome sequence data can be used to compare proteolytic enzyme inventories of different organisms as they relate to physiological needs for protein modification and hydrolysis. In this review, we exploit genome sequence data to compare hyperthermophilic microorganisms from the euryarchaeotal genus Pyrococcus , the crenarchaeote Sulfolobus solfataricus , and the bacterium Thermotoga maritima . An overview of the proteases in these organisms is given based on those proteases that have been characterized and on putative proteasesmore » that have been identified from genomic sequences, but have yet to be characterized. The analysis revealed both similarities and differences in the mechanisms utilized for proteolysis by each of these hyperthermophiles and indicated how these mechanisms relate to proteolysis in less thermophilic cells and organisms.« less

  20. Plant comparative genetics after 10 years.

    PubMed

    Gale, M D; Devos, K M

    1998-10-23

    The past 10 years have seen the discovery of unexpected levels of conservation of gene content and gene orders over millions of years of evolution within grasses, crucifers, legumes, some trees, and Solanaceae crops. Within the grasses, which include the three 500-million-ton-plus-per-year crops (wheat, maize, and rice), and the crucifers, which include all the Brassica crops, colinearity looks good enough to do most map-based cloning only in the small genome model species, rice and Arabidopsis. Elsewhere, knowledge gained in a few major crops is being pooled and applied across the board. The extrapolation of information from the well-studied species to orphan crops, which include many tropical species, is providing a solid base for their improvement. Genome rearrangements are giving new insights into evolution. In fact, comparative genetics is the key that will unlock the secrets of crop plants with genomes larger than that of humans.

  1. Molecular Markers and Cotton Genetic Improvement: Current Status and Future Prospects

    PubMed Central

    Malik, Waqas; Iqbal, Muhammad Zaffar; Ali Khan, Asif; Qayyum, Abdul; Ali Abid, Muhammad; Noor, Etrat; Qadir Ahmad, Muhammad; Hasan Abbasi, Ghulam

    2014-01-01

    Narrow genetic base and complex allotetraploid genome of cotton (Gossypium hirsutum L.) is stimulating efforts to avail required polymorphism for marker based breeding. The availability of draft genome sequence of G. raimondii and G. arboreum and next generation sequencing (NGS) technologies facilitated the development of high-throughput marker technologies in cotton. The concepts of genetic diversity, QTL mapping, and marker assisted selection (MAS) are evolving into more efficient concepts of linkage disequilibrium, association mapping, and genomic selection, respectively. The objective of the current review is to analyze the pace of evolution in the molecular marker technologies in cotton during the last ten years into the following four areas: (i) comparative analysis of low- and high-throughput marker technologies available in cotton, (ii) genetic diversity in the available wild and improved gene pools of cotton, (iii) identification of the genomic regions within cotton genome underlying economic traits, and (iv) marker based selection methodologies. Moreover, the applications of marker technologies to enhance the breeding efficiency in cotton are also summarized. Aforementioned genomic technologies and the integration of several other omics resources are expected to enhance the cotton productivity and meet the global fiber quantity and quality demands. PMID:25401149

  2. Phylogenomic Analysis and Dynamic Evolution of Chloroplast Genomes in Salicaceae

    PubMed Central

    Huang, Yuan; Wang, Jun; Yang, Yongping; Fan, Chuanzhu; Chen, Jiahui

    2017-01-01

    Chloroplast genomes of plants are highly conserved in both gene order and gene content. Analysis of the whole chloroplast genome is known to provide much more informative DNA sites and thus generates high resolution for plant phylogenies. Here, we report the complete chloroplast genomes of three Salix species in family Salicaceae. Phylogeny of Salicaceae inferred from complete chloroplast genomes is generally consistent with previous studies but resolved with higher statistical support. Incongruences of phylogeny, however, are observed in genus Populus, which most likely results from homoplasy. By comparing three Salix chloroplast genomes with the published chloroplast genomes of other Salicaceae species, we demonstrate that the synteny and length of chloroplast genomes in Salicaceae are highly conserved but experienced dynamic evolution among species. We identify seven positively selected chloroplast genes in Salicaceae, which might be related to the adaptive evolution of Salicaceae species. Comparative chloroplast genome analysis within the family also indicates that some chloroplast genes are lost or became pseudogenes, infer that the chloroplast genes horizontally transferred to the nucleus genome. Based on the complete nucleus genome sequences from two Salicaceae species, we remarkably identify that the entire chloroplast genome is indeed transferred and integrated to the nucleus genome in the individual of the reference genome of P. trichocarpa at least once. This observation, along with presence of the large nuclear plastid DNA (NUPTs) and NUPTs-containing multiple chloroplast genes in their original order in the chloroplast genome, favors the DNA-mediated hypothesis of organelle to nucleus DNA transfer. Overall, the phylogenomic analysis using chloroplast complete genomes clearly elucidates the phylogeny of Salicaceae. The identification of positively selected chloroplast genes and dynamic chloroplast-to-nucleus gene transfers in Salicaceae provide resources to better understand the successful adaptation of Salicaceae species. PMID:28676809

  3. Single-cell copy number variation detection

    PubMed Central

    2011-01-01

    Detection of chromosomal aberrations from a single cell by array comparative genomic hybridization (single-cell array CGH), instead of from a population of cells, is an emerging technique. However, such detection is challenging because of the genome artifacts and the DNA amplification process inherent to the single cell approach. Current normalization algorithms result in inaccurate aberration detection for single-cell data. We propose a normalization method based on channel, genome composition and recurrent genome artifact corrections. We demonstrate that the proposed channel clone normalization significantly improves the copy number variation detection in both simulated and real single-cell array CGH data. PMID:21854607

  4. Bacillus subtilis genome diversity.

    PubMed

    Earl, Ashlee M; Losick, Richard; Kolter, Roberto

    2007-02-01

    Microarray-based comparative genomic hybridization (M-CGH) is a powerful method for rapidly identifying regions of genome diversity among closely related organisms. We used M-CGH to examine the genome diversity of 17 strains belonging to the nonpathogenic species Bacillus subtilis. Our M-CGH results indicate that there is considerable genetic heterogeneity among members of this species; nearly one-third of Bsu168-specific genes exhibited variability, as measured by the microarray hybridization intensities. The variable loci include those encoding proteins involved in antibiotic production, cell wall synthesis, sporulation, and germination. The diversity in these genes may reflect this organism's ability to survive in diverse natural settings.

  5. Phylogeny and comparative genome analysis of a Basidiomycete fungi

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Riley, Robert W.; Salamov, Asaf; Grigoriev, Igor

    2011-03-14

    Fungi of the phylum Basidiomycota, make up some 37percent of the described fungi, and are important from the perspectives of forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, plant pathogenic rusts and smuts, and some human pathogens. To better understand these important fungi, we have undertaken a comparative genomic analysis of the Basidiomycetes with available sequenced genomes. We report a phylogeny that sheds light on previously unclear evolutionary relationships among the Basidiomycetes. We also define a `core proteome? based on protein families conserved in all Basidiomycetes. We identify key expansions and contractions in protein familiesmore » that may be responsible for the degradation of plant biomass such as cellulose, hemicellulose, and lignin. Finally, we speculate as to the genomic changes that drove such expansions and contractions.« less

  6. SALAD database: a motif-based database of protein annotations for plant comparative genomics

    PubMed Central

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933

  7. SALAD database: a motif-based database of protein annotations for plant comparative genomics.

    PubMed

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.

  8. Indexcov: fast coverage quality control for whole-genome sequencing.

    PubMed

    Pedersen, Brent S; Collins, Ryan L; Talkowski, Michael E; Quinlan, Aaron R

    2017-11-01

    The BAM and CRAM formats provide a supplementary linear index that facilitates rapid access to sequence alignments in arbitrary genomic regions. Comparing consecutive entries in a BAM or CRAM index allows one to infer the number of alignment records per genomic region for use as an effective proxy of sequence depth in each genomic region. Based on these properties, we have developed indexcov, an efficient estimator of whole-genome sequencing coverage to rapidly identify samples with aberrant coverage profiles, reveal large-scale chromosomal anomalies, recognize potential batch effects, and infer the sex of a sample. Indexcov is available at https://github.com/brentp/goleft under the MIT license. © The Authors 2017. Published by Oxford University Press.

  9. Global MLST of Salmonella Typhi Revisited in Post-genomic Era: Genetic Conservation, Population Structure, and Comparative Genomics of Rare Sequence Types.

    PubMed

    Yap, Kien-Pong; Ho, Wing S; Gan, Han M; Chai, Lay C; Thong, Kwai L

    2016-01-01

    Typhoid fever, caused by Salmonella enterica serovar Typhi, remains an important public health burden in Southeast Asia and other endemic countries. Various genotyping methods have been applied to study the genetic variations of this human-restricted pathogen. Multilocus sequence typing (MLST) is one of the widely accepted methods, and recently, there is a growing interest in the re-application of MLST in the post-genomic era. In this study, we provide the global MLST distribution of S. Typhi utilizing both publicly available 1,826 S. Typhi genome sequences in addition to performing conventional MLST on S. Typhi strains isolated from various endemic regions spanning over a century. Our global MLST analysis confirms the predominance of two sequence types (ST1 and ST2) co-existing in the endemic regions. Interestingly, S. Typhi strains with ST8 are currently confined within the African continent. Comparative genomic analyses of ST8 and other rare STs with genomes of ST1/ST2 revealed unique mutations in important virulence genes such as flhB, sipC, and tviD that may explain the variations that differentiate between seemingly successful (widespread) and unsuccessful (poor dissemination) S. Typhi populations. Large scale whole-genome phylogeny demonstrated evidence of phylogeographical structuring and showed that ST8 may have diverged from the earlier ancestral population of ST1 and ST2, which later lost some of its fitness advantages, leading to poor worldwide dissemination. In response to the unprecedented increase in genomic data, this study demonstrates and highlights the utility of large-scale genome-based MLST as a quick and effective approach to narrow the scope of in-depth comparative genomic analysis and consequently provide new insights into the fine scale of pathogen evolution and population structure.

  10. Improving prokaryotic transposable elements identification using a combination of de novo and profile HMM methods.

    PubMed

    Kamoun, Choumouss; Payen, Thibaut; Hua-Van, Aurélie; Filée, Jonathan

    2013-10-11

    Insertion Sequences (ISs) and their non-autonomous derivatives (MITEs) are important components of prokaryotic genomes inducing duplication, deletion, rearrangement or lateral gene transfers. Although ISs and MITEs are relatively simple and basic genetic elements, their detection remains a difficult task due to their remarkable sequence diversity. With the advent of high-throughput genome and metagenome sequencing technologies, the development of fast, reliable and sensitive methods of ISs and MITEs detection become an important challenge. So far, almost all studies dealing with prokaryotic transposons have used classical BLAST-based detection methods against reference libraries. Here we introduce alternative methods of detection either taking advantages of the structural properties of the elements (de novo methods) or using an additional library-based method using profile HMM searches. In this study, we have developed three different work flows dedicated to ISs and MITEs detection: the first two use de novo methods detecting either repeated sequences or presence of Inverted Repeats; the third one use 28 in-house transposase alignment profiles with HMM search methods. We have compared the respective performances of each method using a reference dataset of 30 archaeal and 30 bacterial genomes in addition to simulated and real metagenomes. Compared to a BLAST-based method using ISFinder as library, de novo methods significantly improve ISs and MITEs detection. For example, in the 30 archaeal genomes, we discovered 30 new elements (+20%) in addition to the 141 multi-copies elements already detected by the BLAST approach. Many of the new elements correspond to ISs belonging to unknown or highly divergent families. The total number of MITEs has even doubled with the discovery of elements displaying very limited sequence similarities with their respective autonomous partners (mainly in the Inverted Repeats of the elements). Concerning metagenomes, with the exception of short reads data (<300 bp) for which both techniques seem equally limited, profile HMM searches considerably ameliorate the detection of transposase encoding genes (up to +50%) generating low level of false positives compare to BLAST-based methods. Compared to classical BLAST-based methods, the sensitivity of de novo and profile HMM methods developed in this study allow a better and more reliable detection of transposons in prokaryotic genomes and metagenomes. We believed that future studies implying ISs and MITEs identification in genomic data should combine at least one de novo and one library-based method, with optimal results obtained by running the two de novo methods in addition to a library-based search. For metagenomic data, profile HMM search should be favored, a BLAST-based step is only useful to the final annotation into groups and families.

  11. Linking secondary metabolites to gene clusters through genome sequencing of six diverse Aspergillus species

    DOE PAGES

    Kjerbolling, Inge; Vesth, Tammi C.; Frisvad, Jens C.; ...

    2018-01-09

    The fungal genus of Aspergillus is highly interesting, containing everything from industrial cell factories over model organisms to human pathogens. In particular, this group has a prolific production of bioactive secondary metabolites (SMs). In this work, four diverse Aspergillus species (A. campestris, A. novofumigatus, A. ochraceoroseus and A. steynii) has been whole genome PacBio sequenced to provide genetic references in three Aspergillus sections. Additionally, A. taichungensis and A. candidus were sequenced for SM elucidation. Thirteen Aspergillus genomes were analysed with comparative genomics to determine phylogeny and genetic diversity, showing that each new genome contains 15–27% genes not found in othermore » sequenced Aspergilli. In particular, the new species A. novofumigatus was compared to the pathogenic species A. fumigatus. This suggests that A. novofumigatus can produce most of the same allergens, virulence and pathogenicity factors as A. fumigatus suggesting that A. novofumigatus could be as pathogenic as A. fumigatus. Furthermore, SMs were linked to gene clusters based on biological and chemical knowledge and analysis, genome sequences and predictive algorithms.« less

  12. Linking secondary metabolites to gene clusters through genome sequencing of six diverse Aspergillus species

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kjerbolling, Inge; Vesth, Tammi C.; Frisvad, Jens C.

    The fungal genus of Aspergillus is highly interesting, containing everything from industrial cell factories over model organisms to human pathogens. In particular, this group has a prolific production of bioactive secondary metabolites (SMs). In this work, four diverse Aspergillus species (A. campestris, A. novofumigatus, A. ochraceoroseus and A. steynii) has been whole genome PacBio sequenced to provide genetic references in three Aspergillus sections. Additionally, A. taichungensis and A. candidus were sequenced for SM elucidation. Thirteen Aspergillus genomes were analysed with comparative genomics to determine phylogeny and genetic diversity, showing that each new genome contains 15–27% genes not found in othermore » sequenced Aspergilli. In particular, the new species A. novofumigatus was compared to the pathogenic species A. fumigatus. This suggests that A. novofumigatus can produce most of the same allergens, virulence and pathogenicity factors as A. fumigatus suggesting that A. novofumigatus could be as pathogenic as A. fumigatus. Furthermore, SMs were linked to gene clusters based on biological and chemical knowledge and analysis, genome sequences and predictive algorithms.« less

  13. Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications.

    PubMed

    Huang, Lei; Ma, Fei; Chapman, Alec; Lu, Sijia; Xie, Xiaoliang Sunney

    2015-01-01

    We present a survey of single-cell whole-genome amplification (WGA) methods, including degenerate oligonucleotide-primed polymerase chain reaction (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping-based amplification cycles (MALBAC). The key parameters to characterize the performance of these methods are defined, including genome coverage, uniformity, reproducibility, unmappable rates, chimera rates, allele dropout rates, false positive rates for calling single-nucleotide variations, and ability to call copy-number variations. Using these parameters, we compare five commercial WGA kits by performing deep sequencing of multiple single cells. We also discuss several major applications of single-cell genomics, including studies of whole-genome de novo mutation rates, the early evolution of cancer genomes, circulating tumor cells (CTCs), meiotic recombination of germ cells, preimplantation genetic diagnosis (PGD), and preimplantation genomic screening (PGS) for in vitro-fertilized embryos.

  14. arrayCGHbase: an analysis platform for comparative genomic hybridization microarrays

    PubMed Central

    Menten, Björn; Pattyn, Filip; De Preter, Katleen; Robbrecht, Piet; Michels, Evi; Buysse, Karen; Mortier, Geert; De Paepe, Anne; van Vooren, Steven; Vermeesch, Joris; Moreau, Yves; De Moor, Bart; Vermeulen, Stefan; Speleman, Frank; Vandesompele, Jo

    2005-01-01

    Background The availability of the human genome sequence as well as the large number of physically accessible oligonucleotides, cDNA, and BAC clones across the entire genome has triggered and accelerated the use of several platforms for analysis of DNA copy number changes, amongst others microarray comparative genomic hybridization (arrayCGH). One of the challenges inherent to this new technology is the management and analysis of large numbers of data points generated in each individual experiment. Results We have developed arrayCGHbase, a comprehensive analysis platform for arrayCGH experiments consisting of a MIAME (Minimal Information About a Microarray Experiment) supportive database using MySQL underlying a data mining web tool, to store, analyze, interpret, compare, and visualize arrayCGH results in a uniform and user-friendly format. Following its flexible design, arrayCGHbase is compatible with all existing and forthcoming arrayCGH platforms. Data can be exported in a multitude of formats, including BED files to map copy number information on the genome using the Ensembl or UCSC genome browser. Conclusion ArrayCGHbase is a web based and platform independent arrayCGH data analysis tool, that allows users to access the analysis suite through the internet or a local intranet after installation on a private server. ArrayCGHbase is available at . PMID:15910681

  15. Complete genome sequence and comparative genomics of the probiotic yeast Saccharomyces boulardii.

    PubMed

    Khatri, Indu; Tomar, Rajul; Ganesan, K; Prasad, G S; Subramanian, Srikrishna

    2017-03-23

    The probiotic yeast, Saccharomyces boulardii (Sb) is known to be effective against many gastrointestinal disorders and antibiotic-associated diarrhea. To understand molecular basis of probiotic-properties ascribed to Sb we determined the complete genomes of two strains of Sb i.e. Biocodex and unique28 and the draft genomes for three other Sb strains that are marketed as probiotics in India. We compared these genomes with 145 strains of S. cerevisiae (Sc) to understand genome-level similarities and differences between these yeasts. A distinctive feature of Sb from other Sc is absence of Ty elements Ty1, Ty3, Ty4 and associated LTR. However, we could identify complete Ty2 and Ty5 elements in Sb. The genes for hexose transporters HXT11 and HXT9, and asparagine-utilization are absent in all Sb strains. We find differences in repeat periods and copy numbers of repeats in flocculin genes that are likely related to the differential adhesion of Sb as compared to Sc. Core-proteome based taxonomy places Sb strains along with wine strains of Sc. We find the introgression of five genes from Z. bailii into the chromosome IV of Sb and wine strains of Sc. Intriguingly, genes involved in conferring known probiotic properties to Sb are conserved in most Sc strains.

  16. Phenetic Comparison of Prokaryotic Genomes Using k-mers

    PubMed Central

    Déraspe, Maxime; Raymond, Frédéric; Boisvert, Sébastien; Culley, Alexander; Roy, Paul H.; Laviolette, François; Corbeil, Jacques

    2017-01-01

    Abstract Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of Streptococcus pneumoniae and Pseudomonas aeruginosa. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in P. aeruginosa than in S. pneumoniae, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets. PMID:28957508

  17. Signatures of cytoplasmic proteins in the exoproteome distinguish community- and hospital-associated methicillin-resistant Staphylococcus aureus USA300 lineages.

    PubMed

    Mekonnen, Solomon A; Palma Medina, Laura M; Glasner, Corinna; Tsompanidou, Eleni; de Jong, Anne; Grasso, Stefano; Schaffer, Marc; Mäder, Ulrike; Larsen, Anders R; Gumpert, Heidi; Westh, Henrik; Völker, Uwe; Otto, Andreas; Becher, Dörte; van Dijl, Jan Maarten

    2017-08-18

    Methicillin-resistant Staphylococcus aureus (MRSA) is the common name for a heterogeneous group of highly drug-resistant staphylococci. Two major MRSA classes are distinguished based on epidemiology, namely community-associated (CA) and hospital-associated (HA) MRSA. Notably, the distinction of CA- and HA-MRSA based on molecular traits remains difficult due to the high genomic plasticity of S. aureus. Here we sought to pinpoint global distinguishing features of CA- and HA-MRSA through a comparative genome and proteome analysis of the notorious MRSA lineage USA300. We show for the first time that CA- and HA-MRSA isolates can be distinguished by 2 distinct extracellular protein abundance clusters that are predictive not only for epidemiologic behavior, but also for their growth and survival within epithelial cells. This 'exoproteome profiling' also groups more distantly related HA-MRSA isolates into the HA exoproteome cluster. Comparative genome analysis suggests that these distinctive features of CA- and HA-MRSA isolates relate predominantly to the accessory genome. Intriguingly, the identified exoproteome clusters differ in the relative abundance of typical cytoplasmic proteins, suggesting that signatures of cytoplasmic proteins in the exoproteome represent a new distinguishing feature of CA- and HA-MRSA. Our comparative genome and proteome analysis focuses attention on potentially distinctive roles of 'liberated' cytoplasmic proteins in the epidemiology and intracellular survival of CA- and HA-MRSA isolates. Such extracellular cytoplasmic proteins were recently invoked in staphylococcal virulence, but their implication in the epidemiology of MRSA is unprecedented.

  18. Comparative fine mapping of the Wax 1 (W1) locus in hexaploid wheat.

    PubMed

    Lu, Ping; Qin, Jinxia; Wang, Guoxin; Wang, Lili; Wang, Zhenzhong; Wu, Qiuhong; Xie, Jingzhong; Liang, Yong; Wang, Yong; Zhang, Deyun; Sun, Qixin; Liu, Zhiyong

    2015-08-01

    By applying comparative genomics analyses, a high-density genetic linkage map of the Wax 1 ( W1 ) locus was constructed as a framework for map-based cloning. Glaucousness is described as the scattering effect of visible light from wax deposited on the cuticle of plant aerial organs. In wheat, the wax on leaves and stems is mainly controlled by two sets of genes: glaucousness loci (W1 and W2) and non-glaucousness loci (Iw1 and Iw2). Bulked segregant analysis (BSA) and simple sequence repeat (SSR) mapping showed that Wax1 (W1) is located on chromosome arm 2BS between markers Xgwm210 and Xbarc35. By applying comparative genomics analyses, colinearity genomic regions of the W1 locus on wheat 2BS were identified in Brachypodium distachyon chromosome 5, rice chromosome 4 and sorghum chromosome 6, respectively. Four STS markers were developed using the Triticum aestivum cv. Chinese Spring 454 contig sequences and the International Wheat Genome Sequencing Consortium (IWGSC) survey sequences. W1 was mapped into a 0.93 cM genetic interval flanked by markers XWGGC3197 and XWGGC2484, which has synteny with genomic regions of 56.5 kb in Brachypodium, 390 kb in rice and 31.8 kb in sorghum. The fine genetic map can serve as a framework for chromosome landing, physical mapping and map-based cloning of the W1 in wheat.

  19. The complete chloroplast genome sequences of Lychnis wilfordii and Silene capitata and comparative analyses with other Caryophyllaceae genomes.

    PubMed

    Kang, Jong-Soo; Lee, Byoung Yoon; Kwak, Myounghai

    2017-01-01

    The complete chloroplast genomes of Lychnis wilfordii and Silene capitata were determined and compared with ten previously reported Caryophyllaceae chloroplast genomes. The chloroplast genome sequences of L. wilfordii and S. capitata contain 152,320 bp and 150,224 bp, respectively. The gene contents and orders among 12 Caryophyllaceae species are consistent, but several microstructural changes have occurred. Expansion of the inverted repeat (IR) regions at the large single copy (LSC)/IRb and small single copy (SSC)/IR boundaries led to partial or entire gene duplications. Additionally, rearrangements of the LSC region were caused by gene inversions and/or transpositions. The 18 kb inversions, which occurred three times in different lineages of tribe Sileneae, were thought to be facilitated by the intermolecular duplicated sequences. Sequence analyses of the L. wilfordii and S. capitata genomes revealed 39 and 43 repeats, respectively, including forward, palindromic, and reverse repeats. In addition, a total of 67 and 56 simple sequence repeats were discovered in the L. wilfordii and S. capitata chloroplast genomes, respectively. Finally, we constructed phylogenetic trees of the 12 Caryophyllaceae species and two Amaranthaceae species based on 73 protein-coding genes using both maximum parsimony and likelihood methods.

  20. Assembly of the Lactuca sativa, L. cv. Tizian draft genome sequence reveals differences within major resistance complex 1 as compared to the cv. Salinas reference genome.

    PubMed

    Verwaaijen, Bart; Wibberg, Daniel; Nelkner, Johanna; Gordin, Miriam; Rupp, Oliver; Winkler, Anika; Bremges, Andreas; Blom, Jochen; Grosch, Rita; Pühler, Alfred; Schlüter, Andreas

    2018-02-10

    Lettuce (Lactuca sativa, L.) is an important annual plant of the family Asteraceae (Compositae). The commercial lettuce cultivar Tizian has been used in various scientific studies investigating the interaction of the plant with phytopathogens or biological control agents. Here, we present the de novo draft genome sequencing and gene prediction for this specific cultivar derived from transcriptome sequence data. The assembled scaffolds amount to a size of 2.22 Gb. Based on RNAseq data, 31,112 transcript isoforms were identified. Functional predictions for these transcripts were determined within the GenDBE annotation platform. Comparison with the cv. Salinas reference genome revealed a high degree of sequence similarity on genome and transcriptome levels, with an average amino acid identity of 99%. Furthermore, it was observed that two large regions are either missing or are highly divergent within the cv. Tizian genome compared to cv. Salinas. One of these regions covers the major resistance complex 1 region of cv. Salinas. The cv. Tizian draft genome sequence provides a valuable resource for future functional and transcriptome analyses focused on this lettuce cultivar. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. [Comparative results of preimplantation genetic screening by array comparative genomic hybridization and new-generation sequencing].

    PubMed

    Aleksandrova, N V; Shubina, E S; Ekimov, A N; Kodyleva, T A; Mukosey, I S; Makarova, N P; Kulakova, E V; Levkov, L A; Barkov, I Yu; Trofimov, D Yu; Sukhikh, G T

    2017-01-01

    Aneuploidies as quantitative chromosome abnormalities are a main cause of failed development of morphologically normal embryos, implantation failures, and early reproductive losses. Preimplantation genetic screening (PGS) allows a preselection of embryos with a normal karyotype, thus increasing the implantation rate and reducing the frequency of early pregnancy loss after IVF. Modern PGS technologies are based on a genome-wide analysis of the embryo. The first pilot study in Russia was performed to assess the possibility of using semiconductor new-generation sequencing (NGS) as a PGS method. NGS data were collected for 38 biopsied embryos and compared with the data from array comparative genomic hybridization (array-CGH). The concordance between the NGS and array-CGH data was 94.8%. Two samples showed the karyotype 47,XXY by array-CGH and a normal karyotype by NGS. The discrepancies may be explained by loss of efficiency of array-CGH amplicon labeling.

  2. Ecological and evolutionary significance of genomic GC content diversity in monocots

    PubMed Central

    Šmarda, Petr; Bureš, Petr; Horová, Lucie; Leitch, Ilia J.; Mucina, Ladislav; Pacini, Ettore; Tichý, Lubomír; Grulich, Vít; Rotreklová, Olga

    2014-01-01

    Genomic DNA base composition (GC content) is predicted to significantly affect genome functioning and species ecology. Although several hypotheses have been put forward to address the biological impact of GC content variation in microbial and vertebrate organisms, the biological significance of GC content diversity in plants remains unclear because of a lack of sufficiently robust genomic data. Using flow cytometry, we report genomic GC contents for 239 species representing 70 of 78 monocot families and compare them with genomic characters, a suite of life history traits and climatic niche data using phylogeny-based statistics. GC content of monocots varied between 33.6% and 48.9%, with several groups exceeding the GC content known for any other vascular plant group, highlighting their unusual genome architecture and organization. GC content showed a quadratic relationship with genome size, with the decreases in GC content in larger genomes possibly being a consequence of the higher biochemical costs of GC base synthesis. Dramatic decreases in GC content were observed in species with holocentric chromosomes, whereas increased GC content was documented in species able to grow in seasonally cold and/or dry climates, possibly indicating an advantage of GC-rich DNA during cell freezing and desiccation. We also show that genomic adaptations associated with changing GC content might have played a significant role in the evolution of the Earth’s contemporary biota, such as the rise of grass-dominated biomes during the mid-Tertiary. One of the major selective advantages of GC-rich DNA is hypothesized to be facilitating more complex gene regulation. PMID:25225383

  3. Phylogenetic analysis of the true water bugs (Insecta: Hemiptera: Heteroptera: Nepomorpha): evidence from mitochondrial genomes

    PubMed Central

    Hua, Jimeng; Li, Ming; Dong, Pengzhi; Cui, Ying; Xie, Qiang; Bu, Wenjun

    2009-01-01

    Background The true water bugs are grouped in infraorder Nepomorpha (Insecta: Hemiptera: Heteroptera) and are of great economic importance. The phylogenetic relationships within Nepomorpha and the taxonomic hierarchies of Pleoidea and Aphelocheiroidea are uncertain. Most of the previous studies were based on morphological characters without algorithmic assessment. In the latest study, the molecular markers employed in phylogenetic analyses were partial sequences of 16S rDNA and 18S rDNA with a total length about 1 kb. Up to now, no mitochondrial genome of the true water bugs has been sequenced, which is one of the largest data sets that could be compared across animal taxa. In this study we analyzed the unresolved problems in Nepomorpha using evidence from mitochondrial genomes. Results Nine mitochondrial genomes of Nepomorpha and five of other hemipterans were sequenced. These mitochondrial genomes contain the commonly found 37 genes without gene rearrangements. Based on the nucleotide sequences of mt-genomes, Pleoidea is not a member of the Nepomorpha and Aphelocheiroidea should be grouped back into Naucoroidea. Phylogenetic relationships among the superfamilies of Nepomorpha were resolved robustly. Conclusion The mt-genome is an effective data source for resolving intraordinal phylogenetic problems at the superfamily level within Heteroptera. The mitochondrial genomes of the true water bugs are typical insect mt-genomes. Based on the nucleotide sequences of the mt-genomes, we propose the Pleoidea to be a separate heteropteran infraorder. The infraorder Nepomorpha consists of five superfamilies with the relationships (Corixoidea + ((Naucoroidea + Notonectoidea) + (Ochteroidea + Nepoidea))). PMID:19523246

  4. Clarification of Taxonomic Status within the Pseudomonas syringae Species Group Based on a Phylogenomic Analysis

    PubMed Central

    Gomila, Margarita; Busquets, Antonio; Mulet, Magdalena; García-Valdés, Elena; Lalucat, Jorge

    2017-01-01

    The Pseudomonas syringae phylogenetic group comprises 15 recognized bacterial species and more than 60 pathovars. The classification and identification of strains is relevant for practical reasons but also for understanding the epidemiology and ecology of this group of plant pathogenic bacteria. Genome-based taxonomic analyses have been introduced recently to clarify the taxonomy of the whole genus. A set of 139 draft and complete genome sequences of strains belonging to all species of the P. syringae group available in public databases were analyzed, together with the genomes of closely related species used as outgroups. Comparative genomics based on the genome sequences of the species type strains in the group allowed the delineation of phylogenomic species and demonstrated that a high proportion of strains included in the study are misclassified. Furthermore, representatives of at least 7 putative novel species were detected. It was also confirmed that P. ficuserectae, P. meliae, and P. savastanoi are later synonyms of P. amygdali and that “P. coronafaciens” should be revived as a nomenspecies. PMID:29270162

  5. Anticipation of Personal Genomics Data Enhances Interest and Learning Environment in Genomics and Molecular Biology Undergraduate Courses

    PubMed Central

    Weber, K. Scott; Jensen, Jamie L.; Johnson, Steven M.

    2015-01-01

    An important discussion at colleges is centered on determining more effective models for teaching undergraduates. As personalized genomics has become more common, we hypothesized it could be a valuable tool to make science education more hands on, personal, and engaging for college undergraduates. We hypothesized that providing students with personal genome testing kits would enhance the learning experience of students in two undergraduate courses at Brigham Young University: Advanced Molecular Biology and Genomics. These courses have an emphasis on personal genomics the last two weeks of the semester. Students taking these courses were given the option to receive personal genomics kits in 2014, whereas in 2015 they were not. Students sent their personal genomics samples in on their own and received the data after the course ended. We surveyed students in these courses before and after the two-week emphasis on personal genomics to collect data on whether anticipation of obtaining their own personal genomic data impacted undergraduate student learning. We also tested to see if specific personal genomic assignments improved the learning experience by analyzing the data from the undergraduate students who completed both the pre- and post-course surveys. Anticipation of personal genomic data significantly enhanced student interest and the learning environment based on the time students spent researching personal genomic material and their self-reported attitudes compared to those who did not anticipate getting their own data. Personal genomics homework assignments significantly enhanced the undergraduate student interest and learning based on the same criteria and a personal genomics quiz. We found that for the undergraduate students in both molecular biology and genomics courses, incorporation of personal genomic testing can be an effective educational tool in undergraduate science education. PMID:26241308

  6. Anticipation of Personal Genomics Data Enhances Interest and Learning Environment in Genomics and Molecular Biology Undergraduate Courses.

    PubMed

    Weber, K Scott; Jensen, Jamie L; Johnson, Steven M

    2015-01-01

    An important discussion at colleges is centered on determining more effective models for teaching undergraduates. As personalized genomics has become more common, we hypothesized it could be a valuable tool to make science education more hands on, personal, and engaging for college undergraduates. We hypothesized that providing students with personal genome testing kits would enhance the learning experience of students in two undergraduate courses at Brigham Young University: Advanced Molecular Biology and Genomics. These courses have an emphasis on personal genomics the last two weeks of the semester. Students taking these courses were given the option to receive personal genomics kits in 2014, whereas in 2015 they were not. Students sent their personal genomics samples in on their own and received the data after the course ended. We surveyed students in these courses before and after the two-week emphasis on personal genomics to collect data on whether anticipation of obtaining their own personal genomic data impacted undergraduate student learning. We also tested to see if specific personal genomic assignments improved the learning experience by analyzing the data from the undergraduate students who completed both the pre- and post-course surveys. Anticipation of personal genomic data significantly enhanced student interest and the learning environment based on the time students spent researching personal genomic material and their self-reported attitudes compared to those who did not anticipate getting their own data. Personal genomics homework assignments significantly enhanced the undergraduate student interest and learning based on the same criteria and a personal genomics quiz. We found that for the undergraduate students in both molecular biology and genomics courses, incorporation of personal genomic testing can be an effective educational tool in undergraduate science education.

  7. Detection of genomic rearrangements in cucumber using genomecmp software

    NASA Astrophysics Data System (ADS)

    Kulawik, Maciej; Pawełkowicz, Magdalena Ewa; Wojcieszek, Michał; PlÄ der, Wojciech; Nowak, Robert M.

    2017-08-01

    Comparative genomic by increasing information about the genomes sequences available in the databases is a rapidly evolving science. A simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number presents an entry point into comparative genomic analysis. Here we present the utility of the new tool genomecmp for finding rearrangements across the compared sequences and applications in plant comparative genomics.

  8. Structure, evolution, and comparative genomics of tetraploid cotton based on a high-density genetic linkage map

    PubMed Central

    Li, Ximei; Jin, Xin; Wang, Hantao; Zhang, Xianlong; Lin, Zhongxu

    2016-01-01

    A high-density linkage map was constructed using 1,885 newly obtained loci and 3,747 previously published loci, which included 5,152 loci with 4696.03 cM in total length and 0.91 cM in mean distance. Homology analysis in the cotton genome further confirmed the 13 expected homologous chromosome pairs and revealed an obvious inversion on Chr10 or Chr20 and repeated inversions on Chr07 or Chr16. In addition, two reciprocal translocations between Chr02 and Chr03 and between Chr04 and Chr05 were confirmed. Comparative genomics between the tetraploid cotton and the diploid cottons showed that no major structural changes exist between DT and D chromosomes but rather between AT and A chromosomes. Blast analysis between the tetraploid cotton genome and the mixed genome of two diploid cottons showed that most AD chromosomes, regardless of whether it is from the AT or DT genome, preferentially matched with the corresponding homologous chromosome in the diploid A genome, and then the corresponding homologous chromosome in the diploid D genome, indicating that the diploid D genome underwent converted evolution by the diploid A genome to form the DT genome during polyploidization. In addition, the results reflected that a series of chromosomal translocations occurred among Chr01/Chr15, Chr02/Chr14, Chr03/Chr17, Chr04/Chr22, and Chr05/Chr19. PMID:27084896

  9. Augmenting Chinese hamster genome assembly by identifying regions of high confidence.

    PubMed

    Vishwanathan, Nandita; Bandyopadhyay, Arpan A; Fu, Hsu-Yuan; Sharma, Mohit; Johnson, Kathryn C; Mudge, Joann; Ramaraj, Thiruvarangan; Onsongo, Getiria; Silverstein, Kevin A T; Jacob, Nitya M; Le, Huong; Karypis, George; Hu, Wei-Shou

    2016-09-01

    Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. We assembled two independent drafts of Chinese hamster genome by de novo assembly from shotgun sequencing reads and by re-scaffolding and gap-filling the draft genome from NCBI for improved scaffold lengths and gap fractions. We then used the two independent assemblies to identify high confidence regions using two different approaches. First, the two independent assemblies were compared at the sequence level to identify their consensus regions as "high confidence regions" which accounts for at least 78 % of the assembled genome. Further, a genome wide comparison of the Chinese hamster scaffolds with mouse chromosomes revealed scaffolds with large blocks of collinearity, which were also compiled as high-quality scaffolds. Genome scale collinearity was complemented with EST based synteny which also revealed conserved gene order compared to mouse. As cell line sequencing becomes more commonly practiced, the approaches reported here are useful for assessing the quality of assembly and potentially facilitate the engineering of cell lines. Copyright © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Comparative Genomics Reveals Accelerated Evolution in Conserved Pathways during the Diversification of Anole Lizards

    PubMed Central

    Tollis, Marc; Hutchins, Elizabeth D; Stapley, Jessica; Rupp, Shawn M; Eckalbar, Walter L; Maayan, Inbar; Lasku, Eris; Infante, Carlos R; Dennis, Stuart R; Robertson, Joel A; May, Catherine M; Bermingham, Eldredge; DeNardo, Dale F; Hsieh, Shi-Tong Tonia; Kulathinal, Rob J; McMillan, William Owen; Menke, Douglas B; Pratt, Stephen C; Rawls, Jeffery Alan; Sanjur, Oris; Wilson-Rawls, Jeanne; Wilson Sayres, Melissa A; Fisher, Rebecca E

    2018-01-01

    Abstract Squamates include all lizards and snakes, and display some of the most diverse and extreme morphological adaptations among vertebrates. However, compared with birds and mammals, relatively few resources exist for comparative genomic analyses of squamates, hampering efforts to understand the molecular bases of phenotypic diversification in such a speciose clade. In particular, the ∼400 species of anole lizard represent an extensive squamate radiation. Here, we sequence and assemble the draft genomes of three anole species—Anolis frenatus, Anolis auratus, and Anolis apletophallus—for comparison with the available reference genome of Anolis carolinensis. Comparative analyses reveal a rapid background rate of molecular evolution consistent with a model of punctuated equilibrium, and strong purifying selection on functional genomic elements in anoles. We find evidence for accelerated evolution in genes involved in behavior, sensory perception, and reproduction, as well as in genes regulating limb bud development and hindlimb specification. Morphometric analyses of anole fore and hindlimbs corroborated these findings. We detect signatures of positive selection across several genes related to the development and regulation of the forebrain, hormones, and the iguanian lizard dewlap, suggesting molecular changes underlying behavioral adaptations known to reinforce species boundaries were a key component in the diversification of anole lizards. PMID:29360978

  11. Comparative genomic analysis of the thermophilic biomass-degrading fungi Myceliophthora thermophila and Thielavia terrestris

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berka, Randy M.; Grigoriev, Igor V.; Otillar, Robert

    2011-10-02

    Thermostable enzymes and thermophilic cell factories may afford economic advantages in the production of many chemicals and biomass-based fuels. Here we describe and compare the genomes of two thermophilic fungi, Myceliophthora thermophila and Thielavia terrestris. To our knowledge, these genomes are the first described for thermophilic eukaryotes and the first complete telomere-to-telomere genomes for filamentous fungi. Genome analyses and experimental data suggest that both thermophiles are capable of hydrolyzing all major polysaccharides found in biomass. Examination of transcriptome data and secreted proteins suggests that the two fungi use shared approaches in the hydrolysis of cellulose and xylan but distinct mechanismsmore » in pectin degradation. Characterization of the biomass-hydrolyzing activity of recombinant enzymes suggests that these organisms are highly efficient in biomass decomposition at both moderate and high temperatures. Furthermore, we present evidence suggesting that aside from representing a potential reservoir of thermostable enzymes, thermophilic fungi are amenable to manipulation using classical and molecular genetics.« less

  12. Comparative genomic analysis of the thermophilic biomass-degrading fungi Myceliophthora thermophila and Thielavia terrestris

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berka, Randy M.; Grigoriev, Igor V.; Otillar, Robert

    2011-05-16

    Thermostable enzymes and thermophilic cell factories may afford economic advantages in the production of many chemicals and biomass-based fuels. Here we describe and compare the genomes of two thermophilic fungi, Myceliophthora thermophila and Thielavia terrestris. To our knowledge, these genomes are the first described for thermophilic eukaryotes and the first complete telomere-to-telomere genomes for filamentous fungi. Genome analyses and experimental data suggest that both thermophiles are capable of hydrolyzing all major polysaccharides found in biomass. Examination of transcriptome data and secreted proteins suggests that the two fungi use shared approaches in the hydrolysis of cellulose and xylan but distinct mechanismsmore » in pectin degradation. Characterization of the biomass-hydrolyzing activity of recombinant enzymes suggests that these organisms are highly efficient in biomass decomposition at both moderate and high temperatures. Furthermore, we present evidence suggesting that aside from representing a potential reservoir of thermostable enzymes, thermophilic fungi are amenable to manipulation using classical and molecular genetics.« less

  13. Genome size of 14 species of fireflies (Insecta, Coleoptera, Lampyridae)

    PubMed Central

    Liu, Gui-Chun; Dong, Zhi-Wei; He, Jin-Wu; Zhao, Ruo-Ping; Wang, Wen; Li, Xue-Yan

    2017-01-01

    Eukaryotic genome size data are important both as the basis for comparative research into genome evolution and as estimators of the cost and difficulty of genome sequencing programs for non-model organisms. In this study, the genome size of 14 species of fireflies (Lampyridae) (two genera in Lampyrinae, three genera in Luciolinae, and one genus in subfamily incertae sedis) were estimated by propidium iodide (PI)-based flow cytometry. The haploid genome sizes of Lampyridae ranged from 0. 42 to 1. 31 pg, a 3. 1-fold span. Genome sizes of the fireflies varied within the tested subfamilies and genera. Lamprigera and Pyrocoelia species had large and small genome sizes, respectively. No correlation was found between genome size and morphological traits such as body length, body width, eye width, and antennal length. Our data provide additional information on genome size estimation of the firefly family Lampyridae. Furthermore, this study will help clarify the cost and difficulty of genome sequencing programs for non-model organisms and will help promote studies on firefly genome evolution. PMID:29280364

  14. Comparative Genomics of 12 Strains of Erwinia amylovora Identifies a Pan-Genome with a Large Conserved Core

    PubMed Central

    Mann, Rachel A.; Smits, Theo H. M.; Bühlmann, Andreas; Blom, Jochen; Goesmann, Alexander; Frey, Jürg E.; Plummer, Kim M.; Beer, Steven V.; Luck, Joanne; Duffy, Brion; Rodoni, Brendan

    2013-01-01

    The plant pathogen Erwinia amylovora can be divided into two host-specific groupings; strains infecting a broad range of hosts within the Rosaceae subfamily Spiraeoideae (e.g., Malus, Pyrus, Crataegus, Sorbus) and strains infecting Rubus (raspberries and blackberries). Comparative genomic analysis of 12 strains representing distinct populations (e.g., geographic, temporal, host origin) of E. amylovora was used to describe the pan-genome of this major pathogen. The pan-genome contains 5751 coding sequences and is highly conserved relative to other phytopathogenic bacteria comprising on average 89% conserved, core genes. The chromosomes of Spiraeoideae-infecting strains were highly homogeneous, while greater genetic diversity was observed between Spiraeoideae- and Rubus-infecting strains (and among individual Rubus-infecting strains), the majority of which was attributed to variable genomic islands. Based on genomic distance scores and phylogenetic analysis, the Rubus-infecting strain ATCC BAA-2158 was genetically more closely related to the Spiraeoideae-infecting strains of E. amylovora than it was to the other Rubus-infecting strains. Analysis of the accessory genomes of Spiraeoideae- and Rubus-infecting strains has identified putative host-specific determinants including variation in the effector protein HopX1Ea and a putative secondary metabolite pathway only present in Rubus-infecting strains. PMID:23409014

  15. Comparative genomics of 12 strains of Erwinia amylovora identifies a pan-genome with a large conserved core.

    PubMed

    Mann, Rachel A; Smits, Theo H M; Bühlmann, Andreas; Blom, Jochen; Goesmann, Alexander; Frey, Jürg E; Plummer, Kim M; Beer, Steven V; Luck, Joanne; Duffy, Brion; Rodoni, Brendan

    2013-01-01

    The plant pathogen Erwinia amylovora can be divided into two host-specific groupings; strains infecting a broad range of hosts within the Rosaceae subfamily Spiraeoideae (e.g., Malus, Pyrus, Crataegus, Sorbus) and strains infecting Rubus (raspberries and blackberries). Comparative genomic analysis of 12 strains representing distinct populations (e.g., geographic, temporal, host origin) of E. amylovora was used to describe the pan-genome of this major pathogen. The pan-genome contains 5751 coding sequences and is highly conserved relative to other phytopathogenic bacteria comprising on average 89% conserved, core genes. The chromosomes of Spiraeoideae-infecting strains were highly homogeneous, while greater genetic diversity was observed between Spiraeoideae- and Rubus-infecting strains (and among individual Rubus-infecting strains), the majority of which was attributed to variable genomic islands. Based on genomic distance scores and phylogenetic analysis, the Rubus-infecting strain ATCC BAA-2158 was genetically more closely related to the Spiraeoideae-infecting strains of E. amylovora than it was to the other Rubus-infecting strains. Analysis of the accessory genomes of Spiraeoideae- and Rubus-infecting strains has identified putative host-specific determinants including variation in the effector protein HopX1(Ea) and a putative secondary metabolite pathway only present in Rubus-infecting strains.

  16. Comparative genomic analysis of clinical and environmental strains provides insight into the pathogenicity and evolution of Vibrio parahaemolyticus.

    PubMed

    Li, Lei; Wong, Hin-chung; Nong, Wenyan; Cheung, Man Kit; Law, Patrick Tik Wan; Kam, Kai Man; Kwan, Hoi Shan

    2014-12-18

    Vibrio parahaemolyticus is a Gram-negative halophilic bacterium. Infections with the bacterium could become systemic and can be life-threatening to immunocompromised individuals. Genome sequences of a few clinical isolates of V. parahaemolyticus are currently available, but the genome dynamics across the species and virulence potential of environmental strains on a genome-scale have not been described before. Here we present genome sequences of four V. parahaemolyticus clinical strains from stool samples of patients and five environmental strains in Hong Kong. Phylogenomics analysis based on single nucleotide polymorphisms revealed a clear distinction between the clinical and environmental isolates. A new gene cluster belonging to the biofilm associated proteins of V. parahaemolyticus was found in clincial strains. In addition, a novel small genomic island frequently found among clinical isolates was reported. A few environmental strains were found harboring virulence genes and prophage elements, indicating their virulence potential. A unique biphenyl degradation pathway was also reported. A database for V. parahaemolyticus (http://kwanlab.bio.cuhk.edu.hk/vp) was constructed here as a platform to access and analyze genome sequences and annotations of the bacterium. We have performed a comparative genomics analysis of clinical and environmental strains of V. parahaemolyticus. Our analyses could facilitate understanding of the phylogenetic diversity and niche adaptation of this bacterium.

  17. Genome-wide comparative analysis of NBS-encoding genes between Brassica species and Arabidopsis thaliana.

    PubMed

    Yu, Jingyin; Tehrim, Sadia; Zhang, Fengqi; Tong, Chaobo; Huang, Junyan; Cheng, Xiaohui; Dong, Caihua; Zhou, Yanqiu; Qin, Rui; Hua, Wei; Liu, Shengyi

    2014-01-03

    Plant disease resistance (R) genes with the nucleotide binding site (NBS) play an important role in offering resistance to pathogens. The availability of complete genome sequences of Brassica oleracea and Brassica rapa provides an important opportunity for researchers to identify and characterize NBS-encoding R genes in Brassica species and to compare with analogues in Arabidopsis thaliana based on a comparative genomics approach. However, little is known about the evolutionary fate of NBS-encoding genes in the Brassica lineage after split from A. thaliana. Here we present genome-wide analysis of NBS-encoding genes in B. oleracea, B. rapa and A. thaliana. Through the employment of HMM search and manual curation, we identified 157, 206 and 167 NBS-encoding genes in B. oleracea, B. rapa and A. thaliana genomes, respectively. Phylogenetic analysis among 3 species classified NBS-encoding genes into 6 subgroups. Tandem duplication and whole genome triplication (WGT) analyses revealed that after WGT of the Brassica ancestor, NBS-encoding homologous gene pairs on triplicated regions in Brassica ancestor were deleted or lost quickly, but NBS-encoding genes in Brassica species experienced species-specific gene amplification by tandem duplication after divergence of B. rapa and B. oleracea. Expression profiling of NBS-encoding orthologous gene pairs indicated the differential expression pattern of retained orthologous gene copies in B. oleracea and B. rapa. Furthermore, evolutionary analysis of CNL type NBS-encoding orthologous gene pairs among 3 species suggested that orthologous genes in B. rapa species have undergone stronger negative selection than those in B .oleracea species. But for TNL type, there are no significant differences in the orthologous gene pairs between the two species. This study is first identification and characterization of NBS-encoding genes in B. rapa and B. oleracea based on whole genome sequences. Through tandem duplication and whole genome triplication analysis in B. oleracea, B. rapa and A. thaliana genomes, our study provides insight into the evolutionary history of NBS-encoding genes after divergence of A. thaliana and the Brassica lineage. These results together with expression pattern analysis of NBS-encoding orthologous genes provide useful resource for functional characterization of these genes and genetic improvement of relevant crops.

  18. A comparative genomics perspective on the genetic content of the alkaliphilic haloarchaeon Natrialba magadii ATCC 43099T

    PubMed Central

    2012-01-01

    Background Natrialba magadii is an aerobic chemoorganotrophic member of the Euryarchaeota and is a dual extremophile requiring alkaline conditions and hypersalinity for optimal growth. The genome sequence of Nab. magadii type strain ATCC 43099 was deciphered to obtain a comprehensive insight into the genetic content of this haloarchaeon and to understand the basis of some of the cellular functions necessary for its survival. Results The genome of Nab. magadii consists of four replicons with a total sequence of 4,443,643 bp and encodes 4,212 putative proteins, some of which contain peptide repeats of various lengths. Comparative genome analyses facilitated the identification of genes encoding putative proteins involved in adaptation to hypersalinity, stress response, glycosylation, and polysaccharide biosynthesis. A proton-driven ATP synthase and a variety of putative cytochromes and other proteins supporting aerobic respiration and electron transfer were encoded by one or more of Nab. magadii replicons. The genome encodes a number of putative proteases/peptidases as well as protein secretion functions. Genes encoding putative transcriptional regulators, basal transcription factors, signal perception/transduction proteins, and chemotaxis/phototaxis proteins were abundant in the genome. Pathways for the biosynthesis of thiamine, riboflavin, heme, cobalamin, coenzyme F420 and other essential co-factors were deduced by in depth sequence analyses. However, approximately 36% of Nab. magadii protein coding genes could not be assigned a function based on Blast analysis and have been annotated as encoding hypothetical or conserved hypothetical proteins. Furthermore, despite extensive comparative genomic analyses, genes necessary for survival in alkaline conditions could not be identified in Nab. magadii. Conclusions Based on genomic analyses, Nab. magadii is predicted to be metabolically versatile and it could use different carbon and energy sources to sustain growth. Nab. magadii has the genetic potential to adapt to its milieu by intracellular accumulation of inorganic cations and/or neutral organic compounds. The identification of Nab. magadii genes involved in coenzyme biosynthesis is a necessary step toward further reconstruction of the metabolic pathways in halophilic archaea and other extremophiles. The knowledge gained from the genome sequence of this haloalkaliphilic archaeon is highly valuable in advancing the applications of extremophiles and their enzymes. PMID:22559199

  19. Comparative genomics of parasitic silkworm microsporidia reveal an association between genome expansion and host adaptation

    PubMed Central

    2013-01-01

    Background Microsporidian Nosema bombycis has received much attention because the pébrine disease of domesticated silkworms results in great economic losses in the silkworm industry. So far, no effective treatment could be found for pébrine. Compared to other known Nosema parasites, N. bombycis can unusually parasitize a broad range of hosts. To gain some insights into the underlying genetic mechanism of pathological ability and host range expansion in this parasite, a comparative genomic approach is conducted. The genome of two Nosema parasites, N. bombycis and N. antheraeae (an obligatory parasite to undomesticated silkworms Antheraea pernyi), were sequenced and compared with their distantly related species, N. ceranae (an obligatory parasite to honey bees). Results Our comparative genomics analysis show that the N. bombycis genome has greatly expanded due to the following three molecular mechanisms: 1) the proliferation of host-derived transposable elements, 2) the acquisition of many horizontally transferred genes from bacteria, and 3) the production of abundnant gene duplications. To our knowledge, duplicated genes derived not only from small-scale events (e.g., tandem duplications) but also from large-scale events (e.g., segmental duplications) have never been seen so abundant in any reported microsporidia genomes. Our relative dating analysis further indicated that these duplication events have arisen recently over very short evolutionary time. Furthermore, several duplicated genes involving in the cytotoxic metabolic pathway were found to undergo positive selection, suggestive of the role of duplicated genes on the adaptive evolution of pathogenic ability. Conclusions Genome expansion is rarely considered as the evolutionary outcome acting on those highly reduced and compact parasitic microsporidian genomes. This study, for the first time, demonstrates that the parasitic genomes can expand, instead of shrink, through several common molecular mechanisms such as gene duplication, horizontal gene transfer, and transposable element expansion. We also showed that the duplicated genes can serve as raw materials for evolutionary innovations possibly contributing to the increase of pathologenic ability. Based on our research, we propose that duplicated genes of N. bombycis should be treated as primary targets for treatment designs against pébrine. PMID:23496955

  20. PGSB/MIPS Plant Genome Information Resources and Concepts for the Analysis of Complex Grass Genomes.

    PubMed

    Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X

    2016-01-01

    PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.

  1. Genomic analysis of WCP30 Phage of Weissella cibaria for Dairy Fermented Foods.

    PubMed

    Lee, Young-Duck; Park, Jong-Hyun

    2017-01-01

    In this study, we report the morphogenetic analysis and genome sequence of a new WCP30 phage of Weissella cibaria , isolated from a fermented food. Based on its morphology, as observed by transmission electron microscopy, WCP30 phage belongs to the family Siphoviridae . Genomic analysis of WCP30 phage showed that it had a 33,697-bp double-stranded DNA genome with 41.2% G+C content. Bioinformatics analysis of the genome revealed 35 open reading frames. A BLASTN search showed that WCP30 phage had low sequence similarity compared to other phages infecting lactic acid bacteria. This is the first report of the morphological features and complete genome sequence of WCP30 phage, which may be useful for controlling the fermentation of dairy foods.

  2. A combinatorial approach of comprehensive QTL-based comparative genome mapping and transcript profiling identified a seed weight-regulating candidate gene in chickpea

    PubMed Central

    Bajaj, Deepak; Upadhyaya, Hari D.; Khan, Yusuf; Das, Shouvik; Badoni, Saurabh; Shree, Tanima; Kumar, Vinod; Tripathi, Shailesh; Gowda, C. L. L.; Singh, Sube; Sharma, Shivali; Tyagi, Akhilesh K.; Chattopdhyay, Debasis; Parida, Swarup K.

    2015-01-01

    High experimental validation/genotyping success rate (94–96%) and intra-specific polymorphic potential (82–96%) of 1536 SNP and 472 SSR markers showing in silico polymorphism between desi ICC 4958 and kabuli ICC 12968 chickpea was obtained in a 190 mapping population (ICC 4958 × ICC 12968) and 92 diverse desi and kabuli genotypes. A high-density 2001 marker-based intra-specific genetic linkage map comprising of eight LGs constructed is comparatively much saturated (mean map-density: 0.94 cM) in contrast to existing intra-specific genetic maps in chickpea. Fifteen robust QTLs (PVE: 8.8–25.8% with LOD: 7.0–13.8) associated with pod and seed number/plant (PN and SN) and 100 seed weight (SW) were identified and mapped on 10 major genomic regions of eight LGs. One of 126.8 kb major genomic region harbouring a strong SW-associated robust QTL (Caq'SW1.1: 169.1–171.3 cM) has been delineated by integrating high-resolution QTL mapping with comprehensive marker-based comparative genome mapping and differential expression profiling. This identified one potential regulatory SNP (G/A) in the cis-acting element of candidate ERF (ethylene responsive factor) TF (transcription factor) gene governing seed weight in chickpea. The functionally relevant molecular tags identified have potential to be utilized for marker-assisted genetic improvement of chickpea. PMID:25786576

  3. Reduce Manual Curation by Combining Gene Predictions from Multiple Annotation Engines, a Case Study of Start Codon Prediction

    PubMed Central

    Ederveen, Thomas H. A.; Overmars, Lex; van Hijum, Sacha A. F. T.

    2013-01-01

    Nowadays, prokaryotic genomes are sequenced faster than the capacity to manually curate gene annotations. Automated genome annotation engines provide users a straight-forward and complete solution for predicting ORF coordinates and function. For many labs, the use of AGEs is therefore essential to decrease the time necessary for annotating a given prokaryotic genome. However, it is not uncommon for AGEs to provide different and sometimes conflicting predictions. Combining multiple AGEs might allow for more accurate predictions. Here we analyzed the ab initio open reading frame (ORF) calling performance of different AGEs based on curated genome annotations of eight strains from different bacterial species with GC% ranging from 35–52%. We present a case study which demonstrates a novel way of comparative genome annotation, using combinations of AGEs in a pre-defined order (or path) to predict ORF start codons. The order of AGE combinations is from high to low specificity, where the specificity is based on the eight genome annotations. For each AGE combination we are able to derive a so-called projected confidence value, which is the average specificity of ORF start codon prediction based on the eight genomes. The projected confidence enables estimating likeliness of a correct prediction for a particular ORF start codon by a particular AGE combination, pinpointing ORFs notoriously difficult to predict start codons. We correctly predict start codons for 90.5±4.8% of the genes in a genome (based on the eight genomes) with an accuracy of 81.1±7.6%. Our consensus-path methodology allows a marked improvement over majority voting (9.7±4.4%) and with an optimal path ORF start prediction sensitivity is gained while maintaining a high specificity. PMID:23675487

  4. SeeGH--a software tool for visualization of whole genome array comparative genomic hybridization data.

    PubMed

    Chi, Bryan; DeLeeuw, Ronald J; Coe, Bradley P; MacAulay, Calum; Lam, Wan L

    2004-02-09

    Array comparative genomic hybridization (CGH) is a technique which detects copy number differences in DNA segments. Complete sequencing of the human genome and the development of an array representing a tiling set of tens of thousands of DNA segments spanning the entire human genome has made high resolution copy number analysis throughout the genome possible. Since array CGH provides signal ratio for each DNA segment, visualization would require the reassembly of individual data points into chromosome profiles. We have developed a visualization tool for displaying whole genome array CGH data in the context of chromosomal location. SeeGH is an application that translates spot signal ratio data from array CGH experiments to displays of high resolution chromosome profiles. Data is imported from a simple tab delimited text file obtained from standard microarray image analysis software. SeeGH processes the signal ratio data and graphically displays it in a conventional CGH karyotype diagram with the added features of magnification and DNA segment annotation. In this process, SeeGH imports the data into a database, calculates the average ratio and standard deviation for each replicate spot, and links them to chromosome regions for graphical display. Once the data is displayed, users have the option of hiding or flagging DNA segments based on user defined criteria, and retrieve annotation information such as clone name, NCBI sequence accession number, ratio, base pair position on the chromosome, and standard deviation. SeeGH represents a novel software tool used to view and analyze array CGH data. The software gives users the ability to view the data in an overall genomic view as well as magnify specific chromosomal regions facilitating the precise localization of genetic alterations. SeeGH is easily installed and runs on Microsoft Windows 2000 or later environments.

  5. Detecting the borders between coding and non-coding DNA regions in prokaryotes based on recursive segmentation and nucleotide doublets statistics

    PubMed Central

    2012-01-01

    Background Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved. Methods In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets. Results Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences. Conclusions This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences. PMID:23282225

  6. EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity

    PubMed Central

    Nahum, Laila A; Reynolds, Matthew T; Wang, Zhengyuan O; Faith, Jeremiah J; Jonna, Rahul; Jiang, Zhi J; Meyer, Thomas J; Pollock, David D

    2006-01-01

    Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity. PMID:17118150

  7. Genome survey and high-density genetic map construction provide genomic and genetic resources for the Pacific White Shrimp Litopenaeus vannamei

    PubMed Central

    Yu, Yang; Zhang, Xiaojun; Yuan, Jianbo; Li, Fuhua; Chen, Xiaohan; Zhao, Yongzhen; Huang, Long; Zheng, Hongkun; Xiang, Jianhai

    2015-01-01

    The Pacific white shrimp Litopenaeus vannamei is the dominant crustacean species in global seafood mariculture. Understanding the genome and genetic architecture is useful for deciphering complex traits and accelerating the breeding program in shrimp. In this study, a genome survey was conducted and a high-density linkage map was constructed using a next-generation sequencing approach. The genome survey was used to identify preliminary genome characteristics and to generate a rough reference for linkage map construction. De novo SNP discovery resulted in 25,140 polymorphic markers. A total of 6,359 high-quality markers were selected for linkage map construction based on marker coverage among individuals and read depths. For the linkage map, a total of 6,146 markers spanning 4,271.43 cM were mapped to 44 sex-averaged linkage groups, with an average marker distance of 0.7 cM. An integration analysis linked 5,885 genome scaffolds and 1,504 BAC clones to the linkage map. Based on the high-density linkage map, several QTLs for body weight and body length were detected. This high-density genetic linkage map reveals basic genomic architecture and will be useful for comparative genomics research, genome assembly and genetic improvement of L. vannamei and other penaeid shrimp species. PMID:26503227

  8. Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity.

    PubMed

    Edger, Patrick P; VanBuren, Robert; Colle, Marivi; Poorten, Thomas J; Wai, Ching Man; Niederhuth, Chad E; Alger, Elizabeth I; Ou, Shujun; Acharya, Charlotte B; Wang, Jie; Callow, Pete; McKain, Michael R; Shi, Jinghua; Collier, Chad; Xiong, Zhiyong; Mower, Jeffrey P; Slovin, Janet P; Hytönen, Timo; Jiang, Ning; Childs, Kevin L; Knapp, Steven J

    2018-02-01

    Although draft genomes are available for most agronomically important plant species, the majority are incomplete, highly fragmented, and often riddled with assembly and scaffolding errors. These assembly issues hinder advances in tool development for functional genomics and systems biology. Here we utilized a robust, cost-effective approach to produce high-quality reference genomes. We report a near-complete genome of diploid woodland strawberry (Fragaria vesca) using single-molecule real-time sequencing from Pacific Biosciences (PacBio). This assembly has a contig N50 length of ∼7.9 million base pairs (Mb), representing a ∼300-fold improvement of the previous version. The vast majority (>99.8%) of the assembly was anchored to 7 pseudomolecules using 2 sets of optical maps from Bionano Genomics. We obtained ∼24.96 Mb of sequence not present in the previous version of the F. vesca genome and produced an improved annotation that includes 1496 new genes. Comparative syntenic analyses uncovered numerous, large-scale scaffolding errors present in each chromosome in the previously published version of the F. vesca genome. Our results highlight the need to improve existing short-read based reference genomes. Furthermore, we demonstrate how genome quality impacts commonly used analyses for addressing both fundamental and applied biological questions. © The Authors 2017. Published by Oxford University Press.

  9. Comparative genome analysis in the integrated microbial genomes (IMG) system.

    PubMed

    Markowitz, Victor M; Kyrpides, Nikos C

    2007-01-01

    Comparative genome analysis is critical for the effective exploration of a rapidly growing number of complete and draft sequences for microbial genomes. The Integrated Microbial Genomes (IMG) system (img.jgi.doe.gov) has been developed as a community resource that provides support for comparative analysis of microbial genomes in an integrated context. IMG allows users to navigate the multidimensional microbial genome data space and focus their analysis on a subset of genes, genomes, and functions of interest. IMG provides graphical viewers, summaries, and occurrence profile tools for comparing genes, pathways, and functions (terms) across specific genomes. Genes can be further examined using gene neighborhoods and compared with sequence alignment tools.

  10. Characterization and Complete Genome Sequences of Three N4-Like Roseobacter Phages Isolated from the South China Sea.

    PubMed

    Li, Baolian; Zhang, Si; Long, Lijuan; Huang, Sijun

    2016-09-01

    Three bacteriophages (RD-1410W1-01, RD-1410Ws-07, and DS-1410Ws-06) were isolated from the surface water of Sanya Bay, northern South China Sea, on two marine bacteria type strains of the Roseobacter lineage. These phages have an isometric head and a short tail, morphologically belonging to the Podoviridae family. Two of these phages can infect four of seven marine roseobacter strains tested and the other one can infect three of them, showing relatively broader host ranges compared to known N4-like roseophages. One-step growth curves showed that these phages have similar short latent periods (1-2 h) but highly variable burst sizes (27-341 pfu cell(-1)). Their complete genomes show high level of similarities to known N4-like roseophages in terms of genome size, G + C content, gene content, and arrangement. The morphological and genomic features of these phages indicate that they belong to the N4likevirus genus. Moreover, comparative genomic analysis based on 43 N4-like phages (10 roseobacter phages and 33 phages infecting other lineages of bacteria) revealed a core genome of 18 genes shared by all the 43 phages and 38 genes shared by all the ten roseophages. The 38 core genes of N4-like roseophages nearly make up 70 % of each genome in length. Phylogenetic analysis based on the concatenated core gene products showed that our phage isolates represent two new phyletic branches, suggesting the broad genetic diversity of marine N4-like roseophages remains.

  11. Making genomic medicine evidence-based and patient-centered: a structured review and landscape analysis of comparative effectiveness research.

    PubMed

    Phillips, Kathryn A; Deverka, Patricia A; Sox, Harold C; Khoury, Muin J; Sandy, Lewis G; Ginsburg, Geoffrey S; Tunis, Sean R; Orlando, Lori A; Douglas, Michael P

    2017-10-01

    Comparative effectiveness research (CER) in genomic medicine (GM) measures the clinical utility of using genomic information to guide clinical care in comparison to appropriate alternatives. We summarized findings of high-quality systematic reviews that compared the analytic and clinical validity and clinical utility of GM tests. We focused on clinical utility findings to summarize CER-derived evidence about GM and identify evidence gaps and future research needs. We abstracted key elements of study design, GM interventions, results, and study quality ratings from 21 systematic reviews published in 2010 through 2015. More than half (N = 13) of the reviews were of cancer-related tests. All reviews identified potentially important clinical applications of the GM interventions, but most had significant methodological weaknesses that largely precluded any conclusions about clinical utility. Twelve reviews discussed the importance of patient-centered outcomes, although few described evidence about the impact of genomic medicine on these outcomes. In summary, we found a very limited body of evidence about the effect of using genomic tests on health outcomes and many evidence gaps for CER to address.Genet Med advance online publication 13 April 2017.

  12. Making genomic medicine evidence-based and patient-centered: a structured review and landscape analysis of comparative effectiveness research

    PubMed Central

    Phillips, Kathryn A.; Deverka, Patricia A.; Sox, Harold C.; Khoury, Muin J.; Sandy, Lewis G.; Ginsburg, Geoffrey S.; Tunis, Sean R.; Orlando, Lori A.; Douglas, Michael P.

    2017-01-01

    Comparative effectiveness research (CER) in genomic medicine (GM) measures the clinical utility of using genomic information to guide clinical care in comparison to appropriate alternatives. We summarized findings of high-quality systematic reviews that compared the analytic and clinical validity and clinical utility of GM tests. We focused on clinical utility findings to summarize CER-derived evidence about GM and identify evidence gaps and future research needs. We abstracted key elements of study design, GM interventions, results, and study quality ratings from 21 systematic reviews published in 2010 through 2015. More than half (N = 13) of the reviews were of cancer-related tests. All reviews identified potentially important clinical applications of the GM interventions, but most had significant methodological weaknesses that largely precluded any conclusions about clinical utility. Twelve reviews discussed the importance of patient-centered outcomes, although few described evidence about the impact of genomic medicine on these outcomes. In summary, we found a very limited body of evidence about the effect of using genomic tests on health outcomes and many evidence gaps for CER to address. Genet Med advance online publication 13 April 2017 PMID:28406488

  13. Genomic relationships based on X chromosome markers and accuracy of genomic predictions with and without X chromosome markers

    PubMed Central

    2014-01-01

    Background Although the X chromosome is the second largest bovine chromosome, markers on the X chromosome are not used for genomic prediction in some countries and populations. In this study, we presented a method for computing genomic relationships using X chromosome markers, investigated the accuracy of imputation from a low density (7K) to the 54K SNP (single nucleotide polymorphism) panel, and compared the accuracy of genomic prediction with and without using X chromosome markers. Methods The impact of considering X chromosome markers on prediction accuracy was assessed using data from Nordic Holstein bulls and different sets of SNPs: (a) the 54K SNPs for reference and test animals, (b) SNPs imputed from the 7K to the 54K SNP panel for test animals, (c) SNPs imputed from the 7K to the 54K panel for half of the reference animals, and (d) the 7K SNP panel for all animals. Beagle and Findhap were used for imputation. GBLUP (genomic best linear unbiased prediction) models with or without X chromosome markers and with or without a residual polygenic effect were used to predict genomic breeding values for 15 traits. Results Averaged over the two imputation datasets, correlation coefficients between imputed and true genotypes for autosomal markers, pseudo-autosomal markers, and X-specific markers were 0.971, 0.831 and 0.935 when using Findhap, and 0.983, 0.856 and 0.937 when using Beagle. Estimated reliabilities of genomic predictions based on the imputed datasets using Findhap or Beagle were very close to those using the real 54K data. Genomic prediction using all markers gave slightly higher reliabilities than predictions without X chromosome markers. Based on our data which included only bulls, using a G matrix that accounted for sex-linked relationships did not improve prediction, compared with a G matrix that did not account for sex-linked relationships. A model that included a polygenic effect did not recover the loss of prediction accuracy from exclusion of X chromosome markers. Conclusions The results from this study suggest that markers on the X chromosome contribute to accuracy of genomic predictions and should be used for routine genomic evaluation. PMID:25080199

  14. Effective de novo assembly of fish genome using haploid larvae.

    PubMed

    Iwasaki, Yuki; Nishiki, Issei; Nakamura, Yoji; Yasuike, Motoshige; Kai, Wataru; Nomura, Kazuharu; Yoshida, Kazunori; Nomura, Yousuke; Fujiwara, Atushi; Kobayashi, Takanori; Ototake, Mitsuru

    2016-02-01

    Recent improvements in next-generation sequencing technology have made it possible to do whole genome sequencing, on even non-model eukaryote species with no available reference genomes. However, de novo assembly of diploid genomes is still a big challenge because of allelic variation. The aim of this study was to determine the feasibility of utilizing the genome of haploid fish larvae for de novo assembly of whole-genome sequences. We compared the efficiency of assembly using the haploid genome of yellowtail (Seriola quinqueradiata) with that using the diploid genome obtained from the dam. De novo assembly from the haploid and the diploid sequence reads (100 million reads per each datasets) generated by the Ion Proton sequencer (200 bp) was done under two different assembly algorithms, namely overlap-layout-consensus (OLC) and de Bruijn graph (DBG). This revealed that the assembly of the haploid genome significantly reduced (approximately 22% for OLC, 9% for DBG) the total number of contigs (with longer average and N50 contig lengths) when compared to the diploid genome assembly. The haploid assembly also improved the quality of the scaffolds by reducing the number of regions with unassigned nucleotides (Ns) (total length of Ns; 45,331,916 bp for haploids and 67,724,360 bp for diploids) in OLC-based assemblies. It appears clear that the haploid genome assembly is better because the allelic variation in the diploid genome disrupts the extension of contigs during the assembly process. Our results indicate that utilizing the genome of haploid larvae leads to a significant improvement in the de novo assembly process, thus providing a novel strategy for the construction of reference genomes from non-model diploid organisms such as fish. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

  15. Widespread of horizontal gene transfer in the human genome.

    PubMed

    Huang, Wenze; Tsai, Lillian; Li, Yulong; Hua, Nan; Sun, Chen; Wei, Chaochun

    2017-04-04

    A fundamental concept in biology is that heritable material is passed from parents to offspring, a process called vertical gene transfer. An alternative mechanism of gene acquisition is through horizontal gene transfer (HGT), which involves movement of genetic materials between different species. Horizontal gene transfer has been found prevalent in prokaryotes but very rare in eukaryote. In this paper, we investigate horizontal gene transfer in the human genome. From the pair-wise alignments between human genome and 53 vertebrate genomes, 1,467 human genome regions (2.6 M bases) from all chromosomes were found to be more conserved with non-mammals than with most mammals. These human genome regions involve 642 known genes, which are enriched with ion binding. Compared to known horizontal gene transfer regions in the human genome, there were few overlapping regions, which indicated horizontal gene transfer is more common than we expected in the human genome. Horizontal gene transfer impacts hundreds of human genes and this study provided insight into potential mechanisms of HGT in the human genome.

  16. Characterization of hemizygous deletions in Citrus using array-Comparative Genomic Hybridization and microsynteny comparisons with the poplar genome

    PubMed Central

    Ríos, Gabino; Naranjo, Miguel A; Iglesias, Domingo J; Ruiz-Rivero, Omar; Geraud, Marion; Usach, Antonio; Talón, Manuel

    2008-01-01

    Background Many fruit-tree species, including relevant Citrus spp varieties exhibit a reproductive biology that impairs breeding and strongly constrains genetic improvements. In citrus, juvenility increases the generation time while sexual sterility, inbreeding depression and self-incompatibility prevent the production of homozygous cultivars. Genomic technology may provide citrus researchers with a new set of tools to address these various restrictions. In this work, we report a valuable genomics-based protocol for the structural analysis of deletion mutations on an heterozygous background. Results Two independent fast neutron mutants of self-incompatible clementine (Citrus clementina Hort. Ex Tan. cv. Clemenules) were the subject of the study. Both mutants, named 39B3 and 39E7, were expected to carry DNA deletions in hemizygous dosage. Array-based Comparative Genomic Hybridization (array-CGH) using a Citrus cDNA microarray allowed the identification of underrepresented genes in these two mutants. Subsequent comparison of citrus deleted genes with annotated plant genomes, especially poplar, made possible to predict the presence of a large deletion in 39B3 of about 700 kb and at least two deletions of approximately 100 and 500 kb in 39E7. The deletion in 39B3 was further characterized by PCR on available Citrus BACs, which helped us to build a partial physical map of the deletion. Among the deleted genes, ClpC-like gene coding for a putative subunit of a multifunctional chloroplastic protease involved in the regulation of chlorophyll b synthesis was directly related to the mutated phenotype since the mutant showed a reduced chlorophyll a/b ratio in green tissues. Conclusion In this work, we report the use of array-CGH for the successful identification of genes included in a hemizygous deletion induced by fast neutron irradiation on Citrus clementina. The study of gene content and order into the 39B3 deletion also led to the unexpected conclusion that microsynteny and local gene colinearity in this species were higher with Populus trichocarpa than with the phylogenetically closer Arabidopsis thaliana. This work corroborates the potential of Citrus genomic resources to assist mutagenesis-based approaches for functional genetics, structural studies and comparative genomics, and hence to facilitate citrus variety improvement. PMID:18691431

  17. MBGD update 2013: the microbial genome database for exploring the diversity of microbial world.

    PubMed

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2013-01-01

    The microbial genome database for comparative analysis (MBGD, available at http://mbgd.genome.ad.jp/) is a platform for microbial genome comparison based on orthology analysis. As its unique feature, MBGD allows users to conduct orthology analysis among any specified set of organisms; this flexibility allows MBGD to adapt to a variety of microbial genomic study. Reflecting the huge diversity of microbial world, the number of microbial genome projects now becomes several thousands. To efficiently explore the diversity of the entire microbial genomic data, MBGD now provides summary pages for pre-calculated ortholog tables among various taxonomic groups. For some closely related taxa, MBGD also provides the conserved synteny information (core genome alignment) pre-calculated using the CoreAligner program. In addition, efficient incremental updating procedure can create extended ortholog table by adding additional genomes to the default ortholog table generated from the representative set of genomes. Combining with the functionalities of the dynamic orthology calculation of any specified set of organisms, MBGD is an efficient and flexible tool for exploring the microbial genome diversity.

  18. An efficient approach to BAC based assembly of complex genomes.

    PubMed

    Visendi, Paul; Berkman, Paul J; Hayashi, Satomi; Golicz, Agnieszka A; Bayer, Philipp E; Ruperao, Pradeep; Hurgobin, Bhavna; Montenegro, Juan; Chan, Chon-Kit Kenneth; Staňková, Helena; Batley, Jacqueline; Šimková, Hana; Doležel, Jaroslav; Edwards, David

    2016-01-01

    There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate 'gold' reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes.

  19. Efficient Breeding by Genomic Mating.

    PubMed

    Akdemir, Deniz; Sánchez, Julio I

    2016-01-01

    Selection in breeding programs can be done by using phenotypes (phenotypic selection), pedigree relationship (breeding value selection) or molecular markers (marker assisted selection or genomic selection). All these methods are based on truncation selection, focusing on the best performance of parents before mating. In this article we proposed an approach to breeding, named genomic mating, which focuses on mating instead of truncation selection. Genomic mating uses information in a similar fashion to genomic selection but includes information on complementation of parents to be mated. Following the efficiency frontier surface, genomic mating uses concepts of estimated breeding values, risk (usefulness) and coefficient of ancestry to optimize mating between parents. We used a genetic algorithm to find solutions to this optimization problem and the results from our simulations comparing genomic selection, phenotypic selection and the mating approach indicate that current approach for breeding complex traits is more favorable than phenotypic and genomic selection. Genomic mating is similar to genomic selection in terms of estimating marker effects, but in genomic mating the genetic information and the estimated marker effects are used to decide which genotypes should be crossed to obtain the next breeding population.

  20. Systematic comparison of variant calling pipelines using gold standard personal exome variants

    PubMed Central

    Hwang, Sohyun; Kim, Eiru; Lee, Insuk; Marcotte, Edward M.

    2015-01-01

    The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners—BWA-MEM, Bowtie2, and Novoalign—and four variant callers—Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes. PMID:26639839

  1. Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies

    PubMed Central

    Denton, James F.; Lugo-Martinez, Jose; Tucker, Abraham E.; Schrider, Daniel R.; Warren, Wesley C.; Hahn, Matthew W.

    2014-01-01

    Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process. PMID:25474019

  2. Extensive error in the number of genes inferred from draft genome assemblies.

    PubMed

    Denton, James F; Lugo-Martinez, Jose; Tucker, Abraham E; Schrider, Daniel R; Warren, Wesley C; Hahn, Matthew W

    2014-12-01

    Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process.

  3. Fred Hutchinson Cancer Research Center (FHCRC-1): Functional Exploration of the Druggable Genome in MYCN Amplified and Non-amplified Neuroblastoma | Office of Cancer Genomics

    Cancer.gov

    To identify candidate drugs targets for neuroblastoma with MYCN amplification we performed parallel siRNA screens with a druggable genome collection of ~6,700 genes comparing MYCN amplified and non-MYCN amplified cell lines: SK-N-BE2 (MYCN amplified) and SK-N-AS (non amplified).  The Hits from each cell lines were determined based on their significance with respect to their differential activity in the presence or absence of RA within each cell line.  Hits for each cell line were also ranked according to their P-value, based on the three replicates.  Integration with gene exp

  4. Streamlined Genome Sequence Compression using Distributed Source Coding

    PubMed Central

    Wang, Shuang; Jiang, Xiaoqian; Chen, Feng; Cui, Lijuan; Cheng, Samuel

    2014-01-01

    We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS). PMID:25520552

  5. Fred Hutchinson Cancer Research Center (FHCRC-1): Functional Exploration of the Druggable Genome in MYCN Amplified and Non-amplified Neuroblastoma | Office of Cancer Genomics

    Cancer.gov

    To identify candidate drugs targets for neuroblastoma with MYCN amplification we performed parallel siRNA screens with a druggable genome collection of ~6,700 genes comparing MYCN amplified and non-MYCN amplified cell lines: SK-N-BE2 (MYCN amplified) and SK-N-AS (non amplified). The Hits from each cell lines were determined based on their significance with respect to their differential activity in the presence or absence of RA within each cell line. Hits for each cell line were also ranked according to their P-value, based on the three replicates. Integration with gene expres

  6. Gene-based SNP discovery in tepary bean (Phaseolus acutifolius) and common bean (P. vulgaris) for diversity analysis and comparative mapping.

    PubMed

    Gujaria-Verma, Neha; Ramsay, Larissa; Sharpe, Andrew G; Sanderson, Lacey-Anne; Debouck, Daniel G; Tar'an, Bunyamin; Bett, Kirstin E

    2016-03-15

    Common bean (Phaseolus vulgaris) is an important grain legume and there has been a recent resurgence in interest in its relative, tepary bean (P. acutifolius), owing to this species' ability to better withstand abiotic stresses. Genomic resources are scarce for this minor crop species and a better knowledge of the genome-level relationship between these two species would facilitate improvement in both. High-throughput genotyping has facilitated large-scale single nucleotide polymorphism (SNP) identification leading to the development of molecular markers with associated sequence information that can be used to place them in the context of a full genome assembly. Transcript-based SNPs were identified from six common bean and two tepary bean accessions and a subset were used to generate a 768-SNP Illumina GoldenGate assay for each species. The tepary bean assay was used to assess diversity in wild and cultivated tepary bean and to generate the first gene-based map of the tepary bean genome. Genotypic analyses of the diversity panel showed a clear separation between domesticated and cultivated tepary beans, two distinct groups within the domesticated types, and P. parvifolius was confirmed to be distinct. The genetic map of tepary bean was compared to the common bean genome assembly to demonstrate high levels of collinearity between the two species with differences limited to a few intra-chromosomal rearrangements. The development of the first set of genomic resources specifically for tepary bean has allowed for greater insight into the structure of this species and its relationship to its agriculturally more prominent relative, common bean. These resources will be helpful in the development of efficient breeding strategies for both species and will facilitate the introgression of agriculturally important traits from one crop into the other.

  7. Finding the missing honey bee genes: lessons learned from a genome upgrade.

    PubMed

    Elsik, Christine G; Worley, Kim C; Bennett, Anna K; Beye, Martin; Camara, Francisco; Childers, Christopher P; de Graaf, Dirk C; Debyser, Griet; Deng, Jixin; Devreese, Bart; Elhaik, Eran; Evans, Jay D; Foster, Leonard J; Graur, Dan; Guigo, Roderic; Hoff, Katharina Jasmin; Holder, Michael E; Hudson, Matthew E; Hunt, Greg J; Jiang, Huaiyang; Joshi, Vandita; Khetani, Radhika S; Kosarev, Peter; Kovar, Christie L; Ma, Jian; Maleszka, Ryszard; Moritz, Robin F A; Munoz-Torres, Monica C; Murphy, Terence D; Muzny, Donna M; Newsham, Irene F; Reese, Justin T; Robertson, Hugh M; Robinson, Gene E; Rueppell, Olav; Solovyev, Victor; Stanke, Mario; Stolle, Eckart; Tsuruda, Jennifer M; Vaerenbergh, Matthias Van; Waterhouse, Robert M; Weaver, Daniel B; Whitfield, Charles W; Wu, Yuanqing; Zdobnov, Evgeny M; Zhang, Lan; Zhu, Dianhui; Gibbs, Richard A

    2014-01-30

    The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.

  8. Finding the missing honey bee genes: lessons learned from a genome upgrade

    PubMed Central

    2014-01-01

    Background The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Results Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Conclusions Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination. PMID:24479613

  9. Complete mitochondrial genome and taxonomic revision of Cardiodactylus muiri Otte, 2007 (Gryllidae: Eneopterinae: Lebinthini).

    PubMed

    Dong, Jiajia; Vicente, Natallia; Chintauan-Marquier, Ioana C; Ramadi, Cahyo; Dettai, Agnès; Robillard, Tony

    2017-05-15

    In the present study, we report the high-coverage complete mitochondrial genome (mitogenome) of the cricket Cardiodactylus muiri Otte, 2007. The mitogenome was sequenced using a long-PCR approach on an Ion Torrent Personal Genome Machine (PGM) for next generation sequencing technology. The total length of the amplified mitogenome is 16,328 bp, representing 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes and one noncoding region (D-loop region). The new sets of long-PCR primers reported here are invaluable resources for future comparative evolutionary genomic studies in Orthopteran insects. The new mitogenome sequence is compared with published cricket mitogenomes. In the taxonomic part, we present new records for the species and describe life-history traits, habitat and male calling song of the species; based on observation of new material, the species Cardiodactylus buru Gorochov & Robillard, 2014 is synonymized under C. muiri.

  10. Whole-genome multiple displacement amplification from single cells.

    PubMed

    Spits, Claudia; Le Caignec, Cédric; De Rycke, Martine; Van Haute, Lindsey; Van Steirteghem, André; Liebaers, Inge; Sermon, Karen

    2006-01-01

    Multiple displacement amplification (MDA) is a recently described method of whole-genome amplification (WGA) that has proven efficient in the amplification of small amounts of DNA, including DNA from single cells. Compared with PCR-based WGA methods, MDA generates DNA with a higher molecular weight and shows better genome coverage. This protocol was developed for preimplantation genetic diagnosis, and details a method for performing single-cell MDA using the phi29 DNA polymerase. It can also be useful for the amplification of other minute quantities of DNA, such as from forensic material or microdissected tissue. The protocol includes the collection and lysis of single cells, and all materials and steps involved in the MDA reaction. The whole procedure takes 3 h and generates 1-2 microg of DNA from a single cell, which is suitable for multiple downstream applications, such as sequencing, short tandem repeat analysis or array comparative genomic hybridization.

  11. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations.

    PubMed

    Mallick, Swapan; Li, Heng; Lipson, Mark; Mathieson, Iain; Gymrek, Melissa; Racimo, Fernando; Zhao, Mengyao; Chennagiri, Niru; Nordenfelt, Susanne; Tandon, Arti; Skoglund, Pontus; Lazaridis, Iosif; Sankararaman, Sriram; Fu, Qiaomei; Rohland, Nadin; Renaud, Gabriel; Erlich, Yaniv; Willems, Thomas; Gallo, Carla; Spence, Jeffrey P; Song, Yun S; Poletti, Giovanni; Balloux, Francois; van Driem, George; de Knijff, Peter; Romero, Irene Gallego; Jha, Aashish R; Behar, Doron M; Bravi, Claudio M; Capelli, Cristian; Hervig, Tor; Moreno-Estrada, Andres; Posukh, Olga L; Balanovska, Elena; Balanovsky, Oleg; Karachanak-Yankova, Sena; Sahakyan, Hovhannes; Toncheva, Draga; Yepiskoposyan, Levon; Tyler-Smith, Chris; Xue, Yali; Abdullah, M Syafiq; Ruiz-Linares, Andres; Beall, Cynthia M; Di Rienzo, Anna; Jeong, Choongwon; Starikovskaya, Elena B; Metspalu, Ene; Parik, Jüri; Villems, Richard; Henn, Brenna M; Hodoglugil, Ugur; Mahley, Robert; Sajantila, Antti; Stamatoyannopoulos, George; Wee, Joseph T S; Khusainova, Rita; Khusnutdinova, Elza; Litvinov, Sergey; Ayodo, George; Comas, David; Hammer, Michael F; Kivisild, Toomas; Klitz, William; Winkler, Cheryl A; Labuda, Damian; Bamshad, Michael; Jorde, Lynn B; Tishkoff, Sarah A; Watkins, W Scott; Metspalu, Mait; Dryomov, Stanislav; Sukernik, Rem; Singh, Lalji; Thangaraj, Kumarasamy; Pääbo, Svante; Kelso, Janet; Patterson, Nick; Reich, David

    2016-10-13

    Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans.

  12. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations

    PubMed Central

    Mallick, Swapan; Li, Heng; Lipson, Mark; Mathieson, Iain; Gymrek, Melissa; Racimo, Fernando; Zhao, Mengyao; Chennagiri, Niru; Nordenfelt, Susanne; Tandon, Arti; Skoglund, Pontus; Lazaridis, Iosif; Sankararaman, Sriram; Fu, Qiaomei; Rohland, Nadin; Renaud, Gabriel; Erlich, Yaniv; Willems, Thomas; Gallo, Carla; Spence, Jeffrey P.; Song, Yun S.; Poletti, Giovanni; Balloux, Francois; van Driem, George; de Knijff, Peter; Romero, Irene Gallego; Jha, Aashish R.; Behar, Doron M.; Bravi, Claudio M.; Capelli, Cristian; Hervig, Tor; Moreno-Estrada, Andres; Posukh, Olga L.; Balanovska, Elena; Balanovsky, Oleg; Karachanak-Yankova, Sena; Sahakyan, Hovhannes; Toncheva, Draga; Yepiskoposyan, Levon; Tyler-Smith, Chris; Xue, Yali; Abdullah, M. Syafiq; Ruiz-Linares, Andres; Beall, Cynthia M.; Di Rienzo, Anna; Jeong, Choongwon; Starikovskaya, Elena B.; Metspalu, Ene; Parik, Jüri; Villems, Richard; Henn, Brenna M.; Hodoglugil, Ugur; Mahley, Robert; Sajantila, Antti; Stamatoyannopoulos, George; Wee, Joseph T. S.; Khusainova, Rita; Khusnutdinova, Elza; Litvinov, Sergey; Ayodo, George; Comas, David; Hammer, Michael; Kivisild, Toomas; Klitz, William; Winkler, Cheryl; Labuda, Damian; Bamshad, Michael; Jorde, Lynn B.; Tishkoff, Sarah A.; Watkins, W. Scott; Metspalu, Mait; Dryomov, Stanislav; Sukernik, Rem; Singh, Lalji; Thangaraj, Kumarasamy; Pääbo, Svante; Kelso, Janet; Patterson, Nick; Reich, David

    2016-01-01

    We report the Simons Genome Diversity Project (SGDP) dataset: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioral modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that in other non-Africans. PMID:27654912

  13. Reconstruction of metabolic pathways for the cattle genome

    PubMed Central

    Seo, Seongwon; Lewin, Harris A

    2009-01-01

    Background Metabolic reconstruction of microbial, plant and animal genomes is a necessary step toward understanding the evolutionary origins of metabolism and species-specific adaptive traits. The aims of this study were to reconstruct conserved metabolic pathways in the cattle genome and to identify metabolic pathways with missing genes and proteins. The MetaCyc database and PathwayTools software suite were chosen for this work because they are widely used and easy to implement. Results An amalgamated cattle genome database was created using the NCBI and Ensembl cattle genome databases (based on build 3.1) as data sources. PathwayTools was used to create a cattle-specific pathway genome database, which was followed by comprehensive manual curation for the reconstruction of metabolic pathways. The curated database, CattleCyc 1.0, consists of 217 metabolic pathways. A total of 64 mammalian-specific metabolic pathways were modified from the reference pathways in MetaCyc, and two pathways previously identified but missing from MetaCyc were added. Comparative analysis of metabolic pathways revealed the absence of mammalian genes for 22 metabolic enzymes whose activity was reported in the literature. We also identified six human metabolic protein-coding genes for which the cattle ortholog is missing from the sequence assembly. Conclusion CattleCyc is a powerful tool for understanding the biology of ruminants and other cetartiodactyl species. In addition, the approach used to develop CattleCyc provides a framework for the metabolic reconstruction of other newly sequenced mammalian genomes. It is clear that metabolic pathway analysis strongly reflects the quality of the underlying genome annotations. Thus, having well-annotated genomes from many mammalian species hosted in BioCyc will facilitate the comparative analysis of metabolic pathways among different species and a systems approach to comparative physiology. PMID:19284618

  14. Upgrading short-read animal genome assemblies to chromosome level using comparative genomics and a universal probe set.

    PubMed

    Damas, Joana; O'Connor, Rebecca; Farré, Marta; Lenis, Vasileios Panagiotis E; Martell, Henry J; Mandawala, Anjali; Fowler, Katie; Joseph, Sunitha; Swain, Martin T; Griffin, Darren K; Larkin, Denis M

    2017-05-01

    Most recent initiatives to sequence and assemble new species' genomes de novo fail to achieve the ultimate endpoint to produce contigs, each representing one whole chromosome. Even the best-assembled genomes (using contemporary technologies) consist of subchromosomal-sized scaffolds. To circumvent this problem, we developed a novel approach that combines computational algorithms to merge scaffolds into chromosomal fragments, PCR-based scaffold verification, and physical mapping to chromosomes. Multigenome-alignment-guided probe selection led to the development of a set of universal avian BAC clones that permit rapid anchoring of multiple scaffolds to chromosomes on all avian genomes. As proof of principle, we assembled genomes of the pigeon ( Columbia livia ) and peregrine falcon ( Falco peregrinus ) to chromosome levels comparable, in continuity, to avian reference genomes. Both species are of interest for breeding, cultural, food, and/or environmental reasons. Pigeon has a typical avian karyotype (2n = 80), while falcon (2n = 50) is highly rearranged compared to the avian ancestor. By using chromosome breakpoint data, we established that avian interchromosomal breakpoints appear in the regions of low density of conserved noncoding elements (CNEs) and that the chromosomal fission sites are further limited to long CNE "deserts." This corresponds with fission being the rarest type of rearrangement in avian genome evolution. High-throughput multiple hybridization and rapid capture strategies using the current BAC set provide the basis for assembling numerous avian (and possibly other reptilian) species, while the overall strategy for scaffold assembly and mapping provides the basis for an approach that (provided metaphases can be generated) could be applied to any animal genome. © 2017 Damas et al.; Published by Cold Spring Harbor Laboratory Press.

  15. Upgrading short-read animal genome assemblies to chromosome level using comparative genomics and a universal probe set

    PubMed Central

    O'Connor, Rebecca; Lenis, Vasileios Panagiotis E.; Martell, Henry J.; Mandawala, Anjali; Fowler, Katie; Joseph, Sunitha; Swain, Martin T.; Griffin, Darren K.; Larkin, Denis M.

    2017-01-01

    Most recent initiatives to sequence and assemble new species’ genomes de novo fail to achieve the ultimate endpoint to produce contigs, each representing one whole chromosome. Even the best-assembled genomes (using contemporary technologies) consist of subchromosomal-sized scaffolds. To circumvent this problem, we developed a novel approach that combines computational algorithms to merge scaffolds into chromosomal fragments, PCR-based scaffold verification, and physical mapping to chromosomes. Multigenome-alignment-guided probe selection led to the development of a set of universal avian BAC clones that permit rapid anchoring of multiple scaffolds to chromosomes on all avian genomes. As proof of principle, we assembled genomes of the pigeon (Columbia livia) and peregrine falcon (Falco peregrinus) to chromosome levels comparable, in continuity, to avian reference genomes. Both species are of interest for breeding, cultural, food, and/or environmental reasons. Pigeon has a typical avian karyotype (2n = 80), while falcon (2n = 50) is highly rearranged compared to the avian ancestor. By using chromosome breakpoint data, we established that avian interchromosomal breakpoints appear in the regions of low density of conserved noncoding elements (CNEs) and that the chromosomal fission sites are further limited to long CNE “deserts.” This corresponds with fission being the rarest type of rearrangement in avian genome evolution. High-throughput multiple hybridization and rapid capture strategies using the current BAC set provide the basis for assembling numerous avian (and possibly other reptilian) species, while the overall strategy for scaffold assembly and mapping provides the basis for an approach that (provided metaphases can be generated) could be applied to any animal genome. PMID:27903645

  16. Genome size estimates for crustaceans using Feulgen image analysis densitometry of ethanol-preserved tissues.

    PubMed

    Jeffery, Nicholas W; Gregory, T Ryan

    2014-10-01

    Crustaceans are enormously diverse both phylogenetically and ecologically, but they remain substantially underrepresented in the existing genome size database. An expansion of this dataset could be facilitated if it were possible to obtain genome size estimates from ethanol-preserved specimens. In this study, two tests were performed in order to assess the reliability of genome size data generated using preserved material. First, the results of estimates based on flash-frozen versus ethanol-preserved material were compared across 37 species of crustaceans that differ widely in genome size. Second, a comparison was made of specimens from a single species that had been stored in ethanol for 1-14 years. In both cases, the use of gill tissue in Feulgen image analysis densitometry proved to be a very viable approach. This finding is of direct relevance to both new studies of field-collected crustaceans as well as potential studies based on existing collections. © 2014 International Society for Advancement of Cytometry.

  17. Genome-wide comparison and taxonomic relatedness of multiple Xylella fastidiosa strains reveal the occurrence of three subspecies and a new Xylella species.

    PubMed

    Marcelletti, Simone; Scortichini, Marco

    2016-10-01

    A total of 21 Xylella fastidiosa strains were assessed by comparing their genomes to infer their taxonomic relationships. The whole-genome-based average nucleotide identity and tetranucleotide frequency correlation coefficient analyses were performed. In addition, a consensus tree based on comparisons of 956 core gene families, and a genome-wide phylogenetic tree and a Neighbor-net network were constructed with 820,088 nucleotides (i.e., approximately 30-33 % of the entire X. fastidiosa genome). All approaches revealed the occurrence of three well-demarcated genetic clusters that represent X. fastidiosa subspecies fastidiosa, multiplex and pauca, with the latter appeared to diverge. We suggest that the proposed but never formally described subspecies 'sandyi' and 'morus' are instead members of the subspecies fastidiosa. These analyses support the view that the Xylella strain isolated from Pyrus pyrifolia in Taiwan is likely to be a new species. A widely used multilocus sequence typing analysis yielded conflicting results.

  18. Genotyping-by-sequencing in an orphan plant species Physocarpus opulifolius helps identify the evolutionary origins of the genus Prunus.

    PubMed

    Buti, Matteo; Sargent, Daniel J; Mhelembe, Khethani G; Delfino, Pietro; Tobutt, Kenneth R; Velasco, Riccardo

    2016-05-11

    The Rosaceae family encompasses numerous genera exhibiting morphological diversification in fruit types and plant habit as well as a wide variety of chromosome numbers. Comparative genomics between various Rosaceous genera has led to the hypothesis that the ancestral genome of the family contained nine chromosomes, however, the synteny studies performed in the Rosaceae to date encompass species with base chromosome numbers x = 7 (Fragaria), x = 8 (Prunus), and x = 17 (Malus), and no study has included species from one of the many Rosaceous genera containing a base chromosome number of x = 9. A genetic linkage map of the species Physocarpus opulifolius (x = 9) was populated with sequence characterised SNP markers using genotyping by sequencing. This allowed for the first time, the extent of the genome diversification of a Rosaceous genus with a base chromosome number of x = 9 to be performed. Orthologous loci distributed throughout the nine chromosomes of Physocarpus and the eight chromosomes of Prunus were identified which permitted a meaningful comparison of the genomes of these two genera to be made. The study revealed a high level of macro-synteny between the two genomes, and relatively few chromosomal rearrangements, as has been observed in studies of other Rosaceous genomes, lending further support for a relatively simple model of genomic evolution in Rosaceae.

  19. Short interspersed nuclear elements (SINEs) are abundant in Solanaceae and have a family-specific impact on gene structure and genome organization.

    PubMed

    Seibt, Kathrin M; Wenke, Torsten; Muders, Katja; Truberg, Bernd; Schmidt, Thomas

    2016-05-01

    Short interspersed nuclear elements (SINEs) are highly abundant non-autonomous retrotransposons that are widespread in plants. They are short in size, non-coding, show high sequence diversity, and are therefore mostly not or not correctly annotated in plant genome sequences. Hence, comparative studies on genomic SINE populations are rare. To explore the structural organization and impact of SINEs, we comparatively investigated the genome sequences of the Solanaceae species potato (Solanum tuberosum), tomato (Solanum lycopersicum), wild tomato (Solanum pennellii), and two pepper cultivars (Capsicum annuum). Based on 8.5 Gbp sequence data, we annotated 82 983 SINE copies belonging to 10 families and subfamilies on a base pair level. Solanaceae SINEs are dispersed over all chromosomes with enrichments in distal regions. Depending on the genome assemblies and gene predictions, 30% of all SINE copies are associated with genes, particularly frequent in introns and untranslated regions (UTRs). The close association with genes is family specific. More than 10% of all genes annotated in the Solanaceae species investigated contain at least one SINE insertion, and we found genes harbouring up to 16 SINE copies. We demonstrate the involvement of SINEs in gene and genome evolution including the donation of splice sites, start and stop codons and exons to genes, enlargement of introns and UTRs, generation of tandem-like duplications and transduction of adjacent sequence regions. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  20. Determining protein function and interaction from genome analysis

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Thompson, Michael J.; Pellegrini, Matteo; Yeates, Todd O.

    2004-08-03

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  1. Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution.

    PubMed

    Caporale, Lynn Helena

    2012-09-01

    This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.

  2. Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications

    PubMed Central

    Harris, R. Alan; Wang, Ting; Coarfa, Cristian; Nagarajan, Raman P.; Hong, Chibo; Downey, Sara L.; Johnson, Brett E.; Fouse, Shaun D.; Delaney, Allen; Zhao, Yongjun; Olshen, Adam; Ballinger, Tracy; Zhou, Xin; Forsberg, Kevin J.; Gu, Junchen; Echipare, Lorigail; O’Geen, Henriette; Lister, Ryan; Pelizzola, Mattia; Xi, Yuanxin; Epstein, Charles B.; Bernstein, Bradley E.; Hawkins, R. David; Ren, Bing; Chung, Wen-Yu; Gu, Hongcang; Bock, Christoph; Gnirke, Andreas; Zhang, Michael Q.; Haussler, David; Ecker, Joseph; Li, Wei; Farnham, Peggy J.; Waterland, Robert A.; Meissner, Alexander; Marra, Marco A.; Hirst, Martin; Milosavljevic, Aleksandar; Costello, Joseph F.

    2010-01-01

    Sequencing-based DNA methylation profiling methods are comprehensive and, as accuracy and affordability improve, will increasingly supplant microarrays for genome-scale analyses. Here, four sequencing-based methodologies were applied to biological replicates of human embryonic stem cells to compare their CpG coverage genome-wide and in transposons, resolution, cost, concordance and its relationship with CpG density and genomic context. The two bisulfite methods reached concordance of 82% for CpG methylation levels and 99% for non-CpG cytosine methylation levels. Using binary methylation calls, two enrichment methods were 99% concordant, while regions assessed by all four methods were 97% concordant. To achieve comprehensive methylome coverage while reducing cost, an approach integrating two complementary methods was examined. The integrative methylome profile along with histone methylation, RNA, and SNP profiles derived from the sequence reads allowed genome-wide assessment of allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression. PMID:20852635

  3. Complete plastid genome sequence of goosegrass (Eleusine indica) and comparison with other Poaceae.

    PubMed

    Zhang, Hui; Hall, Nathan; McElroy, J Scott; Lowe, Elijah K; Goertzen, Leslie R

    2017-02-05

    Eleusine indica, also known as goosegrass, is a serious weed in at least 42 countries. In this paper we report the complete plastid genome sequence of goosegrass obtained by de novo assembly of paired-end and mate-paired reads generated by Illumina sequencing of total genomic DNA. The goosegrass plastome is a circular molecule of 135,151bp in length, consisting of two single-copy regions separated by a pair of inverted repeats (IRs) of 20,919 bases. The large (LSC) and the small (SSC) single-copy regions span 80,667 bases and 12,646 bases, respectively. The plastome of goosegrass has 38.19% GC content and includes 108 unique genes, of which 76 are protein-coding, 28 are transfer RNA, and 4 are ribosomal RNA. The goosegrass plastome sequence was compared to eight other species of Poaceae. Although generally conserved with respect to Poaceae, this genomic resource will be useful for evolutionary studies within this weed species and the genus Eleusine. Copyright © 2016. Published by Elsevier B.V.

  4. GBOOST: a GPU-based tool for detecting gene-gene interactions in genome-wide case control studies.

    PubMed

    Yung, Ling Sing; Yang, Can; Wan, Xiang; Yu, Weichuan

    2011-05-01

    Collecting millions of genetic variations is feasible with the advanced genotyping technology. With a huge amount of genetic variations data in hand, developing efficient algorithms to carry out the gene-gene interaction analysis in a timely manner has become one of the key problems in genome-wide association studies (GWAS). Boolean operation-based screening and testing (BOOST), a recent work in GWAS, completes gene-gene interaction analysis in 2.5 days on a desktop computer. Compared with central processing units (CPUs), graphic processing units (GPUs) are highly parallel hardware and provide massive computing resources. We are, therefore, motivated to use GPUs to further speed up the analysis of gene-gene interactions. We implement the BOOST method based on a GPU framework and name it GBOOST. GBOOST achieves a 40-fold speedup compared with BOOST. It completes the analysis of Wellcome Trust Case Control Consortium Type 2 Diabetes (WTCCC T2D) genome data within 1.34 h on a desktop computer equipped with Nvidia GeForce GTX 285 display card. GBOOST code is available at http://bioinformatics.ust.hk/BOOST.html#GBOOST.

  5. Inactivation of p53 Rescues the Maintenance of High Risk HPV DNA Genomes Deficient in Expression of E6

    PubMed Central

    Lorenz, Laurel D.; Rivera Cardona, Jessenia; Lambert, Paul F.

    2013-01-01

    The human papillomavirus DNA genome undergoes three distinct stages of replication: establishment, maintenance and amplification. We show that the HPV16 E6 protein is required for the maintenance of the HPV16 DNA genome as an extrachromosomal, nuclear plasmid in its natural host cell, the human keratinocyte. Based upon mutational analyses, inactivation of p53 by E6, but not necessarily E6-mediated degradation of p53, was found to correlate with the ability of E6 to support maintenance of the HPV16 genome as a nuclear plasmid. Inactivation of p53 with dominant negative p53 rescued the ability of HPV16 E6STOP and E6SAT mutant genomes to replicate as extrachromosomal genomes, though not to the same degree as observed for the HPV16 E6 wild-type (WT) genome. Inactivation of p53 also rescued the ability of HPV18 and HPV31 E6-deficient genomes to be maintained at copy numbers comparable to that of HPV18 and HPV31 E6WT genomes at early passages, though upon further passaging copy numbers for the HPV18 and 31 E6-deficient genomes lessened compared to that of the WT genomes. We conclude that inactivation of p53 is necessary for maintenance of HPV16 and for HPV18 and 31 to replicate at WT copy number, but that additional functions of E6 independent of inactivating p53 must also contribute to the maintenance of these genomes. Together these results suggest that re-activation of p53 may be a possible means for eradicating extrachromosomal HPV16, 18 or 31 genomes in the context of persistent infections. PMID:24204267

  6. Enhancing genome assemblies by integrating non-sequence based data

    PubMed Central

    2011-01-01

    Introduction Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies. Methods The tammar wallaby (Macropus eugenii) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome. Results Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated into a scaffold (55% total) but a 35% increase in N50 scaffold size over the use of sequence-based data alone. Conclusions We provide a relatively simple pipeline utilizing existing bioinformatics tools to integrate map data into a genome assembly which is available at http://www.mcb.uconn.edu/fac.php?name=paska. While the map data only contributed minimally to assigning the initial contigs to scaffolds in the new assembly, it greatly increased the N50 size. This process added structure to our low coverage assembly, greatly increasing its utility in further analyses. PMID:21554765

  7. Enhancing genome assemblies by integrating non-sequence based data.

    PubMed

    Heider, Thomas N; Lindsay, James; Wang, Chenwei; O'Neill, Rachel J; Pask, Andrew J

    2011-05-28

    Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies. The tammar wallaby (Macropus eugenii) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome. Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated into a scaffold (55% total) but a 35% increase in N50 scaffold size over the use of sequence-based data alone. We provide a relatively simple pipeline utilizing existing bioinformatics tools to integrate map data into a genome assembly which is available at http://www.mcb.uconn.edu/fac.php?name=paska. While the map data only contributed minimally to assigning the initial contigs to scaffolds in the new assembly, it greatly increased the N50 size. This process added structure to our low coverage assembly, greatly increasing its utility in further analyses.

  8. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

    PubMed

    Singh, Param Priya; Arora, Jatin; Isambert, Hervé

    2015-07-01

    Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.

  9. Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes

    PubMed Central

    Singh, Param Priya; Arora, Jatin; Isambert, Hervé

    2015-01-01

    Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined ‘ohnologs’ after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases. PMID:26181593

  10. Methods and Applications of CRISPR-Mediated Base Editing in Eukaryotic Genomes.

    PubMed

    Hess, Gaelen T; Tycko, Josh; Yao, David; Bassik, Michael C

    2017-10-05

    The past several years have seen an explosion in development of applications for the CRISPR-Cas9 system, from efficient genome editing, to high-throughput screening, to recruitment of a range of DNA and chromatin-modifying enzymes. While homology-directed repair (HDR) coupled with Cas9 nuclease cleavage has been used with great success to repair and re-write genomes, recently developed base-editing systems present a useful orthogonal strategy to engineer nucleotide substitutions. Base editing relies on recruitment of cytidine deaminases to introduce changes (rather than double-stranded breaks and donor templates) and offers potential improvements in efficiency while limiting damage and simplifying the delivery of editing machinery. At the same time, these systems enable novel mutagenesis strategies to introduce sequence diversity for engineering and discovery. Here, we review the different base-editing platforms, including their deaminase recruitment strategies and editing outcomes, and compare them to other CRISPR genome-editing technologies. Additionally, we discuss how these systems have been applied in therapeutic, engineering, and research settings. Lastly, we explore future directions of this emerging technology. Copyright © 2017 Elsevier Inc. All rights reserved.

  11. Improved orthologous databases to ease protozoan targets inference.

    PubMed

    Kotowski, Nelson; Jardim, Rodrigo; Dávila, Alberto M R

    2015-09-29

    Homology inference helps on identifying similarities, as well as differences among organisms, which provides a better insight on how closely related one might be to another. In addition, comparative genomics pipelines are widely adopted tools designed using different bioinformatics applications and algorithms. In this article, we propose a methodology to build improved orthologous databases with the potential to aid on protozoan target identification, one of the many tasks which benefit from comparative genomics tools. Our analyses are based on OrthoSearch, a comparative genomics pipeline originally designed to infer orthologs through protein-profile comparison, supported by an HMM, reciprocal best hits based approach. Our methodology allows OrthoSearch to confront two orthologous databases and to generate an improved new one. Such can be later used to infer potential protozoan targets through a similarity analysis against the human genome. The protein sequences of Cryptosporidium hominis, Entamoeba histolytica and Leishmania infantum genomes were comparatively analyzed against three orthologous databases: (i) EggNOG KOG, (ii) ProtozoaDB and (iii) Kegg Orthology (KO). That allowed us to create two new orthologous databases, "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB", with 16,938 and 27,701 orthologous groups, respectively. Such new orthologous databases were used for a regular OrthoSearch run. By confronting "KO + EggNOG KOG" and "KO + EggNOG KOG + ProtozoaDB" databases and protozoan species we were able to detect the following total of orthologous groups and coverage (relation between the inferred orthologous groups and the species total number of proteins): Cryptosporidium hominis: 1,821 (11 %) and 3,254 (12 %); Entamoeba histolytica: 2,245 (13 %) and 5,305 (19 %); Leishmania infantum: 2,702 (16 %) and 4,760 (17 %). Using our HMM-based methodology and the largest created orthologous database, it was possible to infer 13 orthologous groups which represent potential protozoan targets; these were found because of our distant homology approach. We also provide the number of species-specific, pair-to-pair and core groups from such analyses, depicted in Venn diagrams. The orthologous databases generated by our HMM-based methodology provide a broader dataset, with larger amounts of orthologous groups when compared to the original databases used as input. Those may be used for several homology inference analyses, annotation tasks and protozoan targets identification.

  12. Structure, evolution, and comparative genomics of tetraploid cotton based on a high-density genetic linkage map.

    PubMed

    Li, Ximei; Jin, Xin; Wang, Hantao; Zhang, Xianlong; Lin, Zhongxu

    2016-06-01

    A high-density linkage map was constructed using 1,885 newly obtained loci and 3,747 previously published loci, which included 5,152 loci with 4696.03 cM in total length and 0.91 cM in mean distance. Homology analysis in the cotton genome further confirmed the 13 expected homologous chromosome pairs and revealed an obvious inversion on Chr10 or Chr20 and repeated inversions on Chr07 or Chr16. In addition, two reciprocal translocations between Chr02 and Chr03 and between Chr04 and Chr05 were confirmed. Comparative genomics between the tetraploid cotton and the diploid cottons showed that no major structural changes exist between DT and D chromosomes but rather between AT and A chromosomes. Blast analysis between the tetraploid cotton genome and the mixed genome of two diploid cottons showed that most AD chromosomes, regardless of whether it is from the AT or DT genome, preferentially matched with the corresponding homologous chromosome in the diploid A genome, and then the corresponding homologous chromosome in the diploid D genome, indicating that the diploid D genome underwent converted evolution by the diploid A genome to form the DT genome during polyploidization. In addition, the results reflected that a series of chromosomal translocations occurred among Chr01/Chr15, Chr02/Chr14, Chr03/Chr17, Chr04/Chr22, and Chr05/Chr19. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  13. Comparative genomic and morphological analyses of Listeria phages isolated from farm environments.

    PubMed

    Denes, Thomas; Vongkamjan, Kitiya; Ackermann, Hans-Wolfgang; Moreno Switt, Andrea I; Wiedmann, Martin; den Bakker, Henk C

    2014-08-01

    The genus Listeria is ubiquitous in the environment and includes the globally important food-borne pathogen Listeria monocytogenes. While the genomic diversity of Listeria has been well studied, considerably less is known about the genomic and morphological diversity of Listeria bacteriophages. In this study, we sequenced and analyzed the genomes of 14 Listeria phages isolated mostly from New York dairy farm environments as well as one related Enterococcus faecalis phage to obtain information on genome characteristics and diversity. We also examined 12 of the phages by electron microscopy to characterize their morphology. These Listeria phages, based on gene orthology and morphology, together with previously sequenced Listeria phages could be classified into five orthoclusters, including one novel orthocluster. One orthocluster (orthocluster I) consists of large genome (~135-kb) myoviruses belonging to the genus “Twort-like viruses,” three orthoclusters (orthoclusters II to IV) contain small-genome (36- to 43-kb) siphoviruses with icosahedral heads, and the novel orthocluster V contains medium-sized-genome (~66-kb) siphoviruses with elongated heads. A novel orthocluster (orthocluster VI) of E. faecalis phages, with medium-sized genomes (~56 kb), was identified, which grouped together and shares morphological features with the novel Listeria phage orthocluster V. This new group of phages (i.e., orthoclusters V and VI) is composed of putative lytic phages that may prove to be useful in phage-based applications for biocontrol, detection, and therapeutic purposes.

  14. An orthology-based analysis of pathogenic protozoa impacting global health: an improved comparative genomics approach with prokaryotes and model eukaryote orthologs.

    PubMed

    Cuadrat, Rafael R C; da Serra Cruz, Sérgio Manuel; Tschoeke, Diogo Antônio; Silva, Edno; Tosta, Frederico; Jucá, Henrique; Jardim, Rodrigo; Campos, Maria Luiza M; Mattoso, Marta; Dávila, Alberto M R

    2014-08-01

    A key focus in 21(st) century integrative biology and drug discovery for neglected tropical and other diseases has been the use of BLAST-based computational methods for identification of orthologous groups in pathogenic organisms to discern orthologs, with a view to evaluate similarities and differences among species, and thus allow the transfer of annotation from known/curated proteins to new/non-annotated ones. We used here a profile-based sensitive methodology to identify distant homologs, coupled to the NCBI's COG (Unicellular orthologs) and KOG (Eukaryote orthologs), permitting us to perform comparative genomics analyses on five protozoan genomes. OrthoSearch was used in five protozoan proteomes showing that 3901 and 7473 orthologs can be identified by comparison with COG and KOG proteomes, respectively. The core protozoa proteome inferred was 418 Protozoa-COG orthologous groups and 704 Protozoa-KOG orthologous groups: (i) 31.58% (132/418) belongs to the category J (translation, ribosomal structure, and biogenesis), and 9.81% (41/418) to the category O (post-translational modification, protein turnover, chaperones) using COG; (ii) 21.45% (151/704) belongs to the categories J, and 13.92% (98/704) to the O using KOG. The phylogenomic analysis showed four well-supported clades for Eukarya, discriminating Multicellular [(i) human, fly, plant and worm] and Unicellular [(ii) yeast, (iii) fungi, and (iv) protozoa] species. These encouraging results attest to the usefulness of the profile-based methodology for comparative genomics to accelerate semi-automatic re-annotation, especially of the protozoan proteomes. This approach may also lend itself for applications in global health, for example, in the case of novel drug target discovery against pathogenic organisms previously considered difficult to research with traditional drug discovery tools.

  15. An Orthology-Based Analysis of Pathogenic Protozoa Impacting Global Health: An Improved Comparative Genomics Approach with Prokaryotes and Model Eukaryote Orthologs

    PubMed Central

    Cuadrat, Rafael R. C.; da Serra Cruz, Sérgio Manuel; Tschoeke, Diogo Antônio; Silva, Edno; Tosta, Frederico; Jucá, Henrique; Jardim, Rodrigo; Campos, Maria Luiza M.; Mattoso, Marta

    2014-01-01

    Abstract A key focus in 21st century integrative biology and drug discovery for neglected tropical and other diseases has been the use of BLAST-based computational methods for identification of orthologous groups in pathogenic organisms to discern orthologs, with a view to evaluate similarities and differences among species, and thus allow the transfer of annotation from known/curated proteins to new/non-annotated ones. We used here a profile-based sensitive methodology to identify distant homologs, coupled to the NCBI's COG (Unicellular orthologs) and KOG (Eukaryote orthologs), permitting us to perform comparative genomics analyses on five protozoan genomes. OrthoSearch was used in five protozoan proteomes showing that 3901 and 7473 orthologs can be identified by comparison with COG and KOG proteomes, respectively. The core protozoa proteome inferred was 418 Protozoa-COG orthologous groups and 704 Protozoa-KOG orthologous groups: (i) 31.58% (132/418) belongs to the category J (translation, ribosomal structure, and biogenesis), and 9.81% (41/418) to the category O (post-translational modification, protein turnover, chaperones) using COG; (ii) 21.45% (151/704) belongs to the categories J, and 13.92% (98/704) to the O using KOG. The phylogenomic analysis showed four well-supported clades for Eukarya, discriminating Multicellular [(i) human, fly, plant and worm] and Unicellular [(ii) yeast, (iii) fungi, and (iv) protozoa] species. These encouraging results attest to the usefulness of the profile-based methodology for comparative genomics to accelerate semi-automatic re-annotation, especially of the protozoan proteomes. This approach may also lend itself for applications in global health, for example, in the case of novel drug target discovery against pathogenic organisms previously considered difficult to research with traditional drug discovery tools. PMID:24960463

  16. Genomic and pedigree-based prediction for leaf, stem, and stripe rust resistance in wheat.

    PubMed

    Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Huerta-Espino, Julio; Lan, Caixia; Bhavani, Sridhar; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E

    2017-07-01

    Genomic prediction for seedling and adult plant resistance to wheat rusts was compared to prediction using few markers as fixed effects in a least-squares approach and pedigree-based prediction. The unceasing plant-pathogen arms race and ephemeral nature of some rust resistance genes have been challenging for wheat (Triticum aestivum L.) breeding programs and farmers. Hence, it is important to devise strategies for effective evaluation and exploitation of quantitative rust resistance. One promising approach that could accelerate gain from selection for rust resistance is 'genomic selection' which utilizes dense genome-wide markers to estimate the breeding values (BVs) for quantitative traits. Our objective was to compare three genomic prediction models including genomic best linear unbiased prediction (GBLUP), GBLUP A that was GBLUP with selected loci as fixed effects and reproducing kernel Hilbert spaces-markers (RKHS-M) with least-squares (LS) approach, RKHS-pedigree (RKHS-P), and RKHS markers and pedigree (RKHS-MP) to determine the BVs for seedling and/or adult plant resistance (APR) to leaf rust (LR), stem rust (SR), and stripe rust (YR). The 333 lines in the 45th IBWSN and the 313 lines in the 46th IBWSN were genotyped using genotyping-by-sequencing and phenotyped in replicated trials. The mean prediction accuracies ranged from 0.31-0.74 for LR seedling, 0.12-0.56 for LR APR, 0.31-0.65 for SR APR, 0.70-0.78 for YR seedling, and 0.34-0.71 for YR APR. For most datasets, the RKHS-MP model gave the highest accuracies, while LS gave the lowest. GBLUP, GBLUP A, RKHS-M, and RKHS-P models gave similar accuracies. Using genome-wide marker-based models resulted in an average of 42% increase in accuracy over LS. We conclude that GS is a promising approach for improvement of quantitative rust resistance and can be implemented in the breeding pipeline.

  17. IMG/M: integrated genome and metagenome comparative data analysis system

    DOE PAGES

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; ...

    2016-10-13

    The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support formore » examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review(ER) companion system (IMG/M ER: https://img.jgi.doe.gov/ mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.« less

  18. IMG/M: integrated genome and metagenome comparative data analysis system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken

    The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support formore » examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review(ER) companion system (IMG/M ER: https://img.jgi.doe.gov/ mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.« less

  19. IMG/M: integrated genome and metagenome comparative data analysis system

    PubMed Central

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Palaniappan, Krishna; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Andersen, Evan; Huntemann, Marcel; Varghese, Neha; Hadjithomas, Michalis; Tennessen, Kristin; Nielsen, Torben; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2017-01-01

    The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support for examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review (ER) companion system (IMG/M ER: https://img.jgi.doe.gov/mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system. PMID:27738135

  20. GStream: Improving SNP and CNV Coverage on Genome-Wide Association Studies

    PubMed Central

    Alonso, Arnald; Marsal, Sara; Tortosa, Raül; Canela-Xandri, Oriol; Julià, Antonio

    2013-01-01

    We present GStream, a method that combines genome-wide SNP and CNV genotyping in the Illumina microarray platform with unprecedented accuracy. This new method outperforms previous well-established SNP genotyping software. More importantly, the CNV calling algorithm of GStream dramatically improves the results obtained by previous state-of-the-art methods and yields an accuracy that is close to that obtained by purely CNV-oriented technologies like Comparative Genomic Hybridization (CGH). We demonstrate the superior performance of GStream using microarray data generated from HapMap samples. Using the reference CNV calls generated by the 1000 Genomes Project (1KGP) and well-known studies on whole genome CNV characterization based either on CGH or genotyping microarray technologies, we show that GStream can increase the number of reliably detected variants up to 25% compared to previously developed methods. Furthermore, the increased genome coverage provided by GStream allows the discovery of CNVs in close linkage disequilibrium with SNPs, previously associated with disease risk in published Genome-Wide Association Studies (GWAS). These results could provide important insights into the biological mechanism underlying the detected disease risk association. With GStream, large-scale GWAS will not only benefit from the combined genotyping of SNPs and CNVs at an unprecedented accuracy, but will also take advantage of the computational efficiency of the method. PMID:23844243

  1. Long-read sequencing improves assembly of Trichinella genomes 10-fold, revealing substantial synteny between lineages diverged over 7 million years.

    PubMed

    Thompson, Peter C; Zarlenga, Dante S; Liu, Ming-Yuan; Rosenthal, Benjamin M

    2017-09-01

    Genome assemblies can form the basis of comparative analyses fostering insight into the evolutionary genetics of a parasite's pathogenicity, host-pathogen interactions, environmental constraints and invasion biology; however, the length and complexity of many parasite genomes has hampered the development of well-resolved assemblies. In order to improve Trichinella genome assemblies, the genome of the sylvatic encapsulated species Trichinella murrelli was sequenced using third-generation, long-read technology and, using syntenic comparisons, scaffolded to a reference genome assembly of Trichinella spiralis, markedly improving both. A high-quality draft assembly for T. murrelli was achieved that totalled 63·2 Mbp, half of which was condensed into 26 contigs each longer than 571 000 bp. When compared with previous assemblies for parasites in the genus, ours required 10-fold fewer contigs, which were five times longer, on average. Better assembly across repetitive regions also enabled resolution of 8 Mbp of previously indeterminate sequence. Furthermore, syntenic comparisons identified widespread scaffold misassemblies in the T. spiralis reference genome. The two new assemblies, organized for the first time into three chromosomal scaffolds, will be valuable resources for future studies linking phenotypic traits within each species to their underlying genetic bases.

  2. Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies.

    PubMed

    Feuk, Lars; MacDonald, Jeffrey R; Tang, Terence; Carson, Andrew R; Li, Martin; Rao, Girish; Khaja, Razi; Scherer, Stephen W

    2005-10-01

    With a draft genome-sequence assembly for the chimpanzee available, it is now possible to perform genome-wide analyses to identify, at a submicroscopic level, structural rearrangements that have occurred between chimpanzees and humans. The goal of this study was to investigate chromosomal regions that are inverted between the chimpanzee and human genomes. Using the net alignments for the builds of the human and chimpanzee genome assemblies, we identified a total of 1,576 putative regions of inverted orientation, covering more than 154 mega-bases of DNA. The DNA segments are distributed throughout the genome and range from 23 base pairs to 62 mega-bases in length. For the 66 inversions more than 25 kilobases (kb) in length, 75% were flanked on one or both sides by (often unrelated) segmental duplications. Using PCR and fluorescence in situ hybridization we experimentally validated 23 of 27 (85%) semi-randomly chosen regions; the largest novel inversion confirmed was 4.3 mega-bases at human Chromosome 7p14. Gorilla was used as an out-group to assign ancestral status to the variants. All experimentally validated inversion regions were then assayed against a panel of human samples and three of the 23 (13%) regions were found to be polymorphic in the human genome. These polymorphic inversions include 730 kb (at 7p22), 13 kb (at 7q11), and 1 kb (at 16q24) fragments with a 5%, 30%, and 48% minor allele frequency, respectively. Our results suggest that inversions are an important source of variation in primate genome evolution. The finding of at least three novel inversion polymorphisms in humans indicates this type of structural variation may be a more common feature of our genome than previously realized.

  3. Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies

    PubMed Central

    Schatz, Michael C.; Phillippy, Adam M.; Sommer, Daniel D.; Delcher, Arthur L.; Puiu, Daniela; Narzisi, Giuseppe; Salzberg, Steven L.; Pop, Mihai

    2013-01-01

    Since its launch in 2004, the open-source AMOS project has released several innovative DNA sequence analysis applications including: Hawkeye, a visual analytics tool for inspecting the structure of genome assemblies; the Assembly Forensics and FRCurve pipelines for systematically evaluating the quality of a genome assembly; and AMOScmp, the first comparative genome assembler. These applications have been used to assemble and analyze dozens of genomes ranging in complexity from simple microbial species through mammalian genomes. Recent efforts have been focused on enhancing support for new data characteristics brought on by second- and now third-generation sequencing. This review describes the major components of AMOS in light of these challenges, with an emphasis on methods for assessing assembly quality and the visual analytics capabilities of Hawkeye. These interactive graphical aspects are essential for navigating and understanding the complexities of a genome assembly, from the overall genome structure down to individual bases. Hawkeye and AMOS are available open source at http://amos.sourceforge.net. PMID:22199379

  4. proGenomes: a resource for consistent functional and taxonomic annotations of prokaryotic genomes.

    PubMed

    Mende, Daniel R; Letunic, Ivica; Huerta-Cepas, Jaime; Li, Simone S; Forslund, Kristoffer; Sunagawa, Shinichi; Bork, Peer

    2017-01-04

    The availability of microbial genomes has opened many new avenues of research within microbiology. This has been driven primarily by comparative genomics approaches, which rely on accurate and consistent characterization of genomic sequences. It is nevertheless difficult to obtain consistent taxonomic and integrated functional annotations for defined prokaryotic clades. Thus, we developed proGenomes, a resource that provides user-friendly access to currently 25 038 high-quality genomes whose sequences and consistent annotations can be retrieved individually or by taxonomic clade. These genomes are assigned to 5306 consistent and accurate taxonomic species clusters based on previously established methodology. proGenomes also contains functional information for almost 80 million protein-coding genes, including a comprehensive set of general annotations and more focused annotations for carbohydrate-active enzymes and antibiotic resistance genes. Additionally, broad habitat information is provided for many genomes. All genomes and associated information can be downloaded by user-selected clade or multiple habitat-specific sets of representative genomes. We expect that the availability of high-quality genomes with comprehensive functional annotations will promote advances in clinical microbial genomics, functional evolution and other subfields of microbiology. proGenomes is available at http://progenomes.embl.de. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Comparative Omics-Driven Genome Annotation Refinement: Application across Yersiniae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rutledge, Alexandra C.; Jones, Marcus B.; Chauhan, Sadhana

    2012-03-27

    Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. To date, the perceived value of manual curation for genome annotations is not offset by the real cost and time associated with the process. In order to balance the large number of sequences generated, the annotation process is now performed almost exclusively in an automated fashion for most genome sequencing projects. One possible way to reduce errors inherent to automated computational annotations is to apply data from 'omics' measurements (i.e. transcriptional and proteomic) to themore » un-annotated genome with a proteogenomic-based approach. This approach does require additional experimental and bioinformatics methods to include omics technologies; however, the approach is readily automatable and can benefit from rapid developments occurring in those research domains as well. The annotation process can be improved by experimental validation of transcription and translation and aid in the discovery of annotation errors. Here the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species, as is becoming common in sequencing efforts. Transcriptomic and proteomic data derived from three highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 previously incorrect protein-coding sequences (e.g., observed frameshifts, extended start sites, and translated pseudogenes) within the three current Yersinia genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent pathogen, thus the discovery of many translated pseudogenes underscores a need for functional analyses to investigate hypotheses related to divergence. Refinements included the discovery of a seemingly essential ribosomal protein, several virulence-associated factors, and a transcriptional regulator, among other proteins, most of which are annotated as hypothetical, that were missed during annotation.« less

  6. Expansion by whole genome duplication and evolution of the sox gene family in teleost fish

    PubMed Central

    Naville, Magali; Volff, Jean-Nicolas

    2017-01-01

    It is now recognized that several rounds of whole genome duplication (WGD) have occurred during the evolution of vertebrates, but the link between WGDs and phenotypic diversification remains unsolved. We have investigated in this study the impact of the teleost-specific WGD on the evolution of the sox gene family in teleostean fishes. The sox gene family, which encodes for transcription factors, has essential role in morphology, physiology and behavior of vertebrates and teleosts, the current largest group of vertebrates. We have first redrawn the evolution of all sox genes identified in eleven teleost genomes using a comparative genomic approach including phylogenetic and synteny analyses. We noticed, compared to tetrapods, an important expansion of the sox family: 58% (11/19) of sox genes are duplicated in teleost genomes. Furthermore, all duplicated sox genes, except sox17 paralogs, are derived from the teleost-specific WGD. Then, focusing on five sox genes, analyzing the evolution of coding and non-coding sequences, as well as the expression patterns in fish embryos and adult tissues, we demonstrated that these paralogs followed lineage-specific evolutionary trajectories in teleost genomes. This work, based on whole genome data from multiple teleostean species, supports the contribution of WGDs to the expansion of gene families, as well as to the emergence of genomic differences between lineages that might promote genetic and phenotypic diversity in teleosts. PMID:28738066

  7. Genome-Based Comparison of Clostridioides difficile: Average Amino Acid Identity Analysis of Core Genomes.

    PubMed

    Cabal, Adriana; Jun, Se-Ran; Jenjaroenpun, Piroon; Wanchai, Visanu; Nookaew, Intawat; Wongsurawat, Thidathip; Burgess, Mary J; Kothari, Atul; Wassenaar, Trudy M; Ussery, David W

    2018-02-14

    Infections due to Clostridioides difficile (previously known as Clostridium difficile) are a major problem in hospitals, where cases can be caused by community-acquired strains as well as by nosocomial spread. Whole genome sequences from clinical samples contain a lot of information but that needs to be analyzed and compared in such a way that the outcome is useful for clinicians or epidemiologists. Here, we compare 663 public available complete genome sequences of C. difficile using average amino acid identity (AAI) scores. This analysis revealed that most of these genomes (640, 96.5%) clearly belong to the same species, while the remaining 23 genomes produce four distinct clusters within the Clostridioides genus. The main C. difficile cluster can be further divided into sub-clusters, depending on the chosen cutoff. We demonstrate that MLST, either based on partial or full gene-length, results in biased estimates of genetic differences and does not capture the true degree of similarity or differences of complete genomes. Presence of genes coding for C. difficile toxins A and B (ToxA/B), as well as the binary C. difficile toxin (CDT), was deduced from their unique PfamA domain architectures. Out of the 663 C. difficile genomes, 535 (80.7%) contained at least one copy of ToxA or ToxB, while these genes were missing from 128 genomes. Although some clusters were enriched for toxin presence, these genes are variably present in a given genetic background. The CDT genes were found in 191 genomes, which were restricted to a few clusters only, and only one cluster lacked the toxin A/B genes consistently. A total of 310 genomes contained ToxA/B without CDT (47%). Further, published metagenomic data from stools were used to assess the presence of C. difficile sequences in blinded cases of C. difficile infection (CDI) and controls, to test if metagenomic analysis is sensitive enough to detect the pathogen, and to establish strain relationships between cases from the same hospital. We conclude that metagenomics can contribute to the identification of CDI and can assist in characterization of the most probable causative strain in CDI patients.

  8. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula

    PubMed Central

    Macas, Jiří; Neumann, Pavel; Navrátilová, Alice

    2007-01-01

    Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum). Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data provide a starting point for further investigations of legume plant genomes based on their global comparative analysis and for the development of more sophisticated approaches for data mining. PMID:18031571

  9. A novel technique based on in vitro oocyte injection to improve CRISPR/Cas9 gene editing in zebrafish

    PubMed Central

    Xie, Shao-Lin; Bian, Wan-Ping; Wang, Chao; Junaid, Muhammad; Zou, Ji-Xing; Pei, De-Sheng

    2016-01-01

    Contemporary improvements in the type II clustered regularly interspaced short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) system offer a convenient way for genome editing in zebrafish. However, the low efficiencies of genome editing and germline transmission require a time-intensive and laborious screening work. Here, we reported a method based on in vitro oocyte storage by injecting oocytes in advance and incubating them in oocyte storage medium to significantly improve the efficiencies of genome editing and germline transmission by in vitro fertilization (IVF) in zebrafish. Compared to conventional methods, the prior micro-injection of zebrafish oocytes improved the efficiency of genome editing, especially for the sgRNAs with low targeting efficiency. Due to high throughputs, simplicity and flexible design, this novel strategy will provide an efficient alternative to increase the speed of generating heritable mutants in zebrafish by using CRISPR/Cas9 system. PMID:27680290

  10. Whole-genome sequence of Schistosoma haematobium.

    PubMed

    Young, Neil D; Jex, Aaron R; Li, Bo; Liu, Shiping; Yang, Linfeng; Xiong, Zijun; Li, Yingrui; Cantacessi, Cinzia; Hall, Ross S; Xu, Xun; Chen, Fangyuan; Wu, Xuan; Zerlotini, Adhemar; Oliveira, Guilherme; Hofmann, Andreas; Zhang, Guojie; Fang, Xiaodong; Kang, Yi; Campbell, Bronwyn E; Loukas, Alex; Ranganathan, Shoba; Rollinson, David; Rinaldi, Gabriel; Brindley, Paul J; Yang, Huanming; Wang, Jun; Wang, Jian; Gasser, Robin B

    2012-01-15

    Schistosomiasis is a neglected tropical disease caused by blood flukes (genus Schistosoma; schistosomes) and affecting 200 million people worldwide. No vaccines are available, and treatment relies on one drug, praziquantel. Schistosoma haematobium has come into the spotlight as a major cause of urogenital disease, as an agent linked to bladder cancer and as a predisposing factor for HIV/AIDS. The parasite is transmitted to humans from freshwater snails. Worms dwell in blood vessels and release eggs that become embedded in the bladder wall to elicit chronic immune-mediated disease and induce squamous cell carcinoma. Here we sequenced the 385-Mb genome of S. haematobium using Illumina-based technology at 74-fold coverage and compared it to sequences from related parasites. We included genome annotation based on function, gene ontology, networking and pathway mapping. This genome now provides an unprecedented resource for many fundamental research areas and shows great promise for the design of new disease interventions.

  11. Comparative Genomics Reveals Accelerated Evolution in Conserved Pathways during the Diversification of Anole Lizards.

    PubMed

    Tollis, Marc; Hutchins, Elizabeth D; Stapley, Jessica; Rupp, Shawn M; Eckalbar, Walter L; Maayan, Inbar; Lasku, Eris; Infante, Carlos R; Dennis, Stuart R; Robertson, Joel A; May, Catherine M; Crusoe, Michael R; Bermingham, Eldredge; DeNardo, Dale F; Hsieh, Shi-Tong Tonia; Kulathinal, Rob J; McMillan, William Owen; Menke, Douglas B; Pratt, Stephen C; Rawls, Jeffery Alan; Sanjur, Oris; Wilson-Rawls, Jeanne; Wilson Sayres, Melissa A; Fisher, Rebecca E; Kusumi, Kenro

    2018-02-01

    Squamates include all lizards and snakes, and display some of the most diverse and extreme morphological adaptations among vertebrates. However, compared with birds and mammals, relatively few resources exist for comparative genomic analyses of squamates, hampering efforts to understand the molecular bases of phenotypic diversification in such a speciose clade. In particular, the ∼400 species of anole lizard represent an extensive squamate radiation. Here, we sequence and assemble the draft genomes of three anole species-Anolis frenatus, Anolis auratus, and Anolis apletophallus-for comparison with the available reference genome of Anolis carolinensis. Comparative analyses reveal a rapid background rate of molecular evolution consistent with a model of punctuated equilibrium, and strong purifying selection on functional genomic elements in anoles. We find evidence for accelerated evolution in genes involved in behavior, sensory perception, and reproduction, as well as in genes regulating limb bud development and hindlimb specification. Morphometric analyses of anole fore and hindlimbs corroborated these findings. We detect signatures of positive selection across several genes related to the development and regulation of the forebrain, hormones, and the iguanian lizard dewlap, suggesting molecular changes underlying behavioral adaptations known to reinforce species boundaries were a key component in the diversification of anole lizards. © The Author(s) 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  12. Absolute determination of single-stranded and self-complementary adeno-associated viral vector genome titers by droplet digital PCR.

    PubMed

    Lock, Martin; Alvira, Mauricio R; Chen, Shu-Jen; Wilson, James M

    2014-04-01

    Accurate titration of adeno-associated viral (AAV) vector genome copies is critical for ensuring correct and reproducible dosing in both preclinical and clinical settings. Quantitative PCR (qPCR) is the current method of choice for titrating AAV genomes because of the simplicity, accuracy, and robustness of the assay. However, issues with qPCR-based determination of self-complementary AAV vector genome titers, due to primer-probe exclusion through genome self-annealing or through packaging of prematurely terminated defective interfering (DI) genomes, have been reported. Alternative qPCR, gel-based, or Southern blotting titering methods have been designed to overcome these issues but may represent a backward step from standard qPCR methods in terms of simplicity, robustness, and precision. Droplet digital PCR (ddPCR) is a new PCR technique that directly quantifies DNA copies with an unparalleled degree of precision and without the need for a standard curve or for a high degree of amplification efficiency; all properties that lend themselves to the accurate quantification of both single-stranded and self-complementary AAV genomes. Here we compare a ddPCR-based AAV genome titer assay with a standard and an optimized qPCR assay for the titration of both single-stranded and self-complementary AAV genomes. We demonstrate absolute quantification of single-stranded AAV vector genomes by ddPCR with up to 4-fold increases in titer over a standard qPCR titration but with equivalent readout to an optimized qPCR assay. In the case of self-complementary vectors, ddPCR titers were on average 5-, 1.9-, and 2.3-fold higher than those determined by standard qPCR, optimized qPCR, and agarose gel assays, respectively. Droplet digital PCR-based genome titering was superior to qPCR in terms of both intra- and interassay precision and is more resistant to PCR inhibitors, a desirable feature for in-process monitoring of early-stage vector production and for vector genome biodistribution analysis in inhibitory tissues.

  13. Effect of reference genome selection on the performance of computational methods for genome-wide protein-protein interaction prediction.

    PubMed

    Muley, Vijaykumar Yogesh; Ranjan, Akash

    2012-01-01

    Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.

  14. Comparative genomics and evolution of eukaryotic phospholipidbiosynthesis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lykidis, Athanasios

    2006-12-01

    Phospholipid biosynthetic enzymes produce diverse molecular structures and are often present in multiple forms encoded by different genes. This work utilizes comparative genomics and phylogenetics for exploring the distribution, structure and evolution of phospholipid biosynthetic genes and pathways in 26 eukaryotic genomes. Although the basic structure of the pathways was formed early in eukaryotic evolution, the emerging picture indicates that individual enzyme families followed unique evolutionary courses. For example, choline and ethanolamine kinases and cytidylyltransferases emerged in ancestral eukaryotes, whereas, multiple forms of the corresponding phosphatidyltransferases evolved mainly in a lineage specific manner. Furthermore, several unicellular eukaryotes maintain bacterial-type enzymesmore » and reactions for the synthesis of phosphatidylglycerol and cardiolipin. Also, base-exchange phosphatidylserine synthases are widespread and ancestral enzymes. The multiplicity of phospholipid biosynthetic enzymes has been largely generated by gene expansion in a lineage specific manner. Thus, these observations suggest that phospholipid biosynthesis has been an actively evolving system. Finally, comparative genomic analysis indicates the existence of novel phosphatidyltransferases and provides a candidate for the uncharacterized eukaryotic phosphatidylglycerol phosphate phosphatase.« less

  15. Chloroplast genes transferred to the nuclear plant genome have adjusted to nuclear base composition and codon usage.

    PubMed Central

    Oliver, J L; Marín, A; Martínez-Zapater, J M

    1990-01-01

    During plant evolution, some plastid genes have been moved to the nuclear genome. These transferred genes are now correctly expressed in the nucleus, their products being transported into the chloroplast. We compared the base compositions, the distributions of some dinucleotides and codon usages of transferred, nuclear and chloroplast genes in two dicots and two monocots plant species. Our results indicate that transferred genes have adjusted to nuclear base composition and codon usage, being now more similar to the nuclear genes than to the chloroplast ones in every species analyzed. PMID:2308837

  16. Comparative Transcriptomes and EVO-DEVO Studies Depending on Next Generation Sequencing.

    PubMed

    Liu, Tiancheng; Yu, Lin; Liu, Lei; Li, Hong; Li, Yixue

    2015-01-01

    High throughput technology has prompted the progressive omics studies, including genomics and transcriptomics. We have reviewed the improvement of comparative omic studies, which are attributed to the high throughput measurement of next generation sequencing technology. Comparative genomics have been successfully applied to evolution analysis while comparative transcriptomics are adopted in comparison of expression profile from two subjects by differential expression or differential coexpression, which enables their application in evolutionary developmental biology (EVO-DEVO) studies. EVO-DEVO studies focus on the evolutionary pressure affecting the morphogenesis of development and previous works have been conducted to illustrate the most conserved stages during embryonic development. Old measurements of these studies are based on the morphological similarity from macro view and new technology enables the micro detection of similarity in molecular mechanism. Evolutionary model of embryo development, which includes the "funnel-like" model and the "hourglass" model, has been evaluated by combination of these new comparative transcriptomic methods with prior comparative genomic information. Although the technology has promoted the EVO-DEVO studies into a new era, technological and material limitation still exist and further investigations require more subtle study design and procedure.

  17. Clinical utility of an array comparative genomic hybridization analysis for Williams syndrome.

    PubMed

    Yagihashi, Tatsuhiko; Torii, Chiharu; Takahashi, Reiko; Omori, Mikimasa; Kosaki, Rika; Yoshihashi, Hiroshi; Ihara, Masahiro; Minagawa-Kawai, Yasuyo; Yamamoto, Junichi; Takahashi, Takao; Kosaki, Kenjiro

    2014-11-01

    To reveal the relation between intellectual disability and the deleted intervals in Williams syndrome, we performed an array comparative genomic hybridization analysis and standardized developmental testing for 11 patients diagnosed as having Williams syndrome based on fluorescent in situ hybridization testing. One patient had a large 4.2-Mb deletion spanning distally beyond the common 1.5-Mb intervals observed in 10/11 patients. We formulated a linear equation describing the developmental age of the 10 patients with the common deletion; the developmental age of the patient with the 4.2-Mb deletion was significantly below the expectation (developmental age = 0.51 × chronological age). The large deletion may account for the severe intellectual disability; therefore, the use of array comparative genomic hybridization may provide practical information regarding individuals with Williams syndrome. © 2014 Japanese Teratology Society.

  18. Drop-on-Demand Single Cell Isolation and Total RNA Analysis

    PubMed Central

    Moon, Sangjun; Kim, Yun-Gon; Dong, Lingsheng; Lombardi, Michael; Haeggstrom, Edward; Jensen, Roderick V.; Hsiao, Li-Li; Demirci, Utkan

    2011-01-01

    Technologies that rapidly isolate viable single cells from heterogeneous solutions have significantly contributed to the field of medical genomics. Challenges remain both to enable efficient extraction, isolation and patterning of single cells from heterogeneous solutions as well as to keep them alive during the process due to a limited degree of control over single cell manipulation. Here, we present a microdroplet based method to isolate and pattern single cells from heterogeneous cell suspensions (10% target cell mixture), preserve viability of the extracted cells (97.0±0.8%), and obtain genomic information from isolated cells compared to the non-patterned controls. The cell encapsulation process is both experimentally and theoretically analyzed. Using the isolated cells, we identified 11 stem cell markers among 1000 genes and compare to the controls. This automated platform enabling high-throughput cell manipulation for subsequent genomic analysis employs fewer handling steps compared to existing methods. PMID:21412416

  19. Large-Scale Comparative Phenotypic and Genomic Analyses Reveal Ecological Preferences of Shewanella Species and Identify Metabolic Pathways Conserved at the Genus Level ▿ †

    PubMed Central

    Rodrigues, Jorge L. M.; Serres, Margrethe H.; Tiedje, James M.

    2011-01-01

    The use of comparative genomics for the study of different microbiological species has increased substantially as sequence technologies become more affordable. However, efforts to fully link a genotype to its phenotype remain limited to the development of one mutant at a time. In this study, we provided a high-throughput alternative to this limiting step by coupling comparative genomics to the use of phenotype arrays for five sequenced Shewanella strains. Positive phenotypes were obtained for 441 nutrients (C, N, P, and S sources), with N-based compounds being the most utilized for all strains. Many genes and pathways predicted by genome analyses were confirmed with the comparative phenotype assay, and three degradation pathways believed to be missing in Shewanella were confirmed as missing. A number of previously unknown gene products were predicted to be parts of pathways or to have a function, expanding the number of gene targets for future genetic analyses. Ecologically, the comparative high-throughput phenotype analysis provided insights into niche specialization among the five different strains. For example, Shewanella amazonensis strain SB2B, isolated from the Amazon River delta, was capable of utilizing 60 C compounds, whereas Shewanella sp. strain W3-18-1, isolated from deep marine sediment, utilized only 25 of them. In spite of the large number of nutrient sources yielding positive results, our study indicated that except for the N sources, they were not sufficiently informative to predict growth phenotypes from increasing evolutionary distances. Our results indicate the importance of phenotypic evaluation for confirming genome predictions. This strategy will accelerate the functional discovery of genes and provide an ecological framework for microbial genome sequencing projects. PMID:21642407

  20. Comparative genomic analysis of the genus Staphylococcus including Staphylococcus aureus and its newly described sister species Staphylococcus simiae

    PubMed Central

    2012-01-01

    Background Staphylococcus belongs to the Gram-positive low G + C content group of the Firmicutes division of bacteria. Staphylococcus aureus is an important human and veterinary pathogen that causes a broad spectrum of diseases, and has developed important multidrug resistant forms such as methicillin-resistant S. aureus (MRSA). Staphylococcus simiae was isolated from South American squirrel monkeys in 2000, and is a coagulase-negative bacterium, closely related, and possibly the sister group, to S. aureus. Comparative genomic analyses of closely related bacteria with different phenotypes can provide information relevant to understanding adaptation to host environment and mechanisms of pathogenicity. Results We determined a Roche/454 draft genome sequence for S. simiae and included it in comparative genomic analyses with 11 other Staphylococcus species including S. aureus. A genome based phylogeny of the genus confirms that S. simiae is the sister group to S. aureus and indicates that the most basal Staphylococcus lineage is Staphylococcus pseudintermedius, followed by Staphylococcus carnosus. Given the primary niche of these two latter taxa, compared to the other species in the genus, this phylogeny suggests that human adaptation evolved after the split of S. carnosus. The two coagulase-positive species (S. aureus and S. pseudintermedius) are not phylogenetically closest but share many virulence factors exclusively, suggesting that these genes were acquired by horizontal transfer. Enrichment in genes related to mobile elements such as prophage in S. aureus relative to S. simiae suggests that pathogenesis in the S. aureus group has developed by gene gain through horizontal transfer, after the split of S. aureus and S. simiae from their common ancestor. Conclusions Comparative genomic analyses across 12 Staphylococcus species provide hypotheses about lineages in which human adaptation has taken place and contributions of horizontal transfer in pathogenesis. PMID:22272658

  1. Slow but not low: genomic comparisons reveal slower evolutionary rate and higher dN/dS in conifers compared to angiosperms.

    PubMed

    Buschiazzo, Emmanuel; Ritland, Carol; Bohlmann, Jörg; Ritland, Kermit

    2012-01-20

    Comparative genomics can inform us about the processes of mutation and selection across diverse taxa. Among seed plants, gymnosperms have been lacking in genomic comparisons. Recent EST and full-length cDNA collections for two conifers, Sitka spruce (Picea sitchensis) and loblolly pine (Pinus taeda), together with full genome sequences for two angiosperms, Arabidopsis thaliana and poplar (Populus trichocarpa), offer an opportunity to infer the evolutionary processes underlying thousands of orthologous protein-coding genes in gymnosperms compared with an angiosperm orthologue set. Based upon pairwise comparisons of 3,723 spruce and pine orthologues, we found an average synonymous genetic distance (dS) of 0.191, and an average dN/dS ratio of 0.314. Using a fossil-established divergence time of 140 million years between spruce and pine, we extrapolated a nucleotide substitution rate of 0.68 × 10(-9) synonymous substitutions per site per year. When compared to angiosperms, this indicates a dramatically slower rate of nucleotide substitution rates in conifers: on average 15-fold. Coincidentally, we found a three-fold higher dN/dS for the spruce-pine lineage compared to the poplar-Arabidopsis lineage. This joint occurrence of a slower evolutionary rate in conifers with higher dN/dS, and possibly positive selection, showcases the uniqueness of conifer genome evolution. Our results are in line with documented reduced nucleotide diversity, conservative genome evolution and low rates of diversification in conifers on the one hand and numerous examples of local adaptation in conifers on the other hand. We propose that reduced levels of nucleotide mutation in large and long-lived conifer trees, coupled with large effective population size, were the main factors leading to slow substitution rates but retention of beneficial mutations.

  2. Genome sequencing and comparative genomics reveal a repertoire of putative pathogenicity genes in chilli anthracnose fungus Colletotrichum truncatum.

    PubMed

    Rao, Soumya; Nandineni, Madhusudan R

    2017-01-01

    Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens.

  3. Genome sequencing and comparative genomics reveal a repertoire of putative pathogenicity genes in chilli anthracnose fungus Colletotrichum truncatum

    PubMed Central

    Rao, Soumya

    2017-01-01

    Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens. PMID:28846714

  4. Breaking Lander-Waterman’s Coverage Bound

    PubMed Central

    Nashta-ali, Damoun; Motahari, Seyed Abolfazl; Hosseinkhalaj, Babak

    2016-01-01

    Lander-Waterman’s coverage bound establishes the total number of reads required to cover the whole genome of size G bases. In fact, their bound is a direct consequence of the well-known solution to the coupon collector’s problem which proves that for such genome, the total number of bases to be sequenced should be O(G ln G). Although the result leads to a tight bound, it is based on a tacit assumption that the set of reads are first collected through a sequencing process and then are processed through a computation process, i.e., there are two different machines: one for sequencing and one for processing. In this paper, we present a significant improvement compared to Lander-Waterman’s result and prove that by combining the sequencing and computing processes, one can re-sequence the whole genome with as low as O(G) sequenced bases in total. Our approach also dramatically reduces the required computational power for the combined process. Simulation results are performed on real genomes with different sequencing error rates. The results support our theory predicting the log G improvement on coverage bound and corresponding reduction in the total number of bases required to be sequenced. PMID:27806058

  5. Integrating Nonadditive Genomic Relationship Matrices into the Study of Genetic Architecture of Complex Traits.

    PubMed

    Nazarian, Alireza; Gezan, Salvador A

    2016-03-01

    The study of genetic architecture of complex traits has been dramatically influenced by implementing genome-wide analytical approaches during recent years. Of particular interest are genomic prediction strategies which make use of genomic information for predicting phenotypic responses instead of detecting trait-associated loci. In this work, we present the results of a simulation study to improve our understanding of the statistical properties of estimation of genetic variance components of complex traits, and of additive, dominance, and genetic effects through best linear unbiased prediction methodology. Simulated dense marker information was used to construct genomic additive and dominance matrices, and multiple alternative pedigree- and marker-based models were compared to determine if including a dominance term into the analysis may improve the genetic analysis of complex traits. Our results showed that a model containing a pedigree- or marker-based additive relationship matrix along with a pedigree-based dominance matrix provided the best partitioning of genetic variance into its components, especially when some degree of true dominance effects was expected to exist. Also, we noted that the use of a marker-based additive relationship matrix along with a pedigree-based dominance matrix had the best performance in terms of accuracy of correlations between true and estimated additive, dominance, and genetic effects. © The American Genetic Association 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  6. Evolution of genome size and chromosome number in the carnivorous plant genus Genlisea (Lentibulariaceae), with a new estimate of the minimum genome size in angiosperms

    PubMed Central

    Fleischmann, Andreas; Michael, Todd P.; Rivadavia, Fernando; Sousa, Aretuza; Wang, Wenqin; Temsch, Eva M.; Greilhuber, Johann; Müller, Kai F.; Heubl, Günther

    2014-01-01

    Background and Aims Some species of Genlisea possess ultrasmall nuclear genomes, the smallest known among angiosperms, and some have been found to have chromosomes of diminutive size, which may explain why chromosome numbers and karyotypes are not known for the majority of species of the genus. However, other members of the genus do not possess ultrasmall genomes, nor do most taxa studied in related genera of the family or order. This study therefore examined the evolution of genome sizes and chromosome numbers in Genlisea in a phylogenetic context. The correlations of genome size with chromosome number and size, with the phylogeny of the group and with growth forms and habitats were also examined. Methods Nuclear genome sizes were measured from cultivated plant material for a comprehensive sampling of taxa, including nearly half of all species of Genlisea and representing all major lineages. Flow cytometric measurements were conducted in parallel in two laboratories in order to compare the consistency of different methods and controls. Chromosome counts were performed for the majority of taxa, comparing different staining techniques for the ultrasmall chromosomes. Key Results Genome sizes of 15 taxa of Genlisea are presented and interpreted in a phylogenetic context. A high degree of congruence was found between genome size distribution and the major phylogenetic lineages. Ultrasmall genomes with 1C values of <100 Mbp were almost exclusively found in a derived lineage of South American species. The ancestral haploid chromosome number was inferred to be n = 8. Chromosome numbers in Genlisea ranged from 2n = 2x = 16 to 2n = 4x = 32. Ascendant dysploid series (2n = 36, 38) are documented for three derived taxa. The different ploidy levels corresponded to the two subgenera, but were not directly correlated to differences in genome size; the three different karyotype ranges mirrored the different sections of the genus. The smallest known plant genomes were not found in G. margaretae, as previously reported, but in G. tuberosa (1C ≈ 61 Mbp) and some strains of G. aurea (1C ≈ 64 Mbp). Conclusions Genlisea is an ideal candidate model organism for the understanding of genome reduction as the genus includes species with both relatively large (∼1700 Mbp) and ultrasmall (∼61 Mbp) genomes. This comparative, phylogeny-based analysis of genome sizes and karyotypes in Genlisea provides essential data for selection of suitable species for comparative whole-genome analyses, as well as for further studies on both the molecular and cytogenetic basis of genome reduction in plants. PMID:25274549

  7. Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences

    DOEpatents

    Eisenberg, David; Marcotte, Edward M.; Pellegrini, Matteo; Thompson, Michael J.; Yeates, Todd O.

    2002-10-15

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  8. Comparative inference of duplicated genes produced by polyploidization in soybean genome.

    PubMed

    Yang, Yanmei; Wang, Jinpeng; Di, Jianyong

    2013-01-01

    Soybean (Glycine max) is one of the most important crop plants for providing protein and oil. It is important to investigate soybean genome for its economic and scientific value. Polyploidy is a widespread and recursive phenomenon during plant evolution, and it could generate massive duplicated genes which is an important resource for genetic innovation. Improved sequence alignment criteria and statistical analysis are used to identify and characterize duplicated genes produced by polyploidization in soybean. Based on the collinearity method, duplicated genes by whole genome duplication account for 70.3% in soybean. From the statistical analysis of the molecular distances between duplicated genes, our study indicates that the whole genome duplication event occurred more than once in the genome evolution of soybean, which is often distributed near the ends of chromosomes.

  9. Complete Genome Sequence and Comparative Genomics of a Streptococcus pyogenes emm3 Strain M3-b isolated from a Japanese Patient with Streptococcal Toxic Shock Syndrome.

    PubMed

    Ogura, Kohei; Watanabe, Shinya; Kirikae, Teruo; Miyoshi-Akiyama, Tohru

    2017-01-01

    Epidemiologic typing of Streptococcus pyogenes (GAS) is frequently based on the genotype of the emm gene, which encodes M/Emm protein. In this study, the complete genome sequence of GAS emm3 strain M3-b, isolated from a patient with streptococcal toxic shock syndrome (STSS), was determined. This strain exhibited 99% identity with other complete genome sequences of emm3 strains MGAS315, SSI-1, and STAB902. The complete genomes of five additional strains isolated from Japanese patients with and without STSS were also sequences. Maximum-likelihood phylogenetic analysis showed that strains M3-b, M3-e, and SSI-1, all which were isolated from STSS patients, were relatively close.

  10. Radiation hybrid maps of the D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes.

    PubMed

    Kumar, Ajay; Seetan, Raed; Mergoum, Mohamed; Tiwari, Vijay K; Iqbal, Muhammad J; Wang, Yi; Al-Azzam, Omar; Šimková, Hana; Luo, Ming-Cheng; Dvorak, Jan; Gu, Yong Q; Denton, Anne; Kilian, Andrzej; Lazo, Gerard R; Kianian, Shahryar F

    2015-10-16

    The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high resolution genome maps with saturated marker scaffolds to anchor and orient BAC contigs/ sequence scaffolds for whole genome assembly. Radiation hybrid (RH) mapping has proven to be an excellent tool for the development of such maps for it offers much higher and more uniform marker resolution across the length of the chromosome compared to genetic mapping and does not require marker polymorphism per se, as it is based on presence (retention) vs. absence (deletion) marker assay. In this study, a 178 line RH panel was genotyped with SSRs and DArT markers to develop the first high resolution RH maps of the entire D-genome of Ae. tauschii accession AL8/78. To confirm map order accuracy, the AL8/78-RH maps were compared with:1) a DArT consensus genetic map constructed using more than 100 bi-parental populations, 2) a RH map of the D-genome of reference hexaploid wheat 'Chinese Spring', and 3) two SNP-based genetic maps, one with anchored D-genome BAC contigs and another with anchored D-genome sequence scaffolds. Using marker sequences, the RH maps were also anchored with a BAC contig based physical map and draft sequence of the D-genome of Ae. tauschii. A total of 609 markers were mapped to 503 unique positions on the seven D-genome chromosomes, with a total map length of 14,706.7 cR. The average distance between any two marker loci was 29.2 cR which corresponds to 2.1 cM or 9.8 Mb. The average mapping resolution across the D-genome was estimated to be 0.34 Mb (Mb/cR) or 0.07 cM (cM/cR). The RH maps showed almost perfect agreement with several published maps with regard to chromosome assignments of markers. The mean rank correlations between the position of markers on AL8/78 maps and the four published maps, ranged from 0.75 to 0.92, suggesting a good agreement in marker order. With 609 mapped markers, a total of 2481 deletions for the whole D-genome were detected with an average deletion size of 42.0 Mb. A total of 520 markers were anchored to 216 Ae. tauschii sequence scaffolds, 116 of which were not anchored earlier to the D-genome. This study reports the development of first high resolution RH maps for the D-genome of Ae. tauschii accession AL8/78, which were then used for the anchoring of unassigned sequence scaffolds. This study demonstrates how RH mapping, which offered high and uniform resolution across the length of the chromosome, can facilitate the complete sequence assembly of the large and complex plant genomes.

  11. Cow genotyping strategies for genomic selection in a small dairy cattle population.

    PubMed

    Jenko, J; Wiggans, G R; Cooper, T A; Eaglen, S A E; Luff, W G de L; Bichard, M; Pong-Wong, R; Woolliams, J A

    2017-01-01

    This study compares how different cow genotyping strategies increase the accuracy of genomic estimated breeding values (EBV) in dairy cattle breeds with low numbers. In these breeds, few sires have progeny records, and genotyping cows can improve the accuracy of genomic EBV. The Guernsey breed is a small dairy cattle breed with approximately 14,000 recorded individuals worldwide. Predictions of phenotypes of milk yield, fat yield, protein yield, and calving interval were made for Guernsey cows from England and Guernsey Island using genomic EBV, with training sets including 197 de-regressed proofs of genotyped bulls, with cows selected from among 1,440 genotyped cows using different genotyping strategies. Accuracies of predictions were tested using 10-fold cross-validation among the cows. Genomic EBV were predicted using 4 different methods: (1) pedigree BLUP, (2) genomic BLUP using only bulls, (3) univariate genomic BLUP using bulls and cows, and (4) bivariate genomic BLUP. Genotyping cows with phenotypes and using their data for the prediction of single nucleotide polymorphism effects increased the correlation between genomic EBV and phenotypes compared with using only bulls by 0.163±0.022 for milk yield, 0.111±0.021 for fat yield, and 0.113±0.018 for protein yield; a decrease of 0.014±0.010 for calving interval from a low base was the only exception. Genetic correlation between phenotypes from bulls and cows were approximately 0.6 for all yield traits and significantly different from 1. Only a very small change occurred in correlation between genomic EBV and phenotypes when using the bivariate model. It was always better to genotype all the cows, but when only half of the cows were genotyped, a divergent selection strategy was better compared with the random or directional selection approach. Divergent selection of 30% of the cows remained superior for the yield traits in 8 of 10 folds. Copyright © 2017 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  12. Genomic comparison of multi-drug resistant invasive and colonizing Acinetobacter baumannii isolated from diverse human body sites reveals genomic plasticity.

    PubMed

    Sahl, Jason W; Johnson, J Kristie; Harris, Anthony D; Phillippy, Adam M; Hsiao, William W; Thom, Kerri A; Rasko, David A

    2011-06-04

    Acinetobacter baumannii has recently emerged as a significant global pathogen, with a surprisingly rapid acquisition of antibiotic resistance and spread within hospitals and health care institutions. This study examines the genomic content of three A. baumannii strains isolated from distinct body sites. Isolates from blood, peri-anal, and wound sources were examined in an attempt to identify genetic features that could be correlated to each isolation source. Pulsed-field gel electrophoresis, multi-locus sequence typing and antibiotic resistance profiles demonstrated genotypic and phenotypic variation. Each isolate was sequenced to high-quality draft status, which allowed for comparative genomic analyses with existing A. baumannii genomes. A high resolution, whole genome alignment method detailed the phylogenetic relationships of sequenced A. baumannii and found no correlation between phylogeny and body site of isolation. This method identified genomic regions unique to both those isolates found on the surface of the skin or in wounds, termed colonization isolates, and those identified from body fluids, termed invasive isolates; these regions may play a role in the pathogenesis and spread of this important pathogen. A PCR-based screen of 74 A. baumanii isolates demonstrated that these unique genes are not exclusive to either phenotype or isolation source; however, a conserved genomic region exclusive to all sequenced A. baumannii was identified and verified. The results of the comparative genome analysis and PCR assay show that A. baumannii is a diverse and genomically variable pathogen that appears to have the potential to cause a range of human disease regardless of the isolation source.

  13. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective.

    PubMed

    Raman, Gurusamy; Park, SeonJoo

    2015-01-01

    Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp) genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC) region (82,805 bp), with some variations in the inverted repeat region A (IRA)/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19) was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA) and ribosomal protein subunit L23 (rpl23) genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP) gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.

  14. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective

    PubMed Central

    Raman, Gurusamy; Park, SeonJoo

    2015-01-01

    Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp) genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC) region (82,805 bp), with some variations in the inverted repeat region A (IRA)/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19) was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA) and ribosomal protein subunit L23 (rpl23) genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP) gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus. PMID:26513163

  15. The genome sequence of E. coli W (ATCC 9637): comparative genome analysis and an improved genome-scale reconstruction of E. coli

    PubMed Central

    2011-01-01

    Background Escherichia coli is a model prokaryote, an important pathogen, and a key organism for industrial biotechnology. E. coli W (ATCC 9637), one of four strains designated as safe for laboratory purposes, has not been sequenced. E. coli W is a fast-growing strain and is the only safe strain that can utilize sucrose as a carbon source. Lifecycle analysis has demonstrated that sucrose from sugarcane is a preferred carbon source for industrial bioprocesses. Results We have sequenced and annotated the genome of E. coli W. The chromosome is 4,900,968 bp and encodes 4,764 ORFs. Two plasmids, pRK1 (102,536 bp) and pRK2 (5,360 bp), are also present. W has unique features relative to other sequenced laboratory strains (K-12, B and Crooks): it has a larger genome and belongs to phylogroup B1 rather than A. W also grows on a much broader range of carbon sources than does K-12. A genome-scale reconstruction was developed and validated in order to interrogate metabolic properties. Conclusions The genome of W is more similar to commensal and pathogenic B1 strains than phylogroup A strains, and therefore has greater utility for comparative analyses with these strains. W should therefore be the strain of choice, or 'type strain' for group B1 comparative analyses. The genome annotation and tools created here are expected to allow further utilization and development of E. coli W as an industrial organism for sucrose-based bioprocesses. Refinements in our E. coli metabolic reconstruction allow it to more accurately define E. coli metabolism relative to previous models. PMID:21208457

  16. A computer-based education intervention to enhance surrogates' informed consent for genomics research.

    PubMed

    Shelton, Ann K; Freeman, Bradley D; Fish, Anne F; Bachman, Jean A; Richardson, Lloyd I

    2015-03-01

    Many research studies conducted today in critical care have a genomics component. Patients' surrogates asked to authorize participation in genomics research for a loved one in the intensive care unit may not be prepared to make informed decisions about a patient's participation in the research. To examine the effectiveness of a new, computer-based education module on surrogates' understanding of the process of informed consent for genomics research. A pilot study was conducted with visitors in the waiting rooms of 2 intensive care units in a Midwestern tertiary care medical center. Visitors were randomly assigned to the experimental (education module plus a sample genomics consent form; n = 65) or the control (sample genomics consent form only; n = 69) group. Participants later completed a test on informed genomics consent. Understanding the process of informed consent was greater (P = .001) in the experimental group than in the control group. Specifically, compared with the control group, the experimental group had a greater understanding of 8 of 13 elements of informed consent: intended benefits of research (P = .02), definition of surrogate consenter (P= .001), withdrawal from the study (P = .001), explanation of risk (P = .002), purpose of the institutional review board (P = .001), definition of substituted judgment (P = .03), compensation for harm (P = .001), and alternative treatments (P = .004). Computer-based education modules may be an important addition to conventional approaches for obtaining informed consent in the intensive care unit. Preparing patients' family members who may consider serving as surrogate consenters is critical to facilitating genomics research in critical care. ©2015 American Association of Critical-Care Nurses.

  17. Relevance of genetic relationship in GWAS and genomic prediction.

    PubMed

    Pereira, Helcio Duarte; Soriano Viana, José Marcelo; Andrade, Andréa Carla Bastos; Fonseca E Silva, Fabyano; Paes, Geísa Pinheiro

    2018-02-01

    The objective of this study was to analyze the relevance of relationship information on the identification of low heritability quantitative trait loci (QTLs) from a genome-wide association study (GWAS) and on the genomic prediction of complex traits in human, animal and cross-pollinating populations. The simulation-based data sets included 50 samples of 1000 individuals of seven populations derived from a common population with linkage disequilibrium. The populations had non-inbred and inbred progeny structure (50 to 200) with varying number of members (5 to 20). The individuals were genotyped for 10,000 single nucleotide polymorphisms (SNPs) and phenotyped for a quantitative trait controlled by 10 QTLs and 90 minor genes showing dominance. The SNP density was 0.1 cM and the narrow sense heritability was 25%. The QTL heritabilities ranged from 1.1 to 2.9%. We applied mixed model approaches for both GWAS and genomic prediction using pedigree-based and genomic relationship matrices. For GWAS, the observed false discovery rate was kept below the significance level of 5%, the power of detection for the low heritability QTLs ranged from 14 to 50%, and the average bias between significant SNPs and a QTL ranged from less than 0.01 to 0.23 cM. The QTL detection power was consistently higher using genomic relationship matrix. Regardless of population and training set size, genomic prediction provided higher prediction accuracy of complex trait when compared to pedigree-based prediction. The accuracy of genomic prediction when there is relatedness between individuals in the training set and the reference population is much higher than the value for unrelated individuals.

  18. Caveat emptor: limitations of the automated reconstruction of metabolic pathways in Plasmodium.

    PubMed

    Ginsburg, Hagai

    2009-01-01

    The functional reconstruction of metabolic pathways from an annotated genome is a tedious and demanding enterprise. Automation of this endeavor using bioinformatics algorithms could cope with the ever-increasing number of sequenced genomes and accelerate the process. Here, the manual reconstruction of metabolic pathways in the functional genomic database of Plasmodium falciparum--Malaria Parasite Metabolic Pathways--is described and compared with pathways generated automatically as they appear in PlasmoCyc, metaSHARK and the Kyoto Encyclopedia for Genes and Genomes. A critical evaluation of this comparison discloses that the automatic reconstruction of pathways generates manifold paths that need an expert manual verification to accept some and reject most others based on manually curated gene annotation.

  19. Comparative aerial- and ground-based high-throughput phenotyping for the genetic dissection of NDVI as a proxy for drought-adaptive traits in durum wheat

    USDA-ARS?s Scientific Manuscript database

    High-throughput phenotyping platforms (HTPPs) provide novel opportunities to more effectively dissect the genetic basis of drought-adaptive traits. This genome-wide association study (GWAS) compares the results obtained with two Unmanned Aerial Vehicles (UAVs) and a ground-based platform used to mea...

  20. Genome-wide patterns of copy number variation in the diversified chicken genomes using next-generation sequencing.

    PubMed

    Yi, Guoqiang; Qu, Lujiang; Liu, Jianfeng; Yan, Yiyuan; Xu, Guiyun; Yang, Ning

    2014-11-07

    Copy number variation (CNV) is important and widespread in the genome, and is a major cause of disease and phenotypic diversity. Herein, we performed a genome-wide CNV analysis in 12 diversified chicken genomes based on whole genome sequencing. A total of 8,840 CNV regions (CNVRs) covering 98.2 Mb and representing 9.4% of the chicken genome were identified, ranging in size from 1.1 to 268.8 kb with an average of 11.1 kb. Sequencing-based predictions were confirmed at a high validation rate by two independent approaches, including array comparative genomic hybridization (aCGH) and quantitative PCR (qPCR). The Pearson's correlation coefficients between sequencing and aCGH results ranged from 0.435 to 0.755, and qPCR experiments revealed a positive validation rate of 91.71% and a false negative rate of 22.43%. In total, 2,214 (25.0%) predicted CNVRs span 2,216 (36.4%) RefSeq genes associated with specific biological functions. Besides two previously reported copy number variable genes EDN3 and PRLR, we also found some promising genes with potential in phenotypic variation. Two genes, FZD6 and LIMS1, related to disease susceptibility/resistance are covered by CNVRs. The highly duplicated SOCS2 may lead to higher bone mineral density. Entire or partial duplication of some genes like POPDC3 may have great economic importance in poultry breeding. Our results based on extensive genetic diversity provide a more refined chicken CNV map and genome-wide gene copy number estimates, and warrant future CNV association studies for important traits in chickens.

  1. Likelihood-based gene annotations for gap filling and quality assessment in genome-scale metabolic models

    DOE PAGES

    Benedict, Matthew N.; Mundy, Michael B.; Henry, Christopher S.; ...

    2014-10-16

    Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genesmore » and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface.« less

  2. Likelihood-Based Gene Annotations for Gap Filling and Quality Assessment in Genome-Scale Metabolic Models

    PubMed Central

    Benedict, Matthew N.; Mundy, Michael B.; Henry, Christopher S.; Chia, Nicholas; Price, Nathan D.

    2014-01-01

    Genome-scale metabolic models provide a powerful means to harness information from genomes to deepen biological insights. With exponentially increasing sequencing capacity, there is an enormous need for automated reconstruction techniques that can provide more accurate models in a short time frame. Current methods for automated metabolic network reconstruction rely on gene and reaction annotations to build draft metabolic networks and algorithms to fill gaps in these networks. However, automated reconstruction is hampered by database inconsistencies, incorrect annotations, and gap filling largely without considering genomic information. Here we develop an approach for applying genomic information to predict alternative functions for genes and estimate their likelihoods from sequence homology. We show that computed likelihood values were significantly higher for annotations found in manually curated metabolic networks than those that were not. We then apply these alternative functional predictions to estimate reaction likelihoods, which are used in a new gap filling approach called likelihood-based gap filling to predict more genomically consistent solutions. To validate the likelihood-based gap filling approach, we applied it to models where essential pathways were removed, finding that likelihood-based gap filling identified more biologically relevant solutions than parsimony-based gap filling approaches. We also demonstrate that models gap filled using likelihood-based gap filling provide greater coverage and genomic consistency with metabolic gene functions compared to parsimony-based approaches. Interestingly, despite these findings, we found that likelihoods did not significantly affect consistency of gap filled models with Biolog and knockout lethality data. This indicates that the phenotype data alone cannot necessarily be used to discriminate between alternative solutions for gap filling and therefore, that the use of other information is necessary to obtain a more accurate network. All described workflows are implemented as part of the DOE Systems Biology Knowledgebase (KBase) and are publicly available via API or command-line web interface. PMID:25329157

  3. Minimal Absent Words in Four Human Genome Assemblies

    PubMed Central

    Garcia, Sara P.; Pinho, Armando J.

    2011-01-01

    Minimal absent words have been computed in genomes of organisms from all domains of life. Here, we aim to contribute to the catalogue of human genomic variation by investigating the variation in number and content of minimal absent words within a species, using four human genome assemblies. We compare the reference human genome GRCh37 assembly, the HuRef assembly of the genome of Craig Venter, the NA12878 assembly from cell line GM12878, and the YH assembly of the genome of a Han Chinese individual. We find the variation in number and content of minimal absent words between assemblies more significant for large and very large minimal absent words, where the biases of sequencing and assembly methodologies become more pronounced. Moreover, we find generally greater similarity between the human genome assemblies sequenced with capillary-based technologies (GRCh37 and HuRef) than between the human genome assemblies sequenced with massively parallel technologies (NA12878 and YH). Finally, as expected, we find the overall variation in number and content of minimal absent words within a species to be generally smaller than the variation between species. PMID:22220210

  4. Ortholog Identification and Comparative Analysis of Microbial Genomes Using MBGD and RECOG.

    PubMed

    Uchiyama, Ikuo

    2017-01-01

    Comparative genomics is becoming an essential approach for identification of genes associated with a specific function or phenotype. Here, we introduce the microbial genome database for comparative analysis (MBGD), which is a comprehensive ortholog database among the microbial genomes available so far. MBGD contains several precomputed ortholog tables including the standard ortholog table covering the entire taxonomic range and taxon-specific ortholog tables for various major taxa. In addition, MBGD allows the users to create an ortholog table within any specified set of genomes through dynamic calculations. In particular, MBGD has a "My MBGD" mode where users can upload their original genome sequences and incorporate them into orthology analysis. The created ortholog table can serve as the basis for various comparative analyses. Here, we describe the use of MBGD and briefly explain how to utilize the orthology information during comparative genome analysis in combination with the stand-alone comparative genomics software RECOG, focusing on the application to comparison of closely related microbial genomes.

  5. Exploring the post-genomic world: differing explanatory and manipulatory functions of post-genomic sciences

    PubMed Central

    Holmes, Christina; Carlson, Siobhan M.; McDonald, Fiona; Jones, Mavis; Graham, Janice

    2016-01-01

    Richard Lewontin proposed that the ability of a scientific field to create a narrative for public understanding garners it social relevance. This article applies Lewontin's conceptual framework of the functions of science (manipulatory and explanatory) to compare and explain the current differences in perceived societal relevance of genetics/genomics and proteomics. We provide three examples to illustrate the social relevance and strong cultural narrative of genetics/genomics for which no counterpart exists for proteomics. We argue that the major difference between genetics/genomics and proteomics is that genomics has a strong explanatory function, due to the strong cultural narrative of heredity. Based on qualitative interviews and observations of proteomics conferences, we suggest that the nature of proteins, lack of public understanding, and theoretical complexity exacerbates this difference for proteomics. Lewontin's framework suggests that social scientists may find that omics sciences affect social relations in different ways than past analyses of genetics. PMID:27134568

  6. Update on Genomic Databases and Resources at the National Center for Biotechnology Information.

    PubMed

    Tatusova, Tatiana

    2016-01-01

    The National Center for Biotechnology Information (NCBI), as a primary public repository of genomic sequence data, collects and maintains enormous amounts of heterogeneous data. Data for genomes, genes, gene expressions, gene variation, gene families, proteins, and protein domains are integrated with the analytical, search, and retrieval resources through the NCBI website, text-based search and retrieval system, provides a fast and easy way to navigate across diverse biological databases.Comparative genome analysis tools lead to further understanding of evolution processes quickening the pace of discovery. Recent technological innovations have ignited an explosion in genome sequencing that has fundamentally changed our understanding of the biology of living organisms. This huge increase in DNA sequence data presents new challenges for the information management system and the visualization tools. New strategies have been designed to bring an order to this genome sequence shockwave and improve the usability of associated data.

  7. Draft genome sequence of Thermoanaerobacterium sp. strain PSU-2 isolated from thermophilic hydrogen producing reactor.

    PubMed

    O-Thong, Sompong; Khongkliang, Peerawat; Mamimin, Chonticha; Singkhala, Apinya; Prasertsan, Poonsuk; Birkeland, Nils-Kåre

    2017-06-01

    Thermoanaerobacterium sp. strain PSU-2 was isolated from thermophilic hydrogen producing reactor and subjected to draft genome sequencing on 454 pyrosequencing and annotated on RAST. The draft genome sequence of strain PSU-2 contains 2,552,497 bases with an estimated G + C content of 35.2%, 2555 CDS, 8 rRNAs and 57 tRNAs. The strain had a number of genes responsible for carbohydrates metabolic, amino acids and derivatives, and protein metabolism of 17.7%, 14.39% and 9.81%, respectively. Strain PSU-2 also had gene responsible for hydrogen biosynthesis as well as the genes related to Ni-Fe hydrogenase. Comparative genomic analysis indicates strain PSU-2 shares about 94% genome sequence similarity with Thermoanaerobacterium xylanolyticum LX-11. The nucleotide sequence of this draft genome was deposited into DDBJ/ENA/GenBank under the accession MSQD00000000.

  8. Genomic features of bacterial adaptation to plants

    PubMed Central

    Levy, Asaf; Gonzalez, Isai Salas; Mittelviefhaus, Maximilian; Clingenpeel, Scott; Paredes, Sur Herrera; Miao, Jiamin; Wang, Kunru; Devescovi, Giulia; Stillman, Kyra; Monteiro, Freddy; Alvarez, Bryan Rangel; Lundberg, Derek S.; Lu, Tse-Yuan; Lebeis, Sarah; Jin, Zhao; McDonald, Meredith; Klein, Andrew P.; Feltcher, Meghan E.; del Rio, Tijana Glavina; Grant, Sarah R.; Doty, Sharon L.; Ley, Ruth E.; Zhao, Bingyu; Venturi, Vittorio; Pelletier, Dale A.; Vorholt, Julia A.; Tringe, Susannah G.; Woyke, Tanja; Dangl, Jeffery L.

    2017-01-01

    Plants intimately associate with diverse bacteria. Plant-associated (PA) bacteria have ostensibly evolved genes enabling adaptation to the plant environment. However, the identities of such genes are mostly unknown and their functions are poorly characterized. We sequenced 484 genomes of bacterial isolates from roots of Brassicaceae, poplar, and maize. We then compared 3837 bacterial genomes to identify thousands of PA gene clusters. Genomes of PA bacteria encode more carbohydrate metabolism functions and fewer mobile elements than related non-plant associated genomes. We experimentally validated candidates from two sets of PA genes, one involved in plant colonization, the other serving in microbe-microbe competition between PA bacteria. We also identified 64 PA protein domains that potentially mimic plant domains; some are shared with PA fungi and oomycetes. This work expands the genome-based understanding of plant-microbe interactions and provides leads for efficient and sustainable agriculture through microbiome engineering. PMID:29255260

  9. Exploring the post-genomic world: differing explanatory and manipulatory functions of post-genomic sciences.

    PubMed

    Holmes, Christina; Carlson, Siobhan M; McDonald, Fiona; Jones, Mavis; Graham, Janice

    2016-01-02

    Richard Lewontin proposed that the ability of a scientific field to create a narrative for public understanding garners it social relevance. This article applies Lewontin's conceptual framework of the functions of science (manipulatory and explanatory) to compare and explain the current differences in perceived societal relevance of genetics/genomics and proteomics. We provide three examples to illustrate the social relevance and strong cultural narrative of genetics/genomics for which no counterpart exists for proteomics. We argue that the major difference between genetics/genomics and proteomics is that genomics has a strong explanatory function, due to the strong cultural narrative of heredity. Based on qualitative interviews and observations of proteomics conferences, we suggest that the nature of proteins, lack of public understanding, and theoretical complexity exacerbates this difference for proteomics. Lewontin's framework suggests that social scientists may find that omics sciences affect social relations in different ways than past analyses of genetics.

  10. All about the Human Genome Project (HGP)

    MedlinePlus

    ... CSER), and Genome Sequencing Informatics Tools (GS-IT) Comparative Genomics Background information prepared for the media on ... other species to the human sequence. Background on Comparative Genomic Analysis New Process to Prioritize Animal Genomes ...

  11. AGORA : Organellar genome annotation from the amino acid and nucleotide references.

    PubMed

    Jung, Jaehee; Kim, Jong Im; Jeong, Young-Sik; Yi, Gangman

    2018-03-29

    Next-generation sequencing (NGS) technologies have led to the accumulation of highthroughput sequence data from various organisms in biology. To apply gene annotation of organellar genomes for various organisms, more optimized tools for functional gene annotation are required. Almost all gene annotation tools are mainly focused on the chloroplast genome of land plants or the mitochondrial genome of animals.We have developed a web application AGORA for the fast, user-friendly, and improved annotations of organellar genomes. AGORA annotates genes based on a BLAST-based homology search and clustering with selected reference sequences from the NCBI database or user-defined uploaded data. AGORA can annotate the functional genes in almost all mitochondrion and plastid genomes of eukaryotes. The gene annotation of a genome with an exon-intron structure within a gene or inverted repeat region is also available. It provides information of start and end positions of each gene, BLAST results compared with the reference sequence, and visualization of gene map by OGDRAW. Users can freely use the software, and the accessible URL is https://bigdata.dongguk.edu/gene_project/AGORA/.The main module of the tool is implemented by the python and php, and the web page is built by the HTML and CSS to support all browsers. gangman@dongguk.edu.

  12. Comparative Mitogenomics of Plant Bugs (Hemiptera: Miridae): Identifying the AGG Codon Reassignments between Serine and Lysine

    PubMed Central

    Wang, Pei; Song, Fan; Cai, Wanzhi

    2014-01-01

    Insect mitochondrial genomes are very important to understand the molecular evolution as well as for phylogenetic and phylogeographic studies of the insects. The Miridae are the largest family of Heteroptera encompassing more than 11,000 described species and of great economic importance. For better understanding the diversity and the evolution of plant bugs, we sequence five new mitochondrial genomes and present the first comparative analysis of nine mitochondrial genomes of mirids available to date. Our result showed that gene content, gene arrangement, base composition and sequences of mitochondrial transcription termination factor were conserved in plant bugs. Intra-genus species shared more conserved genomic characteristics, such as nucleotide and amino acid composition of protein-coding genes, secondary structure and anticodon mutations of tRNAs, and non-coding sequences. Control region possessed several distinct characteristics, including: variable size, abundant tandem repetitions, and intra-genus conservation; and was useful in evolutionary and population genetic studies. The AGG codon reassignments were investigated between serine and lysine in the genera Adelphocoris and other cimicomorphans. Our analysis revealed correlated evolution between reassignments of the AGG codon and specific point mutations at the antidocons of tRNALys and tRNASer(AGN). Phylogenetic analysis indicated that mitochondrial genome sequences were useful in resolving family level relationship of Cimicomorpha. Comparative evolutionary analysis of plant bug mitochondrial genomes allowed the identification of previously neglected coding genes or non-coding regions as potential molecular markers. The finding of the AGG codon reassignments between serine and lysine indicated the parallel evolution of the genetic code in Hemiptera mitochondrial genomes. PMID:24988409

  13. Innovative assembly strategy contributes to understanding the evolution and conservation genetics of the endangered Solenodon paradoxus from the island of Hispaniola.

    PubMed

    Grigorev, Kirill; Kliver, Sergey; Dobrynin, Pavel; Komissarov, Aleksey; Wolfsberger, Walter; Krasheninnikova, Ksenia; Afanador-Herna Ndez, Yashira M; Brandt, Adam L; Paulino, Liz A; Carreras, Rosanna; Rodríguez, Luis E; Nu N Ez, Adrell; Brandt, Jessica R; Silva, Filipe; Herna Ndez-Martich, J David; Majeske, Audrey J; Antunes, Agostinho; Roca, Alfred L; O'Brien, Stephen J; Martínez-Cruzado, Juan Carlos; Oleksyk, Taras K

    2018-03-16

    Solenodons are insectivores living in Hispaniola and Cuba that form an isolated branch in the tree of placental mammals highly divergent from other eulipothyplan insectivores The history, unique biology and adaptations of these enigmatic venomous species could be illuminated by the availability of genome data, but a whole genome assembly for solenodons has not been previously performed, partially due to the difficulty in obtaining samples from the field. Island isolation and reduced numbers have likely resulted in high homozygosity within the Hispaniolan solenodon (Solenodon paradoxus), thus we tested the performance of several assembly strategies on the genome of this genetically impoverished species. The string-graph based assembly strategy seemed a better choice compared to the conventional de Bruijn graph approach, due to the high levels of homozygosity, which is often a hallmark of endemic or endangered species. A consensus reference genome was assembled from sequences of five individuals from the southern subspecies (S. p. woodi). In addition, we obtained additional sequence from one sample of the northern subspecies (S. p. paradoxus). The resulting genome assemblies were compared to each other, and annotated for genes, with a specific emphasis on venom genes, repeats, variable microsatellite loci and other genomic variants. Phylogenetic positioning and selection signatures were inferred based on 4,416 single copy orthologs from 10 other mammals. We estimated that solenodons diverged from other extant mammals 73.6 Mya. Patterns of SNP variation allowed us to infer population demography, which supported a subspecies split within the Hispaniolan solenodon at least 300 Kya.

  14. Assessing genome-wide copy number variation in the Han Chinese population.

    PubMed

    Lu, Jianqi; Lou, Haiyi; Fu, Ruiqing; Lu, Dongsheng; Zhang, Feng; Wu, Zhendong; Zhang, Xi; Li, Changhua; Fang, Baijun; Pu, Fangfang; Wei, Jingning; Wei, Qian; Zhang, Chao; Wang, Xiaoji; Lu, Yan; Yan, Shi; Yang, Yajun; Jin, Li; Xu, Shuhua

    2017-10-01

    Copy number variation (CNV) is a valuable source of genetic diversity in the human genome and a well-recognised cause of various genetic diseases. However, CNVs have been considerably under-represented in population-based studies, particularly the Han Chinese which is the largest ethnic group in the world. To build a representative CNV map for the Han Chinese population. We conducted a genome-wide CNV study involving 451 male Han Chinese samples from 11 geographical regions encompassing 28 dialect groups, representing a less-biased panel compared with the currently available data. We detected CNVs by using 4.2M NimbleGen comparative genomic hybridisation array and whole-genome deep sequencing of 51 samples to optimise the filtering conditions in CNV discovery. A comprehensive Han Chinese CNV map was built based on a set of high-quality variants (positive predictive value >0.8, with sizes ranging from 369 bp to 4.16 Mb and a median of 5907 bp). The map consists of 4012 CNV regions (CNVRs), and more than half are novel to the 30 East Asian CNV Project and the 1000 Genomes Project Phase 3. We further identified 81 CNVRs specific to regional groups, which was indicative of the subpopulation structure within the Han Chinese population. Our data are complementary to public data sources, and the CNV map may facilitate in the identification of pathogenic CNVs and further biomedical research studies involving the Han Chinese population. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  15. Enrichment of Root Endophytic Bacteria from Populus deltoides and Single-Cell-Genomics Analysis

    DOE PAGES

    Utturkar, Sagar M.; Cude, W. Nathan; Robeson, Jr., Michael S.; ...

    2016-07-15

    Bacterial endophytes that colonize Populus trees contribute to nutrient acquisition, prime immunity responses, and directly or indirectly increase both above- and below-ground biomasses. Endophytes are embedded within plant material, so physical separation and isolation are difficult tasks. Application of culture-independent methods, such as metagenome or bacterial transcriptome sequencing, has been limited due to the predominance of DNA from the plant biomass. In this paper, we present a modified differential and density gradient centrifugation-based protocol for the separation of endophytic bacteria from Populus roots. This protocol achieved substantial reduction in contaminating plant DNA, allowed enrichment of endophytic bacteria away from themore » plant material, and enabled single-cell genomics analysis. Four single-cell genomes were selected for whole-genome amplification based on their rarity in the microbiome (potentially uncultured taxa) as well as their inferred abilities to form associations with plants. Bioinformatics analyses, including assembly, contamination removal, and completeness estimation, were performed to obtain single-amplified genomes (SAGs) of organisms from the phyla Armatimonadetes, Verrucomicrobia, and Planctomycetes, which were unrepresented in our previous cultivation efforts. Finally, comparative genomic analysis revealed unique characteristics of each SAG that could facilitate future cultivation efforts for these bacteria.« less

  16. A case of female epispadias.

    PubMed

    Tantibhedhyangkul, Julierut; Copland, Susannah D; Haqq, Andrea M; Price, Thomas M

    2008-11-01

    To present a case of unrecognized female epispadias. Case report. University-based reproductive endocrinology and fertility clinic. A 16-year-old girl with epispadias, history of mild urinary incontinence, auditory neuropathy, and functional hyperandrogenism. None. Peripheral blood array-based comparative genomic hybridization. The patient was referred for evaluation of excessive weight gain, secondary amenorrhea, and abnormal external genitalia. Examination under anesthesia revealed bilateral labia minora hypertrophy, bifid clitoris, and a patulous urethra, consistent with female epispadias. Hormonal evaluation showed functional hyperandrogenism, and peripheral blood array-based comparative genomic hybridization showed no chromosomal deletions or duplications. Female epispadias is a rare abnormality, not commonly recognized by most practitioners. The diagnosis is supported by a history of urinary incontinence and physical findings of bifid clitoris and patulous urethra. The condition can have serious physical and psychological consequences leading to a gross disruption of social function.

  17. Phylogenetic relationship and virulence inference of Streptococcus Anginosus Group: curated annotation and whole-genome comparative analysis support distinct species designation

    PubMed Central

    2013-01-01

    Background The Streptococcus Anginosus Group (SAG) represents three closely related species of the viridans group streptococci recognized as commensal bacteria of the oral, gastrointestinal and urogenital tracts. The SAG also cause severe invasive infections, and are pathogens during cystic fibrosis (CF) pulmonary exacerbation. Little genomic information or description of virulence mechanisms is currently available for SAG. We conducted intra and inter species whole-genome comparative analyses with 59 publically available Streptococcus genomes and seven in-house closed high quality finished SAG genomes; S. constellatus (3), S. intermedius (2), and S. anginosus (2). For each SAG species, we sequenced at least one numerically dominant strain from CF airways recovered during acute exacerbation and an invasive, non-lung isolate. We also evaluated microevolution that occurred within two isolates that were cultured from one individual one year apart. Results The SAG genomes were most closely related to S. gordonii and S. sanguinis, based on shared orthologs and harbor a similar number of proteins within each COG category as other Streptococcus species. Numerous characterized streptococcus virulence factor homologs were identified within the SAG genomes including; adherence, invasion, spreading factors, LPxTG cell wall proteins, and two component histidine kinases known to be involved in virulence gene regulation. Mobile elements, primarily integrative conjugative elements and bacteriophage, account for greater than 10% of the SAG genomes. S. anginosus was the most variable species sequenced in this study, yielding both the smallest and the largest SAG genomes containing multiple genomic rearrangements, insertions and deletions. In contrast, within the S. constellatus and S. intermedius species, there was extensive continuous synteny, with only slight differences in genome size between strains. Within S. constellatus we were able to determine important SNPs and changes in VNTR numbers that occurred over the course of one year. Conclusions The comparative genomic analysis of the SAG clarifies the phylogenetics of these bacteria and supports the distinct species classification. Numerous potential virulence determinants were identified and provide a foundation for further studies into SAG pathogenesis. Furthermore, the data may be used to enable the development of rapid diagnostic assays and therapeutics for these pathogens. PMID:24341328

  18. The importance of information on relatives for the prediction of genomic breeding values and the implications for the makeup of reference data sets in livestock breeding schemes.

    PubMed

    Clark, Samuel A; Hickey, John M; Daetwyler, Hans D; van der Werf, Julius H J

    2012-02-09

    The theory of genomic selection is based on the prediction of the effects of genetic markers in linkage disequilibrium with quantitative trait loci. However, genomic selection also relies on relationships between individuals to accurately predict genetic value. This study aimed to examine the importance of information on relatives versus that of unrelated or more distantly related individuals on the estimation of genomic breeding values. Simulated and real data were used to examine the effects of various degrees of relationship on the accuracy of genomic selection. Genomic Best Linear Unbiased Prediction (gBLUP) was compared to two pedigree based BLUP methods, one with a shallow one generation pedigree and the other with a deep ten generation pedigree. The accuracy of estimated breeding values for different groups of selection candidates that had varying degrees of relationships to a reference data set of 1750 animals was investigated. The gBLUP method predicted breeding values more accurately than BLUP. The most accurate breeding values were estimated using gBLUP for closely related animals. Similarly, the pedigree based BLUP methods were also accurate for closely related animals, however when the pedigree based BLUP methods were used to predict unrelated animals, the accuracy was close to zero. In contrast, gBLUP breeding values, for animals that had no pedigree relationship with animals in the reference data set, allowed substantial accuracy. An animal's relationship to the reference data set is an important factor for the accuracy of genomic predictions. Animals that share a close relationship to the reference data set had the highest accuracy from genomic predictions. However a baseline accuracy that is driven by the reference data set size and the overall population effective population size enables gBLUP to estimate a breeding value for unrelated animals within a population (breed), using information previously ignored by pedigree based BLUP methods.

  19. The Draft Genome and Transcriptome of Amaranthus hypochondriacus: A C4 Dicot Producing High-Lysine Edible Pseudo-Cereal

    PubMed Central

    Sunil, Meeta; Hariharan, Arun K.; Nayak, Soumya; Gupta, Saurabh; Nambisan, Suran R.; Gupta, Ravi P.; Panda, Binay; Choudhary, Bibha; Srinivasan, Subhashini

    2014-01-01

    Grain amaranths, edible C4 dicots, produce pseudo-cereals high in lysine. Lysine being one of the most limiting essential amino acids in cereals and C4 photosynthesis being one of the most sought-after phenotypes in protein-rich legume crops, the genome of one of the grain amaranths is likely to play a critical role in crop research. We have sequenced the genome and transcriptome of Amaranthus hypochondriacus, a diploid (2n = 32) belonging to the order Caryophyllales with an estimated genome size of 466 Mb. Of the 411 linkage single-nucleotide polymorphisms (SNPs) reported for grain amaranths, 355 SNPs (86%) are represented in the scaffolds and 74% of the 8.6 billion bases of the sequenced transcriptome map to the genomic scaffolds. The genome of A. hypochondriacus, codes for at least 24,829 proteins, shares the paleohexaploidy event with species under the superorders Rosids and Asterids, harbours 1 SNP in 1,000 bases, and contains 13.76% of repeat elements. Annotation of all the genes in the lysine biosynthetic pathway using comparative genomics and expression analysis offers insights into the high-lysine phenotype. As the first grain species under Caryophyllales and the first C4 dicot genome reported, the work presented here will be beneficial in improving crops and in expanding our understanding of angiosperm evolution. PMID:25071079

  20. Noninvasive Online Measurement of Genome Lengths of Mammalian Tissues in Bulk by 14 MeV Neutron Atometry

    NASA Astrophysics Data System (ADS)

    Maglich, Bogdan; Radovic, Anna; Druey, Christian

    2012-10-01

    Genome length, L=, no. of DNA nucleotide base pairs in cell of bovine (b) and porcine (p) tissues, closest to human genome, were hitherto measured by genomic sequencing Lb=3, Lp=2.7 Giga base pairs [1,2] (Gbp) errors not given. - We report measurements of Lb/Lp and Lb, Lp without sequencing by atometry [3,4]. No. of O and C atoms, N, in nucleotide molecules, was obtained from prompt γ rate, G, emitted in inel. scatt. 14 MeV neutrons, with nuclei of C, O, in nucleotide molecule. Since G prop. N, Lb/Lp=Gb/Gp. p and b meat was irradiated for 30'. From msd G we obtained Lb /Lp=1.28±0.02 16% greater than [1,2]. We got absolute Lb=1.65/f, Lp=1.28/f Gbp, 0.3

  1. NIH tools facilitate matching cancer drugs with gene targets

    Cancer.gov

    A new study details how a suite of web-based tools provides the research community with greatly improved capacity to compare data derived from large collections of genomic information against thousands of drugs. By comparing drugs and genetic targets, re

  2. Whole genome sequencing and comparative genomics of closely related Fusarium Head Blight fungi: Fusarium graminearum, F. meridionale and F. asiaticum.

    PubMed

    Walkowiak, Sean; Rowland, Owen; Rodrigue, Nicolas; Subramaniam, Rajagopal

    2016-12-09

    The Fusarium graminearum species complex is composed of many distinct fungal species that cause several diseases in economically important crops, including Fusarium Head Blight of wheat. Despite being closely related, these species and individuals within species have distinct phenotypic differences in toxin production and pathogenicity, with some isolates reported as non-pathogenic on certain hosts. In this report, we compare genomes and gene content of six new isolates from the species complex, including the first available genomes of F. asiaticum and F. meridionale, with four other genomes reported in previous studies. A comparison of genome structure and gene content revealed a 93-99% overlap across all ten genomes. We identified more than 700 k base pairs (kb) of single nucleotide polymorphisms (SNPs), insertions, and deletions (indels) within common regions of the genome, which validated the species and genetic populations reported within species. We constructed a non-redundant pan gene list containing 15,297 genes from the ten genomes and among them 1827 genes or 12% were absent in at least one genome. These genes were co-localized in telomeric regions and select regions within chromosomes with a corresponding increase in SNPs and indels. Many are also predicted to encode for proteins involved in secondary metabolism and other functions associated with disease. Genes that were common between isolates contained high levels of nucleotide variation and may be pseudogenes, allelic, or under diversifying selection. The genomic resources we have contributed will be useful for the identification of genes that contribute to the phenotypic variation and niche specialization that have been reported among members of the F. graminearum species complex.

  3. Complete Chloroplast Genome Sequences of Mongolia Medicine Artemisia frigida and Phylogenetic Relationships with Other Plants

    PubMed Central

    Liu, Yue; Huo, Naxin; Dong, Lingli; Wang, Yi; Zhang, Shuixian; Young, Hugh A.; Feng, Xiaoxiao; Gu, Yong Qiang

    2013-01-01

    Background Artemisia frigida Willd. is an important Mongolian traditional medicinal plant with pharmacological functions of stanch and detumescence. However, there is little sequence and genomic information available for Artemisia frigida, which makes phylogenetic identification, evolutionary studies, and genetic improvement of its value very difficult. We report the complete chloroplast genome sequence of Artemisia frigida based on 454 pyrosequencing. Methodology/Principal Findings The complete chloroplast genome of Artemisia frigida is 151,076 bp including a large single copy (LSC) region of 82,740 bp, a small single copy (SSC) region of 18,394 bp and a pair of inverted repeats (IRs) of 24,971 bp. The genome contains 114 unique genes and 18 duplicated genes. The chloroplast genome of Artemisia frigida contains a small 3.4 kb inversion within a large 23 kb inversion in the LSC region, a unique feature in Asteraceae. The gene order in the SSC region of Artemisia frigida is inverted compared with the other 6 Asteraceae species with the chloroplast genomes sequenced. This inversion is likely caused by an intramolecular recombination event only occurred in Artemisia frigida. The existence of rich SSR loci in the Artemisia frigida chloroplast genome provides a rare opportunity to study population genetics of this Mongolian medicinal plant. Phylogenetic analysis demonstrates a sister relationship between Artemisia frigida and four other species in Asteraceae, including Ageratina adenophora, Helianthus annuus, Guizotia abyssinica and Lactuca sativa, based on 61 protein-coding sequences. Furthermore, Artemisia frigida was placed in the tribe Anthemideae in the subfamily Asteroideae (Asteraceae) based on ndhF and trnL-F sequence comparisons. Conclusion The chloroplast genome sequence of Artemisia frigida was assembled and analyzed in this study, representing the first plastid genome sequenced in the Anthemideae tribe. This complete chloroplast genome sequence will be useful for molecular ecology and molecular phylogeny studies within Artemisia species and also within the Asteraceae family. PMID:23460871

  4. ISOL@: an Italian SOLAnaceae genomics resource.

    PubMed

    Chiusano, Maria Luisa; D'Agostino, Nunzio; Traini, Alessandra; Licciardello, Concetta; Raimondo, Enrico; Aversano, Mario; Frusciante, Luigi; Monti, Luigi

    2008-03-26

    Present-day '-omics' technologies produce overwhelming amounts of data which include genome sequences, information on gene expression (transcripts and proteins) and on cell metabolic status. These data represent multiple aspects of a biological system and need to be investigated as a whole to shed light on the mechanisms which underpin the system functionality. The gathering and convergence of data generated by high-throughput technologies, the effective integration of different data-sources and the analysis of the information content based on comparative approaches are key methods for meaningful biological interpretations. In the frame of the International Solanaceae Genome Project, we propose here ISOLA, an Italian SOLAnaceae genomics resource. ISOLA (available at http://biosrv.cab.unina.it/isola) represents a trial platform and it is conceived as a multi-level computational environment.ISOLA currently consists of two main levels: the genome and the expression level. The cornerstone of the genome level is represented by the Solanum lycopersicum genome draft sequences generated by the International Tomato Genome Sequencing Consortium. Instead, the basic element of the expression level is the transcriptome information from different Solanaceae species, mainly in the form of species-specific comprehensive collections of Expressed Sequence Tags (ESTs). The cross-talk between the genome and the expression levels is based on data source sharing and on tools that enhance data quality, that extract information content from the levels' under parts and produce value-added biological knowledge. ISOLA is the result of a bioinformatics effort that addresses the challenges of the post-genomics era. It is designed to exploit '-omics' data based on effective integration to acquire biological knowledge and to approach a systems biology view. Beyond providing experimental biologists with a preliminary annotation of the tomato genome, this effort aims to produce a trial computational environment where different aspects and details are maintained as they are relevant for the analysis of the organization, the functionality and the evolution of the Solanaceae family.

  5. Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High-Risk Bladder Cancer

    PubMed Central

    Lam, Lucia L.; Ghadessi, Mercedeh; Erho, Nicholas; Vergara, Ismael A.; Alshalalfa, Mohammed; Buerki, Christine; Haddad, Zaid; Sierocinski, Thomas; Triche, Timothy J.; Skinner, Eila C.; Davicioni, Elai; Daneshmand, Siamak; Black, Peter C.

    2014-01-01

    Background Nearly half of muscle-invasive bladder cancer patients succumb to their disease following cystectomy. Selecting candidates for adjuvant therapy is currently based on clinical parameters with limited predictive power. This study aimed to develop and validate genomic-based signatures that can better identify patients at risk for recurrence than clinical models alone. Methods Transcriptome-wide expression profiles were generated using 1.4 million feature-arrays on archival tumors from 225 patients who underwent radical cystectomy and had muscle-invasive and/or node-positive bladder cancer. Genomic (GC) and clinical (CC) classifiers for predicting recurrence were developed on a discovery set (n = 133). Performances of GC, CC, an independent clinical nomogram (IBCNC), and genomic-clinicopathologic classifiers (G-CC, G-IBCNC) were assessed in the discovery and independent validation (n = 66) sets. GC was further validated on four external datasets (n = 341). Discrimination and prognostic abilities of classifiers were compared using area under receiver-operating characteristic curves (AUCs). All statistical tests were two-sided. Results A 15-feature GC was developed on the discovery set with area under curve (AUC) of 0.77 in the validation set. This was higher than individual clinical variables, IBCNC (AUC = 0.73), and comparable to CC (AUC = 0.78). Performance was improved upon combining GC with clinical nomograms (G-IBCNC, AUC = 0.82; G-CC, AUC = 0.86). G-CC high-risk patients had elevated recurrence probabilities (P < .001), with GC being the best predictor by multivariable analysis (P = .005). Genomic-clinicopathologic classifiers outperformed clinical nomograms by decision curve and reclassification analyses. GC performed the best in validation compared with seven prior signatures. GC markers remained prognostic across four independent datasets. Conclusions The validated genomic-based classifiers outperform clinical models for predicting postcystectomy bladder cancer recurrence. This may be used to better identify patients who need more aggressive management. PMID:25344601

  6. Analysis of infant isolates of Bifidobacterium breve by comparative genome hybridization indicates the existence of new subspecies with marked infant specificity.

    PubMed

    Boesten, Rolf; Schuren, Frank; Wind, Richèle D; Knol, Jan; de Vos, Willem M

    2011-09-01

    A total of 20 Bifidobacterium strains were isolated from fecal samples of 4 breast- and bottle-fed infants and all were characterized as Bifidobacterium breve based on 16S rRNA gene sequence and metabolic analysis. These isolates were further characterized and compared to the type strains of B. breve and 7 other Bifidobacterium spp. by comparative genome hybridization. For this purpose, we constructed and used a DNA-based microarray containing over 2000 randomly cloned DNA fragments from B. breve type strain LMG13208. This molecular analysis revealed a high degree of genomic variation between the isolated strains and allowed the vast majority to be grouped into 4 clusters. One cluster contained a single isolate that was virtually indistinguishable from the B. breve type strain. The 3 other clusters included 19 B. breve strains that differed considerably from all type strains. Remarkably, each of the 4 clusters included strains that were isolated from a single infant, indicating that a niche adaptation may contribute to variation within the B. breve species. Based on genomic hybridization data, the new B. breve isolates were estimated to contain approximately 60-90% of the genes of the B. breve type strain, attesting to the existence of various subspecies within the species B. breve. Further bioinformatic analysis identified several hundred diagnostic clones specific to the genomic clustering of the B. breve isolates. Molecular analysis of representatives of these revealed that annotated genes from the conserved B. breve core encoded mainly housekeeping functions, while the strain-specific genes were predicted to code for functions related to life style, such as carbohydrate metabolism and transport. This is compatible with genetic adaptation of the strains to their niche, a combination of infants and diet. Copyright © 2011 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  7. CRISPR-STOP: gene silencing through base-editing-induced nonsense mutations.

    PubMed

    Kuscu, Cem; Parlak, Mahmut; Tufan, Turan; Yang, Jiekun; Szlachta, Karol; Wei, Xiaolong; Mammadov, Rashad; Adli, Mazhar

    2017-07-01

    CRISPR-Cas9-induced DNA damage may have deleterious effects at high-copy-number genomic regions. Here, we use CRISPR base editors to knock out genes by changing single nucleotides to create stop codons. We show that the CRISPR-STOP method is an efficient and less deleterious alternative to wild-type Cas9 for gene-knockout studies. Early stop codons can be introduced in ∼17,000 human genes. CRISPR-STOP-mediated targeted screening demonstrates comparable efficiency to WT Cas9, which indicates the suitability of our approach for genome-wide functional screenings.

  8. Comprehensive performance comparison of high-resolution array platforms for genome-wide Copy Number Variation (CNV) analysis in humans.

    PubMed

    Haraksingh, Rajini R; Abyzov, Alexej; Urban, Alexander Eckehart

    2017-04-24

    High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data. The arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4-489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0-86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters. High-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies.

  9. Is quantitative PCR for the pneumolysin (ply) gene useful for detection of pneumococcal lower respiratory tract infection?

    PubMed

    Abdeldaim, G; Herrmann, B; Korsgaard, J; Olcén, P; Blomberg, J; Strålin, K

    2009-06-01

    The pneumolysin (ply) gene is widely used as a target in PCR assays for Streptococcus pneumoniae in respiratory secretions. However, false-positive results with conventional ply-based PCR have been reported. The aim here was to study the performance of a quantitative ply-based PCR for the identification of pneumococcal lower respiratory tract infection (LRTI). In a prospective study, fibreoptic bronchoscopy was performed in 156 hospitalized adult patients with LRTI and 31 controls who underwent bronchoscopy because of suspicion of malignancy. Among the LRTI patients and controls, the quantitative ply-based PCR applied to bronchoalveolar lavage (BAL) fluid was positive at >or=10(3) genome copies/mL in 61% and 71% of the subjects, at >or=10(5) genome copies/mL in 40% and 58% of the subjects, and at >or=10(7) genome copies/mL in 15% and 3.2% of the subjects, respectively. Using BAL fluid culture, blood culture, and/or a urinary antigen test, S. pneumoniae was identified in 19 LRTI patients. As compared with these diagnostic methods used in combination, quantitative ply-based PCR showed sensitivities and specificities of 89% and 43% at a cut-off of 10(3) genome copies/mL, of 84% and 66% at a cut-off of 10(5) genome copies/mL, and of 53% and 90% at a cut-off of 10(7) genome copies/mL, respectively. In conclusion, a high cut-off with the quantitative ply-based PCR was required to reach acceptable specificity. However, as a high cut-off resulted in low sensitivity, quantitative ply-based PCR does not appear to be clinically useful. Quantitative PCR methods for S. pneumoniae using alternative gene targets should be evaluated.

  10. Comparative functional pan-genome analyses to build connections between genomic dynamics and phenotypic evolution in polycyclic aromatic hydrocarbon metabolism in the genus Mycobacterium.

    PubMed

    Kweon, Ohgew; Kim, Seong-Jae; Blom, Jochen; Kim, Sung-Kwan; Kim, Bong-Soo; Baek, Dong-Heon; Park, Su Inn; Sutherland, John B; Cerniglia, Carl E

    2015-02-14

    The bacterial genus Mycobacterium is of great interest in the medical and biotechnological fields. Despite a flood of genome sequencing and functional genomics data, significant gaps in knowledge between genome and phenome seriously hinder efforts toward the treatment of mycobacterial diseases and practical biotechnological applications. In this study, we propose the use of systematic, comparative functional pan-genomic analysis to build connections between genomic dynamics and phenotypic evolution in polycyclic aromatic hydrocarbon (PAH) metabolism in the genus Mycobacterium. Phylogenetic, phenotypic, and genomic information for 27 completely genome-sequenced mycobacteria was systematically integrated to reconstruct a mycobacterial phenotype network (MPN) with a pan-genomic concept at a network level. In the MPN, mycobacterial phenotypes show typical scale-free relationships. PAH degradation is an isolated phenotype with the lowest connection degree, consistent with phylogenetic and environmental isolation of PAH degraders. A series of functional pan-genomic analyses provide conserved and unique types of genomic evidence for strong epistatic and pleiotropic impacts on evolutionary trajectories of the PAH-degrading phenotype. Under strong natural selection, the detailed gene gain/loss patterns from horizontal gene transfer (HGT)/deletion events hypothesize a plausible evolutionary path, an epistasis-based birth and pleiotropy-dependent death, for PAH metabolism in the genus Mycobacterium. This study generated a practical mycobacterial compendium of phenotypic and genomic changes, focusing on the PAH-degrading phenotype, with a pan-genomic perspective of the evolutionary events and the environmental challenges. Our findings suggest that when selection acts on PAH metabolism, only a small fraction of possible trajectories is likely to be observed, owing mainly to a combination of the ambiguous phenotypic effects of PAHs and the corresponding pleiotropy- and epistasis-dependent evolutionary adaptation. Evolutionary constraints on the selection of trajectories, like those seen in PAH-degrading phenotypes, are likely to apply to the evolution of other phenotypes in the genus Mycobacterium.

  11. Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain

    PubMed Central

    Schrider, Daniel R.; Kern, Andrew D.

    2015-01-01

    The comparative genomics revolution of the past decade has enabled the discovery of functional elements in the human genome via sequence comparison. While that is so, an important class of elements, those specific to humans, is entirely missed by searching for sequence conservation across species. Here we present an analysis based on variation data among human genomes that utilizes a supervised machine learning approach for the identification of human-specific purifying selection in the genome. Using only allele frequency information from the complete low-coverage 1000 Genomes Project data set in conjunction with a support vector machine trained from known functional and nonfunctional portions of the genome, we are able to accurately identify portions of the genome constrained by purifying selection. Our method identifies previously known human-specific gains or losses of function and uncovers many novel candidates. Candidate targets for gain and loss of function along the human lineage include numerous putative regulatory regions of genes essential for normal development of the central nervous system, including a significant enrichment of gain of function events near neurotransmitter receptor genes. These results are consistent with regulatory turnover being a key mechanism in the evolution of human-specific characteristics of brain development. Finally, we show that the majority of the genome is unconstrained by natural selection currently, in agreement with what has been estimated from phylogenetic methods but in sharp contrast to estimates based on transcriptomics or other high-throughput functional methods. PMID:26590212

  12. Comparative Genomic and Morphological Analyses of Listeria Phages Isolated from Farm Environments

    PubMed Central

    Denes, Thomas; Ackermann, Hans-Wolfgang; Moreno Switt, Andrea I.; Wiedmann, Martin; den Bakker, Henk C.

    2014-01-01

    The genus Listeria is ubiquitous in the environment and includes the globally important food-borne pathogen Listeria monocytogenes. While the genomic diversity of Listeria has been well studied, considerably less is known about the genomic and morphological diversity of Listeria bacteriophages. In this study, we sequenced and analyzed the genomes of 14 Listeria phages isolated mostly from New York dairy farm environments as well as one related Enterococcus faecalis phage to obtain information on genome characteristics and diversity. We also examined 12 of the phages by electron microscopy to characterize their morphology. These Listeria phages, based on gene orthology and morphology, together with previously sequenced Listeria phages could be classified into five orthoclusters, including one novel orthocluster. One orthocluster (orthocluster I) consists of large-genome (∼135-kb) myoviruses belonging to the genus “Twort-like viruses,” three orthoclusters (orthoclusters II to IV) contain small-genome (36- to 43-kb) siphoviruses with icosahedral heads, and the novel orthocluster V contains medium-sized-genome (∼66-kb) siphoviruses with elongated heads. A novel orthocluster (orthocluster VI) of E. faecalis phages, with medium-sized genomes (∼56 kb), was identified, which grouped together and shares morphological features with the novel Listeria phage orthocluster V. This new group of phages (i.e., orthoclusters V and VI) is composed of putative lytic phages that may prove to be useful in phage-based applications for biocontrol, detection, and therapeutic purposes. PMID:24837381

  13. Genome size of termites (Insecta, Dictyoptera, Isoptera) and wood roaches (Insecta, Dictyoptera, Cryptocercidae)

    NASA Astrophysics Data System (ADS)

    Koshikawa, Shigeyuki; Miyazaki, Satoshi; Cornette, Richard; Matsumoto, Tadao; Miura, Toru

    2008-09-01

    The evolution of genome size has been discussed in relation to the evolution of various biological traits. In the present study, the genome sizes of 22 dictyopteran species were estimated by Feulgen image analysis densitometry and 6-diamidino-2-phenylindole (DAPI)-based flow cytometry. The haploid genome sizes ( C-values) of termites (Isoptera) ranged from 0.58 to 1.90 pg, and those of Cryptocercus wood roaches (Cryptocercidae) were 1.16 to 1.32 pg. Compared to known values of other cockroaches (Blattaria) and mantids (Mantodea), these values are low. A relatively small genome size appears to be a (syn)apomorphy of Isoptera + Cryptocercus, together with their sociality. In some phylogenetic groups, genome size evolution is thought to be influenced by selective pressure on a particular trait, such as cell size or rate of development. The present results raise the possibility that genome size is influenced by selective pressures on traits associated with the evolution of sociality.

  14. Genome size of termites (Insecta, Dictyoptera, Isoptera) and wood roaches (Insecta, Dictyoptera, Cryptocercidae).

    PubMed

    Koshikawa, Shigeyuki; Miyazaki, Satoshi; Cornette, Richard; Matsumoto, Tadao; Miura, Toru

    2008-09-01

    The evolution of genome size has been discussed in relation to the evolution of various biological traits. In the present study, the genome sizes of 22 dictyopteran species were estimated by Feulgen image analysis densitometry and 6-diamidino-2-phenylindole (DAPI)-based flow cytometry. The haploid genome sizes (C-values) of termites (Isoptera) ranged from 0.58 to 1.90 pg, and those of Cryptocercus wood roaches (Cryptocercidae) were 1.16 to 1.32 pg. Compared to known values of other cockroaches (Blattaria) and mantids (Mantodea), these values are low. A relatively small genome size appears to be a (syn)apomorphy of Isoptera + Cryptocercus, together with their sociality. In some phylogenetic groups, genome size evolution is thought to be influenced by selective pressure on a particular trait, such as cell size or rate of development. The present results raise the possibility that genome size is influenced by selective pressures on traits associated with the evolution of sociality.

  15. A draft genome assembly of the army worm, Spodoptera frugiperda.

    PubMed

    Kakumani, Pavan Kumar; Malhotra, Pawan; Mukherjee, Sunil K; Bhatnagar, Raj K

    2014-08-01

    Spodoptera is an agriculturally important pest insect and studies in understanding its biology have been limited by the unavailability of its genome. In the present study, the genomic DNA was sequenced and assembled into 37,243 scaffolds of size, 358 Mb with N50 of 53.7 kb. Based on degree of identity, we could anchor 305 Mb of the genome onto all the 28 chromosomes of Bombyx mori. Repeat elements were identified, which accounts for 20.28% of the total genome. Further, we predicted 11,595 genes, with an average intron length of 726 bp. The genes were annotated and domain analysis revealed that Sf genes share a significant homology and expression pattern with B. mori, despite differences in KOG gene categories and representation of certain protein families. The present study on Sf genome would help in the characterization of cellular pathways to understand its biology and comparative evolutionary studies among lepidopteran family members to help annotate their genomes. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. PARRoT- a homology-based strategy to quantify and compare RNA-sequencing from non-model organisms.

    PubMed

    Gan, Ruei-Chi; Chen, Ting-Wen; Wu, Timothy H; Huang, Po-Jung; Lee, Chi-Ching; Yeh, Yuan-Ming; Chiu, Cheng-Hsun; Huang, Hsien-Da; Tang, Petrus

    2016-12-22

    Next-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared. Here, we propose a new analysis strategy and quantification methods for quantifying expression level which not only generate a virtual reference from sequencing data, but also provide comparisons between transcriptomes. First, all reads from the transcriptome datasets are pooled together for de novo assembly. The assembled contigs are searched against NCBI NR databases to find potential homolog sequences. Based on the searched result, a set of virtual transcripts are generated and served as a reference transcriptome. By using the same reference, normalized quantification values including RC (read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) can be obtained that are comparable across transcriptome datasets. In order to demonstrate the feasibility of our strategy, we implement it in the web service PARRoT. PARRoT stands for Pipeline for Analyzing RNA Reads of Transcriptomes. It analyzes gene expression profiles for two transcriptome sequencing datasets. For better understanding of the biological meaning from the comparison among transcriptomes, PARRoT further provides linkage between these virtual transcripts and their potential function through showing best hits in SwissProt, NR database, assigning GO terms. Our demo datasets showed that PARRoT can analyze two paired-end transcriptomic datasets of approximately 100 million reads within just three hours. In this study, we proposed and implemented a strategy to analyze transcriptomes from non-reference organisms which offers the opportunity to quantify and compare transcriptome profiles through a homolog based virtual transcriptome reference. By using the homolog based reference, our strategy effectively avoids the problems that may cause from inconsistencies among transcriptomes. This strategy will shed lights on the field of comparative genomics for non-model organism. We have implemented PARRoT as a web service which is freely available at http://parrot.cgu.edu.tw .

  17. CoCoNUT: an efficient system for the comparison and analysis of genomes

    PubMed Central

    2008-01-01

    Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics. PMID:19014477

  18. Comprehensive genomic analysis of a plant growth-promoting rhizobacterium Pantoea agglomerans strain P5.

    PubMed

    Shariati J, Vahid; Malboobi, Mohammad Ali; Tabrizi, Zeinab; Tavakol, Elahe; Owilia, Parviz; Safari, Maryam

    2017-11-15

    In this study, we provide a comparative genomic analysis of Pantoea agglomerans strain P5 and 10 closely related strains based on phylogenetic analyses. A next-generation shotgun strategy was implemented using the Illumina HiSeq 2500 technology followed by core- and pan-genome analysis. The genome of P. agglomerans strain P5 contains an assembly size of 5082485 bp with 55.4% G + C content. P. agglomerans consists of 2981 core and 3159 accessory genes for Coding DNA Sequences (CDSs) based on the pan-genome analysis. Strain P5 can be grouped closely with strains PG734 and 299 R using pan and core genes, respectively. All the predicted and annotated gene sequences were allocated to KEGG pathways. Accordingly,  genes involved in plant growth-promoting (PGP) ability, including phosphate solubilization, IAA and siderophore production, acetoin and 2,3-butanediol synthesis and bacterial secretion, were assigned. This study provides an in-depth view of the PGP characteristics of strain P5, highlighting its potential use in agriculture as a biofertilizer.

  19. DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique

    PubMed Central

    Li, Pinghao; Wang, Shuang; Kim, Jihoon; Xiong, Hongkai; Ohno-Machado, Lucila; Jiang, Xiaoqian

    2013-01-01

    Genome data are becoming increasingly important for modern medicine. As the rate of increase in DNA sequencing outstrips the rate of increase in disk storage capacity, the storage and data transferring of large genome data are becoming important concerns for biomedical researchers. We propose a two-pass lossless genome compression algorithm, which highlights the synthesis of complementary contextual models, to improve the compression performance. The proposed framework could handle genome compression with and without reference sequences, and demonstrated performance advantages over best existing algorithms. The method for reference-free compression led to bit rates of 1.720 and 1.838 bits per base for bacteria and yeast, which were approximately 3.7% and 2.6% better than the state-of-the-art algorithms. Regarding performance with reference, we tested on the first Korean personal genome sequence data set, and our proposed method demonstrated a 189-fold compression rate, reducing the raw file size from 2986.8 MB to 15.8 MB at a comparable decompression cost with existing algorithms. DNAcompact is freely available at https://sourceforge.net/projects/dnacompact/for research purpose. PMID:24282536

  20. CRISPR/Cas9-Based Multiplex Genome Editing in Monocot and Dicot Plants.

    PubMed

    Ma, Xingliang; Liu, Yao-Guang

    2016-07-01

    The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9-mediated genome targeting system has been applied to a variety of organisms, including plants. Compared to other genome-targeting technologies such as zinc-finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs), the CRISPR/Cas9 system is easier to use and has much higher editing efficiency. In addition, multiple "single guide RNAs" (sgRNAs) with different target sequences can be designed to direct the Cas9 protein to multiple genomic sites for simultaneous multiplex editing. Here, we present a procedure for highly efficient multiplex genome targeting in monocot and dicot plants using a versatile and robust CRISPR/Cas9 vector system, emphasizing the construction of binary constructs with multiple sgRNA expression cassettes in one round of cloning using Golden Gate ligation. We also describe the genotyping of targeted mutations in transgenic plants by direct Sanger sequencing followed by decoding of superimposed sequencing chromatograms containing biallelic or heterozygous mutations using the Web-based tool DSDecode. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  1. PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data

    NASA Astrophysics Data System (ADS)

    Deneke, Carlus; Rentzsch, Robert; Renard, Bernhard Y.

    2017-01-01

    The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from the reference database. Here we present the machine learning based approach PaPrBaG (Pathogenicity Prediction for Bacterial Genomes). PaPrBaG overcomes genetic divergence by training on a wide range of species with known pathogenicity phenotype. To that end we compiled a comprehensive list of pathogenic and non-pathogenic bacteria with human host, using various genome metadata in conjunction with a rule-based protocol. A detailed comparative study reveals that PaPrBaG has several advantages over sequence similarity approaches. Most importantly, it always provides a prediction whereas other approaches discard a large number of sequencing reads with low similarity to currently known reference genomes. Furthermore, PaPrBaG remains reliable even at very low genomic coverages. CombiningPaPrBaG with existing approaches further improves prediction results.

  2. Complete genome sequence of Lactobacillus plantarum LZ206, a potential probiotic strain with antimicrobial activity against food-borne pathogenic microorganisms.

    PubMed

    Li, Ping; Gu, Qing; Zhou, Qingqing

    2016-11-20

    Lactobacilli strains have been considered as important candidates for manufacturing "natural food", due to their antimicrobial properties and generally regarded as safe (GRAS) status. Lactobacillus plantarum LZ206 is a potential probiotic strain isolated from raw cow milk, with antimicrobial activity against various pathogens, including Gram-positive bacteria (Staphylococcus aureus and Listeria monocytogenes), Gram-negtive bacteria (Escherichia coli and Salmonella enterica), and fungus Candida albicans. To better understand molecular base for its antimicrobial activity, entire genome of LZ206 was sequenced. It was revealed that genome of LZ206 contained a circular 3,212,951-bp chromosome, two circular plasmids and one predicted linear plasmid. A plantaricin gene cluster, which is responsible for bacteriocins biosynthesis and could be associated with its broad-spectrum antimicrobial activity, was identified based on comparative genomic analysis. Whole genome sequencing of L. plantarum LZ206 might facilitate its applications to protect food products from pathogens' contamination in the dairy industry. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A- and B-genome diploid species of peanut

    PubMed Central

    2012-01-01

    Background Cultivated peanut or groundnut (Arachis hypogaea L.) is an important oilseed crop with an allotetraploid genome (AABB, 2n = 4x = 40). Both the low level of genetic variation within the cultivated gene pool and its polyploid nature limit the utilization of molecular markers to explore genome structure and facilitate genetic improvement. Nevertheless, a wealth of genetic diversity exists in diploid Arachis species (2n = 2x = 20), which represent a valuable gene pool for cultivated peanut improvement. Interspecific populations have been used widely for genetic mapping in diploid species of Arachis. However, an intraspecific mapping strategy was essential to detect chromosomal rearrangements among species that could be obscured by mapping in interspecific populations. To develop intraspecific reference linkage maps and gain insights into karyotypic evolution within the genus, we comparatively mapped the A- and B-genome diploid species using intraspecific F2 populations. Exploring genome organization among diploid peanut species by comparative mapping will enhance our understanding of the cultivated tetraploid peanut genome. Moreover, new sources of molecular markers that are highly transferable between species and developed from expressed genes will be required to construct saturated genetic maps for peanut. Results A total of 2,138 EST-SSR (expressed sequence tag-simple sequence repeat) markers were developed by mining a tetraploid peanut EST assembly including 101,132 unigenes (37,916 contigs and 63,216 singletons) derived from 70,771 long-read (Sanger) and 270,957 short-read (454) sequences. A set of 97 SSR markers were also developed by mining 9,517 genomic survey sequences of Arachis. An SSR-based intraspecific linkage map was constructed using an F2 population derived from a cross between K 9484 (PI 298639) and GKBSPSc 30081 (PI 468327) in the B-genome species A. batizocoi. A high degree of macrosynteny was observed when comparing the homoeologous linkage groups between A (A. duranensis) and B (A. batizocoi) genomes. Comparison of the A- and B-genome genetic linkage maps also showed a total of five inversions and one major reciprocal translocation between two pairs of chromosomes under our current mapping resolution. Conclusions Our findings will contribute to understanding tetraploid peanut genome origin and evolution and eventually promote its genetic improvement. The newly developed EST-SSR markers will enrich current molecular marker resources in peanut. PMID:23140574

  4. Understanding the differences between genome sequences of Escherichia coli B strains REL606 and BL21(DE3), and comparison of the closely related E. coli B and K-12 genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Studier, F.W.; Daegelen, P.; Lenski, R. E.

    2009-12-01

    Each difference between the genome sequences of Escherichia coli B strains REL606 and BL21(DE3) can be interpreted in light of known laboratory manipulations plus a gene conversion between ribosomal RNA operons. Two treatments with 1-methyl-3-nitro-1-nitrosoguanidine in the REL606 lineage produced at least 93 single-base-pair mutations ({approx} 90% GC-to-AT transitions) and 3 single-base-pair GC deletions. Two UV treatments in the BL21(DE3) lineage produced only 4 single-base-pair mutations but 16 large deletions. P1 transductions from K-12 into the two B lineages produced 317 single-base-pair differences and 9 insertions or deletions, reflecting differences between B DNA in BL21(DE3) and integrated restriction fragments ofmore » K-12 DNA inherited by REL606. Two sites showed selective enrichment of spontaneous mutations. No unselected spontaneous single-base-pair mutations were evident. The genome sequences revealed that a progenitor of REL606 had been misidentified, explaining initially perplexing differences. Limited sequencing of other B strains defined characteristic properties of B and allowed assembly of the inferred genome of the ancestral B of Delbrueck and Luria. Comparison of the B and K-12 genomes shows that more than half of the 3793 proteins of their basic genomes are predicted to be identical, although {approx} 310 appear to be functional in either B or K-12 but not in both. The ancestral basic genome appears to have had {approx} 4039 coding sequences occupying {approx} 4.0 Mbp. Repeated horizontal transfer from diverged Escherichia coli genomes and homologous recombination may explain the observed variable distribution of single-base-pair differences. Fifteen sites are occupied by phage-related elements, but only six by comparable elements at the same site. More than 50 sites are occupied by IS elements in both B and K, 16 in common, and likely founding IS elements are identified. A signature of widespread cryptic phage P4-type mobile elements was identified. Complex deletions (dense clusters of small deletions and substitutions) apparently removed nonessential genes from {approx} 30 sites in the basic genomes.« less

  5. Accuracy of Genomic Prediction for Foliar Terpene Traits in Eucalyptus polybractea.

    PubMed

    Kainer, David; Stone, Eric A; Padovan, Amanda; Foley, William J; Külheim, Carsten

    2018-06-11

    Unlike agricultural crops, most forest species have not had millennia of improvement through phenotypic selection, but can contribute energy and material resources and possibly help alleviate climate change. Yield gains similar to those achieved in agricultural crops over millennia could be made in forestry species with the use of genomic methods in a much shorter time frame. Here we compare various methods of genomic prediction for eight traits related to foliar terpene yield in Eucalyptus polybractea , a tree grown predominantly for the production of Eucalyptus oil. The genomic markers used in this study are derived from shallow whole genome sequencing of a population of 480 trees. We compare the traditional pedigree-based additive best linear unbiased predictors (ABLUP), genomic BLUP (GBLUP), BayesB genomic prediction model, and a form of GBLUP based on weighting markers according to their influence on traits (BLUP|GA). Predictive ability is assessed under varying marker densities of 10,000, 100,000 and 500,000 SNPs. Our results show that BayesB and BLUP|GA perform best across the eight traits. Predictive ability was higher for individual terpene traits, such as foliar α-pinene and 1,8-cineole concentration (0.59 and 0.73, respectively), than aggregate traits such as total foliar oil concentration (0.38). This is likely a function of the trait architecture and markers used. BLUP|GA was the best model for the two biomass related traits, height and 1 year change in height (0.25 and 0.19, respectively). Predictive ability increased with marker density for most traits, but with diminishing returns. The results of this study are a solid foundation for yield improvement of essential oil producing eucalypts. New markets such as biopolymers and terpene-derived biofuels could benefit from rapid yield increases in undomesticated oil-producing species. Copyright © 2018, G3: Genes, Genomes, Genetics.

  6. Genome assortment, not serogroup, defines Vibrio cholerae pandemic strains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brettin, Thomas S; Bruce, David C; Challacombe, Jean F

    2009-01-01

    Vibrio cholerae, the causative agent of cholera, is a bacterium autochthonous to the aquatic environment, and a serious public health threat. V. cholerae serogroup O1 is responsible for the previous two cholera pandemics, in which classical and El Tor biotypes were dominant in the 6th and the current 7th pandemics, respectively. Cholera researchers continually face newly emerging and re-emerging pathogenic clones carrying combinations of new serogroups as well as of phenotypic and genotypic properties. These genotype and phenotype changes have hampered control of the disease. Here we compare the complete genome sequences of 23 strains of V. cholerae isolated frommore » a variety of sources and geographical locations over the past 98 years in an effort to elucidate the evolutionary mechanisms governing genetic diversity and genesis of new pathogenic clones. The genome-based phylogeny revealed 12 distinct V. cholerae phyletic lineages, of which one, designated the V. cholerae core genome (CG), comprises both O1 classical and EI Tor biotypes. All 7th pandemic clones share nearly identical gene content, i.e., the same genome backbone. The transition from 6th to 7th pandemic strains is defined here as a 'shift' between pathogenic clones belonging to the same O1 serogroup, but from significantly different phyletic lineages within the CG clade. In contrast, transition among clones during the present 7th pandemic period can be characterized as a 'drift' between clones, differentiated mainly by varying composition of laterally transferred genomic islands, resulting in emergence of variants, exemplified by V.cholerae serogroup O139 and V.cholerae O1 El Tor hybrid clones that produce cholera toxin of classical biotype. Based on the comprehensive comparative genomics presented in this study it is concluded that V. cholerae undergoes extensive genetic recombination via lateral gene transfer, and, therefore, genome assortment, not serogroup, should be used to define pathogenic V. cholerae clones.« less

  7. A web server for mining Comparative Genomic Hybridization (CGH) data

    NASA Astrophysics Data System (ADS)

    Liu, Jun; Ranka, Sanjay; Kahveci, Tamer

    2007-11-01

    Advances in cytogenetics and molecular biology has established that chromosomal alterations are critical in the pathogenesis of human cancer. Recurrent chromosomal alterations provide cytological and molecular markers for the diagnosis and prognosis of disease. They also facilitate the identification of genes that are important in carcinogenesis, which in the future may help in the development of targeted therapy. A large amount of publicly available cancer genetic data is now available and it is growing. There is a need for public domain tools that allow users to analyze their data and visualize the results. This chapter describes a web based software tool that will allow researchers to analyze and visualize Comparative Genomic Hybridization (CGH) datasets. It employs novel data mining methodologies for clustering and classification of CGH datasets as well as algorithms for identifying important markers (small set of genomic intervals with aberrations) that are potentially cancer signatures. The developed software will help in understanding the relationships between genomic aberrations and cancer types.

  8. Application of Nexus copy number software for CNV detection and analysis.

    PubMed

    Darvishi, Katayoon

    2010-04-01

    Among human structural genomic variation, copy number variants (CNVs) are the most frequently known component, comprised of gains/losses of DNA segments that are generally 1 kb in length or longer. Array-based comparative genomic hybridization (aCGH) has emerged as a powerful tool for detecting genomic copy number variants (CNVs). With the rapid increase in the density of array technology and with the adaptation of new high-throughput technology, a reliable and computationally scalable method for accurate mapping of recurring DNA copy number aberrations has become a main focus in research. Here we introduce Nexus Copy Number software, a platform-independent tool, to analyze the output files of all types of commercial and custom-made comparative genomic hybridization (CGH) and single-nucleotide polymorphism (SNP) arrays, such as those manufactured by Affymetrix, Agilent Technologies, Illumina, and Roche NimbleGen. It also supports data generated by various array image-analysis software tools such as GenePix, ImaGene, and BlueFuse. (c) 2010 by John Wiley & Sons, Inc.

  9. Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species.

    PubMed

    Chen, Zhiwen; Feng, Kun; Grover, Corrinne E; Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F; Wang, Kunbo; Hua, Jinping

    2016-01-01

    The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium.

  10. Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species

    PubMed Central

    Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F.; Wang, Kunbo

    2016-01-01

    The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium. PMID:27309527

  11. Estimation and Partitioning of Heritability in Human Populations using Whole Genome Analysis Methods

    PubMed Central

    Vinkhuyzen, Anna AE; Wray, Naomi R; Yang, Jian; Goddard, Michael E; Visscher, Peter M

    2014-01-01

    Understanding genetic variation of complex traits in human populations has moved from the quantification of the resemblance between close relatives to the dissection of genetic variation into the contributions of individual genomic loci. But major questions remain unanswered: how much phenotypic variation is genetic, how much of the genetic variation is additive and what is the joint distribution of effect size and allele frequency at causal variants? We review and compare three whole-genome analysis methods that use mixed linear models (MLM) to estimate genetic variation, using the relationship between close or distant relatives based on pedigree or SNPs. We discuss theory, estimation procedures, bias and precision of each method and review recent advances in the dissection of additive genetic variation of complex traits in human populations that are based upon the application of MLM. Using genome wide data, SNPs account for far more of the genetic variation than the highly significant SNPs associated with a trait, but they do not account for all of the genetic variance estimated by pedigree based methods. We explain possible reasons for this ‘missing’ heritability. PMID:23988118

  12. Genome and transcriptome adaptation accompanying emergence of the definitive type 2 host-restricted Salmonella enterica serovar Typhimurium pathovar.

    PubMed

    Kingsley, Robert A; Kay, Sally; Connor, Thomas; Barquist, Lars; Sait, Leanne; Holt, Kathryn E; Sivaraman, Karthi; Wileman, Thomas; Goulding, David; Clare, Simon; Hale, Christine; Seshasayee, Aswin; Harris, Simon; Thomson, Nicholas R; Gardner, Paul; Rabsch, Wolfgang; Wigley, Paul; Humphrey, Tom; Parkhill, Julian; Dougan, Gordon

    2013-08-27

    Salmonella enterica serovar Typhimurium definitive type 2 (DT2) is host restricted to Columba livia (rock or feral pigeon) but is also closely related to S. Typhimurium isolates that circulate in livestock and cause a zoonosis characterized by gastroenteritis in humans. DT2 isolates formed a distinct phylogenetic cluster within S. Typhimurium based on whole-genome-sequence polymorphisms. Comparative genome analysis of DT2 94-213 and S. Typhimurium SL1344, DT104, and D23580 identified few differences in gene content with the exception of variations within prophages. However, DT2 94-213 harbored 22 pseudogenes that were intact in other closely related S. Typhimurium strains. We report a novel in silico approach to identify single amino acid substitutions in proteins that have a high probability of a functional impact. One polymorphism identified using this method, a single-residue deletion in the Tar protein, abrogated chemotaxis to aspartate in vitro. DT2 94-213 also exhibited an altered transcriptional profile in response to culture at 42°C compared to that of SL1344. Such differentially regulated genes included a number involved in flagellum biosynthesis and motility. IMPORTANCE Whereas Salmonella enterica serovar Typhimurium can infect a wide range of animal species, some variants within this serovar exhibit a more limited host range and altered disease potential. Phylogenetic analysis based on whole-genome sequences can identify lineages associated with specific virulence traits, including host adaptation. This study represents one of the first to link pathogen-specific genetic signatures, including coding capacity, genome degradation, and transcriptional responses to host adaptation within a Salmonella serovar. We performed comparative genome analysis of reference and pigeon-adapted definitive type 2 (DT2) S. Typhimurium isolates alongside phenotypic and transcriptome analyses, to identify genetic signatures linked to host adaptation within the DT2 lineage.

  13. Effective normalization for copy number variation detection from whole genome sequencing.

    PubMed

    Janevski, Angel; Varadan, Vinay; Kamalakaran, Sitharthan; Banerjee, Nilanjana; Dimitrova, Nevenka

    2012-01-01

    Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations. We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions. The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable gene and CNV region calls. Choice of read-count normalization methodology has a substantial effect on CNV calls and the use of genomic mappability or an appropriately chosen control genome can optimize the output of CNV analysis.

  14. Essie: A Concept-based Search Engine for Structured Biomedical Text

    PubMed Central

    Ide, Nicholas C.; Loane, Russell F.; Demner-Fushman, Dina

    2007-01-01

    This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie’s design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie’s performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain. PMID:17329729

  15. Biodiversity of Vibrios

    PubMed Central

    Thompson, Fabiano L.; Iida, Tetsuya; Swings, Jean

    2004-01-01

    Vibrios are ubiquitous and abundant in the aquatic environment. A high abundance of vibrios is also detected in tissues and/or organs of various marine algae and animals, e.g., abalones, bivalves, corals, fish, shrimp, sponges, squid, and zooplankton. Vibrios harbour a wealth of diverse genomes as revealed by different genomic techniques including amplified fragment length polymorphism, multilocus sequence typing, repetetive extragenic palindrome PCR, ribotyping, and whole-genome sequencing. The 74 species of this group are distributed among four different families, i.e., Enterovibrionaceae, Photobacteriaceae, Salinivibrionaceae, and Vibrionaceae. Two new genera, i.e., Enterovibrio norvegicus and Grimontia hollisae, and 20 novel species, i.e., Enterovibrio coralii, Photobacterium eurosenbergii, V. brasiliensis, V. chagasii, V. coralliillyticus, V. crassostreae, V. fortis, V. gallicus, V. hepatarius, V. hispanicus, V. kanaloaei, V. neonatus, V. neptunius, V. pomeroyi, V. pacinii, V. rotiferianus, V. superstes, V. tasmaniensis, V. ezurae, and V. xuii, have been described in the last few years. Comparative genome analyses have already revealed a variety of genomic events, including mutations, chromosomal rearrangements, loss of genes by decay or deletion, and gene acquisitions through duplication or horizontal transfer (e.g., in the acquisition of bacteriophages, pathogenicity islands, and super-integrons), that are probably important driving forces in the evolution and speciation of vibrios. Whole-genome sequencing and comparative genomics through the application of, e.g., microarrays will facilitate the investigation of the gene repertoire at the species level. Based on such new genomic information, the taxonomy and the species concept for vibrios will be reviewed in the next years. PMID:15353563

  16. Distinct p53 genomic binding patterns in normal and cancer-derived human cells

    PubMed Central

    McCorkle, Sean R; McCombie, WR; Dunn, John J

    2011-01-01

    Here, we report genome-wide analysis of the tumor suppressor p53 binding sites in normal human cells. 743 high-confidence ChIP-seq peaks representing putative genomic binding sites were identified in normal IMR90 fibroblasts using a reference chromatin sample. More than 40% were located within 2 kb of a transcription start site (TSS), a distribution similar to that documented for individually studied, functional p53 binding sites and, to date, not observed by previous p53 genome-wide studies. Nearly half of the high-confidence binding sites in the IMR90 cells reside in CpG islands in marked contrast to sites reported in cancer-derived cells. The distinct genomic features of the IMR90 binding sites do not reflect a distinct preference for specific sequences, since the de novo developed p53 motif based on our study is similar to those reported by genome-wide studies of cancer cells. More likely, the different chromatin landscape in normal, compared with cancer-derived cells, influences p53 binding via modulating availability of the sites. We compared the IMR90 ChIP-seq peaks to the recently published IMR90 methylome1 and demonstrated that they are enriched at hypomethylated DNA. Our study represents the first genome-wide, de novo mapping of p53 binding sites in normal human cells and reveals that p53 binding sites reside in distinct genomic landscapes in normal and cancer-derived human cells. PMID:22127205

  17. The complete chloroplast genome sequence of Dodonaea viscosa: comparative and phylogenetic analyses.

    PubMed

    Saina, Josphat K; Gichira, Andrew W; Li, Zhi-Zhong; Hu, Guang-Wan; Wang, Qing-Feng; Liao, Kuo

    2018-02-01

    The plant chloroplast (cp) genome is a highly conserved structure which is beneficial for evolution and systematic research. Currently, numerous complete cp genome sequences have been reported due to high throughput sequencing technology. However, there is no complete chloroplast genome of genus Dodonaea that has been reported before. To better understand the molecular basis of Dodonaea viscosa chloroplast, we used Illumina sequencing technology to sequence its complete genome. The whole length of the cp genome is 159,375 base pairs (bp), with a pair of inverted repeats (IRs) of 27,099 bp separated by a large single copy (LSC) 87,204 bp, and small single copy (SSC) 17,972 bp. The annotation analysis revealed a total of 115 unique genes of which 81 were protein coding, 30 tRNA, and four ribosomal RNA genes. Comparative genome analysis with other closely related Sapindaceae members showed conserved gene order in the inverted and single copy regions. Phylogenetic analysis clustered D. viscosa with other species of Sapindaceae with strong bootstrap support. Finally, a total of 249 SSRs were detected. Moreover, a comparison of the synonymous (Ks) and nonsynonymous (Ka) substitution rates in D. viscosa showed very low values. The availability of cp genome reported here provides a valuable genetic resource for comprehensive further studies in genetic variation, taxonomy and phylogenetic evolution of Sapindaceae family. In addition, SSR markers detected will be used in further phylogeographic and population structure studies of the species in this genus.

  18. K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features

    PubMed Central

    Sievers, Aaron; Bosiek, Katharina; Bisch, Marc; Dreessen, Chris; Riedel, Jascha; Froß, Patrick; Hausmann, Michael; Hildenbrand, Georg

    2017-01-01

    In genome analysis, k-mer-based comparison methods have become standard tools. However, even though they are able to deliver reliable results, other algorithms seem to work better in some cases. To improve k-mer-based DNA sequence analysis and comparison, we successfully checked whether adding positional resolution is beneficial for finding and/or comparing interesting organizational structures. A simple but efficient algorithm for extracting and saving local k-mer spectra (frequency distribution of k-mers) was developed and used. The results were analyzed by including positional information based on visualizations as genomic maps and by applying basic vector correlation methods. This analysis was concentrated on small word lengths (1 ≤ k ≤ 4) on relatively small viral genomes of Papillomaviridae and Herpesviridae, while also checking its usability for larger sequences, namely human chromosome 2 and the homologous chromosomes (2A, 2B) of a chimpanzee. Using this alignment-free analysis, several regions with specific characteristics in Papillomaviridae and Herpesviridae formerly identified by independent, mostly alignment-based methods, were confirmed. Correlations between the k-mer content and several genes in these genomes have been found, showing similarities between classified and unclassified viruses, which may be potentially useful for further taxonomic research. Furthermore, unknown k-mer correlations in the genomes of Human Herpesviruses (HHVs), which are probably of major biological function, are found and described. Using the chromosomes of a chimpanzee and human that are currently known, identities between the species on every analyzed chromosome were reproduced. This demonstrates the feasibility of our approach for large data sets of complex genomes. Based on these results, we suggest k-mer analysis with positional resolution as a method for closing a gap between the effectiveness of alignment-based methods (like NCBI BLAST) and the high pace of standard k-mer analysis. PMID:28422050

  19. Genome-Based Taxonomic Classification of Bacteroidetes

    DOE PAGES

    Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina; ...

    2016-12-20

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogeneticmore » analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.« less

  20. Genome-Based Taxonomic Classification of Bacteroidetes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogeneticmore » analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.« less

  1. Genome-Based Taxonomic Classification of Bacteroidetes

    PubMed Central

    Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina; Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia N.; Woyke, Tanja; Kyrpides, Nikos C.; Klenk, Hans-Peter; Göker, Markus

    2016-01-01

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved. PMID:28066339

  2. Genome-wide linkage disequilibrium and genetic diversity in five populations of Australian domestic sheep.

    PubMed

    Al-Mamun, Hawlader Abdullah; Clark, Samuel A; Kwan, Paul; Gondro, Cedric

    2015-11-24

    Knowledge of the genetic structure and overall diversity of livestock species is important to maximise the potential of genome-wide association studies and genomic prediction. Commonly used measures such as linkage disequilibrium (LD), effective population size (N e ), heterozygosity, fixation index (F ST) and runs of homozygosity (ROH) are widely used and help to improve our knowledge about genetic diversity in animal populations. The development of high-density single nucleotide polymorphism (SNP) arrays and the subsequent genotyping of large numbers of animals have greatly increased the accuracy of these population-based estimates. In this study, we used the Illumina OvineSNP50 BeadChip array to estimate and compare LD (measured by r (2) and D'), N e , heterozygosity, F ST and ROH in five Australian sheep populations: three pure breeds, i.e., Merino (MER), Border Leicester (BL), Poll Dorset (PD) and two crossbred populations i.e. F1 crosses of Merino and Border Leicester (MxB) and MxB crossed to Poll Dorset (MxBxP). Compared to other livestock species, the sheep populations that were analysed in this study had low levels of LD and high levels of genetic diversity. The rate of LD decay was greater in Merino than in the other pure breeds. Over short distances (<10 kb), the levels of LD were higher in BL and PD than in MER. Similarly, BL and PD had comparatively smaller N e than MER. Observed heterozygosity in the pure breeds ranged from 0.3 in BL to 0.38 in MER. Genetic distances between breeds were modest compared to other livestock species (highest F ST = 0.063) but the genetic diversity within breeds was high. Based on ROH, two chromosomal regions showed evidence of strong recent selection. This study shows that there is a large range of genome diversity in Australian sheep breeds, especially in Merino sheep. The observed range of diversity will influence the design of genome-wide association studies and the results that can be obtained from them. This knowledge will also be useful to design reference populations for genomic prediction of breeding values in sheep.

  3. Translational bioinformatics in the cloud: an affordable alternative

    PubMed Central

    2010-01-01

    With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine. We find that the cloud-based analysis compares favorably in both performance and cost in comparison to a local computational cluster, suggesting that cloud computing technologies might be a viable resource for facilitating large-scale translational research in genomic medicine. PMID:20691073

  4. Contribution of radiation hybrids to genome mapping in domestic animals.

    PubMed

    Faraut, T; de Givry, S; Hitte, C; Lahbib-Mansais, Y; Morisson, M; Milan, D; Schiex, T; Servin, B; Vignal, A; Galibert, F; Yerle, M

    2009-01-01

    Radiation hybrid mapping has emerged in the end of the 1990 s as a successful and complementary approach to map genomes, essentially because of its ability to bridge the gaps between genetic and clone-based physical maps, but also using comparative mapping approaches, between 'gene-rich' and 'gene-poor' maps. Since its early development in human, radiation hybrid mapping played a pivotal role in the process of mapping animal genomes, especially mammalian ones. We review here all the different steps involved in radiation hybrid mapping from the constitution of panels to the construction of maps. A description of its contribution to whole genome maps with a special emphasis on domestic animals will also be presented. Finally, current applications of radiation hybrid mapping in the context of whole genome assemblies will be described. (c) 2009 S. Karger AG, Basel.

  5. NGSmethDB 2017: enhanced methylomes and differential methylation

    PubMed Central

    Lebrón, Ricardo; Gómez-Martín, Cristina; Carpena, Pedro; Bernaola-Galván, Pedro; Barturen, Guillermo; Hackenberg, Michael; Oliver, José L.

    2017-01-01

    The 2017 update of NGSmethDB stores whole genome methylomes generated from short-read data sets obtained by bisulfite sequencing (WGBS) technology. To generate high-quality methylomes, stringent quality controls were integrated with third-part software, adding also a two-step mapping process to exploit the advantages of the new genome assembly models. The samples were all profiled under constant parameter settings, thus enabling comparative downstream analyses. Besides a significant increase in the number of samples, NGSmethDB now includes two additional data-types, which are a valuable resource for the discovery of methylation epigenetic biomarkers: (i) differentially methylated single-cytosines; and (ii) methylation segments (i.e. genome regions of homogeneous methylation). The NGSmethDB back-end is now based on MongoDB, a NoSQL hierarchical database using JSON-formatted documents and dynamic schemas, thus accelerating sample comparative analyses. Besides conventional database dumps, track hubs were implemented, which improved database access, visualization in genome browsers and comparative analyses to third-part annotations. In addition, the database can be also accessed through a RESTful API. Lastly, a Python client and a multiplatform virtual machine allow for program-driven access from user desktop. This way, private methylation data can be compared to NGSmethDB without the need to upload them to public servers. Database website: http://bioinfo2.ugr.es/NGSmethDB. PMID:27794041

  6. Genome-wide Target Enrichment-aided Chip Design: a 66 K SNP Chip for Cashmere Goat.

    PubMed

    Qiao, Xian; Su, Rui; Wang, Yang; Wang, Ruijun; Yang, Ting; Li, Xiaokai; Chen, Wei; He, Shiyang; Jiang, Yu; Xu, Qiwu; Wan, Wenting; Zhang, Yaolei; Zhang, Wenguang; Chen, Jiang; Liu, Bin; Liu, Xin; Fan, Yixing; Chen, Duoyuan; Jiang, Huaizhi; Fang, Dongming; Liu, Zhihong; Wang, Xiaowen; Zhang, Yanjun; Mao, Danqing; Wang, Zhiying; Di, Ran; Zhao, Qianjun; Zhong, Tao; Yang, Huanming; Wang, Jian; Wang, Wen; Dong, Yang; Chen, Xiaoli; Xu, Xun; Li, Jinquan

    2017-08-17

    Compared with the commercially available single nucleotide polymorphism (SNP) chip based on the Bead Chip technology, the solution hybrid selection (SHS)-based target enrichment SNP chip is not only design-flexible, but also cost-effective for genotype sequencing. In this study, we propose to design an animal SNP chip using the SHS-based target enrichment strategy for the first time. As an update to the international collaboration on goat research, a 66 K SNP chip for cashmere goat was created from the whole-genome sequencing data of 73 individuals. Verification of this 66 K SNP chip with the whole-genome sequencing data of 436 cashmere goats showed that the SNP call rates was between 95.3% and 99.8%. The average sequencing depth for target SNPs were 40X. The capture regions were shown to be 200 bp that flank target SNPs. This chip was further tested in a genome-wide association analysis of cashmere fineness (fiber diameter). Several top hit loci were found marginally associated with signaling pathways involved in hair growth. These results demonstrate that the 66 K SNP chip is a useful tool in the genomic analyses of cashmere goats. The successful chip design shows that the SHS-based target enrichment strategy could be applied to SNP chip design in other species.

  7. Clonality: an R package for testing clonal relatedness of two tumors from the same patient based on their genomic profiles.

    PubMed

    Ostrovnaya, Irina; Seshan, Venkatraman E; Olshen, Adam B; Begg, Colin B

    2011-06-15

    If a cancer patient develops multiple tumors, it is sometimes impossible to determine whether these tumors are independent or clonal based solely on pathological characteristics. Investigators have studied how to improve this diagnostic challenge by comparing the presence of loss of heterozygosity (LOH) at selected genetic locations of tumor samples, or by comparing genomewide copy number array profiles. We have previously developed statistical methodology to compare such genomic profiles for an evidence of clonality. We assembled the software for these tests in a new R package called 'Clonality'. For LOH profiles, the package contains significance tests. The analysis of copy number profiles includes a likelihood ratio statistic and reference distribution, as well as an option to produce various plots that summarize the results. Bioconductor (http://bioconductor.org/packages/release/bioc/html/Clonality.html) and http://www.mskcc.org/mskcc/html/13287.cfm.

  8. Chromhome: A rich internet application for accessing comparative chromosome homology maps

    PubMed Central

    Nagarajan, Sridevi; Rens, Willem; Stalker, James; Cox, Tony; Ferguson-Smith, Malcolm A

    2008-01-01

    Background Comparative genomics has become a significant research area in recent years, following the availability of a number of sequenced genomes. The comparison of genomes is of great importance in the analysis of functionally important genome regions. It can also be used to understand the phylogenetic relationships of species and the mechanisms leading to rearrangement of karyotypes during evolution. Many species have been studied at the cytogenetic level by cross species chromosome painting. With the large amount of such information, it has become vital to computerize the data and make them accessible worldwide. Chromhome is a comprehensive web application that is designed to provide cytogenetic comparisons among species and to fulfil this need. Results The Chromhome application architecture is multi-tiered with an interactive client layer, business logic and database layers. Enterprise java platform with open source framework OpenLaszlo is used to implement the Rich Internet Chromhome Application. Cross species comparative mapping raw data are collected and the processed information is stored into MySQL Chromhome database. Chromhome Release 1.0 contains 109 homology maps from 51 species. The data cover species from 14 orders and 30 families. The homology map displays all the chromosomes of the compared species as one image, making comparisons among species easier. Inferred data also provides maps of homologous regions that could serve as a guideline for researchers involved in phylogenetic or evolution based studies. Conclusion Chromhome provides a useful resource for comparative genomics, holding graphical homology maps of a wide range of species. It brings together cytogenetic data of many genomes under one roof. Inferred painting can often determine the chromosomal homologous regions between two species, if each has been compared with a common third species. Inferred painting greatly reduces the need to map entire genomes and helps focus only on relevant regions of the chromosomes of the species under study. Future releases of Chromhome will accommodate more species and their respective gene and BAC maps, in addition to chromosome painting data. Chromhome application provides a single-page interface (SPI) with desktop style layout, delivering a better and richer user experience. PMID:18366796

  9. Chromhome: a rich internet application for accessing comparative chromosome homology maps.

    PubMed

    Nagarajan, Sridevi; Rens, Willem; Stalker, James; Cox, Tony; Ferguson-Smith, Malcolm A

    2008-03-26

    Comparative genomics has become a significant research area in recent years, following the availability of a number of sequenced genomes. The comparison of genomes is of great importance in the analysis of functionally important genome regions. It can also be used to understand the phylogenetic relationships of species and the mechanisms leading to rearrangement of karyotypes during evolution. Many species have been studied at the cytogenetic level by cross species chromosome painting. With the large amount of such information, it has become vital to computerize the data and make them accessible worldwide. Chromhome http://www.chromhome.org is a comprehensive web application that is designed to provide cytogenetic comparisons among species and to fulfil this need. The Chromhome application architecture is multi-tiered with an interactive client layer, business logic and database layers. Enterprise java platform with open source framework OpenLaszlo is used to implement the Rich Internet Chromhome Application. Cross species comparative mapping raw data are collected and the processed information is stored into MySQL Chromhome database. Chromhome Release 1.0 contains 109 homology maps from 51 species. The data cover species from 14 orders and 30 families. The homology map displays all the chromosomes of the compared species as one image, making comparisons among species easier. Inferred data also provides maps of homologous regions that could serve as a guideline for researchers involved in phylogenetic or evolution based studies. Chromhome provides a useful resource for comparative genomics, holding graphical homology maps of a wide range of species. It brings together cytogenetic data of many genomes under one roof. Inferred painting can often determine the chromosomal homologous regions between two species, if each has been compared with a common third species. Inferred painting greatly reduces the need to map entire genomes and helps focus only on relevant regions of the chromosomes of the species under study. Future releases of Chromhome will accommodate more species and their respective gene and BAC maps, in addition to chromosome painting data. Chromhome application provides a single-page interface (SPI) with desktop style layout, delivering a better and richer user experience.

  10. Combined CRISPRi/a-Based Chemical Genetic Screens Reveal that Rigosertib Is a Microtubule-Destabilizing Agent. | Office of Cancer Genomics

    Cancer.gov

    Chemical libraries paired with phenotypic screens can now readily identify compounds with therapeutic potential. A central limitation to exploiting these compounds, however, has been in identifying their relevant cellular targets. Here, we present a two-tiered CRISPR-mediated chemical-genetic strategy for target identification: combined genome-wide knockdown and overexpression screening as well as focused, comparative chemical-genetic profiling.

  11. Inferring transposons activity chronology by TRANScendence - TEs database and de-novo mining tool.

    PubMed

    Startek, Michał Piotr; Nogły, Jakub; Gromadka, Agnieszka; Grzebelus, Dariusz; Gambin, Anna

    2017-10-16

    The constant progress in sequencing technology leads to ever increasing amounts of genomic data. In the light of current evidence transposable elements (TEs for short) are becoming useful tools for learning about the evolution of host genome. Therefore the software for genome-wide detection and analysis of TEs is of great interest. Here we describe the computational tool for mining, classifying and storing TEs from newly sequenced genomes. This is an online, web-based, user-friendly service, enabling users to upload their own genomic data, and perform de-novo searches for TEs. The detected TEs are automatically analyzed, compared to reference databases, annotated, clustered into families, and stored in TEs repository. Also, the genome-wide nesting structure of found elements are detected and analyzed by new method for inferring evolutionary history of TEs. We illustrate the functionality of our tool by performing a full-scale analyses of TE landscape in Medicago truncatula genome. TRANScendence is an effective tool for the de-novo annotation and classification of transposable elements in newly-acquired genomes. Its streamlined interface makes it well-suited for evolutionary studies.

  12. Comparative analyses of putative toxin gene homologs from an Old World viper, Daboia russelii

    PubMed Central

    Krishnan, Neeraja M.

    2017-01-01

    Availability of snake genome sequences has opened up exciting areas of research on comparative genomics and gene diversity. One of the challenges in studying snake genomes is the acquisition of biological material from live animals, especially from the venomous ones, making the process cumbersome and time-consuming. Here, we report comparative sequence analyses of putative toxin gene homologs from Russell’s viper (Daboia russelii) using whole-genome sequencing data obtained from shed skin. When compared with the major venom proteins in Russell’s viper studied previously, we found 45–100% sequence similarity between the venom proteins and their putative homologs in the skin. Additionally, comparative analyses of 20 putative toxin gene family homologs provided evidence of unique sequence motifs in nerve growth factor (NGF), platelet derived growth factor (PDGF), Kunitz/Bovine pancreatic trypsin inhibitor (Kunitz BPTI), cysteine-rich secretory proteins, antigen 5, andpathogenesis-related1 proteins (CAP) and cysteine-rich secretory protein (CRISP). In those derived proteins, we identified V11 and T35 in the NGF domain; F23 and A29 in the PDGF domain; N69, K2 and A5 in the CAP domain; and Q17 in the CRISP domain to be responsible for differences in the largest pockets across the protein domain structures in crotalines, viperines and elapids from the in silico structure-based analysis. Similarly, residues F10, Y11 and E20 appear to play an important role in the protein structures across the kunitz protein domain of viperids and elapids. Our study highlights the usefulness of shed skin in obtaining good quality high-molecular weight DNA for comparative genomic studies, and provides evidence towards the unique features and evolution of putative venom gene homologs in vipers. PMID:29230357

  13. Comparative Genomics of Burkholderia singularis sp. nov., a Low G+C Content, Free-Living Bacterium That Defies Taxonomic Dissection of the Genus Burkholderia

    PubMed Central

    Vandamme, Peter; Peeters, Charlotte; De Smet, Birgit; Price, Erin P.; Sarovich, Derek S.; Henry, Deborah A.; Hird, Trevor J.; Zlosnik, James E. A.; Mayo, Mark; Warner, Jeffrey; Baker, Anthony; Currie, Bart J.; Carlier, Aurélien

    2017-01-01

    Four Burkholderia pseudomallei-like isolates of human clinical origin were examined by a polyphasic taxonomic approach that included comparative whole genome analyses. The results demonstrated that these isolates represent a rare and unusual, novel Burkholderia species for which we propose the name B. singularis. The type strain is LMG 28154T (=CCUG 65685T). Its genome sequence has an average mol% G+C content of 64.34%, which is considerably lower than that of other Burkholderia species. The reduced G+C content of strain LMG 28154T was characterized by a genome wide AT bias that was not due to reduced GC-biased gene conversion or reductive genome evolution, but might have been caused by an altered DNA base excision repair pathway. B. singularis can be differentiated from other Burkholderia species by multilocus sequence analysis, MALDI-TOF mass spectrometry and a distinctive biochemical profile that includes the absence of nitrate reduction, a mucoid appearance on Columbia sheep blood agar, and a slowly positive oxidase reaction. Comparisons with publicly available whole genome sequences demonstrated that strain TSV85, an Australian water isolate, also represents the same species and therefore, to date, B. singularis has been recovered from human or environmental samples on three continents. PMID:28932212

  14. eHive: an artificial intelligence workflow system for genomic analysis.

    PubMed

    Severin, Jessica; Beal, Kathryn; Vilella, Albert J; Fitzgerald, Stephen; Schuster, Michael; Gordon, Leo; Ureta-Vidal, Abel; Flicek, Paul; Herrero, Javier

    2010-05-11

    The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.

  15. Draft Genomes, Phylogenetic Reconstruction, and Comparative Genomics of Two Novel Cohabiting Bacterial Symbionts Isolated from Frankliniella occidentalis

    PubMed Central

    Facey, Paul D.; Méric, Guillaume; Hitchings, Matthew D.; Pachebat, Justin A.; Hegarty, Matt J.; Chen, Xiaorui; Morgan, Laura V.A.; Hoeppner, James E.; Whitten, Miranda M.A.; Kirk, William D.J.; Dyson, Paul J.; Sheppard, Sam K.; Sol, Ricardo Del

    2015-01-01

    Obligate bacterial symbionts are widespread in many invertebrates, where they are often confined to specialized host cells and are transmitted directly from mother to progeny. Increasing numbers of these bacteria are being characterized but questions remain about their population structure and evolution. Here we take a comparative genomics approach to investigate two prominent bacterial symbionts (BFo1 and BFo2) isolated from geographically separated populations of western flower thrips, Frankliniella occidentalis. Our multifaceted approach to classifying these symbionts includes concatenated multilocus sequence analysis (MLSA) phylogenies, ribosomal multilocus sequence typing (rMLST), construction of whole-genome phylogenies, and in-depth genomic comparisons. We showed that the BFo1 genome clusters more closely to species in the genus Erwinia, and is a putative close relative to Erwinia aphidicola. BFo1 is also likely to have shared a common ancestor with Erwinia pyrifoliae/Erwinia amylovora and the nonpathogenic Erwinia tasmaniensis and genetic traits similar to Erwinia billingiae. The BFo1 genome contained virulence factors found in the genus Erwinia but represented a divergent lineage. In contrast, we showed that BFo2 belongs within the Enterobacteriales but does not group closely with any currently known bacterial species. Concatenated MLSA phylogenies indicate that it may have shared a common ancestor to the Erwinia and Pantoea genera, and based on the clustering of rMLST genes, it was most closely related to Pantoea ananatis but represented a divergent lineage. We reconstructed a core genome of a putative common ancestor of Erwinia and Pantoea and compared this with the genomes of BFo bacteria. BFo2 possessed none of the virulence determinants that were omnipresent in the Erwinia and Pantoea genera. Taken together, these data are consistent with BFo2 representing a highly novel species that maybe related to known Pantoea. PMID:26185096

  16. Draft Genomes, Phylogenetic Reconstruction, and Comparative Genomics of Two Novel Cohabiting Bacterial Symbionts Isolated from Frankliniella occidentalis.

    PubMed

    Facey, Paul D; Méric, Guillaume; Hitchings, Matthew D; Pachebat, Justin A; Hegarty, Matt J; Chen, Xiaorui; Morgan, Laura V A; Hoeppner, James E; Whitten, Miranda M A; Kirk, William D J; Dyson, Paul J; Sheppard, Sam K; Del Sol, Ricardo

    2015-07-15

    Obligate bacterial symbionts are widespread in many invertebrates, where they are often confined to specialized host cells and are transmitted directly from mother to progeny. Increasing numbers of these bacteria are being characterized but questions remain about their population structure and evolution. Here we take a comparative genomics approach to investigate two prominent bacterial symbionts (BFo1 and BFo2) isolated from geographically separated populations of western flower thrips, Frankliniella occidentalis. Our multifaceted approach to classifying these symbionts includes concatenated multilocus sequence analysis (MLSA) phylogenies, ribosomal multilocus sequence typing (rMLST), construction of whole-genome phylogenies, and in-depth genomic comparisons. We showed that the BFo1 genome clusters more closely to species in the genus Erwinia, and is a putative close relative to Erwinia aphidicola. BFo1 is also likely to have shared a common ancestor with Erwinia pyrifoliae/Erwinia amylovora and the nonpathogenic Erwinia tasmaniensis and genetic traits similar to Erwinia billingiae. The BFo1 genome contained virulence factors found in the genus Erwinia but represented a divergent lineage. In contrast, we showed that BFo2 belongs within the Enterobacteriales but does not group closely with any currently known bacterial species. Concatenated MLSA phylogenies indicate that it may have shared a common ancestor to the Erwinia and Pantoea genera, and based on the clustering of rMLST genes, it was most closely related to Pantoea ananatis but represented a divergent lineage. We reconstructed a core genome of a putative common ancestor of Erwinia and Pantoea and compared this with the genomes of BFo bacteria. BFo2 possessed none of the virulence determinants that were omnipresent in the Erwinia and Pantoea genera. Taken together, these data are consistent with BFo2 representing a highly novel species that maybe related to known Pantoea. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  17. Clinical significance of rare copy number variations in epilepsy: a case-control survey using microarray-based comparative genomic hybridization.

    PubMed

    Striano, Pasquale; Coppola, Antonietta; Paravidino, Roberta; Malacarne, Michela; Gimelli, Stefania; Robbiano, Angela; Traverso, Monica; Pezzella, Marianna; Belcastro, Vincenzo; Bianchi, Amedeo; Elia, Maurizio; Falace, Antonio; Gazzerro, Elisabetta; Ferlazzo, Edoardo; Freri, Elena; Galasso, Roberta; Gobbi, Giuseppe; Molinatto, Cristina; Cavani, Simona; Zuffardi, Orsetta; Striano, Salvatore; Ferrero, Giovanni Battista; Silengo, Margherita; Cavaliere, Maria Luigia; Benelli, Matteo; Magi, Alberto; Piccione, Maria; Dagna Bricarelli, Franca; Coviello, Domenico A; Fichera, Marco; Minetti, Carlo; Zara, Federico

    2012-03-01

    To perform an extensive search for genomic rearrangements by microarray-based comparative genomic hybridization in patients with epilepsy. Prospective cohort study. Epilepsy centers in Italy. Two hundred seventy-nine patients with unexplained epilepsy, 265 individuals with nonsyndromic mental retardation but no epilepsy, and 246 healthy control subjects were screened by microarray-based comparative genomic hybridization. Identification of copy number variations (CNVs) and gene enrichment. Rare CNVs occurred in 26 patients (9.3%) and 16 healthy control subjects (6.5%) (P = .26). The CNVs identified in patients were larger (P = .03) and showed higher gene content (P = .02) than those in control subjects. The CNVs larger than 1 megabase (P = .002) and including more than 10 genes (P = .005) occurred more frequently in patients than in control subjects. Nine patients (34.6%) among those harboring rare CNVs showed rearrangements associated with emerging microdeletion or microduplication syndromes. Mental retardation and neuropsychiatric features were associated with rare CNVs (P = .004), whereas epilepsy type was not. The CNV rate in patients with epilepsy and mental retardation or neuropsychiatric features is not different from that observed in patients with mental retardation only. Moreover, significant enrichment of genes involved in ion transport was observed within CNVs identified in patients with epilepsy. Patients with epilepsy show a significantly increased burden of large, rare, gene-rich CNVs, particularly when associated with mental retardation and neuropsychiatric features. The limited overlap between CNVs observed in the epilepsy group and those observed in the group with mental retardation only as well as the involvement of specific (ion channel) genes indicate a specific association between the identified CNVs and epilepsy. Screening for CNVs should be performed for diagnostic purposes preferentially in patients with epilepsy and mental retardation or neuropsychiatric features.

  18. RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336”

    PubMed Central

    Kumar, Ranjit; Lawrence, Mark L.; Watt, James; Cooksey, Amanda M.; Burgess, Shane C.; Nanduri, Bindu

    2012-01-01

    Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify “novel” genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method. The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations. PMID:22276113

  19. RNA-seq based transcriptional map of bovine respiratory disease pathogen "Histophilus somni 2336".

    PubMed

    Kumar, Ranjit; Lawrence, Mark L; Watt, James; Cooksey, Amanda M; Burgess, Shane C; Nanduri, Bindu

    2012-01-01

    Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method.The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations.

  20. Predicting Protein Function by Genomic Context: Quantitative Evaluation and Qualitative Inferences

    PubMed Central

    Huynen, Martijn; Snel, Berend; Lathe, Warren; Bork, Peer

    2000-01-01

    Various new methods have been proposed to predict functional interactions between proteins based on the genomic context of their genes. The types of genomic context that they use are Type I: the fusion of genes; Type II: the conservation of gene-order or co-occurrence of genes in potential operons; and Type III: the co-occurrence of genes across genomes (phylogenetic profiles). Here we compare these types for their coverage, their correlations with various types of functional interaction, and their overlap with homology-based function assignment. We apply the methods to Mycoplasma genitalium, the standard benchmarking genome in computational and experimental genomics. Quantitatively, conservation of gene order is the technique with the highest coverage, applying to 37% of the genes. By combining gene order conservation with gene fusion (6%), the co-occurrence of genes in operons in absence of gene order conservation (8%), and the co-occurrence of genes across genomes (11%), significant context information can be obtained for 50% of the genes (the categories overlap). Qualitatively, we observe that the functional interactions between genes are stronger as the requirements for physical neighborhood on the genome are more stringent, while the fraction of potential false positives decreases. Moreover, only in cases in which gene order is conserved in a substantial fraction of the genomes, in this case six out of twenty-five, does a single type of functional interaction (physical interaction) clearly dominate (>80%). In other cases, complementary function information from homology searches, which is available for most of the genes with significant genomic context, is essential to predict the type of interaction. Using a combination of genomic context and homology searches, new functional features can be predicted for 10% of M. genitalium genes. PMID:10958638

  1. Ultrafast DNA sequencing on a microchip by a hybrid separation mechanism that gives 600 bases in 6.5 minutes.

    PubMed

    Fredlake, Christopher P; Hert, Daniel G; Kan, Cheuk-Wai; Chiesl, Thomas N; Root, Brian E; Forster, Ryan E; Barron, Annelise E

    2008-01-15

    To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require approximately 70 min to deliver approximately 650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered "hybrid" mechanism of DNA electromigration, in which DNA molecules alternate rapidly between repeating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs.

  2. Ultrafast DNA sequencing on a microchip by a hybrid separation mechanism that gives 600 bases in 6.5 minutes

    PubMed Central

    Fredlake, Christopher P.; Hert, Daniel G.; Kan, Cheuk-Wai; Chiesl, Thomas N.; Root, Brian E.; Forster, Ryan E.; Barron, Annelise E.

    2008-01-01

    To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require ≈70 min to deliver ≈650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered “hybrid” mechanism of DNA electromigration, in which DNA molecules alternate rapidly between reptating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs. PMID:18184818

  3. Comparative genome analysis of Basidiomycete fungi

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Riley, Robert; Salamov, Asaf; Henrissat, Bernard

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism.more » Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.« less

  4. Comparison of Models and Whole-Genome Profiling Approaches for Genomic-Enabled Prediction of Septoria Tritici Blotch, Stagonospora Nodorum Blotch, and Tan Spot Resistance in Wheat.

    PubMed

    Juliana, Philomin; Singh, Ravi P; Singh, Pawan K; Crossa, Jose; Rutkoski, Jessica E; Poland, Jesse A; Bergstrom, Gary C; Sorrells, Mark E

    2017-07-01

    The leaf spotting diseases in wheat that include Septoria tritici blotch (STB) caused by , Stagonospora nodorum blotch (SNB) caused by , and tan spot (TS) caused by pose challenges to breeding programs in selecting for resistance. A promising approach that could enable selection prior to phenotyping is genomic selection that uses genome-wide markers to estimate breeding values (BVs) for quantitative traits. To evaluate this approach for seedling and/or adult plant resistance (APR) to STB, SNB, and TS, we compared the predictive ability of least-squares (LS) approach with genomic-enabled prediction models including genomic best linear unbiased predictor (GBLUP), Bayesian ridge regression (BRR), Bayes A (BA), Bayes B (BB), Bayes Cπ (BC), Bayesian least absolute shrinkage and selection operator (BL), and reproducing kernel Hilbert spaces markers (RKHS-M), a pedigree-based model (RKHS-P) and RKHS markers and pedigree (RKHS-MP). We observed that LS gave the lowest prediction accuracies and RKHS-MP, the highest. The genomic-enabled prediction models and RKHS-P gave similar accuracies. The increase in accuracy using genomic prediction models over LS was 48%. The mean genomic prediction accuracies were 0.45 for STB (APR), 0.55 for SNB (seedling), 0.66 for TS (seedling) and 0.48 for TS (APR). We also compared markers from two whole-genome profiling approaches: genotyping by sequencing (GBS) and diversity arrays technology sequencing (DArTseq) for prediction. While, GBS markers performed slightly better than DArTseq, combining markers from the two approaches did not improve accuracies. We conclude that implementing GS in breeding for these diseases would help to achieve higher accuracies and rapid gains from selection. Copyright © 2017 Crop Science Society of America.

  5. A high-resolution genetic linkage map and QTL fine mapping for growth-related traits and sex in the Yangtze River common carp (Cyprinus carpio haematopterus).

    PubMed

    Feng, Xiu; Yu, Xiaomu; Fu, Beide; Wang, Xinhua; Liu, Haiyang; Pang, Meixia; Tong, Jingou

    2018-04-02

    A high-density genetic linkage map is essential for QTL fine mapping, comparative genome analysis, identification of candidate genes and marker-assisted selection for economic traits in aquaculture species. The Yangtze River common carp (Cyprinus carpio haematopterus) is one of the most important aquacultured strains in China. However, quite limited genetics and genomics resources have been developed for genetic improvement of economic traits in such strain. A high-resolution genetic linkage map was constructed by using 7820 2b-RAD (2b-restriction site-associated DNA) and 295 microsatellite markers in a F2 family of the Yangtze River common carp (C. c. haematopterus). The length of the map was 4586.56 cM with an average marker interval of 0.57 cM. Comparative genome mapping revealed that a high proportion (70%) of markers with disagreed chromosome location was observed between C. c. haematopterus and another common carp strain (subspecies) C. c. carpio. A clear 2:1 relationship was observed between C. c. haematopterus linkage groups (LGs) and zebrafish (Danio rerio) chromosomes. Based on the genetic map, 21 QTLs for growth-related traits were detected on 12 LGs, and contributed values of phenotypic variance explained (PVE) ranging from 16.3 to 38.6%, with LOD scores ranging from 4.02 to 11.13. A genome-wide significant QTL (LOD = 10.83) and three chromosome-wide significant QTLs (mean LOD = 4.84) for sex were mapped on LG50 and LG24, respectively. A 1.4 cM confidence interval of QTL for all growth-related traits showed conserved synteny with a 2.06 M segment on chromosome 14 of D. rerio. Five potential candidate genes were identified by blast search in this genomic region, including a well-studied multi-functional growth related gene, Apelin. We mapped a set of suggestive and significant QTLs for growth-related traits and sex based on a high-density genetic linkage map using SNP and microsatellite markers for Yangtze River common carp. Several candidate growth genes were also identified from the QTL regions by comparative mapping. This genetic map would provide a basis for genome assembly and comparative genomics studies, and those QTL-derived candidate genes and genetic markers are useful genomic resources for marker-assisted selection (MAS) of growth-related traits in the Yangtze River common carp.

  6. SPOCS: software for predicting and visualizing orthology/paralogy relationships among genomes.

    PubMed

    Curtis, Darren S; Phillips, Aaron R; Callister, Stephen J; Conlan, Sean; McCue, Lee Ann

    2013-10-15

    At the rate that prokaryotic genomes can now be generated, comparative genomics studies require a flexible method for quickly and accurately predicting orthologs among the rapidly changing set of genomes available. SPOCS implements a graph-based ortholog prediction method to generate a simple tab-delimited table of orthologs and in addition, html files that provide a visualization of the predicted ortholog/paralog relationships to which gene/protein expression metadata may be overlaid. A SPOCS web application is freely available at http://cbb.pnnl.gov/portal/tools/spocs.html. Source code for Linux systems is also freely available under an open source license at http://cbb.pnnl.gov/portal/software/spocs.html; the Boost C++ libraries and BLAST are required.

  7. A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data.

    PubMed

    Bertl, Johanna; Guo, Qianyun; Juul, Malene; Besenbacher, Søren; Nielsen, Morten Muhlig; Hornshøj, Henrik; Pedersen, Jakob Skou; Hobolth, Asger

    2018-04-19

    Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.

  8. Microsatellite genotyping and genome-wide single nucleotide polymorphism-based indices of Plasmodium falciparum diversity within clinical infections.

    PubMed

    Murray, Lee; Mobegi, Victor A; Duffy, Craig W; Assefa, Samuel A; Kwiatkowski, Dominic P; Laman, Eugene; Loua, Kovana M; Conway, David J

    2016-05-12

    In regions where malaria is endemic, individuals are often infected with multiple distinct parasite genotypes, a situation that may impact on evolution of parasite virulence and drug resistance. Most approaches to studying genotypic diversity have involved analysis of a modest number of polymorphic loci, although whole genome sequencing enables a broader characterisation of samples. PCR-based microsatellite typing of a panel of ten loci was performed on Plasmodium falciparum in 95 clinical isolates from a highly endemic area in the Republic of Guinea, to characterize within-isolate genetic diversity. Separately, single nucleotide polymorphism (SNP) data from genome-wide short-read sequences of the same samples were used to derive within-isolate fixation indices (F ws), an inverse measure of diversity within each isolate compared to overall local genetic diversity. The latter indices were compared with the microsatellite results, and also with indices derived by randomly sampling modest numbers of SNPs. As expected, the number of microsatellite loci with more than one allele in each isolate was highly significantly inversely correlated with the genome-wide F ws fixation index (r = -0.88, P < 0.001). However, the microsatellite analysis revealed that most isolates contained mixed genotypes, even those that had no detectable genome sequence heterogeneity. Random sampling of different numbers of SNPs showed that an F ws index derived from ten or more SNPs with minor allele frequencies of >10 % had high correlation (r > 0.90) with the index derived using all SNPs. Different types of data give highly correlated indices of within-infection diversity, although PCR-based analysis detects low-level minority genotypes not apparent in bulk sequence analysis. When whole-genome data are not obtainable, quantitative assay of ten or more SNPs can yield a reasonably accurate estimate of the within-infection fixation index (F ws).

  9. Comprehensive Genomic Profiling Aids in Distinguishing Metastatic Recurrence from Second Primary Cancers

    PubMed Central

    Weinberg, Benjamin A.; Gowen, Kyle; Lee, Thomas K.; Ou, Sai‐Hong Ignatius; Bristow, Robert; Krill, Lauren; Almira‐Suarez, M. Isabel; Ali, Siraj M.; Miller, Vincent A.; Liu, Stephen V.

    2017-01-01

    Abstract Background. Metastatic recurrence after treatment for locoregional cancer is a major cause of morbidity and cancer‐specific mortality. Distinguishing metastatic recurrence from the development of a second primary cancer has important prognostic and therapeutic value and represents a difficult clinical scenario. Advances beyond histopathological comparison are needed. We sought to interrogate the ability of comprehensive genomic profiling (CGP) to aid in distinguishing between these clinical scenarios. Materials and Methods. We identified three prospective cases of recurrent tumors in patients previously treated for localized cancers in which histologic analyses suggested subsequent development of a distinct second primary. Paired samples from the original primary and recurrent tumor were subjected to hybrid capture next‐generation sequencing‐based CGP to identify base pair substitutions, insertions, deletions, copy number alterations (CNA), and chromosomal rearrangements. Genomic profiles between paired samples were compared using previously established statistical clonality assessment software to gauge relatedness beyond global CGP similarities. Results. A high degree of similarity was observed among genomic profiles from morphologically distinct primary and recurrent tumors. Genomic information suggested reclassification as recurrent metastatic disease, and patients received therapy for metastatic disease based on the molecular determination. Conclusions. Our cases demonstrate an important adjunct role for CGP technologies in separating metastatic recurrence from development of a second primary cancer. Larger series are needed to confirm our observations, but comparative CGP may be considered in patients for whom distinguishing metastatic recurrence from a second primary would alter the therapeutic approach. Implications for Practice. Distinguishing a metastatic recurrence from a second primary cancer can represent a difficult clinicopathologic problem but has important prognostic and therapeutic implications. Approaches to aid histologic analysis may improve clinician and pathologist confidence in this increasingly common clinical scenario. Our series provides early support for incorporating paired comprehensive genomic profiling in clinical situations in which determination of metastatic recurrence versus a distinct second primary cancer would influence patient management. PMID:28193735

  10. Interactive or static reports to guide clinical interpretation of cancer genomics.

    PubMed

    Gray, Stacy W; Gagan, Jeffrey; Cerami, Ethan; Cronin, Angel M; Uno, Hajime; Oliver, Nelly; Lowenstein, Carol; Lederman, Ruth; Revette, Anna; Suarez, Aaron; Lee, Charlotte; Bryan, Jordan; Sholl, Lynette; Van Allen, Eliezer M

    2018-05-01

    Misinterpretation of complex genomic data presents a major challenge in the implementation of precision oncology. We sought to determine whether interactive genomic reports with embedded clinician education and optimized data visualization improved genomic data interpretation. We conducted a randomized, vignette-based survey study to determine whether exposure to interactive reports for a somatic gene panel, as compared to static reports, improves physicians' genomic comprehension and report-related satisfaction (overall scores calculated across 3 vignettes, range 0-18 and 1-4, respectively, higher score corresponding with improved endpoints). One hundred and five physicians at a tertiary cancer center participated (29% participation rate): 67% medical, 20% pediatric, 7% radiation, and 7% surgical oncology; 37% female. Prior to viewing the case-based vignettes, 34% of the physicians reported difficulty making treatment recommendations based on the standard static report. After vignette/report exposure, physicians' overall comprehension scores did not differ by report type (mean score: interactive 11.6 vs static 10.5, difference = 1.1, 95% CI, -0.3, 2.5, P = .13). However, physicians exposed to the interactive report were more likely to correctly assess sequencing quality (P < .001) and understand when reports needed to be interpreted with caution (eg, low tumor purity; P = .02). Overall satisfaction scores were higher in the interactive group (mean score 2.5 vs 2.1, difference = 0.4, 95% CI, 0.2-0.7, P = .001). Interactive genomic reports may improve physicians' ability to accurately assess genomic data and increase report-related satisfaction. Additional research in users' genomic needs and efforts to integrate interactive reports into electronic health records may facilitate the implementation of precision oncology.

  11. DArT Markers Effectively Target Gene Space in the Rye Genome

    PubMed Central

    Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

    2016-01-01

    Large genome size and complexity hamper considerably the genomics research in relevant species. Rye (Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes. PMID:27833625

  12. DArT Markers Effectively Target Gene Space in the Rye Genome.

    PubMed

    Gawroński, Piotr; Pawełkowicz, Magdalena; Tofil, Katarzyna; Uszyński, Grzegorz; Sharifova, Saida; Ahluwalia, Shivaksh; Tyrka, Mirosław; Wędzony, Maria; Kilian, Andrzej; Bolibok-Brągoszewska, Hanna

    2016-01-01

    Large genome size and complexity hamper considerably the genomics research in relevant species. Rye ( Secale cereale L.) has one of the largest genomes among cereal crops and repetitive sequences account for over 90% of its length. Diversity Arrays Technology is a high-throughput genotyping method, in which a preferential sampling of gene-rich regions is achieved through the use of methylation sensitive restriction enzymes. We obtained sequences of 6,177 rye DArT markers and following a redundancy analysis assembled them into 3,737 non-redundant sequences, which were then used in homology searches against five Pooideae sequence sets. In total 515 DArT sequences could be incorporated into publicly available rye genome zippers providing a starting point for the integration of DArT- and transcript-based genomics resources in rye. Using Blast2Go pipeline we attributed putative gene functions to 1101 (29.4%) of the non-redundant DArT marker sequences, including 132 sequences with putative disease resistance-related functions, which were found to be preferentially located in the 4RL and 6RL chromosomes. Comparative analysis based on the DArT sequences revealed obvious inconsistencies between two recently published high density consensus maps of rye. Furthermore we demonstrated that DArT marker sequences can be a source of SSR polymorphisms. Obtained data demonstrate that DArT markers effectively target gene space in the large, complex, and repetitive rye genome. Through the annotation of putative gene functions and the alignment of DArT sequences relative to reference genomes we obtained information, that will complement the results of the studies, where DArT genotyping was deployed, by simplifying the gene ontology and microcolinearity based identification of candidate genes.

  13. Small RNA-based prediction of hybrid performance in maize.

    PubMed

    Seifert, Felix; Thiemann, Alexander; Schrag, Tobias A; Rybka, Dominika; Melchinger, Albrecht E; Frisch, Matthias; Scholten, Stefan

    2018-05-21

    Small RNA (sRNA) sequences are known to have a broad impact on gene regulation by various mechanisms. Their performance for the prediction of hybrid traits has not yet been analyzed. Our objective was to analyze the relation of parental sRNA expression with the performance of their hybrids, to develop a sRNA-based prediction approach, and to compare it to more common SNP and mRNA transcript based predictions using a factorial mating scheme of a maize hybrid breeding program. Correlation of genomic differences and messenger RNA (mRNA) or sRNA expression differences between parental lines with hybrid performance of their hybrids revealed that sRNAs showed an inverse relationship in contrast to the other two data types. We associated differences for SNPs, mRNA and sRNA expression between parental inbred lines with the performance of their hybrid combinations and developed two prediction approaches using distance measures based on associated markers. Cross-validations revealed parental differences in sRNA expression to be strong predictors for hybrid performance for grain yield in maize, comparable to genomic and mRNA data. The integration of both positively and negatively associated markers in the prediction approaches enhanced the prediction accurary. The associated sRNAs belong predominantly to the canonical size classes of 22- and 24-nt that show specific genomic mapping characteristics. Expression profiles of sRNA are a promising alternative to SNPs or mRNA expression profiles for hybrid prediction, especially for plant species without reference genome or transcriptome information. The characteristics of the sRNAs we identified suggest that association studies based on breeding populations facilitate the identification of sRNAs involved in hybrid performance.

  14. DNA Asymmetric Strand Bias Affects the Amino Acid Composition of Mitochondrial Proteins

    PubMed Central

    Min, Xiang Jia; Hickey, Donal A.

    2007-01-01

    Abstract Variations in GC content between genomes have been extensively documented. Genomes with comparable GC contents can, however, still differ in the apportionment of the G and C nucleotides between the two DNA strands. This asymmetric strand bias is known as GC skew. Here, we have investigated the impact of differences in nucleotide skew on the amino acid composition of the encoded proteins. We compared orthologous genes between animal mitochondrial genomes that show large differences in GC and AT skews. Specifically, we compared the mitochondrial genomes of mammals, which are characterized by a negative GC skew and a positive AT skew, to those of flatworms, which show the opposite skews for both GC and AT base pairs. We found that the mammalian proteins are highly enriched in amino acids encoded by CA-rich codons (as predicted by their negative GC and positive AT skews), whereas their flatworm orthologs were enriched in amino acids encoded by GT-rich codons (also as predicted from their skews). We found that these differences in mitochondrial strand asymmetry (measured as GC and AT skews) can have very large, predictable effects on the composition of the encoded proteins. PMID:17974594

  15. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea.

    PubMed

    Makarova, Kira S; Sorokin, Alexander V; Novichkov, Pavel S; Wolf, Yuri I; Koonin, Eugene V

    2007-11-27

    An evolutionary classification of genes from sequenced genomes that distinguishes between orthologs and paralogs is indispensable for genome annotation and evolutionary reconstruction. Shortly after multiple genome sequences of bacteria, archaea, and unicellular eukaryotes became available, an attempt on such a classification was implemented in Clusters of Orthologous Groups of proteins (COGs). Rapid accumulation of genome sequences creates opportunities for refining COGs but also represents a challenge because of error amplification. One of the practical strategies involves construction of refined COGs for phylogenetically compact subsets of genomes. New Archaeal Clusters of Orthologous Genes (arCOGs) were constructed for 41 archaeal genomes (13 Crenarchaeota, 27 Euryarchaeota and one Nanoarchaeon) using an improved procedure that employs a similarity tree between smaller, group-specific clusters, semi-automatically partitions orthology domains in multidomain proteins, and uses profile searches for identification of remote orthologs. The annotation of arCOGs is a consensus between three assignments based on the COGs, the CDD database, and the annotations of homologs in the NR database. The 7538 arCOGs, on average, cover approximately 88% of the genes in a genome compared to a approximately 76% coverage in COGs. The finer granularity of ortholog identification in the arCOGs is apparent from the fact that 4538 arCOGs correspond to 2362 COGs; approximately 40% of the arCOGs are new. The archaeal gene core (protein-coding genes found in all 41 genome) consists of 166 arCOGs. The arCOGs were used to reconstruct gene loss and gene gain events during archaeal evolution and gene sets of ancestral forms. The Last Archaeal Common Ancestor (LACA) is conservatively estimated to possess 996 genes compared to 1245 and 1335 genes for the last common ancestors of Crenarchaeota and Euryarchaeota, respectively. It is inferred that LACA was a chemoautotrophic hyperthermophile that, in addition to the core archaeal functions, encoded more idiosyncratic systems, e.g., the CASS systems of antivirus defense and some toxin-antitoxin systems. The arCOGs provide a convenient, flexible framework for functional annotation of archaeal genomes, comparative genomics and evolutionary reconstructions. Genomic reconstructions suggest that the last common ancestor of archaea might have been (nearly) as advanced as the modern archaeal hyperthermophiles. ArCOGs and related information are available at: ftp://ftp.ncbi.nih.gov/pub/koonin/arCOGs/.

  16. Genome resolved analysis of a premature infant gut microbial community reveals a Varibaculum cambriense genome and a shift towards fermentation-based metabolism during the third week of life.

    PubMed

    Brown, Christopher T; Sharon, Itai; Thomas, Brian C; Castelle, Cindy J; Morowitz, Michael J; Banfield, Jillian F

    2013-12-17

    The premature infant gut has low individual but high inter-individual microbial diversity compared with adults. Based on prior 16S rRNA gene surveys, many species from this environment are expected to be similar to those previously detected in the human microbiota. However, the level of genomic novelty and metabolic variation of strains found in the infant gut remains relatively unexplored. To study the stability and function of early microbial colonizers of the premature infant gut, nine stool samples were taken during the third week of life of a premature male infant delivered via Caesarean section. Metagenomic sequences were assembled and binned into near-complete and partial genomes, enabling strain-level genomic analysis of the microbial community.We reconstructed eleven near-complete and six partial bacterial genomes representative of the key members of the microbial community. Twelve of these genomes share >90% putative ortholog amino acid identity with reference genomes. Manual curation of the assembly of one particularly novel genome resulted in the first essentially complete genome sequence (in three pieces, the order of which could not be determined due to a repeat) for Varibaculum cambriense (strain Dora), a medically relevant species that has been implicated in abscess formation.During the period studied, the microbial community undergoes a compositional shift, in which obligate anaerobes (fermenters) overtake Escherichia coli as the most abundant species. Other species remain stable, probably due to their ability to either respire anaerobically or grow by fermentation, and their capacity to tolerate fluctuating levels of oxygen. Metabolic predictions for V. cambriense suggest that, like other members of the microbial community, this organism is able to process various sugar substrates and make use of multiple different electron acceptors during anaerobic respiration. Genome comparisons within the family Actinomycetaceae reveal important differences related to respiratory metabolism and motility. Genome-based analysis provided direct insight into strain-specific potential for anaerobic respiration and yielded the first genome for the genus Varibaculum. Importantly, comparison of these de novo assembled genomes with closely related isolate genomes supported the accuracy of the metagenomic methodology. Over a one-week period, the early gut microbial community transitioned to a community with a higher representation of obligate anaerobes, emphasizing both taxonomic and metabolic instability during colonization.

  17. Genome resolved analysis of a premature infant gut microbial community reveals a Varibaculum cambriense genome and a shift towards fermentation-based metabolism during the third week of life

    PubMed Central

    2013-01-01

    Background The premature infant gut has low individual but high inter-individual microbial diversity compared with adults. Based on prior 16S rRNA gene surveys, many species from this environment are expected to be similar to those previously detected in the human microbiota. However, the level of genomic novelty and metabolic variation of strains found in the infant gut remains relatively unexplored. Results To study the stability and function of early microbial colonizers of the premature infant gut, nine stool samples were taken during the third week of life of a premature male infant delivered via Caesarean section. Metagenomic sequences were assembled and binned into near-complete and partial genomes, enabling strain-level genomic analysis of the microbial community. We reconstructed eleven near-complete and six partial bacterial genomes representative of the key members of the microbial community. Twelve of these genomes share >90% putative ortholog amino acid identity with reference genomes. Manual curation of the assembly of one particularly novel genome resulted in the first essentially complete genome sequence (in three pieces, the order of which could not be determined due to a repeat) for Varibaculum cambriense (strain Dora), a medically relevant species that has been implicated in abscess formation. During the period studied, the microbial community undergoes a compositional shift, in which obligate anaerobes (fermenters) overtake Escherichia coli as the most abundant species. Other species remain stable, probably due to their ability to either respire anaerobically or grow by fermentation, and their capacity to tolerate fluctuating levels of oxygen. Metabolic predictions for V. cambriense suggest that, like other members of the microbial community, this organism is able to process various sugar substrates and make use of multiple different electron acceptors during anaerobic respiration. Genome comparisons within the family Actinomycetaceae reveal important differences related to respiratory metabolism and motility. Conclusions Genome-based analysis provided direct insight into strain-specific potential for anaerobic respiration and yielded the first genome for the genus Varibaculum. Importantly, comparison of these de novo assembled genomes with closely related isolate genomes supported the accuracy of the metagenomic methodology. Over a one-week period, the early gut microbial community transitioned to a community with a higher representation of obligate anaerobes, emphasizing both taxonomic and metabolic instability during colonization. PMID:24451181

  18. Inclusion of Population-specific Reference Panel from India to the 1000 Genomes Phase 3 Panel Improves Imputation Accuracy.

    PubMed

    Ahmad, Meraj; Sinha, Anubhav; Ghosh, Sreya; Kumar, Vikrant; Davila, Sonia; Yajnik, Chittaranjan S; Chandak, Giriraj R

    2017-07-27

    Imputation is a computational method based on the principle of haplotype sharing allowing enrichment of genome-wide association study datasets. It depends on the haplotype structure of the population and density of the genotype data. The 1000 Genomes Project led to the generation of imputation reference panels which have been used globally. However, recent studies have shown that population-specific panels provide better enrichment of genome-wide variants. We compared the imputation accuracy using 1000 Genomes phase 3 reference panel and a panel generated from genome-wide data on 407 individuals from Western India (WIP). The concordance of imputed variants was cross-checked with next-generation re-sequencing data on a subset of genomic regions. Further, using the genome-wide data from 1880 individuals, we demonstrate that WIP works better than the 1000 Genomes phase 3 panel and when merged with it, significantly improves the imputation accuracy throughout the minor allele frequency range. We also show that imputation using only South Asian component of the 1000 Genomes phase 3 panel works as good as the merged panel, making it computationally less intensive job. Thus, our study stresses that imputation accuracy using 1000 Genomes phase 3 panel can be further improved by including population-specific reference panels from South Asia.

  19. Comparative Genomic Analysis of Two Clonally Related Multidrug Resistant Mycobacterium tuberculosis by Single Molecule Real Time Sequencing.

    PubMed

    Leung, Kenneth Siu-Sing; Siu, Gilman Kit-Hang; Tam, Kingsley King-Gee; To, Sabrina Wai-Chi; Rajwani, Rahim; Ho, Pak-Leung; Wong, Samson Sai-Yin; Zhao, Wei W; Ma, Oliver Chiu-Kit; Yam, Wing-Cheong

    2017-01-01

    Background: Multidrug-resistant tuberculosis (MDR-TB) is posing a major threat to global TB control. In this study, we focused on two consecutive MDR-TB isolated from the same patient before and after the initiation of anti-TB treatment. To better understand the genomic characteristics of MDR-TB, Single Molecule Real-Time (SMRT) Sequencing and comparative genomic analyses was performed to identify mutations that contributed to the stepwise development of drug resistance and growth fitness in MDR-TB under in vivo challenge of anti-TB drugs. Result: Both pre-treatment and post-treatment strain demonstrated concordant phenotypic and genotypic susceptibility profiles toward rifampicin, pyrazinamide, streptomycin, fluoroquinolones, aminoglycosides, cycloserine, ethionamide, and para-aminosalicylic acid. However, although both strains carried identical missense mutations at rpoB S531L, inhA C-15T, and embB M306V, MYCOTB Sensititre assay showed that the post-treatment strain had 16-, 8-, and 4-fold elevation in the minimum inhibitory concentrations (MICs) toward rifabutin, isoniazid, and ethambutol respectively. The results have indicated the presence of additional resistant-related mutations governing the stepwise development of MDR-TB. Further comparative genomic analyses have identified three additional polymorphisms between the clinical isolates. These include a single nucleotide deletion at nucleotide position 360 of rv0888 in pre-treatment strain, and a missense mutation at rv3303c ( lpdA) V44I and a 6-bp inframe deletion at codon 67-68 in rv2071c ( cobM) in the post-treatment strain. Multiple sequence alignment showed that these mutations were occurring at highly conserved regions among pathogenic mycobacteria. Using structural-based and sequence-based algorithms, we further predicted that the mutations potentially have deleterious effect on protein function. Conclusion: This is the first study that compared the full genomes of two clonally-related MDR-TB clinical isolates during the course of anti-TB treatment. Our work has demonstrated the robustness of SMRT Sequencing in identifying mutations among MDR-TB clinical isolates. Comparative genome analysis also suggested novel mutations at rv0888, lpdA , and cobM that might explain the difference in antibiotic resistance and growth pattern between the two MDR-TB strains.

  20. HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads

    PubMed Central

    Li, Pinghao; Jiang, Xiaoqian; Wang, Shuang; Kim, Jihoon; Xiong, Hongkai; Ohno-Machado, Lucila

    2014-01-01

    Background and objective Short-read sequencing is becoming the standard of practice for the study of structural variants associated with disease. However, with the growth of sequence data largely surpassing reasonable storage capability, the biomedical community is challenged with the management, transfer, archiving, and storage of sequence data. Methods We developed Hierarchical mUlti-reference Genome cOmpression (HUGO), a novel compression algorithm for aligned reads in the sorted Sequence Alignment/Map (SAM) format. We first aligned short reads against a reference genome and stored exactly mapped reads for compression. For the inexact mapped or unmapped reads, we realigned them against different reference genomes using an adaptive scheme by gradually shortening the read length. Regarding the base quality value, we offer lossy and lossless compression mechanisms. The lossy compression mechanism for the base quality values uses k-means clustering, where a user can adjust the balance between decompression quality and compression rate. The lossless compression can be produced by setting k (the number of clusters) to the number of different quality values. Results The proposed method produced a compression ratio in the range 0.5–0.65, which corresponds to 35–50% storage savings based on experimental datasets. The proposed approach achieved 15% more storage savings over CRAM and comparable compression ratio with Samcomp (CRAM and Samcomp are two of the state-of-the-art genome compression algorithms). The software is freely available at https://sourceforge.net/projects/hierachicaldnac/with a General Public License (GPL) license. Limitation Our method requires having different reference genomes and prolongs the execution time for additional alignments. Conclusions The proposed multi-reference-based compression algorithm for aligned reads outperforms existing single-reference based algorithms. PMID:24368726

Top