Science.gov

Sample records for gene genomic structure

  1. Gene3D: Structural Assignment for Whole Genes and Genomes Using the CATH Domain Structure Database

    PubMed Central

    Buchan, Daniel W.A.; Shepherd, Adrian J.; Lee, David; Pearl, Frances M.G.; Rison, Stuart C.G.; Thornton, Janet M.; Orengo, Christine A.

    2002-01-01

    We present a novel web-based resource, Gene3D, of precalculated structural assignments to gene sequences and whole genomes. This resource assigns structural domains from the CATH database to whole genes and links these to their curated functional and structural annotations within the CATH domain structure database, the functional Dictionary of Homologous Superfamilies (DHS) and PDBsum. Currently Gene3D provides annotation for 36 complete genomes (two eukaryotes, six archaea, and 28 bacteria). On average, between 30% and 40% of the genes of a given genome can be structurally annotated. Matches to structural domains are found using the profile-based method (PSI-BLAST). and a novel protocol, DRange, is used to resolve conflicts in matches involving different homologous superfamilies. PMID:11875040

  2. Genomic structure of the human prion protein gene.

    PubMed Central

    Puckett, C; Concannon, P; Casey, C; Hood, L

    1991-01-01

    Creutzfeld-Jacob disease and Gerstmann-Sträussler syndrome are rare degenerative disorders of the nervous system which have been genetically linked to the prion protein (PrP) gene. The PrP gene encodes a host glycoprotein of unknown function and is located on the short arm of chromosome 20, a region with few known genes or anonymous markers. The complete structure of the PrP gene in man has not been determined despite considerable interest in its relationship to these unusual disorders. We have determined that the human PrP gene has the same simple genomic structure seen in the hamster gene and consists of two exons and a single intron. In contrast to the hamster PrP gene the human gene appears to have a single major transcriptional start site. The region immediately 5' of the transcriptional start site of the human PrP gene demonstrates the GC-rich features commonly seen in housekeeping genes. Curiously, the genomic clone we have isolated contains a 24-bp deletion that removes one of five octameric peptide repeats predicted to form a B-pleated sheet in this region of the PrP. We have also identified 5' of the PrP gene an RFLP which has a high degree of heterozygosity and which should serve as a useful marker for the pter-12 region of human chromosome 20. Images Figure 3 Figure 5 PMID:1678248

  3. Genome structure and gene content in protist mitochondrial DNAs.

    PubMed

    Gray, M W; Lang, B F; Cedergren, R; Golding, G B; Lemieux, C; Sankoff, D; Turmel, M; Brossard, N; Delage, E; Littlejohn, T G; Plante, I; Rioux, P; Saint-Louis, D; Zhu, Y; Burger, G

    1998-02-15

    Although the collection of completely sequenced mitochondrial genomes is expanding rapidly, only recently has a phylogenetically broad representation of mtDNA sequences from protists (mostly unicellular eukaryotes) become available. This review surveys the 23 complete protist mtDNA sequences that have been determined to date, commenting on such aspects as mitochondrial genome structure, gene content, ribosomal RNA, introns, transfer RNAs and the genetic code and phylogenetic implications. We also illustrate the utility of a comparative genomics approach to gene identification by providing evidence that orfB in plant and protist mtDNAs is the homolog of atp8 , the gene in animal and fungal mtDNA that encodes subunit 8 of the F0portion of mitochondrial ATP synthase. Although several protist mtDNAs, like those of animals and most fungi, are seen to be highly derived, others appear to be have retained a number of features of the ancestral, proto-mitochondrial genome. Some of these ancestral features are also shared with plant mtDNA, although the latter have evidently expanded considerably in size, if not in gene content, in the course of evolution. Comparative analysis of protist mtDNAs is providing a new perspective on mtDNA evolution: how the original mitochondrial genome was organized, what genes it contained, and in what ways it must have changed in different eukaryotic phyla.

  4. Gene3D: comprehensive structural and functional annotation of genomes.

    PubMed

    Yeats, Corin; Lees, Jonathan; Reid, Adam; Kellam, Paul; Martin, Nigel; Liu, Xinhui; Orengo, Christine

    2008-01-01

    Gene3D provides comprehensive structural and functional annotation of most available protein sequences, including the UniProt, RefSeq and Integr8 resources. The main structural annotation is generated through scanning these sequences against the CATH structural domain database profile-HMM library. CATH is a database of manually derived PDB-based structural domains, placed within a hierarchy reflecting topology, homology and conservation and is able to infer more ancient and divergent homology relationships than sequence-based approaches. This data is supplemented with Pfam-A, other non-domain structural predictions (i.e. coiled coils) and experimental data from UniProt. In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein-protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data. All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database. Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes. This is available at: http://gene3d.biochem.ucl.ac.uk/ PMID:18032434

  5. Recognizing genes and other components of genomic structure

    SciTech Connect

    Burks, C. ); Myers, E. . Dept. of Computer Science); Stormo, G.D. . Dept. of Molecular, Cellular and Developmental Biology)

    1991-01-01

    The Aspen Center for Physics (ACP) sponsored a three-week workshop, with 26 scientists participating, from 28 May to 15 June, 1990. The workshop, entitled Recognizing Genes and Other Components of Genomic Structure, focussed on discussion of current needs and future strategies for developing the ability to identify and predict the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians. No participant stayed for less than one week, and most attended for two or three weeks. Computers, software, and databases were available for use as electronic blackboards'' and as the basis for collaborative exploration of ideas being discussed and developed at the workshop. 23 refs., 2 tabs.

  6. The Mitochondrial Genome of Soybean Reveals Complex Genome Structures and Gene Evolution at Intercellular and Phylogenetic Levels

    PubMed Central

    Chang, Shengxin; Wang, Yankun; Lu, Jiangjie; Gai, Junyi; Li, Jijie; Chu, Pu; Guan, Rongzhan; Zhao, Tuanjie

    2013-01-01

    Determining mitochondrial genomes is important for elucidating vital activities of seed plants. Mitochondrial genomes are specific to each plant species because of their variable size, complex structures and patterns of gene losses and gains during evolution. This complexity has made research on the soybean mitochondrial genome difficult compared with its nuclear and chloroplast genomes. The present study helps to solve a 30-year mystery regarding the most complex mitochondrial genome structure, showing that pairwise rearrangements among the many large repeats may produce an enriched molecular pool of 760 circles in seed plants. The soybean mitochondrial genome harbors 58 genes of known function in addition to 52 predicted open reading frames of unknown function. The genome contains sequences of multiple identifiable origins, including 6.8 kb and 7.1 kb DNA fragments that have been transferred from the nuclear and chloroplast genomes, respectively, and some horizontal DNA transfers. The soybean mitochondrial genome has lost 16 genes, including nine protein-coding genes and seven tRNA genes; however, it has acquired five chloroplast-derived genes during evolution. Four tRNA genes, common among the three genomes, are derived from the chloroplast. Sizeable DNA transfers to the nucleus, with pericentromeric regions as hotspots, are observed, including DNA transfers of 125.0 kb and 151.6 kb identified unambiguously from the soybean mitochondrial and chloroplast genomes, respectively. The soybean nuclear genome has acquired five genes from its mitochondrial genome. These results provide biological insights into the mitochondrial genome of seed plants, and are especially helpful for deciphering vital activities in soybean. PMID:23431381

  7. Assessment of phylogenetic structure in genome size--gene content correlations.

    PubMed

    Prasad, Vibhu Ranjan; Isler, Karin

    2012-05-01

    Gene content and gene-coding percentage can be predicted from genome size in newly sequenced organisms. Here, we investigate whether these predictions are influenced by phylogenetic relationships between the involved species. Combining a highly resolved phylogenetic tree with a large compilation of gene content data, our results reveal the presence of significant phylogenetic structure in the correlations between genome size and gene content in both bacteria and eukaryotes. The variation in log(gene content) explained by log(genome size) in combination with phylogeny was found to be 97% in bacteria and 55% in eukaryotes. Further, in bacteria, gene-coding percentages are only significantly correlated to genome size if phylogenetic information is taken into account in the analyses. These findings support the usage of phylogenetic correlation models for gene content predictions.

  8. The Complete Chloroplast Genome Sequence of Podocarpus lambertii: Genome Structure, Evolutionary Aspects, Gene Content and SSR Detection

    PubMed Central

    Vieira, Leila do Nascimento; Faoro, Helisson; Rogalski, Marcelo; Fraga, Hugo Pacheco de Freitas; Cardoso, Rodrigo Luis Alves; de Souza, Emanuel Maltempi; de Oliveira Pedrosa, Fábio; Nodari, Rubens Onofre; Guerra, Miguel Pedro

    2014-01-01

    Background Podocarpus lambertii (Podocarpaceae) is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp) genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. Methodology/Principal Findings The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR). It contains 118 unique genes and one duplicated tRNA (trnN-GUU), which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi) and Araucariaceae (Agathis dammara). Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. Conclusion The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of this genus. PMID

  9. Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation.

    PubMed

    Sharma, Virag; Elghafari, Anas; Hiller, Michael

    2016-06-20

    Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes. PMID:27016733

  10. Effects of genome structure variation, homeologous genes and repetitive DNA on polyploid crop research in the age of genomics.

    PubMed

    Fu, Donghui; Mason, Annaliese S; Xiao, Meili; Yan, Hui

    2016-01-01

    Compared to diploid species, allopolyploid crop species possess more complex genomes, higher productivity, and greater adaptability to changing environments. Next generation sequencing techniques have produced high-density genetic maps, whole genome sequences, transcriptomes and epigenomes for important polyploid crops. However, several problems interfere with the full application of next generation sequencing techniques to these crops. Firstly, different types of genomic variation affect sequence assembly and QTL mapping. Secondly, duplicated or homoeologous genes can diverge in function and then lead to emergence of many minor QTL, which increases difficulties in fine mapping, cloning and marker assisted selection. Thirdly, repetitive DNA sequences arising in polyploid crop genomes also impact sequence assembly, and are increasingly being shown to produce small RNAs to regulate gene expression and hence phenotypic traits. We propose that these three key features should be considered together when analyzing polyploid crop genomes. It is apparent that dissection of genomic structural variation, elucidation of the function and mechanism of interaction of homoeologous genes, and investigation of the de novo roles of repeat sequences in agronomic traits are necessary for genomics-based crop breeding in polyploids.

  11. Structural features of conopeptide genes inferred from partial sequences of the Conus tribblei genome.

    PubMed

    Barghi, Neda; Concepcion, Gisela P; Olivera, Baldomero M; Lluisma, Arturo O

    2016-02-01

    The evolvability of venom components (in particular, the gene-encoded peptide toxins) in venomous species serves as an adaptive strategy allowing them to target new prey types or respond to changes in the prey field. The structure, organization, and expression of the venom peptide genes may provide insights into the molecular mechanisms that drive the evolution of such genes. Conus is a particularly interesting group given the high chemical diversity of their venom peptides, and the rapid evolution of the conopeptide-encoding genes. Conus genomes, however, are large and characterized by a high proportion of repetitive sequences. As a result, the structure and organization of conopeptide genes have remained poorly known. In this study, a survey of the genome of Conus tribblei was undertaken to address this gap. A partial assembly of C. tribblei genome was generated; the assembly, though consisting of a large number of fragments, accounted for 2160.5 Mb of sequence. A large number of repetitive genomic elements consisting of 642.6 Mb of retrotransposable elements, simple repeats, and novel interspersed repeats were observed. We characterized the structural organization and distribution of conotoxin genes in the genome. A significant number of conopeptide genes (estimated to be between 148 and 193) belonging to different superfamilies with complete or nearly complete exon regions were observed, ~60 % of which were expressed. The unexpressed conopeptide genes represent hidden but significant conotoxin diversity. The conotoxin genes also differed in the frequency and length of the introns. The interruption of exons by long introns in the conopeptide genes and the presence of repeats in the introns may indicate the importance of introns in facilitating recombination, evolution and diversification of conotoxins. These findings advance our understanding of the structural framework that promotes the gene-level molecular evolution of venom peptides. PMID:26423067

  12. Structural features of conopeptide genes inferred from partial sequences of the Conus tribblei genome.

    PubMed

    Barghi, Neda; Concepcion, Gisela P; Olivera, Baldomero M; Lluisma, Arturo O

    2016-02-01

    The evolvability of venom components (in particular, the gene-encoded peptide toxins) in venomous species serves as an adaptive strategy allowing them to target new prey types or respond to changes in the prey field. The structure, organization, and expression of the venom peptide genes may provide insights into the molecular mechanisms that drive the evolution of such genes. Conus is a particularly interesting group given the high chemical diversity of their venom peptides, and the rapid evolution of the conopeptide-encoding genes. Conus genomes, however, are large and characterized by a high proportion of repetitive sequences. As a result, the structure and organization of conopeptide genes have remained poorly known. In this study, a survey of the genome of Conus tribblei was undertaken to address this gap. A partial assembly of C. tribblei genome was generated; the assembly, though consisting of a large number of fragments, accounted for 2160.5 Mb of sequence. A large number of repetitive genomic elements consisting of 642.6 Mb of retrotransposable elements, simple repeats, and novel interspersed repeats were observed. We characterized the structural organization and distribution of conotoxin genes in the genome. A significant number of conopeptide genes (estimated to be between 148 and 193) belonging to different superfamilies with complete or nearly complete exon regions were observed, ~60 % of which were expressed. The unexpressed conopeptide genes represent hidden but significant conotoxin diversity. The conotoxin genes also differed in the frequency and length of the introns. The interruption of exons by long introns in the conopeptide genes and the presence of repeats in the introns may indicate the importance of introns in facilitating recombination, evolution and diversification of conotoxins. These findings advance our understanding of the structural framework that promotes the gene-level molecular evolution of venom peptides.

  13. Mammalian homeobox-containing genes: genome organization, structure, expression and evolution.

    PubMed

    Schughart, K; Kappen, C; Ruddle, F H

    1988-12-01

    Mammalian homeo box-containing genes have been isolated by their sequence similarity to Drosophila homeotic selector genes. About 20 murine homeo box genes have been identified to date and their expression and structural organization has been described in detail. Most homeo box gene loci are organized in at least three major gene clusters in the mouse and human genome. The structure of homeo box genes within these clusters is very similar and in this paper the murine Hox-2.2 gene will be discussed as an example. Homeo box genes are expressed in region-specific patterns during different stages of vertebrate development and almost all mammalian homeo box genes are expressed in the central nervous system (CNS) of the developing embryo. Within the developing CNS of mouse embryos the anterior boundaries of expression are specific for each gene. Comparisons of nucleotide and amino acid sequences as well as the analysis of the structural organization of murine and human homeo box genes reveal strong paralogous relationships between genes in different clusters. These findings suggest that the homeo box gene clusters evolved in two steps. First, an ancestral gene cluster was created by duplications of individual genes along one linkage group and in a subsequent step duplications of the ancestral gene complex gave rise to the three (or possibly four) gene clusters observed in mouse and human to date. The possibility of the homeo box genes representing a functional array of genetic switches will be discussed.

  14. Genomic structure of the human PAX2 gene

    SciTech Connect

    Sanyanusin, P.; Norrish, J.H.; Ward, T.A.

    1996-07-01

    Recent evidence indicates that Fgf8 is expressed during vertebrate development in multiple locations involved in the patterning and outgrowth of important embryo structures. Cloning and analysis of the murine gene revealed at least eight potential protein isoforms that share a common carboxyl region, encoded by exons 2 and 3, but possess different amino termini, generated by alternative splicing of RNA encoded by multiple 5{prime} exons (exons 1A, 1B, 1C, the human FGF8 gene). Human FGF-8 isoforms are identical to their murine counterparts in the common carboxyl region. Four of the human isoforms are identical to, or very similar to, the murine isoforms in the amino termini. However, four of the potential murine isoforms do not have corresponding human isoforms due to marked sequence divergence, leading to a blocked reading frame in exon 1B of FGF8. The lack of the four murine isoforms in humans raises the question of their function in murine development. 18 refs., 2 figs.

  15. Dinoflagellate Gene Structure and Intron Splice Sites in a Genomic Tandem Array.

    PubMed

    Mendez, Gregory S; Delwiche, Charles F; Apt, Kirk E; Lippmeier, J Casey

    2015-01-01

    Dinoflagellates are one of the last major lineages of eukaryotes for which little is known about genome structure and organization. We report here the sequence and gene structure of a clone isolated from a cosmid library which, to our knowledge, represents the largest contiguously sequenced, dinoflagellate genomic, tandem gene array. These data, combined with information from a large transcriptomic library, allowed a high level of confidence of every base pair call. This degree of confidence is not possible with PCR-based contigs. The sequence contains an intron-rich set of five highly expressed gene repeats arranged in tandem. One of the tandem repeat gene members contains an intron 26,372 bp long. This study characterizes a splice site consensus sequence for dinoflagellate introns. Two to nine base pairs around the 3' splice site are repeated by an identical two to nine base pairs around the 5' splice site. The 5' and 3' splice sites are in the same locations within each repeat so that the repeat is found only once in the mature mRNA. This identically repeated intron boundary sequence might be useful in gene modeling and annotation of genomes.

  16. Refined physical mapping and genomic structure of the EXTL1 gene.

    PubMed

    Wuyts, W; Spieker, N; Van Roy, N; De Boulle, K; De Paepe, A; Willems, P J; Van Hul, W; Versteeg, R; Speleman, F

    1999-01-01

    Recently, the EXTL1 gene, a member of the EXT tumor suppressor gene family, has been mapped to 1p36, a chromosome region which is frequently implicated in a wide variety of malignancies, including breast carcinoma, colorectal cancer and neuroblastoma. In this study, we show that the EXTL1 gene is located between the genetic markers D1S511 and D1S234 within 200 kb of the LAP18 gene on chromosome 1p36. 1, a region which has been proposed to harbor a tumor suppressor gene implicated in MYCN-amplified neuroblastomas. In addition, we determined the genomic structure of the EXTL1 gene, revealing that the EXTL1 coding sequence spans 11 exons within a 50-kb region.

  17. Physical mapping and genomic structure of the human TNFR2 gene

    SciTech Connect

    Beltinger, C.P.; White, P.S.; Maris, J.M.

    1996-07-01

    The tumor necrosis factor receptor 2 (TNFR2) gene localizes to 1p36.2, a genomic region characteristically deleted in neuroblastomas and other malignancies. In addition, TNFR2 is the principal mediator of the effects of TNF on cellular immunity, and it may cooperate with TNFR1 in the killing of nonlymphoid cells. Therefore, we undertook an analysis of the genomic structure and precise physical mapping of this gene. The TNFR2 gene is contained on 10 exons that span 26 kb. Most of the functional domains of TNFR2 are encoded by separate exons, and each of the repeats of the extracellular cysteine-rich domain is interrupted by an intron. The genomic structure reveals a close relationship to TNFR1, another member of the TNFR superfamily. Based on electrophoretic analysis of yeast artificial chromosomes, TNFR2 maps within 400 kb of the genetic marker D1S434. In addition, we have identified a new polymorphic dinucleotide repeat within intron 4 of TNFR2. The genetic sequence information and exon-intron boundaries we have determined will facilitate mutational analysis of this gene to determine its potential role in neuroblastoma, as well as in other cancers with characteristic deletions or rearrangements of 1p36. 52 refs., 3 figs., 1 tab.

  18. Structural Genomics: From Genes to Structures With Valuable Materials And Many Questions in Between

    SciTech Connect

    Fox, B.G.; Goulding, C.; Malkowski, M.G.; Stewart, L.; Deacon, A.; /SLAC, SSRL

    2009-04-30

    The Protein Structure Initiative (PSI), funded by the US National Institutes of Health (NIH), provides a framework for the development and systematic evaluation of methods to solve protein structures. Although the PSI and other structural genomics efforts around the world have led to the solution of many new protein structures as well as the development of new methods, methodological bottlenecks still exist and are being addressed in this 'production phase' of PSI.

  19. Genomic structure and chromosomal assignment of the mouse Ku70 gene

    SciTech Connect

    Takiguchi, Yuichi |; Kurimasa, Akihiro; Chen, Fanqing

    1996-07-01

    DNA-dependent protein kinase (DNA-PK) consists of three polypeptide subunits: Ku70, Ku80, and the DNA-PK catalytic subunit (DNA-PKcs). Mammalian mutants deficient in either Ku80 or DNA-PKcs function have been shown to be lacking in DNA double-strand break repair and V(D)J recombination, respectively. The precise role of the Ku70 gene in this process has not yet been determined, in part because no cell lines, animals, or human diseases involved with deficiencies in this gene have yet been identified. Both the human and the mouse Ku70 cDNAs have been cloned, and the human gene has been mapped to chromosome 22q13. The original mouse cDNA clones, however, lacked a complete 5{prime}-region, and none of the mammalian Ku70 genomic sequences have been characterized. This report contains an analysis of the 5{prime}-region of the mouse cDNA sequence, a characterization of the mouse Ku70 genomic structure, and fluorescence in situ hybridization data that map the mouse gene to chromosome 15. The deduced amino acid sequence of the mouse gene consists of 608 amino acids compared to 609 for the human gene. The genomic sequence is 24 kb and consists of 13 exons, including an untranslated first exon. Sequences form the upstream region of exon 1 revealed four consensus GC box sequences and a strong transcription initiation site at a reasonable location. The assignment of the mouse Ku70 gene to chromosome 15 is consistent with the syntenic relationship of this gene in human (chromosome 22q13) and mouse and adds to the comparative mapping data for the genes involved in the SCID phenotype. 39 refs., 3 figs.

  20. Genomic structure and chromosomal mapping of the murine CD40 gene

    SciTech Connect

    Grimaldi, J.C.; Chang, R.; Howard, M.; Cockayne, D.A. ); Torres, R.; Clark, E.A. ); Kozak, C.A. )

    1992-12-15

    The B cell-associated surface molecule, CD40, is likely to play a central role in the expansion of Ag-stimulated B cells, and their interaction with activated Th cells. In this study the authors have isolated genomic clones of murine CD40 from a mouse liver genomic DNA library. Comparison with the murine CD40 cDNA sequence revealed the presence of nine exons that together contain the entire murine CD40 coding region, and span approximately 16.3 kb of genomic DNA. The intron/exon structure of the CD40 gene resembles that of the low affinity nerve growth factor receptor gene, a close homolog of both human and murine CD40. In both cases the functional domains of the receptor molecules are separated onto different exons throughout the genes. Southern blot analysis demonstrated that murine CD40 is a single copy gene that maps in the distal region of mouse chromosome 2. 58 refs., 4 figs., 1 tab.

  1. Overview of PSB track on gene structure identification in large-scale genomic sequence

    SciTech Connect

    Uberbacher, E.C.; Xu, Y.

    1998-12-31

    The recent funding of more than a dozen major genome centers to begin community-wide high-throughput sequencing of the human genome has created a significant new challenge for the computational analysis of DNA sequence and the prediction of gene structure and function. It has been estimated that on average from 1996 to 2003, approximately 2 million bases of newly finished DNA sequence will be produced every day and be made available on the Internet and in central databases. The finished (fully assembled) sequence generated each day will represent approximately 75 new genes (and their respective proteins), and many times this number will be represented in partially completed sequences. The information contained in these is of immeasurable value to medical research, biotechnology, the pharmaceutical industry and researchers in a host of fields ranging from microorganism metabolism, to structural biology, to bioremediation. Sequencing of microorganisms and other model organisms is also ramping up at a very rapid rate. The genomes for yeast and several microorganisms such as H. influenza have recently been fully sequenced, although the significance of many genes remains to be determined.

  2. Structure and organization of Marchantia polymorpha chloroplast genome. I. Cloning and gene identification.

    PubMed

    Ohyama, K; Fukuzawa, H; Kohchi, T; Sano, T; Sano, S; Shirai, H; Umesono, K; Shiki, Y; Takeuchi, M; Chang, Z

    1988-09-20

    We have determined the complete nucleotide sequence of chloroplast DNA from a liverwort, Marchantia polymorpha, using a clone bank of chloroplast DNA fragments. The circular genome consists of 121,024 base-pairs and includes two large inverted repeats (IRA and IRB, each 10,058 base-pairs), a large single-copy region (LSC, 81,095 base-pairs), and a small single-copy region (SSC, 19,813 base-pairs). The nucleotide sequence was analysed with a computer to deduce the entire gene organization, assuming the universal genetic code and the presence of introns in the coding sequences. We detected 136 possible genes. 103 gene products of which are related to known stable RNA or protein molecules. Stable RNA genes for four species of ribosomal RNA and 32 species of tRNA were located, although one of the tRNA genes may be defective. Twenty genes encoding polypeptides involved in photosynthesis and electron transport were identified by comparison with known chloroplast genes. Twenty-five open reading frames (ORFs) show structural similarities to Escherichia coli RNA polymerase subunits, 19 ribosomal proteins and two related proteins. Seven ORFs are comparable with human mitochondrial NADH dehydrogenase genes. A computer-aided homology search predicted possible chloroplast homologues of bacterial proteins; two ORFs for bacterial 4Fe-4S-type ferredoxin, two for distinct subunits of a protein-dependent transport system, one ORF for a component of nitrogenase, and one for an antenna protein of a light-harvesting complex. The other 33 ORFs, consisting of 29 to 2136 codons, remain to be identified, but some of them seem to be conserved in evolution. Detailed information on gene identification is presented in the accompanying papers. We postulated that there were 22 introns in 20 genes (8 tRNA genes and 12 ORFs), which may be classified into the groups I and II found in fungal mitochondrial genes. The structural gene for ribosomal protein S12 is trans-split on the opposite DNA strand

  3. Genomic structure and expression of STM2, the chromosome 1 familial Alzheimer disease gene

    SciTech Connect

    Levy-Lahad, E.; Wang, Kai; Fu, Ying Hui

    1996-06-01

    Mutations in the gene STM2 result in autosomal dominant familial Alzheimer disease. To screen for mutations and to identify regulatory elements for this gene, the genomic DNA sequence and intron-exon structure were determined. Twelve exons including 10 coding exons were identified in a genomic region spanning 23, 737 bp. The first 2 exons encode the 5{prime}-untranslated region. Expression analysis of STM2 indicates that two transcripts of 2.4 and 2.8 kb are found in skeletal muscle, pancreas, and heart. In addition, a splice variant of the 2.4-kb transcript was identified that is the result of the use of an alternative splice acceptor site located in exon 10. The use of this site results in a transcript lacking a single glutamate. The promotor for this gene and the alternatively spliced exons leading to the 2.8-kb form of the gene remain to be identified. Expression of STM2 was high in skeletal muscle and pancreas, with comparatively low levels observed in brain. This expression pattern is intriguing since in Alzheimer disease, pathology and degeneration are observed only in the central nervous system. 19 refs., 2 figs., 3 tabs.

  4. Domain organization, genomic structure, evolution, and regulation of expression of the aggrecan gene family.

    PubMed

    Schwartz, N B; Pirok, E W; Mensch, J R; Domowicz, M S

    1999-01-01

    Proteoglycans are complex macromolecules, consisting of a polypeptide backbone to which are covalently attached one or more glycosaminoglycan chains. Molecular cloning has allowed identification of the genes encoding the core proteins of various proteoglycans, leading to a better understanding of the diversity of proteoglycan structure and function, as well as to the evolution of a classification of proteoglycans on the basis of emerging gene families that encode the different core proteins. One such family includes several proteoglycans that have been grouped with aggrecan, the large aggregating chondroitin sulfate proteoglycan of cartilage, based on a high number of sequence similarities within the N- and C-terminal domains. Thus far these proteoglycans include versican, neurocan, and brevican. It is now apparent that these proteins, as a group, are truly a gene family with shared structural motifs on the protein and nucleotide (mRNA) levels, and with nearly identical genomic organizations. Clearly a common ancestral origin is indicated for the members of the aggrecan family of proteoglycans. However, differing patterns of amplification and divergence have also occurred within certain exons across species and family members, leading to the class-characteristic protein motifs in the central carbohydrate-rich region exclusively. Thus the overall domain organization strongly suggests that sequence conservation in the terminal globular domains underlies common functions, whereas differences in the central portions of the genes account for functional specialization among the members of this gene family.

  5. The population genomics of begomoviruses: global scale population structure and gene flow

    PubMed Central

    2010-01-01

    Background The rapidly growing availability of diverse full genome sequences from across the world is increasing the feasibility of studying the large-scale population processes that underly observable pattern of virus diversity. In particular, characterizing the genetic structure of virus populations could potentially reveal much about how factors such as geographical distributions, host ranges and gene flow between populations combine to produce the discontinuous patterns of genetic diversity that we perceive as distinct virus species. Among the richest and most diverse full genome datasets that are available is that for the dicotyledonous plant infecting genus, Begomovirus, in the Family Geminiviridae. The begomoviruses all share the same whitefly vector, are highly recombinogenic and are distributed throughout tropical and subtropical regions where they seriously threaten the food security of the world's poorest people. Results We focus here on using a model-based population genetic approach to identify the genetically distinct sub-populations within the global begomovirus meta-population. We demonstrate the existence of at least seven major sub-populations that can further be sub-divided into as many as thirty four significantly differentiated and genetically cohesive minor sub-populations. Using the population structure framework revealed in the present study, we further explored the extent of gene flow and recombination between genetic populations. Conclusions Although geographical barriers are apparently the most significant underlying cause of the seven major population sub-divisions, within the framework of these sub-divisions, we explore patterns of gene flow to reveal that both host range differences and genetic barriers to recombination have probably been major contributors to the minor population sub-divisions that we have identified. We believe that the global Begomovirus population structure revealed here could facilitate population genetics studies

  6. Analysis of the murine Dtk gene identifies conservation of genomic structure within a new receptor tyrosine kinase subfamily

    SciTech Connect

    Lewis, P.M.; Crosier, K.E.; Crosier, P.S.

    1996-01-01

    The receptor tyrosine kinase Dtk/Tyro 3/Sky/rse/brt/tif is a member of a new subfamily of receptors that also includes Axl/Ufo/Ark and Eyk/Mer. These receptors are characterized by the presence of two immunoglobulin-like loops and two fibronectin type III repeats in their extracellular domains. The structure of the murine Dtk gene has been determined. The gene consists of 21 exons that are distributed over 21 kb of genomic DNA. An isoform of Dtk is generated by differential splicing of exons from the 5{prime} region of the gene. The overall genomic structure of Dtk is virtually identical to that determined for the human UFO gene. This particular genomic organization is likely to have been duplicated and closely maintained throughout evolution. 38 refs., 3 figs., 1 tab.

  7. A highly conserved gene island of three genes on chromosome 3B of hexaploid wheat: diverse gene function and genomic structure maintained in a tightly linked block

    PubMed Central

    2010-01-01

    Background The complexity of the wheat genome has resulted from waves of retrotransposable element insertions. Gene deletions and disruptions generated by the fast replacement of repetitive elements in wheat have resulted in disruption of colinearity at a micro (sub-megabase) level among the cereals. In view of genomic changes that are possible within a given time span, conservation of genes between species tends to imply an important functional or regional constraint that does not permit a change in genomic structure. The ctg1034 contig completed in this paper was initially studied because it was assigned to the Sr2 resistance locus region, but detailed mapping studies subsequently assigned it to the long arm of 3B and revealed its unusual features. Results BAC shotgun sequencing of the hexaploid wheat (Triticum aestivum cv. Chinese Spring) genome has been used to assemble a group of 15 wheat BACs from the chromosome 3B physical map FPC contig ctg1034 into a 783,553 bp genomic sequence. This ctg1034 sequence was annotated for biological features such as genes and transposable elements. A three-gene island was identified among >80% repetitive DNA sequence. Using bioinformatics analysis there were no observable similarity in their gene functions. The ctg1034 gene island also displayed complete conservation of gene order and orientation with syntenic gene islands found in publicly available genome sequences of Brachypodium distachyon, Oryza sativa, Sorghum bicolor and Zea mays, even though the intergenic space and introns were divergent. Conclusion We propose that ctg1034 is located within the heterochromatic C-band region of deletion bin 3BL7 based on the identification of heterochromatic tandem repeats and presence of significant matches to chromodomain-containing gypsy LTR retrotransposable elements. We also speculate that this location, among other highly repetitive sequences, may account for the relative stability in gene order and orientation within the gene

  8. Comparative mapping, genomic structure, and expression analysis of eight pseudo-response regulator genes in Brassica rapa.

    PubMed

    Kim, Jin A; Kim, Jung Sun; Hong, Joon Ki; Lee, Yeon-Hee; Choi, Beom-Soon; Seol, Young-Joo; Jeon, Chang Hoo

    2012-05-01

    Circadian clocks regulate plant growth and development in response to environmental factors. In this function, clocks influence the adaptation of species to changes in location or climate. Circadian-clock genes have been subject of intense study in models such as Arabidopsis thaliana but the results may not necessarily reflect clock functions in species with polyploid genomes, such as Brassica species, that include multiple copies of clock-related genes. The triplicate genome of Brassica rapa retains high sequence-level co-linearity with Arabidopsis genomes. In B. rapa we had previously identified five orthologs of the five known Arabidopsis pseudo-response regulator (PRR) genes that are key regulators of the circadian clock in this species. Three of these B. rapa genes, BrPRR1, BrPPR5, and BrPPR7, are present in two copies each in the B. rapa genome, for a total of eight B. rapa PRR (BrPRR) orthologs. We have now determined sequences and expression characteristics of the eight BrPRR genes and mapped their positions in the B. rapa genome. Although both members of each paralogous pair exhibited the same expression pattern, some variation in their gene structures was apparent. The BrPRR genes are tightly linked to several flowering genes. The knowledge about genome location, copy number variation and structural diversity of these B. rapa clock genes will improve our understanding of clock-related functions in this important crop. This will facilitate the development of Brassica crops for optimal growth in new environments and under changing conditions.

  9. Genes, genome and Gestalt.

    PubMed

    Grisolia, Cesar Koppe

    2005-01-01

    According to Gestalt thinking, biological systems cannot be viewed as the sum of their elements, but as processes of the whole. To understand organisms we must start from the whole, observing how the various parts are related. In genetics, we must observe the genome over and above the sum of its genes. Either loss or addition of one gene in a genome can change the function of the organism. Genomes are organized in networks of genes, which need to be well integrated. In the case of genetically modified organisms (GMOs), for example, soybeans, rats, Anopheles mosquitoes, and pigs, the insertion of an exogenous gene into a receptive organism generally causes disturbance in the networks, resulting in the breakdown of gene interactions. In these cases, genetic modification increased the genetic load of the GMO and consequently decreased its adaptability (fitness). Therefore, it is hard to claim that the production of such organisms with an increased genetic load does not have ethical implications.

  10. Human Txk: genomic organization, structure and contiguous physical linkage with the Tec gene.

    PubMed

    Ohta, Y; Haire, R N; Amemiya, C T; Litman, R T; Träger, T; Riess, O; Litman, G W

    1996-02-15

    Txk is a Tec-family tyrosine kinase expressed in mouse and human T lymphocytes. Among the Tec kinases, Txk is unique in that its amino terminal region does not include a pleckstrin homology domain or other known extended functional region. Txk is encoded at human chromosome 4p12 and at a recognized region of conserved synteny on mouse chromosome 5. The genomic organization of Txk consists of 15 exons with strong exon-intron organizational homology to Btk, the only other Tec-family kinase for which the genomic structure is fully known. The human Tec gene also maps to 4p12 and, based on limited studies reported here, possesses organizational homology with Btk and Txk. We have sequenced a continuous region of DNA that contains 3' Tec and 5' Txk exons separated by only a approximately 1.5 kb intergenic region containing the putative promoter region of Txk. The close physical linkage of these Tec-family tyrosine kinases, which are expressed in different hematopoetic cell lineages, suggests their potential for coordinate cis-regulation.

  11. The mouse formin (Fmn) gene: Genomic structure, novel exons, and genetic mapping

    SciTech Connect

    Wang, C.C.; Chan, D.C.; Leder, P.

    1997-02-01

    Mutations in the mouse formin (Fmn) gene, formerly known as the limb deformity (ld) gene, give rise to recessively inherited limb deformities and renal malformations or aplasia. The Fmn gene encodes many differentially processed transcripts that are expressed in both adult and embryonic tissues. To study the genomic organization of the Fmn locus, we have used Fmn probes to isolate and characterize genomic clones spanning 500 kb. Our analysis of these clones shows that the Fmn gene is composed of at least 24 exons and spans 400 kb. We have identified two novel exons that are expressed in the developing embryonic limb bud as well as adult tissues such as brain and kidney. We have also used a microsatellite polymorphism from within the Fmn gene to map it genetically to a 2.2-cM interval between D2Mit58 and D2Mit103. 36 refs., 6 figs., 1 tab.

  12. Genome-Wide Analysis of the Expansin Gene Superfamily Reveals Grapevine-Specific Structural and Functional Characteristics

    PubMed Central

    Tornielli, Giovanni Battista; Fasoli, Marianna; Venturini, Luca; Pezzotti, Mario; Zenoni, Sara

    2013-01-01

    Background Expansins are proteins that loosen plant cell walls in a pH-dependent manner, probably by increasing the relative movement among polymers thus causing irreversible expansion. The expansin superfamily (EXP) comprises four distinct families: expansin A (EXPA), expansin B (EXPB), expansin-like A (EXLA) and expansin-like B (EXLB). There is experimental evidence that EXPA and EXPB proteins are required for cell expansion and developmental processes involving cell wall modification, whereas the exact functions of EXLA and EXLB remain unclear. The complete grapevine (Vitis vinifera) genome sequence has allowed the characterization of many gene families, but an exhaustive genome-wide analysis of expansin gene expression has not been attempted thus far. Methodology/Principal Findings We identified 29 EXP superfamily genes in the grapevine genome, representing all four EXP families. Members of the same EXP family shared the same exon–intron structure, and phylogenetic analysis confirmed a closer relationship between EXP genes from woody species, i.e. grapevine and poplar (Populus trichocarpa), compared to those from Arabidopsis thaliana and rice (Oryza sativa). We also identified grapevine-specific duplication events involving the EXLB family. Global gene expression analysis confirmed a strong correlation among EXP genes expressed in mature and green/vegetative samples, respectively, as reported for other gene families in the recently-published grapevine gene expression atlas. We also observed the specific co-expression of EXLB genes in woody organs, and the involvement of certain grapevine EXP genes in berry development and post-harvest withering. Conclusion Our comprehensive analysis of the grapevine EXP superfamily confirmed and extended current knowledge about the structural and functional characteristics of this gene family, and also identified properties that are currently unique to grapevine expansin genes. Our data provide a model for the functional

  13. Genome-wide analysis of the structural genes regulating defense phenylpropanoid metabolism in Populus

    SciTech Connect

    Tschaplinski, Timothy J; Tsai, Chung-Jui; Harding, Scott A; Lindroth, richard L; Yuan, Yinan

    2006-01-01

    Salicin-based phenolic glycosides, hydroxycinnamate derivatives and flavonoid-derived condensed tannins comprise up to one-third of Populus leaf dry mass. Genes regulating the abundance and chemical diversity of these substances have not been comprehensively analysed in tree species exhibiting this metabolically demanding level of phenolic metabolism. Here, shikimate-phenylpropanoid pathway genes thought to give rise to these phenolic products were annotated from the Populus genome, their expression assessed by semiquantitative or quantitative reverse transcription polymerase chain reaction (PCR), and metabolic evidence for function presented. Unlike Arabidopsis, Populus leaves accumulate an array of hydroxycinnamoyl-quinate esters, which is consistent with broadened function of the expanded hydroxycinnamoyl-CoA transferase gene family. Greater flavonoid pathway diversity is also represented, and flavonoid gene families are larger. Consistent with expanded pathway function, most of these genes were upregulated during wound-stimulated condensed tannin synthesis in leaves. The suite of Populus genes regulating phenylpropanoid product accumulation should have important application in managing phenolic carbon pools in relation to climate change and global carbon cycling.

  14. Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models.

    PubMed

    Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar

    2016-03-01

    Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants. PMID:26943367

  15. Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models.

    PubMed

    Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar

    2016-03-01

    Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.

  16. Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models

    PubMed Central

    Yang, Wen-Yun; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar

    2016-01-01

    Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants. PMID:26943367

  17. Human tissue factor pathway inhibitor (TFPI) gene: Complete genomic structure and localization on the genetic map of chromosome 2q

    SciTech Connect

    Enjyoji, Kei-ichi; Emi, Mitsuru; Mukai, Tsunehiro; Imada, Motohiro; Kato, Hisao ); Leppert, M.L.; Lalouel, J.M. Univ. of Utah Medical School, Salt Lake City, UT )

    1993-08-01

    Tissue factor pathway inhibitor (TFPI), a protease inhibitor that circulates in association with plasma lipoproteins (VLDL, LDL and HDL), helps to regulate the extrinsic blood coagulation cascade. The authors have cloned a 125-kb genomic region containing the entire human TFPI gene on six overlapping cosmids and prepared a restriction map of this contig to clarify gene structure. More than half (45 kb) of the 85-kb gene is occupied with 5[prime] noncoding elements: coding begins at exon 3. A HindIII RFLP identified with one cosmid was genotyped in the CEPH panel of 559 reference families. Linkage analysis using markers on human chromosome 2 located the TFPI gene on 2q, 36 cM proximal to D2S43(pYNZ15) and 13 cM distal to the crystalline [gamma]-polypeptide locus CRYGP1(p5G1). 31 refs., 3 figs., 3 tabs.

  18. Characterization of the genomic structure of the mouse APLP1 gene

    SciTech Connect

    Zhong, Sue; Wu, Kuo; Black, I.B.; Schaar, D.G.

    1996-02-15

    This article reports on the organization of the mouse APLP1 gene, an evolutionarily conserved amyloid precursor-like protein. The amyloid beta protein, important in Alzheimer diseases, is derived from these precursor proteins. By investigating the expression and structure of this murine gene, it is hoped that more will be learned about the function and regulation of the human homologue. 27 refs., 2 figs.

  19. Genomic organization and chromosomal localization of the murine 2 P domain potassium channel gene Kcnk8: conservation of gene structure in 2 P domain potassium channels.

    PubMed

    Bockenhauer, D; Nimmakayalu, M A; Ward, D C; Goldstein, S A; Gallagher, P G

    2000-12-31

    A 2 P domain potassium channel expressed in eye, lung, and stomach, Kcnk8, has recently been identified. To initiate further biochemical and genetic studies of this channel, we assembled the murine Kcnk8 cDNA sequence, characterized the genomic structure of the Kcnk8 gene, determined its chromosomal localization, and analyzed its activity in a Xenopus laevis oocyte expression system. The composite cDNA has an open reading frame of 1029 bp and encodes a protein of 343 amino acids with a predicted molecular mass of 36 kDa. Structure analyses predict 2 P domains and four potential transmembrane helices with a potential single EF-hand motif and four potential SH3-binding motifs in the COOH-terminus. Cloning of the Kcnk8 chromosomal gene revealed that it is composed of three exons distributed over 4 kb of genomic DNA. Genome database searching revealed that one of the intron/exon boundaries identified in Kcnk8 is present in other mammalian 2 P domain potassium channels genes and many C. elegans 2P domain potassium channel genes, revealing evolutionary conservation of gene structure. Using fluorescence in situ hybridization, the murine Kcnk8 gene was mapped to chromosome 19, 2B, the locus of the murine dancer phenotype, and syntenic to 11q11-11q13, the location of the human homologue. No significant currents were generated in a Xenopus laevis oocyte expression system using the composite Kcnk8 cDNA sequence, suggesting, like many potassium channels, additional channel subunits, modulator substances, or cellular chaperones are required for channel function.

  20. The mouse Muc5b mucin gene: cDNA and genomic structures, chromosomal localization and expression.

    PubMed Central

    Escande, Fabienne; Porchet, Nicole; Aubert, Jean-Pierre; Buisine, Marie-Pierre

    2002-01-01

    We report here the isolation and characterization of the mouse Muc5b mucin gene (mMuc5b). We determined its complete cDNA sequence, its genomic organization, and chromosomal localization. Moreover, we analyzed the expression of this gene by reverse-transcription PCR and in situ hybridization. The structure of the gene was determined from a genomic cosmid clone that encompasses the entire mMuc5b gene, including the 5'-flanking region. The mMuc5b gene spans approximately 36 kb and contains 49 exons. It is located on mouse distal chromosome 7. mMuc5b encodes at least two transcripts by alternative splicing of the second exon, the longest one being 14.9 kb in length. The deduced peptide contains 4782 amino acids. Its central region can be subdivided into 10 imperfect repeats, each composed of a cysteine-rich domain followed by a threonine, serine, and proline-rich mucin-type domain. It is flanked by cysteine-rich domains similar to cysteine-rich domains of pre-pro-von Willebrand factor. Comparison with its human homologue MUC5B revealed common features including high sequence similarities in the 5' and 3' regions, and the conservation of the genomic organization. In contrast, mMuc5b differs from its human homologue, since no highly tandemly repeated sequences could be identified within its central region. mMuc5b is expressed mainly in laryngeal mucous glands, and at a lesser extend in stomach and duodenum. PMID:11964160

  1. Evolutionary genomics: transdomain gene transfers.

    PubMed

    Bordenstein, Seth R

    2007-11-01

    Biologists have until now conceded that bacterial gene transfer to multicellular animals is relatively uncommon in Nature. A new study showing promiscuous insertions of bacterial endosymbiont genes into invertebrate genomes ushers in a shift in this paradigm.

  2. Genomic structure, gene expression, and promoter analysis of human multidrug resistance-associated protein 7

    SciTech Connect

    Kao, Hsin-Hsin; Chang, Ming-Shi; Cheng, Jan-Fang; Huang, Jin-Ding

    2002-03-15

    The multidrug resistance-associated protein (MRP) subfamily transporters associated with anticancer drug efflux are attributed to the multidrug-resistance of cancer cells. The genomic organization of human multidrug resistance-associated protein 7 (MRP7) was identified. The human MRP7 gene, consisting of 22 exons and 21 introns, greatly differs from other members of the human MRP subfamily. A splicing variant of human MRP7, MRP7A, expressed in most human tissues, was also characterized. The 1.93-kb promoter region of MRP7 was isolated and shown to support luciferase activity at a level 4- to 5-fold greater than that of the SV40 promoter. Basal MRP7 gene expression was regulated by 2 regions in the 5-flanking region at 1,780 1,287 bp, and at 611 to 208 bp. In Madin-Darby canine kidney (MDCK) cells, MRP7 promoter activity was increased by 226 percent by genotoxic 2-acetylaminofluorene and 347 percent by the histone deacetylase inhibitor, trichostatin A. The protein was expressed in the membrane fraction of transfected MDCK cells.

  3. From genes to genome biology

    SciTech Connect

    Pennisi, E.

    1996-06-21

    This article describes a change in the approach to mapping genomes, from looking at one gene at a time, to other approaches. Strategies include everything from lab techniques to computer programs designed to analyze whole batches of genes at once. Also included is a update on the work on the human genome.

  4. Genomic structure of the choroideremia (CHM) gene and mutation analysis in CHM patients

    SciTech Connect

    Bokhoven, H. van; Hurk, J. van den; Bogerd, L.

    1994-09-01

    We have isolated the complete open reading frame (ORF) of the choroideremia (CHM) gene and elucidated its exon-intron structure. The ORF of the CHM gene is located on 15 exons and encodes a protein of 653 amino acids. Among 75 CHM patients investigated for large structural abnormalities, 15 (20%) showed deletions of one or more exons of the gene. The deletions vary in size from a few kb spanning one exon to more than 10 megabases encompassing a large part of Xq21. In addition, we have positioned the X-chromosomal breakpoint in a CHM female with an X;7 translocation between exons 3 and 4. Fine mapping of the deletions indicates that there is no clustering of deletion breakpoints. Moreover, only 2 deletions are located entirely within the CHM gene, indicating that most deletions can be detected by PCR amplification of exons 1 and 15. From within the CHM gene we identified two microsatellite markers, a (CA){sub n}- and a [(TA){sub 4-12}C]{sub n}-like repeat, which should be very valuable for CHM diagnostics in clear-cut CHM families. In patients in which the diagnosis of choroideremia is less obvious, mutation analysis can be performed by PCR-SSCP analysis and direct sequencing. The feasibility of this approach was illustrated by the finding of 10 causative mutations in 12 Danish CHM families investigated. Interestingly, all CHM gene mutations detected thus far give rise to the introduction of a premature stop codon. Missense mutations thus far have not been found.

  5. Genomic clones of Aspergillus nidulans containing alcA, the structural gene for alcohol dehydrogenase and alcR, a regulatory gene for ethanol metabolism.

    PubMed

    Doy, C H; Pateman, J A; Olsen, J E; Kane, H J; Creaser, E H

    1985-04-01

    Our aim was to obtain from Aspergillus nidulans a genomic bank and then clone a region we expected from earlier genetic mapping to contain two closely linked genes, alcA, the structural gene for alcohol dehydrogenase (ADH) and alcR, a positive trans-acting regulatory gene for ethanol metabolism. The expression of alcA is repressed by carbon catabolites. A genomic restriction fragment characteristic of the alcA-alcR region was identified, cloned in pBR322, and used to select from a genomic bank in lambda EMBL3A three overlapping clones covering 24 kb of DNA. Southern genomic analysis of wild-type, alcA and alcR mutants showed that the mutants contained extra DNA at sites near the center of the cloned DNA and are close together, as expected for alcA and alcR. Transcription from the cloned DNA and hybridization with a clone carrying the Saccharomyces cerevisiae gene for ADHI (ADC1) are both confined to the alcA-alcR region. At least one of several species of mature mRNA is about 1 kb, the size required to code for ADH. For all species, carbon catabolite repression overrides control by induction. The overall characteristics of transcription, hybridization to ADC1 and earlier work suggest that alcA consists of a number of exons and/or that the alcA-alcR region represents a cluster of alcA-related genes or sequences.

  6. Genomic structure of PEX13, a candidate peroxisome biogenesis disorder gene.

    PubMed

    Björkman, J; Stetten, G; Moore, C S; Gould, S J; Crane, D I

    1998-12-15

    The peroxisome biogenesis disorders (PBDs) are a set of lethal genetic diseases characterized by peroxisomal metabolic deficiencies, multisystem abnormalities, mental retardation, and premature death. These disorders are genetically heterogeneous and are caused by mutations in genes, termed PEX genes, required for import of proteins into the peroxisomal matrix. We have previously reported the identification of human PEX13, the gene encoding the docking factor for the PTS1 receptor, or PEX5 protein. As such, mutations in PEX13 would be expected to abrogate peroxisomal protein import and result in PBD phenotypes. We report here the structure of the human PEX13 gene. PEX13 spans approximately 11 kb on chromosome 2 and contains four exons, one more than previously thought. The corrected PEX13 cDNA is predicted to encode a protein product with a molecular mass of 44,312 Da. We examined the ability of PEX13 expression to rescue the peroxisomal protein import defects of fibroblast cells representing all known PBD complementation groups. No complementation was observed, suggesting that this gene is not mutated in any set of existing patients. However, given that complementation group assignments have been determined for only a subset of PBD patients, it is possible that PEX13-deficient patients may exist at a low frequency within our existing PBD patient population or within ethnic groups underrepresented in our patient pool.

  7. Genomic structure and chromosomal mapping of the human CD22 gene

    SciTech Connect

    Wilson, G.L.; Kozlow, E.; Kehrl, J.H. ); Najfeld, V. ); Menniger, J.; Ward, D. )

    1993-06-01

    The human CD22 gene is expressed specifically in B lymphocytes and likely has an important function in cell-cell interactions. A nearly full length human CD22 cDNA clone was used to isolate genomic clones that span the CD22 gene. The CD22 gene is spread over 22 kb of DNA and is composed of 15 exons. The first exon contains the major transcriptional start sites. The translation initiation codon is located in exon 3, which also encodes a portion of the signal peptide. Exons 4 to 10 encode the seven Ig domains of CD22, exon 11 encodes the transmembrane domain, exons 12 to 15 encode the intracytoplasmic domain of CD22, and exon 15 also contains the 3' untranslated region. A minor form of CD22 mRNA likely results from splicing of exon 5 to exon 8, skipping exons 6 and 7. A 4.6-kb Xbal fragment of the CD22 gene was used to map the chromosomal location of CD22 by fluorescence in situ hybridization. The hybridization locus was identified by combining fluorescent images of the probe with the chromosomal banding pattern generated by an Alu probe. The results demonstrate the CD22 is located within the band region q13.1 of chromosome 19. Two closely clustered major transcription start sites and several minor start sites were mapped by primer extension. Similarly to many other lymphoid-specific genes, the CD22 promoter lacks an obvious TATA box. Approximately 4 kb of DNA 5' of the transcription start sites were sequenced and found to contain multiple Alu elements. Potential binding sites for the transcriptional factors NF-kB, AP-1, and Oct-2 are located within 300 bp 5' of the major transcription start sites. A 400-bp fragment (bp -339 through +71) of the CD22 promoter region was subcloned into a pGEM-chloramphenicol acetyltransferase vector and after transfection into B and T cells was found to be active in both B and T cells. 45 refs., 7 figs., 2 tabs.

  8. Visualizing conserved gene location across microbe genomes

    NASA Astrophysics Data System (ADS)

    Shaw, Chris D.

    2009-01-01

    This paper introduces an analysis-based zoomable visualization technique for displaying the location of genes across many related species of microbes. The purpose of this visualizatiuon is to enable a biologist to examine the layout of genes in the organism of interest with respect to the gene organization of related organisms. During the genomic annotation process, the ability to observe gene organization in common with previously annotated genomes can help a biologist better confirm the structure and function of newly analyzed microbe DNA sequences. We have developed a visualization and analysis tool that enables the biologist to observe and examine gene organization among genomes, in the context of the primary sequence of interest. This paper describes the visualization and analysis steps, and presents a case study using a number of Rickettsia genomes.

  9. Localization, expression and genomic structure of the gene encoding the human serine protease testisin.

    PubMed

    Hooper, J D; Bowen, N; Marshall, H; Cullen, L M; Sood, R; Daniels, R; Stuttgen, M A; Normyle, J F; Higgs, D R; Kastner, D L; Ogbourne, S M; Pera, M F; Jazwinska, E C; Antalis, T M

    2000-06-21

    Testisin is a recently identified human serine protease expressed by premeiotic testicular germ cells and is a candidate tumor suppressor for testicular cancer. Here, we report the characterization of the gene encoding testisin, designated PRSS21, and its localization on the short arm of human chromosome 16 (16p13.3) between the microsatellite marker D16S246 and the radiation hybrid breakpoint CY23HA. We have further refined the localization to cosmid 406D6 in this interval and have established that the gene is approximately 4. 5 kb in length, and contains six exons and five intervening introns. The structure of PRSS21 is very similar to the human prostasin gene (PRSS8) which maps nearby on 16p11.2, suggesting that these genes may have evolved through gene duplication. Sequence analysis showed that the two known isoforms of testisin are generated by alternative pre-mRNA splicing. A major transcription initiation site was identified 97 nucleotides upstream of the testisin translation start and conforms to a consensus initiator element. The region surrounding the transcription initiation site lacks a TATA consensus sequence, but contains a CCAAT sequence and includes a CpG island. The 5'-flanking region contains several consensus response elements including Sp1, AP1 and several testis-specific elements. Analysis of testisin gene expression in tumor cell lines shows that testisin is not expressed in testicular tumor cells but is aberrantly expressed in some tumor cell lines of non-testis origin. These data provide the basis for identifying potential genetic alterations of PRSS21 that may underlie both testicular abnormalities and tumorigenesis. PMID:11004480

  10. Genomic Survey, Gene Expression Analysis and Structural Modeling Suggest Diverse Roles of DNA Methyltransferases in Legumes

    PubMed Central

    Garg, Rohini; Kumari, Romika; Tiwari, Sneha; Goyal, Shweta

    2014-01-01

    DNA methylation plays a crucial role in development through inheritable gene silencing. Plants possess three types of DNA methyltransferases (MTases), namely Methyltransferase (MET), Chromomethylase (CMT) and Domains Rearranged Methyltransferase (DRM), which maintain methylation at CG, CHG and CHH sites. DNA MTases have not been studied in legumes so far. Here, we report the identification and analysis of putative DNA MTases in five legumes, including chickpea, soybean, pigeonpea, Medicago and Lotus. MTases in legumes could be classified in known MET, CMT, DRM and DNA nucleotide methyltransferases (DNMT2) subfamilies based on their domain organization. First three MTases represent DNA MTases, whereas DNMT2 represents a transfer RNA (tRNA) MTase. Structural comparison of all the MTases in plants with known MTases in mammalian and plant systems have been reported to assign structural features in context of biological functions of these proteins. The structure analysis clearly specified regions crucial for protein-protein interactions and regions important for nucleosome binding in various domains of CMT and MET proteins. In addition, structural model of DRM suggested that circular permutation of motifs does not have any effect on overall structure of DNA methyltransferase domain. These results provide valuable insights into role of various domains in molecular recognition and should facilitate mechanistic understanding of their function in mediating specific methylation patterns. Further, the comprehensive gene expression analyses of MTases in legumes provided evidence of their role in various developmental processes throughout the plant life cycle and response to various abiotic stresses. Overall, our study will be very helpful in establishing the specific functions of DNA MTases in legumes. PMID:24586452

  11. Comparative analysis of syntenic genes in grass genomes reveals accelerated rates of gene structure and coding sequence evolution in polyploid wheat

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cycles of whole genome duplication (WGD) and diploidization are hallmarks of eukaryotic genome evolution and speciation. Polyploid wheat (Triticum aestivum) has had a massive increase in genome size largely due to recent WGDs. How these processes may impact the dynamics of gene evolution was studied...

  12. Clustering of gene ontology terms in genomes.

    PubMed

    Tiirikka, Timo; Siermala, Markku; Vihinen, Mauno

    2014-10-25

    Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at http://bioinf.uta.fi/GOme. The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them. PMID:24995610

  13. Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage

    PubMed Central

    2012-01-01

    Background Bathycoccus prasinos is an extremely small cosmopolitan marine green alga whose cells are covered with intricate spider's web patterned scales that develop within the Golgi cisternae before their transport to the cell surface. The objective of this work is to sequence and analyze its genome, and to present a comparative analysis with other known genomes of the green lineage. Research Its small genome of 15 Mb consists of 19 chromosomes and lacks transposons. Although 70% of all B. prasinos genes share similarities with other Viridiplantae genes, up to 428 genes were probably acquired by horizontal gene transfer, mainly from other eukaryotes. Two chromosomes, one big and one small, are atypical, an unusual synapomorphic feature within the Mamiellales. Genes on these atypical outlier chromosomes show lower GC content and a significant fraction of putative horizontal gene transfer genes. Whereas the small outlier chromosome lacks colinearity with other Mamiellales and contains many unknown genes without homologs in other species, the big outlier shows a higher intron content, increased expression levels and a unique clustering pattern of housekeeping functionalities. Four gene families are highly expanded in B. prasinos, including sialyltransferases, sialidases, ankyrin repeats and zinc ion-binding genes, and we hypothesize that these genes are associated with the process of scale biogenesis. Conclusion The minimal genomes of the Mamiellophyceae provide a baseline for evolutionary and functional analyses of metabolic processes in green plants. PMID:22925495

  14. Alternative splicing and genomic structure of the Wilms tumor gene WT1.

    PubMed Central

    Haber, D A; Sohn, R L; Buckler, A J; Pelletier, J; Call, K M; Housman, D E

    1991-01-01

    The chromosome 11p13 Wilms tumor susceptibility gene WT1 appears to play a crucial role in regulating the proliferation and differentiation of nephroblasts and gonadal tissue. The WT1 gene consists of 10 exons, encoding a complex pattern of mRNA species: four distinct transcripts are expressed, reflecting the presence or absence of two alternative splices. Splice I consists of a separate exon, encoding 17 amino acids, which is inserted between the proline-rich amino terminus and the zinc finger domains. Splice II arises from the use of an alternative 5' splice junction and results in the insertion of 3 amino acids between zinc fingers 3 and 4. RNase protection analysis demonstrates that the most prevalent splice variant in both human and mouse is that which contains both alternative splices, whereas the least common is the transcript missing both splices. The relative distribution of splice variants is highly conserved between normal fetal kidney tissue and Wilms tumors that have intact WT1 transcripts. The ratio of these different WT1 mRNA species is also maintained as a function of development in the mouse kidney and in various mouse tissues expressing WT1. The conservation in structure and relative levels of each of the four WT1 mRNA species suggests that each encoded polypeptide makes a significant contribution to normal gene function. The control of cellular proliferation and differentiation exerted by the WT1 gene products may involve interactions between four polypeptides with distinct targets and functions. Images PMID:1658787

  15. Macronuclear genome structure of the ciliate Nyctotherus ovalis: Single-gene chromosomes and tiny introns

    PubMed Central

    Ricard, Guénola; de Graaf, Rob M; Dutilh, Bas E; Duarte, I; van Alen, Theo A; van Hoek, Angela HAM; Boxma, Brigitte; van der Staay, Georg WM; Moon-van der Staay, Seung Yeo; Chang, Wei-Jen; Landweber, Laura F; Hackstein, Johannes HP; Huynen, Martijn A

    2008-01-01

    Background Nyctotherus ovalis is a single-celled eukaryote that has hydrogen-producing mitochondria and lives in the hindgut of cockroaches. Like all members of the ciliate taxon, it has two types of nuclei, a micronucleus and a macronucleus. N. ovalis generates its macronuclear chromosomes by forming polytene chromosomes that subsequently develop into macronuclear chromosomes by DNA elimination and rearrangement. Results We examined the structure of these gene-sized macronuclear chromosomes in N. ovalis. We determined the telomeres, subtelomeric regions, UTRs, coding regions and introns by sequencing a large set of macronuclear DNA sequences (4,242) and cDNAs (5,484) and comparing them with each other. The telomeres consist of repeats CCC(AAAACCCC)n, similar to those in spirotrichous ciliates such as Euplotes, Sterkiella (Oxytricha) and Stylonychia. Per sequenced chromosome we found evidence for either a single protein-coding gene, a single tRNA, or the complete ribosomal RNAs cluster. Hence the chromosomes appear to encode single transcripts. In the short subtelomeric regions we identified a few overrepresented motifs that could be involved in gene regulation, but there is no consensus polyadenylation site. The introns are short (21–29 nucleotides), and a significant fraction (1/3) of the tiny introns is conserved in the distantly related ciliate Paramecium tetraurelia. As has been observed in P. tetraurelia, the N. ovalis introns tend to contain in-frame stop codons or have a length that is not dividable by three. This pattern causes premature termination of mRNA translation in the event of intron retention, and potentially degradation of unspliced mRNAs by the nonsense-mediated mRNA decay pathway. Conclusion The combination of short leaders, tiny introns and single genes leads to very minimal macronuclear chromosomes. The smallest we identified contained only 150 nucleotides. PMID:19061489

  16. Genome-Wide Analysis Reveals Novel Genes Influencing Temporal Lobe Structure with Relevance to Neurodegeneration in Alzheimer’s Disease

    PubMed Central

    Stein, Jason L.; Hua, Xue; Morra, Jonathan H.; Lee, Suh; Hibar, Derrek P.; Ho, April J.; Leow, Alex D.; Toga, Arthur W.; Sul, Jae Hoon; Kang, Hyun Min; Eskin, Eleazar; Saykin, Andrew J.; Shen, Li; Foroud, Tatiana; Pankratz, Nathan; Huentelman, Matthew J.; Craig, David W.; Gerber, Jill D.; Allen, April N.; Corneveaux, Jason J.; Stephan, Dietrich A.; Webster, Jennifer; DeChairo, Bryan M.; Potkin, Steven G.; Jack, Clifford R.; Weiner, Michael W.; Thompson, Paul M.

    2010-01-01

    In a genome-wide association study of structural brain degeneration, we mapped the 3D profile of temporal lobe volume differences in 742 brain MRI scans of Alzheimer’s disease patients, mildly impaired, and healthy elderly subjects. After searching 546,314 genomic markers, 2 single nucleotide polymorphisms (SNPs) were associated with bilateral temporal lobe volume (P < 5×10−7). One SNP, rs10845840, is located in the GRIN2B gene which encodes the N-Methyl-D-Aspartate (NMDA) glutamate receptor NR2B subunit. This protein - involved in learning and memory, and excitotoxic cell death - has age-dependent prevalence in the synapse and is already a therapeutic target in Alzheimer’s disease. Risk alleles for lower temporal lobe volume at this SNP were significantly over-represented in AD and MCI subjects versus controls (odds ratio = 1.273; P = 0.039) and were associated with the mini-mental state exam (MMSE; t = −2.114; P = 0.035) demonstrating a negative effect on global cognitive function. Voxelwise maps of genetic association of this SNP with regional brain volumes, revealed intense temporal lobe effects (FDR correction at q = 0.05; critical P = 0.0257). This study uses large-scale brain mapping for gene discovery with implications for Alzheimer’s disease. PMID:20197096

  17. Evolution of Pulmonate Gastropod Mitochondrial Genomes: Comparisons of Gene Organizations of Euhadra, Cepaea and Albinaria and Implications of Unusual Trna Secondary Structures

    PubMed Central

    Yamazaki, N.; Ueshima, R.; Terrett, J. A.; Yokobori, S. I.; Kaifu, M.; Segawa, R.; Kobayashi, T.; Numachi, K. I.; Ueda, T.; Nishikawa, K.; Watanabe, K.; Thomas, R. H.

    1997-01-01

    Complete gene organizations of the mitochondrial genomes of three pulmonate gastropods, Euhadra herklotsi, Cepaea nemoralis and Albinaria coerulea, permit comparisons of their gene organizations. Euhadra and Cepaea are classified in the same superfamily, Helicoidea, yet they show several differences in the order of tRNA and protein coding genes. Albinaria is distantly related to the other two genera but shares the same gene order in one part of its mitochondrial genome with Euhadra and in another part with Cepaea. Despite their small size (14.1-14.5 kbp), these snail mtDNAs encode 13 protein genes, two rRNA genes and at least 22 tRNA genes. These genomes exhibit several unusual or unique features compared to other published metazoan mitochondrial genomes, including those of other molluscs. Several tRNAs predicted from the DNA sequences possess bizarre structures lacking either the T stem or the D stem, similar to the situation seen in nematode mt-tRNAs. The acceptor stems of many tRNAs show a considerable number of mismatched basepairs, indicating that the RNA editing process recently demonstrated in Euhadra is widespread in the pulmonate gastropods. Strong selection acting on mitochondrial genomes of these animals would have resulted in frequent occurrence of the mismatched basepairs in regions of overlapping genes. PMID:9055084

  18. The human mitochondrial elongation factor tu (EF-Tu) gene: cDNA sequence, genomic localization, genomic structure, and identification of a pseudogene.

    PubMed

    Ling, M; Merante, F; Chen, H S; Duff, C; Duncan, A M; Robinson, B H

    1997-09-15

    The human mitochondrial elongation factor Tu (EF-Tu) is nuclear-encoded and functions in the translational apparatus of mitochondria. The complete human EF-Tu cDNA sequence of 1677 base pairs (bp) with a 101 bp 5'-untranslated region, a 1368 bp coding region, and a 207 bp 3'-untranslated region, has been determined and updated. The predicted protein from this cDNA sequence is approximately 49.8 kDa in size and is composed of 455 amino acids (aa) with a putative N-terminal mitochondrial leader sequence of approximately 50 aa residues. The predicted amino acid sequence shows high similarity to other EF-Tu protein sequences from ox, yeast, and bacteria, and also shows limited similarity to human cystolic elongation factor 1 alpha. The complete size of this cDNA (1677 bp) obtained by cloning and sequencing was confirmed by Northern blot analysis, which showed a single transcript (mRNA) of approximately 1.7 kb in human liver. The genomic structure of this EF-Tu gene has been determined for the first time. This gene contains nine introns with a predicted size of approximately 3.6 kilobases (kb) and has been mapped to chromosome 16p11.2. In addition, an intronless pseudogene of approximately 1.7 kb with 92.6% nucleotide sequence similarity to the EF-Tu gene has also been identified and mapped to chromosome 17q11.2. PMID:9332382

  19. Short interspersed nuclear elements (SINEs) are abundant in Solanaceae and have a family-specific impact on gene structure and genome organization.

    PubMed

    Seibt, Kathrin M; Wenke, Torsten; Muders, Katja; Truberg, Bernd; Schmidt, Thomas

    2016-05-01

    Short interspersed nuclear elements (SINEs) are highly abundant non-autonomous retrotransposons that are widespread in plants. They are short in size, non-coding, show high sequence diversity, and are therefore mostly not or not correctly annotated in plant genome sequences. Hence, comparative studies on genomic SINE populations are rare. To explore the structural organization and impact of SINEs, we comparatively investigated the genome sequences of the Solanaceae species potato (Solanum tuberosum), tomato (Solanum lycopersicum), wild tomato (Solanum pennellii), and two pepper cultivars (Capsicum annuum). Based on 8.5 Gbp sequence data, we annotated 82 983 SINE copies belonging to 10 families and subfamilies on a base pair level. Solanaceae SINEs are dispersed over all chromosomes with enrichments in distal regions. Depending on the genome assemblies and gene predictions, 30% of all SINE copies are associated with genes, particularly frequent in introns and untranslated regions (UTRs). The close association with genes is family specific. More than 10% of all genes annotated in the Solanaceae species investigated contain at least one SINE insertion, and we found genes harbouring up to 16 SINE copies. We demonstrate the involvement of SINEs in gene and genome evolution including the donation of splice sites, start and stop codons and exons to genes, enlargement of introns and UTRs, generation of tandem-like duplications and transduction of adjacent sequence regions.

  20. Genome-wide Analyses of the Structural Gene Families Involved in the Legume-specific 5-Deoxyisoflavonoid Biosynthesis of Lotus japonicus

    PubMed Central

    Shimada, Norimoto; Sato, Shusei; Akashi, Tomoyoshi; Nakamura, Yasukazu; Tabata, Satoshi; Ayabe, Shin-ichi; Aoki, Toshio

    2007-01-01

    Abstract A model legume Lotus japonicus (Regel) K. Larsen is one of the subjects of genome sequencing and functional genomics programs. In the course of targeted approaches to the legume genomics, we analyzed the genes encoding enzymes involved in the biosynthesis of the legume-specific 5-deoxyisoflavonoid of L. japonicus, which produces isoflavan phytoalexins on elicitor treatment. The paralogous biosynthetic genes were assigned as comprehensively as possible by biochemical experiments, similarity searches, comparison of the gene structures, and phylogenetic analyses. Among the 10 biosynthetic genes investigated, six comprise multigene families, and in many cases they form gene clusters in the chromosomes. Semi-quantitative reverse transcriptase–PCR analyses showed coordinate up-regulation of most of the genes during phytoalexin induction and complex accumulation patterns of the transcripts in different organs. Some paralogous genes exhibited similar expression specificities, suggesting their genetic redundancy. The molecular evolution of the biosynthetic genes is discussed. The results presented here provide reliable annotations of the genes and genetic markers for comparative and functional genomics of leguminous plants. PMID:17452423

  1. Lessons from Structural Genomics*

    PubMed Central

    Terwilliger, Thomas C.; Stuart, David; Yokoyama, Shigeyuki

    2010-01-01

    A decade of structural genomics, the large-scale determination of protein structures, has generated a wealth of data and many important lessons for structural biology and for future large-scale projects. These lessons include a confirmation that it is possible to construct large-scale facilities that can determine the structures of a hundred or more proteins per year, that these structures can be of high quality, and that these structures can have an important impact. Technology development has played a critical role in structural genomics, the difficulties at each step of determining a structure of a particular protein can be quantified, and validation of technologies is nearly as important as the technologies themselves. Finally, rapid deposition of data in public databases has increased the impact and usefulness of the data and international cooperation has advanced the field and improved data sharing. PMID:19416074

  2. Sequencing and analysis of the prolate-headed lactococcal bacteriophage c2 genome and identification of the structural genes.

    PubMed

    Lubbers, M W; Waterfield, N R; Beresford, T P; Le Page, R W; Jarvis, A W

    1995-12-01

    The 22,163-bp genome of the lactococcal prolate-headed phage c2 was sequenced. Thirty-nine open reading frames (ORFs), early and late promoters, and a putative transcription terminator were identified. Twenty-two ORFs were in the early gene region, and 17 were in the late gene region. Putative genes for a DNA polymerase, a recombination protein, a sigma factor protein, a transcription regulatory protein, holin proteins, and a terminase were identified. Transcription of the early and late genes proceeded divergently from a noncoding 611-bp region. A 521-bp fragment contained within the 611-bp intergenic region could act as an origin of replication in Lactococcus lactis. Three major structural proteins, with sizes of 175, 90, and 29 kDa, and eight minor proteins, with sizes of 143, 82, 66, 60, 44, 42, 32, and 28 kDa, were identified. Several of these proteins appeared to be posttranslationally modified by proteolytic cleavage. The 175- and 90-kDa proteins were identified as the major phage head proteins, and the 29- and 60-kDa proteins were identified as the major tail protein and (possibly) the tail adsorption protein, respectively. The head proteins appeared to be covalently linked multimers of the same 30-kDa gene product. Phage c2 and prolate-headed lactococcal phage bIL67 (C. Schouler, S. D. Ehrlich, and M.-C. Chopin, Microbiology 140:3061-3069, 1994) shared 80% nucleotide sequence identity. However, several DNA deletions or insertions which corresponded to the loss or acquisition of specific ORFs, respectively, were noted. The identification of direct nucleotide repeats flanking these sequences indicated that recombination may be important in the evolution of these phages.(ABSTRACT TRUNCATED AT 250 WORDS)

  3. Genomic structure of the human plasma prekallikrein gene, identification of allelic variants, and analysis in end-stage renal disease.

    PubMed

    Yu, H; Anderson, P J; Freedman, B I; Rich, S S; Bowden, D W

    2000-10-15

    Kallikreins are serine proteases that catalyze the release of kinins and other vasoactive peptides. Previously, we have studied one tissue-specific (H. Yu et al., 1996, J. Am. Soc. Nephrol. 7: 2559-2564) and one plasma-specific (H. Yu et al., 1998, Hypertension 31: 906-911) human kallikrein gene in end-stage renal disease (ESRD). Short sequence repeat polymorphisms for the human plasma kallikrein gene (KLKB1; previously known as KLK3) on chromosome 4 were associated with ESRD in an African American study population. This study of KLKB1 in ESRD has been extended by determining the genomic structure of KLKB1 and searching for allelic variants that may be associated with ESRD. Exon-spanning PCR primer sets were identified by serial testing of primer pairs designed from KLKB1 cDNA sequence and DNA sequencing of PCR products. Like the rat plasma kallikrein gene and the closely related human factor XI gene, the human KLKB1 gene contains 15 exons and 14 introns. The longest intron, F, is almost 12 kb long. The total length of the gene is approximately 30 kb. Sequence of the 5'-proximal promoter region of KLKB1 was obtained by shotgun cloning of genomic fragments from a bacterial artificial clone containing the KLKB1 gene, followed by screening of the clones using exon 1-specific probes. Primers flanking the exons and 5'-proximal promoter region were used to screen for allelic variants in the genomic DNA from ESRD patients and controls using the single-strand conformation polymorphism technique. We identified 12 allelic variants in the 5'-proximal promoter and 7 exons. Of note were a common polymorphism (30% of the population) at position 521 of KLKB1 cDNA, which leads to the replacement of asparagine with a serine at position 124 in the heavy chain of the A2 domain of the protein. In addition, an A716C polymorphism in exon 7 resulting in the amino acid change H189P in the A3 domain of the heavy chain was observed in 5 patients belonging to 3 ESRD families. A third

  4. Genomic structure of the human plasma prekallikrein gene, identification of allelic variants, and analysis in end-stage renal disease.

    PubMed

    Yu, H; Anderson, P J; Freedman, B I; Rich, S S; Bowden, D W

    2000-10-15

    Kallikreins are serine proteases that catalyze the release of kinins and other vasoactive peptides. Previously, we have studied one tissue-specific (H. Yu et al., 1996, J. Am. Soc. Nephrol. 7: 2559-2564) and one plasma-specific (H. Yu et al., 1998, Hypertension 31: 906-911) human kallikrein gene in end-stage renal disease (ESRD). Short sequence repeat polymorphisms for the human plasma kallikrein gene (KLKB1; previously known as KLK3) on chromosome 4 were associated with ESRD in an African American study population. This study of KLKB1 in ESRD has been extended by determining the genomic structure of KLKB1 and searching for allelic variants that may be associated with ESRD. Exon-spanning PCR primer sets were identified by serial testing of primer pairs designed from KLKB1 cDNA sequence and DNA sequencing of PCR products. Like the rat plasma kallikrein gene and the closely related human factor XI gene, the human KLKB1 gene contains 15 exons and 14 introns. The longest intron, F, is almost 12 kb long. The total length of the gene is approximately 30 kb. Sequence of the 5'-proximal promoter region of KLKB1 was obtained by shotgun cloning of genomic fragments from a bacterial artificial clone containing the KLKB1 gene, followed by screening of the clones using exon 1-specific probes. Primers flanking the exons and 5'-proximal promoter region were used to screen for allelic variants in the genomic DNA from ESRD patients and controls using the single-strand conformation polymorphism technique. We identified 12 allelic variants in the 5'-proximal promoter and 7 exons. Of note were a common polymorphism (30% of the population) at position 521 of KLKB1 cDNA, which leads to the replacement of asparagine with a serine at position 124 in the heavy chain of the A2 domain of the protein. In addition, an A716C polymorphism in exon 7 resulting in the amino acid change H189P in the A3 domain of the heavy chain was observed in 5 patients belonging to 3 ESRD families. A third

  5. The walnut (Juglans regia) genome sequence reveals diversity in genes coding for the biosynthesis of non-structural polyphenols.

    PubMed

    Martínez-García, Pedro J; Crepeau, Marc W; Puiu, Daniela; Gonzalez-Ibeas, Daniel; Whalen, Jeanne; Stevens, Kristian A; Paul, Robin; Butterfield, Timothy S; Britton, Monica T; Reagan, Russell L; Chakraborty, Sandeep; Walawage, Sriema L; Vasquez-Gross, Hans A; Cardeno, Charis; Famula, Randi A; Pratt, Kevin; Kuruganti, Sowmya; Aradhya, Mallikarjuna K; Leslie, Charles A; Dandekar, Abhaya M; Salzberg, Steven L; Wegrzyn, Jill L; Langley, Charles H; Neale, David B

    2016-09-01

    The Persian walnut (Juglans regia L.), a diploid species native to the mountainous regions of Central Asia, is the major walnut species cultivated for nut production and is one of the most widespread tree nut species in the world. The high nutritional value of J. regia nuts is associated with a rich array of polyphenolic compounds, whose complete biosynthetic pathways are still unknown. A J. regia genome sequence was obtained from the cultivar 'Chandler' to discover target genes and additional unknown genes. The 667-Mbp genome was assembled using two different methods (SOAPdenovo2 and MaSuRCA), with an N50 scaffold size of 464 955 bp (based on a genome size of 606 Mbp), 221 640 contigs and a GC content of 37%. Annotation with MAKER-P and other genomic resources yielded 32 498 gene models. Previous studies in walnut relying on tissue-specific methods have only identified a single polyphenol oxidase (PPO) gene (JrPPO1). Enabled by the J. regia genome sequence, a second homolog of PPO (JrPPO2) was discovered. In addition, about 130 genes in the large gallate 1-β-glucosyltransferase (GGT) superfamily were detected. Specifically, two genes, JrGGT1 and JrGGT2, were significantly homologous to the GGT from Quercus robur (QrGGT), which is involved in the synthesis of 1-O-galloyl-β-d-glucose, a precursor for the synthesis of hydrolysable tannins. The reference genome for J. regia provides meaningful insight into the complex pathways required for the synthesis of polyphenols. The walnut genome sequence provides important tools and methods to accelerate breeding and to facilitate the genetic dissection of complex traits.

  6. Dynamic structures in phytoplasma genomes: sequence variable mosaics (SVMs) of clustered genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Emergence of the phytoplasma clade from an Acholeplasma-like ancestor gave rise to an intriguing group of cell wall-less prokaryotes through a remarkable and continuing evolutionary process. In a ceaseless progression, phytoplasmas have evolved reduced genomes, losing biochemical pathways for synth...

  7. Population Structure and Comparative Genome Hybridization of European Flor Yeast Reveal a Unique Group of Saccharomyces cerevisiae Strains with Few Gene Duplications in Their Genome

    PubMed Central

    Legras, Jean-Luc; Erny, Claude; Charpentier, Claudine

    2014-01-01

    Wine biological aging is a wine making process used to produce specific beverages in several countries in Europe, including Spain, Italy, France, and Hungary. This process involves the formation of a velum at the surface of the wine. Here, we present the first large scale comparison of all European flor strains involved in this process. We inferred the population structure of these European flor strains from their microsatellite genotype diversity and analyzed their ploidy. We show that almost all of these flor strains belong to the same cluster and are diploid, except for a few Spanish strains. Comparison of the array hybridization profile of six flor strains originating from these four countries, with that of three wine strains did not reveal any large segmental amplification. Nonetheless, some genes, including YKL221W/MCH2 and YKL222C, were amplified in the genome of four out of six flor strains. Finally, we correlated ICR1 ncRNA and FLO11 polymorphisms with flor yeast population structure, and associate the presence of wild type ICR1 and a long Flo11p with thin velum formation in a cluster of Jura strains. These results provide new insight into the diversity of flor yeast and show that combinations of different adaptive changes can lead to an increase of hydrophobicity and affect velum formation. PMID:25272156

  8. The impact of genome-wide supported schizophrenia risk variants in the neurogranin gene on brain structure and function.

    PubMed

    Walton, Esther; Geisler, Daniel; Hass, Johanna; Liu, Jingyu; Turner, Jessica; Yendiki, Anastasia; Smolka, Michael N; Ho, Beng-Choon; Manoach, Dara S; Gollub, Randy L; Roessner, Veit; Calhoun, Vince D; Ehrlich, Stefan

    2013-01-01

    The neural mechanisms underlying genetic risk for schizophrenia, a highly heritable psychiatric condition, are still under investigation. New schizophrenia risk genes discovered through genome-wide association studies (GWAS), such as neurogranin (NRGN), can be used to identify these mechanisms. In this study we examined the association of two common NRGN risk single nucleotide polymorphisms (SNPs) with functional and structural brain-based intermediate phenotypes for schizophrenia. We obtained structural, functional MRI and genotype data of 92 schizophrenia patients and 114 healthy volunteers from the multisite Mind Clinical Imaging Consortium study. Two schizophrenia-associated NRGN SNPs (rs12807809 and rs12541) were tested for association with working memory-elicited dorsolateral prefrontal cortex (DLPFC) activity and surface-wide cortical thickness. NRGN rs12541 risk allele homozygotes (TT) displayed increased working memory-related activity in several brain regions, including the left DLPFC, left insula, left somatosensory cortex and the cingulate cortex, when compared to non-risk allele carriers. NRGN rs12807809 non-risk allele (C) carriers showed reduced cortical gray matter thickness compared to risk allele homozygotes (TT) in an area comprising the right pericalcarine gyrus, the right cuneus, and the right lingual gyrus. Our study highlights the effects of schizophrenia risk variants in the NRGN gene on functional and structural brain-based intermediate phenotypes for schizophrenia. These results support recent GWAS findings and further implicate NRGN in the pathophysiology of schizophrenia by suggesting that genetic NRGN risk variants contribute to subtle changes in neural functioning and anatomy that can be quantified with neuroimaging methods. PMID:24098564

  9. Improved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangements

    PubMed Central

    Jühling, Frank; Pütz, Joern; Bernt, Matthias; Donath, Alexander; Middendorf, Martin; Florentz, Catherine; Stadler, Peter F.

    2012-01-01

    Transfer RNAs (tRNAs) are present in all types of cells as well as in organelles. tRNAs of animal mitochondria show a low level of primary sequence conservation and exhibit ‘bizarre’ secondary structures, lacking complete domains of the common cloverleaf. Such sequences are hard to detect and hence frequently missed in computational analyses and mitochondrial genome annotation. Here, we introduce an automatic annotation procedure for mitochondrial tRNA genes in Metazoa based on sequence and structural information in manually curated covariance models. The method, applied to re-annotate 1876 available metazoan mitochondrial RefSeq genomes, allows to distinguish between remaining functional genes and degrading ‘pseudogenes’, even at early stages of divergence. The subsequent analysis of a comprehensive set of mitochondrial tRNA genes gives new insights into the evolution of structures of mitochondrial tRNA sequences as well as into the mechanisms of genome rearrangements. We find frequent losses of tRNA genes concentrated in basal Metazoa, frequent independent losses of individual parts of tRNA genes, particularly in Arthropoda, and wide-spread conserved overlaps of tRNAs in opposite reading direction. Direct evidence for several recent Tandem Duplication-Random Loss events is gained, demonstrating that this mechanism has an impact on the appearance of new mitochondrial gene orders. PMID:22139921

  10. Gene structure and expression

    SciTech Connect

    Hawkins, J. )

    1990-01-01

    This book describes the structure of genes in molecular terms and summarizes present knowledge about how their activity is regulated. It covers a range of topics, including a review of the structure and replication of DNA, transcription and translation, prokaryotic and eukaryotic gene organization and expression, retroviruses and oncogenes. The book also includes a chapter on the methodology of DNA manipulation including sections on site-directed mutagenesis, the polymerase chain reaction, reporter genes and restriction fragment length polymorphisms. The hemoglobin gene system and the genetics of the proteins of the immune system are presented in the latter half of the book to show the structure and expression of the most well-studied systems in higher eukaryotes. The final chapter reviews the differences between prokaryotic and the eukaryotic genomes.

  11. Comparative assessment of the pig, mouse, and human genomes: A structural and functional analysis of genes involved in immunity

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A detailed analysis was conducted on portions of the porcine, murine, and human genome associated with the immune response. It was found that non-protein coding RNA/DNA that potentially interact and regulate gene expression, nucleotide similarity, isochore type, and the similarity of 5’ and 3’ UTR ...

  12. Genomic structure and chromosomal mapping of the gene coding for ICBP90, a protein involved in the regulation of the topoisomerase IIalpha gene expression.

    PubMed

    Hopfner, R; Mousli, M; Garnier, J M; Redon, R; du Manoir, S; Chatton, B; Ghyselinck, N; Oudet, P; Bronner, C

    2001-03-21

    We have recently identified a novel CCAAT box binding protein (ICBP90) involved in the regulation of topoisomerase IIalpha gene expression. We have observed that it is expressed in non-tumoral proliferating human lung fibroblast cells whereas in HeLa cells, a tumoral cell line, ICBP90 was still present even when cells were at confluence. In the present study, we have determined the ICBP90 gene structure by screening of a human placenta genomic library and PCR analysis. We report that the ICBP90 gene spans about 35.8 kb and contains six coding exons named A to F. In the 5' upstream sequence of the region containing the coding exons, two additional exons (I and II) were found. Additionally, an internal splicing site was found in exon A. A promoter region, including three putative Sp1 binding sites between exons I and A, was identified by transient transfection. Northern blot analysis of several cancer cell lines revealed the existence of two ICBP90 mRNA species of 5.1 and 4.3 kb that are transcribed from the gene. The relative amounts of these mRNAs depended on the cell type. In MOLT-4 cells and Burkitt's lymphoma Raji cells, the 4.3 kb or the 5.1 kb transcripts were mainly observed, respectively. In other cell lines, such as HL-60 cells, chronic myelogenous leukaemia K-562, lung carcinoma A549, HeLa or colorectal SW480, both 4.3 and 5.1 kb forms of ICBP90 mRNA could be detected. Interestingly, western blot analysis showed several ICBP90 protein bands in HeLa but only a single band in MOLT-4 cell extracts. Taken together our results are consistent with the ICBP90 gene exhibiting alternative splicing and promoter usage in a cell-specific manner. PMID:11290415

  13. Genomic structure and chromosomal localization of GML (GPI-anchored molecule-like protein), a gene induced by p53

    SciTech Connect

    Kimura, Yasutoshi |; Furuhata, Tomohisa; Nakamura, Yusuke

    1997-05-01

    Among its known functions, tumor suppressor gene p53 serves as a transcriptional regulator and mediates various signals through activation of downstream genes. We recently identified a novel gene, GML (glycosylphosphatidylinositol (GPI)-anchored molecule-like protein), whose expression is specifically induced by wildtype p53. To characterize the GML gene further, we determined 35.8 kb of DNA sequence that included a consensus binding sequence for p53 and the entire GML gene. The GML gene consists of four exons, and the p53-binding sequence is present in the 5{prime}-flanking region. In genomic organization this gene resembles genes encoding murine Ly-6 glycoproteins, a human homologue of the Ly-6 family called RIG-E, and CD59; products of these genes, known as GPI-anchored proteins, are variously involved in signal transduction, cell-cell adhesion, and cell-matrix attachment. FISH analysis revealed that the GML gene is located on human chromosome 8q24.3. Genes encoding at least two other GPI-anchored molecules, E48 and RIG-E, are also located in this region. 20 refs., 2 figs., 1 tab.

  14. Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes

    PubMed Central

    Matus, José Tomás; Aquea, Felipe; Arce-Johnson, Patricio

    2008-01-01

    Background The MYB superfamily constitutes the most abundant group of transcription factors described in plants. Members control processes such as epidermal cell differentiation, stomatal aperture, flavonoid synthesis, cold and drought tolerance and pathogen resistance. No genome-wide characterization of this family has been conducted in a woody species such as grapevine. In addition, previous analysis of the recently released grape genome sequence suggested expansion events of several gene families involved in wine quality. Results We describe and classify 108 members of the grape R2R3 MYB gene subfamily in terms of their genomic gene structures and similarity to their putative Arabidopsis thaliana orthologues. Seven gene models were derived and analyzed in terms of gene expression and their DNA binding domain structures. Despite low overall sequence homology in the C-terminus of all proteins, even in those with similar functions across Arabidopsis and Vitis, highly conserved motif sequences and exon lengths were found. The grape epidermal cell fate clade is expanded when compared with the Arabidopsis and rice MYB subfamilies. Two anthocyanin MYBA related clusters were identified in chromosomes 2 and 14, one of which includes the previously described grape colour locus. Tannin related loci were also detected with eight candidate homologues in chromosomes 4, 9 and 11. Conclusion This genome wide transcription factor analysis in Vitis suggests that clade-specific grape R2R3 MYB genes are expanded while other MYB genes could be well conserved compared to Arabidopsis. MYB gene abundance, homology and orientation within particular loci also suggests that expanded MYB clades conferring quality attributes of grapes and wines, such as colour and astringency, could possess redundant, overlapping and cooperative functions. PMID:18647406

  15. Insights into structural variations and genome rearrangements in prokaryotic genomes.

    PubMed

    Periwal, Vinita; Scaria, Vinod

    2015-01-01

    Structural variations (SVs) are genomic rearrangements that affect fairly large fragments of DNA. Most of the SVs such as inversions, deletions and translocations have been largely studied in context of genetic diseases in eukaryotes. However, recent studies demonstrate that genome rearrangements can also have profound impact on prokaryotic genomes, leading to altered cell phenotype. In contrast to single-nucleotide variations, SVs provide a much deeper insight into organization of bacterial genomes at a much better resolution. SVs can confer change in gene copy number, creation of new genes, altered gene expression and many other functional consequences. High-throughput technologies have now made it possible to explore SVs at a much refined resolution in bacterial genomes. Through this review, we aim to highlight the importance of the less explored field of SVs in prokaryotic genomes and their impact. We also discuss its potential applicability in the emerging fields of synthetic biology and genome engineering where targeted SVs could serve to create sophisticated and accurate genome editing.

  16. Synonymous Codon Usage Bias in the Plastid Genome is Unrelated to Gene Structure and Shows Evolutionary Heterogeneity.

    PubMed

    Qi, Yueying; Xu, Wenjing; Xing, Tian; Zhao, Mingming; Li, Nana; Yan, Li; Xia, Guangmin; Wang, Mengcheng

    2015-01-01

    Synonymous codon usage bias (SCUB) is the nonuniform usage of codons, occurring often in nearly all organisms. Our previous study found that SCUB is correlated with intron number, is unequal among exons in the plant nuclear genome, and mirrors evolutionary specialization. However, whether this rule exists in the plastid genome has not been addressed. Here, we present an analysis of SCUB in the plastid genomes of 25 species from lower to higher plants (algae, bryophytes, pteridophytes, gymnosperms, and spermatophytes). We found NNA and NNT (A- and T-ending codons) are preferential in the plastid genomes of all plants. Interestingly, this preference is heterogeneous among taxonomies of plants, with the strongest preference in bryophytes and the weakest in pteridophytes, suggesting an association between SCUB and plant evolution. In addition, SCUB frequencies are consistent among genes with varied introns and among exons, indicating that the bias of NNA and NNT is unrelated to either intron number or exon position. Further, SCUB is associated with DNA methylation-induced conversion of cytosine to thymine in the vascular plants but not in algae or bryophytes. These data demonstrate that these SCUB profiles in the plastid genome are distinctly different compared with the nuclear genome.

  17. Synonymous Codon Usage Bias in the Plastid Genome is Unrelated to Gene Structure and Shows Evolutionary Heterogeneity

    PubMed Central

    Qi, Yueying; Xu, Wenjing; Xing, Tian; Zhao, Mingming; Li, Nana; Yan, Li; Xia, Guangmin; Wang, Mengcheng

    2015-01-01

    Synonymous codon usage bias (SCUB) is the nonuniform usage of codons, occurring often in nearly all organisms. Our previous study found that SCUB is correlated with intron number, is unequal among exons in the plant nuclear genome, and mirrors evolutionary specialization. However, whether this rule exists in the plastid genome has not been addressed. Here, we present an analysis of SCUB in the plastid genomes of 25 species from lower to higher plants (algae, bryophytes, pteridophytes, gymnosperms, and spermatophytes). We found NNA and NNT (A- and T-ending codons) are preferential in the plastid genomes of all plants. Interestingly, this preference is heterogeneous among taxonomies of plants, with the strongest preference in bryophytes and the weakest in pteridophytes, suggesting an association between SCUB and plant evolution. In addition, SCUB frequencies are consistent among genes with varied introns and among exons, indicating that the bias of NNA and NNT is unrelated to either intron number or exon position. Further, SCUB is associated with DNA methylation–induced conversion of cytosine to thymine in the vascular plants but not in algae or bryophytes. These data demonstrate that these SCUB profiles in the plastid genome are distinctly different compared with the nuclear genome. PMID:25922569

  18. 5' flanking sequence and genomic structure of Egr-1, a murine mitogen inducible zinc finger encoding gene.

    PubMed

    Tsai-Morris, C H; Cao, X M; Sukhatme, V P

    1988-09-26

    Egr-1 is a murine zinc finger encoding cDNA whose expression is modulated by a variety of ligand-receptor interactions and is often coregulated with c-fos (1). This study reports the isolation of a mouse Egr-1 genomic clone, its intron-exon structure, and 935 bp of 5' flanking sequence. The gene spans about 3.8 kb and consists of 2 exons and one 700 bp intron. S1 nuclease protection and primer extension analysis were used to define the transcription initiation site. "TATA" and "CCAAT" sequences were located at nucleotides -26 and -337 respectively. In addition, there exist five elements whose sequence is nearly identical to the inner core 10 nucleotide region (CCATATTAGG) of the c-fos serum response element, four Sp1 consensus sequences, two AP1 target sequence analogs, and two potential cAMP response elements. These results will ultimately lead to a detailed definition of the intracellular events regulating Egr-1 expression.

  19. Chromosomal localization, genomic structure, and allelic polymorphism of the human CD79a (lg-{alpha}/mb-1) gene

    SciTech Connect

    Hashimoto, S.; Gregersen, P.K.; Chiorazzi, N. |; Mohrenweiser, H.W.

    1994-12-31

    The germline DNA sequence of the human CD79a (Ig-{alpha}/mb-1) gene was determined by polymerase chain reaction sequencing of a cosmid clone derived from an arrayed human chromosome 19 library. The CD79a gene was localized to chromosome 19q13.2; this localization places the gene within the CEA-like gene cluster with the following gene order: -CEA-CGM1-CD79a-RPS11-ATP1A3-BGP-CGM9-. The genomic organization of the human CD79a gene resembles the mouse counterpart with five exons interrupted by four introns. Computer analyses suggest the presence of transcription regulatory elements known to be important in the regulation of mouse CD79a (AP-1, EBF, AP-2, MUF2, and SP-1 sites), as well as elements not found in the mouse gene (an NK-kB binding site and a series of E-box motifs). Similar to the mouse gene, the 5{prime} flanking region of human CD79a lacks a TATA box; however, unlike mouse CD79a, a classical octamer motif could not be identified in the human gene. Finally, a new Rsa I restriction fragment length polymorphism was defined in the non-coding regions of the human gene. 64 refs., 4 figs., 2 tabs.

  20. Genome Structure of the Symbiont Bifidobacterium pseudocatenulatum CECT 7765 and Gene Expression Profiling in Response to Lactulose-Derived Oligosaccharides.

    PubMed

    Benítez-Páez, Alfonso; Moreno, F Javier; Sanz, María L; Sanz, Yolanda

    2016-01-01

    Bifidobacterium pseudocatenulatum CECT 7765 was isolated from stools of a breast-fed infant. Although, this strain is generally considered an adult-type bifidobacterial species, it has also been shown to have pre-clinical efficacy in obesity models. In order to understand the molecular basis of its adaptation to complex carbohydrates and improve its potential functionality, we have analyzed its genome and transcriptome, as well as its metabolic output when growing in galacto-oligosaccharides derived from lactulose (GOS-Lu) as carbon source. B. pseudocatenulatum CECT 7765 shows strain-specific genome regions, including a great diversity of sugar metabolic-related genes. A preliminary and exploratory transcriptome analysis suggests candidate over-expression of several genes coding for sugar transporters and permeases; furthermore, five out of seven beta-galactosidases identified in the genome could be activated in response to GOS-Lu exposure. Here, we also propose that a specific gene cluster is involved in controlling the import and hydrolysis of certain di- and tri-saccharides, which seemed to be those primarily taken-up by the bifidobacterial strain. This was discerned from mass spectrometry-based quantification of different saccharide fractions of culture supernatants. Our results confirm that the expression of genes involved in sugar transport and metabolism and in the synthesis of leucine, an amino acid with a key role in glucose and energy homeostasis, was up-regulated by GOS-Lu. This was done using qPCR in addition to the exploratory information derived from the single-replicated RNAseq approach, together with the functional annotation of genes predicted to be encoded in the B. pseudocatenulatum CETC 7765 genome. PMID:27199952

  1. Genome Structure of the Symbiont Bifidobacterium pseudocatenulatum CECT 7765 and Gene Expression Profiling in Response to Lactulose-Derived Oligosaccharides

    PubMed Central

    Benítez-Páez, Alfonso; Moreno, F. Javier; Sanz, María L.; Sanz, Yolanda

    2016-01-01

    Bifidobacterium pseudocatenulatum CECT 7765 was isolated from stools of a breast-fed infant. Although, this strain is generally considered an adult-type bifidobacterial species, it has also been shown to have pre-clinical efficacy in obesity models. In order to understand the molecular basis of its adaptation to complex carbohydrates and improve its potential functionality, we have analyzed its genome and transcriptome, as well as its metabolic output when growing in galacto-oligosaccharides derived from lactulose (GOS-Lu) as carbon source. B. pseudocatenulatum CECT 7765 shows strain-specific genome regions, including a great diversity of sugar metabolic-related genes. A preliminary and exploratory transcriptome analysis suggests candidate over-expression of several genes coding for sugar transporters and permeases; furthermore, five out of seven beta-galactosidases identified in the genome could be activated in response to GOS-Lu exposure. Here, we also propose that a specific gene cluster is involved in controlling the import and hydrolysis of certain di- and tri-saccharides, which seemed to be those primarily taken-up by the bifidobacterial strain. This was discerned from mass spectrometry-based quantification of different saccharide fractions of culture supernatants. Our results confirm that the expression of genes involved in sugar transport and metabolism and in the synthesis of leucine, an amino acid with a key role in glucose and energy homeostasis, was up-regulated by GOS-Lu. This was done using qPCR in addition to the exploratory information derived from the single-replicated RNAseq approach, together with the functional annotation of genes predicted to be encoded in the B. pseudocatenulatum CETC 7765 genome. PMID:27199952

  2. Characterization of gene rearrangements resulted from genomic structural aberrations in human esophageal squamous cell carcinoma KYSE150 cells.

    PubMed

    Hao, Jia-Jie; Gong, Ting; Zhang, Yu; Shi, Zhi-Zhou; Xu, Xin; Dong, Jin-Tang; Zhan, Qi-Min; Fu, Song-Bin; Wang, Ming-Rong

    2013-01-15

    Chromosomal rearrangements and involved genes have been reported to play important roles in the development and progression of human malignancies. But the gene rearrangements in esophageal squamous cell carcinoma (ESCC) remain to be identified. In the present study, array-based comparative genomic hybridization (array-CGH) was performed on the ESCC cell line KYSE150. Eight disrupted genes were detected according to the obviously distinct unbalanced breakpoints. The splitting of these genes was validated by dual-color fluorescence in-situ hybridization (FISH). By using rapid amplification of cDNA ends (RACE), genome walking and sequencing analysis, we further identified gene disruptions and rearrangements. A fusion transcript DTL-1q42.2 was derived from an intrachromosomal rearrangement of chromosome 1. Highly amplified segments of DTL and PTPRD were self-rearranged. The sequences on either side of the junctions possess micro-homology with each other. FISH results indicated that the split DTL and PTPRD were also involved in comprising parts of the derivative chromosomes resulted from t(1q;9p;12p) and t(9;1;9). Further, we found that regions harboring DTL (1q32.3) and PTPRD (9p23) were also splitting in ESCC tumors. The data supplement significant information on the existing genetic background of KYSE150, which may be used as a model for studying these gene rearrangements.

  3. The region of phage T4 genes 34, 33 and 59: primary structures and organization on the genome.

    PubMed Central

    Hahn, S; Kruse, U; Rüger, W

    1986-01-01

    The product of gene 33 is essential for the regulation of late transcription and gene product 59 is required in recombination, DNA repair and replication. The exact functions of both proteins are not known. Restriction fragments spanning the genomic area of genes 33 and 59 have been cloned into phage M13 and a 4.9 kb nucleotide sequence has been determined. Translation of the DNA sequence predicted that gp33 contains 112 amino acids with a mol.wt. of 12.816 kd while gp59 is composed of 217 amino acids adding up to a mol.wt. of 25.967 kd. The genomic area studied here also contains 3 open reading frames of genes not identified to date and it is thought to include the NH2-terminal part of g34. One of the open reading frames seems to code for the 10 kd protein, probably involved in the regulation of transcription of bacteriophage T4. This protein is predicted to consist of 89 amino acid residues with a mol.wt. of 10.376 kd. Gene 33 and the gene for the 10 kd protein were cloned separately on high expression vectors resulting in over-production of the two proteins. Images PMID:3797242

  4. Human molybdopterin synthase gene: genomic structure and mutations in molybdenum cofactor deficiency type B.

    PubMed Central

    Reiss, J; Dorche, C; Stallmeyer, B; Mendel, R R; Cohen, N; Zabot, M T

    1999-01-01

    Biosynthesis of the molybdenum cofactor (MoCo) can be divided into (1) the formation of a precursor and (2) the latter's subsequent conversion, by molybdopterin synthase, into the organic moiety of MoCo. These two steps are reflected by the complementation groups A and B and the two formally distinguished types of MoCo deficiency that have an identical phenotype. Both types of MoCo deficiency result in a pleiotropic loss of all molybdoenzyme activities and cause severe neurological damage. MOCS1 is defective in patients with group A deficiency and has been shown to encode two enzymes for early synthesis via a bicistronic transcript with two consecutive open reading frames (ORFs). MOCS2 encodes the small and large subunits of molybdopterin synthase via a single transcript with two overlapping reading frames. This gene was mapped to 5q and comprises seven exons. The coding sequence and all splice site-junction sequences were screened for mutations, in MoCo-deficient patients in whom a previous search for MOCS1 mutations had been negative. In seven of the eight patients whom we investigated, we identified MOCS2 mutations that, by their nature, are most likely responsible for the deficiency. Three different frameshift mutations were observed, with one of them found on 7 of 14 identified alleles. Furthermore, a start-codon mutation and a missense mutation of a highly conserved amino acid residue were found. The locations of the mutations confirm the functional role of both ORFs. One of the patients with identified MOCS2 mutations had been classified as type B, in complementation studies. These findings support the hypothetical mechanism, for both forms of MoCo deficiency, that formerly had been established by cell-culture experiments. PMID:10053004

  5. Evolutionary origin of Rosaceae-specific active non-autonomous hAT elements and their contribution to gene regulation and genomic structural variation.

    PubMed

    Wang, Lu; Peng, Qian; Zhao, Jianbo; Ren, Fei; Zhou, Hui; Wang, Wei; Liao, Liao; Owiti, Albert; Jiang, Quan; Han, Yuepeng

    2016-05-01

    Transposable elements account for approximately 30 % of the Prunus genome; however, their evolutionary origin and functionality remain largely unclear. In this study, we identified a hAT transposon family, termed Moshan, in Prunus. The Moshan elements consist of three types, aMoshan, tMoshan, and mMoshan. The aMoshan and tMoshan types contain intact or truncated transposase genes, respectively, while the mMoshan type is miniature inverted-repeat transposable element (MITE). The Moshan transposons are unique to Rosaceae, and the copy numbers of different Moshan types are significantly correlated. Sequence homology analysis reveals that the mMoshan MITEs are direct deletion derivatives of the tMoshan progenitors, and one kind of mMoshan containing a MuDR-derived fragment were amplified predominately in the peach genome. The mMoshan sequences contain cis-regulatory elements that can enhance gene expression up to 100-fold. The mMoshan MITEs can serve as potential sources of micro and long noncoding RNAs. Whole-genome re-sequencing analysis indicates that mMoshan elements are highly active, and an insertion into S-haplotype-specific F-box gene was reported to cause the breakdown of self-incompatibility in sour cherry. Taken together, all these results suggest that the mMoshan elements play important roles in regulating gene expression and driving genomic structural variation in Prunus.

  6. Brief Guide to Genomics: DNA, Genes and Genomes

    MedlinePlus

    ... guía de genómica A Brief Guide to Genomics DNA, Genes and Genomes Deoxyribonucleic acid (DNA) is the ... and lead to a disease such as cancer. DNA Sequencing Sequencing simply means determining the exact order ...

  7. Genome-wide identification, structural analysis and new insights into late embryogenesis abundant (LEA) gene family formation pattern in Brassica napus

    PubMed Central

    Liang, Yu; Xiong, Ziyi; Zheng, Jianxiao; Xu, Dongyang; Zhu, Zeyang; Xiang, Jun; Gan, Jianping; Raboanatahiry, Nadia; Yin, Yongtai; Li, Maoteng

    2016-01-01

    Late embryogenesis abundant (LEA) proteins are a diverse and large group of polypeptides that play important roles in desiccation and freezing tolerance in plants. The LEA family has been systematically characterized in some plants but not Brassica napus. In this study, 108 BnLEA genes were identified in the B. napus genome and classified into eight families based on their conserved domains. Protein sequence alignments revealed an abundance of alanine, lysine and glutamic acid residues in BnLEA proteins. The BnLEA gene structure has few introns (<3), and they are distributed unevenly across all 19 chromosomes in B. napus, occurring as gene clusters in chromosomes A9, C2, C4 and C5. More than two-thirds of the BnLEA genes are associated with segmental duplication. Synteny analysis revealed that most LEA genes are conserved, although gene losses or gains were also identified. These results suggest that segmental duplication and whole-genome duplication played a major role in the expansion of the BnLEA gene family. Expression profiles analysis indicated that expression of most BnLEAs was increased in leaves and late stage seeds. This study presents a comprehensive overview of the LEA gene family in B. napus and provides new insights into the formation of this family. PMID:27072743

  8. Genome-wide identification, structural analysis and new insights into late embryogenesis abundant (LEA) gene family formation pattern in Brassica napus.

    PubMed

    Liang, Yu; Xiong, Ziyi; Zheng, Jianxiao; Xu, Dongyang; Zhu, Zeyang; Xiang, Jun; Gan, Jianping; Raboanatahiry, Nadia; Yin, Yongtai; Li, Maoteng

    2016-01-01

    Late embryogenesis abundant (LEA) proteins are a diverse and large group of polypeptides that play important roles in desiccation and freezing tolerance in plants. The LEA family has been systematically characterized in some plants but not Brassica napus. In this study, 108 BnLEA genes were identified in the B. napus genome and classified into eight families based on their conserved domains. Protein sequence alignments revealed an abundance of alanine, lysine and glutamic acid residues in BnLEA proteins. The BnLEA gene structure has few introns (<3), and they are distributed unevenly across all 19 chromosomes in B. napus, occurring as gene clusters in chromosomes A9, C2, C4 and C5. More than two-thirds of the BnLEA genes are associated with segmental duplication. Synteny analysis revealed that most LEA genes are conserved, although gene losses or gains were also identified. These results suggest that segmental duplication and whole-genome duplication played a major role in the expansion of the BnLEA gene family. Expression profiles analysis indicated that expression of most BnLEAs was increased in leaves and late stage seeds. This study presents a comprehensive overview of the LEA gene family in B. napus and provides new insights into the formation of this family. PMID:27072743

  9. Gene Chips and Functional Genomics

    NASA Astrophysics Data System (ADS)

    Hamadeh, Hisham; Afshari, Cynthia

    2000-11-01

    These past few years of scientific discovery will undoubtedly be remembered as the "genomics era," the period in which biologists succeeded in enumerating the sequence of nucleotides making up all, or at least most, of human DNA. And while this achievement has been heralded as a technological feat equal to the moon landing, it is only the first of many advances in DNA technology. Scientists are now faced with the task of understanding the meaning of the DNA sequence. Specifically, they want to learn how the DNA code relates to protein function. An important tool in the study of "functional genomics," is the cDNA microarray—also known as the gene chip. Inspired by computer microchips, gene chips allow scientists to monitor the expression of hundreds, even thousands, of genes in a fraction of the time it used to take to monitor the expression of a single one. By altering the conditions under which a particular tissue expresses genes—say, by exposing it to toxins or growth factors—scientists can determine the suite of genes expressed in different situations and hence start to get a handle on the function of these genes. The authors discuss this important new technology and some of its practical applications.

  10. Heat Shock Protein 70 and 90 Genes in the Harmful Dinoflagellate Cochlodinium polykrikoides: Genomic Structures and Transcriptional Responses to Environmental Stresses

    PubMed Central

    Guo, Ruoyu; Youn, Seok Hyun; Ki, Jang-Seu

    2015-01-01

    The marine dinoflagellate Cochlodinium polykrikoides is responsible for harmful algal blooms in aquatic environments and has spread into the world's oceans. As a microeukaryote, it seems to have distinct genomic characteristics, like gene structure and regulation. In the present study, we characterized heat shock protein (HSP) 70/90 of C. polykrikoides and evaluated their transcriptional responses to environmental stresses. Both HSPs contained the conserved motif patterns, showing the highest homology with those of other dinoflagellates. Genomic analysis showed that the CpHSP70 had no intron but was encoded by tandem arrangement manner with separation of intergenic spacers. However, CpHSP90 had one intron in the coding genomic regions, and no intergenic region was found. Phylogenetic analyses of separate HSPs showed that CpHSP70 was closely related with the dinoflagellate Crypthecodinium cohnii and CpHSP90 with other Gymnodiniales in dinoflagellates. Gene expression analyses showed that both HSP genes were upregulated by the treatments of separate algicides CuSO4 and NaOCl; however, they displayed downregulation pattern with PCB treatment. The transcription of CpHSP90 and CpHSP70 showed similar expression patterns under the same toxicant treatment, suggesting that both genes might have cooperative functions for the toxicant induced gene regulation in the dinoflagellate. PMID:26064872

  11. The primary structure and genomic organization of five novel transcripts located close to the Huntington's disease gene on human chromosome 4p16.3.

    PubMed

    Hadano, S; Ishida, Y; Ikeda, J E

    1998-06-30

    Five distinct novel transcripts (RES4-22, -23, -24, -25 and -26) that mapped to the 1-Mb interval between D4S180 and D4S183 on human chromosome 4p16.3 close to the Huntington's disease (HD) gene were isolated, and the structure and exon/intron organization of each gene were thoroughly analyzed. The transcripts of the RES4-22, -23 and -24 genes each have several isoforms by alternative splicing and these have also been defined. Two transcripts, RES4-24 and RES4-25, reside in the same genomic region with opposite polarities and they also clearly overlap. Among these transcripts, RES4-26 was found to encode a novel zinc finger protein. The transcript map based upon our current level of analysis combined with data from previous studies reveals the gene-rich nature and the intricate organization of the genes in the HD locus.

  12. Expression analysis, genomic structure, and mapping to 7q31 of the human sperm adhesion molecule gene SPAM1

    SciTech Connect

    Jones, M.H.; Davey, P.M.; Aplin, H.; Affara, N.A.

    1995-10-10

    During the course of systematic sequence tag analysis of clones isolated from an adult testis cDNA library, clones 296 and 576 were found to detect 71-74% sequence identity to the guinea pig sperm surface protein PH-20. This surface protein is involved in sperm-egg adhesion in the guinea pig. Nucleotide sequence for 1919 bp of human DNA from a series of overlapping cDNA clones isolated from a testis cDNA library confirmed the sequence identity within a 1527-bp open reading frame to be 71-74% to the guinea pig gene and the similarity to be 60% for the predicted protein of 509 amino acids. Southern blot analysis of human genomic DNA and DNA from somatic cell hybrids indicates that the gene (SPAM1) is unique and does not form part of a larger family and that it maps to chromosome 7. Fluorescence in situ hybridization with yeast artificial chromosome (YAC) clones isolated from the CEPH megaYAC library has refined this localization to 7q31. PCR analysis of genomic DNA and YAC clone DNA has shown that the 1919 bp of the gene that has been cloned covers approximately 11 kb of genomic DNA and is encoded by at least 4 exons. Northern analysis of poly(A){sup -} mRNA from a range of 16 human tissues has demonstrated that expression of the gene as a single 2.4-kb transcript is strictly limited to the testis. 19 refs., 3 figs.

  13. Characterization of promoter region and genomic structure of the murine and human genes encoding Src like adapter protein.

    PubMed

    Kratchmarova, I; Sosinowski, T; Weiss, A; Witter, K; Vincenz, C; Pandey, A

    2001-01-10

    Src-like adapter protein (SLAP) was identified as a signaling molecule in a yeast two-hybrid system using the cytoplasmic domain of EphA2, a receptor protein tyrosine kinase (Pandey et al., 1995. Characterization of a novel Src-like adapter protein that associates with the Eck receptor tyrosine kinase. J. Biol. Chem. 270, 19201-19204). It is very similar to members of the Src family of cytoplasmic tyrosine kinases in that it contains very homologous SH3 and SH2 domains (Abram and Courtneidge, 2000. Src family tyrosine kinases and growth factor signaling. Exp. Cell. Res. 254, 1-13.). However, instead of a kinase domain at the C-terminus, it contains a unique C-terminal region. In order to exclude the possibility that an alternative form exists, we have isolated genomic clones containing the murine Slap gene as well as the human SLA gene. The coding regions of murine Slap and human SLA genes contain seven exons and six introns. Absence of any kinase domain in the genomic region confirm its designation as an adapter protein. Additionally, we have cloned and sequenced approximately 2.6 kb of the region 5' to the initiator methionine of the murine Slap gene. When subcloned upstream of a luciferase gene, this fragment increased the transcriptional activity about 6-fold in a human Jurkat T cell line and approximately 52-fold in a murine T cell line indicating that this region contains promoter elements that dictate SLAP expression. We have also cloned the promoter region of the human SLA gene. Since SLAP is transcriptionally regulated by retinoic acid and by activation of B cells, the cloning of its promoter region will permit a detailed analysis of the elements required for its transcriptional regulation.

  14. Whole genome sequencing in patients with retinitis pigmentosa reveals pathogenic DNA structural changes and NEK2 as a new disease gene

    PubMed Central

    Nishiguchi, Koji M.; Tearle, Richard G.; Liu, Yangfan P.; Oh, Edwin C.; Miyake, Noriko; Benaglio, Paola; Harper, Shyana; Koskiniemi-Kuendig, Hanna; Venturini, Giulia; Sharon, Dror; Koenekoop, Robert K.; Nakamura, Makoto; Kondo, Mineo; Ueno, Shinji; Yasuma, Tetsuhiro R.; Beckmann, Jacques S.; Ikegawa, Shiro; Matsumoto, Naomichi; Terasaki, Hiroko; Berson, Eliot L.; Katsanis, Nicholas; Rivolta, Carlo

    2013-01-01

    We performed whole genome sequencing in 16 unrelated patients with autosomal recessive retinitis pigmentosa (ARRP), a disease characterized by progressive retinal degeneration and caused by mutations in over 50 genes, in search of pathogenic DNA variants. Eight patients were from North America, whereas eight were Japanese, a population for which ARRP seems to have different genetic drivers. Using a specific workflow, we assessed both the coding and noncoding regions of the human genome, including the evaluation of highly polymorphic SNPs, structural and copy number variations, as well as 69 control genomes sequenced by the same procedures. We detected homozygous or compound heterozygous mutations in 7 genes associated with ARRP (USH2A, RDH12, CNGB1, EYS, PDE6B, DFNB31, and CERKL) in eight patients, three Japanese and five Americans. Fourteen of the 16 mutant alleles identified were previously unknown. Among these, there was a 2.3-kb deletion in USH2A and an inverted duplication of ∼446 kb in EYS, which would have likely escaped conventional screening techniques or exome sequencing. Moreover, in another Japanese patient, we identified a homozygous frameshift (p.L206fs), absent in more than 2,500 chromosomes from ethnically matched controls, in the ciliary gene NEK2, encoding a serine/threonine-protein kinase. Inactivation of this gene in zebrafish induced retinal photoreceptor defects that were rescued by human NEK2 mRNA. In addition to identifying a previously undescribed ARRP gene, our study highlights the importance of rare structural DNA variations in Mendelian diseases and advocates the need for screening approaches that transcend the analysis of the coding sequences of the human genome. PMID:24043777

  15. Horizontal gene transfer and the rock record: comparative genomics of phylogenetically distant bacteria that induce wrinkle structure formation in modern sediments.

    PubMed

    Flood, B E; Bailey, J V; Biddle, J F

    2014-03-01

    Wrinkle structures are sedimentary features that are produced primarily through the trapping and binding of siliciclastic sediments by mat-forming micro-organisms. Wrinkle structures and related sedimentary structures in the rock record are commonly interpreted to represent the stabilizing influence of cyanobacteria on sediments because cyanobacteria are known to produce similar textures and structures in modern tidal flat settings. However, other extant bacteria such as filamentous representatives of the family Beggiatoaceae can also interact with sediments to produce sedimentary features that morphologically resemble many of those associated with cyanobacteria-dominated mats. While Beggiatoa spp. and cyanobacteria are metabolically and phylogenetically distant, genomic analyses show that the two groups share hundreds of homologous genes, likely as the result of horizontal gene transfer. The comparative genomics results described here suggest that some horizontally transferred genes may code for phenotypic traits such as filament formation, chemotaxis, and the production of extracellular polymeric substances that potentially underlie the similar biostabilizing influences of these organisms on sediments. We suggest that the ecological utility of certain basic life modes such as the construction of mats and biofilms, coupled with the lateral mobility of genes in the microbial world, introduces an element of uncertainty into the inference of specific phylogenetic origins from gross morphological features preserved in the ancient rock record. PMID:24382125

  16. Whole-genome DNA methylation patterns and complex associations with gene structure and expression during flower development in Arabidopsis.

    PubMed

    Yang, Hongxing; Chang, Fang; You, Chenjiang; Cui, Jie; Zhu, Genfeng; Wang, Lei; Zheng, Yu; Qi, Ji; Ma, Hong

    2015-01-01

    Flower development is a complex process requiring proper spatiotemporal expression of numerous genes. Accumulating evidence indicates that epigenetic mechanisms, including DNA methylation, play essential roles in modulating gene expression. However, few studies have examined the relationship between DNA methylation and floral gene expression on a genomic scale. Here we present detailed analyses of DNA methylomes at single-base resolution for three Arabidopsis floral periods: meristems, early flowers and late flowers. We detected 1.5 million methylcytosines, and estimated the methylation levels for 24 035 genes. We found that many cytosine sites were methylated de novo from the meristem to the early flower stage, and many sites were demethylated from early to late flowers. A comparison of the transcriptome data of the same three periods revealed that the methylation and demethylation processes were correlated with expression changes of >3000 genes, many of which are important for normal flower development. We also found different methylation patterns for three sequence contexts ((m) CG, (m) CHG and (m) CHH) and in different genic regions, potentially with different roles in gene expression.

  17. Aspergillus parasiticus SU-1 genome sequence, predicted chromosome structure, and comparative gene expression under aflatoxin-inducing conditions: evidence that differential expression contributes to species phenotype.

    PubMed

    Linz, John E; Wee, Josephine; Roze, Ludmila V

    2014-08-01

    The filamentous fungi Aspergillus parasiticus and Aspergillus flavus produce the carcinogenic secondary metabolite aflatoxin on susceptible crops. These species differ in the quantity of aflatoxins B1, B2, G1, and G2 produced in culture, in the ability to produce the mycotoxin cyclopiazonic acid, and in morphology of mycelia and conidiospores. To understand the genetic basis for differences in biochemistry and morphology, we conducted next-generation sequence (NGS) analysis of the A. parasiticus strain SU-1 genome and comparative gene expression (RNA sequence analysis [RNA Seq]) analysis of A. parasiticus SU-1 and A. flavus strain NRRL 3357 (3357) grown under aflatoxin-inducing and -noninducing culture conditions. Although A. parasiticus SU-1 and A. flavus 3357 are highly similar in genome structure and gene organization, we observed differences in the presence of specific mycotoxin gene clusters and differential expression of specific mycotoxin genes and gene clusters that help explain differences in the type and quantity of mycotoxins synthesized. Using computer-aided analysis of secondary metabolite clusters (antiSMASH), we demonstrated that A. parasiticus SU-1 and A. flavus 3357 may carry up to 93 secondary metabolite gene clusters, and surprisingly, up to 10% of the genome appears to be dedicated to secondary metabolite synthesis. The data also suggest that fungus-specific zinc binuclear cluster (C6) transcription factors play an important role in regulation of secondary metabolite cluster expression. Finally, we identified uniquely expressed genes in A. parasiticus SU-1 that encode C6 transcription factors and genes involved in secondary metabolism and stress response/cellular defense. Future work will focus on these differentially expressed A. parasiticus SU-1 loci to reveal their role in determining distinct species characteristics. PMID:24951444

  18. cDNA cloning, genomic structure, and chromosome mapping of the human epithelial membrane protein CL-20 gene (EMP1), a member of the PMP22 family.

    PubMed

    Chen, Y; Medvedev, A; Ruzanov, P; Marvin, K W; Jetten, A M

    1997-04-01

    CL-20 is a novel gene encoding a protein that is structurally related to but distinct from the peripheral myelin protein PMP22. Like PMP22, CL-20 is likely to play important roles in the regulation of cell proliferation, differentiation, and cell death. In this study, we describe the cloning and sequencing of a cDNA encoding the human homologue of CL-20 and characterize the genomic structure of this gene. The hCL-20 gene (HGMW-approved symbol EMP1) encodes a protein of 157 amino acids that exhibits 76% identity to the rabbit CL-20 and to the rat EMP-1, which have been described recently, and 39% identity to human PMP22. CL-20 contains four hydrophobic domains, suggesting that it is an integral membrane protein. In particular the second hydrophobic domain encoded within the fourth exon is highly conserved among CL-20, EMP-1, and PMP22, suggesting a functional role for this region. CL-20 mRNA is abundant in squamous-differentiated bronchial epithelial cells; however, low levels of CL-20 mRNA can be detected in several human tissues by Northern analysis. Retinoic acid, which inhibits squamous differentiation, represses CL-20 expression in normal human bronchial epithelial cells. The genomic structure of the hCL-20 gene was analyzed using a P1 vector containing this gene. The hCL-20 gene contains five exons about 0.2, 0.12, 0.1, 0.14, and 2.2 kb and four introns about 15, 1.9, 0.1, and 0.7 kb. We have mapped the hCL-20 gene to chromosome 12p12 by fluorescence in situ hybridization. PMID:9126480

  19. Evolutionary genomics: new genes for new jobs.

    PubMed

    Presgraves, Daven C

    2005-01-26

    Whole genome sequence analyses have confirmed that gene duplication and divergence play major roles in genome evolution. But the details of how young, functionally redundant gene duplicates escape mutational degradation have remained elusive. Several recent studies show that new genes survive because they evolve new, and sometimes essential, functions.

  20. Uses of antimicrobial genes from microbial genome

    DOEpatents

    Sorek, Rotem; Rubin, Edward M.

    2013-08-20

    We describe a method for mining microbial genomes to discover antimicrobial genes and proteins having broad spectrum of activity. Also described are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.

  1. Human retina-specific amine oxidase: genomic structure of the gene (AOC2), alternatively spliced variant, and mRNA expression in retina.

    PubMed

    Imamura, Y; Noda, S; Mashima, Y; Kudoh, J; Oguchi, Y; Shimizu, N

    1998-07-15

    Previously, we reported the isolation of cDNA for human retina-specific amine oxidase (RAO) and the expression of RAO exclusively in retina. Bacterial artificial chromosome clones containing the human RAO gene (AOC2) were mapped to human chromosome 17q21 (Imamura et al., 1997, Genomics 40: 277-283). Here, we report the complete genomic structure of the RAO gene, including 5' flanking sequence, and mRNA expression in retina. The human RAO gene spans 6 kb and is composed of four exons corresponding to the amino acid sequence 1-530, 530-598, 598-641, and 642-729 separated by three introns of 3000, 310, and 351 bp. Screening of a human retina cDNA library revealed the existence of an alternatively spliced cDNA variant with an additional 81 bp at the end of exon 2. The sizes of exons and the locations of exon/intron boundaries in the human RAO gene showed remarkable similarity to those of the human kidney diamine oxidase gene (AOC1). In situ hybridization revealed that mRNA coding for RAO is expressed preferentially in the ganglion cell layer of the mouse retina. We designed four sets of PCR primers to amplify four exons, which will be valuable for analyzing mutations in patients with ocular diseases affecting the retinal ganglion cell layer.

  2. Structure, expression profile and phylogenetic inference of chalcone isomerase-like genes from the narrow-leafed lupin (Lupinus angustifolius L.) genome

    PubMed Central

    Przysiecka, Łucja; Książkiewicz, Michał; Wolko, Bogdan; Naganowska, Barbara

    2015-01-01

    Lupins, like other legumes, have a unique biosynthesis scheme of 5-deoxy-type flavonoids and isoflavonoids. A key enzyme in this pathway is chalcone isomerase (CHI), a member of CHI-fold protein family, encompassing subfamilies of CHI1, CHI2, CHI-like (CHIL), and fatty acid-binding (FAP) proteins. Here, two Lupinus angustifolius (narrow-leafed lupin) CHILs, LangCHIL1 and LangCHIL2, were identified and characterized using DNA fingerprinting, cytogenetic and linkage mapping, sequencing and expression profiling. Clones carrying CHIL sequences were assembled into two contigs. Full gene sequences were obtained from these contigs, and mapped in two L. angustifolius linkage groups by gene-specific markers. Bacterial artificial chromosome fluorescence in situ hybridization approach confirmed the localization of two LangCHIL genes in distinct chromosomes. The expression profiles of both LangCHIL isoforms were very similar. The highest level of transcription was in the roots of the third week of plant growth; thereafter, expression declined. The expression of both LangCHIL genes in leaves and stems was similar and low. Comparative mapping to reference legume genome sequences revealed strong syntenic links; however, LangCHIL2 contig had a much more conserved structure than LangCHIL1. LangCHIL2 is assumed to be an ancestor gene, whereas LangCHIL1 probably appeared as a result of duplication. As both copies are transcriptionally active, questions arise concerning their hypothetical functional divergence. Screening of the narrow-leafed lupin genome and transcriptome with CHI-fold protein sequences, followed by Bayesian inference of phylogeny and cross-genera synteny survey, identified representatives of all but one (CHI1) main subfamilies. They are as follows: two copies of CHI2, FAPa2 and CHIL, and single copies of FAPb and FAPa1. Duplicated genes are remnants of whole genome duplication which is assumed to have occurred after the divergence of Lupinus, Arachis, and Glycine

  3. Genomics of local adaptation with gene flow.

    PubMed

    Tigano, Anna; Friesen, Vicki L

    2016-05-01

    Gene flow is a fundamental evolutionary force in adaptation that is especially important to understand as humans are rapidly changing both the natural environment and natural levels of gene flow. Theory proposes a multifaceted role for gene flow in adaptation, but it focuses mainly on the disruptive effect that gene flow has on adaptation when selection is not strong enough to prevent the loss of locally adapted alleles. The role of gene flow in adaptation is now better understood due to the recent development of both genomic models of adaptive evolution and genomic techniques, which both point to the importance of genetic architecture in the origin and maintenance of adaptation with gene flow. In this review, we discuss three main topics on the genomics of adaptation with gene flow. First, we investigate selection on migration and gene flow. Second, we discuss the three potential sources of adaptive variation in relation to the role of gene flow in the origin of adaptation. Third, we explain how local adaptation is maintained despite gene flow: we provide a synthesis of recent genomic models of adaptation, discuss the genomic mechanisms and review empirical studies on the genomics of adaptation with gene flow. Despite predictions on the disruptive effect of gene flow in adaptation, an increasing number of studies show that gene flow can promote adaptation, that local adaptations can be maintained despite high gene flow, and that genetic architecture plays a fundamental role in the origin and maintenance of local adaptation with gene flow.

  4. PLAZA: A Comparative Genomics Resource to Study Gene and Genome Evolution in Plants[W

    PubMed Central

    Proost, Sebastian; Van Bel, Michiel; Sterck, Lieven; Billiau, Kenny; Van Parys, Thomas; Van de Peer, Yves; Vandepoele, Klaas

    2009-01-01

    The number of sequenced genomes of representatives within the green lineage is rapidly increasing. Consequently, comparative sequence analysis has significantly altered our view on the complexity of genome organization, gene function, and regulatory pathways. To explore all this genome information, a centralized infrastructure is required where all data generated by different sequencing initiatives is integrated and combined with advanced methods for data mining. Here, we describe PLAZA, an online platform for plant comparative genomics (http://bioinformatics.psb.ugent.be/plaza/). This resource integrates structural and functional annotation of published plant genomes together with a large set of interactive tools to study gene function and gene and genome evolution. Precomputed data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, intraspecies whole-genome dot plots, and genomic colinearity between species. Through the integration of high confidence Gene Ontology annotations and tree-based orthology between related species, thousands of genes lacking any functional description are functionally annotated. Advanced query systems, as well as multiple interactive visualization tools, are available through a user-friendly and intuitive Web interface. In addition, detailed documentation and tutorials introduce the different tools, while the workbench provides an efficient means to analyze user-defined gene sets through PLAZA's interface. In conclusion, PLAZA provides a comprehensible and up-to-date research environment to aid researchers in the exploration of genome information within the green plant lineage. PMID:20040540

  5. PLAZA: a comparative genomics resource to study gene and genome evolution in plants.

    PubMed

    Proost, Sebastian; Van Bel, Michiel; Sterck, Lieven; Billiau, Kenny; Van Parys, Thomas; Van de Peer, Yves; Vandepoele, Klaas

    2009-12-01

    The number of sequenced genomes of representatives within the green lineage is rapidly increasing. Consequently, comparative sequence analysis has significantly altered our view on the complexity of genome organization, gene function, and regulatory pathways. To explore all this genome information, a centralized infrastructure is required where all data generated by different sequencing initiatives is integrated and combined with advanced methods for data mining. Here, we describe PLAZA, an online platform for plant comparative genomics (http://bioinformatics.psb.ugent.be/plaza/). This resource integrates structural and functional annotation of published plant genomes together with a large set of interactive tools to study gene function and gene and genome evolution. Precomputed data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, intraspecies whole-genome dot plots, and genomic colinearity between species. Through the integration of high confidence Gene Ontology annotations and tree-based orthology between related species, thousands of genes lacking any functional description are functionally annotated. Advanced query systems, as well as multiple interactive visualization tools, are available through a user-friendly and intuitive Web interface. In addition, detailed documentation and tutorials introduce the different tools, while the workbench provides an efficient means to analyze user-defined gene sets through PLAZA's interface. In conclusion, PLAZA provides a comprehensible and up-to-date research environment to aid researchers in the exploration of genome information within the green plant lineage.

  6. Genomic organization of mouse gene zfp162.

    PubMed

    Wrehlke, C; Wiedemeyer, W R; Schmitt-Wrede, H P; Mincheva, A; Lichter, P; Wunderlich, F

    1999-05-01

    We report the cloning and characterization of the alternatively spliced mouse gene zfp162, formerly termed mzfm, the homolog of the human ZFM1 gene encoding the splicing factor SF1 and a putative signal transduction and activation of RNA (STAR) protein. The zfp162 gene is about 14 kb long and consists of 14 exons and 13 introns. Comparison of zfp162 with the genomic sequences of ZFM1/SF1 revealed that the exon-intron structure and exon sequences are well conserved between the genes, whereas the introns differ in length and sequence composition. Using fluorescent in situ hybridization, the zfp162 gene was assigned to chromosome 19, region B. Screening of a genomic library integrated in lambda DASH II resulted in the identification of the 5'-flanking region of zfp162. Sequence analysis of this region showed that zfp162 is a TATA-less gene containing an initiator control element and two CCAAT boxes. The promoter exhibits the following motifs: AP-2, CRE, Ets, GRE, HNF5, MRE, SP-1, TRE, TCF1, and PU.1. The core promoter, from position -331 to -157, contains the motifs CRE, SP-1, MRE, and AP-2, as determined in transfected CHO-K1 cells and IC-21 cells by reporter gene assay using a secreted form of human placental alkaline phosphatase. The occurrence of PU.1/GRE supports the view that the zfp162 gene encodes a protein involved not only in nuclear RNA metabolism, as the human ZFM1/SF1, but also in as yet unknown macrophage-inherent functions. PMID:10360842

  7. [Integration of different T-DNA structures of ACC oxidase gene into carnation genome extended cut flower vase-life differently].

    PubMed

    Yu, Yi-Xun; Bao, Man-Zhu

    2004-09-01

    The cultivar 'Master' of carnation (Dianthus caryophyllus L.) was transformed with four T-DNA structures containing sense, antisense, sense direct repeat and antisense direct repeat gene of ACC oxidase mediated by Agrobacterium tumefaciens. Southern blotting detection showed that foreign gene was integrated into the carnation genome and 14 transgenic lines were obtained. The transgenic plants were transplanted to soil and grew normally in greenhouse. Of the 12 transgenic lines screened, the cut flower vase life of 8 transgenic lines is up to 11 days and the longest one is 12.8 days while the vase life of the control is 5.8 days under 25 degrees C. The vase life of 2 lines out of 3 with single sense ACO gene is same as that of the control, while the vase life of 3 lines out of 4 with single antisense ACO gene is prolonged. The vase life of cut flowers of 5 lines with direct repeat ACO genes is all prolonged by about 6 days, while the vase life of 3 out of 7 lines with single ACO gene is same as that of the control. During the senescence of cut flowers, the ethylene production of the most of the transgenic lines decreased significantly, and the production of ethylene is not detectable in lines T456, T556 and T575. The results of the research demonstrate that antisense foreign gene inhibits expression of endogenesis gene more significantly than sense one. Both sense direct repeat and antisense direct repeat foreign genes can suppress endogenous gene expression more significantly comparing to single foreign genes. The transgenic lines obtained from this research are useful to minimize carnation cut flower transportation and storage expenses.

  8. Genome Structure of the Legume, Lotus japonicus

    PubMed Central

    Sato, Shusei; Nakamura, Yasukazu; Kaneko, Takakazu; Asamizu, Erika; Kato, Tomohiko; Nakao, Mitsuteru; Sasamoto, Shigemi; Watanabe, Akiko; Ono, Akiko; Kawashima, Kumiko; Fujishiro, Tsunakazu; Katoh, Midori; Kohara, Mitsuyo; Kishida, Yoshie; Minami, Chiharu; Nakayama, Shinobu; Nakazaki, Naomi; Shimizu, Yoshimi; Shinpo, Sayaka; Takahashi, Chika; Wada, Tsuyuko; Yamada, Manabu; Ohmido, Nobuko; Hayashi, Makoto; Fukui, Kiichi; Baba, Tomoya; Nakamichi, Tomoko; Mori, Hirotada; Tabata, Satoshi

    2008-01-01

    The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10 951 complete and 19 848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes. PMID:18511435

  9. Single molecule real-time sequencing of Xanthomonas oryzae genomes reveals a dynamic structure and complex TAL (transcription activator-like) effector gene relationships

    PubMed Central

    Booher, Nicholas J.; Carpenter, Sara C. D.; Sebra, Robert P.; Wang, Li; Salzberg, Steven L.; Leach, Jan E.; Bogdanove, Adam J.

    2016-01-01

    Pathogen-injected, direct transcriptional activators of host genes, TAL (transcription activator-like) effectors play determinative roles in plant diseases caused by Xanthomonas spp. A large domain of nearly identical, 33–35 aa repeats in each protein mediates DNA recognition. This modularity makes TAL effectors customizable and thus important also in biotechnology. However, the repeats render TAL effector (tal) genes nearly impossible to assemble using next-generation, short reads. Here, we demonstrate that long-read, single molecule real-time (SMRT) sequencing solves this problem. Taking an ensemble approach to first generate local, tal gene contigs, we correctly assembled de novo the genomes of two strains of the rice pathogen X. oryzae completed previously using the Sanger method and even identified errors in those references. Sequencing two more strains revealed a dynamic genome structure and a striking plasticity in tal gene content. Our results pave the way for population-level studies to inform resistance breeding, improve biotechnology and probe TAL effector evolution. PMID:27148456

  10. Genomic Structure and Identification of Novel Mutations in Usherin, the Gene Responsible for Usher Syndrome Type IIa

    PubMed Central

    Weston, M. D.; Eudy, J. D.; Fujita, S.; Yao, S.-F.; Usami, S.; Cremers, C.; Greenburg, J.; Ramesar, R.; Martini, A.; Moller, C.; Smith, R. J.; Sumegi, J.; Kimberling, William J.

    2000-01-01

    Usher syndrome type IIa (USHIIa) is an autosomal recessive disorder characterized by moderate to severe sensorineural hearing loss and progressive retinitis pigmentosa. This disorder maps to human chromosome 1q41. Recently, mutations in USHIIa patients were identified in a novel gene isolated from this chromosomal region. The USH2A gene encodes a protein with a predicted molecular weight of 171.5 kD and possesses laminin epidermal growth factor as well as fibronectin type III domains. These domains are observed in other protein components of the basal lamina and extracellular matrixes; they may also be observed in cell-adhesion molecules. The intron/exon organization of the gene whose protein we name “Usherin” was determined by direct sequencing of PCR products and cloned genomic DNA with cDNA-specific primers. The gene is encoded by 21 exons and spans a minimum of 105 kb. A mutation search of 57 independent USHIIa probands was performed with a combination of direct sequencing and heteroduplex analysis of PCR-amplified exons. Fifteen new mutations were found. Of 114 independent USH2A alleles, 58 harbored probable pathologic mutations. Ten cases of USHIIa were true homozygotes and 10 were compound heterozygotes; 18 heterozygotes with only one identifiable mutation were observed. Sixty-five percent (38/58) of cases had at least one mutation, and 51% (58/114) of the total number of possible mutations were identified. The allele 2299delG (previously reported as 2314delG) was the most frequent mutant allele observed (16%; 31/192). Three new missense mutations (C319Y, N346H, and C419F) were discovered; all were restricted to the previously unreported laminin domain VI region of Usherin. The possible significance of this domain, known to be necessary for laminin network assembly, is discussed in the context of domain VI mutations from other proteins. PMID:10729113

  11. A joint modeling approach for uncovering associations between gene expression, bioactivity and chemical structure in early drug discovery to guide lead selection and genomic biomarker development.

    PubMed

    Perualila-Tan, Nolen; Kasim, Adetayo; Talloen, Willem; Verbist, Bie; Göhlmann, Hinrich W H; Shkedy, Ziv

    2016-08-01

    The modern drug discovery process involves multiple sources of high-dimensional data. This imposes the challenge of data integration. A typical example is the integration of chemical structure (fingerprint features), phenotypic bioactivity (bioassay read-outs) data for targets of interest, and transcriptomic (gene expression) data in early drug discovery to better understand the chemical and biological mechanisms of candidate drugs, and to facilitate early detection of safety issues prior to later and expensive phases of drug development cycles. In this paper, we discuss a joint model for the transcriptomic and the phenotypic variables conditioned on the chemical structure. This modeling approach can be used to uncover, for a given set of compounds, the association between gene expression and biological activity taking into account the influence of the chemical structure of the compound on both variables. The model allows to detect genes that are associated with the bioactivity data facilitating the identification of potential genomic biomarkers for compounds efficacy. In addition, the effect of every structural feature on both genes and pIC50 and their associations can be simultaneously investigated. Two oncology projects are used to illustrate the applicability and usefulness of the joint model to integrate multi-source high-dimensional information to aid drug discovery. PMID:27269248

  12. Informational laws of genome structures

    NASA Astrophysics Data System (ADS)

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-06-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.

  13. Informational laws of genome structures

    PubMed Central

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-01-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined. PMID:27354155

  14. Informational laws of genome structures.

    PubMed

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-01-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined. PMID:27354155

  15. Molecular Characterization of Two Lactate Dehydrogenase Genes with a Novel Structural Organization on the Genome of Lactobacillus sp. Strain MONT4

    PubMed Central

    Weekes, Jennifer; Yüksel, Gülhan Ü.

    2004-01-01

    Two lactate dehydrogenase (ldh) genes from Lactobacillus sp. strain MONT4 were cloned by complementation in Escherichia coli DC1368 (ldh pfl) and were sequenced. The sequence analysis revealed a novel genomic organization of the ldh genes. Subcloning of the individual ldh genes and their Northern blot analyses indicated that the genes are monocistronic. PMID:15466577

  16. Persistence drives gene clustering in bacterial genomes

    PubMed Central

    Fang, Gang; Rocha, Eduardo PC; Danchin, Antoine

    2008-01-01

    Background Gene clustering plays an important role in the organization of the bacterial chromosome and several mechanisms have been proposed to explain its extent. However, the controversies raised about the validity of each of these mechanisms remind us that the cause of this gene organization remains an open question. Models proposed to explain clustering did not take into account the function of the gene products nor the likely presence or absence of a given gene in a genome. However, genomes harbor two very different categories of genes: those genes present in a majority of organisms – persistent genes – and those present in very few organisms – rare genes. Results We show that two classes of genes are significantly clustered in bacterial genomes: the highly persistent and the rare genes. The clustering of rare genes is readily explained by the selfish operon theory. Yet, genes persistently present in bacterial genomes are also clustered and we try to understand why. We propose a model accounting specifically for such clustering, and show that indispensability in a genome with frequent gene deletion and insertion leads to the transient clustering of these genes. The model describes how clusters are created via the gene flux that continuously introduces new genes while deleting others. We then test if known selective processes, such as co-transcription, physical interaction or functional neighborhood, account for the stabilization of these clusters. Conclusion We show that the strong selective pressure acting on the function of persistent genes, in a permanent state of flux of genes in bacterial genomes, maintaining their size fairly constant, that drives persistent genes clustering. A further selective stabilization process might contribute to maintaining the clustering. PMID:18179692

  17. A Gene-By-Gene Approach to Bacterial Population Genomics: Whole Genome MLST of Campylobacter.

    PubMed

    Sheppard, Samuel K; Jolley, Keith A; Maiden, Martin C J

    2012-01-01

    Campylobacteriosis remains a major human public health problem world-wide. Genetic analyses of Campylobacter isolates, and particularly molecular epidemiology, have been central to the study of this disease, particularly the characterization of Campylobacter genotypes isolated from human infection, farm animals, and retail food. These studies have demonstrated that Campylobacter populations are highly structured, with distinct genotypes associated with particular wild or domestic animal sources, and that chicken meat is the most likely source of most human infection in countries such as the UK. The availability of multiple whole genome sequences from Campylobacter isolates presents the prospect of identifying those genes or allelic variants responsible for host-association and increased human disease risk, but the diversity of Campylobacter genomes present challenges for such analyses. We present a gene-by-gene approach for investigating the genetic basis of phenotypes in diverse bacteria such as Campylobacter, implemented with the BIGSdb software on the pubMLST.org/campylobacter website. PMID:24704917

  18. Evolution of the P-type II ATPase gene family in the fungi and presence of structural genomic changes among isolates of Glomus intraradices

    PubMed Central

    Corradi, Nicolas; Sanders, Ian R

    2006-01-01

    Background The P-type II ATPase gene family encodes proteins with an important role in adaptation of the cell to variation in external K+, Ca2+ and Na2+ concentrations. The presence of P-type II gene subfamilies that are specific for certain kingdoms has been reported but was sometimes contradicted by discovery of previously unknown homologous sequences in newly sequenced genomes. Members of this gene family have been sampled in all of the fungal phyla except the arbuscular mycorrhizal fungi (AMF; phylum Glomeromycota), which are known to play a key-role in terrestrial ecosystems and to be genetically highly variable within populations. Here we used highly degenerate primers on AMF genomic DNA to increase the sampling of fungal P-Type II ATPases and to test previous predictions about their evolution. In parallel, homologous sequences of the P-type II ATPases have been used to determine the nature and amount of polymorphism that is present at these loci among isolates of Glomus intraradices harvested from the same field. Results In this study, four P-type II ATPase sub-families have been isolated from three AMF species. We show that, contrary to previous predictions, P-type IIC ATPases are present in all basal fungal taxa. Additionally, P-Type IIE ATPases should no longer be considered as exclusive to the Ascomycota and the Basidiomycota, since we also demonstrate their presence in the Zygomycota. Finally, a comparison of homologous sequences encoding P-type IID ATPases showed unexpectedly that indel mutations among coding regions, as well as specific gene duplications occur among AMF individuals within the same field. Conclusion On the basis of these results we suggest that the diversification of P-Type IIC and E ATPases followed the diversification of the extant fungal phyla with independent events of gene gains and losses. Consistent with recent findings on the human genome, but at a much smaller geographic scale, we provided evidence that structural genomic

  19. KEGG: kyoto encyclopedia of genes and genomes.

    PubMed

    Kanehisa, M; Goto, S

    2000-01-01

    KEGG (Kyoto Encyclopedia of Genes and Genomes) is a knowledge base for systematic analysis of gene functions, linking genomic information with higher order functional information. The genomic information is stored in the GENES database, which is a collection of gene catalogs for all the completely sequenced genomes and some partial genomes with up-to-date annotation of gene functions. The higher order functional information is stored in the PATHWAY database, which contains graphical representations of cellular processes, such as metabolism, membrane transport, signal transduction and cell cycle. The PATHWAY database is supplemented by a set of ortholog group tables for the information about conserved subpathways (pathway motifs), which are often encoded by positionally coupled genes on the chromosome and which are especially useful in predicting gene functions. A third database in KEGG is LIGAND for the information about chemical compounds, enzyme molecules and enzymatic reactions. KEGG provides Java graphics tools for browsing genome maps, comparing two genome maps and manipulating expression maps, as well as computational tools for sequence comparison, graph comparison and path computation. The KEGG databases are daily updated and made freely available (http://www. genome.ad.jp/kegg/).

  20. Computational Genomics: From Genome Sequence To Global Gene Regulation

    NASA Astrophysics Data System (ADS)

    Li, Hao

    2000-03-01

    As various genome projects are shifting to the post-sequencing phase, it becomes a big challenge to analyze the sequence data and extract biological information using computational tools. In the past, computational genomics has mainly focused on finding new genes and mapping out their biological functions. With the rapid accumulation of experimental data on genome-wide gene activities, it is now possible to understand how genes are regulated on a genomic scale. A major mechanism for gene regulation is to control the level of transcription, which is achieved by regulatory proteins that bind to short DNA sequences - the regulatory elements. We have developed a new approach to identifying regulatory elements in genomes. The approach formalizes how one would proceed to decipher a ``text'' consisting of a long string of letters written in an unknown language that did not delineate words. The algorithm is based on a statistical mechanics model in which the sequence is segmented probabilistically into ``words'' and a ``dictionary'' of ``words'' is built concurrently. For the control regions in the yeast genome, we built a ``dictionary'' of about one thousand words which includes many known as well as putative regulatory elements. I will discuss how we can use this dictionary to search for genes that are likely to be regulated in a similar fashion and to analyze gene expression data generated from DNA micro-array experiments.

  1. Genomic organization of the adrenoleukodystrophy gene

    SciTech Connect

    Sarde, C.O.; Mosser, J.; Kretz, C.

    1994-07-01

    Adrenoleukodystrophy (ALD), the most frequent peroxisomal disorder, is a severe neurodegenerative disease associated with an impairment of very long chain fatty acids {beta}-oxidation. The authors have recently identified by positional cloning the gene responsible for ALD, located in Xq28. It encodes a new member of the {open_quotes}ABC{close_quotes} superfamily of membrane-associated transporters that shows, in particular, significant homology to the 70-kDa peroxisomal membrane protein (PMP70). They report here a detailed characterization of the ALD gene structure. It extends over 21 kb and consists of 10 exons. To facilitate the detection of mutations in ALD patients, they have determined the intronic sequences flanking the exons as well as the sequence of the 3{prime} untranslated region and of the immediate 5{prime} promoter region. Sequences present in distal exons cross-hybridize strongly to additional sequences in the human genome. The ALD gene has been positioned on a pulsed-field map between DXS15 and the L1CAM gene, about 650 kb upstream from the color pigment genes. The frequent occurrence of color vision anomalies observed in patients with adrenomyeloneuropathy (the adult onset form of ALD) thus does not represent a contiguous gene syndrome but a secondary manifestation of ALD. 37 refs., 6 figs.

  2. Directed self-assembly, genomic assembly complexity and the formation of biological structure, or, what are the genes for nacre?

    PubMed

    Cartwright, Julyan H E

    2016-03-13

    Biology uses dynamical mechanisms of self-organization and self-assembly of materials, but it also choreographs and directs these processes. The difference between abiotic self-assembly and a biological process is rather like the difference between setting up and running an experiment to make a material remotely compared with doing it in one's own laboratory: with a remote experiment-say on the International Space Station-everything must be set up beforehand to let the experiment run 'hands off', but in the laboratory one can intervene at any point in a 'hands-on' approach. It is clear that the latter process, of directed self-assembly, can allow much more complicated experiments and produce far more complex structures than self-assembly alone. This control over self-assembly in biology is exercised at certain key waypoints along a trajectory and the process may be quantified in terms of the genomic assembly complexity of a biomaterial.

  3. Directed self-assembly, genomic assembly complexity and the formation of biological structure, or, what are the genes for nacre?

    PubMed

    Cartwright, Julyan H E

    2016-03-13

    Biology uses dynamical mechanisms of self-organization and self-assembly of materials, but it also choreographs and directs these processes. The difference between abiotic self-assembly and a biological process is rather like the difference between setting up and running an experiment to make a material remotely compared with doing it in one's own laboratory: with a remote experiment-say on the International Space Station-everything must be set up beforehand to let the experiment run 'hands off', but in the laboratory one can intervene at any point in a 'hands-on' approach. It is clear that the latter process, of directed self-assembly, can allow much more complicated experiments and produce far more complex structures than self-assembly alone. This control over self-assembly in biology is exercised at certain key waypoints along a trajectory and the process may be quantified in terms of the genomic assembly complexity of a biomaterial. PMID:26857670

  4. A unified gene catalog for the laboratory mouse reference genome.

    PubMed

    Zhu, Y; Richardson, J E; Hale, P; Baldarelli, R M; Reed, D J; Recla, J M; Sinclair, R; Reddy, T B K; Bult, C J

    2015-08-01

    We report here a semi-automated process by which mouse genome feature predictions and curated annotations (i.e., genes, pseudogenes, functional RNAs, etc.) from Ensembl, NCBI and Vertebrate Genome Annotation database (Vega) are reconciled with the genome features in the Mouse Genome Informatics (MGI) database (http://www.informatics.jax.org) into a comprehensive and non-redundant catalog. Our gene unification method employs an algorithm (fjoin--feature join) for efficient detection of genome coordinate overlaps among features represented in two annotation data sets. Following the analysis with fjoin, genome features are binned into six possible categories (1:1, 1:0, 0:1, 1:n, n:1, n:m) based on coordinate overlaps. These categories are subsequently prioritized for assessment of annotation equivalencies and differences. The version of the unified catalog reported here contains more than 59,000 entries, including 22,599 protein-coding coding genes, 12,455 pseudogenes, and 24,007 other feature types (e.g., microRNAs, lincRNAs, etc.). More than 23,000 of the entries in the MGI gene catalog have equivalent gene models in the annotation files obtained from NCBI, Vega, and Ensembl. 12,719 of the features are unique to NCBI relative to Ensembl/Vega; 11,957 are unique to Ensembl/Vega relative to NCBI, and 3095 are unique to MGI. More than 4000 genome features fall into categories that require manual inspection to resolve structural differences in the gene models from different annotation sources. Using the MGI unified gene catalog, researchers can easily generate a comprehensive report of mouse genome features from a single source and compare the details of gene and transcript structure using MGI's mouse genome browser.

  5. KEGG: Kyoto Encyclopedia of Genes and Genomes.

    PubMed

    Ogata, H; Goto, S; Sato, K; Fujibuchi, W; Bono, H; Kanehisa, M

    1999-01-01

    Kyoto Encyclopedia of Genes and Genomes (KEGG) is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules. The major component of KEGG is the PATHWAY database that consists of graphical diagrams of biochemical pathways including most of the known metabolic pathways and some of the known regulatory pathways. The pathway information is also represented by the ortholog group tables summarizing orthologous and paralogous gene groups among different organisms. KEGG maintains the GENES database for the gene catalogs of all organisms with complete genomes and selected organisms with partial genomes, which are continuously re-annotated, as well as the LIGAND database for chemical compounds and enzymes. Each gene catalog is associated with the graphical genome map for chromosomal locations that is represented by Java applet. In addition to the data collection efforts, KEGG develops and provides various computational tools, such as for reconstructing biochemical pathways from the complete genome sequence and for predicting gene regulatory networks from the gene expression profiles. The KEGG databases are daily updated and made freely available (http://www.genome.ad.jp/kegg/).

  6. Mobilized retrotransposon Tos17 of rice by alien DNA introgression transposes into genes and causes structural and methylation alterations of a flanking genomic region.

    PubMed

    Han, F P; Liu, Z L; Tan, M; Hao, S; Fedak, G; Liu, B

    2004-01-01

    Tos17 is a copia-like endogenous retrotransposon of rice, which can be activated by various stresses such as tissue culture and alien DNA introgression. To confirm element mobilization by introgression and to study possible structural and epigenetic effects of Tos17 insertion on its target sequences, we isolated all flanking regions of Tos17 in an introgressed rice line (Tong35) that contains minute amount of genomic DNA from wild rice (Zizania latifolia). It was found that there has been apparent but limited mobilization of Tos17 in this introgression line, as being reflected by increased but stable copy number of the element in progeny of the line. Three of the five activated copies of the element have transposed into genes. Based on sequence analysis and Southern blot hybridization with several double-enzyme digests, no structural change in Tos17 could be inferred in the introgression line. Cytosine methylation status at all seven CCGG sites within Tos17 was also identical between the introgression line and its rice parent (Matsumae)-all sites being heavily methylated. In contrast, changes in structure and cytosine methylation patterns were detected in one of the three low-copy genomic regions that flank newly transposed Tos17, and all changes are stably inherited through selfed generations. PMID:15703040

  7. Assessing the gene space in draft genomes.

    PubMed

    Parra, Genis; Bradnam, Keith; Ning, Zemin; Keane, Thomas; Korf, Ian

    2009-01-01

    Genome sequencing projects have been initiated for a wide range of eukaryotes. A few projects have reached completion, but most exist as draft assemblies. As one of the main reasons to sequence a genome is to obtain its catalog of genes, an important question is how complete or completable the catalog is in unfinished genomes. To answer this question, we have identified a set of core eukaryotic genes (CEGs), that are extremely highly conserved and which we believe are present in low copy numbers in higher eukaryotes. From an analysis of a phylogenetically diverse set of eukaryotic genome assemblies, we found that the proportion of CEGs mapped in draft genomes provides a useful metric for describing the gene space, and complements the commonly used N50 length and x-fold coverage values.

  8. JGI Plant Genomics Gene Annotation Pipeline

    SciTech Connect

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  9. Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays

    PubMed Central

    Mak, Angel C. Y.; Lai, Yvonne Y. Y.; Lam, Ernest T.; Kwok, Tsz-Piu; Leung, Alden K. Y.; Poon, Annie; Mostovoy, Yulia; Hastie, Alex R.; Stedman, William; Anantharaman, Thomas; Andrews, Warren; Zhou, Xiang; Pang, Andy W. C.; Dai, Heng; Chu, Catherine; Lin, Chin; Wu, Jacob J. K.; Li, Catherine M. L.; Li, Jing-Woei; Yim, Aldrin K. Y.; Chan, Saki; Sibert, Justin; Džakula, Željko; Cao, Han; Yiu, Siu-Ming; Chan, Ting-Fung; Yip, Kevin Y.; Xiao, Ming; Kwok, Pui-Yan

    2016-01-01

    Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation. PMID:26510793

  10. Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays.

    PubMed

    Mak, Angel C Y; Lai, Yvonne Y Y; Lam, Ernest T; Kwok, Tsz-Piu; Leung, Alden K Y; Poon, Annie; Mostovoy, Yulia; Hastie, Alex R; Stedman, William; Anantharaman, Thomas; Andrews, Warren; Zhou, Xiang; Pang, Andy W C; Dai, Heng; Chu, Catherine; Lin, Chin; Wu, Jacob J K; Li, Catherine M L; Li, Jing-Woei; Yim, Aldrin K Y; Chan, Saki; Sibert, Justin; Džakula, Željko; Cao, Han; Yiu, Siu-Ming; Chan, Ting-Fung; Yip, Kevin Y; Xiao, Ming; Kwok, Pui-Yan

    2016-01-01

    Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation.

  11. Structural Genomics of Protein Phosphatases

    SciTech Connect

    Almo,S.; Bonanno, J.; Sauder, J.; Emtage, S.; Dilorenzo, T.; Malashkevich, V.; Wasserman, S.; Swaminathan, S.; Eswaramoorthy, S.; et al

    2007-01-01

    The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptional regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.

  12. Genomic evidence for adaptation by gene duplication.

    PubMed

    Qian, Wenfeng; Zhang, Jianzhi

    2014-08-01

    Gene duplication is widely believed to facilitate adaptation, but unambiguous evidence for this hypothesis has been found in only a small number of cases. Although gene duplication may increase the fitness of the involved organisms by doubling gene dosage or neofunctionalization, it may also result in a simple division of ancestral functions into daughter genes, which need not promote adaptation. Hence, the general validity of the adaptation by gene duplication hypothesis remains uncertain. Indeed, a genome-scale experiment found similar fitness effects of deleting pairs of duplicate genes and deleting individual singleton genes from the yeast genome, leading to the conclusion that duplication rarely results in adaptation. Here we contend that the above comparison is unfair because of a known duplication bias among genes with different fitness contributions. To rectify this problem, we compare homologous genes from the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. We discover that simultaneously deleting a duplicate gene pair in S. cerevisiae reduces fitness significantly more than deleting their singleton counterpart in S. pombe, revealing post-duplication adaptation. The duplicates-singleton difference in fitness effect is not attributable to a potential increase in gene dose after duplication, suggesting that the adaptation is owing to neofunctionalization, which we find to be explicable by acquisitions of binary protein-protein interactions rather than gene expression changes. These results provide genomic evidence for the role of gene duplication in organismal adaptation and are important for understanding the genetic mechanisms of evolutionary innovation. PMID:24904045

  13. Genome evolution in maize: from genomes back to genes.

    PubMed

    Schnable, James C

    2015-01-01

    Maize occupies dual roles as both (a) one of the big-three grain species (along with rice and wheat) responsible for providing more than half of the calories consumed around the world, and (b) a model system for plant genetics and cytogenetics dating back to the origin of the field of genetics in the early twentieth century. The long history of genetic investigation in this species combined with modern genomic and quantitative genetic data has provided particular insight into the characteristics of genes linked to phenotypes and how these genes differ from many other sequences in plant genomes that are not easily distinguishable based on molecular data alone. These recent results suggest that the number of genes in plants that make significant contributions to phenotype may be lower than the number of genes defined by current molecular criteria, and also indicate that syntenic conservation has been underemphasized as a marker for gene function. PMID:25494463

  14. From gene action to reactive genomes

    PubMed Central

    Keller, Evelyn Fox

    2014-01-01

    Poised at a critical turning point in the history of genetics, recent work (e.g. in genomics, epigenetics, genomic plasticity) obliges us to critically reexamine many of our most basic concepts. For example, I argue that genomic research supports a radical transformation in our understanding of the genome – a shift from an earlier conception of that entity as an effectively static collection of active genes to that of a dynamic and reactive system dedicated to the context specific regulation of protein-coding sequences. PMID:24882822

  15. Using Genomics for Natural Product Structure Elucidation.

    PubMed

    Tietz, Jonathan I; Mitchell, Douglas A

    2016-01-01

    Natural products (NPs) are the most historically bountiful source of chemical matter for drug development-especially for anti-infectives. With insights gleaned from genome mining, interest in natural product discovery has been reinvigorated. An essential stage in NP discovery is structural elucidation, which sheds light not only on the chemical composition of a molecule but also its novelty, properties, and derivatization potential. The history of structure elucidation is replete with techniquebased revolutions: combustion analysis, crystallography, UV, IR, MS, and NMR have each provided game-changing advances; the latest such advance is genomics. All natural products have a genetic basis, and the ability to obtain and interpret genomic information for structure elucidation is increasingly available at low cost to non-specialists. In this review, we describe the value of genomics as a structural elucidation technique, especially from the perspective of the natural product chemist approaching an unknown metabolite. Herein we first introduce the databases and programs of interest to the natural products chemist, with an emphasis on those currently most suited for general usability. We describe strategies for linking observed natural product-linked phenotypes to their corresponding gene clusters. We then discuss techniques for extracting structural information from genes, illustrated with numerous case examples. We also provide an analysis of the biases and limitations of the field with recommendations for future development. Our overview is not only aimed at biologically-oriented researchers already at ease with bioinformatic techniques, but also, in particular, at natural product, organic, and/or medicinal chemists not previously familiar with genomic techniques.

  16. Reproduction-related genes in the pearl oyster genome.

    PubMed

    Matsumoto, Toshie; Masaoka, Tetsuji; Fujiwara, Atsushi; Nakamura, Yoji; Satoh, Nori; Awaji, Masahiko

    2013-10-01

    Molluscan reproduction has been a target of biological research because of the various reproductive strategies that have evolved in this phylum. It has also been studied for the development of fisheries technologies, particularly aquaculture. Although fundamental processes of reproduction in other phyla, such as vertebrates and arthropods, have been well studied, information on the molecular mechanisms of molluscan reproduction remains limited. The recently released draft genome of the pearl oyster Pinctada fucata provides a novel and powerful platform for obtaining structural information on the genes and proteins involved in bivalve reproduction. In the present study, we analyzed the pearl oyster draft genome to screen reproduction-related genes. Analysis was mainly conducted for genes reported from other molluscs for encoding orthologs of reproduction-related proteins in other phyla. The gene search in the P. fucata gene models (version 1.1) and genome assembly (version 1.0) were performed using Genome Browser and BLAST software. The obtained gene models were then BLASTP searched against a public database to confirm the best-hit sequences. As a result, more than 40 gene models were identified with high accuracy to encode reproduction-related genes reported for P. fucata and other molluscs. These include vasa, nanos, doublesex- and mab-3-related transcription factor, 5-hydroxytryptamine (5-HT) receptors, vitellogenin, estrogen receptor, and others. The set of reproduction-related genes of P. fucata identified in the present study constitute a new tool for research on bivalve reproduction at the molecular level.

  17. An integrated approach to structural genomics.

    PubMed

    Heinemann, U; Frevert, J; Hofmann, K; Illing, G; Maurer, C; Oschkinat, H; Saenger, W

    2000-01-01

    Structural genomics aims at determining a set of protein structures that will represent all domain folds present in the biosphere. These structures can be used as the basis for the homology modelling of the majority of all remaining protein domains or, indeed, proteins. Structural genomics therefore promises to provide a comprehensive structural description of the protein universe. To achieve this, a broad scientific effort is required. The Berlin-based "Protein Structure Factory" (PSF) plans to contribute to this effort by setting up a local infrastructure for the low-cost, high-throughput analysis of soluble human proteins. In close collaboration with the German Human Genome Project (DHGP) protein-coding genes will be expressed in Escherichia coli or yeast. Affinity-tagged proteins will be purified semi-automatically for biophysical characterization and structure analysis by X-ray diffraction methods and NMR spectroscopy. In all steps of the structure analysis process, possibilities for automation, parallelization and standardization will be explored. Major new facilities that are created for the PSF include a robotic station for large-scale protein crystallization, an NMR center and an experimental station for protein crystallography at the synchrotron storage ring BESSY II in Berlin. PMID:11063780

  18. Characterization of the Genomic Xist Locus in Rodents Reveals Conservation of Overall Gene Structure and Tandem Repeats but Rapid Evolution of Unique Sequence

    PubMed Central

    Nesterova, Tatyana B.; Slobodyanyuk, Sergey Ya.; Elisaphenko, Eugene A.; Shevchenko, Alexander I.; Johnston, Colette; Pavlova, Marina E.; Rogozin, Igor B.; Kolesnikov, Nikolay N.; Brockdorff, Neil; Zakian, Suren M.

    2001-01-01

    The Xist locus plays a central role in the regulation of X chromosome inactivation in mammals, although its exact mode of action remains to be elucidated. Evolutionary studies are important in identifying conserved genomic regions and defining their possible function. Here we report cloning, sequence analysis, and detailed characterization of the Xist gene from four closely related species of common vole (field mouse), Microtus arvalis. Our analysis reveals that there is overall conservation of Xist gene structure both between different vole species and relative to mouse and human Xist/XIST. Within transcribed sequence, there is significant conservation over five short regions of unique sequence and also over Xist-specific tandem repeats. The majority of unique sequences, however, are evolving at an unexpectedly high rate. This is also evident from analysis of flanking sequences, which reveals a very high rate of rearrangement and invasion of dispersed repeats. We discuss these results in the context of Xist gene function and evolution. [The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AJ310127–AJ310130 and AJ311670.] PMID:11337478

  19. Gene family evolution across 12 Drosophila genomes.

    PubMed

    Hahn, Matthew W; Han, Mira V; Han, Sang-Gook

    2007-11-01

    Comparison of whole genomes has revealed large and frequent changes in the size of gene families. These changes occur because of high rates of both gene gain (via duplication) and loss (via deletion or pseudogenization), as well as the evolution of entirely new genes. Here we use the genomes of 12 fully sequenced Drosophila species to study the gain and loss of genes at unprecedented resolution. We find large numbers of both gains and losses, with over 40% of all gene families differing in size among the Drosophila. Approximately 17 genes are estimated to be duplicated and fixed in a genome every million years, a rate on par with that previously found in both yeast and mammals. We find many instances of extreme expansions or contractions in the size of gene families, including the expansion of several sex- and spermatogenesis-related families in D. melanogaster that also evolve under positive selection at the nucleotide level. Newly evolved gene families in our dataset are associated with a class of testes-expressed genes known to have evolved de novo in a number of cases. Gene family comparisons also allow us to identify a number of annotated D. melanogaster genes that are unlikely to encode functional proteins, as well as to identify dozens of previously unannotated D. melanogaster genes with conserved homologs in the other Drosophila. Taken together, our results demonstrate that the apparent stasis in total gene number among species has masked rapid turnover in individual gene gain and loss. It is likely that this genomic revolving door has played a large role in shaping the morphological, physiological, and metabolic differences among species.

  20. Genome structure analysis of molluscs revealed whole genome duplication and lineage specific repeat variation.

    PubMed

    Yoshida, Masa-aki; Ishikura, Yukiko; Moritaki, Takeya; Shoguchi, Eiichi; Shimizu, Kentaro K; Sese, Jun; Ogura, Atsushi

    2011-09-01

    Comparative genome structure analysis allows us to identify novel genes, repetitive sequences and gene duplications. To explore lineage-specific genomic changes of the molluscs that is good model for development of nervous system in invertebrate, we conducted comparative genome structure analyses of three molluscs, pygmy squid, nautilus and scallops using partial genome shotgun sequencing. Most effective elements on the genome structural changes are repetitive elements (REs) causing expansion of genome size and whole genome duplication producing large amount of novel functional genes. Therefore, we investigated variation and proportion of REs and whole genome duplication. We, first, identified variations of REs in the three molluscan genomes by homology-based and de novo RE detection. Proportion of REs were 9.2%, 4.0%, and 3.8% in the pygmy squid, nautilus and scallop, respectively. We, then, estimated genome size of the species as 2.1, 4.2 and 1.8 Gb, respectively, with 2× coverage frequency and DNA sequencing theory. We also performed a gene duplication assay based on coding genes, and found that large-scale duplication events occurred after divergence from the limpet Lottia, an out-group of the three molluscan species. Comparison of all the results suggested that RE expansion did not relate to the increase in genome size of nautilus. Despite close relationships to nautilus, the squid has the largest portion of REs and smaller genome size than nautilus. We also identified lineage-specific RE and gene-family expansions, possibly relate to acquisition of the most complicated eye and brain systems in the three species.

  1. Genomic disorders: A window into human gene and genome evolution

    PubMed Central

    Carvalho, Claudia M. B.; Zhang, Feng; Lupski, James R.

    2010-01-01

    Gene duplications alter the genetic constitution of organisms and can be a driving force of molecular evolution in humans and the great apes. In this context, the study of genomic disorders has uncovered the essential role played by the genomic architecture, especially low copy repeats (LCRs) or segmental duplications (SDs). In fact, regardless of the mechanism, LCRs can mediate or stimulate rearrangements, inciting genomic instability and generating dynamic and unstable regions prone to rapid molecular evolution. In humans, copy-number variation (CNV) has been implicated in common traits such as neuropathy, hypertension, color blindness, infertility, and behavioral traits including autism and schizophrenia, as well as disease susceptibility to HIV, lupus nephritis, and psoriasis among many other clinical phenotypes. The same mechanisms implicated in the origin of genomic disorders may also play a role in the emergence of segmental duplications and the evolution of new genes by means of genomic and gene duplication and triplication, exon shuffling, exon accretion, and fusion/fission events. PMID:20080665

  2. PGDD: a database of gene and genome duplication in plants

    PubMed Central

    Lee, Tae-Ho; Tang, Haibao; Wang, Xiyin; Paterson, Andrew H.

    2013-01-01

    Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at http://chibba.agtec.uga.edu/duplication/), a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD. PMID:23180799

  3. Unmet Challenges of Structural Genomics

    PubMed Central

    Chruszcz, Maksymilian; Domagalski, Marcin; Osinski, Tomasz; Wlodawer, Alexander; Minor, Wladek

    2010-01-01

    Summary Structural genomics (SG) programs have developed during the last decade many novel methodologies for faster and more accurate structure determination. These new tools and approaches led to determination of thousands of protein structures. The generation of enormous amounts of experimental data resulted in significant improvements in the understanding of many biological processes at molecular levels. However, the amount of data collected so far is so large that traditional analysis methods are limiting the rate of extraction of biological and biochemical information from 3-D models. This situation has prompted us to review the challenges that remain unmet by structural genomics, as well as the areas in which the potential impact of SG could exceed what has been achieved so far. PMID:20810277

  4. Gene duplication and transfer events in plant mitochondria genome

    SciTech Connect

    Xiong Aisheng Peng Rihe; Zhuang Jing; Gao Feng; Zhu Bo; Fu Xiaoyan; Xue Yong; Jin Xiaofen; Tian Yongsheng; Zhao Wei; Yao Quanhong

    2008-11-07

    Gene or genome duplication events increase the amount of genetic material available to increase the genomic, and thereby phenotypic, complexity of organisms during evolution. Gene duplication and transfer events have been important to molecular evolution in all three domains of life, and may be the first step in the emergence of new gene functions. Gene transfer events have been proposed as another accelerator of evolution. The duplicated gene or genome, mainly nuclear, has been the subject of several recent reviews. In addition to the nuclear genome, organisms have organelle genomes, including mitochondrial genome. In this review, we briefly summarize gene duplication and transfer events in the plant mitochondrial genome.

  5. Structural analysis of the CD11b gene and phylogenetic analysis of the [alpha]-integrin gene family demonstrate remarkable conservation of genomic organization and suggest early diversification during evolution

    SciTech Connect

    Fleming, J.C.; Gonzalez, D.A.; Tenen, D.G. ); Pahl, H.L. Harvard Medical School, Boston, MA ); Smith, T.F. )

    1993-01-15

    CD11b is a member of the [beta]2 subfamily of the human leukocyte integrins. Its expression is limited to mature myeloid and NK cells and is up-regulated during the course of granulocytic and monocytic differentiation. The CD11b/CD18 (Mo1) heterodimer promotes adhesion of granulocytes and monocytes to C3bi-coated bacteria and endothelial cells. In an attempt to relate the exon structure to the known functional domains, as well as to identify and study cis-acting elements that are involved in its tissue-specific expression, the authors have isolated genomic clones encoding CD11b, deduced the exon/intron organization, and determined the transcriptional start site. The CD11b gene spans 55 kb and is encoded by 30 exons. Its structure closely resembles that of CD11c, another of the three leukocyte integrin [alpha]-chains, and suggests that these two genes arose by a gene duplication event. Furthermore, comparison of the CD11b gene structure with that of platelet glycoprotein llb and Drosophila PS2 suggest how the human leukocyte integrins evolved and dispersed during the course of evolution. 67 refs., 5 figs., 2 tabs.

  6. The mitochondrial genome of Xiphinema americanum sensu stricto (Nematoda: Enoplea): considerable economization in the length and structural features of encoded genes.

    PubMed

    He, Y; Jones, J; Armstrong, M; Lamberti, F; Moens, M

    2005-12-01

    The complete sequence of the mitochondrial genome of the plant parasitic nematode Xiphinema americanum sensu stricto has been determined. At 12626bp it is the smallest metazoan mitochondrial genome reported to date. Genes are transcribed from both strands. Genes coding for 12 proteins, 2 rRNAs and 17 putative tRNAs (with the tRNA-C, I, N, S1, S2 missing) are predicted from the sequence. The arrangement of genes within the X. americanum mitochondrial genome is unique and includes gene overlaps. Comparisons with the mtDNA of other nematodes show that the small size of the X. americanum mtDNA is due to a combination of factors. The two mitochondrial rRNA genes are considerably smaller than those of other nematodes, with most of the protein encoding and tRNA genes also slightly smaller. In addition, five tRNAs genes are absent, lengthy noncoding regions are not present in the mtDNA, and several gene overlaps are present.

  7. Single Nucleotide Polymorphisms Reveal Genetic Structuring of the Carpathian Newt and Provide Evidence of Interspecific Gene Flow in the Nuclear Genome

    PubMed Central

    Zieliński, Piotr; Dudek, Katarzyna; Stuglik, Michał Tadeusz; Liana, Marcin; Babik, Wiesław

    2014-01-01

    Genetic variation within species is commonly structured in a hierarchical manner which may result from superimposition of processes acting at different spatial and temporal scales. In organisms of limited dispersal ability, signatures of past subdivision are detectable for a long time. Studies of contemporary genetic structure in such taxa inform about the history of isolation, range changes and local admixture resulting from geographically restricted hybridization with related species. Here we use a set of 139 transcriptome-derived, unlinked nuclear single nucleotide polymorphisms (SNP) to assess the genetic structure of the Carpathian newt (Lissotriton montandoni, Lm) and introgression from its congener, the smooth newt (L. vulgaris, Lv). Two substantially differentiated groups of Lm populations likely originated from separate refugia, both located in the Eastern Carpathians. The colonization of the present range in north-western and south-western directions was accompanied by a modest loss of variation; admixture between the two groups has occurred in the middle of the Eastern Carpathians. Local, apparently recent introgression of Lv alleles into several Lm populations was detected, demonstrating increased power for admixture detection in comparison to a previous study based on a limited number of microsatellite markers. The level of introgression was higher in Lm populations classified as admixed than in syntopic populations. We discuss the possible causes and propose further tests to distinguish between alternatives. Several outlier loci were identified in tests of interspecific differentiation, suggesting genomic heterogeneity of gene flow between species. PMID:24820116

  8. Bacterial Cellular Engineering by Genome Editing and Gene Silencing

    PubMed Central

    Nakashima, Nobutaka; Miyazaki, Kentaro

    2014-01-01

    Genome editing is an important technology for bacterial cellular engineering, which is commonly conducted by homologous recombination-based procedures, including gene knockout (disruption), knock-in (insertion), and allelic exchange. In addition, some new recombination-independent approaches have emerged that utilize catalytic RNAs, artificial nucleases, nucleic acid analogs, and peptide nucleic acids. Apart from these methods, which directly modify the genomic structure, an alternative approach is to conditionally modify the gene expression profile at the posttranscriptional level without altering the genomes. This is performed by expressing antisense RNAs to knock down (silence) target mRNAs in vivo. This review describes the features and recent advances on methods used in genomic engineering and silencing technologies that are advantageously used for bacterial cellular engineering. PMID:24552876

  9. Regulation of methane genes and genome expression

    SciTech Connect

    John N. Reeve

    2009-09-09

    At the start of this project, it was known that methanogens were Archaeabacteria (now Archaea) and were therefore predicted to have gene expression and regulatory systems different from Bacteria, but few of the molecular biology details were established. The goals were then to establish the structures and organizations of genes in methanogens, and to develop the genetic technologies needed to investigate and dissect methanogen gene expression and regulation in vivo. By cloning and sequencing, we established the gene and operon structures of all of the “methane” genes that encode the enzymes that catalyze methane biosynthesis from carbon dioxide and hydrogen. This work identified unique sequences in the methane gene that we designated mcrA, that encodes the largest subunit of methyl-coenzyme M reductase, that could be used to identify methanogen DNA and establish methanogen phylogenetic relationships. McrA sequences are now the accepted standard and used extensively as hybridization probes to identify and quantify methanogens in environmental research. With the methane genes in hand, we used northern blot and then later whole-genome microarray hybridization analyses to establish how growth phase and substrate availability regulated methane gene expression in Methanobacterium thermautotrophicus ΔH (now Methanothermobacter thermautotrophicus). Isoenzymes or pairs of functionally equivalent enzymes catalyze several steps in the hydrogen-dependent reduction of carbon dioxide to methane. We established that hydrogen availability determine which of these pairs of methane genes is expressed and therefore which of the alternative enzymes is employed to catalyze methane biosynthesis under different environmental conditions. As were unable to establish a reliable genetic system for M. thermautotrophicus, we developed in vitro transcription as an alternative system to investigate methanogen gene expression and regulation. This led to the discovery that an archaeal protein

  10. Functional Insights from Structural Genomics

    SciTech Connect

    Forouhar,F.; Kuzin, A.; Seetharaman, J.; Lee, I.; Zhou, W.; Abashidze, M.; Chen, Y.; Montelione, G.; Tong, L.; et al

    2007-01-01

    Structural genomics efforts have produced structural information, either directly or by modeling, for thousands of proteins over the past few years. While many of these proteins have known functions, a large percentage of them have not been characterized at the functional level. The structural information has provided valuable functional insights on some of these proteins, through careful structural analyses, serendipity, and structure-guided functional screening. Some of the success stories based on structures solved at the Northeast Structural Genomics Consortium (NESG) are reported here. These include a novel methyl salicylate esterase with important role in plant innate immunity, a novel RNA methyltransferase (H. influenzae yggJ (HI0303)), a novel spermidine/spermine N-acetyltransferase (B. subtilis PaiA), a novel methyltransferase or AdoMet binding protein (A. fulgidus AF{_}0241), an ATP:cob(I)alamin adenosyltransferase (B. subtilis YvqK), a novel carboxysome pore (E. coli EutN), a proline racemase homolog with a disrupted active site (B. melitensis BME11586), an FMN-dependent enzyme (S. pneumoniae SP{_}1951), and a 12-stranded {beta}-barrel with a novel fold (V. parahaemolyticus VPA1032).

  11. Structural Genomics of Minimal Organisms: Pipeline and Results

    SciTech Connect

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2007-09-14

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93percent of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  12. 2004 Structural, Function and Evolutionary Genomics

    SciTech Connect

    Douglas L. Brutlag Nancy Ryan Gray

    2005-03-23

    This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

  13. Complete structure, genomic organization, and expression of channel catfish (Ictalurus punctatus, Rafinesque 1818) matrix metalloproteinase-9 gene

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In the course of studying pathogenesis of enteric septicemia of catfish, we noted that the channel catfish (CC) matrix metalloproteinase-9 (MMP-9) expressed sequence tag (EST) was up-regulated after early Edwardsiella ictaluri infection. In this study, the CC MMP-9 gene was cloned, sequenced and ch...

  14. The genomic environment around the Aromatase gene: evolutionary insights

    PubMed Central

    Castro, L Filipe C; Santos, Miguel M; Reis-Henriques, Maria A

    2005-01-01

    Background The cytochrome P450 aromatase (CYP19), catalyses the aromatisation of androgens to estrogens, a key mechanism in vertebrate reproductive physiology. A current evolutionary hypothesis suggests that CYP19 gene arose at the origin of vertebrates, given that it has not been found outside this clade. The human CYP19 gene is located in one of the proposed MHC-paralogon regions (HSA15q). At present it is unclear whether this genomic location is ancestral (which would suggest an invertebrate origin for CYP19) or derived (genomic location with no evolutionary meaning). The distinction between these possibilities should help to clarify the timing of the CYP19 emergence and which taxa should be investigated. Results Here we determine the "genomic environment" around CYP19 in three vertebrate species Homo sapiens, Tetraodon nigroviridis and Xenopus tropicalis. Paralogy studies and phylogenetic analysis of six gene families suggests that the CYP19 gene region was structured through "en bloc" genomic duplication (as part of the MHC-paralogon formation). Four gene families have specifically duplicated in the vertebrate lineage. Moreover, the mapping location of the different paralogues is consistent with a model of "en bloc" duplication. Furthermore, we also determine that this region has retained the same gene content since the divergence of Actinopterygii and Tetrapods. A single inversion in gene order has taken place, probably in the mammalian lineage. Finally, we describe the first invertebrate CYP19 sequence, from Branchiostoma floridae. Conclusion Contrary to previous suggestions, our data indicates an invertebrate origin for the aromatase gene, given the striking conservation pattern in both gene order and gene content, and the presence of aromatase in amphioxus. We propose that CYP19 duplicated in the vertebrate lineage to yield four paralogues, followed by the subsequent loss of all but one gene in vertebrate evolution. Finally, we suggest that agnathans and

  15. Gene Fusion: A Genome Wide Survey

    NASA Technical Reports Server (NTRS)

    Liang, Ping; Riley, Monica

    2001-01-01

    As a well known fact, organisms form larger and complex multimodular (composite or chimeric) and mostly multi-functional proteins through gene fusion of two or more individual genes which have independent evolution histories and functions. We call each of these components a module. The existence of multimodular proteins may improves the efficiency in gene regulation and in cellular functions, and thus may give the host organism advantages in adaptation to environments. Analysis of all gene fusions in present-day organisms should allow us to examine the patterns of gene fusion in context with cellular functions, to trace back the evolution processes from the ancient smaller and uni-functional proteins to the present-day larger and complex multi-functional proteins, and to estimate the minimal number of ancestor proteins that existed in the last common ancestor for all life on earth. Although many multimodular proteins have been experimentally known, identification of gene fusion events systematically at genome scale had not been possible until recently when large number of completed genome sequences have been becoming available. In addition, technical difficulties for such analysis also exist due to the complexity of this biological and evolutionary process. We report from this study a new strategy to computationally identify multimodular proteins using completed genome sequences and the results surveyed from 22 organisms with the data from over 40 organisms to be presented during the meeting. Additional information is contained in the original extended abstract.

  16. The fungal mitochondrial genome project: evolution of fungal mitochondrial genomes and their gene expression.

    PubMed

    Paquin, B; Laforest, M J; Forget, L; Roewer, I; Wang, Z; Longcore, J; Lang, B F

    1997-05-01

    The goal of the fungal mitochondrial genome project (FMGP) is to sequence complete mitochondrial genomes for a representative sample of the major fungal lineages; to analyze the genome structure, gene content, and conserved sequence elements of these sequences; and to study the evolution of gene expression in fungal mitochondria. By using our new sequence data for evolutionary studies, we were able to construct phylogenetic trees that provide further solid evidence that animals and fungi share a common ancestor to the exclusion of chlorophytes and protists. With a database comprising multiple mitochondrial gene sequences, the level of support for our mitochondrial phylogenies is unprecedented, in comparison to trees inferred with nuclear ribosomal RNA sequences. We also found several new molecular features in the mitochondrial genomes of lower fungi, including: (1) tRNA editing, which is the same type as that found in the mitochondria of the amoeboid protozoan Acanthamoeba castellanii; (2) two novel types of putative mobile DNA elements, one encoding a site-specific endonuclease that confers mobility on the element, and the other constituting a class of highly compact, structured elements; and (3) a large number of introns, which provide insights into intron origins and evolution. Here, we present an overview of these results, and discuss examples of the diversity of structures found in the fungal mitochondrial genome.

  17. Pseudomonas aeruginosa Genomic Structure and Diversity

    PubMed Central

    Klockgether, Jens; Cramer, Nina; Wiehlmann, Lutz; Davenport, Colin F.; Tümmler, Burkhard

    2011-01-01

    The Pseudomonas aeruginosa genome (G + C content 65–67%, size 5.5–7 Mbp) is made up of a single circular chromosome and a variable number of plasmids. Sequencing of complete genomes or blocks of the accessory genome has revealed that the genome encodes a large repertoire of transporters, transcriptional regulators, and two-component regulatory systems which reflects its metabolic diversity to utilize a broad range of nutrients. The conserved core component of the genome is largely collinear among P. aeruginosa strains and exhibits an interclonal sequence diversity of 0.5–0.7%. Only a few loci of the core genome are subject to diversifying selection. Genome diversity is mainly caused by accessory DNA elements located in 79 regions of genome plasticity that are scattered around the genome and show an anomalous usage of mono- to tetradecanucleotides. Genomic islands of the pKLC102/PAGI-2 family that integrate into tRNALys or tRNAGly genes represent hotspots of inter- and intraclonal genomic diversity. The individual islands differ in their repertoire of metabolic genes that make a large contribution to the pangenome. In order to unravel intraclonal diversity of P. aeruginosa, the genomes of two members of the PA14 clonal complex from diverse habitats and geographic origin were compared. The genome sequences differed by less than 0.01% from each other. One hundred ninety-eight of the 231 single nucleotide substitutions (SNPs) were non-randomly distributed in the genome. Non-synonymous SNPs were mainly found in an integrated Pf1-like phage and in genes involved in transcriptional regulation, membrane and extracellular constituents, transport, and secretion. In summary, P. aeruginosa is endowed with a highly conserved core genome of low sequence diversity and a highly variable accessory genome that communicates with other pseudomonads and genera via horizontal gene transfer. PMID:21808635

  18. Genomic Prediction of Gene Bank Wheat Landraces

    PubMed Central

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J.; Wenzl, Peter; Singh, Sukhwinder

    2016-01-01

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite

  19. Genomic Prediction of Gene Bank Wheat Landraces.

    PubMed

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J; Wenzl, Peter; Singh, Sukhwinder

    2016-01-01

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, "diversity" and "prediction", including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15-20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials

  20. Chloroplast genome structure in Ilex (Aquifoliaceae)

    PubMed Central

    Yao, Xin; Tan, Yun-Hong; Liu, Ying-Ying; Song, Yu; Yang, Jun-Bo; Corlett, Richard T.

    2016-01-01

    Aquifoliaceae is the largest family in the campanulid order Aquifoliales. It consists of a single genus, Ilex, the hollies, which is the largest woody dioecious genus in the angiosperms. Most species are in East Asia or South America. The taxonomy and evolutionary history remain unclear due to the lack of a robust species-level phylogeny. We produced the first complete chloroplast genomes in this family, including seven Ilex species, by Illumina sequencing of long-range PCR products and subsequent reference-guided de novo assembly. These genomes have a typical bicyclic structure with a conserved genome arrangement and moderate divergence. The total length is 157,741 bp and there is one large single-copy region (LSC) with 87,109 bp, one small single-copy with 18,436 bp, and a pair of inverted repeat regions (IR) with 52,196 bp. A total of 144 genes were identified, including 96 protein-coding genes, 40 tRNA and 8 rRNA. Thirty-four repetitive sequences were identified in Ilex pubescens, with lengths >14 bp and identity >90%, and 11 divergence hotspot regions that could be targeted for phylogenetic markers. This study will contribute to improved resolution of deep branches of the Ilex phylogeny and facilitate identification of Ilex species. PMID:27378489

  1. Chloroplast genome structure in Ilex (Aquifoliaceae).

    PubMed

    Yao, Xin; Tan, Yun-Hong; Liu, Ying-Ying; Song, Yu; Yang, Jun-Bo; Corlett, Richard T

    2016-01-01

    Aquifoliaceae is the largest family in the campanulid order Aquifoliales. It consists of a single genus, Ilex, the hollies, which is the largest woody dioecious genus in the angiosperms. Most species are in East Asia or South America. The taxonomy and evolutionary history remain unclear due to the lack of a robust species-level phylogeny. We produced the first complete chloroplast genomes in this family, including seven Ilex species, by Illumina sequencing of long-range PCR products and subsequent reference-guided de novo assembly. These genomes have a typical bicyclic structure with a conserved genome arrangement and moderate divergence. The total length is 157,741 bp and there is one large single-copy region (LSC) with 87,109 bp, one small single-copy with 18,436 bp, and a pair of inverted repeat regions (IR) with 52,196 bp. A total of 144 genes were identified, including 96 protein-coding genes, 40 tRNA and 8 rRNA. Thirty-four repetitive sequences were identified in Ilex pubescens, with lengths >14 bp and identity >90%, and 11 divergence hotspot regions that could be targeted for phylogenetic markers. This study will contribute to improved resolution of deep branches of the Ilex phylogeny and facilitate identification of Ilex species. PMID:27378489

  2. Genes after the human genome project.

    PubMed

    Baetu, Tudor M

    2012-03-01

    While the Human Genome Nomenclature Committee (HGNC) concept of the gene can accommodate a wide variety of genomic sequences contributing to phenotypic outcomes, it fails to specify how sequences should be grouped when dealing with complex loci consisting of adjacent/overlapping sequences contributing to the same phenotype, distant sequences shown to contribute to the same gene product, and partially overlapping sequences identified by different techniques. The purpose of this paper is to review recently proposed concepts of the gene and critically assess how well they succeed in addressing the above problems while preserving the degree of generality achieved by the HGNC concept. I conclude that a dynamic interplay between mapping and syntax-based concepts is required in order to satisfy these desiderata.

  3. Floral gene resources from basal angiosperms for comparative genomics research

    PubMed Central

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and

  4. Lectin genes in the Frankia alni genome.

    PubMed

    Pujic, Petar; Fournier, Pascale; Alloisio, Nicole; Hay, Anne-Emmanuelle; Maréchal, Joelle; Anchisi, Stéphanie; Normand, Philippe

    2012-01-01

    Frankia alni strain ACN14a's genome was scanned for the presence of determinants involved in interactions with its host plant, Alnus spp. One such determinant type is lectin, proteins that bind specifically to sugar motifs. The genome of F. alni was found to contain 7 such lectin-coding genes, five of which were of the ricinB-type. The proteins coded by these genes contain either only the lectin domain, or also a heat shock protein or a serine-threonine kinase domain upstream. These lectins were found to have several homologs in Streptomyces spp., and a few in other bacterial genomes among which none in Frankia EAN1pec and CcI3 and two in strain EUN1f. One of these F. alni genes, FRAAL0616, was cloned in E. coli, fused with a reporter gene yielding a fusion protein that was found to bind to both root hairs and to bacterial hyphae. This protein was also found to modify the dynamics of nodule formation in A. glutinosa, resulting in a higher number of nodules per root. Its role could thus be to permit binding of microbial cells to root hairs and help symbiosis to occur under conditions of low Frankia cell counts such as in pioneer situations. PMID:22159868

  5. The d4 gene family in the human genome

    SciTech Connect

    Chestkov, A.V.; Baka, I.D.; Kost, M.V.

    1996-08-15

    The d4 domain, a novel zinc finger-like structural motif, was first revealed in the rat neuro-d4 protein. Here we demonstrate that the d4 domain is conserved in evolution and that three related genes form a d4 family in the human genome. The human neuro-d4 is very similar to rat neuro-d4 at both the amino acid and the nucleotide levels. Moreover, the same splice variants have been detected among rat and human neuro-d4 transcripts. This gene has been localized on chromosome 19, and two other genes, members of the d4 family isolated by screening of the human genomic library at low stringency, have been mapped to chromosomes 11 and 14. The gene on chromosome 11 is the homolog of the ubiquitously expressed mouse gene ubi-d4/requiem, which is required for cell death after deprivation of trophic factors. A gene with a conserved d4 domain has been found in the genome of the nematode Caenorhabditis elegans. The conservation of d4 proteins from nematodes to vertebrates suggests that they have a general importance, but a diversity of d4 proteins expressed in vertebrate nervous systems suggests that some family members have special functions. 11 refs., 2 figs.

  6. Draft Genome Sequence and Gene Annotation of the Entomopathogenic Fungus Verticillium hemipterigenum

    PubMed Central

    Horn, Fabian; Habel, Andreas; Scharf, Daniel H.; Dworschak, Jan; Brakhage, Axel A.; Guthke, Reinhard

    2015-01-01

    Verticillium hemipterigenum (anamorph Torrubiella hemipterigena) is an entomopathogenic fungus and produces a broad range of secondary metabolites. Here, we present the draft genome sequence of the fungus, including gene structure and functional annotation. Genes were predicted incorporating RNA-Seq data and functionally annotated to provide the basis for further genome studies. PMID:25614560

  7. Genomic structure and evolution of multigene families: "flowers" on the human genome.

    PubMed

    Kim, Hie Lim; Iwase, Mineyo; Igawa, Takeshi; Nishioka, Tasuku; Kaneko, Satoko; Katsura, Yukako; Takahata, Naoyuki; Satta, Yoko

    2012-01-01

    We report the results of an extensive investigation of genomic structures in the human genome, with a particular focus on relatively large repeats (>50 kb) in adjacent chromosomal regions. We named such structures "Flowers" because the pattern observed on dot plots resembles a flower. We detected a total of 291 Flowers in the human genome. They were predominantly located in euchromatic regions. Flowers are gene-rich compared to the average gene density of the genome. Genes involved in systems receiving environmental information, such as immunity and detoxification, were overrepresented in Flowers. Within a Flower, the mean number of duplication units was approximately four. The maximum and minimum identities between homologs in a Flower showed different distributions; the maximum identity was often concentrated to 100% identity, while the minimum identity was evenly distributed in the range of 78% to 100%. Using a gene conversion detection test, we found frequent and/or recent gene conversion events within the tested Flowers. Interestingly, many of those converted regions contained protein-coding genes. Computer simulation studies suggest that one role of such frequent gene conversions is the elongation of the life span of gene families in a Flower by the resurrection of pseudogenes. PMID:22779033

  8. Exon structure of the human dystrophin gene

    SciTech Connect

    Roberts, R.G.; Coffey, A.J.; Bobrow, M.; Bentley, D.R.

    1993-05-01

    Application of a novel vectorette PCR approach to defining intron-exon boundaries has permitted completion of analysis of the exon structure of the largest and most complex known human gene. The authors present here a summary of the exon structure of the entire human dystrophin gene, together with the sizes of genomic HindIII fragments recognized by each exon, and (where available) GenBank accession numbers for adjacent intron sequences. 20 refs., 1 tab.

  9. The evolution of chloroplast genes and genomes in ferns.

    PubMed

    Wolf, Paul G; Der, Joshua P; Duffy, Aaron M; Davidson, Jacob B; Grusz, Amanda L; Pryer, Kathleen M

    2011-07-01

    Most of the publicly available data on chloroplast (plastid) genes and genomes come from seed plants, with relatively little information from their sister group, the ferns. Here we describe several broad evolutionary patterns and processes in fern plastid genomes (plastomes), and we include some new plastome sequence data. We review what we know about the evolutionary history of plastome structure across the fern phylogeny and we compare plastome organization and patterns of evolution in ferns to those in seed plants. A large clade of ferns is characterized by a plastome that has been reorganized with respect to the ancestral gene order (a similar order that is ancestral in seed plants). We review the sequence of inversions that gave rise to this organization. We also explore global nucleotide substitution patterns in ferns versus those found in seed plants across plastid genes, and we review the high levels of RNA editing observed in fern plastomes.

  10. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

    PubMed

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-02-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp.

  11. Characterization of the Poplar Pan-Genome by Genome-Wide Identification of Structural Variation.

    PubMed

    Pinosio, Sara; Giacomello, Stefania; Faivre-Rampant, Patricia; Taylor, Gail; Jorge, Veronique; Le Paslier, Marie Christine; Zaina, Giusi; Bastien, Catherine; Cattonaro, Federica; Marroni, Fabio; Morgante, Michele

    2016-10-01

    Many recent studies have emphasized the important role of structural variation (SV) in determining human genetic and phenotypic variation. In plants, studies aimed at elucidating the extent of SV are still in their infancy. Evidence has indicated a high presence and an active role of SV in driving plant genome evolution in different plant species.With the aim of characterizing the size and the composition of the poplar pan-genome, we performed a genome-wide analysis of structural variation in three intercrossable poplar species: Populus nigra, Populus deltoides, and Populus trichocarpa We detected a total of 7,889 deletions and 10,586 insertions relative to the P. trichocarpa reference genome, covering respectively 33.2 Mb and 62.9 Mb of genomic sequence, and 3,230 genes affected by copy number variation (CNV). The majority of the detected variants are inter-specific in agreement with a recent origin following separation of species.Insertions and deletions (INDELs) were preferentially located in low-gene density regions of the poplar genome and were, for the majority, associated with the activity of transposable elements. Genes affected by SV showed lower-than-average expression levels and higher levels of dN/dS, suggesting that they are subject to relaxed selective pressure or correspond to pseudogenes.Functional annotation of genes affected by INDELs showed over-representation of categories associated with transposable elements activity, while genes affected by genic CNVs showed enrichment in categories related to resistance to stress and pathogens. This study provides a genome-wide catalogue of SV and the first insight on functional and structural properties of the poplar pan-genome. PMID:27499133

  12. Characterization of the Poplar Pan-Genome by Genome-Wide Identification of Structural Variation

    PubMed Central

    Pinosio, Sara; Giacomello, Stefania; Faivre-Rampant, Patricia; Taylor, Gail; Jorge, Veronique; Le Paslier, Marie Christine; Zaina, Giusi; Bastien, Catherine; Cattonaro, Federica; Marroni, Fabio; Morgante, Michele

    2016-01-01

    Many recent studies have emphasized the important role of structural variation (SV) in determining human genetic and phenotypic variation. In plants, studies aimed at elucidating the extent of SV are still in their infancy. Evidence has indicated a high presence and an active role of SV in driving plant genome evolution in different plant species. With the aim of characterizing the size and the composition of the poplar pan-genome, we performed a genome-wide analysis of structural variation in three intercrossable poplar species: Populus nigra, Populus deltoides, and Populus trichocarpa. We detected a total of 7,889 deletions and 10,586 insertions relative to the P. trichocarpa reference genome, covering respectively 33.2 Mb and 62.9 Mb of genomic sequence, and 3,230 genes affected by copy number variation (CNV). The majority of the detected variants are inter-specific in agreement with a recent origin following separation of species. Insertions and deletions (INDELs) were preferentially located in low-gene density regions of the poplar genome and were, for the majority, associated with the activity of transposable elements. Genes affected by SV showed lower-than-average expression levels and higher levels of dN/dS, suggesting that they are subject to relaxed selective pressure or correspond to pseudogenes. Functional annotation of genes affected by INDELs showed over-representation of categories associated with transposable elements activity, while genes affected by genic CNVs showed enrichment in categories related to resistance to stress and pathogens. This study provides a genome-wide catalogue of SV and the first insight on functional and structural properties of the poplar pan-genome. PMID:27499133

  13. Analysis of the pdx-1 (snz-1/sno-1) region of the Neurospora crassa genome: correlation of pyridoxine-requiring phenotypes with mutations in two structural genes.

    PubMed Central

    Bean, L E; Dvorachek, W H; Braun, E L; Errett, A; Saenz, G S; Giles, M D; Werner-Washburne, M; Nelson, M A; Natvig, D O

    2001-01-01

    We report the analysis of a 36-kbp region of the Neurospora crassa genome, which contains homologs of two closely linked stationary phase genes, SNZ1 and SNO1, from Saccharomyces cerevisiae. Homologs of SNZ1 encode extremely highly conserved proteins that have been implicated in pyridoxine (vitamin B6) metabolism in the filamentous fungi Cercospora nicotianae and in Aspergillus nidulans. In N. crassa, SNZ and SNO homologs map to the region occupied by pdx-1 (pyridoxine requiring), a gene that has been known for several decades, but which was not sequenced previously. In this study, pyridoxine-requiring mutants of N. crassa were found to possess mutations that disrupt conserved regions in either the SNZ or SNO homolog. Previously, nearly all of these mutants were classified as pdx-1. However, one mutant with a disrupted SNO homolog was at one time designated pdx-2. It now appears appropriate to reserve the pdx-1 designation for the N. crassa SNZ homolog and pdx-2 for the SNO homolog. We further report annotation of the entire 36,030-bp region, which contains at least 12 protein coding genes, supporting a previous conclusion of high gene densities (12,000-13,000 total genes) for N. crassa. Among genes in this region other than SNZ and SNO homologs, there was no evidence of shared function. Four of the genes in this region appear to have been lost from the S. cerevisiae lineage. PMID:11238395

  14. Analysis of the pdx-1 (snz-1/sno-1) region of the Neurospora crassa genome: correlation of pyridoxine-requiring phenotypes with mutations in two structural genes.

    PubMed

    Bean, L E; Dvorachek, W H; Braun, E L; Errett, A; Saenz, G S; Giles, M D; Werner-Washburne, M; Nelson, M A; Natvig, D O

    2001-03-01

    We report the analysis of a 36-kbp region of the Neurospora crassa genome, which contains homologs of two closely linked stationary phase genes, SNZ1 and SNO1, from Saccharomyces cerevisiae. Homologs of SNZ1 encode extremely highly conserved proteins that have been implicated in pyridoxine (vitamin B6) metabolism in the filamentous fungi Cercospora nicotianae and in Aspergillus nidulans. In N. crassa, SNZ and SNO homologs map to the region occupied by pdx-1 (pyridoxine requiring), a gene that has been known for several decades, but which was not sequenced previously. In this study, pyridoxine-requiring mutants of N. crassa were found to possess mutations that disrupt conserved regions in either the SNZ or SNO homolog. Previously, nearly all of these mutants were classified as pdx-1. However, one mutant with a disrupted SNO homolog was at one time designated pdx-2. It now appears appropriate to reserve the pdx-1 designation for the N. crassa SNZ homolog and pdx-2 for the SNO homolog. We further report annotation of the entire 36,030-bp region, which contains at least 12 protein coding genes, supporting a previous conclusion of high gene densities (12,000-13,000 total genes) for N. crassa. Among genes in this region other than SNZ and SNO homologs, there was no evidence of shared function. Four of the genes in this region appear to have been lost from the S. cerevisiae lineage.

  15. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes.

    PubMed

    Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K; Duan, Yongping; Luo, Feng

    2015-01-01

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.

  16. FGF: a web tool for Fishing Gene Family in a whole genome database.

    PubMed

    Zheng, Hongkun; Shi, Junjie; Fang, Xiaodong; Li, Yuan; Vang, Søren; Fan, Wei; Wang, Junyi; Zhang, Zhang; Wang, Wen; Kristiansen, Karsten; Wang, Jun

    2007-07-01

    Gene duplication is an important process in evolution. The availability of genome sequences of a number of organisms has made it possible to conduct comprehensive searches for duplicated genes enabling informative studies of their evolution. We have established the FGF (Fishing Gene Family) program to efficiently search for and identify gene families. The FGF output displays the results as visual phylogenetic trees including information on gene structure, chromosome position, duplication fate and selective pressure. It is particularly useful to identify pseudogenes and detect changes in gene structure. FGF is freely available on a web server at http://fgf.genomics.org.cn/

  17. Coevolution of the Organization and Structure of Prokaryotic Genomes.

    PubMed

    Touchon, Marie; Rocha, Eduardo P C

    2016-01-04

    The cytoplasm of prokaryotes contains many molecular machines interacting directly with the chromosome. These vital interactions depend on the chromosome structure, as a molecule, and on the genome organization, as a unit of genetic information. Strong selection for the organization of the genetic elements implicated in these interactions drives replicon ploidy, gene distribution, operon conservation, and the formation of replication-associated traits. The genomes of prokaryotes are also very plastic with high rates of horizontal gene transfer and gene loss. The evolutionary conflicts between plasticity and organization lead to the formation of regions with high genetic diversity whose impact on chromosome structure is poorly understood. Prokaryotic genomes are remarkable documents of natural history because they carry the imprint of all of these selective and mutational forces. Their study allows a better understanding of molecular mechanisms, their impact on microbial evolution, and how they can be tinkered in synthetic biology.

  18. Genome-level identification, gene expression, and comparative analysis of porcine ß-defensin genes

    PubMed Central

    2012-01-01

    Background Beta-defensins (β-defensins) are innate immune peptides with evolutionary conservation across a wide range of species and has been suggested to play important roles in innate immune reactions against pathogens. However, the complete β-defensin repertoire in the pig has not been fully addressed. Result A BLAST analysis was performed against the available pig genomic sequence in the NCBI database to identify β-defensin-related sequences using previously reported β-defensin sequences of pigs, humans, and cattle. The porcine β-defensin gene clusters were mapped to chromosomes 7, 14, 15 and 17. The gene expression analysis of 17 newly annotated porcine β-defensin genes across 15 tissues using semi-quantitative reverse transcription polymerase chain reaction (RT-PCR) showed differences in their tissue distribution, with the kidney and testis having the largest pBD expression repertoire. We also analyzed single nucleotide polymorphisms (SNPs) in the mature peptide region of pBD genes from 35 pigs of 7 breeds. We found 8 cSNPs in 7 pBDs. Conclusion We identified 29 porcine β-defensin (pBD) gene-like sequences, including 17 unreported pBDs in the porcine genome. Comparative analysis of β-defensin genes in the pig genome with those in human and cattle genomes showed structural conservation of β-defensin syntenic regions among these species. PMID:23150902

  19. Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species

    PubMed Central

    Hirao, Tomonori; Watanabe, Atsushi; Kurita, Manabu; Kondo, Teiji; Takata, Katsuhiko

    2008-01-01

    Background The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. Results The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. Conclusion The observed differences in genomic structure between C. japonica and other land plants, including

  20. Characterization of histone genes isolated from Xenopus laevis and Xenopus tropicalis genomic libraries.

    PubMed Central

    Ruberti, I; Fragapane, P; Pierandrei-Amaldi, P; Beccari, E; Amaldi, F; Bozzoni, I

    1982-01-01

    Using a cDNA clone for the histone H3 we have isolated, from two genomic libraries of Xenopus laevis and Xenopus tropicalis, clones containing four different histone gene clusters. The structural organization of X. laevis histone genes has been determined by restriction mapping, Southern blot hybridization and translation of the mRNAs which hybridize to the various restriction fragments. The arrangement of the histone genes in X. tropicalis has been determined by Southern analysis using X. laevis genomic fragments, containing individual genes, as probes. Histone genes are clustered in the genome of X. laevis and X. tropicalis and, compared to invertebrates, show a higher organization heterogeneity as demonstrated by structural analysis of the four genomic clones. In fact, the order of the genes within individual clusters is not conserved. Images PMID:6296782

  1. Genomic organization of the human skeletal muscle sodium channel gene

    SciTech Connect

    George, A.L. Jr.; Iyer, G.S.; Kleinfield, R.; Kallen, R.G.; Barchi, R.L. )

    1993-03-01

    Voltage-dependent sodium channels are essential for normal membrane excitability and contractility in adult skeletal muscle. The gene encoding the principal sodium channel [alpha]-subunit isoform in human skeletal muscle (SCN4A) has recently been shown to harbor point mutations in certain hereditary forms of periodic paralysis. The authors have carried out an analysis of the detailed structure of this gene including delination of intron-exon boundaries by genomic DNA cloning and sequence analysis. The complete coding region of SCN4A is found in 32.5 kb of genomic DNA and consists of 24 exons (54 to >2.2 kb) and 23 introns (97 bp-4.85 kb). The exon organization of the gene shows no relationship to the predicted functional domains of the channel protein and splice junctions interrupt many of the transmembrane segments. The genomic organization of sodium channels may have been partially conserved during evolution as evidenced by the observation that 10 of the 24 splice junctions in SCN4A are positioned in homologous locations in a putative sodium channel gene in Drosophila (para). The information presented here should be extremely useful both for further identifying sodium channel mutations and for gaining a better understanding of sodium channel evolution. 39 refs., 5 figs., 2 tabs.

  2. Diversity of laccase-coding genes in Fusarium oxysporum genomes.

    PubMed

    Kwiatos, Natalia; Ryngajłło, Małgorzata; Bielecki, Stanisław

    2015-01-01

    Multiple studies confirm laccase role in fungal pathogenicity and lignocellulose degradation. In spite of broad genomic research, laccases from plant wilt pathogen Fusarium oxysporum are still not characterized. The study aimed to identify F. oxysporum genes that may encode laccases sensu stricto and to characterize the proteins in silico in order to facilitate further research on their impact on the mentioned processes. Twelve sequenced F. oxysporum genomes available on Broad Institute of Harvard and MIT (2015) website were analyzed and three genes that may encode laccases sensu stricto were found. Their amino acid sequences possess all features essential for their catalytic activity, moreover, the homology models proved the characteristic 3D laccase structures. The study shades light on F. oxysporum as a new source of multicopper oxidases, enzymes with possible high redox potential and broad perspective in biotechnological applications.

  3. Structural and Operational Complexity of the Geobacter Sulfurreducens Genome

    SciTech Connect

    Qiu, Yu; Cho, Byung-Kwan; Park, Young S.; Lovley, Derek R.; Palsson, Bernhard O.; Zengler, Karsten

    2010-06-30

    Prokaryotic genomes can be annotated based on their structural, operational, and functional properties. These annotations provide the pivotal scaffold for understanding cellular functions on a genome-scale, such as metabolism and transcriptional regulation. Here, we describe a systems approach to simultaneously determine the structural and operational annotation of the Geobacter sulfurreducens genome. Integration of proteomics, transcriptomics, RNA polymerase, and sigma factor-binding information with deep-sequencing-based analysis of primary 59-end transcripts allowed for a most precise annotation. The structural annotation is comprised of numerous previously undetected genes, noncoding RNAs, prevalent leaderless mRNA transcripts, and antisense transcripts. When compared with other prokaryotes, we found that the number of antisense transcripts reversely correlated with genome size. The operational annotation consists of 1453 operons, 22% of which have multiple transcription start sites that use different RNA polymerase holoenzymes. Several operons with multiple transcription start sites encoded genes with essential functions, giving insight into the regulatory complexity of the genome. The experimentally determined structural and operational annotations can be combined with functional annotation, yielding a new three-level annotation that greatly expands our understanding of prokaryotic genomes.

  4. Structural and operational complexity of the Geobacter sulfurreducens genome

    PubMed Central

    Qiu, Yu; Cho, Byung-Kwan; Park, Young Seoub; Lovley, Derek; Palsson, Bernhard Ø.; Zengler, Karsten

    2010-01-01

    Prokaryotic genomes can be annotated based on their structural, operational, and functional properties. These annotations provide the pivotal scaffold for understanding cellular functions on a genome-scale, such as metabolism and transcriptional regulation. Here, we describe a systems approach to simultaneously determine the structural and operational annotation of the Geobacter sulfurreducens genome. Integration of proteomics, transcriptomics, RNA polymerase, and sigma factor-binding information with deep-sequencing-based analysis of primary 5′-end transcripts allowed for a most precise annotation. The structural annotation is comprised of numerous previously undetected genes, noncoding RNAs, prevalent leaderless mRNA transcripts, and antisense transcripts. When compared with other prokaryotes, we found that the number of antisense transcripts reversely correlated with genome size. The operational annotation consists of 1453 operons, 22% of which have multiple transcription start sites that use different RNA polymerase holoenzymes. Several operons with multiple transcription start sites encoded genes with essential functions, giving insight into the regulatory complexity of the genome. The experimentally determined structural and operational annotations can be combined with functional annotation, yielding a new three-level annotation that greatly expands our understanding of prokaryotic genomes. PMID:20592237

  5. The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community.

    PubMed

    Arnaud, Martha B; Chibucos, Marcus C; Costanzo, Maria C; Crabtree, Jonathan; Inglis, Diane O; Lotia, Adil; Orvis, Joshua; Shah, Prachi; Skrzypek, Marek S; Binkley, Gail; Miyasato, Stuart R; Wortman, Jennifer R; Sherlock, Gavin

    2010-01-01

    The Aspergillus Genome Database (AspGD) is an online genomics resource for researchers studying the genetics and molecular biology of the Aspergilli. AspGD combines high-quality manual curation of the experimental scientific literature examining the genetics and molecular biology of Aspergilli, cutting-edge comparative genomics approaches to iteratively refine and improve structural gene annotations across multiple Aspergillus species, and web-based research tools for accessing and exploring the data. All of these data are freely available at http://www.aspgd.org. We welcome feedback from users and the research community at aspergillus-curator@genome.stanford.edu.

  6. The mouse gene for vascular endothelial growth factor. Genomic structure, definition of the transcriptional unit, and characterization of transcriptional and post-transcriptional regulatory sequences.

    PubMed

    Shima, D T; Kuroki, M; Deutsch, U; Ng, Y S; Adamis, A P; D'Amore, P A

    1996-02-16

    We describe the genomic organization and functional characterization of the mouse gene encoding vascular endothelial growth factor (VEGF), a polypeptide implicated in embryonic vascular development and postnatal angiogenesis. The coding region for mouse VEGF is interrupted by seven introns and encompasses approximately 14 kilobases. Organization of exons suggests that, similar to the human VEGF gene, alternative splicing generates the 120-, 164-, and 188-amino acid isoforms, but does not predict a fourth VEGF isoform corresponding to human VEGF206. Approximately 1. 2 kilobases of 5'-flanking region have been sequenced, and primer extension analysis identified a single major transcription initiation site, notably lacking TATA or CCAT consensus sequences. The 5'-flanking region is sufficient to promote a 7-fold induction of basal transcription. The genomic region encoding the 3'-untranslated region was determined by Northern and nuclease mapping analysis. Investigation of mRNA sequences responsible for the rapid turnover of VEGF mRNA (mRNA half-life, <1 h) (Shima, D. T. , Deutsch, U., and D'Amore, P. A. (1995) FEBS Lett. 370, 203-208) revealed that the 3'-untranslated region was sufficient to trigger the rapid turnover of a normally long-lived reporter mRNA in vitro. These data and reagents will allow the molecular and genetic analysis of mechanisms that control the developmental and pathological expression of VEGF.

  7. INTEGRATE: gene fusion discovery using whole genome and transcriptome data

    PubMed Central

    Zhang, Jin; White, Nicole M.; Schmidt, Heather K.; Fulton, Robert S.; Tomlinson, Chad; Warren, Wesley C.; Wilson, Richard K.; Maher, Christopher A.

    2016-01-01

    While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use. PMID:26556708

  8. Landscape genomics of Populus trichocarpa: the role of hybridization, limited gene flow, and natural selection in shaping patterns of population structure.

    PubMed

    Geraldes, Armando; Farzaneh, Nima; Grassa, Christopher J; McKown, Athena D; Guy, Robert D; Mansfield, Shawn D; Douglas, Carl J; Cronk, Quentin C B

    2014-11-01

    Populus trichocarpa is an ecologically important tree across western North America. We used a large population sample of 498 accessions over a wide geographical area genotyped with a 34K Populus SNP array to quantify geographical patterns of genetic variation in this species (landscape genomics). We present evidence that three processes contribute to the observed patterns: (1) introgression from the sister species P. balsamifera, (2) isolation by distance (IBD), and (3) natural selection. Introgression was detected only at the margins of the species' distribution. IBD was significant across the sampled area as a whole, but no evidence of restricted gene flow was detected in a core of drainages from southern British Columbia (BC). We identified a large number of FST outliers. Gene Ontology analyses revealed that FST outliers are overrepresented in genes involved in circadian rhythm and response to red/far-red light when the entire dataset is considered, whereas in southern BC heat response genes are overrepresented. We also identified strong correlations between geoclimate variables and allele frequencies at FST outlier loci that provide clues regarding the selective pressures acting at these loci.

  9. Wolbachia genome integrated in an insect chromosome: Evolution and fate of laterally transferred endosymbiont genes

    PubMed Central

    Nikoh, Naruo; Tanaka, Kohjiro; Shibata, Fukashi; Kondo, Natsuko; Hizume, Masahiro; Shimada, Masakazu; Fukatsu, Takema

    2008-01-01

    Recent accumulation of microbial genome data has demonstrated that lateral gene transfers constitute an important and universal evolutionary process in prokaryotes, while those in multicellular eukaryotes are still regarded as unusual, except for endosymbiotic gene transfers from mitochondria and plastids. Here we thoroughly investigated the bacterial genes derived from a Wolbachia endosymbiont on the nuclear genome of the beetle Callosobruchus chinensis. Exhaustive PCR detection and Southern blot analysis suggested that ∼30% of Wolbachia genes, in terms of the gene repertoire of wMel, are present on the insect nuclear genome. Fluorescent in situ hybridization located the transferred genes on the proximal region of the basal short arm of the X chromosome. Molecular evolutionary and other lines of evidence indicated that the transferred genes are probably derived from a single lateral transfer event. The transferred genes were, for the length examined, structurally disrupted, freed from functional constraints, and transcriptionally inactive. Hence, most, if not all, of the transferred genes have been pseudogenized. Notwithstanding this, the transferred genes were ubiquitously detected from Japanese and Taiwanese populations of C. chinensis, while the number of the transferred genes detected differed between the populations. The transferred genes were not detected from congenic beetle species, indicating that the transfer event occurred after speciation of C. chinensis, which was estimated to be one or several million years ago. These features of the laterally transferred endosymbiont genes are compared with the evolutionary patterns of mitochondrial and plastid genome fragments acquired by nuclear genomes through recent endosymbiotic gene transfers. PMID:18073380

  10. Elucidation of operon structures across closely related bacterial genomes.

    PubMed

    Zhou, Chuan; Ma, Qin; Li, Guojun

    2014-01-01

    About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components.

  11. Elucidation of operon structures across closely related bacterial genomes.

    PubMed

    Zhou, Chuan; Ma, Qin; Li, Guojun

    2014-01-01

    About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components. PMID:24959722

  12. GenePRIMP: A GENE PRediction IMprovement Pipeline for Prokaryotic genomes

    SciTech Connect

    Pati, Amrita; Ivanova, Natalia N.; Mikhailova, Natalia; Ovchinnikova, Galina; Hooper, Sean D.; Lykidis, Athanasios; Kyrpides, Nikos C.

    2010-04-01

    We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.

  13. p63 gene structure in the phylum mollusca.

    PubMed

    Baričević, Ana; Štifanić, Mauro; Hamer, Bojan; Batel, Renato

    2015-08-01

    Roles of p53 family ancestor (p63) in the organisms' response to stressful environmental conditions (mainly pollution) have been studied among molluscs, especially in the genus Mytilus, within the last 15 years. Nevertheless, information about gene structure of this regulatory gene in molluscs is scarce. Here we report the first complete genomic structure of the p53 family orthologue in the mollusc Mediterranean mussel Mytilus galloprovincialis and confirm its similarity to vertebrate p63 gene. Our searches within the available molluscan genomes (Aplysia californica, Lottia gigantea, Crassostrea gigas and Biomphalaria glabrata), found only one p53 family member present in a single copy per haploid genome. Comparative analysis of those orthologues, additionally confirmed the conserved p63 gene structure. Conserved p63 gene structure can be a helpful tool to complement or/and revise gene annotations of any future p63 genomic sequence records in molluscs, but also in other animal phyla. Knowledge of the correct gene structure will enable better prediction of possible protein isoforms and their functions. Our analyses also pointed out possible mis-annotations of the p63 gene in sequenced molluscan genomes and stressed the value of manual inspection (based on alignments of cDNA and protein onto the genome sequence) for a reliable and complete gene annotation.

  14. CYP2D6: novel genomic structures and alleles

    PubMed Central

    Kramer, Whitney E.; Walker, Denise L.; O’Kane, Dennis J.; Mrazek, David A.; Fisher, Pamela K.; Dukek, Brian A.; Bruflat, Jamie K.; Black, John L.

    2010-01-01

    Objective CYP2D6 is a polymorphic gene. It has been observed to be deleted, to be duplicated and to undergo recombination events involving the CYP2D7 pseudogene and surrounding sequences. The objective of this study was to discover the genomic structure of CYP2D6 recombinants that interfere with clinical genotyping platforms that are available today. Methods Clinical samples containing rare homozygous CYP2D6 alleles, ambiguous readouts, and those with duplication signals and two different alleles were analyzed by long-range PCR amplification of individual genes, PCR fragment analysis, allele-specific primer extension assay, and DNA sequencing to characterize alleles and genomic structure. Results Novel alleles, genomic structures, and the DNA sequence of these structures are described. Interestingly, in 49 of 50 DNA samples that had CYP2D6 gene duplications or multiplications where two alleles were detected, the chromosome containing the duplication or multiplication had identical tandem alleles. Conclusion Several new CYP2D6 alleles and genomic structures are described which will be useful for CYP2D6 genotyping. The findings suggest that the recombination events responsible for CYP2D6 duplications and multiplications are because of mechanisms other than interchromosomal crossover during meiosis. PMID:19741566

  15. Identification and characterization of essential genes in the human genome

    PubMed Central

    Wang, Tim; Birsoy, Kıvanç; Hughes, Nicholas W.; Krupczak, Kevin M.; Post, Yorick; Wei, Jenny J.; Lander, Eric S.; Sabatini, David M.

    2015-01-01

    Large-scale genetic analysis of lethal phenotypes has elucidated the molecular underpinnings of many biological processes. Using the bacterial clustered regularly interspaced short palindromic repeats (CRISPR) system, we constructed a genome-wide single-guide RNA (sgRNA) library to screen for genes required for proliferation and survival in a human cancer cell line. Our screen revealed the set of cell-essential genes, which was validated by an orthogonal gene-trap-based screen and comparison with yeast gene knockouts. This set is enriched for genes that encode components of fundamental pathways, are expressed at high levels, and contain few inactivating polymorphisms in the human population. We also uncovered a large group of uncharacterized genes involved in RNA processing, a number of whose products localize to the nucleolus. Lastly, screens in additional cell lines showed a high degree of overlap in gene essentiality, but also revealed differences specific to each cell line and cancer type that reflect the developmental origin, oncogenic drivers, paralogous gene expression pattern, and chromosomal structure of each line. These results demonstrate the power of CRISPR-based screens and suggest a general strategy for identifying liabilities in cancer cells. PMID:26472758

  16. Genomic structure of the α-amylase gene in the pearl oyster Pinctada fucata and its expression in response to salinity and food concentration.

    PubMed

    Huang, Guiju; Guo, Yihui; Li, Lu; Fan, Sigang; Yu, Ziniu; Yu, Dahui

    2016-08-01

    Amylase is one of the most important digestive enzymes for phytophagous animals. In this study, the cDNA, genomic DNA, and promoter region of the α-amylase gene of the pearl oyster Pinctada fucata were cloned by using reverse transcription-polymerase chain reaction (RT-PCR), rapid amplification of cDNA ends, and genome-walking methods. The full-length cDNA sequence was 1704bp long and consisted of a 5'-untranslated region of 17bp, a 3'-untranslated region of 118bp, and a 1569-bp open reading frame encoding a 522-aa polypeptide with a 20-aa signal peptide. Sequence alignment revealed that P. fucata α-amylase (Pfamy) shared the highest identity (91.6%) with Pinctada maxima. The phylogenetic tree showed that it was closely related to P. maxima, based on the amino acid sequences. The genomic DNA was 10850bp and contained nine exons, eight introns, and a promoter region of 3932bp. Several transcriptional factors such as GATA-1, AP-1, and SP1 were predicted in the promoter region. Quantitative RT-PCR assay indicated that the relative expression level of Pfamy was significantly higher in the digestive gland than in other tissues (gonad, gills, muscle, and mantle) (P<0.001). The expression level at salinity 27‰ was significantly higher than that at other salinities (P<0.05). Expression reached a minimum when the algal food concentration was 16×10(4)cells/mL, which was significantly lower than the level observed at 8×10(4)cells/mL and 20×10(4) cells/mL (P<0.05). Our findings provide a genetic basis for further research on Pfamy activity and will facilitate studies on the growth mechanisms and genetic improvement of the pearl oyster P. fucata. PMID:27129943

  17. Genomic structure of the α-amylase gene in the pearl oyster Pinctada fucata and its expression in response to salinity and food concentration.

    PubMed

    Huang, Guiju; Guo, Yihui; Li, Lu; Fan, Sigang; Yu, Ziniu; Yu, Dahui

    2016-08-01

    Amylase is one of the most important digestive enzymes for phytophagous animals. In this study, the cDNA, genomic DNA, and promoter region of the α-amylase gene of the pearl oyster Pinctada fucata were cloned by using reverse transcription-polymerase chain reaction (RT-PCR), rapid amplification of cDNA ends, and genome-walking methods. The full-length cDNA sequence was 1704bp long and consisted of a 5'-untranslated region of 17bp, a 3'-untranslated region of 118bp, and a 1569-bp open reading frame encoding a 522-aa polypeptide with a 20-aa signal peptide. Sequence alignment revealed that P. fucata α-amylase (Pfamy) shared the highest identity (91.6%) with Pinctada maxima. The phylogenetic tree showed that it was closely related to P. maxima, based on the amino acid sequences. The genomic DNA was 10850bp and contained nine exons, eight introns, and a promoter region of 3932bp. Several transcriptional factors such as GATA-1, AP-1, and SP1 were predicted in the promoter region. Quantitative RT-PCR assay indicated that the relative expression level of Pfamy was significantly higher in the digestive gland than in other tissues (gonad, gills, muscle, and mantle) (P<0.001). The expression level at salinity 27‰ was significantly higher than that at other salinities (P<0.05). Expression reached a minimum when the algal food concentration was 16×10(4)cells/mL, which was significantly lower than the level observed at 8×10(4)cells/mL and 20×10(4) cells/mL (P<0.05). Our findings provide a genetic basis for further research on Pfamy activity and will facilitate studies on the growth mechanisms and genetic improvement of the pearl oyster P. fucata.

  18. Chapter 6: Structural variation and medical genomics.

    PubMed

    Raphael, Benjamin J

    2012-01-01

    Differences between individual human genomes, or between human and cancer genomes, range in scale from single nucleotide variants (SNVs) through intermediate and large-scale duplications, deletions, and rearrangements of genomic segments. The latter class, called structural variants (SVs), have received considerable attention in the past several years as they are a previously under appreciated source of variation in human genomes. Much of this recent attention is the result of the availability of higher-resolution technologies for measuring these variants, including both microarray-based techniques, and more recently, high-throughput DNA sequencing. We describe the genomic technologies and computational techniques currently used to measure SVs, focusing on applications in human and cancer genomics.

  19. Identifying potential cancer driver genes by genomic data integration

    NASA Astrophysics Data System (ADS)

    Chen, Yong; Hao, Jingjing; Jiang, Wei; He, Tong; Zhang, Xuegong; Jiang, Tao; Jiang, Rui

    2013-12-01

    Cancer is a genomic disease associated with a plethora of gene mutations resulting in a loss of control over vital cellular functions. Among these mutated genes, driver genes are defined as being causally linked to oncogenesis, while passenger genes are thought to be irrelevant for cancer development. With increasing numbers of large-scale genomic datasets available, integrating these genomic data to identify driver genes from aberration regions of cancer genomes becomes an important goal of cancer genome analysis and investigations into mechanisms responsible for cancer development. A computational method, MAXDRIVER, is proposed here to identify potential driver genes on the basis of copy number aberration (CNA) regions of cancer genomes, by integrating publicly available human genomic data. MAXDRIVER employs several optimization strategies to construct a heterogeneous network, by means of combining a fused gene functional similarity network, gene-disease associations and a disease phenotypic similarity network. MAXDRIVER was validated to effectively recall known associations among genes and cancers. Previously identified as well as novel driver genes were detected by scanning CNAs of breast cancer, melanoma and liver carcinoma. Three predicted driver genes (CDKN2A, AKT1, RNF139) were found common in these three cancers by comparative analysis.

  20. Identifying potential cancer driver genes by genomic data integration

    PubMed Central

    Chen, Yong; Hao, Jingjing; Jiang, Wei; He, Tong; Zhang, Xuegong; Jiang, Tao; Jiang, Rui

    2013-01-01

    Cancer is a genomic disease associated with a plethora of gene mutations resulting in a loss of control over vital cellular functions. Among these mutated genes, driver genes are defined as being causally linked to oncogenesis, while passenger genes are thought to be irrelevant for cancer development. With increasing numbers of large-scale genomic datasets available, integrating these genomic data to identify driver genes from aberration regions of cancer genomes becomes an important goal of cancer genome analysis and investigations into mechanisms responsible for cancer development. A computational method, MAXDRIVER, is proposed here to identify potential driver genes on the basis of copy number aberration (CNA) regions of cancer genomes, by integrating publicly available human genomic data. MAXDRIVER employs several optimization strategies to construct a heterogeneous network, by means of combining a fused gene functional similarity network, gene-disease associations and a disease phenotypic similarity network. MAXDRIVER was validated to effectively recall known associations among genes and cancers. Previously identified as well as novel driver genes were detected by scanning CNAs of breast cancer, melanoma and liver carcinoma. Three predicted driver genes (CDKN2A, AKT1, RNF139) were found common in these three cancers by comparative analysis. PMID:24346768

  1. Two duplicated chicken-type lysozyme genes in disc abalone Haliotis discus discus: molecular aspects in relevance to structure, genomic organization, mRNA expression and bacteriolytic function.

    PubMed

    Umasuthan, Navaneethaiyer; Bathige, S D N K; Kasthuri, Saranya Revathy; Wan, Qiang; Whang, Ilson; Lee, Jehee

    2013-08-01

    Lysozymes are crucial antibacterial proteins that are associated with catalytic cleavage of peptidoglycan and subsequent bacteriolysis. The present study describes the identification of two lysozyme genes from disc abalone Haliotis discus discus and their characterization at sequence-, genomic-, transcriptional- and functional-levels. Two cDNAs and BAC clones bearing lysozyme genes were isolated from abalone transcriptome and BAC genomic libraries, respectively and sequences were determined. Corresponding deduced amino acid sequences harbored a chicken-type lysozyme (LysC) family profile and exhibited conserved characteristics of LysC family members including active residues (Glu and Asp) and GS(S/T)DYGIFQINS motif suggested that they are LysC counterparts in disc abalone and designated as abLysC1 and abLysC2. While abLysC1 represented the homolog recently reported in Ezo abalone [1], abLysC2 shared significant identity with LysC homologs. Unlike other vertebrate LysCs, coding sequence of abLysCs were distributed within five exons interrupted by four introns. Both abLysCs revealed a broader mRNA distribution with highest levels in mantle (abLysC1) and hepatopancreas (abLysC2) suggesting their likely main role in defense and digestion, respectively. Investigation of temporal transcriptional profiles post-LPS and -pathogen challenges revealed induced-responses of abLysCs in gills and hemocytes. The in vitro muramidase activity of purified recombinant (r) abLysCs proteins was evaluated, and findings indicated that they are active in acidic pH range (3.5-6.5) and over a broad temperature range (20-60 °C) and influenced by ionic strength. When the antibacterial spectra of (r)abLysCs were examined, they displayed differential activities against both Gram positive and Gram negative strains providing evidence for their involvement in bacteriolytic function in abalone physiology.

  2. Exploring structural variants in environmentally sensitive gene families.

    PubMed

    Young, Nevin Dale; Zhou, Peng; Silverstein, Kevin At

    2016-04-01

    Environmentally sensitive plant gene families like NBS-LRRs, receptor kinases, defensins and others, are known to be highly variable. However, most existing strategies for discovering and describing structural variation in complex gene families provide incomplete and imperfect results. The move to de novo genome assemblies for multiple accessions or individuals within a species is enabling more comprehensive and accurate insights about gene family variation. Earlier array-based genome hybridization and sequence-based read mapping methods were limited by their reliance on a reference genome and by misplacement of paralogous sequences. Variant discovery based on de novo genome assemblies overcome the problems arising from a reference genome and reduce sequence misplacement. As de novo genome sequencing moves to the use of longer reads, artifacts will be minimized, intact tandem gene clusters will be constructed accurately, and insights into rapid evolution will become feasible. PMID:26855303

  3. Genome structure of bdelloid rotifers: shaped by asexuality or desiccation?

    PubMed

    Gladyshev, Eugene A; Arkhipova, Irina R

    2010-01-01

    Bdelloid rotifers are microscopic invertebrate animals best known for their ancient asexuality and the ability to survive desiccation at any life stage. Both factors are expected to have a profound influence on their genome structure. Recent molecular studies demonstrated that, although the gene-rich regions of bdelloid genomes are organized as colinear pairs of closely related sequences and depleted in repetitive DNA, subtelomeric regions harbor diverse transposable elements and horizontally acquired genes of foreign origin. Although asexuality is expected to result in depletion of deleterious transposons, only desiccation appears to have the power to produce all the uncovered genomic peculiarities. Repair of desiccation-induced DNA damage would require the presence of a homologous template, maintaining colinear pairs in gene-rich regions and selecting against insertion of repetitive DNA that might cause chromosomal rearrangements. Desiccation may also induce a transient state of competence in recovering animals, allowing them to acquire environmental DNA. Even if bdelloids engage in rare or obscure forms of sexual reproduction, all these features could still be present. The relative contribution of asexuality and desiccation to genome organization may be clarified by analyzing whole-genome sequences and comparing foreign gene and transposon content in species which lost the ability to survive desiccation.

  4. Genome Structure Gallery from the Mycobacterium Tuberculosis Structual Genomics Consortium

    DOE Data Explorer

    The TB Structural Genomics Consortium works with the structures of proteins from M. tuberculosis, analyzing these structures in the context of functional information that currently exists and that the Consortium generates. The database of linked structural and functional information constructed from this project will form a lasting basis for understanding M. tuberculosis pathogenesis and for structure-based drug design. The Consortium's structural and functional information is publicly available. The Structures Gallery makes more than 650 total structures available by PDB identifier. Some of these are not consortium targets, but all are viewable in 3D color and can be manipulated in various ways by Jmol, an open-source Java viewer for chemical structures in 3D from http://www.jmol.org/

  5. Inter-genomic DNA Exchanges and Homeologous Gene Silencing Shaped the Nascent Allopolyploid Coffee Genome (Coffea arabica L.)

    PubMed Central

    Lashermes, Philippe; Hueber, Yann; Combes, Marie-Christine; Severac, Dany; Dereeper, Alexis

    2016-01-01

    Allopolyploidization is a biological process that has played a major role in plant speciation and evolution. Genomic changes are common consequences of polyploidization, but their dynamics over time are still poorly understood. Coffea arabica, a recently formed allotetraploid, was chosen to study genetic changes that accompany allopolyploid formation. Both RNA-seq and DNA-seq data were generated from two genetically distant C. arabica accessions. Genomic structural variation was investigated using C. canephora, one of its diploid progenitors, as reference genome. The fate of 9047 duplicate homeologous genes was inferred and compared between the accessions. The pattern of SNP density along the reference genome was consistent with the allopolyploid structure. Large genomic duplications or deletions were not detected. Two homeologous copies were retained and expressed in 96% of the genes analyzed. Nevertheless, duplicated genes were found to be affected by various genomic changes leading to homeolog loss or silencing. Genetic and epigenetic changes were evidenced that could have played a major role in the stabilization of the unique ancestral allotetraploid and its subsequent diversification. While the early evolution of C. arabica mainly involved homeologous crossover exchanges, the later stage appears to have relied on more gradual evolution involving gene conversion and homeolog silencing. PMID:27440920

  6. Genomic scan for genes predisposing to schizophrenia

    SciTech Connect

    Coon, H.; Jensen. S.; Holik, J.

    1994-03-15

    We initiated a genome-wide search for genes predisposing to schizophrenia by ascertaining 9 families, each containing three to five cases of schizophrenia. The 9 pedigrees were initially genotyped with 329 polymorphic DNA loci distributed throughout the genome. Assuming either autosomal dominant or recessive inheritance, 254 DNA loci yielded lod scores less than -2.0 at {theta} = 0.0, 101 DNA markers gave lod scores less than -2.0 at {theta} = 0.05, while 5 DNA loci produced maximum lod scores greater than 1: D4S35, D14S17, D15S1, D22S84, and D22S55. Of the DNA markers yielding lod scores greater than 1, D4S35 and D22S55 also were suggestive of linkage when the Affected-Pedigree-Member method was used. The families were then genotyped with four highly polymorphic simple sequence repeat markers; possible linkage diminished with DNA markers mapping nearby D4S35, while suggestive evidence of linkage remained with loci in the region of D22S55. Although follow-up investigation of these chromosomal regions may be warranted, our linkage results should be viewed as preliminary observations, as 35 unaffected persons are not past the age of risk. 90 refs., 3 tabs.

  7. Chicken rRNA Gene Cluster Structure

    PubMed Central

    Dyomin, Alexander G.; Koshel, Elena I.; Kiselev, Artem M.; Saifitdinova, Alsu F.; Galkina, Svetlana A.; Fukagawa, Tatsuo; Kostareva, Anna A.

    2016-01-01

    Ribosomal RNA (rRNA) genes, whose activity results in nucleolus formation, constitute an extremely important part of genome. Despite the extensive exploration into avian genomes, no complete description of avian rRNA gene primary structure has been offered so far. We publish a complete chicken rRNA gene cluster sequence here, including 5’ETS (1836 bp), 18S rRNA gene (1823 bp), ITS1 (2530 bp), 5.8S rRNA gene (157 bp), ITS2 (733 bp), 28S rRNA gene (4441 bp) and 3’ETS (343 bp). The rRNA gene cluster sequence of 11863 bp was assembled from raw reads and deposited to GenBank under KT445934 accession number. The assembly was validated through in situ fluorescent hybridization analysis on chicken metaphase chromosomes using computed and synthesized specific probes, as well as through the reference assembly against de novo assembled rRNA gene cluster sequence using sequenced fragments of BAC-clone containing chicken NOR (nucleolus organizer region). The results have confirmed the chicken rRNA gene cluster validity. PMID:27299357

  8. Genome-wide consequences of deleting any single gene

    PubMed Central

    Teng, Xinchen; Dayhoff-Brannigan, Margaret; Cheng, Wen-Chih; Gilbert, Catherine E.; Sing, Cierra N.; Diny, Nicola L.; Wheelan, Sarah J.; Dunham, Maitreya J.; Boeke, Jef D.; Pineda, Fernando J.; Hardwick, J. Marie

    2013-01-01

    Summary Loss or duplication of chromosome segments can lead to further genomic changes associated with cancer. However, it is not known if only a select subset of genes is responsible for driving further changes. To determine if perturbation of any given gene in a genome suffices to drive subsequent genetic changes, we analyzed the yeast knockout collection for secondary mutations of functional consequence. Unlike wild type, most gene knockout strains were found to have one additional mutant gene affecting nutrient responses and/or heat-stress-induced cell death. Moreover, independent knockouts of the same gene often evolved mutations in the same secondary gene. Genome sequencing identified acquired mutations in several human tumor suppressor homologs. Thus, mutation of any single gene may cause a genomic imbalance with consequences sufficient to drive adaptive genetic changes. This complicates genetic analyses, but is a logical consequence of losing a functional unit originally acquired under pressure during evolution. PMID:24211263

  9. Structural genomics for science and society.

    PubMed

    Hol, W G

    2000-11-01

    The field of robotics is affecting structural biology, enabling the era of structural genomics. The potential impact on protein fold prediction, biology, protein engineering and medicine is immense. Unraveling mysteries in the protein structure universe will require a dedicated effort for decades to come with computational toxicology as possibly a century long challenge.

  10. Evolution of mammalian genome organization inferred from comparative gene mapping

    PubMed Central

    Murphy, William J; Stanyon, Roscoe; O'Brien, Stephen J

    2001-01-01

    Comparative genome analyses, including chromosome painting in over 40 diverse mammalian species, ordered gene maps from several representatives of different mammalian and vertebrate orders, and large-scale sequencing of the human and mouse genomes are beginning to provide insight into the rates and patterns of chromosomal evolution on a whole-genome scale, as well as into the forces that have sculpted the genomes of extant mammalian species. PMID:11423011

  11. The discrepancies in the results of bioinformatics tools for genomic structural annotation

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena; Nowak, Robert; Osipowski, Paweł; Rymuszka, Jacek; Świerkula, Katarzyna; Wojcieszek, Michał; Przybecki, Zbigniew

    2014-11-01

    A major focus of sequencing project is to identify genes in genomes. However it is necessary to define the variety of genes and the criteria for identifying them. In this work we present discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. We use Fgenesh, GenScan and GeneMark to automated structural annotation, the results have been compared to reference annotation.

  12. Evidence-based gene predictions in plant genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Automated evidence-based gene building is a rapid and cost-effective way to provide reliable gene annotations on newly sequenced genomes. One of the limitations of evidence-based gene builders, however, is their requirement for gene expression evidence—known proteins, full-length cDNAs, or expressed...

  13. Genome-, Transcriptome- and Proteome-Wide Analyses of the Gliadin Gene Families in Triticum urartu.

    PubMed

    Zhang, Yanlin; Luo, Guangbin; Liu, Dongcheng; Wang, Dongzhi; Yang, Wenlong; Sun, Jiazhu; Zhang, Aimin; Zhan, Kehui

    2015-01-01

    Gliadins are the major components of storage proteins in wheat grains, and they play an essential role in the dough extensibility and nutritional quality of flour. Because of the large number of the gliadin family members, the high level of sequence identity, and the lack of abundant genomic data for Triticum species, identifying the full complement of gliadin family genes in hexaploid wheat remains challenging. Triticum urartu is a wild diploid wheat species and considered the A-genome donor of polyploid wheat species. The accession PI428198 (G1812) was chosen to determine the complete composition of the gliadin gene families in the wheat A-genome using the available draft genome. Using a PCR-based cloning strategy for genomic DNA and mRNA as well as a bioinformatics analysis of genomic sequence data, 28 gliadin genes were characterized. Of these genes, 23 were α-gliadin genes, three were γ-gliadin genes and two were ω-gliadin genes. An RNA sequencing (RNA-Seq) survey of the dynamic expression patterns of gliadin genes revealed that their synthesis in immature grains began prior to 10 days post-anthesis (DPA), peaked at 15 DPA and gradually decreased at 20 DPA. The accumulation of proteins encoded by 16 of the expressed gliadin genes was further verified and quantified using proteomic methods. The phylogenetic analysis demonstrated that the homologs of these α-gliadin genes were present in tetraploid and hexaploid wheat, which was consistent with T. urartu being the A-genome progenitor species. This study presents a systematic investigation of the gliadin gene families in T. urartu that spans the genome, transcriptome and proteome, and it provides new information to better understand the molecular structure, expression profiles and evolution of the gliadin genes in T. urartu and common wheat.

  14. Genome-, Transcriptome- and Proteome-Wide Analyses of the Gliadin Gene Families in Triticum urartu

    PubMed Central

    Wang, Dongzhi; Yang, Wenlong; Sun, Jiazhu; Zhang, Aimin; Zhan, Kehui

    2015-01-01

    Gliadins are the major components of storage proteins in wheat grains, and they play an essential role in the dough extensibility and nutritional quality of flour. Because of the large number of the gliadin family members, the high level of sequence identity, and the lack of abundant genomic data for Triticum species, identifying the full complement of gliadin family genes in hexaploid wheat remains challenging. Triticum urartu is a wild diploid wheat species and considered the A-genome donor of polyploid wheat species. The accession PI428198 (G1812) was chosen to determine the complete composition of the gliadin gene families in the wheat A-genome using the available draft genome. Using a PCR-based cloning strategy for genomic DNA and mRNA as well as a bioinformatics analysis of genomic sequence data, 28 gliadin genes were characterized. Of these genes, 23 were α-gliadin genes, three were γ-gliadin genes and two were ω-gliadin genes. An RNA sequencing (RNA-Seq) survey of the dynamic expression patterns of gliadin genes revealed that their synthesis in immature grains began prior to 10 days post-anthesis (DPA), peaked at 15 DPA and gradually decreased at 20 DPA. The accumulation of proteins encoded by 16 of the expressed gliadin genes was further verified and quantified using proteomic methods. The phylogenetic analysis demonstrated that the homologs of these α-gliadin genes were present in tetraploid and hexaploid wheat, which was consistent with T. urartu being the A-genome progenitor species. This study presents a systematic investigation of the gliadin gene families in T. urartu that spans the genome, transcriptome and proteome, and it provides new information to better understand the molecular structure, expression profiles and evolution of the gliadin genes in T. urartu and common wheat. PMID:26132381

  15. Genome-wide characterization of the Pectate Lyase-like (PLL) genes in Brassica rapa.

    PubMed

    Jiang, Jingjing; Yao, Lina; Miao, Ying; Cao, Jiashu

    2013-11-01

    Pectate lyases (PL) depolymerize demethylated pectin (pectate, EC 4.2.2.2) by catalyzing the eliminative cleavage of α-1,4-glycosidic linked galacturonan. Pectate Lyase-like (PLL) genes are one of the largest and most complex families in plants. However, studies on the phylogeny, gene structure, and expression of PLL genes are limited. To understand the potential functions of PLL genes in plants, we characterized their intron-exon structure, phylogenetic relationships, and protein structures, and measured their expression patterns in various tissues, specifically the reproductive tissues in Brassica rapa. Sequence alignments revealed two characteristic motifs in PLL genes. The chromosome location analysis indicated that 18 of the 46 PLL genes were located in the least fractionated sub-genome (LF) of B. rapa, while 16 were located in the medium fractionated sub-genome (MF1) and 12 in the more fractionated sub-genome (MF2). Quantitative RT-PCR analysis showed that BrPLL genes were expressed in various tissues, with most of them being expressed in flowers. Detailed qRT-PCR analysis identified 11 pollen specific PLL genes and several other genes with unique spatial expression patterns. In addition, some duplicated genes showed similar expression patterns. The phylogenetic analysis identified three PLL gene subfamilies in plants, among which subfamily II might have evolved from gene neofunctionalization or subfunctionalization. Therefore, this study opens the possibility for exploring the roles of PLL genes during plant development.

  16. Comparative genomics on Vangl1 and Vangl2 genes.

    PubMed

    Katoh, Yuriko; Katoh, Masaru

    2005-05-01

    WNT signals are transduced to the beta-catenin pathway or the planar cell polarity (PCP) pathway. WNT - beta-catenin pathway is implicated in carcinogenesis, while WNT-PCP pathway is implicated in cell motility and metastasis. Drosophila Van Gogh (Vang), Frizzled (Fz), Starry night (Stan), Prickle (Pk) and Diego (Dgo) are PCP signaling molecules. Vangl1 (Strabismus 2) and Vangl2 (Strabismus 1 or Ltap) are mammalian homologs of Drosophila Vang interacting with PRICKLE1, PRICKLE2, ANKRD6, DVL1, DVL2, DVL3, KAI1 and MAGI3. Here we identified and characterized rat Vangl1 and Vangl2 genes by using bioinformatics. Rat Vangl1 gene, consisting of eight exons, was located within AC098913.7 and AC108524.6 genome sequences. Rat Vangl2 gene, consisting of eight exons, was located within AC118856.3 and AC115243.5 genome sequences. Exon-intron structure of mammalian Vangl1 and Vangl2 orthologs was well conserved. E47 and double ELK1-binding sites were conserved among promoters of mammalian Vangl1 orthologs. PAX4, NFkappaB, HNF4, SOX9, RFX1, and POU2F1 (OCT1)-binding sites were conserved among promoters of mammalian Vangl2 orthologs. Rat Vangl1 (526 aa) and Vangl2 (521 aa) were four-transmembrane proteins with 71.5% total-amino-acid identity. Ser cluster motif (SxxSxxSxxSxxSxxS) in the N-terminal cytoplasmic region and PDZ-binding motif in the C-terminal cytoplasmic tail were evolutionarily conserved among vertebrate Vangl1 and Vangl2 orthologs. This is the first report on rat Vangl1 and Vangl2 genes as well as on comparative genomics for Vangl1 and Vangl2 orthologs.

  17. Mechanisms and Dynamics of Orphan Gene Emergence in Insect Genomes

    PubMed Central

    Wissler, Lothar; Gadau, Jürgen; Simola, Daniel F.; Helmkampf, Martin; Bornberg-Bauer, Erich

    2013-01-01

    Orphan genes are defined as genes that lack detectable similarity to genes in other species and therefore no clear signals of common descent (i.e., homology) can be inferred. Orphans are an enigmatic portion of the genome because their origin and function are mostly unknown and they typically make up 10% to 30% of all genes in a genome. Several case studies demonstrated that orphans can contribute to lineage-specific adaptation. Here, we study orphan genes by comparing 30 arthropod genomes, focusing in particular on seven recently sequenced ant genomes. This setup allows analyzing a major metazoan taxon and a comparison between social Hymenoptera (ants and bees) and nonsocial Diptera (flies and mosquitoes). First, we find that recently split lineages undergo accelerated genomic reorganization, including the rapid gain of many orphan genes. Second, between the two insect orders Hymenoptera and Diptera, orphan genes are more abundant and emerge more rapidly in Hymenoptera, in particular, in leaf-cutter ants. With respect to intragenomic localization, we find that ant orphan genes show little clustering, which suggests that orphan genes in ants are scattered uniformly over the genome and between nonorphan genes. Finally, our results indicate that the genetic mechanisms creating orphan genes—such as gene duplication, frame-shift fixation, creation of overlapping genes, horizontal gene transfer, and exaptation of transposable elements—act at different rates in insects, primates, and plants. In Formicidae, the majority of orphan genes has their origin in intergenic regions, pointing to a high rate of de novo gene formation or generalized gene loss, and support a recently proposed dynamic model of frequent gene birth and death. PMID:23348040

  18. Primate genome architecture influences structural variation mechanisms and functional consequences.

    PubMed

    Gokcumen, Omer; Tischler, Verena; Tica, Jelena; Zhu, Qihui; Iskow, Rebecca C; Lee, Eunjung; Fritz, Markus Hsi-Yang; Langdon, Amy; Stütz, Adrian M; Pavlidis, Pavlos; Benes, Vladimir; Mills, Ryan E; Park, Peter J; Lee, Charles; Korbel, Jan O

    2013-09-24

    Although nucleotide resolution maps of genomic structural variants (SVs) have provided insights into the origin and impact of phenotypic diversity in humans, comparable maps in nonhuman primates have thus far been lacking. Using massively parallel DNA sequencing, we constructed fine-resolution genomic structural variation maps in five chimpanzees, five orang-utans, and five rhesus macaques. The SV maps, which are comprised of thousands of deletions, duplications, and mobile element insertions, revealed a high activity of retrotransposition in macaques compared with great apes. By comparison, nonallelic homologous recombination is specifically active in the great apes, which is correlated with architectural differences between the genomes of great apes and macaque. Transcriptome analyses across nonhuman primates and humans revealed effects of species-specific whole-gene duplication on gene expression. We identified 13 gene duplications coinciding with the species-specific gain of tissue-specific gene expression in keeping with a role of gene duplication in the promotion of diversification and the acquisition of unique functions. Differences in the present day activity of SV formation mechanisms that our study revealed may contribute to ongoing diversification and adaptation of great ape and Old World monkey lineages.

  19. Primate genome architecture influences structural variation mechanisms and functional consequences

    PubMed Central

    Gokcumen, Omer; Tischler, Verena; Tica, Jelena; Zhu, Qihui; Iskow, Rebecca C.; Lee, Eunjung; Fritz, Markus Hsi-Yang; Langdon, Amy; Stütz, Adrian M.; Pavlidis, Pavlos; Benes, Vladimir; Mills, Ryan E.; Park, Peter J.; Lee, Charles; Korbel, Jan O.

    2013-01-01

    Although nucleotide resolution maps of genomic structural variants (SVs) have provided insights into the origin and impact of phenotypic diversity in humans, comparable maps in nonhuman primates have thus far been lacking. Using massively parallel DNA sequencing, we constructed fine-resolution genomic structural variation maps in five chimpanzees, five orang-utans, and five rhesus macaques. The SV maps, which are comprised of thousands of deletions, duplications, and mobile element insertions, revealed a high activity of retrotransposition in macaques compared with great apes. By comparison, nonallelic homologous recombination is specifically active in the great apes, which is correlated with architectural differences between the genomes of great apes and macaque. Transcriptome analyses across nonhuman primates and humans revealed effects of species-specific whole-gene duplication on gene expression. We identified 13 gene duplications coinciding with the species-specific gain of tissue-specific gene expression in keeping with a role of gene duplication in the promotion of diversification and the acquisition of unique functions. Differences in the present day activity of SV formation mechanisms that our study revealed may contribute to ongoing diversification and adaptation of great ape and Old World monkey lineages. PMID:24014587

  20. Prevalent role of gene features in determining evolutionary fates of whole-genome duplication duplicated genes in flowering plants.

    PubMed

    Jiang, Wen-kai; Liu, Yun-long; Xia, En-hua; Gao, Li-zhi

    2013-04-01

    The evolution of genes and genomes after polyploidization has been the subject of extensive studies in evolutionary biology and plant sciences. While a significant number of duplicated genes are rapidly removed during a process called fractionation, which operates after the whole-genome duplication (WGD), another considerable number of genes are retained preferentially, leading to the phenomenon of biased gene retention. However, the evolutionary mechanisms underlying gene retention after WGD remain largely unknown. Through genome-wide analyses of sequence and functional data, we comprehensively investigated the relationships between gene features and the retention probability of duplicated genes after WGDs in six plant genomes, Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa), soybean (Glycine max), rice (Oryza sativa), sorghum (Sorghum bicolor), and maize (Zea mays). The results showed that multiple gene features were correlated with the probability of gene retention. Using a logistic regression model based on principal component analysis, we resolved evolutionary rate, structural complexity, and GC3 content as the three major contributors to gene retention. Cluster analysis of these features further classified retained genes into three distinct groups in terms of gene features and evolutionary behaviors. Type I genes are more prone to be selected by dosage balance; type II genes are possibly subject to subfunctionalization; and type III genes may serve as potential targets for neofunctionalization. This study highlights that gene features are able to act jointly as primary forces when determining the retention and evolution of WGD-derived duplicated genes in flowering plants. These findings thus may help to provide a resolution to the debate on different evolutionary models of gene fates after WGDs.

  1. Genome-wide analysis of chromatin packing in Arabidopsis thaliana at single-gene resolution

    PubMed Central

    Liu, Chang; Wang, Congmao; Wang, George; Becker, Claude; Zaidem, Maricris; Weigel, Detlef

    2016-01-01

    The three-dimensional packing of the genome plays an important role in regulating gene expression. We have used Hi-C, a genome-wide chromatin conformation capture (3C) method, to analyze Arabidopsis thaliana chromosomes dissected into subkilobase segments, which is required for gene-level resolution in this species with a gene-dense genome. We found that the repressive H3K27me3 histone mark is overrepresented in the promoter regions of genes that are in conformational linkage over long distances. In line with the globally dispersed distribution of RNA polymerase II in A. thaliana nuclear space, actively transcribed genes do not show a strong tendency to associate with each other. In general, there are often contacts between 5′ and 3′ ends of genes, forming local chromatin loops. Such self-loop structures of genes are more likely to occur in more highly expressed genes, although they can also be found in silent genes. Silent genes with local chromatin loops are highly enriched for the histone variant H3.3 at their 5′ and 3′ ends but depleted of repressive marks such as heterochromatic histone modifications and DNA methylation in flanking regions. Our results suggest that, different from animals, a major theme of genome folding in A. thaliana is the formation of structural units that correspond to gene bodies. PMID:27225844

  2. PIECE: a database for plant gene structure comparison and evolution.

    PubMed

    Wang, Yi; You, Frank M; Lazo, Gerard R; Luo, Ming-Cheng; Thilmony, Roger; Gordon, Sean; Kianian, Shahryar F; Gu, Yong Q

    2013-01-01

    Gene families often show degrees of differences in terms of exon-intron structures depending on their distinct evolutionary histories. Comparative analysis of gene structures is important for understanding their evolutionary and functional relationships within plant species. Here, we present a comparative genomics database named PIECE (http://wheat.pw.usda.gov/piece) for Plant Intron and Exon Comparison and Evolution studies. The database contains all the annotated genes extracted from 25 sequenced plant genomes. These genes were classified based on Pfam motifs. Phylogenetic trees were pre-constructed for each gene category. PIECE provides a user-friendly interface for different types of searches and a graphical viewer for displaying a gene structure pattern diagram linked to the resulting bootstrapped dendrogram for each gene family. The gene structure evolution of orthologous gene groups was determined using the GLOOME, Exalign and GECA software programs that can be accessed within the database. PIECE also provides a web server version of the software, GSDraw, for drawing schematic diagrams of gene structures. PIECE is a powerful tool for comparing gene sequences and provides valuable insights into the evolution of gene structure in plant genomes.

  3. PIECE: a database for plant gene structure comparison and evolution

    PubMed Central

    Wang, Yi; You, Frank M.; Lazo, Gerard R.; Luo, Ming-Cheng; Thilmony, Roger; Gordon, Sean; Kianian, Shahryar F.; Gu, Yong Q.

    2013-01-01

    Gene families often show degrees of differences in terms of exon–intron structures depending on their distinct evolutionary histories. Comparative analysis of gene structures is important for understanding their evolutionary and functional relationships within plant species. Here, we present a comparative genomics database named PIECE (http://wheat.pw.usda.gov/piece) for Plant Intron and Exon Comparison and Evolution studies. The database contains all the annotated genes extracted from 25 sequenced plant genomes. These genes were classified based on Pfam motifs. Phylogenetic trees were pre-constructed for each gene category. PIECE provides a user-friendly interface for different types of searches and a graphical viewer for displaying a gene structure pattern diagram linked to the resulting bootstrapped dendrogram for each gene family. The gene structure evolution of orthologous gene groups was determined using the GLOOME, Exalign and GECA software programs that can be accessed within the database. PIECE also provides a web server version of the software, GSDraw, for drawing schematic diagrams of gene structures. PIECE is a powerful tool for comparing gene sequences and provides valuable insights into the evolution of gene structure in plant genomes. PMID:23180792

  4. Genome-wide Membrane Protein Structure Prediction

    PubMed Central

    Piccoli, Stefano; Suku, Eda; Garonzi, Marianna; Giorgetti, Alejandro

    2013-01-01

    Transmembrane proteins allow cells to extensively communicate with the external world in a very accurate and specific way. They form principal nodes in several signaling pathways and attract large interest in therapeutic intervention, as the majority pharmaceutical compounds target membrane proteins. Thus, according to the current genome annotation methods, a detailed structural/functional characterization at the protein level of each of the elements codified in the genome is also required. The extreme difficulty in obtaining high-resolution three-dimensional structures, calls for computational approaches. Here we review to which extent the efforts made in the last few years, combining the structural characterization of membrane proteins with protein bioinformatics techniques, could help describing membrane proteins at a genome-wide scale. In particular we analyze the use of comparative modeling techniques as a way of overcoming the lack of high-resolution three-dimensional structures in the human membrane proteome. PMID:24403851

  5. Genome-wide analysis reveals recurrent structural abnormalities of TP63 and other p53-related genes in peripheral T-cell lymphomas

    PubMed Central

    Vasmatzis, George; Johnson, Sarah H.; Knudson, Ryan A.; Ketterling, Rhett P.; Braggio, Esteban; Fonseca, Rafael; Viswanatha, David S.; Law, Mark E.; Kip, N. Sertac; Özsan, Nazan; Grebe, Stefan K.; Frederick, Lori A.; Eckloff, Bruce W.; Thompson, E. Aubrey; Kadin, Marshall E.; Milosevic, Dragana; Porcher, Julie C.; Asmann, Yan W.; Smith, David I.; Kovtun, Irina V.; Ansell, Stephen M.; Dogan, Ahmet

    2012-01-01

    Peripheral T-cell lymphomas (PTCLs) are aggressive malignancies of mature T lymphocytes with 5-year overall survival rates of only ∼ 35%. Improvement in outcomes has been stymied by poor understanding of the genetics and molecular pathogenesis of PTCL, with a resulting paucity of molecular targets for therapy. We developed bioinformatic tools to identify chromosomal rearrangements using genome-wide, next-generation sequencing analysis of mate-pair DNA libraries and applied these tools to 16 PTCL patient tissue samples and 6 PTCL cell lines. Thirteen recurrent abnormalities were identified, of which 5 involved p53-related genes (TP53, TP63, CDKN2A, WWOX, and ANKRD11). Among these abnormalities were novel TP63 rearrangements encoding fusion proteins homologous to ΔNp63, a dominant-negative p63 isoform that inhibits the p53 pathway. TP63 rearrangements were seen in 11 (5.8%) of 190 PTCLs and were associated with inferior overall survival; they also were detected in 2 (1.2%) of 164 diffuse large B-cell lymphomas. As TP53 mutations are rare in PTCL compared with other malignancies, our findings suggest that a constellation of alternate genetic abnormalities may contribute to disruption of p53-associated tumor suppressor function in PTCL. PMID:22855598

  6. Biased distribution of DNA uptake sequences towards genome maintenance genes.

    PubMed

    Davidsen, Tonje; Rødland, Einar A; Lagesen, Karin; Seeberg, Erling; Rognes, Torbjørn; Tønjum, Tone

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H.influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions. These results imply that the high frequency of DUS in genome maintenance genes is conserved among phylogenetically divergent species and thus are of significant biological importance. Increased DUS density is expected to enhance DNA uptake and the over-representation of DUS in genome maintenance genes might reflect facilitated recovery of genome preserving functions. For example, transient and beneficial increase in genome instability can be allowed during pathogenesis simply through loss of antimutator genes, since these DUS-containing sequences will be preferentially recovered. Furthermore, uptake of such genes could provide a mechanism for facilitated recovery from DNA damage after genotoxic stress. PMID:14960717

  7. Hemipteran genomics and psyllid gene expression

    Technology Transfer Automated Retrieval System (TEKTRAN)

    One of the best tools current available is the application of genomics to insect pest problems. Genomics provides rapid elucidation of the genetic basis of insect biology. Research efforts on psyllid genomics, while still in its infancy, is providing information which will aid strategies to suppress...

  8. Plant Ion Channels: Gene Families, Physiology, and Functional Genomics Analyses

    PubMed Central

    Ward, John M.; Mäser, Pascal; Schroeder, Julian I.

    2016-01-01

    Distinct potassium, anion, and calcium channels in the plasma membrane and vacuolar membrane of plant cells have been identified and characterized by patch clamping. Primarily owing to advances in Arabidopsis genetics and genomics, and yeast functional complementation, many of the corresponding genes have been identified. Recent advances in our understanding of ion channel genes that mediate signal transduction and ion transport are discussed here. Some plant ion channels, for example, ALMT and SLAC anion channel subunits, are unique. The majority of plant ion channel families exhibit homology to animal genes; such families include both hyperpolarization-and depolarization-activated Shaker-type potassium channels, CLC chloride transporters/channels, cyclic nucleotide–gated channels, and ionotropic glutamate receptor homologs. These plant ion channels offer unique opportunities to analyze the structural mechanisms and functions of ion channels. Here we review gene families of selected plant ion channel classes and discuss unique structure-function aspects and their physiological roles in plant cell signaling and transport. PMID:18842100

  9. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

    PubMed Central

    Jang, Ho; Hur, Youngmi; Lee, Hyunju

    2016-01-01

    DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes. PMID:27156852

  10. Coelacanth genome sequence reveals the evolutionary history of vertebrate genes.

    PubMed

    Noonan, James P; Grimwood, Jane; Danke, Joshua; Schmutz, Jeremy; Dickson, Mark; Amemiya, Chris T; Myers, Richard M

    2004-12-01

    The coelacanth is one of the nearest living relatives of tetrapods. However, a teleost species such as zebrafish or Fugu is typically used as the outgroup in current tetrapod comparative sequence analyses. Such studies are complicated by the fact that teleost genomes have undergone a whole-genome duplication event, as well as individual gene-duplication events. Here, we demonstrate the value of coelacanth genome sequence by complete sequencing and analysis of the protocadherin gene cluster of the Indonesian coelacanth, Latimeria menadoensis. We found that coelacanth has 49 protocadherin cluster genes organized in the same three ordered subclusters, alpha, beta, and gamma, as the 54 protocadherin cluster genes in human. In contrast, whole-genome and tandem duplications have generated two zebrafish protocadherin clusters comprised of at least 97 genes. Additionally, zebrafish protocadherins are far more prone to homogenizing gene conversion events than coelacanth protocadherins, suggesting that recombination- and duplication-driven plasticity may be a feature of teleost genomes. Our results indicate that coelacanth provides the ideal outgroup sequence against which tetrapod genomes can be measured. We therefore present L. menadoensis as a candidate for whole-genome sequencing.

  11. Genome-editing Technologies for Gene and Cell Therapy

    PubMed Central

    Maeder, Morgan L; Gersbach, Charles A

    2016-01-01

    Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed. PMID:26755333

  12. Genomic structure, chromosomal localization and expression profile of a novel melanoma differentiation associated (mda-7) gene with cancer specific growth suppressing and apoptosis inducing properties.

    SciTech Connect

    Huang, E. Y.; Madireddi, M. T.; Gopalkrishnan, R. V.; Leszczyniecka, M.; Su, Z. Z.; Lebedeva, I. V.; Kang, D. C.; Jian, H.; Lin, J. J.; Alexandre, D.; Chen, Y.; Vozhilla, N.; Mei, M. X.; Christiansen, K. A.; Sivo, F.; Goldstein, N. I.; Chada, S.; Huberman, E.; Pestka, S.; Fisher, P. B.; Biochip Technology Center; Columbia Univ.; Introgen Therapeutics Inc.; UMDNJ-Robert Wood Johnson Medical School

    2001-10-25

    Abnormalities in cellular differentiation are frequent occurrences in human cancers. Treatment of human melanoma cells with recombinant fibroblast interferon (IFN-beta) and the protein kinase C activator mezerein (MEZ) results in an irreversible loss in growth potential, suppression of tumorigenic properties and induction of terminal cell differentiation. Subtraction hybridization identified melanoma differentiation associated gene-7 (mda-7), as a gene induced during these physiological changes in human melanoma cells. Ectopic expression of mda-7 by means of a replication defective adenovirus results in growth suppression and induction of apoptosis in a broad spectrum of additional cancers, including melanoma, glioblastoma multiforme, osteosarcoma and carcinomas of the breast, cervix, colon, lung, nasopharynx and prostate. In contrast, no apparent harmful effects occur when mda-7 is expressed in normal epithelial or fibroblast cells. Human clones of mda-7 were isolated and its organization resolved in terms of intron/exon structure and chromosomal localization. Hu-mda-7 encompasses seven exons and six introns and encodes a protein with a predicted size of 23.8 kDa, consisting of 206 amino acids. Hu-mda-7 mRNA is stably expressed in the thymus, spleen and peripheral blood leukocytes. De novo mda-7 mRNA expression is also detected in human melanocytes and expression is inducible in cells of melanocyte/melanoma lineage and in certain normal and cancer cell types following treatment with a combination of IFN-beta plus MEZ. Mda-7 expression is also induced during megakaryocyte differentiation induced in human hematopoietic cells by treatment with TPA (12-O-tetradecanoyl phorbol-13-acetate). In contrast, de novo expression of mda-7 is not detected nor is it inducible by IFN-beta+MEZ in a spectrum of additional normal and cancer cells. No correlation was observed between induction of mda-7 mRNA expression and growth suppression following treatment with IFN-beta+MEZ and

  13. The fractal structure of the mitochondrial genomes

    NASA Astrophysics Data System (ADS)

    Oiwa, Nestor N.; Glazier, James A.

    2002-08-01

    The mitochondrial DNA genome has a definite multifractal structure. We show that loops, hairpins and inverted palindromes are responsible for this self-similarity. We can thus establish a definite relation between the function of subsequences and their fractal dimension. Intriguingly, protein coding DNAs also exhibit palindromic structures, although they do not appear in the sequence of amino acids. These structures may reflect the stabilization and transcriptional control of DNA or the control of posttranscriptional editing of mRNA.

  14. Complete female mitochondrial genome of Anodonta anatina (Mollusca: Unionidae): confirmation of a novel protein-coding gene (F ORF).

    PubMed

    Soroka, Marianna; Burzyński, Artur

    2015-04-01

    Freshwater mussels are among animals having two different, gender-specific mitochondrial genomes. We sequenced complete female mitochondrial genomes from five individuals of Anodonta anatina, a bivalve species common in palearctic ecozone. The length of the genome was variable: 15,637-15,653 bp. This variation was almost entirely confined to the non-coding parts, which constituted approximately 5% of the genome. Nucleotide diversity was moderate, at 0.3%. Nucleotide composition was typically biased towards AT (66.0%). All genes normally seen in animal mtDNA were identified, as well as the ORF characteristic for unionid mitochondrial genomes, bringing the total number of genes present to 38. If this additional ORF does encode a protein, it must evolve under a very relaxed selection since all substitutions within this gene were non-synonymous. The gene order and structure of the genome were identical to those of all female mitochondrial genomes described in unionid bivalves except the Gonideini.

  15. Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression.

    PubMed

    Ay, Ferhat; Bunnik, Evelien M; Varoquaux, Nelle; Bol, Sebastiaan M; Prudhomme, Jacques; Vert, Jean-Philippe; Noble, William Stafford; Le Roch, Karine G

    2014-06-01

    The development of the human malaria parasite Plasmodium falciparum is controlled by coordinated changes in gene expression throughout its complex life cycle, but the corresponding regulatory mechanisms are incompletely understood. To study the relationship between genome architecture and gene regulation in Plasmodium, we assayed the genome architecture of P. falciparum at three time points during its erythrocytic (asexual) cycle. Using chromosome conformation capture coupled with next-generation sequencing technology (Hi-C), we obtained high-resolution chromosomal contact maps, which we then used to construct a consensus three-dimensional genome structure for each time point. We observed strong clustering of centromeres, telomeres, ribosomal DNA, and virulence genes, resulting in a complex architecture that cannot be explained by a simple volume exclusion model. Internal virulence gene clusters exhibit domain-like structures in contact maps, suggesting that they play an important role in the genome architecture. Midway during the erythrocytic cycle, at the highly transcriptionally active trophozoite stage, the genome adopts a more open chromatin structure with increased chromosomal intermingling. In addition, we observed reduced expression of genes located in spatial proximity to the repressive subtelomeric center, and colocalization of distinct groups of parasite-specific genes with coordinated expression profiles. Overall, our results are indicative of a strong association between the P. falciparum spatial genome organization and gene expression. Understanding the molecular processes involved in genome conformation dynamics could contribute to the discovery of novel antimalarial strategies.

  16. Structural Variation Mutagenesis of the Human Genome: Impact on Disease and Evolution

    PubMed Central

    Lupski, James R.

    2015-01-01

    Watson-Crick base-pair changes, or single-nucleotide variants (SNV), have long been known as a source of mutations. However, the extent to which DNA structural variation, including duplication and deletion copy number variants (CNV) and copy number neutral inversions and translocations, contribute to human genome variation and disease has been appreciated only recently. Moreover, the potential complexity of structural variants (SV) was not envisioned; thus, the frequency of complex genomic rearrangements (CGR) and how such events form remained a mystery. The concept of genomic disorders, diseases due to genomic rearrangements and not sequence-based changes for which genomic architecture incite genomic instability, delineated a new category of conditions distinct from chromosomal syndromes and single-gene Mendelian diseases. Nevertheless, it is the mechanistic understanding of CNV/SV formation that has promoted further understanding of human biology and disease and provided insights into human genome and gene evolution. PMID:25892534

  17. Gene protein products of SA11 simian rotavirus genome.

    PubMed Central

    Arias, C F; López, S; Espejo, R T

    1982-01-01

    When MA104 cells were infected with SA11 rotavirus, 12 protein classes, absent in mock-infected cells, could be distinguished by polyacrylamide gel electrophoresis. At least two of these proteins were glycosylated, and their synthesis could be blocked with tunicamycin. The oligosaccharides of both glycoproteins were cleaved by endo-beta-N-acetylglucosaminidase H, suggesting that they were residues of the "high-mannose" type. Of the 12 viral polypeptides observed in infected cells, 1 was probably the apoprotein of one of these glycoproteins; 5, including 1 glycoprotein, were structural components of the virions, whereas the other 6, including a second and possibly third glycoprotein, were nonstructural viral proteins. When the 11 double-stranded RNA genome segments of SA11 were translated, after denaturation, in an RNA-dependent cell-free translation system, at least 11 different polypeptides were synthesized. Ten of these polypeptides had electrophoretic migration patterns equal to those of viral proteins observed in tunicamycin-treated infected cells. Nine of the 11 double-stranded RNA genome segments were resolved by polyacrylamide gel electrophoresis and were translated individually. Two were not resolved from each other and therefore were translated together. Correlation of each synthesized polypeptide with an individual RNA segment allowed us to make a probable gene-coding assignment for the different SA11 genome segments. Images PMID:6283128

  18. The inheritance of organelle genes and genomes: patterns and mechanisms.

    PubMed

    Xu, Jianping

    2005-12-01

    Unlike nuclear genes and genomes, the inheritance of organelle genes and genomes does not follow Mendel's laws. In this mini-review, I summarize recent research progress on the patterns and mechanisms of the inheritance of organelle genes and genomes. While most sexual eukaryotes show uniparental inheritance of organelle genes and genomes in some progeny at least part of the time, increasing evidence indicates that strictly uniparental inheritance is rare and that organelle inheritance patterns are very diverse and complex. In contrast with the predominance of uniparental inheritance in multicellular organisms, organelle genes in eukaryotic microorganisms, such as protists, algae, and fungi, typically show a greater diversity of inheritance patterns, with sex-determining loci playing significant roles. The diverse patterns of inheritance are matched by the rich variety of potential mechanisms. Indeed, many factors, both deterministic and stochastic, can influence observed patterns of organelle inheritance. Interestingly, in multicellular organisms, progeny from interspecific crosses seem to exhibit more frequent paternal leakage and biparental organelle genome inheritance than those from intraspecific crosses. The recent observation of a sex-determining gene in the basidiomycete yeast Cryptococcus neoformans, which controls mitochondrial DNA inheritance, has opened up potentially exciting research opportunities for identifying specific molecular genetic pathways that control organelle inheritance, as well as for testing evolutionary hypotheses regarding the prevalence of uniparental inheritance of organelle genes and genomes.

  19. Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis

    PubMed Central

    2012-01-01

    Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs) is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS) however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS) for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units) technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium) data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction. PMID:22554139

  20. Genome engineering and gene expression control for bacterial strain development.

    PubMed

    Song, Chan Woo; Lee, Joungmin; Lee, Sang Yup

    2015-01-01

    In recent years, a number of techniques and tools have been developed for genome engineering and gene expression control to achieve desired phenotypes of various bacteria. Here we review and discuss the recent advances in bacterial genome manipulation and gene expression control techniques, and their actual uses with accompanying examples. Genome engineering has been commonly performed based on homologous recombination. During such genome manipulation, the counterselection systems employing SacB or nucleases have mainly been used for the efficient selection of desired engineered strains. The recombineering technology enables simple and more rapid manipulation of the bacterial genome. The group II intron-mediated genome engineering technology is another option for some bacteria that are difficult to be engineered by homologous recombination. Due to the increasing demands on high-throughput screening of bacterial strains having the desired phenotypes, several multiplex genome engineering techniques have recently been developed and validated in some bacteria. Another approach to achieve desired bacterial phenotypes is the repression of target gene expression without the modification of genome sequences. This can be performed by expressing antisense RNA, small regulatory RNA, or CRISPR RNA to repress target gene expression at the transcriptional or translational level. All of these techniques allow efficient and rapid development and screening of bacterial strains having desired phenotypes, and more advanced techniques are expected to be seen.

  1. Higher plant mitochondrial DNA: Genomes, genes, mutants, transcription, translation

    SciTech Connect

    Not Available

    1986-01-01

    This volume contains brief summaries of 63 presentations given at the International Workshop on Higher Plant Mitochondrial DNA. The presentations are organized into topical discussions addressing plant genomes, mitochondrial genes, cytoplasmic male sterility, transcription, translation, plasmids and tissue culture. (DT)

  2. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics.

    PubMed

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-03-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes.

  3. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics

    PubMed Central

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-01-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  4. Comparative analysis of essential genes in prokaryotic genomic islands.

    PubMed

    Zhang, Xi; Peng, Chong; Zhang, Ge; Gao, Feng

    2015-07-30

    Essential genes are thought to encode proteins that carry out the basic functions to sustain a cellular life, and genomic islands (GIs) usually contain clusters of horizontally transferred genes. It has been assumed that essential genes are not likely to be located in GIs, but systematical analysis of essential genes in GIs has not been explored before. Here, we have analyzed the essential genes in 28 prokaryotes by statistical method and reached a conclusion that essential genes in GIs are significantly fewer than those outside GIs. The function of 362 essential genes found in GIs has been explored further by BLAST against the Virulence Factor Database (VFDB) and the phage/prophage sequence database of PHAge Search Tool (PHAST). Consequently, 64 and 60 eligible essential genes are found to share the sequence similarity with the virulence factors and phage/prophages-related genes, respectively. Meanwhile, we find several toxin-related proteins and repressors encoded by these essential genes in GIs. The comparative analysis of essential genes in genomic islands will not only shed new light on the development of the prediction algorithm of essential genes, but also give a clue to detect the functionality of essential genes in genomic islands.

  5. Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

    PubMed

    Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

    2016-04-01

    Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp.

  6. Full genome comparison and characterization of avian H10 viruses with different pathogenicity in Mink (Mustela vison) reveals genetic and functional differences in the non-structural gene

    PubMed Central

    2010-01-01

    Background The unique property of some avian H10 viruses, particularly the ability to cause severe disease in mink without prior adaptation, enabled our study. Coupled with previous experimental data and genetic characterization here we tried to investigate the possible influence of different genes on the virulence of these H10 avian influenza viruses in mink. Results Phylogenetic analysis revealed a close relationship between the viruses studied. Our study also showed that there are no genetic differences in receptor specificity or the cleavability of the haemagglutinin proteins of these viruses regardless of whether they are of low or high pathogenicity in mink. In poly I:C stimulated mink lung cells the NS1 protein of influenza A virus showing high pathogenicity in mink down regulated the type I interferon promoter activity to a greater extent than the NS1 protein of the virus showing low pathogenicity in mink. Conclusions Differences in pathogenicity and virulence in mink between these strains could be related to clear amino acid differences in the non structural 1 (NS1) protein. The NS gene of mink/84 appears to have contributed to the virulence of the virus in mink by helping the virus evade the innate immune responses. PMID:20591155

  7. Circular structures in retroviral and cellular genomes.

    PubMed

    Albert, F G; Bronson, E C; Fitzgerald, D J; Anderson, J N

    1995-10-01

    A computer program for predicting DNA bending from nucleotide sequence was used to identify circular structures in retroviral and cellular genomes. An 830-base pair circular structure was located in a control region near the center of the genome of the human immunodeficiency virus type I (HIV-I). This unusual structure displayed relatively smooth planar bending throughout its length. The structure is conserved in diverse isolates of HIV-I, HIV-II, and simian immunodeficiency viruses, which implies that it is under selective constraints. A search of all sequences in the GenBank data base was carried out in order to identify similar circular structures in cellular DNA. The results revealed that the structures are associated with a wide range of sequences that undergo recombination, including most known examples of DNA inversion and subtelomeric translocation systems. Circular structures were also associated with replication and transposition systems where DNA looping has been implicated in the generation of large protein-DNA complexes. Experimental evidence for the structures was provided by studies which demonstrated that two sequences detected as circular by computer preferentially formed covalently closed circles during ligation reactions in vitro when compared to nonbent fragments, bent fragments with noncircular shapes, and total genomic DNA. In addition, a single T-->C substitution in one of these sequences rendered it less planar as seen by computer analysis and significantly reduced its rate of ligase-catalyzed cyclization. These results permit us to speculate that intrinsically circular structures facilitate DNA looping during formation of the large protein-DNA complexes that are involved in site- and region-specific recombination and in other genomic processes. PMID:7559522

  8. Genomic variants of genes associated with three horticultural traits in apple revealed by genome re-sequencing

    PubMed Central

    Zhang, Shijie; Chen, Weiping; Xin, Lu; Gao, Zhihong; Hou, Yingjun; Yu, Xinyi; Zhang, Zhen; Qu, Shenchun

    2014-01-01

    The apple (Malus × domestica Borkh.) cultivar ‘Su Shuai’ exhibits greater disease resistance, shorter internodes and lighter fruit flavor compared with its parents ‘Golden Delicious’ and ‘Indo’. To obtain a comprehensive overview of the sequence variation in these three horticultural traits, the genomes of ‘Su Shuai’ and ‘Indo’ were resequenced using next-generation sequencing and compared to the genome of ‘Golden Delicious’. A wide range of genetic variations were detected, including 2 454 406 and 18 749 349 single nucleotide polymorphism (SNP) and 59 547 and 50 143 structural variants (SVs) in the ‘Indo’ and ‘Su Shuai’ genomes, respectively. Among the SVs in ‘Su Shuai’, 17 genes related to disease resistance, 10 genes related to Gibberellin (GA) and 19 genes associated with fruit flavor were identified. The expression patterns of eight of the SV genes were examined using reverse transcription-quantitative polymerase chain reaction (RT-qPCR). The results of this study illustrate the genomic variation in these cultivars and provide evidence for a genetic basis for the horticultural traits of disease resistance, short internodes and lighter flavor exhibited in these cultivars. These results provide a genetic basis for the phenotypic characteristics of ‘Su Shuai’ and, as such, these SVs could serve as gene-specific molecular markers in maker-assisted breeding of apples. PMID:26504548

  9. LCGserver: A Webserver for Exploring Evolutionary Trajectory of Gene Orders in a Large Number of Genomes.

    PubMed

    Wang, Dapeng; Yu, Jun

    2015-09-01

    Genes and chromosomes are highly organized; together with protein-coding sequence, gene structure at per gene level and gene order at cluster level are both variable in a context of lineages and under natural selection. How gene order and chromosome organization are related and selected remains to be illuminated. The number of newly-sequenced genomes from various taxa has been increasing rapidly, but there have not been easy-to-use web tools that allow better visualization for gene order in a large genome collection. Here, we describe a webserver, LCGserver (http://lcgbase.big.ac.cn/LCGserver/), for exploring evolutionary dynamics of gene orders over diverse lineages. This server provides gene order information at three levels: single gene, paired gene (a minimal cluster), and clustered gene (more than two genes). The most exclusive feature of LCGserver is alignment and visualization of neighboring genes based on orthology, allowing users to inspect all conserved and dynamic events of gene order along chromosomes in a lineage-specific manner. In addition, it categories paired genes into six patterns and identifies fully-conserved gene clusters within and among lineages.

  10. Recent segmental and gene duplications in the mouse genome

    PubMed Central

    Cheung, Joseph; Wilson, Michael D; Zhang, Junjun; Khaja, Razi; MacDonald, Jeffrey R; Heng, Henry HQ; Koop, Ben F; Scherer, Stephen W

    2003-01-01

    Background The high quality of the mouse genome draft sequence and its associated annotations are an invaluable biological resource. Identifying recent duplications in the mouse genome, especially in regions containing genes, may highlight important events in recent murine evolution. In addition, detecting recent sequence duplications can reveal potentially problematic regions of the genome assembly. We use BLAST-based computational heuristics to identify large (≥ 5 kb) and recent (≥ 90% sequence identity) segmental duplications in the mouse genome sequence. Here we present a database of recently duplicated regions of the mouse genome found in the mouse genome sequencing consortium (MGSC) February 2002 and February 2003 assemblies. Results We determined that 33.6 Mb of 2,695 Mb (1.2%) of sequence from the February 2003 mouse genome sequence assembly is involved in recent segmental duplications, which is less than that observed in the human genome (around 3.5-5%). From this dataset, 8.9 Mb (26%) of the duplication content consisted of 'unmapped' chromosome sequence. Moreover, we suspect that an additional 18.5 Mb of sequence is involved in duplication artifacts arising from sequence misassignment errors in this genome assembly. By searching for genes that are located within these regions, we identified 675 genes that mapped to duplicated regions of the mouse genome. Sixteen of these genes appear to have been duplicated independently in the human genome. From our dataset we further characterized a 42 kb recent segmental duplication of Mater, a maternal-effect gene essential for embryogenesis in mice. Conclusion Our results provide an initial analysis of the recently duplicated sequence and gene content of the mouse genome. Many of these duplicated loci, as well as regions identified to be involved in potential sequence misassignment errors, will require further mapping and sequencing to achieve accuracy. A Genome Browser database was set up to display the

  11. Discovery and classification of homeobox genes in animal genomes.

    PubMed

    Marlétaz, Ferdinand; Paps, Jordi; Maeso, Ignacio; Holland, Peter W H

    2014-01-01

    The diversification of homeobox genes is of great interest to evolutionary and developmental biology. To generate a catalogue of all homeobox genes within species of interest, it is necessary to sequence complete genomes. It is now possible for small research projects and individual laboratories to determine near-complete genome sequences of animal species. We provide bioinformatic methods for assembling draft genome sequences from any animal species, including read filtering and error correction, plus methods for extracting and classifying all homeobox sequences. PMID:25151154

  12. A data management system for structural genomics

    PubMed Central

    Raymond, Stéphane; O'Toole, Nicholas; Cygler, Miroslaw

    2004-01-01

    Background Structural genomics (SG) projects aim to determine thousands of protein structures by the development of high-throughput techniques for all steps of the experimental structure determination pipeline. Crucial to the success of such endeavours is the careful tracking and archiving of experimental and external data on protein targets. Results We have developed a sophisticated data management system for structural genomics. Central to the system is an Oracle-based, SQL-interfaced database. The database schema deals with all facets of the structure determination process, from target selection to data deposition. Users access the database via any web browser. Experimental data is input by users with pre-defined web forms. Data can be displayed according to numerous criteria. A list of all current target proteins can be viewed, with links for each target to associated entries in external databases. To avoid unnecessary work on targets, our data management system matches protein sequences weekly using BLAST to entries in the Protein Data Bank and to targets of other SG centers worldwide. Conclusion Our system is a working, effective and user-friendly data management tool for structural genomics projects. In this report we present a detailed summary of the various capabilities of the system, using real target data as examples, and indicate our plans for future enhancements. PMID:15210054

  13. A data management system for structural genomics.

    PubMed

    Raymond, Stéphane; O'Toole, Nicholas; Cygler, Miroslaw

    2004-06-21

    BACKGROUND: Structural genomics (SG) projects aim to determine thousands of protein structures by the development of high-throughput techniques for all steps of the experimental structure determination pipeline. Crucial to the success of such endeavours is the careful tracking and archiving of experimental and external data on protein targets. RESULTS: We have developed a sophisticated data management system for structural genomics. Central to the system is an Oracle-based, SQL-interfaced database. The database schema deals with all facets of the structure determination process, from target selection to data deposition. Users access the database via any web browser. Experimental data is input by users with pre-defined web forms. Data can be displayed according to numerous criteria. A list of all current target proteins can be viewed, with links for each target to associated entries in external databases. To avoid unnecessary work on targets, our data management system matches protein sequences weekly using BLAST to entries in the Protein Data Bank and to targets of other SG centers worldwide. CONCLUSION: Our system is a working, effective and user-friendly data management tool for structural genomics projects. In this report we present a detailed summary of the various capabilities of the system, using real target data as examples, and indicate our plans for future enhancements.

  14. Adaptive gene expression divergence inferred from population genomics.

    PubMed

    Holloway, Alisha K; Lawniczak, Mara K N; Mezey, Jason G; Begun, David J; Jones, Corbin D

    2007-10-01

    Detailed studies of individual genes have shown that gene expression divergence often results from adaptive evolution of regulatory sequence. Genome-wide analyses, however, have yet to unite patterns of gene expression with polymorphism and divergence to infer population genetic mechanisms underlying expression evolution. Here, we combined genomic expression data--analyzed in a phylogenetic context--with whole genome light-shotgun sequence data from six Drosophila simulans lines and reference sequences from D. melanogaster and D. yakuba. These data allowed us to use molecular population genetics to test for neutral versus adaptive gene expression divergence on a genomic scale. We identified recent and recurrent adaptive evolution along the D. simulans lineage by contrasting sequence polymorphism within D. simulans to divergence from D. melanogaster and D. yakuba. Genes that evolved higher levels of expression in D. simulans have experienced adaptive evolution of the associated 3' flanking and amino acid sequence. Concomitantly, these genes are also decelerating in their rates of protein evolution, which is in agreement with the finding that highly expressed genes evolve slowly. Interestingly, adaptive evolution in 5' cis-regulatory regions did not correspond strongly with expression evolution. Our results provide a genomic view of the intimate link between selection acting on a phenotype and associated genic evolution.

  15. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure

    PubMed Central

    2011-01-01

    Background Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome. Results Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella. Conclusions When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution. PMID:21619600

  16. Molecular Assemblies, Genes and Genomics Integrated Efficiently (MAGGIE)

    SciTech Connect

    Baliga, Nitin S

    2011-05-26

    Final report on MAGGIE. We set ambitious goals to model the functions of individual organisms and their community from molecular to systems scale. These scientific goals are driving the development of sophisticated algorithms to analyze large amounts of experimental measurements made using high throughput technologies to explain and predict how the environment influences biological function at multiple scales and how the microbial systems in turn modify the environment. By experimentally evaluating predictions made using these models we will test the degree to which our quantitative multiscale understanding wilt help to rationally steer individual microbes and their communities towards specific tasks. Towards this end we have made substantial progress towards understanding evolution of gene families, transcriptional structures, detailed structures of keystone molecular assemblies (proteins and complexes), protein interactions, biological networks, microbial interactions, and community structure. Using comparative analysis we have tracked the evolutionary history of gene functions to understand how novel functions evolve. One level up, we have used proteomics data, high-resolution genome tiling microarrays, and 5' RNA sequencing to revise genome annotations, discover new genes including ncRNAs, and map dynamically changing operon structures of five model organisms: For Desulfovibrio vulgaris Hildenborough, Pyrococcus furiosis, Sulfolobus solfataricus, Methanococcus maripaludis and Haiobacterium salinarum NROL We have developed machine learning algorithms to accurately identify protein interactions at a near-zero false positive rate from noisy data generated using tagfess complex purification, TAP purification, and analysis of membrane complexes. Combining other genome-scale datasets produced by ENIGMA (in particular, microarray data) and available from literature we have been able to achieve a true positive rate as high as 65% at almost zero false positives when

  17. The impact of extremophiles on structural genomics (and vice versa).

    PubMed

    Jenney, Francis E; Adams, Michael W W

    2008-01-01

    The advent of the complete genome sequences of various organisms in the mid-1990s raised the issue of how one could determine the function of hypothetical proteins. While insight might be obtained from a 3D structure, the chances of being able to predict such a structure is limited for the deduced amino acid sequence of any uncharacterized gene. A template for modeling is required, but there was only a low probability of finding a protein closely-related in sequence with an available structure. Thus, in the late 1990s, an international effort known as structural genomics (SG) was initiated, its primary goal to "fill sequence-structure space" by determining the 3D structures of representatives of all known protein families. This was to be achieved mainly by X-ray crystallography and it was estimated that at least 5,000 new structures would be required. While the proteins (genes) for SG have subsequently been derived from hundreds of different organisms, extremophiles and particularly thermophiles have been specifically targeted due to the increased stability and ease of handling of their proteins, relative to those from mesophiles. This review summarizes the significant impact that extremophiles and proteins derived from them have had on SG projects worldwide. To what extent SG has influenced the field of extremophile research is also discussed.

  18. Evolution of genes and genomes on the Drosophila phylogeny.

    PubMed

    Clark, Andrew G; Eisen, Michael B; Smith, Douglas R; Bergman, Casey M; Oliver, Brian; Markow, Therese A; Kaufman, Thomas C; Kellis, Manolis; Gelbart, William; Iyer, Venky N; Pollard, Daniel A; Sackton, Timothy B; Larracuente, Amanda M; Singh, Nadia D; Abad, Jose P; Abt, Dawn N; Adryan, Boris; Aguade, Montserrat; Akashi, Hiroshi; Anderson, Wyatt W; Aquadro, Charles F; Ardell, David H; Arguello, Roman; Artieri, Carlo G; Barbash, Daniel A; Barker, Daniel; Barsanti, Paolo; Batterham, Phil; Batzoglou, Serafim; Begun, Dave; Bhutkar, Arjun; Blanco, Enrico; Bosak, Stephanie A; Bradley, Robert K; Brand, Adrianne D; Brent, Michael R; Brooks, Angela N; Brown, Randall H; Butlin, Roger K; Caggese, Corrado; Calvi, Brian R; Bernardo de Carvalho, A; Caspi, Anat; Castrezana, Sergio; Celniker, Susan E; Chang, Jean L; Chapple, Charles; Chatterji, Sourav; Chinwalla, Asif; Civetta, Alberto; Clifton, Sandra W; Comeron, Josep M; Costello, James C; Coyne, Jerry A; Daub, Jennifer; David, Robert G; Delcher, Arthur L; Delehaunty, Kim; Do, Chuong B; Ebling, Heather; Edwards, Kevin; Eickbush, Thomas; Evans, Jay D; Filipski, Alan; Findeiss, Sven; Freyhult, Eva; Fulton, Lucinda; Fulton, Robert; Garcia, Ana C L; Gardiner, Anastasia; Garfield, David A; Garvin, Barry E; Gibson, Greg; Gilbert, Don; Gnerre, Sante; Godfrey, Jennifer; Good, Robert; Gotea, Valer; Gravely, Brenton; Greenberg, Anthony J; Griffiths-Jones, Sam; Gross, Samuel; Guigo, Roderic; Gustafson, Erik A; Haerty, Wilfried; Hahn, Matthew W; Halligan, Daniel L; Halpern, Aaron L; Halter, Gillian M; Han, Mira V; Heger, Andreas; Hillier, LaDeana; Hinrichs, Angie S; Holmes, Ian; Hoskins, Roger A; Hubisz, Melissa J; Hultmark, Dan; Huntley, Melanie A; Jaffe, David B; Jagadeeshan, Santosh; Jeck, William R; Johnson, Justin; Jones, Corbin D; Jordan, William C; Karpen, Gary H; Kataoka, Eiko; Keightley, Peter D; Kheradpour, Pouya; Kirkness, Ewen F; Koerich, Leonardo B; Kristiansen, Karsten; Kudrna, Dave; Kulathinal, Rob J; Kumar, Sudhir; Kwok, Roberta; Lander, Eric; Langley, Charles H; Lapoint, Richard; Lazzaro, Brian P; Lee, So-Jeong; Levesque, Lisa; Li, Ruiqiang; Lin, Chiao-Feng; Lin, Michael F; Lindblad-Toh, Kerstin; Llopart, Ana; Long, Manyuan; Low, Lloyd; Lozovsky, Elena; Lu, Jian; Luo, Meizhong; Machado, Carlos A; Makalowski, Wojciech; Marzo, Mar; Matsuda, Muneo; Matzkin, Luciano; McAllister, Bryant; McBride, Carolyn S; McKernan, Brendan; McKernan, Kevin; Mendez-Lago, Maria; Minx, Patrick; Mollenhauer, Michael U; Montooth, Kristi; Mount, Stephen M; Mu, Xu; Myers, Eugene; Negre, Barbara; Newfeld, Stuart; Nielsen, Rasmus; Noor, Mohamed A F; O'Grady, Patrick; Pachter, Lior; Papaceit, Montserrat; Parisi, Matthew J; Parisi, Michael; Parts, Leopold; Pedersen, Jakob S; Pesole, Graziano; Phillippy, Adam M; Ponting, Chris P; Pop, Mihai; Porcelli, Damiano; Powell, Jeffrey R; Prohaska, Sonja; Pruitt, Kim; Puig, Marta; Quesneville, Hadi; Ram, Kristipati Ravi; Rand, David; Rasmussen, Matthew D; Reed, Laura K; Reenan, Robert; Reily, Amy; Remington, Karin A; Rieger, Tania T; Ritchie, Michael G; Robin, Charles; Rogers, Yu-Hui; Rohde, Claudia; Rozas, Julio; Rubenfield, Marc J; Ruiz, Alfredo; Russo, Susan; Salzberg, Steven L; Sanchez-Gracia, Alejandro; Saranga, David J; Sato, Hajime; Schaeffer, Stephen W; Schatz, Michael C; Schlenke, Todd; Schwartz, Russell; Segarra, Carmen; Singh, Rama S; Sirot, Laura; Sirota, Marina; Sisneros, Nicholas B; Smith, Chris D; Smith, Temple F; Spieth, John; Stage, Deborah E; Stark, Alexander; Stephan, Wolfgang; Strausberg, Robert L; Strempel, Sebastian; Sturgill, David; Sutton, Granger; Sutton, Granger G; Tao, Wei; Teichmann, Sarah; Tobari, Yoshiko N; Tomimura, Yoshihiko; Tsolas, Jason M; Valente, Vera L S; Venter, Eli; Venter, J Craig; Vicario, Saverio; Vieira, Filipe G; Vilella, Albert J; Villasante, Alfredo; Walenz, Brian; Wang, Jun; Wasserman, Marvin; Watts, Thomas; Wilson, Derek; Wilson, Richard K; Wing, Rod A; Wolfner, Mariana F; Wong, Alex; Wong, Gane Ka-Shu; Wu, Chung-I; Wu, Gabriel; Yamamoto, Daisuke; Yang, Hsiao-Pei; Yang, Shiaw-Pyng; Yorke, James A; Yoshida, Kiyohito; Zdobnov, Evgeny; Zhang, Peili; Zhang, Yu; Zimin, Aleksey V; Baldwin, Jennifer; Abdouelleil, Amr; Abdulkadir, Jamal; Abebe, Adal; Abera, Brikti; Abreu, Justin; Acer, St Christophe; Aftuck, Lynne; Alexander, Allen; An, Peter; Anderson, Erica; Anderson, Scott; Arachi, Harindra; Azer, Marc; Bachantsang, Pasang; Barry, Andrew; Bayul, Tashi; Berlin, Aaron; Bessette, Daniel; Bloom, Toby; Blye, Jason; Boguslavskiy, Leonid; Bonnet, Claude; Boukhgalter, Boris; Bourzgui, Imane; Brown, Adam; Cahill, Patrick; Channer, Sheridon; Cheshatsang, Yama; Chuda, Lisa; Citroen, Mieke; Collymore, Alville; Cooke, Patrick; Costello, Maura; D'Aco, Katie; Daza, Riza; De Haan, Georgius; DeGray, Stuart; DeMaso, Christina; Dhargay, Norbu; Dooley, Kimberly; Dooley, Erin; Doricent, Missole; Dorje, Passang; Dorjee, Kunsang; Dupes, Alan; Elong, Richard; Falk, Jill; Farina, Abderrahim; Faro, Susan; Ferguson, Diallo; Fisher, Sheila; Foley, Chelsea D; Franke, Alicia; Friedrich, Dennis; Gadbois, Loryn; Gearin, Gary; Gearin, Christina R; Giannoukos, Georgia; Goode, Tina; Graham, Joseph; Grandbois, Edward; Grewal, Sharleen; Gyaltsen, Kunsang; Hafez, Nabil; Hagos, Birhane; Hall, Jennifer; Henson, Charlotte; Hollinger, Andrew; Honan, Tracey; Huard, Monika D; Hughes, Leanne; Hurhula, Brian; Husby, M Erii; Kamat, Asha; Kanga, Ben; Kashin, Seva; Khazanovich, Dmitry; Kisner, Peter; Lance, Krista; Lara, Marcia; Lee, William; Lennon, Niall; Letendre, Frances; LeVine, Rosie; Lipovsky, Alex; Liu, Xiaohong; Liu, Jinlei; Liu, Shangtao; Lokyitsang, Tashi; Lokyitsang, Yeshi; Lubonja, Rakela; Lui, Annie; MacDonald, Pen; Magnisalis, Vasilia; Maru, Kebede; Matthews, Charles; McCusker, William; McDonough, Susan; Mehta, Teena; Meldrim, James; Meneus, Louis; Mihai, Oana; Mihalev, Atanas; Mihova, Tanya; Mittelman, Rachel; Mlenga, Valentine; Montmayeur, Anna; Mulrain, Leonidas; Navidi, Adam; Naylor, Jerome; Negash, Tamrat; Nguyen, Thu; Nguyen, Nga; Nicol, Robert; Norbu, Choe; Norbu, Nyima; Novod, Nathaniel; O'Neill, Barry; Osman, Sahal; Markiewicz, Eva; Oyono, Otero L; Patti, Christopher; Phunkhang, Pema; Pierre, Fritz; Priest, Margaret; Raghuraman, Sujaa; Rege, Filip; Reyes, Rebecca; Rise, Cecil; Rogov, Peter; Ross, Keenan; Ryan, Elizabeth; Settipalli, Sampath; Shea, Terry; Sherpa, Ngawang; Shi, Lu; Shih, Diana; Sparrow, Todd; Spaulding, Jessica; Stalker, John; Stange-Thomann, Nicole; Stavropoulos, Sharon; Stone, Catherine; Strader, Christopher; Tesfaye, Senait; Thomson, Talene; Thoulutsang, Yama; Thoulutsang, Dawa; Topham, Kerri; Topping, Ira; Tsamla, Tsamla; Vassiliev, Helen; Vo, Andy; Wangchuk, Tsering; Wangdi, Tsering; Weiand, Michael; Wilkinson, Jane; Wilson, Adam; Yadav, Shailendra; Young, Geneva; Yu, Qing; Zembek, Lisa; Zhong, Danni; Zimmer, Andrew; Zwirko, Zac; Jaffe, David B; Alvarez, Pablo; Brockman, Will; Butler, Jonathan; Chin, CheeWhye; Gnerre, Sante; Grabherr, Manfred; Kleber, Michael; Mauceli, Evan; MacCallum, Iain

    2007-11-01

    Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.

  19. Gene Islands Integrated into tRNAGly Genes Confer Genome Diversity on a Pseudomonas aeruginosa Clone

    PubMed Central

    Larbig, Karen D.; Christmann, Andreas; Johann, André; Klockgether, Jens; Hartsch, Thomas; Merkl, Rainer; Wiehlmann, Lutz; Fritz, Hans-Joachim; Tümmler, Burkhard

    2002-01-01

    Intraclonal genome diversity of Pseudomonas aeruginosa was studied in one of the most diverse mosaic regions of the P. aeruginosa chromosome. The ca. 110-kb large hypervariable region located near the lipH gene in two members of the predominant P. aeruginosa clone C, strain C and strain SG17M, was sequenced. In both strains the region consists of an individual strain-specific gene island of 111 (strain C) or 106 (SG17M) open reading frames (ORFs) and of a 7-kb stretch of clone C-specific sequence of 9 ORFs. The gene islands are integrated into conserved tRNAGly genes and have a bipartite structure. The first part adjacent to the tRNA gene consists of strain-specific ORFs encoding metabolic functions and transporters, the majority of which have homologs of known function in other eubacteria, such as hemophores, cytochrome c biosynthesis, or mercury resistance. The second part is made up mostly of ORFs of yet-unknown function. Forty-seven of these ORFs are mutual homologs with a pairwise amino acid sequence identity of 35 to 88% and are arranged in the same order in the two gene islands. We hypothesize that this novel type of gene island derives from mobile elements which, upon integration, endow the recipient with strain-specific metabolic properties, thus possibly conferring on it a selective advantage in its specific habitat. PMID:12426355

  20. Structural dynamics of cereal mitochondrial genomes as revealed by complete nucleotide sequencing of the wheat mitochondrial genome.

    PubMed

    Ogihara, Yasunari; Yamazaki, Yukiko; Murai, Koji; Kanno, Akira; Terachi, Toru; Shiina, Takashi; Miyashita, Naohiko; Nasuda, Shuhei; Nakamura, Chiharu; Mori, Naoki; Takumi, Shigeo; Murata, Minoru; Futo, Satoshi; Tsunewaki, Koichiro

    2005-01-01

    The application of a new gene-based strategy for sequencing the wheat mitochondrial genome shows its structure to be a 452 528 bp circular molecule, and provides nucleotide-level evidence of intra-molecular recombination. Single, reciprocal and double recombinant products, and the nucleotide sequences of the repeats that mediate their formation have been identified. The genome has 55 genes with exons, including 35 protein-coding, 3 rRNA and 17 tRNA genes. Nucleotide sequences of seven wheat genes have been determined here for the first time. Nine genes have an exon-intron structure. Gene amplification responsible for the production of multicopy mitochondrial genes, in general, is species-specific, suggesting the recent origin of these genes. About 16, 17, 15, 3.0 and 0.2% of wheat mitochondrial DNA (mtDNA) may be of genic (including introns), open reading frame, repetitive sequence, chloroplast and retro-element origin, respectively. The gene order of the wheat mitochondrial gene map shows little synteny to the rice and maize maps, indicative that thorough gene shuffling occurred during speciation. Almost all unique mtDNA sequences of wheat, as compared with rice and maize mtDNAs, are redundant DNA. Features of the gene-based strategy are discussed, and a mechanistic model of mitochondrial gene amplification is proposed. PMID:16260473

  1. Structural dynamics of cereal mitochondrial genomes as revealed by complete nucleotide sequencing of the wheat mitochondrial genome

    PubMed Central

    Ogihara, Yasunari; Yamazaki, Yukiko; Murai, Koji; Kanno, Akira; Terachi, Toru; Shiina, Takashi; Miyashita, Naohiko; Nasuda, Shuhei; Nakamura, Chiharu; Mori, Naoki; Takumi, Shigeo; Murata, Minoru; Futo, Satoshi; Tsunewaki, Koichiro

    2005-01-01

    The application of a new gene-based strategy for sequencing the wheat mitochondrial genome shows its structure to be a 452 528 bp circular molecule, and provides nucleotide-level evidence of intra-molecular recombination. Single, reciprocal and double recombinant products, and the nucleotide sequences of the repeats that mediate their formation have been identified. The genome has 55 genes with exons, including 35 protein-coding, 3 rRNA and 17 tRNA genes. Nucleotide sequences of seven wheat genes have been determined here for the first time. Nine genes have an exon–intron structure. Gene amplification responsible for the production of multicopy mitochondrial genes, in general, is species-specific, suggesting the recent origin of these genes. About 16, 17, 15, 3.0 and 0.2% of wheat mitochondrial DNA (mtDNA) may be of genic (including introns), open reading frame, repetitive sequence, chloroplast and retro-element origin, respectively. The gene order of the wheat mitochondrial gene map shows little synteny to the rice and maize maps, indicative that thorough gene shuffling occurred during speciation. Almost all unique mtDNA sequences of wheat, as compared with rice and maize mtDNAs, are redundant DNA. Features of the gene-based strategy are discussed, and a mechanistic model of mitochondrial gene amplification is proposed. PMID:16260473

  2. The cavefish genome reveals candidate genes for eye loss.

    PubMed

    McGaugh, Suzanne E; Gross, Joshua B; Aken, Bronwen; Blin, Maryline; Borowsky, Richard; Chalopin, Domitille; Hinaux, Hélène; Jeffery, William R; Keene, Alex; Ma, Li; Minx, Patrick; Murphy, Daniel; O'Quin, Kelly E; Rétaux, Sylvie; Rohner, Nicolas; Searle, Steve M J; Stahl, Bethany A; Tabin, Cliff; Volff, Jean-Nicolas; Yoshizawa, Masato; Warren, Wesley C

    2014-01-01

    Natural populations subjected to strong environmental selection pressures offer a window into the genetic underpinnings of evolutionary change. Cavefish populations, Astyanax mexicanus (Teleostei: Characiphysi), exhibit repeated, independent evolution for a variety of traits including eye degeneration, pigment loss, increased size and number of taste buds and mechanosensory organs, and shifts in many behavioural traits. Surface and cave forms are interfertile making this system amenable to genetic interrogation; however, lack of a reference genome has hampered efforts to identify genes responsible for changes in cave forms of A. mexicanus. Here we present the first de novo genome assembly for Astyanax mexicanus cavefish, contrast repeat elements to other teleost genomes, identify candidate genes underlying quantitative trait loci (QTL), and assay these candidate genes for potential functional and expression differences. We expect the cavefish genome to advance understanding of the evolutionary process, as well as, analogous human disease including retinal dysfunction. PMID:25329095

  3. The cavefish genome reveals candidate genes for eye loss

    PubMed Central

    McGaugh, Suzanne E.; Gross, Joshua B.; Aken, Bronwen; Blin, Maryline; Borowsky, Richard; Chalopin, Domitille; Hinaux, Hélène; Jeffery, William R.; Keene, Alex; Ma, Li; Minx, Patrick; Murphy, Daniel; O’Quin, Kelly E.; Rétaux, Sylvie; Rohner, Nicolas; Searle, Steve M. J.; Stahl, Bethany A.; Tabin, Cliff; Volff, Jean-Nicolas; Yoshizawa, Masato; Warren, Wesley C.

    2014-01-01

    Natural populations subjected to strong environmental selection pressures offer a window into the genetic underpinnings of evolutionary change. Cavefish populations, Astyanax mexicanus (Teleostei: Characiphysi), exhibit repeated, independent evolution for a variety of traits including eye degeneration, pigment loss, increased size and number of taste buds and mechanosensory organs, and shifts in many behavioural traits. Surface and cave forms are interfertile making this system amenable to genetic interrogation; however, lack of a reference genome has hampered efforts to identify genes responsible for changes in cave forms of A. mexicanus. Here we present the first de novo genome assembly for Astyanax mexicanus cavefish, contrast repeat elements to other teleost genomes, identify candidate genes underlying quantitative trait loci (QTL), and assay these candidate genes for potential functional and expression differences. We expect the cavefish genome to advance understanding of the evolutionary process, as well as, analogous human disease including retinal dysfunction. PMID:25329095

  4. Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss

    PubMed Central

    2010-01-01

    Background The bacterial genus Listeria contains pathogenic and non-pathogenic species, including the pathogens L. monocytogenes and L. ivanovii, both of which carry homologous virulence gene clusters such as the prfA cluster and clusters of internalin genes. Initial evidence for multiple deletions of the prfA cluster during the evolution of Listeria indicates that this genus provides an interesting model for studying the evolution of virulence and also presents practical challenges with regard to definition of pathogenic strains. Results To better understand genome evolution and evolution of virulence characteristics in Listeria, we used a next generation sequencing approach to generate draft genomes for seven strains representing Listeria species or clades for which genome sequences were not available. Comparative analyses of these draft genomes and six publicly available genomes, which together represent the main Listeria species, showed evidence for (i) a pangenome with 2,032 core and 2,918 accessory genes identified to date, (ii) a critical role of gene loss events in transition of Listeria species from facultative pathogen to saprotroph, even though a consistent pattern of gene loss seemed to be absent, and a number of isolates representing non-pathogenic species still carried some virulence associated genes, and (iii) divergence of modern pathogenic and non-pathogenic Listeria species and strains, most likely circa 47 million years ago, from a pathogenic common ancestor that contained key virulence genes. Conclusions Genome evolution in Listeria involved limited gene loss and acquisition as supported by (i) a relatively high coverage of the predicted pan-genome by the observed pan-genome, (ii) conserved genome size (between 2.8 and 3.2 Mb), and (iii) a highly syntenic genome. Limited gene loss in Listeria did include loss of virulence associated genes, likely associated with multiple transitions to a saprotrophic lifestyle. The genus Listeria thus provides

  5. Mapping ancestral genomes with massive gene loss: A matrix sandwich problem

    PubMed Central

    Gavranović, Haris; Chauve, Cedric; Salse, Jérôme; Tannier, Eric

    2011-01-01

    Motivation: Ancestral genomes provide a better way to understand the structural evolution of genomes than the simple comparison of extant genomes. Most ancestral genome reconstruction methods rely on universal markers, that is, homologous families of DNA segments present in exactly one exemplar in every considered species. Complex histories of genes or other markers, undergoing duplications and losses, are rarely taken into account. It follows that some ancestors are inaccessible by these methods, such as the proto–monocotyledon whose evolution involved massive gene loss following a whole genome duplication. Results: We propose a mapping approach based on the combinatorial notion of ‘sandwich consecutive ones matrix’, which explicitly takes gene losses into account. We introduce combinatorial optimization problems related to this concept, and propose a heuristic solver and a lower bound on the optimal solution. We use these results to propose a configuration for the proto-chromosomes of the monocot ancestor, and study the accuracy of this configuration. We also use our method to reconstruct the ancestral boreoeutherian genomes, which illustrates that the framework we propose is not specific to plant paleogenomics but is adapted to reconstruct any ancestral genome from extant genomes with heterogeneous marker content. Availability: Upon request to the authors. Contact: haris.gavranovic@gmail.com; eric.tannier@inria.fr PMID:21685079

  6. The CACTA transposon Bot1 played a major role in Brassica genome divergence and gene proliferation.

    PubMed

    Alix, Karine; Joets, Johann; Ryder, Carol D; Moore, Jay; Barker, Guy C; Bailey, John P; King, Graham J; Pat Heslop-Harrison, John S

    2008-12-01

    We isolated and characterized a Brassica C genome-specific CACTA element, which was designated Bot1 (Brassica oleracea transposon 1). After analysing phylogenetic relationships, copy numbers and sequence similarity of Bot1 and Bot1 analogues in B. oleracea (C genome) versus Brassica rapa (A genome), we concluded that Bot1 has encountered several rounds of amplification in the oleracea genome only, and has played a major role in the recent rapa and oleracea genome divergence. We performed in silico analyses of the genomic organization and internal structure of Bot1, and established which segment of Bot1 is C-genome specific. Our work reports a fully characterized Brassica repetitive sequence that can distinguish the Brassica A and C chromosomes in the allotetraploid Brassica napus, by fluorescent in situ hybridization. We demonstrated that Bot1 carries a host S locus-associated SLL3 gene copy. We speculate that Bot1 was involved in the proliferation of SLL3 around the Brassica genome. The present study reinforces the assumption that transposons are a major driver of genome and gene evolution in higher plants.

  7. GenePING: secure, scalable management of personal genomic data

    PubMed Central

    Adida, Ben; Kohane, Isaac S

    2006-01-01

    Background Patient genomic data are rapidly becoming part of clinical decision making. Within a few years, full genome expression profiling and genotyping will be affordable enough to perform on every individual. The management of such sizeable, yet fine-grained, data in compliance with privacy laws and best practices presents significant security and scalability challenges. Results We present the design and implementation of GenePING, an extension to the PING personal health record system that supports secure storage of large, genome-sized datasets, as well as efficient sharing and retrieval of individual datapoints (e.g. SNPs, rare mutations, gene expression levels). Even with full access to the raw GenePING storage, an attacker cannot discover any stored genomic datapoint on any single patient. Given a large-enough number of patient records, an attacker cannot discover which data corresponds to which patient, or even the size of a given patient's record. The computational overhead of GenePING's security features is a small constant, making the system usable, even in emergency care, on today's hardware. Conclusion GenePING is the first personal health record management system to support the efficient and secure storage and sharing of large genomic datasets. GenePING is available online at , licensed under the LGPL. PMID:16638151

  8. LATERAL GENE TRANSFER AND THE HISTORY OF BACTERIAL GENOMES

    SciTech Connect

    Howard Ochman

    2006-02-22

    The aims of this research were to elucidate the role and extent of lateral transfer in the differentiation of bacterial strains and species, and to assess the impact of gene transfer on the evolution of bacterial genomes. The ultimate goal of the project is to examine the dynamics of a core set of protein-coding genes (i.e., those that are distributed universally among Bacteria) by developing conserved primers that would allow their amplification and sequencing in any bacterial taxa. In addition, we adopted a bioinformatic approach to elucidate the extent of lateral gene transfer in sequenced genome.

  9. From Genomics to Gene Therapy: Induced Pluripotent Stem Cells Meet Genome Editing.

    PubMed

    Hotta, Akitsu; Yamanaka, Shinya

    2015-01-01

    The advent of induced pluripotent stem (iPS) cells has opened up numerous avenues of opportunity for cell therapy, including the initiation in September 2014 of the first human clinical trial to treat dry age-related macular degeneration. In parallel, advances in genome-editing technologies by site-specific nucleases have dramatically improved our ability to edit endogenous genomic sequences at targeted sites of interest. In fact, clinical trials have already begun to implement this technology to control HIV infection. Genome editing in iPS cells is a powerful tool and enables researchers to investigate the intricacies of the human genome in a dish. In the near future, the groundwork laid by such an approach may expand the possibilities of gene therapy for treating congenital disorders. In this review, we summarize the exciting progress being made in the utilization of genomic editing technologies in pluripotent stem cells and discuss remaining challenges toward gene therapy applications.

  10. Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome.

    PubMed

    Muñoz-Amatriaín, María; Lonardi, Stefano; Luo, MingCheng; Madishetty, Kavitha; Svensson, Jan T; Moscou, Matthew J; Wanamaker, Steve; Jiang, Tao; Kleinhofs, Andris; Muehlbauer, Gary J; Wise, Roger P; Stein, Nils; Ma, Yaqin; Rodriguez, Edmundo; Kudrna, Dave; Bhat, Prasanna R; Chao, Shiaoman; Condamine, Pascal; Heinen, Shane; Resnik, Josh; Wing, Rod; Witt, Heather N; Alpert, Matthew; Beccuti, Marco; Bozdag, Serdar; Cordero, Francesca; Mirebrahim, Hamid; Ounit, Rachid; Wu, Yonghui; You, Frank; Zheng, Jie; Simková, Hana; Dolezel, Jaroslav; Grimwood, Jane; Schmutz, Jeremy; Duma, Denisa; Altschmied, Lothar; Blake, Tom; Bregitzer, Phil; Cooper, Laurel; Dilbirligi, Muharrem; Falk, Anders; Feiz, Leila; Graner, Andreas; Gustafson, Perry; Hayes, Patrick M; Lemaux, Peggy; Mammadov, Jafar; Close, Timothy J

    2015-10-01

    Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant. PMID:26252423

  11. Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome.

    PubMed

    Muñoz-Amatriaín, María; Lonardi, Stefano; Luo, MingCheng; Madishetty, Kavitha; Svensson, Jan T; Moscou, Matthew J; Wanamaker, Steve; Jiang, Tao; Kleinhofs, Andris; Muehlbauer, Gary J; Wise, Roger P; Stein, Nils; Ma, Yaqin; Rodriguez, Edmundo; Kudrna, Dave; Bhat, Prasanna R; Chao, Shiaoman; Condamine, Pascal; Heinen, Shane; Resnik, Josh; Wing, Rod; Witt, Heather N; Alpert, Matthew; Beccuti, Marco; Bozdag, Serdar; Cordero, Francesca; Mirebrahim, Hamid; Ounit, Rachid; Wu, Yonghui; You, Frank; Zheng, Jie; Simková, Hana; Dolezel, Jaroslav; Grimwood, Jane; Schmutz, Jeremy; Duma, Denisa; Altschmied, Lothar; Blake, Tom; Bregitzer, Phil; Cooper, Laurel; Dilbirligi, Muharrem; Falk, Anders; Feiz, Leila; Graner, Andreas; Gustafson, Perry; Hayes, Patrick M; Lemaux, Peggy; Mammadov, Jafar; Close, Timothy J

    2015-10-01

    Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant.

  12. Detection of Prokaryotic Genes in the Amphimedon queenslandica Genome

    PubMed Central

    Conaco, Cecilia; Tsoulfas, Pantelis; Sakarya, Onur; Dolan, Amanda; Werren, John; Kosik, Kenneth S.

    2016-01-01

    Horizontal gene transfer (HGT) is common between prokaryotes and phagotrophic eukaryotes. In metazoans, the scale and significance of HGT remains largely unexplored but is usually linked to a close association with parasites and endosymbionts. Marine sponges (Porifera), which host many microorganisms in their tissues and lack an isolated germ line, are potential carriers of genes transferred from prokaryotes. In this study, we identified a number of potential horizontally transferred genes within the genome of the sponge, Amphimedon queenslandica. We further identified homologs of some of these genes in other sponges. The transferred genes, most of which possess catalytic activity for carbohydrate or protein metabolism, have assimilated host genome characteristics and are actively expressed. The diversity of functions contributed by the horizontally transferred genes is likely an important factor in the adaptation and evolution of A. queenslandica. These findings highlight the potential importance of HGT on the success of sponges in diverse ecological niches. PMID:26959231

  13. Detection of Prokaryotic Genes in the Amphimedon queenslandica Genome.

    PubMed

    Conaco, Cecilia; Tsoulfas, Pantelis; Sakarya, Onur; Dolan, Amanda; Werren, John; Kosik, Kenneth S

    2016-01-01

    Horizontal gene transfer (HGT) is common between prokaryotes and phagotrophic eukaryotes. In metazoans, the scale and significance of HGT remains largely unexplored but is usually linked to a close association with parasites and endosymbionts. Marine sponges (Porifera), which host many microorganisms in their tissues and lack an isolated germ line, are potential carriers of genes transferred from prokaryotes. In this study, we identified a number of potential horizontally transferred genes within the genome of the sponge, Amphimedon queenslandica. We further identified homologs of some of these genes in other sponges. The transferred genes, most of which possess catalytic activity for carbohydrate or protein metabolism, have assimilated host genome characteristics and are actively expressed. The diversity of functions contributed by the horizontally transferred genes is likely an important factor in the adaptation and evolution of A. queenslandica. These findings highlight the potential importance of HGT on the success of sponges in diverse ecological niches. PMID:26959231

  14. Genomes, diversity and resistance gene analogues in Musa species.

    PubMed

    Azhar, M; Heslop-Harrison, J S

    2008-01-01

    Resistance genes (R genes) in plants are abundant and may represent more than 1% of all the genes. Their diversity is critical to the recognition and response to attack from diverse pathogens. Like many other crops, banana and plantain face attacks from potentially devastating fungal and bacterial diseases, increased by a combination of worldwide spread of pathogens, exploitation of a small number of varieties, new pathogen mutations, and the lack of effective, benign and cheap chemical control. The challenge for plant breeders is to identify and exploit genetic resistances to diseases, which is particularly difficult in banana and plantain where the valuable cultivars are sterile, parthenocarpic and mostly triploid so conventional genetic analysis and breeding is impossible. In this paper, we review the nature of R genes and the key motifs, particularly in the Nucleotide Binding Sites (NBS), Leucine Rich Repeat (LRR) gene class. We present data about identity, nature and evolutionary diversity of the NBS domains of Musa R genes in diploid wild species with the Musa acuminata (A), M. balbisiana (B), M. schizocarpa (S), M. textilis (T), M. velutina and M. ornata genomes, and from various cultivated hybrid and triploid accessions, using PCR primers to isolate the domains from genomic DNA. Of 135 new sequences, 75% of the sequenced clones had uninterrupted open reading frames (ORFs), and phylogenetic UPGMA tree construction showed four clusters, one from Musa ornata, one largely from the B and T genomes, one from A and M. velutina, and the largest with A, B, T and S genomes. Only genes of the coiled-coil (non-TIR) class were found, typical of the grasses and presumably monocotyledons. The analysis of R genes in cultivated banana and plantain, and their wild relatives, has implications for identification and selection of resistance genes within the genus which may be useful for plant selection and breeding and also for defining relationships and genome evolution

  15. Genomes, diversity and resistance gene analogues in Musa species.

    PubMed

    Azhar, M; Heslop-Harrison, J S

    2008-01-01

    Resistance genes (R genes) in plants are abundant and may represent more than 1% of all the genes. Their diversity is critical to the recognition and response to attack from diverse pathogens. Like many other crops, banana and plantain face attacks from potentially devastating fungal and bacterial diseases, increased by a combination of worldwide spread of pathogens, exploitation of a small number of varieties, new pathogen mutations, and the lack of effective, benign and cheap chemical control. The challenge for plant breeders is to identify and exploit genetic resistances to diseases, which is particularly difficult in banana and plantain where the valuable cultivars are sterile, parthenocarpic and mostly triploid so conventional genetic analysis and breeding is impossible. In this paper, we review the nature of R genes and the key motifs, particularly in the Nucleotide Binding Sites (NBS), Leucine Rich Repeat (LRR) gene class. We present data about identity, nature and evolutionary diversity of the NBS domains of Musa R genes in diploid wild species with the Musa acuminata (A), M. balbisiana (B), M. schizocarpa (S), M. textilis (T), M. velutina and M. ornata genomes, and from various cultivated hybrid and triploid accessions, using PCR primers to isolate the domains from genomic DNA. Of 135 new sequences, 75% of the sequenced clones had uninterrupted open reading frames (ORFs), and phylogenetic UPGMA tree construction showed four clusters, one from Musa ornata, one largely from the B and T genomes, one from A and M. velutina, and the largest with A, B, T and S genomes. Only genes of the coiled-coil (non-TIR) class were found, typical of the grasses and presumably monocotyledons. The analysis of R genes in cultivated banana and plantain, and their wild relatives, has implications for identification and selection of resistance genes within the genus which may be useful for plant selection and breeding and also for defining relationships and genome evolution

  16. Genome engineering using a synthetic gene circuit in Bacillus subtilis

    PubMed Central

    Jeong, Da-Eun; Park, Seung-Hwan; Pan, Jae-Gu; Kim, Eui-Joong; Choi, Soo-Keun

    2015-01-01

    Genome engineering without leaving foreign DNA behind requires an efficient counter-selectable marker system. Here, we developed a genome engineering method in Bacillus subtilis using a synthetic gene circuit as a counter-selectable marker system. The system contained two repressible promoters (B. subtilis xylA (Pxyl) and spac (Pspac)) and two repressor genes (lacI and xylR). Pxyl-lacI was integrated into the B. subtilis genome with a target gene containing a desired mutation. The xylR and Pspac–chloramphenicol resistant genes (cat) were located on a helper plasmid. In the presence of xylose, repression of XylR by xylose induced LacI expression, the LacIs repressed the Pspac promoter and the cells become chloramphenicol sensitive. Thus, to survive in the presence of chloramphenicol, the cell must delete Pxyl-lacI by recombination between the wild-type and mutated target genes. The recombination leads to mutation of the target gene. The remaining helper plasmid was removed easily under the chloramphenicol absent condition. In this study, we showed base insertion, deletion and point mutation of the B. subtilis genome without leaving any foreign DNA behind. Additionally, we successfully deleted a 2-kb gene (amyE) and a 38-kb operon (ppsABCDE). This method will be useful to construct designer Bacillus strains for various industrial applications. PMID:25552415

  17. The Quality and Validation of Structures from Structural Genomics

    PubMed Central

    Domagalski, Marcin J.; Zheng, Heping; Zimmerman, Matthew D.; Dauter, Zbigniew; Wlodawer, Alexander; Minor, Wladek

    2014-01-01

    Quality control of three-dimensional structures of macromolecules is a critical step to ensure the integrity of structural biology data, especially those produced by structural genomics centers. Whereas the Protein Data Bank (PDB) has proven to be a remarkable success overall, the inconsistent quality of structures reveals a lack of universal standards for structure/deposit validation. Here, we review the state-of-the-art methods used in macromolecular structure validation, focusing on validation of structures determined by X-ray crystallography. We describe some general protocols used in the rebuilding and re-refinement of problematic structural models. We also briefly discuss some frontier areas of structure validation, including refinement of protein–ligand complexes, automation of structure redetermination, and the use of NMR structures and computational models to solve X-ray crystal structures by molecular replacement. PMID:24203341

  18. Structure and evolution of Paramecium hemoglobin genes.

    PubMed

    Yamauchi, K; Tada, H; Usuki, I

    1995-10-17

    Hemoglobin (Hb) genes have been cloned from three different species of ciliated protists, P. multimicronucleatum, P. triaurelia and P. jenningsi. Southern blotting of the genomic DNAs using the P. caudatum Hb cDNA showed both intraspecies variation in different stocks of P. caudatum and interspecies variation within the genus Paramecium. The isolated Hb genes were composed of 118, 117 and 117 codons, and interrupted by a short intron with 27, 29 and 29 bp at the same position, in P. multimicronucleatum, P. triaurelia and P. jenningsi, respectively. This suggests that the one-intron and two-exon structure has been conserved in the Hb genes in this genus. The amino acid sequences of the Paramecium Hbs were more than 87% identical to one another and homologous to those from the other ciliated protists Tetrahymena thermophila and T. pyriformis, the green alga Chlamydomonas eugametos, and the cyanobacterium Nostoc commune Hbs, all of which consist of about 120 amino acid residues (120-aa group). In particular, the amino acid sequences of the P. triaurelia and P. jenningsi Hbs were the same, although there were 20 nucleotide differences between the coding regions in the two genes. A maximum likelihood inference as to the phylogenetic relationships among these genes suggests that the Paramecium Hbs genes have evolved more rapidly than the other genes in the 120-aa group, and that P. triaurelia and P. genningsi are sibling species and the P. aurelia complex became a small cell after it separated from P. jenningsi.

  19. Genomic landscape of DNA repair genes in cancer.

    PubMed

    Chae, Young Kwang; Anker, Jonathan F; Carneiro, Benedito A; Chandra, Sunandana; Kaplan, Jason; Kalyan, Aparna; Santa-Maria, Cesar A; Platanias, Leonidas C; Giles, Francis J

    2016-04-26

    DNA repair genes are frequently mutated in cancer, yet limited data exist regarding the overall genomic landscape and functional implications of these alterations in their entirety. We created comprehensive lists of DNA repair genes and indirect caretakers. Mutation, copy number variation (CNV), and expression frequencies of these genes were analyzed in COSMIC. Mutation co-occurrence, clinical outcomes, and mutation burden were analyzed in TCGA. We report the 20 genes most frequently with mutations (n > 19,689 tumor samples for each gene), CNVs (n > 1,556), or up- or down-regulated (n = 7,998). Mutual exclusivity was observed as no genes displayed both high CNV gain and loss or high up- and down-regulation, and CNV gain and loss positively correlated with up- and down-regulation, respectively. Co-occurrence of mutations differed between cancers, and mutations in many DNA repair genes were associated with higher total mutation burden. Mutation and CNV frequencies offer insights into which genes may play tumor suppressive or oncogenic roles, such as NEIL2 and RRM2B, respectively. Mutual exclusivities within CNV and expression frequencies, and correlations between CNV and expression, support the functionality of these genomic alterations. This study provides comprehensive lists of candidate genes as potential biomarkers for genomic instability, novel therapeutic targets, or predictors of immunotherapy efficacy.

  20. Genomic landscape of DNA repair genes in cancer

    PubMed Central

    Carneiro, Benedito A.; Chandra, Sunandana; Kaplan, Jason; Kalyan, Aparna; Santa-Maria, Cesar A.; Platanias, Leonidas C.; Giles, Francis J.

    2016-01-01

    DNA repair genes are frequently mutated in cancer, yet limited data exist regarding the overall genomic landscape and functional implications of these alterations in their entirety.  We created comprehensive lists of DNA repair genes and indirect caretakers.  Mutation, copy number variation (CNV), and expression frequencies of these genes were analyzed in COSMIC. Mutation co-occurrence, clinical outcomes, and mutation burden were analyzed in TCGA. We report the 20 genes most frequently with mutations (n > 19,689 tumor samples for each gene), CNVs (n > 1,556), or up- or down-regulated (n = 7,998).  Mutual exclusivity was observed as no genes displayed both high CNV gain and loss or high up- and down-regulation, and CNV gain and loss positively correlated with up- and down-regulation, respectively. Co-occurrence of mutations differed between cancers, and mutations in many DNA repair genes were associated with higher total mutation burden. Mutation and CNV frequencies offer insights into which genes may play tumor suppressive or oncogenic roles, such as NEIL2 and RRM2B, respectively.  Mutual exclusivities within CNV and expression frequencies, and correlations between CNV and expression, support the functionality of these genomic alterations. This study provides comprehensive lists of candidate genes as potential biomarkers for genomic instability, novel therapeutic targets, or predictors of immunotherapy efficacy. PMID:27004405

  1. Comparative genetics and genomics of nematodes: genome structure, development, and lifestyle.

    PubMed

    Sommer, Ralf J; Streit, Adrian

    2011-01-01

    Nematodes are found in virtually all habitats on earth. Many of them are parasites of plants and animals, including humans. The free-living nematode, Caenorhabditis elegans, is one of the genetically best-studied model organisms and was the first metazoan whose genome was fully sequenced. In recent years, the draft genome sequences of another six nematodes representing four of the five major clades of nematodes were published. Compared to mammalian genomes, all these genomes are very small. Nevertheless, they contain almost the same number of genes as the human genome. Nematodes are therefore a very attractive system for comparative genetic and genomic studies, with C. elegans as an excellent baseline. Here, we review the efforts that were made to extend genetic analysis to nematodes other than C. elegans, and we compare the seven available nematode genomes. One of the most striking findings is the unexpectedly high incidence of gene acquisition through horizontal gene transfer (HGT). PMID:21721943

  2. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    PubMed

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-12-11

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.

  3. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    PubMed

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-01-01

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives. PMID:26658305

  4. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome

    PubMed Central

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-01-01

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives. PMID:26658305

  5. Genome-wide analysis and identification of genes related to expansin gene family in indica rice.

    PubMed

    Hemalatha, N; Rajesh, M K; Narayanan, N K

    2011-01-01

    In this study, we carried out genome-wide analyses to explore expansin gene family in the genome of indica rice. Reference nucleotides were chosen as query sequences for searches in the indica rice genome database. Clones having genomic sequences similar to expansin were taken and converted to amino acid sequences. Putative sequences were subjected to PROSITE and Pfam databases, and 21 signature-sequences-related expansin gene family was obtained. The presence of transmembrane domains was also predicted for all 21 expansin proteins. A phylogenetic tree was generated from the alignments of the proteins sequences to examine the phylogenetic relationship of indica rice expansin proteins.

  6. Identification of the major structural and nonstructural proteins encoded by human parvovirus B19 and mapping of their genes by procaryotic expression of isolated genomic fragments.

    PubMed

    Cotmore, S F; McKie, V C; Anderson, L J; Astell, C R; Tattersall, P

    1986-11-01

    Plasma from a child with homozygous sickle-cell disease, sampled during the early phase of an aplastic crisis, contained human parvovirus B19 virions. Plasma taken 10 days later (during the convalescent phase) contained both immunoglobulin M and immunoglobulin G antibodies directed against two viral polypeptides with apparent molecular weights of 83,000 and 58,000 which were present exclusively in the particulate fraction of the plasma taken during the acute phase. These two protein species comigrated at 110S on neutral sucrose velocity gradients with the B19 viral DNA and thus appear to constitute the viral capsid polypeptides. The B19 genome was molecularly cloned into a bacterial plasmid vector. Restriction endonuclease fragments of this cloned B19 genome were treated with BAL 31 and shotgun cloned into the open reading frame expression vector pJS413. Two expression constructs containing B19 sequences from different halves of the viral genome were obtained, which directed the synthesis, in bacteria, of segments of virally encoded protein. These polypeptide fragments were then purified and used to immunize rabbits. Antibodies against a protein sequence specified between nucleotides 2897 and 3749 recognized both the 83- and 58-kilodalton capsid polypeptides in aplastic plasma taken during the acute phase and detected similar proteins in the tissues of a stillborn fetus which had been infected transplacentally with B19. Antibodies against a protein sequence encoded in the other half of the B19 genome (nucleotides 1072 through 2044) did not react specifically with any protein in plasma taken during the acute phase but recognized three nonstructural polypeptides of 71, 63, and 52 kilodaltons present in the liver and, at lower levels, in some other tissues of the transplacentally infected fetus.

  7. Efficient Gene Tree Correction Guided by Genome Evolution

    PubMed Central

    Lafond, Manuel; Seguin, Jonathan; Boussau, Bastien; Guéguen, Laurent; El-Mabrouk, Nadia; Tannier, Eric

    2016-01-01

    Motivations Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases. Results We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny. Availability A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available. PMID:27513924

  8. Bacterial origin of a diverse family of UDP-glycosyltransferase genes in the Tetranychus urticae genome.

    PubMed

    Ahn, Seung-Joon; Dermauw, Wannes; Wybouw, Nicky; Heckel, David G; Van Leeuwen, Thomas

    2014-07-01

    UDP-glycosyltransferases (UGTs) catalyze the conjugation of a variety of small lipophilic molecules with uridine diphosphate (UDP) sugars, altering them into more water-soluble metabolites. Thereby, UGTs play an important role in the detoxification of xenobiotics and in the regulation of endobiotics. Recently, the genome sequence was reported for the two-spotted spider mite, Tetranychus urticae, a polyphagous herbivore damaging a number of agricultural crops. Although various gene families implicated in xenobiotic metabolism have been documented in T. urticae, UGTs so far have not. We identified 80 UGT genes in the T. urticae genome, the largest number of UGT genes in a metazoan species reported so far. Phylogenetic analysis revealed that lineage-specific gene expansions increased the diversity of the T. urticae UGT repertoire. Genomic distribution, intron-exon structure and structural motifs in the T. urticae UGTs were also described. In addition, expression profiling after host-plant shifts and in acaricide resistant lines supported an important role for UGT genes in xenobiotic metabolism. Expanded searches of UGTs in other arachnid species (Subphylum Chelicerata), including a spider, a scorpion, two ticks and two predatory mites, unexpectedly revealed the complete absence of UGT genes. However, a centipede (Subphylum Myriapoda) and a water flea and a crayfish (Subphylum Crustacea) contain UGT genes in their genomes similar to insect UGTs, suggesting that the UGT gene family might have been lost early in the Chelicerata lineage and subsequently re-gained in the tetranychid mites. Sequence similarity of T. urticae UGTs and bacterial UGTs and their phylogenetic reconstruction suggest that spider mites acquired UGT genes from bacteria by horizontal gene transfer. Our findings show a unique evolutionary history of the T. urticae UGT gene family among other arthropods and provide important clues to its functions in relation to detoxification and thereby host

  9. Reading Genomes and Controlling Gene Expression

    NASA Astrophysics Data System (ADS)

    Libchaber, Albert

    2000-03-01

    Molecular recognition of DNA sequences is achieved by DNA hybridization of complementary sequences. We present various scenarios for optimization, leading to microarrays and global measurement. Gene expression can be controlled using gene constructs immobilized on a template with micron scale temperature heaters. We will discuss and present results on protein microarrays.

  10. Mechanisms underlying structural variant formation in genomic disorders

    PubMed Central

    Carvalho, Claudia M. B.; Lupski, James R.

    2016-01-01

    With the recent burst of technological developments in genomics, and the clinical implementation of genome-wide assays, our understanding of the molecular basis of genomic disorders, specifically the contribution of structural variation to disease burden, is evolving quickly. Ongoing studies have revealed a ubiquitous role for genome architecture in the formation of structural variants at a given locus, both in DNA recombination-based processes and in replication-based processes. These reports showcase the influence of repeat sequences on genomic stability and structural variant complexity and also highlight the tremendous plasticity and dynamic nature of our genome in evolution, health and disease susceptibility. PMID:26924765

  11. The evolution of chloroplast genome structure in ferns.

    PubMed

    Wolf, Paul G; Roper, Jessie M; Duffy, Aaron M

    2010-09-01

    The plastid genome (plastome) is a rich source of phylogenetic and other comparative data in plants. Most land plants possess a plastome of similar structure. However, in a major group of plants, the ferns, a unique plastome structure has evolved. The gene order in ferns has been explained by a series of genomic inversions relative to the plastome organization of seed plants. Here, we examine for the first time the structure of the plastome across fern phylogeny. We used a PCR-based strategy to map and partially sequence plastomes. We found that a pair of partially overlapping inversions in the region of the inverted repeat occurred in the common ancestor of most ferns. However, the ancestral (seed plant) structure is still found in early diverging branches leading to the osmundoid and filmy fern lineages. We found that a second pair of overlapping inversions occurred on a branch leading to the core leptosporangiates. We also found that the unique placement of the gene matK in ferns (lacking a flanking intron) is not a result of a large-scale inversion, as previously thought. This is because the intron loss maps to an earlier point on the phylogeny than the nearby inversion. We speculate on why inversions may occur in pairs and what this may mean for the dynamics of plastome evolution.

  12. Functional and Evolutionary Characterization of a Gene Transfer Agent's Multilocus "Genome".

    PubMed

    Hynes, Alexander P; Shakya, Migun; Mercer, Ryan G; Grüll, Marc P; Bown, Luke; Davidson, Fraser; Steffen, Ekaterina; Matchem, Heidi; Peach, Mandy E; Berger, Tim; Grebe, Katherine; Zhaxybayeva, Olga; Lang, Andrew S

    2016-10-01

    Gene transfer agents (GTAs) are phage-like particles that can package and transfer a random piece of the producing cell's genome, but are unable to transfer all the genes required for their own production. As such, GTAs represent an evolutionary conundrum: are they selfish genetic elements propagating through an unknown mechanism, defective viruses, or viral structures "repurposed" by cells for gene exchange, as their name implies? In Rhodobacter capsulatus, production of the R. capsulatus GTA (RcGTA) particles is associated with a cluster of genes resembling a small prophage. Utilizing transcriptomic, genetic and biochemical approaches, we report that the RcGTA "genome" consists of at least 24 genes distributed across five distinct loci. We demonstrate that, of these additional loci, two are involved in cell recognition and binding and one in the production and maturation of RcGTA particles. The five RcGTA "genome" loci are widespread within Rhodobacterales, but not all loci have the same evolutionary histories. Specifically, two of the loci have been subject to frequent, probably virus-mediated, gene transfer events. We argue that it is unlikely that RcGTA is a selfish genetic element. Instead, our findings are compatible with the scenario that RcGTA is a virus-derived element maintained by the producing organism due to a selective advantage of within-population gene exchange. The modularity of the RcGTA "genome" is presumably a result of selection on the host organism to retain GTA functionality. PMID:27343288

  13. Mapping and annotating obesity-related genes in pig and human genomes.

    PubMed

    Martelli, Pier Luigi; Fontanesi, Luca; Piovesan, Damiano; Fariselli, Piero; Casadio, Rita

    2014-01-01

    Background. Obesity is a major health problem in both developed and emerging countries. Obesity is a complex disease whose etiology involves genetic factors in strong interplay with environmental determinants and lifestyle. The discovery of genetic factors and biological pathways underlying human obesity is hampered by the difficulty in controlling the genetic background of human cohorts. Animal models are then necessary to further dissect the genetics of obesity. Pig has emerged as one of the most attractive models, because of the similarity with humans in the mechanisms regulating the fat deposition. Results. We collected the genes related to obesity in humans and to fat deposition traits in pig. We localized them on both human and pig genomes, building a map useful to interpret comparative studies on obesity. We characterized the collected genes structurally and functionally with BAR+ and mapped them on KEGG pathways and on STRING protein interaction network. Conclusions. The collected set consists of 361 obesity related genes in human and pig genomes. All genes were mapped on the human genome, and 54 could not be localized on the pig genome (release 2012). Only for 3 human genes there is no counterpart in pig, confirming that this animal is a good model for human obesity studies. Obesity related genes are mostly involved in regulation and signaling processes/pathways and relevant connection emerges between obesity-related genes and diseases such as cancer and infectious diseases.

  14. The structural code of cyanobacterial genomes

    PubMed Central

    Lehmann, Robert; Machné, Rainer; Herzel, Hanspeter

    2014-01-01

    A periodic bias in nucleotide frequency with a period of about 11 bp is characteristic for bacterial genomes. This signal is commonly interpreted to relate to the helical pitch of negatively supercoiled DNA. Functions in supercoiling-dependent RNA transcription or as a ‘structural code’ for DNA packaging have been suggested. Cyanobacterial genomes showed especially strong periodic signals and, on the other hand, DNA supercoiling and supercoiling-dependent transcription are highly dynamic and underlie circadian rhythms of these phototrophic bacteria. Focusing on this phylum and dinucleotides, we find that a minimal motif of AT-tracts (AT2) yields the strongest signal. Strong genome-wide periodicity is ancestral to a clade of unicellular and polyploid species but lost upon morphological transitions into two baeocyte-forming and a symbiotic species. The signal is intermediate in heterocystous species and weak in monoploid picocyanobacteria. A pronounced ‘structural code’ may support efficient nucleoid condensation and segregation in polyploid cells. The major source of the AT2 signal are protein-coding regions, where it is encoded preferentially in the first and third codon positions. The signal shows only few relations to supercoiling-dependent and diurnal RNA transcription in Synechocystis sp. PCC 6803. Strong and specific signals in two distinct transposons suggest roles in transposase transcription and transpososome formation. PMID:25056315

  15. Integrated genome-wide analysis of genomic changes and gene regulation in human adrenocortical tissue samples

    PubMed Central

    Gara, Sudheer Kumar; Wang, Yonghong; Patel, Dhaval; Liu-Chittenden, Yi; Jain, Meenu; Boufraqech, Myriem; Zhang, Lisa; Meltzer, Paul S.; Kebebew, Electron

    2015-01-01

    To gain insight into the pathogenesis of adrenocortical carcinoma (ACC) and whether there is progression from normal-to-adenoma-to-carcinoma, we performed genome-wide gene expression, gene methylation, microRNA expression and comparative genomic hybridization (CGH) analysis in human adrenocortical tissue (normal, adrenocortical adenomas and ACC) samples. A pairwise comparison of normal, adrenocortical adenomas and ACC gene expression profiles with more than four-fold expression differences and an adjusted P-value < 0.05 revealed no major differences in normal versus adrenocortical adenoma whereas there are 808 and 1085, respectively, dysregulated genes between ACC versus adrenocortical adenoma and ACC versus normal. The majority of the dysregulated genes in ACC were downregulated. By integrating the CGH, gene methylation and expression profiles of potential miRNAs with the gene expression of dysregulated genes, we found that there are higher alterations in ACC versus normal compared to ACC versus adrenocortical adenoma. Importantly, we identified several novel molecular pathways that are associated with dysregulated genes and further experimentally validated that oncostatin m signaling induces caspase 3 dependent apoptosis and suppresses cell proliferation. Finally, we propose that there is higher number of genomic changes from normal-to-adenoma-to-carcinoma and identified oncostatin m signaling as a plausible druggable pathway for therapeutics. PMID:26446994

  16. Integrated genome-wide analysis of genomic changes and gene regulation in human adrenocortical tissue samples.

    PubMed

    Gara, Sudheer Kumar; Wang, Yonghong; Patel, Dhaval; Liu-Chittenden, Yi; Jain, Meenu; Boufraqech, Myriem; Zhang, Lisa; Meltzer, Paul S; Kebebew, Electron

    2015-10-30

    To gain insight into the pathogenesis of adrenocortical carcinoma (ACC) and whether there is progression from normal-to-adenoma-to-carcinoma, we performed genome-wide gene expression, gene methylation, microRNA expression and comparative genomic hybridization (CGH) analysis in human adrenocortical tissue (normal, adrenocortical adenomas and ACC) samples. A pairwise comparison of normal, adrenocortical adenomas and ACC gene expression profiles with more than four-fold expression differences and an adjusted P-value < 0.05 revealed no major differences in normal versus adrenocortical adenoma whereas there are 808 and 1085, respectively, dysregulated genes between ACC versus adrenocortical adenoma and ACC versus normal. The majority of the dysregulated genes in ACC were downregulated. By integrating the CGH, gene methylation and expression profiles of potential miRNAs with the gene expression of dysregulated genes, we found that there are higher alterations in ACC versus normal compared to ACC versus adrenocortical adenoma. Importantly, we identified several novel molecular pathways that are associated with dysregulated genes and further experimentally validated that oncostatin m signaling induces caspase 3 dependent apoptosis and suppresses cell proliferation. Finally, we propose that there is higher number of genomic changes from normal-to-adenoma-to-carcinoma and identified oncostatin m signaling as a plausible druggable pathway for therapeutics.

  17. Draft Genome Sequence and Gene Annotation of Stemphylium lycopersici Strain CIDEFI-216

    PubMed Central

    Franco, Mario E. E.; López, Silvina; Medina, Rocio; Saparrat, Mario C. N.

    2015-01-01

    Stemphylium lycopersici is a plant-pathogenic fungus that is widely distributed throughout the world. In tomatoes, it is one of the etiological agents of gray leaf spot disease. Here, we report the first draft genome sequence of S. lycopersici, including its gene structure and functional annotation. PMID:26404600

  18. Genomic organization of SLC3A1, a transporter gene mutated in cystinuria

    SciTech Connect

    Pras, E.; Sood, R.; Raben, N.

    1996-08-15

    The SLC3A1 gene encodes a transport protein for cystine and the dibasic amino acids. Recently mutations in this gene have been shown to cause cystinuria. We report the genomic structure and organization of SLC3A1, which is composed of 10 exons and spans nearly 45 kb. Until now screening for mutations in SLC3A1 has been based on RT-PCR amplification of illegitimate mRNA transcripts from white blood cells. In this report we provide primers for amplification of exons from genomic DNA, thus simplifying the process of screening for SLC3A1 mutations in cystinuria. 20 refs., 3 figs., 2 tabs.

  19. Genome Sequencing Fishes out Longevity Genes.

    PubMed

    Lakhina, Vanisha; Murphy, Coleen T

    2015-12-01

    Understanding the molecular basis underlying aging is critical if we are to fully understand how and why we age-and possibly how to delay the aging process. Up until now, most longevity pathways were discovered in invertebrates because of their short lifespans and availability of genetic tools. Now, Reichwald et al. and Valenzano et al. independently provide a reference genome for the short-lived African turquoise killifish, establishing its role as a vertebrate system for aging research. PMID:26638067

  20. Genome Sequencing Fishes out Longevity Genes.

    PubMed

    Lakhina, Vanisha; Murphy, Coleen T

    2015-12-01

    Understanding the molecular basis underlying aging is critical if we are to fully understand how and why we age-and possibly how to delay the aging process. Up until now, most longevity pathways were discovered in invertebrates because of their short lifespans and availability of genetic tools. Now, Reichwald et al. and Valenzano et al. independently provide a reference genome for the short-lived African turquoise killifish, establishing its role as a vertebrate system for aging research.

  1. Precision medicine in breast cancer: genes, genomes, and the future of genomically driven treatments.

    PubMed

    Stover, Daniel G; Wagle, Nikhil

    2015-04-01

    Remarkable progress in sequencing technology over the past 20 years has made it possible to comprehensively profile tumors and identify clinically relevant genomic alterations. In breast cancer, the most common malignancy affecting women, we are now increasingly able to use this technology to help specify the use of therapies that target key molecular and genetic dependencies. Large sequencing studies have confirmed the role of well-known cancer-related genes and have also revealed numerous other genes that are recurrently mutated in breast cancer. This growing understanding of patient-to-patient variability at the genomic level in breast cancer is advancing our ability to direct the appropriate treatment to the appropriate patient at the appropriate time--a hallmark of "precision cancer medicine." This review focuses on the technological advances that have catalyzed these developments, the landscape of mutations in breast cancer, the clinical impact of genomic profiling, and the incorporation of genomic information into clinical care and clinical trials.

  2. Genome sequence, comparative analysis and haplotype structure of the domestic dog.

    PubMed

    Lindblad-Toh, Kerstin; Wade, Claire M; Mikkelsen, Tarjei S; Karlsson, Elinor K; Jaffe, David B; Kamal, Michael; Clamp, Michele; Chang, Jean L; Kulbokas, Edward J; Zody, Michael C; Mauceli, Evan; Xie, Xiaohui; Breen, Matthew; Wayne, Robert K; Ostrander, Elaine A; Ponting, Chris P; Galibert, Francis; Smith, Douglas R; DeJong, Pieter J; Kirkness, Ewen; Alvarez, Pablo; Biagi, Tara; Brockman, William; Butler, Jonathan; Chin, Chee-Wye; Cook, April; Cuff, James; Daly, Mark J; DeCaprio, David; Gnerre, Sante; Grabherr, Manfred; Kellis, Manolis; Kleber, Michael; Bardeleben, Carolyne; Goodstadt, Leo; Heger, Andreas; Hitte, Christophe; Kim, Lisa; Koepfli, Klaus-Peter; Parker, Heidi G; Pollinger, John P; Searle, Stephen M J; Sutter, Nathan B; Thomas, Rachael; Webber, Caleb; Baldwin, Jennifer; Abebe, Adal; Abouelleil, Amr; Aftuck, Lynne; Ait-Zahra, Mostafa; Aldredge, Tyler; Allen, Nicole; An, Peter; Anderson, Scott; Antoine, Claudel; Arachchi, Harindra; Aslam, Ali; Ayotte, Laura; Bachantsang, Pasang; Barry, Andrew; Bayul, Tashi; Benamara, Mostafa; Berlin, Aaron; Bessette, Daniel; Blitshteyn, Berta; Bloom, Toby; Blye, Jason; Boguslavskiy, Leonid; Bonnet, Claude; Boukhgalter, Boris; Brown, Adam; Cahill, Patrick; Calixte, Nadia; Camarata, Jody; Cheshatsang, Yama; Chu, Jeffrey; Citroen, Mieke; Collymore, Alville; Cooke, Patrick; Dawoe, Tenzin; Daza, Riza; Decktor, Karin; DeGray, Stuart; Dhargay, Norbu; Dooley, Kimberly; Dooley, Kathleen; Dorje, Passang; Dorjee, Kunsang; Dorris, Lester; Duffey, Noah; Dupes, Alan; Egbiremolen, Osebhajajeme; Elong, Richard; Falk, Jill; Farina, Abderrahim; Faro, Susan; Ferguson, Diallo; Ferreira, Patricia; Fisher, Sheila; FitzGerald, Mike; Foley, Karen; Foley, Chelsea; Franke, Alicia; Friedrich, Dennis; Gage, Diane; Garber, Manuel; Gearin, Gary; Giannoukos, Georgia; Goode, Tina; Goyette, Audra; Graham, Joseph; Grandbois, Edward; Gyaltsen, Kunsang; Hafez, Nabil; Hagopian, Daniel; Hagos, Birhane; Hall, Jennifer; Healy, Claire; Hegarty, Ryan; Honan, Tracey; Horn, Andrea; Houde, Nathan; Hughes, Leanne; Hunnicutt, Leigh; Husby, M; Jester, Benjamin; Jones, Charlien; Kamat, Asha; Kanga, Ben; Kells, Cristyn; Khazanovich, Dmitry; Kieu, Alix Chinh; Kisner, Peter; Kumar, Mayank; Lance, Krista; Landers, Thomas; Lara, Marcia; Lee, William; Leger, Jean-Pierre; Lennon, Niall; Leuper, Lisa; LeVine, Sarah; Liu, Jinlei; Liu, Xiaohong; Lokyitsang, Yeshi; Lokyitsang, Tashi; Lui, Annie; Macdonald, Jan; Major, John; Marabella, Richard; Maru, Kebede; Matthews, Charles; McDonough, Susan; Mehta, Teena; Meldrim, James; Melnikov, Alexandre; Meneus, Louis; Mihalev, Atanas; Mihova, Tanya; Miller, Karen; Mittelman, Rachel; Mlenga, Valentine; Mulrain, Leonidas; Munson, Glen; Navidi, Adam; Naylor, Jerome; Nguyen, Tuyen; Nguyen, Nga; Nguyen, Cindy; Nguyen, Thu; Nicol, Robert; Norbu, Nyima; Norbu, Choe; Novod, Nathaniel; Nyima, Tenchoe; Olandt, Peter; O'Neill, Barry; O'Neill, Keith; Osman, Sahal; Oyono, Lucien; Patti, Christopher; Perrin, Danielle; Phunkhang, Pema; Pierre, Fritz; Priest, Margaret; Rachupka, Anthony; Raghuraman, Sujaa; Rameau, Rayale; Ray, Verneda; Raymond, Christina; Rege, Filip; Rise, Cecil; Rogers, Julie; Rogov, Peter; Sahalie, Julie; Settipalli, Sampath; Sharpe, Theodore; Shea, Terrance; Sheehan, Mechele; Sherpa, Ngawang; Shi, Jianying; Shih, Diana; Sloan, Jessie; Smith, Cherylyn; Sparrow, Todd; Stalker, John; Stange-Thomann, Nicole; Stavropoulos, Sharon; Stone, Catherine; Stone, Sabrina; Sykes, Sean; Tchuinga, Pierre; Tenzing, Pema; Tesfaye, Senait; Thoulutsang, Dawa; Thoulutsang, Yama; Topham, Kerri; Topping, Ira; Tsamla, Tsamla; Vassiliev, Helen; Venkataraman, Vijay; Vo, Andy; Wangchuk, Tsering; Wangdi, Tsering; Weiand, Michael; Wilkinson, Jane; Wilson, Adam; Yadav, Shailendra; Yang, Shuli; Yang, Xiaoping; Young, Geneva; Yu, Qing; Zainoun, Joanne; Zembek, Lisa; Zimmer, Andrew; Lander, Eric S

    2005-12-01

    Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.

  3. GenePainter: a fast tool for aligning gene structures of eukaryotic protein families, visualizing the alignments and mapping gene structures onto protein structures

    PubMed Central

    2013-01-01

    Background All sequenced eukaryotic genomes have been shown to possess at least a few introns. This includes those unicellular organisms, which were previously suspected to be intron-less. Therefore, gene splicing must have been present at least in the last common ancestor of the eukaryotes. To explain the evolution of introns, basically two mutually exclusive concepts have been developed. The introns-early hypothesis says that already the very first protein-coding genes contained introns while the introns-late concept asserts that eukaryotic genes gained introns only after the emergence of the eukaryotic lineage. A very important aspect in this respect is the conservation of intron positions within homologous genes of different taxa. Results GenePainter is a standalone application for mapping gene structure information onto protein multiple sequence alignments. Based on the multiple sequence alignments the gene structures are aligned down to single nucleotides. GenePainter accounts for variable lengths in exons and introns, respects split codons at intron junctions and is able to handle sequencing and assembly errors, which are possible reasons for frame-shifts in exons and gaps in genome assemblies. Thus, even gene structures of considerably divergent proteins can properly be compared, as it is needed in phylogenetic analyses. Conserved intron positions can also be mapped to user-provided protein structures. For their visualization GenePainter provides scripts for the molecular graphics system PyMol. Conclusions GenePainter is a tool to analyse gene structure conservation providing various visualization options. A stable version of GenePainter for all operating systems as well as documentation and example data are available at http://www.motorprotein.de/genepainter.html. PMID:23496949

  4. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    PubMed

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).

  5. Integrative Genomics Identifies Gene Signature Associated with Melanoma Ulceration

    PubMed Central

    Toth, Reka; Vizkeleti, Laura; Herandez-Vargas, Hector; Lazar, Viktoria; Emri, Gabriella; Szatmari, Istvan; Herceg, Zdenko; Adany, Roza; Balazs, Margit

    2013-01-01

    Background Despite the extensive research approaches applied to characterise malignant melanoma, no specific molecular markers are available that are clearly related to the progression of this disease. In this study, our aims were to define a gene expression signature associated with the clinical outcome of melanoma patients and to provide an integrative interpretation of the gene expression -, copy number alterations -, and promoter methylation patterns that contribute to clinically relevant molecular functional alterations. Methods Gene expression profiles were determined using the Affymetrix U133 Plus2.0 array. The NimbleGen Human CGH Whole-Genome Tiling array was used to define CNAs, and the Illumina GoldenGate Methylation platform was applied to characterise the methylation patterns of overlapping genes. Results We identified two subclasses of primary melanoma: one representing patients with better prognoses and the other being characteristic of patients with unfavourable outcomes. We assigned 1,080 genes as being significantly correlated with ulceration, 987 genes were downregulated and significantly enriched in the p53, Nf-kappaB, and WNT/beta-catenin pathways. Through integrated genome analysis, we defined 150 downregulated genes whose expression correlated with copy number losses in ulcerated samples. These genes were significantly enriched on chromosome 6q and 10q, which contained a total of 36 genes. Ten of these genes were downregulated and involved in cell-cell and cell-matrix adhesion or apoptosis. The expression and methylation patterns of additional genes exhibited an inverse correlation, suggesting that transcriptional silencing of these genes is driven by epigenetic events. Conclusion Using an integrative genomic approach, we were able to identify functionally relevant molecular hotspots characterised by copy number losses and promoter hypermethylation in distinct molecular subtypes of melanoma that contribute to specific transcriptomic silencing

  6. Structure and content of the Entamoeba histolytica genome.

    PubMed

    Clark, C G; Alsmark, U C M; Tazreiter, M; Saito-Nakano, Y; Ali, V; Marion, S; Weber, C; Mukherjee, C; Bruchhaus, I; Tannich, E; Leippe, M; Sicheritz-Ponten, T; Foster, P G; Samuelson, J; Noël, C J; Hirt, R P; Embley, T M; Gilchrist, C A; Mann, B J; Singh, U; Ackers, J P; Bhattacharya, S; Bhattacharya, A; Lohia, A; Guillén, N; Duchêne, M; Nozaki, T; Hall, N

    2007-01-01

    The intestinal parasite Entamoeba histolytica is one of the first protists for which a draft genome sequence has been published. Although the genome is still incomplete, it is unlikely that many genes are missing from the list of those already identified. In this chapter we summarise the features of the genome as they are currently understood and provide previously unpublished analyses of many of the genes.

  7. Identification of the major structural and nonstructural proteins encoded by human parvovirus B19 and mapping of their genes by procaryotic expression of isolated genomic fragments

    SciTech Connect

    Cotmore, S.F.; McKie, V.C.; Anderson, L.J.; Astell, C.R.; Tattersall, P.

    1986-11-01

    Plasma from a child with homozygous sickle-cell disease, sampled during the early phase of an aplastic crisis, contained human parvovirus B19 virions. Plasma taken 10 days later (during the convalescent phase) contained both immunoglobulin M and immunoglobulin G antibodies directed against two viral polypeptides with apparent molecular weights for 83,000 and 58,000 which were present exclusively in the particulate fraction of the plasma taken during the acute phase. These two protein species comigrated at 110S on neutral sucrose velocity gradients with the B19 viral DNA and thus appear to constitute the viral capsid polypeptides. The B19 genome was molecularly cloned into a bacterial plasmid vector. Two expression constructs containing B19 sequences from different halves of the viral genome were obtained, which directed the synthesis, in bacteria, of segments of virally encoded protein. These polypeptide fragments were then purified and used to immunize rabbits. Antibodies against a protein sequence specified between nucleotides 2897 and 3749 recognized both the 83- and 58-kilodalton capsid polypeptides in aplastic plasma taken during the acute phase and detected similar proteins in the similar proteins in the tissues of a stillborn fetus which had been infected transplacentally with B19. Antibodies against a protein sequence encoded in the other half of the B19 genome (nucleotides 1072 through 2044) did not react specifically with any protein in plasma taken during the acute phase but recognized three nonstructural polypeptides of 71, 63, and 52 kilodaltons present in the liver and, at lower levels, in some other tissues of the transplacentally infected fetus.

  8. Genome-wide identification and evolution of HECT genes in soybean.

    PubMed

    Meng, Xianwen; Wang, Chen; Rahman, Siddiq Ur; Wang, Yaxu; Wang, Ailan; Tao, Shiheng

    2015-04-16

    Proteins containing domains homologous to the E6-associated protein (E6-AP) carboxyl terminus (HECT) are an important class of E3 ubiquitin ligases involved in the ubiquitin proteasome pathway. HECT-type E3s play crucial roles in plant growth and development. However, current understanding of plant HECT genes and their evolution is very limited. In this study, we performed a genome-wide analysis of the HECT domain-containing genes in soybean. Using high-quality genome sequences, we identified 19 soybean HECT genes. The predicted HECT genes were distributed unevenly across 15 of 20 chromosomes. Nineteen of these genes were inferred to be segmentally duplicated gene pairs, suggesting that in soybean, segmental duplications have made a significant contribution to the expansion of the HECT gene family. Phylogenetic analysis showed that these HECT genes can be divided into seven groups, among which gene structure and domain architecture was relatively well-conserved. The Ka/Ks ratios show that after the duplication events, duplicated HECT genes underwent purifying selection. Moreover, expression analysis reveals that 15 of the HECT genes in soybean are differentially expressed in 14 tissues, and are often highly expressed in the flowers and roots. In summary, this work provides useful information on which further functional studies of soybean HECT genes can be based.

  9. Analyses of the Complete Genome and Gene Expression of Chloroplast of Sweet Potato [Ipomoea batata

    PubMed Central

    Yan, Lang; Lai, Xianjun; Li, Xuedan; Wei, Changhe; Tan, Xuemei; Zhang, Yizheng

    2015-01-01

    Sweet potato [Ipomoea batatas (L.) Lam] ranks among the top seven most important food crops cultivated worldwide and is hexaploid plant (2n=6x=90) in the Convolvulaceae family with a genome size between 2,200 to 3,000 Mb. The genomic resources for this crop are deficient due to its complicated genetic structure. Here, we report the complete nucleotide sequence of the chloroplast (cp) genome of sweet potato, which is a circular molecule of 161,303 bp in the typical quadripartite structure with large (LSC) and small (SSC) single-copy regions separated by a pair of inverted repeats (IRs). The chloroplast DNA contains a total of 145 genes, including 94 protein-encoding genes of which there are 72 single-copy and 11 double-copy genes. The organization and structure of the chloroplast genome (gene content and order, IR expansion/contraction, random repeating sequences, structural rearrangement) of sweet potato were compared with those of Ipomoea (L.) species and some basal important angiosperms, respectively. Some boundary gene-flow and gene gain-and-loss events were identified at intra- and inter-species levels. In addition, by comparing with the transcriptome sequences of sweet potato, the RNA editing events and differential expressions of the chloroplast functional-genes were detected. Moreover, phylogenetic analysis was conducted based on 77 protein-coding genes from 33 taxa and the result may contribute to a better understanding of the evolution progress of the genus Ipomoea (L.), including phylogenetic relationships, intraspecific differentiation and interspecific introgression. PMID:25874767

  10. A white spruce gene catalog for conifer genome analyses.

    PubMed

    Rigault, Philippe; Boyle, Brian; Lepage, Pierre; Cooke, Janice E K; Bousquet, Jean; MacKay, John J

    2011-09-01

    Several angiosperm plant genomes, including Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), poplar (Populus trichocarpa), and grapevine (Vitis vinifera), have been sequenced, but the lack of reference genomes in gymnosperm phyla reduces our understanding of plant evolution and restricts the potential impacts of genomics research. A gene catalog was developed for the conifer tree Picea glauca (white spruce) through large-scale expressed sequence tag sequencing and full-length cDNA sequencing to facilitate genome characterizations, comparative genomics, and gene mapping. The resource incorporates new and publicly available sequences into 27,720 cDNA clusters, 23,589 of which are represented by full-length insert cDNAs. Expressed sequence tags, mate-pair cDNA clone analysis, and custom sequencing were integrated through an iterative process to improve the accuracy of clustering outcomes. The entire catalog spans 30 Mb of unique transcribed sequence. We estimated that the P. glauca nuclear genome contains up to 32,520 transcribed genes owing to incomplete, partially sequenced, and unsampled transcripts and that its transcriptome could span up to 47 Mb. These estimates are in the same range as the Arabidopsis and rice transcriptomes. Next-generation methods confirmed and enhanced the catalog by providing deeper coverage for rare transcripts, by extending many incomplete clusters, and by augmenting the overall transcriptome coverage to 38 Mb of unique sequence. Genomic sample sequencing at 8.5% of the 19.8-Gb P. glauca genome identified 1,495 clusters representing highly repeated sequences among the cDNA clusters. With a conifer transcriptome in full view, functional and protein domain annotations clearly highlighted the divergences between conifers and angiosperms, likely reflecting their respective evolutionary paths.

  11. Genome-editing technologies for gene correction of hemophilia.

    PubMed

    Park, Chul-Yong; Lee, Dongjin R; Sung, Jin Jea; Kim, Dong-Wook

    2016-09-01

    Hemophilia is caused by various mutations in blood coagulation factor genes, including factor VIII (FVIII) and factor IX (FIX), that encode key proteins in the blood clotting pathway. Although the addition of therapeutic genes or infusion of clotting factors may be used to remedy hemophilia's symptoms, no permanent cure for the disease exists. Moreover, patients often develop neutralizing antibodies or experience adverse effects that limit the therapy's benefits. However, targeted gene therapy involving the precise correction of these mutated genes at the genome level using programmable nucleases is a promising strategy. These nucleases can induce double-strand breaks (DSBs) on genomes, and repairs of such induced DSBs by the two cellular repair systems enable a targeted gene correction. Going beyond cultured cell systems, we are now entering the age of direct gene correction in vivo using various delivery tools. Here, we describe the current status of in vivo and ex vivo genome-editing technology related to potential hemophilia gene correction and the prominent issues surrounding its application in patients with monogenic diseases.

  12. Molecular cloning, genomic organization, and chromosomal localization of the human pancreatitis-associated protein (PAP) gene

    SciTech Connect

    Dusetti, N.J.; Frigerio, J.M.; Dagorn, J.C.; Iovanna, J.L. ); Fox, M.F.; Swallow, D.M. )

    1994-01-01

    Pancreatitis-associated protein (PAP) is a secretory pancreatic protein present in small amounts in normal pancreas and overexpressed during the acute phase of pancreatitis. In this paper, the authors describe the cloning, characterization, and chromosomal mapping of the human PAP gene. The gene spans 2748 bp and contains six exons interrupted by five introns. The gene has a typical promoter containing the sequences TATAAA and CCAAT 28 and 52 bp upstream of the cap site, respectively. They found striking similarities in genomic organization as well as in the promoter sequences between the human and rat PAP genes. The human PAP gene was mapped to chromosome 2p12 using rodent-human hybrid cells and in situ chromosomal hybridization. This localization coincides with that of the reg/lithostathine gene, which encodes a pancreatic secretory protein structurally related to PAP, suggesting that both genes derived from the same ancestral gene by duplication. 35 refs., 4 figs., 1 tab.

  13. Genome-wide analysis of the GRAS gene family in Chinese cabbage (Brassica rapa ssp. pekinensis).

    PubMed

    Song, Xiao-Ming; Liu, Tong-Kun; Duan, Wei-Ke; Ma, Qing-Hua; Ren, Jun; Wang, Zhen; Li, Ying; Hou, Xi-Lin

    2014-01-01

    The GRAS gene family is one of the most important families of transcriptional regulators. In this study, 48 GRAS genes are identified from Chinese cabbage, and they are classified into eight groups according to the classification of Arabidopsis. The characterization, classification, gene structure and phylogenetic construction of GRAS proteins are performed. Distribution mapping shows that GRAS proteins are nonrandomly localized in 10 chromosomes. Fifty-five orthologous gene pairs are shared by Chinese cabbage and Arabidopsis, and interaction networks of these orthologous genes are constructed. The expansion of GRAS genes in Chinese cabbage results from genome triplication. Among the 17 species examined, 14 higher plants carry the GRAS genes, whereas two lower plants and one fungi species do not. Furthermore, the expression patterns of GRAS genes exhibit differences in three tissues based on RNA-seq data. Taken together, this comprehensive analysis will provide rich resources for studying GRAS protein functions in Chinese cabbage.

  14. Bacterial Genes in the Aphid Genome: Absence of Functional Gene Transfer from Buchnera to Its Host

    PubMed Central

    Nikoh, Naruo; McCutcheon, John P.; Kudo, Toshiaki; Miyagishima, Shin-ya; Moran, Nancy A.; Nakabachi, Atsushi

    2010-01-01

    Genome reduction is typical of obligate symbionts. In cellular organelles, this reduction partly reflects transfer of ancestral bacterial genes to the host genome, but little is known about gene transfer in other obligate symbioses. Aphids harbor anciently acquired obligate mutualists, Buchnera aphidicola (Gammaproteobacteria), which have highly reduced genomes (420–650 kb), raising the possibility of gene transfer from ancestral Buchnera to the aphid genome. In addition, aphids often harbor other bacteria that also are potential sources of transferred genes. Previous limited sampling of genes expressed in bacteriocytes, the specialized cells that harbor Buchnera, revealed that aphids acquired at least two genes from bacteria. The newly sequenced genome of the pea aphid, Acyrthosiphon pisum, presents the first opportunity for a complete inventory of genes transferred from bacteria to the host genome in the context of an ancient obligate symbiosis. Computational screening of the entire A. pisum genome, followed by phylogenetic and experimental analyses, provided strong support for the transfer of 12 genes or gene fragments from bacteria to the aphid genome: three LD–carboxypeptidases (LdcA1, LdcA2,ψLdcA), five rare lipoprotein As (RlpA1-5), N-acetylmuramoyl-L-alanine amidase (AmiD), 1,4-beta-N-acetylmuramidase (bLys), DNA polymerase III alpha chain (ψDnaE), and ATP synthase delta chain (ψAtpH). Buchnera was the apparent source of two highly truncated pseudogenes (ψDnaE and ψAtpH). Most other transferred genes were closely related to genes from relatives of Wolbachia (Alphaproteobacteria). At least eight of the transferred genes (LdcA1, AmiD, RlpA1-5, bLys) appear to be functional, and expression of seven (LdcA1, AmiD, RlpA1-5) are highly upregulated in bacteriocytes. The LdcAs and RlpAs appear to have been duplicated after transfer. Our results excluded the hypothesis that genome reduction in Buchnera has been accompanied by gene transfer to the host

  15. Genome duplication and gene loss affect the evolution of heat shock transcription factor genes in legumes.

    PubMed

    Lin, Yongxiang; Cheng, Ying; Jin, Jing; Jin, Xiaolei; Jiang, Haiyang; Yan, Hanwei; Cheng, Beijiu

    2014-01-01

    Whole-genome duplication events (polyploidy events) and gene loss events have played important roles in the evolution of legumes. Here we show that the vast majority of Hsf gene duplications resulted from whole genome duplication events rather than tandem duplication, and significant differences in gene retention exist between species. By searching for intraspecies gene colinearity (microsynteny) and dating the age distributions of duplicated genes, we found that genome duplications accounted for 42 of 46 Hsf-containing segments in Glycine max, while paired segments were rarely identified in Lotus japonicas, Medicago truncatula and Cajanus cajan. However, by comparing interspecies microsynteny, we determined that the great majority of Hsf-containing segments in Lotus japonicas, Medicago truncatula and Cajanus cajan show extensive conservation with the duplicated regions of Glycine max. These segments formed 17 groups of orthologous segments. These results suggest that these regions shared ancient genome duplication with Hsf genes in Glycine max, but more than half of the copies of these genes were lost. On the other hand, the Glycine max Hsf gene family retained approximately 75% and 84% of duplicated genes produced from the ancient genome duplication and recent Glycine-specific genome duplication, respectively. Continuous purifying selection has played a key role in the maintenance of Hsf genes in Glycine max. Expression analysis of the Hsf genes in Lotus japonicus revealed their putative involvement in multiple tissue-/developmental stages and responses to various abiotic stimuli. This study traces the evolution of Hsf genes in legume species and demonstrates that the rates of gene gain and loss are far from equilibrium in different species. PMID:25047803

  16. Genome duplication and gene loss affect the evolution of heat shock transcription factor genes in legumes.

    PubMed

    Lin, Yongxiang; Cheng, Ying; Jin, Jing; Jin, Xiaolei; Jiang, Haiyang; Yan, Hanwei; Cheng, Beijiu

    2014-01-01

    Whole-genome duplication events (polyploidy events) and gene loss events have played important roles in the evolution of legumes. Here we show that the vast majority of Hsf gene duplications resulted from whole genome duplication events rather than tandem duplication, and significant differences in gene retention exist between species. By searching for intraspecies gene colinearity (microsynteny) and dating the age distributions of duplicated genes, we found that genome duplications accounted for 42 of 46 Hsf-containing segments in Glycine max, while paired segments were rarely identified in Lotus japonicas, Medicago truncatula and Cajanus cajan. However, by comparing interspecies microsynteny, we determined that the great majority of Hsf-containing segments in Lotus japonicas, Medicago truncatula and Cajanus cajan show extensive conservation with the duplicated regions of Glycine max. These segments formed 17 groups of orthologous segments. These results suggest that these regions shared ancient genome duplication with Hsf genes in Glycine max, but more than half of the copies of these genes were lost. On the other hand, the Glycine max Hsf gene family retained approximately 75% and 84% of duplicated genes produced from the ancient genome duplication and recent Glycine-specific genome duplication, respectively. Continuous purifying selection has played a key role in the maintenance of Hsf genes in Glycine max. Expression analysis of the Hsf genes in Lotus japonicus revealed their putative involvement in multiple tissue-/developmental stages and responses to various abiotic stimuli. This study traces the evolution of Hsf genes in legume species and demonstrates that the rates of gene gain and loss are far from equilibrium in different species.

  17. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  18. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  19. Genome-wide analysis of homeobox genes from Mesobuthus martensii reveals Hox gene duplication in scorpions.

    PubMed

    Di, Zhiyong; Yu, Yao; Wu, Yingliang; Hao, Pei; He, Yawen; Zhao, Huabin; Li, Yixue; Zhao, Guoping; Li, Xuan; Li, Wenxin; Cao, Zhijian

    2015-06-01

    Homeobox genes belong to a large gene group, which encodes the famous DNA-binding homeodomain that plays a key role in development and cellular differentiation during embryogenesis in animals. Here, one hundred forty-nine homeobox genes were identified from the Asian scorpion, Mesobuthus martensii (Chelicerata: Arachnida: Scorpiones: Buthidae) based on our newly assembled genome sequence with approximately 248 × coverage. The identified homeobox genes were categorized into eight classes including 82 families: 67 ANTP class genes, 33 PRD genes, 11 LIM genes, five POU genes, six SINE genes, 14 TALE genes, five CUT genes, two ZF genes and six unclassified genes. Transcriptome data confirmed that more than half of the genes were expressed in adults. The homeobox gene diversity of the eight classes is similar to the previously analyzed Mandibulata arthropods. Interestingly, it is hypothesized that the scorpion M. martensii may have two Hox clusters. The first complete genome-wide analysis of homeobox genes in Chelicerata not only reveals the repertoire of scorpion, arachnid and chelicerate homeobox genes, but also shows some insights into the evolution of arthropod homeobox genes.

  20. Genomic characterisation of Wongabel virus reveals novel genes within the Rhabdoviridae.

    PubMed

    Gubala, Aneta J; Proll, David F; Barnard, Ross T; Cowled, Chris J; Crameri, Sandra G; Hyatt, Alex D; Boyle, David B

    2008-06-20

    Viruses belonging to the family Rhabdoviridae infect a variety of different hosts, including insects, vertebrates and plants. Currently, there are approximately 200 ICTV-recognised rhabdoviruses isolated around the world. However, the majority remain poorly characterised and only a fraction have been definitively assigned to genera. The genomic and transcriptional complexity displayed by several of the characterised rhabdoviruses indicates large diversity and complexity within this family. To enable an improved taxonomic understanding of this family, it is necessary to gain further information about the poorly characterised members of this family. Here we present the complete genome sequence and predicted transcription strategy of Wongabel virus (WONV), a previously uncharacterised rhabdovirus isolated from biting midges (Culicoides austropalpalis) collected in northern Queensland, Australia. The 13,196 nucleotide genome of WONV encodes five typical rhabdovirus genes N, P, M, G and L. In addition, the WONV genome contains three genes located between the P and M genes (U1, U2, U3) and two open reading frames overlapping with the N and G genes (U4, U5). These five additional genes and their putative protein products appear to be novel, and their functions are unknown. Predictive analysis of the U5 gene product revealed characteristics typical of viroporins, and indicated structural similarities with the alpha-1 protein (putative viroporin) of viruses in the genus Ephemerovirus. Phylogenetic analyses of the N and G proteins of WONV indicated closest similarity with the avian-associated Flanders virus; however, the genomes of these two viruses are significantly diverged. WONV displays a novel and unique genome structure that has not previously been described for any animal rhabdovirus.

  1. Surfeit locus gene homologs are widely distributed in invertebrate genomes.

    PubMed

    Armes, N; Fried, M

    1996-10-01

    The mouse Surfeit locus contains six sequence-unrelated genes (Surf-1 to -6) arranged in the tightest gene cluster so far described for mammals. The organization and juxtaposition of five of the Surfeit genes (Surf-1 to -5) are conserved between mammals and birds, and this may reflect a functional or regulatory requirement for the gene clustering. We have undertaken an evolutionary study to determine whether the Surfeit genes are conserved and clustered in invertebrate genomes. Drosophila melanogaster and Caenorhabditis elegans homologs of the mouse Surf-4 gene, which encodes an integral membrane protein associated with the endoplasmic reticulum, have been isolated. The amino acid sequences of the Drosophila and C. elegans homologs are highly conserved in comparison with the mouse Surf-4 protein. In particular, a dilysine motif implicated in endoplasmic reticulum localization of the mouse protein is conserved in the invertebrate homologs. We show that the Drosophila Surf-4 gene, which is transcribed from a TATA-less promoter, is not closely associated with other Drosophila Surfeit gene homologs but rather is located upstream from sequences encoding a homolog of a yeast seryl-tRNA synthetase protein. There are at least two closely linked Surf-3/rpL7a genes or highly polymorphic alleles of a single Surf-3/rpL7a gene in the C. elegans genome. The chromosomal locations of the C. elegans Surf-1, Surf-3/rpL7a, and Surf-4 genes have been determined. In D. melanogaster the Surf-3/rpL7a, Surf-4, and Surf-5 gene homologs and in C. elegans the Surf-1, Surf-3/rpL7a, Surf-4, and Surf-5 gene homologs are located on completely different chromosomes, suggesting that any requirement for the tight clustering of the genes in the Surfeit locus is restricted to vertebrate lineages.

  2. Comparative genomics on nemo-like kinase gene.

    PubMed

    Katoh, Masuko; Katoh, Masaru

    2005-06-01

    WNT signals are transduced to the planar cell polarity (PCP) pathway or the beta-catenin pathway. Drosophila Frizzled (Fz), Starry night (Stan), Van Gogh (Vang), Dishevelled (Dsh), Prickle (Pk), Diego (Dgo) and Nemo (Nmo) are implicated in the PCP signaling pathway. Choi and Benzer identified Drosophila Nmo in 1994, and Brott et al identified mouse Nemo-like kinase (Nlk) in 1998. Nlk positively regulates the PCP pathway, and negatively regulates the beta-catenin pathway. Here, we identified and characterized rat Nlk gene, Nlk2 gene and Nlkp pseudogene by using bioinformatics. Nlk gene, consisting of 11 exons, was mapped to rat chromosome 10q25. Rat Nlk gene encoded 515-aa Nlk protein with the serine/threonine kinase domain, poly(His) tracts and poly(Ala) tract, which showed 100, 99.8, 97.1 and 89.5% total-amino-acid identity with mouse Nlk, human NLK, Xenopus nlk and zebrafish nlk, respectively. Rat Nlk2 gene and Nlkp pseudogene were mapped to rat chromosome 13p13 and 2q44, respectively. Nlk2 gene and Nlkp pseudogene, consisting of a single exon, were not evolutionarily conserved. Nlk2 gene and Nlkp pseudogene were predicted as retrotransposed Nlk homologs within the rat genome. Nlk2 gene encoded a 480-aa Nlk2 protein with partial deletion within the kinase domain, which was predicted as the dominant negative Nlk homolog. This is the first report on the Nlk gene and retrotransposed Nlk homologs within the rat genome.

  3. Positionally biased gene loss after whole genome duplication: Evidence from human, yeast, and plant

    PubMed Central

    Makino, Takashi; McLysaght, Aoife

    2012-01-01

    Whole genome duplication (WGD) has made a significant contribution to many eukaryotic genomes including yeast, plants, and vertebrates. Following WGD, some ohnologs (WGD paralogs) remain in the genome arranged in blocks of conserved gene order and content (paralogons). However, the most common outcome is loss of one of the ohnolog pair. It is unclear what factors, if any, govern gene loss from paralogons. Recent studies have reported physical clustering (genetic linkage) of functionally linked (interacting) genes in the human genome and propose a biological significance for the clustering of interacting genes such as coexpression or preservation of epistatic interactions. Here we conduct a novel test of a hypothesis that functionally linked genes in the same paralogon are preferentially retained in cis after WGD. We compare the number of protein–protein interactions (PPIs) between linked singletons within a paralogon (defined as cis-PPIs) with that of PPIs between singletons across paralogon pairs (defined as trans-PPIs). We find that paralogons in which the number of cis-PPIs is greater than that of trans-PPIs are significantly enriched in human and yeast. The trend is similar in plants, but it is difficult to assess statistical significance due to multiple, overlapping WGD events. Interestingly, human singletons participating in cis-PPIs tend to be classified into “response to stimulus.” We uncover strong evidence of biased gene loss after WGD, which further supports the hypothesis of biologically significant gene clusters in eukaryotic genomes. These observations give us new insight for understanding the evolution of genome structure and of protein interaction networks. PMID:22835904

  4. The Isochore Structure of the Human Genome

    NASA Astrophysics Data System (ADS)

    Petrov, Dimitri; Arndt, Peter F.; Hwa, Terence

    2002-03-01

    Most of the genomes of warm-blooded vertebrates is a mosaic of very long (>200,000 bp) DNA segments, the isochores. These isochores are fairly homogeneous in base composition and distinguished by their guanine-cytosine (GC)-content. With the emergence of sequence data of different organisms we were able to study the isochore structure on scales up to length of chromosomes. We observed interesting long-range correlations and explore the possible mechanism(s) using sequence evolution models with mutation rates measured from the repetitive elements in the different isochores.

  5. In-silico human genomics with GeneCards

    PubMed Central

    2011-01-01

    Since 1998, the bioinformatics, systems biology, genomics and medical communities have enjoyed a synergistic relationship with the GeneCards database of human genes (http://www.genecards.org). This human gene compendium was created to help to introduce order into the increasing chaos of information flow. As a consequence of viewing details and deep links related to specific genes, users have often requested enhanced capabilities, such that, over time, GeneCards has blossomed into a suite of tools (including GeneDecks, GeneALaCart, GeneLoc, GeneNote and GeneAnnot) for a variety of analyses of both single human genes and sets thereof. In this paper, we focus on inhouse and external research activities which have been enabled, enhanced, complemented and, in some cases, motivated by GeneCards. In turn, such interactions have often inspired and propelled improvements in GeneCards. We describe here the evolution and architecture of this project, including examples of synergistic applications in diverse areas such as synthetic lethality in cancer, the annotation of genetic variations in disease, omics integration in a systems biology approach to kidney disease, and bioinformatics tools. PMID:22155609

  6. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth.

    PubMed

    Xiao, Shijun; Wang, Panpan; Zhang, Yan; Fang, Lujing; Liu, Yang; Li, Jiong-Tang; Wang, Zhi-Yong

    2015-12-22

    The genetic map of a species is essential for its whole genome assembly and can be applied to the mapping of important traits. In this study, we performed RNA-seq for a family of large yellow croakers (Larimichthys crocea) and constructed a high-density genetic map. In this map, 24 linkage groups comprised 3,448 polymorphic SNP markers. Approximately 72.4% (2,495) of the markers were located in protein-coding regions. Comparison of the croaker genome with those of five model fish species revealed that the croaker genome structure was closer to that of the medaka than to the remaining four genomes. Because the medaka genome preserves the teleost ancestral karyotype, this result indicated that the croaker genome might also maintain the teleost ancestral genome structure. The analysis also revealed different genome rearrangements across teleosts. QTL mapping and association analysis consistently identified growth-related QTL regions and associated genes. Orthologs of the associated genes in other species were demonstrated to regulate development, indicating that these genes might regulate development and growth in croaker. This gene map will enable us to construct the croaker genome for comparative studies and to provide an important resource for selective breeding of croaker.

  7. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth

    PubMed Central

    Xiao, Shijun; Wang, Panpan; Zhang, Yan; Fang, Lujing; Liu, Yang; Li, Jiong-Tang; Wang, Zhi-Yong

    2015-01-01

    The genetic map of a species is essential for its whole genome assembly and can be applied to the mapping of important traits. In this study, we performed RNA-seq for a family of large yellow croakers (Larimichthys crocea) and constructed a high-density genetic map. In this map, 24 linkage groups comprised 3,448 polymorphic SNP markers. Approximately 72.4% (2,495) of the markers were located in protein-coding regions. Comparison of the croaker genome with those of five model fish species revealed that the croaker genome structure was closer to that of the medaka than to the remaining four genomes. Because the medaka genome preserves the teleost ancestral karyotype, this result indicated that the croaker genome might also maintain the teleost ancestral genome structure. The analysis also revealed different genome rearrangements across teleosts. QTL mapping and association analysis consistently identified growth-related QTL regions and associated genes. Orthologs of the associated genes in other species were demonstrated to regulate development, indicating that these genes might regulate development and growth in croaker. This gene map will enable us to construct the croaker genome for comparative studies and to provide an important resource for selective breeding of croaker. PMID:26689832

  8. Gene discovery in the Acanthamoeba castellanii genome

    SciTech Connect

    Anderson, Iain J.; Watkins, Russell F.; Samuelson, John; Spencer,David F.; Majoros, William H.; Gray, Michael W.; Loftus, Brendan J.

    2005-08-01

    Acanthamoeba castellanii is a free-living amoeba found in soil, freshwater, and marine environments and an important predator of bacteria. Acanthamoeba castellanii is also an opportunistic pathogen of clinical interest, responsible for several distinct diseases in humans. In order to provide a genomic platform for the study of this ubiquitous and important protist, we generated a sequence survey of approximately 0.5 x coverage of the genome. The data predict that A. castellanii exhibits a greater biosynthetic capacity than the free-living Dictyostelium discoideum and the parasite Entamoeba histolytica, providing an explanation for the ability of A. castellanii to inhabit adversity of environments. Alginate lyase may provide access to bacteria within biofilms by breaking down the biofilm matrix, and polyhydroxybutyrate depolymerase may facilitate utilization of the bacterial storage compound polyhydroxybutyrate as a food source. Enzymes for the synthesis and breakdown of cellulose were identified, and they likely participate in encystation and excystation as in D. discoideum. Trehalose-6-phosphate synthase is present, suggesting that trehalose plays a role in stress adaptation. Detection and response to a number of stress conditions is likely accomplished with a large set of signal transduction histidine kinases and a set of putative receptorserine/threonine kinases similar to those found in E. histolytica. Serine, cysteine and metalloproteases were identified, some of which are likely involved in pathogenicity.

  9. Genome-wide comparative analysis reveals possible common ancestors of nucleotide-binding sites domain containing genes in hybrid Citrus sinensis genome and original Citrus clementina genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We identified and re-annotated candidate disease resistance (R) genes with nucleotide-binding sites (NBS) domain from a Citrus clementina genome and two complete Citrus sinensis genome sequences (one from the USA and one from China). We found similar numbers of NBS genes from three citrus genomes, r...

  10. A genome-wide analysis of the expansin genes in Malus × Domestica.

    PubMed

    Zhang, Shizhong; Xu, Ruirui; Gao, Zheng; Chen, Changtian; Jiang, Zesheng; Shu, Huairui

    2014-04-01

    Expansins were first identified as cell wall-loosening proteins; they are involved in regulating cell expansion, fruits softening and many other physiological processes. However, our knowledge about the expansin family members and their evolutionary relationships in fruit trees, such as apple, is limited. In this study, we identified 41 members of the expansin gene family in the genome of apple (Malus × Domestica L. Borkh). Phylogenetic analysis revealed that expansin genes in apple could be divided into four subfamilies according to their gene structures and protein motifs. By phylogenetic analysis of the expansins in five plants (Arabidopsis, rice, poplar, grape and apple), the expansins were divided into 17 subgroups. Our gene duplication analysis revealed that whole-genome and chromosomal-segment duplications contributed to the expansion of Mdexpansins. The microarray and expressed sequence tag (EST) data showed that 34 Mdexpansin genes could be divided into five groups by the EST analysis; they may also play different roles during fruit development. An expression model for MdEXPA16 and MdEXPA20 showed their potential role in developing fruit. Overall, our study provides useful data and novel insights into the functions and regulatory mechanisms of the expansin genes in apple, as well as their evolution and divergence. As the first step towards genome-wide analysis of the expansin genes in apple, our results have established a solid foundation for future studies on the function of the expansin genes in fruit development.

  11. Variations and classification of toxic epitopes related to celiac disease among α-gliadin genes from four Aegilops genomes.

    PubMed

    Li, Jie; Wang, Shunli; Li, Shanshan; Ge, Pei; Li, Xiaohui; Ma, Wujun; Zeller, F J; Hsam, Sai L K; Yan, Yueming

    2012-07-01

    The α-gliadins are associated with human celiac disease. A total of 23 noninterrupted full open reading frame α-gliadin genes and 19 pseudogenes were cloned and sequenced from C, M, N, and U genomes of four diploid Aegilops species. Sequence comparison of α-gliadin genes from Aegilops and Triticum species demonstrated an existence of extensive allelic variations in Gli-2 loci of the four Aegilops genomes. Specific structural features were found including the compositions and variations of two polyglutamine domains (QI and QII) and four T cell stimulatory toxic epitopes. The mean numbers of glutamine residues in the QI domain in C and N genomes and the QII domain in C, N, and U genomes were much higher than those in Triticum genomes, and the QI domain in C and N genomes and the QII domain in C, M, N, and U genomes displayed greater length variations. Interestingly, the types and numbers of four T cell stimulatory toxic epitopes in α-gliadins from the four Aegilops genomes were significantly less than those from Triticum A, B, D, and their progenitor genomes. Relationships between the structural variations of the two polyglutamine domains and the distributions of four T cell stimulatory toxic epitopes were found, resulting in the α-gliadin genes from the Aegilops and Triticum genomes to be classified into three groups.

  12. A Genomic Signature and the Identification of New Sporulation Genes

    PubMed Central

    Abecasis, Ana B.; Serrano, Mónica; Alves, Renato; Quintais, Leonor

    2013-01-01

    Bacterial endospores are the most resistant cell type known to humans, as they are able to withstand extremes of temperature, pressure, chemical injury, and time. They are also of interest because the endospore is the infective particle in a variety of human and livestock diseases. Endosporulation is characterized by the morphogenesis of an endospore within a mother cell. Based on the genes known to be involved in endosporulation in the model organism Bacillus subtilis, a conserved core of about 100 genes was derived, representing the minimal machinery for endosporulation. The core was used to define a genomic signature of about 50 genes that are able to distinguish endospore-forming organisms, based on complete genome sequences, and we show this 50-gene signature is robust against phylogenetic proximity and other artifacts. This signature includes previously uncharacterized genes that we can now show are important for sporulation in B. subtilis and/or are under developmental control, thus further validating this genomic signature. We also predict that a series of polyextremophylic organisms, as well as several gut bacteria, are able to form endospores, and we identified 3 new loci essential for sporulation in B. subtilis: ytaF, ylmC, and ylzA. In all, the results support the view that endosporulation likely evolved once, at the base of the Firmicutes phylum, and is unrelated to other bacterial cell differentiation programs and that this involved the evolution of new genes and functions, as well as the cooption of ancestral, housekeeping functions. PMID:23396918

  13. GENOME-ENABLED DISCOVERY OF CARBON SEQUESTRATION GENES IN POPLAR

    SciTech Connect

    DAVIS J M

    2007-10-11

    Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, the ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.

  14. Re-Examining the Gene in Personalized Genomics

    ERIC Educational Resources Information Center

    Bartol, Jordan

    2013-01-01

    Personalized genomics companies (PG; also called "direct-to-consumer genetics") are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept…

  15. Meet me halfway: when genomics meets structural bioinformatics.

    PubMed

    Gong, Sungsam; Worth, Catherine L; Cheng, Tammy M K; Blundell, Tom L

    2011-06-01

    The DNA sequencing technology developed by Frederick Sanger in the 1970s established genomics as the basis of comparative genetics. The recent invention of next-generation sequencing (NGS) platform has added a new dimension to genome research by generating ultra-fast and high-throughput sequencing data in an unprecedented manner. The advent of NGS technology also provides the opportunity to study genetic diseases where sequence variants or mutations are sought to establish a causal relationship with disease phenotypes. However, it is not a trivial task to seek genetic variants responsible for genetic diseases and even harder for complex diseases such as diabetes and cancers. In such polygenic diseases, multiple genes and alleles, which can exist in healthy individuals, come together to contribute to common disease phenotypes in a complex manner. Hence, it is desirable to have an approach that integrates omics data with both knowledge of protein structure and function and an understanding of networks/pathways, i.e. functional genomics and systems biology; in this way, genotype-phenotype relationships can be better understood. In this review, we bring this 'bottom-up' approach alongside the current NGS-driven genetic study of genetic variations and disease aetiology. We describe experimental and computational techniques for assessing genetic variants and their deleterious effects on protein structure and function. PMID:21350909

  16. Meet me halfway: when genomics meets structural bioinformatics.

    PubMed

    Gong, Sungsam; Worth, Catherine L; Cheng, Tammy M K; Blundell, Tom L

    2011-06-01

    The DNA sequencing technology developed by Frederick Sanger in the 1970s established genomics as the basis of comparative genetics. The recent invention of next-generation sequencing (NGS) platform has added a new dimension to genome research by generating ultra-fast and high-throughput sequencing data in an unprecedented manner. The advent of NGS technology also provides the opportunity to study genetic diseases where sequence variants or mutations are sought to establish a causal relationship with disease phenotypes. However, it is not a trivial task to seek genetic variants responsible for genetic diseases and even harder for complex diseases such as diabetes and cancers. In such polygenic diseases, multiple genes and alleles, which can exist in healthy individuals, come together to contribute to common disease phenotypes in a complex manner. Hence, it is desirable to have an approach that integrates omics data with both knowledge of protein structure and function and an understanding of networks/pathways, i.e. functional genomics and systems biology; in this way, genotype-phenotype relationships can be better understood. In this review, we bring this 'bottom-up' approach alongside the current NGS-driven genetic study of genetic variations and disease aetiology. We describe experimental and computational techniques for assessing genetic variants and their deleterious effects on protein structure and function.

  17. Functional genomics: Probing plant gene function and expression with transposons

    PubMed Central

    Martienssen, Robert A.

    1998-01-01

    Transposable elements provide a convenient and flexible means to disrupt plant genes, so allowing their function to be assessed. By engineering transposons to carry reporter genes and regulatory signals, the expression of target genes can be monitored and to some extent manipulated. Two strategies for using transposons to assess gene function are outlined here: First, the PCR can be used to identify plants that carry insertions into specific genes from among pools of heavily mutagenized individuals (site-selected transposon mutagenesis). This method requires that high copy transposons be used and that a relatively large number of reactions be performed to identify insertions into genes of interest. Second, a large library of plants, each carrying a unique insertion, can be generated. Each insertion site then can be amplified and sequenced systematically. These two methods have been demonstrated in maize, Arabidopsis, and other plant species, and the relative merits of each are discussed in the context of plant genome research. PMID:9482828

  18. Structure of the human retinoblastoma gene

    SciTech Connect

    Hong, F.D.; Huang, Hueijen S.; To, Hoang; Young, Lihjiuan S.; Oro, A.; Bookstein, R.; Lee, E.Y.H.P.; Lee, Wenhwa )

    1989-07-01

    Complete inactivation of the human retinoblastoma gene (RB) is believed to be an essential step in tumorigenesis of several different cancers. To provide a framework for understanding inactivation mechanisms, the structure of RB was delineated. The RB transcript is encoded in 27 exons dispersed over about 200 kilobases (kb) of genomic DNA. The length of individual exons ranges from 31 to 1,889 base pairs (bp). The largest intron spans >60 kb and the smallest one has only 80 bp. Deletion of exons 13-17 is frequently observed in various types of tumors, including retinoblastoma, breast cancer, and osteosarcoma, and the presence of a potential hot spot for recombination in the region is predicted. A putative leucine-zipper motif is exclusively encoded by exon 20. The detailed RB structure presented should prove useful in defining potential functional domains of its encoded protein. Transcription of RB is initiated at multiple positions and the sequences surrounding the initiation sites have a high G+C content. A typical upstream TATA box is not present. Localization of the RB promoter region was accomplished by utilizing a heterologous expression system containing a bacterial chloramphenicol acetyltransferase gene. Deletion analysis revealed that a region as small as 70 bp is sufficient for RB promoter activity, similar to other previously characterized G+C-rich gene promoters. Several direct repeats and possible stem-and-loop structures are found in the promoter region.

  19. Epigenomics and the structure of the living genome.

    PubMed

    Friedman, Nir; Rando, Oliver J

    2015-10-01

    Eukaryotic genomes are packaged into an extensively folded state known as chromatin. Analysis of the structure of eukaryotic chromosomes has been revolutionized by development of a suite of genome-wide measurement technologies, collectively termed "epigenomics." We review major advances in epigenomic analysis of eukaryotic genomes, covering aspects of genome folding at scales ranging from whole chromosome folding down to nucleotide-resolution assays that provide structural insights into protein-DNA interactions. We then briefly outline several challenges remaining and highlight new developments such as single-cell epigenomic assays that will help provide us with a high-resolution structural understanding of eukaryotic genomes.

  20. Comparative 3D genome structure analysis of the fission and the budding yeast.

    PubMed

    Gong, Ke; Tjong, Harianto; Zhou, Xianghong Jasmine; Alber, Frank

    2015-01-01

    We studied the 3D structural organization of the fission yeast genome, which emerges from the tethering of heterochromatic regions in otherwise randomly configured chromosomes represented as flexible polymer chains in an nuclear environment. This model is sufficient to explain in a statistical manner many experimentally determined distinctive features of the fission yeast genome, including chromatin interaction patterns from Hi-C experiments and the co-locations of functionally related and co-expressed genes, such as genes expressed by Pol-III. Our findings demonstrate that some previously described structure-function correlations can be explained as a consequence of random chromatin collisions driven by a few geometric constraints (mainly due to centromere-SPB and telomere-NE tethering) combined with the specific gene locations in the chromosome sequence. We also performed a comparative analysis between the fission and budding yeast genome structures, for which we previously detected a similar organizing principle. However, due to the different chromosome sizes and numbers, substantial differences are observed in the 3D structural genome organization between the two species, most notably in the nuclear locations of orthologous genes, and the extent of nuclear territories for genes and chromosomes. However, despite those differences, remarkably, functional similarities are maintained, which is evident when comparing spatial clustering of functionally related genes in both yeasts. Functionally related genes show a similar spatial clustering behavior in both yeasts, even though their nuclear locations are largely different between the yeast species.

  1. Comparative genomics and transcriptomics of trait-gene association

    PubMed Central

    2012-01-01

    Background The Order Rickettsiales includes important tick-borne pathogens, from Rickettsia rickettsii, which causes Rocky Mountain spotted fever, to Anaplasma marginale, the most prevalent vector-borne pathogen of cattle. Although most pathogens in this Order are transmitted by arthropod vectors, little is known about the microbial determinants of transmission. A. marginale provides unique tools for studying the determinants of transmission, with multiple strain sequences available that display distinct and reproducible transmission phenotypes. The closed core A. marginale genome suggests that any phenotypic differences are due to single nucleotide polymorphisms (SNPs). We combined DNA/RNA comparative genomic approaches using strains with different tick transmission phenotypes and identified genes that segregate with transmissibility. Results Comparison of seven strains with different transmission phenotypes generated a list of SNPs affecting 18 genes and nine promoters. Transcriptional analysis found two candidate genes downstream from promoter SNPs that were differentially transcribed. To corroborate the comparative genomics approach we used three RNA-seq platforms to analyze the transcriptomes from two A. marginale strains with different transmission phenotypes. RNA-seq analysis confirmed the comparative genomics data and found 10 additional genes whose transcription between strains with distinct transmission efficiencies was significantly different. Six regions of the genome that contained no annotation were found to be transcriptionally active, and two of these newly identified transcripts were differentially transcribed. Conclusions This approach identified 30 genes and two novel transcripts potentially involved in tick transmission. We describe the transcriptome of an obligate intracellular bacterium in depth, while employing massive parallel sequencing to dissect an important trait in bacterial pathogenesis. PMID:23181781

  2. An integrated map of structural variation in 2,504 human genomes.

    PubMed

    Sudmant, Peter H; Rausch, Tobias; Gardner, Eugene J; Handsaker, Robert E; Abyzov, Alexej; Huddleston, John; Zhang, Yan; Ye, Kai; Jun, Goo; Hsi-Yang Fritz, Markus; Konkel, Miriam K; Malhotra, Ankit; Stütz, Adrian M; Shi, Xinghua; Paolo Casale, Francesco; Chen, Jieming; Hormozdiari, Fereydoun; Dayama, Gargi; Chen, Ken; Malig, Maika; Chaisson, Mark J P; Walter, Klaudia; Meiers, Sascha; Kashin, Seva; Garrison, Erik; Auton, Adam; Lam, Hugo Y K; Jasmine Mu, Xinmeng; Alkan, Can; Antaki, Danny; Bae, Taejeong; Cerveira, Eliza; Chines, Peter; Chong, Zechen; Clarke, Laura; Dal, Elif; Ding, Li; Emery, Sarah; Fan, Xian; Gujral, Madhusudan; Kahveci, Fatma; Kidd, Jeffrey M; Kong, Yu; Lameijer, Eric-Wubbo; McCarthy, Shane; Flicek, Paul; Gibbs, Richard A; Marth, Gabor; Mason, Christopher E; Menelaou, Androniki; Muzny, Donna M; Nelson, Bradley J; Noor, Amina; Parrish, Nicholas F; Pendleton, Matthew; Quitadamo, Andrew; Raeder, Benjamin; Schadt, Eric E; Romanovitch, Mallory; Schlattl, Andreas; Sebra, Robert; Shabalin, Andrey A; Untergasser, Andreas; Walker, Jerilyn A; Wang, Min; Yu, Fuli; Zhang, Chengsheng; Zhang, Jing; Zheng-Bradley, Xiangqun; Zhou, Wanding; Zichner, Thomas; Sebat, Jonathan; Batzer, Mark A; McCarroll, Steven A; Mills, Ryan E; Gerstein, Mark B; Bashir, Ali; Stegle, Oliver; Devine, Scott E; Lee, Charles; Eichler, Evan E; Korbel, Jan O

    2015-10-01

    Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association. PMID:26432246

  3. An integrated map of structural variation in 2,504 human genomes.

    PubMed

    Sudmant, Peter H; Rausch, Tobias; Gardner, Eugene J; Handsaker, Robert E; Abyzov, Alexej; Huddleston, John; Zhang, Yan; Ye, Kai; Jun, Goo; Hsi-Yang Fritz, Markus; Konkel, Miriam K; Malhotra, Ankit; Stütz, Adrian M; Shi, Xinghua; Paolo Casale, Francesco; Chen, Jieming; Hormozdiari, Fereydoun; Dayama, Gargi; Chen, Ken; Malig, Maika; Chaisson, Mark J P; Walter, Klaudia; Meiers, Sascha; Kashin, Seva; Garrison, Erik; Auton, Adam; Lam, Hugo Y K; Jasmine Mu, Xinmeng; Alkan, Can; Antaki, Danny; Bae, Taejeong; Cerveira, Eliza; Chines, Peter; Chong, Zechen; Clarke, Laura; Dal, Elif; Ding, Li; Emery, Sarah; Fan, Xian; Gujral, Madhusudan; Kahveci, Fatma; Kidd, Jeffrey M; Kong, Yu; Lameijer, Eric-Wubbo; McCarthy, Shane; Flicek, Paul; Gibbs, Richard A; Marth, Gabor; Mason, Christopher E; Menelaou, Androniki; Muzny, Donna M; Nelson, Bradley J; Noor, Amina; Parrish, Nicholas F; Pendleton, Matthew; Quitadamo, Andrew; Raeder, Benjamin; Schadt, Eric E; Romanovitch, Mallory; Schlattl, Andreas; Sebra, Robert; Shabalin, Andrey A; Untergasser, Andreas; Walker, Jerilyn A; Wang, Min; Yu, Fuli; Zhang, Chengsheng; Zhang, Jing; Zheng-Bradley, Xiangqun; Zhou, Wanding; Zichner, Thomas; Sebat, Jonathan; Batzer, Mark A; McCarroll, Steven A; Mills, Ryan E; Gerstein, Mark B; Bashir, Ali; Stegle, Oliver; Devine, Scott E; Lee, Charles; Eichler, Evan E; Korbel, Jan O

    2015-10-01

    Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.

  4. The genome and structural proteome of an ocean siphovirus: a new window into the cyanobacterial 'mobilome'.

    PubMed

    Sullivan, Matthew B; Krastins, Bryan; Hughes, Jennifer L; Kelly, Libusha; Chase, Michael; Sarracino, David; Chisholm, Sallie W

    2009-11-01

    Prochlorococcus, an abundant phototroph in the oceans, are infected by members of three families of viruses: myo-, podo- and siphoviruses. Genomes of myo- and podoviruses isolated on Prochlorococcus contain DNA replication machinery and virion structural genes homologous to those from coliphages T4 and T7 respectively. They also contain a suite of genes of cyanobacterial origin, most notably photosynthesis genes, which are expressed during infection and appear integral to the evolutionary trajectory of both host and phage. Here we present the first genome of a cyanobacterial siphovirus, P-SS2, which was isolated from Atlantic slope waters using a Prochlorococcus host (MIT9313). The P-SS2 genome is larger than, and considerably divergent from, previously sequenced siphoviruses. It appears most closely related to lambdoid siphoviruses, with which it shares 13 functional homologues. The approximately 108 kb P-SS2 genome encodes 131 predicted proteins and notably lacks photosynthesis genes which have consistently been found in other marine cyanophage, but does contain 14 other cyanobacterial homologues. While only six structural proteins were identified from the genome sequence, 35 proteins were detected experimentally; these mapped onto capsid and tail structural modules in the genome. P-SS2 is potentially capable of integration into its host as inferred from bioinformatically identified genetic machinery int, bet, exo and a 53 bp attachment site. The host attachment site appears to be a genomic island that is tied to insertion sequence (IS) activity that could facilitate mobility of a gene involved in the nitrogen-stress response. The homologous region and a secondary IS-element hot-spot in Synechococcus RS9917 are further evidence of IS-mediated genome evolution coincident with a probable relic prophage integration event. This siphovirus genome provides a glimpse into the biology of a deep-photic zone phage as well as the ocean cyanobacterial prophage and IS element

  5. Genome-Wide Scans for Delineation of Candidate Genes Regulating Seed-Protein Content in Chickpea

    PubMed Central

    Upadhyaya, Hari D.; Bajaj, Deepak; Narnoliya, Laxmi; Das, Shouvik; Kumar, Vinod; Gowda, C. L. L.; Sharma, Shivali; Tyagi, Akhilesh K.; Parida, Swarup K.

    2016-01-01

    Identification of potential genes/alleles governing complex seed-protein content (SPC) is essential in marker-assisted breeding for quality trait improvement of chickpea. Henceforth, the present study utilized an integrated genomics-assisted breeding strategy encompassing trait association analysis, selective genotyping in traditional bi-parental mapping population and differential expression profiling for the first-time to understand the complex genetic architecture of quantitative SPC trait in chickpea. For GWAS (genome-wide association study), high-throughput genotyping information of 16376 genome-based SNPs (single nucleotide polymorphism) discovered from a structured population of 336 sequenced desi and kabuli accessions [with 150–200 kb LD (linkage disequilibrium) decay] was utilized. This led to identification of seven most effective genomic loci (genes) associated [10–20% with 41% combined PVE (phenotypic variation explained)] with SPC trait in chickpea. Regardless of the diverse desi and kabuli genetic backgrounds, a comparable level of association potential of the identified seven genomic loci with SPC trait was observed. Five SPC-associated genes were validated successfully in parental accessions and homozygous individuals of an intra-specific desi RIL (recombinant inbred line) mapping population (ICC 12299 × ICC 4958) by selective genotyping. The seed-specific expression, including differential up-regulation (>four fold) of six SPC-associated genes particularly in accessions, parents and homozygous individuals of the aforementioned mapping population with a high level of contrasting SPC (21–22%) was evident. Collectively, the integrated genomic approach delineated diverse naturally occurring novel functional SNP allelic variants in six potential candidate genes regulating SPC trait in chickpea. Of these, a non-synonymous SNP allele-carrying zinc finger transcription factor gene exhibiting strong association with SPC trait was found to be the most

  6. Genome-Wide Scans for Delineation of Candidate Genes Regulating Seed-Protein Content in Chickpea.

    PubMed

    Upadhyaya, Hari D; Bajaj, Deepak; Narnoliya, Laxmi; Das, Shouvik; Kumar, Vinod; Gowda, C L L; Sharma, Shivali; Tyagi, Akhilesh K; Parida, Swarup K

    2016-01-01

    Identification of potential genes/alleles governing complex seed-protein content (SPC) is essential in marker-assisted breeding for quality trait improvement of chickpea. Henceforth, the present study utilized an integrated genomics-assisted breeding strategy encompassing trait association analysis, selective genotyping in traditional bi-parental mapping population and differential expression profiling for the first-time to understand the complex genetic architecture of quantitative SPC trait in chickpea. For GWAS (genome-wide association study), high-throughput genotyping information of 16376 genome-based SNPs (single nucleotide polymorphism) discovered from a structured population of 336 sequenced desi and kabuli accessions [with 150-200 kb LD (linkage disequilibrium) decay] was utilized. This led to identification of seven most effective genomic loci (genes) associated [10-20% with 41% combined PVE (phenotypic variation explained)] with SPC trait in chickpea. Regardless of the diverse desi and kabuli genetic backgrounds, a comparable level of association potential of the identified seven genomic loci with SPC trait was observed. Five SPC-associated genes were validated successfully in parental accessions and homozygous individuals of an intra-specific desi RIL (recombinant inbred line) mapping population (ICC 12299 × ICC 4958) by selective genotyping. The seed-specific expression, including differential up-regulation (>four fold) of six SPC-associated genes particularly in accessions, parents and homozygous individuals of the aforementioned mapping population with a high level of contrasting SPC (21-22%) was evident. Collectively, the integrated genomic approach delineated diverse naturally occurring novel functional SNP allelic variants in six potential candidate genes regulating SPC trait in chickpea. Of these, a non-synonymous SNP allele-carrying zinc finger transcription factor gene exhibiting strong association with SPC trait was found to be the most

  7. Diversity of 5S rRNA genes within individual prokaryotic genomes.

    PubMed

    Pei, Anna; Li, Hongru; Oberdorf, William E; Alekseyenko, Alexander V; Parsons, Tamasha; Yang, Liying; Gerz, Erika A; Lee, Peng; Xiang, Charlie; Nossa, Carlos W; Pei, Zhiheng

    2012-10-01

    We examined intragenomic variation of paralogous 5S rRNA genes to evaluate the concept of ribosomal constraints. In a dataset containing 1161 genomes from 779 unique species, 96 species exhibited > 3% diversity. Twenty-seven species with > 10% diversity contained a total of 421 mismatches between all pairs of the most dissimilar copies of 5S rRNA genes. The large majority (401 of 421) of the diversified positions were conserved at the secondary structure level. The high diversity was associated with partial rRNA operon, split operon, or spacer length-related divergence. In total, these findings indicated that there are tight ribosomal constraints on paralogous 5S rRNA genes in a genome despite of the high degree of diversity at the primary structure level.

  8. No genes for intelligence in the fluid genome.

    PubMed

    Ho, Mae-Wan

    2013-01-01

    Revolution is brewing belatedly within the heartlands of the genetic determinist establishment still in denial about the fluid genome that makes identifying genes even for common disease well-nigh impossible. The fruitless hunt for intelligence genes serves to expose the poverty of an obsolete paradigm that is obstructing knowledge and preventing fruitful policies from being widely implemented. Genome-wide scans using state-of-the art technologies on extensive databases have failed to find a single gene for intelligence; instead, environment and maternal effects may account for most, if not all correlation among relatives, while identical twins diverge genetically and epigenetically throughout life. Abundant evidence points to the enormous potential for improving intellectual abilities (and health) through simple environmental and social interventions.

  9. No genes for intelligence in the fluid genome.

    PubMed

    Ho, Mae-Wan

    2013-01-01

    Revolution is brewing belatedly within the heartlands of the genetic determinist establishment still in denial about the fluid genome that makes identifying genes even for common disease well-nigh impossible. The fruitless hunt for intelligence genes serves to expose the poverty of an obsolete paradigm that is obstructing knowledge and preventing fruitful policies from being widely implemented. Genome-wide scans using state-of-the art technologies on extensive databases have failed to find a single gene for intelligence; instead, environment and maternal effects may account for most, if not all correlation among relatives, while identical twins diverge genetically and epigenetically throughout life. Abundant evidence points to the enormous potential for improving intellectual abilities (and health) through simple environmental and social interventions. PMID:23865113

  10. Paucity of chimeric gene-transposable element transcripts in the Drosophila melanogaster genome

    PubMed Central

    Lipatov, Mikhail; Lenkov, Kapa; Petrov, Dmitri A; Bergman, Casey M

    2005-01-01

    Background Recent analysis of the human and mouse genomes has shown that a substantial proportion of protein coding genes and cis-regulatory elements contain transposable element (TE) sequences, implicating TE domestication as a mechanism for the origin of genetic novelty. To understand the general role of TE domestication in eukaryotic genome evolution, it is important to assess the acquisition of functional TE sequences by host genomes in a variety of different species, and to understand in greater depth the population dynamics of these mutational events. Results Using an in silico screen for host genes that contain TE sequences, we identified a set of 63 mature "chimeric" transcripts supported by expressed sequence tag (EST) evidence in the Drosophila melanogaster genome. We found a paucity of chimeric TEs relative to expectations derived from non-chimeric TEs, indicating that the majority (~80%) of TEs that generate chimeric transcripts are deleterious and are not observed in the genome sequence. Using a pooled-PCR strategy to assay the presence of gene-TE chimeras in wild strains, we found that over half of the observed chimeric TE insertions are restricted to the sequenced strain, and ~15% are found at high frequencies in North American D. melanogaster populations. Estimated population frequencies of chimeric TEs did not differ significantly from non-chimeric TEs, suggesting that the distribution of fitness effects for the observed subset of chimeric TEs is indistinguishable from the general set of TEs in the genome sequence. Conclusion In contrast to mammalian genomes, we found that fewer than 1% of Drosophila genes produce mRNAs that include bona fide TE sequences. This observation can be explained by the results of our population genomic analysis, which indicates that most potential chimeric TEs in D. melanogaster are deleterious but that a small proportion may contribute to the evolution of novel gene sequences such as nested or intercalated gene

  11. Genome-Wide Identification and Expression Analysis of WRKY Gene Family in Capsicum annuum L.

    PubMed Central

    Diao, Wei-Ping; Snyder, John C.; Wang, Shu-Bin; Liu, Jin-Bing; Pan, Bao-Gui; Guo, Guang-Jun; Wei, Ge

    2016-01-01

    The WRKY family of transcription factors is one of the most important families of plant transcriptional regulators with members regulating multiple biological processes, especially in regulating defense against biotic and abiotic stresses. However, little information is available about WRKYs in pepper (Capsicum annuum L.). The recent release of completely assembled genome sequences of pepper allowed us to perform a genome-wide investigation for pepper WRKY proteins. In the present study, a total of 71 WRKY genes were identified in the pepper genome. According to structural features of their encoded proteins, the pepper WRKY genes (CaWRKY) were classified into three main groups, with the second group further divided into five subgroups. Genome mapping analysis revealed that CaWRKY were enriched on four chromosomes, especially on chromosome 1, and 15.5% of the family members were tandemly duplicated genes. A phylogenetic tree was constructed depending on WRKY domain' sequences derived from pepper and Arabidopsis. The expression of 21 selected CaWRKY genes in response to seven different biotic and abiotic stresses (salt, heat shock, drought, Phytophtora capsici, SA, MeJA, and ABA) was evaluated by quantitative RT-PCR; Some CaWRKYs were highly expressed and up-regulated by stress treatment. Our results will provide a platform for functional identification and molecular breeding studies of WRKY genes in pepper. PMID:26941768

  12. (Structure and expression of nuclear genes encoding rubisco activase)

    SciTech Connect

    Zielinski, R.E.

    1990-01-01

    Our activities during the past year have centered around two basic aspects of the project: describing more thoroughly the diurnal and light irradiance effects on activase gene expression in barley; and isolating and structurally characterizing cDNA and genomic DNA sequences encoding activase from barley. Three appendices are included that summarize these activities.

  13. The Rhodomonas salina mitochondrial genome: bacteria-like operons, compact gene arrangement and complex repeat region.

    PubMed

    Hauth, Amy M; Maier, Uwe G; Lang, B Franz; Burger, Gertraud

    2005-01-01

    To gain insight into the mitochondrial genome structure and gene content of a putatively ancestral group of eukaryotes, the cryptophytes, we sequenced the complete mitochondrial DNA of Rhodomonas salina. The 48 063 bp circular-mapping molecule codes for 2 rRNAs, 27 tRNAs and 40 proteins including 23 components of oxidative phosphorylation, 15 ribosomal proteins and two subunits of tat translocase. One potential protein (ORF161) is without assigned function. Only two introns occur in the genome; both are present within cox1 belong to group II and contain RT open reading frames. Primitive genome features include bacteria-like rRNAs and tRNAs, ribosomal protein genes organized in large clusters resembling bacterial operons and the presence of the otherwise rare genes such as rps1 and tatA. The highly compact gene organization contrasts with the presence of a 4.7 kb long, repeat-containing intergenic region. Repeat motifs approximately 40-700 bp long occur up to 31 times, forming a complex repeat structure. Tandem repeats are the major arrangement but the region also includes a large, approximately 3 kb, inverted repeat and several potentially stable approximately 40-80 bp long hairpin structures. We provide evidence that the large repeat region is involved in replication and transcription initiation, predict a promoter motif that occurs in three locations and discuss two likely scenarios of how this highly structured repeat region might have evolved.

  14. Genome-Wide Characterization and Expression Profiles of the Superoxide Dismutase Gene Family in Gossypium.

    PubMed

    Zhang, Jingbo; Li, Bo; Yang, Yang; Hu, Wenran; Chen, Fangyuan; Xie, Lixia; Fan, Ling

    2016-01-01

    Superoxide dismutase (SOD) as a group of significant and ubiquitous enzymes plays a critical function in plant growth and development. Previously this gene family has been investigated in Arabidopsis and rice; it has not yet been characterized in cotton. In our study, it was the first time for us to perform a genome-wide analysis of SOD gene family in cotton. Our results showed that 10 genes of SOD gene family were identified in Gossypium arboreum and Gossypium raimondii, including 6 Cu-Zn-SODs, 2 Fe-SODs, and 2 Mn-SODs. The chromosomal distribution analysis revealed that SOD genes are distributed across 7 chromosomes in Gossypium arboreum and 8 chromosomes in Gossypium raimondii. Segmental duplication is predominant duplication event and major contributor for expansion of SOD gene family. Gene structure and protein structure analysis showed that SOD genes have conserved exon/intron arrangement and motif composition. Microarray-based expression analysis revealed that SOD genes have important function in abiotic stress. Moreover, the tissue-specific expression profile reveals the functional divergence of SOD genes in different organs development of cotton. Taken together, this study has imparted new insights into the putative functions of SOD gene family in cotton. Findings of the present investigation could help in understanding the role of SOD gene family in various aspects of the life cycle of cotton. PMID:27660755

  15. Genome-Wide Characterization and Expression Profiles of the Superoxide Dismutase Gene Family in Gossypium

    PubMed Central

    Zhang, Jingbo; Li, Bo; Yang, Yang; Hu, Wenran; Chen, Fangyuan; Xie, Lixia

    2016-01-01

    Superoxide dismutase (SOD) as a group of significant and ubiquitous enzymes plays a critical function in plant growth and development. Previously this gene family has been investigated in Arabidopsis and rice; it has not yet been characterized in cotton. In our study, it was the first time for us to perform a genome-wide analysis of SOD gene family in cotton. Our results showed that 10 genes of SOD gene family were identified in Gossypium arboreum and Gossypium raimondii, including 6 Cu-Zn-SODs, 2 Fe-SODs, and 2 Mn-SODs. The chromosomal distribution analysis revealed that SOD genes are distributed across 7 chromosomes in Gossypium arboreum and 8 chromosomes in Gossypium raimondii. Segmental duplication is predominant duplication event and major contributor for expansion of SOD gene family. Gene structure and protein structure analysis showed that SOD genes have conserved exon/intron arrangement and motif composition. Microarray-based expression analysis revealed that SOD genes have important function in abiotic stress. Moreover, the tissue-specific expression profile reveals the functional divergence of SOD genes in different organs development of cotton. Taken together, this study has imparted new insights into the putative functions of SOD gene family in cotton. Findings of the present investigation could help in understanding the role of SOD gene family in various aspects of the life cycle of cotton.

  16. Genome-Wide Characterization and Expression Profiles of the Superoxide Dismutase Gene Family in Gossypium

    PubMed Central

    Zhang, Jingbo; Li, Bo; Yang, Yang; Hu, Wenran; Chen, Fangyuan; Xie, Lixia

    2016-01-01

    Superoxide dismutase (SOD) as a group of significant and ubiquitous enzymes plays a critical function in plant growth and development. Previously this gene family has been investigated in Arabidopsis and rice; it has not yet been characterized in cotton. In our study, it was the first time for us to perform a genome-wide analysis of SOD gene family in cotton. Our results showed that 10 genes of SOD gene family were identified in Gossypium arboreum and Gossypium raimondii, including 6 Cu-Zn-SODs, 2 Fe-SODs, and 2 Mn-SODs. The chromosomal distribution analysis revealed that SOD genes are distributed across 7 chromosomes in Gossypium arboreum and 8 chromosomes in Gossypium raimondii. Segmental duplication is predominant duplication event and major contributor for expansion of SOD gene family. Gene structure and protein structure analysis showed that SOD genes have conserved exon/intron arrangement and motif composition. Microarray-based expression analysis revealed that SOD genes have important function in abiotic stress. Moreover, the tissue-specific expression profile reveals the functional divergence of SOD genes in different organs development of cotton. Taken together, this study has imparted new insights into the putative functions of SOD gene family in cotton. Findings of the present investigation could help in understanding the role of SOD gene family in various aspects of the life cycle of cotton. PMID:27660755

  17. Diversity of human tRNA genes from the 1000-genomes project

    PubMed Central

    Parisien, Marc; Wang, Xiaoyun; Pan, Tao

    2013-01-01

    The sequence diversity of individual human genomes has been extensively analyzed for variations and phenotypic implications for mRNA, miRNA, and long non-coding RNA genes. TRNA (tRNA) also exhibits large sequence diversity in the human genome, but tRNA gene sequence variation and potential functional implications in individual human genomes have not been investigated. Here we capitalize on the sequencing data from the 1000-genomes project to examine the diversity of tRNA genes in the human population. Previous analysis of the reference human genome indicated an unexpected large number of diverse tRNA genes beyond the necessity of translation, suggesting that some tRNA transcripts may perform non-canonical functions. We found 24 new tRNA sequences in > 1% and 76 new tRNA sequences in > 0.2% of all individuals, indicating that tRNA genes are also subject to evolutionary changes in the human population. Unexpectedly, two abundant new tRNA genes contain base-pair mismatches in the anticodon stem. We experimentally determined that these two new tRNAs have altered structures in vitro; however, one new tRNA is not aminoacylated but extremely stable in HeLa cells, suggesting that this new tRNA can be used for non-canonical function. Our results show that at the scale of human population, tRNA genes are more diverse than conventionally understood, and some new tRNAs may perform non-canonical, extra-translational functions that may be linked to human health and disease. PMID:24448271

  18. Diversity of human tRNA genes from the 1000-genomes project.

    PubMed

    Parisien, Marc; Wang, Xiaoyun; Pan, Tao

    2013-12-01

    The sequence diversity of individual human genomes has been extensively analyzed for variations and phenotypic implications for mRNA, miRNA, and long non-coding RNA genes. TRNA (tRNA) also exhibits large sequence diversity in the human genome, but tRNA gene sequence variation and potential functional implications in individual human genomes have not been investigated. Here we capitalize on the sequencing data from the 1000-genomes project to examine the diversity of tRNA genes in the human population. Previous analysis of the reference human genome indicated an unexpected large number of diverse tRNA genes beyond the necessity of translation, suggesting that some tRNA transcripts may perform non-canonical functions. We found 24 new tRNA sequences in>1% and 76 new tRNA sequences in>0.2% of all individuals, indicating that tRNA genes are also subject to evolutionary changes in the human population. Unexpectedly, two abundant new tRNA genes contain base-pair mismatches in the anticodon stem. We experimentally determined that these two new tRNAs have altered structures in vitro; however, one new tRNA is not aminoacylated but extremely stable in HeLa cells, suggesting that this new tRNA can be used for non-canonical function. Our results show that at the scale of human population, tRNA genes are more diverse than conventionally understood, and some new tRNAs may perform non-canonical, extra-translational functions that may be linked to human health and disease.

  19. Genome-wide identification, phylogeny, and expression of fibroblast growth genes in common carp.

    PubMed

    Jiang, Likun; Zhang, Songhao; Dong, Chuanju; Chen, Baohua; Feng, Jingyan; Peng, Wenzhu; Mahboob, Shahid; Al-Ghanim, Khalid A; Xu, Peng

    2016-03-10

    Fibroblast growth factors (FGFs) are a large family of polypeptide growth factors, which are found in organisms ranging from nematodes to humans. In vertebrates, a number of FGFs have been shown to play important roles in developing embryos and adult organisms. Among the vertebrate species, FGFs are highly conserved in both gene structure and amino-acid sequence. However, studies on teleost FGFs are mainly limited to model species, hence we investigated FGFs in the common carp genome. We identified 35 FGFs in the common carp genome. Phylogenetic analysis revealed that most of the FGFs are highly conserved, though recent gene duplication and gene losses do exist. By examining the copy number of FGFs in several vertebrate genomes, we found that eight FGFs in common carp have undergone gene duplications, including FGF6a, FGF6b, FGF7, FGF8b, FGF10a, FGF11b, FGF13a, and FGF18b. The expression patterns of all FGFs were examined in various tissues, including the blood, brain, gill, heart, intestine, muscle, skin, spleen and kidney, showing that most of the FGFs were ubiquitously expressed, indicating their critical role in common carp. To some extent, examination of gene families with detailed phylogenetic or orthology analysis verified the authenticity and accuracy of assembly and annotation of the recently published common carp whole genome sequences. Gene families are also considered as a unique source for evolutionary studies. Moreover, the whole set of common carp FGF gene family provides an important genomic resource for future biochemical, physiological, and phylogenetic studies on FGFs in teleosts.

  20. A genomic approach to identify hybrid incompatibility genes

    PubMed Central

    Cooper, Jacob C.; Phadnis, Nitin

    2016-01-01

    ABSTRACT Uncovering the genetic and molecular basis of barriers to gene flow between populations is key to understanding how new species are born. Intrinsic postzygotic reproductive barriers such as hybrid sterility and hybrid inviability are caused by deleterious genetic interactions known as hybrid incompatibilities. The difficulty in identifying these hybrid incompatibility genes remains a rate-limiting step in our understanding of the molecular basis of speciation. We recently described how whole genome sequencing can be applied to identify hybrid incompatibility genes, even from genetically terminal hybrids. Using this approach, we discovered a new hybrid incompatibility gene, gfzf, between Drosophila melanogaster and Drosophila simulans, and found that it plays an essential role in cell cycle regulation. Here, we discuss the history of the hunt for incompatibility genes between these species, discuss the molecular roles of gfzf in cell cycle regulation, and explore how intragenomic conflict drives the evolution of fundamental cellular mechanisms that lead to the developmental arrest of hybrids. PMID:27230814

  1. Gene transfer with subsequent removal of the selection gene from the host genome.

    PubMed Central

    Dale, E C; Ow, D W

    1991-01-01

    A general method of gene transfer that does not leave behind a selectable marker in the host genome is described. A luciferase gene was introduced into the tobacco genome by using the hygromycin phosphotransferase gene (hpt) as a linked selectable marker. Flanked by recombination sites from the bacteriophage P1 Cre/lox recombination system, the hpt gene was subsequently excised from the plant genome by the Cre recombinase. The Cre-catalyzed excision event in the plant genome was precise and conservative--i.e., without loss or alteration of nucleotides in the recombinant site. After removal of the Cre-encoding locus by genetic segregation, plants were obtained that had incorporated only the desired transgene. Gene transfer without the incorporation of antibiotic-resistance markers in the host genome should ease public concerns over the field release of transgenic organisms expressing such traits. Moreover, it would obviate the need for different selectable markers in subsequent rounds of gene transfer into the same host. Images PMID:1660141

  2. Population and Functional Genomics of Neisseria Revealed with Gene-by-Gene Approaches

    PubMed Central

    Harrison, Odile B.

    2016-01-01

    Rapid low-cost whole-genome sequencing (WGS) is revolutionizing microbiology; however, complementary advances in accessible, reproducible, and rapid analysis techniques are required to realize the potential of these data. Here, investigations of the genus Neisseria illustrated the gene-by-gene conceptual approach to the organization and analysis of WGS data. Using the gene and its link to phenotype as a starting point, the BIGSdb database, which powers the PubMLST databases, enables the assembly of large open-access collections of annotated genomes that provide insight into the evolution of the Neisseria, the epidemiology of meningococcal and gonococcal disease, and mechanisms of Neisseria pathogenicity. PMID:27098959

  3. Genomic analyses of bacterial porin-cytochrome gene clusters

    DOE PAGES

    Shi, Liang; Fredrickson, James K.; Zachara, John M.

    2014-11-26

    In this study, the porin-cytochrome (Pcc) protein complex is responsible for trans-outer membrane electron transfer during extracellular reduction of Fe(III) by the dissimilatory metal-reducing bacterium Geobacter sulfurreducens PCA. The identified and characterized Pcc complex of G. sulfurreducens PCA consists of a porin-like outer-membrane protein, a periplasmic 8-heme c type cytochrome (c-Cyt) and an outer-membrane 12-heme c-Cyt, and the genes encoding the Pcc proteins are clustered in the same regions of genome (i.e., the pcc gene clusters) of G. sulfurreducens PCA. A survey of additionally microbial genomes has identified the pcc gene clusters in all sequenced Geobacter spp. and other bacteriamore » from six different phyla, including Anaeromyxobacter dehalogenans 2CP-1, A. dehalogenans 2CP-C, Anaeromyxobacter sp. K, Candidatus Kuenenia stuttgartiensis, Denitrovibrio acetiphilus DSM 12809, Desulfurispirillum indicum S5, Desulfurivibrio alkaliphilus AHT2, Desulfurobacterium thermolithotrophum DSM 11699, Desulfuromonas acetoxidans DSM 684, Ignavibacterium album JCM 16511, and Thermovibrio ammonificans HB-1. The numbers of genes in the pcc gene clusters vary, ranging from two to nine. Similar to the metal-reducing (Mtr) gene clusters of other Fe(III)-reducing bacteria, such as Shewanella spp., additional genes that encode putative c-Cyts with predicted cellular localizations at the cytoplasmic membrane, periplasm and outer membrane often associate with the pcc gene clusters. This suggests that the Pcc-associated c-Cyts may be part of the pathways for extracellular electron transfer reactions. The presence of pcc gene clusters in the microorganisms that do not reduce solid-phase Fe(III) and Mn(IV) oxides, such as D. alkaliphilus AHT2 and I. album JCM 16511, also suggests that some of the pcc gene clusters may be involved in extracellular electron transfer reactions with the substrates other than Fe(III) and Mn(IV) oxides.« less

  4. Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome

    PubMed Central

    Aubourg, Sébastien; Martin-Magniette, Marie-Laure; Brunaud, Véronique; Taconnat, Ludivine; Bitton, Frédérique; Balzergue, Sandrine; Jullien, Pauline E; Ingouff, Mathieu; Thareau, Vincent; Schiex, Thomas; Lecharny, Alain; Renou, Jean-Pierre

    2007-01-01

    Background Since the finishing of the sequencing of the Arabidopsis thaliana genome, the Arabidopsis community and the annotator centers have been working on the improvement of gene annotation at the structural and functional levels. In this context, we have used the large CATMA resource on the Arabidopsis transcriptome to search for genes missed by different annotation processes. Probes on the CATMA microarrays are specific gene sequence tags (GSTs) based on the CDS models predicted by the Eugene software. Among the 24 576 CATMA v2 GSTs, 677 are in regions considered as intergenic by the TAIR annotation. We analyzed the cognate transcriptome data in the CATMA resource and carried out data-mining to characterize novel genes and improve gene models. Results The statistical analysis of the results of more than 500 hybridized samples distributed among 12 organs provides an experimental validation for 465 novel genes. The hybridization evidence was confirmed by RT-PCR approaches for 88% of the 465 novel genes. Comparisons with the current annotation show that these novel genes often encode small proteins, with an average size of 137 aa. Our approach has also led to the improvement of pre-existing gene models through both the extension of 16 CDS and the identification of 13 gene models erroneously constituted of two merged CDS. Conclusion This work is a noticeable step forward in the improvement of the Arabidopsis genome annotation. We increased the number of Arabidopsis validated genes by 465 novel transcribed genes to which we associated several functional annotations such as expression profiles, sequence conservation in plants, cognate transcripts and protein motifs. PMID:17980019

  5. Genome-wide analysis of homeobox gene family in legumes: identification, gene duplication and expression profiling.

    PubMed

    Bhattacharjee, Annapurna; Ghangal, Rajesh; Garg, Rohini; Jain, Mukesh

    2015-01-01

    Homeobox genes encode transcription factors that are known to play a major role in different aspects of plant growth and development. In the present study, we identified homeobox genes belonging to 14 different classes in five legume species, including chickpea, soybean, Medicago, Lotus and pigeonpea. The characteristic differences within homeodomain sequences among various classes of homeobox gene family were quite evident. Genome-wide expression analysis using publicly available datasets (RNA-seq and microarray) indicated that homeobox genes are differentially expressed in various tissues/developmental stages and under stress conditions in different legumes. We validated the differential expression of selected chickpea homeobox genes via quantitative reverse transcription polymerase chain reaction. Genome duplication analysis in soybean indicated that segmental duplication has significantly contributed in the expansion of homeobox gene family. The Ka/Ks ratio of duplicated homeobox genes in soybean showed that several members of this family have undergone purifying selection. Moreover, expression profiling indicated that duplicated genes might have been retained due to sub-functionalization. The genome-wide identification and comprehensive gene expression profiling of homeobox gene family members in legumes will provide opportunities for functional analysis to unravel their exact role in plant growth and development.

  6. Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite.

    PubMed

    Borodovsky, Mark; Lomsadze, Alex

    2014-01-01

    This unit describes how to use several gene-finding programs from the GeneMark line developed for finding protein-coding ORFs in genomic DNA of prokaryotic species, in genomic DNA of eukaryotic species with intronless genes, in genomes of viruses and phages, and in prokaryotic metagenomic sequences, as well as in EST sequences with spliced-out introns. These bioinformatics tools were demonstrated to have state-of-the-art accuracy, and have been frequently used for gene annotation in novel nucleotide sequences. An additional advantage of these sequence-analysis tools is that the problem of algorithm parameterization is solved automatically, with parameters estimated by iterative self-training (unsupervised training). PMID:24510847

  7. The banana E2 gene family: Genomic identification, characterization, expression profiling analysis.

    PubMed

    Dong, Chen; Hu, Huigang; Jue, Dengwei; Zhao, Qiufang; Chen, Hongliang; Xie, Jianghui; Jia, Liqiang

    2016-04-01

    The E2 is at the center of a cascade of Ub1 transfers, and it links activation of the Ub1 by E1 to its eventual E3-catalyzed attachment to substrate. Although the genome-wide analysis of this family has been performed in some species, little is known about analysis of E2 genes in banana. In this study, 74 E2 genes of banana were identified and phylogenetically clustered into thirteen subgroups. The predicted banana E2 genes were distributed across all 11 chromosomes at different densities. Additionally, the E2 domain, gene structure and motif compositions were analyzed. The expression of all of the banana E2 genes was analyzed in the root, stem, leaf, flower organs, five stages of fruit development and under abiotic stresses. All of the banana E2 genes, with the exception of few genes in each group, were expressed in at least one of the organs and fruit developments, which indicated that the E2 genes might involve in various aspects of the physiological and developmental processes of the banana. Quantitative RT-PCR (qRT-PCR) analysis identified that 45 E2s under drought and 33 E2s under salt were induced. To the best of our knowledge, this report describes the first genome-wide analysis of the banana E2 gene family, and the results should provide valuable information for understanding the classification, cloning and putative functions of this family. PMID:26940488

  8. Duplex stem-loop-containing quadruplex motifs in the human genome: a combined genomic and structural study.

    PubMed

    Lim, Kah Wai; Jenjaroenpun, Piroon; Low, Zhen Jie; Khong, Zi Jian; Ng, Yi Siang; Kuznetsov, Vladimir Andreevich; Phan, Anh Tuân

    2015-06-23

    Duplex stem-loops and four-stranded G-quadruplexes have been implicated in (patho)biological processes. Overlap of stem-loop- and quadruplex-forming sequences could give rise to quadruplex-duplex hybrids (QDH), which combine features of both structural forms and could exhibit unique properties. Here, we present a combined genomic and structural study of stem-loop-containing quadruplex sequences (SLQS) in the human genome. Based on a maximum loop length of 20 nt, our survey identified 80 307 SLQS, embedded within 60 172 unique clusters. Our analysis suggested that these should cover close to half of total SLQS in the entire genome. Among these, 48 508 SLQS were strand-specifically located in genic/promoter regions, with the majority of genes displaying a low number of SLQS. Notably, genes containing abundant SLQS clusters were strongly associated with brain tissues. Enrichment analysis of SLQS-positive genes and mapping of SLQS onto transcriptional/mutagenesis hotspots and cancer-associated genes, provided a statistical framework supporting the biological involvements of SLQS. In vitro formation of diverse QDH by selective SLQS hits were successfully verified by nuclear magnetic resonance spectroscopy. Folding topologies of two SLQS were elucidated in detail. We also demonstrated that sequence changes at mutation/single-nucleotide polymorphism loci could affect the structural conformations adopted by SLQS. Thus, our predicted SLQS offer novel insights into the potential involvement of QDH in diverse (patho)biological processes and could represent novel regulatory signals.

  9. Duplex stem-loop-containing quadruplex motifs in the human genome: a combined genomic and structural study

    PubMed Central

    Lim, Kah Wai; Jenjaroenpun, Piroon; Low, Zhen Jie; Khong, Zi Jian; Ng, Yi Siang; Kuznetsov, Vladimir Andreevich; Phan, Anh Tuân

    2015-01-01

    Duplex stem-loops and four-stranded G-quadruplexes have been implicated in (patho)biological processes. Overlap of stem-loop- and quadruplex-forming sequences could give rise to quadruplex–duplex hybrids (QDH), which combine features of both structural forms and could exhibit unique properties. Here, we present a combined genomic and structural study of stem-loop-containing quadruplex sequences (SLQS) in the human genome. Based on a maximum loop length of 20 nt, our survey identified 80 307 SLQS, embedded within 60 172 unique clusters. Our analysis suggested that these should cover close to half of total SLQS in the entire genome. Among these, 48 508 SLQS were strand-specifically located in genic/promoter regions, with the majority of genes displaying a low number of SLQS. Notably, genes containing abundant SLQS clusters were strongly associated with brain tissues. Enrichment analysis of SLQS-positive genes and mapping of SLQS onto transcriptional/mutagenesis hotspots and cancer-associated genes, provided a statistical framework supporting the biological involvements of SLQS. In vitro formation of diverse QDH by selective SLQS hits were successfully verified by nuclear magnetic resonance spectroscopy. Folding topologies of two SLQS were elucidated in detail. We also demonstrated that sequence changes at mutation/single-nucleotide polymorphism loci could affect the structural conformations adopted by SLQS. Thus, our predicted SLQS offer novel insights into the potential involvement of QDH in diverse (patho)biological processes and could represent novel regulatory signals. PMID:25958397

  10. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes.

    PubMed

    Chan, Patricia P; Lowe, Todd M

    2016-01-01

    Transfer RNAs represent the largest, most ubiquitous class of non-protein coding RNA genes found in all living organisms. The tRNAscan-SE search tool has become the de facto standard for annotating tRNA genes in genomes, and the Genomic tRNA Database (GtRNAdb) was created as a portal for interactive exploration of these gene predictions. Since its published description in 2009, the GtRNAdb has steadily grown in content, and remains the most commonly cited web-based source of tRNA gene information. In this update, we describe not only a major increase in the number of tRNA predictions (>367000) and genomes analyzed (>4370), but more importantly, the integration of new analytic and functional data to improve the quality and biological context of tRNA gene predictions. New information drawn from other sources includes tRNA modification data, epigenetic data, single nucleotide polymorphisms, gene expression and evolutionary conservation. A richer set of analytic data is also presented, including better tRNA functional prediction, non-canonical features, predicted structural impacts from sequence variants and minimum free energy structural predictions. Views of tRNA genes in genomic context are provided via direct links to the UCSC genome browsers. The database can be searched by sequence or gene features, and is available at http://gtrnadb.ucsc.edu/.

  11. Stability domains of actin genes and genomic evolution

    NASA Astrophysics Data System (ADS)

    Carlon, E.; Dkhissi, A.; Malki, M. Lejard; Blossey, R.

    2007-11-01

    In eukaryotic genes, the protein coding sequence is split into several fragments, the exons, separated by noncoding DNA stretches, the introns. Prokaryotes do not have introns in their genomes. We report calculations of the stability domains of actin genes for various organisms in the animal, plant, and fungi kingdoms. Actin genes have been chosen because they have been highly conserved during evolution. In these genes, all introns were removed so as to mimic ancient genes at the time of the early eukaryotic development, i.e., before intron insertion. Common stability boundaries are found in evolutionarily distant organisms, which implies that these boundaries date from the early origin of eukaryotes. In general, the boundaries correspond with intron positions in the actins of vertebrates and other animals, but not much for plants and fungi. The sharpest boundary is found in a locus where fungi, algae, and animals have introns in positions separated by one nucleotide only, which identifies a hot spot for insertion. These results suggest that some introns may have been incorporated into the genomes through a thermodynamically driven mechanism, in agreement with previous observations on human genes. They also suggest a different mechanism for intron insertion in plants and animals.

  12. Systematically fragmented genes in a multipartite mitochondrial genome

    PubMed Central

    Vlcek, Cestmir; Marande, William; Teijeiro, Shona; Lukeš, Julius; Burger, Gertraud

    2011-01-01

    Arguably, the most bizarre mitochondrial DNA (mtDNA) is that of the euglenozoan eukaryote Diplonema papillatum. The genome consists of numerous small circular chromosomes none of which appears to encode a complete gene. For instance, the cox1 coding sequence is spread out over nine different chromosomes in non-overlapping pieces (modules), which are transcribed separately and joined to a contiguous mRNA by trans-splicing. Here, we examine how many genes are encoded by Diplonema mtDNA and whether all are fragmented and their transcripts trans-spliced. Module identification is challenging due to the sequence divergence of Diplonema mitochondrial genes. By employing most sensitive protein profile search algorithms and comparing genomic with cDNA sequence, we recognize a total of 11 typical mitochondrial genes. The 10 protein-coding genes are systematically chopped up into three to 12 modules of 60–350 bp length. The corresponding mRNAs are all trans-spliced. Identification of ribosomal RNAs is most difficult. So far, we only detect the 3′-module of the large subunit ribosomal RNA (rRNA); it does not trans-splice with other pieces. The small subunit rRNA gene remains elusive. Our results open new intriguing questions about the biochemistry and evolution of mitochondrial trans-splicing in Diplonema. PMID:20935050

  13. Genome structure and transcriptional regulation of human coronavirus NL63

    PubMed Central

    Pyrc, Krzysztof; Jebbink, Maarten F; Berkhout, Ben; van der Hoek, Lia

    2004-01-01

    Background Two human coronaviruses are known since the 1960s: HCoV-229E and HCoV-OC43. SARS-CoV was discovered in the early spring of 2003, followed by the identification of HCoV-NL63, the fourth member of the coronaviridae family that infects humans. In this study, we describe the genome structure and the transcription strategy of HCoV-NL63 by experimental analysis of the viral subgenomic mRNAs. Results The genome of HCoV-NL63 has the following gene order: 1a-1b-S-ORF3-E-M-N. The GC content of the HCoV-NL63 genome is extremely low (34%) compared to other coronaviruses, and we therefore performed additional analysis of the nucleotide composition. Overall, the RNA genome is very low in C and high in U, and this is also reflected in the codon usage. Inspection of the nucleotide composition along the genome indicates that the C-count increases significantly in the last one-third of the genome at the expense of U and G. We document the production of subgenomic (sg) mRNAs coding for the S, ORF3, E, M and N proteins. We did not detect any additional sg mRNA. Furthermore, we sequenced the 5' end of all sg mRNAs, confirming the presence of an identical leader sequence in each sg mRNA. Northern blot analysis indicated that the expression level among the sg mRNAs differs significantly, with the sg mRNA encoding nucleocapsid (N) being the most abundant. Conclusions The presented data give insight into the viral evolution and mutational patterns in coronaviral genome. Furthermore our data show that HCoV-NL63 employs the discontinuous replication strategy with generation of subgenomic mRNAs during the (-) strand synthesis. Because HCoV-NL63 has a low pathogenicity and is able to grow easily in cell culture, this virus can be a powerful tool to study SARS coronavirus pathogenesis. PMID:15548333

  14. Confluence of genes, environment, development, and behavior in a post Genome-Wide Association Study world.

    PubMed

    Vrieze, Scott I; Iacono, William G; McGue, Matt

    2012-11-01

    This article serves to outline a research paradigm to investigate main effects and interactions of genes, environment, and development on behavior and psychiatric illness. We provide a historical context for candidate gene studies and genome-wide association studies, including benefits, limitations, and expected payoffs. Using substance use and abuse as our driving example, we then turn to the importance of etiological psychological theory in guiding genetic, environmental, and developmental research, as well as the utility of refined phenotypic measures, such as endophenotypes, in the pursuit of etiological understanding and focused tests of genetic and environmental associations. Phenotypic measurement has received considerable attention in the history of psychology and is informed by psychometrics, whereas the environment remains relatively poorly measured and is often confounded with genetic effects (i.e., gene-environment correlation). Genetically informed designs, which are no longer limited to twin and adoption studies thanks to ever-cheaper genotyping, are required to understand environmental influences. Finally, we outline the vast amount of individual difference in structural genomic variation, most of which remains to be leveraged in genetic association tests. Although the genetic data can be massive and burdensome (tens of millions of variants per person), we argue that improved understanding of genomic structure and function will provide investigators with new tools to test specific a priori hypotheses derived from etiological psychological theory, much like current candidate gene research but with less confusion and more payoff than candidate gene research has to date. PMID:23062291

  15. GeneSV - an Approach to Help Characterize Possible Variations in Genomic and Protein Sequences.

    PubMed

    Zemla, Adam; Kostova, Tanya; Gorchakov, Rodion; Volkova, Evgeniya; Beasley, David W C; Cardosa, Jane; Weaver, Scott C; Vasilakis, Nikos; Naraghi-Arani, Pejman

    2014-01-01

    A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism. PMID:24453480

  16. Genome-Wide Analysis and Characterization of Aux/IAA Family Genes in Brassica rapa

    PubMed Central

    Rameneni, Jana Jeevan; Li, Xiaonan; Sivanandhan, Ganesan; Choi, Su Ryun; Pang, Wenxing; Im, Subin; Lim, Yong Pyo

    2016-01-01

    Auxins are the key players in plant growth development involving leaf formation, phototropism, root, fruit and embryo development. Auxin/Indole-3-Acetic Acid (Aux/IAA) are early auxin response genes noted as transcriptional repressors in plant auxin signaling. However, many studies focus on Aux/ARF gene families and much less is known about the Aux/IAA gene family in Brassica rapa (B. rapa). Here we performed a comprehensive genome-wide analysis and identified 55 Aux/IAA genes in B. rapa using four conserved motifs of Aux/IAA family (PF02309). Chromosomal mapping of the B. rapa Aux/IAA (BrIAA) genes facilitated understanding cluster rearrangement of the crucifer building blocks in the genome. Phylogenetic analysis of BrIAA with Arabidopsis thaliana, Oryza sativa and Zea mays identified 51 sister pairs including 15 same species (BrIAA—BrIAA) and 36 cross species (BrIAA—AtIAA) IAA genes. Among the 55 BrIAA genes, expression of 43 and 45 genes were verified using Genebank B. rapa ESTs and in home developed microarray data from mature leaves of Chiifu and RcBr lines. Despite their huge morphological difference, tissue specific expression analysis of BrIAA genes between the parental lines Chiifu and RcBr showed that the genes followed a similar pattern of expression during leaf development and a different pattern during bud, flower and siliqua development stages. The response of the BrIAA genes to abiotic and auxin stress at different time intervals revealed their involvement in stress response. Single Nucleotide Polymorphisms between IAA genes of reference genome Chiifu and RcBr were focused and identified. Our study examines the scope of conservation and divergence of Aux/IAA genes and their structures in B. rapa. Analyzing the expression and structural variation between two parental lines will significantly contribute to functional genomics of Brassica crops and we belive our study would provide a foundation in understanding the Aux/IAA genes in B. rapa. PMID

  17. Genome-Wide Analysis and Characterization of Aux/IAA Family Genes in Brassica rapa.

    PubMed

    Paul, Parameswari; Dhandapani, Vignesh; Rameneni, Jana Jeevan; Li, Xiaonan; Sivanandhan, Ganesan; Choi, Su Ryun; Pang, Wenxing; Im, Subin; Lim, Yong Pyo

    2016-01-01

    Auxins are the key players in plant growth development involving leaf formation, phototropism, root, fruit and embryo development. Auxin/Indole-3-Acetic Acid (Aux/IAA) are early auxin response genes noted as transcriptional repressors in plant auxin signaling. However, many studies focus on Aux/ARF gene families and much less is known about the Aux/IAA gene family in Brassica rapa (B. rapa). Here we performed a comprehensive genome-wide analysis and identified 55 Aux/IAA genes in B. rapa using four conserved motifs of Aux/IAA family (PF02309). Chromosomal mapping of the B. rapa Aux/IAA (BrIAA) genes facilitated understanding cluster rearrangement of the crucifer building blocks in the genome. Phylogenetic analysis of BrIAA with Arabidopsis thaliana, Oryza sativa and Zea mays identified 51 sister pairs including 15 same species (BrIAA-BrIAA) and 36 cross species (BrIAA-AtIAA) IAA genes. Among the 55 BrIAA genes, expression of 43 and 45 genes were verified using Genebank B. rapa ESTs and in home developed microarray data from mature leaves of Chiifu and RcBr lines. Despite their huge morphological difference, tissue specific expression analysis of BrIAA genes between the parental lines Chiifu and RcBr showed that the genes followed a similar pattern of expression during leaf development and a different pattern during bud, flower and siliqua development stages. The response of the BrIAA genes to abiotic and auxin stress at different time intervals revealed their involvement in stress response. Single Nucleotide Polymorphisms between IAA genes of reference genome Chiifu and RcBr were focused and identified. Our study examines the scope of conservation and divergence of Aux/IAA genes and their structures in B. rapa. Analyzing the expression and structural variation between two parental lines will significantly contribute to functional genomics of Brassica crops and we belive our study would provide a foundation in understanding the Aux/IAA genes in B. rapa.

  18. Genome-Wide Analysis and Characterization of Aux/IAA Family Genes in Brassica rapa.

    PubMed

    Paul, Parameswari; Dhandapani, Vignesh; Rameneni, Jana Jeevan; Li, Xiaonan; Sivanandhan, Ganesan; Choi, Su Ryun; Pang, Wenxing; Im, Subin; Lim, Yong Pyo

    2016-01-01

    Auxins are the key players in plant growth development involving leaf formation, phototropism, root, fruit and embryo development. Auxin/Indole-3-Acetic Acid (Aux/IAA) are early auxin response genes noted as transcriptional repressors in plant auxin signaling. However, many studies focus on Aux/ARF gene families and much less is known about the Aux/IAA gene family in Brassica rapa (B. rapa). Here we performed a comprehensive genome-wide analysis and identified 55 Aux/IAA genes in B. rapa using four conserved motifs of Aux/IAA family (PF02309). Chromosomal mapping of the B. rapa Aux/IAA (BrIAA) genes facilitated understanding cluster rearrangement of the crucifer building blocks in the genome. Phylogenetic analysis of BrIAA with Arabidopsis thaliana, Oryza sativa and Zea mays identified 51 sister pairs including 15 same species (BrIAA-BrIAA) and 36 cross species (BrIAA-AtIAA) IAA genes. Among the 55 BrIAA genes, expression of 43 and 45 genes were verified using Genebank B. rapa ESTs and in home developed microarray data from mature leaves of Chiifu and RcBr lines. Despite their huge morphological difference, tissue specific expression analysis of BrIAA genes between the parental lines Chiifu and RcBr showed that the genes followed a similar pattern of expression during leaf development and a different pattern during bud, flower and siliqua development stages. The response of the BrIAA genes to abiotic and auxin stress at different time intervals revealed their involvement in stress response. Single Nucleotide Polymorphisms between IAA genes of reference genome Chiifu and RcBr were focused and identified. Our study examines the scope of conservation and divergence of Aux/IAA genes and their structures in B. rapa. Analyzing the expression and structural variation between two parental lines will significantly contribute to functional genomics of Brassica crops and we belive our study would provide a foundation in understanding the Aux/IAA genes in B. rapa. PMID

  19. Genome-wide analysis of the R2R3-MYB transcription factor gene family in sweet orange (Citrus sinensis).

    PubMed

    Liu, Chaoyang; Wang, Xia; Xu, Yuantao; Deng, Xiuxin; Xu, Qiang

    2014-10-01

    MYB transcription factor represents one of the largest gene families in plant genomes. Sweet orange (Citrus sinensis) is one of the most important fruit crops worldwide, and recently the genome has been sequenced. This provides an opportunity to investigate the organization and evolutionary characteristics of sweet orange MYB genes from whole genome view. In the present study, we identified 100 R2R3-MYB genes in the sweet orange genome. A comprehensive analysis of this gene family was performed, including the phylogeny, gene structure, chromosomal localization and expression pattern analyses. The 100 genes were divided into 29 subfamilies based on the sequence similarity and phylogeny, and the classification was also well supported by the highly conserved exon/intron structures and motif composition. The phylogenomic comparison of MYB gene family among sweet orange and related plant species, Arabidopsis, cacao and papaya suggested the existence of functional divergence during evolution. Expression profiling indicated that sweet orange R2R3-MYB genes exhibited distinct temporal and spatial expression patterns. Our analysis suggested that the sweet orange MYB genes may play important roles in different plant biological processes, some of which may be potentially involved in citrus fruit quality. These results will be useful for future functional analysis of the MYB gene family in sweet orange.

  20. Population structure and minimum core genome typing of Legionella pneumophila

    PubMed Central

    Qin, Tian; Zhang, Wen; Liu, Wenbin; Zhou, Haijian; Ren, Hongyu; Shao, Zhujun; Lan, Ruiting; Xu, Jianguo

    2016-01-01

    Legionella pneumophila is an important human pathogen causing Legionnaires’ disease. In this study, whole genome sequencing (WGS) was used to study the characteristics and population structure of L. pneumophila strains. We sequenced and compared 53 isolates of L. pneumophila covering different serogroups and sequence-based typing (SBT) types (STs). We found that 1,896 single-copy orthologous genes were shared by all isolates and were defined as the minimum core genome (MCG) of L. pneumophila. A total of 323,224 single-nucleotide polymorphisms (SNPs) were identified among the 53 strains. After excluding 314,059 SNPs which were likely to be results of recombination, the remaining 9,165 SNPs were referred to as MCG SNPs. Population Structure analysis based on MCG divided the 53 L. pneumophila into nine MCG groups. The within-group distances were much smaller than the between-group distances, indicating considerable divergence between MCG groups. MCG groups were also supplied by phylogenetic analysis and may be considered as robust taxonomic units within L. pneumophila. Among the nine MCG groups, eight showed high intracellular growth ability while one showed low intracellular growth ability. Furthermore, MCG typing also showed high resolution in subtyping ST1 strains. The results obtained in this study provided significant insights into the evolution, population structure and pathogenicity of L. pneumophila. PMID:26888563

  1. Rapid and efficient genome-wide characterization of Xanthomonas TAL effector genes

    PubMed Central

    Yu, Yan-Hua; Lu, Ye; He, Yong-Qiang; Huang, Sheng; Tang, Ji-Liang

    2015-01-01

    Xanthomonas TALE transcriptional activators act as virulence or avirulence factors by activating host disease susceptibility or resistance genes. Their specificity is determined by a tandem repeat domain. Some Xanthomonas pathogens contain 10–30 TALEs per strain. Although TALEs play critical roles in pathogenesis, their studies have so far been limited to a few examples, due to their highly repetitive gene structure and extreme similarity among different members, which constrict sequencing and assembling. To facilitate TALE studies, we developed an efficient and rapid pipeline for genome-wide cloning of tal genes as many as possible from a strain. Here, we report the pipeline and its use to identify all 18 tal genes from a newly isolated strain of the rice pathogen Xathomonas oryzae. Target prediction revealed a number of potential rice targets including several notable genes such as genes encoding SWEET, WRKY, Hen1, and BAK1 proteins, which provide candidates for further experimental functional analysis of the TALEs. PMID:26271455

  2. Rapid and efficient genome-wide characterization of Xanthomonas TAL effector genes.

    PubMed

    Yu, Yan-Hua; Lu, Ye; He, Yong-Qiang; Huang, Sheng; Tang, Ji-Liang

    2015-01-01

    Xanthomonas TALE transcriptional activators act as virulence or avirulence factors by activating host disease susceptibility or resistance genes. Their specificity is determined by a tandem repeat domain. Some Xanthomonas pathogens contain 10-30 TALEs per strain. Although TALEs play critical roles in pathogenesis, their studies have so far been limited to a few examples, due to their highly repetitive gene structure and extreme similarity among different members, which constrict sequencing and assembling. To facilitate TALE studies, we developed an efficient and rapid pipeline for genome-wide cloning of tal genes as many as possible from a strain. Here, we report the pipeline and its use to identify all 18 tal genes from a newly isolated strain of the rice pathogen Xathomonas oryzae. Target prediction revealed a number of potential rice targets including several notable genes such as genes encoding SWEET, WRKY, Hen1, and BAK1 proteins, which provide candidates for further experimental functional analysis of the TALEs. PMID:26271455

  3. Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

    SciTech Connect

    Condon, Bradford J.; Leng, Yueqiang; Wu, Dongliang; Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinlzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian

    2013-01-24

    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25 higher than those between inbred lines and 50 lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.

  4. Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

    PubMed Central

    Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinIzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian

    2013-01-01

    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five percent of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25× higher than those between inbred lines and 50× lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP–encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence. PMID:23357949

  5. Complete Sequence and Gene Organization of the Mitochondrial Genome of the Land Snail Albinaria Coerulea

    PubMed Central

    Hatzoglou, E.; Rodakis, G. C.; Lecanidou, R.

    1995-01-01

    The complete sequence (14,130 bp) of the mitochondrial DNA (mtDNA) of the land snail Albinaria coerulea was determined. It contains 13 protein, two rRNA and 22 tRNA genes. Twenty-four of these genes are encoded by one and 13 genes by the other strand. The gene arrangement shares almost no similarities with that of two other molluscs for which the complete gene content and arrangement are known, the bivalve Mytilus edulis and the chiton Katharina tunicata; the protein and rRNA gene order is similar to that of another terrestrial gastropod, Cepaea nemoralis. Unusual features include the following: (1) the absence of lengthy noncoding regions (there are only 141 intergenic nucleotides interspersed at different gene borders, the longest intergenic sequence being 42 nucleotides), (2) the presence of several overlapping genes (mostly tRNAs), (3) the presence of tRNA-like structures and other stem and loop structures within genes. An RNA editing system acting on tRNAs must necessarily be invoked for posttranscriptional extension of the overlapping tRNAs. Due to these features, and also because of the small size of its genes (e.g., it contains the smallest rRNA genes among the known coelomates), it is one of the most compact mitochondrial genomes known to date. PMID:7498775

  6. Genomic organization and evolution of ruminant lysozyme c genes.

    PubMed

    Irwin, David M

    2015-01-18

    Ruminant stomach lysozyme is a long established model of adaptive gene evolution. Evolution of stomach lysozyme function required changes in the site of expression of the lysozyme c gene and changes in the enzymatic properties of the enzyme. In ruminant mammals, these changes were associated with a change in the size of the lysozyme c gene family. The recent release of near complete genome sequences from several ruminant species allows a more complete examination of the evolution and diversification of the lysozyme c gene family. Here we characterize the size of the lysozyme c gene family in extant ruminants and demonstrate that their pecoran ruminant ancestor had a family of at least 10 lysozyme c genes, which included at least two pseudogenes. Evolutionary analysis of the ruminant lysozyme c gene sequences demonstrate that each of the four exons of the lysozyme c gene has a unique evolutionary history, indicating that they participated independently in concerted evolution. These analyses also show that episodic changes in the evolutionary constraints on the protein sequences occurred, with lysozyme c genes expressed in the abomasum of the stomach of extant ruminant species showing the greatest levels of selective constraints.

  7. Transport genes and chemotaxis in Laribacter hongkongensis: a genome-wide analysis

    PubMed Central

    2011-01-01

    Background Laribacter hongkongensis is a Gram-negative, sea gull-shaped rod associated with community-acquired gastroenteritis. The bacterium has been found in diverse freshwater environments including fish, frogs and drinking water reservoirs. Using the complete genome sequence data of L. hongkongensis, we performed a comprehensive analysis of putative transport-related genes and genes related to chemotaxis, motility and quorum sensing, which may help the bacterium adapt to the changing environments and combat harmful substances. Results A genome-wide analysis using Transport Classification Database TCDB, similarity and keyword searches revealed the presence of a large diversity of transporters (n = 457) and genes related to chemotaxis (n = 52) and flagellar biosynthesis (n = 40) in the L. hongkongensis genome. The transporters included those from all seven major transporter categories, which may allow the uptake of essential nutrients or ions, and extrusion of metabolic end products and hazardous substances. L. hongkongensis is unique among closely related members of Neisseriaceae family in possessing higher number of proteins related to transport of ammonium, urea and dicarboxylate, which may reflect the importance of nitrogen and dicarboxylate metabolism in this assacharolytic bacterium. Structural modeling of two C4-dicarboxylate transporters showed that they possessed similar structures to the determined structures of other DctP-TRAP transporters, with one having an unusual disulfide bond. Diverse mechanisms for iron transport, including hemin transporters for iron acquisition from host proteins, were also identified. In addition to the chemotaxis and flagella-related genes, the L. hongkongensis genome also contained two copies of qseB/qseC homologues of the AI-3 quorum sensing system. Conclusions The large number of diverse transporters and genes involved in chemotaxis, motility and quorum sensing suggested that the bacterium may utilize a complex system to

  8. Genomic organization of the human NSP gene, prototype of a novel gene family encoding reticulons

    SciTech Connect

    Roebroek, A.J.M.; Ayoubi, T.A.Y.; Velde, H.J.K. van de; Schoenmakers, E.F.P.M.; Pauli, I.G.L.; Van De Ven, W.J.M.

    1996-03-01

    Recently, cDNA cloning and expression of three mRNA variants of the human NSP gene were described. This neuroendocrine-specific gene encodes three NSP protein isoforms with unique amino-terminal parts, but common carboxy-terminal parts. The proteins, with yet unknown function, are associated with the endoplasmic reticulum and therefore are named NSP reticulons. Potentially, these proteins are neuroendocrine markers of a novel category in human lung cancer diagnosis. Here, the genomic organization of this gene was studied by analysis of genomic clones isolated from lambda phage and YAC libraries. The NSP exons were found to be dispersed over a genomic region of about 275 kb. The present elucidation of the genomic organization of the NSP gene explains the generation of NSP mRNA variants encoding NSP protein isoforms. Multiple promoters rather than alternative splicing of internal exons seem to be involved in this diversity. Furthermore, comparison of NSP genomic and cDNA sequences with databank nucleotide sequences resulted in the discovery of other human members of this novel family of reticulons encoding genes. 25 refs., 4 figs.

  9. Identification of novel virulence-associated genes via genome analysis of hypothetical genes.

    PubMed

    Garbom, Sara; Forsberg, Ake; Wolf-Watz, Hans; Kihlberg, Britt-Marie

    2004-03-01

    The sequencing of bacterial genomes has opened new perspectives for identification of targets for treatment of infectious diseases. We have identified a set of novel virulence-associated genes (vag genes) by comparing the genome sequences of six human pathogens that are known to cause persistent or chronic infections in humans: Yersinia pestis, Neisseria gonorrhoeae, Helicobacter pylori, Borrelia burgdorferi, Streptococcus pneumoniae, and Treponema pallidum. This comparison was limited to genes annotated as hypothetical in the T. pallidum genome project. Seventeen genes with unknown functions were found to be conserved among these pathogens. Insertional inactivation of 14 of these genes generated nine mutants that were attenuated for virulence in a mouse infection model. Out of these nine genes, five were found to be specifically associated with virulence in mice as demonstrated by infection with Yersinia pseudotuberculosis in-frame deletion mutants. In addition, these five vag genes were essential only in vivo, since all the mutants were able to grow in vitro. These genes are broadly conserved among bacteria. Therefore, we propose that the corresponding vag gene products may constitute novel targets for antimicrobial therapy and that some vag mutants could serve as carrier strains for live vaccines. PMID:14977936

  10. Genomic discovery of potent chromatin insulators for human gene therapy.

    PubMed

    Liu, Mingdong; Maurano, Matthew T; Wang, Hao; Qi, Heyuan; Song, Chao-Zhong; Navas, Patrick A; Emery, David W; Stamatoyannopoulos, John A; Stamatoyannopoulos, George

    2015-02-01

    Insertional mutagenesis and genotoxicity, which usually manifest as hematopoietic malignancy, represent major barriers to realizing the promise of gene therapy. Although insulator sequences that block transcriptional enhancers could mitigate or eliminate these risks, so far no human insulators with high functional potency have been identified. Here we describe a genomic approach for the identification of compact sequence elements that function as insulators. These elements are highly occupied by the insulator protein CTCF, are DNase I hypersensitive and represent only a small minority of the CTCF recognition sequences in the human genome. We show that the elements identified acted as potent enhancer blockers and substantially decreased the risk of tumor formation in a cancer-prone animal model. The elements are small, can be efficiently accommodated by viral vectors and have no detrimental effects on viral titers. The insulators we describe here are expected to increase the safety of gene therapy for genetic diseases.

  11. Evolutionary Design of Gene Networks: Forced Evolution by Genomic Parasites

    PubMed Central

    Spirov, A. V.; Zagriychuk, E. A.; Holloway, D. M.

    2014-01-01

    The co-evolution of species with their genomic parasites (transposons) is thought to be one of the primary ways of rewiring gene regulatory networks (GRNs). We develop a framework for conducting evolutionary computations (EC) using the transposon mechanism. We find that the selective pressure of transposons can speed evolutionary searches for solutions and lead to outgrowth of GRNs (through co-option of new genes to acquire insensitivity to the attacking transposons). We test the approach by finding GRNs which can solve a fundamental problem in developmental biology: how GRNs in early embryo development can robustly read maternal signaling gradients, despite continued attacks on the genome by transposons. We observed co-evolutionary oscillations in the abundance of particular GRNs and their transposons, reminiscent of predator-prey or host-parasite dynamics. PMID:25558118

  12. Sequence, genomic structure, and chromosomal assignment of human DOC-2

    SciTech Connect

    Albertsen, H.M.; Williams, B.; Smith, S.A.

    1996-04-15

    DOC-2 is a human gene originally identified as a 767-bp cDNA fragment isolated from normal ovarian epithelial cells by differential display against ovarian carcinoma cells. We have now determined the complete cDNA sequence of the 3.2-kb DOC-2 transcript and localized the gene to chromosome 5. A 12.5-kb genomic fragment at the 5{prime}-end of DOC-2 has also been sequenced, revealing the intron-exon structure of the first eight exons (788 bases) of the DOC-2 gene. Translation of the DOC-2 cDNA predicts a hydrophobic protein of 770 amino acid residues with a molecular weight of 82.5 kDa. Comparison of the DNA and amino acid sequences of DOC-2 to publicly accessible sequence data-bases revealed 83% identity to p96, a murine-responsive phosphoprotein. In addition, about 45% identity was observed between the first 140 N-terminal residues of DOC-2 and the Caenorhabditas elegans M110.5 and Drosophila melanoaster Dab genes. 14 refs., 3 figs.

  13. Comparative genomics of mitochondria in chlorarachniophyte algae: endosymbiotic gene transfer and organellar genome dynamics.

    PubMed

    Tanifuji, Goro; Archibald, John M; Hashimoto, Tetsuo

    2016-01-01

    Chlorarachniophyte algae possess four DNA-containing compartments per cell, the nucleus, mitochondrion, plastid and nucleomorph, the latter being a relic nucleus derived from a secondary endosymbiont. While the evolutionary dynamics of plastid and nucleomorph genomes have been investigated, a comparative investigation of mitochondrial genomes (mtDNAs) has not been carried out. We have sequenced the complete mtDNA of Lotharella oceanica and compared it to that of another chlorarachniophyte, Bigelowiella natans. The linear mtDNA of L. oceanica is 36.7 kbp in size and contains 35 protein genes, three rRNAs and 24 tRNAs. The codons GUG and UUG appear to be capable of acting as initiation codons in the chlorarachniophyte mtDNAs, in addition to AUG. Rpl16, rps4 and atp8 genes are missing in L.oceanica mtDNA, despite being present in B. natans mtDNA. We searched for, and found, mitochondrial rpl16 and rps4 genes with spliceosomal introns in the L. oceanica nuclear genome, indicating that mitochondrion-to-host-nucleus gene transfer occurred after the divergence of these two genera. Despite being of similar size and coding capacity, the level of synteny between L. oceanica and B. natans mtDNA is low, suggesting frequent rearrangements. Overall, our results suggest that chlorarachniophyte mtDNAs are more evolutionarily dynamic than their plastid counterparts. PMID:26888293

  14. Comparative genomics of mitochondria in chlorarachniophyte algae: endosymbiotic gene transfer and organellar genome dynamics.

    PubMed

    Tanifuji, Goro; Archibald, John M; Hashimoto, Tetsuo

    2016-02-18

    Chlorarachniophyte algae possess four DNA-containing compartments per cell, the nucleus, mitochondrion, plastid and nucleomorph, the latter being a relic nucleus derived from a secondary endosymbiont. While the evolutionary dynamics of plastid and nucleomorph genomes have been investigated, a comparative investigation of mitochondrial genomes (mtDNAs) has not been carried out. We have sequenced the complete mtDNA of Lotharella oceanica and compared it to that of another chlorarachniophyte, Bigelowiella natans. The linear mtDNA of L. oceanica is 36.7 kbp in size and contains 35 protein genes, three rRNAs and 24 tRNAs. The codons GUG and UUG appear to be capable of acting as initiation codons in the chlorarachniophyte mtDNAs, in addition to AUG. Rpl16, rps4 and atp8 genes are missing in L.oceanica mtDNA, despite being present in B. natans mtDNA. We searched for, and found, mitochondrial rpl16 and rps4 genes with spliceosomal introns in the L. oceanica nuclear genome, indicating that mitochondrion-to-host-nucleus gene transfer occurred after the divergence of these two genera. Despite being of similar size and coding capacity, the level of synteny between L. oceanica and B. natans mtDNA is low, suggesting frequent rearrangements. Overall, our results suggest that chlorarachniophyte mtDNAs are more evolutionarily dynamic than their plastid counterparts.

  15. Comparative genomics of mitochondria in chlorarachniophyte algae: endosymbiotic gene transfer and organellar genome dynamics

    NASA Astrophysics Data System (ADS)

    Tanifuji, Goro; Archibald, John M.; Hashimoto, Tetsuo

    2016-02-01

    Chlorarachniophyte algae possess four DNA-containing compartments per cell, the nucleus, mitochondrion, plastid and nucleomorph, the latter being a relic nucleus derived from a secondary endosymbiont. While the evolutionary dynamics of plastid and nucleomorph genomes have been investigated, a comparative investigation of mitochondrial genomes (mtDNAs) has not been carried out. We have sequenced the complete mtDNA of Lotharella oceanica and compared it to that of another chlorarachniophyte, Bigelowiella natans. The linear mtDNA of L. oceanica is 36.7 kbp in size and contains 35 protein genes, three rRNAs and 24 tRNAs. The codons GUG and UUG appear to be capable of acting as initiation codons in the chlorarachniophyte mtDNAs, in addition to AUG. Rpl16, rps4 and atp8 genes are missing in L.oceanica mtDNA, despite being present in B. natans mtDNA. We searched for, and found, mitochondrial rpl16 and rps4 genes with spliceosomal introns in the L. oceanica nuclear genome, indicating that mitochondrion-to-host-nucleus gene transfer occurred after the divergence of these two genera. Despite being of similar size and coding capacity, the level of synteny between L. oceanica and B. natans mtDNA is low, suggesting frequent rearrangements. Overall, our results suggest that chlorarachniophyte mtDNAs are more evolutionarily dynamic than their plastid counterparts.

  16. Comparative genomics of mitochondria in chlorarachniophyte algae: endosymbiotic gene transfer and organellar genome dynamics

    PubMed Central

    Tanifuji, Goro; Archibald, John M.; Hashimoto, Tetsuo

    2016-01-01

    Chlorarachniophyte algae possess four DNA-containing compartments per cell, the nucleus, mitochondrion, plastid and nucleomorph, the latter being a relic nucleus derived from a secondary endosymbiont. While the evolutionary dynamics of plastid and nucleomorph genomes have been investigated, a comparative investigation of mitochondrial genomes (mtDNAs) has not been carried out. We have sequenced the complete mtDNA of Lotharella oceanica and compared it to that of another chlorarachniophyte, Bigelowiella natans. The linear mtDNA of L. oceanica is 36.7 kbp in size and contains 35 protein genes, three rRNAs and 24 tRNAs. The codons GUG and UUG appear to be capable of acting as initiation codons in the chlorarachniophyte mtDNAs, in addition to AUG. Rpl16, rps4 and atp8 genes are missing in L.oceanica mtDNA, despite being present in B. natans mtDNA. We searched for, and found, mitochondrial rpl16 and rps4 genes with spliceosomal introns in the L. oceanica nuclear genome, indicating that mitochondrion-to-host-nucleus gene transfer occurred after the divergence of these two genera. Despite being of similar size and coding capacity, the level of synteny between L. oceanica and B. natans mtDNA is low, suggesting frequent rearrangements. Overall, our results suggest that chlorarachniophyte mtDNAs are more evolutionarily dynamic than their plastid counterparts. PMID:26888293

  17. Genomic aberrations frequently alter chromatin regulatory genes in chordoma.

    PubMed

    Wang, Lu; Zehir, Ahmet; Nafa, Khedoudja; Zhou, Nengyi; Berger, Michael F; Casanova, Jacklyn; Sadowska, Justyna; Lu, Chao; Allis, C David; Gounder, Mrinal; Chandhanayingyong, Chandhanarat; Ladanyi, Marc; Boland, Patrick J; Hameed, Meera

    2016-07-01

    Chordoma is a rare primary bone neoplasm that is resistant to standard chemotherapies. Despite aggressive surgical management, local recurrence and metastasis is not uncommon. To identify the specific genetic aberrations that play key roles in chordoma pathogenesis, we utilized a genome-wide high-resolution SNP-array and next generation sequencing (NGS)-based molecular profiling platform to study 24 patient samples with typical histopathologic features of chordoma. Matching normal tissues were available for 16 samples. SNP-array analysis revealed nonrandom copy number losses across the genome, frequently involving 3, 9p, 1p, 14, 10, and 13. In contrast, copy number gain is uncommon in chordomas. Two minimum deleted regions were observed on 3p within a ∼8 Mb segment at 3p21.1-p21.31, which overlaps SETD2, BAP1 and PBRM1. The minimum deleted region on 9p was mapped to CDKN2A locus at 9p21.3, and homozygous deletion of CDKN2A was detected in 5/22 chordomas (∼23%). NGS-based molecular profiling demonstrated an extremely low level of mutation rate in chordomas, with an average of 0.5 mutations per sample for the 16 cases with matched normal. When the mutated genes were grouped based on molecular functions, many of the mutation events (∼40%) were found in chromatin regulatory genes. The combined copy number and mutation profiling revealed that SETD2 is the single gene affected most frequently in chordomas, either by deletion or by mutations. Our study demonstrated that chordoma belongs to the C-class (copy number changes) tumors whose oncogenic signature is non-random multiple copy number losses across the genome and genomic aberrations frequently alter chromatin regulatory genes. © 2016 Wiley Periodicals, Inc.

  18. Genomic Aberrations Frequently Alter Chromatin Regulatory Genes in Chordoma

    PubMed Central

    Wang, Lu; Zehir, Ahmet; Nafa, Khedoudja; Zhou, Nengyi; Berger, Michael F.; Casanova, Jacklyn; Sadowska, Justyna; Lu, Chao; Allis, C. David; Gounder, Mrinal; Chandhanayingyong, Chandhanarat; Ladanyi, Marc; Boland, Patrick J; Hameed, Meera

    2016-01-01

    Chordoma is a rare primary bone neoplasm that is resistant to standard chemotherapies. Despite aggressive surgical management, local recurrence and metastasis is not uncommon. To identify the specific genetic aberrations that play key roles in chordoma pathogenesis, we utilized a genome-wide high-resolution SNP-array and next generation sequencing (NGS)-based molecular profiling platform to study 24 patient samples with typical histopathologic features of chordoma. Matching normal tissues were available for 16 samples. SNP-array analysis revealed nonrandom copy number losses across the genome, frequently involving 3, 9p, 1p, 14, 10, and 13. In contrast, copy number gain is uncommon in chordomas. Two minimum deleted regions were observed on 3p within a ~8 Mb segment at 3p21.1–p21.31, which overlaps SETD2, BAP1 and PBRM1. The minimum deleted region on 9p was mapped to CDKN2A locus at 9p21.3, and homozygous deletion of CDKN2A was detected in 5/22 chordomas (~23%). NGS-based molecular profiling demonstrated an extremely low level of mutation rate in chordomas, with an average of 0.5 mutations per sample for the 16 cases with matched normal. When the mutated genes were grouped based on molecular functions, many of the mutation events (~40%) were found in chromatin regulatory genes. The combined copy number and mutation profiling revealed that SETD2 is the single gene affected most frequently in chordomas, either by deletion or by mutations. Our study demonstrated that chordoma belongs to the C-class (copy number changes) tumors whose oncogenic signature is non-random multiple copy number losses across the genome and genomic aberrations frequently alter chromatin regulatory genes. PMID:27072194

  19. Re-examining the Gene in Personalized Genomics

    NASA Astrophysics Data System (ADS)

    Bartol, Jordan

    2013-10-01

    Personalized genomics companies (PG; also called `direct-to-consumer genetics') are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept presented to customers and the relation between the information given and the science behind PG. Two quite different gene concepts are present in company rhetoric, but only one features in the science. To explain this, we must appreciate the delicate tension between PG, academic science, public expectation, and market forces.

  20. Metabolic Genes within Cyanophage Genomes: Implications for Diversity and Evolution

    PubMed Central

    Gao, E-Bin; Huang, Youhua; Ning, Degang

    2016-01-01

    Cyanophages, a group of viruses specifically infecting cyanobacteria, are genetically diverse and extensively abundant in water environments. As a result of selective pressure, cyanophages often acquire a range of metabolic genes from host genomes. The host-derived genes make a significant contribution to the ecological success of cyanophages. In this review, we summarize the host-derived metabolic genes, as well as their origin and roles in cyanophage evolution and important host metabolic pathways, such as the light-dependent reactions of photosynthesis, the pentose phosphate pathway, nutrient acquisition and nucleotide biosynthesis. We also discuss the suitability of the host-derived metabolic genes as potential diagnostic markers for the detection of genetic diversity of cyanophages in natural environments. PMID:27690109

  1. Evolutionary genomics of Salmonella: Gene acquisitions revealed by microarray analysis

    PubMed Central

    Porwollik, Steffen; Wong, Rita Mei-Yi; McClelland, Michael

    2002-01-01

    The presence of homologues of Salmonella enterica sv. Typhimurium LT2 genes was assessed in 22 other Salmonella including members of all seven subspecies and Salmonella bongori. Genomes were hybridized to a microarray of over 97% of the 4,596 annotated ORFs in the LT2 genome. A phylogenetic tree based on homologue content, relative to LT2, was largely concordant with previous studies using sequence information from several loci. Based on the topology of this tree, homologues of genes in LT2 acquired by various clades were predicted including 513 homologues acquired by the ancestor of all Salmonella, 111 acquired by S. enterica, 105 by diphasic Salmonella, and 216 by subspecies 1, most of which are of unknown function. Because this subspecies is responsible for almost all Salmonella infections of mammals and birds, these genes will be of particular interest for further mechanistic studies. Overall, a high level of gene gain, loss, or rapid divergence was predicted along all lineages. For example, at least 425 close homologues of LT2 genes may have been laterally transferred into Salmonella and then between Salmonella lineages. PMID:12072558

  2. Genome-wide analysis reveals gene expression and metabolic network dynamics during embryo development in Arabidopsis.

    PubMed

    Xiang, Daoquan; Venglat, Prakash; Tibiche, Chabane; Yang, Hui; Risseeuw, Eddy; Cao, Yongguo; Babic, Vivijan; Cloutier, Mathieu; Keller, Wilf; Wang, Edwin; Selvaraj, Gopalan; Datla, Raju

    2011-05-01

    Embryogenesis is central to the life cycle of most plant species. Despite its importance, because of the difficulty associated with embryo isolation, global gene expression programs involved in plant embryogenesis, especially the early events following fertilization, are largely unknown. To address this gap, we have developed methods to isolate whole live Arabidopsis (Arabidopsis thaliana) embryos as young as zygote and performed genome-wide profiling of gene expression. These studies revealed insights into patterns of gene expression relating to: maternal and paternal contributions to zygote development, chromosomal level clustering of temporal expression in embryogenesis, and embryo-specific functions. Functional analysis of some of the modulated transcription factor encoding genes from our data sets confirmed that they are critical for embryogenesis. Furthermore, we constructed stage-specific metabolic networks mapped with differentially regulated genes by combining the microarray data with the available Kyoto Encyclopedia of Genes and Genomes metabolic data sets. Comparative analysis of these networks revealed the network-associated structural and topological features, pathway interactions, and gene expression with reference to the metabolic activities during embryogenesis. Together, these studies have generated comprehensive gene expression data sets for embryo development in Arabidopsis and may serve as an important foundational resource for other seed plants. PMID:21402797

  3. Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments

    SciTech Connect

    Haas, B J; Salzberg, S L; Zhu, W; Pertea, M; Allen, J E; Orvis, J; White, O; Buell, C R; Wortman, J R

    2007-12-10

    EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

  4. Prophage, phiPV83-pro, carrying panton-valentine leukocidin genes, on the Staphylococcus aureus P83 chromosome: comparative analysis of the genome structures of phiPV83-pro, phiPVL, phi11, and other phages.

    PubMed

    Zou, D; Kaneko, J; Narita, S; Kamio, Y

    2000-12-01

    Staphylococcus aureus P83 has Panton-Valentine leukocidin (PVL)-like genes, lukM and lukF-PV. Here, lukM and lukF-PV genes were found on the genome of a prophage, which was designated as phiPV83-pro. The precise genome size was 45,636 bp with att core sequences of 10 base pairs. Sixty-four ORFs were identified on the phiPV83-pro genome, including two extra operons, lukM-lukF-PV and orfs63-64. The lukM-lukF-PV cluster was located 2.1 kb upstream of the attL site. The most striking feature of the phiPV83-pro genome was a constituent of at least 4 regions from phi11, phiPVL, and other phages, i.e., (i) att sites identical with those of phi11, (ii) a cos sequence and the genes encoding packaging and head proteins of phiPVL (occupied half region of phiPV83-pro), and (iii) the other two regions which showed no significant similarity with known phages (occupied about 40% of phiPV83-pro). Furthermore, two insertion sequences, ISSA1 and ISSA2 were integrated into attL site and orf44, respectively. PhiPV83-pro was not induced as phage particles from S. aureus P83 regardless of its treatment with mitomycin C. The insertion of ISSA1 into the attL site was one of the reasons of the failure of the induction of the phage particles by mitomycin C treatment of the strain P83.

  5. New Markov Model Approaches to Deciphering Microbial Genome Function and Evolution: Comparative Genomics of Laterally Transferred Genes

    SciTech Connect

    Borodovsky, M.

    2013-04-11

    Algorithmic methods for gene prediction have been developed and successfully applied to many different prokaryotic genome sequences. As the set of genes in a particular genome is not homogeneous with respect to DNA sequence composition features, the GeneMark.hmm program utilizes two Markov models representing distinct classes of protein coding genes denoted "typical" and "atypical". Atypical genes are those whose DNA features deviate significantly from those classified as typical and they represent approximately 10% of any given genome. In addition to the inherent interest of more accurately predicting genes, the atypical status of these genes may also reflect their separate evolutionary ancestry from other genes in that genome. We hypothesize that atypical genes are largely comprised of those genes that have been relatively recently acquired through lateral gene transfer (LGT). If so, what fraction of atypical genes are such bona fide LGTs? We have made atypical gene predictions for all fully completed prokaryotic genomes; we have been able to compare these results to other "surrogate" methods of LGT prediction.

  6. The Arabidopsis root transcriptome by serial analysis of gene expression. Gene identification using the genome sequence.

    PubMed

    Fizames, Cécile; Muños, Stéphane; Cazettes, Céline; Nacry, Philippe; Boucherez, Jossia; Gaymard, Frédéric; Piquemal, David; Delorme, Valérie; Commes, Thérèse; Doumas, Patrick; Cooke, Richard; Marti, Jacques; Sentenac, Hervé; Gojon, Alain

    2004-01-01

    Large-scale identification of genes expressed in roots of the model plant Arabidopsis was performed by serial analysis of gene expression (SAGE), on a total of 144,083 sequenced tags, representing at least 15,964 different mRNAs. For tag to gene assignment, we developed a computational approach based on 26,620 genes annotated from the complete sequence of the genome. The procedure selected warrants the identification of the genes corresponding to the majority of the tags found experimentally, with a high level of reliability, and provides a reference database for SAGE studies in Arabidopsis. This new resource allowed us to characterize the expression of more than 3,000 genes, for which there is no expressed sequence tag (EST) or cDNA in the databases. Moreover, 85% of the tags were specific for one gene. To illustrate this advantage of SAGE for functional genomics, we show that our data allow an unambiguous analysis of most of the individual genes belonging to 12 different ion transporter multigene families. These results indicate that, compared with EST-based tag to gene assignment, the use of the annotated genome sequence greatly improves gene identification in SAGE studies. However, more than 6,000 different tags remained with no gene match, suggesting that a significant proportion of transcripts present in the roots originate from yet unknown or wrongly annotated genes. The root transcriptome characterized in this study markedly differs from those obtained in other organs, and provides a unique resource for investigating the functional specificities of the root system. As an example of the use of SAGE for transcript profiling in Arabidopsis, we report here the identification of 270 genes differentially expressed between roots of plants grown either with NO3- or NH4NO3 as N source.

  7. Child Development and Structural Variation in the Human Genome

    ERIC Educational Resources Information Center

    Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

    2013-01-01

    Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…

  8. Nonclinical and clinical Enterococcus faecium strains, but not Enterococcus faecalis strains, have distinct structural and functional genomic features.

    PubMed

    Kim, Eun Bae; Marco, Maria L

    2014-01-01

    Certain strains of Enterococcus faecium and Enterococcus faecalis contribute beneficially to animal health and food production, while others are associated with nosocomial infections. To determine whether there are structural and functional genomic features that are distinct between nonclinical (NC) and clinical (CL) strains of those species, we analyzed the genomes of 31 E. faecium and 38 E. faecalis strains. Hierarchical clustering of 7,017 orthologs found in the E. faecium pangenome revealed that NC strains clustered into two clades and are distinct from CL strains. NC E. faecium genomes are significantly smaller than CL genomes, and this difference was partly explained by significantly fewer mobile genetic elements (ME), virulence factors (VF), and antibiotic resistance (AR) genes. E. faecium ortholog comparisons identified 68 and 153 genes that are enriched for NC and CL strains, respectively. Proximity analysis showed that CL-enriched loci, and not NC-enriched loci, are more frequently colocalized on the genome with ME. In CL genomes, AR genes are also colocalized with ME, and VF are more frequently associated with CL-enriched loci. Genes in 23 functional groups are also differentially enriched between NC and CL E. faecium genomes. In contrast, differences were not observed between NC and CL E. faecalis genomes despite their having larger genomes than E. faecium. Our findings show that unlike E. faecalis, NC and CL E. faecium strains are equipped with distinct structural and functional genomic features indicative of adaptation to different environments.

  9. Genomic location of the major ribosomal protein gene locus determines Vibrio cholerae global growth and infectivity.

    PubMed

    Soler-Bistué, Alfonso; Mondotte, Juan A; Bland, Michael Jason; Val, Marie-Eve; Saleh, María-Carla; Mazel, Didier

    2015-04-01

    The effects on cell physiology of gene order within the bacterial chromosome are poorly understood. In silico approaches have shown that genes involved in transcription and translation processes, in particular ribosomal protein (RP) genes, localize near the replication origin (oriC) in fast-growing bacteria suggesting that such a positional bias is an evolutionarily conserved growth-optimization strategy. Such genomic localization could either provide a higher dosage of these genes during fast growth or facilitate the assembly of ribosomes and transcription foci by keeping physically close the many components of these macromolecular machines. To explore this, we used novel recombineering tools to create a set of Vibrio cholerae strains in which S10-spec-α (S10), a locus bearing half of the ribosomal protein genes, was systematically relocated to alternative genomic positions. We show that the relative distance of S10 to the origin of replication tightly correlated with a reduction of S10 dosage, mRNA abundance and growth rate within these otherwise isogenic strains. Furthermore, this was accompanied by a significant reduction in the host-invasion capacity in Drosophila melanogaster. Both phenotypes were rescued in strains bearing two S10 copies highly distal to oriC, demonstrating that replication-dependent gene dosage reduction is the main mechanism behind these alterations. Hence, S10 positioning connects genome structure to cell physiology in Vibrio cholerae. Our results show experimentally for the first time that genomic positioning of genes involved in the flux of genetic information conditions global growth control and hence bacterial physiology and potentially its evolution.

  10. Overlapping genes in the human and mouse genomes

    PubMed Central

    Sanna, Chaitanya R; Li, Wen-Hsiung; Zhang, Liqing

    2008-01-01

    Background Increasing evidence suggests that overlapping genes are much more common in eukaryotic genomes than previously thought. In this study we identified and characterized the overlapping genes in a set of 13,484 pairs of human-mouse orthologous genes. Results About 10% of the genes under study are overlapping genes, the majority of which are different-strand overlaps. The majority of the same-strand overlaps are embedded forms, whereas most different-strand overlaps are not embedded and in the convergent transcription orientation. Most of the same-strand overlapping gene pairs show at least a tenfold difference in length, much larger than the length difference between non-overlapping neighboring gene pairs. The length difference between the two different-strand overlapping genes is less dramatic. Over 27% of the different-strand-overlap relationships are shared between human and mouse, compared to only ~8% conservation for same-strand-overlap relationships. More than 96% of the same-strand and different-strand overlaps that are not shared between human and mouse have both genes located on the same chromosomes in the species that does not show the overlap. We examined the causes of transition between the overlapping and non-overlapping states in the two species and found that 3' UTR change plays an important role in the transition. Conclusion Our study contributes to the understanding of the evolutionary transition between overlapping genes and non-overlapping genes and demonstrates the high rates of evolutionary changes in the un-translated regions. PMID:18410680

  11. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

    PubMed

    Huang, Ying; Chen, Shi-Yi; Deng, Feilong

    2016-01-01

    In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools. PMID:27536341

  12. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes.

    PubMed

    Biankin, Andrew V; Waddell, Nicola; Kassahn, Karin S; Gingras, Marie-Claude; Muthuswamy, Lakshmi B; Johns, Amber L; Miller, David K; Wilson, Peter J; Patch, Ann-Marie; Wu, Jianmin; Chang, David K; Cowley, Mark J; Gardiner, Brooke B; Song, Sarah; Harliwong, Ivon; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Wani, Shivangi; Gongora, Milena; Pajic, Marina; Scarlett, Christopher J; Gill, Anthony J; Pinho, Andreia V; Rooman, Ilse; Anderson, Matthew; Holmes, Oliver; Leonard, Conrad; Taylor, Darrin; Wood, Scott; Xu, Qinying; Nones, Katia; Fink, J Lynn; Christ, Angelika; Bruxner, Tim; Cloonan, Nicole; Kolle, Gabriel; Newell, Felicity; Pinese, Mark; Mead, R Scott; Humphris, Jeremy L; Kaplan, Warren; Jones, Marc D; Colvin, Emily K; Nagrial, Adnan M; Humphrey, Emily S; Chou, Angela; Chin, Venessa T; Chantrill, Lorraine A; Mawson, Amanda; Samra, Jaswinder S; Kench, James G; Lovell, Jessica A; Daly, Roger J; Merrett, Neil D; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q; Barbour, Andrew; Zeps, Nikolajs; Kakkar, Nipun; Zhao, Fengmei; Wu, Yuan Qing; Wang, Min; Muzny, Donna M; Fisher, William E; Brunicardi, F Charles; Hodges, Sally E; Reid, Jeffrey G; Drummond, Jennifer; Chang, Kyle; Han, Yi; Lewis, Lora R; Dinh, Huyen; Buhay, Christian J; Beck, Timothy; Timms, Lee; Sam, Michelle; Begley, Kimberly; Brown, Andrew; Pai, Deepa; Panchal, Ami; Buchner, Nicholas; De Borja, Richard; Denroche, Robert E; Yung, Christina K; Serra, Stefano; Onetto, Nicole; Mukhopadhyay, Debabrata; Tsao, Ming-Sound; Shaw, Patricia A; Petersen, Gloria M; Gallinger, Steven; Hruban, Ralph H; Maitra, Anirban; Iacobuzio-Donahue, Christine A; Schulick, Richard D; Wolfgang, Christopher L; Morgan, Richard A; Lawlor, Rita T; Capelli, Paola; Corbo, Vincenzo; Scardoni, Maria; Tortora, Giampaolo; Tempero, Margaret A; Mann, Karen M; Jenkins, Nancy A; Perez-Mancera, Pedro A; Adams, David J; Largaespada, David A; Wessels, Lodewyk F A; Rust, Alistair G; Stein, Lincoln D; Tuveson, David A; Copeland, Neal G; Musgrove, Elizabeth A; Scarpa, Aldo; Eshleman, James R; Hudson, Thomas J; Sutherland, Robert L; Wheeler, David A; Pearson, John V; McPherson, John D; Gibbs, Richard A; Grimmond, Sean M

    2012-11-15

    Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis.

  13. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes

    PubMed Central

    Biankin, Andrew V.; Waddell, Nicola; Kassahn, Karin S.; Gingras, Marie-Claude; Muthuswamy, Lakshmi B.; Johns, Amber L.; Miller, David K.; Wilson, Peter J.; Patch, Ann-Marie; Wu, Jianmin; Chang, David K.; Cowley, Mark J.; Gardiner, Brooke B.; Song, Sarah; Harliwong, Ivon; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Wani, Shivangi; Gongora, Milena; Pajic, Marina; Scarlett, Christopher J.; Gill, Anthony J.; Pinho, Andreia V.; Rooman, Ilse; Anderson, Matthew; Holmes, Oliver; Leonard, Conrad; Taylor, Darrin; Wood, Scott; Xu, Qinying; Nones, Katia; Fink, J. Lynn; Christ, Angelika; Bruxner, Tim; Cloonan, Nicole; Kolle, Gabriel; Newell, Felicity; Pinese, Mark; Mead, R. Scott; Humphris, Jeremy L.; Kaplan, Warren; Jones, Marc D.; Colvin, Emily K.; Nagrial, Adnan M.; Humphrey, Emily S.; Chou, Angela; Chin, Venessa T.; Chantrill, Lorraine A.; Mawson, Amanda; Samra, Jaswinder S.; Kench, James G.; Lovell, Jessica A.; Daly, Roger J.; Merrett, Neil D.; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q.; Barbour, Andrew; Zeps, Nikolajs; Kakkar, Nipun; Zhao, Fengmei; Wu, Yuan Qing; Wang, Min; Muzny, Donna M.; Fisher, William E.; Brunicardi, F. Charles; Hodges, Sally E.; Reid, Jeffrey G.; Drummond, Jennifer; Chang, Kyle; Han, Yi; Lewis, Lora R.; Dinh, Huyen; Buhay, Christian J.; Beck, Timothy; Timms, Lee; Sam, Michelle; Begley, Kimberly; Brown, Andrew; Pai, Deepa; Panchal, Ami; Buchner, Nicholas; De Borja, Richard; Denroche, Robert E.; Yung, Christina K.; Serra, Stefano; Onetto, Nicole; Mukhopadhyay, Debabrata; Tsao, Ming-Sound; Shaw, Patricia A.; Petersen, Gloria M.; Gallinger, Steven; Hruban, Ralph H.; Maitra, Anirban; Iacobuzio-Donahue, Christine A.; Schulick, Richard D.; Wolfgang, Christopher L.; Morgan, Richard A.; Lawlor, Rita T.; Capelli, Paola; Corbo, Vincenzo; Scardoni, Maria; Tortora, Giampaolo; Tempero, Margaret A.; Mann, Karen M.; Jenkins, Nancy A.; Perez-Mancera, Pedro A.; Adams, David J.; Largaespada, David A.; Wessels, Lodewyk F. A.; Rust, Alistair G.; Stein, Lincoln D.; Tuveson, David A.; Copeland, Neal G.; Musgrove, Elizabeth A.; Scarpa, Aldo; Eshleman, James R.; Hudson, Thomas J.; Sutherland, Robert L.; Wheeler, David A.; Pearson, John V.; McPherson, John D.; Gibbs, Richard A.; Grimmond, Sean M.

    2012-01-01

    Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis. PMID:23103869

  14. The Effect of Stress on Genome Regulation and Structure

    PubMed Central

    MADLUNG, ANDREAS; COMAI, LUCA

    2004-01-01

    • Background Stresses exert evolutionary pressures on all organisms, which have developed sophisticated responses to cope and survive. These responses involve cellular physiology, gene regulation and genome remodelling. • Scope In this review, the effects of stress on genomes and the connected responses are considered. Recent developments in our understanding of epigenetic genome regulation, including the role of RNA interference (RNAi), suggest a function for this in stress initiation and response. We review our knowledge of how different stresses, tissue culture, pathogen attack, abiotic stress, and hybridization, affect genomes. Using allopolyploid hybridization as an example, we examine mechanisms that may mediate genomic responses, focusing on RNAi-mediated perturbations. • Conclusions A common response to stresses may be the relaxation of epigenetic regulation, leading to activation of suppressed sequences and secondary effects as regulatory systems attempt to re-establish genomic order. PMID:15319229

  15. Genome size diversity in angiosperms and its influence on gene space.

    PubMed

    Dodsworth, Steven; Leitch, Andrew R; Leitch, Ilia J

    2015-12-01

    Genome size varies c. 2400-fold in angiosperms (flowering plants), although the range of genome size is skewed towards small genomes, with a mean genome size of 1C=5.7Gb. One of the most crucial factors governing genome size in angiosperms is the relative amount and activity of repetitive elements. Recently, there have been new insights into how these repeats, previously discarded as 'junk' DNA, can have a significant impact on gene space (i.e. the part of the genome comprising all the genes and gene-related DNA). Here we review these new findings and explore in what ways genome size itself plays a role in influencing how repeats impact genome dynamics and gene space, including gene expression. PMID:26605684

  16. Genome size diversity in angiosperms and its influence on gene space.

    PubMed

    Dodsworth, Steven; Leitch, Andrew R; Leitch, Ilia J

    2015-12-01

    Genome size varies c. 2400-fold in angiosperms (flowering plants), although the range of genome size is skewed towards small genomes, with a mean genome size of 1C=5.7Gb. One of the most crucial factors governing genome size in angiosperms is the relative amount and activity of repetitive elements. Recently, there have been new insights into how these repeats, previously discarded as 'junk' DNA, can have a significant impact on gene space (i.e. the part of the genome comprising all the genes and gene-related DNA). Here we review these new findings and explore in what ways genome size itself plays a role in influencing how repeats impact genome dynamics and gene space, including gene expression.

  17. Sugarcane Functional Genomics: Gene Discovery for Agronomic Trait Development

    PubMed Central

    Menossi, M.; Silva-Filho, M. C.; Vincentz, M.; Van-Sluys, M.-A.; Souza, G. M.

    2008-01-01

    Sugarcane is a highly productive crop used for centuries as the main source of sugar and recently to produce ethanol, a renewable bio-fuel energy source. There is increased interest in this crop due to the impending need to decrease fossil fuel usage. Sugarcane has a highly polyploid genome. Expressed sequence tag (EST) sequencing has significantly contributed to gene discovery and expression studies used to associate function with sugarcane genes. A significant amount of data exists on regulatory events controlling responses to herbivory, drought, and phosphate deficiency, which cause important constraints on yield and on endophytic bacteria, which are highly beneficial. The means to reduce drought, phosphate deficiency, and herbivory by the sugarcane borer have a negative impact on the environment. Improved tolerance for these constraints is being sought. Sugarcane's ability to accumulate sucrose up to 16% of its culm dry weight is a challenge for genetic manipulation. Genome-based technology such as cDNA microarray data indicates genes associated with sugar content that may be used to develop new varieties improved for sucrose content or for traits that restrict the expansion of the cultivated land. The genes can also be used as molecular markers of agronomic traits in traditional breeding programs. PMID:18273390

  18. Evolutionary conservation analysis between the essential and nonessential genes in bacterial genomes.

    PubMed

    Luo, Hao; Gao, Feng; Lin, Yan

    2015-08-14

    Essential genes are thought to be critical for the survival of the organisms under certain circumstances, and the natural selection acting on essential genes is expected to be stricter than on nonessential ones. Up to now, essential genes have been identified in approximately thirty bacterial organisms by experimental methods. In this paper, we performed a comprehensive comparison between the essential and nonessential genes in the genomes of 23 bacterial species based on the Ka/Ks ratio, and found that essential genes are more evolutionarily conserved than nonessential genes in most of the bacteria examined. Furthermore, we also analyzed the conservation by functional clusters with the clusters of orthologous groups (COGs), and found that the essential genes in the functional categories of G (Carbohydrate transport and metabolism), H (Coenzyme transport and metabolism), I (Transcription), J (Translation, ribosomal structure and biogenesis), K (Lipid transport and metabolism) and L (Replication, recombination and repair) tend to be more evolutionarily conserved than the corresponding nonessential genes in bacteria. The results suggest that the essential genes in these subcategories are subject to stronger selective pressure than the nonessential genes, and therefore, provide more insights of the evolutionary conservation for the essential and nonessential genes in complex biological processes.

  19. Genome-wide identification and characterization of WRKY gene family in Salix suchowensis.

    PubMed

    Bi, Changwei; Xu, Yiqing; Ye, Qiaolin; Yin, Tongming; Ye, Ning

    2016-01-01

    WRKY proteins are the zinc finger transcription factors that were first identified in plants. They can specifically interact with the W-box, which can be found in the promoter region of a large number of plant target genes, to regulate the expressions of downstream target genes. They also participate in diverse physiological and growing processes in plants. Prior to this study, a plenty of WRKY genes have been identified and characterized in herbaceous species, but there is no large-scale study of WRKY genes in willow. With the whole genome sequencing of Salix suchowensis, we have the opportunity to conduct the genome-wide research for willow WRKY gene family. In this study, we identified 85 WRKY genes in the willow genome and renamed them from SsWRKY1 to SsWRKY85 on the basis of their specific distributions on chromosomes. Due to their diverse structural features, the 85 willow WRKY genes could be further classified into three main groups (group I-III), with five subgroups (IIa-IIe) in group II. With the multiple sequence alignment and the manual search, we found three variations of the WRKYGQK heptapeptide: WRKYGRK, WKKYGQK and WRKYGKK, and four variations of the normal zinc finger motif, which might execute some new biological functions. In addition, the SsWRKY genes from the same subgroup share the similar exon-intron structures and conserved motif domains. Further studies of SsWRKY genes revealed that segmental duplication events (SDs) played a more prominent role in the expansion of SsWRKY genes. Distinct expression profiles of SsWRKY genes with RNA sequencing data revealed that diverse expression patterns among five tissues, including tender roots, young leaves, vegetative buds, non-lignified stems and barks. With the analyses of WRKY gene family in willow, it is not only beneficial to complete the functional and annotation information of WRKY genes family in woody plants, but also provide important references to investigate the expansion and evolution of

  20. Genome-wide identification and characterization of WRKY gene family in Salix suchowensis

    PubMed Central

    Ye, Qiaolin; Yin, Tongming

    2016-01-01

    WRKY proteins are the zinc finger transcription factors that were first identified in plants. They can specifically interact with the W-box, which can be found in the promoter region of a large number of plant target genes, to regulate the expressions of downstream target genes. They also participate in diverse physiological and growing processes in plants. Prior to this study, a plenty of WRKY genes have been identified and characterized in herbaceous species, but there is no large-scale study of WRKY genes in willow. With the whole genome sequencing of Salix suchowensis, we have the opportunity to conduct the genome-wide research for willow WRKY gene family. In this study, we identified 85 WRKY genes in the willow genome and renamed them from SsWRKY1 to SsWRKY85 on the basis of their specific distributions on chromosomes. Due to their diverse structural features, the 85 willow WRKY genes could be further classified into three main groups (group I–III), with five subgroups (IIa–IIe) in group II. With the multiple sequence alignment and the manual search, we found three variations of the WRKYGQK heptapeptide: WRKYGRK, WKKYGQK and WRKYGKK, and four variations of the normal zinc finger motif, which might execute some new biological functions. In addition, the SsWRKY genes from the same subgroup share the similar exon–intron structures and conserved motif domains. Further studies of SsWRKY genes revealed that segmental duplication events (SDs) played a more prominent role in the expansion of SsWRKY genes. Distinct expression profiles of SsWRKY genes with RNA sequencing data revealed that diverse expression patterns among five tissues, including tender roots, young leaves, vegetative buds, non-lignified stems and barks. With the analyses of WRKY gene family in willow, it is not only beneficial to complete the functional and annotation information of WRKY genes family in woody plants, but also provide important references to investigate the expansion and evolution

  1. Genome-wide identification and characterization of WRKY gene family in Salix suchowensis

    PubMed Central

    Ye, Qiaolin; Yin, Tongming

    2016-01-01

    WRKY proteins are the zinc finger transcription factors that were first identified in plants. They can specifically interact with the W-box, which can be found in the promoter region of a large number of plant target genes, to regulate the expressions of downstream target genes. They also participate in diverse physiological and growing processes in plants. Prior to this study, a plenty of WRKY genes have been identified and characterized in herbaceous species, but there is no large-scale study of WRKY genes in willow. With the whole genome sequencing of Salix suchowensis, we have the opportunity to conduct the genome-wide research for willow WRKY gene family. In this study, we identified 85 WRKY genes in the willow genome and renamed them from SsWRKY1 to SsWRKY85 on the basis of their specific distributions on chromosomes. Due to their diverse structural features, the 85 willow WRKY genes could be further classified into three main groups (group I–III), with five subgroups (IIa–IIe) in group II. With the multiple sequence alignment and the manual search, we found three variations of the WRKYGQK heptapeptide: WRKYGRK, WKKYGQK and WRKYGKK, and four variations of the normal zinc finger motif, which might execute some new biological functions. In addition, the SsWRKY genes from the same subgroup share the similar exon–intron structures and conserved motif domains. Further studies of SsWRKY genes revealed that segmental duplication events (SDs) played a more prominent role in the expansion of SsWRKY genes. Distinct expression profiles of SsWRKY genes with RNA sequencing data revealed that diverse expression patterns among five tissues, including tender roots, young leaves, vegetative buds, non-lignified stems and barks. With the analyses of WRKY gene family in willow, it is not only beneficial to complete the functional and annotation information of WRKY genes family in woody plants, but also provide important references to investigate the expansion and evolution

  2. Draft Genome of the Wheat Rust Pathogen (Puccinia triticina) Unravels Genome-Wide Structural Variations during Evolution.

    PubMed

    Kiran, Kanti; Rawal, Hukam C; Dubey, Himanshu; Jaswal, Rajdeep; Devanna, B N; Gupta, Deepak Kumar; Bhardwaj, Subhash C; Prasad, P; Pal, Dharam; Chhuneja, Parveen; Balasubramanian, P; Kumar, J; Swami, M; Solanke, Amolkumar U; Gaikwad, Kishor; Singh, Nagendra K; Sharma, Tilak Raj

    2016-01-01

    Leaf rust is one of the most important diseases of wheat and is caused by Puccinia triticina, a highly variable rust pathogen prevalent worldwide. Decoding the genome of this pathogen will help in unraveling the molecular basis of its evolution and in the identification of genes responsible for its various biological functions. We generated high quality draft genome sequences (approximately 100- 106 Mb) of two races of P. triticina; the variable and virulent Race77 and the old, avirulent Race106. The genomes of races 77 and 106 had 33X and 27X coverage, respectively. We predicted 27678 and 26384 genes, with average lengths of 1,129 and 1,086 bases in races 77 and 106, respectively and found that the genomes consisted of 37.49% and 39.99% repetitive sequences. Genome wide comparative analysis revealed that Race77 differs substantially from Race106 with regard to segmental duplication (SD), repeat element, and SNP/InDel characteristics. Comparative analyses showed that Race 77 is a recent, highly variable and adapted Race compared with Race106. Further sequence analyses of 13 additional pathotypes of Race77 clearly differentiated the recent, active and virulent, from the older pathotypes. Average densities of 2.4 SNPs and 0.32 InDels per kb were obtained for all P. triticina pathotypes. Secretome analysis demonstrated that Race77 has more virulence factors than Race 106, which may be responsible for the greater degree of adaptation of this pathogen. We also found that genes under greater selection pressure were conserved in the genomes of both races, and may affect functions crucial for the higher levels of virulence factors in Race77. This study provides insights into the genome structure, genome organization, molecular basis of variation, and pathogenicity of P. triticina The genome sequence data generated in this study have been submitted to public domain databases and will be an important resource for comparative genomics studies of the more than 4000 existing

  3. OxyGene: an innovative platform for investigating oxidative-response genes in whole prokaryotic genomes

    PubMed Central

    Thybert, David; Avner, Stéphane; Lucchetti-Miganeh, Céline; Chéron, Angélique; Barloy-Hubler, Frédérique

    2008-01-01

    Background Oxidative stress is a common stress encountered by living organisms and is due to an imbalance between intracellular reactive oxygen and nitrogen species (ROS, RNS) and cellular antioxidant defence. To defend themselves against ROS/RNS, bacteria possess a subsystem of detoxification enzymes, which are classified with regard to their substrates. To identify such enzymes in prokaryotic genomes, different approaches based on similarity, enzyme profiles or patterns exist. Unfortunately, several problems persist in the annotation, classification and naming of these enzymes due mainly to some erroneous entries in databases, mistake propagation, absence of updating and disparity in function description. Description In order to improve the current annotation of oxidative stress subsystems, an innovative platform named OxyGene has been developed. It integrates an original database called OxyDB, holding thoroughly tested anchor-based signatures associated to subfamilies of oxidative stress enzymes, and a new anchor-driven annotator, for ab initio detection of ROS/RNS response genes. All complete Bacterial and Archaeal genomes have been re-annotated, and the results stored in the OxyGene repository can be interrogated via a Graphical User Interface. Conclusion OxyGene enables the exploration and comparative analysis of enzymes belonging to 37 detoxification subclasses in 664 microbial genomes. It proposes a new classification that improves both the ontology and the annotation of the detoxification subsystems in prokaryotic whole genomes, while discovering new ORFs and attributing precise function to hypothetical annotated proteins. OxyGene is freely available at: PMID:19117520

  4. The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis

    PubMed Central

    Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

    2015-01-01

    Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5’ portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids. PMID:26046631

  5. Genome structures and transcriptomes signify niche adaptation for the multiple-ion-tolerant extremophyte Schrenkiella parvula.

    PubMed

    Oh, Dong-Ha; Hong, Hyewon; Lee, Sang Yeol; Yun, Dae-Jin; Bohnert, Hans J; Dassanayake, Maheshi

    2014-04-01

    Schrenkiella parvula (formerly Thellungiella parvula), a close relative of Arabidopsis (Arabidopsis thaliana) and Brassica crop species, thrives on the shores of Lake Tuz, Turkey, where soils accumulate high concentrations of multiple-ion salts. Despite the stark differences in adaptations to extreme salt stresses, the genomes of S. parvula and Arabidopsis show extensive synteny. S. parvula completes its life cycle in the presence of Na⁺, K⁺, Mg²⁺, Li⁺, and borate at soil concentrations lethal to Arabidopsis. Genome structural variations, including tandem duplications and translocations of genes, interrupt the colinearity observed throughout the S. parvula and Arabidopsis genomes. Structural variations distinguish homologous gene pairs characterized by divergent promoter sequences and basal-level expression strengths. Comparative RNA sequencing reveals the enrichment of ion-transport functions among genes with higher expression in S. parvula, while pathogen defense-related genes show higher expression in Arabidopsis. Key stress-related ion transporter genes in S. parvula showed increased copy number, higher transcript dosage, and evidence for subfunctionalization. This extremophyte offers a framework to identify the requisite adjustments of genomic architecture and expression control for a set of genes found in most plants in a way to support distinct niche adaptation and lifestyles. PMID:24563282

  6. The mitochondrial genome of the screamer louse Bothriometopus (phthiraptera: ischnocera): effects of extensive gene rearrangements on the evolution of the genome.

    PubMed

    Cameron, Stephen L; Johnson, Kevin P; Whiting, Michael F

    2007-11-01

    Mitochondrial (mt) genome rearrangement has generally been studied with respect to the phenomenon itself, focusing on their phylogenetic distribution and causal mechanisms. Rearrangements have additional significance through effects on substitution, transcription, and mRNA processing. Lice are an ideal group in which to study the interactions between rearrangements and these factors due to the heightened rearrangement rate within this group. The entire mt genome of the screamer louse Bothriometopus was sequenced and compared to previously sequenced louse genomes. The mt genome is 15,564 bp, circular, and all genes are encoded on the same strand. The gene arrangement differs radically from both other louse species and the ancestral insect. Nucleotide composition is A+T biased, but there is no skew which may be due to reversal of replication direction or a transcriptional effect. Bothriometopus has both tRNA duplication and concerted evolution which has not been observed previously. Eleven of the 13 protein-coding genes have 3' end stem-loop structures which may allow mRNA processing without flanking tRNAs and so facilitate gene rearrangements. There are five candidate control regions capable of forming stem-loop structures. Two are structurally more similar to the control regions of other insect species than those of other lice. Analyses of Bothriometopus demonstrate that louse mt genomes, in addition to being extensively rearranged, differ significantly from most insect species in nucleotide composition biases, tRNA evolution, protein-coding gene structures and putative signaling sites such as the control region. These may be either a cause or a consequence of gene rearrangements. PMID:17925995

  7. Microcollinearity in an ethylene receptor coding gene region of the Coffea canephora genome is extensively conserved with Vitis vinifera and other distant dicotyledonous sequenced genomes

    PubMed Central

    Guyot, Romain; de la Mare, Marion; Viader, Véronique; Hamon, Perla; Coriton, Olivier; Bustamante-Porras, José; Poncet, Valérie; Campa, Claudine; Hamon, Serge; de Kochko, Alexandre

    2009-01-01

    Background Coffea canephora, also called Robusta, belongs to the Rubiaceae, the fourth largest angiosperm family. This diploid species (2x = 2n = 22) has a fairly small genome size of ≈ 690 Mb and despite its extreme economic importance, particularly for developing countries, knowledge on the genome composition, structure and evolution remain very limited. Here, we report the 160 kb of the first C. canephora Bacterial Artificial Chromosome (BAC) clone ever sequenced and its fine analysis. Results This clone contains the CcEIN4 gene, encoding an ethylene receptor, and twenty other predicted genes showing a high gene density of one gene per 7.8 kb. Most of them display perfect matches with C. canephora expressed sequence tags or show transcriptional activities through PCR amplifications on cDNA libraries. Twenty-three transposable elements, mainly Class II transposon derivatives, were identified at this locus. Most of these Class II elements are Miniature Inverted-repeat Transposable Elements (MITE) known to be closely associated with plant genes. This BAC composition gives a pattern similar to those found in gene rich regions of Solanum lycopersicum and Medicago truncatula genomes indicating that the CcEIN4 regions may belong to a gene rich region in the C. canephora genome. Comparative sequence analysis indicated an extensive conservation between C. canephora and most of the reference dicotyledonous genomes studied in this work, such as tomato (S. lycopersicum), grapevine (V. vinifera), barrel medic M. truncatula, black cottonwood (Populus trichocarpa) and Arabidopsis thaliana. The higher degree of microcollinearity was found between C. canephora and V. vinifera, which belong respectively to the Asterids and Rosids, two clades that diverged more than 114 million years ago. Conclusion This study provides a first glimpse of C. canephora genome composition and evolution. Our data revealed a remarkable conservation of the microcollinearity between C. canephora and V

  8. Genome-Wide Identification and Functional Classification of Tomato (Solanum lycopersicum) Aldehyde Dehydrogenase (ALDH) Gene Superfamily

    PubMed Central

    Lopez-Valverde, Francisco J.; Robles-Bolivar, Paula; Lima-Cabello, Elena; Gachomo, Emma W.; Kotchoni, Simeon O.

    2016-01-01

    Aldehyde dehydrogenases (ALDHs) is a protein superfamily that catalyzes the oxidation of aldehyde molecules into their corresponding non-toxic carboxylic acids, and responding to different environmental stresses, offering promising genetic approaches for improving plant adaptation. The aim of the current study is the functional analysis for systematic identification of S. lycopersicum ALDH gene superfamily. We performed genome-based ALDH genes identification and functional classification, phylogenetic relationship, structure and catalytic domains analysis, and microarray based gene expression. Twenty nine unique tomato ALDH sequences encoding 11 ALDH families were identified, including a unique member of the family 19 ALDH. Phylogenetic analysis revealed 13 groups, with a conserved relationship among ALDH families. Functional structure analysis of ALDH2 showed a catalytic mechanism involving Cys-Glu couple. However, the analysis of ALDH3 showed no functional gene duplication or potential neo-functionalities. Gene expression analysis reveals that particular ALDH genes might respond to wounding stress increasing the expression as ALDH2B7. Overall, this study reveals the complexity of S. lycopersicum ALDH gene superfamily and offers new insights into the structure-functional features and evolution of ALDH gene families in vascular plants. The functional characterization of ALDHs is valuable and promoting molecular breeding in tomato for the improvement of stress tolerance and signaling. PMID:27755582

  9. Genome-wide analysis and expression profiling of the phospholipase D gene family in Gossypium arboreum.

    PubMed

    Tang, Kai; Dong, Chunjuan; Liu, Jinyuan

    2016-02-01

    The plant phospholipase D (PLD) plays versatile functions in multiple aspects of plant growth, development, and stress responses. However, until now, our knowledge concerning the PLD gene family members and their expression patterns in cotton has been limited. In this study, we performed for the first time the genome-wide analysis and expression profiling of PLD gene family in Gossypium arboretum, and finally, a total of 19 non-redundant PLD genes (GaPLDs) were identified. Based on the phylogenetic analysis, they were divided into six well-supported clades (α, β/γ, δ, ε, ζ and φ). Most of the GaPLD genes within the same clade showed the similar exon-intron organization and highly conserved motif structures. Additionally, the chromosomal distribution pattern revealed that GaPLD genes were unevenly distributed across 10 of the 13 cotton chromosomes. Segmental duplication is the major contributor to the expansion of GaPLD gene family and estimated to have occurred from 19.61 to 20.44 million years ago when a recent large-scale genome duplication occurred in cotton. Moreover, the expression profiling provides the functional divergence of GaPLD genes in cotton and provides some new light on the molecular mechanisms of GaPLDα1 and GaPLDδ2 in fiber development. PMID:26718354

  10. Mapping Our Genes: The Genome Projects: How Big, How Fast

    DOE R&D Accomplishments Database

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for �writing the rules� of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. The Office of Technology Assessment (OTA) prepared this report with the assistance of several hundred experts throughout the world.

  11. Mapping our genes: The genome projects: How big, how fast

    SciTech Connect

    none,

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for /open quotes/writing the rules/close quotes/ of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. OTA prepared this report with the assistance of several hundred experts throughout the world. 342 refs., 26 figs., 11 tabs.

  12. Genome-Wide Profiling of PARP1 Reveals an Interplay with Gene Regulatory Regions and DNA Methylation

    PubMed Central

    Nalabothula, Narasimharao; Al-jumaily, Taha; Eteleeb, Abdallah M.; Flight, Robert M.; Xiaorong, Shao; Moseley, Hunter; Rouchka, Eric C.; Fondufe-Mittendorf, Yvonne N.

    2015-01-01

    Poly (ADP-ribose) polymerase-1 (PARP1) is a nuclear enzyme involved in DNA repair, chromatin remodeling and gene expression. PARP1 interactions with chromatin architectural multi-protein complexes (i.e. nucleosomes) alter chromatin structure resulting in changes in gene expression. Chromatin structure impacts gene regulatory processes including transcription, splicing, DNA repair, replication and recombination. It is important to delineate whether PARP1 randomly associates with nucleosomes or is present at specific nucleosome regions throughout the cell genome. We performed genome-wide association studies in breast cancer cell lines to address these questions. Our studies show that PARP1 associates with epigenetic regulatory elements genome-wide, such as active histone marks, CTCF and DNase hypersensitive sites. Additionally, the binding of PARP1 to chromatin genome-wide is mutually exclusive with DNA methylation pattern suggesting a functional interplay between PARP1 and DNA methylation. Indeed, inhibition of PARylation results in genome-wide changes in DNA methylation patterns. Our results suggest that PARP1 controls the fidelity of gene transcription and marks actively transcribed gene regions by selectively binding to transcriptionally active chromatin. These studies provide a platform for developing our understanding of PARP1’s role in gene regulation. PMID:26305327

  13. Genomic Structure of an Economically Important Cyanobacterium, Arthrospira (Spirulina) platensis NIES-39

    PubMed Central

    Fujisawa, Takatomo; Narikawa, Rei; Okamoto, Shinobu; Ehira, Shigeki; Yoshimura, Hidehisa; Suzuki, Iwane; Masuda, Tatsuru; Mochimaru, Mari; Takaichi, Shinichi; Awai, Koichiro; Sekine, Mitsuo; Horikawa, Hiroshi; Yashiro, Isao; Omata, Seiha; Takarada, Hiromi; Katano, Yoko; Kosugi, Hiroki; Tanikawa, Satoshi; Ohmori, Kazuko; Sato, Naoki; Ikeuchi, Masahiko; Fujita, Nobuyuki; Ohmori, Masayuki

    2010-01-01

    A filamentous non-N2-fixing cyanobacterium, Arthrospira (Spirulina) platensis, is an important organism for industrial applications and as a food supply. Almost the complete genome of A. platensis NIES-39 was determined in this study. The genome structure of A. platensis is estimated to be a single, circular chromosome of 6.8 Mb, based on optical mapping. Annotation of this 6.7 Mb sequence yielded 6630 protein-coding genes as well as two sets of rRNA genes and 40 tRNA genes. Of the protein-coding genes, 78% are similar to those of other organisms; the remaining 22% are currently unknown. A total 612 kb of the genome comprise group II introns, insertion sequences and some repetitive elements. Group I introns are located in a protein-coding region. Abundant restriction-modification systems were determined. Unique features in the gene composition were noted, particularly in a large number of genes for adenylate cyclase and haemolysin-like Ca2+-binding proteins and in chemotaxis proteins. Filament-specific genes were highlighted by comparative genomic analysis. PMID:20203057

  14. Genome structure and metabolic features in the red seaweed Chondrus crispus shed light on evolution of the Archaeplastida

    PubMed Central

    Collén, Jonas; Porcel, Betina; Carré, Wilfrid; Ball, Steven G.; Chaparro, Cristian; Tonon, Thierry; Barbeyron, Tristan; Michel, Gurvan; Noel, Benjamin; Valentin, Klaus; Elias, Marek; Artiguenave, François; Arun, Alok; Aury, Jean-Marc; Barbosa-Neto, José F.; Bothwell, John H.; Bouget, François-Yves; Brillet, Loraine; Cabello-Hurtado, Francisco; Capella-Gutiérrez, Salvador; Charrier, Bénédicte; Cladière, Lionel; Cock, J. Mark; Coelho, Susana M.; Colleoni, Christophe; Czjzek, Mirjam; Da Silva, Corinne; Delage, Ludovic; Denoeud, France; Deschamps, Philippe; Dittami, Simon M.; Gabaldón, Toni; Gachon, Claire M. M.; Groisillier, Agnès; Hervé, Cécile; Jabbari, Kamel; Katinka, Michael; Kloareg, Bernard; Kowalczyk, Nathalie; Labadie, Karine; Leblanc, Catherine; Lopez, Pascal J.; McLachlan, Deirdre H.; Meslet-Cladiere, Laurence; Moustafa, Ahmed; Nehr, Zofia; Nyvall Collén, Pi; Panaud, Olivier; Partensky, Frédéric; Poulain, Julie; Rensing, Stefan A.; Rousvoal, Sylvie; Samson, Gaelle; Symeonidi, Aikaterini; Weissenbach, Jean; Zambounis, Antonios; Wincker, Patrick; Boyen, Catherine

    2013-01-01

    Red seaweeds are key components of coastal ecosystems and are economically important as food and as a source of gelling agents, but their genes and genomes have received little attention. Here we report the sequencing of the 105-Mbp genome of the florideophyte Chondrus crispus (Irish moss) and the annotation of the 9,606 genes. The genome features an unusual structure characterized by gene-dense regions surrounded by repeat-rich regions dominated by transposable elements. Despite its fairly large size, this genome shows features typical of compact genomes, e.g., on average only 0.3 introns per gene, short introns, low median distance between genes, small gene families, and no indication of large-scale genome duplication. The genome also gives insights into the metabolism of marine red algae and adaptations to the marine environment, including genes related to halogen metabolism, oxylipins, and multicellularity (microRNA processing and transcription factors). Particularly interesting are features related to carbohydrate metabolism, which include a minimalistic gene set for starch biosynthesis, the presence of cellulose synthases acquired before the primary endosymbiosis showing the polyphyly of cellulose synthesis in Archaeplastida, and cellulases absent in terrestrial plants as well as the occurrence of a mannosylglycerate synthase potentially originating from a marine bacterium. To explain the observations on genome structure and gene content, we propose an evolutionary scenario involving an ancestral red alga that was driven by early ecological forces to lose genes, introns, and intergenetic DNA; this loss was followed by an expansion of genome size as a consequence of activity of transposable elements. PMID:23503846