Science.gov

Sample records for gene genomic structure

  1. Genomic structure of the human caldesmon gene.

    PubMed Central

    Hayashi, K; Yano, H; Hashida, T; Takeuchi, R; Takeda, O; Asada, K; Takahashi, E; Kato, I; Sobue, K

    1992-01-01

    The high molecular weight caldesmon (h-CaD) is predominantly expressed in smooth muscles, whereas the low molecular weight caldesmon (l-CaD) is widely distributed in nonmuscle tissues and cells. The changes in CaD isoform expression are closely correlated with the phenotypic modulation of smooth muscle cells. During a search for isoform diversity of human CaDs, l-CaD cDNAs were cloned from HeLa S3 cells. HeLa l-CaD I is composed of 558 amino acids, whereas 26 amino acids (residues 202-227 for HeLa l-CaD I) are deleted in HeLa l-CaD II. The short amino-terminal sequence of HeLa l-CaDs is different from that of fibroblast (WI-38) l-CaD II and human aorta h-CaD. We have also identified WI-38 l-CaD I, which contains a 26-amino acid insertion relative to WI-38 l-CaD II. To reveal the molecular events of the expressional regulation of the CaD isoforms, the genomic structure of the human CaD gene was determined. The human CaD gene is composed of 14 exons and was mapped to a single locus, 7q33-q34. The 26-amino acid insertion is encoded in exon 4 and is specifically spliced in the mRNAs for both h-CaD and l-CaDs I. Exon 3 is the exon that encodes the central repeating domain specific to h-CaD (residues 208-436) together with the common domain in all CaD (residues 73-207 for h-CaD and WI-38 l-CaDs, and residues 68-201 for HeLa l-CaDs). The regulation of h- and l-CaD expression is thought to depend on selection of the two 5' splice sites within exon 3. Thus, the change in expression between l-CaD and h-CaD might be caused by this splicing pathway. Images PMID:1465449

  2. Genome Editing of Structural Variations: Modeling and Gene Correction.

    PubMed

    Park, Chul-Yong; Sung, Jin Jea; Kim, Dong-Wook

    2016-07-01

    The analysis of chromosomal structural variations (SVs), such as inversions and translocations, was made possible by the completion of the human genome project and the development of genome-wide sequencing technologies. SVs contribute to genetic diversity and evolution, although some SVs can cause diseases such as hemophilia A in humans. Genome engineering technology using programmable nucleases (e.g., ZFNs, TALENs, and CRISPR/Cas9) has been rapidly developed, enabling precise and efficient genome editing for SV research. Here, we review advances in modeling and gene correction of SVs, focusing on inversion, translocation, and nucleotide repeat expansion. PMID:27016031

  3. Recognizing genes and other components of genomic structure

    SciTech Connect

    Burks, C. ); Myers, E. . Dept. of Computer Science); Stormo, G.D. . Dept. of Molecular, Cellular and Developmental Biology)

    1991-01-01

    The Aspen Center for Physics (ACP) sponsored a three-week workshop, with 26 scientists participating, from 28 May to 15 June, 1990. The workshop, entitled Recognizing Genes and Other Components of Genomic Structure, focussed on discussion of current needs and future strategies for developing the ability to identify and predict the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians. No participant stayed for less than one week, and most attended for two or three weeks. Computers, software, and databases were available for use as electronic blackboards'' and as the basis for collaborative exploration of ideas being discussed and developed at the workshop. 23 refs., 2 tabs.

  4. Rapid evolution and complex structural organization in genomic regions harboring multiple prolamin genes in the polyploid wheat genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genes encoding wheat prolamins belong to complicated multi-gene families in the wheat genome. To understand the structural complexity of storage protein loci, we sequenced and analyzed orthologous regions containing both gliadin and LMW-glutenin genes from the A and B genomes of a tetraploid wheat ...

  5. The Mitochondrial Genome of Soybean Reveals Complex Genome Structures and Gene Evolution at Intercellular and Phylogenetic Levels

    PubMed Central

    Chang, Shengxin; Wang, Yankun; Lu, Jiangjie; Gai, Junyi; Li, Jijie; Chu, Pu; Guan, Rongzhan; Zhao, Tuanjie

    2013-01-01

    Determining mitochondrial genomes is important for elucidating vital activities of seed plants. Mitochondrial genomes are specific to each plant species because of their variable size, complex structures and patterns of gene losses and gains during evolution. This complexity has made research on the soybean mitochondrial genome difficult compared with its nuclear and chloroplast genomes. The present study helps to solve a 30-year mystery regarding the most complex mitochondrial genome structure, showing that pairwise rearrangements among the many large repeats may produce an enriched molecular pool of 760 circles in seed plants. The soybean mitochondrial genome harbors 58 genes of known function in addition to 52 predicted open reading frames of unknown function. The genome contains sequences of multiple identifiable origins, including 6.8 kb and 7.1 kb DNA fragments that have been transferred from the nuclear and chloroplast genomes, respectively, and some horizontal DNA transfers. The soybean mitochondrial genome has lost 16 genes, including nine protein-coding genes and seven tRNA genes; however, it has acquired five chloroplast-derived genes during evolution. Four tRNA genes, common among the three genomes, are derived from the chloroplast. Sizeable DNA transfers to the nucleus, with pericentromeric regions as hotspots, are observed, including DNA transfers of 125.0 kb and 151.6 kb identified unambiguously from the soybean mitochondrial and chloroplast genomes, respectively. The soybean nuclear genome has acquired five genes from its mitochondrial genome. These results provide biological insights into the mitochondrial genome of seed plants, and are especially helpful for deciphering vital activities in soybean. PMID:23431381

  6. Comparative Genomics of Sibling Fungal Pathogenic Taxa Identifies Adaptive Evolution without Divergence in Pathogenicity Genes or Genomic Structure

    PubMed Central

    Sillo, Fabiano; Garbelotto, Matteo; Friedman, Maria; Gonthier, Paolo

    2015-01-01

    It has been estimated that the sister plant pathogenic fungal species Heterobasidion irregulare and Heterobasidion annosum may have been allopatrically isolated for 34–41 Myr. They are now sympatric due to the introduction of the first species from North America into Italy, where they freely hybridize. We used a comparative genomic approach to 1) confirm that the two species are distinct at the genomic level; 2) determine which gene groups have diverged the most and the least between species; 3) show that their overall genomic structures are similar, as predicted by the viability of hybrids, and identify genomic regions that instead are incongruent; and 4) test the previously formulated hypothesis that genes involved in pathogenicity may be less divergent between the two species than genes involved in saprobic decay and sporulation. Results based on the sequencing of three genomes per species identified a high level of interspecific similarity, but clearly confirmed the status of the two as distinct taxa. Genes involved in pathogenicity were more conserved between species than genes involved in saprobic growth and sporulation, corroborating at the genomic level that invasiveness may be determined by the two latter traits, as documented by field and inoculation studies. Additionally, the majority of genes under positive selection and the majority of genes bearing interspecific structural variations were involved either in transcriptional or in mitochondrial functions. This study provides genomic-level evidence that invasiveness of pathogenic microbes can be attained without the high levels of pathogenicity presumed to exist for pathogens challenging naïve hosts. PMID:26527650

  7. Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation

    PubMed Central

    Sharma, Virag; Elghafari, Anas; Hiller, Michael

    2016-01-01

    Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes. PMID:27016733

  8. Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation.

    PubMed

    Sharma, Virag; Elghafari, Anas; Hiller, Michael

    2016-06-20

    Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes. PMID:27016733

  9. Structural features of conopeptide genes inferred from partial sequences of the Conus tribblei genome.

    PubMed

    Barghi, Neda; Concepcion, Gisela P; Olivera, Baldomero M; Lluisma, Arturo O

    2016-02-01

    The evolvability of venom components (in particular, the gene-encoded peptide toxins) in venomous species serves as an adaptive strategy allowing them to target new prey types or respond to changes in the prey field. The structure, organization, and expression of the venom peptide genes may provide insights into the molecular mechanisms that drive the evolution of such genes. Conus is a particularly interesting group given the high chemical diversity of their venom peptides, and the rapid evolution of the conopeptide-encoding genes. Conus genomes, however, are large and characterized by a high proportion of repetitive sequences. As a result, the structure and organization of conopeptide genes have remained poorly known. In this study, a survey of the genome of Conus tribblei was undertaken to address this gap. A partial assembly of C. tribblei genome was generated; the assembly, though consisting of a large number of fragments, accounted for 2160.5 Mb of sequence. A large number of repetitive genomic elements consisting of 642.6 Mb of retrotransposable elements, simple repeats, and novel interspersed repeats were observed. We characterized the structural organization and distribution of conotoxin genes in the genome. A significant number of conopeptide genes (estimated to be between 148 and 193) belonging to different superfamilies with complete or nearly complete exon regions were observed, ~60 % of which were expressed. The unexpressed conopeptide genes represent hidden but significant conotoxin diversity. The conotoxin genes also differed in the frequency and length of the introns. The interruption of exons by long introns in the conopeptide genes and the presence of repeats in the introns may indicate the importance of introns in facilitating recombination, evolution and diversification of conotoxins. These findings advance our understanding of the structural framework that promotes the gene-level molecular evolution of venom peptides. PMID:26423067

  10. Dinoflagellate Gene Structure and Intron Splice Sites in a Genomic Tandem Array.

    PubMed

    Mendez, Gregory S; Delwiche, Charles F; Apt, Kirk E; Lippmeier, J Casey

    2015-01-01

    Dinoflagellates are one of the last major lineages of eukaryotes for which little is known about genome structure and organization. We report here the sequence and gene structure of a clone isolated from a cosmid library which, to our knowledge, represents the largest contiguously sequenced, dinoflagellate genomic, tandem gene array. These data, combined with information from a large transcriptomic library, allowed a high level of confidence of every base pair call. This degree of confidence is not possible with PCR-based contigs. The sequence contains an intron-rich set of five highly expressed gene repeats arranged in tandem. One of the tandem repeat gene members contains an intron 26,372 bp long. This study characterizes a splice site consensus sequence for dinoflagellate introns. Two to nine base pairs around the 3' splice site are repeated by an identical two to nine base pairs around the 5' splice site. The 5' and 3' splice sites are in the same locations within each repeat so that the repeat is found only once in the mature mRNA. This identically repeated intron boundary sequence might be useful in gene modeling and annotation of genomes. PMID:25963315

  11. Physical mapping and genomic structure of the human TNFR2 gene

    SciTech Connect

    Beltinger, C.P.; White, P.S.; Maris, J.M.

    1996-07-01

    The tumor necrosis factor receptor 2 (TNFR2) gene localizes to 1p36.2, a genomic region characteristically deleted in neuroblastomas and other malignancies. In addition, TNFR2 is the principal mediator of the effects of TNF on cellular immunity, and it may cooperate with TNFR1 in the killing of nonlymphoid cells. Therefore, we undertook an analysis of the genomic structure and precise physical mapping of this gene. The TNFR2 gene is contained on 10 exons that span 26 kb. Most of the functional domains of TNFR2 are encoded by separate exons, and each of the repeats of the extracellular cysteine-rich domain is interrupted by an intron. The genomic structure reveals a close relationship to TNFR1, another member of the TNFR superfamily. Based on electrophoretic analysis of yeast artificial chromosomes, TNFR2 maps within 400 kb of the genetic marker D1S434. In addition, we have identified a new polymorphic dinucleotide repeat within intron 4 of TNFR2. The genetic sequence information and exon-intron boundaries we have determined will facilitate mutational analysis of this gene to determine its potential role in neuroblastoma, as well as in other cancers with characteristic deletions or rearrangements of 1p36. 52 refs., 3 figs., 1 tab.

  12. Structural Genomics: From Genes to Structures With Valuable Materials And Many Questions in Between

    SciTech Connect

    Fox, B.G.; Goulding, C.; Malkowski, M.G.; Stewart, L.; Deacon, A.; /SLAC, SSRL

    2009-04-30

    The Protein Structure Initiative (PSI), funded by the US National Institutes of Health (NIH), provides a framework for the development and systematic evaluation of methods to solve protein structures. Although the PSI and other structural genomics efforts around the world have led to the solution of many new protein structures as well as the development of new methods, methodological bottlenecks still exist and are being addressed in this 'production phase' of PSI.

  13. The genomic structure of human BTK, the defective gene in X-linked agammaglobulinemia

    SciTech Connect

    Rohrer, J.; Parolini, O.; Conley, M.E. |; Belmont, J.W.

    1994-12-31

    It has recently been demonstrated that mutations in the gene for Bruton`s tyrosine kinase (BTK) are responsible for X-linked agammaglobulinemia. Southern blot analysis and sequencing of cDNA were used to document deletions, insertions, and single base pair substitutions. To facilitate analysis of BTK regulation and to permit the development of assays that could be used to screen genomic DNA for mutations in BTK, the authors determined the genomic organization of this gene. Subcloning of a cosmid and a yeast artificial chromosome showed that BTK is divided into 19 exons spanning 37 kilobases of genomic DNA. Analysis of the region 5{prime} to the first untranslated exon revealed no consensus TATAA or CAAT boxes; however, three retinoic acid binding sites were identified in this region. Comparison of the structure of BTK with that of other nonreceptor tyrosine kinases, including SRC, FES, and CSK, demonstrated a lack of conservation of exon borders. Information obtained in this study will contribute to understanding of the evolution of nonreceptor tyrosine kinases. It will also be useful in diagnostic studies, including carrier detection, and in studies directed towards gene therapy or gene replacement. 29 refs., 2 figs., 2 tabs.

  14. Computational Integration of Structural and Functional Genomics Data Across Species to Develop Information on Porcine Inflammatory Gene Regulatory Pathway

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Comparative integration of structural and functional genomic data across species holds great promise in finding genes controlling disease resistance. We are investigating the porcine gut immune response to infection through gene expression profiling. We have collected porcine Affymetrix GeneChip da...

  15. BIOINFORMATIC INTEGRATION OF STRUCTURAL AND FUNCTIONAL GENOMICS DATA ACROSS SPECIES TO DEVELOP PORCINE INFLAMMATORY GENE REGULATORY PATHWAY INFORMATION

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Integration of structural and functional genomic data across species holds great promise in finding genes controlling disease resistance. We are investigating the porcine gut immune response to infection through gene expression profiling. We have collected porcine Affymetrix GeneChip data from RNA ...

  16. Computational Integration Of Structural And Functional Genomics Data Across Species To Develop Porcine Inflammatory Gene Regulatory Pathway Information

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Comparative integration of structural and functional genomic data across species holds great promise in finding genes controlling disease resistance. We are investigating the porcine gut immune response to infection through gene expression profiling. We have collected porcine Affymetrix GeneChip da...

  17. Genomic structure of the human D-site binding protein (DBP) gene

    SciTech Connect

    Shutler, G.; Glassco, T.; Kang, Xiaolin

    1996-06-15

    The human gene for the D-Site Binding Protein (DBP) has been sequenced and characterized. This gene is a member of the b/ZIP family of transcription factors and is one of three genes forming the PAR sub-family. DBP has been implicated in the diurnal regulation of a variety of liver-specific genes. Examination of the genomic structure of DBP reveals that the gene is divided into four exons and is contained within a relatively compact region of approximately 6 kb. These exons appear to correspond to functional divisions the DBP protein. Exon 1 contains a long 5{prime} UTR, and conservation between the rat and the human genes of the presence of small open reading frames within this region suggests that is may play a role in translational control. Exon 2 contains a limited region of similarity to the other PAR domain genes, which may be part of a potential activation domain. Exon 3 contains the PAR domain and differs by only 1 of 71 amino acids between rat and human. Exon 4, containing both the basic and the leucine zipper domains, is likewise highly conserved. The overall degree of homology between the rat and the human cDNA sequences is 82% for the nucleic acid sequence and 92% for the protein sequence. comparison of the rat and human proximal promoters reveals extensive sequence conservation, with two previously characterized DNA binding sites being conserved at the functional and sequence levels. 31 refs., 4 figs.

  18. Genomic structure and expression of STM2, the chromosome 1 familial Alzheimer disease gene

    SciTech Connect

    Levy-Lahad, E.; Wang, Kai; Fu, Ying Hui

    1996-06-01

    Mutations in the gene STM2 result in autosomal dominant familial Alzheimer disease. To screen for mutations and to identify regulatory elements for this gene, the genomic DNA sequence and intron-exon structure were determined. Twelve exons including 10 coding exons were identified in a genomic region spanning 23, 737 bp. The first 2 exons encode the 5{prime}-untranslated region. Expression analysis of STM2 indicates that two transcripts of 2.4 and 2.8 kb are found in skeletal muscle, pancreas, and heart. In addition, a splice variant of the 2.4-kb transcript was identified that is the result of the use of an alternative splice acceptor site located in exon 10. The use of this site results in a transcript lacking a single glutamate. The promotor for this gene and the alternatively spliced exons leading to the 2.8-kb form of the gene remain to be identified. Expression of STM2 was high in skeletal muscle and pancreas, with comparatively low levels observed in brain. This expression pattern is intriguing since in Alzheimer disease, pathology and degeneration are observed only in the central nervous system. 19 refs., 2 figs., 3 tabs.

  19. The population genomics of begomoviruses: global scale population structure and gene flow

    PubMed Central

    2010-01-01

    Background The rapidly growing availability of diverse full genome sequences from across the world is increasing the feasibility of studying the large-scale population processes that underly observable pattern of virus diversity. In particular, characterizing the genetic structure of virus populations could potentially reveal much about how factors such as geographical distributions, host ranges and gene flow between populations combine to produce the discontinuous patterns of genetic diversity that we perceive as distinct virus species. Among the richest and most diverse full genome datasets that are available is that for the dicotyledonous plant infecting genus, Begomovirus, in the Family Geminiviridae. The begomoviruses all share the same whitefly vector, are highly recombinogenic and are distributed throughout tropical and subtropical regions where they seriously threaten the food security of the world's poorest people. Results We focus here on using a model-based population genetic approach to identify the genetically distinct sub-populations within the global begomovirus meta-population. We demonstrate the existence of at least seven major sub-populations that can further be sub-divided into as many as thirty four significantly differentiated and genetically cohesive minor sub-populations. Using the population structure framework revealed in the present study, we further explored the extent of gene flow and recombination between genetic populations. Conclusions Although geographical barriers are apparently the most significant underlying cause of the seven major population sub-divisions, within the framework of these sub-divisions, we explore patterns of gene flow to reveal that both host range differences and genetic barriers to recombination have probably been major contributors to the minor population sub-divisions that we have identified. We believe that the global Begomovirus population structure revealed here could facilitate population genetics studies

  20. The human BARX2 gene: genomic structure, chromosomal localization, and single nucleotide polymorphisms.

    PubMed

    Hjalt, T A; Murray, J C

    1999-12-15

    The BARX genes 1 and 2 are Bar class homeobox genes expressed in craniofacial structures during development. In this report, we present the genomic structure, chromosomal localization, and polymorphic markers in BARX2. The gene has four exons, ranging in size from 85 to 1099 bp. BARX2 is localized on human chromosome 11q25, as determined by radiation hybrid mapping. In the mouse, Barx2 is coexpressed with Pitx2 in several tissues. Based on the coexpression, BARX2 was assumed to be a candidate gene for those cases of Rieger syndrome that cannot be associated with mutations of PITX2. Mutations in PITX2 cause some cases of Rieger syndrome, an autosomal dominant disorder affecting eyes, teeth, and umbilicus. DNA from Rieger patients was subjected to single-strand conformation polymorphism screening of the BARX2 coding region. Three single nucleotide polymorphisms were found in a normal population, although no etiologic mutations were detectable in over 100 cases of Rieger syndrome or in individuals with related ocular disorders. PMID:10644443

  1. Pseudoscorpion mitochondria show rearranged genes and genome-wide reductions of RNA gene sizes and inferred structures, yet typical nucleotide composition bias

    PubMed Central

    2012-01-01

    Background Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes. Results We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic analyses of all 13

  2. Analysis of the murine Dtk gene identifies conservation of genomic structure within a new receptor tyrosine kinase subfamily

    SciTech Connect

    Lewis, P.M.; Crosier, K.E.; Crosier, P.S.

    1996-01-01

    The receptor tyrosine kinase Dtk/Tyro 3/Sky/rse/brt/tif is a member of a new subfamily of receptors that also includes Axl/Ufo/Ark and Eyk/Mer. These receptors are characterized by the presence of two immunoglobulin-like loops and two fibronectin type III repeats in their extracellular domains. The structure of the murine Dtk gene has been determined. The gene consists of 21 exons that are distributed over 21 kb of genomic DNA. An isoform of Dtk is generated by differential splicing of exons from the 5{prime} region of the gene. The overall genomic structure of Dtk is virtually identical to that determined for the human UFO gene. This particular genomic organization is likely to have been duplicated and closely maintained throughout evolution. 38 refs., 3 figs., 1 tab.

  3. The genomic structure of the human Charcot-Leyden crystal protein gene is analogous to those of the galectin genes

    SciTech Connect

    Dyer, K.D. |; Handen, J.S.; Rosenberg, H.F.

    1997-03-01

    The Charcot-Leyden crystal (CLC) protein, or eosinophil lysophospholipase, is a characteristic protein of human eosinophils and basophils; recent work has demonstrated that the CLC protein is both structurally and functionally related to the galectin family of {beta}-galactoside binding proteins. The galectins as a group share a number of features in common, including a linear ligand binding site encoded on a single exon. In this work, we demonstrate that the intron-exon structure of the gene encoding CLC is analogous to those encoding the galectins. The coding sequence of the CLC gene is divided into four exons, with the entire {beta}-galactoside binding site encoded by exon III. We have isolated CLC {beta}-galactoside binding sites from both orangutan (Pongo pygmaeus) and murine (Mus musculus) genomic DNAs, both encoded on single exons, and noted conservation of the amino acids shown to interact directly with the {beta}-galactoside ligand. The most likely interpretation of these results suggests the occurrence of one or more exon duplication and insertion events, resulting in the distribution of this lectin domain to CLC as well as to the multiple galectin genes. 35 refs., 3 figs.

  4. The mouse formin (Fmn) gene: Genomic structure, novel exons, and genetic mapping

    SciTech Connect

    Wang, C.C.; Chan, D.C.; Leder, P.

    1997-02-01

    Mutations in the mouse formin (Fmn) gene, formerly known as the limb deformity (ld) gene, give rise to recessively inherited limb deformities and renal malformations or aplasia. The Fmn gene encodes many differentially processed transcripts that are expressed in both adult and embryonic tissues. To study the genomic organization of the Fmn locus, we have used Fmn probes to isolate and characterize genomic clones spanning 500 kb. Our analysis of these clones shows that the Fmn gene is composed of at least 24 exons and spans 400 kb. We have identified two novel exons that are expressed in the developing embryonic limb bud as well as adult tissues such as brain and kidney. We have also used a microsatellite polymorphism from within the Fmn gene to map it genetically to a 2.2-cM interval between D2Mit58 and D2Mit103. 36 refs., 6 figs., 1 tab.

  5. Structural Variants in the Soybean Genome Localize to Clusters of Biotic Stress-Response Genes1[W][OA

    PubMed Central

    McHale, Leah K.; Haun, William J.; Xu, Wayne W.; Bhaskar, Pudota B.; Anderson, Justin E.; Hyten, David L.; Gerhardt, Daniel J.; Jeddeloh, Jeffrey A.; Stupar, Robert M.

    2012-01-01

    Genome-wide structural and gene content variations are hypothesized to drive important phenotypic variation within a species. Structural and gene content variations were assessed among four soybean (Glycine max) genotypes using array hybridization and targeted resequencing. Many chromosomes exhibited relatively low rates of structural variation (SV) among genotypes. However, several regions exhibited both copy number and presence-absence variation, the most prominent found on chromosomes 3, 6, 7, 16, and 18. Interestingly, the regions most enriched for SV were specifically localized to gene-rich regions that harbor clustered multigene families. The most abundant classes of gene families associated with these regions were the nucleotide-binding and receptor-like protein classes, both of which are important for plant biotic defense. The colocalization of SV with plant defense response signal transduction pathways provides insight into the mechanisms of soybean resistance gene evolution and may inform the development of new approaches to resistance gene cloning. PMID:22696021

  6. Genome structure drives patterns of gene family evolution in ciliates, a case study using Chilodonella uncinata (Protista, Ciliophora, Phyllopharyngea).

    PubMed

    Gao, Feng; Song, Weibo; Katz, Laura A

    2014-08-01

    In most lineages, diversity among gene family members results from gene duplication followed by sequence divergence. Because of the genome rearrangements during the development of somatic nuclei, gene family evolution in ciliates involves more complex processes. Previous work on the ciliate Chilodonella uncinata revealed that macronuclear β-tubulin gene family members are generated by alternative processing, in which germline regions are alternatively used in multiple macronuclear chromosomes. To further study genome evolution in this ciliate, we analyzed its transcriptome and found that (1) alternative processing is extensive among gene families; and (2) such gene families are likely to be C. uncinata specific. We characterized additional macronuclear and micronuclear copies of one candidate alternatively processed gene family-a protein kinase domain containing protein (PKc)-from two C. uncinata strains. Analysis of the PKc sequences reveals that (1) multiple PKc gene family members in the macronucleus share some identical regions flanked by divergent regions; and (2) the shared identical regions are processed from a single micronuclear chromosome. We discuss analogous processes in lineages across the eukaryotic tree of life to provide further insights on the impact of genome structure on gene family evolution in eukaryotes. PMID:24749903

  7. Genome structure drives patterns of gene family evolution in ciliates, a case study using Chilodonella uncinata (Protista, Ciliophora, Phyllopharyngea)

    PubMed Central

    Gao, Feng; Song, Weibo; Katz, Laura A.

    2014-01-01

    In most lineages, diversity among gene family members results from gene duplication followed by sequence divergence. Because of the genome rearrangements during the development of somatic nuclei, gene family evolution in ciliates involves more complex processes. Previous work on the ciliate Chilodonella uncinata revealed that macronuclear β-tubulin gene family members are generated by alternative processing, in which germline regions are alternatively used in multiple macronuclear chromosomes. To further study genome evolution in this ciliate, we analyzed its transcriptome and found that: 1) alternative processing is extensive among gene families; and 2) such gene families are likely to be C. uncinata-specific. We characterized additional macronuclear and micronuclear copies of one candidate alternatively processed gene family -- a protein kinase domain containing protein (PKc) -- from two C. uncinata strains. Analysis of the PKc sequences reveals: 1) multiple PKc gene family members in the macronucleus share some identical regions flanked by divergent regions; and 2) the shared identical regions are processed from a single micronuclear chromosome. We discuss analogous processes in lineages across the eukaryotic tree of life to provide further insights on the impact of genome structure on gene family evolution in eukaryotes. PMID:24749903

  8. Genome-wide analysis of the structural genes regulating defense phenylpropanoid metabolism in Populus

    SciTech Connect

    Tschaplinski, Timothy J; Tsai, Chung-Jui; Harding, Scott A; Lindroth, richard L; Yuan, Yinan

    2006-01-01

    Salicin-based phenolic glycosides, hydroxycinnamate derivatives and flavonoid-derived condensed tannins comprise up to one-third of Populus leaf dry mass. Genes regulating the abundance and chemical diversity of these substances have not been comprehensively analysed in tree species exhibiting this metabolically demanding level of phenolic metabolism. Here, shikimate-phenylpropanoid pathway genes thought to give rise to these phenolic products were annotated from the Populus genome, their expression assessed by semiquantitative or quantitative reverse transcription polymerase chain reaction (PCR), and metabolic evidence for function presented. Unlike Arabidopsis, Populus leaves accumulate an array of hydroxycinnamoyl-quinate esters, which is consistent with broadened function of the expanded hydroxycinnamoyl-CoA transferase gene family. Greater flavonoid pathway diversity is also represented, and flavonoid gene families are larger. Consistent with expanded pathway function, most of these genes were upregulated during wound-stimulated condensed tannin synthesis in leaves. The suite of Populus genes regulating phenylpropanoid product accumulation should have important application in managing phenolic carbon pools in relation to climate change and global carbon cycling.

  9. Human tissue factor pathway inhibitor (TFPI) gene: Complete genomic structure and localization on the genetic map of chromosome 2q

    SciTech Connect

    Enjyoji, Kei-ichi; Emi, Mitsuru; Mukai, Tsunehiro; Imada, Motohiro; Kato, Hisao ); Leppert, M.L.; Lalouel, J.M. Univ. of Utah Medical School, Salt Lake City, UT )

    1993-08-01

    Tissue factor pathway inhibitor (TFPI), a protease inhibitor that circulates in association with plasma lipoproteins (VLDL, LDL and HDL), helps to regulate the extrinsic blood coagulation cascade. The authors have cloned a 125-kb genomic region containing the entire human TFPI gene on six overlapping cosmids and prepared a restriction map of this contig to clarify gene structure. More than half (45 kb) of the 85-kb gene is occupied with 5[prime] noncoding elements: coding begins at exon 3. A HindIII RFLP identified with one cosmid was genotyped in the CEPH panel of 559 reference families. Linkage analysis using markers on human chromosome 2 located the TFPI gene on 2q, 36 cM proximal to D2S43(pYNZ15) and 13 cM distal to the crystalline [gamma]-polypeptide locus CRYGP1(p5G1). 31 refs., 3 figs., 3 tabs.

  10. Genomic structure and complete nucleotide sequence of the Batten disease gene, CLN3

    SciTech Connect

    Mitchison, H.M.; Munroe, P.B.; O`Rawe, A.M.

    1997-03-01

    We recently cloned a cDNA for CLN3, the gene for juvenile-onset neuronal ceroid lipofuscinosis or Batten disease. To resolve the genomic organization we used a cosmid clone containing CLN3 to sequence the entire gene in addition to 1.1 kb 5{prime} of the start of the published CLN3 cDNA and 0.3 kb 3{prime} to the polyadenylation site. CLN3 is organized into at least 15 exons spanning 15 kb and ranging from 47 to 356 bp. The 14 introns vary from 80 to 4227 bp, and all exon/intron junction sequences conform to the GTAG rule. Numerous repetitive Alu elements are present within the introns and 5{prime}- and 3{prime}-untranslated regions. The 5{prime} region of the CLN3 gene contains several potential transcription regulatory elements but no consensus TATA-1 box was identified. CLN3 is homologous to 27 deposited human ESTs, and sequence comparisons suggest alternative splicing of the gene and the existence of transcribed sequences upstream to the start of the published CLN3 cDNA. 19 refs., 2 figs., 1 tab.

  11. Characterization of the genomic structure of the mouse APLP1 gene

    SciTech Connect

    Zhong, Sue; Wu, Kuo; Black, I.B.; Schaar, D.G.

    1996-02-15

    This article reports on the organization of the mouse APLP1 gene, an evolutionarily conserved amyloid precursor-like protein. The amyloid beta protein, important in Alzheimer diseases, is derived from these precursor proteins. By investigating the expression and structure of this murine gene, it is hoped that more will be learned about the function and regulation of the human homologue. 27 refs., 2 figs.

  12. An analysis by restriction enzymes of the genomic structure of the 3' untranslated region of the human estrogen receptor gene.

    PubMed

    Keaveney, M; Neilan, J; Gannon, F

    1989-04-12

    The estrogen receptor gene has a very long 3' untranslated region. As a first step towards the analysis of this structural feature for any functional role, we have cloned the human genomic estrogen receptor gene. Extensive restriction enzyme analysis of this DNA and comparison of the sizes of the DNA fragments obtained with those predicted from published cDNA sequences indicate that the 3' exon extends for at least 4304 bases from base number 2018 in the cDNA to the end of the cDNA. The data also show that the most 3' intron in this gene occurs between bases 1902 and 2018 of the cDNA. PMID:2930778

  13. From genes to genome biology

    SciTech Connect

    Pennisi, E.

    1996-06-21

    This article describes a change in the approach to mapping genomes, from looking at one gene at a time, to other approaches. Strategies include everything from lab techniques to computer programs designed to analyze whole batches of genes at once. Also included is a update on the work on the human genome.

  14. Mapping of a gene coding for a major late structural polypeptide on the vaccinia virus genome.

    PubMed Central

    Wittek, R; Hänggi, M; Hiller, G

    1984-01-01

    Cell-free translation of total RNA isolated from vaccinia virus-infected cells late in infection results in a complex mixture of polypeptides. A monospecific antibody directed against one of the major structural proteins of the virus particle immunoprecipitated a single polypeptide with a molecular weight of 11,000 (11K) from this mixture. Immunoprecipitation was therefore used to identify the structural polypeptide among the in vitro translation products of RNA purified by hybridization selection to restriction fragments of the vaccinia virus genome. This allowed us to map the mRNA coding for the 11K polypeptide to the extreme left-hand end of the HindIII E fragment. Detailed transcriptional mapping of this region of the genome by nuclease S1 analysis revealed the presence of a late RNA transcribed from the rightward-reading strand. Its 5' end mapped at ca. 130 base pairs to the left of the HindIII site at the junction between the HindIII F and E fragments. The map position of this RNA coincided precisely with the map position of the late message coding for the 11K polypeptide. Images PMID:6319738

  15. Genomic structure and chromosomal localization of the human deoxycytidine kinase gene

    SciTech Connect

    Song, J.J.; Walker, S.; Gribbin, T. ); Chen, E. Univ. of North Carolina, Chapel Hill ); Johnson, E.E.; Spychala, J.; Mitchell, B.S. )

    1993-01-15

    Deoxycytidine kinase (NTP:deoxycytidine 5[prime]-phosphotransferase, EC 2.7.1.74) is an enzyme that catalyzes phosphorylation of deoxyribonucleosides and a number of nucleoside analogs that are important in antiviral and cancer chemotherapy. Deficiency of this enzyme activity is associated with resistance to these agents, whereas increased enzyme activity is associated with increased activation of such compounds to cytotoxic nucleoside triphosphate derivatives. To characterize the regulation of expression of this gene, we have isolated genomic clones encompassing its entire coding and 5[prime] flanking regions and delinated all the exon/intron boundaries. The gene extends over more than 34 kilobases on chromosome 4 and the coding region is composed of 7 exons ranging in size from 90 to 1544 base pairs (bp). The 5[prime] flanking region is highly G+C-rich and contains four regions that are potential Sp1 binding sites. A 697-bp fragment encompassing 386 bp of 5[prime] upstream region, the 250-bp first exon, and 61 bp of the first intron was demonstrated to promote chloramphenicol acetyltransferase activity in a T-lymphoblast cell line and to have >6-fold greater activity in a Jurkat T-lymphoblast than in a Raji B-lymphoblast cell line. Our data suggest that these 5[prime] sequences may contain elements that are important for the tissue-specific differences in deoxycytidine kinase expression. 32 refs., 4 figs., 2 tabs.

  16. Genomic structure, gene expression, and promoter analysis of human multidrug resistance-associated protein 7

    SciTech Connect

    Kao, Hsin-Hsin; Chang, Ming-Shi; Cheng, Jan-Fang; Huang, Jin-Ding

    2002-03-15

    The multidrug resistance-associated protein (MRP) subfamily transporters associated with anticancer drug efflux are attributed to the multidrug-resistance of cancer cells. The genomic organization of human multidrug resistance-associated protein 7 (MRP7) was identified. The human MRP7 gene, consisting of 22 exons and 21 introns, greatly differs from other members of the human MRP subfamily. A splicing variant of human MRP7, MRP7A, expressed in most human tissues, was also characterized. The 1.93-kb promoter region of MRP7 was isolated and shown to support luciferase activity at a level 4- to 5-fold greater than that of the SV40 promoter. Basal MRP7 gene expression was regulated by 2 regions in the 5-flanking region at 1,780 1,287 bp, and at 611 to 208 bp. In Madin-Darby canine kidney (MDCK) cells, MRP7 promoter activity was increased by 226 percent by genotoxic 2-acetylaminofluorene and 347 percent by the histone deacetylase inhibitor, trichostatin A. The protein was expressed in the membrane fraction of transfected MDCK cells.

  17. Reannotation and extended community resources for the genome of the non-seed plant Physcomitrella patens provide insights into the evolution of plant gene structures and functions

    PubMed Central

    2013-01-01

    Background The moss Physcomitrella patens as a model species provides an important reference for early-diverging lineages of plants and the release of the genome in 2008 opened the doors to genome-wide studies. The usability of a reference genome greatly depends on the quality of the annotation and the availability of centralized community resources. Therefore, in the light of accumulating evidence for missing genes, fragmentary gene structures, false annotations and a low rate of functional annotations on the original release, we decided to improve the moss genome annotation. Results Here, we report the complete moss genome re-annotation (designated V1.6) incorporating the increased transcript availability from a multitude of developmental stages and tissue types. We demonstrate the utility of the improved P. patens genome annotation for comparative genomics and new extensions to the cosmoss.org resource as a central repository for this plant “flagship” genome. The structural annotation of 32,275 protein-coding genes results in 8387 additional loci including 1456 loci with known protein domains or homologs in Plantae. This is the first release to include information on transcript isoforms, suggesting alternative splicing events for at least 10.8% of the loci. Furthermore, this release now also provides information on non-protein-coding loci. Functional annotations were improved regarding quality and coverage, resulting in 58% annotated loci (previously: 41%) that comprise also 7200 additional loci with GO annotations. Access and manual curation of the functional and structural genome annotation is provided via the http://www.cosmoss.org model organism database. Conclusions Comparative analysis of gene structure evolution along the green plant lineage provides novel insights, such as a comparatively high number of loci with 5’-UTR introns in the moss. Comparative analysis of functional annotations reveals expansions of moss house-keeping and metabolic genes

  18. Genomic structure and chromosomal mapping of the human CD22 gene

    SciTech Connect

    Wilson, G.L.; Kozlow, E.; Kehrl, J.H. ); Najfeld, V. ); Menniger, J.; Ward, D. )

    1993-06-01

    The human CD22 gene is expressed specifically in B lymphocytes and likely has an important function in cell-cell interactions. A nearly full length human CD22 cDNA clone was used to isolate genomic clones that span the CD22 gene. The CD22 gene is spread over 22 kb of DNA and is composed of 15 exons. The first exon contains the major transcriptional start sites. The translation initiation codon is located in exon 3, which also encodes a portion of the signal peptide. Exons 4 to 10 encode the seven Ig domains of CD22, exon 11 encodes the transmembrane domain, exons 12 to 15 encode the intracytoplasmic domain of CD22, and exon 15 also contains the 3' untranslated region. A minor form of CD22 mRNA likely results from splicing of exon 5 to exon 8, skipping exons 6 and 7. A 4.6-kb Xbal fragment of the CD22 gene was used to map the chromosomal location of CD22 by fluorescence in situ hybridization. The hybridization locus was identified by combining fluorescent images of the probe with the chromosomal banding pattern generated by an Alu probe. The results demonstrate the CD22 is located within the band region q13.1 of chromosome 19. Two closely clustered major transcription start sites and several minor start sites were mapped by primer extension. Similarly to many other lymphoid-specific genes, the CD22 promoter lacks an obvious TATA box. Approximately 4 kb of DNA 5' of the transcription start sites were sequenced and found to contain multiple Alu elements. Potential binding sites for the transcriptional factors NF-kB, AP-1, and Oct-2 are located within 300 bp 5' of the major transcription start sites. A 400-bp fragment (bp -339 through +71) of the CD22 promoter region was subcloned into a pGEM-chloramphenicol acetyltransferase vector and after transfection into B and T cells was found to be active in both B and T cells. 45 refs., 7 figs., 2 tabs.

  19. Clustering of gene ontology terms in genomes.

    PubMed

    Tiirikka, Timo; Siermala, Markku; Vihinen, Mauno

    2014-10-25

    Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at http://bioinf.uta.fi/GOme. The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them. PMID:24995610

  20. Genome-wide prediction of nucleosome occupancy in maize reveals plant chromatin structural features at genes and other elements at multiple scales.

    PubMed

    Fincher, Justin A; Vera, Daniel L; Hughes, Diana D; McGinnis, Karen M; Dennis, Jonathan H; Bass, Hank W

    2013-06-01

    The nucleosome is a fundamental structural and functional chromatin unit that affects nearly all DNA-templated events in eukaryotic genomes. It is also a biochemical substrate for higher order, cis-acting gene expression codes and the monomeric structural unit for chromatin packaging at multiple scales. To predict the nucleosome landscape of a model plant genome, we used a support vector machine computational algorithm trained on human chromatin to predict the nucleosome occupancy likelihood (NOL) across the maize (Zea mays) genome. Experimentally validated NOL plots provide a novel genomic annotation that highlights gene structures, repetitive elements, and chromosome-scale domains likely to reflect regional gene density. We established a new genome browser (http://www.genomaize.org) for viewing support vector machine-based NOL scores. This annotation provides sequence-based comprehensive coverage across the entire genome, including repetitive genomic regions typically excluded from experimental genomics data. We find that transposable elements often displayed family-specific NOL profiles that included distinct regions, especially near their termini, predicted to have strong affinities for nucleosomes. We examined transcription start site consensus NOL plots for maize gene sets and discovered that most maize genes display a typical +1 nucleosome positioning signal just downstream of the start site but not upstream. This overall lack of a -1 nucleosome positioning signal was also predicted by our method for Arabidopsis (Arabidopsis thaliana) genes and verified by additional analysis of previously published Arabidopsis MNase-Seq data, revealing a general feature of plant promoters. Our study advances plant chromatin research by defining the potential contribution of the DNA sequence to observed nucleosome positioning and provides an invariant baseline annotation against which other genomic data can be compared. PMID:23572549

  1. The mouse lp(A3)/Edg7 lysophosphatidic acid receptor gene: genomic structure, chromosomal localization, and expression pattern.

    PubMed

    Contos, J J; Chun, J

    2001-04-18

    The extracellular signaling molecule, lysophosphatidic acid (LPA), mediates proliferative and morphological effects on cells and has been proposed to be involved in several biological processes including neuronal development, wound healing, and cancer progression. Three mammalian G protein-coupled receptors, encoded by genes designated lp (lysophospholipid) receptor or edg (endothelial differentiation gene), mediate the effects of LPA, activating similar (e.g. Ca(2+) release) as well as distinct (neurite retraction) responses. To understand the evolution and function of LPA receptor genes, we characterized lp(A3)/Edg7 in mouse and human and compared the expression pattern with the other two known LPA receptor genes (lp(A1)/Edg2 and lp(A2)/Edg4non-mutant). We found mouse and human lp(A3) to have nearly identical three-exon genomic structures, with introns upstream of the coding region for transmembrane domain (TMD) I and within the coding region for TMD VI. This structure is similar to lp(A1) and lp(A2), indicating a common ancestral gene with two introns. We localized mouse lp(A3) to distal Chromosome 3 near the varitint waddler (Va) gene, in a region syntenic with the human lp(A3) chromosomal location (1p22.3-31.1). We found highest expression levels of each of the three LPA receptor genes in adult mouse testes, relatively high expression levels of lp(A2) and lp(A3) in kidney, and moderate expression of lp(A2) and lp(A3) in lung. All lp(A) transcripts were expressed during brain development, with lp(A1) and lp(A2) transcripts expressed during the embryonic neurogenic period, and lp(A3) transcript during the early postnatal period. Our results indicate both overlapping as well as distinct functions of lp(A1), lp(A2), and lp(A3). PMID:11313151

  2. Comparative analysis of syntenic genes in grass genomes reveals accelerated rates of gene structure and coding sequence evolution in polyploid wheat

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cycles of whole genome duplication (WGD) and diploidization are hallmarks of eukaryotic genome evolution and speciation. Polyploid wheat (Triticum aestivum) has had a massive increase in genome size largely due to recent WGDs. How these processes may impact the dynamics of gene evolution was studied...

  3. Genomic structure analysis of promoter sequence of a mouse mu opioid receptor gene.

    PubMed Central

    Min, B H; Augustin, L B; Felsheim, R F; Fuchs, J A; Loh, H H

    1994-01-01

    We have isolated mouse mu opioid receptor genomic clones (termed MOR) containing the entire amino acid coding sequence corresponding to rat MOR-1 cDNA, including additional 5' flanking sequence. The mouse MOR gene is > 53 kb long, and the coding sequence is divided by three introns, with exon junctions in codons 95 and 213 and between codons 386 and 387. The first intron is > 26 kb, the second is 0.8 kb, and the third is > 12 kb. Multiple transcription initiation sites were observed, with four major sites confirmed by 5' rapid amplification of cDNA ends and RNase protection located between 291 and 268 bp upstream of the translation start codon. Comparison of the 5' flanking sequence with a transcription factor database revealed putative cis-acting regulatory elements for transcription factors affected by cAMP, as well as those involved in the action of gluco- and mineralocorticoids, cytokines, and immune-cell-specific factors. Images PMID:8090773

  4. Chloroplast Genome Sequence of the Moss Tortula ruralis: Gene Content and Structural Arrangement Relative to Other Green Plant Chloroplast Genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Tortula ruralis, a widely distributed moss species in the family Pottiaceae, is increasingly being used as a model organism for the study of desiccation tolerance and mechanisms of cellular repair. In this paper, we present the chloroplast genome sequence of Tortula ruralis, only the second publishe...

  5. Genomic Survey, Gene Expression Analysis and Structural Modeling Suggest Diverse Roles of DNA Methyltransferases in Legumes

    PubMed Central

    Garg, Rohini; Kumari, Romika; Tiwari, Sneha; Goyal, Shweta

    2014-01-01

    DNA methylation plays a crucial role in development through inheritable gene silencing. Plants possess three types of DNA methyltransferases (MTases), namely Methyltransferase (MET), Chromomethylase (CMT) and Domains Rearranged Methyltransferase (DRM), which maintain methylation at CG, CHG and CHH sites. DNA MTases have not been studied in legumes so far. Here, we report the identification and analysis of putative DNA MTases in five legumes, including chickpea, soybean, pigeonpea, Medicago and Lotus. MTases in legumes could be classified in known MET, CMT, DRM and DNA nucleotide methyltransferases (DNMT2) subfamilies based on their domain organization. First three MTases represent DNA MTases, whereas DNMT2 represents a transfer RNA (tRNA) MTase. Structural comparison of all the MTases in plants with known MTases in mammalian and plant systems have been reported to assign structural features in context of biological functions of these proteins. The structure analysis clearly specified regions crucial for protein-protein interactions and regions important for nucleosome binding in various domains of CMT and MET proteins. In addition, structural model of DRM suggested that circular permutation of motifs does not have any effect on overall structure of DNA methyltransferase domain. These results provide valuable insights into role of various domains in molecular recognition and should facilitate mechanistic understanding of their function in mediating specific methylation patterns. Further, the comprehensive gene expression analyses of MTases in legumes provided evidence of their role in various developmental processes throughout the plant life cycle and response to various abiotic stresses. Overall, our study will be very helpful in establishing the specific functions of DNA MTases in legumes. PMID:24586452

  6. Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage

    PubMed Central

    2012-01-01

    Background Bathycoccus prasinos is an extremely small cosmopolitan marine green alga whose cells are covered with intricate spider's web patterned scales that develop within the Golgi cisternae before their transport to the cell surface. The objective of this work is to sequence and analyze its genome, and to present a comparative analysis with other known genomes of the green lineage. Research Its small genome of 15 Mb consists of 19 chromosomes and lacks transposons. Although 70% of all B. prasinos genes share similarities with other Viridiplantae genes, up to 428 genes were probably acquired by horizontal gene transfer, mainly from other eukaryotes. Two chromosomes, one big and one small, are atypical, an unusual synapomorphic feature within the Mamiellales. Genes on these atypical outlier chromosomes show lower GC content and a significant fraction of putative horizontal gene transfer genes. Whereas the small outlier chromosome lacks colinearity with other Mamiellales and contains many unknown genes without homologs in other species, the big outlier shows a higher intron content, increased expression levels and a unique clustering pattern of housekeeping functionalities. Four gene families are highly expanded in B. prasinos, including sialyltransferases, sialidases, ankyrin repeats and zinc ion-binding genes, and we hypothesize that these genes are associated with the process of scale biogenesis. Conclusion The minimal genomes of the Mamiellophyceae provide a baseline for evolutionary and functional analyses of metabolic processes in green plants. PMID:22925495

  7. Evolution of Pulmonate Gastropod Mitochondrial Genomes: Comparisons of Gene Organizations of Euhadra, Cepaea and Albinaria and Implications of Unusual Trna Secondary Structures

    PubMed Central

    Yamazaki, N.; Ueshima, R.; Terrett, J. A.; Yokobori, S. I.; Kaifu, M.; Segawa, R.; Kobayashi, T.; Numachi, K. I.; Ueda, T.; Nishikawa, K.; Watanabe, K.; Thomas, R. H.

    1997-01-01

    Complete gene organizations of the mitochondrial genomes of three pulmonate gastropods, Euhadra herklotsi, Cepaea nemoralis and Albinaria coerulea, permit comparisons of their gene organizations. Euhadra and Cepaea are classified in the same superfamily, Helicoidea, yet they show several differences in the order of tRNA and protein coding genes. Albinaria is distantly related to the other two genera but shares the same gene order in one part of its mitochondrial genome with Euhadra and in another part with Cepaea. Despite their small size (14.1-14.5 kbp), these snail mtDNAs encode 13 protein genes, two rRNA genes and at least 22 tRNA genes. These genomes exhibit several unusual or unique features compared to other published metazoan mitochondrial genomes, including those of other molluscs. Several tRNAs predicted from the DNA sequences possess bizarre structures lacking either the T stem or the D stem, similar to the situation seen in nematode mt-tRNAs. The acceptor stems of many tRNAs show a considerable number of mismatched basepairs, indicating that the RNA editing process recently demonstrated in Euhadra is widespread in the pulmonate gastropods. Strong selection acting on mitochondrial genomes of these animals would have resulted in frequent occurrence of the mismatched basepairs in regions of overlapping genes. PMID:9055084

  8. Genome-Wide Analysis Reveals Novel Genes Influencing Temporal Lobe Structure with Relevance to Neurodegeneration in Alzheimer’s Disease

    PubMed Central

    Stein, Jason L.; Hua, Xue; Morra, Jonathan H.; Lee, Suh; Hibar, Derrek P.; Ho, April J.; Leow, Alex D.; Toga, Arthur W.; Sul, Jae Hoon; Kang, Hyun Min; Eskin, Eleazar; Saykin, Andrew J.; Shen, Li; Foroud, Tatiana; Pankratz, Nathan; Huentelman, Matthew J.; Craig, David W.; Gerber, Jill D.; Allen, April N.; Corneveaux, Jason J.; Stephan, Dietrich A.; Webster, Jennifer; DeChairo, Bryan M.; Potkin, Steven G.; Jack, Clifford R.; Weiner, Michael W.; Thompson, Paul M.

    2010-01-01

    In a genome-wide association study of structural brain degeneration, we mapped the 3D profile of temporal lobe volume differences in 742 brain MRI scans of Alzheimer’s disease patients, mildly impaired, and healthy elderly subjects. After searching 546,314 genomic markers, 2 single nucleotide polymorphisms (SNPs) were associated with bilateral temporal lobe volume (P < 5×10−7). One SNP, rs10845840, is located in the GRIN2B gene which encodes the N-Methyl-D-Aspartate (NMDA) glutamate receptor NR2B subunit. This protein - involved in learning and memory, and excitotoxic cell death - has age-dependent prevalence in the synapse and is already a therapeutic target in Alzheimer’s disease. Risk alleles for lower temporal lobe volume at this SNP were significantly over-represented in AD and MCI subjects versus controls (odds ratio = 1.273; P = 0.039) and were associated with the mini-mental state exam (MMSE; t = −2.114; P = 0.035) demonstrating a negative effect on global cognitive function. Voxelwise maps of genetic association of this SNP with regional brain volumes, revealed intense temporal lobe effects (FDR correction at q = 0.05; critical P = 0.0257). This study uses large-scale brain mapping for gene discovery with implications for Alzheimer’s disease. PMID:20197096

  9. Short interspersed nuclear elements (SINEs) are abundant in Solanaceae and have a family-specific impact on gene structure and genome organization.

    PubMed

    Seibt, Kathrin M; Wenke, Torsten; Muders, Katja; Truberg, Bernd; Schmidt, Thomas

    2016-05-01

    Short interspersed nuclear elements (SINEs) are highly abundant non-autonomous retrotransposons that are widespread in plants. They are short in size, non-coding, show high sequence diversity, and are therefore mostly not or not correctly annotated in plant genome sequences. Hence, comparative studies on genomic SINE populations are rare. To explore the structural organization and impact of SINEs, we comparatively investigated the genome sequences of the Solanaceae species potato (Solanum tuberosum), tomato (Solanum lycopersicum), wild tomato (Solanum pennellii), and two pepper cultivars (Capsicum annuum). Based on 8.5 Gbp sequence data, we annotated 82 983 SINE copies belonging to 10 families and subfamilies on a base pair level. Solanaceae SINEs are dispersed over all chromosomes with enrichments in distal regions. Depending on the genome assemblies and gene predictions, 30% of all SINE copies are associated with genes, particularly frequent in introns and untranslated regions (UTRs). The close association with genes is family specific. More than 10% of all genes annotated in the Solanaceae species investigated contain at least one SINE insertion, and we found genes harbouring up to 16 SINE copies. We demonstrate the involvement of SINEs in gene and genome evolution including the donation of splice sites, start and stop codons and exons to genes, enlargement of introns and UTRs, generation of tandem-like duplications and transduction of adjacent sequence regions. PMID:26996788

  10. Genome-wide Analyses of the Structural Gene Families Involved in the Legume-specific 5-Deoxyisoflavonoid Biosynthesis of Lotus japonicus

    PubMed Central

    Shimada, Norimoto; Sato, Shusei; Akashi, Tomoyoshi; Nakamura, Yasukazu; Tabata, Satoshi; Ayabe, Shin-ichi; Aoki, Toshio

    2007-01-01

    Abstract A model legume Lotus japonicus (Regel) K. Larsen is one of the subjects of genome sequencing and functional genomics programs. In the course of targeted approaches to the legume genomics, we analyzed the genes encoding enzymes involved in the biosynthesis of the legume-specific 5-deoxyisoflavonoid of L. japonicus, which produces isoflavan phytoalexins on elicitor treatment. The paralogous biosynthetic genes were assigned as comprehensively as possible by biochemical experiments, similarity searches, comparison of the gene structures, and phylogenetic analyses. Among the 10 biosynthetic genes investigated, six comprise multigene families, and in many cases they form gene clusters in the chromosomes. Semi-quantitative reverse transcriptase–PCR analyses showed coordinate up-regulation of most of the genes during phytoalexin induction and complex accumulation patterns of the transcripts in different organs. Some paralogous genes exhibited similar expression specificities, suggesting their genetic redundancy. The molecular evolution of the biosynthetic genes is discussed. The results presented here provide reliable annotations of the genes and genetic markers for comparative and functional genomics of leguminous plants. PMID:17452423

  11. Genomics screens for metastasis genes

    PubMed Central

    Yan, Jinchun; Huang, Qihong

    2014-01-01

    Metastasis is responsible for most cancer mortality. The process of metastasis is complex, requiring the coordinated expression and fine regulation of many genes in multiple pathways in both the tumor and host tissues. Identification and characterization of the genetic programs that regulate metastasis is critical to understanding the metastatic process and discovering molecular targets for the prevention and treatment of metastasis. Genomic approaches and functional genomic analyses can systemically discover metastasis genes. In this review, we summarize the genetic tools and methods that have been used to identify and characterize the genes that play critical roles in metastasis. PMID:22684367

  12. Characterisation of a genomic clone covering the structural mouse MyoD1 gene and its promoter region.

    PubMed Central

    Zingg, J M; Alva, G P; Jost, J P

    1991-01-01

    We have isolated the mouse MyoD1 gene flanked by its promoter region by screening a genomic library with synthetic oligonucleotides. The structural gene is interrupted by two G + C rich introns. Transfection of the cloned gene inserted into an expression vector converts fibroblasts to myoblasts. Sequence analysis of about 650 bp of the 5' upstream region revealed the presence of several potential regulatory elements such as a TATA-box, an AP2-box, two SP1-boxes and a CAAT-box. In addition, there are three half palindromic estrogen response elements, a potential cAMP response element and various muscle specific elements such as a muscle-specific CAAT-box (MCAT) and four potential binding sites for MyoD1. Using S1 protection analysis the major start site of transcription in muscle and myoblast cells was mapped 3 bp upstream of the published cDNA 5' end. Promoter activity of the 650 bp upstream fragment was tested by in vitro transcription and by transfection analysis of myoblasts and fibroblasts. In all promoter test systems used, MyoD1 promoter activity was detected in myoblasts as well as in fibroblasts. Furthermore, DNA methylation was found to turn off MyoD1 promoter activity both in myoblasts and in fibroblasts. Images PMID:1754380

  13. Genomic structure of the human plasma prekallikrein gene, identification of allelic variants, and analysis in end-stage renal disease.

    PubMed

    Yu, H; Anderson, P J; Freedman, B I; Rich, S S; Bowden, D W

    2000-10-15

    Kallikreins are serine proteases that catalyze the release of kinins and other vasoactive peptides. Previously, we have studied one tissue-specific (H. Yu et al., 1996, J. Am. Soc. Nephrol. 7: 2559-2564) and one plasma-specific (H. Yu et al., 1998, Hypertension 31: 906-911) human kallikrein gene in end-stage renal disease (ESRD). Short sequence repeat polymorphisms for the human plasma kallikrein gene (KLKB1; previously known as KLK3) on chromosome 4 were associated with ESRD in an African American study population. This study of KLKB1 in ESRD has been extended by determining the genomic structure of KLKB1 and searching for allelic variants that may be associated with ESRD. Exon-spanning PCR primer sets were identified by serial testing of primer pairs designed from KLKB1 cDNA sequence and DNA sequencing of PCR products. Like the rat plasma kallikrein gene and the closely related human factor XI gene, the human KLKB1 gene contains 15 exons and 14 introns. The longest intron, F, is almost 12 kb long. The total length of the gene is approximately 30 kb. Sequence of the 5'-proximal promoter region of KLKB1 was obtained by shotgun cloning of genomic fragments from a bacterial artificial clone containing the KLKB1 gene, followed by screening of the clones using exon 1-specific probes. Primers flanking the exons and 5'-proximal promoter region were used to screen for allelic variants in the genomic DNA from ESRD patients and controls using the single-strand conformation polymorphism technique. We identified 12 allelic variants in the 5'-proximal promoter and 7 exons. Of note were a common polymorphism (30% of the population) at position 521 of KLKB1 cDNA, which leads to the replacement of asparagine with a serine at position 124 in the heavy chain of the A2 domain of the protein. In addition, an A716C polymorphism in exon 7 resulting in the amino acid change H189P in the A3 domain of the heavy chain was observed in 5 patients belonging to 3 ESRD families. A third

  14. Cytosines, but not purines, determine recombination activating gene (RAG)-induced breaks on heteroduplex DNA structures: implications for genomic instability.

    PubMed

    Naik, Abani Kanta; Lieber, Michael R; Raghavan, Sathees C

    2010-03-01

    The sequence specificity of the recombination activating gene (RAG) complex during V(D)J recombination has been well studied. RAGs can also act as structure-specific nuclease; however, little is known about the mechanism of its action. Here, we show that in addition to DNA structure, sequence dictates the pattern and efficiency of RAG cleavage on altered DNA structures. Cytosine nucleotides are preferentially nicked by RAGs when present at single-stranded regions of heteroduplex DNA. Although unpaired thymine nucleotides are also nicked, the efficiency is many fold weaker. Induction of single- or double-strand breaks by RAGs depends on the position of cytosines and whether it is present on one or both of the strands. Interestingly, RAGs are unable to induce breaks when adenine or guanine nucleotides are present at single-strand regions. The nucleotide present immediately next to the bubble sequence could also affect RAG cleavage. Hence, we propose "C((d))C((S))C((S))" (d, double-stranded; s, single-stranded) as a consensus sequence for RAG-induced breaks at single-/double-strand DNA transitions. Such a consensus sequence motif is useful for explaining RAG cleavage on other types of DNA structures described in the literature. Therefore, the mechanism of RAG cleavage described here could explain facets of chromosomal rearrangements specific to lymphoid tissues leading to genomic instability. PMID:20051517

  15. Dynamic structures in phytoplasma genomes: sequence variable mosaics (SVMs) of clustered genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Emergence of the phytoplasma clade from an Acholeplasma-like ancestor gave rise to an intriguing group of cell wall-less prokaryotes through a remarkable and continuing evolutionary process. In a ceaseless progression, phytoplasmas have evolved reduced genomes, losing biochemical pathways for synth...

  16. Population Structure and Comparative Genome Hybridization of European Flor Yeast Reveal a Unique Group of Saccharomyces cerevisiae Strains with Few Gene Duplications in Their Genome

    PubMed Central

    Legras, Jean-Luc; Erny, Claude; Charpentier, Claudine

    2014-01-01

    Wine biological aging is a wine making process used to produce specific beverages in several countries in Europe, including Spain, Italy, France, and Hungary. This process involves the formation of a velum at the surface of the wine. Here, we present the first large scale comparison of all European flor strains involved in this process. We inferred the population structure of these European flor strains from their microsatellite genotype diversity and analyzed their ploidy. We show that almost all of these flor strains belong to the same cluster and are diploid, except for a few Spanish strains. Comparison of the array hybridization profile of six flor strains originating from these four countries, with that of three wine strains did not reveal any large segmental amplification. Nonetheless, some genes, including YKL221W/MCH2 and YKL222C, were amplified in the genome of four out of six flor strains. Finally, we correlated ICR1 ncRNA and FLO11 polymorphisms with flor yeast population structure, and associate the presence of wild type ICR1 and a long Flo11p with thin velum formation in a cluster of Jura strains. These results provide new insight into the diversity of flor yeast and show that combinations of different adaptive changes can lead to an increase of hydrophobicity and affect velum formation. PMID:25272156

  17. Improved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangements

    PubMed Central

    Jühling, Frank; Pütz, Joern; Bernt, Matthias; Donath, Alexander; Middendorf, Martin; Florentz, Catherine; Stadler, Peter F.

    2012-01-01

    Transfer RNAs (tRNAs) are present in all types of cells as well as in organelles. tRNAs of animal mitochondria show a low level of primary sequence conservation and exhibit ‘bizarre’ secondary structures, lacking complete domains of the common cloverleaf. Such sequences are hard to detect and hence frequently missed in computational analyses and mitochondrial genome annotation. Here, we introduce an automatic annotation procedure for mitochondrial tRNA genes in Metazoa based on sequence and structural information in manually curated covariance models. The method, applied to re-annotate 1876 available metazoan mitochondrial RefSeq genomes, allows to distinguish between remaining functional genes and degrading ‘pseudogenes’, even at early stages of divergence. The subsequent analysis of a comprehensive set of mitochondrial tRNA genes gives new insights into the evolution of structures of mitochondrial tRNA sequences as well as into the mechanisms of genome rearrangements. We find frequent losses of tRNA genes concentrated in basal Metazoa, frequent independent losses of individual parts of tRNA genes, particularly in Arthropoda, and wide-spread conserved overlaps of tRNAs in opposite reading direction. Direct evidence for several recent Tandem Duplication-Random Loss events is gained, demonstrating that this mechanism has an impact on the appearance of new mitochondrial gene orders. PMID:22139921

  18. The impact of genome-wide supported schizophrenia risk variants in the neurogranin gene on brain structure and function.

    PubMed

    Walton, Esther; Geisler, Daniel; Hass, Johanna; Liu, Jingyu; Turner, Jessica; Yendiki, Anastasia; Smolka, Michael N; Ho, Beng-Choon; Manoach, Dara S; Gollub, Randy L; Roessner, Veit; Calhoun, Vince D; Ehrlich, Stefan

    2013-01-01

    The neural mechanisms underlying genetic risk for schizophrenia, a highly heritable psychiatric condition, are still under investigation. New schizophrenia risk genes discovered through genome-wide association studies (GWAS), such as neurogranin (NRGN), can be used to identify these mechanisms. In this study we examined the association of two common NRGN risk single nucleotide polymorphisms (SNPs) with functional and structural brain-based intermediate phenotypes for schizophrenia. We obtained structural, functional MRI and genotype data of 92 schizophrenia patients and 114 healthy volunteers from the multisite Mind Clinical Imaging Consortium study. Two schizophrenia-associated NRGN SNPs (rs12807809 and rs12541) were tested for association with working memory-elicited dorsolateral prefrontal cortex (DLPFC) activity and surface-wide cortical thickness. NRGN rs12541 risk allele homozygotes (TT) displayed increased working memory-related activity in several brain regions, including the left DLPFC, left insula, left somatosensory cortex and the cingulate cortex, when compared to non-risk allele carriers. NRGN rs12807809 non-risk allele (C) carriers showed reduced cortical gray matter thickness compared to risk allele homozygotes (TT) in an area comprising the right pericalcarine gyrus, the right cuneus, and the right lingual gyrus. Our study highlights the effects of schizophrenia risk variants in the NRGN gene on functional and structural brain-based intermediate phenotypes for schizophrenia. These results support recent GWAS findings and further implicate NRGN in the pathophysiology of schizophrenia by suggesting that genetic NRGN risk variants contribute to subtle changes in neural functioning and anatomy that can be quantified with neuroimaging methods. PMID:24098564

  19. THE FAD2 GENE FAMILY OF SOYBEAN:INSIGHTS INTO THE STRUCTURAL AND FUNCTIONAL DIVERGENCE OF A PALEOPOLYPLOID GENOME

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The omega-6 fatty acid desaturase (FAD2) gene family in soybean consists of at least five members in four regions of the genome. These desaturases are responsible for the conversion of oleic acid to linoleic acid. Bacterial artificial chromosomes (BACs) corresponding to these loci were sequenced to ...

  20. Comparative assessment of the pig, mouse, and human genomes: A structural and functional analysis of genes involved in immunity

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A detailed analysis was conducted on portions of the porcine, murine, and human genome associated with the immune response. It was found that non-protein coding RNA/DNA that potentially interact and regulate gene expression, nucleotide similarity, isochore type, and the similarity of 5’ and 3’ UTR ...

  1. Gene expression patterns are correlated with genomic and genic structure in soybean

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Studies have indicated that exon and intron size, and intergenic distance are correlated with gene expression levels and expression breadth. Previous studies on these correlations in plants and animals have been conflicting. In this study next-generation sequence data of the soybean transcriptome wa...

  2. Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes

    PubMed Central

    Matus, José Tomás; Aquea, Felipe; Arce-Johnson, Patricio

    2008-01-01

    Background The MYB superfamily constitutes the most abundant group of transcription factors described in plants. Members control processes such as epidermal cell differentiation, stomatal aperture, flavonoid synthesis, cold and drought tolerance and pathogen resistance. No genome-wide characterization of this family has been conducted in a woody species such as grapevine. In addition, previous analysis of the recently released grape genome sequence suggested expansion events of several gene families involved in wine quality. Results We describe and classify 108 members of the grape R2R3 MYB gene subfamily in terms of their genomic gene structures and similarity to their putative Arabidopsis thaliana orthologues. Seven gene models were derived and analyzed in terms of gene expression and their DNA binding domain structures. Despite low overall sequence homology in the C-terminus of all proteins, even in those with similar functions across Arabidopsis and Vitis, highly conserved motif sequences and exon lengths were found. The grape epidermal cell fate clade is expanded when compared with the Arabidopsis and rice MYB subfamilies. Two anthocyanin MYBA related clusters were identified in chromosomes 2 and 14, one of which includes the previously described grape colour locus. Tannin related loci were also detected with eight candidate homologues in chromosomes 4, 9 and 11. Conclusion This genome wide transcription factor analysis in Vitis suggests that clade-specific grape R2R3 MYB genes are expanded while other MYB genes could be well conserved compared to Arabidopsis. MYB gene abundance, homology and orientation within particular loci also suggests that expanded MYB clades conferring quality attributes of grapes and wines, such as colour and astringency, could possess redundant, overlapping and cooperative functions. PMID:18647406

  3. Synonymous Codon Usage Bias in the Plastid Genome is Unrelated to Gene Structure and Shows Evolutionary Heterogeneity

    PubMed Central

    Qi, Yueying; Xu, Wenjing; Xing, Tian; Zhao, Mingming; Li, Nana; Yan, Li; Xia, Guangmin; Wang, Mengcheng

    2015-01-01

    Synonymous codon usage bias (SCUB) is the nonuniform usage of codons, occurring often in nearly all organisms. Our previous study found that SCUB is correlated with intron number, is unequal among exons in the plant nuclear genome, and mirrors evolutionary specialization. However, whether this rule exists in the plastid genome has not been addressed. Here, we present an analysis of SCUB in the plastid genomes of 25 species from lower to higher plants (algae, bryophytes, pteridophytes, gymnosperms, and spermatophytes). We found NNA and NNT (A- and T-ending codons) are preferential in the plastid genomes of all plants. Interestingly, this preference is heterogeneous among taxonomies of plants, with the strongest preference in bryophytes and the weakest in pteridophytes, suggesting an association between SCUB and plant evolution. In addition, SCUB frequencies are consistent among genes with varied introns and among exons, indicating that the bias of NNA and NNT is unrelated to either intron number or exon position. Further, SCUB is associated with DNA methylation–induced conversion of cytosine to thymine in the vascular plants but not in algae or bryophytes. These data demonstrate that these SCUB profiles in the plastid genome are distinctly different compared with the nuclear genome. PMID:25922569

  4. Chromosomal localization, genomic structure, and allelic polymorphism of the human CD79a (lg-{alpha}/mb-1) gene

    SciTech Connect

    Hashimoto, S.; Gregersen, P.K.; Chiorazzi, N. |; Mohrenweiser, H.W.

    1994-12-31

    The germline DNA sequence of the human CD79a (Ig-{alpha}/mb-1) gene was determined by polymerase chain reaction sequencing of a cosmid clone derived from an arrayed human chromosome 19 library. The CD79a gene was localized to chromosome 19q13.2; this localization places the gene within the CEA-like gene cluster with the following gene order: -CEA-CGM1-CD79a-RPS11-ATP1A3-BGP-CGM9-. The genomic organization of the human CD79a gene resembles the mouse counterpart with five exons interrupted by four introns. Computer analyses suggest the presence of transcription regulatory elements known to be important in the regulation of mouse CD79a (AP-1, EBF, AP-2, MUF2, and SP-1 sites), as well as elements not found in the mouse gene (an NK-kB binding site and a series of E-box motifs). Similar to the mouse gene, the 5{prime} flanking region of human CD79a lacks a TATA box; however, unlike mouse CD79a, a classical octamer motif could not be identified in the human gene. Finally, a new Rsa I restriction fragment length polymorphism was defined in the non-coding regions of the human gene. 64 refs., 4 figs., 2 tabs.

  5. Brief Guide to Genomics: DNA, Genes and Genomes

    MedlinePlus

    ... guía de genómica A Brief Guide to Genomics DNA, Genes and Genomes Deoxyribonucleic acid (DNA) is the ... and lead to a disease such as cancer. DNA Sequencing Sequencing simply means determining the exact order ...

  6. Genome Structure of the Symbiont Bifidobacterium pseudocatenulatum CECT 7765 and Gene Expression Profiling in Response to Lactulose-Derived Oligosaccharides

    PubMed Central

    Benítez-Páez, Alfonso; Moreno, F. Javier; Sanz, María L.; Sanz, Yolanda

    2016-01-01

    Bifidobacterium pseudocatenulatum CECT 7765 was isolated from stools of a breast-fed infant. Although, this strain is generally considered an adult-type bifidobacterial species, it has also been shown to have pre-clinical efficacy in obesity models. In order to understand the molecular basis of its adaptation to complex carbohydrates and improve its potential functionality, we have analyzed its genome and transcriptome, as well as its metabolic output when growing in galacto-oligosaccharides derived from lactulose (GOS-Lu) as carbon source. B. pseudocatenulatum CECT 7765 shows strain-specific genome regions, including a great diversity of sugar metabolic-related genes. A preliminary and exploratory transcriptome analysis suggests candidate over-expression of several genes coding for sugar transporters and permeases; furthermore, five out of seven beta-galactosidases identified in the genome could be activated in response to GOS-Lu exposure. Here, we also propose that a specific gene cluster is involved in controlling the import and hydrolysis of certain di- and tri-saccharides, which seemed to be those primarily taken-up by the bifidobacterial strain. This was discerned from mass spectrometry-based quantification of different saccharide fractions of culture supernatants. Our results confirm that the expression of genes involved in sugar transport and metabolism and in the synthesis of leucine, an amino acid with a key role in glucose and energy homeostasis, was up-regulated by GOS-Lu. This was done using qPCR in addition to the exploratory information derived from the single-replicated RNAseq approach, together with the functional annotation of genes predicted to be encoded in the B. pseudocatenulatum CETC 7765 genome. PMID:27199952

  7. Genome Structure of the Symbiont Bifidobacterium pseudocatenulatum CECT 7765 and Gene Expression Profiling in Response to Lactulose-Derived Oligosaccharides.

    PubMed

    Benítez-Páez, Alfonso; Moreno, F Javier; Sanz, María L; Sanz, Yolanda

    2016-01-01

    Bifidobacterium pseudocatenulatum CECT 7765 was isolated from stools of a breast-fed infant. Although, this strain is generally considered an adult-type bifidobacterial species, it has also been shown to have pre-clinical efficacy in obesity models. In order to understand the molecular basis of its adaptation to complex carbohydrates and improve its potential functionality, we have analyzed its genome and transcriptome, as well as its metabolic output when growing in galacto-oligosaccharides derived from lactulose (GOS-Lu) as carbon source. B. pseudocatenulatum CECT 7765 shows strain-specific genome regions, including a great diversity of sugar metabolic-related genes. A preliminary and exploratory transcriptome analysis suggests candidate over-expression of several genes coding for sugar transporters and permeases; furthermore, five out of seven beta-galactosidases identified in the genome could be activated in response to GOS-Lu exposure. Here, we also propose that a specific gene cluster is involved in controlling the import and hydrolysis of certain di- and tri-saccharides, which seemed to be those primarily taken-up by the bifidobacterial strain. This was discerned from mass spectrometry-based quantification of different saccharide fractions of culture supernatants. Our results confirm that the expression of genes involved in sugar transport and metabolism and in the synthesis of leucine, an amino acid with a key role in glucose and energy homeostasis, was up-regulated by GOS-Lu. This was done using qPCR in addition to the exploratory information derived from the single-replicated RNAseq approach, together with the functional annotation of genes predicted to be encoded in the B. pseudocatenulatum CETC 7765 genome. PMID:27199952

  8. Characterization of gene rearrangements resulted from genomic structural aberrations in human esophageal squamous cell carcinoma KYSE150 cells.

    PubMed

    Hao, Jia-Jie; Gong, Ting; Zhang, Yu; Shi, Zhi-Zhou; Xu, Xin; Dong, Jin-Tang; Zhan, Qi-Min; Fu, Song-Bin; Wang, Ming-Rong

    2013-01-15

    Chromosomal rearrangements and involved genes have been reported to play important roles in the development and progression of human malignancies. But the gene rearrangements in esophageal squamous cell carcinoma (ESCC) remain to be identified. In the present study, array-based comparative genomic hybridization (array-CGH) was performed on the ESCC cell line KYSE150. Eight disrupted genes were detected according to the obviously distinct unbalanced breakpoints. The splitting of these genes was validated by dual-color fluorescence in-situ hybridization (FISH). By using rapid amplification of cDNA ends (RACE), genome walking and sequencing analysis, we further identified gene disruptions and rearrangements. A fusion transcript DTL-1q42.2 was derived from an intrachromosomal rearrangement of chromosome 1. Highly amplified segments of DTL and PTPRD were self-rearranged. The sequences on either side of the junctions possess micro-homology with each other. FISH results indicated that the split DTL and PTPRD were also involved in comprising parts of the derivative chromosomes resulted from t(1q;9p;12p) and t(9;1;9). Further, we found that regions harboring DTL (1q32.3) and PTPRD (9p23) were also splitting in ESCC tumors. The data supplement significant information on the existing genetic background of KYSE150, which may be used as a model for studying these gene rearrangements. PMID:23026210

  9. Evolutionary origin of Rosaceae-specific active non-autonomous hAT elements and their contribution to gene regulation and genomic structural variation.

    PubMed

    Wang, Lu; Peng, Qian; Zhao, Jianbo; Ren, Fei; Zhou, Hui; Wang, Wei; Liao, Liao; Owiti, Albert; Jiang, Quan; Han, Yuepeng

    2016-05-01

    Transposable elements account for approximately 30 % of the Prunus genome; however, their evolutionary origin and functionality remain largely unclear. In this study, we identified a hAT transposon family, termed Moshan, in Prunus. The Moshan elements consist of three types, aMoshan, tMoshan, and mMoshan. The aMoshan and tMoshan types contain intact or truncated transposase genes, respectively, while the mMoshan type is miniature inverted-repeat transposable element (MITE). The Moshan transposons are unique to Rosaceae, and the copy numbers of different Moshan types are significantly correlated. Sequence homology analysis reveals that the mMoshan MITEs are direct deletion derivatives of the tMoshan progenitors, and one kind of mMoshan containing a MuDR-derived fragment were amplified predominately in the peach genome. The mMoshan sequences contain cis-regulatory elements that can enhance gene expression up to 100-fold. The mMoshan MITEs can serve as potential sources of micro and long noncoding RNAs. Whole-genome re-sequencing analysis indicates that mMoshan elements are highly active, and an insertion into S-haplotype-specific F-box gene was reported to cause the breakdown of self-incompatibility in sour cherry. Taken together, all these results suggest that the mMoshan elements play important roles in regulating gene expression and driving genomic structural variation in Prunus. PMID:26941188

  10. Tripartite mitochondrial genome of spinach: physical structure, mitochondrial gene mapping, and locations of transposed chloroplast DNA sequences.

    PubMed Central

    Stern, D B; Palmer, J D

    1986-01-01

    A complete physical map of the spinach mitochondrial genome has been established. The entire sequence content of 327 kilobase pairs (kb) is postulated to occur as a single circular molecule. Two directly repeated elements of approximately 6 kb, located on this "master chromosome", are proposed to participate in an intragenomic recombination event that reversibly generates two "subgenomic" circles of 93 kb and 234 kb. The positions of protein and ribosomal RNA-encoding genes, determined by heterologous filter hybridizations, are scattered throughout the genome, with duplicate 26S rRNA genes located partially or entirely within the 6 kb repeat elements. Filter hybridizations between spinach mitochondrial DNA and cloned segments of spinach chloroplast DNA reveal at least twelve dispersed regions of inter-organellar sequence homology. Images PMID:3016660

  11. Genomic contributions in livestock gene introgression programmes

    PubMed Central

    Wall, Eileen; Visscher, Peter M; Hospital, Frédéric; Woolliams, John A

    2005-01-01

    The composition of the genome after introgression of a marker gene from a donor to a recipient breed was studied using analytical and simulation methods. Theoretical predictions of proportional genomic contributions, including donor linkage drag, from ancestors used at each generation of crossing after an introgression programme agreed closely with simulated results. The obligate drag, the donor genome surrounding the target locus that cannot be removed by subsequent selection, was also studied. It was shown that the number of backcross generations and the length of the chromosome affected proportional genomic contributions to the carrier chromosomes. Population structure had no significant effect on ancestral contributions and linkage drag but it did have an effect on the obligate drag whereby larger offspring groups resulted in smaller obligate drag. The implications for an introgression programme of the number of backcross generations, the population structure and the carrier chromosome length are discussed. The equations derived describing contributions to the genome from individuals from a given generation provide a framework to predict the genomic composition of a population after the introgression of a favourable donor allele. These ancestral contributions can be assigned a value and therefore allow the prediction of genetic lag. PMID:15823237

  12. Genome-wide identification, structural analysis and new insights into late embryogenesis abundant (LEA) gene family formation pattern in Brassica napus

    PubMed Central

    Liang, Yu; Xiong, Ziyi; Zheng, Jianxiao; Xu, Dongyang; Zhu, Zeyang; Xiang, Jun; Gan, Jianping; Raboanatahiry, Nadia; Yin, Yongtai; Li, Maoteng

    2016-01-01

    Late embryogenesis abundant (LEA) proteins are a diverse and large group of polypeptides that play important roles in desiccation and freezing tolerance in plants. The LEA family has been systematically characterized in some plants but not Brassica napus. In this study, 108 BnLEA genes were identified in the B. napus genome and classified into eight families based on their conserved domains. Protein sequence alignments revealed an abundance of alanine, lysine and glutamic acid residues in BnLEA proteins. The BnLEA gene structure has few introns (<3), and they are distributed unevenly across all 19 chromosomes in B. napus, occurring as gene clusters in chromosomes A9, C2, C4 and C5. More than two-thirds of the BnLEA genes are associated with segmental duplication. Synteny analysis revealed that most LEA genes are conserved, although gene losses or gains were also identified. These results suggest that segmental duplication and whole-genome duplication played a major role in the expansion of the BnLEA gene family. Expression profiles analysis indicated that expression of most BnLEAs was increased in leaves and late stage seeds. This study presents a comprehensive overview of the LEA gene family in B. napus and provides new insights into the formation of this family. PMID:27072743

  13. Genome-wide identification, structural analysis and new insights into late embryogenesis abundant (LEA) gene family formation pattern in Brassica napus.

    PubMed

    Liang, Yu; Xiong, Ziyi; Zheng, Jianxiao; Xu, Dongyang; Zhu, Zeyang; Xiang, Jun; Gan, Jianping; Raboanatahiry, Nadia; Yin, Yongtai; Li, Maoteng

    2016-01-01

    Late embryogenesis abundant (LEA) proteins are a diverse and large group of polypeptides that play important roles in desiccation and freezing tolerance in plants. The LEA family has been systematically characterized in some plants but not Brassica napus. In this study, 108 BnLEA genes were identified in the B. napus genome and classified into eight families based on their conserved domains. Protein sequence alignments revealed an abundance of alanine, lysine and glutamic acid residues in BnLEA proteins. The BnLEA gene structure has few introns (<3), and they are distributed unevenly across all 19 chromosomes in B. napus, occurring as gene clusters in chromosomes A9, C2, C4 and C5. More than two-thirds of the BnLEA genes are associated with segmental duplication. Synteny analysis revealed that most LEA genes are conserved, although gene losses or gains were also identified. These results suggest that segmental duplication and whole-genome duplication played a major role in the expansion of the BnLEA gene family. Expression profiles analysis indicated that expression of most BnLEAs was increased in leaves and late stage seeds. This study presents a comprehensive overview of the LEA gene family in B. napus and provides new insights into the formation of this family. PMID:27072743

  14. Heat Shock Protein 70 and 90 Genes in the Harmful Dinoflagellate Cochlodinium polykrikoides: Genomic Structures and Transcriptional Responses to Environmental Stresses

    PubMed Central

    Guo, Ruoyu; Youn, Seok Hyun; Ki, Jang-Seu

    2015-01-01

    The marine dinoflagellate Cochlodinium polykrikoides is responsible for harmful algal blooms in aquatic environments and has spread into the world's oceans. As a microeukaryote, it seems to have distinct genomic characteristics, like gene structure and regulation. In the present study, we characterized heat shock protein (HSP) 70/90 of C. polykrikoides and evaluated their transcriptional responses to environmental stresses. Both HSPs contained the conserved motif patterns, showing the highest homology with those of other dinoflagellates. Genomic analysis showed that the CpHSP70 had no intron but was encoded by tandem arrangement manner with separation of intergenic spacers. However, CpHSP90 had one intron in the coding genomic regions, and no intergenic region was found. Phylogenetic analyses of separate HSPs showed that CpHSP70 was closely related with the dinoflagellate Crypthecodinium cohnii and CpHSP90 with other Gymnodiniales in dinoflagellates. Gene expression analyses showed that both HSP genes were upregulated by the treatments of separate algicides CuSO4 and NaOCl; however, they displayed downregulation pattern with PCB treatment. The transcription of CpHSP90 and CpHSP70 showed similar expression patterns under the same toxicant treatment, suggesting that both genes might have cooperative functions for the toxicant induced gene regulation in the dinoflagellate. PMID:26064872

  15. Genes for two calcium-dependent cell adhesion molecules have similar structures and are arranged in tandem in the chicken genome.

    PubMed Central

    Sorkin, B C; Gallin, W J; Edelman, G M; Cunningham, B A

    1991-01-01

    Genomic sequences immediately upstream of the translational start site for the chicken liver cell adhesion molecule (L-CAM) gene contain a second closely related gene, which, because of its location, we have designated the K-CAM gene. Less than 700 base pairs separate the presumed poly(A) site in the K-CAM gene from the translation initiation site for L-CAM. The sizes of exons 4-15 of the K-CAM gene are almost identical to those in the L-CAM gene and the exon/intron junctions occur at exactly equivalent positions in both genes. Exon 16, which includes the 3' untranslated region, is much shorter in the K-CAM gene and intron sizes and sequences are not generally conserved between the two genes. Probes from the K-CAM gene hybridized to a 3-kilobase mRNA that was present at high levels in embryonic skin, at lower levels in kidney, heart, and gizzard, and at still lower levels in brain and liver, as determined by Northern blotting. The sequence of the predicted gene product was nearly identical to that of the chicken B-cadherin cDNA, although the distribution of the K-CAM gene transcript differed from that reported for the cadherin. The proximity and identical overall structure of the K-CAM and L-CAM genes strongly suggest that they arose by gene duplication and raise the possibility that genes for other calcium-dependent CAMs may be located in clusters. Moreover, the tandem arrangement of the genes may have important implications for the regulation of their expression. Images PMID:1763068

  16. Horizontal gene transfer and the rock record: comparative genomics of phylogenetically distant bacteria that induce wrinkle structure formation in modern sediments.

    PubMed

    Flood, B E; Bailey, J V; Biddle, J F

    2014-03-01

    Wrinkle structures are sedimentary features that are produced primarily through the trapping and binding of siliciclastic sediments by mat-forming micro-organisms. Wrinkle structures and related sedimentary structures in the rock record are commonly interpreted to represent the stabilizing influence of cyanobacteria on sediments because cyanobacteria are known to produce similar textures and structures in modern tidal flat settings. However, other extant bacteria such as filamentous representatives of the family Beggiatoaceae can also interact with sediments to produce sedimentary features that morphologically resemble many of those associated with cyanobacteria-dominated mats. While Beggiatoa spp. and cyanobacteria are metabolically and phylogenetically distant, genomic analyses show that the two groups share hundreds of homologous genes, likely as the result of horizontal gene transfer. The comparative genomics results described here suggest that some horizontally transferred genes may code for phenotypic traits such as filament formation, chemotaxis, and the production of extracellular polymeric substances that potentially underlie the similar biostabilizing influences of these organisms on sediments. We suggest that the ecological utility of certain basic life modes such as the construction of mats and biofilms, coupled with the lateral mobility of genes in the microbial world, introduces an element of uncertainty into the inference of specific phylogenetic origins from gross morphological features preserved in the ancient rock record. PMID:24382125

  17. Aspergillus parasiticus SU-1 genome sequence, predicted chromosome structure, and comparative gene expression under aflatoxin-inducing conditions: evidence that differential expression contributes to species phenotype.

    PubMed

    Linz, John E; Wee, Josephine; Roze, Ludmila V

    2014-08-01

    The filamentous fungi Aspergillus parasiticus and Aspergillus flavus produce the carcinogenic secondary metabolite aflatoxin on susceptible crops. These species differ in the quantity of aflatoxins B1, B2, G1, and G2 produced in culture, in the ability to produce the mycotoxin cyclopiazonic acid, and in morphology of mycelia and conidiospores. To understand the genetic basis for differences in biochemistry and morphology, we conducted next-generation sequence (NGS) analysis of the A. parasiticus strain SU-1 genome and comparative gene expression (RNA sequence analysis [RNA Seq]) analysis of A. parasiticus SU-1 and A. flavus strain NRRL 3357 (3357) grown under aflatoxin-inducing and -noninducing culture conditions. Although A. parasiticus SU-1 and A. flavus 3357 are highly similar in genome structure and gene organization, we observed differences in the presence of specific mycotoxin gene clusters and differential expression of specific mycotoxin genes and gene clusters that help explain differences in the type and quantity of mycotoxins synthesized. Using computer-aided analysis of secondary metabolite clusters (antiSMASH), we demonstrated that A. parasiticus SU-1 and A. flavus 3357 may carry up to 93 secondary metabolite gene clusters, and surprisingly, up to 10% of the genome appears to be dedicated to secondary metabolite synthesis. The data also suggest that fungus-specific zinc binuclear cluster (C6) transcription factors play an important role in regulation of secondary metabolite cluster expression. Finally, we identified uniquely expressed genes in A. parasiticus SU-1 that encode C6 transcription factors and genes involved in secondary metabolism and stress response/cellular defense. Future work will focus on these differentially expressed A. parasiticus SU-1 loci to reveal their role in determining distinct species characteristics. PMID:24951444

  18. Uses of antimicrobial genes from microbial genome

    DOEpatents

    Sorek, Rotem; Rubin, Edward M.

    2013-08-20

    We describe a method for mining microbial genomes to discover antimicrobial genes and proteins having broad spectrum of activity. Also described are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.

  19. Global efforts in structural genomics.

    PubMed

    Stevens, R C; Yokoyama, S; Wilson, I A

    2001-10-01

    A worldwide initiative in structural genomics aims to capitalize on the recent successes of the genome projects. Substantial new investments in structural genomics in the past 2 years indicate the high level of support for these international efforts. Already, enormous progress has been made on high-throughput methodologies and technologies that will speed up macromolecular structure determinations. Recent international meetings have resulted in the formation of an International Structural Genomics Organization to formulate policy and foster cooperation between the public and private efforts. PMID:11588249

  20. Characterization of the genomic structure and tissue-specific promoter of the human nuclear receptor NR5A2 (hB1F) gene.

    PubMed

    Zhang, C K; Lin, W; Cai, Y N; Xu, P L; Dong, H; Li, M; Kong, Y Y; Fu, G; Xie, Y H; Huang, G M; Wang, Y

    2001-08-01

    The human homologue of the Drosophila melanogaster orphan nuclear receptor fushi tarazu factor 1 (Ftz-F1), NR5A2 (hB1F), was initially identified as a regulatory factor that binds and activates enhancer II of hepatitis B virus. NR5A2 (hB1F) is expressed specifically in pancreas and liver, playing important roles in the regulation of several liver-specific genes. A detailed analysis on the genomic structure and promoter activity will greatly promote future studies on the function of the NR5A2 (hB1F) gene. In this report, a bacterial artificial chromosome clone and several phage clones covering the NR5A2 (hB1F) gene were isolated and the complete genomic sequence was obtained. Alignment of different cDNAs of the NR5A2 (hB1F) gene with the genomic sequence facilitated the delineation of its structural organization, which spans over 150 kb and consists of eight exons interrupted by seven introns. RT-PCR and 3'-RACE revealed that utilization of two polyadenylation signals results in the 3.8 and 5.2 kb transcripts that were observed previously. The transcription start site of the NR5A2 (hB1F) gene was mapped downstream of a canonical TATA box. An upstream fragment containing binding sites for several liver-specific and ubiquitous transcription factors exhibits hepatocyte-specific promoter activity. Transient transfections indicated that hepatocyte nuclear factors HNF1 and HNF3beta could activate NR5A2 (hB1F) promoter. PMID:11595170

  1. Structure, expression profile and phylogenetic inference of chalcone isomerase-like genes from the narrow-leafed lupin (Lupinus angustifolius L.) genome

    PubMed Central

    Przysiecka, Łucja; Książkiewicz, Michał; Wolko, Bogdan; Naganowska, Barbara

    2015-01-01

    Lupins, like other legumes, have a unique biosynthesis scheme of 5-deoxy-type flavonoids and isoflavonoids. A key enzyme in this pathway is chalcone isomerase (CHI), a member of CHI-fold protein family, encompassing subfamilies of CHI1, CHI2, CHI-like (CHIL), and fatty acid-binding (FAP) proteins. Here, two Lupinus angustifolius (narrow-leafed lupin) CHILs, LangCHIL1 and LangCHIL2, were identified and characterized using DNA fingerprinting, cytogenetic and linkage mapping, sequencing and expression profiling. Clones carrying CHIL sequences were assembled into two contigs. Full gene sequences were obtained from these contigs, and mapped in two L. angustifolius linkage groups by gene-specific markers. Bacterial artificial chromosome fluorescence in situ hybridization approach confirmed the localization of two LangCHIL genes in distinct chromosomes. The expression profiles of both LangCHIL isoforms were very similar. The highest level of transcription was in the roots of the third week of plant growth; thereafter, expression declined. The expression of both LangCHIL genes in leaves and stems was similar and low. Comparative mapping to reference legume genome sequences revealed strong syntenic links; however, LangCHIL2 contig had a much more conserved structure than LangCHIL1. LangCHIL2 is assumed to be an ancestor gene, whereas LangCHIL1 probably appeared as a result of duplication. As both copies are transcriptionally active, questions arise concerning their hypothetical functional divergence. Screening of the narrow-leafed lupin genome and transcriptome with CHI-fold protein sequences, followed by Bayesian inference of phylogeny and cross-genera synteny survey, identified representatives of all but one (CHI1) main subfamilies. They are as follows: two copies of CHI2, FAPa2 and CHIL, and single copies of FAPb and FAPa1. Duplicated genes are remnants of whole genome duplication which is assumed to have occurred after the divergence of Lupinus, Arachis, and Glycine

  2. Genomic organization of mouse gene zfp162.

    PubMed

    Wrehlke, C; Wiedemeyer, W R; Schmitt-Wrede, H P; Mincheva, A; Lichter, P; Wunderlich, F

    1999-05-01

    We report the cloning and characterization of the alternatively spliced mouse gene zfp162, formerly termed mzfm, the homolog of the human ZFM1 gene encoding the splicing factor SF1 and a putative signal transduction and activation of RNA (STAR) protein. The zfp162 gene is about 14 kb long and consists of 14 exons and 13 introns. Comparison of zfp162 with the genomic sequences of ZFM1/SF1 revealed that the exon-intron structure and exon sequences are well conserved between the genes, whereas the introns differ in length and sequence composition. Using fluorescent in situ hybridization, the zfp162 gene was assigned to chromosome 19, region B. Screening of a genomic library integrated in lambda DASH II resulted in the identification of the 5'-flanking region of zfp162. Sequence analysis of this region showed that zfp162 is a TATA-less gene containing an initiator control element and two CCAAT boxes. The promoter exhibits the following motifs: AP-2, CRE, Ets, GRE, HNF5, MRE, SP-1, TRE, TCF1, and PU.1. The core promoter, from position -331 to -157, contains the motifs CRE, SP-1, MRE, and AP-2, as determined in transfected CHO-K1 cells and IC-21 cells by reporter gene assay using a secreted form of human placental alkaline phosphatase. The occurrence of PU.1/GRE supports the view that the zfp162 gene encodes a protein involved not only in nuclear RNA metabolism, as the human ZFM1/SF1, but also in as yet unknown macrophage-inherent functions. PMID:10360842

  3. Single molecule real-time sequencing of Xanthomonas oryzae genomes reveals a dynamic structure and complex TAL (transcription activator-like) effector gene relationships

    PubMed Central

    Booher, Nicholas J.; Carpenter, Sara C. D.; Sebra, Robert P.; Wang, Li; Salzberg, Steven L.; Leach, Jan E.; Bogdanove, Adam J.

    2016-01-01

    Pathogen-injected, direct transcriptional activators of host genes, TAL (transcription activator-like) effectors play determinative roles in plant diseases caused by Xanthomonas spp. A large domain of nearly identical, 33–35 aa repeats in each protein mediates DNA recognition. This modularity makes TAL effectors customizable and thus important also in biotechnology. However, the repeats render TAL effector (tal) genes nearly impossible to assemble using next-generation, short reads. Here, we demonstrate that long-read, single molecule real-time (SMRT) sequencing solves this problem. Taking an ensemble approach to first generate local, tal gene contigs, we correctly assembled de novo the genomes of two strains of the rice pathogen X. oryzae completed previously using the Sanger method and even identified errors in those references. Sequencing two more strains revealed a dynamic genome structure and a striking plasticity in tal gene content. Our results pave the way for population-level studies to inform resistance breeding, improve biotechnology and probe TAL effector evolution. PMID:27148456

  4. Persistence drives gene clustering in bacterial genomes

    PubMed Central

    Fang, Gang; Rocha, Eduardo PC; Danchin, Antoine

    2008-01-01

    Background Gene clustering plays an important role in the organization of the bacterial chromosome and several mechanisms have been proposed to explain its extent. However, the controversies raised about the validity of each of these mechanisms remind us that the cause of this gene organization remains an open question. Models proposed to explain clustering did not take into account the function of the gene products nor the likely presence or absence of a given gene in a genome. However, genomes harbor two very different categories of genes: those genes present in a majority of organisms – persistent genes – and those present in very few organisms – rare genes. Results We show that two classes of genes are significantly clustered in bacterial genomes: the highly persistent and the rare genes. The clustering of rare genes is readily explained by the selfish operon theory. Yet, genes persistently present in bacterial genomes are also clustered and we try to understand why. We propose a model accounting specifically for such clustering, and show that indispensability in a genome with frequent gene deletion and insertion leads to the transient clustering of these genes. The model describes how clusters are created via the gene flux that continuously introduces new genes while deleting others. We then test if known selective processes, such as co-transcription, physical interaction or functional neighborhood, account for the stabilization of these clusters. Conclusion We show that the strong selective pressure acting on the function of persistent genes, in a permanent state of flux of genes in bacterial genomes, maintaining their size fairly constant, that drives persistent genes clustering. A further selective stabilization process might contribute to maintaining the clustering. PMID:18179692

  5. Genome Majority Vote Improves Gene Predictions

    PubMed Central

    Wall, Michael E.; Raghavan, Sindhu; Cohn, Judith D.; Dunbar, John

    2011-01-01

    Recent studies have noted extensive inconsistencies in gene start sites among orthologous genes in related microbial genomes. Here we provide the first documented evidence that imposing gene start consistency improves the accuracy of gene start-site prediction. We applied an algorithm using a genome majority vote (GMV) scheme to increase the consistency of gene starts among orthologs. We used a set of validated Escherichia coli genes as a standard to quantify accuracy. Results showed that the GMV algorithm can correct hundreds of gene prediction errors in sets of five or ten genomes while introducing few errors. Using a conservative calculation, we project that GMV would resolve many inconsistencies and errors in publicly available microbial gene maps. Our simple and logical solution provides a notable advance toward accurate gene maps. PMID:22131910

  6. A joint modeling approach for uncovering associations between gene expression, bioactivity and chemical structure in early drug discovery to guide lead selection and genomic biomarker development.

    PubMed

    Perualila-Tan, Nolen; Kasim, Adetayo; Talloen, Willem; Verbist, Bie; Göhlmann, Hinrich W H; Shkedy, Ziv

    2016-08-01

    The modern drug discovery process involves multiple sources of high-dimensional data. This imposes the challenge of data integration. A typical example is the integration of chemical structure (fingerprint features), phenotypic bioactivity (bioassay read-outs) data for targets of interest, and transcriptomic (gene expression) data in early drug discovery to better understand the chemical and biological mechanisms of candidate drugs, and to facilitate early detection of safety issues prior to later and expensive phases of drug development cycles. In this paper, we discuss a joint model for the transcriptomic and the phenotypic variables conditioned on the chemical structure. This modeling approach can be used to uncover, for a given set of compounds, the association between gene expression and biological activity taking into account the influence of the chemical structure of the compound on both variables. The model allows to detect genes that are associated with the bioactivity data facilitating the identification of potential genomic biomarkers for compounds efficacy. In addition, the effect of every structural feature on both genes and pIC50 and their associations can be simultaneously investigated. Two oncology projects are used to illustrate the applicability and usefulness of the joint model to integrate multi-source high-dimensional information to aid drug discovery. PMID:27269248

  7. Genome-wide comparative analysis of the Brassica rapa gene space reveals genome shrinkage and differential loss of duplicated genes after whole genome triplication

    PubMed Central

    2009-01-01

    Background Brassica rapa is one of the most economically important vegetable crops worldwide. Owing to its agronomic importance and phylogenetic position, B. rapa provides a crucial reference to understand polyploidy-related crop genome evolution. The high degree of sequence identity and remarkably conserved genome structure between Arabidopsis and Brassica genomes enables comparative tiling sequencing using Arabidopsis sequences as references to select the counterpart regions in B. rapa, which is a strong challenge of structural and comparative crop genomics. Results We assembled 65.8 megabase-pairs of non-redundant euchromatic sequence of B. rapa and compared this sequence to the Arabidopsis genome to investigate chromosomal relationships, macrosynteny blocks, and microsynteny within blocks. The triplicated B. rapa genome contains only approximately twice the number of genes as in Arabidopsis because of genome shrinkage. Genome comparisons suggest that B. rapa has a distinct organization of ancestral genome blocks as a result of recent whole genome triplication followed by a unique diploidization process. A lack of the most recent whole genome duplication (3R) event in the B. rapa genome, atypical of other Brassica genomes, may account for the emergence of B. rapa from the Brassica progenitor around 8 million years ago. Conclusions This work demonstrates the potential of using comparative tiling sequencing for genome analysis of crop species. Based on a comparative analysis of the B. rapa sequences and the Arabidopsis genome, it appears that polyploidy and chromosomal diploidization are ongoing processes that collectively stabilize the B. rapa genome and facilitate its evolution. PMID:19821981

  8. A Gene-By-Gene Approach to Bacterial Population Genomics: Whole Genome MLST of Campylobacter.

    PubMed

    Sheppard, Samuel K; Jolley, Keith A; Maiden, Martin C J

    2012-01-01

    Campylobacteriosis remains a major human public health problem world-wide. Genetic analyses of Campylobacter isolates, and particularly molecular epidemiology, have been central to the study of this disease, particularly the characterization of Campylobacter genotypes isolated from human infection, farm animals, and retail food. These studies have demonstrated that Campylobacter populations are highly structured, with distinct genotypes associated with particular wild or domestic animal sources, and that chicken meat is the most likely source of most human infection in countries such as the UK. The availability of multiple whole genome sequences from Campylobacter isolates presents the prospect of identifying those genes or allelic variants responsible for host-association and increased human disease risk, but the diversity of Campylobacter genomes present challenges for such analyses. We present a gene-by-gene approach for investigating the genetic basis of phenotypes in diverse bacteria such as Campylobacter, implemented with the BIGSdb software on the pubMLST.org/campylobacter website. PMID:24704917

  9. Genome-wide identification of BURP domain-containing genes in rice reveals a gene family with diverse structures and responses to abiotic stresses.

    PubMed

    Ding, Xipeng; Hou, Xin; Xie, Kabin; Xiong, Lizhong

    2009-06-01

    Increasing evidence suggests that a gene family encoding proteins containing BURP domains have diverse functions in plants, but systematic characterization of this gene family have not been reported. In this study, 17 BURP family genes (OsBURP01-17) were identified and analyzed in rice (Oryza sativa L.). These genes have diverse exon-intron structures and distinct organization of putative motifs. Based on the phylogenetic analysis of BURP protein sequences from rice and other plant species, the BURP family was classified into seven subfamilies, including two subfamilies (BURP V and BURP VI) with members from rice only and one subfamily (BURP VII) with members from monocotyledons only. Two BURP gene clusters, belonging to BURP V and BURP VI, were located in the duplicated region on chromosome 5 and 6 of rice, respectively. Transcript level analysis of BURP genes of rice in various tissues and organs revealed different tempo-spatial expression patterns, suggesting that these genes may function at different stages of plant growth and development. Interestingly, all the genes of the BURP VII subfamily were predominantly expressed in flower organs. We also investigated the expression patterns of BURP genes of rice under different stress conditions. The results suggested that, except for two genes (OsBURP01 and OsBURP13), all other members were induced by at least one of the stresses including drought, salt, cold, and abscisic acid treatment. Two genes (OsBURP05 and OsBURP16) were responsive to all the stress treatments and most of the OsBURP genes were responsive to salt stress. Promoter sequence analysis revealed an over-abundance of stress-related cis-elements in the stress-responsive genes. The data presented here provide important clues for elucidating the functions of genes of this family. PMID:19363683

  10. Genome Structure of the Legume, Lotus japonicus

    PubMed Central

    Sato, Shusei; Nakamura, Yasukazu; Kaneko, Takakazu; Asamizu, Erika; Kato, Tomohiko; Nakao, Mitsuteru; Sasamoto, Shigemi; Watanabe, Akiko; Ono, Akiko; Kawashima, Kumiko; Fujishiro, Tsunakazu; Katoh, Midori; Kohara, Mitsuyo; Kishida, Yoshie; Minami, Chiharu; Nakayama, Shinobu; Nakazaki, Naomi; Shimizu, Yoshimi; Shinpo, Sayaka; Takahashi, Chika; Wada, Tsuyuko; Yamada, Manabu; Ohmido, Nobuko; Hayashi, Makoto; Fukui, Kiichi; Baba, Tomoya; Nakamichi, Tomoko; Mori, Hirotada; Tabata, Satoshi

    2008-01-01

    The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10 951 complete and 19 848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes. PMID:18511435

  11. Evolution of the P-type II ATPase gene family in the fungi and presence of structural genomic changes among isolates of Glomus intraradices

    PubMed Central

    Corradi, Nicolas; Sanders, Ian R

    2006-01-01

    Background The P-type II ATPase gene family encodes proteins with an important role in adaptation of the cell to variation in external K+, Ca2+ and Na2+ concentrations. The presence of P-type II gene subfamilies that are specific for certain kingdoms has been reported but was sometimes contradicted by discovery of previously unknown homologous sequences in newly sequenced genomes. Members of this gene family have been sampled in all of the fungal phyla except the arbuscular mycorrhizal fungi (AMF; phylum Glomeromycota), which are known to play a key-role in terrestrial ecosystems and to be genetically highly variable within populations. Here we used highly degenerate primers on AMF genomic DNA to increase the sampling of fungal P-Type II ATPases and to test previous predictions about their evolution. In parallel, homologous sequences of the P-type II ATPases have been used to determine the nature and amount of polymorphism that is present at these loci among isolates of Glomus intraradices harvested from the same field. Results In this study, four P-type II ATPase sub-families have been isolated from three AMF species. We show that, contrary to previous predictions, P-type IIC ATPases are present in all basal fungal taxa. Additionally, P-Type IIE ATPases should no longer be considered as exclusive to the Ascomycota and the Basidiomycota, since we also demonstrate their presence in the Zygomycota. Finally, a comparison of homologous sequences encoding P-type IID ATPases showed unexpectedly that indel mutations among coding regions, as well as specific gene duplications occur among AMF individuals within the same field. Conclusion On the basis of these results we suggest that the diversification of P-Type IIC and E ATPases followed the diversification of the extant fungal phyla with independent events of gene gains and losses. Consistent with recent findings on the human genome, but at a much smaller geographic scale, we provided evidence that structural genomic

  12. Informational laws of genome structures

    PubMed Central

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-01-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined. PMID:27354155

  13. Informational laws of genome structures.

    PubMed

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-01-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined. PMID:27354155

  14. Informational laws of genome structures

    NASA Astrophysics Data System (ADS)

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-06-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.

  15. Connected gene neighborhoods in prokaryotic genomes

    PubMed Central

    Rogozin, Igor B.; Makarova, Kira S.; Murvai, Janos; Czabarka, Eva; Wolf, Yuri I.; Tatusov, Roman L.; Szekely, Laszlo A.; Koonin, Eugene V.

    2002-01-01

    A computational method was developed for delineating connected gene neighborhoods in bacterial and archaeal genomes. These gene neighborhoods are not typically present, in their entirety, in any single genome, but are held together by overlapping, partially conserved gene arrays. The procedure was applied to comparing the orders of orthologous genes, which were extracted from the database of Clusters of Orthologous Groups of proteins (COGs), in 31 prokaryotic genomes and resulted in the identification of 188 clusters of gene arrays, which included 1001 of 2890 COGs. These clusters were projected onto actual genomes to produce extended neighborhoods including additional genes, which are adjacent to the genes from the clusters and are transcribed in the same direction, which resulted in a total of 2387 COGs being included in the neighborhoods. Most of the neighborhoods consist predominantly of genes united by a coherent functional theme, but also include a minority of genes without an obvious functional connection to the main theme. We hypothesize that although some of the latter genes might have unsuspected roles, others are maintained within gene arrays because of the advantage of expression at a level that is typical of the given neighborhood. We designate this phenomenon ‘genomic hitchhiking’. The largest neighborhood includes 79 genes (COGs) and consists of overlapping, rearranged ribosomal protein superoperons; apparent genome hitchhiking is particularly typical of this neighborhood and other neighborhoods that consist of genes coding for translation machinery components. Several neighborhoods involve previously undetected connections between genes, allowing new functional predictions. Gene neighborhoods appear to evolve via complex rearrangement, with different combinations of genes from a neighborhood fixed in different lineages. PMID:12000841

  16. Multiple genome alignment for identifying the core structure among moderately related microbial genomes

    PubMed Central

    Uchiyama, Ikuo

    2008-01-01

    Background Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. Results The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. Conclusion The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes. PMID:18976470

  17. Identification of genes in genomic and EST sequences

    SciTech Connect

    Fields, C.; Adams, M.D.; Kerlavage, A.R.; Dubnick, M.; McCombie, W.R.; Martin-Gallardo, A.; Venter, J.C.; White, O.

    1993-12-31

    Currently-available software tools are capable of predicting the locations of most protein-coding genes in anonymous genomic DNA sequences. The use of predicted exxon to select primers for PCR amplification from cDNA libraries allows the complete structures of novel genes to be determined efficiently. As the number of expressed sequence tag (EST) sequences increases, the fraction of genes that can be localized in genomic sequences by searching EST databases will rapidly approach unity. The challenge for automated DNA sequence analysis is now to develop methods for accurately predicting gene structure and alternative splicing patterns. Substantially improving current accuracies in gene structure prediction will require retrospective comparative analysis of sequences from different organisms and gene families.

  18. Directed self-assembly, genomic assembly complexity and the formation of biological structure, or, what are the genes for nacre?

    PubMed

    Cartwright, Julyan H E

    2016-03-13

    Biology uses dynamical mechanisms of self-organization and self-assembly of materials, but it also choreographs and directs these processes. The difference between abiotic self-assembly and a biological process is rather like the difference between setting up and running an experiment to make a material remotely compared with doing it in one's own laboratory: with a remote experiment-say on the International Space Station-everything must be set up beforehand to let the experiment run 'hands off', but in the laboratory one can intervene at any point in a 'hands-on' approach. It is clear that the latter process, of directed self-assembly, can allow much more complicated experiments and produce far more complex structures than self-assembly alone. This control over self-assembly in biology is exercised at certain key waypoints along a trajectory and the process may be quantified in terms of the genomic assembly complexity of a biomaterial. PMID:26857670

  19. Ligninolytic peroxidase genes in the oyster mushroom genome: heterologous expression, molecular structure, catalytic and stability properties, and lignin-degrading ability

    PubMed Central

    2014-01-01

    Background The genome of Pleurotus ostreatus, an important edible mushroom and a model ligninolytic organism of interest in lignocellulose biorefineries due to its ability to delignify agricultural wastes, was sequenced with the purpose of identifying and characterizing the enzymes responsible for lignin degradation. Results Heterologous expression of the class II peroxidase genes, followed by kinetic studies, enabled their functional classification. The resulting inventory revealed the absence of lignin peroxidases (LiPs) and the presence of three versatile peroxidases (VPs) and six manganese peroxidases (MnPs), the crystal structures of two of them (VP1 and MnP4) were solved at 1.0 to 1.1 Å showing significant structural differences. Gene expansion supports the importance of both peroxidase types in the white-rot lifestyle of this fungus. Using a lignin model dimer and synthetic lignin, we showed that VP is able to degrade lignin. Moreover, the dual Mn-mediated and Mn-independent activity of P. ostreatus MnPs justifies their inclusion in a new peroxidase subfamily. The availability of the whole POD repertoire enabled investigation, at a biochemical level, of the existence of duplicated genes. Differences between isoenzymes are not limited to their kinetic constants. Surprising differences in their activity T50 and residual activity at both acidic and alkaline pH were observed. Directed mutagenesis and spectroscopic/structural information were combined to explain the catalytic and stability properties of the most interesting isoenzymes, and their evolutionary history was analyzed in the context of over 200 basidiomycete peroxidase sequences. Conclusions The analysis of the P. ostreatus genome shows a lignin-degrading system where the role generally played by LiP has been assumed by VP. Moreover, it enabled the first characterization of the complete set of peroxidase isoenzymes in a basidiomycete, revealing strong differences in stability properties and providing

  20. JGI Plant Genomics Gene Annotation Pipeline

    SciTech Connect

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  1. Gene Insertion Into Genomic Safe Harbors for Human Gene Therapy.

    PubMed

    Papapetrou, Eirini P; Schambach, Axel

    2016-04-01

    Genomic safe harbors (GSHs) are sites in the genome able to accommodate the integration of new genetic material in a manner that ensures that the newly inserted genetic elements: (i) function predictably and (ii) do not cause alterations of the host genome posing a risk to the host cell or organism. GSHs are thus ideal sites for transgene insertion whose use can empower functional genetics studies in basic research and therapeutic applications in human gene therapy. Currently, no fully validated GSHs exist in the human genome. Here, we review our formerly proposed GSH criteria and discuss additional considerations on extending these criteria, on strategies for the identification and validation of GSHs, as well as future prospects on GSH targeting for therapeutic applications. In view of recent advances in genome biology, gene targeting technologies, and regenerative medicine, gene insertion into GSHs can potentially catalyze nearly all applications in human gene therapy. PMID:26867951

  2. Genes but Not Genomes Reveal Bacterial Domestication of Lactococcus Lactis

    PubMed Central

    Passerini, Delphine; Beltramo, Charlotte; Coddeville, Michele; Quentin, Yves; Ritzenthaler, Paul

    2010-01-01

    Background The population structure and diversity of Lactococcus lactis subsp. lactis, a major industrial bacterium involved in milk fermentation, was determined at both gene and genome level. Seventy-six lactococcal isolates of various origins were studied by different genotyping methods and thirty-six strains displaying unique macrorestriction fingerprints were analyzed by a new multilocus sequence typing (MLST) scheme. This gene-based analysis was compared to genomic characteristics determined by pulsed-field gel electrophoresis (PFGE). Methodology/Principal Findings The MLST analysis revealed that L. lactis subsp. lactis is essentially clonal with infrequent intra- and intergenic recombination; also, despite its taxonomical classification as a subspecies, it displays a genetic diversity as substantial as that within several other bacterial species. Genome-based analysis revealed a genome size variability of 20%, a value typical of bacteria inhabiting different ecological niches, and that suggests a large pan-genome for this subspecies. However, the genomic characteristics (macrorestriction pattern, genome or chromosome size, plasmid content) did not correlate to the MLST-based phylogeny, with strains from the same sequence type (ST) differing by up to 230 kb in genome size. Conclusion/Significance The gene-based phylogeny was not fully consistent with the traditional classification into dairy and non-dairy strains but supported a new classification based on ecological separation between “environmental” strains, the main contributors to the genetic diversity within the subspecies, and “domesticated” strains, subject to recent genetic bottlenecks. Comparison between gene- and genome-based analyses revealed little relationship between core and dispensable genome phylogenies, indicating that clonal diversification and phenotypic variability of the “domesticated” strains essentially arose through substantial genomic flux within the dispensable genome

  3. Reconstruction of ancestral gene orders using intermediate genomes

    PubMed Central

    2015-01-01

    Background The problem of reconstructing ancestral genomes in a given phylogenetic tree arises in many different comparative genomics fields. Here, we focus on reconstructing the gene order of ancestral genomes, a problem that has been largely studied in the past 20 years, especially with the increasing availability of whole genome DNA sequences. There are two main approaches to this problem: event-based methods, that try to find the ancestral genomes that minimize the number of rearrangement events in the tree; and homology-based, that look for conserved structures, such as adjacent genes in the extant genomes, to build the ancestral genomes. Results We propose algorithms that use the concept of intermediate genomes, arising in optimal pairwise rearrangement scenarios. We show that intermediate genomes have combinatorial properties that make them easy to reconstruct, and develop fast algorithms with better reconstructed ancestral genomes than current event-based methods. The proposed framework is also designed to accept extra information, such as results from homology-based approaches, giving rise to combined algorithms with better results than the original methods. PMID:26451811

  4. Mobilized retrotransposon Tos17 of rice by alien DNA introgression transposes into genes and causes structural and methylation alterations of a flanking genomic region.

    PubMed

    Han, F P; Liu, Z L; Tan, M; Hao, S; Fedak, G; Liu, B

    2004-01-01

    Tos17 is a copia-like endogenous retrotransposon of rice, which can be activated by various stresses such as tissue culture and alien DNA introgression. To confirm element mobilization by introgression and to study possible structural and epigenetic effects of Tos17 insertion on its target sequences, we isolated all flanking regions of Tos17 in an introgressed rice line (Tong35) that contains minute amount of genomic DNA from wild rice (Zizania latifolia). It was found that there has been apparent but limited mobilization of Tos17 in this introgression line, as being reflected by increased but stable copy number of the element in progeny of the line. Three of the five activated copies of the element have transposed into genes. Based on sequence analysis and Southern blot hybridization with several double-enzyme digests, no structural change in Tos17 could be inferred in the introgression line. Cytosine methylation status at all seven CCGG sites within Tos17 was also identical between the introgression line and its rice parent (Matsumae)-all sites being heavily methylated. In contrast, changes in structure and cytosine methylation patterns were detected in one of the three low-copy genomic regions that flank newly transposed Tos17, and all changes are stably inherited through selfed generations. PMID:15703040

  5. Genome evolution in maize: from genomes back to genes.

    PubMed

    Schnable, James C

    2015-01-01

    Maize occupies dual roles as both (a) one of the big-three grain species (along with rice and wheat) responsible for providing more than half of the calories consumed around the world, and (b) a model system for plant genetics and cytogenetics dating back to the origin of the field of genetics in the early twentieth century. The long history of genetic investigation in this species combined with modern genomic and quantitative genetic data has provided particular insight into the characteristics of genes linked to phenotypes and how these genes differ from many other sequences in plant genomes that are not easily distinguishable based on molecular data alone. These recent results suggest that the number of genes in plants that make significant contributions to phenotype may be lower than the number of genes defined by current molecular criteria, and also indicate that syntenic conservation has been underemphasized as a marker for gene function. PMID:25494463

  6. Genome-wide characterization of maize miRNA genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MicroRNAs (miRNAs) are small non-coding RNAs that play essential roles in plant growth and development. We conducted a genome-wide survey of maize miRNA genes, characterizing their structure, expression, and evolution. Computational approaches based on homology and secondary structure modeling ident...

  7. Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays

    PubMed Central

    Mak, Angel C. Y.; Lai, Yvonne Y. Y.; Lam, Ernest T.; Kwok, Tsz-Piu; Leung, Alden K. Y.; Poon, Annie; Mostovoy, Yulia; Hastie, Alex R.; Stedman, William; Anantharaman, Thomas; Andrews, Warren; Zhou, Xiang; Pang, Andy W. C.; Dai, Heng; Chu, Catherine; Lin, Chin; Wu, Jacob J. K.; Li, Catherine M. L.; Li, Jing-Woei; Yim, Aldrin K. Y.; Chan, Saki; Sibert, Justin; Džakula, Željko; Cao, Han; Yiu, Siu-Ming; Chan, Ting-Fung; Yip, Kevin Y.; Xiao, Ming; Kwok, Pui-Yan

    2016-01-01

    Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation. PMID:26510793

  8. From gene action to reactive genomes

    PubMed Central

    Keller, Evelyn Fox

    2014-01-01

    Poised at a critical turning point in the history of genetics, recent work (e.g. in genomics, epigenetics, genomic plasticity) obliges us to critically reexamine many of our most basic concepts. For example, I argue that genomic research supports a radical transformation in our understanding of the genome – a shift from an earlier conception of that entity as an effectively static collection of active genes to that of a dynamic and reactive system dedicated to the context specific regulation of protein-coding sequences. PMID:24882822

  9. Pichia stipitis genomics, transcriptomics, and gene clusters

    PubMed Central

    Jeffries, Thomas W; Van Vleet, Jennifer R Headman

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the result of several gene products acting together. When coinheritance is necessary for the overall physiological function, recombination and selection favor colocation of these genes in a cluster. These are particularly evident in strongly conserved and idiomatic traits. In some cases, the functional clusters consist of multiple gene families. Phylogenetic analyses of the members in each family show that once formed, functional clusters undergo duplication and differentiation. Genome-wide expression analysis reveals that regulatory patterns of clusters are similar after they have duplicated and that the expression profiles evolve along with functional differentiation of the clusters. Orthologous gene families appear to arise through tandem gene duplication, followed by differentiation in the regulatory and coding regions of the gene. Genome-wide expression analysis combined with cross-species comparisons of functional gene clusters should reveal many more aspects of eukaryotic physiology. PMID:19659741

  10. Recurrent Gene Duplication Diversifies Genome Defense Repertoire in Drosophila.

    PubMed

    Levine, Mia T; Vander Wende, Helen M; Hsieh, Emily; Baker, EmilyClare P; Malik, Harmit S

    2016-07-01

    Transposable elements (TEs) comprise large fractions of many eukaryotic genomes and imperil host genome integrity. The host genome combats these challenges by encoding proteins that silence TE activity. Both the introduction of new TEs via horizontal transfer and TE sequence evolution requires constant innovation of host-encoded TE silencing machinery to keep pace with TEs. One form of host innovation is the adaptation of existing, single-copy host genes. Indeed, host suppressors of TE replication often harbor signatures of positive selection. Such signatures are especially evident in genes encoding the piwi-interacting-RNA pathway of gene silencing, for example, the female germline-restricted TE silencer, HP1D/Rhino Host genomes can also innovate via gene duplication and divergence. However, the importance of gene family expansions, contractions, and gene turnover to host genome defense has been largely unexplored. Here, we functionally characterize Oxpecker, a young, tandem duplicate gene of HP1D/rhino We demonstrate that Oxpecker supports female fertility in Drosophila melanogaster and silences several TE families that are incompletely silenced by HP1D/Rhino in the female germline. We further show that, like Oxpecker, at least ten additional, structurally diverse, HP1D/rhino-derived daughter and "granddaughter" genes emerged during a short 15-million year period of Drosophila evolution. These young paralogs are transcribed primarily in germline tissues, where the genetic conflict between host genomes and TEs plays out. Our findings suggest that gene family expansion is an underappreciated yet potent evolutionary mechanism of genome defense diversification. PMID:26979388

  11. Selecting soluble/foldable protein domains through single-gene or genomic ORF filtering: structure of the head domain of Burkholderia pseudomallei antigen BPSL2063.

    PubMed

    Gourlay, Louise J; Peano, Clelia; Deantonio, Cecilia; Perletti, Lucia; Pietrelli, Alessandro; Villa, Riccardo; Matterazzo, Elena; Lassaux, Patricia; Santoro, Claudio; Puccio, Simone; Sblattero, Daniele; Bolognesi, Martino

    2015-11-01

    The 1.8 Å resolution crystal structure of a conserved domain of the potential Burkholderia pseudomallei antigen and trimeric autotransporter BPSL2063 is presented as a structural vaccinology target for melioidosis vaccine development. Since BPSL2063 (1090 amino acids) hosts only one conserved domain, and the expression/purification of the full-length protein proved to be problematic, a domain-filtering library was generated using β-lactamase as a reporter gene to select further BPSL2063 domains. As a result, two domains (D1 and D2) were identified and produced in soluble form in Escherichia coli. Furthermore, as a general tool, a genomic open reading frame-filtering library from the B. pseudomallei genome was also constructed to facilitate the selection of domain boundaries from the entire ORFeome. Such an approach allowed the selection of three potential protein antigens that were also produced in soluble form. The results imply the further development of ORF-filtering methods as a tool in protein-based research to improve the selection and production of soluble proteins or domains for downstream applications such as X-ray crystallography. PMID:26527140

  12. CiMT-1, an unusual chordate metallothionein gene in Ciona intestinalis genome: structure and expression studies.

    PubMed

    Franchi, Nicola; Boldrin, Francesco; Ballarin, Loriano; Piccinni, Ester

    2011-02-01

    The present article reports on the characterization of the urochordate metallothionein (MT) gene, CiMT-1, from the solitary ascidian Ciona intestinalis. The predicted protein is shorter than other known deuterostome MTs, having only 39 amino acids. The gene has the same tripartite structure as vertebrate MTs, with some features resembling those of echinoderm MTs. The promoter region shows the canonical cis-acting elements recognized by transcription factors that respond to metal, ROS, and cytokines. Unusual sequences, described in fish and echinoderms, are also present. In situ hybridization suggests that only a population of hemocytes involved in immune responses, i.e. granular amebocytes, express CiMT-1 mRNA. These observations support the idea that urochordates perform detoxification through hemocytes, and that MTs may play important roles in inflammatory humoral responses in tunicates. The reported data offer new clues for better understanding the evolution of these multivalent proteins from non-vertebrate to vertebrate chordates and reinforce their functions in detoxification and immunity. PMID:21328559

  13. PGDD: a database of gene and genome duplication in plants

    PubMed Central

    Lee, Tae-Ho; Tang, Haibao; Wang, Xiyin; Paterson, Andrew H.

    2013-01-01

    Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at http://chibba.agtec.uga.edu/duplication/), a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD. PMID:23180799

  14. Structural Genomics of Protein Phosphatases

    SciTech Connect

    Almo,S.; Bonanno, J.; Sauder, J.; Emtage, S.; Dilorenzo, T.; Malashkevich, V.; Wasserman, S.; Swaminathan, S.; Eswaramoorthy, S.; et al

    2007-01-01

    The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptional regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.

  15. Gene duplication and transfer events in plant mitochondria genome

    SciTech Connect

    Xiong Aisheng Peng Rihe; Zhuang Jing; Gao Feng; Zhu Bo; Fu Xiaoyan; Xue Yong; Jin Xiaofen; Tian Yongsheng; Zhao Wei; Yao Quanhong

    2008-11-07

    Gene or genome duplication events increase the amount of genetic material available to increase the genomic, and thereby phenotypic, complexity of organisms during evolution. Gene duplication and transfer events have been important to molecular evolution in all three domains of life, and may be the first step in the emergence of new gene functions. Gene transfer events have been proposed as another accelerator of evolution. The duplicated gene or genome, mainly nuclear, has been the subject of several recent reviews. In addition to the nuclear genome, organisms have organelle genomes, including mitochondrial genome. In this review, we briefly summarize gene duplication and transfer events in the plant mitochondrial genome.

  16. Characterization of the Wilson disease gene: Genomic organization; alternative splicing; structure/function predictions; and population frequencies of disease-specific mutations

    SciTech Connect

    Petrukhin, K.; Chernov, I.; Ross, B.M.

    1994-09-01

    The Wilson disease (WD) gene has recently been identified as a putative copper-transporting ATPase with high amino acid similarity with the Menkes disease (MNK) gene. We have further characterized the WD gene by extending the 5{prime}-coding and non-coding DNA sequence and elucidating the intron/exon structure and genomic organization. Analysis of RNA transcripts from liver, brain, kidney and placenta reveals extensive alternative splicing which may provide a mechanism to regulate the quantity of functional protein product. Comparative sequence analysis shows that WD and MNK belong to the sub-family of heavy metal-transporting ATPases with several characterizing features which include unique amino acid motifs and distinct N-terminal and C-terminal transmembrane structure. Our data indicate that the 600 amino acid metal binding portion of the WD and MNK proteins was formed by gene duplication events and splicing of the 6 metal binding domain segment to a common ancestral protein. We have raised a WD-specific anti-peptide antibody to the N-terminal region and are beginning to explore the cellular and intracellular location of the WD protein. The metal-binding segment of the WD protein has been expressed in E. coli and metal binding assays are underway to characterize this aspect of the protein`s function. We have identified numerous disease-specific mutations and developed a rapid {open_quotes}reverse dot blot{close_quotes} screening protocol to determine mutation frequencies in different populations. The most common mutation disrupts the characteristic SEHP motif and accounts for more than 40% of WD cases in North American, Russian, and Swedish populations. This mutation has not been observed in our limited Sicilian sample.

  17. Using Genomics for Natural Product Structure Elucidation.

    PubMed

    Tietz, Jonathan I; Mitchell, Douglas A

    2016-01-01

    Natural products (NPs) are the most historically bountiful source of chemical matter for drug development-especially for anti-infectives. With insights gleaned from genome mining, interest in natural product discovery has been reinvigorated. An essential stage in NP discovery is structural elucidation, which sheds light not only on the chemical composition of a molecule but also its novelty, properties, and derivatization potential. The history of structure elucidation is replete with techniquebased revolutions: combustion analysis, crystallography, UV, IR, MS, and NMR have each provided game-changing advances; the latest such advance is genomics. All natural products have a genetic basis, and the ability to obtain and interpret genomic information for structure elucidation is increasingly available at low cost to non-specialists. In this review, we describe the value of genomics as a structural elucidation technique, especially from the perspective of the natural product chemist approaching an unknown metabolite. Herein we first introduce the databases and programs of interest to the natural products chemist, with an emphasis on those currently most suited for general usability. We describe strategies for linking observed natural product-linked phenotypes to their corresponding gene clusters. We then discuss techniques for extracting structural information from genes, illustrated with numerous case examples. We also provide an analysis of the biases and limitations of the field with recommendations for future development. Our overview is not only aimed at biologically-oriented researchers already at ease with bioinformatic techniques, but also, in particular, at natural product, organic, and/or medicinal chemists not previously familiar with genomic techniques. PMID:26456468

  18. Bacterial Cellular Engineering by Genome Editing and Gene Silencing

    PubMed Central

    Nakashima, Nobutaka; Miyazaki, Kentaro

    2014-01-01

    Genome editing is an important technology for bacterial cellular engineering, which is commonly conducted by homologous recombination-based procedures, including gene knockout (disruption), knock-in (insertion), and allelic exchange. In addition, some new recombination-independent approaches have emerged that utilize catalytic RNAs, artificial nucleases, nucleic acid analogs, and peptide nucleic acids. Apart from these methods, which directly modify the genomic structure, an alternative approach is to conditionally modify the gene expression profile at the posttranscriptional level without altering the genomes. This is performed by expressing antisense RNAs to knock down (silence) target mRNAs in vivo. This review describes the features and recent advances on methods used in genomic engineering and silencing technologies that are advantageously used for bacterial cellular engineering. PMID:24552876

  19. Complete plastid genome sequence of Vaccinium macrocarpon: structure, gene content and rearrangements revealed by next generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete plastid genome sequence of the American cranberry was reconstructed using next-generation sequencing data by in silico procedures. We used Roche 454 shotgun sequence data to isolate cranberry plastid-specific sequences of the cultivar ‘HyRed’ via homology comparisons with complete seque...

  20. Regulation of methane genes and genome expression

    SciTech Connect

    John N. Reeve

    2009-09-09

    At the start of this project, it was known that methanogens were Archaeabacteria (now Archaea) and were therefore predicted to have gene expression and regulatory systems different from Bacteria, but few of the molecular biology details were established. The goals were then to establish the structures and organizations of genes in methanogens, and to develop the genetic technologies needed to investigate and dissect methanogen gene expression and regulation in vivo. By cloning and sequencing, we established the gene and operon structures of all of the “methane” genes that encode the enzymes that catalyze methane biosynthesis from carbon dioxide and hydrogen. This work identified unique sequences in the methane gene that we designated mcrA, that encodes the largest subunit of methyl-coenzyme M reductase, that could be used to identify methanogen DNA and establish methanogen phylogenetic relationships. McrA sequences are now the accepted standard and used extensively as hybridization probes to identify and quantify methanogens in environmental research. With the methane genes in hand, we used northern blot and then later whole-genome microarray hybridization analyses to establish how growth phase and substrate availability regulated methane gene expression in Methanobacterium thermautotrophicus ΔH (now Methanothermobacter thermautotrophicus). Isoenzymes or pairs of functionally equivalent enzymes catalyze several steps in the hydrogen-dependent reduction of carbon dioxide to methane. We established that hydrogen availability determine which of these pairs of methane genes is expressed and therefore which of the alternative enzymes is employed to catalyze methane biosynthesis under different environmental conditions. As were unable to establish a reliable genetic system for M. thermautotrophicus, we developed in vitro transcription as an alternative system to investigate methanogen gene expression and regulation. This led to the discovery that an archaeal protein

  1. Genome-wide SNPs and re-sequencing of growth habit and inflorescence genes in barley: implications for association mapping in germplasm arrays varying in size and structure

    PubMed Central

    2010-01-01

    association analyses - with SNP data only - in the larger germplasm arrays. For both vernalization sensitivity and inflorescence type, the most significant associations in the larger data sets were found with SNPs coincident with the synthetic markers used in the CAP Core and with SNPs detected via interaction analysis in the CAP Core. Conclusions Small and highly structured collections of germplasm, such as the CAP Core, are cost-effectively phenotyped and genotyped with high-throughput markers. They are also useful for characterizing allelic diversity at loci in germplasm of interest. Our results suggest that discovery-oriented exercises in AM in such small arrays may generate a large number of false-positives. However, if haplotypes in candidate genes are available, they may be used as anchors in an analysis of interactions to identify other candidate regions harboring genes determining target traits. Using larger germplasm arrays, genome regions where the principal genes determining vernalization sensitivity and row type are located were identified. PMID:21159198

  2. Unmet Challenges of Structural Genomics

    PubMed Central

    Chruszcz, Maksymilian; Domagalski, Marcin; Osinski, Tomasz; Wlodawer, Alexander; Minor, Wladek

    2010-01-01

    Summary Structural genomics (SG) programs have developed during the last decade many novel methodologies for faster and more accurate structure determination. These new tools and approaches led to determination of thousands of protein structures. The generation of enormous amounts of experimental data resulted in significant improvements in the understanding of many biological processes at molecular levels. However, the amount of data collected so far is so large that traditional analysis methods are limiting the rate of extraction of biological and biochemical information from 3-D models. This situation has prompted us to review the challenges that remain unmet by structural genomics, as well as the areas in which the potential impact of SG could exceed what has been achieved so far. PMID:20810277

  3. Haemonchus contortus: Genome Structure, Organization and Comparative Genomics.

    PubMed

    Laing, R; Martinelli, A; Tracey, A; Holroyd, N; Gilleard, J S; Cotton, J A

    2016-01-01

    One of the first genome sequencing projects for a parasitic nematode was that for Haemonchus contortus. The open access data from the Wellcome Trust Sanger Institute provided a valuable early resource for the research community, particularly for the identification of specific genes and genetic markers. Later, a second sequencing project was initiated by the University of Melbourne, and the two draft genome sequences for H. contortus were published back-to-back in 2013. There is a pressing need for long-range genomic information for genetic mapping, population genetics and functional genomic studies, so we are continuing to improve the Wellcome Trust Sanger Institute assembly to provide a finished reference genome for H. contortus. This review describes this process, compares the H. contortus genome assemblies with draft genomes from other members of the strongylid group and discusses future directions for parasite genomics using the H. contortus model. PMID:27238013

  4. Structural variations in plant genomes

    PubMed Central

    Edwards, David; Varshney, Rajeev K.

    2014-01-01

    Differences between plant genomes range from single nucleotide polymorphisms to large-scale duplications, deletions and rearrangements. The large polymorphisms are termed structural variants (SVs). SVs have received significant attention in human genetics and were found to be responsible for various chronic diseases. However, little effort has been directed towards understanding the role of SVs in plants. Many recent advances in plant genetics have resulted from improvements in high-resolution technologies for measuring SVs, including microarray-based techniques, and more recently, high-throughput DNA sequencing. In this review we describe recent reports of SV in plants and describe the genomic technologies currently used to measure these SVs. PMID:24907366

  5. Genes, genomes and identity. Projections on matter.

    PubMed

    Hauskeller, Christine

    2004-12-01

    This paper aims to show that references to genes and genomes are counterproductive in legal and political understandings of what it is to be human and a unique individual. To support this claim, I will give a brief overview of the many incompatible meanings the term 'identity' has gathered in reference to genes or genome in the contexts of biology and family ancestry, personal identity, species identity. One finds various and incompatible understandings of these expressions. While genetics is usually considered to deliver definitive knowledge about history and the future, genomics seems to work with more complicated relations between DNA, inheritance and phenotype. In genomics, 'identity' is no longer about identification and status markers but about individualization. Regulatory and legal documents project from traits to genomes, implying that individuality is at least represented, if not created, in a unique genome. Boundaries between humans and other animals, between different 'kinds' of humans, and between all individual humans are re-established via reference to the chemical matter of DNA. My analysis will show how this trend is a reactionary response to modern understandings of identities as social products and that it ignores new biomedical understandings of human bodies. PMID:15828152

  6. Structural characterization of genomes by large scale sequence-structure threading: application of reliability analysis in structural genomics

    PubMed Central

    Cherkasov, Artem; Ho Sui, Shannan J; Brunham, Robert C; Jones, Steven JM

    2004-01-01

    Background We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, maintainability and safety work to model time-to-failure of mechanical devices, mechanisms, building constructions and equipment. Results We have found that the Weibull function describes protein fold distribution within and among genomes more accurately than conventional power functions which have been used in a number of structural genomic studies reported to date. It has also been found that the Weibull reliability parameter β for protein fold distributions varies between genomes and may reflect differences in rates of gene duplication in evolutionary history of organisms. Conclusions The results of this work demonstrate that reliability analysis can provide useful insights and testable predictions in the fields of comparative and structural genomics. PMID:15274750

  7. Gene Fusion: A Genome Wide Survey

    NASA Technical Reports Server (NTRS)

    Liang, Ping; Riley, Monica

    2001-01-01

    As a well known fact, organisms form larger and complex multimodular (composite or chimeric) and mostly multi-functional proteins through gene fusion of two or more individual genes which have independent evolution histories and functions. We call each of these components a module. The existence of multimodular proteins may improves the efficiency in gene regulation and in cellular functions, and thus may give the host organism advantages in adaptation to environments. Analysis of all gene fusions in present-day organisms should allow us to examine the patterns of gene fusion in context with cellular functions, to trace back the evolution processes from the ancient smaller and uni-functional proteins to the present-day larger and complex multi-functional proteins, and to estimate the minimal number of ancestor proteins that existed in the last common ancestor for all life on earth. Although many multimodular proteins have been experimentally known, identification of gene fusion events systematically at genome scale had not been possible until recently when large number of completed genome sequences have been becoming available. In addition, technical difficulties for such analysis also exist due to the complexity of this biological and evolutionary process. We report from this study a new strategy to computationally identify multimodular proteins using completed genome sequences and the results surveyed from 22 organisms with the data from over 40 organisms to be presented during the meeting. Additional information is contained in the original extended abstract.

  8. Comparative Genomics in Identifying Aflatoxin Biosynthetic Genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus produces the most toxic and the most carcinogenic mycotoxins, aflatoxin B1 and B2. In order to solve aflatoxin contamination of food commodities, A. flavus genomics tools for identification of genes involved in aflatoxin biosynthesis have been employed. A. flavus Expressed Seque...

  9. Chromatin Structure Regulates Gene Conversion

    PubMed Central

    Cummings, W. Jason; Yabuki, Munehisa; Ordinario, Ellen C; Bednarski, David W; Quay, Simon; Maizels, Nancy

    2007-01-01

    Homology-directed repair is a powerful mechanism for maintaining and altering genomic structure. We asked how chromatin structure contributes to the use of homologous sequences as donors for repair using the chicken B cell line DT40 as a model. In DT40, immunoglobulin genes undergo regulated sequence diversification by gene conversion templated by pseudogene donors. We found that the immunoglobulin Vλ pseudogene array is characterized by histone modifications associated with active chromatin. We directly demonstrated the importance of chromatin structure for gene conversion, using a regulatable experimental system in which the heterochromatin protein HP1 (Drosophila melanogaster Su[var]205), expressed as a fusion to Escherichia coli lactose repressor, is tethered to polymerized lactose operators integrated within the pseudo-Vλ donor array. Tethered HP1 diminished histone acetylation within the pseudo-Vλ array, and altered the outcome of Vλ diversification, so that nontemplated mutations rather than templated mutations predominated. Thus, chromatin structure regulates homology-directed repair. These results suggest that histone modifications may contribute to maintaining genomic stability by preventing recombination between repetitive sequences. PMID:17880262

  10. Structural Genomics of Minimal Organisms: Pipeline and Results

    SciTech Connect

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2007-09-14

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93percent of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  11. Horizontal gene transfer, genome innovation and evolution.

    PubMed

    Gogarten, J Peter; Townsend, Jeffrey P

    2005-09-01

    To what extent is the tree of life the best representation of the evolutionary history of microorganisms? Recent work has shown that, among sets of prokaryotic genomes in which most homologous genes show extremely low sequence divergence, gene content can vary enormously, implying that those genes that are variably present or absent are frequently horizontally transferred. Traditionally, successful horizontal gene transfer was assumed to provide a selective advantage to either the host or the gene itself, but could horizontally transferred genes be neutral or nearly neutral? We suggest that for many prokaryotes, the boundaries between species are fuzzy, and therefore the principles of population genetics must be broadened so that they can be applied to higher taxonomic categories. PMID:16138096

  12. Complete structure, genomic organization, and expression of channel catfish (Ictalurus punctatus, Rafinesque 1818) matrix metalloproteinase-9 gene

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In the course of studying pathogenesis of enteric septicemia of catfish, we noted that the channel catfish (CC) matrix metalloproteinase-9 (MMP-9) expressed sequence tag (EST) was up-regulated after early Edwardsiella ictaluri infection. In this study, the CC MMP-9 gene was cloned, sequenced and ch...

  13. Genomic Prediction of Gene Bank Wheat Landraces.

    PubMed

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J; Wenzl, Peter; Singh, Sukhwinder

    2016-01-01

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, "diversity" and "prediction", including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15-20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials

  14. Genomic Prediction of Gene Bank Wheat Landraces

    PubMed Central

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J.; Wenzl, Peter; Singh, Sukhwinder

    2016-01-01

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite

  15. Functional Insights from Structural Genomics

    SciTech Connect

    Forouhar,F.; Kuzin, A.; Seetharaman, J.; Lee, I.; Zhou, W.; Abashidze, M.; Chen, Y.; Montelione, G.; Tong, L.; et al

    2007-01-01

    Structural genomics efforts have produced structural information, either directly or by modeling, for thousands of proteins over the past few years. While many of these proteins have known functions, a large percentage of them have not been characterized at the functional level. The structural information has provided valuable functional insights on some of these proteins, through careful structural analyses, serendipity, and structure-guided functional screening. Some of the success stories based on structures solved at the Northeast Structural Genomics Consortium (NESG) are reported here. These include a novel methyl salicylate esterase with important role in plant innate immunity, a novel RNA methyltransferase (H. influenzae yggJ (HI0303)), a novel spermidine/spermine N-acetyltransferase (B. subtilis PaiA), a novel methyltransferase or AdoMet binding protein (A. fulgidus AF{_}0241), an ATP:cob(I)alamin adenosyltransferase (B. subtilis YvqK), a novel carboxysome pore (E. coli EutN), a proline racemase homolog with a disrupted active site (B. melitensis BME11586), an FMN-dependent enzyme (S. pneumoniae SP{_}1951), and a 12-stranded {beta}-barrel with a novel fold (V. parahaemolyticus VPA1032).

  16. 2004 Structural, Function and Evolutionary Genomics

    SciTech Connect

    Douglas L. Brutlag Nancy Ryan Gray

    2005-03-23

    This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

  17. Floral gene resources from basal angiosperms for comparative genomics research

    PubMed Central

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and

  18. Lectin genes in the Frankia alni genome.

    PubMed

    Pujic, Petar; Fournier, Pascale; Alloisio, Nicole; Hay, Anne-Emmanuelle; Maréchal, Joelle; Anchisi, Stéphanie; Normand, Philippe

    2012-01-01

    Frankia alni strain ACN14a's genome was scanned for the presence of determinants involved in interactions with its host plant, Alnus spp. One such determinant type is lectin, proteins that bind specifically to sugar motifs. The genome of F. alni was found to contain 7 such lectin-coding genes, five of which were of the ricinB-type. The proteins coded by these genes contain either only the lectin domain, or also a heat shock protein or a serine-threonine kinase domain upstream. These lectins were found to have several homologs in Streptomyces spp., and a few in other bacterial genomes among which none in Frankia EAN1pec and CcI3 and two in strain EUN1f. One of these F. alni genes, FRAAL0616, was cloned in E. coli, fused with a reporter gene yielding a fusion protein that was found to bind to both root hairs and to bacterial hyphae. This protein was also found to modify the dynamics of nodule formation in A. glutinosa, resulting in a higher number of nodules per root. Its role could thus be to permit binding of microbial cells to root hairs and help symbiosis to occur under conditions of low Frankia cell counts such as in pioneer situations. PMID:22159868

  19. The d4 gene family in the human genome

    SciTech Connect

    Chestkov, A.V.; Baka, I.D.; Kost, M.V.

    1996-08-15

    The d4 domain, a novel zinc finger-like structural motif, was first revealed in the rat neuro-d4 protein. Here we demonstrate that the d4 domain is conserved in evolution and that three related genes form a d4 family in the human genome. The human neuro-d4 is very similar to rat neuro-d4 at both the amino acid and the nucleotide levels. Moreover, the same splice variants have been detected among rat and human neuro-d4 transcripts. This gene has been localized on chromosome 19, and two other genes, members of the d4 family isolated by screening of the human genomic library at low stringency, have been mapped to chromosomes 11 and 14. The gene on chromosome 11 is the homolog of the ubiquitously expressed mouse gene ubi-d4/requiem, which is required for cell death after deprivation of trophic factors. A gene with a conserved d4 domain has been found in the genome of the nematode Caenorhabditis elegans. The conservation of d4 proteins from nematodes to vertebrates suggests that they have a general importance, but a diversity of d4 proteins expressed in vertebrate nervous systems suggests that some family members have special functions. 11 refs., 2 figs.

  20. Chloroplast genome structure in Ilex (Aquifoliaceae)

    PubMed Central

    Yao, Xin; Tan, Yun-Hong; Liu, Ying-Ying; Song, Yu; Yang, Jun-Bo; Corlett, Richard T.

    2016-01-01

    Aquifoliaceae is the largest family in the campanulid order Aquifoliales. It consists of a single genus, Ilex, the hollies, which is the largest woody dioecious genus in the angiosperms. Most species are in East Asia or South America. The taxonomy and evolutionary history remain unclear due to the lack of a robust species-level phylogeny. We produced the first complete chloroplast genomes in this family, including seven Ilex species, by Illumina sequencing of long-range PCR products and subsequent reference-guided de novo assembly. These genomes have a typical bicyclic structure with a conserved genome arrangement and moderate divergence. The total length is 157,741 bp and there is one large single-copy region (LSC) with 87,109 bp, one small single-copy with 18,436 bp, and a pair of inverted repeat regions (IR) with 52,196 bp. A total of 144 genes were identified, including 96 protein-coding genes, 40 tRNA and 8 rRNA. Thirty-four repetitive sequences were identified in Ilex pubescens, with lengths >14 bp and identity >90%, and 11 divergence hotspot regions that could be targeted for phylogenetic markers. This study will contribute to improved resolution of deep branches of the Ilex phylogeny and facilitate identification of Ilex species. PMID:27378489

  1. On the analysis of large-scale genomic structures.

    PubMed

    Oiwa, Nestor Norio; Goldman, Carla

    2005-01-01

    We apply methods from statistical physics (histograms, correlation functions, fractal dimensions, and singularity spectra) to characterize large-scale structure of the distribution of nucleotides along genomic sequences. We discuss the role of the extension of noncoding segments ("junk DNA") for the genomic organization, and the connection between the coding segment distribution and the high-eukaryotic chromatin condensation. The following sequences taken from GenBank were analyzed: complete genome of Xanthomonas campestri, complete genome of yeast, chromosome V of Caenorhabditis elegans, and human chromosome XVII around gene BRCA1. The results are compared with the random and periodic sequences and those generated by simple and generalized fractal Cantor sets. PMID:15858230

  2. Genome-Wide Comparative Analysis Reveals Similar Types of NBS Genes in Hybrid Citrus sinensis Genome and Original Citrus clementine Genome and Provides New Insights into Non-TIR NBS Genes

    PubMed Central

    Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K.; Duan, Yongping; Luo, Feng

    2015-01-01

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention. PMID:25811466

  3. Kluyveromyces lactis genome harbours a functional linker histone encoding gene.

    PubMed

    Staneva, Dessislava; Georgieva, Milena; Miloshev, George

    2016-06-01

    Linker histones are essential components of chromatin in eukaryotes. Through interactions with linker DNA and nucleosomes they facilitate folding and maintenance of higher-order chromatin structures and thus delicately modulate gene activity. The necessity of linker histones in lower eukaryotes appears controversial and dubious. Genomic data have shown that Schizosaccharomyces pombe does not possess genes encoding linker histones while Kluyveromyces lactis has been reported to have a pseudogene. Regarding this controversy, we have provided the first direct experimental evidence for the existence of a functional linker histone gene, KlLH1, in K. lactis genome. Sequencing of KlLH1 from both genomic DNA and copy DNA confirmed the presence of an intact open reading frame. Transcription and splicing of the KlLH1 sequence as well as translation of its mRNA have been studied. In silico analysis revealed homology of KlLH1p to the histone H1/H5 protein family with predicted three domain structure characteristic for the linker histones of higher eukaryotes. This strongly proves that the yeast K. lactis does indeed possess a functional linker histone gene thus entailing the evolutionary preservation and significance of linker histones. The nucleotide sequences of KlLH1 are deposited in the GenBank under accession numbers KT826576, KT826577 and KT826578. PMID:27189369

  4. Genomic Structure and Evolution of Multigene Families: “Flowers” on the Human Genome

    PubMed Central

    Kim, Hie Lim; Iwase, Mineyo; Igawa, Takeshi; Nishioka, Tasuku; Kaneko, Satoko; Katsura, Yukako; Takahata, Naoyuki; Satta, Yoko

    2012-01-01

    We report the results of an extensive investigation of genomic structures in the human genome, with a particular focus on relatively large repeats (>50 kb) in adjacent chromosomal regions. We named such structures “Flowers” because the pattern observed on dot plots resembles a flower. We detected a total of 291 Flowers in the human genome. They were predominantly located in euchromatic regions. Flowers are gene-rich compared to the average gene density of the genome. Genes involved in systems receiving environmental information, such as immunity and detoxification, were overrepresented in Flowers. Within a Flower, the mean number of duplication units was approximately four. The maximum and minimum identities between homologs in a Flower showed different distributions; the maximum identity was often concentrated to 100% identity, while the minimum identity was evenly distributed in the range of 78% to 100%. Using a gene conversion detection test, we found frequent and/or recent gene conversion events within the tested Flowers. Interestingly, many of those converted regions contained protein-coding genes. Computer simulation studies suggest that one role of such frequent gene conversions is the elongation of the life span of gene families in a Flower by the resurrection of pseudogenes. PMID:22779033

  5. Genomic structure and evolution of multigene families: "flowers" on the human genome.

    PubMed

    Kim, Hie Lim; Iwase, Mineyo; Igawa, Takeshi; Nishioka, Tasuku; Kaneko, Satoko; Katsura, Yukako; Takahata, Naoyuki; Satta, Yoko

    2012-01-01

    We report the results of an extensive investigation of genomic structures in the human genome, with a particular focus on relatively large repeats (>50 kb) in adjacent chromosomal regions. We named such structures "Flowers" because the pattern observed on dot plots resembles a flower. We detected a total of 291 Flowers in the human genome. They were predominantly located in euchromatic regions. Flowers are gene-rich compared to the average gene density of the genome. Genes involved in systems receiving environmental information, such as immunity and detoxification, were overrepresented in Flowers. Within a Flower, the mean number of duplication units was approximately four. The maximum and minimum identities between homologs in a Flower showed different distributions; the maximum identity was often concentrated to 100% identity, while the minimum identity was evenly distributed in the range of 78% to 100%. Using a gene conversion detection test, we found frequent and/or recent gene conversion events within the tested Flowers. Interestingly, many of those converted regions contained protein-coding genes. Computer simulation studies suggest that one role of such frequent gene conversions is the elongation of the life span of gene families in a Flower by the resurrection of pseudogenes. PMID:22779033

  6. Evolution of galanin receptor genes: insights from the deuterostome genomes.

    PubMed

    Liu, Z; Xu, Y; Wu, L; Zhang, S

    2010-08-01

    Galanin exerts its biological activities through three different G protein-coupled receptors, Galr1, Galr2 and Galr3. To obtain insights into the evolution of Galrs, we searched the genomes of the deuterostomes by extensive BLAST survey and phylogenetic analyses. The Galr2 and Galr3 share similar genomic structures, and most of them are composed of 2 exons and 1 intron. However, most of Galr1 are composed of 3 extrons and 2 introns. We did not detect the typical Galr genes in the genomic databases of invertebrate deutserotomes, but three Galr1/Alstr homologs and two Galr1/Gpr151 homologs in amphioxus, two Galr1/Gpr151 homologs in sea squirt and one Galr1/Gpr151 homologs in sea urchin were identified. It is highly possible that the Galr genes in vertebrates may evolve from the homologous genes of Galr1/Alstr/Gpr151 in invertebrate deuterostomes. We also proposed that Galr3 genes were the products of Galr2 duplication during evolution, while Galr2 genes may evolve from Galr1. PMID:20476798

  7. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.)

    PubMed Central

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-01-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073

  8. Mining the genome for lipid genes.

    PubMed

    Kuivenhoven, Jan Albert; Hegele, Robert A

    2014-10-01

    Mining of the genome for lipid genes has since the early 1970s helped to shape our understanding of how triglycerides are packaged (in chylomicrons), repackaged (in very low density lipoproteins; VLDL), and hydrolyzed, and also how remnant and low-density lipoproteins (LDL) are cleared from the circulation. Gene discoveries have also provided insights into high-density lipoprotein (HDL) biogenesis and remodeling. Interestingly, at least half of these key molecular genetic studies were initiated with the benefit of prior knowledge of relevant proteins. In addition, multiple important findings originated from studies in mouse, and from other types of non-genetic approaches. Although it appears by now that the main lipid pathways have been uncovered, and that only modulators or adaptor proteins such as those encoded by LDLRAP1, APOA5, ANGPLT3/4, and PCSK9 are currently being discovered, genome wide association studies (GWAS) in particular have implicated many new loci based on statistical analyses; these may prove to have equally large impacts on lipoprotein traits as gene products that are already known. On the other hand, since 2004 - and particularly since 2010 when massively parallel sequencing has become de rigeur - no major new insights into genes governing lipid metabolism have been reported. This is probably because the etiologies of true Mendelian lipid disorders with overt clinical complications have been largely resolved. In the meantime, it has become clear that proving the importance of new candidate genes is challenging. This could be due to very low frequencies of large impact variants in the population. It must further be emphasized that functional genetic studies, while necessary, are often difficult to accomplish, making it hazardous to upgrade a variant that is simply associated to being definitively causative. Also, it is clear that applying a monogenic approach to dissect complex lipid traits that are mostly of polygenic origin is the wrong way to

  9. Hybrid Vigour? Genes, Genomics, and History

    PubMed Central

    BIVINS, ROBERTA

    2010-01-01

    Is the gene ‘special’ for historians? What effects, if any, has the notion of the ‘gene’ had on our understanding of history? Certainly, there is a widespread public and professional perception that genetics and history are or should be in dialogue with each other in some way. But historians and geneticists view history and genetics very differently – and assume very different relationships between them. And public perceptions of genes, genetics, genomics, and indeed the nature and meanings of ‘history’ differ yet again. Here, in looking at the meaning, and the implications – the significance – of the gene (and its corollary scientific disciplines and approaches) specifically to historians, I will focus on two aspects of the discourse. First, I will examine the ways in which historians have thus far approached genes and genetics, and the impact such studies have had on the field. There is considerable overlap between the subject matter of genetics/genomics and many of the most widely used analytic categories of contemporary historiography – race, gender, sexuality, ethnicity, (dis)ability, among others. Yet the impact of genetics and genomics on society has been studied principally by anthropologists, sociologists and ethicists.2 Only two historical sub-disciplines have engaged with the rise of genetics to any significant degree: the histories of science and of medicine. What does this indicate or suggest? Second, I will explore the impact of the ‘gene’ and genetic understandings (of, for example, the body, health, disease, identity, the family, and evolution) on public conceptions of history itself. PMID:20357894

  10. Genome-level identification, gene expression, and comparative analysis of porcine ß-defensin genes

    PubMed Central

    2012-01-01

    Background Beta-defensins (β-defensins) are innate immune peptides with evolutionary conservation across a wide range of species and has been suggested to play important roles in innate immune reactions against pathogens. However, the complete β-defensin repertoire in the pig has not been fully addressed. Result A BLAST analysis was performed against the available pig genomic sequence in the NCBI database to identify β-defensin-related sequences using previously reported β-defensin sequences of pigs, humans, and cattle. The porcine β-defensin gene clusters were mapped to chromosomes 7, 14, 15 and 17. The gene expression analysis of 17 newly annotated porcine β-defensin genes across 15 tissues using semi-quantitative reverse transcription polymerase chain reaction (RT-PCR) showed differences in their tissue distribution, with the kidney and testis having the largest pBD expression repertoire. We also analyzed single nucleotide polymorphisms (SNPs) in the mature peptide region of pBD genes from 35 pigs of 7 breeds. We found 8 cSNPs in 7 pBDs. Conclusion We identified 29 porcine β-defensin (pBD) gene-like sequences, including 17 unreported pBDs in the porcine genome. Comparative analysis of β-defensin genes in the pig genome with those in human and cattle genomes showed structural conservation of β-defensin syntenic regions among these species. PMID:23150902

  11. Genomic organization of the human skeletal muscle sodium channel gene

    SciTech Connect

    George, A.L. Jr.; Iyer, G.S.; Kleinfield, R.; Kallen, R.G.; Barchi, R.L. )

    1993-03-01

    Voltage-dependent sodium channels are essential for normal membrane excitability and contractility in adult skeletal muscle. The gene encoding the principal sodium channel [alpha]-subunit isoform in human skeletal muscle (SCN4A) has recently been shown to harbor point mutations in certain hereditary forms of periodic paralysis. The authors have carried out an analysis of the detailed structure of this gene including delination of intron-exon boundaries by genomic DNA cloning and sequence analysis. The complete coding region of SCN4A is found in 32.5 kb of genomic DNA and consists of 24 exons (54 to >2.2 kb) and 23 introns (97 bp-4.85 kb). The exon organization of the gene shows no relationship to the predicted functional domains of the channel protein and splice junctions interrupt many of the transmembrane segments. The genomic organization of sodium channels may have been partially conserved during evolution as evidenced by the observation that 10 of the 24 splice junctions in SCN4A are positioned in homologous locations in a putative sodium channel gene in Drosophila (para). The information presented here should be extremely useful both for further identifying sodium channel mutations and for gaining a better understanding of sodium channel evolution. 39 refs., 5 figs., 2 tabs.

  12. Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics

    PubMed Central

    Fermin, Damian; Allen, Baxter B; Blackwell, Thomas W; Menon, Rajasree; Adamski, Marcin; Xu, Yin; Ulintz, Peter; Omenn, Gilbert S; States, David J

    2006-01-01

    Background Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database. Results Applying a false discovery rate threshold of 0.05, we identified 282 significant open reading frames, each containing two or more peptide matches. There were 627 novel peptides associated with these open reading frames that mapped to a unique genomic coordinate placed within the start/stop points of previously annotated genes. These peptides matched 1,110 distinct tandem MS spectra. Peptides fell into four categories based upon where their genomic coordinates placed them relative to annotated exons within the parent gene. Conclusion This work provides evidence for novel alternative splice variants in many previously annotated genes. These findings suggest that annotation of the genome is not yet complete and that proteomics has the potential to further add to our understanding of gene structures. PMID:16646984

  13. Coevolution of the Organization and Structure of Prokaryotic Genomes.

    PubMed

    Touchon, Marie; Rocha, Eduardo P C

    2016-01-01

    The cytoplasm of prokaryotes contains many molecular machines interacting directly with the chromosome. These vital interactions depend on the chromosome structure, as a molecule, and on the genome organization, as a unit of genetic information. Strong selection for the organization of the genetic elements implicated in these interactions drives replicon ploidy, gene distribution, operon conservation, and the formation of replication-associated traits. The genomes of prokaryotes are also very plastic with high rates of horizontal gene transfer and gene loss. The evolutionary conflicts between plasticity and organization lead to the formation of regions with high genetic diversity whose impact on chromosome structure is poorly understood. Prokaryotic genomes are remarkable documents of natural history because they carry the imprint of all of these selective and mutational forces. Their study allows a better understanding of molecular mechanisms, their impact on microbial evolution, and how they can be tinkered in synthetic biology. PMID:26729648

  14. INTEGRATE: gene fusion discovery using whole genome and transcriptome data

    PubMed Central

    Zhang, Jin; White, Nicole M.; Schmidt, Heather K.; Fulton, Robert S.; Tomlinson, Chad; Warren, Wesley C.; Wilson, Richard K.; Maher, Christopher A.

    2016-01-01

    While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use. PMID:26556708

  15. Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species

    PubMed Central

    Hirao, Tomonori; Watanabe, Atsushi; Kurita, Manabu; Kondo, Teiji; Takata, Katsuhiko

    2008-01-01

    Background The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. Results The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. Conclusion The observed differences in genomic structure between C. japonica and other land plants, including

  16. Isolation, cDNA, and genomic structure of a conserved gene (NOF) at chromosome 11q13 next to FAU and oriented in the opposite transcriptional orientation

    SciTech Connect

    Kas, K.; Meyen, E.; Van De Ven, W.J.M.

    1996-06-15

    In our effort to characterize a gene at chromosome 11q13 involved in a t(11;17)(q13;q21) translocation in B-non-Hodgkin lymphoma, we have identified a novel human gene, NOF (Neighbour of FAU). It maps right next to FAU in a head to head configuration separated by a maximum of 146 nucleotides. cDNA clones representing NOF hybridized to a 2.2-kb mRNA present in all tissues tested. The largest open reading frame appeared to contain 166 amino acids and is proline rich, and the sequence shows no homology with any known gene in the public databases. The NOF gene consists of 4 exons and 3 introns spanning approximately 5 kb, and the boundaries between exons and introns follow the GT/AG rule. The NOF locus is conserved during evolution, with the predicted protein having over 80% identity to three translated mouse and rat ESTs of unknown function. Moreover, the mouse ESTs map in the same organization, closely linked to the FAU gene, in the mouse genome. NOF, however, is not affected by the t(11;17)(q13;121) chromosomal translocation. 14 refs., 2 figs.

  17. GenePRIMP: A GENE PRediction IMprovement Pipeline for Prokaryotic genomes

    SciTech Connect

    Pati, Amrita; Ivanova, Natalia N.; Mikhailova, Natalia; Ovchinnikova, Galina; Hooper, Sean D.; Lykidis, Athanasios; Kyrpides, Nikos C.

    2010-04-01

    We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.

  18. Genomic architecture and inheritance of human ribosomal RNA gene clusters

    PubMed Central

    Stults, Dawn M.; Killen, Michael W.; Pierce, Heather H.; Pierce, Andrew J.

    2008-01-01

    The finishing of the Human Genome Project largely completed the detailing of human euchromatic sequences; however, the most highly repetitive regions of the genome still could not be assembled. The 12 gene clusters producing the structural RNA components of the ribosome are critically important for cellular viability, yet fall into this unassembled region of the Human Genome Project. To determine the extent of human variation in ribosomal RNA gene content (rDNA) and patterns of rDNA cluster inheritance, we have determined the physical lengths of the rDNA clusters in peripheral blood white cells of healthy human volunteers. The cluster lengths exhibit striking variability between and within human individuals, ranging from 50 kb to >6 Mb, manifest essentially complete heterozygosity, and provide each person with their own unique rDNA electrophoretic karyotype. Analysis of these rDNA fingerprints in multigenerational human families demonstrates that the rDNA clusters are subject to meiotic rearrangement at a frequency >10% per cluster, per meiosis. With this high intrinsic recombinational instability, the rDNA clusters may serve as a unique paradigm of potential human genomic plasticity. PMID:18025267

  19. Transcriptional consequences of genomic structural aberrations in breast cancer

    PubMed Central

    Inaki, Koichiro; Hillmer, Axel M.; Ukil, Leena; Yao, Fei; Woo, Xing Yi; Vardy, Leah A.; Zawack, Kelson Folkvard Braaten; Lee, Charlie Wah Heng; Ariyaratne, Pramila Nuwantha; Chan, Yang Sun; Desai, Kartiki Vasant; Bergh, Jonas; Hall, Per; Putti, Thomas Choudary; Ong, Wai Loon; Shahab, Atif; Cacheux-Rataboul, Valere; Karuturi, Radha Krishna Murthy; Sung, Wing-Kin; Ruan, Xiaoan; Bourque, Guillaume; Ruan, Yijun; Liu, Edison T.

    2011-01-01

    Using a long-span, paired-end deep sequencing strategy, we have comprehensively identified cancer genome rearrangements in eight breast cancer genomes. Herein, we show that 40%–54% of these structural genomic rearrangements result in different forms of fusion transcripts and that 44% are potentially translated. We find that single segmental tandem duplication spanning several genes is a major source of the fusion gene transcripts in both cell lines and primary tumors involving adjacent genes placed in the reverse-order position by the duplication event. Certain other structural mutations, however, tend to attenuate gene expression. From these candidate gene fusions, we have found a fusion transcript (RPS6KB1–VMP1) recurrently expressed in ∼30% of breast cancers associated with potential clinical consequences. This gene fusion is caused by tandem duplication on 17q23 and appears to be an indicator of local genomic instability altering the expression of oncogenic components such as MIR21 and RPS6KB1. PMID:21467264

  20. Structural and Operational Complexity of the Geobacter Sulfurreducens Genome

    SciTech Connect

    Qiu, Yu; Cho, Byung-Kwan; Park, Young S.; Lovley, Derek R.; Palsson, Bernhard O.; Zengler, Karsten

    2010-06-30

    Prokaryotic genomes can be annotated based on their structural, operational, and functional properties. These annotations provide the pivotal scaffold for understanding cellular functions on a genome-scale, such as metabolism and transcriptional regulation. Here, we describe a systems approach to simultaneously determine the structural and operational annotation of the Geobacter sulfurreducens genome. Integration of proteomics, transcriptomics, RNA polymerase, and sigma factor-binding information with deep-sequencing-based analysis of primary 59-end transcripts allowed for a most precise annotation. The structural annotation is comprised of numerous previously undetected genes, noncoding RNAs, prevalent leaderless mRNA transcripts, and antisense transcripts. When compared with other prokaryotes, we found that the number of antisense transcripts reversely correlated with genome size. The operational annotation consists of 1453 operons, 22% of which have multiple transcription start sites that use different RNA polymerase holoenzymes. Several operons with multiple transcription start sites encoded genes with essential functions, giving insight into the regulatory complexity of the genome. The experimentally determined structural and operational annotations can be combined with functional annotation, yielding a new three-level annotation that greatly expands our understanding of prokaryotic genomes.

  1. Identification and characterization of essential genes in the human genome

    PubMed Central

    Wang, Tim; Birsoy, Kıvanç; Hughes, Nicholas W.; Krupczak, Kevin M.; Post, Yorick; Wei, Jenny J.; Lander, Eric S.; Sabatini, David M.

    2015-01-01

    Large-scale genetic analysis of lethal phenotypes has elucidated the molecular underpinnings of many biological processes. Using the bacterial clustered regularly interspaced short palindromic repeats (CRISPR) system, we constructed a genome-wide single-guide RNA (sgRNA) library to screen for genes required for proliferation and survival in a human cancer cell line. Our screen revealed the set of cell-essential genes, which was validated by an orthogonal gene-trap-based screen and comparison with yeast gene knockouts. This set is enriched for genes that encode components of fundamental pathways, are expressed at high levels, and contain few inactivating polymorphisms in the human population. We also uncovered a large group of uncharacterized genes involved in RNA processing, a number of whose products localize to the nucleolus. Lastly, screens in additional cell lines showed a high degree of overlap in gene essentiality, but also revealed differences specific to each cell line and cancer type that reflect the developmental origin, oncogenic drivers, paralogous gene expression pattern, and chromosomal structure of each line. These results demonstrate the power of CRISPR-based screens and suggest a general strategy for identifying liabilities in cancer cells. PMID:26472758

  2. Identification and characterization of essential genes in the human genome.

    PubMed

    Wang, Tim; Birsoy, Kıvanç; Hughes, Nicholas W; Krupczak, Kevin M; Post, Yorick; Wei, Jenny J; Lander, Eric S; Sabatini, David M

    2015-11-27

    Large-scale genetic analysis of lethal phenotypes has elucidated the molecular underpinnings of many biological processes. Using the bacterial clustered regularly interspaced short palindromic repeats (CRISPR) system, we constructed a genome-wide single-guide RNA library to screen for genes required for proliferation and survival in a human cancer cell line. Our screen revealed the set of cell-essential genes, which was validated with an orthogonal gene-trap-based screen and comparison with yeast gene knockouts. This set is enriched for genes that encode components of fundamental pathways, are expressed at high levels, and contain few inactivating polymorphisms in the human population. We also uncovered a large group of uncharacterized genes involved in RNA processing, a number of whose products localize to the nucleolus. Last, screens in additional cell lines showed a high degree of overlap in gene essentiality but also revealed differences specific to each cell line and cancer type that reflect the developmental origin, oncogenic drivers, paralogous gene expression pattern, and chromosomal structure of each line. These results demonstrate the power of CRISPR-based screens and suggest a general strategy for identifying liabilities in cancer cells. PMID:26472758

  3. Genome-wide association study using whole-genome sequencing rapidly identifies new genes influencing agronomic traits in rice.

    PubMed

    Yano, Kenji; Yamamoto, Eiji; Aya, Koichiro; Takeuchi, Hideyuki; Lo, Pei-Ching; Hu, Li; Yamasaki, Masanori; Yoshida, Shinya; Kitano, Hidemi; Hirano, Ko; Matsuoka, Makoto

    2016-08-01

    A genome-wide association study (GWAS) can be a powerful tool for the identification of genes associated with agronomic traits in crop species, but it is often hindered by population structure and the large extent of linkage disequilibrium. In this study, we identified agronomically important genes in rice using GWAS based on whole-genome sequencing, followed by the screening of candidate genes based on the estimated effect of nucleotide polymorphisms. Using this approach, we identified four new genes associated with agronomic traits. Some genes were undetectable by standard SNP analysis, but we detected them using gene-based association analysis. This study provides fundamental insights relevant to the rapid identification of genes associated with agronomic traits using GWAS and will accelerate future efforts aimed at crop improvement. PMID:27322545

  4. p63 gene structure in the phylum mollusca.

    PubMed

    Baričević, Ana; Štifanić, Mauro; Hamer, Bojan; Batel, Renato

    2015-08-01

    Roles of p53 family ancestor (p63) in the organisms' response to stressful environmental conditions (mainly pollution) have been studied among molluscs, especially in the genus Mytilus, within the last 15 years. Nevertheless, information about gene structure of this regulatory gene in molluscs is scarce. Here we report the first complete genomic structure of the p53 family orthologue in the mollusc Mediterranean mussel Mytilus galloprovincialis and confirm its similarity to vertebrate p63 gene. Our searches within the available molluscan genomes (Aplysia californica, Lottia gigantea, Crassostrea gigas and Biomphalaria glabrata), found only one p53 family member present in a single copy per haploid genome. Comparative analysis of those orthologues, additionally confirmed the conserved p63 gene structure. Conserved p63 gene structure can be a helpful tool to complement or/and revise gene annotations of any future p63 genomic sequence records in molluscs, but also in other animal phyla. Knowledge of the correct gene structure will enable better prediction of possible protein isoforms and their functions. Our analyses also pointed out possible mis-annotations of the p63 gene in sequenced molluscan genomes and stressed the value of manual inspection (based on alignments of cDNA and protein onto the genome sequence) for a reliable and complete gene annotation. PMID:25936268

  5. Identifying potential cancer driver genes by genomic data integration

    NASA Astrophysics Data System (ADS)

    Chen, Yong; Hao, Jingjing; Jiang, Wei; He, Tong; Zhang, Xuegong; Jiang, Tao; Jiang, Rui

    2013-12-01

    Cancer is a genomic disease associated with a plethora of gene mutations resulting in a loss of control over vital cellular functions. Among these mutated genes, driver genes are defined as being causally linked to oncogenesis, while passenger genes are thought to be irrelevant for cancer development. With increasing numbers of large-scale genomic datasets available, integrating these genomic data to identify driver genes from aberration regions of cancer genomes becomes an important goal of cancer genome analysis and investigations into mechanisms responsible for cancer development. A computational method, MAXDRIVER, is proposed here to identify potential driver genes on the basis of copy number aberration (CNA) regions of cancer genomes, by integrating publicly available human genomic data. MAXDRIVER employs several optimization strategies to construct a heterogeneous network, by means of combining a fused gene functional similarity network, gene-disease associations and a disease phenotypic similarity network. MAXDRIVER was validated to effectively recall known associations among genes and cancers. Previously identified as well as novel driver genes were detected by scanning CNAs of breast cancer, melanoma and liver carcinoma. Three predicted driver genes (CDKN2A, AKT1, RNF139) were found common in these three cancers by comparative analysis.

  6. A Roadmap for Functional Structural Variants in the Soybean Genome

    PubMed Central

    Anderson, Justin E.; Kantar, Michael B.; Kono, Thomas Y.; Fu, Fengli; Stec, Adrian O.; Song, Qijian; Cregan, Perry B.; Specht, James E.; Diers, Brian W.; Cannon, Steven B.; McHale, Leah K.; Stupar, Robert M.

    2014-01-01

    Gene structural variation (SV) has recently emerged as a key genetic mechanism underlying several important phenotypic traits in crop species. We screened a panel of 41 soybean (Glycine max) accessions serving as parents in a soybean nested association mapping population for deletions and duplications in more than 53,000 gene models. Array hybridization and whole genome resequencing methods were used as complementary technologies to identify SV in 1528 genes, or approximately 2.8%, of the soybean gene models. Although SV occurs throughout the genome, SV enrichment was noted in families of biotic defense response genes. Among accessions, SV was nearly eightfold less frequent for gene models that have retained paralogs since the last whole genome duplication event, compared with genes that have not retained paralogs. Increases in gene copy number, similar to that described at the Rhg1 resistance locus, account for approximately one-fourth of the genic SV events. This assessment of soybean SV occurrence presents a target list of genes potentially responsible for rapidly evolving and/or adaptive traits. PMID:24855315

  7. Gene organization and characterization of the complete mitochondrial genome of Hainan black goat (Capra hircus).

    PubMed

    Hu, Jiangtao; Zhao, Wei; Niu, Lili; Wang, Linjie; Li, Li; Zhang, Hongping; Zhong, Tao

    2016-05-01

    The complete mitochondrial genome sequence of Hainan black goat was determined for the first time by the PCR-based method. The total length of the mitogenome was 16,641 bp, including 33.54% A, 26.04% C, 27.31% T, 13.11% G. The genome structure contained 22 tRNA genes, 2 rRNA genes, 13 protein-coding genes and 1 control region (D-loop region). These results have extended more detail information of mitochondrial genome, thus being useful for further study on the genetic divergence and phylogenetic resolution of global goats. PMID:25211090

  8. Inter-genomic DNA Exchanges and Homeologous Gene Silencing Shaped the Nascent Allopolyploid Coffee Genome (Coffea arabica L.)

    PubMed Central

    Lashermes, Philippe; Hueber, Yann; Combes, Marie-Christine; Severac, Dany; Dereeper, Alexis

    2016-01-01

    Allopolyploidization is a biological process that has played a major role in plant speciation and evolution. Genomic changes are common consequences of polyploidization, but their dynamics over time are still poorly understood. Coffea arabica, a recently formed allotetraploid, was chosen to study genetic changes that accompany allopolyploid formation. Both RNA-seq and DNA-seq data were generated from two genetically distant C. arabica accessions. Genomic structural variation was investigated using C. canephora, one of its diploid progenitors, as reference genome. The fate of 9047 duplicate homeologous genes was inferred and compared between the accessions. The pattern of SNP density along the reference genome was consistent with the allopolyploid structure. Large genomic duplications or deletions were not detected. Two homeologous copies were retained and expressed in 96% of the genes analyzed. Nevertheless, duplicated genes were found to be affected by various genomic changes leading to homeolog loss or silencing. Genetic and epigenetic changes were evidenced that could have played a major role in the stabilization of the unique ancestral allotetraploid and its subsequent diversification. While the early evolution of C. arabica mainly involved homeologous crossover exchanges, the later stage appears to have relied on more gradual evolution involving gene conversion and homeolog silencing. PMID:27440920

  9. Generating Genome-Scale Candidate Gene Lists for Pharmacogenomics

    PubMed Central

    Hansen, NT; Brunak, S; Altman, RB

    2009-01-01

    A critical task in pharmacogenomics is identifying genes that may be important modulators of drug response. High-throughput experimental methods are often plagued by false positives and do not take advantage of existing knowledge. Candidate gene lists can usefully summarize existing knowledge, but they are expensive to generate manually and may therefore have incomplete coverage. We have developed a method that ranks 12,460 genes in the human genome on the basis of their potential relevance to a specific query drug and its putative indications. Our method uses known gene–drug interactions, networks of gene–gene interactions, and available measures of drug–drug similarity. It ranks genes by building a local network of known interactions and assessing the similarity of the query drug (by both structure and indication) with drugs that interact with gene products in the local network. In a comprehensive benchmark, our method achieves an overall area under the curve of 0.82. To showcase our method, we found novel gene candidates for warfarin, gefitinib, carboplatin, and gemcitabine, and we provide the molecular hypotheses for these predictions. PMID:19369935

  10. Rotavirus gene structure and function.

    PubMed Central

    Estes, M K; Cohen, J

    1989-01-01

    Knowledge of the structure and function of the genes and proteins of the rotaviruses has expanded rapidly. Information obtained in the last 5 years has revealed unexpected and unique molecular properties of rotavirus proteins of general interest to virologists, biochemists, and cell biologists. Rotaviruses share some features of replication with reoviruses, yet antigenic and molecular properties of the outer capsid proteins, VP4 (a protein whose cleavage is required for infectivity, possibly by mediating fusion with the cell membrane) and VP7 (a glycoprotein), show more similarities with those of other viruses such as the orthomyxoviruses, paramyxoviruses, and alphaviruses. Rotavirus morphogenesis is a unique process, during which immature subviral particles bud through the membrane of the endoplasmic reticulum (ER). During this process, transiently enveloped particles form, the outer capsid proteins are assembled onto particles, and mature particles accumulate in the lumen of the ER. Two ER-specific viral glycoproteins are involved in virus maturation, and these glycoproteins have been shown to be useful models for studying protein targeting and retention in the ER and for studying mechanisms of virus budding. New ideas and approaches to understanding how each gene functions to replicate and assemble the segmented viral genome have emerged from knowledge of the primary structure of rotavirus genes and their proteins and from knowledge of the properties of domains on individual proteins. Localization of type-specific and cross-reactive neutralizing epitopes on the outer capsid proteins is becoming increasingly useful in dissecting the protective immune response, including evaluation of vaccine trials, with the practical possibility of enhancing the production of new, more effective vaccines. Finally, future analyses with recently characterized immunologic and gene probes and new animal models can be expected to provide a basic understanding of what regulates the

  11. Comparative and Functional Genomics in Identifying Aflatoxin Biosynthetic Genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Identification of genes involved in aflatoxin biosynthesis through Aspergillus flavus genomics has been actively pursued. A. flavus Expressed Sequence Tags (EST’s) and whole genome sequencing have been completed. Groups of genes that are potentially involved in aflatoxin production have been profi...

  12. A salmonid EST genomic study: genes, duplications, phylogeny and microarrays

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background: Salmonids are of interest because of their relatively recent genome duplication, and their extensive use in wild fisheries and aquaculture. A comprehensive gene list and a comparison of genes in some of the different species provide valuable genomic information for one of the most wide...

  13. Genomic scan for genes predisposing to schizophrenia

    SciTech Connect

    Coon, H.; Jensen. S.; Holik, J.

    1994-03-15

    We initiated a genome-wide search for genes predisposing to schizophrenia by ascertaining 9 families, each containing three to five cases of schizophrenia. The 9 pedigrees were initially genotyped with 329 polymorphic DNA loci distributed throughout the genome. Assuming either autosomal dominant or recessive inheritance, 254 DNA loci yielded lod scores less than -2.0 at {theta} = 0.0, 101 DNA markers gave lod scores less than -2.0 at {theta} = 0.05, while 5 DNA loci produced maximum lod scores greater than 1: D4S35, D14S17, D15S1, D22S84, and D22S55. Of the DNA markers yielding lod scores greater than 1, D4S35 and D22S55 also were suggestive of linkage when the Affected-Pedigree-Member method was used. The families were then genotyped with four highly polymorphic simple sequence repeat markers; possible linkage diminished with DNA markers mapping nearby D4S35, while suggestive evidence of linkage remained with loci in the region of D22S55. Although follow-up investigation of these chromosomal regions may be warranted, our linkage results should be viewed as preliminary observations, as 35 unaffected persons are not past the age of risk. 90 refs., 3 tabs.

  14. Genomic structure of the α-amylase gene in the pearl oyster Pinctada fucata and its expression in response to salinity and food concentration.

    PubMed

    Huang, Guiju; Guo, Yihui; Li, Lu; Fan, Sigang; Yu, Ziniu; Yu, Dahui

    2016-08-01

    Amylase is one of the most important digestive enzymes for phytophagous animals. In this study, the cDNA, genomic DNA, and promoter region of the α-amylase gene of the pearl oyster Pinctada fucata were cloned by using reverse transcription-polymerase chain reaction (RT-PCR), rapid amplification of cDNA ends, and genome-walking methods. The full-length cDNA sequence was 1704bp long and consisted of a 5'-untranslated region of 17bp, a 3'-untranslated region of 118bp, and a 1569-bp open reading frame encoding a 522-aa polypeptide with a 20-aa signal peptide. Sequence alignment revealed that P. fucata α-amylase (Pfamy) shared the highest identity (91.6%) with Pinctada maxima. The phylogenetic tree showed that it was closely related to P. maxima, based on the amino acid sequences. The genomic DNA was 10850bp and contained nine exons, eight introns, and a promoter region of 3932bp. Several transcriptional factors such as GATA-1, AP-1, and SP1 were predicted in the promoter region. Quantitative RT-PCR assay indicated that the relative expression level of Pfamy was significantly higher in the digestive gland than in other tissues (gonad, gills, muscle, and mantle) (P<0.001). The expression level at salinity 27‰ was significantly higher than that at other salinities (P<0.05). Expression reached a minimum when the algal food concentration was 16×10(4)cells/mL, which was significantly lower than the level observed at 8×10(4)cells/mL and 20×10(4) cells/mL (P<0.05). Our findings provide a genetic basis for further research on Pfamy activity and will facilitate studies on the growth mechanisms and genetic improvement of the pearl oyster P. fucata. PMID:27129943

  15. Evidence-based gene predictions in plant genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Automated evidence-based gene building is a rapid and cost-effective way to provide reliable gene annotations on newly sequenced genomes. One of the limitations of evidence-based gene builders, however, is their requirement for gene expression evidence—known proteins, full-length cDNAs, or expressed...

  16. Genome-, Transcriptome- and Proteome-Wide Analyses of the Gliadin Gene Families in Triticum urartu

    PubMed Central

    Wang, Dongzhi; Yang, Wenlong; Sun, Jiazhu; Zhang, Aimin; Zhan, Kehui

    2015-01-01

    Gliadins are the major components of storage proteins in wheat grains, and they play an essential role in the dough extensibility and nutritional quality of flour. Because of the large number of the gliadin family members, the high level of sequence identity, and the lack of abundant genomic data for Triticum species, identifying the full complement of gliadin family genes in hexaploid wheat remains challenging. Triticum urartu is a wild diploid wheat species and considered the A-genome donor of polyploid wheat species. The accession PI428198 (G1812) was chosen to determine the complete composition of the gliadin gene families in the wheat A-genome using the available draft genome. Using a PCR-based cloning strategy for genomic DNA and mRNA as well as a bioinformatics analysis of genomic sequence data, 28 gliadin genes were characterized. Of these genes, 23 were α-gliadin genes, three were γ-gliadin genes and two were ω-gliadin genes. An RNA sequencing (RNA-Seq) survey of the dynamic expression patterns of gliadin genes revealed that their synthesis in immature grains began prior to 10 days post-anthesis (DPA), peaked at 15 DPA and gradually decreased at 20 DPA. The accumulation of proteins encoded by 16 of the expressed gliadin genes was further verified and quantified using proteomic methods. The phylogenetic analysis demonstrated that the homologs of these α-gliadin genes were present in tetraploid and hexaploid wheat, which was consistent with T. urartu being the A-genome progenitor species. This study presents a systematic investigation of the gliadin gene families in T. urartu that spans the genome, transcriptome and proteome, and it provides new information to better understand the molecular structure, expression profiles and evolution of the gliadin genes in T. urartu and common wheat. PMID:26132381

  17. Genome-, Transcriptome- and Proteome-Wide Analyses of the Gliadin Gene Families in Triticum urartu.

    PubMed

    Zhang, Yanlin; Luo, Guangbin; Liu, Dongcheng; Wang, Dongzhi; Yang, Wenlong; Sun, Jiazhu; Zhang, Aimin; Zhan, Kehui

    2015-01-01

    Gliadins are the major components of storage proteins in wheat grains, and they play an essential role in the dough extensibility and nutritional quality of flour. Because of the large number of the gliadin family members, the high level of sequence identity, and the lack of abundant genomic data for Triticum species, identifying the full complement of gliadin family genes in hexaploid wheat remains challenging. Triticum urartu is a wild diploid wheat species and considered the A-genome donor of polyploid wheat species. The accession PI428198 (G1812) was chosen to determine the complete composition of the gliadin gene families in the wheat A-genome using the available draft genome. Using a PCR-based cloning strategy for genomic DNA and mRNA as well as a bioinformatics analysis of genomic sequence data, 28 gliadin genes were characterized. Of these genes, 23 were α-gliadin genes, three were γ-gliadin genes and two were ω-gliadin genes. An RNA sequencing (RNA-Seq) survey of the dynamic expression patterns of gliadin genes revealed that their synthesis in immature grains began prior to 10 days post-anthesis (DPA), peaked at 15 DPA and gradually decreased at 20 DPA. The accumulation of proteins encoded by 16 of the expressed gliadin genes was further verified and quantified using proteomic methods. The phylogenetic analysis demonstrated that the homologs of these α-gliadin genes were present in tetraploid and hexaploid wheat, which was consistent with T. urartu being the A-genome progenitor species. This study presents a systematic investigation of the gliadin gene families in T. urartu that spans the genome, transcriptome and proteome, and it provides new information to better understand the molecular structure, expression profiles and evolution of the gliadin genes in T. urartu and common wheat. PMID:26132381

  18. Chicken rRNA Gene Cluster Structure

    PubMed Central

    Dyomin, Alexander G.; Koshel, Elena I.; Kiselev, Artem M.; Saifitdinova, Alsu F.; Galkina, Svetlana A.; Fukagawa, Tatsuo; Kostareva, Anna A.

    2016-01-01

    Ribosomal RNA (rRNA) genes, whose activity results in nucleolus formation, constitute an extremely important part of genome. Despite the extensive exploration into avian genomes, no complete description of avian rRNA gene primary structure has been offered so far. We publish a complete chicken rRNA gene cluster sequence here, including 5’ETS (1836 bp), 18S rRNA gene (1823 bp), ITS1 (2530 bp), 5.8S rRNA gene (157 bp), ITS2 (733 bp), 28S rRNA gene (4441 bp) and 3’ETS (343 bp). The rRNA gene cluster sequence of 11863 bp was assembled from raw reads and deposited to GenBank under KT445934 accession number. The assembly was validated through in situ fluorescent hybridization analysis on chicken metaphase chromosomes using computed and synthesized specific probes, as well as through the reference assembly against de novo assembled rRNA gene cluster sequence using sequenced fragments of BAC-clone containing chicken NOR (nucleolus organizer region). The results have confirmed the chicken rRNA gene cluster validity. PMID:27299357

  19. The human TAX1 gene encoding the axon-associated cell adhesion molecule TAG-1/axonin-1: Genomic structure and basic promoter

    SciTech Connect

    Kozlov, S.V.; Giger, R.J.; Hasler, T.; Sonderegger, P.; Korvatska, E.; Schorderet, D.F.

    1995-11-20

    The human TAX-1 gene (HGMW-approved symbol TAX1) is located on chromosome 1 (1q32.1) and encodes the neuronal cell adhesion molecule TAG-1/axonin-1. The gene product, termed TAG-1 in the rat and axonin-1 in the chicken, is composed of six immunoglobulin (Ig)-like and four fibronectin type III (FNIII)-like domains. It is found predominantly on the axons of particular nerve fiber tracts during neural development, and it has been demonstrated to function as a potent substratum for neurite outgrowth in vitro. Here we report the cloning and structural characterization of the TAX-1 gene. The transcribed region of the TAX-1 gene extends over about 40 kb. Like its chicken homologue, the human TAX-1 gene consists of 23 exons. Two GT/CA microsatellites were localized in the first intron; a polymorphism was found for one of them. Reporter gene analysis with serially truncated fragments of the 5{prime}-flanking region indicated that a 164-bp fragment located immediately upstream of the putative transcription initiation site was sufficient to function as a basal promoter. 45 refs., 3 figs., 2 tabs.

  20. Genome-wide analysis of chromatin packing in Arabidopsis thaliana at single-gene resolution

    PubMed Central

    Liu, Chang; Wang, Congmao; Wang, George; Becker, Claude; Zaidem, Maricris; Weigel, Detlef

    2016-01-01

    The three-dimensional packing of the genome plays an important role in regulating gene expression. We have used Hi-C, a genome-wide chromatin conformation capture (3C) method, to analyze Arabidopsis thaliana chromosomes dissected into subkilobase segments, which is required for gene-level resolution in this species with a gene-dense genome. We found that the repressive H3K27me3 histone mark is overrepresented in the promoter regions of genes that are in conformational linkage over long distances. In line with the globally dispersed distribution of RNA polymerase II in A. thaliana nuclear space, actively transcribed genes do not show a strong tendency to associate with each other. In general, there are often contacts between 5′ and 3′ ends of genes, forming local chromatin loops. Such self-loop structures of genes are more likely to occur in more highly expressed genes, although they can also be found in silent genes. Silent genes with local chromatin loops are highly enriched for the histone variant H3.3 at their 5′ and 3′ ends but depleted of repressive marks such as heterochromatic histone modifications and DNA methylation in flanking regions. Our results suggest that, different from animals, a major theme of genome folding in A. thaliana is the formation of structural units that correspond to gene bodies. PMID:27225844

  1. The discrepancies in the results of bioinformatics tools for genomic structural annotation

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena; Nowak, Robert; Osipowski, Paweł; Rymuszka, Jacek; Świerkula, Katarzyna; Wojcieszek, Michał; Przybecki, Zbigniew

    2014-11-01

    A major focus of sequencing project is to identify genes in genomes. However it is necessary to define the variety of genes and the criteria for identifying them. In this work we present discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. We use Fgenesh, GenScan and GeneMark to automated structural annotation, the results have been compared to reference annotation.

  2. Genomic sequencing reveals gene content, genomic organization, and recombination relationships in barley.

    PubMed

    Rostoks, Nils; Park, Yong-Jin; Ramakrishna, Wusirika; Ma, Jianxin; Druka, Arnis; Shiloff, Bryan A; SanMiguel, Phillip J; Jiang, Zeyu; Brueggeman, Robert; Sandhu, Devinder; Gill, Kulvinder; Bennetzen, Jeffrey L; Kleinhofs, Andris

    2002-05-01

    Barley (Hordeum vulgare L.) is one of the most important large-genome cereals with extensive genetic resources available in the public sector. Studies of genome organization in barley have been limited primarily to genetic markers and sparse sequence data. Here we report sequence analysis of 417.5 kb DNA from four BAC clones from different genomic locations. Sequences were analyzed with respect to gene content, the arrangement of repetitive sequences and the relationship of gene density to recombination frequencies. Gene densities ranged from 1 gene per 12 kb to 1 gene per 103 kb with an average of 1 gene per 21 kb. In general, genes were organized into islands separated by large blocks of nested retrotransposons. Single genes in apparent isolation were also found. Genes occupied 11% of the total sequence, LTR retrotransposons and other repeated elements accounted for 51.9% and the remaining 37.1% could not be annotated. PMID:12021850

  3. Genome Structure Gallery from the Mycobacterium Tuberculosis Structual Genomics Consortium

    DOE Data Explorer

    The TB Structural Genomics Consortium works with the structures of proteins from M. tuberculosis, analyzing these structures in the context of functional information that currently exists and that the Consortium generates. The database of linked structural and functional information constructed from this project will form a lasting basis for understanding M. tuberculosis pathogenesis and for structure-based drug design. The Consortium's structural and functional information is publicly available. The Structures Gallery makes more than 650 total structures available by PDB identifier. Some of these are not consortium targets, but all are viewable in 3D color and can be manipulated in various ways by Jmol, an open-source Java viewer for chemical structures in 3D from http://www.jmol.org/

  4. Gene identification in bacterial and organellar genomes using GeneScan.

    PubMed

    Ramakrishna, R; Srinivasan, R

    1999-03-30

    The performance of the GeneScan algorithm for gene identification has been improved by incorporation of a directed iterative scanning procedure. Application is made here to the cases of bacterial and organnellar genomes. The sensitivity of gene identification was 100% in Plasmodium falciparum plastid-like genome (35 kb) and in 98% in the Mycoplasma genitalium genome (approximately 580 kb) and the Haemophilus influenzae Rd genome (approximately 1.8 Mb). Sensitivity was found to improve in both the Open Reading Frames (ORFs) which have been identified as genes (by homology or by other methods) and those that are classified as hypothetical. False positive assignments (at the nucleotide level) were 0.25% in H. influenzae genome and 0.3% in M. genitalium. There were no false positive assignments in the plastid-like genome. The agreement between the GeneScan predictions and GeneMark predictions of putative ORFs was 97% in M. genitalium genome and 86% in H. influenzae genome. In terms of an exact match between predicted genes/ORFs and the annotation in the databank, GeneScan performance was evaluated to be between 72% and 90% in different genomes. We predict five putative ORFs that were not annotated earlier in the GenBank files for both M. genitalium and H. influenzae genomes. Our preliminary analysis of the newly sequenced G + C rich genome of Mycobacterium tuberculosis H37Rv also shows comparable sensitivity (99%). PMID:10353188

  5. Biased distribution of DNA uptake sequences towards genome maintenance genes.

    PubMed

    Davidsen, Tonje; Rødland, Einar A; Lagesen, Karin; Seeberg, Erling; Rognes, Torbjørn; Tønjum, Tone

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H.influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions. These results imply that the high frequency of DUS in genome maintenance genes is conserved among phylogenetically divergent species and thus are of significant biological importance. Increased DUS density is expected to enhance DNA uptake and the over-representation of DUS in genome maintenance genes might reflect facilitated recovery of genome preserving functions. For example, transient and beneficial increase in genome instability can be allowed during pathogenesis simply through loss of antimutator genes, since these DUS-containing sequences will be preferentially recovered. Furthermore, uptake of such genes could provide a mechanism for facilitated recovery from DNA damage after genotoxic stress. PMID:14960717

  6. Genomic organization and 5{prime}-flanking DNA sequence of the murine stomatin gene (Epb72)

    SciTech Connect

    Gallagher, P.G.; Turetsky, T.; Mentzer, W.C.

    1996-06-15

    Stomatin is a poorly understood integral membrane protein that is absent from the erythrocyte membranes of many patients with hereditary stomatocytosis. This report describes the cloning of the murine stomatin chromosomal gene, determination of its genomic structure, and characterization of the 5{prime}-flanking genomic DNA sequences. The stomatin gene is encoded by seven exons spread over {approximately}25 kb of genomic DNA. There is no concordance between the exon structure of the stomatin gene and the locations of three domains predicted on the basis of protein structure. Inspection of the 5{prime}-flanking DNA sequences reveals features of a TATA-less housekeeping gene promoter and consensus sequences for a number of potential DNA-binding proteins. 12 refs., 2 figs., 1 tab.

  7. Hemipteran genomics and psyllid gene expression

    Technology Transfer Automated Retrieval System (TEKTRAN)

    One of the best tools current available is the application of genomics to insect pest problems. Genomics provides rapid elucidation of the genetic basis of insect biology. Research efforts on psyllid genomics, while still in its infancy, is providing information which will aid strategies to suppress...

  8. Base composition and gene distribution: critical patterns in mammalian genome organization.

    PubMed

    Gardiner, K

    1996-12-01

    Recent success in developing transcriptional maps of large genomic regions provide excellent opportunities for the investigation of mammalian genome organization. Detailed definition of organizational features will, in the short term, aid in prioritizing genomic sequencing efforts and in interpreting sequencing results and, in the long term, will surely provide insights into the structural, functional and evolutionary basis for the mammalian chromosome and chromosomal banding patterns. For such efforts, human chromosome 21 provides an excellent model system because the physical and clone maps are detailed, and several transcriptional mapping projects have provided large numbers of novel genes. It is, therefore, valuable at this point to examine these transcriptional mapping data and to compare them with the isochore model of the mammalian genome, which describes patterns in base composition and predicts gene distributions. Not only do compelling organizational patterns appear, but new questions about additional possible patterns in gene size, structure, conservation and transcription can be asked. PMID:9257535

  9. Plant Ion Channels: Gene Families, Physiology, and Functional Genomics Analyses

    PubMed Central

    Ward, John M.; Mäser, Pascal; Schroeder, Julian I.

    2016-01-01

    Distinct potassium, anion, and calcium channels in the plasma membrane and vacuolar membrane of plant cells have been identified and characterized by patch clamping. Primarily owing to advances in Arabidopsis genetics and genomics, and yeast functional complementation, many of the corresponding genes have been identified. Recent advances in our understanding of ion channel genes that mediate signal transduction and ion transport are discussed here. Some plant ion channels, for example, ALMT and SLAC anion channel subunits, are unique. The majority of plant ion channel families exhibit homology to animal genes; such families include both hyperpolarization-and depolarization-activated Shaker-type potassium channels, CLC chloride transporters/channels, cyclic nucleotide–gated channels, and ionotropic glutamate receptor homologs. These plant ion channels offer unique opportunities to analyze the structural mechanisms and functions of ion channels. Here we review gene families of selected plant ion channel classes and discuss unique structure-function aspects and their physiological roles in plant cell signaling and transport. PMID:18842100

  10. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

    PubMed Central

    Jang, Ho; Hur, Youngmi; Lee, Hyunju

    2016-01-01

    DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes. PMID:27156852

  11. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data.

    PubMed

    Jang, Ho; Hur, Youngmi; Lee, Hyunju

    2016-01-01

    DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes. PMID:27156852

  12. Genome-editing Technologies for Gene and Cell Therapy.

    PubMed

    Maeder, Morgan L; Gersbach, Charles A

    2016-03-01

    Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed. PMID:26755333

  13. Genome-editing Technologies for Gene and Cell Therapy

    PubMed Central

    Maeder, Morgan L; Gersbach, Charles A

    2016-01-01

    Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed. PMID:26755333

  14. Integrating microarray gene expression object model and clinical document architecture for cancer genomics research.

    PubMed

    Park, Yu Rang; Lee, Hye Won; Kim, Ju Han

    2005-01-01

    Systematic integration of genomic-scale expression profiles with clinical information may facilitate cancer genomics research. MAGE-OM (Microarray Gene Expression Object Model) defines standard objects for genomic but not for clinical data. HL7 CDA (Clinical Document Architecture) is a document model for clinical information, describing syntax (generic structure) but not semantics. We designed a document template in XML Schema with additional constraints for CDA to define content semantics, enabling data model-level integration of MAGE-OM and CDA for cancer genomics research. PMID:16779360

  15. Genome Variability and Gene Content in Chordopoxviruses: Dependence on Microsatellites

    PubMed Central

    Hatcher, Eneida L.; Wang, Chunlin; Lefkowitz, Elliot J.

    2015-01-01

    To investigate gene loss in poxviruses belonging to the Chordopoxvirinae subfamily, we assessed the gene content of representative members of the subfamily, and determined whether individual genes present in each genome were intact, truncated, or fragmented. When nonintact genes were identified, the early stop mutations (ESMs) leading to gene truncation or fragmentation were analyzed. Of all the ESMs present in these poxvirus genomes, over 65% co-localized with microsatellites—simple sequence nucleotide repeats. On average, microsatellites comprise 24% of the nucleotide sequence of these poxvirus genomes. These simple repeats have been shown to exhibit high rates of variation, and represent a target for poxvirus protein variation, gene truncation, and reductive evolution. PMID:25912716

  16. Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis

    PubMed Central

    2012-01-01

    Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs) is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS) however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS) for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units) technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium) data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction. PMID:22554139

  17. Genome-wide Membrane Protein Structure Prediction

    PubMed Central

    Piccoli, Stefano; Suku, Eda; Garonzi, Marianna; Giorgetti, Alejandro

    2013-01-01

    Transmembrane proteins allow cells to extensively communicate with the external world in a very accurate and specific way. They form principal nodes in several signaling pathways and attract large interest in therapeutic intervention, as the majority pharmaceutical compounds target membrane proteins. Thus, according to the current genome annotation methods, a detailed structural/functional characterization at the protein level of each of the elements codified in the genome is also required. The extreme difficulty in obtaining high-resolution three-dimensional structures, calls for computational approaches. Here we review to which extent the efforts made in the last few years, combining the structural characterization of membrane proteins with protein bioinformatics techniques, could help describing membrane proteins at a genome-wide scale. In particular we analyze the use of comparative modeling techniques as a way of overcoming the lack of high-resolution three-dimensional structures in the human membrane proteome. PMID:24403851

  18. The Trypanosoma cruzi genome; conserved core genes and extremely variable surface molecule families.

    PubMed

    Andersson, Björn

    2011-01-01

    The protozoan parasite Trypanosoma cruzi is an important but neglected pathogen that causes chagas disease, which affects millions of people, mainly in latin America. The population structure and epidemiology of the parasite are complex, with much variability among strains. The genome sequence of a reference strain, CL Brener, was published in 2005, and the availability of this sequence has both revealed the complexity of the parasite genome and greatly facilitated research into parasite biology and pathogenesis, by making the sequences of more than 8000 core genes available. The T. cruzi genome is highly repetitive, which has resulted in inaccuracies in the genome sequence, and attempts have been made to provide a deeper analysis of repeated genes as a complement to the genome sequence. The genome was found to be organized in stable core regions containing housekeeping and other genes, surrounded by highly repetitive, often sub-telomeric highly variable regions containing multiple members of large families of surface molecule genes. Comparative sequencing of T. cruzi strains has been initiated and the results show that the core gene content of the parasite is highly conserved, but that much sequence variability, 3-4% difference at the DNA level on average between strains in coding regions, is present. The additional genomes will improve the understanding of parasite biology and epidemiology. PMID:21624458

  19. Higher plant mitochondrial DNA: Genomes, genes, mutants, transcription, translation

    SciTech Connect

    Not Available

    1986-01-01

    This volume contains brief summaries of 63 presentations given at the International Workshop on Higher Plant Mitochondrial DNA. The presentations are organized into topical discussions addressing plant genomes, mitochondrial genes, cytoplasmic male sterility, transcription, translation, plasmids and tissue culture. (DT)

  20. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics.

    PubMed

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-03-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  1. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics

    PubMed Central

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-01-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  2. Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

    PubMed

    Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

    2016-04-01

    Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp. PMID:26724943

  3. Comparative analysis of essential genes in prokaryotic genomic islands

    PubMed Central

    Zhang, Xi; Peng, Chong; Zhang, Ge; Gao, Feng

    2015-01-01

    Essential genes are thought to encode proteins that carry out the basic functions to sustain a cellular life, and genomic islands (GIs) usually contain clusters of horizontally transferred genes. It has been assumed that essential genes are not likely to be located in GIs, but systematical analysis of essential genes in GIs has not been explored before. Here, we have analyzed the essential genes in 28 prokaryotes by statistical method and reached a conclusion that essential genes in GIs are significantly fewer than those outside GIs. The function of 362 essential genes found in GIs has been explored further by BLAST against the Virulence Factor Database (VFDB) and the phage/prophage sequence database of PHAge Search Tool (PHAST). Consequently, 64 and 60 eligible essential genes are found to share the sequence similarity with the virulence factors and phage/prophages-related genes, respectively. Meanwhile, we find several toxin-related proteins and repressors encoded by these essential genes in GIs. The comparative analysis of essential genes in genomic islands will not only shed new light on the development of the prediction algorithm of essential genes, but also give a clue to detect the functionality of essential genes in genomic islands. PMID:26223387

  4. A genomic view on epilepsy and autism candidate genes.

    PubMed

    Jabbari, Kamel; Nürnberg, Peter

    2016-07-01

    Epilepsy is a common complex disorder most frequently associated with psychiatric and neurological diseases. Massive parallel sequencing of individual or cohort genomes and exomes led the identification of several disease associated genes. We review here the candidate genes in epilepsy genetics with focus on exome and gene panel data. Together with the examination of brain expressed genes and post synaptic proteome the results show that: (1) Non-metabolic epilepsies and autism candidate genes tend to be AT-rich and (2) large transcript size and local AT-richness are characteristic features of genes involved in developmental brain disorders and synaptic functions. These results point to the preferential location of core epilepsy and autism candidate genes in late replicating, GC-poor chromosomal regions (isochores). These results indicate that the genomic alterations leading to some brain disorders are confined to responsive chromatin areas harboring brain critical genes. PMID:26772991

  5. Genomic structure, chromosomal localization and expression profile of a novel melanoma differentiation associated (mda-7) gene with cancer specific growth suppressing and apoptosis inducing properties.

    SciTech Connect

    Huang, E. Y.; Madireddi, M. T.; Gopalkrishnan, R. V.; Leszczyniecka, M.; Su, Z. Z.; Lebedeva, I. V.; Kang, D. C.; Jian, H.; Lin, J. J.; Alexandre, D.; Chen, Y.; Vozhilla, N.; Mei, M. X.; Christiansen, K. A.; Sivo, F.; Goldstein, N. I.; Chada, S.; Huberman, E.; Pestka, S.; Fisher, P. B.; Biochip Technology Center; Columbia Univ.; Introgen Therapeutics Inc.; UMDNJ-Robert Wood Johnson Medical School

    2001-10-25

    Abnormalities in cellular differentiation are frequent occurrences in human cancers. Treatment of human melanoma cells with recombinant fibroblast interferon (IFN-beta) and the protein kinase C activator mezerein (MEZ) results in an irreversible loss in growth potential, suppression of tumorigenic properties and induction of terminal cell differentiation. Subtraction hybridization identified melanoma differentiation associated gene-7 (mda-7), as a gene induced during these physiological changes in human melanoma cells. Ectopic expression of mda-7 by means of a replication defective adenovirus results in growth suppression and induction of apoptosis in a broad spectrum of additional cancers, including melanoma, glioblastoma multiforme, osteosarcoma and carcinomas of the breast, cervix, colon, lung, nasopharynx and prostate. In contrast, no apparent harmful effects occur when mda-7 is expressed in normal epithelial or fibroblast cells. Human clones of mda-7 were isolated and its organization resolved in terms of intron/exon structure and chromosomal localization. Hu-mda-7 encompasses seven exons and six introns and encodes a protein with a predicted size of 23.8 kDa, consisting of 206 amino acids. Hu-mda-7 mRNA is stably expressed in the thymus, spleen and peripheral blood leukocytes. De novo mda-7 mRNA expression is also detected in human melanocytes and expression is inducible in cells of melanocyte/melanoma lineage and in certain normal and cancer cell types following treatment with a combination of IFN-beta plus MEZ. Mda-7 expression is also induced during megakaryocyte differentiation induced in human hematopoietic cells by treatment with TPA (12-O-tetradecanoyl phorbol-13-acetate). In contrast, de novo expression of mda-7 is not detected nor is it inducible by IFN-beta+MEZ in a spectrum of additional normal and cancer cells. No correlation was observed between induction of mda-7 mRNA expression and growth suppression following treatment with IFN-beta+MEZ and

  6. Impact of recurrent gene duplication on adaptation of plant genomes

    PubMed Central

    2014-01-01

    Background Recurrent gene duplication and retention played an important role in angiosperm genome evolution. It has been hypothesized that these processes contribute significantly to plant adaptation but so far this hypothesis has not been tested at the genome scale. Results We studied available sequenced angiosperm genomes to assess the frequency of positive selection footprints in lineage specific expanded (LSE) gene families compared to single-copy genes using a dN/dS-based test in a phylogenetic framework. We found 5.38% of alignments in LSE genes with codons under positive selection. In contrast, we found no evidence for codons under positive selection in the single-copy reference set. An analysis at the branch level shows that purifying selection acted more strongly on single-copy genes than on LSE gene clusters. Moreover we detect significantly more branches indicating evolution under positive selection and/or relaxed constraint in LSE genes than in single-copy genes. Conclusions In this – to our knowledge –first genome-scale study we provide strong empirical support for the hypothesis that LSE genes fuel adaptation in angiosperms. Our conservative approach for detecting selection footprints as well as our results can be of interest for further studies on (plant) gene family evolution. PMID:24884640

  7. Structure of the human hexabrachion (tenascin) gene.

    PubMed Central

    Gulcher, J R; Nies, D E; Alexakos, M J; Ravikant, N A; Sturgill, M E; Marton, L S; Stefansson, K

    1991-01-01

    The structure of the gene encoding human hexabrachion (tenascin) has been determined from overlapping clones isolated from a human genomic bacteriophage library. The genomic inserts were characterized by restriction mapping, Southern blot analysis, PCR, and DNA sequencing. The coding region of the hexabrachion gene spans approximately 80 kilobases of DNA and consists of 27 exons separated by 26 introns. The exon-intron structure supports a hypothesis based on the cDNA sequence that the hexabrachion gene is an assembly of DNA modules that are also found elsewhere in the genome. Single exons may encode a module, a portion of a module, or a group of modules. The 15 type III units similar to those found in fibronectin are each encoded either by a single exon or by two exons interrupted by an intron. All type III units known to be spliced out of the smaller forms of the protein are encoded by one exon. The fibrinogen-like domain of 210 amino acids is encoded by five exons. The 14.5 epidermal growth factor-like repeats are all encoded by a single exon. Images PMID:1719530

  8. Historical overview of research on the tobacco mosaic virus genome: genome organization, infectivity and gene manipulation.

    PubMed Central

    Okada, Y

    1999-01-01

    Early in the development of molecular biology, TMV RNA was widely used as a mRNA [corrected] that could be purified easily, and it contributed much to research on protein synthesis. Also, in the early stages of elucidation of the genetic code, artificially produced TMV mutants were widely used and provided the first proof that the genetic code was non-overlapping. In 1982, Goelet et al. determined the complete TMV RNA base sequence of 6395 nucleotides. The four genes (130K, 180K, 30K and coat protein) could then be mapped at precise locations in the TMV genome. Furthermore it had become clear, a little earlier, that genes located internally in the genome were expressed via subgenomic mRNAs. The initiation site for assembly of TMV particles was also determined. However, although TMV contributed so much at the beginning of the development of molecular biology, its influence was replaced by that of Escherichia coli and its phages in the next phase. As recombinant DNA technology developed in the 1980s, RNA virus research became more detached from the frontier of molecular biology. To recover from this setback, a gene-manipulation system was needed for RNA viruses. In 1986, two such systems were developed for TMV, using full-length cDNA clones, by Dawson's group and by Okada's group. Thus, reverse genetics could be used to elucidate the basic functions of all proteins encoded by the TMV genome. Identification of the function of the 30K protein was especially important because it was the first evidence that a plant virus possesses a cell-to-cell movement function. Many other plant viruses have since been found to encode comparable 'movement proteins'. TMV thus became the first plant virus for which structures and functions were known for all its genes. At the birth of molecular plant pathology, TMV became a leader again. TMV has also played pioneering roles in many other fields. TMV was the first virus for which the amino acid sequence of the coat protein was determined

  9. Genomic variants of genes associated with three horticultural traits in apple revealed by genome re-sequencing

    PubMed Central

    Zhang, Shijie; Chen, Weiping; Xin, Lu; Gao, Zhihong; Hou, Yingjun; Yu, Xinyi; Zhang, Zhen; Qu, Shenchun

    2014-01-01

    The apple (Malus × domestica Borkh.) cultivar ‘Su Shuai’ exhibits greater disease resistance, shorter internodes and lighter fruit flavor compared with its parents ‘Golden Delicious’ and ‘Indo’. To obtain a comprehensive overview of the sequence variation in these three horticultural traits, the genomes of ‘Su Shuai’ and ‘Indo’ were resequenced using next-generation sequencing and compared to the genome of ‘Golden Delicious’. A wide range of genetic variations were detected, including 2 454 406 and 18 749 349 single nucleotide polymorphism (SNP) and 59 547 and 50 143 structural variants (SVs) in the ‘Indo’ and ‘Su Shuai’ genomes, respectively. Among the SVs in ‘Su Shuai’, 17 genes related to disease resistance, 10 genes related to Gibberellin (GA) and 19 genes associated with fruit flavor were identified. The expression patterns of eight of the SV genes were examined using reverse transcription-quantitative polymerase chain reaction (RT-qPCR). The results of this study illustrate the genomic variation in these cultivars and provide evidence for a genetic basis for the horticultural traits of disease resistance, short internodes and lighter flavor exhibited in these cultivars. These results provide a genetic basis for the phenotypic characteristics of ‘Su Shuai’ and, as such, these SVs could serve as gene-specific molecular markers in maker-assisted breeding of apples. PMID:26504548

  10. Identification of chemosensory receptor genes from vertebrate genomes.

    PubMed

    Niimura, Yoshihito

    2013-01-01

    Chemical senses are essential for the survival of animals. In vertebrates, mainly three different types of receptors, olfactory receptors (ORs), vomeronasal receptors type 1 (V1Rs), and vomeronasal receptors type 2 (V2Rs), are responsible for the detection of chemicals in the environment. Mouse or rat genomes contain >1,000 OR genes, forming the largest multigene family in vertebrates, and have >100 V1R and V2R genes as well. Recent advancement in genome sequencing enabled us to computationally identify nearly complete repertories of OR, V1R, and V2R genes from various organisms, revealing that the numbers of these genes are highly variable among different organisms depending on each species' living environment. Here I would explain bioinformatic methods to identify the entire repertoires of OR, V1R, and V2R genes from vertebrate genome sequences. PMID:24014356

  11. The structure of neutrophil defensin genes.

    PubMed

    Linzmeier, R; Michaelson, D; Liu, L; Ganz, T

    1993-04-26

    Defensins are a family of microbicidal peptides abundant in the granules of mammalian neutrophils, in rabbit alveolar macrophages, and in human and murine intestinal Paneth cells. We cloned and sequenced the genes of three neutrophil-specific defensins. Human HNP-1 and HNP-3 are nearly identical and rabbit NP-3a is closely related. The four known neutrophil-specific defensin genes are strikingly similar in the structure and organization of their three exons and two introns, but the three defensin genes expressed in macrophages (MCP-1 and -2) or Paneth cells (HD-5) are organized differently: HD-5 had only two exons, and MCP-1 and -2 have a comparatively short first intron. The diverse genomic organization of defensin genes may contribute to their cell-specific expression. PMID:8477861

  12. Genome-Wide Detection and Analysis of Multifunctional Genes

    PubMed Central

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  13. Genome-Wide Detection and Analysis of Multifunctional Genes.

    PubMed

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-10-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms--H. sapiens, D. melanogaster, and S. cerevisiae--and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  14. Pinpointing disease genes through phenomic and genomic data fusion

    PubMed Central

    2015-01-01

    Background Pinpointing genes involved in inherited human diseases remains a great challenge in the post-genomics era. Although approaches have been proposed either based on the guilt-by-association principle or making use of disease phenotype similarities, the low coverage of both diseases and genes in existing methods has been preventing the scan of causative genes for a significant proportion of diseases at the whole-genome level. Results To overcome this limitation, we proposed a rigorous statistical method called pgFusion to prioritize candidate genes by integrating one type of disease phenotype similarity derived from the Unified Medical Language System (UMLS) and seven types of gene functional similarities calculated from gene expression, gene ontology, pathway membership, protein sequence, protein domain, protein-protein interaction and regulation pattern, respectively. Our method covered a total of 7,719 diseases and 20,327 genes, achieving the highest coverage thus far for both diseases and genes. We performed leave-one-out cross-validation experiments to demonstrate the superior performance of our method and applied it to a real exome sequencing dataset of epileptic encephalopathies, showing the capability of this approach in finding causative genes for complex diseases. We further provided the standalone software and online services of pgFusion at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgfusion. Conclusions pgFusion not only provided an effective way for prioritizing candidate genes, but also demonstrated feasible solutions to two fundamental questions in the analysis of big genomic data: the comparability of heterogeneous data and the integration of multiple types of data. Applications of this method in exome or whole genome sequencing studies would accelerate the finding of causative genes for human diseases. Other research fields in genomics could also benefit from the incorporation of our data fusion methodology. PMID:25708473

  15. Rapid turnover of antimicrobial-type cysteine-rich protein genes in closely related Oryza genomes.

    PubMed

    Shenton, Matthew R; Ohyanagi, Hajime; Wang, Zi-Xuan; Toyoda, Atsushi; Fujiyama, Asao; Nagata, Toshifumi; Feng, Qi; Han, Bin; Kurata, Nori

    2015-10-01

    Defensive and reproductive protein genes undergo rapid evolution. Small, cysteine-rich secreted peptides (CRPs) act as antimicrobial agents and function in plant intercellular signaling and are over-represented among reproductively expressed proteins. Because of their roles in defense, reproduction and development and their presence in multigene families, CRP variation can have major consequences for plant phenotypic and functional diversification. We surveyed the CRP genes of six closely related Oryza genomes comprising Oryza sativa ssp. japonica and ssp. indica, Oryza glaberrima and three accessions of Oryza rufipogon to observe patterns of evolution in these gene families and the effects of variation on their gene expression. These Oryza genomes, like other plant genomes, have accumulated large reservoirs of CRP sequences, comprising 26 groups totaling between 676 and 843 genes, in contrast to antimicrobial CRPs in animal genomes. Despite the close evolutionary relationships between the genomes, we observed rapid changes in number and structure among CRP gene families. Many CRP sequences are in gene clusters generated by local duplications, have undergone rapid turnover and are more likely to be silent or specifically expressed. By contrast, conserved CRP genes are more likely to be highly and broadly expressed. Variable CRP genes created by repeated duplication, gene modification and inactivation can gain new functions and expression patterns in newly evolved gene copies. For the CRP proteins, the process of gain/loss by deletion or duplication at gene clusters seems to be an important mechanism in evolution of the gene families, which also contributes to their expression evolution. PMID:25842177

  16. The fractal structure of the mitochondrial genomes

    NASA Astrophysics Data System (ADS)

    Oiwa, Nestor N.; Glazier, James A.

    2002-08-01

    The mitochondrial DNA genome has a definite multifractal structure. We show that loops, hairpins and inverted palindromes are responsible for this self-similarity. We can thus establish a definite relation between the function of subsequences and their fractal dimension. Intriguingly, protein coding DNAs also exhibit palindromic structures, although they do not appear in the sequence of amino acids. These structures may reflect the stabilization and transcriptional control of DNA or the control of posttranscriptional editing of mRNA.

  17. Molecular Assemblies, Genes and Genomics Integrated Efficiently (MAGGIE)

    SciTech Connect

    Baliga, Nitin S

    2011-05-26

    Final report on MAGGIE. We set ambitious goals to model the functions of individual organisms and their community from molecular to systems scale. These scientific goals are driving the development of sophisticated algorithms to analyze large amounts of experimental measurements made using high throughput technologies to explain and predict how the environment influences biological function at multiple scales and how the microbial systems in turn modify the environment. By experimentally evaluating predictions made using these models we will test the degree to which our quantitative multiscale understanding wilt help to rationally steer individual microbes and their communities towards specific tasks. Towards this end we have made substantial progress towards understanding evolution of gene families, transcriptional structures, detailed structures of keystone molecular assemblies (proteins and complexes), protein interactions, biological networks, microbial interactions, and community structure. Using comparative analysis we have tracked the evolutionary history of gene functions to understand how novel functions evolve. One level up, we have used proteomics data, high-resolution genome tiling microarrays, and 5' RNA sequencing to revise genome annotations, discover new genes including ncRNAs, and map dynamically changing operon structures of five model organisms: For Desulfovibrio vulgaris Hildenborough, Pyrococcus furiosis, Sulfolobus solfataricus, Methanococcus maripaludis and Haiobacterium salinarum NROL We have developed machine learning algorithms to accurately identify protein interactions at a near-zero false positive rate from noisy data generated using tagfess complex purification, TAP purification, and analysis of membrane complexes. Combining other genome-scale datasets produced by ENIGMA (in particular, microarray data) and available from literature we have been able to achieve a true positive rate as high as 65% at almost zero false positives when

  18. The cavefish genome reveals candidate genes for eye loss

    PubMed Central

    McGaugh, Suzanne E.; Gross, Joshua B.; Aken, Bronwen; Blin, Maryline; Borowsky, Richard; Chalopin, Domitille; Hinaux, Hélène; Jeffery, William R.; Keene, Alex; Ma, Li; Minx, Patrick; Murphy, Daniel; O’Quin, Kelly E.; Rétaux, Sylvie; Rohner, Nicolas; Searle, Steve M. J.; Stahl, Bethany A.; Tabin, Cliff; Volff, Jean-Nicolas; Yoshizawa, Masato; Warren, Wesley C.

    2014-01-01

    Natural populations subjected to strong environmental selection pressures offer a window into the genetic underpinnings of evolutionary change. Cavefish populations, Astyanax mexicanus (Teleostei: Characiphysi), exhibit repeated, independent evolution for a variety of traits including eye degeneration, pigment loss, increased size and number of taste buds and mechanosensory organs, and shifts in many behavioural traits. Surface and cave forms are interfertile making this system amenable to genetic interrogation; however, lack of a reference genome has hampered efforts to identify genes responsible for changes in cave forms of A. mexicanus. Here we present the first de novo genome assembly for Astyanax mexicanus cavefish, contrast repeat elements to other teleost genomes, identify candidate genes underlying quantitative trait loci (QTL), and assay these candidate genes for potential functional and expression differences. We expect the cavefish genome to advance understanding of the evolutionary process, as well as, analogous human disease including retinal dysfunction. PMID:25329095

  19. The cavefish genome reveals candidate genes for eye loss.

    PubMed

    McGaugh, Suzanne E; Gross, Joshua B; Aken, Bronwen; Blin, Maryline; Borowsky, Richard; Chalopin, Domitille; Hinaux, Hélène; Jeffery, William R; Keene, Alex; Ma, Li; Minx, Patrick; Murphy, Daniel; O'Quin, Kelly E; Rétaux, Sylvie; Rohner, Nicolas; Searle, Steve M J; Stahl, Bethany A; Tabin, Cliff; Volff, Jean-Nicolas; Yoshizawa, Masato; Warren, Wesley C

    2014-01-01

    Natural populations subjected to strong environmental selection pressures offer a window into the genetic underpinnings of evolutionary change. Cavefish populations, Astyanax mexicanus (Teleostei: Characiphysi), exhibit repeated, independent evolution for a variety of traits including eye degeneration, pigment loss, increased size and number of taste buds and mechanosensory organs, and shifts in many behavioural traits. Surface and cave forms are interfertile making this system amenable to genetic interrogation; however, lack of a reference genome has hampered efforts to identify genes responsible for changes in cave forms of A. mexicanus. Here we present the first de novo genome assembly for Astyanax mexicanus cavefish, contrast repeat elements to other teleost genomes, identify candidate genes underlying quantitative trait loci (QTL), and assay these candidate genes for potential functional and expression differences. We expect the cavefish genome to advance understanding of the evolutionary process, as well as, analogous human disease including retinal dysfunction. PMID:25329095

  20. Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss

    PubMed Central

    2010-01-01

    Background The bacterial genus Listeria contains pathogenic and non-pathogenic species, including the pathogens L. monocytogenes and L. ivanovii, both of which carry homologous virulence gene clusters such as the prfA cluster and clusters of internalin genes. Initial evidence for multiple deletions of the prfA cluster during the evolution of Listeria indicates that this genus provides an interesting model for studying the evolution of virulence and also presents practical challenges with regard to definition of pathogenic strains. Results To better understand genome evolution and evolution of virulence characteristics in Listeria, we used a next generation sequencing approach to generate draft genomes for seven strains representing Listeria species or clades for which genome sequences were not available. Comparative analyses of these draft genomes and six publicly available genomes, which together represent the main Listeria species, showed evidence for (i) a pangenome with 2,032 core and 2,918 accessory genes identified to date, (ii) a critical role of gene loss events in transition of Listeria species from facultative pathogen to saprotroph, even though a consistent pattern of gene loss seemed to be absent, and a number of isolates representing non-pathogenic species still carried some virulence associated genes, and (iii) divergence of modern pathogenic and non-pathogenic Listeria species and strains, most likely circa 47 million years ago, from a pathogenic common ancestor that contained key virulence genes. Conclusions Genome evolution in Listeria involved limited gene loss and acquisition as supported by (i) a relatively high coverage of the predicted pan-genome by the observed pan-genome, (ii) conserved genome size (between 2.8 and 3.2 Mb), and (iii) a highly syntenic genome. Limited gene loss in Listeria did include loss of virulence associated genes, likely associated with multiple transitions to a saprotrophic lifestyle. The genus Listeria thus provides

  1. Genome-scale comparative analysis of gene fusions, gene fissions, and the fungal tree of life

    PubMed Central

    Leonard, Guy; Richards, Thomas A.

    2012-01-01

    During the course of evolution genes undergo both fusion and fission by which ORFs are joined or separated. These processes can amend gene function and represent an important factor in the evolution of protein interaction networks. Gene fusions have been suggested to be useful characters for identifying evolutionary relationships because they constitute synapomorphies or cladistic characters. To investigate the fidelity of gene-fusion characters, we developed an approach for identifying differentially distributed gene fusions among whole-genome datasets: fdfBLAST. Applying this tool to the Fungi, we identified 63 gene fusions present in two or more genomes. Using a combination of phylogenetic and comparative genomic analyses, we then investigated the evolution of these genes across 115 fungal genomes, testing each gene fusion for evidence of homoplasy, including gene fission, convergence, and horizontal gene transfer. These analyses demonstrated 110 gene-fission events. We then identified a minimum of three mechanisms that drive gene fission: separation, degeneration, and duplication. These data suggest that gene fission plays an important and hitherto underestimated role in gene evolution. Gene fusions therefore are highly labile characters, and their use for polarizing evolutionary relationships, without reference to gene and species phylogenies, is limited. Accounting for these considerable sources of homoplasy, we identified fusion characters that provide support for multiple nodes in the phylogeny of the Fungi, including relationships within the deeply derived flagellum-forming fungi (i.e., the chytrids). PMID:23236161

  2. Performing integrative functional genomics analysis in GeneWeaver.org.

    PubMed

    Jay, Jeremy J; Chesler, Elissa J

    2014-01-01

    Functional genomics experiments and analyses give rise to large sets of results, each typically quantifying the relation of molecular entities including genes, gene products, polymorphisms, and other genomic features with biological characteristics or processes. There is tremendous utility and value in using these data in an integrative fashion to find convergent evidence for the role of genes in various processes, to identify functionally similar molecular entities, or to compare processes based on their genomic correlates. However, these gene-centered data are often deposited in diverse and non-interoperable stores. Therefore, integration requires biologists to implement computational algorithms and harmonization of gene identifiers both within and across species. The GeneWeaver web-based software system brings together a large data archive from diverse functional genomics data with a suite of combinatorial tools in an interactive environment. Account management features allow data and results to be shared among user-defined groups. Users can retrieve curated gene set data, upload, store, and share their own experimental results and perform integrative analyses including novel algorithmic approaches for set-set integration of genes and functions. PMID:24233775

  3. Genome-level evolution of resistance genes in Arabidopsis thaliana.

    PubMed Central

    Baumgarten, Andrew; Cannon, Steven; Spangler, Russ; May, Georgiana

    2003-01-01

    Pathogen resistance genes represent some of the most abundant and diverse gene families found within plant genomes. However, evolutionary mechanisms generating resistance gene diversity at the genome level are not well understood. We used the complete Arabidopsis thaliana genome sequence to show that most duplication of individual NBS-LRR sequences occurs at close physical proximity to the parent sequence and generates clusters of closely related NBS-LRR sequences. Deploying the statistical strength of phylogeographic approaches and using chromosomal location as a proxy for spatial location, we show that apparent duplication of NBS-LRR genes to ectopic chromosomal locations is largely the consequence of segmental chromosome duplication and rearrangement, rather than the independent duplication of individual sequences. Although accounting for a smaller fraction of NBS-LRR gene duplications, segmental chromosome duplication and rearrangement events have a large impact on the evolution of this multigene family. Intergenic exchange is dramatically lower between NBS-LRR sequences located in different chromosome regions as compared to exchange between sequences within the same chromosome region. Consequently, once translocated to new chromosome locations, NBS-LRR gene copies have a greater likelihood of escaping intergenic exchange and adopting new functions than do gene copies located within the same chromosomal region. We propose an evolutionary model that relates processes of genome evolution to mechanisms of evolution for the large, diverse, NBS-LRR gene family. PMID:14504238

  4. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure

    PubMed Central

    2011-01-01

    Background Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome. Results Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella. Conclusions When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution. PMID:21619600

  5. Complexity, Post-genomic Biology and Gene Expression Programs

    NASA Astrophysics Data System (ADS)

    Williams, Rohan B. H.; Luo, Oscar Junhong

    Gene expression represents the fundamental phenomenon by which information encoded in a genome is utilised for the overall biological objectives of the organism. Understanding this level of information transfer is therefore essential for dissecting the mechanistic basis of form and function of organisms. We survey recent developments in the methodology of the life sciences that is relevant for understanding the organisation and function of the genome and review our current understanding of the regulation of gene expression, and finally, outline some new approaches that may be useful in understanding the organisation of gene regulatory systems.

  6. LATERAL GENE TRANSFER AND THE HISTORY OF BACTERIAL GENOMES

    SciTech Connect

    Howard Ochman

    2006-02-22

    The aims of this research were to elucidate the role and extent of lateral transfer in the differentiation of bacterial strains and species, and to assess the impact of gene transfer on the evolution of bacterial genomes. The ultimate goal of the project is to examine the dynamics of a core set of protein-coding genes (i.e., those that are distributed universally among Bacteria) by developing conserved primers that would allow their amplification and sequencing in any bacterial taxa. In addition, we adopted a bioinformatic approach to elucidate the extent of lateral gene transfer in sequenced genome.

  7. The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species

    PubMed Central

    2009-01-01

    The Gene Ontology (GO) is a collaborative effort that provides structured vocabularies for annotating the molecular function, biological role, and cellular location of gene products in a highly systematic way and in a species-neutral manner with the aim of unifying the representation of gene function across different organisms. Each contributing member of the GO Consortium independently associates GO terms to gene products from the organism(s) they are annotating. Here we introduce the Reference Genome project, which brings together those independent efforts into a unified framework based on the evolutionary relationships between genes in these different organisms. The Reference Genome project has two primary goals: to increase the depth and breadth of annotations for genes in each of the organisms in the project, and to create data sets and tools that enable other genome annotation efforts to infer GO annotations for homologous genes in their organisms. In addition, the project has several important incidental benefits, such as increasing annotation consistency across genome databases, and providing important improvements to the GO's logical structure and biological content. PMID:19578431

  8. Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome.

    PubMed

    Muñoz-Amatriaín, María; Lonardi, Stefano; Luo, MingCheng; Madishetty, Kavitha; Svensson, Jan T; Moscou, Matthew J; Wanamaker, Steve; Jiang, Tao; Kleinhofs, Andris; Muehlbauer, Gary J; Wise, Roger P; Stein, Nils; Ma, Yaqin; Rodriguez, Edmundo; Kudrna, Dave; Bhat, Prasanna R; Chao, Shiaoman; Condamine, Pascal; Heinen, Shane; Resnik, Josh; Wing, Rod; Witt, Heather N; Alpert, Matthew; Beccuti, Marco; Bozdag, Serdar; Cordero, Francesca; Mirebrahim, Hamid; Ounit, Rachid; Wu, Yonghui; You, Frank; Zheng, Jie; Simková, Hana; Dolezel, Jaroslav; Grimwood, Jane; Schmutz, Jeremy; Duma, Denisa; Altschmied, Lothar; Blake, Tom; Bregitzer, Phil; Cooper, Laurel; Dilbirligi, Muharrem; Falk, Anders; Feiz, Leila; Graner, Andreas; Gustafson, Perry; Hayes, Patrick M; Lemaux, Peggy; Mammadov, Jafar; Close, Timothy J

    2015-10-01

    Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant. PMID:26252423

  9. Detection of Prokaryotic Genes in the Amphimedon queenslandica Genome

    PubMed Central

    Conaco, Cecilia; Tsoulfas, Pantelis; Sakarya, Onur; Dolan, Amanda; Werren, John; Kosik, Kenneth S.

    2016-01-01

    Horizontal gene transfer (HGT) is common between prokaryotes and phagotrophic eukaryotes. In metazoans, the scale and significance of HGT remains largely unexplored but is usually linked to a close association with parasites and endosymbionts. Marine sponges (Porifera), which host many microorganisms in their tissues and lack an isolated germ line, are potential carriers of genes transferred from prokaryotes. In this study, we identified a number of potential horizontally transferred genes within the genome of the sponge, Amphimedon queenslandica. We further identified homologs of some of these genes in other sponges. The transferred genes, most of which possess catalytic activity for carbohydrate or protein metabolism, have assimilated host genome characteristics and are actively expressed. The diversity of functions contributed by the horizontally transferred genes is likely an important factor in the adaptation and evolution of A. queenslandica. These findings highlight the potential importance of HGT on the success of sponges in diverse ecological niches. PMID:26959231

  10. Detection of Prokaryotic Genes in the Amphimedon queenslandica Genome.

    PubMed

    Conaco, Cecilia; Tsoulfas, Pantelis; Sakarya, Onur; Dolan, Amanda; Werren, John; Kosik, Kenneth S

    2016-01-01

    Horizontal gene transfer (HGT) is common between prokaryotes and phagotrophic eukaryotes. In metazoans, the scale and significance of HGT remains largely unexplored but is usually linked to a close association with parasites and endosymbionts. Marine sponges (Porifera), which host many microorganisms in their tissues and lack an isolated germ line, are potential carriers of genes transferred from prokaryotes. In this study, we identified a number of potential horizontally transferred genes within the genome of the sponge, Amphimedon queenslandica. We further identified homologs of some of these genes in other sponges. The transferred genes, most of which possess catalytic activity for carbohydrate or protein metabolism, have assimilated host genome characteristics and are actively expressed. The diversity of functions contributed by the horizontally transferred genes is likely an important factor in the adaptation and evolution of A. queenslandica. These findings highlight the potential importance of HGT on the success of sponges in diverse ecological niches. PMID:26959231

  11. Genomes, diversity and resistance gene analogues in Musa species.

    PubMed

    Azhar, M; Heslop-Harrison, J S

    2008-01-01

    Resistance genes (R genes) in plants are abundant and may represent more than 1% of all the genes. Their diversity is critical to the recognition and response to attack from diverse pathogens. Like many other crops, banana and plantain face attacks from potentially devastating fungal and bacterial diseases, increased by a combination of worldwide spread of pathogens, exploitation of a small number of varieties, new pathogen mutations, and the lack of effective, benign and cheap chemical control. The challenge for plant breeders is to identify and exploit genetic resistances to diseases, which is particularly difficult in banana and plantain where the valuable cultivars are sterile, parthenocarpic and mostly triploid so conventional genetic analysis and breeding is impossible. In this paper, we review the nature of R genes and the key motifs, particularly in the Nucleotide Binding Sites (NBS), Leucine Rich Repeat (LRR) gene class. We present data about identity, nature and evolutionary diversity of the NBS domains of Musa R genes in diploid wild species with the Musa acuminata (A), M. balbisiana (B), M. schizocarpa (S), M. textilis (T), M. velutina and M. ornata genomes, and from various cultivated hybrid and triploid accessions, using PCR primers to isolate the domains from genomic DNA. Of 135 new sequences, 75% of the sequenced clones had uninterrupted open reading frames (ORFs), and phylogenetic UPGMA tree construction showed four clusters, one from Musa ornata, one largely from the B and T genomes, one from A and M. velutina, and the largest with A, B, T and S genomes. Only genes of the coiled-coil (non-TIR) class were found, typical of the grasses and presumably monocotyledons. The analysis of R genes in cultivated banana and plantain, and their wild relatives, has implications for identification and selection of resistance genes within the genus which may be useful for plant selection and breeding and also for defining relationships and genome evolution

  12. Sessile snails, dynamic genomes: gene rearrangements within the mitochondrial genome of a family of caenogastropod molluscs

    PubMed Central

    2010-01-01

    Background Widespread sampling of vertebrates, which comprise the majority of published animal mitochondrial genomes, has led to the view that mitochondrial gene rearrangements are relatively rare, and that gene orders are typically stable across major taxonomic groups. In contrast, more limited sampling within the Phylum Mollusca has revealed an unusually high number of gene order arrangements. Here we provide evidence that the lability of the molluscan mitochondrial genome extends to the family level by describing extensive gene order changes that have occurred within the Vermetidae, a family of sessile marine gastropods that radiated from a basal caenogastropod stock during the Cenozoic Era. Results Major mitochondrial gene rearrangements have occurred within this family at a scale unexpected for such an evolutionarily young group and unprecedented for any caenogastropod examined to date. We determined the complete mitochondrial genomes of four species (Dendropoma maximum, D. gregarium, Eualetes tulipa, and Thylacodes squamigerus) and the partial mitochondrial genomes of two others (Vermetus erectus and Thylaeodus sp.). Each of the six vermetid gastropods assayed possessed a unique gene order. In addition to the typical mitochondrial genome complement of 37 genes, additional tRNA genes were evident in D. gregarium (trnK) and Thylacodes squamigerus (trnV, trnLUUR). Three pseudogenes and additional tRNAs found within the genome of Thylacodes squamigerus provide evidence of a past duplication event in this taxon. Likewise, high sequence similarities between isoaccepting leucine tRNAs in Thylacodes, Eualetes, and Thylaeodus suggest that tRNA remolding has been rife within this family. While vermetids exhibit gene arrangements diagnostic of this family, they also share arrangements with littorinimorph caenogastropods, with which they have been linked based on sperm morphology and primary sequence-based phylogenies. Conclusions We have uncovered major changes in gene

  13. Genome engineering using a synthetic gene circuit in Bacillus subtilis

    PubMed Central

    Jeong, Da-Eun; Park, Seung-Hwan; Pan, Jae-Gu; Kim, Eui-Joong; Choi, Soo-Keun

    2015-01-01

    Genome engineering without leaving foreign DNA behind requires an efficient counter-selectable marker system. Here, we developed a genome engineering method in Bacillus subtilis using a synthetic gene circuit as a counter-selectable marker system. The system contained two repressible promoters (B. subtilis xylA (Pxyl) and spac (Pspac)) and two repressor genes (lacI and xylR). Pxyl-lacI was integrated into the B. subtilis genome with a target gene containing a desired mutation. The xylR and Pspac–chloramphenicol resistant genes (cat) were located on a helper plasmid. In the presence of xylose, repression of XylR by xylose induced LacI expression, the LacIs repressed the Pspac promoter and the cells become chloramphenicol sensitive. Thus, to survive in the presence of chloramphenicol, the cell must delete Pxyl-lacI by recombination between the wild-type and mutated target genes. The recombination leads to mutation of the target gene. The remaining helper plasmid was removed easily under the chloramphenicol absent condition. In this study, we showed base insertion, deletion and point mutation of the B. subtilis genome without leaving any foreign DNA behind. Additionally, we successfully deleted a 2-kb gene (amyE) and a 38-kb operon (ppsABCDE). This method will be useful to construct designer Bacillus strains for various industrial applications. PMID:25552415

  14. Genome-wide identification and analysis of the MADS-box gene family in sesame.

    PubMed

    Wei, Xin; Wang, Linhai; Yu, Jingyin; Zhang, Yanxin; Li, Donghua; Zhang, Xiurong

    2015-09-10

    MADS-box genes encode transcription factors that play crucial roles in plant growth and development. Sesame (Sesamum indicum L.) is an oil crop that contributes to the daily oil and protein requirements of almost half of the world's population; therefore, a genome-wide analysis of the MADS-box gene family is needed. Fifty-seven MADS-box genes were identified from 14 linkage groups of the sesame genome. Analysis of phylogenetic relationships with Arabidopsis thaliana, Utricularia gibba and Solanum lycopersicum MADS-box genes was performed. Sesame MADS-box genes were clustered into four groups: 28 MIKC(c)-type, 5 MIKC(⁎)-type, 14 Mα-type and 10 Mγ-type. Gene structure analysis revealed from 1 to 22 exons of sesame MADS-box genes. The number of exons in type II MADS-box genes greatly exceeded the number in type I genes. Motif distribution analysis of sesame MADS-box genes also indicated that type II MADS-box genes contained more motifs than type I genes. These results suggested that type II sesame MADS-box genes had more complex structures. By analyzing expression profiles of MADS-box genes in seven sesame transcriptomes, we determined that MIKC(C)-type MADS-box genes played significant roles in sesame flower and seed development. Although most MADS-box genes in the same clade showed similar expression features, some gene functions were diversified from the orthologous Arabidopsis genes. This research will contribute to uncovering the role of MADS-box genes in sesame development. PMID:25967387

  15. mGene: accurate SVM-based gene finding with an application to nematode genomes.

    PubMed

    Schweikert, Gabriele; Zien, Alexander; Zeller, Georg; Behr, Jonas; Dieterich, Christoph; Ong, Cheng Soon; Philips, Petra; De Bona, Fabio; Hartmann, Lisa; Bohlen, Anja; Krüger, Nina; Sonnenburg, Sören; Rätsch, Gunnar

    2009-11-01

    We present a highly accurate gene-prediction system for eukaryotic genomes, called mGene. It combines in an unprecedented manner the flexibility of generalized hidden Markov models (gHMMs) with the predictive power of modern machine learning methods, such as Support Vector Machines (SVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. Considering the average of sensitivity and specificity, the developmental version of mGene exhibited the best prediction performance on nucleotide, exon, and transcript level for ab initio and multiple-genome gene-prediction tasks. The fully developed version shows superior performance in 10 out of 12 evaluation criteria compared with the other participating gene finders, including Fgenesh++ and Augustus. An in-depth analysis of mGene's genome-wide predictions revealed that approximately 2200 predicted genes were not contained in the current genome annotation. Testing a subset of 57 of these genes by RT-PCR and sequencing, we confirmed expression for 24 (42%) of them. mGene missed 300 annotated genes, out of which 205 were unconfirmed. RT-PCR testing of 24 of these genes resulted in a success rate of merely 8%. These findings suggest that even the gene catalog of a well-studied organism such as C. elegans can be substantially improved by mGene's predictions. We also provide gene predictions for the four nematodes C. briggsae, C. brenneri, C. japonica, and C. remanei. Comparing the resulting proteomes among these organisms and to the known protein universe, we identified many species-specific gene inventions. In a quality assessment of several available annotations for these genomes, we find that mGene's predictions are most accurate. PMID:19564452

  16. A data management system for structural genomics.

    PubMed

    Raymond, Stéphane; O'Toole, Nicholas; Cygler, Miroslaw

    2004-06-21

    BACKGROUND: Structural genomics (SG) projects aim to determine thousands of protein structures by the development of high-throughput techniques for all steps of the experimental structure determination pipeline. Crucial to the success of such endeavours is the careful tracking and archiving of experimental and external data on protein targets. RESULTS: We have developed a sophisticated data management system for structural genomics. Central to the system is an Oracle-based, SQL-interfaced database. The database schema deals with all facets of the structure determination process, from target selection to data deposition. Users access the database via any web browser. Experimental data is input by users with pre-defined web forms. Data can be displayed according to numerous criteria. A list of all current target proteins can be viewed, with links for each target to associated entries in external databases. To avoid unnecessary work on targets, our data management system matches protein sequences weekly using BLAST to entries in the Protein Data Bank and to targets of other SG centers worldwide. CONCLUSION: Our system is a working, effective and user-friendly data management tool for structural genomics projects. In this report we present a detailed summary of the various capabilities of the system, using real target data as examples, and indicate our plans for future enhancements. PMID:15210054

  17. Comparative Genomic Analysis of Drosophila melanogaster and Vector Mosquito Developmental Genes

    PubMed Central

    Behura, Susanta K.; Haugen, Morgan; Flannery, Ellen; Sarro, Joseph; Tessier, Charles R.; Severson, David W.; Duman-Scheel, Molly

    2011-01-01

    Genome sequencing projects have presented the opportunity for analysis of developmental genes in three vector mosquito species: Aedes aegypti, Culex quinquefasciatus, and Anopheles gambiae. A comparative genomic analysis of developmental genes in Drosophila melanogaster and these three important vectors of human disease was performed in this investigation. While the study was comprehensive, special emphasis centered on genes that 1) are components of developmental signaling pathways, 2) regulate fundamental developmental processes, 3) are critical for the development of tissues of vector importance, 4) function in developmental processes known to have diverged within insects, and 5) encode microRNAs (miRNAs) that regulate developmental transcripts in Drosophila. While most fruit fly developmental genes are conserved in the three vector mosquito species, several genes known to be critical for Drosophila development were not identified in one or more mosquito genomes. In other cases, mosquito lineage-specific gene gains with respect to D. melanogaster were noted. Sequence analyses also revealed that numerous repetitive sequences are a common structural feature of Drosophila and mosquito developmental genes. Finally, analysis of predicted miRNA binding sites in fruit fly and mosquito developmental genes suggests that the repertoire of developmental genes targeted by miRNAs is species-specific. The results of this study provide insight into the evolution of developmental genes and processes in dipterans and other arthropods, serve as a resource for those pursuing analysis of mosquito development, and will promote the design and refinement of functional analysis experiments. PMID:21754989

  18. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    PubMed

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-01-01

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives. PMID:26658305

  19. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome

    PubMed Central

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-01-01

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives. PMID:26658305

  20. Divergence of the mitochondrial genome structure in the apicomplexan parasites, Babesia and Theileria.

    PubMed

    Hikosaka, Kenji; Watanabe, Yoh-Ichi; Tsuji, Naotoshi; Kita, Kiyoshi; Kishine, Hiroe; Arisue, Nobuko; Palacpac, Nirianne Marie Q; Kawazu, Shin-Ichiro; Sawai, Hiromi; Horii, Toshihiro; Igarashi, Ikuo; Tanabe, Kazuyuki

    2010-05-01

    Mitochondrial (mt) genomes from diverse phylogenetic groups vary considerably in size, structure, and organization. The genus Plasmodium, causative agent of malaria, of the phylum Apicomplexa, has the smallest mt genome in the form of a circular and/or tandemly repeated linear element of 6 kb, encoding only three protein genes (cox1, cox3, and cob). The closely related genera Babesia and Theileria also have small mt genomes (6.6 kb) that are monomeric linear with an organization distinct from Plasmodium. To elucidate the structural divergence and evolution of mt genomes between Babesia/Theileria and Plasmodium, we determined five new sequences from Babesia bigemina, B. caballi, B. gibsoni, Theileria orientalis, and T. equi. Together with previously reported sequences of B. bovis, T. annulata, and T. parva, all eight Babesia and Theileria mt genomes are linear molecules with terminal inverted repeats (TIRs) on both ends containing three protein-coding genes (cox1, cox3, and cob) and six large subunit (LSU) ribosomal RNA (rRNA) gene fragments. The organization and transcriptional direction of protein-coding genes and the rRNA gene fragments were completely conserved in the four Babesia species. In contrast, notable variation occurred in the four Theileria species. Although the genome structures of T. annulata and T. parva were nearly identical to those of Babesia, an inversion in the 3-kb central region was found in T. orientalis. Moreover, the T. equi mt genome is the largest (8.2 kb) and most divergent with unusually long TIR sequences, in which cox3 and two LSU rRNA gene fragments are located. The T. equi mt genome showed little synteny to the other species. These results suggest that the Theileria mt genome is highly diverse with lineage-specific evolution in two Theileria species: genome inversion in T. orientalis and gene-embedded long TIR in T. equi. PMID:20034997

  1. Bacterial origin of a diverse family of UDP-glycosyltransferase genes in the Tetranychus urticae genome.

    PubMed

    Ahn, Seung-Joon; Dermauw, Wannes; Wybouw, Nicky; Heckel, David G; Van Leeuwen, Thomas

    2014-07-01

    UDP-glycosyltransferases (UGTs) catalyze the conjugation of a variety of small lipophilic molecules with uridine diphosphate (UDP) sugars, altering them into more water-soluble metabolites. Thereby, UGTs play an important role in the detoxification of xenobiotics and in the regulation of endobiotics. Recently, the genome sequence was reported for the two-spotted spider mite, Tetranychus urticae, a polyphagous herbivore damaging a number of agricultural crops. Although various gene families implicated in xenobiotic metabolism have been documented in T. urticae, UGTs so far have not. We identified 80 UGT genes in the T. urticae genome, the largest number of UGT genes in a metazoan species reported so far. Phylogenetic analysis revealed that lineage-specific gene expansions increased the diversity of the T. urticae UGT repertoire. Genomic distribution, intron-exon structure and structural motifs in the T. urticae UGTs were also described. In addition, expression profiling after host-plant shifts and in acaricide resistant lines supported an important role for UGT genes in xenobiotic metabolism. Expanded searches of UGTs in other arachnid species (Subphylum Chelicerata), including a spider, a scorpion, two ticks and two predatory mites, unexpectedly revealed the complete absence of UGT genes. However, a centipede (Subphylum Myriapoda) and a water flea and a crayfish (Subphylum Crustacea) contain UGT genes in their genomes similar to insect UGTs, suggesting that the UGT gene family might have been lost early in the Chelicerata lineage and subsequently re-gained in the tetranychid mites. Sequence similarity of T. urticae UGTs and bacterial UGTs and their phylogenetic reconstruction suggest that spider mites acquired UGT genes from bacteria by horizontal gene transfer. Our findings show a unique evolutionary history of the T. urticae UGT gene family among other arthropods and provide important clues to its functions in relation to detoxification and thereby host

  2. Efficient Gene Tree Correction Guided by Genome Evolution

    PubMed Central

    Lafond, Manuel; Seguin, Jonathan; Boussau, Bastien; Guéguen, Laurent; El-Mabrouk, Nadia; Tannier, Eric

    2016-01-01

    Motivations Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases. Results We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny. Availability A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available. PMID:27513924

  3. Genomic Gene Clustering Analysis of Pathways in Eukaryotes

    PubMed Central

    Lee, Jennifer M.; Sonnhammer, Erik L.L.

    2003-01-01

    Genomic clustering of genes in a pathway is commonly found in prokaryotes due to transcriptional operons, but these are not present in most eukaryotes. Yet, there might be clustering to a lesser extent of pathway members in eukaryotic genomes, that assist coregulation of a set of functionally cooperating genes. We analyzed five sequenced eukaryotic genomes for clustering of genes assigned to the same pathway in the KEGG database. Between 98% and 30% of the analyzed pathways in a genome were found to exhibit significantly higher clustering levels than expected by chance. In descending order by the level of clustering, the genomes studied were Saccharomyces cerevisiae, Homo sapiens, Caenorhabditis elegans, Arabidopsis thaliana, and Drosophila melanogaster. Surprisingly, there is not much agreement between genomes in terms of which pathways are most clustered. Only seven of 69 pathways found in all species were significantly clustered in all five of them. This species-specific pattern of pathway clustering may reflect adaptations or evolutionary events unique to a particular lineage. We note that although operons are common in C. elegans, only 58% of the pathways showed significant clustering, which is less than in human. Virtually all pathways in S. cerevisiae showed significant clustering. PMID:12695325

  4. Mapping and annotating obesity-related genes in pig and human genomes.

    PubMed

    Martelli, Pier Luigi; Fontanesi, Luca; Piovesan, Damiano; Fariselli, Piero; Casadio, Rita

    2014-01-01

    Background. Obesity is a major health problem in both developed and emerging countries. Obesity is a complex disease whose etiology involves genetic factors in strong interplay with environmental determinants and lifestyle. The discovery of genetic factors and biological pathways underlying human obesity is hampered by the difficulty in controlling the genetic background of human cohorts. Animal models are then necessary to further dissect the genetics of obesity. Pig has emerged as one of the most attractive models, because of the similarity with humans in the mechanisms regulating the fat deposition. Results. We collected the genes related to obesity in humans and to fat deposition traits in pig. We localized them on both human and pig genomes, building a map useful to interpret comparative studies on obesity. We characterized the collected genes structurally and functionally with BAR+ and mapped them on KEGG pathways and on STRING protein interaction network. Conclusions. The collected set consists of 361 obesity related genes in human and pig genomes. All genes were mapped on the human genome, and 54 could not be localized on the pig genome (release 2012). Only for 3 human genes there is no counterpart in pig, confirming that this animal is a good model for human obesity studies. Obesity related genes are mostly involved in regulation and signaling processes/pathways and relevant connection emerges between obesity-related genes and diseases such as cancer and infectious diseases. PMID:23855670

  5. From genome to gene: a new epoch for wheat research?

    PubMed

    Wang, Meng; Wang, Shubin; Xia, Guangmin

    2015-06-01

    Genetic research for bread wheat (Triticum aestivum), a staple crop around the world, has been impeded by its complex large hexaploid genome that contains a high proportion of repetitive DNA. Recent advances in sequencing technology have now overcome these challenges and led to genome drafts for bread wheat and its progenitors as well as high-resolution transcriptomes. However, the exploitation of these data for identifying agronomically important genes in wheat is lagging behind. We review recent wheat genome sequencing achievements and focus on four aspects of strategies and future hotspots for wheat improvement: positional cloning, 'omics approaches, combining forward and reverse genetics, and epigenetics. PMID:25887708

  6. Identification of Neural Outgrowth Genes using Genome-Wide RNAi

    PubMed Central

    Sepp, Katharine J.; Hong, Pengyu; Lizarraga, Sofia B.; Liu, Judy S.; Mejia, Luis A.; Walsh, Christopher A.; Perrimon, Norbert

    2008-01-01

    While genetic screens have identified many genes essential for neurite outgrowth, they have been limited in their ability to identify neural genes that also have earlier critical roles in the gastrula, or neural genes for which maternally contributed RNA compensates for gene mutations in the zygote. To address this, we developed methods to screen the Drosophila genome using RNA-interference (RNAi) on primary neural cells and present the results of the first full-genome RNAi screen in neurons. We used live-cell imaging and quantitative image analysis to characterize the morphological phenotypes of fluorescently labelled primary neurons and glia in response to RNAi-mediated gene knockdown. From the full genome screen, we focused our analysis on 104 evolutionarily conserved genes that when downregulated by RNAi, have morphological defects such as reduced axon extension, excessive branching, loss of fasciculation, and blebbing. To assist in the phenotypic analysis of the large data sets, we generated image analysis algorithms that could assess the statistical significance of the mutant phenotypes. The algorithms were essential for the analysis of the thousands of images generated by the screening process and will become a valuable tool for future genome-wide screens in primary neurons. Our analysis revealed unexpected, essential roles in neurite outgrowth for genes representing a wide range of functional categories including signalling molecules, enzymes, channels, receptors, and cytoskeletal proteins. We also found that genes known to be involved in protein and vesicle trafficking showed similar RNAi phenotypes. We confirmed phenotypes of the protein trafficking genes Sec61alpha and Ran GTPase using Drosophila embryo and mouse embryonic cerebral cortical neurons, respectively. Collectively, our results showed that RNAi phenotypes in primary neural culture can parallel in vivo phenotypes, and the screening technique can be used to identify many new genes that have

  7. Nuclear structure, gene expression and development.

    PubMed

    Brown, K

    1999-01-01

    This article considers the extent to which features of nuclear structure are involved in the regulation of genome function. The recent renaissance in imaging technology has inspired a new determination to assign specific functions to nuclear domains or structures, many of which have been described as "factories" to express the idea that they coordinate nuclear processes in an efficient way. Visual data have been combined with genetic and biochemical information to support the idea that nuclear organization has functional significance. Particular DNA sequences or chromatin structures may nucleate domains that are permissive or restrictive of transcription, to which active or inactive loci could be recruited. Associations within the nucleus, as well as many nuclear structures, are transient and change dynamically during cell cycle progression and development. Despite this complexity, elucidation of the possible structural basis of epigenetic phenomena, such as the inheritance of a "cellular memory" of gene expression status, is an important goal for cell biology. Topics for discussion include the regulatory effect of chromatin structure on gene expression, putative "nuclear addresses" for genes and proteins, the functional significance of nuclear bodies, and the role of the nuclear matrix in nuclear compartmentalization. PMID:10651237

  8. Robust Gene-Gene Interaction Analysis in Genome Wide Association Studies.

    PubMed

    Kim, Yongkang; Park, Taesung

    2015-01-01

    Genome-wide association studies (GWAS) have successfully discovered hundreds of associations between genetic variants and complex traits. Most GWAS have focused on the identification of single variants. It has been shown that most of the variants that were discovered by GWAS could only partially explain disease heritability. The explanation for this missing heritability is generally believed to be gene-gene (GG) or gene-environment (GE) interactions and other structural variants. Generalized multifactor dimensionality reduction (GMDR) has been proven to be reasonably powerful in detecting GG and GE interactions; however, its performance has been found to decline when outlying quantitative traits are present. This paper proposes a robust GMDR estimation method (based on the L-estimator and M-estimator estimation methods) in an attempt to reduce the effects caused by outlying traits. A comparison of robust GMDR with the original MDR based on simulation studies showed the former method to outperform the latter. The performance of robust GMDR is illustrated through a real GWA example consisting of 8,577 samples from the Korean population using the Homeostasis Model Assessment of Insulin Resistance (HOMA-IR) level as a phenotype. Robust GMDR identified the KCNH1 gene to have strong interaction effects with other genes on the function of insulin secretion. PMID:26267341

  9. Robust Gene-Gene Interaction Analysis in Genome Wide Association Studies

    PubMed Central

    Kim, Yongkang; Park, Taesung

    2015-01-01

    Genome-wide association studies (GWAS) have successfully discovered hundreds of associations between genetic variants and complex traits. Most GWAS have focused on the identification of single variants. It has been shown that most of the variants that were discovered by GWAS could only partially explain disease heritability. The explanation for this missing heritability is generally believed to be gene-gene (GG) or gene-environment (GE) interactions and other structural variants. Generalized multifactor dimensionality reduction (GMDR) has been proven to be reasonably powerful in detecting GG and GE interactions; however, its performance has been found to decline when outlying quantitative traits are present. This paper proposes a robust GMDR estimation method (based on the L-estimator and M-estimator estimation methods) in an attempt to reduce the effects caused by outlying traits. A comparison of robust GMDR with the original MDR based on simulation studies showed the former method to outperform the latter. The performance of robust GMDR is illustrated through a real GWA example consisting of 8,577 samples from the Korean population using the Homeostasis Model Assessment of Insulin Resistance (HOMA-IR) level as a phenotype. Robust GMDR identified the KCNH1 gene to have strong interaction effects with other genes on the function of insulin secretion. PMID:26267341

  10. Integrated genome-wide analysis of genomic changes and gene regulation in human adrenocortical tissue samples

    PubMed Central

    Gara, Sudheer Kumar; Wang, Yonghong; Patel, Dhaval; Liu-Chittenden, Yi; Jain, Meenu; Boufraqech, Myriem; Zhang, Lisa; Meltzer, Paul S.; Kebebew, Electron

    2015-01-01

    To gain insight into the pathogenesis of adrenocortical carcinoma (ACC) and whether there is progression from normal-to-adenoma-to-carcinoma, we performed genome-wide gene expression, gene methylation, microRNA expression and comparative genomic hybridization (CGH) analysis in human adrenocortical tissue (normal, adrenocortical adenomas and ACC) samples. A pairwise comparison of normal, adrenocortical adenomas and ACC gene expression profiles with more than four-fold expression differences and an adjusted P-value < 0.05 revealed no major differences in normal versus adrenocortical adenoma whereas there are 808 and 1085, respectively, dysregulated genes between ACC versus adrenocortical adenoma and ACC versus normal. The majority of the dysregulated genes in ACC were downregulated. By integrating the CGH, gene methylation and expression profiles of potential miRNAs with the gene expression of dysregulated genes, we found that there are higher alterations in ACC versus normal compared to ACC versus adrenocortical adenoma. Importantly, we identified several novel molecular pathways that are associated with dysregulated genes and further experimentally validated that oncostatin m signaling induces caspase 3 dependent apoptosis and suppresses cell proliferation. Finally, we propose that there is higher number of genomic changes from normal-to-adenoma-to-carcinoma and identified oncostatin m signaling as a plausible druggable pathway for therapeutics. PMID:26446994

  11. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    PubMed

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events). PMID:23161689

  12. Reading Genomes and Controlling Gene Expression

    NASA Astrophysics Data System (ADS)

    Libchaber, Albert

    2000-03-01

    Molecular recognition of DNA sequences is achieved by DNA hybridization of complementary sequences. We present various scenarios for optimization, leading to microarrays and global measurement. Gene expression can be controlled using gene constructs immobilized on a template with micron scale temperature heaters. We will discuss and present results on protein microarrays.

  13. Genomic organization of SLC3A1, a transporter gene mutated in cystinuria

    SciTech Connect

    Pras, E.; Sood, R.; Raben, N.

    1996-08-15

    The SLC3A1 gene encodes a transport protein for cystine and the dibasic amino acids. Recently mutations in this gene have been shown to cause cystinuria. We report the genomic structure and organization of SLC3A1, which is composed of 10 exons and spans nearly 45 kb. Until now screening for mutations in SLC3A1 has been based on RT-PCR amplification of illegitimate mRNA transcripts from white blood cells. In this report we provide primers for amplification of exons from genomic DNA, thus simplifying the process of screening for SLC3A1 mutations in cystinuria. 20 refs., 3 figs., 2 tabs.

  14. Draft Genome Sequence and Gene Annotation of Stemphylium lycopersici Strain CIDEFI-216

    PubMed Central

    Franco, Mario E. E.; López, Silvina; Medina, Rocio; Saparrat, Mario C. N.

    2015-01-01

    Stemphylium lycopersici is a plant-pathogenic fungus that is widely distributed throughout the world. In tomatoes, it is one of the etiological agents of gray leaf spot disease. Here, we report the first draft genome sequence of S. lycopersici, including its gene structure and functional annotation. PMID:26404600

  15. Genome-Wide Identification and Evolution of HECT Genes in Soybean

    PubMed Central

    Meng, Xianwen; Wang, Chen; Rahman, Siddiq Ur; Wang, Yaxu; Wang, Ailan; Tao, Shiheng

    2015-01-01

    Proteins containing domains homologous to the E6-associated protein (E6-AP) carboxyl terminus (HECT) are an important class of E3 ubiquitin ligases involved in the ubiquitin proteasome pathway. HECT-type E3s play crucial roles in plant growth and development. However, current understanding of plant HECT genes and their evolution is very limited. In this study, we performed a genome-wide analysis of the HECT domain-containing genes in soybean. Using high-quality genome sequences, we identified 19 soybean HECT genes. The predicted HECT genes were distributed unevenly across 15 of 20 chromosomes. Nineteen of these genes were inferred to be segmentally duplicated gene pairs, suggesting that in soybean, segmental duplications have made a significant contribution to the expansion of the HECT gene family. Phylogenetic analysis showed that these HECT genes can be divided into seven groups, among which gene structure and domain architecture was relatively well-conserved. The Ka/Ks ratios show that after the duplication events, duplicated HECT genes underwent purifying selection. Moreover, expression analysis reveals that 15 of the HECT genes in soybean are differentially expressed in 14 tissues, and are often highly expressed in the flowers and roots. In summary, this work provides useful information on which further functional studies of soybean HECT genes can be based. PMID:25894222

  16. Simple repetitive sequences in the genome: structure and functional significance.

    PubMed

    Brahmachari, S K; Meera, G; Sarkar, P S; Balagurumoorthy, P; Tripathi, J; Raghavan, S; Shaligram, U; Pataskar, S

    1995-09-01

    The current explosion of DNA sequence information has generated increasing evidence for the claim that noncoding repetitive DNA sequences present within and around different genes could play an important role in genetic control processes, although the precise role and mechanism by which these sequences function are poorly understood. Several of the simple repetitive sequences which occur in a large number of loci throughout the human and other eukaryotic genomes satisfy the sequence criteria for forming non-B DNA structures in vitro. We have summarized some of the features of three different types of simple repeats that highlight the importance of repetitive DNA in the control of gene expression and chromatin organization. (i) (TG/CA)n repeats are widespread and conserved in many loci. These sequences are associated with nucleosomes of varying linker length and may play a role in chromatin organization. These Z-potential sequences can help absorb superhelical stress during transcription and aid in recombination. (ii) Human telomeric repeat (TTAGGG)n adopts a novel quadruplex structure and exhibits unusual chromatin organization. This unusual structural motif could explain chromosome pairing and stability. (iii) Intragenic amplification of (CTG)n/(CAG)n trinucleotide repeat, which is now known to be associated with several genetic disorders, could down-regulate gene expression in vivo. The overall implications of these findings vis-à-vis repetitive sequences in the genome are summarized. PMID:8582360

  17. Integrative Genomics Identifies Gene Signature Associated with Melanoma Ulceration

    PubMed Central

    Toth, Reka; Vizkeleti, Laura; Herandez-Vargas, Hector; Lazar, Viktoria; Emri, Gabriella; Szatmari, Istvan; Herceg, Zdenko; Adany, Roza; Balazs, Margit

    2013-01-01

    Background Despite the extensive research approaches applied to characterise malignant melanoma, no specific molecular markers are available that are clearly related to the progression of this disease. In this study, our aims were to define a gene expression signature associated with the clinical outcome of melanoma patients and to provide an integrative interpretation of the gene expression -, copy number alterations -, and promoter methylation patterns that contribute to clinically relevant molecular functional alterations. Methods Gene expression profiles were determined using the Affymetrix U133 Plus2.0 array. The NimbleGen Human CGH Whole-Genome Tiling array was used to define CNAs, and the Illumina GoldenGate Methylation platform was applied to characterise the methylation patterns of overlapping genes. Results We identified two subclasses of primary melanoma: one representing patients with better prognoses and the other being characteristic of patients with unfavourable outcomes. We assigned 1,080 genes as being significantly correlated with ulceration, 987 genes were downregulated and significantly enriched in the p53, Nf-kappaB, and WNT/beta-catenin pathways. Through integrated genome analysis, we defined 150 downregulated genes whose expression correlated with copy number losses in ulcerated samples. These genes were significantly enriched on chromosome 6q and 10q, which contained a total of 36 genes. Ten of these genes were downregulated and involved in cell-cell and cell-matrix adhesion or apoptosis. The expression and methylation patterns of additional genes exhibited an inverse correlation, suggesting that transcriptional silencing of these genes is driven by epigenetic events. Conclusion Using an integrative genomic approach, we were able to identify functionally relevant molecular hotspots characterised by copy number losses and promoter hypermethylation in distinct molecular subtypes of melanoma that contribute to specific transcriptomic silencing

  18. Comparative genetics and genomics of nematodes: genome structure, development, and lifestyle.

    PubMed

    Sommer, Ralf J; Streit, Adrian

    2011-01-01

    Nematodes are found in virtually all habitats on earth. Many of them are parasites of plants and animals, including humans. The free-living nematode, Caenorhabditis elegans, is one of the genetically best-studied model organisms and was the first metazoan whose genome was fully sequenced. In recent years, the draft genome sequences of another six nematodes representing four of the five major clades of nematodes were published. Compared to mammalian genomes, all these genomes are very small. Nevertheless, they contain almost the same number of genes as the human genome. Nematodes are therefore a very attractive system for comparative genetic and genomic studies, with C. elegans as an excellent baseline. Here, we review the efforts that were made to extend genetic analysis to nematodes other than C. elegans, and we compare the seven available nematode genomes. One of the most striking findings is the unexpectedly high incidence of gene acquisition through horizontal gene transfer (HGT). PMID:21721943

  19. Molecular cloning, genomic organization, and chromosomal localization of the human pancreatitis-associated protein (PAP) gene

    SciTech Connect

    Dusetti, N.J.; Frigerio, J.M.; Dagorn, J.C.; Iovanna, J.L. ); Fox, M.F.; Swallow, D.M. )

    1994-01-01

    Pancreatitis-associated protein (PAP) is a secretory pancreatic protein present in small amounts in normal pancreas and overexpressed during the acute phase of pancreatitis. In this paper, the authors describe the cloning, characterization, and chromosomal mapping of the human PAP gene. The gene spans 2748 bp and contains six exons interrupted by five introns. The gene has a typical promoter containing the sequences TATAAA and CCAAT 28 and 52 bp upstream of the cap site, respectively. They found striking similarities in genomic organization as well as in the promoter sequences between the human and rat PAP genes. The human PAP gene was mapped to chromosome 2p12 using rodent-human hybrid cells and in situ chromosomal hybridization. This localization coincides with that of the reg/lithostathine gene, which encodes a pancreatic secretory protein structurally related to PAP, suggesting that both genes derived from the same ancestral gene by duplication. 35 refs., 4 figs., 1 tab.

  20. Genomic analysis reveals extensive gene duplication within the bovine TRB locus

    PubMed Central

    Connelley, Timothy; Aerts, Jan; Law, Andy; Morrison, W Ivan

    2009-01-01

    Background Diverse TR and IG repertoires are generated by V(D)J somatic recombination. Genomic studies have been pivotal in cataloguing the V, D, J and C genes present in the various TR/IG loci and describing how duplication events have expanded the number of these genes. Such studies have also provided insights into the evolution of these loci and the complex mechanisms that regulate TR/IG expression. In this study we analyze the sequence of the third bovine genome assembly to characterize the germline repertoire of bovine TRB genes and compare the organization, evolution and regulatory structure of the bovine TRB locus with that of humans and mice. Results The TRB locus in the third bovine genome assembly is distributed over 5 scaffolds, extending to ~730 Kb. The available sequence contains 134 TRBV genes, assigned to 24 subgroups, and 3 clusters of DJC genes, each comprising a single TRBD gene, 5–7 TRBJ genes and a single TRBC gene. Seventy-nine of the TRBV genes are predicted to be functional. Comparison with the human and murine TRB loci shows that the gene order, as well as the sequences of non-coding elements that regulate TRB expression, are highly conserved in the bovine. Dot-plot analyses demonstrate that expansion of the genomic TRBV repertoire has occurred via a complex and extensive series of duplications, predominantly involving DNA blocks containing multiple genes. These duplication events have resulted in massive expansion of several TRBV subgroups, most notably TRBV6, 9 and 21 which contain 40, 35 and 16 members respectively. Similarly, duplication has lead to the generation of a third DJC cluster. Analyses of cDNA data confirms the diversity of the TRBV genes and, in addition, identifies a substantial number of TRBV genes, predominantly from the larger subgroups, which are still absent from the genome assembly. The observed gene duplication within the bovine TRB locus has created a repertoire of phylogenetically diverse functional TRBV genes

  1. Genome Sequencing Fishes out Longevity Genes.

    PubMed

    Lakhina, Vanisha; Murphy, Coleen T

    2015-12-01

    Understanding the molecular basis underlying aging is critical if we are to fully understand how and why we age-and possibly how to delay the aging process. Up until now, most longevity pathways were discovered in invertebrates because of their short lifespans and availability of genetic tools. Now, Reichwald et al. and Valenzano et al. independently provide a reference genome for the short-lived African turquoise killifish, establishing its role as a vertebrate system for aging research. PMID:26638067

  2. Evolution of genomic structures on Mammalian sex chromosomes.

    PubMed

    Katsura, Yukako; Iwase, Mineyo; Satta, Yoko

    2012-04-01

    Throughout mammalian evolution, recombination between the two sex chromosomes was suppressed in a stepwise manner. It is thought that the suppression of recombination led to an accumulation of deleterious mutations and frequent genomic rearrangements on the Y chromosome. In this article, we review three evolutionary aspects related to genomic rearrangements and structures, such as inverted repeats (IRs) and palindromes (PDs), on the mammalian sex chromosomes. First, we describe the stepwise manner in which recombination between the X and Y chromosomes was suppressed in placental mammals and discuss a genomic rearrangement that might have led to the formation of present pseudoautosomal boundaries (PAB). Second, we describe ectopic gene conversion between the X and Y chromosomes, and propose possible molecular causes. Third, we focus on the evolutionary mode and timing of PD formation on the X and Y chromosomes. The sequence of the chimpanzee Y chromosome was recently published by two groups. Both groups suggest that rapid evolution of genomic structure occurred on the Y chromosome. Our re-analysis of the sequences confirmed the species-specific mode of human and chimpanzee Y chromosomal evolution. Finally, we present a general outlook regarding the rapid evolution of mammalian sex chromosomes. PMID:23024603

  3. Analyses of the Complete Genome and Gene Expression of Chloroplast of Sweet Potato [Ipomoea batata

    PubMed Central

    Yan, Lang; Lai, Xianjun; Li, Xuedan; Wei, Changhe; Tan, Xuemei; Zhang, Yizheng

    2015-01-01

    Sweet potato [Ipomoea batatas (L.) Lam] ranks among the top seven most important food crops cultivated worldwide and is hexaploid plant (2n=6x=90) in the Convolvulaceae family with a genome size between 2,200 to 3,000 Mb. The genomic resources for this crop are deficient due to its complicated genetic structure. Here, we report the complete nucleotide sequence of the chloroplast (cp) genome of sweet potato, which is a circular molecule of 161,303 bp in the typical quadripartite structure with large (LSC) and small (SSC) single-copy regions separated by a pair of inverted repeats (IRs). The chloroplast DNA contains a total of 145 genes, including 94 protein-encoding genes of which there are 72 single-copy and 11 double-copy genes. The organization and structure of the chloroplast genome (gene content and order, IR expansion/contraction, random repeating sequences, structural rearrangement) of sweet potato were compared with those of Ipomoea (L.) species and some basal important angiosperms, respectively. Some boundary gene-flow and gene gain-and-loss events were identified at intra- and inter-species levels. In addition, by comparing with the transcriptome sequences of sweet potato, the RNA editing events and differential expressions of the chloroplast functional-genes were detected. Moreover, phylogenetic analysis was conducted based on 77 protein-coding genes from 33 taxa and the result may contribute to a better understanding of the evolution progress of the genus Ipomoea (L.), including phylogenetic relationships, intraspecific differentiation and interspecific introgression. PMID:25874767

  4. Phase Transition in the Genome Evolution Favors Nonrandom Distribution of Genes on Chromosomes

    NASA Astrophysics Data System (ADS)

    Kowalski, Jakub; Waga, Wojciech; Zawierta, Marta; Cebrat, Stanisław

    We have used the Monte Carlo-based computer models to show that selection pressure could affect the distribution of recombination hotspots along the chromosome. Close to the critical crossover rate, where genomes may switch between the Darwinian purifying selection or complementation of haplotypes, the distribution of recombination events and the force of selection exerted on genes affect the structure of chromosomes. The order of expression of genes and their location on chromosome may decide about the extinction or survival of competing populations.

  5. Analyses of the complete genome and gene expression of chloroplast of sweet potato [Ipomoea batata].

    PubMed

    Yan, Lang; Lai, Xianjun; Li, Xuedan; Wei, Changhe; Tan, Xuemei; Zhang, Yizheng

    2015-01-01

    Sweet potato [Ipomoea batatas (L.) Lam] ranks among the top seven most important food crops cultivated worldwide and is hexaploid plant (2n=6x=90) in the Convolvulaceae family with a genome size between 2,200 to 3,000 Mb. The genomic resources for this crop are deficient due to its complicated genetic structure. Here, we report the complete nucleotide sequence of the chloroplast (cp) genome of sweet potato, which is a circular molecule of 161,303 bp in the typical quadripartite structure with large (LSC) and small (SSC) single-copy regions separated by a pair of inverted repeats (IRs). The chloroplast DNA contains a total of 145 genes, including 94 protein-encoding genes of which there are 72 single-copy and 11 double-copy genes. The organization and structure of the chloroplast genome (gene content and order, IR expansion/contraction, random repeating sequences, structural rearrangement) of sweet potato were compared with those of Ipomoea (L.) species and some basal important angiosperms, respectively. Some boundary gene-flow and gene gain-and-loss events were identified at intra- and inter-species levels. In addition, by comparing with the transcriptome sequences of sweet potato, the RNA editing events and differential expressions of the chloroplast functional-genes were detected. Moreover, phylogenetic analysis was conducted based on 77 protein-coding genes from 33 taxa and the result may contribute to a better understanding of the evolution progress of the genus Ipomoea (L.), including phylogenetic relationships, intraspecific differentiation and interspecific introgression. PMID:25874767

  6. Genome-editing technologies for gene correction of hemophilia.

    PubMed

    Park, Chul-Yong; Lee, Dongjin R; Sung, Jin Jea; Kim, Dong-Wook

    2016-09-01

    Hemophilia is caused by various mutations in blood coagulation factor genes, including factor VIII (FVIII) and factor IX (FIX), that encode key proteins in the blood clotting pathway. Although the addition of therapeutic genes or infusion of clotting factors may be used to remedy hemophilia's symptoms, no permanent cure for the disease exists. Moreover, patients often develop neutralizing antibodies or experience adverse effects that limit the therapy's benefits. However, targeted gene therapy involving the precise correction of these mutated genes at the genome level using programmable nucleases is a promising strategy. These nucleases can induce double-strand breaks (DSBs) on genomes, and repairs of such induced DSBs by the two cellular repair systems enable a targeted gene correction. Going beyond cultured cell systems, we are now entering the age of direct gene correction in vivo using various delivery tools. Here, we describe the current status of in vivo and ex vivo genome-editing technology related to potential hemophilia gene correction and the prominent issues surrounding its application in patients with monogenic diseases. PMID:27357631

  7. Glycoprotein Structural Genomics: Solving the Glycosylation Problem

    PubMed Central

    Chang, Veronica T.; Crispin, Max; Aricescu, A. Radu; Harvey, David J.; Nettleship, Joanne E.; Fennelly, Janet A.; Yu, Chao; Boles, Kent S.; Evans, Edward J.; Stuart, David I.; Dwek, Raymond A.; Jones, E. Yvonne; Owens, Raymond J.; Davis, Simon J.

    2007-01-01

    Summary Glycoproteins present special problems for structural genomic analysis because they often require glycosylation in order to fold correctly, whereas their chemical and conformational heterogeneity generally inhibits crystallization. We show that the “glycosylation problem” can be solved by expressing glycoproteins transiently in mammalian cells in the presence of the N-glycosylation processing inhibitors, kifunensine or swainsonine. This allows the correct folding of the glycoproteins, but leaves them sensitive to enzymes, such as endoglycosidase H, that reduce the N-glycans to single residues, enhancing crystallization. Since the scalability of transient mammalian expression is now comparable to that of bacterial systems, this approach should relieve one of the major bottlenecks in structural genomic analysis. PMID:17355862

  8. Identification of the major structural and nonstructural proteins encoded by human parvovirus B19 and mapping of their genes by procaryotic expression of isolated genomic fragments.

    PubMed Central

    Cotmore, S F; McKie, V C; Anderson, L J; Astell, C R; Tattersall, P

    1986-01-01

    Plasma from a child with homozygous sickle-cell disease, sampled during the early phase of an aplastic crisis, contained human parvovirus B19 virions. Plasma taken 10 days later (during the convalescent phase) contained both immunoglobulin M and immunoglobulin G antibodies directed against two viral polypeptides with apparent molecular weights of 83,000 and 58,000 which were present exclusively in the particulate fraction of the plasma taken during the acute phase. These two protein species comigrated at 110S on neutral sucrose velocity gradients with the B19 viral DNA and thus appear to constitute the viral capsid polypeptides. The B19 genome was molecularly cloned into a bacterial plasmid vector. Restriction endonuclease fragments of this cloned B19 genome were treated with BAL 31 and shotgun cloned into the open reading frame expression vector pJS413. Two expression constructs containing B19 sequences from different halves of the viral genome were obtained, which directed the synthesis, in bacteria, of segments of virally encoded protein. These polypeptide fragments were then purified and used to immunize rabbits. Antibodies against a protein sequence specified between nucleotides 2897 and 3749 recognized both the 83- and 58-kilodalton capsid polypeptides in aplastic plasma taken during the acute phase and detected similar proteins in the tissues of a stillborn fetus which had been infected transplacentally with B19. Antibodies against a protein sequence encoded in the other half of the B19 genome (nucleotides 1072 through 2044) did not react specifically with any protein in plasma taken during the acute phase but recognized three nonstructural polypeptides of 71, 63, and 52 kilodaltons present in the liver and, at lower levels, in some other tissues of the transplacentally infected fetus. Images PMID:3021988

  9. The mitochondrial genome of the stramenopile alga Chrysodidymus synuroideus. Complete sequence, gene content and genome organization

    PubMed Central

    Chesnick, Joby M.; Goff, Megan; Graham, James; Ocampo, Christopher; Lang, B. Franz; Seif, Elias; Burger, Gertraud

    2000-01-01

    This is the first report of a complete mitochondrial genome sequence from a photosynthetic member of the stramenopiles, the chrysophyte alga Chrysodidymus synuroideus. The circular-mapping mitochondrial DNA (mtDNA) of 34 119 bp contains 58 densely packed genes (all without introns) and five unique open reading frames (ORFs). Protein genes code for components of respiratory chain complexes, ATP synthase and the mitoribosome, as well as one product of unknown function, encoded in many other protist mtDNAs (YMF16). In addition to small and large subunit ribosomal RNAs, 23 tRNAs are mtDNA-encoded, permitting translation of all codons present in protein-coding genes except ACN (Thr) and CGN (Arg). The missing tRNAs are assumed to be imported from the cytosol. Comparison of the C.synuroideus mtDNA with that of other stramenopiles allowed us to draw conclusions about mitochondrial genome organization, expression and evolution. First, we provide evidence that mitochondrial ORFs code for highly derived, unrecognizable versions of ribosomal or respiratory genes otherwise ‘missing’ in a particular mtDNA. Secondly, the observed constraints in mitochondrial genome rearrangements suggest operon-based, co-ordinated expression of genes functioning in common biological processes. Finally, stramenopile mtDNAs reveal an unexpectedly low variability in genome size and gene complement, testifying to substantial differences in the tempo of mtDNA evolution between major eukaryotic lineages. PMID:10871400

  10. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  11. The Quality and Validation of Structures from Structural Genomics

    PubMed Central

    Domagalski, Marcin J.; Zheng, Heping; Zimmerman, Matthew D.; Dauter, Zbigniew; Wlodawer, Alexander; Minor, Wladek

    2014-01-01

    Quality control of three-dimensional structures of macromolecules is a critical step to ensure the integrity of structural biology data, especially those produced by structural genomics centers. Whereas the Protein Data Bank (PDB) has proven to be a remarkable success overall, the inconsistent quality of structures reveals a lack of universal standards for structure/deposit validation. Here, we review the state-of-the-art methods used in macromolecular structure validation, focusing on validation of structures determined by X-ray crystallography. We describe some general protocols used in the rebuilding and re-refinement of problematic structural models. We also briefly discuss some frontier areas of structure validation, including refinement of protein–ligand complexes, automation of structure redetermination, and the use of NMR structures and computational models to solve X-ray crystal structures by molecular replacement. PMID:24203341

  12. Genome sequence, comparative analysis and haplotype structure of the domestic dog.

    PubMed

    Lindblad-Toh, Kerstin; Wade, Claire M; Mikkelsen, Tarjei S; Karlsson, Elinor K; Jaffe, David B; Kamal, Michael; Clamp, Michele; Chang, Jean L; Kulbokas, Edward J; Zody, Michael C; Mauceli, Evan; Xie, Xiaohui; Breen, Matthew; Wayne, Robert K; Ostrander, Elaine A; Ponting, Chris P; Galibert, Francis; Smith, Douglas R; DeJong, Pieter J; Kirkness, Ewen; Alvarez, Pablo; Biagi, Tara; Brockman, William; Butler, Jonathan; Chin, Chee-Wye; Cook, April; Cuff, James; Daly, Mark J; DeCaprio, David; Gnerre, Sante; Grabherr, Manfred; Kellis, Manolis; Kleber, Michael; Bardeleben, Carolyne; Goodstadt, Leo; Heger, Andreas; Hitte, Christophe; Kim, Lisa; Koepfli, Klaus-Peter; Parker, Heidi G; Pollinger, John P; Searle, Stephen M J; Sutter, Nathan B; Thomas, Rachael; Webber, Caleb; Baldwin, Jennifer; Abebe, Adal; Abouelleil, Amr; Aftuck, Lynne; Ait-Zahra, Mostafa; Aldredge, Tyler; Allen, Nicole; An, Peter; Anderson, Scott; Antoine, Claudel; Arachchi, Harindra; Aslam, Ali; Ayotte, Laura; Bachantsang, Pasang; Barry, Andrew; Bayul, Tashi; Benamara, Mostafa; Berlin, Aaron; Bessette, Daniel; Blitshteyn, Berta; Bloom, Toby; Blye, Jason; Boguslavskiy, Leonid; Bonnet, Claude; Boukhgalter, Boris; Brown, Adam; Cahill, Patrick; Calixte, Nadia; Camarata, Jody; Cheshatsang, Yama; Chu, Jeffrey; Citroen, Mieke; Collymore, Alville; Cooke, Patrick; Dawoe, Tenzin; Daza, Riza; Decktor, Karin; DeGray, Stuart; Dhargay, Norbu; Dooley, Kimberly; Dooley, Kathleen; Dorje, Passang; Dorjee, Kunsang; Dorris, Lester; Duffey, Noah; Dupes, Alan; Egbiremolen, Osebhajajeme; Elong, Richard; Falk, Jill; Farina, Abderrahim; Faro, Susan; Ferguson, Diallo; Ferreira, Patricia; Fisher, Sheila; FitzGerald, Mike; Foley, Karen; Foley, Chelsea; Franke, Alicia; Friedrich, Dennis; Gage, Diane; Garber, Manuel; Gearin, Gary; Giannoukos, Georgia; Goode, Tina; Goyette, Audra; Graham, Joseph; Grandbois, Edward; Gyaltsen, Kunsang; Hafez, Nabil; Hagopian, Daniel; Hagos, Birhane; Hall, Jennifer; Healy, Claire; Hegarty, Ryan; Honan, Tracey; Horn, Andrea; Houde, Nathan; Hughes, Leanne; Hunnicutt, Leigh; Husby, M; Jester, Benjamin; Jones, Charlien; Kamat, Asha; Kanga, Ben; Kells, Cristyn; Khazanovich, Dmitry; Kieu, Alix Chinh; Kisner, Peter; Kumar, Mayank; Lance, Krista; Landers, Thomas; Lara, Marcia; Lee, William; Leger, Jean-Pierre; Lennon, Niall; Leuper, Lisa; LeVine, Sarah; Liu, Jinlei; Liu, Xiaohong; Lokyitsang, Yeshi; Lokyitsang, Tashi; Lui, Annie; Macdonald, Jan; Major, John; Marabella, Richard; Maru, Kebede; Matthews, Charles; McDonough, Susan; Mehta, Teena; Meldrim, James; Melnikov, Alexandre; Meneus, Louis; Mihalev, Atanas; Mihova, Tanya; Miller, Karen; Mittelman, Rachel; Mlenga, Valentine; Mulrain, Leonidas; Munson, Glen; Navidi, Adam; Naylor, Jerome; Nguyen, Tuyen; Nguyen, Nga; Nguyen, Cindy; Nguyen, Thu; Nicol, Robert; Norbu, Nyima; Norbu, Choe; Novod, Nathaniel; Nyima, Tenchoe; Olandt, Peter; O'Neill, Barry; O'Neill, Keith; Osman, Sahal; Oyono, Lucien; Patti, Christopher; Perrin, Danielle; Phunkhang, Pema; Pierre, Fritz; Priest, Margaret; Rachupka, Anthony; Raghuraman, Sujaa; Rameau, Rayale; Ray, Verneda; Raymond, Christina; Rege, Filip; Rise, Cecil; Rogers, Julie; Rogov, Peter; Sahalie, Julie; Settipalli, Sampath; Sharpe, Theodore; Shea, Terrance; Sheehan, Mechele; Sherpa, Ngawang; Shi, Jianying; Shih, Diana; Sloan, Jessie; Smith, Cherylyn; Sparrow, Todd; Stalker, John; Stange-Thomann, Nicole; Stavropoulos, Sharon; Stone, Catherine; Stone, Sabrina; Sykes, Sean; Tchuinga, Pierre; Tenzing, Pema; Tesfaye, Senait; Thoulutsang, Dawa; Thoulutsang, Yama; Topham, Kerri; Topping, Ira; Tsamla, Tsamla; Vassiliev, Helen; Venkataraman, Vijay; Vo, Andy; Wangchuk, Tsering; Wangdi, Tsering; Weiand, Michael; Wilkinson, Jane; Wilson, Adam; Yadav, Shailendra; Yang, Shuli; Yang, Xiaoping; Young, Geneva; Yu, Qing; Zainoun, Joanne; Zembek, Lisa; Zimmer, Andrew; Lander, Eric S

    2005-12-01

    Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health. PMID:16341006

  13. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth

    PubMed Central

    Xiao, Shijun; Wang, Panpan; Zhang, Yan; Fang, Lujing; Liu, Yang; Li, Jiong-Tang; Wang, Zhi-Yong

    2015-01-01

    The genetic map of a species is essential for its whole genome assembly and can be applied to the mapping of important traits. In this study, we performed RNA-seq for a family of large yellow croakers (Larimichthys crocea) and constructed a high-density genetic map. In this map, 24 linkage groups comprised 3,448 polymorphic SNP markers. Approximately 72.4% (2,495) of the markers were located in protein-coding regions. Comparison of the croaker genome with those of five model fish species revealed that the croaker genome structure was closer to that of the medaka than to the remaining four genomes. Because the medaka genome preserves the teleost ancestral karyotype, this result indicated that the croaker genome might also maintain the teleost ancestral genome structure. The analysis also revealed different genome rearrangements across teleosts. QTL mapping and association analysis consistently identified growth-related QTL regions and associated genes. Orthologs of the associated genes in other species were demonstrated to regulate development, indicating that these genes might regulate development and growth in croaker. This gene map will enable us to construct the croaker genome for comparative studies and to provide an important resource for selective breeding of croaker. PMID:26689832

  14. In-silico human genomics with GeneCards

    PubMed Central

    2011-01-01

    Since 1998, the bioinformatics, systems biology, genomics and medical communities have enjoyed a synergistic relationship with the GeneCards database of human genes (http://www.genecards.org). This human gene compendium was created to help to introduce order into the increasing chaos of information flow. As a consequence of viewing details and deep links related to specific genes, users have often requested enhanced capabilities, such that, over time, GeneCards has blossomed into a suite of tools (including GeneDecks, GeneALaCart, GeneLoc, GeneNote and GeneAnnot) for a variety of analyses of both single human genes and sets thereof. In this paper, we focus on inhouse and external research activities which have been enabled, enhanced, complemented and, in some cases, motivated by GeneCards. In turn, such interactions have often inspired and propelled improvements in GeneCards. We describe here the evolution and architecture of this project, including examples of synergistic applications in diverse areas such as synthetic lethality in cancer, the annotation of genetic variations in disease, omics integration in a systems biology approach to kidney disease, and bioinformatics tools. PMID:22155609

  15. Genome-wide characterization of the ankyrin repeats gene family under salt stress in soybean.

    PubMed

    Zhang, Dayong; Wan, Qun; He, Xiaolan; Ning, Lihua; Huang, Yihong; Xu, Zhaolong; Liu, Jia; Shao, Hongbo

    2016-10-15

    Ankyrin repeats (ANK) gene family are common in diverse organisms and play important roles in cell growth, development and response to environmental stresses. Recently, genome-wide identification and evolutionary analyses of the ANK gene family have been carried out in Arabidopsis, rice and maize. However, little is known about the ANK genes in the whole soybean genome. In this study, we described the identification and structural characterization of 162ANK genes in soybean (GmANK). Then, comprehensive bioinformatics analyses of GmANK genes family were performed including gene locus, phylogenetic, domain composition analysis, chromosomal localization and expression profiling. Domain composition analyses showed that GmANK proteins formed eleven subfamilies in soybean. In sicilo expression analysis of these GmANK genes demonstrated that GmANK genes show a diverse/various expression pattern, suggesting that functional diversification of GmANK genes family. Based on digital gene expression profile (DGEP) data between cultivated soybean and wild type under salt treatment, some GmANKs related to salt/drought response were investigated. Moreover, the expression pattern and subcellular localization of GmANK6 were performed. The results will provide important clues to explore ANK genes expression and function in future studies in soybean. PMID:27335162

  16. Exploring laccase genes from plant pathogen genomes: a bioinformatic approach.

    PubMed

    Feng, B Z; Li, P Q; Fu, L; Yu, X M

    2015-01-01

    To date, research on laccases has mostly been focused on plant and fungal laccases and their current use in biotechnological applications. In contrast, little is known about laccases from plant pathogens, although recent rapid progress in whole genome sequencing of an increasing number of organisms has facilitated their identification and ascertainment of their origins. In this study, a comparative analysis was performed to elucidate the distribution of laccases among bacteria, fungi, and oomycetes, and, through comparison of their amino acids, to determine the relationships between them. We retrieved the laccase genes for the 20 publicly available plant pathogen genomes. From these, 125 laccase genes were identified in total, including seven in bacterial genomes, 101 in fungal genomes, and 17 in oomycete genomes. Most of the predicted protein models of these genes shared typical fungal laccase characteristics, possessing four conserved domains with one cysteine and ten histidine residues at these domains. Phylogenetic analysis illustrated that laccases from bacteria and oomycetes were grouped into two distinct clades, whereas fungal laccases clustered in three main clades. These results provide the theoretical groundwork regarding the role of laccases in plant pathogens and might be used to guide future research into these enzymes. PMID:26535716

  17. Genome-wide comparative analysis reveals possible common ancestors of nucleotide-binding sites domain containing genes in hybrid Citrus sinensis genome and original Citrus clementina genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We identified and re-annotated candidate disease resistance (R) genes with nucleotide-binding sites (NBS) domain from a Citrus clementina genome and two complete Citrus sinensis genome sequences (one from the USA and one from China). We found similar numbers of NBS genes from three citrus genomes, r...

  18. Mechanisms underlying structural variant formation in genomic disorders

    PubMed Central

    Carvalho, Claudia M. B.; Lupski, James R.

    2016-01-01

    With the recent burst of technological developments in genomics, and the clinical implementation of genome-wide assays, our understanding of the molecular basis of genomic disorders, specifically the contribution of structural variation to disease burden, is evolving quickly. Ongoing studies have revealed a ubiquitous role for genome architecture in the formation of structural variants at a given locus, both in DNA recombination-based processes and in replication-based processes. These reports showcase the influence of repeat sequences on genomic stability and structural variant complexity and also highlight the tremendous plasticity and dynamic nature of our genome in evolution, health and disease susceptibility. PMID:26924765

  19. Identification of the major structural and nonstructural proteins encoded by human parvovirus B19 and mapping of their genes by procaryotic expression of isolated genomic fragments

    SciTech Connect

    Cotmore, S.F.; McKie, V.C.; Anderson, L.J.; Astell, C.R.; Tattersall, P.

    1986-11-01

    Plasma from a child with homozygous sickle-cell disease, sampled during the early phase of an aplastic crisis, contained human parvovirus B19 virions. Plasma taken 10 days later (during the convalescent phase) contained both immunoglobulin M and immunoglobulin G antibodies directed against two viral polypeptides with apparent molecular weights for 83,000 and 58,000 which were present exclusively in the particulate fraction of the plasma taken during the acute phase. These two protein species comigrated at 110S on neutral sucrose velocity gradients with the B19 viral DNA and thus appear to constitute the viral capsid polypeptides. The B19 genome was molecularly cloned into a bacterial plasmid vector. Two expression constructs containing B19 sequences from different halves of the viral genome were obtained, which directed the synthesis, in bacteria, of segments of virally encoded protein. These polypeptide fragments were then purified and used to immunize rabbits. Antibodies against a protein sequence specified between nucleotides 2897 and 3749 recognized both the 83- and 58-kilodalton capsid polypeptides in aplastic plasma taken during the acute phase and detected similar proteins in the similar proteins in the tissues of a stillborn fetus which had been infected transplacentally with B19. Antibodies against a protein sequence encoded in the other half of the B19 genome (nucleotides 1072 through 2044) did not react specifically with any protein in plasma taken during the acute phase but recognized three nonstructural polypeptides of 71, 63, and 52 kilodaltons present in the liver and, at lower levels, in some other tissues of the transplacentally infected fetus.

  20. Genome-wide gene-based association study.

    PubMed

    Yang, Hsin-Chou; Liang, Yu-Jen; Chung, Chia-Min; Chen, Jia-Wei; Pan, Wen-Harn

    2009-01-01

    Genome-wide association studies, which analyzes hundreds of thousands of single-nucleotide polymorphisms to identify disease susceptibility genes, are challenging because the work involves intensive computation and complex modeling. We propose a two-stage genome-wide association scanning procedure, consisting of a single-locus association scan for the first stage and a gene-based association scan for the second stage. Marginal effects of single-nucleotide polymorphisms are examined by using the exact Armitage trend test or logistic regression, and gene effects are examined by using a p-value combination method. Compared with some existing single-locus and multilocus methods, the proposed method has the following merits: 1) convenient for definition of biologically meaningful regions, 2) powerful for detection of minor-effect genes, 3) helpful for alleviation of a multiple-testing problem, and 4) convenient for result interpretation. The method was applied to study Genetic Analysis Workshop 16 Problem 1 rheumatoid arthritis data, and strong association signals were found. The results show that the human major histocompatibility complex region is the most important genomic region associated with rheumatoid arthritis. Moreover, previously reported genes including PTPN22, C5, and IL2RB were confirmed; novel genes including HLA-DRA, BTNL2, C6orf10, NOTCH4, TAP2, and TNXB were identified by our analysis. PMID:20018002

  1. A genomic signature and the identification of new sporulation genes.

    PubMed

    Abecasis, Ana B; Serrano, Mónica; Alves, Renato; Quintais, Leonor; Pereira-Leal, José B; Henriques, Adriano O

    2013-05-01

    Bacterial endospores are the most resistant cell type known to humans, as they are able to withstand extremes of temperature, pressure, chemical injury, and time. They are also of interest because the endospore is the infective particle in a variety of human and livestock diseases. Endosporulation is characterized by the morphogenesis of an endospore within a mother cell. Based on the genes known to be involved in endosporulation in the model organism Bacillus subtilis, a conserved core of about 100 genes was derived, representing the minimal machinery for endosporulation. The core was used to define a genomic signature of about 50 genes that are able to distinguish endospore-forming organisms, based on complete genome sequences, and we show this 50-gene signature is robust against phylogenetic proximity and other artifacts. This signature includes previously uncharacterized genes that we can now show are important for sporulation in B. subtilis and/or are under developmental control, thus further validating this genomic signature. We also predict that a series of polyextremophylic organisms, as well as several gut bacteria, are able to form endospores, and we identified 3 new loci essential for sporulation in B. subtilis: ytaF, ylmC, and ylzA. In all, the results support the view that endosporulation likely evolved once, at the base of the Firmicutes phylum, and is unrelated to other bacterial cell differentiation programs and that this involved the evolution of new genes and functions, as well as the cooption of ancestral, housekeeping functions. PMID:23396918

  2. GENOME-ENABLED DISCOVERY OF CARBON SEQUESTRATION GENES IN POPLAR

    SciTech Connect

    DAVIS J M

    2007-10-11

    Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, the ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.

  3. A Genomic Signature and the Identification of New Sporulation Genes

    PubMed Central

    Abecasis, Ana B.; Serrano, Mónica; Alves, Renato; Quintais, Leonor

    2013-01-01

    Bacterial endospores are the most resistant cell type known to humans, as they are able to withstand extremes of temperature, pressure, chemical injury, and time. They are also of interest because the endospore is the infective particle in a variety of human and livestock diseases. Endosporulation is characterized by the morphogenesis of an endospore within a mother cell. Based on the genes known to be involved in endosporulation in the model organism Bacillus subtilis, a conserved core of about 100 genes was derived, representing the minimal machinery for endosporulation. The core was used to define a genomic signature of about 50 genes that are able to distinguish endospore-forming organisms, based on complete genome sequences, and we show this 50-gene signature is robust against phylogenetic proximity and other artifacts. This signature includes previously uncharacterized genes that we can now show are important for sporulation in B. subtilis and/or are under developmental control, thus further validating this genomic signature. We also predict that a series of polyextremophylic organisms, as well as several gut bacteria, are able to form endospores, and we identified 3 new loci essential for sporulation in B. subtilis: ytaF, ylmC, and ylzA. In all, the results support the view that endosporulation likely evolved once, at the base of the Firmicutes phylum, and is unrelated to other bacterial cell differentiation programs and that this involved the evolution of new genes and functions, as well as the cooption of ancestral, housekeeping functions. PMID:23396918

  4. Genome-wide identification and functional analyses of calmodulin genes in Solanaceous species

    PubMed Central

    2013-01-01

    Background Calmodulin (CaM) is a major calcium sensor in all eukaryotes. It binds calcium and modulates the activity of a wide range of downstream proteins in response to calcium signals. However, little is known about the CaM gene family in Solanaceous species, including the economically important species, tomato (Solanum lycopersicum), and the gene silencing model plant, Nicotiana benthamiana. Moreover, the potential function of CaM in plant disease resistance remains largely unclear. Results We performed genome-wide identification of CaM gene families in Solanaceous species. Employing bioinformatics approaches, multiple full-length CaM genes were identified from tomato, N. benthamiana and potato (S. tuberosum) genomes, with tomato having 6 CaM genes, N. benthamiana having 7 CaM genes, and potato having 4 CaM genes. Sequence comparison analyses showed that three tomato genes, SlCaM3/4/5, two potato genes StCaM2/3, and two sets of N. benthamiana genes, NbCaM1/2/3/4 and NbCaM5/6, encode identical CaM proteins, yet the genes contain different intron/exon organization and are located on different chromosomes. Further sequence comparisons and gene structural and phylogenetic analyses reveal that Solanaceous species gained a new group of CaM genes during evolution. These new CaM genes are unusual in that they contain three introns in contrast to only a single intron typical of known CaM genes in plants. The tomato CaM (SlCaM) genes were found to be expressed in all organs. Prediction of cis-acting elements in 5' upstream sequences and expression analyses demonstrated that SlCaM genes have potential to be highly responsive to a variety of biotic and abiotic stimuli. Additionally, silencing of SlCaM2 and SlCaM6 altered expression of a set of signaling and defense-related genes and resulted in significantly lower resistance to Tobacco rattle virus and the oomycete pathogen, Pythium aphanidermatum. Conclusions The CaM gene families in the Solanaceous species tomato, N

  5. Molecular cloning, genomic structure, and genetic mapping of two Rdl-orthologous genes of GABA receptors in the diamondback moth, Plutella xylostella.

    PubMed

    Yuan, Guorui; Gao, Weiyue; Yang, Yihua; Wu, Yidong

    2010-06-01

    The Resistance to dieldrin (Rdl) gene encodes a subunit of the insect gamma-aminobutyric acid (GABA) receptor. Cyclodiene resistance in many insects is associated with replacement of a single amino acid (alanine at position 302) with either a serine or a glycine in the Rdl gene. Two Rdl-orthologous genes of GABA receptors (PxGABARalpha1 and PxGABARalpha2) were cloned and sequenced from a susceptible strain (Roth) of Plutella xylostella. PxGABARalpha1 and PxGABARalpha2 showed 84% and 77% identity with the Rdl gene of Drosophila melanogaster at an amino acid level, respectively. The coding regions of PxGABARalpha1 and PxGABARalpha2 both comprise ten exons, with two alternative RNA-splicing forms in exon 3 of both genes. At the orthologous position of alanine-302 in D. melanogaster Rdl, PxGABARalpha1 has a conserved alanine at position 282. PxGABARalpha2 has a serine instead of an alanine at the equivalent position. With two informative DNA markers, both PxGABARalpha1 and PxGABARalpha2 were mapped onto the Z chromosome of P. xylostella. PMID:20513056

  6. Gene discovery in the Acanthamoeba castellanii genome

    SciTech Connect

    Anderson, Iain J.; Watkins, Russell F.; Samuelson, John; Spencer,David F.; Majoros, William H.; Gray, Michael W.; Loftus, Brendan J.

    2005-08-01

    Acanthamoeba castellanii is a free-living amoeba found in soil, freshwater, and marine environments and an important predator of bacteria. Acanthamoeba castellanii is also an opportunistic pathogen of clinical interest, responsible for several distinct diseases in humans. In order to provide a genomic platform for the study of this ubiquitous and important protist, we generated a sequence survey of approximately 0.5 x coverage of the genome. The data predict that A. castellanii exhibits a greater biosynthetic capacity than the free-living Dictyostelium discoideum and the parasite Entamoeba histolytica, providing an explanation for the ability of A. castellanii to inhabit adversity of environments. Alginate lyase may provide access to bacteria within biofilms by breaking down the biofilm matrix, and polyhydroxybutyrate depolymerase may facilitate utilization of the bacterial storage compound polyhydroxybutyrate as a food source. Enzymes for the synthesis and breakdown of cellulose were identified, and they likely participate in encystation and excystation as in D. discoideum. Trehalose-6-phosphate synthase is present, suggesting that trehalose plays a role in stress adaptation. Detection and response to a number of stress conditions is likely accomplished with a large set of signal transduction histidine kinases and a set of putative receptorserine/threonine kinases similar to those found in E. histolytica. Serine, cysteine and metalloproteases were identified, some of which are likely involved in pathogenicity.

  7. Databases of homologous gene families for comparative genomics

    PubMed Central

    Penel, Simon; Arigon, Anne-Muriel; Dufayard, Jean-François; Sertier, Anne-Sophie; Daubin, Vincent; Duret, Laurent; Gouy, Manolo; Perrière, Guy

    2009-01-01

    Background Comparative genomics is a central step in many sequence analysis studies, from gene annotation and the identification of new functional regions in genomes, to the study of evolutionary processes at the molecular level (speciation, single gene or whole genome duplications, etc.) and phylogenetics. In that context, databases providing users high quality homologous families and sequence alignments as well as phylogenetic trees based on state of the art algorithms are becoming indispensable. Methods We developed an automated procedure allowing massive all-against-all similarity searches, gene clustering, multiple alignments computation, and phylogenetic trees construction and reconciliation. The application of this procedure to a very large set of sequences is possible through parallel computing on a large computer cluster. Results Three databases were developed using this procedure: HOVERGEN, HOGENOM and HOMOLENS. These databases share the same architecture but differ in their content. HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl. Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface. The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes. The three databases, as well as the software required to build and query them, can be used or downloaded from the PBIL (Pôle Bioinformatique Lyonnais) site at . PMID:19534752

  8. Re-Examining the Gene in Personalized Genomics

    ERIC Educational Resources Information Center

    Bartol, Jordan

    2013-01-01

    Personalized genomics companies (PG; also called "direct-to-consumer genetics") are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept…

  9. GMOL: An Interactive Tool for 3D Genome Structure Visualization.

    PubMed

    Nowotny, Jackson; Wells, Avery; Oluwadare, Oluwatosin; Xu, Lingfei; Cao, Renzhi; Trieu, Tuan; He, Chenfeng; Cheng, Jianlin

    2016-01-01

    It has been shown that genome spatial structures largely affect both genome activity and DNA function. Knowing this, many researchers are currently attempting to accurately model genome structures. Despite these increased efforts there still exists a shortage of tools dedicated to visualizing the genome. Creating a tool that can accurately visualize the genome can aid researchers by highlighting structural relationships that may not be obvious when examining the sequence information alone. Here we present a desktop application, known as GMOL, designed to effectively visualize genome structures so that researchers may better analyze genomic data. GMOL was developed based upon our multi-scale approach that allows a user to scale between six separate levels within the genome. With GMOL, a user can choose any unit at any scale and scale it up or down to visualize its structure and retrieve corresponding genome sequences. Users can also interactively manipulate and measure the whole genome structure and extract static images and machine-readable data files in PDB format from the multi-scale structure. By using GMOL researchers will be able to better understand and analyze genome structure models and the impact their structural relations have on genome activity and DNA function. PMID:26868282

  10. Review: Progress in the Researches on Insect Mitochondrial Genome and Analysis of Gene Order

    NASA Astrophysics Data System (ADS)

    Hu, Li; Jianyu, Gao; Haiyu, Liu; Wanzhi, Cai

    2009-04-01

    Insect mitochondrial genome is a double-stranded circular genomes which range from 14,503 bp to 19,571 bp in size. Nearly all the sequenced insect mitochondrial genomes encode 37 genes: two for rRNAs, 13 for proteins and 22 for tRNAs. This review compares and summarizes the features of complete mitochondrial genomes from 175 sequenced species of insects in 22 orders. The genomic organization, contents, gene order, and rearrangements of gene order are analyzed.

  11. Genome structure of cottontail rabbit herpesvirus.

    PubMed

    Cebrian, J; Berthelot, N; Laithier, M

    1989-02-01

    The genome structure of a herpesvirus isolated from primary cultures of kidney cells from the cottontail rabbit Sylvilagus floridanus was elucidated by using electron microscopy and restriction enzyme analysis. The genome, which was about 150 kilobase pairs long and which had an average G + C composition of 45%, consisted of two regions with unique base sequences (54 and 47 kilobase pairs) enclosed by reiterations of a 925-base-pair sequence with a variable copy number. The internal repeats were in opposite polarity with respect to the terminal repeats, and both unique regions underwent inversion. The nucleotide sequence of the repeat unit was determined, and virion DNA termini were precisely localized within this sequence. Elements showing homology with the cleavage-packaging signals common to other herpesviruses were detected. The data indicate that this virus is different from the previously described herpesvirus sylvilagus. PMID:2911115

  12. Gene identification and classification in the Synechocystis genomic sequence by recursive gene mark analysis.

    PubMed

    Hirosawa, M; Isono, K; Hayes, W; Borodovsky, M

    1997-01-01

    The GeneMark method has proven to be an efficient gene-finding tool for the analysis of prokaryotic genomic sequence data. We have developed a procedure of deriving and utilizing several GeneMark models in order to get better gene-detection performance. Upon applying this procedure to the 1.0 Mb contiguous DNA sequence of Synechocystis sp. strain PCC6803, we were able to cluster predicted genes into distinct classes and to produce the class-specific GeneMark models reflecting statistical characteristics of each gene class. One gene class apparently includes genes of exogenous origin. Using class-specific models reduces the gene under prediction error rate down to 1.7% in comparison with 8.1% reported in the previous study when only one GeneMark model was used. PMID:9522117

  13. Comparative genomics and transcriptomics of trait-gene association

    PubMed Central

    2012-01-01

    Background The Order Rickettsiales includes important tick-borne pathogens, from Rickettsia rickettsii, which causes Rocky Mountain spotted fever, to Anaplasma marginale, the most prevalent vector-borne pathogen of cattle. Although most pathogens in this Order are transmitted by arthropod vectors, little is known about the microbial determinants of transmission. A. marginale provides unique tools for studying the determinants of transmission, with multiple strain sequences available that display distinct and reproducible transmission phenotypes. The closed core A. marginale genome suggests that any phenotypic differences are due to single nucleotide polymorphisms (SNPs). We combined DNA/RNA comparative genomic approaches using strains with different tick transmission phenotypes and identified genes that segregate with transmissibility. Results Comparison of seven strains with different transmission phenotypes generated a list of SNPs affecting 18 genes and nine promoters. Transcriptional analysis found two candidate genes downstream from promoter SNPs that were differentially transcribed. To corroborate the comparative genomics approach we used three RNA-seq platforms to analyze the transcriptomes from two A. marginale strains with different transmission phenotypes. RNA-seq analysis confirmed the comparative genomics data and found 10 additional genes whose transcription between strains with distinct transmission efficiencies was significantly different. Six regions of the genome that contained no annotation were found to be transcriptionally active, and two of these newly identified transcripts were differentially transcribed. Conclusions This approach identified 30 genes and two novel transcripts potentially involved in tick transmission. We describe the transcriptome of an obligate intracellular bacterium in depth, while employing massive parallel sequencing to dissect an important trait in bacterial pathogenesis. PMID:23181781

  14. Genome-Wide Scans for Delineation of Candidate Genes Regulating Seed-Protein Content in Chickpea.

    PubMed

    Upadhyaya, Hari D; Bajaj, Deepak; Narnoliya, Laxmi; Das, Shouvik; Kumar, Vinod; Gowda, C L L; Sharma, Shivali; Tyagi, Akhilesh K; Parida, Swarup K

    2016-01-01

    Identification of potential genes/alleles governing complex seed-protein content (SPC) is essential in marker-assisted breeding for quality trait improvement of chickpea. Henceforth, the present study utilized an integrated genomics-assisted breeding strategy encompassing trait association analysis, selective genotyping in traditional bi-parental mapping population and differential expression profiling for the first-time to understand the complex genetic architecture of quantitative SPC trait in chickpea. For GWAS (genome-wide association study), high-throughput genotyping information of 16376 genome-based SNPs (single nucleotide polymorphism) discovered from a structured population of 336 sequenced desi and kabuli accessions [with 150-200 kb LD (linkage disequilibrium) decay] was utilized. This led to identification of seven most effective genomic loci (genes) associated [10-20% with 41% combined PVE (phenotypic variation explained)] with SPC trait in chickpea. Regardless of the diverse desi and kabuli genetic backgrounds, a comparable level of association potential of the identified seven genomic loci with SPC trait was observed. Five SPC-associated genes were validated successfully in parental accessions and homozygous individuals of an intra-specific desi RIL (recombinant inbred line) mapping population (ICC 12299 × ICC 4958) by selective genotyping. The seed-specific expression, including differential up-regulation (>four fold) of six SPC-associated genes particularly in accessions, parents and homozygous individuals of the aforementioned mapping population with a high level of contrasting SPC (21-22%) was evident. Collectively, the integrated genomic approach delineated diverse naturally occurring novel functional SNP allelic variants in six potential candidate genes regulating SPC trait in chickpea. Of these, a non-synonymous SNP allele-carrying zinc finger transcription factor gene exhibiting strong association with SPC trait was found to be the most

  15. Genome-Wide Scans for Delineation of Candidate Genes Regulating Seed-Protein Content in Chickpea

    PubMed Central

    Upadhyaya, Hari D.; Bajaj, Deepak; Narnoliya, Laxmi; Das, Shouvik; Kumar, Vinod; Gowda, C. L. L.; Sharma, Shivali; Tyagi, Akhilesh K.; Parida, Swarup K.

    2016-01-01

    Identification of potential genes/alleles governing complex seed-protein content (SPC) is essential in marker-assisted breeding for quality trait improvement of chickpea. Henceforth, the present study utilized an integrated genomics-assisted breeding strategy encompassing trait association analysis, selective genotyping in traditional bi-parental mapping population and differential expression profiling for the first-time to understand the complex genetic architecture of quantitative SPC trait in chickpea. For GWAS (genome-wide association study), high-throughput genotyping information of 16376 genome-based SNPs (single nucleotide polymorphism) discovered from a structured population of 336 sequenced desi and kabuli accessions [with 150–200 kb LD (linkage disequilibrium) decay] was utilized. This led to identification of seven most effective genomic loci (genes) associated [10–20% with 41% combined PVE (phenotypic variation explained)] with SPC trait in chickpea. Regardless of the diverse desi and kabuli genetic backgrounds, a comparable level of association potential of the identified seven genomic loci with SPC trait was observed. Five SPC-associated genes were validated successfully in parental accessions and homozygous individuals of an intra-specific desi RIL (recombinant inbred line) mapping population (ICC 12299 × ICC 4958) by selective genotyping. The seed-specific expression, including differential up-regulation (>four fold) of six SPC-associated genes particularly in accessions, parents and homozygous individuals of the aforementioned mapping population with a high level of contrasting SPC (21–22%) was evident. Collectively, the integrated genomic approach delineated diverse naturally occurring novel functional SNP allelic variants in six potential candidate genes regulating SPC trait in chickpea. Of these, a non-synonymous SNP allele-carrying zinc finger transcription factor gene exhibiting strong association with SPC trait was found to be the most

  16. Ab initio gene identification: prokaryote genome annotation with GeneScan and GLIMMER.

    PubMed

    Aggarwal, Gautam; Ramaswamy, Ramakrishna

    2002-02-01

    We compare the annotation of three complete genomes using the ab initio methods of gene identification GeneScan and GLIMMER. The annotation given in GenBank, the standard against which these are compared, has been made using GeneMark. We find a number of novel genes which are predicted by both methods used here, as well as a number of genes that are predicted by GeneMark, but are not identified by either of the nonconsensus methods that we have used. The three organisms studied here are all prokaryotic species with fairly compact genomes. The Fourier measure forms the basis for an efficient non-consensus method for gene prediction, and the algorithm GeneScan exploits this measure. We have bench-marked this program as well as GLIMMER using 3 complete prokaryotic genomes. An effort has also been made to study the limitations of these techniques for complete genome analysis. GeneScan and GLIMMER are of comparable accuracy insofar as gene-identification is concerned, with sensitivities and specificities typically greater than 0.9. The number of false predictions (both positive and negative) is higher for GeneScan as compared to GLIMMER, but in a significant number of cases, similar results are provided by the two techniques. This suggests that there could be some as-yet unidentified additional genes in these three genomes, and also that some of the putative identifications made hitherto might require re-evaluation. All these cases are discussed in detail. PMID:11927773

  17. Diversity of 5S rRNA genes within individual prokaryotic genomes

    PubMed Central

    Pei, Anna; Li, Hongru; Oberdorf, William E; Alekseyenko, Alexander V.; Parsons, Tamasha; Yang, Liying; Gerz, Erika A.; Lee, Peng; Xiang, Charlie; Nossa, Carlos W.; Pei, Zhiheng

    2012-01-01

    We examined intragenomic variation of paralogous 5S rRNA genes to evaluate the concept of ribosomal constraints. In a dataset containing 1168 genomes from 779 unique species, 96 species exhibited >3% diversity. Twenty seven species with >10% diversity contained a total of 421 mismatches between all pairs of the most dissimilar copies of 5S rRNA genes. The large majority (401 of 421) the diversified positions were conserved at the secondary structure level. The high diversity was associated with partial rRNA operon, split operon, or spacer length-related divergence. In total, these findings indicated that there were tight ribosomal constraints on paralogous 5S rRNA genes in a genome despite of the high degree of diversity at the primary structure level. There is supplementary material. PMID:22765222

  18. Cohesin: genomic insights into controlling gene transcription and development

    PubMed Central

    Dorsett, Dale

    2011-01-01

    Over the past decade it has emerged that the cohesin protein complex, which functions in sister chromatid cohesion, chromosome segregation and DNA repair, also regulates gene expression and development. Even minor changes in cohesin activity alter several aspects of development. Genome-wide analysis indicates that cohesin directly regulates transcription of genes involved in cell proliferation, pluripotency, and differentiation through multiple mechanisms. These mechanisms are poorly understood, but involve both partial gene repression in concert with Polycomb group proteins, and facilitating long-range looping, both between enhancers and promoters, and between CTCF protein binding sites. PMID:21324671

  19. No genes for intelligence in the fluid genome.

    PubMed

    Ho, Mae-Wan

    2013-01-01

    Revolution is brewing belatedly within the heartlands of the genetic determinist establishment still in denial about the fluid genome that makes identifying genes even for common disease well-nigh impossible. The fruitless hunt for intelligence genes serves to expose the poverty of an obsolete paradigm that is obstructing knowledge and preventing fruitful policies from being widely implemented. Genome-wide scans using state-of-the art technologies on extensive databases have failed to find a single gene for intelligence; instead, environment and maternal effects may account for most, if not all correlation among relatives, while identical twins diverge genetically and epigenetically throughout life. Abundant evidence points to the enormous potential for improving intellectual abilities (and health) through simple environmental and social interventions. PMID:23865113

  20. An Integrated Genomic Strategy Delineates Candidate Mediator Genes Regulating Grain Size and Weight in Rice

    PubMed Central

    Malik, Naveen; Dwivedi, Nidhi; Singh, Ashok K.; Parida, Swarup K.; Agarwal, Pinky; Thakur, Jitendra K.; Tyagi, Akhilesh K.

    2016-01-01

    The present study deployed a Mediator (MED) genes-mediated integrated genomic strategy for understanding the complex genetic architecture of grain size/weight quantitative trait in rice. The targeted multiplex amplicon resequencing of 55 MED genes annotated from whole rice genome in 384 accessions discovered 3971 SNPs, which were structurally and functionally annotated in diverse coding and non-coding sequence-components of genes. Association analysis, using the genotyping information of 3971 SNPs in a structured population of 384 accessions (with 50–100 kb linkage disequilibrium decay), detected 10 MED gene-derived SNPs significantly associated (46% combined phenotypic variation explained) with grain length, width and weight in rice. Of these, one strong grain weight-associated non-synonymous SNP (G/A)-carrying OsMED4_2 gene was validated successfully in low- and high-grain weight parental accessions and homozygous individuals of a rice mapping population. The seed-specific expression, including differential up/down-regulation of three grain size/weight-associated MED genes (including OsMED4_2) in six low and high-grain weight rice accessions was evident. Altogether, combinatorial genomic approach involving haplotype-based association analysis delineated diverse functionally relevant natural SNP-allelic variants in 10 MED genes, including three potential novel SNP haplotypes in an OsMED4_2 gene governing grain size/weight differentiation in rice. These molecular tags have potential to accelerate genomics-assisted crop improvement in rice. PMID:27000976

  1. Genome-Wide Identification and Expression Analysis of WRKY Gene Family in Capsicum annuum L.

    PubMed Central

    Diao, Wei-Ping; Snyder, John C.; Wang, Shu-Bin; Liu, Jin-Bing; Pan, Bao-Gui; Guo, Guang-Jun; Wei, Ge

    2016-01-01

    The WRKY family of transcription factors is one of the most important families of plant transcriptional regulators with members regulating multiple biological processes, especially in regulating defense against biotic and abiotic stresses. However, little information is available about WRKYs in pepper (Capsicum annuum L.). The recent release of completely assembled genome sequences of pepper allowed us to perform a genome-wide investigation for pepper WRKY proteins. In the present study, a total of 71 WRKY genes were identified in the pepper genome. According to structural features of their encoded proteins, the pepper WRKY genes (CaWRKY) were classified into three main groups, with the second group further divided into five subgroups. Genome mapping analysis revealed that CaWRKY were enriched on four chromosomes, especially on chromosome 1, and 15.5% of the family members were tandemly duplicated genes. A phylogenetic tree was constructed depending on WRKY domain' sequences derived from pepper and Arabidopsis. The expression of 21 selected CaWRKY genes in response to seven different biotic and abiotic stresses (salt, heat shock, drought, Phytophtora capsici, SA, MeJA, and ABA) was evaluated by quantitative RT-PCR; Some CaWRKYs were highly expressed and up-regulated by stress treatment. Our results will provide a platform for functional identification and molecular breeding studies of WRKY genes in pepper. PMID:26941768

  2. Genome Copy Numbers and Gene Conversion in Methanogenic Archaea▿

    PubMed Central

    Hildenbrand, Catherina; Stock, Tilmann; Lange, Christian; Rother, Michael; Soppa, Jörg

    2011-01-01

    Previous studies revealed that one species of methanogenic archaea, Methanocaldococcus jannaschii, is polyploid, while a second species, Methanothermobacter thermoautotrophicus, is diploid. To further investigate the distribution of ploidy in methanogenic archaea, species of two additional genera—Methanosarcina acetivorans and Methanococcus maripaludis—were investigated. M. acetivorans was found to be polyploid during fast growth (tD = 6 h; 17 genome copies) and oligoploid during slow growth (doubling time = 49 h; 3 genome copies). M. maripaludis has the highest ploidy level found for any archaeal species, with up to 55 genome copies in exponential phase and ca. 30 in stationary phase. A compilation of archaeal species with quantified ploidy levels reveals a clear dichotomy between Euryarchaeota and Crenarchaeota: none of seven euryarchaeal species of six genera is monoploid (haploid), while, in contrast, all six crenarchaeal species of four genera are monoploid, indicating significant genetic differences between these two kingdoms. Polyploidy in asexual species should lead to accumulation of inactivating mutations until the number of intact chromosomes per cell drops to zero (called “Muller's ratchet”). A mechanism to equalize the genome copies, such as gene conversion, would counteract this phenomenon. Making use of a previously constructed heterozygous mutant strain of the polyploid M. maripaludis we could show that in the absence of selection very fast equalization of genomes in M. maripaludis took place probably via a gene conversion mechanism. In addition, it was shown that the velocity of this phenomenon is inversely correlated to the strength of selection. PMID:21097629

  3. Structure of the human retinoblastoma gene

    SciTech Connect

    Hong, F.D.; Huang, Hueijen S.; To, Hoang; Young, Lihjiuan S.; Oro, A.; Bookstein, R.; Lee, E.Y.H.P.; Lee, Wenhwa )

    1989-07-01

    Complete inactivation of the human retinoblastoma gene (RB) is believed to be an essential step in tumorigenesis of several different cancers. To provide a framework for understanding inactivation mechanisms, the structure of RB was delineated. The RB transcript is encoded in 27 exons dispersed over about 200 kilobases (kb) of genomic DNA. The length of individual exons ranges from 31 to 1,889 base pairs (bp). The largest intron spans >60 kb and the smallest one has only 80 bp. Deletion of exons 13-17 is frequently observed in various types of tumors, including retinoblastoma, breast cancer, and osteosarcoma, and the presence of a potential hot spot for recombination in the region is predicted. A putative leucine-zipper motif is exclusively encoded by exon 20. The detailed RB structure presented should prove useful in defining potential functional domains of its encoded protein. Transcription of RB is initiated at multiple positions and the sequences surrounding the initiation sites have a high G+C content. A typical upstream TATA box is not present. Localization of the RB promoter region was accomplished by utilizing a heterologous expression system containing a bacterial chloramphenicol acetyltransferase gene. Deletion analysis revealed that a region as small as 70 bp is sufficient for RB promoter activity, similar to other previously characterized G+C-rich gene promoters. Several direct repeats and possible stem-and-loop structures are found in the promoter region.

  4. Genome-wide identification, phylogeny, and expression of fibroblast growth genes in common carp.

    PubMed

    Jiang, Likun; Zhang, Songhao; Dong, Chuanju; Chen, Baohua; Feng, Jingyan; Peng, Wenzhu; Mahboob, Shahid; Al-Ghanim, Khalid A; Xu, Peng

    2016-03-10

    Fibroblast growth factors (FGFs) are a large family of polypeptide growth factors, which are found in organisms ranging from nematodes to humans. In vertebrates, a number of FGFs have been shown to play important roles in developing embryos and adult organisms. Among the vertebrate species, FGFs are highly conserved in both gene structure and amino-acid sequence. However, studies on teleost FGFs are mainly limited to model species, hence we investigated FGFs in the common carp genome. We identified 35 FGFs in the common carp genome. Phylogenetic analysis revealed that most of the FGFs are highly conserved, though recent gene duplication and gene losses do exist. By examining the copy number of FGFs in several vertebrate genomes, we found that eight FGFs in common carp have undergone gene duplications, including FGF6a, FGF6b, FGF7, FGF8b, FGF10a, FGF11b, FGF13a, and FGF18b. The expression patterns of all FGFs were examined in various tissues, including the blood, brain, gill, heart, intestine, muscle, skin, spleen and kidney, showing that most of the FGFs were ubiquitously expressed, indicating their critical role in common carp. To some extent, examination of gene families with detailed phylogenetic or orthology analysis verified the authenticity and accuracy of assembly and annotation of the recently published common carp whole genome sequences. Gene families are also considered as a unique source for evolutionary studies. Moreover, the whole set of common carp FGF gene family provides an important genomic resource for future biochemical, physiological, and phylogenetic studies on FGFs in teleosts. PMID:26691502

  5. Genomic architecture of MHC-linked odorant receptor gene repertoires among 16 vertebrate species.

    PubMed

    Santos, Pablo Sandro Carvalho; Kellermann, Thomas; Uchanska-Ziegler, Barbara; Ziegler, Andreas

    2010-09-01

    The recent sequencing and assembly of the genomes of different organisms have shown that almost all vertebrates studied in detail so far have one or more clusters of genes encoding odorant receptors (OR) in close physical linkage to the major histocompatibility complex (MHC). It has been postulated that MHC-linked OR genes could be involved in MHC-influenced mate choice, comprising both pre- as well as post-copulatory mechanisms. We have therefore carried out a systematic comparison of protein sequences of these receptors from the genomes of man, chimpanzee, gorilla, orangutan, rhesus macaque, mouse, rat, dog, cat, cow, pig, horse, elephant, opossum, frog and zebra fish (amounting to a total of 559 protein sequences) in order to identify OR families exhibiting evolutionarily conserved MHC linkage. In addition, we compared the genomic structure of this region within these 16 species, accounting for presence or absence of OR gene families, gene order, transcriptional orientation and linkage to the MHC or framework genes. The results are presented in the form of gene maps and phylogenetic analyses that reveal largely concordant repertoires of gene families, at least among tetrapods, although each of the eight taxa studied (primates, rodents, ungulates, carnivores, proboscids, marsupials, amphibians and teleosts) exhibits a typical architecture of MHC (or MHC framework loci)-linked OR genes. Furthermore, the comparison of the genomic organization of this region has implications for phylogenetic relationships between closely related taxa, especially in disputed cases such as the evolutionary history of even- and odd-toed ungulates and carnivores. Finally, the largely conserved linkage between distinct OR genes and the MHC supports the concept that particular alleles within a given haplotype function in a concerted fashion during self-/non-self-discrimination processes in reproduction. PMID:20680261

  6. Comparative 3D Genome Structure Analysis of the Fission and the Budding Yeast

    PubMed Central

    Gong, Ke; Tjong, Harianto; Zhou, Xianghong Jasmine; Alber, Frank

    2015-01-01

    We studied the 3D structural organization of the fission yeast genome, which emerges from the tethering of heterochromatic regions in otherwise randomly configured chromosomes represented as flexible polymer chains in an nuclear environment. This model is sufficient to explain in a statistical manner many experimentally determined distinctive features of the fission yeast genome, including chromatin interaction patterns from Hi-C experiments and the co-locations of functionally related and co-expressed genes, such as genes expressed by Pol-III. Our findings demonstrate that some previously described structure-function correlations can be explained as a consequence of random chromatin collisions driven by a few geometric constraints (mainly due to centromere-SPB and telomere-NE tethering) combined with the specific gene locations in the chromosome sequence. We also performed a comparative analysis between the fission and budding yeast genome structures, for which we previously detected a similar organizing principle. However, due to the different chromosome sizes and numbers, substantial differences are observed in the 3D structural genome organization between the two species, most notably in the nuclear locations of orthologous genes, and the extent of nuclear territories for genes and chromosomes. However, despite those differences, remarkably, functional similarities are maintained, which is evident when comparing spatial clustering of functionally related genes in both yeasts. Functionally related genes show a similar spatial clustering behavior in both yeasts, even though their nuclear locations are largely different between the yeast species. PMID:25799503

  7. Population and Functional Genomics of Neisseria Revealed with Gene-by-Gene Approaches

    PubMed Central

    Harrison, Odile B.

    2016-01-01

    Rapid low-cost whole-genome sequencing (WGS) is revolutionizing microbiology; however, complementary advances in accessible, reproducible, and rapid analysis techniques are required to realize the potential of these data. Here, investigations of the genus Neisseria illustrated the gene-by-gene conceptual approach to the organization and analysis of WGS data. Using the gene and its link to phenotype as a starting point, the BIGSdb database, which powers the PubMLST databases, enables the assembly of large open-access collections of annotated genomes that provide insight into the evolution of the Neisseria, the epidemiology of meningococcal and gonococcal disease, and mechanisms of Neisseria pathogenicity. PMID:27098959

  8. Population and Functional Genomics of Neisseria Revealed with Gene-by-Gene Approaches.

    PubMed

    Maiden, Martin C J; Harrison, Odile B

    2016-08-01

    Rapid low-cost whole-genome sequencing (WGS) is revolutionizing microbiology; however, complementary advances in accessible, reproducible, and rapid analysis techniques are required to realize the potential of these data. Here, investigations of the genus Neisseria illustrated the gene-by-gene conceptual approach to the organization and analysis of WGS data. Using the gene and its link to phenotype as a starting point, the BIGSdb database, which powers the PubMLST databases, enables the assembly of large open-access collections of annotated genomes that provide insight into the evolution of the Neisseria, the epidemiology of meningococcal and gonococcal disease, and mechanisms of Neisseria pathogenicity. PMID:27098959

  9. A genomic approach to identify hybrid incompatibility genes

    PubMed Central

    Cooper, Jacob C.; Phadnis, Nitin

    2016-01-01

    ABSTRACT Uncovering the genetic and molecular basis of barriers to gene flow between populations is key to understanding how new species are born. Intrinsic postzygotic reproductive barriers such as hybrid sterility and hybrid inviability are caused by deleterious genetic interactions known as hybrid incompatibilities. The difficulty in identifying these hybrid incompatibility genes remains a rate-limiting step in our understanding of the molecular basis of speciation. We recently described how whole genome sequencing can be applied to identify hybrid incompatibility genes, even from genetically terminal hybrids. Using this approach, we discovered a new hybrid incompatibility gene, gfzf, between Drosophila melanogaster and Drosophila simulans, and found that it plays an essential role in cell cycle regulation. Here, we discuss the history of the hunt for incompatibility genes between these species, discuss the molecular roles of gfzf in cell cycle regulation, and explore how intragenomic conflict drives the evolution of fundamental cellular mechanisms that lead to the developmental arrest of hybrids. PMID:27230814

  10. Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome

    PubMed Central

    Aubourg, Sébastien; Martin-Magniette, Marie-Laure; Brunaud, Véronique; Taconnat, Ludivine; Bitton, Frédérique; Balzergue, Sandrine; Jullien, Pauline E; Ingouff, Mathieu; Thareau, Vincent; Schiex, Thomas; Lecharny, Alain; Renou, Jean-Pierre

    2007-01-01

    Background Since the finishing of the sequencing of the Arabidopsis thaliana genome, the Arabidopsis community and the annotator centers have been working on the improvement of gene annotation at the structural and functional levels. In this context, we have used the large CATMA resource on the Arabidopsis transcriptome to search for genes missed by different annotation processes. Probes on the CATMA microarrays are specific gene sequence tags (GSTs) based on the CDS models predicted by the Eugene software. Among the 24 576 CATMA v2 GSTs, 677 are in regions considered as intergenic by the TAIR annotation. We analyzed the cognate transcriptome data in the CATMA resource and carried out data-mining to characterize novel genes and improve gene models. Results The statistical analysis of the results of more than 500 hybridized samples distributed among 12 organs provides an experimental validation for 465 novel genes. The hybridization evidence was confirmed by RT-PCR approaches for 88% of the 465 novel genes. Comparisons with the current annotation show that these novel genes often encode small proteins, with an average size of 137 aa. Our approach has also led to the improvement of pre-existing gene models through both the extension of 16 CDS and the identification of 13 gene models erroneously constituted of two merged CDS. Conclusion This work is a noticeable step forward in the improvement of the Arabidopsis genome annotation. We increased the number of Arabidopsis validated genes by 465 novel transcribed genes to which we associated several functional annotations such as expression profiles, sequence conservation in plants, cognate transcripts and protein motifs. PMID:17980019

  11. Integrated Genomic and Gene Expression Profiling Identifies Two Major Genomic Circuits in Urothelial Carcinoma

    PubMed Central

    Lindgren, David; Sjödahl, Gottfrid; Lauss, Martin; Staaf, Johan; Chebil, Gunilla; Lövgren, Kristina; Gudjonsson, Sigurdur; Liedberg, Fredrik; Patschan, Oliver; Månsson, Wiking; Fernö, Mårten; Höglund, Mattias

    2012-01-01

    Similar to other malignancies, urothelial carcinoma (UC) is characterized by specific recurrent chromosomal aberrations and gene mutations. However, the interconnection between specific genomic alterations, and how patterns of chromosomal alterations adhere to different molecular subgroups of UC, is less clear. We applied tiling resolution array CGH to 146 cases of UC and identified a number of regions harboring recurrent focal genomic amplifications and deletions. Several potential oncogenes were included in the amplified regions, including known oncogenes like E2F3, CCND1, and CCNE1, as well as new candidate genes, such as SETDB1 (1q21), and BCL2L1 (20q11). We next combined genome profiling with global gene expression, gene mutation, and protein expression data and identified two major genomic circuits operating in urothelial carcinoma. The first circuit was characterized by FGFR3 alterations, overexpression of CCND1, and 9q and CDKN2A deletions. The second circuit was defined by E3F3 amplifications and RB1 deletions, as well as gains of 5p, deletions at PTEN and 2q36, 16q, 20q, and elevated CDKN2A levels. TP53/MDM2 alterations were common for advanced tumors within the two circuits. Our data also suggest a possible RAS/RAF circuit. The tumors with worst prognosis showed a gene expression profile that indicated a keratinized phenotype. Taken together, our integrative approach revealed at least two separate networks of genomic alterations linked to the molecular diversity seen in UC, and that these circuits may reflect distinct pathways of tumor development. PMID:22685613

  12. Viral genome structures are optimal for capsid assembly

    PubMed Central

    Perlmutter, Jason D; Qiao, Cong; Hagan, Michael F

    2013-01-01

    Understanding how virus capsids assemble around their nucleic acid (NA) genomes could promote efforts to block viral propagation or to reengineer capsids for gene therapy applications. We develop a coarse-grained model of capsid proteins and NAs with which we investigate assembly dynamics and thermodynamics. In contrast to recent theoretical models, we find that capsids spontaneously ‘overcharge’; that is, the negative charge of the NA exceeds the positive charge on capsid. When applied to specific viruses, the optimal NA lengths closely correspond to the natural genome lengths. Calculations based on linear polyelectrolytes rather than base-paired NAs underpredict the optimal length, demonstrating the importance of NA structure to capsid assembly. These results suggest that electrostatics, excluded volume, and NA tertiary structure are sufficient to predict assembly thermodynamics and that the ability of viruses to selectively encapsidate their genomic NAs can be explained, at least in part, on a thermodynamic basis. DOI: http://dx.doi.org/10.7554/eLife.00632.001 PMID:23795290

  13. Meet me halfway: when genomics meets structural bioinformatics.

    PubMed

    Gong, Sungsam; Worth, Catherine L; Cheng, Tammy M K; Blundell, Tom L

    2011-06-01

    The DNA sequencing technology developed by Frederick Sanger in the 1970s established genomics as the basis of comparative genetics. The recent invention of next-generation sequencing (NGS) platform has added a new dimension to genome research by generating ultra-fast and high-throughput sequencing data in an unprecedented manner. The advent of NGS technology also provides the opportunity to study genetic diseases where sequence variants or mutations are sought to establish a causal relationship with disease phenotypes. However, it is not a trivial task to seek genetic variants responsible for genetic diseases and even harder for complex diseases such as diabetes and cancers. In such polygenic diseases, multiple genes and alleles, which can exist in healthy individuals, come together to contribute to common disease phenotypes in a complex manner. Hence, it is desirable to have an approach that integrates omics data with both knowledge of protein structure and function and an understanding of networks/pathways, i.e. functional genomics and systems biology; in this way, genotype-phenotype relationships can be better understood. In this review, we bring this 'bottom-up' approach alongside the current NGS-driven genetic study of genetic variations and disease aetiology. We describe experimental and computational techniques for assessing genetic variants and their deleterious effects on protein structure and function. PMID:21350909

  14. Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite.

    PubMed

    Borodovsky, Mark; Lomsadze, Alex

    2014-01-01

    This unit describes how to use several gene-finding programs from the GeneMark line developed for finding protein-coding ORFs in genomic DNA of prokaryotic species, in genomic DNA of eukaryotic species with intronless genes, in genomes of viruses and phages, and in prokaryotic metagenomic sequences, as well as in EST sequences with spliced-out introns. These bioinformatics tools were demonstrated to have state-of-the-art accuracy, and have been frequently used for gene annotation in novel nucleotide sequences. An additional advantage of these sequence-analysis tools is that the problem of algorithm parameterization is solved automatically, with parameters estimated by iterative self-training (unsupervised training). PMID:24510847

  15. Gene identification in prokaryotic genomes, phages, metagenomes, and EST sequences with GeneMarkS suite.

    PubMed

    Borodovsky, Mark; Lomsadze, Alex

    2011-09-01

    This unit describes how to use several gene-finding programs from the GeneMark line developed for finding protein-coding ORFs in genomic DNA of prokaryotic species, in genomic DNA of eukaryotic species with intronless genes, in genomes of viruses and phages, and in prokaryotic metagenomic sequences, as well as in EST sequences with spliced-out introns. These bioinformatics tools were demonstrated to have state-of-the-art accuracy and have been frequently used for gene annotation in novel nucleotide sequences. An additional advantage of these sequence-analysis tools is that the problem of algorithm parameterization is solved automatically, with parameters estimated by iterative self-training (unsupervised training). PMID:21901741

  16. MLST revisited: the gene-by-gene approach to bacterial genomics

    PubMed Central

    Maiden, Martin C. J.; Jansen van Rensburg, Melissa J.; Bray, James E.; Earle, Sarah G.; Ford, Suzanne A.; Jolley, Keith A.; McCarthy, Noel D.

    2014-01-01

    Multilocus sequence typing (MLST) was proposed in 1998 as a portable sequence-based method for identifying clonal relationships among bacteria. Today, in the whole-genome era of microbiology, the need for systematic, standardized descriptions of bacterial genotypic variation remains a priority. Here, to meet this need, we draw on the successes of MLST and 16S rRNA gene sequencing to propose a hierarchical gene-by-gene approach that reflects functional and evolutionary relationships and catalogues bacteria ‘from domain to strain’. Our gene-based typing approach using online platforms such as the Bacterial Isolate Genome Sequence Database (BIGSdb) allows the scalable organization and analysis of whole-genome sequence data. PMID:23979428

  17. (Structure and expression of nuclear genes encoding rubisco activase)

    SciTech Connect

    Zielinski, R.E.

    1990-01-01

    Our activities during the past year have centered around two basic aspects of the project: describing more thoroughly the diurnal and light irradiance effects on activase gene expression in barley; and isolating and structurally characterizing cDNA and genomic DNA sequences encoding activase from barley. Three appendices are included that summarize these activities.

  18. Genomic analyses of bacterial porin-cytochrome gene clusters

    DOE PAGESBeta

    Shi, Liang; Fredrickson, James K.; Zachara, John M.

    2014-11-26

    In this study, the porin-cytochrome (Pcc) protein complex is responsible for trans-outer membrane electron transfer during extracellular reduction of Fe(III) by the dissimilatory metal-reducing bacterium Geobacter sulfurreducens PCA. The identified and characterized Pcc complex of G. sulfurreducens PCA consists of a porin-like outer-membrane protein, a periplasmic 8-heme c type cytochrome (c-Cyt) and an outer-membrane 12-heme c-Cyt, and the genes encoding the Pcc proteins are clustered in the same regions of genome (i.e., the pcc gene clusters) of G. sulfurreducens PCA. A survey of additionally microbial genomes has identified the pcc gene clusters in all sequenced Geobacter spp. and other bacteriamore » from six different phyla, including Anaeromyxobacter dehalogenans 2CP-1, A. dehalogenans 2CP-C, Anaeromyxobacter sp. K, Candidatus Kuenenia stuttgartiensis, Denitrovibrio acetiphilus DSM 12809, Desulfurispirillum indicum S5, Desulfurivibrio alkaliphilus AHT2, Desulfurobacterium thermolithotrophum DSM 11699, Desulfuromonas acetoxidans DSM 684, Ignavibacterium album JCM 16511, and Thermovibrio ammonificans HB-1. The numbers of genes in the pcc gene clusters vary, ranging from two to nine. Similar to the metal-reducing (Mtr) gene clusters of other Fe(III)-reducing bacteria, such as Shewanella spp., additional genes that encode putative c-Cyts with predicted cellular localizations at the cytoplasmic membrane, periplasm and outer membrane often associate with the pcc gene clusters. This suggests that the Pcc-associated c-Cyts may be part of the pathways for extracellular electron transfer reactions. The presence of pcc gene clusters in the microorganisms that do not reduce solid-phase Fe(III) and Mn(IV) oxides, such as D. alkaliphilus AHT2 and I. album JCM 16511, also suggests that some of the pcc gene clusters may be involved in extracellular electron transfer reactions with the substrates other than Fe(III) and Mn(IV) oxides.« less

  19. Genome-Wide Architecture of Disease Resistance Genes in Lettuce.

    PubMed

    Christopoulou, Marilena; Wo, Sebastian Reyes-Chin; Kozik, Alex; McHale, Leah K; Truco, Maria-Jose; Wroblewski, Tadeusz; Michelmore, Richard W

    2015-12-01

    Genome-wide motif searches identified 1134 genes in the lettuce reference genome of cv. Salinas that are potentially involved in pathogen recognition, of which 385 were predicted to encode nucleotide binding-leucine rich repeat receptor (NLR) proteins. Using a maximum-likelihood approach, we grouped the NLRs into 25 multigene families and 17 singletons. Forty-one percent of these NLR-encoding genes belong to three families, the largest being RGC16 with 62 genes in cv. Salinas. The majority of NLR-encoding genes are located in five major resistance clusters (MRCs) on chromosomes 1, 2, 3, 4, and 8 and cosegregate with multiple disease resistance phenotypes. Most MRCs contain primarily members of a single NLR gene family but a few are more complex. MRC2 spans 73 Mb and contains 61 NLRs of six different gene families that cosegregate with nine disease resistance phenotypes. MRC3, which is 25 Mb, contains 22 RGC21 genes and colocates with Dm13. A library of 33 transgenic RNA interference tester stocks was generated for functional analysis of NLR-encoding genes that cosegregated with disease resistance phenotypes in each of the MRCs. Members of four NLR-encoding families, RGC1, RGC2, RGC21, and RGC12 were shown to be required for 16 disease resistance phenotypes in lettuce. The general composition of MRCs is conserved across different genotypes; however, the specific repertoire of NLR-encoding genes varied particularly of the rapidly evolving Type I genes. These tester stocks are valuable resources for future analyses of additional resistance phenotypes. PMID:26449254

  20. Genome-Wide Architecture of Disease Resistance Genes in Lettuce

    PubMed Central

    Christopoulou, Marilena; Wo, Sebastian Reyes-Chin; Kozik, Alex; McHale, Leah K.; Truco, Maria-Jose; Wroblewski, Tadeusz; Michelmore, Richard W.

    2015-01-01

    Genome-wide motif searches identified 1134 genes in the lettuce reference genome of cv. Salinas that are potentially involved in pathogen recognition, of which 385 were predicted to encode nucleotide binding-leucine rich repeat receptor (NLR) proteins. Using a maximum-likelihood approach, we grouped the NLRs into 25 multigene families and 17 singletons. Forty-one percent of these NLR-encoding genes belong to three families, the largest being RGC16 with 62 genes in cv. Salinas. The majority of NLR-encoding genes are located in five major resistance clusters (MRCs) on chromosomes 1, 2, 3, 4, and 8 and cosegregate with multiple disease resistance phenotypes. Most MRCs contain primarily members of a single NLR gene family but a few are more complex. MRC2 spans 73 Mb and contains 61 NLRs of six different gene families that cosegregate with nine disease resistance phenotypes. MRC3, which is 25 Mb, contains 22 RGC21 genes and colocates with Dm13. A library of 33 transgenic RNA interference tester stocks was generated for functional analysis of NLR-encoding genes that cosegregated with disease resistance phenotypes in each of the MRCs. Members of four NLR-encoding families, RGC1, RGC2, RGC21, and RGC12 were shown to be required for 16 disease resistance phenotypes in lettuce. The general composition of MRCs is conserved across different genotypes; however, the specific repertoire of NLR-encoding genes varied particularly of the rapidly evolving Type I genes. These tester stocks are valuable resources for future analyses of additional resistance phenotypes. PMID:26449254

  1. The banana E2 gene family: Genomic identification, characterization, expression profiling analysis.

    PubMed

    Dong, Chen; Hu, Huigang; Jue, Dengwei; Zhao, Qiufang; Chen, Hongliang; Xie, Jianghui; Jia, Liqiang

    2016-04-01

    The E2 is at the center of a cascade of Ub1 transfers, and it links activation of the Ub1 by E1 to its eventual E3-catalyzed attachment to substrate. Although the genome-wide analysis of this family has been performed in some species, little is known about analysis of E2 genes in banana. In this study, 74 E2 genes of banana were identified and phylogenetically clustered into thirteen subgroups. The predicted banana E2 genes were distributed across all 11 chromosomes at different densities. Additionally, the E2 domain, gene structure and motif compositions were analyzed. The expression of all of the banana E2 genes was analyzed in the root, stem, leaf, flower organs, five stages of fruit development and under abiotic stresses. All of the banana E2 genes, with the exception of few genes in each group, were expressed in at least one of the organs and fruit developments, which indicated that the E2 genes might involve in various aspects of the physiological and developmental processes of the banana. Quantitative RT-PCR (qRT-PCR) analysis identified that 45 E2s under drought and 33 E2s under salt were induced. To the best of our knowledge, this report describes the first genome-wide analysis of the banana E2 gene family, and the results should provide valuable information for understanding the classification, cloning and putative functions of this family. PMID:26940488

  2. GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes

    PubMed Central

    Chan, Patricia P.; Lowe, Todd M.

    2016-01-01

    Transfer RNAs represent the largest, most ubiquitous class of non-protein coding RNA genes found in all living organisms. The tRNAscan-SE search tool has become the de facto standard for annotating tRNA genes in genomes, and the Genomic tRNA Database (GtRNAdb) was created as a portal for interactive exploration of these gene predictions. Since its published description in 2009, the GtRNAdb has steadily grown in content, and remains the most commonly cited web-based source of tRNA gene information. In this update, we describe not only a major increase in the number of tRNA predictions (>367000) and genomes analyzed (>4370), but more importantly, the integration of new analytic and functional data to improve the quality and biological context of tRNA gene predictions. New information drawn from other sources includes tRNA modification data, epigenetic data, single nucleotide polymorphisms, gene expression and evolutionary conservation. A richer set of analytic data is also presented, including better tRNA functional prediction, non-canonical features, predicted structural impacts from sequence variants and minimum free energy structural predictions. Views of tRNA genes in genomic context are provided via direct links to the UCSC genome browsers. The database can be searched by sequence or gene features, and is available at http://gtrnadb.ucsc.edu/. PMID:26673694

  3. An integrated map of structural variation in 2,504 human genomes.

    PubMed

    Sudmant, Peter H; Rausch, Tobias; Gardner, Eugene J; Handsaker, Robert E; Abyzov, Alexej; Huddleston, John; Zhang, Yan; Ye, Kai; Jun, Goo; Hsi-Yang Fritz, Markus; Konkel, Miriam K; Malhotra, Ankit; Stütz, Adrian M; Shi, Xinghua; Paolo Casale, Francesco; Chen, Jieming; Hormozdiari, Fereydoun; Dayama, Gargi; Chen, Ken; Malig, Maika; Chaisson, Mark J P; Walter, Klaudia; Meiers, Sascha; Kashin, Seva; Garrison, Erik; Auton, Adam; Lam, Hugo Y K; Jasmine Mu, Xinmeng; Alkan, Can; Antaki, Danny; Bae, Taejeong; Cerveira, Eliza; Chines, Peter; Chong, Zechen; Clarke, Laura; Dal, Elif; Ding, Li; Emery, Sarah; Fan, Xian; Gujral, Madhusudan; Kahveci, Fatma; Kidd, Jeffrey M; Kong, Yu; Lameijer, Eric-Wubbo; McCarthy, Shane; Flicek, Paul; Gibbs, Richard A; Marth, Gabor; Mason, Christopher E; Menelaou, Androniki; Muzny, Donna M; Nelson, Bradley J; Noor, Amina; Parrish, Nicholas F; Pendleton, Matthew; Quitadamo, Andrew; Raeder, Benjamin; Schadt, Eric E; Romanovitch, Mallory; Schlattl, Andreas; Sebra, Robert; Shabalin, Andrey A; Untergasser, Andreas; Walker, Jerilyn A; Wang, Min; Yu, Fuli; Zhang, Chengsheng; Zhang, Jing; Zheng-Bradley, Xiangqun; Zhou, Wanding; Zichner, Thomas; Sebat, Jonathan; Batzer, Mark A; McCarroll, Steven A; Mills, Ryan E; Gerstein, Mark B; Bashir, Ali; Stegle, Oliver; Devine, Scott E; Lee, Charles; Eichler, Evan E; Korbel, Jan O

    2015-10-01

    Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association. PMID:26432246

  4. Horizontally transferred genes in the genome of Pacific white shrimp, Litopenaeus vannamei

    PubMed Central

    2013-01-01

    Background In recent years, as the development of next-generation sequencing technology, a growing number of genes have been reported as being horizontally transferred from prokaryotes to eukaryotes, most of them involving arthropods. As a member of the phylum Arthropoda, the Pacific white shrimp Litopenaeus vannamei has to adapt to the complex water environments with various symbiotic or parasitic microorganisms, which provide a platform for horizontal gene transfer (HGT). Results In this study, we analyzed the genome-wide HGT events in L. vannamei. Through homology search and phylogenetic analysis, followed by experimental PCR confirmation, 14 genes with HGT event were identified: 12 of them were transferred from bacteria and two from fungi. Structure analysis of these genes showed that the introns of the two fungi-originated genes were substituted by shrimp DNA fragment, two genes transferred from bacteria had shrimp specific introns inserted in them. Furthermore, around other three bacteria-originated genes, there were three large DNA segments inserted into the shrimp genome. One segment was a transposon that fully transferred, and the other two segments contained only coding regions of bacteria. Functional prediction of these 14 genes showed that 6 of them might be related to energy metabolism, and 4 others related to defense of the organism. Conclusions HGT events from bacteria or fungi were happened in the genome of L. vannamei, and these horizontally transferred genes can be transcribed in shrimp. This is the first time to report the existence of horizontally transferred genes in shrimp. Importantly, most of these genes are exposed to a negative selection pressure and appeared to be functional. PMID:23914989

  5. Genome-Wide Analysis and Characterization of Aux/IAA Family Genes in Brassica rapa

    PubMed Central

    Rameneni, Jana Jeevan; Li, Xiaonan; Sivanandhan, Ganesan; Choi, Su Ryun; Pang, Wenxing; Im, Subin; Lim, Yong Pyo

    2016-01-01

    Auxins are the key players in plant growth development involving leaf formation, phototropism, root, fruit and embryo development. Auxin/Indole-3-Acetic Acid (Aux/IAA) are early auxin response genes noted as transcriptional repressors in plant auxin signaling. However, many studies focus on Aux/ARF gene families and much less is known about the Aux/IAA gene family in Brassica rapa (B. rapa). Here we performed a comprehensive genome-wide analysis and identified 55 Aux/IAA genes in B. rapa using four conserved motifs of Aux/IAA family (PF02309). Chromosomal mapping of the B. rapa Aux/IAA (BrIAA) genes facilitated understanding cluster rearrangement of the crucifer building blocks in the genome. Phylogenetic analysis of BrIAA with Arabidopsis thaliana, Oryza sativa and Zea mays identified 51 sister pairs including 15 same species (BrIAA—BrIAA) and 36 cross species (BrIAA—AtIAA) IAA genes. Among the 55 BrIAA genes, expression of 43 and 45 genes were verified using Genebank B. rapa ESTs and in home developed microarray data from mature leaves of Chiifu and RcBr lines. Despite their huge morphological difference, tissue specific expression analysis of BrIAA genes between the parental lines Chiifu and RcBr showed that the genes followed a similar pattern of expression during leaf development and a different pattern during bud, flower and siliqua development stages. The response of the BrIAA genes to abiotic and auxin stress at different time intervals revealed their involvement in stress response. Single Nucleotide Polymorphisms between IAA genes of reference genome Chiifu and RcBr were focused and identified. Our study examines the scope of conservation and divergence of Aux/IAA genes and their structures in B. rapa. Analyzing the expression and structural variation between two parental lines will significantly contribute to functional genomics of Brassica crops and we belive our study would provide a foundation in understanding the Aux/IAA genes in B. rapa. PMID

  6. Genome-Wide Analysis and Characterization of Aux/IAA Family Genes in Brassica rapa.

    PubMed

    Paul, Parameswari; Dhandapani, Vignesh; Rameneni, Jana Jeevan; Li, Xiaonan; Sivanandhan, Ganesan; Choi, Su Ryun; Pang, Wenxing; Im, Subin; Lim, Yong Pyo

    2016-01-01

    Auxins are the key players in plant growth development involving leaf formation, phototropism, root, fruit and embryo development. Auxin/Indole-3-Acetic Acid (Aux/IAA) are early auxin response genes noted as transcriptional repressors in plant auxin signaling. However, many studies focus on Aux/ARF gene families and much less is known about the Aux/IAA gene family in Brassica rapa (B. rapa). Here we performed a comprehensive genome-wide analysis and identified 55 Aux/IAA genes in B. rapa using four conserved motifs of Aux/IAA family (PF02309). Chromosomal mapping of the B. rapa Aux/IAA (BrIAA) genes facilitated understanding cluster rearrangement of the crucifer building blocks in the genome. Phylogenetic analysis of BrIAA with Arabidopsis thaliana, Oryza sativa and Zea mays identified 51 sister pairs including 15 same species (BrIAA-BrIAA) and 36 cross species (BrIAA-AtIAA) IAA genes. Among the 55 BrIAA genes, expression of 43 and 45 genes were verified using Genebank B. rapa ESTs and in home developed microarray data from mature leaves of Chiifu and RcBr lines. Despite their huge morphological difference, tissue specific expression analysis of BrIAA genes between the parental lines Chiifu and RcBr showed that the genes followed a similar pattern of expression during leaf development and a different pattern during bud, flower and siliqua development stages. The response of the BrIAA genes to abiotic and auxin stress at different time intervals revealed their involvement in stress response. Single Nucleotide Polymorphisms between IAA genes of reference genome Chiifu and RcBr were focused and identified. Our study examines the scope of conservation and divergence of Aux/IAA genes and their structures in B. rapa. Analyzing the expression and structural variation between two parental lines will significantly contribute to functional genomics of Brassica crops and we belive our study would provide a foundation in understanding the Aux/IAA genes in B. rapa. PMID

  7. Gene Discovery through Genomic Sequencing of Brucella abortus

    PubMed Central

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979

  8. Stability domains of actin genes and genomic evolution

    NASA Astrophysics Data System (ADS)

    Carlon, E.; Dkhissi, A.; Malki, M. Lejard; Blossey, R.

    2007-11-01

    In eukaryotic genes, the protein coding sequence is split into several fragments, the exons, separated by noncoding DNA stretches, the introns. Prokaryotes do not have introns in their genomes. We report calculations of the stability domains of actin genes for various organisms in the animal, plant, and fungi kingdoms. Actin genes have been chosen because they have been highly conserved during evolution. In these genes, all introns were removed so as to mimic ancient genes at the time of the early eukaryotic development, i.e., before intron insertion. Common stability boundaries are found in evolutionarily distant organisms, which implies that these boundaries date from the early origin of eukaryotes. In general, the boundaries correspond with intron positions in the actins of vertebrates and other animals, but not much for plants and fungi. The sharpest boundary is found in a locus where fungi, algae, and animals have introns in positions separated by one nucleotide only, which identifies a hot spot for insertion. These results suggest that some introns may have been incorporated into the genomes through a thermodynamically driven mechanism, in agreement with previous observations on human genes. They also suggest a different mechanism for intron insertion in plants and animals.

  9. GeneSV - an Approach to Help Characterize Possible Variations in Genomic and Protein Sequences.

    PubMed

    Zemla, Adam; Kostova, Tanya; Gorchakov, Rodion; Volkova, Evgeniya; Beasley, David W C; Cardosa, Jane; Weaver, Scott C; Vasilakis, Nikos; Naraghi-Arani, Pejman

    2014-01-01

    A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism. PMID:24453480

  10. GeneSV – an Approach to Help Characterize Possible Variations in Genomic and Protein Sequences

    PubMed Central

    Zemla, Adam; Kostova, Tanya; Gorchakov, Rodion; Volkova, Evgeniya; Beasley, David W. C.; Cardosa, Jane; Weaver, Scott C.; Vasilakis, Nikos; Naraghi-Arani, Pejman

    2014-01-01

    A computational approach for identification and assessment of genomic sequence variability (GeneSV) is described. For a given nucleotide sequence, GeneSV collects information about the permissible nucleotide variability (changes that potentially preserve function) observed in corresponding regions in genomic sequences, and combines it with conservation/variability results from protein sequence and structure-based analyses of evaluated protein coding regions. GeneSV was used to predict effects (functional vs. non-functional) of 37 amino acid substitutions on the NS5 polymerase (RdRp) of dengue virus type 2 (DENV-2), 36 of which are not observed in any publicly available DENV-2 sequence. 32 novel mutants with single amino acid substitutions in the RdRp were generated using a DENV-2 reverse genetics system. In 81% (26 of 32) of predictions tested, GeneSV correctly predicted viability of introduced mutations. In 4 of 5 (80%) mutants with double amino acid substitutions proximal in structure to one another GeneSV was also correct in its predictions. Predictive capabilities of the developed system were illustrated on dengue RNA virus, but described in the manuscript a general approach to characterize real or theoretically possible variations in genomic and protein sequences can be applied to any organism. PMID:24453480

  11. Confluence of genes, environment, development, and behavior in a post Genome-Wide Association Study world.

    PubMed

    Vrieze, Scott I; Iacono, William G; McGue, Matt

    2012-11-01

    This article serves to outline a research paradigm to investigate main effects and interactions of genes, environment, and development on behavior and psychiatric illness. We provide a historical context for candidate gene studies and genome-wide association studies, including benefits, limitations, and expected payoffs. Using substance use and abuse as our driving example, we then turn to the importance of etiological psychological theory in guiding genetic, environmental, and developmental research, as well as the utility of refined phenotypic measures, such as endophenotypes, in the pursuit of etiological understanding and focused tests of genetic and environmental associations. Phenotypic measurement has received considerable attention in the history of psychology and is informed by psychometrics, whereas the environment remains relatively poorly measured and is often confounded with genetic effects (i.e., gene-environment correlation). Genetically informed designs, which are no longer limited to twin and adoption studies thanks to ever-cheaper genotyping, are required to understand environmental influences. Finally, we outline the vast amount of individual difference in structural genomic variation, most of which remains to be leveraged in genetic association tests. Although the genetic data can be massive and burdensome (tens of millions of variants per person), we argue that improved understanding of genomic structure and function will provide investigators with new tools to test specific a priori hypotheses derived from etiological psychological theory, much like current candidate gene research but with less confusion and more payoff than candidate gene research has to date. PMID:23062291

  12. Complete Sequence and Gene Organization of the Mitochondrial Genome of the Land Snail Albinaria Coerulea

    PubMed Central

    Hatzoglou, E.; Rodakis, G. C.; Lecanidou, R.

    1995-01-01

    The complete sequence (14,130 bp) of the mitochondrial DNA (mtDNA) of the land snail Albinaria coerulea was determined. It contains 13 protein, two rRNA and 22 tRNA genes. Twenty-four of these genes are encoded by one and 13 genes by the other strand. The gene arrangement shares almost no similarities with that of two other molluscs for which the complete gene content and arrangement are known, the bivalve Mytilus edulis and the chiton Katharina tunicata; the protein and rRNA gene order is similar to that of another terrestrial gastropod, Cepaea nemoralis. Unusual features include the following: (1) the absence of lengthy noncoding regions (there are only 141 intergenic nucleotides interspersed at different gene borders, the longest intergenic sequence being 42 nucleotides), (2) the presence of several overlapping genes (mostly tRNAs), (3) the presence of tRNA-like structures and other stem and loop structures within genes. An RNA editing system acting on tRNAs must necessarily be invoked for posttranscriptional extension of the overlapping tRNAs. Due to these features, and also because of the small size of its genes (e.g., it contains the smallest rRNA genes among the known coelomates), it is one of the most compact mitochondrial genomes known to date. PMID:7498775

  13. Genomic organization and evolution of ruminant lysozyme c genes

    PubMed Central

    IRWIN, David M

    2015-01-01

    Ruminant stomach lysozyme is a long established model of adaptive gene evolution. Evolution of stomach lysozyme function required changes in the site of expression of the lysozyme c gene and changes in the enzymatic properties of the enzyme. In ruminant mammals, these changes were associated with a change in the size of the lysozyme c gene family. The recent release of near complete genome sequences from several ruminant species allows a more complete examination of the evolution and diversification of the lysozyme c gene family. Here we characterize the size of the lysozyme c gene family in extant ruminants and demonstrate that their pecoran ruminant ancestor had a family of at least 10 lysozyme c genes, which included at least two pseudogenes. Evolutionary analysis of the ruminant lysozyme c gene sequences demonstrate that each of the four exons of the lysozyme c gene has a unique evolutionary history, indicating that they participated independently in concerted evolution. These analyses also show that episodic changes in the evolutionary constraints on the protein sequences occurred, with lysozyme c genes expressed in the abomasum of the stomach of extant ruminant species showing the greatest levels of selective constraints. PMID:25730456

  14. Identification of novel virulence-associated genes via genome analysis of hypothetical genes.

    PubMed

    Garbom, Sara; Forsberg, Ake; Wolf-Watz, Hans; Kihlberg, Britt-Marie

    2004-03-01

    The sequencing of bacterial genomes has opened new perspectives for identification of targets for treatment of infectious diseases. We have identified a set of novel virulence-associated genes (vag genes) by comparing the genome sequences of six human pathogens that are known to cause persistent or chronic infections in humans: Yersinia pestis, Neisseria gonorrhoeae, Helicobacter pylori, Borrelia burgdorferi, Streptococcus pneumoniae, and Treponema pallidum. This comparison was limited to genes annotated as hypothetical in the T. pallidum genome project. Seventeen genes with unknown functions were found to be conserved among these pathogens. Insertional inactivation of 14 of these genes generated nine mutants that were attenuated for virulence in a mouse infection model. Out of these nine genes, five were found to be specifically associated with virulence in mice as demonstrated by infection with Yersinia pseudotuberculosis in-frame deletion mutants. In addition, these five vag genes were essential only in vivo, since all the mutants were able to grow in vitro. These genes are broadly conserved among bacteria. Therefore, we propose that the corresponding vag gene products may constitute novel targets for antimicrobial therapy and that some vag mutants could serve as carrier strains for live vaccines. PMID:14977936

  15. Genomic organization of the human NSP gene, prototype of a novel gene family encoding reticulons

    SciTech Connect

    Roebroek, A.J.M.; Ayoubi, T.A.Y.; Velde, H.J.K. van de; Schoenmakers, E.F.P.M.; Pauli, I.G.L.; Van De Ven, W.J.M.

    1996-03-01

    Recently, cDNA cloning and expression of three mRNA variants of the human NSP gene were described. This neuroendocrine-specific gene encodes three NSP protein isoforms with unique amino-terminal parts, but common carboxy-terminal parts. The proteins, with yet unknown function, are associated with the endoplasmic reticulum and therefore are named NSP reticulons. Potentially, these proteins are neuroendocrine markers of a novel category in human lung cancer diagnosis. Here, the genomic organization of this gene was studied by analysis of genomic clones isolated from lambda phage and YAC libraries. The NSP exons were found to be dispersed over a genomic region of about 275 kb. The present elucidation of the genomic organization of the NSP gene explains the generation of NSP mRNA variants encoding NSP protein isoforms. Multiple promoters rather than alternative splicing of internal exons seem to be involved in this diversity. Furthermore, comparison of NSP genomic and cDNA sequences with databank nucleotide sequences resulted in the discovery of other human members of this novel family of reticulons encoding genes. 25 refs., 4 figs.

  16. Transport genes and chemotaxis in Laribacter hongkongensis: a genome-wide analysis

    PubMed Central

    2011-01-01

    Background Laribacter hongkongensis is a Gram-negative, sea gull-shaped rod associated with community-acquired gastroenteritis. The bacterium has been found in diverse freshwater environments including fish, frogs and drinking water reservoirs. Using the complete genome sequence data of L. hongkongensis, we performed a comprehensive analysis of putative transport-related genes and genes related to chemotaxis, motility and quorum sensing, which may help the bacterium adapt to the changing environments and combat harmful substances. Results A genome-wide analysis using Transport Classification Database TCDB, similarity and keyword searches revealed the presence of a large diversity of transporters (n = 457) and genes related to chemotaxis (n = 52) and flagellar biosynthesis (n = 40) in the L. hongkongensis genome. The transporters included those from all seven major transporter categories, which may allow the uptake of essential nutrients or ions, and extrusion of metabolic end products and hazardous substances. L. hongkongensis is unique among closely related members of Neisseriaceae family in possessing higher number of proteins related to transport of ammonium, urea and dicarboxylate, which may reflect the importance of nitrogen and dicarboxylate metabolism in this assacharolytic bacterium. Structural modeling of two C4-dicarboxylate transporters showed that they possessed similar structures to the determined structures of other DctP-TRAP transporters, with one having an unusual disulfide bond. Diverse mechanisms for iron transport, including hemin transporters for iron acquisition from host proteins, were also identified. In addition to the chemotaxis and flagella-related genes, the L. hongkongensis genome also contained two copies of qseB/qseC homologues of the AI-3 quorum sensing system. Conclusions The large number of diverse transporters and genes involved in chemotaxis, motility and quorum sensing suggested that the bacterium may utilize a complex system to

  17. Evolution of akirin family in gene and genome levels and coexpressed patterns among family members and rel gene in croaker.

    PubMed

    Liu, Tianxing; Gao, Yunhang; Xu, Tianjun

    2015-09-01

    Akirins, which are highly conserved nuclear proteins, are present throughout the metazoan and regulate innate immunity, embryogenesis, myogenesis, and carcinogenesis. This study reports all akirin genes from miiuy croaker and analyzes comprehensively the akirin gene family combined with akirin genes from other species. A second nuclear localization signal (NLS) is observed in akirin2 homologues, which is not in akirin1 homologues in all teleosts and most other vertebrates. Thus, we deduced that the loss of second NLS in akirin1 homologues in teleosts likely occurred in an ancestor to all Osteichthyes after splitting with cartilaginous fish. Significantly, the akirin2(2) gene included six exons interrupted by five introns in the miiuy croaker, which may be caused by the intron insertion event as a novel evidence for the variation of akirin gene structure in some species. In addition, comparison of the genomic neighborhood genes of akirin1, akirin2(1), and akirin2(2) demonstrates a strong level of conserved synteny across the teleost classes, which further proved the deduction of Macqueen and Johnston 2009 that the produce of akirin paralogues can be attributed to whole-genome duplications and the loss of some akirin paralogues after genome duplications. Furthermore, akirin gene family members and relish gene are ubiquitously expressed across all tissues, and their expression levels are increased in three immune tissues after infection with Vibrio anguillarum. Combined with the expression patterns of LEAP-1 and LEAP-2 from miiuy croaker, an intricate network of co-regulation among family members is established. Thus, it is further proved that akirins acted in concert with the relish protein to induce the expression of a subset of downstream pathway elements in the NF-kB dependent signaling pathway. PMID:25912355

  18. Augmented Annotation of the Schizosaccharomyces pombe Genome Reveals Additional Genes Required for Growth and Viability

    PubMed Central

    Bitton, Danny A.; Wood, Valerie; Scutt, Paul J.; Grallert, Agnes; Yates, Tim; Smith, Duncan L.; Hagan, Iain M.; Miller, Crispin J.

    2011-01-01

    Genome annotation is a synthesis of computational prediction and experimental evidence. Small genes are notoriously difficult to detect because the patterns used to identify them are often indistinguishable from chance occurrences, leading to an arbitrary cutoff threshold for the length of a protein-coding gene identified solely by in silico analysis. We report a systematic reappraisal of the Schizosaccharomyces pombe genome that ignores thresholds. A complete six-frame translation was compared to a proteome data set, the Pfam domain database, and the genomes of six other fungi. Thirty-nine novel loci were identified. RT-PCR and RNA-Seq confirmed transcription at 38 loci; 33 novel gene structures were delineated by 5′ and 3′ RACE. Expression levels of 14 transcripts fluctuated during meiosis. Translational evidence for 10 genes, evolutionary conservation data supporting 35 predictions, and distinct phenotypes upon ORF deletion (one essential, four slow-growth, two delayed-division phenotypes) suggest that all 39 predictions encode functional proteins. The popularity of S. pombe as a model organism suggests that this augmented annotation will be of interest in diverse areas of molecular and cellular biology, while the generality of the approach suggests widespread applicability to other genomes. PMID:21270388

  19. Evolutionary Design of Gene Networks: Forced Evolution by Genomic Parasites

    PubMed Central

    Spirov, A. V.; Zagriychuk, E. A.; Holloway, D. M.

    2014-01-01

    The co-evolution of species with their genomic parasites (transposons) is thought to be one of the primary ways of rewiring gene regulatory networks (GRNs). We develop a framework for conducting evolutionary computations (EC) using the transposon mechanism. We find that the selective pressure of transposons can speed evolutionary searches for solutions and lead to outgrowth of GRNs (through co-option of new genes to acquire insensitivity to the attacking transposons). We test the approach by finding GRNs which can solve a fundamental problem in developmental biology: how GRNs in early embryo development can robustly read maternal signaling gradients, despite continued attacks on the genome by transposons. We observed co-evolutionary oscillations in the abundance of particular GRNs and their transposons, reminiscent of predator-prey or host-parasite dynamics. PMID:25558118

  20. Lateral gene transfers have polished animal genomes: lessons from nematodes

    PubMed Central

    Danchin, Etienne G. J.; Rosso, Marie-Noëlle

    2012-01-01

    It is now accepted that lateral gene transfers (LGT), have significantly contributed to the composition of bacterial genomes. The amplitude of the phenomenon is considered so high in prokaryotes that it challenges the traditional view of a binary hierarchical tree of life to correctly represent the evolutionary history of species. Given the plethora of transfers between prokaryotes, it is currently impossible to infer the last common ancestral gene set for any extant species. For this ensemble of reasons, it has been proposed that the Darwinian binary tree of life may be inappropriate to correctly reflect the actual relations between species, at least in prokaryotes. In contrast, the contribution of LGT to the composition of animal genomes is less documented. In the light of recent analyses that reported series of LGT events in nematodes, we discuss the importance of this phenomenon in the evolutionary history and in the current composition of an animal genome. Far from being neutral, it appears that besides having contributed to nematode genome contents, LGT have favored the emergence of important traits such as plant-parasitism. PMID:22919619

  1. The genome of Salinibacter ruber: Convergence and gene exchange among hyperhalophilic bacteria and archaea

    PubMed Central

    Mongodin, E. F.; Nelson, K. E.; Daugherty, S.; DeBoy, R. T.; Wister, J.; Khouri, H.; Weidman, J.; Walsh, D. A.; Papke, R. T.; Sanchez Perez, G.; Sharma, A. K.; Nesbø, C. L.; MacLeod, D.; Bapteste, E.; Doolittle, W. F.; Charlebois, R. L.; Legault, B.; Rodriguez-Valera, F.

    2005-01-01

    Saturated thalassic brines are among the most physically demanding habitats on Earth: few microbes survive in them. Salinibacter ruber is among these organisms and has been found repeatedly in significant numbers in climax saltern crystallizer communities. The phenotype of this bacterium is remarkably similar to that of the hyperhalophilic Archaea (Haloarchaea). The genome sequence suggests that this resemblance has arisen through convergence at the physiological level (different genes producing similar overall phenotype) and the molecular level (independent mutations yielding similar sequences or structures). Several genes and gene clusters also derive by lateral transfer from (or may have been laterally transferred to) haloarchaea. S. ruber encodes four rhodopsins. One resembles bacterial proteorhodopsins and three are of the haloarchaeal type, previously uncharacterized in a bacterial genome. The impact of these modular adaptive elements on the cell biology and ecology of S. ruber is substantial, affecting salt adaptation, bioenergetics, and photobiology. PMID:16330755

  2. Genomic aberrations frequently alter chromatin regulatory genes in chordoma.

    PubMed

    Wang, Lu; Zehir, Ahmet; Nafa, Khedoudja; Zhou, Nengyi; Berger, Michael F; Casanova, Jacklyn; Sadowska, Justyna; Lu, Chao; Allis, C David; Gounder, Mrinal; Chandhanayingyong, Chandhanarat; Ladanyi, Marc; Boland, Patrick J; Hameed, Meera

    2016-07-01

    Chordoma is a rare primary bone neoplasm that is resistant to standard chemotherapies. Despite aggressive surgical management, local recurrence and metastasis is not uncommon. To identify the specific genetic aberrations that play key roles in chordoma pathogenesis, we utilized a genome-wide high-resolution SNP-array and next generation sequencing (NGS)-based molecular profiling platform to study 24 patient samples with typical histopathologic features of chordoma. Matching normal tissues were available for 16 samples. SNP-array analysis revealed nonrandom copy number losses across the genome, frequently involving 3, 9p, 1p, 14, 10, and 13. In contrast, copy number gain is uncommon in chordomas. Two minimum deleted regions were observed on 3p within a ∼8 Mb segment at 3p21.1-p21.31, which overlaps SETD2, BAP1 and PBRM1. The minimum deleted region on 9p was mapped to CDKN2A locus at 9p21.3, and homozygous deletion of CDKN2A was detected in 5/22 chordomas (∼23%). NGS-based molecular profiling demonstrated an extremely low level of mutation rate in chordomas, with an average of 0.5 mutations per sample for the 16 cases with matched normal. When the mutated genes were grouped based on molecular functions, many of the mutation events (∼40%) were found in chromatin regulatory genes. The combined copy number and mutation profiling revealed that SETD2 is the single gene affected most frequently in chordomas, either by deletion or by mutations. Our study demonstrated that chordoma belongs to the C-class (copy number changes) tumors whose oncogenic signature is non-random multiple copy number losses across the genome and genomic aberrations frequently alter chromatin regulatory genes. © 2016 Wiley Periodicals, Inc. PMID:27072194

  3. Re-examining the Gene in Personalized Genomics

    NASA Astrophysics Data System (ADS)

    Bartol, Jordan

    2013-10-01

    Personalized genomics companies (PG; also called `direct-to-consumer genetics') are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept presented to customers and the relation between the information given and the science behind PG. Two quite different gene concepts are present in company rhetoric, but only one features in the science. To explain this, we must appreciate the delicate tension between PG, academic science, public expectation, and market forces.

  4. Genome-wide analysis reveals gene expression and metabolic network dynamics during embryo development in Arabidopsis.

    PubMed

    Xiang, Daoquan; Venglat, Prakash; Tibiche, Chabane; Yang, Hui; Risseeuw, Eddy; Cao, Yongguo; Babic, Vivijan; Cloutier, Mathieu; Keller, Wilf; Wang, Edwin; Selvaraj, Gopalan; Datla, Raju

    2011-05-01

    Embryogenesis is central to the life cycle of most plant species. Despite its importance, because of the difficulty associated with embryo isolation, global gene expression programs involved in plant embryogenesis, especially the early events following fertilization, are largely unknown. To address this gap, we have developed methods to isolate whole live Arabidopsis (Arabidopsis thaliana) embryos as young as zygote and performed genome-wide profiling of gene expression. These studies revealed insights into patterns of gene expression relating to: maternal and paternal contributions to zygote development, chromosomal level clustering of temporal expression in embryogenesis, and embryo-specific functions. Functional analysis of some of the modulated transcription factor encoding genes from our data sets confirmed that they are critical for embryogenesis. Furthermore, we constructed stage-specific metabolic networks mapped with differentially regulated genes by combining the microarray data with the available Kyoto Encyclopedia of Genes and Genomes metabolic data sets. Comparative analysis of these networks revealed the network-associated structural and topological features, pathway interactions, and gene expression with reference to the metabolic activities during embryogenesis. Together, these studies have generated comprehensive gene expression data sets for embryo development in Arabidopsis and may serve as an important foundational resource for other seed plants. PMID:21402797

  5. Comparative genomics of mitochondria in chlorarachniophyte algae: endosymbiotic gene transfer and organellar genome dynamics

    PubMed Central

    Tanifuji, Goro; Archibald, John M.; Hashimoto, Tetsuo

    2016-01-01

    Chlorarachniophyte algae possess four DNA-containing compartments per cell, the nucleus, mitochondrion, plastid and nucleomorph, the latter being a relic nucleus derived from a secondary endosymbiont. While the evolutionary dynamics of plastid and nucleomorph genomes have been investigated, a comparative investigation of mitochondrial genomes (mtDNAs) has not been carried out. We have sequenced the complete mtDNA of Lotharella oceanica and compared it to that of another chlorarachniophyte, Bigelowiella natans. The linear mtDNA of L. oceanica is 36.7 kbp in size and contains 35 protein genes, three rRNAs and 24 tRNAs. The codons GUG and UUG appear to be capable of acting as initiation codons in the chlorarachniophyte mtDNAs, in addition to AUG. Rpl16, rps4 and atp8 genes are missing in L.oceanica mtDNA, despite being present in B. natans mtDNA. We searched for, and found, mitochondrial rpl16 and rps4 genes with spliceosomal introns in the L. oceanica nuclear genome, indicating that mitochondrion-to-host-nucleus gene transfer occurred after the divergence of these two genera. Despite being of similar size and coding capacity, the level of synteny between L. oceanica and B. natans mtDNA is low, suggesting frequent rearrangements. Overall, our results suggest that chlorarachniophyte mtDNAs are more evolutionarily dynamic than their plastid counterparts. PMID:26888293

  6. Comparative genomics of mitochondria in chlorarachniophyte algae: endosymbiotic gene transfer and organellar genome dynamics

    NASA Astrophysics Data System (ADS)

    Tanifuji, Goro; Archibald, John M.; Hashimoto, Tetsuo

    2016-02-01

    Chlorarachniophyte algae possess four DNA-containing compartments per cell, the nucleus, mitochondrion, plastid and nucleomorph, the latter being a relic nucleus derived from a secondary endosymbiont. While the evolutionary dynamics of plastid and nucleomorph genomes have been investigated, a comparative investigation of mitochondrial genomes (mtDNAs) has not been carried out. We have sequenced the complete mtDNA of Lotharella oceanica and compared it to that of another chlorarachniophyte, Bigelowiella natans. The linear mtDNA of L. oceanica is 36.7 kbp in size and contains 35 protein genes, three rRNAs and 24 tRNAs. The codons GUG and UUG appear to be capable of acting as initiation codons in the chlorarachniophyte mtDNAs, in addition to AUG. Rpl16, rps4 and atp8 genes are missing in L.oceanica mtDNA, despite being present in B. natans mtDNA. We searched for, and found, mitochondrial rpl16 and rps4 genes with spliceosomal introns in the L. oceanica nuclear genome, indicating that mitochondrion-to-host-nucleus gene transfer occurred after the divergence of these two genera. Despite being of similar size and coding capacity, the level of synteny between L. oceanica and B. natans mtDNA is low, suggesting frequent rearrangements. Overall, our results suggest that chlorarachniophyte mtDNAs are more evolutionarily dynamic than their plastid counterparts.

  7. Inferring gene transcriptional modulatory relations: a genetical genomics approach

    SciTech Connect

    Li, Hongqiang; Lu, Lu; Manly, Kenneth; Chesler, Elissa J; Bao, Lei; Wang, Jintao; Zhou, Mi; Williams, Robert; Cui, Yan

    2005-01-01

    Bayesian network modeling is a promising approach to define and evaluate gene expression circuits in diverse tissues and cell types under different experimental conditions. The power and practicality of this approach can be improved by restricting the number of potential interactions among genes and by defining causal relations before evaluating posterior probabilities for billions of networks. A newly developed genetical genomics method that combines transcriptome profiling with complex trait analysis now provides strong constraints on network architecture. This method detects those chromosomal intervals responsible for differences in mRNA expression using quantitative trait locus (QTL) mapping. We have developed an efficient Bayesian approach that exploits the genetical genomics method to focus computational effort on the most plausible gene modulatory networks. We exploit a dense marker map for a genetic reference population (GRP) that consists of 32 BXD strains of mice made by intercrossing two progenitor strains- C57BL/6J and DBA/2J. These progenitors differ at 1.3 million known single nucleotide polymorphisms (SNPs), all of which can be exploited to estimate the probability that a gene contains functional polymorphisms that segregate within the GRP. We constructed 66 candidate networks that include all the candidate modulator genes located in the 209 statistically significant trans-acting QTL regions. SNPs that distinguish between the two progenitor strains were used to further winnow the list of candidate modulators. Bayesian network was then used to identify the genetic modulatory relations that best explain the microarray data.

  8. Duplex stem-loop-containing quadruplex motifs in the human genome: a combined genomic and structural study

    PubMed Central

    Lim, Kah Wai; Jenjaroenpun, Piroon; Low, Zhen Jie; Khong, Zi Jian; Ng, Yi Siang; Kuznetsov, Vladimir Andreevich; Phan, Anh Tuân

    2015-01-01

    Duplex stem-loops and four-stranded G-quadruplexes have been implicated in (patho)biological processes. Overlap of stem-loop- and quadruplex-forming sequences could give rise to quadruplex–duplex hybrids (QDH), which combine features of both structural forms and could exhibit unique properties. Here, we present a combined genomic and structural study of stem-loop-containing quadruplex sequences (SLQS) in the human genome. Based on a maximum loop length of 20 nt, our survey identified 80 307 SLQS, embedded within 60 172 unique clusters. Our analysis suggested that these should cover close to half of total SLQS in the entire genome. Among these, 48 508 SLQS were strand-specifically located in genic/promoter regions, with the majority of genes displaying a low number of SLQS. Notably, genes containing abundant SLQS clusters were strongly associated with brain tissues. Enrichment analysis of SLQS-positive genes and mapping of SLQS onto transcriptional/mutagenesis hotspots and cancer-associated genes, provided a statistical framework supporting the biological involvements of SLQS. In vitro formation of diverse QDH by selective SLQS hits were successfully verified by nuclear magnetic resonance spectroscopy. Folding topologies of two SLQS were elucidated in detail. We also demonstrated that sequence changes at mutation/single-nucleotide polymorphism loci could affect the structural conformations adopted by SLQS. Thus, our predicted SLQS offer novel insights into the potential involvement of QDH in diverse (patho)biological processes and could represent novel regulatory signals. PMID:25958397

  9. Structure of the human retinoblastoma gene.

    PubMed Central

    Hong, F D; Huang, H J; To, H; Young, L J; Oro, A; Bookstein, R; Lee, E Y; Lee, W H

    1989-01-01

    Complete inactivation of the human retinoblastoma gene (RB) is believed to be an essential step in tumorigenesis of several different cancers. To provide a framework for understanding inactivation mechanisms, the structure of RB was delineated. The RB transcript is encoded in 27 exons dispersed over about 200 kilobases (kb) of genomic DNA. The length of individual exons ranges from 31 to 1889 base pairs (bp). The largest intron spans greater than 60 kb and the smallest one has only 80 bp. Deletion of exons 13-17 is frequently observed in various types of tumors, including retinoblastoma, breast cancer, and osteosarcoma, and the presence of a potential "hot spot" for recombination in the region is predicted. A putative "leucine-zipper" motif is exclusively encoded by exon 20. The detailed RB structure presented here should prove useful in defining potential functional domains of its encoded protein. Transcription of RB is initiated at multiple positions and the sequences surrounding the initiation sites have a high G + C content. A typical upstream TATA box is not present. Localization of the RB promoter region was accomplished by utilizing a heterologous expression system containing a bacterial chloramphenicol acetyltransferase gene. Deletion analysis revealed that a region as small as 70 bp is sufficient for RB promoter activity, similar to other previously characterized G + C-rich gene promoters. Several direct repeats and possible stem-and-loop structures are found in the promoter region. No enhancer element was detected within the 7.3 kb of upstream sequence studied. Several features of the RB promoter are reminiscent of the characteristics associated with many "housekeeping" genes, consistent with its ubiquitous expression pattern. Images PMID:2748600

  10. Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

    SciTech Connect

    Condon, Bradford J.; Leng, Yueqiang; Wu, Dongliang; Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinlzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian

    2013-01-24

    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25 higher than those between inbred lines and 50 lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.

  11. Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

    PubMed Central

    Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinIzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian

    2013-01-01

    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five percent of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25× higher than those between inbred lines and 50× lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP–encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence. PMID:23357949

  12. Increased complexity of gene structure and base composition in vertebrates.

    PubMed

    Wu, Ying; Yuan, Huizhong; Tan, Shengjun; Chen, Jian-Qun; Tian, Dacheng; Yang, Haiwang

    2011-07-20

    How the structure and base composition of genes changed with the evolution of vertebrates remains a puzzling question. Here we analyzed 895 orthologous protein-coding genes in six multicellular animals: human, chicken, zebrafish, sea squirt, fruit fly, and worm. Our analyses reveal that many gene regions, particularly intron and 3' UTR, gradually expanded throughout the evolution of vertebrates from their invertebrate ancestors, and that the number of exons per gene increased. Studies based on all protein-coding genes in each genome provide consistent results. We also find that GC-content increased in many gene regions (especially 5' UTR) in the evolution of endotherms, except in coding-exons. Analysis of individual genomes shows that 3' UTR demonstrated stronger length and GC-content correlation with intron than 5' UTR, and gene with large intron in all six species demonstrated relatively similar GC-content. Our data indicates a great increase in complexity in vertebrate genes and we propose that the requirement for morphological and functional changes is probably the driving force behind the evolution of structure and base composition complexity in multicellular animal genes. PMID:21777854

  13. New Markov Model Approaches to Deciphering Microbial Genome Function and Evolution: Comparative Genomics of Laterally Transferred Genes

    SciTech Connect

    Borodovsky, M.

    2013-04-11

    Algorithmic methods for gene prediction have been developed and successfully applied to many different prokaryotic genome sequences. As the set of genes in a particular genome is not homogeneous with respect to DNA sequence composition features, the GeneMark.hmm program utilizes two Markov models representing distinct classes of protein coding genes denoted "typical" and "atypical". Atypical genes are those whose DNA features deviate significantly from those classified as typical and they represent approximately 10% of any given genome. In addition to the inherent interest of more accurately predicting genes, the atypical status of these genes may also reflect their separate evolutionary ancestry from other genes in that genome. We hypothesize that atypical genes are largely comprised of those genes that have been relatively recently acquired through lateral gene transfer (LGT). If so, what fraction of atypical genes are such bona fide LGTs? We have made atypical gene predictions for all fully completed prokaryotic genomes; we have been able to compare these results to other "surrogate" methods of LGT prediction.

  14. Genomic Heterogeneity and Structural Variation in Soybean Near Isogenic Lines

    PubMed Central

    Stec, Adrian O.; Bhaskar, Pudota B.; Bolon, Yung-Tsi; Nolan, Rebecca; Shoemaker, Randy C.; Vance, Carroll P.; Stupar, Robert M.

    2013-01-01

    Near isogenic lines (NILs) are a critical genetic resource for the soybean research community. The ability to identify and characterize the genes driving the phenotypic differences between NILs is limited by the degree to which differential genetic introgressions can be resolved. Furthermore, the genetic heterogeneity extant among NIL sub-lines is an unaddressed research topic that might have implications for how genomic and phenotypic data from NILs are utilized. In this study, a recently developed high-resolution comparative genomic hybridization (CGH) platform was used to investigate the structure and diversity of genetic introgressions in two classical soybean NIL populations, respectively varying in protein content and iron deficiency chlorosis (IDC) susceptibility. There were three objectives: assess the capacity for CGH to resolve genomic introgressions, identify introgressions that are heterogeneous among NIL sub-lines, and associate heterogeneous introgressions with susceptibility to IDC. Using the CGH approach, introgression boundaries were refined and previously unknown introgressions were revealed. Furthermore, heterogeneous introgressions were identified within seven sub-lines of the IDC NIL “IsoClark.” This included three distinct introgression haplotypes linked to the major iron susceptible locus on chromosome 03. A phenotypic assessment of the seven sub-lines did not reveal any differences in IDC susceptibility, indicating that the genetic heterogeneity among the lines does not have a significant impact on the primary NIL phenotype. PMID:23630538

  15. The complete mitochondrial genome of the grand jackknife clam, Solen grandis (Bivalvia: Solenidae): a novel gene order and unusual non-coding region.

    PubMed

    Yuan, Yang; Li, Qi; Kong, Lingfeng; Yu, Hong

    2012-02-01

    Molluscs in general, and bivalves in particular, exhibit an extraordinary degree of mitochondrial gene order variation when compared with other metazoans. The complete mitochondrial genome of Solen grandis (Bivalvia: Solenidae) was determined using long-PCR and genome walking techniques. The entire mitochondrial genome sequence of S. grandis is 16,784 bp in length, and contains 36 genes including 12 protein-coding genes (atp8 is absent), 2 ribosomal RNAs, and 22 tRNAs. All genes are encoded on the same strand. Compared with other species, it bears a novel gene order. Besides these, we find a peculiar non-coding region of 435 bp with a microsatellite-like (TA)(12) element, poly-structures and many hairpin structures. In contrast to the available heterodont mitochondrial genomes from GenBank, the complete mtDNA of S. grandis has the shortest cox3 gene, and the longest atp6, nad4, nad5 genes. PMID:21598108

  16. Genomic location of the major ribosomal protein gene locus determines Vibrio cholerae global growth and infectivity.

    PubMed

    Soler-Bistué, Alfonso; Mondotte, Juan A; Bland, Michael Jason; Val, Marie-Eve; Saleh, María-Carla; Mazel, Didier

    2015-04-01

    The effects on cell physiology of gene order within the bacterial chromosome are poorly understood. In silico approaches have shown that genes involved in transcription and translation processes, in particular ribosomal protein (RP) genes, localize near the replication origin (oriC) in fast-growing bacteria suggesting that such a positional bias is an evolutionarily conserved growth-optimization strategy. Such genomic localization could either provide a higher dosage of these genes during fast growth or facilitate the assembly of ribosomes and transcription foci by keeping physically close the many components of these macromolecular machines. To explore this, we used novel recombineering tools to create a set of Vibrio cholerae strains in which S10-spec-α (S10), a locus bearing half of the ribosomal protein genes, was systematically relocated to alternative genomic positions. We show that the relative distance of S10 to the origin of replication tightly correlated with a reduction of S10 dosage, mRNA abundance and growth rate within these otherwise isogenic strains. Furthermore, this was accompanied by a significant reduction in the host-invasion capacity in Drosophila melanogaster. Both phenotypes were rescued in strains bearing two S10 copies highly distal to oriC, demonstrating that replication-dependent gene dosage reduction is the main mechanism behind these alterations. Hence, S10 positioning connects genome structure to cell physiology in Vibrio cholerae. Our results show experimentally for the first time that genomic positioning of genes involved in the flux of genetic information conditions global growth control and hence bacterial physiology and potentially its evolution. PMID:25875621

  17. Genomic Characterization of Prenatally Detected Chromosomal Structural Abnormalities Using Oligonucleotide Array Comparative Genomic Hybridization

    PubMed Central

    Li, Peining; Pomianowski, Pawel; DiMaio, Miriam S.; Florio, Joanne R.; Rossi, Michael R.; Xiang, Bixia; Xu, Fang; Yang, Hui; Geng, Qian; Xie, Jiansheng; Mahoney, Maurice J.

    2013-01-01

    Detection of chromosomal structural abnormalities using conventional cytogenetic methods poses a challenge for prenatal genetic counseling due to unpredictable clinical outcomes and risk of recurrence. Of the 1,726 prenatal cases in a 3-year period, we performed oligonucleotide array comparative genomic hybridization (aCGH) analysis on 11 cases detected with various structural chromosomal abnormalities. In nine cases, genomic aberrations and gene contents involving a 3p distal deletion, a marker chromosome from chromosome 4, a derivative chromosome 5 from a 5p/7q translocation, a de novo distal 6q deletion, a recombinant chromosome 8 comprised of an 8p duplication and an 8q deletion, an extra derivative chromosome 9 from an 8p/9q translocation, mosaicism for chromosome 12q with added material of initially unknown origin, an unbalanced 13q/15q rearrangement, and a distal 18q duplication and deletion were delineated. An absence of pathogenic copy number changes was noted in one case with a de novo 11q/14q translocation and in another with a familial insertion of 21q into a 19q. Genomic characterization of the structural abnormalities aided in the prediction of clinical outcomes. These results demonstrated the value of aCGH analysis in prenatal cases with subtle or complex chromosomal rearrangements. Furthermore, a retrospective analysis of clinical indications of our prenatal cases showed that approximately 20% of them had abnormal ultrasound findings and should be considered as high risk pregnancies for a combined chromosome and aCGH analysis. PMID:21671377

  18. Population structure and minimum core genome typing of Legionella pneumophila

    PubMed Central

    Qin, Tian; Zhang, Wen; Liu, Wenbin; Zhou, Haijian; Ren, Hongyu; Shao, Zhujun; Lan, Ruiting; Xu, Jianguo

    2016-01-01

    Legionella pneumophila is an important human pathogen causing Legionnaires’ disease. In this study, whole genome sequencing (WGS) was used to study the characteristics and population structure of L. pneumophila strains. We sequenced and compared 53 isolates of L. pneumophila covering different serogroups and sequence-based typing (SBT) types (STs). We found that 1,896 single-copy orthologous genes were shared by all isolates and were defined as the minimum core genome (MCG) of L. pneumophila. A total of 323,224 single-nucleotide polymorphisms (SNPs) were identified among the 53 strains. After excluding 314,059 SNPs which were likely to be results of recombination, the remaining 9,165 SNPs were referred to as MCG SNPs. Population Structure analysis based on MCG divided the 53 L. pneumophila into nine MCG groups. The within-group distances were much smaller than the between-group distances, indicating considerable divergence between MCG groups. MCG groups were also supplied by phylogenetic analysis and may be considered as robust taxonomic units within L. pneumophila. Among the nine MCG groups, eight showed high intracellular growth ability while one showed low intracellular growth ability. Furthermore, MCG typing also showed high resolution in subtyping ST1 strains. The results obtained in this study provided significant insights into the evolution, population structure and pathogenicity of L. pneumophila. PMID:26888563

  19. Population structure and minimum core genome typing of Legionella pneumophila.

    PubMed

    Qin, Tian; Zhang, Wen; Liu, Wenbin; Zhou, Haijian; Ren, Hongyu; Shao, Zhujun; Lan, Ruiting; Xu, Jianguo

    2016-01-01

    Legionella pneumophila is an important human pathogen causing Legionnaires' disease. In this study, whole genome sequencing (WGS) was used to study the characteristics and population structure of L. pneumophila strains. We sequenced and compared 53 isolates of L. pneumophila covering different serogroups and sequence-based typing (SBT) types (STs). We found that 1,896 single-copy orthologous genes were shared by all isolates and were defined as the minimum core genome (MCG) of L. pneumophila. A total of 323,224 single-nucleotide polymorphisms (SNPs) were identified among the 53 strains. After excluding 314,059 SNPs which were likely to be results of recombination, the remaining 9,165 SNPs were referred to as MCG SNPs. Population Structure analysis based on MCG divided the 53 L. pneumophila into nine MCG groups. The within-group distances were much smaller than the between-group distances, indicating considerable divergence between MCG groups. MCG groups were also supplied by phylogenetic analysis and may be considered as robust taxonomic units within L. pneumophila. Among the nine MCG groups, eight showed high intracellular growth ability while one showed low intracellular growth ability. Furthermore, MCG typing also showed high resolution in subtyping ST1 strains. The results obtained in this study provided significant insights into the evolution, population structure and pathogenicity of L. pneumophila. PMID:26888563

  20. Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments

    SciTech Connect

    Haas, B J; Salzberg, S L; Zhu, W; Pertea, M; Allen, J E; Orvis, J; White, O; Buell, C R; Wortman, J R

    2007-12-10

    EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

  1. Gene duplication, genome duplication, and the functional diversification of vertebrate globins

    PubMed Central

    Storz, Jay F.; Opazo, Juan C.; Hoffmann, Federico G.

    2015-01-01

    The functional diversification of the vertebrate globin gene superfamily provides an especially vivid illustration of the role of gene duplication and whole-genome duplication in promoting evolutionary innovation. For example, key globin proteins that evolved specialized functions in various aspects of oxidative metabolism and oxygen signaling pathways (hemoglobin [Hb], myoglobin [Mb], and cytoglobin [Cygb]) trace their origins to two whole-genome duplication events in the stem lineage of vertebrates. The retention of the proto-Hb and Mb genes in the ancestor of jawed vertebrates permitted a physiological division of labor between the oxygen-carrier function of Hb and the oxygen-storage function of Mb. In the Hb gene lineage, a subsequent tandem gene duplication gave rise to the proto α- and β-globin genes, which permitted the formation of multimeric Hbs composed of unlike subunits (α2β2). The evolution of this heteromeric quaternary structure was central to the emergence of Hb as a specialized oxygen-transport protein because it provided a mechanism for cooperative oxygen-binding and allosteric regulatory control. Subsequent rounds of duplication and divergence have produced diverse repertoires of α- and β-like globin genes that are ontogenetically regulated such that functionally distinct Hb isoforms are expressed during different stages of prenatal development and postnatal life. In the ancestor of jawless fishes, the proto Mb and Hb genes appear to have been secondarily lost, and the Cygb homolog evolved a specialized respiratory function in blood-oxygen transport. Phylogenetic and comparative genomic analyses of the vertebrate globin gene superfamily have revealed numerous instances in which paralogous globins have convergently evolved similar expression patterns and/or similar functional specializations in different organismal lineages. PMID:22846683

  2. Genome size diversity in angiosperms and its influence on gene space.

    PubMed

    Dodsworth, Steven; Leitch, Andrew R; Leitch, Ilia J

    2015-12-01

    Genome size varies c. 2400-fold in angiosperms (flowering plants), although the range of genome size is skewed towards small genomes, with a mean genome size of 1C=5.7Gb. One of the most crucial factors governing genome size in angiosperms is the relative amount and activity of repetitive elements. Recently, there have been new insights into how these repeats, previously discarded as 'junk' DNA, can have a significant impact on gene space (i.e. the part of the genome comprising all the genes and gene-related DNA). Here we review these new findings and explore in what ways genome size itself plays a role in influencing how repeats impact genome dynamics and gene space, including gene expression. PMID:26605684

  3. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes

    PubMed Central

    Biankin, Andrew V.; Waddell, Nicola; Kassahn, Karin S.; Gingras, Marie-Claude; Muthuswamy, Lakshmi B.; Johns, Amber L.; Miller, David K.; Wilson, Peter J.; Patch, Ann-Marie; Wu, Jianmin; Chang, David K.; Cowley, Mark J.; Gardiner, Brooke B.; Song, Sarah; Harliwong, Ivon; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Wani, Shivangi; Gongora, Milena; Pajic, Marina; Scarlett, Christopher J.; Gill, Anthony J.; Pinho, Andreia V.; Rooman, Ilse; Anderson, Matthew; Holmes, Oliver; Leonard, Conrad; Taylor, Darrin; Wood, Scott; Xu, Qinying; Nones, Katia; Fink, J. Lynn; Christ, Angelika; Bruxner, Tim; Cloonan, Nicole; Kolle, Gabriel; Newell, Felicity; Pinese, Mark; Mead, R. Scott; Humphris, Jeremy L.; Kaplan, Warren; Jones, Marc D.; Colvin, Emily K.; Nagrial, Adnan M.; Humphrey, Emily S.; Chou, Angela; Chin, Venessa T.; Chantrill, Lorraine A.; Mawson, Amanda; Samra, Jaswinder S.; Kench, James G.; Lovell, Jessica A.; Daly, Roger J.; Merrett, Neil D.; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q.; Barbour, Andrew; Zeps, Nikolajs; Kakkar, Nipun; Zhao, Fengmei; Wu, Yuan Qing; Wang, Min; Muzny, Donna M.; Fisher, William E.; Brunicardi, F. Charles; Hodges, Sally E.; Reid, Jeffrey G.; Drummond, Jennifer; Chang, Kyle; Han, Yi; Lewis, Lora R.; Dinh, Huyen; Buhay, Christian J.; Beck, Timothy; Timms, Lee; Sam, Michelle; Begley, Kimberly; Brown, Andrew; Pai, Deepa; Panchal, Ami; Buchner, Nicholas; De Borja, Richard; Denroche, Robert E.; Yung, Christina K.; Serra, Stefano; Onetto, Nicole; Mukhopadhyay, Debabrata; Tsao, Ming-Sound; Shaw, Patricia A.; Petersen, Gloria M.; Gallinger, Steven; Hruban, Ralph H.; Maitra, Anirban; Iacobuzio-Donahue, Christine A.; Schulick, Richard D.; Wolfgang, Christopher L.; Morgan, Richard A.; Lawlor, Rita T.; Capelli, Paola; Corbo, Vincenzo; Scardoni, Maria; Tortora, Giampaolo; Tempero, Margaret A.; Mann, Karen M.; Jenkins, Nancy A.; Perez-Mancera, Pedro A.; Adams, David J.; Largaespada, David A.; Wessels, Lodewyk F. A.; Rust, Alistair G.; Stein, Lincoln D.; Tuveson, David A.; Copeland, Neal G.; Musgrove, Elizabeth A.; Scarpa, Aldo; Eshleman, James R.; Hudson, Thomas J.; Sutherland, Robert L.; Wheeler, David A.; Pearson, John V.; McPherson, John D.; Gibbs, Richard A.; Grimmond, Sean M.

    2012-01-01

    Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis. PMID:23103869

  4. OxyGene: an innovative platform for investigating oxidative-response genes in whole prokaryotic genomes

    PubMed Central

    Thybert, David; Avner, Stéphane; Lucchetti-Miganeh, Céline; Chéron, Angélique; Barloy-Hubler, Frédérique

    2008-01-01

    Background Oxidative stress is a common stress encountered by living organisms and is due to an imbalance between intracellular reactive oxygen and nitrogen species (ROS, RNS) and cellular antioxidant defence. To defend themselves against ROS/RNS, bacteria possess a subsystem of detoxification enzymes, which are classified with regard to their substrates. To identify such enzymes in prokaryotic genomes, different approaches based on similarity, enzyme profiles or patterns exist. Unfortunately, several problems persist in the annotation, classification and naming of these enzymes due mainly to some erroneous entries in databases, mistake propagation, absence of updating and disparity in function description. Description In order to improve the current annotation of oxidative stress subsystems, an innovative platform named OxyGene has been developed. It integrates an original database called OxyDB, holding thoroughly tested anchor-based signatures associated to subfamilies of oxidative stress enzymes, and a new anchor-driven annotator, for ab initio detection of ROS/RNS response genes. All complete Bacterial and Archaeal genomes have been re-annotated, and the results stored in the OxyGene repository can be interrogated via a Graphical User Interface. Conclusion OxyGene enables the exploration and comparative analysis of enzymes belonging to 37 detoxification subclasses in 664 microbial genomes. It proposes a new classification that improves both the ontology and the annotation of the detoxification subsystems in prokaryotic whole genomes, while discovering new ORFs and attributing precise function to hypothetical annotated proteins. OxyGene is freely available at: PMID:19117520

  5. Genome-wide identification of NBS genes in japonica rice reveals significant expansion of divergent non-TIR NBS-LRR genes.

    PubMed

    Zhou, T; Wang, Y; Chen, J-Q; Araki, H; Jing, Z; Jiang, K; Shen, J; Tian, D

    2004-05-01

    A complete set of candidate disease resistance ( R) genes encoding nucleotide-binding sites (NBSs) was identified in the genome sequence of japonica rice ( Oryza sativaL. var. Nipponbare). These putative R genes were characterized with respect to structural diversity, phylogenetic relationships and chromosomal distribution, and compared with those in Arabidopsis thaliana. We found 535 NBS-coding sequences, including 480 non-TIR (Toll/IL-1 receptor) NBS-LRR (Leucine Rich Repeat) genes. TIR NBS-LRR genes, which are common in A. thaliana, have not been identified in the rice genome. The number of non-TIR NBS-LRR genes in rice is 8.7 times higher than that in A. thaliana, and they account for about 1% of all of predicted ORFs in the rice genome. Some 76% of the NBS genes were located in 44 gene clusters or in 57 tandem arrays, and 16 apparent gene duplications were detected in these regions. Phylogenetic analyses based both NBS and N-terminal regions classified the genes into about 200 groups, but no deep clades were detected, in contrast to the two distinct clusters found in A. thaliana. The structural and genetic diversity that exists among NBS-LRR proteins in rice is remarkable, and suggests that diversifying selection has played an important role in the evolution of R genes in this agronomically important species. (Supplemental material is available online at http://gattaca.nju.edu.cn.) PMID:15014983

  6. Sequence, genomic structure, and chromosomal assignment of human DOC-2

    SciTech Connect

    Albertsen, H.M.; Williams, B.; Smith, S.A.

    1996-04-15

    DOC-2 is a human gene originally identified as a 767-bp cDNA fragment isolated from normal ovarian epithelial cells by differential display against ovarian carcinoma cells. We have now determined the complete cDNA sequence of the 3.2-kb DOC-2 transcript and localized the gene to chromosome 5. A 12.5-kb genomic fragment at the 5{prime}-end of DOC-2 has also been sequenced, revealing the intron-exon structure of the first eight exons (788 bases) of the DOC-2 gene. Translation of the DOC-2 cDNA predicts a hydrophobic protein of 770 amino acid residues with a molecular weight of 82.5 kDa. Comparison of the DNA and amino acid sequences of DOC-2 to publicly accessible sequence data-bases revealed 83% identity to p96, a murine-responsive phosphoprotein. In addition, about 45% identity was observed between the first 140 N-terminal residues of DOC-2 and the Caenorhabditas elegans M110.5 and Drosophila melanoaster Dab genes. 14 refs., 3 figs.

  7. Sugarcane Functional Genomics: Gene Discovery for Agronomic Trait Development

    PubMed Central

    Menossi, M.; Silva-Filho, M. C.; Vincentz, M.; Van-Sluys, M.-A.; Souza, G. M.

    2008-01-01

    Sugarcane is a highly productive crop used for centuries as the main source of sugar and recently to produce ethanol, a renewable bio-fuel energy source. There is increased interest in this crop due to the impending need to decrease fossil fuel usage. Sugarcane has a highly polyploid genome. Expressed sequence tag (EST) sequencing has significantly contributed to gene discovery and expression studies used to associate function with sugarcane genes. A significant amount of data exists on regulatory events controlling responses to herbivory, drought, and phosphate deficiency, which cause important constraints on yield and on endophytic bacteria, which are highly beneficial. The means to reduce drought, phosphate deficiency, and herbivory by the sugarcane borer have a negative impact on the environment. Improved tolerance for these constraints is being sought. Sugarcane's ability to accumulate sucrose up to 16% of its culm dry weight is a challenge for genetic manipulation. Genome-based technology such as cDNA microarray data indicates genes associated with sugar content that may be used to develop new varieties improved for sucrose content or for traits that restrict the expansion of the cultivated land. The genes can also be used as molecular markers of agronomic traits in traditional breeding programs. PMID:18273390

  8. Genomic analysis of primordial dwarfism reveals novel disease genes.

    PubMed

    Shaheen, Ranad; Faqeih, Eissa; Ansari, Shinu; Abdel-Salam, Ghada; Al-Hassnan, Zuhair N; Al-Shidi, Tarfa; Alomar, Rana; Sogaty, Sameera; Alkuraya, Fowzan S

    2014-02-01

    Primordial dwarfism (PD) is a disease in which severely impaired fetal growth persists throughout postnatal development and results in stunted adult size. The condition is highly heterogeneous clinically, but the use of certain phenotypic aspects such as head circumference and facial appearance has proven helpful in defining clinical subgroups. In this study, we present the results of clinical and genomic characterization of 16 new patients in whom a broad definition of PD was used (e.g., 3M syndrome was included). We report a novel PD syndrome with distinct facies in two unrelated patients, each with a different homozygous truncating mutation in CRIPT. Our analysis also reveals, in addition to mutations in known PD disease genes, the first instance of biallelic truncating BRCA2 mutation causing PD with normal bone marrow analysis. In addition, we have identified a novel locus for Seckel syndrome based on a consanguineous multiplex family and identified a homozygous truncating mutation in DNA2 as the likely cause. An additional novel PD disease candidate gene XRCC4 was identified by autozygome/exome analysis, and the knockout mouse phenotype is highly compatible with PD. Thus, we add a number of novel genes to the growing list of PD-linked genes, including one which we show to be linked to a novel PD syndrome with a distinct facial appearance. PD is extremely heterogeneous genetically and clinically, and genomic tools are often required to reach a molecular diagnosis. PMID:24389050

  9. Gene structure in the sea urchin Strongylocentrotus purpuratus based on transcriptome analysis.

    PubMed

    Tu, Qiang; Cameron, R Andrew; Worley, Kim C; Gibbs, Richard A; Davidson, Eric H

    2012-10-01

    A comprehensive transcriptome analysis has been performed on protein-coding RNAs of Strongylocentrotus purpuratus, including 10 different embryonic stages, six feeding larval and metamorphosed juvenile stages, and six adult tissues. In this study, we pooled the transcriptomes from all of these sources and focused on the insights they provide for gene structure in the genome of this recently sequenced model system. The genome had initially been annotated by use of computational gene model prediction algorithms. A large fraction of these predicted genes were recovered in the transcriptome when the reads were mapped to the genome and appropriately filtered and analyzed. However, in a manually curated subset, we discovered that more than half the computational gene model predictions were imperfect, containing errors such as missing exons, prediction of nonexistent exons, erroneous intron/exon boundaries, fusion of adjacent genes, and prediction of multiple genes from single genes. The transcriptome data have been used to provide a systematic upgrade of the gene model predictions throughout the genome, very greatly improving the research usability of the genomic sequence. We have constructed new public databases that incorporate information from the transcriptome analyses. The transcript-based gene model data were used to define average structural parameters for S. purpuratus protein-coding genes. In addition, we constructed a custom sea urchin gene ontology, and assigned about 7000 different annotated transcripts to 24 functional classes. Strong correlations became evident between given functional ontology classes and structural properties, including gene size, exon number, and exon and intron size. PMID:22709795

  10. Genome-wide analysis and expression profiling of the phospholipase D gene family in Gossypium arboreum.

    PubMed

    Tang, Kai; Dong, Chunjuan; Liu, Jinyuan

    2016-02-01

    The plant phospholipase D (PLD) plays versatile functions in multiple aspects of plant growth, development, and stress responses. However, until now, our knowledge concerning the PLD gene family members and their expression patterns in cotton has been limited. In this study, we performed for the first time the genome-wide analysis and expression profiling of PLD gene family in Gossypium arboretum, and finally, a total of 19 non-redundant PLD genes (GaPLDs) were identified. Based on the phylogenetic analysis, they were divided into six well-supported clades (α, β/γ, δ, ε, ζ and φ). Most of the GaPLD genes within the same clade showed the similar exon-intron organization and highly conserved motif structures. Additionally, the chromosomal distribution pattern revealed that GaPLD genes were unevenly distributed across 10 of the 13 cotton chromosomes. Segmental duplication is the major contributor to the expansion of GaPLD gene family and estimated to have occurred from 19.61 to 20.44 million years ago when a recent large-scale genome duplication occurred in cotton. Moreover, the expression profiling provides the functional divergence of GaPLD genes in cotton and provides some new light on the molecular mechanisms of GaPLDα1 and GaPLDδ2 in fiber development. PMID:26718354

  11. Global deceleration of gene evolution following recent genome hybridizations in fungi.

    PubMed

    Sriswasdi, Sira; Takashima, Masako; Manabe, Ri-Ichiroh; Ohkuma, Moriya; Sugita, Takashi; Iwasaki, Wataru

    2016-08-01

    Polyploidization events such as whole-genome duplication and inter-species hybridization are major evolutionary forces that shape genomes. Although long-term effects of polyploidization have been well-characterized, early molecular evolutionary consequences of polyploidization remain largely unexplored. Here, we report the discovery of two recent and independent genome hybridizations within a single clade of a fungal genus, Trichosporon Comparative genomic analyses revealed that redundant genes are experiencing decelerations, not accelerations, of evolutionary rates. We identified a relationship between gene conversion and decelerated evolution suggesting that gene conversion may improve the genome stability of young hybrids by restricting gene functional divergences. Furthermore, we detected large-scale gene losses from transcriptional and translational machineries that indicate a global compensatory mechanism against increased gene dosages. Overall, our findings illustrate counteracting mechanisms during an early phase of post-genome hybridization and fill a critical gap in existing theories on genome evolution. PMID:27440871

  12. Microcollinearity in an ethylene receptor coding gene region of the Coffea canephora genome is extensively conserved with Vitis vinifera and other distant dicotyledonous sequenced genomes

    PubMed Central

    Guyot, Romain; de la Mare, Marion; Viader, Véronique; Hamon, Perla; Coriton, Olivier; Bustamante-Porras, José; Poncet, Valérie; Campa, Claudine; Hamon, Serge; de Kochko, Alexandre

    2009-01-01

    Background Coffea canephora, also called Robusta, belongs to the Rubiaceae, the fourth largest angiosperm family. This diploid species (2x = 2n = 22) has a fairly small genome size of ≈ 690 Mb and despite its extreme economic importance, particularly for developing countries, knowledge on the genome composition, structure and evolution remain very limited. Here, we report the 160 kb of the first C. canephora Bacterial Artificial Chromosome (BAC) clone ever sequenced and its fine analysis. Results This clone contains the CcEIN4 gene, encoding an ethylene receptor, and twenty other predicted genes showing a high gene density of one gene per 7.8 kb. Most of them display perfect matches with C. canephora expressed sequence tags or show transcriptional activities through PCR amplifications on cDNA libraries. Twenty-three transposable elements, mainly Class II transposon derivatives, were identified at this locus. Most of these Class II elements are Miniature Inverted-repeat Transposable Elements (MITE) known to be closely associated with plant genes. This BAC composition gives a pattern similar to those found in gene rich regions of Solanum lycopersicum and Medicago truncatula genomes indicating that the CcEIN4 regions may belong to a gene rich region in the C. canephora genome. Comparative sequence analysis indicated an extensive conservation between C. canephora and most of the reference dicotyledonous genomes studied in this work, such as tomato (S. lycopersicum), grapevine (V. vinifera), barrel medic M. truncatula, black cottonwood (Populus trichocarpa) and Arabidopsis thaliana. The higher degree of microcollinearity was found between C. canephora and V. vinifera, which belong respectively to the Asterids and Rosids, two clades that diverged more than 114 million years ago. Conclusion This study provides a first glimpse of C. canephora genome composition and evolution. Our data revealed a remarkable conservation of the microcollinearity between C. canephora and V

  13. Evaluating high-throughput ab initio gene finders to discover proteins encoded in eukaryotic pathogen genomes missed by laboratory techniques.

    PubMed

    Goodswen, Stephen J; Kennedy, Paul J; Ellis, John T

    2012-01-01

    Next generation sequencing technology is advancing genome sequencing at an unprecedented level. By unravelling the code within a pathogen's genome, every possible protein (prior to post-translational modifications) can theoretically be discovered, irrespective of life cycle stages and environmental stimuli. Now more than ever there is a great need for high-throughput ab initio gene finding. Ab initio gene finders use statistical models to predict genes and their exon-intron structures from the genome sequence alone. This paper evaluates whether existing ab initio gene finders can effectively predict genes to deduce proteins that have presently missed capture by laboratory techniques. An aim here is to identify possible patterns of prediction inaccuracies for gene finders as a whole irrespective of the target pathogen. All currently available ab initio gene finders are considered in the evaluation but only four fulfil high-throughput capability: AUGUSTUS, GeneMark_hmm, GlimmerHMM, and SNAP. These gene finders require training data specific to a target pathogen and consequently the evaluation results are inextricably linked to the availability and quality of the data. The pathogen, Toxoplasma gondii, is used to illustrate the evaluation methods. The results support current opinion that predicted exons by ab initio gene finders are inaccurate in the absence of experimental evidence. However, the results reveal some patterns of inaccuracy that are common to all gene finders and these inaccuracies may provide a focus area for future gene finder developers. PMID:23226328

  14. The mitochondrial genome of the screamer louse Bothriometopus (phthiraptera: ischnocera): effects of extensive gene rearrangements on the evolution of the genome.

    PubMed

    Cameron, Stephen L; Johnson, Kevin P; Whiting, Michael F

    2007-11-01

    Mitochondrial (mt) genome rearrangement has generally been studied with respect to the phenomenon itself, focusing on their phylogenetic distribution and causal mechanisms. Rearrangements have additional significance through effects on substitution, transcription, and mRNA processing. Lice are an ideal group in which to study the interactions between rearrangements and these factors due to the heightened rearrangement rate within this group. The entire mt genome of the screamer louse Bothriometopus was sequenced and compared to previously sequenced louse genomes. The mt genome is 15,564 bp, circular, and all genes are encoded on the same strand. The gene arrangement differs radically from both other louse species and the ancestral insect. Nucleotide composition is A+T biased, but there is no skew which may be due to reversal of replication direction or a transcriptional effect. Bothriometopus has both tRNA duplication and concerted evolution which has not been observed previously. Eleven of the 13 protein-coding genes have 3' end stem-loop structures which may allow mRNA processing without flanking tRNAs and so facilitate gene rearrangements. There are five candidate control regions capable of forming stem-loop structures. Two are structurally more similar to the control regions of other insect species than those of other lice. Analyses of Bothriometopus demonstrate that louse mt genomes, in addition to being extensively rearranged, differ significantly from most insect species in nucleotide composition biases, tRNA evolution, protein-coding gene structures and putative signaling sites such as the control region. These may be either a cause or a consequence of gene rearrangements. PMID:17925995

  15. Genomic variation Salmonella enterica core genes for epidemiological typing

    SciTech Connect

    Leekitcharoenphon, Pimlapas; Hendriksen, Rene S; Le Hello, Simon; Weill, Fancois-Xavier; Baggesen, Dorte Lau; Jun, Se Ran; Ussery, David W; Lund, Ole; Crook, Derrick W; Wilson, Daniel J; Aarestrup, Frank M

    2012-01-01

    It has been 30 years since the initial emergence and subsequent rapid global spread of multidrug-resistant S. Typhimurium DT104. Nonetheless, its origin and transmission route have never been revealed. We used whole genome sequence (WGS) and temporally structured sequence analysis within a Bayesian framework to reconstruct temporal and spatial phylogenetic trees and estimate the rate of mutation and divergence time of 315 S. Typhimurium DT104 isolates sampled from 1969 to 2012 from 21 countries on six continents. DT104 was estimated to have emerged initially as antimicrobial-susceptible strains in ~1946 (95% credible interval 1931 - 1959) and later became multidrug-resistant (MDR) DT104 in ~1974 (95% CI 1966 1981) through horizontal transfer of the 13-kb SGI1 MDR region into already SGI1-containing susceptible strains. This was followed by multiple transmission events initially from Central Europe and later between European countries. An independent transmission occurred to the United States and another to Japan and from here to Taiwan and Canada. An independent acquisition of resistance genes took place in Thailand in ~1986 (95% CI 1975 1990). Locally in Denmark, WGS was capable of confirming local epidemiology for transmission between animal herds. Interestingly, the demographic history of Danish MDR DT104 provided evidence for the accomplishment of an eradication program across pig herds in Denmark from 1996 to 2000. The results from this study refuse several hypotheses on the evolution of DT104 and would suggest WGS may be useful in monitoring emerging clones and making strategies for prevention

  16. Genomic variation Salmonella enterica core genes for epidemiological typing

    DOE PAGESBeta

    Leekitcharoenphon, Pimlapas; Hendriksen, Rene S; Le Hello, Simon; Weill, Fancois-Xavier; Baggesen, Dorte Lau; Jun, Se Ran; Ussery, David W; Lund, Ole; Crook, Derrick W; Wilson, Daniel J; et al

    2012-01-01

    It has been 30 years since the initial emergence and subsequent rapid global spread of multidrug-resistant S. Typhimurium DT104. Nonetheless, its origin and transmission route have never been revealed. We used whole genome sequence (WGS) and temporally structured sequence analysis within a Bayesian framework to reconstruct temporal and spatial phylogenetic trees and estimate the rate of mutation and divergence time of 315 S. Typhimurium DT104 isolates sampled from 1969 to 2012 from 21 countries on six continents. DT104 was estimated to have emerged initially as antimicrobial-susceptible strains in ~1946 (95% credible interval 1931 - 1959) and later became multidrug-resistant (MDR)more » DT104 in ~1974 (95% CI 1966 1981) through horizontal transfer of the 13-kb SGI1 MDR region into already SGI1-containing susceptible strains. This was followed by multiple transmission events initially from Central Europe and later between European countries. An independent transmission occurred to the United States and another to Japan and from here to Taiwan and Canada. An independent acquisition of resistance genes took place in Thailand in ~1986 (95% CI 1975 1990). Locally in Denmark, WGS was capable of confirming local epidemiology for transmission between animal herds. Interestingly, the demographic history of Danish MDR DT104 provided evidence for the accomplishment of an eradication program across pig herds in Denmark from 1996 to 2000. The results from this study refuse several hypotheses on the evolution of DT104 and would suggest WGS may be useful in monitoring emerging clones and making strategies for prevention« less

  17. Genome-Wide Profiling of PARP1 Reveals an Interplay with Gene Regulatory Regions and DNA Methylation

    PubMed Central

    Nalabothula, Narasimharao; Al-jumaily, Taha; Eteleeb, Abdallah M.; Flight, Robert M.; Xiaorong, Shao; Moseley, Hunter; Rouchka, Eric C.; Fondufe-Mittendorf, Yvonne N.

    2015-01-01

    Poly (ADP-ribose) polymerase-1 (PARP1) is a nuclear enzyme involved in DNA repair, chromatin remodeling and gene expression. PARP1 interactions with chromatin architectural multi-protein complexes (i.e. nucleosomes) alter chromatin structure resulting in changes in gene expression. Chromatin structure impacts gene regulatory processes including transcription, splicing, DNA repair, replication and recombination. It is important to delineate whether PARP1 randomly associates with nucleosomes or is present at specific nucleosome regions throughout the cell genome. We performed genome-wide association studies in breast cancer cell lines to address these questions. Our studies show that PARP1 associates with epigenetic regulatory elements genome-wide, such as active histone marks, CTCF and DNase hypersensitive sites. Additionally, the binding of PARP1 to chromatin genome-wide is mutually exclusive with DNA methylation pattern suggesting a functional interplay between PARP1 and DNA methylation. Indeed, inhibition of PARylation results in genome-wide changes in DNA methylation patterns. Our results suggest that PARP1 controls the fidelity of gene transcription and marks actively transcribed gene regions by selectively binding to transcriptionally active chromatin. These studies provide a platform for developing our understanding of PARP1’s role in gene regulation. PMID:26305327

  18. Structure and evolution of the human IKBA gene

    SciTech Connect

    Ito, C.Y.; Bautch, V.L.; Baldwin, A.S. Jr.

    1995-09-20

    I{kappa}B{alpha} belongs to a gene family whose members are characterized by their 6-7 Ankyrin repeats, which allow them to interact with members of the Rel family of transcription factors. We have sequenced a human I{kappa}B{alpha} genomic clone to determine its gene structure. The human I{kappa}B{alpha} gene (IKBA) has six exons and five introns that span approximately 3.5 kb. This genomic organization is similiar to that of other members of the Ankyrin gene family. The human IKBA gene shares similiar intron/exon boundaries with the human BCL3 and NFKB2 genes, which is consistent with their conserved Ankyrin repeats. To examine further the evolutionary relationship between human I{kappa}B{alpha} and other members of its gene family, we performed a phylogenetic analysis. Although the resulting phylogenetic tree does not identify a common ancestor of the I{kappa}B{alpha} gene family, it indicates that this family diverges into two groups based on structure and function. 41 refs., 4 figs., 1 tab.

  19. The Gene-Finder computer tools for analysis of human and model organisms genome sequences.

    PubMed

    Solovyev, V; Salamov, A

    1997-01-01

    We present a complex of new programs for promoter, 3'-processing, splice sites, coding exons and gene structure identification in genomic DNA of several model species. The human gene structure prediction program FGENEH, exon prediction-FEXH and splice site prediction-HSPL have been modified for sequence analysis of Drosophila (FGENED, FEXD and DSPL), C.elegance (FGENEN, FEXN and NSPL), Yeast (FEXY and YSPL) and Plant (FGENEA, FEXA and ASPL) genomic sequences. We recomputed all frequency and discriminant function parameters for these organisms and adjusted organism specific minimal intron lengths. An accuracy of coding region prediction for these programs is similar with the observed accuracy of FEXH and FGENEH. We have developed FEXHB and FGENEHB programs combining pattern recognition features and information about similarity of predicted exons with known sequences in protein databases. These programs have approximately 10% higher average accuracy of coding region recognition. Two new programs for human promoter site prediction (TSSG and TSSW) have been developed which use Gosh (1993) and Wingender (1994) data bases of functional motifs, respectively. POLYAH program was designed for prediction of 3'-processing regions in human genes and CDSB program was developed for bacterial gene prediction. We have developed a new approach to predict multiple genes based on double dynamic programming, that is very important for analysis of long genomic DNA fragments generated by genome sequencing projects. Analysis of uncharacterized sequences based on our methods is available through the University of Houston, Weizmann Institute of Science email servers and several Web pages at Baylor College of Medicine. PMID:9322052

  20. Pangenome Analysis of Burkholderia pseudomallei: Genome Evolution Preserves Gene Order despite High Recombination Rates

    PubMed Central

    Spring-Pearson, Senanu M.; Stone, Joshua K.; Doyle, Adina; Allender, Christopher J.; Okinaka, Richard T.; Mayo, Mark; Broomall, Stacey M.; Hill, Jessica M.; Karavis, Mark A.; Hubbard, Kyle S.; Insalaco, Joseph M.; McNew, Lauren A.; Rosenzweig, C. Nicole; Gibbons, Henry S.; Currie, Bart J.; Wagner, David M.; Keim, Paul; Tuanyok, Apichai

    2015-01-01

    The pangenomic diversity in Burkholderia pseudomallei is high, with approximately 5.8% of the genome consisting of genomic islands. Genomic islands are known hotspots for recombination driven primarily by site-specific recombination associated with tRNAs. However, recombination rates in other portions of the genome are also high, a feature we expected to disrupt gene order. We analyzed the pangenome of 37 isolates of B. pseudomallei and demonstrate that the pangenome is ‘open’, with approximately 136 new genes identified with each new genome sequenced, and that the global core genome consists of 4568±16 homologs. Genes associated with metabolism were statistically overrepresented in the core genome, and genes associated with mobile elements, disease, and motility were primarily associated with accessory portions of the pangenome. The frequency distribution of genes present in between 1 and 37 of the genomes analyzed matches well with a model of genome evolution in which 96% of the genome has very low recombination rates but 4% of the genome recombines readily. Using homologous genes among pairs of genomes, we found that gene order was highly conserved among strains, despite the high recombination rates previously observed. High rates of gene transfer and recombination are incompatible with retaining gene order unless these processes are either highly localized to specific sites within the genome, or are characterized by symmetrical gene gain and loss. Our results demonstrate that both processes occur: localized recombination introduces many new genes at relatively few sites, and recombination throughout the genome generates the novel multi-locus sequence types previously observed while preserving gene order. PMID:26484663

  1. Mapping Our Genes: The Genome Projects: How Big, How Fast

    DOE R&D Accomplishments Database

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for �writing the rules� of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. The Office of Technology Assessment (OTA) prepared this report with the assistance of several hundred experts throughout the world.

  2. Mapping our genes: The genome projects: How big, how fast

    SciTech Connect

    none,

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for /open quotes/writing the rules/close quotes/ of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. OTA prepared this report with the assistance of several hundred experts throughout the world. 342 refs., 26 figs., 11 tabs.

  3. Evolutionary genomics and adaptive evolution of the Hedgehog gene family (Shh, Ihh and Dhh) in vertebrates.

    PubMed

    Pereira, Joana; Johnson, Warren E; O'Brien, Stephen J; Jarvis, Erich D; Zhang, Guojie; Gilbert, M Thomas P; Vasconcelos, Vitor; Antunes, Agostinho

    2014-01-01

    The Hedgehog (Hh) gene family codes for a class of secreted proteins composed of two active domains that act as signalling molecules during embryo development, namely for the development of the nervous and skeletal systems and the formation of the testis cord. While only one Hh gene is found typically in invertebrate genomes, most vertebrates species have three (Sonic hedgehog--Shh; Indian hedgehog--Ihh; and Desert hedgehog--Dhh), each with different expression patterns and functions, which likely helped promote the increasing complexity of vertebrates and their successful diversification. In this study, we used comparative genomic and adaptive evolutionary analyses to characterize the evolution of the Hh genes in vertebrates following the two major whole genome duplication (WGD) events. To overcome the lack of Hh-coding sequences on avian publicly available databases, we used an extensive dataset of 45 avian and three non-avian reptilian genomes to show that birds have all three Hh paralogs. We find suggestions that following the WGD events, vertebrate Hh paralogous genes evolved independently within similar linkage groups and under different evolutionary rates, especially within the catalytic domain. The structural regions around the ion-binding site were identified to be under positive selection in the signaling domain. These findings contrast with those observed in invertebrates, where different lineages that experienced gene duplication retained similar selective constraints in the Hh orthologs. Our results provide new insights on the evolutionary history of the Hh gene family, the functional roles of these paralogs in vertebrate species, and on the location of mutational hotspots. PMID:25549322

  4. Evolutionary Genomics and Adaptive Evolution of the Hedgehog Gene Family (Shh, Ihh and Dhh) in Vertebrates

    PubMed Central

    Pereira, Joana; Johnson, Warren E.; O’Brien, Stephen J.; Jarvis, Erich D.; Zhang, Guojie; Gilbert, M. Thomas P.; Vasconcelos, Vitor; Antunes, Agostinho

    2014-01-01

    The Hedgehog (Hh) gene family codes for a class of secreted proteins composed of two active domains that act as signalling molecules during embryo development, namely for the development of the nervous and skeletal systems and the formation of the testis cord. While only one Hh gene is found typically in invertebrate genomes, most vertebrates species have three (Sonic hedgehog – Shh; Indian hedgehog – Ihh; and Desert hedgehog – Dhh), each with different expression patterns and functions, which likely helped promote the increasing complexity of vertebrates and their successful diversification. In this study, we used comparative genomic and adaptive evolutionary analyses to characterize the evolution of the Hh genes in vertebrates following the two major whole genome duplication (WGD) events. To overcome the lack of Hh-coding sequences on avian publicly available databases, we used an extensive dataset of 45 avian and three non-avian reptilian genomes to show that birds have all three Hh paralogs. We find suggestions that following the WGD events, vertebrate Hh paralogous genes evolved independently within similar linkage groups and under different evolutionary rates, especially within the catalytic domain. The structural regions around the ion-binding site were identified to be under positive selection in the signaling domain. These findings contrast with those observed in invertebrates, where different lineages that experienced gene duplication retained similar selective constraints in the Hh orthologs. Our results provide new insights on the evolutionary history of the Hh gene family, the functional roles of these paralogs in vertebrate species, and on the location of mutational hotspots. PMID:25549322

  5. The immune gene repertoire encoded in the purple sea urchin genome.

    PubMed

    Hibino, Taku; Loza-Coll, Mariano; Messier, Cynthia; Majeske, Audrey J; Cohen, Avis H; Terwilliger, David P; Buckley, Katherine M; Brockton, Virginia; Nair, Sham V; Berney, Kevin; Fugmann, Sebastian D; Anderson, Michele K; Pancer, Zeev; Cameron, R Andrew; Smith, L Courtney; Rast, Jonathan P

    2006-12-01

    Echinoderms occupy a critical and largely unexplored phylogenetic vantage point from which to infer both the early evolution of bilaterian immunity and the underpinnings of the vertebrate adaptive immune system. Here we present an initial survey of the purple sea urchin genome for genes associated with immunity. An elaborate repertoire of potential immune receptors, regulators and effectors is present, including unprecedented expansions of innate pathogen recognition genes. These include a diverse array of 222 Toll-like receptor (TLR) genes and a coordinate expansion of directly associated signaling adaptors. Notably, a subset of sea urchin TLR genes encodes receptors with structural characteristics previously identified only in protostomes. A similarly expanded set of 203 NOD/NALP-like cytoplasmic recognition proteins is present. These genes have previously been identified only in vertebrates where they are represented in much lower numbers. Genes that mediate the alternative and lectin complement pathways are described, while gene homologues of the terminal pathway are not present. We have also identified several homologues of genes that function in jawed vertebrate adaptive immunity. The most striking of these is a gene cluster with similarity to the jawed vertebrate Recombination Activating Genes 1 and 2 (RAG1/2). Sea urchins are long-lived, complex organisms and these findings reveal an innate immune system of unprecedented complexity. Whether the presumably intense selective processes that molded these gene families also gave rise to novel immune mechanisms akin to adaptive systems remains to be seen. The genome sequence provides immediate opportunities to apply the advantages of the sea urchin model toward problems in developmental and evolutionary immunobiology. PMID:17027739

  6. Genomic analyses of bacterial porin-cytochrome gene clusters

    SciTech Connect

    Shi, Liang; Fredrickson, James K.; Zachara, John M.

    2014-11-26

    In this study, the porin-cytochrome (Pcc) protein complex is responsible for trans-outer membrane electron transfer during extracellular reduction of Fe(III) by the dissimilatory metal-reducing bacterium Geobacter sulfurreducens PCA. The identified and characterized Pcc complex of G. sulfurreducens PCA consists of a porin-like outer-membrane protein, a periplasmic 8-heme c type cytochrome (c-Cyt) and an outer-membrane 12-heme c-Cyt, and the genes encoding the Pcc proteins are clustered in the same regions of genome (i.e., the pcc gene clusters) of G. sulfurreducens PCA. A survey of additionally microbial genomes has identified the pcc gene clusters in all sequenced Geobacter spp. and other bacteria from six different phyla, including Anaeromyxobacter dehalogenans 2CP-1, A. dehalogenans 2CP-C, Anaeromyxobacter sp. K, Candidatus Kuenenia stuttgartiensis, Denitrovibrio acetiphilus DSM 12809, Desulfurispirillum indicum S5, Desulfurivibrio alkaliphilus AHT2, Desulfurobacterium thermolithotrophum DSM 11699, Desulfuromonas acetoxidans DSM 684, Ignavibacterium album JCM 16511, and Thermovibrio ammonificans HB-1. The numbers of genes in the pcc gene clusters vary, ranging from two to nine. Similar to the metal-reducing (Mtr) gene clusters of other Fe(III)-reducing bacteria, such as Shewanella spp., additional genes that encode putative c-Cyts with predicted cellular localizations at the cytoplasmic membrane, periplasm and outer membrane often associate with the pcc gene clusters. This suggests that the Pcc-associated c-Cyts may be part of the pathways for extracellular electron transfer reactions. The presence of pcc gene clusters in the microorganisms that do not reduce solid-phase Fe(III) and Mn(IV) oxides, such as D. alkaliphilus AHT2 and I. album JCM 16511, also suggests that some of the pcc gene clusters may be involved in extracellular

  7. Rapid genome reshaping by multiple-gene loss after whole-genome duplication in teleost fish suggested by mathematical modeling

    PubMed Central

    Sato, Yukuto; Tsukamoto, Katsumi; Nishida, Mutsumi

    2015-01-01

    Whole-genome duplication (WGD) is believed to be a significant source of major evolutionary innovation. Redundant genes resulting from WGD are thought to be lost or acquire new functions. However, the rates of gene loss and thus temporal process of genome reshaping after WGD remain unclear. The WGD shared by all teleost fish, one-half of all jawed vertebrates, was more recent than the two ancient WGDs that occurred before the origin of jawed vertebrates, and thus lends itself to analysis of gene loss and genome reshaping. Using a newly developed orthology identification pipeline, we inferred the post–teleost-specific WGD evolutionary histories of 6,892 protein-coding genes from nine phylogenetically representative teleost genomes on a time-calibrated tree. We found that rapid gene loss did occur in the first 60 My, with a loss of more than 70–80% of duplicated genes, and produced similar genomic gene arrangements within teleosts in that relatively short time. Mathematical modeling suggests that rapid gene loss occurred mainly by events involving simultaneous loss of multiple genes. We found that the subsequent 250 My were characterized by slow and steady loss of individual genes. Our pipeline also identified about 1,100 shared single-copy genes that are inferred to have become singletons before the divergence of clupeocephalan teleosts. Therefore, our comparative genome analysis suggests that rapid gene loss just after the WGD reshaped teleost genomes before the major divergence, and provides a useful set of marker genes for future phylogenetic analysis. PMID:26578810

  8. Genomic variation of Trypanosoma cruzi: involvement of multicopy genes.

    PubMed Central

    Wagner, W; So, M

    1990-01-01

    By using improved pulsed field gel conditions, the karyotypes of several strains of the protozoan parasite Trypanosoma cruzi were analyzed and compared with those of Leishmania major and two other members of the genus Trypanosoma. There was no difference in chromosome migration patterns between different life cycle stages of the T. cruzi strains analyzed. However, the sizes and numbers of chromosomal bands varied considerably among T. cruzi strains. This karyotype variation among T. cruzi strains was analyzed further at the chromosomal level by using multicopy genes as probes in Southern hybridizations. The chromosomal location of the genes encoding alpha- and beta-tubulin, ubiquitin, rRNA, spliced leader RNA, and an 85-kilodalton protein remained stable during developmental conversion of the parasite. The sizes and numbers of chromosomes containing these sequences varied among the different strains analyzed, implying multiple rearrangements of these genes during evolution of the parasites. During continuous in vitro cultivation of T. cruzi Y, the chromosomal location of the spliced leader gene shifted spontaneously. The spliced leader gene encodes a 35-nucleotide RNA that is spliced in trans from a 105-nucleotide donor RNA onto all mRNAs in T. cruzi. The spliced leader sequences changed in their physical location in both the cloned and uncloned Y strains. Associated with the complex changes was an increase in the infectivity of the rearranged variant for tissue culture cells. Our results indicate that the spliced leader gene clusters in T. cruzi undergo high-frequency genomic rearrangements. Images PMID:2169461

  9. Genome-enabled Discovery of Carbon Sequestration Genes

    SciTech Connect

    Tuskan, Gerald A; Tschaplinski, Timothy J; Kalluri, Udaya C; Yin, Tongming; Yang, Xiaohan; Zhang, Xinye; Engle, Nancy L; Ranjan, Priya; Basu, Manojit M; Gunter, Lee E; Jawdy, Sara; Martin, Madhavi Z; Campbell, Alina S; DiFazio, Stephen P; Davis, John M; Hinchee, Maud; Pinnacchio, Christa; Meilan, R; Busov, V.; Strauss, S

    2009-01-01

    The fate of carbon below ground is likely to be a major factor determining the success of carbon sequestration strategies involving plants. Despite their importance, molecular processes controlling belowground C allocation and partitioning are poorly understood. This project is leveraging the Populus trichocarpa genome sequence to discover genes important to C sequestration in plants and soils. The focus is on the identification of genes that provide key control points for the flow and chemical transformations of carbon in roots, concentrating on genes that control the synthesis of chemical forms of carbon that result in slower turnover rates of soil organic matter (i.e., increased recalcitrance). We propose to enhance carbon allocation and partitioning to roots by 1) modifying the auxin signaling pathway, and the invertase family, which controls sucrose metabolism, and by 2) increasing root proliferation through transgenesis with genes known to control fine root proliferation (e.g., ANT), 3) increasing the production of recalcitrant C metabolites by identifying genes controlling secondary C metabolism by a major mQTL-based gene discovery effort, and 4) increasing aboveground productivity by enhancing drought tolerance to achieve maximum C sequestration. This broad, integrated approach is aimed at ultimately enhancing root biomass as well as root detritus longevity, providing the best prospects for significant enhancement of belowground C sequestration.

  10. Census of solo LuxR genes in prokaryotic genomes

    PubMed Central

    Hudaiberdiev, Sanjarbek; Choudhary, Kumari S.; Vera Alvarez, Roberto; Gelencsér, Zsolt; Ligeti, Balázs; Lamba, Doriano; Pongor, Sándor

    2015-01-01

    luxR genes encode transcriptional regulators that control acyl homoserine lactone-based quorum sensing (AHL QS) in Gram negative bacteria. On the bacterial chromosome, luxR genes are usually found next or near to a luxI gene encoding the AHL signal synthase. Recently, a number of luxR genes were described that have no luxI genes in their vicinity on the chromosome. These so-called solo luxR genes may either respond to internal AHL signals produced by a non-adjacent luxI in the chromosome, or can respond to exogenous signals. Here we present a survey of solo luxR genes found in complete and draft bacterial genomes in the NCBI databases using HMMs. We found that 2698 of the 3550 luxR genes found are solos, which is an unexpectedly high number even if some of the hits may be false positives. We also found that solo LuxR sequences form distinct clusters that are different from the clusters of LuxR sequences that are part of the known luxR-luxI topological arrangements. We also found a number of cases that we termed twin luxR topologies, in which two adjacent luxR genes were in tandem or divergent orientation. Many of the luxR solo clusters were devoid of the sequence motifs characteristic of AHL binding LuxR proteins so there is room to speculate that the solos may be involved in sensing hitherto unknown signals. It was noted that only some of the LuxR clades are rich in conserved cysteine residues. Molecular modeling suggests that some of the cysteines may be involved in disulfide formation, which makes us speculate that some LuxR proteins, including some of the solos may be involved in redox regulation. PMID:25815274

  11. Profiling of chicken adipose tissue gene expression by genome array

    PubMed Central

    Wang, Hong-Bao; Li, Hui; Wang, Qi-Gui; Zhang, Xin-Yu; Wang, Shou-Zhi; Wang, Yu-Xiang; Wang, Xiu-Ping

    2007-01-01

    Background Excessive accumulation of lipids in the adipose tissue is a major problem in the present-day broiler industry. However, few studies have analyzed the expression of adipose tissue genes that are involved in pathways and mechanisms leading to adiposity in chickens. Gene expression profiling of chicken adipose tissue could provide key information about the ontogenesis of fatness and clarify the molecular mechanisms underlying obesity. In this study, Chicken Genome Arrays were used to construct an adipose tissue gene expression profile of 7-week-old broilers, and to screen adipose tissue genes that are differentially expressed in lean and fat lines divergently selected over eight generations for high and low abdominal fat weight. Results The gene expression profiles detected 13,234–16,858 probe sets in chicken adipose tissue at 7 weeks, and genes involved in lipid metabolism and immunity such as fatty acid binding protein (FABP), thyroid hormone-responsive protein (Spot14), lipoprotein lipase(LPL), insulin-like growth factor binding protein 7(IGFBP7) and major histocompatibility complex (MHC), were highly expressed. In contrast, some genes related to lipogenesis, such as leptin receptor, sterol regulatory element binding proteins1 (SREBP1), apolipoprotein B(ApoB) and insulin-like growth factor 2(IGF2), were not detected. Moreover, 230 genes that were differentially expressed between the two lines were screened out; these were mainly involved in lipid metabolism, signal transduction, energy metabolism, tumorigenesis and immunity. Subsequently, real-time RT-PCR was performed to validate fifteen differentially expressed genes screened out by the microarray approach and high consistency was observed between the two methods. Conclusion Our results establish the groundwork for further studies of the basic genetic control of growth and development of chicken adipose tissue, and will be beneficial in clarifying the molecular mechanism of obesity in chickens. PMID

  12. Orthopoxvirus Genome Evolution: The Role of Gene Loss

    PubMed Central

    Hendrickson, Robert Curtis; Wang, Chunlin; Hatcher, Eneida L.; Lefkowitz, Elliot J.

    2010-01-01

    Poxviruses are highly successful pathogens, known to infect a variety of hosts. The family Poxviridae includes Variola virus, the causative agent of smallpox, which has been eradicated as a public health threat but could potentially reemerge as a bioterrorist threat. The risk scenario includes other animal poxviruses and genetically engineered manipulations of poxviruses. Studies of orthologous gene sets have established the evolutionary relationships of members within the Poxviridae family. It is not clear, however, how variations between family members arose in the past, an important issue in understanding how these viruses may vary and possibly produce future threats. Using a newly developed poxvirus-specific tool, we predicted accurate gene sets for viruses with completely sequenced genomes in the genus Orthopoxvirus. Employing sensitive sequence comparison techniques together with comparison of syntenic gene maps, we established the relationships between all viral gene sets. These techniques allowed us to unambiguously identify the gene loss/gain events that have occurred over the course of orthopoxvirus evolution. It is clear that for all existing Orthopoxvirus species, no individual species has acquired protein-coding genes unique to that species. All existing species contain genes that are all present in members of the species Cowpox virus and that cowpox virus strains contain every gene present in any other orthopoxvirus strain. These results support a theory of reductive evolution in which the reduction in size of the core gene set of a putative ancestral virus played a critical role in speciation and confining any newly emerging virus species to a particular environmental (host or tissue) niche. PMID:21994715

  13. Child Development and Structural Variation in the Human Genome

    ERIC Educational Resources Information Center

    Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

    2013-01-01

    Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…

  14. The complete mitochondrial genome of the house dust mite Dermatophagoides pteronyssinus (Trouessart): a novel gene arrangement among arthropods

    PubMed Central

    Dermauw, Wannes; Van Leeuwen, Thomas; Vanholme, Bartel; Tirry, Luc

    2009-01-01

    Background The apparent scarcity of available sequence data has greatly impeded evolutionary studies in Acari (mites and ticks). This subclass encompasses over 48,000 species and forms the largest group within the Arachnida. Although mitochondrial genomes are widely utilised for phylogenetic and population genetic studies, only 20 mitochondrial genomes of Acari have been determined, of which only one belongs to the diverse order of the Sarcoptiformes. In this study, we describe the mitochondrial genome of the European house dust mite Dermatophagoides pteronyssinus, the most important member of this largely neglected group. Results The mitochondrial genome of D. pteronyssinus is a circular DNA molecule of 14,203 bp. It contains the complete set of 37 genes (13 protein coding genes, 2 rRNA genes and 22 tRNA genes), usually present in metazoan mitochondrial genomes. The mitochondrial gene order differs considerably from that of other Acari mitochondrial genomes. Compared to the mitochondrial genome of Limulus polyphemus, considered as the ancestral arthropod pattern, only 11 of the 38 gene boundaries are conserved. The majority strand has a 72.6% AT-content but a GC-skew of 0.194. This skew is the reverse of that normally observed for typical animal mitochondrial genomes. A microsatellite was detected in a large non-coding region (286 bp), which probably functions as the control region. Almost all tRNA genes lack a T-arm, provoking the formation of canonical cloverleaf tRNA-structures, and both rRNA genes are considerably reduced in size. Finally, the genomic sequence was used to perform a phylogenetic study. Both maximum likelihood and Bayesian inference analysis clustered D. pteronyssinus with Steganacarus magnus, forming a sistergroup of the Trombidiformes. Conclusion Although the mitochondrial genome of D. pteronyssinus shares different features with previously characterised Acari mitochondrial genomes, it is unique in many ways. Gene order is extremely rearranged

  15. Methods and strategies for gene structure curation in WormBase

    PubMed Central

    Williams, G.W.; Davis, P.A.; Rogers, A.S.; Bieri, T.; Ozersky, P.; Spieth, J.

    2011-01-01

    The Caenorhabditis elegans genome sequence was published over a decade ago; this was the first published genome of a multi-cellular organism and now the WormBase project has had a decade of experience in curating this genome's sequence and gene structures. In one of its roles as a central repository for nematode biology, WormBase continues to refine the gene structure annotations using sequence similarity and other computational methods, as well as information from the literature- and community-submitted annotations. We describe the various methods of gene structure curation that have been tried by WormBase and the problems associated with each of them. We also describe the current strategy for gene structure curation, and introduce the WormBase ‘curation tool’, which integrates different data sources in order to identify new and correct gene structures. Database URL: http://www.wormbase.org/ PMID:21543339

  16. Draft Genome of the Wheat Rust Pathogen (Puccinia triticina) Unravels Genome-Wide Structural Variations during Evolution.

    PubMed

    Kiran, Kanti; Rawal, Hukam C; Dubey, Himanshu; Jaswal, Rajdeep; Devanna, B N; Gupta, Deepak Kumar; Bhardwaj, Subhash C; Prasad, P; Pal, Dharam; Chhuneja, Parveen; Balasubramanian, P; Kumar, J; Swami, M; Solanke, Amolkumar U; Gaikwad, Kishor; Singh, Nagendra K; Sharma, Tilak Raj

    2016-01-01

    Leaf rust is one of the most important diseases of wheat and is caused by Puccinia triticina, a highly variable rust pathogen prevalent worldwide. Decoding the genome of this pathogen will help in unraveling the molecular basis of its evolution and in the identification of genes responsible for its various biological functions. We generated high quality draft genome sequences (approximately 100- 106 Mb) of two races of P. triticina; the variable and virulent Race77 and the old, avirulent Race106. The genomes of races 77 and 106 had 33X and 27X coverage, respectively. We predicted 27678 and 26384 genes, with average lengths of 1,129 and 1,086 bases in races 77 and 106, respectively and found that the genomes consisted of 37.49% and 39.99% repetitive sequences. Genome wide comparative analysis revealed that Race77 differs substantially from Race106 with regard to segmental duplication (SD), repeat element, and SNP/InDel characteristics. Comparative analyses showed that Race 77 is a recent, highly variable and adapted Race compared with Race106. Further sequence analyses of 13 additional pathotypes of Race77 clearly differentiated the recent, active and virulent, from the older pathotypes. Average densities of 2.4 SNPs and 0.32 InDels per kb were obtained for all P. triticina pathotypes. Secretome analysis demonstrated that Race77 has more virulence factors than Race 106, which may be responsible for the greater degree of adaptation of this pathogen. We also found that genes under greater selection pressure were conserved in the genomes of both races, and may affect functions crucial for the higher levels of virulence factors in Race77. This study provides insights into the genome structure, genome organization, molecular basis of variation, and pathogenicity of P. triticina The genome sequence data generated in this study have been submitted to public domain databases and will be an important resource for comparative genomics studies of the more than 4000 existing

  17. Genome structures and transcriptomes signify niche adaptation for the multiple-ion-tolerant extremophyte Schrenkiella parvula.

    PubMed

    Oh, Dong-Ha; Hong, Hyewon; Lee, Sang Yeol; Yun, Dae-Jin; Bohnert, Hans J; Dassanayake, Maheshi

    2014-04-01

    Schrenkiella parvula (formerly Thellungiella parvula), a close relative of Arabidopsis (Arabidopsis thaliana) and Brassica crop species, thrives on the shores of Lake Tuz, Turkey, where soils accumulate high concentrations of multiple-ion salts. Despite the stark differences in adaptations to extreme salt stresses, the genomes of S. parvula and Arabidopsis show extensive synteny. S. parvula completes its life cycle in the presence of Na⁺, K⁺, Mg²⁺, Li⁺, and borate at soil concentrations lethal to Arabidopsis. Genome structural variations, including tandem duplications and translocations of genes, interrupt the colinearity observed throughout the S. parvula and Arabidopsis genomes. Structural variations distinguish homologous gene pairs characterized by divergent promoter sequences and basal-level expression strengths. Comparative RNA sequencing reveals the enrichment of ion-transport functions among genes with higher expression in S. parvula, while pathogen defense-related genes show higher expression in Arabidopsis. Key stress-related ion transporter genes in S. parvula showed increased copy number, higher transcript dosage, and evidence for subfunctionalization. This extremophyte offers a framework to identify the requisite adjustments of genomic architecture and expression control for a set of genes found in most plants in a way to support distinct niche adaptation and lifestyles. PMID:24563282

  18. The compact Selaginella genome identifies changes in gene content associated with the evolution of vascular plants

    SciTech Connect

    Grigoriev, Igor V.; Banks, Jo Ann; Nishiyama, Tomoaki; Hasebe, Mitsuyasu; Bowman, John L.; Gribskov, Michael; dePamphilis, Claude; Albert, Victor A.; Aono, Naoki; Aoyama, Tsuyoshi; Ambrose, Barbara A.; Ashton, Neil W.; Axtell, Michael J.; Barker, Elizabeth; Barker, Michael S.; Bennetzen, Jeffrey L.; Bonawitz, Nicholas D.; Chapple, Clint; Cheng, Chaoyang; Correa, Luiz Gustavo Guedes; Dacre, Michael; DeBarry, Jeremy; Dreyer, Ingo; Elias, Marek; Engstrom, Eric M.; Estelle, Mark; Feng, Liang; Finet, Cedric; Floyd, Sandra K.; Frommer, Wolf B.; Fujita, Tomomichi; Gramzow, Lydia; Gutensohn, Michael; Harholt, Jesper; Hattori, Mitsuru; Heyl, Alexander; Hirai, Tadayoshi; Hiwatashi, Yuji; Ishikawa, Masaki; Iwata, Mineko; Karol, Kenneth G.; Koehler, Barbara; Kolukisaoglu, Uener; Kubo, Minoru; Kurata, Tetsuya; Lalonde, Sylvie; Li, Kejie; Li, Ying; Litt, Amy; Lyons, Eric; Manning, Gerard; Maruyama, Takeshi; Michael, Todd P.; Mikami, Koji; Miyazaki, Saori; Morinaga, Shin-ichi; Murata, Takashi; Mueller-Roeber, Bernd; Nelson, David R.; Obara, Mari; Oguri, Yasuko; Olmstead, Richard G.; Onodera, Naoko; Petersen, Bent Larsen; Pils, Birgit; Prigge, Michael; Rensing, Stefan A.; Riano-Pachon, Diego Mauricio; Roberts, Alison W.; Sato, Yoshikatsu; Scheller, Henrik Vibe; Schulz, Burkhard; Schulz, Christian; Shakirov, Eugene V.; Shibagaki, Nakako; Shinohara, Naoki; Shippen, Dorothy E.; Sorensen, Iben; Sotooka, Ryo; Sugimoto, Nagisa; Sugita, Mamoru; Sumikawa, Naomi; Tanurdzic, Milos; Theilsen, Gunter; Ulvskov, Peter; Wakazuki, Sachiko; Weng, Jing-Ke; Willats, William W.G.T.; Wipf, Daniel; Wolf, Paul G.; Yang, Lixing; Zimmer, Andreas D.; Zhu, Qihui; Mitros, Therese; Hellsten, Uffe; Loque, Dominique; Otillar, Robert; Salamov, Asaf; Schmutz, Jeremy; Shapiro, Harris; Lindquist, Erika; Lucas, Susan; Rokhsar, Daniel

    2011-04-28

    We report the genome sequence of the nonseed vascular plant, Selaginella moellendorffii, and by comparative genomics identify genes that likely played important roles in the early evolution of vascular plants and their subsequent evolution

  19. Genes encoding calmodulin-binding proteins in the Arabidopsis genome

    NASA Technical Reports Server (NTRS)

    Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

    2002-01-01

    Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.

  20. The first complete mitochondrial genome sequences of Amblypygi (Chelicerata: Arachnida) reveal conservation of the ancestral arthropod gene order.

    PubMed

    Fahrein, Kathrin; Masta, Susan E; Podsiadlowski, Lars

    2009-05-01

    Amblypygi (whip spiders) are terrestrial chelicerates inhabiting the subtropics and tropics. In morphological and rRNA-based phylogenetic analyses, Amblypygi cluster with Uropygi (whip scorpions) and Araneae (spiders) to form the taxon Tetrapulmonata, but there is controversy regarding the interrelationship of these three taxa. Mitochondrial genomes provide an additional large data set of phylogenetic information (sequences, gene order, RNA secondary structure), but in arachnids, mitochondrial genome data are missing for some of the major orders. In the course of an ongoing project concerning arachnid mitochondrial genomics, we present the first two complete mitochondrial genomes from Amblypygi. Both genomes were found to be typical circular duplex DNA molecules with all 37 genes usually present in bilaterian mitochondrial genomes. In both species, gene order is identical to that of Limulus polyphemus (Xiphosura), which is assumed to reflect the putative arthropod ground pattern. All tRNA gene sequences have the potential to fold into structures that are typical of metazoan mitochondrial tRNAs, except for tRNA-Ala, which lacks the D arm in both amblypygids, suggesting the loss of this feature early in amblypygid evolution. Phylogenetic analysis resulted in weak support for Uropygi being the sister group of Amblypygi. PMID:19448726

  1. TreeGenes: A Forest Tree Genome Database

    PubMed Central

    Wegrzyn, Jill L.; Lee, Jennifer M.; Tearse, Brandon R.; Neale, David B.

    2008-01-01

    The Dendrome Project and associated TreeGenes database serve the forest genetics research community through a curated and integrated web-based relational database. The research community is composed of approximately 2 000 members representing over 730 organizations worldwide. The database itself is composed of a wide range of genetic data from many forest trees with focused efforts on commercially important members of the Pinaceae family. The primary data types curated include species, publications, tree and DNA extraction information, genetic maps, molecular markers, ESTs, genotypic, and phenotypic data. There are currently ten main search modules or user access points within this PostgreSQL database. These access points allow users to navigate logically through the related data types. The goals of the Dendrome Project are to (1) provide a comprehensive resource for forest tree genomics data to facilitate gene discovery in related species, (2) develop interfaces that encourage the submission and integration of all genomic data, and to (3) centralize and distribute existing and novel online tools for the research community that both support and ease analysis. Recent developments have focused on increasing data content, functional annotations, data retrieval, and visualization tools. TreeGenes was developed to provide a centralized web resource with analysis and visualization tools to support data storage and exchange. PMID:18725987

  2. Genomic convergence: identifying candidate genes for Parkinson's disease by combining serial analysis of gene expression and genetic linkage.

    PubMed

    Hauser, Michael A; Li, Yi-Ju; Takeuchi, Satoshi; Walters, Robert; Noureddine, Maher; Maready, Melinda; Darden, Tiffany; Hulette, Christine; Martin, Eden; Hauser, Elizabeth; Xu, Hong; Schmechel, Don; Stenger, Judith E; Dietrich, Fred; Vance, Jeffery

    2003-03-15

    We present a multifactorial, multistep approach called genomic convergence that combines gene expression with genomic linkage analysis to identify and prioritize candidate susceptibility genes for Parkinson's disease (PD). To initiate this process, we used serial analysis of gene expression (SAGE) to identify genes expressed in two normal substantia nigras (SN) and adjacent midbrain tissue. This identified over 3700 transcripts, including the three most abundant SAGE tags, which did not correspond to any known genes or ESTs. We developed high-throughput bioinformatics methods to map the genes corresponding to these tags and identified 402 SN genes that lay within five large genomic linkage regions, previously identified in 174 multiplex PD families. These genes represent excellent candidates for PD susceptibility alleles and further genomic convergence and analyses. PMID:12620972

  3. The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis

    PubMed Central

    Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

    2015-01-01

    Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5’ portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids. PMID:26046631

  4. Highly efficient CRISPR/Cas9-mediated TAR cloning of genes and chromosomal loci from complex genomes in yeast

    PubMed Central

    Lee, Nicholas C.O.; Larionov, Vladimir; Kouprina, Natalay

    2015-01-01

    Transformation-associated recombination (TAR) protocol allowing the selective isolation of full-length genes complete with their distal enhancer regions and entire genomic loci with sizes up to 250 kb from complex genomes in yeast S. cerevisiae has been developed more than a decade ago. However, its wide spread usage has been impeded by a low efficiency (0.5–2%) of chromosomal region capture during yeast transformants which in turn requires a time-consuming screen of hundreds of colonies. Here, we demonstrate that pre-treatment of genomic DNA with CRISPR-Cas9 nucleases to generate double-strand breaks near the targeted genomic region results in a dramatic increase in the fraction of gene-positive colonies (up to 32%). As only a dozen or less yeast transformants need to be screened to obtain a clone with the desired chromosomal region, extensive experience with yeast is no longer required. A TAR-CRISPR protocol may help to create a bank of human genes, each represented by a genomic copy containing its native regulatory elements, that would lead to a significant advance in functional, structural and comparative genomics, in diagnostics, gene replacement, generation of animal models for human diseases and has a potential for gene therapy. PMID:25690893

  5. CASSIS and SMIPS: promoter-based prediction of secondary metabolite gene clusters in eukaryotic genomes

    PubMed Central

    Wolf, Thomas; Shelest, Vladimir; Nath, Neetika; Shelest, Ekaterina

    2016-01-01

    Motivation: Secondary metabolites (SM) are structurally diverse natural products of high pharmaceutical importance. Genes involved in their biosynthesis are often organized in clusters, i.e., are co-localized and co-expressed. In silico cluster prediction in eukaryotic genomes remains problematic mainly due to the high variability of the clusters’ content and lack of other distinguishing sequence features. Results: We present Cluster Assignment by Islands of Sites (CASSIS), a method for SM cluster prediction in eukaryotic genomes, and Secondary Metabolites by InterProScan (SMIPS), a tool for genome-wide detection of SM key enzymes (‘anchor’ genes): polyketide synthases, non-ribosomal peptide synthetases and dimethylallyl tryptophan synthases. Unlike other tools based on protein similarity, CASSIS exploits the idea of co-regulation of the cluster genes, which assumes the existence of common regulatory patterns in the cluster promoters. The method searches for ‘islands’ of enriched cluster-specific motifs in the vicinity of anchor genes. It was validated in a series of cross-validation experiments and showed high sensitivity and specificity. Availability and implementation: CASSIS and SMIPS are freely available at https://sbi.hki-jena.de/cassis. Contact: thomas.wolf@leibniz-hki.de or ekaterina.shelest@leibniz-hki.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656005

  6. Genome structure and metabolic features in the red seaweed Chondrus crispus shed light on evolution of the Archaeplastida

    PubMed Central

    Collén, Jonas; Porcel, Betina; Carré, Wilfrid; Ball, Steven G.; Chaparro, Cristian; Tonon, Thierry; Barbeyron, Tristan; Michel, Gurvan; Noel, Benjamin; Valentin, Klaus; Elias, Marek; Artiguenave, François; Arun, Alok; Aury, Jean-Marc; Barbosa-Neto, José F.; Bothwell, John H.; Bouget, François-Yves; Brillet, Loraine; Cabello-Hurtado, Francisco; Capella-Gutiérrez, Salvador; Charrier, Bénédicte; Cladière, Lionel; Cock, J. Mark; Coelho, Susana M.; Colleoni, Christophe; Czjzek, Mirjam; Da Silva, Corinne; Delage, Ludovic; Denoeud, France; Deschamps, Philippe; Dittami, Simon M.; Gabaldón, Toni; Gachon, Claire M. M.; Groisillier, Agnès; Hervé, Cécile; Jabbari, Kamel; Katinka, Michael; Kloareg, Bernard; Kowalczyk, Nathalie; Labadie, Karine; Leblanc, Catherine; Lopez, Pascal J.; McLachlan, Deirdre H.; Meslet-Cladiere, Laurence; Moustafa, Ahmed; Nehr, Zofia; Nyvall Collén, Pi; Panaud, Olivier; Partensky, Frédéric; Poulain, Julie; Rensing, Stefan A.; Rousvoal, Sylvie; Samson, Gaelle; Symeonidi, Aikaterini; Weissenbach, Jean; Zambounis, Antonios; Wincker, Patrick; Boyen, Catherine

    2013-01-01

    Red seaweeds are key components of coastal ecosystems and are economically important as food and as a source of gelling agents, but their genes and genomes have received little attention. Here we report the sequencing of the 105-Mbp genome of the florideophyte Chondrus crispus (Irish moss) and the annotation of the 9,606 genes. The genome features an unusual structure characterized by gene-dense regions surrounded by repeat-rich regions dominated by transposable elements. Despite its fairly large size, this genome shows features typical of compact genomes, e.g., on average only 0.3 introns per gene, short introns, low median distance between genes, small gene families, and no indication of large-scale genome duplication. The genome also gives insights into the metabolism of marine red algae and adaptations to the marine environment, including genes related to halogen metabolism, oxylipins, and multicellularity (microRNA processing and transcription factors). Particularly interesting are features related to carbohydrate metabolism, which include a minimalistic gene set for starch biosynthesis, the presence of cellulose synthases acquired before the primary endosymbiosis showing the polyphyly of cellulose synthesis in Archaeplastida, and cellulases absent in terrestrial plants as well as the occurrence of a mannosylglycerate synthase potentially originating from a marine bacterium. To explain the observations on genome structure and gene content, we propose an evolutionary scenario involving an ancestral red alga that was driven by early ecological forces to lose genes, introns, and intergenetic DNA; this loss was followed by an expansion of genome size as a consequence of activity of transposable elements. PMID:23503846

  7. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions.

    PubMed

    Zhang, Yan-Cong; Lin, Kui

    2015-01-01

    Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828

  8. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions

    PubMed Central

    Zhang, Yan-Cong; Lin, Kui

    2015-01-01

    Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828

  9. Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting | Office of Cancer Genomics

    Cancer.gov

    The CRISPR/Cas9 system enables genome editing and somatic cell genetic screens in mammalian cells. We performed genome-scale loss-of-function screens in 33 cancer cell lines to identify genes essential for proliferation/survival and found a strong correlation between increased gene copy number and decreased cell viability after genome editing. Within regions of copy-number gain, CRISPR/Cas9 targeting of both expressed and unexpressed genes, as well as intergenic loci, led to significantly decreased cell proliferation through induction of a G2 cell-cycle arrest.

  10. Molecular Networking and Pattern-Based Genome Mining Improves discovery of biosynthetic gene clusters and their products from Salinispora species

    PubMed Central

    Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; Sarkar, Anindita; Li, Jie; Ziemert, Nadine; Wang, Mingxun; Bandeira, Nuno; Moore, Bradley S.; Dorrestein, Pieter C.; Jensen, Paul R.

    2015-01-01

    Summary Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. Here we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated the identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. These efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches. PMID:25865308

  11. Genomic Structure of an Economically Important Cyanobacterium, Arthrospira (Spirulina) platensis NIES-39

    PubMed Central

    Fujisawa, Takatomo; Narikawa, Rei; Okamoto, Shinobu; Ehira, Shigeki; Yoshimura, Hidehisa; Suzuki, Iwane; Masuda, Tatsuru; Mochimaru, Mari; Takaichi, Shinichi; Awai, Koichiro; Sekine, Mitsuo; Horikawa, Hiroshi; Yashiro, Isao; Omata, Seiha; Takarada, Hiromi; Katano, Yoko; Kosugi, Hiroki; Tanikawa, Satoshi; Ohmori, Kazuko; Sato, Naoki; Ikeuchi, Masahiko; Fujita, Nobuyuki; Ohmori, Masayuki

    2010-01-01

    A filamentous non-N2-fixing cyanobacterium, Arthrospira (Spirulina) platensis, is an important organism for industrial applications and as a food supply. Almost the complete genome of A. platensis NIES-39 was determined in this study. The genome structure of A. platensis is estimated to be a single, circular chromosome of 6.8 Mb, based on optical mapping. Annotation of this 6.7 Mb sequence yielded 6630 protein-coding genes as well as two sets of rRNA genes and 40 tRNA genes. Of the protein-coding genes, 78% are similar to those of other organisms; the remaining 22% are currently unknown. A total 612 kb of the genome comprise group II introns, insertion sequences and some repetitive elements. Group I introns are located in a protein-coding region. Abundant restriction-modification systems were determined. Unique features in the gene composition were noted, particularly in a large number of genes for adenylate cyclase and haemolysin-like Ca2+-binding proteins and in chemotaxis proteins. Filament-specific genes were highlighted by comparative genomic analysis. PMID:20203057

  12. Genomic organization of the human SCN5A gene encoding the cardiac sodium channel

    SciTech Connect

    Wang, Qing; Li, Zhizhong; Shen, Jiaxiang; Keating, M.T.

    1996-05-15

    The voltage-gated cardiac sodium channel, SCN5A, is responsible for the initial upstroke of the action potential. Mutations in the human SCN5A gene cause susceptibility to cardiac arrhythmias and sudden death in the long QT syndrome (LQT). In this report we characterize the genomic structure of SCN5A. SCN5A consists of 28 exons spanning approximately 80 kb on chromosome 3p21. We describe the sequences of all intron/exon boundaries and a dinucleotide repeat polymorphism in intron 16. Oligonucleotide primers based on exon-flanking sequences amplify all SCN5A exons by PCR. This work establishes the complete genomic organization of SCN5A and will enable high-resolution analyses of this locus for mutations associated with LQT and other phenotypes for which SCN5A may be a candidate gene. 40 refs., 4 figs., 2 tabs.

  13. A genome-wide MeSH-based literature mining system predicts implicit gene-to-gene relationships and networks

    PubMed Central

    2013-01-01

    Background The large amount of literature in the post-genomics era enables the study of gene interactions and networks using all available articles published for a specific organism. MeSH is a controlled vocabulary of medical and scientific terms that is used by biomedical scientists to manually index articles in the PubMed literature database. We hypothesized that genome-wide gene-MeSH term associations from the PubMed literature database could be used to predict implicit gene-to-gene relationships and networks. While the gene-MeSH associations have been used to detect gene-gene interactions in some studies, different methods have not been well compared, and such a strategy has not been evaluated for a genome-wide literature analysis. Genome-wide literature mining of gene-to-gene interactions allows ranking of the best gene interactions and investigation of comprehensive biological networks at a genome level. Results The genome-wide GenoMesh literature mining algorithm was developed by sequentially generating a gene-article matrix, a normalized gene-MeSH term matrix, and a gene-gene matrix. The gene-gene matrix relies on the calculation of pairwise gene dissimilarities based on gene-MeSH relationships. An optimized dissimilarity score was identified from six well-studied functions based on a receiver operating characteristic (ROC) analysis. Based on the studies with well-studied Escherichia coli and less-studied Brucella spp., GenoMesh was found to accurately identify gene functions using weighted MeSH terms, predict gene-gene interactions not reported in the literature, and cluster all the genes studied from an organism using the MeSH-based gene-gene matrix. A web-based GenoMesh literature mining program is also available at: http://genomesh.hegroup.org. GenoMesh also predicts gene interactions and networks among genes associated with specific MeSH terms or user-selected gene lists. Conclusions The GenoMesh algorithm and web program provide the first genome

  14. Genome sequence surveys of Brachiola algerae and Edhazardia aedis reveal microsporidia with low gene densities

    PubMed Central

    Williams, Bryony AP; Lee, Renny CH; Becnel, James J; Weiss, Louis M; Fast, Naomi M; Keeling, Patrick J

    2008-01-01

    Background Microsporidia are well known models of extreme nuclear genome reduction and compaction. The smallest microsporidian genomes have received the most attention, but genomes of different species range in size from 2.3 Mb to 19.5 Mb and the nature of the larger genomes remains unknown. Results Here we have undertaken genome sequence surveys of two diverse microsporidia, Brachiola algerae and Edhazardia aedis. In both species we find very large intergenic regions, many transposable elements, and a low gene-density, all in contrast to the small, model microsporidian genomes. We also find no recognizable genes that are not also found in other surveyed or sequenced microsporidian genomes. Conclusion Our results demonstrate that microsporidian genome architecture varies greatly between microsporidia. Much of the genome size difference could be accounted for by non-coding material, such as intergenic spaces and retrotransposons, and this suggests that the forces dictating genome size may vary across the phylum. PMID:18445287

  15. Identification and distribution of the NBS-LRR gene family in the cassava genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Plant resistance genes (R genes) exist in large families and usually contain both a nucleotide-binding site domain and a leucine-rich repeat domain, denoted NBS-LRR. The genome sequence of cassava (Manihot esculenta) is a valuable resource for analyzing the genomic organization of resistance genes i...

  16. A 4-gigabase physical map unlocks the structure and evolution of the complex genome of Aegilops tauschii, the wheat D-genome progenitor

    PubMed Central

    Luo, Ming-Cheng; Gu, Yong Q.; You, Frank M.; Deal, Karin R.; Ma, Yaqin; Hu, Yuqin; Huo, Naxin; Wang, Yi; Wang, Jirui; Chen, Shiyong; Jorgensen, Chad M.; Zhang, Yong; McGuire, Patrick E.; Pasternak, Shiran; Stein, Joshua C.; Ware, Doreen; Kramer, Melissa; McCombie, W. Richard; Kianian, Shahryar F.; Martis, Mihaela M.; Mayer, Klaus F. X.; Sehgal, Sunish K.; Li, Wanlong; Gill, Bikram S.; Bevan, Michael W.; Šimková, Hana; Doležel, Jaroslav; Weining, Song; Lazo, Gerard R.; Anderson, Olin D.; Dvorak, Jan

    2013-01-01

    The current limitations in genome sequencing technology require the construction of physical maps for high-quality draft sequences of large plant genomes, such as that of Aegilops tauschii, the wheat D-genome progenitor. To construct a physical map of the Ae. tauschii genome, we fingerprinted 461,706 bacterial artificial chromosome clones, assembled contigs, designed a 10K Ae. tauschii Infinium SNP array, constructed a 7,185-marker genetic map, and anchored on the map contigs totaling 4.03 Gb. Using whole genome shotgun reads, we extended the SNP marker sequences and found 17,093 genes and gene fragments. We showed that collinearity of the Ae. tauschii genes with Brachypodium distachyon, rice, and sorghum decreased with phylogenetic distance and that structural genome evolution rates have been high across all investigated lineages in subfamily Pooideae, including that of Brachypodieae. We obtained additional information about the evolution of the seven Triticeae chromosomes from 12 ancestral chromosomes and uncovered a pattern of centromere inactivation accompanying nested chromosome insertions in grasses. We showed that the density of noncollinear genes along the Ae. tauschii chromosomes positively correlates with recombination rates, suggested a cause, and showed that new genes, exemplified by disease resistance genes, are preferentially located in high-recombination chromosome regions. PMID:23610408

  17. Chromosome Mapping of Dragline Silk Genes in the Genomes of Widow Spiders (Araneae, Theridiidae)

    PubMed Central

    Zhao, Yonghui; Ayoub, Nadia A.; Hayashi, Cheryl Y.

    2010-01-01

    With its incredible strength and toughness, spider dragline silk is widely lauded for its impressive material properties. Dragline silk is composed of two structural proteins, MaSp1 and MaSp2, which are encoded by members of the spidroin gene family. While previous studies have characterized the genes that encode the constituent proteins of spider silks, nothing is known about the physical location of these genes. We determined karyotypes and sex chromosome organization for the widow spiders, Latrodectus hesperus and L. geometricus (Araneae, Theridiidae). We then used fluorescence in situ hybridization to map the genomic locations of the genes for the silk proteins that compose the remarkable spider dragline. These genes included three loci for the MaSp1 protein and the single locus for the MaSp2 protein. In addition, we mapped a MaSp1 pseudogene. All the MaSp1 gene copies and pseudogene localized to a single chromosomal region while MaSp2 was located on a different chromosome of L. hesperus. Using probes derived from L. hesperus, we comparatively mapped all three MaSp1 loci to a single region of a L. geometricus chromosome. As with L. hesperus, MaSp2 was found on a separate L. geometricus chromosome, thus again unlinked to the MaSp1 loci. These results indicate orthology of the corresponding chromosomal regions in the two widow genomes. Moreover, the occurrence of multiple MaSp1 loci in a conserved gene cluster across species suggests that MaSp1 proliferated by tandem duplication in a common ancestor of L. geometricus and L. hesperus. Unequal crossover events during recombination could have given rise to the gene copies and could also maintain sequence similarity among gene copies over time. Further comparative mapping with taxa of increasing divergence from Latrodectus will pinpoint when the MaSp1 duplication events occurred and the phylogenetic distribution of silk gene linkage patterns. PMID:20877726

  18. Genome-wide identification and evolutionary analysis of algal LPAT genes involved in TAG biosynthesis using bioinformatic approaches.

    PubMed

    Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar

    2014-12-01

    Lysophosphatidyl acyltransferase (LPAT) is one of the major triacylglycerol synthesis enzymes, controlling the metabolic flow of lysophosphatidic acid to phosphatidic acid. Experimental studies in Arabidopsis have shown that LPAT activity is exhibited primarily by three distinct isoforms, namely the plastid-located LPAT1, the endoplasmic reticulum-located LPAT2, and the soluble isoform of LPAT (solLPAT). In this study, 24 putative genes representing all LPAT isoforms were identified from the analysis of 11 complete genomes including green algae, red algae, diatoms and higher plants. We observed LPAT1 and solLPAT genes to be ubiquitously present in nearly all genomes examined, whereas LPAT2 genes to have evolved more recently in the plant lineage. Phylogenetic analysis indicated that LPAT1, LPAT2 and solLPAT have convergently evolved through separate evolutionary paths and belong to three different gene families, which was further evidenced by their wide divergence at gene structure and sequence level. The genome distribution supports the hypothesis that each gene encoding a LPAT is not duplicated. Mapping of exon-intron structure of LPAT genes to the domain structure of proteins across different algal and plant species indicates that exon shuffling plays no role in the evolution of LPAT genes. Besides the previously defined motifs, several conserved consensus sequences were discovered which could be useful to distinguish different LPAT isoforms. Taken together, this study will enable the generation of experimental approximations to better understand the functional role of algal LPAT in lipid accumulation. PMID:25280541

  19. Cell-of-Origin-Specific 3D Genome Structure Acquired during Somatic Cell Reprogramming

    PubMed Central

    Krijger, Peter Hugo Lodewijk; Di Stefano, Bruno; de Wit, Elzo; Limone, Francesco; van Oevelen, Chris; de Laat, Wouter; Graf, Thomas

    2016-01-01

    Summary Forced expression of reprogramming factors can convert somatic cells into induced pluripotent stem cells (iPSCs). Here we studied genome topology dynamics during reprogramming of different somatic cell types with highly distinct genome conformations. We find large-scale topologically associated domain (TAD) repositioning and alterations of tissue-restricted genomic neighborhoods and chromatin loops, effectively erasing the somatic-cell-specific genome structures while establishing an embryonic stem-cell-like 3D genome. Yet, early passage iPSCs carry topological hallmarks that enable recognition of their cell of origin. These hallmarks are not remnants of somatic chromosome topologies. Instead, the distinguishing topological features are acquired during reprogramming, as we also find for cell-of-origin-dependent gene expression patterns. PMID:26971819

  20. The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure.

    PubMed

    Kagale, Sateesh; Koh, Chushin; Nixon, John; Bollina, Venkatesh; Clarke, Wayne E; Tuteja, Reetu; Spillane, Charles; Robinson, Stephen J; Links, Matthew G; Clarke, Carling; Higgins, Erin E; Huebert, Terry; Sharpe, Andrew G; Parkin, Isobel A P

    2014-01-01

    Camelina sativa is an oilseed with desirable agronomic and oil-quality attributes for a viable industrial oil platform crop. Here we generate the first chromosome-scale high-quality reference genome sequence for C. sativa and annotated 89,418 protein-coding genes, representing a whole-genome triplication event relative to the crucifer model Arabidopsis thaliana. C. sativa represents the first crop species to be sequenced from lineage I of the Brassicaceae. The well-preserved hexaploid genome structure of C. sativa surprisingly mirrors those of economically important amphidiploid Brassica crop species from lineage II as well as wheat and cotton. The three genomes of C. sativa show no evidence of fractionation bias and limited expression-level bias, both characteristics commonly associated with polyploid evolution. The highly undifferentiated polyploid genome of C. sativa presents significant consequences for breeding and genetic manipulation of this industrial oil crop. PMID:24759634

  1. The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure

    PubMed Central

    Kagale, Sateesh; Koh, Chushin; Nixon, John; Bollina, Venkatesh; Clarke, Wayne E.; Tuteja, Reetu; Spillane, Charles; Robinson, Stephen J.; Links, Matthew G.; Clarke, Carling; Higgins, Erin E.; Huebert, Terry; Sharpe, Andrew G.; Parkin, Isobel A. P.

    2014-01-01

    Camelina sativa is an oilseed with desirable agronomic and oil-quality attributes for a viable industrial oil platform crop. Here we generate the first chromosome-scale high-quality reference genome sequence for C. sativa and annotated 89,418 protein-coding genes, representing a whole-genome triplication event relative to the crucifer model Arabidopsis thaliana. C. sativa represents the first crop species to be sequenced from lineage I of the Brassicaceae. The well-preserved hexaploid genome structure of C. sativa surprisingly mirrors those of economically important amphidiploid Brassica crop species from lineage II as well as wheat and cotton. The three genomes of C. sativa show no evidence of fractionation bias and limited expression-level bias, both characteristics commonly associated with polyploid evolution. The highly undifferentiated polyploid genome of C. sativa presents significant consequences for breeding and genetic manipulation of this industrial oil crop. PMID:24759634

  2. The TTSMI database: a catalog of triplex target DNA sites associated with genes and regulatory elements in the human genome

    PubMed Central

    Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A.

    2015-01-01

    A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser. PMID:25324314

  3. The TTSMI database: a catalog of triplex target DNA sites associated with genes and regulatory elements in the human genome.

    PubMed

    Jenjaroenpun, Piroon; Chew, Chee Siang; Yong, Tai Pang; Choowongkomon, Kiattawee; Thammasorn, Wimada; Kuznetsov, Vladimir A

    2015-01-01

    A triplex target DNA site (TTS), a stretch of DNA that is composed of polypurines, is able to form a triple-helix (triplex) structure with triplex-forming oligonucleotides (TFOs) and is able to influence the site-specific modulation of gene expression and/or the modification of genomic DNA. The co-localization of a genomic TTS with gene regulatory signals and functional genome structures suggests that TFOs could potentially be exploited in antigene strategies for the therapy of cancers and other genetic diseases. Here, we present the TTS Mapping and Integration (TTSMI; http://ttsmi.bii.a-star.edu.sg) database, which provides a catalog of unique TTS locations in the human genome and tools for analyzing the co-localization of TTSs with genomic regulatory sequences and signals that were identified using next-generation sequencing techniques and/or predicted by computational models. TTSMI was designed as a user-friendly tool that facilitates (i) fast searching/filtering of TTSs using several search terms and criteria associated with sequence stability and specificity, (ii) interactive filtering of TTSs that co-localize with gene regulatory signals and non-B DNA structures, (iii) exploration of dynamic combinations of the biological signals of specific TTSs and (iv) visualization of a TTS simultaneously with diverse annotation tracks via the UCSC genome browser. PMID:25324314

  4. Distal-less homeobox genes of insects and spiders: genomic organization, function, regulation and evolution.

    PubMed

    Chen, Bin; Piel, William H; Monteiro, Antónia

    2016-06-01

    The Distal-less (Dll) genes are homeodomain transcription factors that are present in most Metazoa and in representatives of all investigated arthropod groups. In Drosophila, the best studied insect, Dll plays an essential role in forming the proximodistal axis of the legs, antennae and analia, and in specifying antennal identity. The initiation of Dll expression in clusters of cells in mid-lateral regions of the Drosophila embryo represents the earliest genetic marker of limbs. Dll genes are involved in the development of the peripheral nervous system and sensitive organs, and they also function as master regulators of black pigmentation in some insect lineages. Here we analyze the complete genomes of six insects, the nematode Caenorhabditis elegans and Homo sapiens, as well as multiple Dll sequences available in databases in order to examine the structure and protein features of these genes. We also review the function, expression, regulation and evolution of arthropod Dll genes with emphasis on insects and spiders. PMID:26898323

  5. Identification and mapping of paralogous genes on a known genomic DNA sequence.

    PubMed

    Bina, Minou

    2006-01-01

    The completion of whole genome sequencing projects offers the opportunity to examine the organization of genes and the discovery of evolutionarily related genes in a given species. For the beginners in the field, through a specific example, this chapter provides a step-by-step procedure for identifying paralogous genes, using the genome browser at UCSC (http://genome.ucsc.edu/). The example describes identification and mapping in the human genome, the paralogs of TCF12/HTF4. The example identifies TCF3 and TCF4 as paralogs of the TCF12/HTF4 gene. The example also identifies a related sequence, corresponding to a pseudogene, in one of the introns of the JAK2 gene. The procedure described should be applicable to the discovery and creation of maps of paralogous genes in the genomic DNA sequences that are available at the genome browser at UCSC. PMID:16888348

  6. Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee.

    PubMed

    Ventura, Mario; Catacchio, Claudia R; Alkan, Can; Marques-Bonet, Tomas; Sajjadian, Saba; Graves, Tina A; Hormozdiari, Fereydoun; Navarro, Arcadi; Malig, Maika; Baker, Carl; Lee, Choli; Turner, Emily H; Chen, Lin; Kidd, Jeffrey M; Archidiacono, Nicoletta; Shendure, Jay; Wilson, Richard K; Eichler, Evan E

    2011-10-01

    Structural variation has played an important role in the evolutionary restructuring of human and great ape genomes. Recent analyses have suggested that the genomes of chimpanzee and human have been particularly enriched for this form of genetic variation. Here, we set out to assess the extent of structural variation in the gorilla lineage by generating 10-fold genomic sequence coverage from a western lowland gorilla and integrating these data into a physical and cytogenetic framework of structural variation. We discovered and validated over 7665 structural changes within the gorilla lineage, including sequence resolution of inversions, deletions, duplications, and mobile element insertions. A comparison with human and other ape genomes shows that the gorilla genome has been subjected to the highest rate of segmental duplication. We show that both the gorilla and chimpanzee genomes have experienced independent yet convergent patterns of structural mutation that have not occurred in humans, including the formation of subtelomeric heterochromatic caps, the hyperexpansion of segmental duplications, and bursts of retroviral integrations. Our analysis suggests that the chimpanzee and gorilla genomes are structurally more derived than either orangutan or human genomes. PMID:21685127

  7. Genome-Wide Analysis of the NAC Gene Family in Physic Nut (Jatropha curcas L.)

    PubMed Central

    Wu, Zhenying; Xu, Xueqin; Xiong, Wangdan; Wu, Pingzhi; Chen, Yaping; Li, Meiru; Wu, Guojiang; Jiang, Huawu

    2015-01-01

    The NAC proteins (NAM, ATAF1/2 and CUC2) are plant-specific transcriptional regulators that have a conserved NAM domain in the N-terminus. They are involved in various biological processes, including both biotic and abiotic stress responses. In the present study, a total of 100 NAC genes (JcNAC) were identified in physic nut (Jatropha curcas L.). Based on phylogenetic analysis and gene structures, 83 JcNAC genes were classified as members of, or proposed to be diverged from, 39 previously predicted orthologous groups (OGs) of NAC sequences. Physic nut has a single intron-containing NAC gene subfamily that has been lost in many plants. The JcNAC genes are non-randomly distributed across the 11 linkage groups of the physic nut genome, and appear to be preferentially retained duplicates that arose from both ancient and recent duplication events. Digital gene expression analysis indicates that some of the JcNAC genes have tissue-specific expression profiles (e.g. in leaves, roots, stem cortex or seeds), and 29 genes differentially respond to abiotic stresses (drought, salinity, phosphorus deficiency and nitrogen deficiency). Our results will be helpful for further functional analysis of the NAC genes in physic nut. PMID:26125188

  8. Genome-Wide Analysis of the NAC Gene Family in Physic Nut (Jatropha curcas L.).

    PubMed

    Wu, Zhenying; Xu, Xueqin; Xiong, Wangdan; Wu, Pingzhi; Chen, Yaping; Li, Meiru; Wu, Guojiang; Jiang, Huawu

    2015-01-01

    The NAC proteins (NAM, ATAF1/2 and CUC2) are plant-specific transcriptional regulators that have a conserved NAM domain in the N-terminus. They are involved in various biological processes, including both biotic and abiotic stress responses. In the present study, a total of 100 NAC genes (JcNAC) were identified in physic nut (Jatropha curcas L.). Based on phylogenetic analysis and gene structures, 83 JcNAC genes were classified as members of, or proposed to be diverged from, 39 previously predicted orthologous groups (OGs) of NAC sequences. Physic nut has a single intron-containing NAC gene subfamily that has been lost in many plants. The JcNAC genes are non-randomly distributed across the 11 linkage groups of the physic nut genome, and appear to be preferentially retained duplicates that arose from both ancient and recent duplication events. Digital gene expression analysis indicates that some of the JcNAC genes have tissue-specific expression profiles (e.g. in leaves, roots, stem cortex or seeds), and 29 genes differentially respond to abiotic stresses (drought, salinity, phosphorus deficiency and nitrogen deficiency). Our results will be helpful for further functional analysis of the NAC genes in physic nut. PMID:26125188

  9. High-Throughput Computational and Experimental Techniques in Structural Genomics

    PubMed Central

    Chance, Mark R.; Fiser, Andras; Sali, Andrej; Pieper, Ursula; Eswar, Narayanan; Xu, Guiping; Fajardo, J. Eduardo; Radhakannan, Thirumuruhan; Marinkovic, Nebojsa

    2004-01-01

    Structural genomics has as its goal the provision of structural information for all possible ORF sequences through a combination of experimental and computational approaches. The access to genome sequences and cloning resources from an ever-widening array of organisms is driving high-throughput structural studies by the New York Structural Genomics Research Consortium. In this report, we outline the progress of the Consortium in establishing its pipeline for structural genomics, and some of the experimental and bioinformatics efforts leading to structural annotation of proteins. The Consortium has established a pipeline for structural biology studies, automated modeling of ORF sequences using solved (template) structures, and a novel high-throughput approach (metallomics) to examining the metal binding to purified protein targets. The Consortium has so far produced 493 purified proteins from >1077 expression vectors. A total of 95 have resulted in crystal structures, and 81 are deposited in the Protein Data Bank (PDB). Comparative modeling of these structures has generated >40,000 structural models. We also initiated a high-throughput metal analysis of the purified proteins; this has determined that 10%-15% of the targets contain a stoichiometric structural or catalytic transition metal atom. The progress of the structural genomics centers in the U.S. and around the world suggests that the goal of providing useful structural information on most all ORF domains will be realized. This projected resource will provide structural biology information important to understanding the function of most proteins of the cell. PMID:15489337

  10. The Infinitely Many Genes Model for the Distributed Genome of Bacteria

    PubMed Central

    Baumdicker, Franz; Hess, Wolfgang R.; Pfaffelhuber, Peter

    2012-01-01

    The distributed genome hypothesis states that the gene pool of a bacterial taxon is much more complex than that found in a single individual genome. However, the possible fitness advantage, why such genomic diversity is maintained, whether this variation is largely adaptive or neutral, and why these distinct individuals can coexist, remains poorly understood. Here, we present the infinitely many genes (IMG) model, which is a quantitative, evolutionary model for the distributed genome. It is based on a genealogy of individual genomes and the possibility of gene gain (from an unbounded reservoir of novel genes, e.g., by horizontal gene transfer from distant taxa) and gene loss, for example, by pseudogenization and deletion of genes, during reproduction. By implementing these mechanisms, the IMG model differs from existing concepts for the distributed genome, which cannot differentiate between neutral evolution and adaptation as drivers of the observed genomic diversity. Using the IMG model, we tested whether the distributed genome of 22 full genomes of picocyanobacteria (Prochlorococcus and Synechococcus) shows signs of adaptation or neutrality. We calculated the effective population size of Prochlorococcus at 1.01 × 1011 and predicted 18 distinct clades for this population, only six of which have been isolated and cultured thus far. We predicted that the Prochlorococcus pangenome contains 57,792 genes and found that the evolution of the distributed genome of Prochlorococcus was possibly neutral, whereas that of Synechococcus and the combined sample shows a clear deviation from neutrality. PMID:22357598

  11. The genome of Chelonid herpesvirus 5 harbors atypical genes

    USGS Publications Warehouse

    Ackermann, Mathias; Koriabine, Maxim; Hartmann-Fritsch, Fabienne; de Jong, Pieter J.; Lewis, Teresa D.; Schetle, Nelli; Work, Thierry M.; Dagenais, Julie; Balazs, George H.; Leong, Jo-Ann C.

    2012-01-01

    The Chelonid fibropapilloma-associated herpesvirus (CFPHV; ChHV5) is believed to be the causative agent of fibropapillomatosis (FP), a neoplastic disease of marine turtles. While clinical signs and pathology of FP are well known, research on ChHV5 has been impeded because no cell culture system for its propagation exists. We have cloned a BAC containing ChHV5 in pTARBAC2.1 and determined its nucleotide sequence. Accordingly, ChHV5 has a type D genome and its predominant gene order is typical for the varicellovirus genus within the alphaherpesvirinae. However, at least four genes that are atypical for an alphaherpesvirus genome were also detected, i.e. two members of the C-type lectin-like domain superfamily (F-lec1, F-lec2), an orthologue to the mouse cytomegalovirus M04 (F-M04) and a viral sialyltransferase (F-sial). Four lines of evidence suggest that these atypical genes are truly part of the ChHV5 genome: (1) the pTARBAC insertion interrupted the UL52 ORF, leaving parts of the gene to either side of the insertion and suggesting that an intact molecule had been cloned. (2) Using FP-associated UL52 (F-UL52) as an anchor and the BAC-derived sequences as a means to generate primers, overlapping PCR was performed with tumor-derived DNA as template, which confirmed the presence of the same stretch of "atypical" DNA in independent FP cases. (3) Pyrosequencing of DNA from independent tumors did not reveal previously undetected viral sequences, suggesting that no apparent loss of viral sequence had happened due to the cloning strategy. (4) The simultaneous presence of previously known ChHV5 sequences and F-sial as well as F-M04 sequences was also confirmed in geographically distinct Australian cases of FP. Finally, transcripts of F-sial and F-M04 but not transcripts of lytic viral genes were detected in tumors from Hawaiian FP-cases. Therefore, we suggest that F-sial and F-M04 may play a role in FP pathogenesis.

  12. The Genome of Chelonid Herpesvirus 5 Harbors Atypical Genes

    PubMed Central

    Ackermann, Mathias; Koriabine, Maxim; Hartmann-Fritsch, Fabienne; de Jong, Pieter J.; Lewis, Teresa D.; Schetle, Nelli; Work, Thierry M.; Dagenais, Julie; Balazs, George H.; Leong, Jo-Ann C.

    2012-01-01

    The Chelonid fibropapilloma-associated herpesvirus (CFPHV; ChHV5) is believed to be the causative agent of fibropapillomatosis (FP), a neoplastic disease of marine turtles. While clinical signs and pathology of FP are well known, research on ChHV5 has been impeded because no cell culture system for its propagation exists. We have cloned a BAC containing ChHV5 in pTARBAC2.1 and determined its nucleotide sequence. Accordingly, ChHV5 has a type D genome and its predominant gene order is typical for the varicellovirus genus within the alphaherpesvirinae. However, at least four genes that are atypical for an alphaherpesvirus genome were also detected, i.e. two members of the C-type lectin-like domain superfamily (F-lec1, F-lec2), an orthologue to the mouse cytomegalovirus M04 (F-M04) and a viral sialyltransferase (F-sial). Four lines of evidence suggest that these atypical genes are truly part of the ChHV5 genome: (1) the pTARBAC insertion interrupted the UL52 ORF, leaving parts of the gene to either side of the insertion and suggesting that an intact molecule had been cloned. (2) Using FP-associated UL52 (F-UL52) as an anchor and the BAC-derived sequences as a means to generate primers, overlapping PCR was performed with tumor-derived DNA as template, which confirmed the presence of the same stretch of “atypical” DNA in independent FP cases. (3) Pyrosequencing of DNA from independent tumors did not reveal previously undetected viral sequences, suggesting that no apparent loss of viral sequence had happened due to the cloning strategy. (4) The simultaneous presence of previously known ChHV5 sequences and F-sial as well as F-M04 sequences was also confirmed in geographically distinct Australian cases of FP. Finally, transcripts of F-sial and F-M04 but not transcripts of lytic viral genes were detected in tumors from Hawaiian FP-cases. Therefore, we suggest that F-sial and F-M04 may play a role in FP pathogenesis. PMID:23056373

  13. The genome of Chelonid herpesvirus 5 harbors atypical genes.

    PubMed

    Ackermann, Mathias; Koriabine, Maxim; Hartmann-Fritsch, Fabienne; de Jong, Pieter J; Lewis, Teresa D; Schetle, Nelli; Work, Thierry M; Dagenais, Julie; Balazs, George H; Leong, Jo-Ann C

    2012-01-01

    The Chelonid fibropapilloma-associated herpesvirus (CFPHV; ChHV5) is believed to be the causative agent of fibropapillomatosis (FP), a neoplastic disease of marine turtles. While clinical signs and pathology of FP are well known, research on ChHV5 has been impeded because no cell culture system for its propagation exists. We have cloned a BAC containing ChHV5 in pTARBAC2.1 and determined its nucleotide sequence. Accordingly, ChHV5 has a type D genome and its predominant gene order is typical for the varicellovirus genus within the alphaherpesvirinae. However, at least four genes that are atypical for an alphaherpesvirus genome were also detected, i.e. two members of the C-type lectin-like domain superfamily (F-lec1, F-lec2), an orthologue to the mouse cytomegalovirus M04 (F-M04) and a viral sialyltransferase (F-sial). Four lines of evidence suggest that these atypical genes are truly part of the ChHV5 genome: (1) the pTARBAC insertion interrupted the UL52 ORF, leaving parts of the gene to either side of the insertion and suggesting that an intact molecule had been cloned. (2) Using FP-associated UL52 (F-UL52) as an anchor and the BAC-derived sequences as a means to generate primers, overlapping PCR was performed with tumor-derived DNA as template, which confirmed the presence of the same stretch of "atypical" DNA in independent FP cases. (3) Pyrosequencing of DNA from independent tumors did not reveal previously undetected viral sequences, suggesting that no apparent loss of viral sequence had happened due to the cloning strategy. (4) The simultaneous presence of previously known ChHV5 sequences and F-sial as well as F-M04 sequences was also confirmed in geographically distinct Australian cases of FP. Finally, transcripts of F-sial and F-M04 but not transcripts of lytic viral genes were detected in tumors from Hawaiian FP-cases. Therefore, we suggest that F-sial and F-M04 may play a role in FP pathogenesis. PMID:23056373

  14. The genome of Chelonid herpesvirus 5 harbors atypical genes

    USGS Publications Warehouse

    Ackermann, Mathias; Koriabine, Maxim; Hartmann-Fritsch, Fabienne; de Jong, Pieter J.; Lewis, Teresa D.; Schetle, Nelli; Work, Thierry M.; Dagenais, Julie; Balazs, George H.; Leong, Jo-Ann C.

    2012-01-01

    The Chelonid fibropapilloma-associated herpesvirus (CFPHV; ChHV5) is believed to be the causative agent of fibropapillomatosis (FP), a neoplastic disease of marine turtles. While clinical signs and pathology of FP are well known, research on ChHV5 has been impeded because no cell culture system for its propagation exists. We have cloned a BAC containing ChHV5 in pTARBAC2.1 and determined its nucleotide sequence. Accordingly, ChHV5 has a type D genome and its predominant gene order is typical for the varicellovirus genus within thealphaherpesvirinae. However, at least four genes that are atypical for an alphaherpesvirus genome were also detected, i.e. two members of the C-type lectin-like domain superfamily (F-lec1, F-lec2), an orthologue to the mouse cytomegalovirus M04 (F-M04) and a viral sialyltransferase (F-sial). Four lines of evidence suggest that these atypical genes are truly part of the ChHV5 genome: (1) the pTARBAC insertion interrupted the UL52 ORF, leaving parts of the gene to either side of the insertion and suggesting that an intact molecule had been cloned. (2) Using FP-associated UL52 (F-UL52) as an anchor and the BAC-derived sequences as a means to generate primers, overlapping PCR was performed with tumor-derived DNA as template, which confirmed the presence of the same stretch of “atypical” DNA in independent FP cases. (3) Pyrosequencing of DNA from independent tumors did not reveal previously undetected viral sequences, suggesting that no apparent loss of viral sequence had happened due to the cloning strategy. (4) The simultaneous presence of previously known ChHV5 sequences and F-sial as well as F-M04 sequences was also confirmed in geographically distinct Australian cases of FP. Finally, transcripts of F-sial and F-M04 but not transcripts of lytic viral genes were detected in tumors from Hawaiian FP-cases. Therefore, we suggest that F-sial and F-M04 may play a role in FP pathogenesis

  15. Connectivity of vertebrate genomes: Paired-related homeobox (Prrx) genes in spotted gar, basal teleosts, and tetrapods□

    PubMed Central

    Braasch, Ingo; Guiguen, Yann; Loker, Ryan; Letaw, John H.; Ferrara, Allyse; Bobe, Julien; Postlethwait, John H.

    2014-01-01

    Teleost fish are important models for human biology, health, and disease. Because genome duplication in a teleost ancestor (TGD) impacts the evolution of teleost genome structure and gene repertoires, we must discriminate gene functions that are shared and ancestral from those that are lineage-specific in teleosts or tetrapods to accurately apply inferences from teleost disease models to human health. Generalizations must account both for the TGD and for divergent evolution between teleosts and tetrapods after the likely two rounds of genome duplication shared by all vertebrates. Progress in sequencing techniques provides new opportunities to generate genomic and transcriptomic information from a broad range of phylogenetically informative taxa that facilitate detailed understanding of gene family and gene function evolution. We illustrate here the use of new sequence resources from spotted gar (Lepisosteus oculatus), a rayfin fish that diverged from teleosts before the TGD, as well as RNA-Seq data from gar and multiple teleost lineages to reconstruct the evolution of the Paired-related homeobox (Prrx) transcription factor gene family, which is involved in the development of mesoderm and neural crest-derived mesenchyme. We show that for Prrx genes, the spotted gar genome and gene expression patterns mimic mammals better than teleosts do. Analyses force the seemingly paradoxical conclusion that regulatory mechanisms for the limb expression domains of Prrx genes existed before the evolution of paired appendages. Detailed evolutionary analyses like those reported here are required to identify fish species most similar to the human genome to optimally connect fish models to human gene functions in health and disease. PMID:24486528

  16. A Novel Cyanophage with a Cyanobacterial Nonbleaching Protein A Gene in the Genome

    PubMed Central

    Gao, E-Bin; Gui, Jian-Fang

    2012-01-01

    A cyanophage, PaV-LD, has been isolated from harmful filamentous cyanobacterium Planktothrix agardhii in Lake Donghu, a shallow freshwater lake in China. Here, we present the cyanophage's genomic organization and major structural proteins. The genome is a 95,299-bp-long, linear double-stranded DNA and contains 142 potential genes. BLAST searches revealed 29 proteins of known function in cyanophages, cyanobacteria, or bacteria. Thirteen major structural proteins ranging in size from 27 kDa to 172 kDa were identified by SDS-PAGE and mass-spectrometric analysis. The genome lacks major genes that are necessary to the tail structure, and the tailless PaV-LD has been confirmed by an electron microscopy comparison with other tail cyanophages and phages. Phylogenetic analysis of the major capsid proteins also reveals an independent branch of PaV-LD that is quite different from other known tail cyanophages and phages. Moreover, the unique genome carries a nonbleaching protein A (NblA) gene (open reading frame [ORF] 022L), which is present in all phycobilisome-containing organisms and mediates phycobilisome degradation. Western blot detection confirmed that 022L was expressed after PaV-LD infection in the host filamentous cyanobacterium. In addition, its appearance was companied by a significant decline of phycocyanobilin content and a color change of the cyanobacterial cells from blue-green to yellow-green. The biological function of PaV-LD nblA was further confirmed by expression in a model cyanobacterium via an integration platform, by spectroscopic analysis and electron microscopy observation. The data indicate that PaV-LD is an exceptional cyanophage of filamentous cyanobacteria, and this novel cyanophage will also provide us with a new vision of the cyanophage-host interactions. PMID:22031930

  17. Genome Enabled Discovery of Carbon Sequestration Genes in Poplar

    SciTech Connect

    Filichkin, Sergei; Etherington, Elizabeth; Ma, Caiping; Strauss, Steve

    2007-02-22

    The goals of the S.H. Strauss laboratory portion of 'Genome-enabled discovery of carbon sequestration genes in poplar' are (1) to explore the functions of candidate genes using Populus transformation by inserting genes provided by Oakridge National Laboratory (ORNL) and the University of Florida (UF) into poplar; (2) to expand the poplar transformation toolkit by developing transformation methods for important genotypes; and (3) to allow induced expression, and efficient gene suppression, in roots and other tissues. As part of the transformation improvement effort, OSU developed transformation protocols for Populus trichocarpa 'Nisqually-1' clone and an early flowering P. alba clone, 6K10. Complete descriptions of the transformation systems were published (Ma et. al. 2004, Meilan et. al 2004). Twenty-one 'Nisqually-1' and 622 6K10 transgenic plants were generated. To identify root predominant promoters, a set of three promoters were tested for their tissue-specific expression patterns in poplar and in Arabidopsis as a model system. A novel gene, ET304, was identified by analyzing a collection of poplar enhancer trap lines generated at OSU (Filichkin et. al 2006a, 2006b). Other promoters include the pGgMT1 root-predominant promoter from Casuarina glauca and the pAtPIN2 promoter from Arabidopsis root specific PIN2 gene. OSU tested two induction systems, alcohol- and estrogen-inducible, in multiple poplar transgenics. Ethanol proved to be the more efficient when tested in tissue culture and greenhouse conditions. Two estrogen-inducible systems were evaluated in transgenic Populus, neither of which functioned reliably in tissue culture conditions. GATEWAY-compatible plant binary vectors were designed to compare the silencing efficiency of homologous (direct) RNAi vs. heterologous (transitive) RNAi inverted repeats. A set of genes was targeted for post transcriptional silencing in the model Arabidopsis system; these include the floral meristem identity gene (APETALA1 or

  18. Gene targeting, genome editing: from Dolly to editors.

    PubMed

    Tan, Wenfang; Proudfoot, Chris; Lillico, Simon G; Whitelaw, C Bruce A

    2016-06-01

    One of the most powerful strategies to investigate biology we have as scientists, is the ability to transfer genetic material in a controlled and deliberate manner between organisms. When applied to livestock, applications worthy of commercial venture can be devised. Although initial methods used to generate transgenic livestock resulted in random transgene insertion, the development of SCNT technology enabled homologous recombination gene targeting strategies to be used in livestock. Much has been accomplished using this approach. However, now we have the ability to change a specific base in the genome without leaving any other DNA mark, with no need for a transgene. With the advent of the genome editors this is now possible and like other significant technological leaps, the result is an even greater diversity of possible applications. Indeed, in merely 5 years, these 'molecular scissors' have enabled the production of more than 300 differently edited pigs, cattle, sheep and goats. The advent of genome editors has brought genetic engineering of livestock to a position where industry, the public and politicians are all eager to see real use of genetically engineered livestock to address societal needs. Since the first transgenic livestock reported just over three decades ago the field of livestock biotechnology has come a long way-but the most exciting period is just starting. PMID:26847670

  19. Molecular structure of the Menkes disease gene (ATP7A)

    SciTech Connect

    Dierick, H.A.; Glover, T.W.; Ambrosini, L.

    1995-08-10

    We report a detailed molecular analysis of the genomic structure of the Menkes disease gene (MNK; ATP7A). There are 23 exons in ATP7A covering a genomic region of approximately 140 kb. The size of the individual coding exons varies between 77 and 726 bp, and introns vary in size between 196 bp and approximately 60 kb. All of the splice sites obey the consensus GT-AG rule except the splice donor of intron 9, which is GC instead of GT. The exon following this rare splice donor variant is alternatively spliced. A PGAM pseudogene and two highly polymorphic CA repeats map to introns within the gene. The structure is very similar to that of the closely related Wilson disease gene (WND; ATP7B). From exon 5 (exon 3 in ATP7B) to the end, all of the splice sites occur at exactly the same nucleotide positions as in the WND gene, except for the boundary between exons 17 and 18 (exons 15 and 16 in ATP7B) and a single codon difference at the boundary between exons 4 and 5 of the MNK gene (exons 2 and 3 in ATP7B). In contrast to the WND gene, in which the first four of six metal binding domains are contained in 1 exon, metal binding domains 1 to 4 are divided over 3 exons. The striking similarity of the MNK and WND genes at the genomic level is consistent with their relatively recent divergence from a common ancestral gene. 39 refs., 4 figs., 1 tab.

  20. Genomic organization and evolution of immunoglobulin kappa gene enhancers and kappa deleting element in mammals

    PubMed Central

    Das, Sabyasachi; Nikolaidis, Nikolas; Nei, Masatoshi

    2009-01-01

    We have studied the genomic structure and evolutionary pattern of immunoglobulin kappa deleting element (KDE) and three kappa enhancers (KE5′, KE3′P, and KE3′D) in eleven mammalian genomic sequences. Our results show that the relative positions and the genomic organization of the KDE and the kappa enhancers are conserved in all mammals studied and have not been affected by the local rearrangements in the immunoglobulin kappa (IGK) light chain locus over a long evolutionary time (∼120 million years of mammalian evolution). Our observations suggest that the sequence motifs in these regulatory elements have been conserved by purifying selection to achieve proper regulation of the expression of the IGK light chain genes. The conservation of the three enhancers in all mammals indicates that these species may use similar mechanisms to regulate IGK gene expression. However, some activities of the IGK enhancers might have evolved in the eutherian lineage. The presence of the three IGK enhancers, KDE, and other recombining elements (REs) in all mammals (including platypus) suggest that these genomic elements were in place before the mammalian radiation. PMID:19560204

  1. Full-genome identification and characterization of NBS-encoding disease resistance genes in wheat.

    PubMed

    Bouktila, Dhia; Khalfallah, Yosra; Habachi-Houimli, Yosra; Mezghani-Khemakhem, Maha; Makni, Mohamed; Makni, Hanem

    2015-02-01

    on the three wheat sub-genomes. In contrast, at chromosome scale, 50 % of members of this gene family were localized on 6 of the 21 wheat chromosomes and ~22 % of them were localized on homeologous group 7. The results of this study provide a detailed analysis of the largest family of plant disease resistance genes in allohexaploid wheat. Some structural traits reported had not been previously identified and the genome-derived data were confronted with those stored in databases outlining the functional specialization of members of this family. The large reservoir of NBS-type genes presented and discussed will, firstly, form an important framework for marker-assisted improvement of resistance in wheat, and, secondly, open up new perspectives for a better understanding of the evolution dynamics of this gene family in grass species and in polyploid systems. PMID:25231182

  2. Genome analysis of DNA repair genes in the alpha proteobacterium Caulobacter crescentus

    PubMed Central

    Martins-Pinheiro, Marinalva; Marques, Regina CP; Menck, Carlos FM

    2007-01-01

    Background The integrity of DNA molecules is fundamental for maintaining life. The DNA repair proteins protect organisms against genetic damage, by removal of DNA lesions or helping to tolerate them. DNA repair genes are best known from the gamma-proteobacterium Escherichia coli, which is the most understood bacterial model. However, genome sequencing raises questions regarding uniformity and ubiquity of these DNA repair genes and pathways, reinforcing the need for identifying genes and proteins, which may respond to DNA damage in other bacteria. Results In this study, we employed a bioinformatic approach, to analyse and describe the open reading frames potentially related to DNA repair from the genome of the alpha-proteobacterium Caulobacter crescentus. This was performed by comparison with known DNA repair related genes found in public databases. As expected, although C. crescentus and E. coli bacteria belong to separate phylogenetic groups, many of their DNA repair genes are very similar. However, some important DNA repair genes are absent in the C. crescentus genome and other interesting functionally related gene duplications are present, which do not occur in E. coli. These include DNA ligases, exonuclease III (xthA), endonuclease III (nth), O6-methylguanine-DNA methyltransferase (ada gene), photolyase-like genes, and uracil-DNA-glycosylases. On the other hand, the genes imuA and imuB, which are involved in DNA damage induced mutagenesis, have recently been described in C. crescentus, but are absent in E. coli. Particularly interesting are the potential atypical phylogeny of one of the photolyase genes in alpha-proteobacteria, indicating an origin by horizontal transfer, and the duplication of the Ada orthologs, which have diverse structural configurations, including one that is still unique for C. crescentus. Conclusion The absence and the presence of certain genes are discussed and predictions are made considering the particular aspects of the C. crescentus

  3. Three-dimensional Structure of a Viral Genome-delivery Portal Vertex

    SciTech Connect

    A Olia; P Prevelige Jr.; J Johnson; G Cingolani

    2011-12-31

    DNA viruses such as bacteriophages and herpesviruses deliver their genome into and out of the capsid through large proteinaceous assemblies, known as portal proteins. Here, we report two snapshots of the dodecameric portal protein of bacteriophage P22. The 3.25-{angstrom}-resolution structure of the portal-protein core bound to 12 copies of gene product 4 (gp4) reveals a {approx}1.1-MDa assembly formed by 24 proteins. Unexpectedly, a lower-resolution structure of the full-length portal protein unveils the unique topology of the C-terminal domain, which forms a {approx}200-{angstrom}-long {alpha}-helical barrel. This domain inserts deeply into the virion and is highly conserved in the Podoviridae family. We propose that the barrel domain facilitates genome spooling onto the interior surface of the capsid during genome packaging and, in analogy to a rifle barrel, increases the accuracy of genome ejection into the host cell.

  4. Genome-Wide Identification, Characterization and Expression Analysis of the TCP Gene Family in Prunus mume

    PubMed Central

    Zhou, Yuzhen; Xu, Zongda; Zhao, Kai; Yang, Weiru; Cheng, Tangren; Wang, Jia; Zhang, Qixiang

    2016-01-01

    TCP proteins, belonging to a plant-specific transcription factors family, are known to have great functions in plant development, especially flower and leaf development. However, there is little information about this gene family in Prunus mume, which is widely cultivated in China as an ornamental and fruit tree. Here a genome-wide analysis of TCP genes was performed to explore their evolution in P. mume. Nineteen PmTCPs were identified and three of them contained putative miR319 target sites. Phylogenetic and comprehensive bioinformatics analyses of these genes revealed that different types of TCP genes had undergone different evolutionary processes and the genes in the same clade had similar chromosomal location, gene structure, and conserved domains. Expression analysis of these PmTCPs indicated that there were diverse expression patterns among different clades. Most TCP genes were predominantly expressed in flower, leaf, and stem, and showed high expression levels in the different stages of flower bud differentiation, especially in petal formation stage and gametophyte development. Genes in TCP-P subfamily had main roles in both flower development and gametophyte development. The CIN genes in double petal cultivars might have key roles in the formation of petal, while they were correlated with gametophyte development in the single petal cultivar. The CYC/TB1 type genes were highly detected in the formation of petal and pistil. The less-complex flower types of P. mume might result from the fact that there were only two CYC type genes present in P. mume and a lack of CYC2 genes to control the identity of flower types. These results lay the foundation for further study on the functions of TCP genes during flower development.

  5. Construction of Genetic Linkage Maps and Comparative Genome Analysis of Catfish Using Gene-Associated Markers

    PubMed Central

    Kucuktas, Huseyin; Wang, Shaolin; Li, Ping; He, Chongbo; Xu, Peng; Sha, Zhenxia; Liu, Hong; Jiang, Yanliang; Baoprasertkul, Puttharat; Somridhivej, Benjaporn; Wang, Yaping; Abernathy, Jason; Guo, Ximing; Liu, Lei; Muir, William; Liu, Zhanjiang

    2009-01-01

    A genetic linkage map of the channel catfish genome (N = 29) was constructed using EST-based microsatellite and single nucleotide polymorphism (SNP) markers in an interspecific reference family. A total of 413 microsatellites and 125 SNP markers were polymorphic in the reference family. Linkage analysis using JoinMap 4.0 allowed mapping of 331 markers (259 microsatellites and 72 SNPs) to 29 linkage groups. Each linkage group contained 3–18 markers. The largest linkage group contained 18 markers and spanned 131.2 cM, while the smallest linkage group contained 14 markers and spanned only 7.9 cM. The linkage map covered a genetic distance of 1811 cM with an average marker interval of 6.0 cM. Sex-specific maps were also constructed; the recombination rate for females was 1.6 times higher than that for males. Putative conserved syntenies between catfish and zebrafish, medaka, and Tetraodon were established, but the overall levels of genome rearrangements were high am