Science.gov

Sample records for gene genomic structure

  1. Genomic structure of the human caldesmon gene.

    PubMed Central

    Hayashi, K; Yano, H; Hashida, T; Takeuchi, R; Takeda, O; Asada, K; Takahashi, E; Kato, I; Sobue, K

    1992-01-01

    The high molecular weight caldesmon (h-CaD) is predominantly expressed in smooth muscles, whereas the low molecular weight caldesmon (l-CaD) is widely distributed in nonmuscle tissues and cells. The changes in CaD isoform expression are closely correlated with the phenotypic modulation of smooth muscle cells. During a search for isoform diversity of human CaDs, l-CaD cDNAs were cloned from HeLa S3 cells. HeLa l-CaD I is composed of 558 amino acids, whereas 26 amino acids (residues 202-227 for HeLa l-CaD I) are deleted in HeLa l-CaD II. The short amino-terminal sequence of HeLa l-CaDs is different from that of fibroblast (WI-38) l-CaD II and human aorta h-CaD. We have also identified WI-38 l-CaD I, which contains a 26-amino acid insertion relative to WI-38 l-CaD II. To reveal the molecular events of the expressional regulation of the CaD isoforms, the genomic structure of the human CaD gene was determined. The human CaD gene is composed of 14 exons and was mapped to a single locus, 7q33-q34. The 26-amino acid insertion is encoded in exon 4 and is specifically spliced in the mRNAs for both h-CaD and l-CaDs I. Exon 3 is the exon that encodes the central repeating domain specific to h-CaD (residues 208-436) together with the common domain in all CaD (residues 73-207 for h-CaD and WI-38 l-CaDs, and residues 68-201 for HeLa l-CaDs). The regulation of h- and l-CaD expression is thought to depend on selection of the two 5' splice sites within exon 3. Thus, the change in expression between l-CaD and h-CaD might be caused by this splicing pathway. Images PMID:1465449

  2. Genome Editing of Structural Variations: Modeling and Gene Correction.

    PubMed

    Park, Chul-Yong; Sung, Jin Jea; Kim, Dong-Wook

    2016-07-01

    The analysis of chromosomal structural variations (SVs), such as inversions and translocations, was made possible by the completion of the human genome project and the development of genome-wide sequencing technologies. SVs contribute to genetic diversity and evolution, although some SVs can cause diseases such as hemophilia A in humans. Genome engineering technology using programmable nucleases (e.g., ZFNs, TALENs, and CRISPR/Cas9) has been rapidly developed, enabling precise and efficient genome editing for SV research. Here, we review advances in modeling and gene correction of SVs, focusing on inversion, translocation, and nucleotide repeat expansion. PMID:27016031

  3. Recognizing genes and other components of genomic structure

    SciTech Connect

    Burks, C. ); Myers, E. . Dept. of Computer Science); Stormo, G.D. . Dept. of Molecular, Cellular and Developmental Biology)

    1991-01-01

    The Aspen Center for Physics (ACP) sponsored a three-week workshop, with 26 scientists participating, from 28 May to 15 June, 1990. The workshop, entitled Recognizing Genes and Other Components of Genomic Structure, focussed on discussion of current needs and future strategies for developing the ability to identify and predict the presence of complex functional units on sequenced, but otherwise uncharacterized, genomic DNA. We addressed the need for computationally-based, automatic tools for synthesizing available data about individual consensus sequences and local compositional patterns into the composite objects (e.g., genes) that are -- as composite entities -- the true object of interest when scanning DNA sequences. The workshop was structured to promote sustained informal contact and exchange of expertise between molecular biologists, computer scientists, and mathematicians. No participant stayed for less than one week, and most attended for two or three weeks. Computers, software, and databases were available for use as electronic blackboards'' and as the basis for collaborative exploration of ideas being discussed and developed at the workshop. 23 refs., 2 tabs.

  4. Rapid evolution and complex structural organization in genomic regions harboring multiple prolamin genes in the polyploid wheat genome

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genes encoding wheat prolamins belong to complicated multi-gene families in the wheat genome. To understand the structural complexity of storage protein loci, we sequenced and analyzed orthologous regions containing both gliadin and LMW-glutenin genes from the A and B genomes of a tetraploid wheat ...

  5. The Mitochondrial Genome of Soybean Reveals Complex Genome Structures and Gene Evolution at Intercellular and Phylogenetic Levels

    PubMed Central

    Chang, Shengxin; Wang, Yankun; Lu, Jiangjie; Gai, Junyi; Li, Jijie; Chu, Pu; Guan, Rongzhan; Zhao, Tuanjie

    2013-01-01

    Determining mitochondrial genomes is important for elucidating vital activities of seed plants. Mitochondrial genomes are specific to each plant species because of their variable size, complex structures and patterns of gene losses and gains during evolution. This complexity has made research on the soybean mitochondrial genome difficult compared with its nuclear and chloroplast genomes. The present study helps to solve a 30-year mystery regarding the most complex mitochondrial genome structure, showing that pairwise rearrangements among the many large repeats may produce an enriched molecular pool of 760 circles in seed plants. The soybean mitochondrial genome harbors 58 genes of known function in addition to 52 predicted open reading frames of unknown function. The genome contains sequences of multiple identifiable origins, including 6.8 kb and 7.1 kb DNA fragments that have been transferred from the nuclear and chloroplast genomes, respectively, and some horizontal DNA transfers. The soybean mitochondrial genome has lost 16 genes, including nine protein-coding genes and seven tRNA genes; however, it has acquired five chloroplast-derived genes during evolution. Four tRNA genes, common among the three genomes, are derived from the chloroplast. Sizeable DNA transfers to the nucleus, with pericentromeric regions as hotspots, are observed, including DNA transfers of 125.0 kb and 151.6 kb identified unambiguously from the soybean mitochondrial and chloroplast genomes, respectively. The soybean nuclear genome has acquired five genes from its mitochondrial genome. These results provide biological insights into the mitochondrial genome of seed plants, and are especially helpful for deciphering vital activities in soybean. PMID:23431381

  6. Comparative Genomics of Sibling Fungal Pathogenic Taxa Identifies Adaptive Evolution without Divergence in Pathogenicity Genes or Genomic Structure

    PubMed Central

    Sillo, Fabiano; Garbelotto, Matteo; Friedman, Maria; Gonthier, Paolo

    2015-01-01

    It has been estimated that the sister plant pathogenic fungal species Heterobasidion irregulare and Heterobasidion annosum may have been allopatrically isolated for 34–41 Myr. They are now sympatric due to the introduction of the first species from North America into Italy, where they freely hybridize. We used a comparative genomic approach to 1) confirm that the two species are distinct at the genomic level; 2) determine which gene groups have diverged the most and the least between species; 3) show that their overall genomic structures are similar, as predicted by the viability of hybrids, and identify genomic regions that instead are incongruent; and 4) test the previously formulated hypothesis that genes involved in pathogenicity may be less divergent between the two species than genes involved in saprobic decay and sporulation. Results based on the sequencing of three genomes per species identified a high level of interspecific similarity, but clearly confirmed the status of the two as distinct taxa. Genes involved in pathogenicity were more conserved between species than genes involved in saprobic growth and sporulation, corroborating at the genomic level that invasiveness may be determined by the two latter traits, as documented by field and inoculation studies. Additionally, the majority of genes under positive selection and the majority of genes bearing interspecific structural variations were involved either in transcriptional or in mitochondrial functions. This study provides genomic-level evidence that invasiveness of pathogenic microbes can be attained without the high levels of pathogenicity presumed to exist for pathogens challenging naïve hosts. PMID:26527650

  7. Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation.

    PubMed

    Sharma, Virag; Elghafari, Anas; Hiller, Michael

    2016-06-20

    Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes. PMID:27016733

  8. Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation

    PubMed Central

    Sharma, Virag; Elghafari, Anas; Hiller, Michael

    2016-01-01

    Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes. PMID:27016733

  9. Structural features of conopeptide genes inferred from partial sequences of the Conus tribblei genome.

    PubMed

    Barghi, Neda; Concepcion, Gisela P; Olivera, Baldomero M; Lluisma, Arturo O

    2016-02-01

    The evolvability of venom components (in particular, the gene-encoded peptide toxins) in venomous species serves as an adaptive strategy allowing them to target new prey types or respond to changes in the prey field. The structure, organization, and expression of the venom peptide genes may provide insights into the molecular mechanisms that drive the evolution of such genes. Conus is a particularly interesting group given the high chemical diversity of their venom peptides, and the rapid evolution of the conopeptide-encoding genes. Conus genomes, however, are large and characterized by a high proportion of repetitive sequences. As a result, the structure and organization of conopeptide genes have remained poorly known. In this study, a survey of the genome of Conus tribblei was undertaken to address this gap. A partial assembly of C. tribblei genome was generated; the assembly, though consisting of a large number of fragments, accounted for 2160.5 Mb of sequence. A large number of repetitive genomic elements consisting of 642.6 Mb of retrotransposable elements, simple repeats, and novel interspersed repeats were observed. We characterized the structural organization and distribution of conotoxin genes in the genome. A significant number of conopeptide genes (estimated to be between 148 and 193) belonging to different superfamilies with complete or nearly complete exon regions were observed, ~60 % of which were expressed. The unexpressed conopeptide genes represent hidden but significant conotoxin diversity. The conotoxin genes also differed in the frequency and length of the introns. The interruption of exons by long introns in the conopeptide genes and the presence of repeats in the introns may indicate the importance of introns in facilitating recombination, evolution and diversification of conotoxins. These findings advance our understanding of the structural framework that promotes the gene-level molecular evolution of venom peptides. PMID:26423067

  10. Dinoflagellate Gene Structure and Intron Splice Sites in a Genomic Tandem Array.

    PubMed

    Mendez, Gregory S; Delwiche, Charles F; Apt, Kirk E; Lippmeier, J Casey

    2015-01-01

    Dinoflagellates are one of the last major lineages of eukaryotes for which little is known about genome structure and organization. We report here the sequence and gene structure of a clone isolated from a cosmid library which, to our knowledge, represents the largest contiguously sequenced, dinoflagellate genomic, tandem gene array. These data, combined with information from a large transcriptomic library, allowed a high level of confidence of every base pair call. This degree of confidence is not possible with PCR-based contigs. The sequence contains an intron-rich set of five highly expressed gene repeats arranged in tandem. One of the tandem repeat gene members contains an intron 26,372 bp long. This study characterizes a splice site consensus sequence for dinoflagellate introns. Two to nine base pairs around the 3' splice site are repeated by an identical two to nine base pairs around the 5' splice site. The 5' and 3' splice sites are in the same locations within each repeat so that the repeat is found only once in the mature mRNA. This identically repeated intron boundary sequence might be useful in gene modeling and annotation of genomes. PMID:25963315

  11. Physical mapping and genomic structure of the human TNFR2 gene

    SciTech Connect

    Beltinger, C.P.; White, P.S.; Maris, J.M.

    1996-07-01

    The tumor necrosis factor receptor 2 (TNFR2) gene localizes to 1p36.2, a genomic region characteristically deleted in neuroblastomas and other malignancies. In addition, TNFR2 is the principal mediator of the effects of TNF on cellular immunity, and it may cooperate with TNFR1 in the killing of nonlymphoid cells. Therefore, we undertook an analysis of the genomic structure and precise physical mapping of this gene. The TNFR2 gene is contained on 10 exons that span 26 kb. Most of the functional domains of TNFR2 are encoded by separate exons, and each of the repeats of the extracellular cysteine-rich domain is interrupted by an intron. The genomic structure reveals a close relationship to TNFR1, another member of the TNFR superfamily. Based on electrophoretic analysis of yeast artificial chromosomes, TNFR2 maps within 400 kb of the genetic marker D1S434. In addition, we have identified a new polymorphic dinucleotide repeat within intron 4 of TNFR2. The genetic sequence information and exon-intron boundaries we have determined will facilitate mutational analysis of this gene to determine its potential role in neuroblastoma, as well as in other cancers with characteristic deletions or rearrangements of 1p36. 52 refs., 3 figs., 1 tab.

  12. Structural Genomics: From Genes to Structures With Valuable Materials And Many Questions in Between

    SciTech Connect

    Fox, B.G.; Goulding, C.; Malkowski, M.G.; Stewart, L.; Deacon, A.; /SLAC, SSRL

    2009-04-30

    The Protein Structure Initiative (PSI), funded by the US National Institutes of Health (NIH), provides a framework for the development and systematic evaluation of methods to solve protein structures. Although the PSI and other structural genomics efforts around the world have led to the solution of many new protein structures as well as the development of new methods, methodological bottlenecks still exist and are being addressed in this 'production phase' of PSI.

  13. The genomic structure of human BTK, the defective gene in X-linked agammaglobulinemia

    SciTech Connect

    Rohrer, J.; Parolini, O.; Conley, M.E. |; Belmont, J.W.

    1994-12-31

    It has recently been demonstrated that mutations in the gene for Bruton`s tyrosine kinase (BTK) are responsible for X-linked agammaglobulinemia. Southern blot analysis and sequencing of cDNA were used to document deletions, insertions, and single base pair substitutions. To facilitate analysis of BTK regulation and to permit the development of assays that could be used to screen genomic DNA for mutations in BTK, the authors determined the genomic organization of this gene. Subcloning of a cosmid and a yeast artificial chromosome showed that BTK is divided into 19 exons spanning 37 kilobases of genomic DNA. Analysis of the region 5{prime} to the first untranslated exon revealed no consensus TATAA or CAAT boxes; however, three retinoic acid binding sites were identified in this region. Comparison of the structure of BTK with that of other nonreceptor tyrosine kinases, including SRC, FES, and CSK, demonstrated a lack of conservation of exon borders. Information obtained in this study will contribute to understanding of the evolution of nonreceptor tyrosine kinases. It will also be useful in diagnostic studies, including carrier detection, and in studies directed towards gene therapy or gene replacement. 29 refs., 2 figs., 2 tabs.

  14. BIOINFORMATIC INTEGRATION OF STRUCTURAL AND FUNCTIONAL GENOMICS DATA ACROSS SPECIES TO DEVELOP PORCINE INFLAMMATORY GENE REGULATORY PATHWAY INFORMATION

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Integration of structural and functional genomic data across species holds great promise in finding genes controlling disease resistance. We are investigating the porcine gut immune response to infection through gene expression profiling. We have collected porcine Affymetrix GeneChip data from RNA ...

  15. Computational Integration Of Structural And Functional Genomics Data Across Species To Develop Porcine Inflammatory Gene Regulatory Pathway Information

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Comparative integration of structural and functional genomic data across species holds great promise in finding genes controlling disease resistance. We are investigating the porcine gut immune response to infection through gene expression profiling. We have collected porcine Affymetrix GeneChip da...

  16. Computational Integration of Structural and Functional Genomics Data Across Species to Develop Information on Porcine Inflammatory Gene Regulatory Pathway

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Comparative integration of structural and functional genomic data across species holds great promise in finding genes controlling disease resistance. We are investigating the porcine gut immune response to infection through gene expression profiling. We have collected porcine Affymetrix GeneChip da...

  17. Genomic structure of the human D-site binding protein (DBP) gene

    SciTech Connect

    Shutler, G.; Glassco, T.; Kang, Xiaolin

    1996-06-15

    The human gene for the D-Site Binding Protein (DBP) has been sequenced and characterized. This gene is a member of the b/ZIP family of transcription factors and is one of three genes forming the PAR sub-family. DBP has been implicated in the diurnal regulation of a variety of liver-specific genes. Examination of the genomic structure of DBP reveals that the gene is divided into four exons and is contained within a relatively compact region of approximately 6 kb. These exons appear to correspond to functional divisions the DBP protein. Exon 1 contains a long 5{prime} UTR, and conservation between the rat and the human genes of the presence of small open reading frames within this region suggests that is may play a role in translational control. Exon 2 contains a limited region of similarity to the other PAR domain genes, which may be part of a potential activation domain. Exon 3 contains the PAR domain and differs by only 1 of 71 amino acids between rat and human. Exon 4, containing both the basic and the leucine zipper domains, is likewise highly conserved. The overall degree of homology between the rat and the human cDNA sequences is 82% for the nucleic acid sequence and 92% for the protein sequence. comparison of the rat and human proximal promoters reveals extensive sequence conservation, with two previously characterized DNA binding sites being conserved at the functional and sequence levels. 31 refs., 4 figs.

  18. Genomic structure and expression of STM2, the chromosome 1 familial Alzheimer disease gene

    SciTech Connect

    Levy-Lahad, E.; Wang, Kai; Fu, Ying Hui

    1996-06-01

    Mutations in the gene STM2 result in autosomal dominant familial Alzheimer disease. To screen for mutations and to identify regulatory elements for this gene, the genomic DNA sequence and intron-exon structure were determined. Twelve exons including 10 coding exons were identified in a genomic region spanning 23, 737 bp. The first 2 exons encode the 5{prime}-untranslated region. Expression analysis of STM2 indicates that two transcripts of 2.4 and 2.8 kb are found in skeletal muscle, pancreas, and heart. In addition, a splice variant of the 2.4-kb transcript was identified that is the result of the use of an alternative splice acceptor site located in exon 10. The use of this site results in a transcript lacking a single glutamate. The promotor for this gene and the alternatively spliced exons leading to the 2.8-kb form of the gene remain to be identified. Expression of STM2 was high in skeletal muscle and pancreas, with comparatively low levels observed in brain. This expression pattern is intriguing since in Alzheimer disease, pathology and degeneration are observed only in the central nervous system. 19 refs., 2 figs., 3 tabs.

  19. The population genomics of begomoviruses: global scale population structure and gene flow

    PubMed Central

    2010-01-01

    Background The rapidly growing availability of diverse full genome sequences from across the world is increasing the feasibility of studying the large-scale population processes that underly observable pattern of virus diversity. In particular, characterizing the genetic structure of virus populations could potentially reveal much about how factors such as geographical distributions, host ranges and gene flow between populations combine to produce the discontinuous patterns of genetic diversity that we perceive as distinct virus species. Among the richest and most diverse full genome datasets that are available is that for the dicotyledonous plant infecting genus, Begomovirus, in the Family Geminiviridae. The begomoviruses all share the same whitefly vector, are highly recombinogenic and are distributed throughout tropical and subtropical regions where they seriously threaten the food security of the world's poorest people. Results We focus here on using a model-based population genetic approach to identify the genetically distinct sub-populations within the global begomovirus meta-population. We demonstrate the existence of at least seven major sub-populations that can further be sub-divided into as many as thirty four significantly differentiated and genetically cohesive minor sub-populations. Using the population structure framework revealed in the present study, we further explored the extent of gene flow and recombination between genetic populations. Conclusions Although geographical barriers are apparently the most significant underlying cause of the seven major population sub-divisions, within the framework of these sub-divisions, we explore patterns of gene flow to reveal that both host range differences and genetic barriers to recombination have probably been major contributors to the minor population sub-divisions that we have identified. We believe that the global Begomovirus population structure revealed here could facilitate population genetics studies

  20. The human BARX2 gene: genomic structure, chromosomal localization, and single nucleotide polymorphisms.

    PubMed

    Hjalt, T A; Murray, J C

    1999-12-15

    The BARX genes 1 and 2 are Bar class homeobox genes expressed in craniofacial structures during development. In this report, we present the genomic structure, chromosomal localization, and polymorphic markers in BARX2. The gene has four exons, ranging in size from 85 to 1099 bp. BARX2 is localized on human chromosome 11q25, as determined by radiation hybrid mapping. In the mouse, Barx2 is coexpressed with Pitx2 in several tissues. Based on the coexpression, BARX2 was assumed to be a candidate gene for those cases of Rieger syndrome that cannot be associated with mutations of PITX2. Mutations in PITX2 cause some cases of Rieger syndrome, an autosomal dominant disorder affecting eyes, teeth, and umbilicus. DNA from Rieger patients was subjected to single-strand conformation polymorphism screening of the BARX2 coding region. Three single nucleotide polymorphisms were found in a normal population, although no etiologic mutations were detectable in over 100 cases of Rieger syndrome or in individuals with related ocular disorders. PMID:10644443

  1. Pseudoscorpion mitochondria show rearranged genes and genome-wide reductions of RNA gene sizes and inferred structures, yet typical nucleotide composition bias

    PubMed Central

    2012-01-01

    Background Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes. Results We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic analyses of all 13

  2. Analysis of the murine Dtk gene identifies conservation of genomic structure within a new receptor tyrosine kinase subfamily

    SciTech Connect

    Lewis, P.M.; Crosier, K.E.; Crosier, P.S.

    1996-01-01

    The receptor tyrosine kinase Dtk/Tyro 3/Sky/rse/brt/tif is a member of a new subfamily of receptors that also includes Axl/Ufo/Ark and Eyk/Mer. These receptors are characterized by the presence of two immunoglobulin-like loops and two fibronectin type III repeats in their extracellular domains. The structure of the murine Dtk gene has been determined. The gene consists of 21 exons that are distributed over 21 kb of genomic DNA. An isoform of Dtk is generated by differential splicing of exons from the 5{prime} region of the gene. The overall genomic structure of Dtk is virtually identical to that determined for the human UFO gene. This particular genomic organization is likely to have been duplicated and closely maintained throughout evolution. 38 refs., 3 figs., 1 tab.

  3. The genomic structure of the human Charcot-Leyden crystal protein gene is analogous to those of the galectin genes

    SciTech Connect

    Dyer, K.D. |; Handen, J.S.; Rosenberg, H.F.

    1997-03-01

    The Charcot-Leyden crystal (CLC) protein, or eosinophil lysophospholipase, is a characteristic protein of human eosinophils and basophils; recent work has demonstrated that the CLC protein is both structurally and functionally related to the galectin family of {beta}-galactoside binding proteins. The galectins as a group share a number of features in common, including a linear ligand binding site encoded on a single exon. In this work, we demonstrate that the intron-exon structure of the gene encoding CLC is analogous to those encoding the galectins. The coding sequence of the CLC gene is divided into four exons, with the entire {beta}-galactoside binding site encoded by exon III. We have isolated CLC {beta}-galactoside binding sites from both orangutan (Pongo pygmaeus) and murine (Mus musculus) genomic DNAs, both encoded on single exons, and noted conservation of the amino acids shown to interact directly with the {beta}-galactoside ligand. The most likely interpretation of these results suggests the occurrence of one or more exon duplication and insertion events, resulting in the distribution of this lectin domain to CLC as well as to the multiple galectin genes. 35 refs., 3 figs.

  4. The mouse formin (Fmn) gene: Genomic structure, novel exons, and genetic mapping

    SciTech Connect

    Wang, C.C.; Chan, D.C.; Leder, P.

    1997-02-01

    Mutations in the mouse formin (Fmn) gene, formerly known as the limb deformity (ld) gene, give rise to recessively inherited limb deformities and renal malformations or aplasia. The Fmn gene encodes many differentially processed transcripts that are expressed in both adult and embryonic tissues. To study the genomic organization of the Fmn locus, we have used Fmn probes to isolate and characterize genomic clones spanning 500 kb. Our analysis of these clones shows that the Fmn gene is composed of at least 24 exons and spans 400 kb. We have identified two novel exons that are expressed in the developing embryonic limb bud as well as adult tissues such as brain and kidney. We have also used a microsatellite polymorphism from within the Fmn gene to map it genetically to a 2.2-cM interval between D2Mit58 and D2Mit103. 36 refs., 6 figs., 1 tab.

  5. Structural Variants in the Soybean Genome Localize to Clusters of Biotic Stress-Response Genes1[W][OA

    PubMed Central

    McHale, Leah K.; Haun, William J.; Xu, Wayne W.; Bhaskar, Pudota B.; Anderson, Justin E.; Hyten, David L.; Gerhardt, Daniel J.; Jeddeloh, Jeffrey A.; Stupar, Robert M.

    2012-01-01

    Genome-wide structural and gene content variations are hypothesized to drive important phenotypic variation within a species. Structural and gene content variations were assessed among four soybean (Glycine max) genotypes using array hybridization and targeted resequencing. Many chromosomes exhibited relatively low rates of structural variation (SV) among genotypes. However, several regions exhibited both copy number and presence-absence variation, the most prominent found on chromosomes 3, 6, 7, 16, and 18. Interestingly, the regions most enriched for SV were specifically localized to gene-rich regions that harbor clustered multigene families. The most abundant classes of gene families associated with these regions were the nucleotide-binding and receptor-like protein classes, both of which are important for plant biotic defense. The colocalization of SV with plant defense response signal transduction pathways provides insight into the mechanisms of soybean resistance gene evolution and may inform the development of new approaches to resistance gene cloning. PMID:22696021

  6. Genome structure drives patterns of gene family evolution in ciliates, a case study using Chilodonella uncinata (Protista, Ciliophora, Phyllopharyngea).

    PubMed

    Gao, Feng; Song, Weibo; Katz, Laura A

    2014-08-01

    In most lineages, diversity among gene family members results from gene duplication followed by sequence divergence. Because of the genome rearrangements during the development of somatic nuclei, gene family evolution in ciliates involves more complex processes. Previous work on the ciliate Chilodonella uncinata revealed that macronuclear β-tubulin gene family members are generated by alternative processing, in which germline regions are alternatively used in multiple macronuclear chromosomes. To further study genome evolution in this ciliate, we analyzed its transcriptome and found that (1) alternative processing is extensive among gene families; and (2) such gene families are likely to be C. uncinata specific. We characterized additional macronuclear and micronuclear copies of one candidate alternatively processed gene family-a protein kinase domain containing protein (PKc)-from two C. uncinata strains. Analysis of the PKc sequences reveals that (1) multiple PKc gene family members in the macronucleus share some identical regions flanked by divergent regions; and (2) the shared identical regions are processed from a single micronuclear chromosome. We discuss analogous processes in lineages across the eukaryotic tree of life to provide further insights on the impact of genome structure on gene family evolution in eukaryotes. PMID:24749903

  7. Genome structure drives patterns of gene family evolution in ciliates, a case study using Chilodonella uncinata (Protista, Ciliophora, Phyllopharyngea)

    PubMed Central

    Gao, Feng; Song, Weibo; Katz, Laura A.

    2014-01-01

    In most lineages, diversity among gene family members results from gene duplication followed by sequence divergence. Because of the genome rearrangements during the development of somatic nuclei, gene family evolution in ciliates involves more complex processes. Previous work on the ciliate Chilodonella uncinata revealed that macronuclear β-tubulin gene family members are generated by alternative processing, in which germline regions are alternatively used in multiple macronuclear chromosomes. To further study genome evolution in this ciliate, we analyzed its transcriptome and found that: 1) alternative processing is extensive among gene families; and 2) such gene families are likely to be C. uncinata-specific. We characterized additional macronuclear and micronuclear copies of one candidate alternatively processed gene family -- a protein kinase domain containing protein (PKc) -- from two C. uncinata strains. Analysis of the PKc sequences reveals: 1) multiple PKc gene family members in the macronucleus share some identical regions flanked by divergent regions; and 2) the shared identical regions are processed from a single micronuclear chromosome. We discuss analogous processes in lineages across the eukaryotic tree of life to provide further insights on the impact of genome structure on gene family evolution in eukaryotes. PMID:24749903

  8. Genome-wide analysis of the structural genes regulating defense phenylpropanoid metabolism in Populus

    SciTech Connect

    Tschaplinski, Timothy J; Tsai, Chung-Jui; Harding, Scott A; Lindroth, richard L; Yuan, Yinan

    2006-01-01

    Salicin-based phenolic glycosides, hydroxycinnamate derivatives and flavonoid-derived condensed tannins comprise up to one-third of Populus leaf dry mass. Genes regulating the abundance and chemical diversity of these substances have not been comprehensively analysed in tree species exhibiting this metabolically demanding level of phenolic metabolism. Here, shikimate-phenylpropanoid pathway genes thought to give rise to these phenolic products were annotated from the Populus genome, their expression assessed by semiquantitative or quantitative reverse transcription polymerase chain reaction (PCR), and metabolic evidence for function presented. Unlike Arabidopsis, Populus leaves accumulate an array of hydroxycinnamoyl-quinate esters, which is consistent with broadened function of the expanded hydroxycinnamoyl-CoA transferase gene family. Greater flavonoid pathway diversity is also represented, and flavonoid gene families are larger. Consistent with expanded pathway function, most of these genes were upregulated during wound-stimulated condensed tannin synthesis in leaves. The suite of Populus genes regulating phenylpropanoid product accumulation should have important application in managing phenolic carbon pools in relation to climate change and global carbon cycling.

  9. Human tissue factor pathway inhibitor (TFPI) gene: Complete genomic structure and localization on the genetic map of chromosome 2q

    SciTech Connect

    Enjyoji, Kei-ichi; Emi, Mitsuru; Mukai, Tsunehiro; Imada, Motohiro; Kato, Hisao ); Leppert, M.L.; Lalouel, J.M. Univ. of Utah Medical School, Salt Lake City, UT )

    1993-08-01

    Tissue factor pathway inhibitor (TFPI), a protease inhibitor that circulates in association with plasma lipoproteins (VLDL, LDL and HDL), helps to regulate the extrinsic blood coagulation cascade. The authors have cloned a 125-kb genomic region containing the entire human TFPI gene on six overlapping cosmids and prepared a restriction map of this contig to clarify gene structure. More than half (45 kb) of the 85-kb gene is occupied with 5[prime] noncoding elements: coding begins at exon 3. A HindIII RFLP identified with one cosmid was genotyped in the CEPH panel of 559 reference families. Linkage analysis using markers on human chromosome 2 located the TFPI gene on 2q, 36 cM proximal to D2S43(pYNZ15) and 13 cM distal to the crystalline [gamma]-polypeptide locus CRYGP1(p5G1). 31 refs., 3 figs., 3 tabs.

  10. Genomic structure and complete nucleotide sequence of the Batten disease gene, CLN3

    SciTech Connect

    Mitchison, H.M.; Munroe, P.B.; O`Rawe, A.M.

    1997-03-01

    We recently cloned a cDNA for CLN3, the gene for juvenile-onset neuronal ceroid lipofuscinosis or Batten disease. To resolve the genomic organization we used a cosmid clone containing CLN3 to sequence the entire gene in addition to 1.1 kb 5{prime} of the start of the published CLN3 cDNA and 0.3 kb 3{prime} to the polyadenylation site. CLN3 is organized into at least 15 exons spanning 15 kb and ranging from 47 to 356 bp. The 14 introns vary from 80 to 4227 bp, and all exon/intron junction sequences conform to the GTAG rule. Numerous repetitive Alu elements are present within the introns and 5{prime}- and 3{prime}-untranslated regions. The 5{prime} region of the CLN3 gene contains several potential transcription regulatory elements but no consensus TATA-1 box was identified. CLN3 is homologous to 27 deposited human ESTs, and sequence comparisons suggest alternative splicing of the gene and the existence of transcribed sequences upstream to the start of the published CLN3 cDNA. 19 refs., 2 figs., 1 tab.

  11. Characterization of the genomic structure of the mouse APLP1 gene

    SciTech Connect

    Zhong, Sue; Wu, Kuo; Black, I.B.; Schaar, D.G.

    1996-02-15

    This article reports on the organization of the mouse APLP1 gene, an evolutionarily conserved amyloid precursor-like protein. The amyloid beta protein, important in Alzheimer diseases, is derived from these precursor proteins. By investigating the expression and structure of this murine gene, it is hoped that more will be learned about the function and regulation of the human homologue. 27 refs., 2 figs.

  12. An analysis by restriction enzymes of the genomic structure of the 3' untranslated region of the human estrogen receptor gene.

    PubMed

    Keaveney, M; Neilan, J; Gannon, F

    1989-04-12

    The estrogen receptor gene has a very long 3' untranslated region. As a first step towards the analysis of this structural feature for any functional role, we have cloned the human genomic estrogen receptor gene. Extensive restriction enzyme analysis of this DNA and comparison of the sizes of the DNA fragments obtained with those predicted from published cDNA sequences indicate that the 3' exon extends for at least 4304 bases from base number 2018 in the cDNA to the end of the cDNA. The data also show that the most 3' intron in this gene occurs between bases 1902 and 2018 of the cDNA. PMID:2930778

  13. From genes to genome biology

    SciTech Connect

    Pennisi, E.

    1996-06-21

    This article describes a change in the approach to mapping genomes, from looking at one gene at a time, to other approaches. Strategies include everything from lab techniques to computer programs designed to analyze whole batches of genes at once. Also included is a update on the work on the human genome.

  14. Mapping of a gene coding for a major late structural polypeptide on the vaccinia virus genome.

    PubMed Central

    Wittek, R; Hänggi, M; Hiller, G

    1984-01-01

    Cell-free translation of total RNA isolated from vaccinia virus-infected cells late in infection results in a complex mixture of polypeptides. A monospecific antibody directed against one of the major structural proteins of the virus particle immunoprecipitated a single polypeptide with a molecular weight of 11,000 (11K) from this mixture. Immunoprecipitation was therefore used to identify the structural polypeptide among the in vitro translation products of RNA purified by hybridization selection to restriction fragments of the vaccinia virus genome. This allowed us to map the mRNA coding for the 11K polypeptide to the extreme left-hand end of the HindIII E fragment. Detailed transcriptional mapping of this region of the genome by nuclease S1 analysis revealed the presence of a late RNA transcribed from the rightward-reading strand. Its 5' end mapped at ca. 130 base pairs to the left of the HindIII site at the junction between the HindIII F and E fragments. The map position of this RNA coincided precisely with the map position of the late message coding for the 11K polypeptide. Images PMID:6319738

  15. Genomic structure and chromosomal localization of the human deoxycytidine kinase gene

    SciTech Connect

    Song, J.J.; Walker, S.; Gribbin, T. ); Chen, E. Univ. of North Carolina, Chapel Hill ); Johnson, E.E.; Spychala, J.; Mitchell, B.S. )

    1993-01-15

    Deoxycytidine kinase (NTP:deoxycytidine 5[prime]-phosphotransferase, EC 2.7.1.74) is an enzyme that catalyzes phosphorylation of deoxyribonucleosides and a number of nucleoside analogs that are important in antiviral and cancer chemotherapy. Deficiency of this enzyme activity is associated with resistance to these agents, whereas increased enzyme activity is associated with increased activation of such compounds to cytotoxic nucleoside triphosphate derivatives. To characterize the regulation of expression of this gene, we have isolated genomic clones encompassing its entire coding and 5[prime] flanking regions and delinated all the exon/intron boundaries. The gene extends over more than 34 kilobases on chromosome 4 and the coding region is composed of 7 exons ranging in size from 90 to 1544 base pairs (bp). The 5[prime] flanking region is highly G+C-rich and contains four regions that are potential Sp1 binding sites. A 697-bp fragment encompassing 386 bp of 5[prime] upstream region, the 250-bp first exon, and 61 bp of the first intron was demonstrated to promote chloramphenicol acetyltransferase activity in a T-lymphoblast cell line and to have >6-fold greater activity in a Jurkat T-lymphoblast than in a Raji B-lymphoblast cell line. Our data suggest that these 5[prime] sequences may contain elements that are important for the tissue-specific differences in deoxycytidine kinase expression. 32 refs., 4 figs., 2 tabs.

  16. Genomic structure, gene expression, and promoter analysis of human multidrug resistance-associated protein 7

    SciTech Connect

    Kao, Hsin-Hsin; Chang, Ming-Shi; Cheng, Jan-Fang; Huang, Jin-Ding

    2002-03-15

    The multidrug resistance-associated protein (MRP) subfamily transporters associated with anticancer drug efflux are attributed to the multidrug-resistance of cancer cells. The genomic organization of human multidrug resistance-associated protein 7 (MRP7) was identified. The human MRP7 gene, consisting of 22 exons and 21 introns, greatly differs from other members of the human MRP subfamily. A splicing variant of human MRP7, MRP7A, expressed in most human tissues, was also characterized. The 1.93-kb promoter region of MRP7 was isolated and shown to support luciferase activity at a level 4- to 5-fold greater than that of the SV40 promoter. Basal MRP7 gene expression was regulated by 2 regions in the 5-flanking region at 1,780 1,287 bp, and at 611 to 208 bp. In Madin-Darby canine kidney (MDCK) cells, MRP7 promoter activity was increased by 226 percent by genotoxic 2-acetylaminofluorene and 347 percent by the histone deacetylase inhibitor, trichostatin A. The protein was expressed in the membrane fraction of transfected MDCK cells.

  17. Reannotation and extended community resources for the genome of the non-seed plant Physcomitrella patens provide insights into the evolution of plant gene structures and functions

    PubMed Central

    2013-01-01

    Background The moss Physcomitrella patens as a model species provides an important reference for early-diverging lineages of plants and the release of the genome in 2008 opened the doors to genome-wide studies. The usability of a reference genome greatly depends on the quality of the annotation and the availability of centralized community resources. Therefore, in the light of accumulating evidence for missing genes, fragmentary gene structures, false annotations and a low rate of functional annotations on the original release, we decided to improve the moss genome annotation. Results Here, we report the complete moss genome re-annotation (designated V1.6) incorporating the increased transcript availability from a multitude of developmental stages and tissue types. We demonstrate the utility of the improved P. patens genome annotation for comparative genomics and new extensions to the cosmoss.org resource as a central repository for this plant “flagship” genome. The structural annotation of 32,275 protein-coding genes results in 8387 additional loci including 1456 loci with known protein domains or homologs in Plantae. This is the first release to include information on transcript isoforms, suggesting alternative splicing events for at least 10.8% of the loci. Furthermore, this release now also provides information on non-protein-coding loci. Functional annotations were improved regarding quality and coverage, resulting in 58% annotated loci (previously: 41%) that comprise also 7200 additional loci with GO annotations. Access and manual curation of the functional and structural genome annotation is provided via the http://www.cosmoss.org model organism database. Conclusions Comparative analysis of gene structure evolution along the green plant lineage provides novel insights, such as a comparatively high number of loci with 5’-UTR introns in the moss. Comparative analysis of functional annotations reveals expansions of moss house-keeping and metabolic genes

  18. Genomic structure and chromosomal mapping of the human CD22 gene

    SciTech Connect

    Wilson, G.L.; Kozlow, E.; Kehrl, J.H. ); Najfeld, V. ); Menniger, J.; Ward, D. )

    1993-06-01

    The human CD22 gene is expressed specifically in B lymphocytes and likely has an important function in cell-cell interactions. A nearly full length human CD22 cDNA clone was used to isolate genomic clones that span the CD22 gene. The CD22 gene is spread over 22 kb of DNA and is composed of 15 exons. The first exon contains the major transcriptional start sites. The translation initiation codon is located in exon 3, which also encodes a portion of the signal peptide. Exons 4 to 10 encode the seven Ig domains of CD22, exon 11 encodes the transmembrane domain, exons 12 to 15 encode the intracytoplasmic domain of CD22, and exon 15 also contains the 3' untranslated region. A minor form of CD22 mRNA likely results from splicing of exon 5 to exon 8, skipping exons 6 and 7. A 4.6-kb Xbal fragment of the CD22 gene was used to map the chromosomal location of CD22 by fluorescence in situ hybridization. The hybridization locus was identified by combining fluorescent images of the probe with the chromosomal banding pattern generated by an Alu probe. The results demonstrate the CD22 is located within the band region q13.1 of chromosome 19. Two closely clustered major transcription start sites and several minor start sites were mapped by primer extension. Similarly to many other lymphoid-specific genes, the CD22 promoter lacks an obvious TATA box. Approximately 4 kb of DNA 5' of the transcription start sites were sequenced and found to contain multiple Alu elements. Potential binding sites for the transcriptional factors NF-kB, AP-1, and Oct-2 are located within 300 bp 5' of the major transcription start sites. A 400-bp fragment (bp -339 through +71) of the CD22 promoter region was subcloned into a pGEM-chloramphenicol acetyltransferase vector and after transfection into B and T cells was found to be active in both B and T cells. 45 refs., 7 figs., 2 tabs.

  19. Clustering of gene ontology terms in genomes.

    PubMed

    Tiirikka, Timo; Siermala, Markku; Vihinen, Mauno

    2014-10-25

    Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at http://bioinf.uta.fi/GOme. The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them. PMID:24995610

  20. Genome-wide prediction of nucleosome occupancy in maize reveals plant chromatin structural features at genes and other elements at multiple scales.

    PubMed

    Fincher, Justin A; Vera, Daniel L; Hughes, Diana D; McGinnis, Karen M; Dennis, Jonathan H; Bass, Hank W

    2013-06-01

    The nucleosome is a fundamental structural and functional chromatin unit that affects nearly all DNA-templated events in eukaryotic genomes. It is also a biochemical substrate for higher order, cis-acting gene expression codes and the monomeric structural unit for chromatin packaging at multiple scales. To predict the nucleosome landscape of a model plant genome, we used a support vector machine computational algorithm trained on human chromatin to predict the nucleosome occupancy likelihood (NOL) across the maize (Zea mays) genome. Experimentally validated NOL plots provide a novel genomic annotation that highlights gene structures, repetitive elements, and chromosome-scale domains likely to reflect regional gene density. We established a new genome browser (http://www.genomaize.org) for viewing support vector machine-based NOL scores. This annotation provides sequence-based comprehensive coverage across the entire genome, including repetitive genomic regions typically excluded from experimental genomics data. We find that transposable elements often displayed family-specific NOL profiles that included distinct regions, especially near their termini, predicted to have strong affinities for nucleosomes. We examined transcription start site consensus NOL plots for maize gene sets and discovered that most maize genes display a typical +1 nucleosome positioning signal just downstream of the start site but not upstream. This overall lack of a -1 nucleosome positioning signal was also predicted by our method for Arabidopsis (Arabidopsis thaliana) genes and verified by additional analysis of previously published Arabidopsis MNase-Seq data, revealing a general feature of plant promoters. Our study advances plant chromatin research by defining the potential contribution of the DNA sequence to observed nucleosome positioning and provides an invariant baseline annotation against which other genomic data can be compared. PMID:23572549

  1. The mouse lp(A3)/Edg7 lysophosphatidic acid receptor gene: genomic structure, chromosomal localization, and expression pattern.

    PubMed

    Contos, J J; Chun, J

    2001-04-18

    The extracellular signaling molecule, lysophosphatidic acid (LPA), mediates proliferative and morphological effects on cells and has been proposed to be involved in several biological processes including neuronal development, wound healing, and cancer progression. Three mammalian G protein-coupled receptors, encoded by genes designated lp (lysophospholipid) receptor or edg (endothelial differentiation gene), mediate the effects of LPA, activating similar (e.g. Ca(2+) release) as well as distinct (neurite retraction) responses. To understand the evolution and function of LPA receptor genes, we characterized lp(A3)/Edg7 in mouse and human and compared the expression pattern with the other two known LPA receptor genes (lp(A1)/Edg2 and lp(A2)/Edg4non-mutant). We found mouse and human lp(A3) to have nearly identical three-exon genomic structures, with introns upstream of the coding region for transmembrane domain (TMD) I and within the coding region for TMD VI. This structure is similar to lp(A1) and lp(A2), indicating a common ancestral gene with two introns. We localized mouse lp(A3) to distal Chromosome 3 near the varitint waddler (Va) gene, in a region syntenic with the human lp(A3) chromosomal location (1p22.3-31.1). We found highest expression levels of each of the three LPA receptor genes in adult mouse testes, relatively high expression levels of lp(A2) and lp(A3) in kidney, and moderate expression of lp(A2) and lp(A3) in lung. All lp(A) transcripts were expressed during brain development, with lp(A1) and lp(A2) transcripts expressed during the embryonic neurogenic period, and lp(A3) transcript during the early postnatal period. Our results indicate both overlapping as well as distinct functions of lp(A1), lp(A2), and lp(A3). PMID:11313151

  2. Comparative analysis of syntenic genes in grass genomes reveals accelerated rates of gene structure and coding sequence evolution in polyploid wheat

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cycles of whole genome duplication (WGD) and diploidization are hallmarks of eukaryotic genome evolution and speciation. Polyploid wheat (Triticum aestivum) has had a massive increase in genome size largely due to recent WGDs. How these processes may impact the dynamics of gene evolution was studied...

  3. Genomic structure analysis of promoter sequence of a mouse mu opioid receptor gene.

    PubMed Central

    Min, B H; Augustin, L B; Felsheim, R F; Fuchs, J A; Loh, H H

    1994-01-01

    We have isolated mouse mu opioid receptor genomic clones (termed MOR) containing the entire amino acid coding sequence corresponding to rat MOR-1 cDNA, including additional 5' flanking sequence. The mouse MOR gene is > 53 kb long, and the coding sequence is divided by three introns, with exon junctions in codons 95 and 213 and between codons 386 and 387. The first intron is > 26 kb, the second is 0.8 kb, and the third is > 12 kb. Multiple transcription initiation sites were observed, with four major sites confirmed by 5' rapid amplification of cDNA ends and RNase protection located between 291 and 268 bp upstream of the translation start codon. Comparison of the 5' flanking sequence with a transcription factor database revealed putative cis-acting regulatory elements for transcription factors affected by cAMP, as well as those involved in the action of gluco- and mineralocorticoids, cytokines, and immune-cell-specific factors. Images PMID:8090773

  4. Chloroplast Genome Sequence of the Moss Tortula ruralis: Gene Content and Structural Arrangement Relative to Other Green Plant Chloroplast Genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Tortula ruralis, a widely distributed moss species in the family Pottiaceae, is increasingly being used as a model organism for the study of desiccation tolerance and mechanisms of cellular repair. In this paper, we present the chloroplast genome sequence of Tortula ruralis, only the second publishe...

  5. Genomic Survey, Gene Expression Analysis and Structural Modeling Suggest Diverse Roles of DNA Methyltransferases in Legumes

    PubMed Central

    Garg, Rohini; Kumari, Romika; Tiwari, Sneha; Goyal, Shweta

    2014-01-01

    DNA methylation plays a crucial role in development through inheritable gene silencing. Plants possess three types of DNA methyltransferases (MTases), namely Methyltransferase (MET), Chromomethylase (CMT) and Domains Rearranged Methyltransferase (DRM), which maintain methylation at CG, CHG and CHH sites. DNA MTases have not been studied in legumes so far. Here, we report the identification and analysis of putative DNA MTases in five legumes, including chickpea, soybean, pigeonpea, Medicago and Lotus. MTases in legumes could be classified in known MET, CMT, DRM and DNA nucleotide methyltransferases (DNMT2) subfamilies based on their domain organization. First three MTases represent DNA MTases, whereas DNMT2 represents a transfer RNA (tRNA) MTase. Structural comparison of all the MTases in plants with known MTases in mammalian and plant systems have been reported to assign structural features in context of biological functions of these proteins. The structure analysis clearly specified regions crucial for protein-protein interactions and regions important for nucleosome binding in various domains of CMT and MET proteins. In addition, structural model of DRM suggested that circular permutation of motifs does not have any effect on overall structure of DNA methyltransferase domain. These results provide valuable insights into role of various domains in molecular recognition and should facilitate mechanistic understanding of their function in mediating specific methylation patterns. Further, the comprehensive gene expression analyses of MTases in legumes provided evidence of their role in various developmental processes throughout the plant life cycle and response to various abiotic stresses. Overall, our study will be very helpful in establishing the specific functions of DNA MTases in legumes. PMID:24586452

  6. Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage

    PubMed Central

    2012-01-01

    Background Bathycoccus prasinos is an extremely small cosmopolitan marine green alga whose cells are covered with intricate spider's web patterned scales that develop within the Golgi cisternae before their transport to the cell surface. The objective of this work is to sequence and analyze its genome, and to present a comparative analysis with other known genomes of the green lineage. Research Its small genome of 15 Mb consists of 19 chromosomes and lacks transposons. Although 70% of all B. prasinos genes share similarities with other Viridiplantae genes, up to 428 genes were probably acquired by horizontal gene transfer, mainly from other eukaryotes. Two chromosomes, one big and one small, are atypical, an unusual synapomorphic feature within the Mamiellales. Genes on these atypical outlier chromosomes show lower GC content and a significant fraction of putative horizontal gene transfer genes. Whereas the small outlier chromosome lacks colinearity with other Mamiellales and contains many unknown genes without homologs in other species, the big outlier shows a higher intron content, increased expression levels and a unique clustering pattern of housekeeping functionalities. Four gene families are highly expanded in B. prasinos, including sialyltransferases, sialidases, ankyrin repeats and zinc ion-binding genes, and we hypothesize that these genes are associated with the process of scale biogenesis. Conclusion The minimal genomes of the Mamiellophyceae provide a baseline for evolutionary and functional analyses of metabolic processes in green plants. PMID:22925495

  7. Evolution of Pulmonate Gastropod Mitochondrial Genomes: Comparisons of Gene Organizations of Euhadra, Cepaea and Albinaria and Implications of Unusual Trna Secondary Structures

    PubMed Central

    Yamazaki, N.; Ueshima, R.; Terrett, J. A.; Yokobori, S. I.; Kaifu, M.; Segawa, R.; Kobayashi, T.; Numachi, K. I.; Ueda, T.; Nishikawa, K.; Watanabe, K.; Thomas, R. H.

    1997-01-01

    Complete gene organizations of the mitochondrial genomes of three pulmonate gastropods, Euhadra herklotsi, Cepaea nemoralis and Albinaria coerulea, permit comparisons of their gene organizations. Euhadra and Cepaea are classified in the same superfamily, Helicoidea, yet they show several differences in the order of tRNA and protein coding genes. Albinaria is distantly related to the other two genera but shares the same gene order in one part of its mitochondrial genome with Euhadra and in another part with Cepaea. Despite their small size (14.1-14.5 kbp), these snail mtDNAs encode 13 protein genes, two rRNA genes and at least 22 tRNA genes. These genomes exhibit several unusual or unique features compared to other published metazoan mitochondrial genomes, including those of other molluscs. Several tRNAs predicted from the DNA sequences possess bizarre structures lacking either the T stem or the D stem, similar to the situation seen in nematode mt-tRNAs. The acceptor stems of many tRNAs show a considerable number of mismatched basepairs, indicating that the RNA editing process recently demonstrated in Euhadra is widespread in the pulmonate gastropods. Strong selection acting on mitochondrial genomes of these animals would have resulted in frequent occurrence of the mismatched basepairs in regions of overlapping genes. PMID:9055084

  8. Genome-Wide Analysis Reveals Novel Genes Influencing Temporal Lobe Structure with Relevance to Neurodegeneration in Alzheimer’s Disease

    PubMed Central

    Stein, Jason L.; Hua, Xue; Morra, Jonathan H.; Lee, Suh; Hibar, Derrek P.; Ho, April J.; Leow, Alex D.; Toga, Arthur W.; Sul, Jae Hoon; Kang, Hyun Min; Eskin, Eleazar; Saykin, Andrew J.; Shen, Li; Foroud, Tatiana; Pankratz, Nathan; Huentelman, Matthew J.; Craig, David W.; Gerber, Jill D.; Allen, April N.; Corneveaux, Jason J.; Stephan, Dietrich A.; Webster, Jennifer; DeChairo, Bryan M.; Potkin, Steven G.; Jack, Clifford R.; Weiner, Michael W.; Thompson, Paul M.

    2010-01-01

    In a genome-wide association study of structural brain degeneration, we mapped the 3D profile of temporal lobe volume differences in 742 brain MRI scans of Alzheimer’s disease patients, mildly impaired, and healthy elderly subjects. After searching 546,314 genomic markers, 2 single nucleotide polymorphisms (SNPs) were associated with bilateral temporal lobe volume (P < 5×10−7). One SNP, rs10845840, is located in the GRIN2B gene which encodes the N-Methyl-D-Aspartate (NMDA) glutamate receptor NR2B subunit. This protein - involved in learning and memory, and excitotoxic cell death - has age-dependent prevalence in the synapse and is already a therapeutic target in Alzheimer’s disease. Risk alleles for lower temporal lobe volume at this SNP were significantly over-represented in AD and MCI subjects versus controls (odds ratio = 1.273; P = 0.039) and were associated with the mini-mental state exam (MMSE; t = −2.114; P = 0.035) demonstrating a negative effect on global cognitive function. Voxelwise maps of genetic association of this SNP with regional brain volumes, revealed intense temporal lobe effects (FDR correction at q = 0.05; critical P = 0.0257). This study uses large-scale brain mapping for gene discovery with implications for Alzheimer’s disease. PMID:20197096

  9. Short interspersed nuclear elements (SINEs) are abundant in Solanaceae and have a family-specific impact on gene structure and genome organization.

    PubMed

    Seibt, Kathrin M; Wenke, Torsten; Muders, Katja; Truberg, Bernd; Schmidt, Thomas

    2016-05-01

    Short interspersed nuclear elements (SINEs) are highly abundant non-autonomous retrotransposons that are widespread in plants. They are short in size, non-coding, show high sequence diversity, and are therefore mostly not or not correctly annotated in plant genome sequences. Hence, comparative studies on genomic SINE populations are rare. To explore the structural organization and impact of SINEs, we comparatively investigated the genome sequences of the Solanaceae species potato (Solanum tuberosum), tomato (Solanum lycopersicum), wild tomato (Solanum pennellii), and two pepper cultivars (Capsicum annuum). Based on 8.5 Gbp sequence data, we annotated 82 983 SINE copies belonging to 10 families and subfamilies on a base pair level. Solanaceae SINEs are dispersed over all chromosomes with enrichments in distal regions. Depending on the genome assemblies and gene predictions, 30% of all SINE copies are associated with genes, particularly frequent in introns and untranslated regions (UTRs). The close association with genes is family specific. More than 10% of all genes annotated in the Solanaceae species investigated contain at least one SINE insertion, and we found genes harbouring up to 16 SINE copies. We demonstrate the involvement of SINEs in gene and genome evolution including the donation of splice sites, start and stop codons and exons to genes, enlargement of introns and UTRs, generation of tandem-like duplications and transduction of adjacent sequence regions. PMID:26996788

  10. Genome-wide Analyses of the Structural Gene Families Involved in the Legume-specific 5-Deoxyisoflavonoid Biosynthesis of Lotus japonicus

    PubMed Central

    Shimada, Norimoto; Sato, Shusei; Akashi, Tomoyoshi; Nakamura, Yasukazu; Tabata, Satoshi; Ayabe, Shin-ichi; Aoki, Toshio

    2007-01-01

    Abstract A model legume Lotus japonicus (Regel) K. Larsen is one of the subjects of genome sequencing and functional genomics programs. In the course of targeted approaches to the legume genomics, we analyzed the genes encoding enzymes involved in the biosynthesis of the legume-specific 5-deoxyisoflavonoid of L. japonicus, which produces isoflavan phytoalexins on elicitor treatment. The paralogous biosynthetic genes were assigned as comprehensively as possible by biochemical experiments, similarity searches, comparison of the gene structures, and phylogenetic analyses. Among the 10 biosynthetic genes investigated, six comprise multigene families, and in many cases they form gene clusters in the chromosomes. Semi-quantitative reverse transcriptase–PCR analyses showed coordinate up-regulation of most of the genes during phytoalexin induction and complex accumulation patterns of the transcripts in different organs. Some paralogous genes exhibited similar expression specificities, suggesting their genetic redundancy. The molecular evolution of the biosynthetic genes is discussed. The results presented here provide reliable annotations of the genes and genetic markers for comparative and functional genomics of leguminous plants. PMID:17452423

  11. Genomics screens for metastasis genes

    PubMed Central

    Yan, Jinchun; Huang, Qihong

    2014-01-01

    Metastasis is responsible for most cancer mortality. The process of metastasis is complex, requiring the coordinated expression and fine regulation of many genes in multiple pathways in both the tumor and host tissues. Identification and characterization of the genetic programs that regulate metastasis is critical to understanding the metastatic process and discovering molecular targets for the prevention and treatment of metastasis. Genomic approaches and functional genomic analyses can systemically discover metastasis genes. In this review, we summarize the genetic tools and methods that have been used to identify and characterize the genes that play critical roles in metastasis. PMID:22684367

  12. Characterisation of a genomic clone covering the structural mouse MyoD1 gene and its promoter region.

    PubMed Central

    Zingg, J M; Alva, G P; Jost, J P

    1991-01-01

    We have isolated the mouse MyoD1 gene flanked by its promoter region by screening a genomic library with synthetic oligonucleotides. The structural gene is interrupted by two G + C rich introns. Transfection of the cloned gene inserted into an expression vector converts fibroblasts to myoblasts. Sequence analysis of about 650 bp of the 5' upstream region revealed the presence of several potential regulatory elements such as a TATA-box, an AP2-box, two SP1-boxes and a CAAT-box. In addition, there are three half palindromic estrogen response elements, a potential cAMP response element and various muscle specific elements such as a muscle-specific CAAT-box (MCAT) and four potential binding sites for MyoD1. Using S1 protection analysis the major start site of transcription in muscle and myoblast cells was mapped 3 bp upstream of the published cDNA 5' end. Promoter activity of the 650 bp upstream fragment was tested by in vitro transcription and by transfection analysis of myoblasts and fibroblasts. In all promoter test systems used, MyoD1 promoter activity was detected in myoblasts as well as in fibroblasts. Furthermore, DNA methylation was found to turn off MyoD1 promoter activity both in myoblasts and in fibroblasts. Images PMID:1754380

  13. Genomic structure of the human plasma prekallikrein gene, identification of allelic variants, and analysis in end-stage renal disease.

    PubMed

    Yu, H; Anderson, P J; Freedman, B I; Rich, S S; Bowden, D W

    2000-10-15

    Kallikreins are serine proteases that catalyze the release of kinins and other vasoactive peptides. Previously, we have studied one tissue-specific (H. Yu et al., 1996, J. Am. Soc. Nephrol. 7: 2559-2564) and one plasma-specific (H. Yu et al., 1998, Hypertension 31: 906-911) human kallikrein gene in end-stage renal disease (ESRD). Short sequence repeat polymorphisms for the human plasma kallikrein gene (KLKB1; previously known as KLK3) on chromosome 4 were associated with ESRD in an African American study population. This study of KLKB1 in ESRD has been extended by determining the genomic structure of KLKB1 and searching for allelic variants that may be associated with ESRD. Exon-spanning PCR primer sets were identified by serial testing of primer pairs designed from KLKB1 cDNA sequence and DNA sequencing of PCR products. Like the rat plasma kallikrein gene and the closely related human factor XI gene, the human KLKB1 gene contains 15 exons and 14 introns. The longest intron, F, is almost 12 kb long. The total length of the gene is approximately 30 kb. Sequence of the 5'-proximal promoter region of KLKB1 was obtained by shotgun cloning of genomic fragments from a bacterial artificial clone containing the KLKB1 gene, followed by screening of the clones using exon 1-specific probes. Primers flanking the exons and 5'-proximal promoter region were used to screen for allelic variants in the genomic DNA from ESRD patients and controls using the single-strand conformation polymorphism technique. We identified 12 allelic variants in the 5'-proximal promoter and 7 exons. Of note were a common polymorphism (30% of the population) at position 521 of KLKB1 cDNA, which leads to the replacement of asparagine with a serine at position 124 in the heavy chain of the A2 domain of the protein. In addition, an A716C polymorphism in exon 7 resulting in the amino acid change H189P in the A3 domain of the heavy chain was observed in 5 patients belonging to 3 ESRD families. A third

  14. Cytosines, but not purines, determine recombination activating gene (RAG)-induced breaks on heteroduplex DNA structures: implications for genomic instability.

    PubMed

    Naik, Abani Kanta; Lieber, Michael R; Raghavan, Sathees C

    2010-03-01

    The sequence specificity of the recombination activating gene (RAG) complex during V(D)J recombination has been well studied. RAGs can also act as structure-specific nuclease; however, little is known about the mechanism of its action. Here, we show that in addition to DNA structure, sequence dictates the pattern and efficiency of RAG cleavage on altered DNA structures. Cytosine nucleotides are preferentially nicked by RAGs when present at single-stranded regions of heteroduplex DNA. Although unpaired thymine nucleotides are also nicked, the efficiency is many fold weaker. Induction of single- or double-strand breaks by RAGs depends on the position of cytosines and whether it is present on one or both of the strands. Interestingly, RAGs are unable to induce breaks when adenine or guanine nucleotides are present at single-strand regions. The nucleotide present immediately next to the bubble sequence could also affect RAG cleavage. Hence, we propose "C((d))C((S))C((S))" (d, double-stranded; s, single-stranded) as a consensus sequence for RAG-induced breaks at single-/double-strand DNA transitions. Such a consensus sequence motif is useful for explaining RAG cleavage on other types of DNA structures described in the literature. Therefore, the mechanism of RAG cleavage described here could explain facets of chromosomal rearrangements specific to lymphoid tissues leading to genomic instability. PMID:20051517

  15. Dynamic structures in phytoplasma genomes: sequence variable mosaics (SVMs) of clustered genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Emergence of the phytoplasma clade from an Acholeplasma-like ancestor gave rise to an intriguing group of cell wall-less prokaryotes through a remarkable and continuing evolutionary process. In a ceaseless progression, phytoplasmas have evolved reduced genomes, losing biochemical pathways for synth...

  16. Population Structure and Comparative Genome Hybridization of European Flor Yeast Reveal a Unique Group of Saccharomyces cerevisiae Strains with Few Gene Duplications in Their Genome

    PubMed Central

    Legras, Jean-Luc; Erny, Claude; Charpentier, Claudine

    2014-01-01

    Wine biological aging is a wine making process used to produce specific beverages in several countries in Europe, including Spain, Italy, France, and Hungary. This process involves the formation of a velum at the surface of the wine. Here, we present the first large scale comparison of all European flor strains involved in this process. We inferred the population structure of these European flor strains from their microsatellite genotype diversity and analyzed their ploidy. We show that almost all of these flor strains belong to the same cluster and are diploid, except for a few Spanish strains. Comparison of the array hybridization profile of six flor strains originating from these four countries, with that of three wine strains did not reveal any large segmental amplification. Nonetheless, some genes, including YKL221W/MCH2 and YKL222C, were amplified in the genome of four out of six flor strains. Finally, we correlated ICR1 ncRNA and FLO11 polymorphisms with flor yeast population structure, and associate the presence of wild type ICR1 and a long Flo11p with thin velum formation in a cluster of Jura strains. These results provide new insight into the diversity of flor yeast and show that combinations of different adaptive changes can lead to an increase of hydrophobicity and affect velum formation. PMID:25272156

  17. Improved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangements

    PubMed Central

    Jühling, Frank; Pütz, Joern; Bernt, Matthias; Donath, Alexander; Middendorf, Martin; Florentz, Catherine; Stadler, Peter F.

    2012-01-01

    Transfer RNAs (tRNAs) are present in all types of cells as well as in organelles. tRNAs of animal mitochondria show a low level of primary sequence conservation and exhibit ‘bizarre’ secondary structures, lacking complete domains of the common cloverleaf. Such sequences are hard to detect and hence frequently missed in computational analyses and mitochondrial genome annotation. Here, we introduce an automatic annotation procedure for mitochondrial tRNA genes in Metazoa based on sequence and structural information in manually curated covariance models. The method, applied to re-annotate 1876 available metazoan mitochondrial RefSeq genomes, allows to distinguish between remaining functional genes and degrading ‘pseudogenes’, even at early stages of divergence. The subsequent analysis of a comprehensive set of mitochondrial tRNA genes gives new insights into the evolution of structures of mitochondrial tRNA sequences as well as into the mechanisms of genome rearrangements. We find frequent losses of tRNA genes concentrated in basal Metazoa, frequent independent losses of individual parts of tRNA genes, particularly in Arthropoda, and wide-spread conserved overlaps of tRNAs in opposite reading direction. Direct evidence for several recent Tandem Duplication-Random Loss events is gained, demonstrating that this mechanism has an impact on the appearance of new mitochondrial gene orders. PMID:22139921

  18. The impact of genome-wide supported schizophrenia risk variants in the neurogranin gene on brain structure and function.

    PubMed

    Walton, Esther; Geisler, Daniel; Hass, Johanna; Liu, Jingyu; Turner, Jessica; Yendiki, Anastasia; Smolka, Michael N; Ho, Beng-Choon; Manoach, Dara S; Gollub, Randy L; Roessner, Veit; Calhoun, Vince D; Ehrlich, Stefan

    2013-01-01

    The neural mechanisms underlying genetic risk for schizophrenia, a highly heritable psychiatric condition, are still under investigation. New schizophrenia risk genes discovered through genome-wide association studies (GWAS), such as neurogranin (NRGN), can be used to identify these mechanisms. In this study we examined the association of two common NRGN risk single nucleotide polymorphisms (SNPs) with functional and structural brain-based intermediate phenotypes for schizophrenia. We obtained structural, functional MRI and genotype data of 92 schizophrenia patients and 114 healthy volunteers from the multisite Mind Clinical Imaging Consortium study. Two schizophrenia-associated NRGN SNPs (rs12807809 and rs12541) were tested for association with working memory-elicited dorsolateral prefrontal cortex (DLPFC) activity and surface-wide cortical thickness. NRGN rs12541 risk allele homozygotes (TT) displayed increased working memory-related activity in several brain regions, including the left DLPFC, left insula, left somatosensory cortex and the cingulate cortex, when compared to non-risk allele carriers. NRGN rs12807809 non-risk allele (C) carriers showed reduced cortical gray matter thickness compared to risk allele homozygotes (TT) in an area comprising the right pericalcarine gyrus, the right cuneus, and the right lingual gyrus. Our study highlights the effects of schizophrenia risk variants in the NRGN gene on functional and structural brain-based intermediate phenotypes for schizophrenia. These results support recent GWAS findings and further implicate NRGN in the pathophysiology of schizophrenia by suggesting that genetic NRGN risk variants contribute to subtle changes in neural functioning and anatomy that can be quantified with neuroimaging methods. PMID:24098564

  19. THE FAD2 GENE FAMILY OF SOYBEAN:INSIGHTS INTO THE STRUCTURAL AND FUNCTIONAL DIVERGENCE OF A PALEOPOLYPLOID GENOME

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The omega-6 fatty acid desaturase (FAD2) gene family in soybean consists of at least five members in four regions of the genome. These desaturases are responsible for the conversion of oleic acid to linoleic acid. Bacterial artificial chromosomes (BACs) corresponding to these loci were sequenced to ...

  20. Comparative assessment of the pig, mouse, and human genomes: A structural and functional analysis of genes involved in immunity

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A detailed analysis was conducted on portions of the porcine, murine, and human genome associated with the immune response. It was found that non-protein coding RNA/DNA that potentially interact and regulate gene expression, nucleotide similarity, isochore type, and the similarity of 5’ and 3’ UTR ...

  1. Gene expression patterns are correlated with genomic and genic structure in soybean

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Studies have indicated that exon and intron size, and intergenic distance are correlated with gene expression levels and expression breadth. Previous studies on these correlations in plants and animals have been conflicting. In this study next-generation sequence data of the soybean transcriptome wa...

  2. Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes

    PubMed Central

    Matus, José Tomás; Aquea, Felipe; Arce-Johnson, Patricio

    2008-01-01

    Background The MYB superfamily constitutes the most abundant group of transcription factors described in plants. Members control processes such as epidermal cell differentiation, stomatal aperture, flavonoid synthesis, cold and drought tolerance and pathogen resistance. No genome-wide characterization of this family has been conducted in a woody species such as grapevine. In addition, previous analysis of the recently released grape genome sequence suggested expansion events of several gene families involved in wine quality. Results We describe and classify 108 members of the grape R2R3 MYB gene subfamily in terms of their genomic gene structures and similarity to their putative Arabidopsis thaliana orthologues. Seven gene models were derived and analyzed in terms of gene expression and their DNA binding domain structures. Despite low overall sequence homology in the C-terminus of all proteins, even in those with similar functions across Arabidopsis and Vitis, highly conserved motif sequences and exon lengths were found. The grape epidermal cell fate clade is expanded when compared with the Arabidopsis and rice MYB subfamilies. Two anthocyanin MYBA related clusters were identified in chromosomes 2 and 14, one of which includes the previously described grape colour locus. Tannin related loci were also detected with eight candidate homologues in chromosomes 4, 9 and 11. Conclusion This genome wide transcription factor analysis in Vitis suggests that clade-specific grape R2R3 MYB genes are expanded while other MYB genes could be well conserved compared to Arabidopsis. MYB gene abundance, homology and orientation within particular loci also suggests that expanded MYB clades conferring quality attributes of grapes and wines, such as colour and astringency, could possess redundant, overlapping and cooperative functions. PMID:18647406

  3. Synonymous Codon Usage Bias in the Plastid Genome is Unrelated to Gene Structure and Shows Evolutionary Heterogeneity

    PubMed Central

    Qi, Yueying; Xu, Wenjing; Xing, Tian; Zhao, Mingming; Li, Nana; Yan, Li; Xia, Guangmin; Wang, Mengcheng

    2015-01-01

    Synonymous codon usage bias (SCUB) is the nonuniform usage of codons, occurring often in nearly all organisms. Our previous study found that SCUB is correlated with intron number, is unequal among exons in the plant nuclear genome, and mirrors evolutionary specialization. However, whether this rule exists in the plastid genome has not been addressed. Here, we present an analysis of SCUB in the plastid genomes of 25 species from lower to higher plants (algae, bryophytes, pteridophytes, gymnosperms, and spermatophytes). We found NNA and NNT (A- and T-ending codons) are preferential in the plastid genomes of all plants. Interestingly, this preference is heterogeneous among taxonomies of plants, with the strongest preference in bryophytes and the weakest in pteridophytes, suggesting an association between SCUB and plant evolution. In addition, SCUB frequencies are consistent among genes with varied introns and among exons, indicating that the bias of NNA and NNT is unrelated to either intron number or exon position. Further, SCUB is associated with DNA methylation–induced conversion of cytosine to thymine in the vascular plants but not in algae or bryophytes. These data demonstrate that these SCUB profiles in the plastid genome are distinctly different compared with the nuclear genome. PMID:25922569

  4. Chromosomal localization, genomic structure, and allelic polymorphism of the human CD79a (lg-{alpha}/mb-1) gene

    SciTech Connect

    Hashimoto, S.; Gregersen, P.K.; Chiorazzi, N. |; Mohrenweiser, H.W.

    1994-12-31

    The germline DNA sequence of the human CD79a (Ig-{alpha}/mb-1) gene was determined by polymerase chain reaction sequencing of a cosmid clone derived from an arrayed human chromosome 19 library. The CD79a gene was localized to chromosome 19q13.2; this localization places the gene within the CEA-like gene cluster with the following gene order: -CEA-CGM1-CD79a-RPS11-ATP1A3-BGP-CGM9-. The genomic organization of the human CD79a gene resembles the mouse counterpart with five exons interrupted by four introns. Computer analyses suggest the presence of transcription regulatory elements known to be important in the regulation of mouse CD79a (AP-1, EBF, AP-2, MUF2, and SP-1 sites), as well as elements not found in the mouse gene (an NK-kB binding site and a series of E-box motifs). Similar to the mouse gene, the 5{prime} flanking region of human CD79a lacks a TATA box; however, unlike mouse CD79a, a classical octamer motif could not be identified in the human gene. Finally, a new Rsa I restriction fragment length polymorphism was defined in the non-coding regions of the human gene. 64 refs., 4 figs., 2 tabs.

  5. Brief Guide to Genomics: DNA, Genes and Genomes

    MedlinePlus

    ... guía de genómica A Brief Guide to Genomics DNA, Genes and Genomes Deoxyribonucleic acid (DNA) is the ... and lead to a disease such as cancer. DNA Sequencing Sequencing simply means determining the exact order ...

  6. Genome Structure of the Symbiont Bifidobacterium pseudocatenulatum CECT 7765 and Gene Expression Profiling in Response to Lactulose-Derived Oligosaccharides

    PubMed Central

    Benítez-Páez, Alfonso; Moreno, F. Javier; Sanz, María L.; Sanz, Yolanda

    2016-01-01

    Bifidobacterium pseudocatenulatum CECT 7765 was isolated from stools of a breast-fed infant. Although, this strain is generally considered an adult-type bifidobacterial species, it has also been shown to have pre-clinical efficacy in obesity models. In order to understand the molecular basis of its adaptation to complex carbohydrates and improve its potential functionality, we have analyzed its genome and transcriptome, as well as its metabolic output when growing in galacto-oligosaccharides derived from lactulose (GOS-Lu) as carbon source. B. pseudocatenulatum CECT 7765 shows strain-specific genome regions, including a great diversity of sugar metabolic-related genes. A preliminary and exploratory transcriptome analysis suggests candidate over-expression of several genes coding for sugar transporters and permeases; furthermore, five out of seven beta-galactosidases identified in the genome could be activated in response to GOS-Lu exposure. Here, we also propose that a specific gene cluster is involved in controlling the import and hydrolysis of certain di- and tri-saccharides, which seemed to be those primarily taken-up by the bifidobacterial strain. This was discerned from mass spectrometry-based quantification of different saccharide fractions of culture supernatants. Our results confirm that the expression of genes involved in sugar transport and metabolism and in the synthesis of leucine, an amino acid with a key role in glucose and energy homeostasis, was up-regulated by GOS-Lu. This was done using qPCR in addition to the exploratory information derived from the single-replicated RNAseq approach, together with the functional annotation of genes predicted to be encoded in the B. pseudocatenulatum CETC 7765 genome. PMID:27199952

  7. Genome Structure of the Symbiont Bifidobacterium pseudocatenulatum CECT 7765 and Gene Expression Profiling in Response to Lactulose-Derived Oligosaccharides.

    PubMed

    Benítez-Páez, Alfonso; Moreno, F Javier; Sanz, María L; Sanz, Yolanda

    2016-01-01

    Bifidobacterium pseudocatenulatum CECT 7765 was isolated from stools of a breast-fed infant. Although, this strain is generally considered an adult-type bifidobacterial species, it has also been shown to have pre-clinical efficacy in obesity models. In order to understand the molecular basis of its adaptation to complex carbohydrates and improve its potential functionality, we have analyzed its genome and transcriptome, as well as its metabolic output when growing in galacto-oligosaccharides derived from lactulose (GOS-Lu) as carbon source. B. pseudocatenulatum CECT 7765 shows strain-specific genome regions, including a great diversity of sugar metabolic-related genes. A preliminary and exploratory transcriptome analysis suggests candidate over-expression of several genes coding for sugar transporters and permeases; furthermore, five out of seven beta-galactosidases identified in the genome could be activated in response to GOS-Lu exposure. Here, we also propose that a specific gene cluster is involved in controlling the import and hydrolysis of certain di- and tri-saccharides, which seemed to be those primarily taken-up by the bifidobacterial strain. This was discerned from mass spectrometry-based quantification of different saccharide fractions of culture supernatants. Our results confirm that the expression of genes involved in sugar transport and metabolism and in the synthesis of leucine, an amino acid with a key role in glucose and energy homeostasis, was up-regulated by GOS-Lu. This was done using qPCR in addition to the exploratory information derived from the single-replicated RNAseq approach, together with the functional annotation of genes predicted to be encoded in the B. pseudocatenulatum CETC 7765 genome. PMID:27199952

  8. Characterization of gene rearrangements resulted from genomic structural aberrations in human esophageal squamous cell carcinoma KYSE150 cells.

    PubMed

    Hao, Jia-Jie; Gong, Ting; Zhang, Yu; Shi, Zhi-Zhou; Xu, Xin; Dong, Jin-Tang; Zhan, Qi-Min; Fu, Song-Bin; Wang, Ming-Rong

    2013-01-15

    Chromosomal rearrangements and involved genes have been reported to play important roles in the development and progression of human malignancies. But the gene rearrangements in esophageal squamous cell carcinoma (ESCC) remain to be identified. In the present study, array-based comparative genomic hybridization (array-CGH) was performed on the ESCC cell line KYSE150. Eight disrupted genes were detected according to the obviously distinct unbalanced breakpoints. The splitting of these genes was validated by dual-color fluorescence in-situ hybridization (FISH). By using rapid amplification of cDNA ends (RACE), genome walking and sequencing analysis, we further identified gene disruptions and rearrangements. A fusion transcript DTL-1q42.2 was derived from an intrachromosomal rearrangement of chromosome 1. Highly amplified segments of DTL and PTPRD were self-rearranged. The sequences on either side of the junctions possess micro-homology with each other. FISH results indicated that the split DTL and PTPRD were also involved in comprising parts of the derivative chromosomes resulted from t(1q;9p;12p) and t(9;1;9). Further, we found that regions harboring DTL (1q32.3) and PTPRD (9p23) were also splitting in ESCC tumors. The data supplement significant information on the existing genetic background of KYSE150, which may be used as a model for studying these gene rearrangements. PMID:23026210

  9. Evolutionary origin of Rosaceae-specific active non-autonomous hAT elements and their contribution to gene regulation and genomic structural variation.

    PubMed

    Wang, Lu; Peng, Qian; Zhao, Jianbo; Ren, Fei; Zhou, Hui; Wang, Wei; Liao, Liao; Owiti, Albert; Jiang, Quan; Han, Yuepeng

    2016-05-01

    Transposable elements account for approximately 30 % of the Prunus genome; however, their evolutionary origin and functionality remain largely unclear. In this study, we identified a hAT transposon family, termed Moshan, in Prunus. The Moshan elements consist of three types, aMoshan, tMoshan, and mMoshan. The aMoshan and tMoshan types contain intact or truncated transposase genes, respectively, while the mMoshan type is miniature inverted-repeat transposable element (MITE). The Moshan transposons are unique to Rosaceae, and the copy numbers of different Moshan types are significantly correlated. Sequence homology analysis reveals that the mMoshan MITEs are direct deletion derivatives of the tMoshan progenitors, and one kind of mMoshan containing a MuDR-derived fragment were amplified predominately in the peach genome. The mMoshan sequences contain cis-regulatory elements that can enhance gene expression up to 100-fold. The mMoshan MITEs can serve as potential sources of micro and long noncoding RNAs. Whole-genome re-sequencing analysis indicates that mMoshan elements are highly active, and an insertion into S-haplotype-specific F-box gene was reported to cause the breakdown of self-incompatibility in sour cherry. Taken together, all these results suggest that the mMoshan elements play important roles in regulating gene expression and driving genomic structural variation in Prunus. PMID:26941188

  10. Tripartite mitochondrial genome of spinach: physical structure, mitochondrial gene mapping, and locations of transposed chloroplast DNA sequences.

    PubMed Central

    Stern, D B; Palmer, J D

    1986-01-01

    A complete physical map of the spinach mitochondrial genome has been established. The entire sequence content of 327 kilobase pairs (kb) is postulated to occur as a single circular molecule. Two directly repeated elements of approximately 6 kb, located on this "master chromosome", are proposed to participate in an intragenomic recombination event that reversibly generates two "subgenomic" circles of 93 kb and 234 kb. The positions of protein and ribosomal RNA-encoding genes, determined by heterologous filter hybridizations, are scattered throughout the genome, with duplicate 26S rRNA genes located partially or entirely within the 6 kb repeat elements. Filter hybridizations between spinach mitochondrial DNA and cloned segments of spinach chloroplast DNA reveal at least twelve dispersed regions of inter-organellar sequence homology. Images PMID:3016660

  11. Genome-wide identification, structural analysis and new insights into late embryogenesis abundant (LEA) gene family formation pattern in Brassica napus

    PubMed Central

    Liang, Yu; Xiong, Ziyi; Zheng, Jianxiao; Xu, Dongyang; Zhu, Zeyang; Xiang, Jun; Gan, Jianping; Raboanatahiry, Nadia; Yin, Yongtai; Li, Maoteng

    2016-01-01

    Late embryogenesis abundant (LEA) proteins are a diverse and large group of polypeptides that play important roles in desiccation and freezing tolerance in plants. The LEA family has been systematically characterized in some plants but not Brassica napus. In this study, 108 BnLEA genes were identified in the B. napus genome and classified into eight families based on their conserved domains. Protein sequence alignments revealed an abundance of alanine, lysine and glutamic acid residues in BnLEA proteins. The BnLEA gene structure has few introns (<3), and they are distributed unevenly across all 19 chromosomes in B. napus, occurring as gene clusters in chromosomes A9, C2, C4 and C5. More than two-thirds of the BnLEA genes are associated with segmental duplication. Synteny analysis revealed that most LEA genes are conserved, although gene losses or gains were also identified. These results suggest that segmental duplication and whole-genome duplication played a major role in the expansion of the BnLEA gene family. Expression profiles analysis indicated that expression of most BnLEAs was increased in leaves and late stage seeds. This study presents a comprehensive overview of the LEA gene family in B. napus and provides new insights into the formation of this family. PMID:27072743

  12. Genome-wide identification, structural analysis and new insights into late embryogenesis abundant (LEA) gene family formation pattern in Brassica napus.

    PubMed

    Liang, Yu; Xiong, Ziyi; Zheng, Jianxiao; Xu, Dongyang; Zhu, Zeyang; Xiang, Jun; Gan, Jianping; Raboanatahiry, Nadia; Yin, Yongtai; Li, Maoteng

    2016-01-01

    Late embryogenesis abundant (LEA) proteins are a diverse and large group of polypeptides that play important roles in desiccation and freezing tolerance in plants. The LEA family has been systematically characterized in some plants but not Brassica napus. In this study, 108 BnLEA genes were identified in the B. napus genome and classified into eight families based on their conserved domains. Protein sequence alignments revealed an abundance of alanine, lysine and glutamic acid residues in BnLEA proteins. The BnLEA gene structure has few introns (<3), and they are distributed unevenly across all 19 chromosomes in B. napus, occurring as gene clusters in chromosomes A9, C2, C4 and C5. More than two-thirds of the BnLEA genes are associated with segmental duplication. Synteny analysis revealed that most LEA genes are conserved, although gene losses or gains were also identified. These results suggest that segmental duplication and whole-genome duplication played a major role in the expansion of the BnLEA gene family. Expression profiles analysis indicated that expression of most BnLEAs was increased in leaves and late stage seeds. This study presents a comprehensive overview of the LEA gene family in B. napus and provides new insights into the formation of this family. PMID:27072743

  13. Genomic contributions in livestock gene introgression programmes

    PubMed Central

    Wall, Eileen; Visscher, Peter M; Hospital, Frédéric; Woolliams, John A

    2005-01-01

    The composition of the genome after introgression of a marker gene from a donor to a recipient breed was studied using analytical and simulation methods. Theoretical predictions of proportional genomic contributions, including donor linkage drag, from ancestors used at each generation of crossing after an introgression programme agreed closely with simulated results. The obligate drag, the donor genome surrounding the target locus that cannot be removed by subsequent selection, was also studied. It was shown that the number of backcross generations and the length of the chromosome affected proportional genomic contributions to the carrier chromosomes. Population structure had no significant effect on ancestral contributions and linkage drag but it did have an effect on the obligate drag whereby larger offspring groups resulted in smaller obligate drag. The implications for an introgression programme of the number of backcross generations, the population structure and the carrier chromosome length are discussed. The equations derived describing contributions to the genome from individuals from a given generation provide a framework to predict the genomic composition of a population after the introgression of a favourable donor allele. These ancestral contributions can be assigned a value and therefore allow the prediction of genetic lag. PMID:15823237

  14. Heat Shock Protein 70 and 90 Genes in the Harmful Dinoflagellate Cochlodinium polykrikoides: Genomic Structures and Transcriptional Responses to Environmental Stresses

    PubMed Central

    Guo, Ruoyu; Youn, Seok Hyun; Ki, Jang-Seu

    2015-01-01

    The marine dinoflagellate Cochlodinium polykrikoides is responsible for harmful algal blooms in aquatic environments and has spread into the world's oceans. As a microeukaryote, it seems to have distinct genomic characteristics, like gene structure and regulation. In the present study, we characterized heat shock protein (HSP) 70/90 of C. polykrikoides and evaluated their transcriptional responses to environmental stresses. Both HSPs contained the conserved motif patterns, showing the highest homology with those of other dinoflagellates. Genomic analysis showed that the CpHSP70 had no intron but was encoded by tandem arrangement manner with separation of intergenic spacers. However, CpHSP90 had one intron in the coding genomic regions, and no intergenic region was found. Phylogenetic analyses of separate HSPs showed that CpHSP70 was closely related with the dinoflagellate Crypthecodinium cohnii and CpHSP90 with other Gymnodiniales in dinoflagellates. Gene expression analyses showed that both HSP genes were upregulated by the treatments of separate algicides CuSO4 and NaOCl; however, they displayed downregulation pattern with PCB treatment. The transcription of CpHSP90 and CpHSP70 showed similar expression patterns under the same toxicant treatment, suggesting that both genes might have cooperative functions for the toxicant induced gene regulation in the dinoflagellate. PMID:26064872

  15. Genes for two calcium-dependent cell adhesion molecules have similar structures and are arranged in tandem in the chicken genome.

    PubMed Central

    Sorkin, B C; Gallin, W J; Edelman, G M; Cunningham, B A

    1991-01-01

    Genomic sequences immediately upstream of the translational start site for the chicken liver cell adhesion molecule (L-CAM) gene contain a second closely related gene, which, because of its location, we have designated the K-CAM gene. Less than 700 base pairs separate the presumed poly(A) site in the K-CAM gene from the translation initiation site for L-CAM. The sizes of exons 4-15 of the K-CAM gene are almost identical to those in the L-CAM gene and the exon/intron junctions occur at exactly equivalent positions in both genes. Exon 16, which includes the 3' untranslated region, is much shorter in the K-CAM gene and intron sizes and sequences are not generally conserved between the two genes. Probes from the K-CAM gene hybridized to a 3-kilobase mRNA that was present at high levels in embryonic skin, at lower levels in kidney, heart, and gizzard, and at still lower levels in brain and liver, as determined by Northern blotting. The sequence of the predicted gene product was nearly identical to that of the chicken B-cadherin cDNA, although the distribution of the K-CAM gene transcript differed from that reported for the cadherin. The proximity and identical overall structure of the K-CAM and L-CAM genes strongly suggest that they arose by gene duplication and raise the possibility that genes for other calcium-dependent CAMs may be located in clusters. Moreover, the tandem arrangement of the genes may have important implications for the regulation of their expression. Images PMID:1763068

  16. Horizontal gene transfer and the rock record: comparative genomics of phylogenetically distant bacteria that induce wrinkle structure formation in modern sediments.

    PubMed

    Flood, B E; Bailey, J V; Biddle, J F

    2014-03-01

    Wrinkle structures are sedimentary features that are produced primarily through the trapping and binding of siliciclastic sediments by mat-forming micro-organisms. Wrinkle structures and related sedimentary structures in the rock record are commonly interpreted to represent the stabilizing influence of cyanobacteria on sediments because cyanobacteria are known to produce similar textures and structures in modern tidal flat settings. However, other extant bacteria such as filamentous representatives of the family Beggiatoaceae can also interact with sediments to produce sedimentary features that morphologically resemble many of those associated with cyanobacteria-dominated mats. While Beggiatoa spp. and cyanobacteria are metabolically and phylogenetically distant, genomic analyses show that the two groups share hundreds of homologous genes, likely as the result of horizontal gene transfer. The comparative genomics results described here suggest that some horizontally transferred genes may code for phenotypic traits such as filament formation, chemotaxis, and the production of extracellular polymeric substances that potentially underlie the similar biostabilizing influences of these organisms on sediments. We suggest that the ecological utility of certain basic life modes such as the construction of mats and biofilms, coupled with the lateral mobility of genes in the microbial world, introduces an element of uncertainty into the inference of specific phylogenetic origins from gross morphological features preserved in the ancient rock record. PMID:24382125

  17. Aspergillus parasiticus SU-1 genome sequence, predicted chromosome structure, and comparative gene expression under aflatoxin-inducing conditions: evidence that differential expression contributes to species phenotype.

    PubMed

    Linz, John E; Wee, Josephine; Roze, Ludmila V

    2014-08-01

    The filamentous fungi Aspergillus parasiticus and Aspergillus flavus produce the carcinogenic secondary metabolite aflatoxin on susceptible crops. These species differ in the quantity of aflatoxins B1, B2, G1, and G2 produced in culture, in the ability to produce the mycotoxin cyclopiazonic acid, and in morphology of mycelia and conidiospores. To understand the genetic basis for differences in biochemistry and morphology, we conducted next-generation sequence (NGS) analysis of the A. parasiticus strain SU-1 genome and comparative gene expression (RNA sequence analysis [RNA Seq]) analysis of A. parasiticus SU-1 and A. flavus strain NRRL 3357 (3357) grown under aflatoxin-inducing and -noninducing culture conditions. Although A. parasiticus SU-1 and A. flavus 3357 are highly similar in genome structure and gene organization, we observed differences in the presence of specific mycotoxin gene clusters and differential expression of specific mycotoxin genes and gene clusters that help explain differences in the type and quantity of mycotoxins synthesized. Using computer-aided analysis of secondary metabolite clusters (antiSMASH), we demonstrated that A. parasiticus SU-1 and A. flavus 3357 may carry up to 93 secondary metabolite gene clusters, and surprisingly, up to 10% of the genome appears to be dedicated to secondary metabolite synthesis. The data also suggest that fungus-specific zinc binuclear cluster (C6) transcription factors play an important role in regulation of secondary metabolite cluster expression. Finally, we identified uniquely expressed genes in A. parasiticus SU-1 that encode C6 transcription factors and genes involved in secondary metabolism and stress response/cellular defense. Future work will focus on these differentially expressed A. parasiticus SU-1 loci to reveal their role in determining distinct species characteristics. PMID:24951444

  18. Uses of antimicrobial genes from microbial genome

    DOEpatents

    Sorek, Rotem; Rubin, Edward M.

    2013-08-20

    We describe a method for mining microbial genomes to discover antimicrobial genes and proteins having broad spectrum of activity. Also described are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.

  19. Global efforts in structural genomics.

    PubMed

    Stevens, R C; Yokoyama, S; Wilson, I A

    2001-10-01

    A worldwide initiative in structural genomics aims to capitalize on the recent successes of the genome projects. Substantial new investments in structural genomics in the past 2 years indicate the high level of support for these international efforts. Already, enormous progress has been made on high-throughput methodologies and technologies that will speed up macromolecular structure determinations. Recent international meetings have resulted in the formation of an International Structural Genomics Organization to formulate policy and foster cooperation between the public and private efforts. PMID:11588249

  20. Characterization of the genomic structure and tissue-specific promoter of the human nuclear receptor NR5A2 (hB1F) gene.

    PubMed

    Zhang, C K; Lin, W; Cai, Y N; Xu, P L; Dong, H; Li, M; Kong, Y Y; Fu, G; Xie, Y H; Huang, G M; Wang, Y

    2001-08-01

    The human homologue of the Drosophila melanogaster orphan nuclear receptor fushi tarazu factor 1 (Ftz-F1), NR5A2 (hB1F), was initially identified as a regulatory factor that binds and activates enhancer II of hepatitis B virus. NR5A2 (hB1F) is expressed specifically in pancreas and liver, playing important roles in the regulation of several liver-specific genes. A detailed analysis on the genomic structure and promoter activity will greatly promote future studies on the function of the NR5A2 (hB1F) gene. In this report, a bacterial artificial chromosome clone and several phage clones covering the NR5A2 (hB1F) gene were isolated and the complete genomic sequence was obtained. Alignment of different cDNAs of the NR5A2 (hB1F) gene with the genomic sequence facilitated the delineation of its structural organization, which spans over 150 kb and consists of eight exons interrupted by seven introns. RT-PCR and 3'-RACE revealed that utilization of two polyadenylation signals results in the 3.8 and 5.2 kb transcripts that were observed previously. The transcription start site of the NR5A2 (hB1F) gene was mapped downstream of a canonical TATA box. An upstream fragment containing binding sites for several liver-specific and ubiquitous transcription factors exhibits hepatocyte-specific promoter activity. Transient transfections indicated that hepatocyte nuclear factors HNF1 and HNF3beta could activate NR5A2 (hB1F) promoter. PMID:11595170

  1. Structure, expression profile and phylogenetic inference of chalcone isomerase-like genes from the narrow-leafed lupin (Lupinus angustifolius L.) genome

    PubMed Central

    Przysiecka, Łucja; Książkiewicz, Michał; Wolko, Bogdan; Naganowska, Barbara

    2015-01-01

    Lupins, like other legumes, have a unique biosynthesis scheme of 5-deoxy-type flavonoids and isoflavonoids. A key enzyme in this pathway is chalcone isomerase (CHI), a member of CHI-fold protein family, encompassing subfamilies of CHI1, CHI2, CHI-like (CHIL), and fatty acid-binding (FAP) proteins. Here, two Lupinus angustifolius (narrow-leafed lupin) CHILs, LangCHIL1 and LangCHIL2, were identified and characterized using DNA fingerprinting, cytogenetic and linkage mapping, sequencing and expression profiling. Clones carrying CHIL sequences were assembled into two contigs. Full gene sequences were obtained from these contigs, and mapped in two L. angustifolius linkage groups by gene-specific markers. Bacterial artificial chromosome fluorescence in situ hybridization approach confirmed the localization of two LangCHIL genes in distinct chromosomes. The expression profiles of both LangCHIL isoforms were very similar. The highest level of transcription was in the roots of the third week of plant growth; thereafter, expression declined. The expression of both LangCHIL genes in leaves and stems was similar and low. Comparative mapping to reference legume genome sequences revealed strong syntenic links; however, LangCHIL2 contig had a much more conserved structure than LangCHIL1. LangCHIL2 is assumed to be an ancestor gene, whereas LangCHIL1 probably appeared as a result of duplication. As both copies are transcriptionally active, questions arise concerning their hypothetical functional divergence. Screening of the narrow-leafed lupin genome and transcriptome with CHI-fold protein sequences, followed by Bayesian inference of phylogeny and cross-genera synteny survey, identified representatives of all but one (CHI1) main subfamilies. They are as follows: two copies of CHI2, FAPa2 and CHIL, and single copies of FAPb and FAPa1. Duplicated genes are remnants of whole genome duplication which is assumed to have occurred after the divergence of Lupinus, Arachis, and Glycine

  2. Genomic organization of mouse gene zfp162.

    PubMed

    Wrehlke, C; Wiedemeyer, W R; Schmitt-Wrede, H P; Mincheva, A; Lichter, P; Wunderlich, F

    1999-05-01

    We report the cloning and characterization of the alternatively spliced mouse gene zfp162, formerly termed mzfm, the homolog of the human ZFM1 gene encoding the splicing factor SF1 and a putative signal transduction and activation of RNA (STAR) protein. The zfp162 gene is about 14 kb long and consists of 14 exons and 13 introns. Comparison of zfp162 with the genomic sequences of ZFM1/SF1 revealed that the exon-intron structure and exon sequences are well conserved between the genes, whereas the introns differ in length and sequence composition. Using fluorescent in situ hybridization, the zfp162 gene was assigned to chromosome 19, region B. Screening of a genomic library integrated in lambda DASH II resulted in the identification of the 5'-flanking region of zfp162. Sequence analysis of this region showed that zfp162 is a TATA-less gene containing an initiator control element and two CCAAT boxes. The promoter exhibits the following motifs: AP-2, CRE, Ets, GRE, HNF5, MRE, SP-1, TRE, TCF1, and PU.1. The core promoter, from position -331 to -157, contains the motifs CRE, SP-1, MRE, and AP-2, as determined in transfected CHO-K1 cells and IC-21 cells by reporter gene assay using a secreted form of human placental alkaline phosphatase. The occurrence of PU.1/GRE supports the view that the zfp162 gene encodes a protein involved not only in nuclear RNA metabolism, as the human ZFM1/SF1, but also in as yet unknown macrophage-inherent functions. PMID:10360842

  3. Single molecule real-time sequencing of Xanthomonas oryzae genomes reveals a dynamic structure and complex TAL (transcription activator-like) effector gene relationships

    PubMed Central

    Booher, Nicholas J.; Carpenter, Sara C. D.; Sebra, Robert P.; Wang, Li; Salzberg, Steven L.; Leach, Jan E.; Bogdanove, Adam J.

    2016-01-01

    Pathogen-injected, direct transcriptional activators of host genes, TAL (transcription activator-like) effectors play determinative roles in plant diseases caused by Xanthomonas spp. A large domain of nearly identical, 33–35 aa repeats in each protein mediates DNA recognition. This modularity makes TAL effectors customizable and thus important also in biotechnology. However, the repeats render TAL effector (tal) genes nearly impossible to assemble using next-generation, short reads. Here, we demonstrate that long-read, single molecule real-time (SMRT) sequencing solves this problem. Taking an ensemble approach to first generate local, tal gene contigs, we correctly assembled de novo the genomes of two strains of the rice pathogen X. oryzae completed previously using the Sanger method and even identified errors in those references. Sequencing two more strains revealed a dynamic genome structure and a striking plasticity in tal gene content. Our results pave the way for population-level studies to inform resistance breeding, improve biotechnology and probe TAL effector evolution. PMID:27148456

  4. Persistence drives gene clustering in bacterial genomes

    PubMed Central

    Fang, Gang; Rocha, Eduardo PC; Danchin, Antoine

    2008-01-01

    Background Gene clustering plays an important role in the organization of the bacterial chromosome and several mechanisms have been proposed to explain its extent. However, the controversies raised about the validity of each of these mechanisms remind us that the cause of this gene organization remains an open question. Models proposed to explain clustering did not take into account the function of the gene products nor the likely presence or absence of a given gene in a genome. However, genomes harbor two very different categories of genes: those genes present in a majority of organisms – persistent genes – and those present in very few organisms – rare genes. Results We show that two classes of genes are significantly clustered in bacterial genomes: the highly persistent and the rare genes. The clustering of rare genes is readily explained by the selfish operon theory. Yet, genes persistently present in bacterial genomes are also clustered and we try to understand why. We propose a model accounting specifically for such clustering, and show that indispensability in a genome with frequent gene deletion and insertion leads to the transient clustering of these genes. The model describes how clusters are created via the gene flux that continuously introduces new genes while deleting others. We then test if known selective processes, such as co-transcription, physical interaction or functional neighborhood, account for the stabilization of these clusters. Conclusion We show that the strong selective pressure acting on the function of persistent genes, in a permanent state of flux of genes in bacterial genomes, maintaining their size fairly constant, that drives persistent genes clustering. A further selective stabilization process might contribute to maintaining the clustering. PMID:18179692

  5. Genome Majority Vote Improves Gene Predictions

    PubMed Central

    Wall, Michael E.; Raghavan, Sindhu; Cohn, Judith D.; Dunbar, John

    2011-01-01

    Recent studies have noted extensive inconsistencies in gene start sites among orthologous genes in related microbial genomes. Here we provide the first documented evidence that imposing gene start consistency improves the accuracy of gene start-site prediction. We applied an algorithm using a genome majority vote (GMV) scheme to increase the consistency of gene starts among orthologs. We used a set of validated Escherichia coli genes as a standard to quantify accuracy. Results showed that the GMV algorithm can correct hundreds of gene prediction errors in sets of five or ten genomes while introducing few errors. Using a conservative calculation, we project that GMV would resolve many inconsistencies and errors in publicly available microbial gene maps. Our simple and logical solution provides a notable advance toward accurate gene maps. PMID:22131910

  6. A joint modeling approach for uncovering associations between gene expression, bioactivity and chemical structure in early drug discovery to guide lead selection and genomic biomarker development.

    PubMed

    Perualila-Tan, Nolen; Kasim, Adetayo; Talloen, Willem; Verbist, Bie; Göhlmann, Hinrich W H; Shkedy, Ziv

    2016-08-01

    The modern drug discovery process involves multiple sources of high-dimensional data. This imposes the challenge of data integration. A typical example is the integration of chemical structure (fingerprint features), phenotypic bioactivity (bioassay read-outs) data for targets of interest, and transcriptomic (gene expression) data in early drug discovery to better understand the chemical and biological mechanisms of candidate drugs, and to facilitate early detection of safety issues prior to later and expensive phases of drug development cycles. In this paper, we discuss a joint model for the transcriptomic and the phenotypic variables conditioned on the chemical structure. This modeling approach can be used to uncover, for a given set of compounds, the association between gene expression and biological activity taking into account the influence of the chemical structure of the compound on both variables. The model allows to detect genes that are associated with the bioactivity data facilitating the identification of potential genomic biomarkers for compounds efficacy. In addition, the effect of every structural feature on both genes and pIC50 and their associations can be simultaneously investigated. Two oncology projects are used to illustrate the applicability and usefulness of the joint model to integrate multi-source high-dimensional information to aid drug discovery. PMID:27269248

  7. Genome-wide comparative analysis of the Brassica rapa gene space reveals genome shrinkage and differential loss of duplicated genes after whole genome triplication

    PubMed Central

    2009-01-01

    Background Brassica rapa is one of the most economically important vegetable crops worldwide. Owing to its agronomic importance and phylogenetic position, B. rapa provides a crucial reference to understand polyploidy-related crop genome evolution. The high degree of sequence identity and remarkably conserved genome structure between Arabidopsis and Brassica genomes enables comparative tiling sequencing using Arabidopsis sequences as references to select the counterpart regions in B. rapa, which is a strong challenge of structural and comparative crop genomics. Results We assembled 65.8 megabase-pairs of non-redundant euchromatic sequence of B. rapa and compared this sequence to the Arabidopsis genome to investigate chromosomal relationships, macrosynteny blocks, and microsynteny within blocks. The triplicated B. rapa genome contains only approximately twice the number of genes as in Arabidopsis because of genome shrinkage. Genome comparisons suggest that B. rapa has a distinct organization of ancestral genome blocks as a result of recent whole genome triplication followed by a unique diploidization process. A lack of the most recent whole genome duplication (3R) event in the B. rapa genome, atypical of other Brassica genomes, may account for the emergence of B. rapa from the Brassica progenitor around 8 million years ago. Conclusions This work demonstrates the potential of using comparative tiling sequencing for genome analysis of crop species. Based on a comparative analysis of the B. rapa sequences and the Arabidopsis genome, it appears that polyploidy and chromosomal diploidization are ongoing processes that collectively stabilize the B. rapa genome and facilitate its evolution. PMID:19821981

  8. A Gene-By-Gene Approach to Bacterial Population Genomics: Whole Genome MLST of Campylobacter.

    PubMed

    Sheppard, Samuel K; Jolley, Keith A; Maiden, Martin C J

    2012-01-01

    Campylobacteriosis remains a major human public health problem world-wide. Genetic analyses of Campylobacter isolates, and particularly molecular epidemiology, have been central to the study of this disease, particularly the characterization of Campylobacter genotypes isolated from human infection, farm animals, and retail food. These studies have demonstrated that Campylobacter populations are highly structured, with distinct genotypes associated with particular wild or domestic animal sources, and that chicken meat is the most likely source of most human infection in countries such as the UK. The availability of multiple whole genome sequences from Campylobacter isolates presents the prospect of identifying those genes or allelic variants responsible for host-association and increased human disease risk, but the diversity of Campylobacter genomes present challenges for such analyses. We present a gene-by-gene approach for investigating the genetic basis of phenotypes in diverse bacteria such as Campylobacter, implemented with the BIGSdb software on the pubMLST.org/campylobacter website. PMID:24704917

  9. Genome-wide identification of BURP domain-containing genes in rice reveals a gene family with diverse structures and responses to abiotic stresses.

    PubMed

    Ding, Xipeng; Hou, Xin; Xie, Kabin; Xiong, Lizhong

    2009-06-01

    Increasing evidence suggests that a gene family encoding proteins containing BURP domains have diverse functions in plants, but systematic characterization of this gene family have not been reported. In this study, 17 BURP family genes (OsBURP01-17) were identified and analyzed in rice (Oryza sativa L.). These genes have diverse exon-intron structures and distinct organization of putative motifs. Based on the phylogenetic analysis of BURP protein sequences from rice and other plant species, the BURP family was classified into seven subfamilies, including two subfamilies (BURP V and BURP VI) with members from rice only and one subfamily (BURP VII) with members from monocotyledons only. Two BURP gene clusters, belonging to BURP V and BURP VI, were located in the duplicated region on chromosome 5 and 6 of rice, respectively. Transcript level analysis of BURP genes of rice in various tissues and organs revealed different tempo-spatial expression patterns, suggesting that these genes may function at different stages of plant growth and development. Interestingly, all the genes of the BURP VII subfamily were predominantly expressed in flower organs. We also investigated the expression patterns of BURP genes of rice under different stress conditions. The results suggested that, except for two genes (OsBURP01 and OsBURP13), all other members were induced by at least one of the stresses including drought, salt, cold, and abscisic acid treatment. Two genes (OsBURP05 and OsBURP16) were responsive to all the stress treatments and most of the OsBURP genes were responsive to salt stress. Promoter sequence analysis revealed an over-abundance of stress-related cis-elements in the stress-responsive genes. The data presented here provide important clues for elucidating the functions of genes of this family. PMID:19363683

  10. Genome Structure of the Legume, Lotus japonicus

    PubMed Central

    Sato, Shusei; Nakamura, Yasukazu; Kaneko, Takakazu; Asamizu, Erika; Kato, Tomohiko; Nakao, Mitsuteru; Sasamoto, Shigemi; Watanabe, Akiko; Ono, Akiko; Kawashima, Kumiko; Fujishiro, Tsunakazu; Katoh, Midori; Kohara, Mitsuyo; Kishida, Yoshie; Minami, Chiharu; Nakayama, Shinobu; Nakazaki, Naomi; Shimizu, Yoshimi; Shinpo, Sayaka; Takahashi, Chika; Wada, Tsuyuko; Yamada, Manabu; Ohmido, Nobuko; Hayashi, Makoto; Fukui, Kiichi; Baba, Tomoya; Nakamichi, Tomoko; Mori, Hirotada; Tabata, Satoshi

    2008-01-01

    The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10 951 complete and 19 848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes. PMID:18511435

  11. Evolution of the P-type II ATPase gene family in the fungi and presence of structural genomic changes among isolates of Glomus intraradices

    PubMed Central

    Corradi, Nicolas; Sanders, Ian R

    2006-01-01

    Background The P-type II ATPase gene family encodes proteins with an important role in adaptation of the cell to variation in external K+, Ca2+ and Na2+ concentrations. The presence of P-type II gene subfamilies that are specific for certain kingdoms has been reported but was sometimes contradicted by discovery of previously unknown homologous sequences in newly sequenced genomes. Members of this gene family have been sampled in all of the fungal phyla except the arbuscular mycorrhizal fungi (AMF; phylum Glomeromycota), which are known to play a key-role in terrestrial ecosystems and to be genetically highly variable within populations. Here we used highly degenerate primers on AMF genomic DNA to increase the sampling of fungal P-Type II ATPases and to test previous predictions about their evolution. In parallel, homologous sequences of the P-type II ATPases have been used to determine the nature and amount of polymorphism that is present at these loci among isolates of Glomus intraradices harvested from the same field. Results In this study, four P-type II ATPase sub-families have been isolated from three AMF species. We show that, contrary to previous predictions, P-type IIC ATPases are present in all basal fungal taxa. Additionally, P-Type IIE ATPases should no longer be considered as exclusive to the Ascomycota and the Basidiomycota, since we also demonstrate their presence in the Zygomycota. Finally, a comparison of homologous sequences encoding P-type IID ATPases showed unexpectedly that indel mutations among coding regions, as well as specific gene duplications occur among AMF individuals within the same field. Conclusion On the basis of these results we suggest that the diversification of P-Type IIC and E ATPases followed the diversification of the extant fungal phyla with independent events of gene gains and losses. Consistent with recent findings on the human genome, but at a much smaller geographic scale, we provided evidence that structural genomic

  12. Connected gene neighborhoods in prokaryotic genomes

    PubMed Central

    Rogozin, Igor B.; Makarova, Kira S.; Murvai, Janos; Czabarka, Eva; Wolf, Yuri I.; Tatusov, Roman L.; Szekely, Laszlo A.; Koonin, Eugene V.

    2002-01-01

    A computational method was developed for delineating connected gene neighborhoods in bacterial and archaeal genomes. These gene neighborhoods are not typically present, in their entirety, in any single genome, but are held together by overlapping, partially conserved gene arrays. The procedure was applied to comparing the orders of orthologous genes, which were extracted from the database of Clusters of Orthologous Groups of proteins (COGs), in 31 prokaryotic genomes and resulted in the identification of 188 clusters of gene arrays, which included 1001 of 2890 COGs. These clusters were projected onto actual genomes to produce extended neighborhoods including additional genes, which are adjacent to the genes from the clusters and are transcribed in the same direction, which resulted in a total of 2387 COGs being included in the neighborhoods. Most of the neighborhoods consist predominantly of genes united by a coherent functional theme, but also include a minority of genes without an obvious functional connection to the main theme. We hypothesize that although some of the latter genes might have unsuspected roles, others are maintained within gene arrays because of the advantage of expression at a level that is typical of the given neighborhood. We designate this phenomenon ‘genomic hitchhiking’. The largest neighborhood includes 79 genes (COGs) and consists of overlapping, rearranged ribosomal protein superoperons; apparent genome hitchhiking is particularly typical of this neighborhood and other neighborhoods that consist of genes coding for translation machinery components. Several neighborhoods involve previously undetected connections between genes, allowing new functional predictions. Gene neighborhoods appear to evolve via complex rearrangement, with different combinations of genes from a neighborhood fixed in different lineages. PMID:12000841

  13. Informational laws of genome structures

    PubMed Central

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-01-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined. PMID:27354155

  14. Informational laws of genome structures

    NASA Astrophysics Data System (ADS)

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-06-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.

  15. Informational laws of genome structures.

    PubMed

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-01-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined. PMID:27354155

  16. Multiple genome alignment for identifying the core structure among moderately related microbial genomes

    PubMed Central

    Uchiyama, Ikuo

    2008-01-01

    Background Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. Results The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. Conclusion The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes. PMID:18976470

  17. Identification of genes in genomic and EST sequences

    SciTech Connect

    Fields, C.; Adams, M.D.; Kerlavage, A.R.; Dubnick, M.; McCombie, W.R.; Martin-Gallardo, A.; Venter, J.C.; White, O.

    1993-12-31

    Currently-available software tools are capable of predicting the locations of most protein-coding genes in anonymous genomic DNA sequences. The use of predicted exxon to select primers for PCR amplification from cDNA libraries allows the complete structures of novel genes to be determined efficiently. As the number of expressed sequence tag (EST) sequences increases, the fraction of genes that can be localized in genomic sequences by searching EST databases will rapidly approach unity. The challenge for automated DNA sequence analysis is now to develop methods for accurately predicting gene structure and alternative splicing patterns. Substantially improving current accuracies in gene structure prediction will require retrospective comparative analysis of sequences from different organisms and gene families.

  18. Directed self-assembly, genomic assembly complexity and the formation of biological structure, or, what are the genes for nacre?

    PubMed

    Cartwright, Julyan H E

    2016-03-13

    Biology uses dynamical mechanisms of self-organization and self-assembly of materials, but it also choreographs and directs these processes. The difference between abiotic self-assembly and a biological process is rather like the difference between setting up and running an experiment to make a material remotely compared with doing it in one's own laboratory: with a remote experiment-say on the International Space Station-everything must be set up beforehand to let the experiment run 'hands off', but in the laboratory one can intervene at any point in a 'hands-on' approach. It is clear that the latter process, of directed self-assembly, can allow much more complicated experiments and produce far more complex structures than self-assembly alone. This control over self-assembly in biology is exercised at certain key waypoints along a trajectory and the process may be quantified in terms of the genomic assembly complexity of a biomaterial. PMID:26857670

  19. Ligninolytic peroxidase genes in the oyster mushroom genome: heterologous expression, molecular structure, catalytic and stability properties, and lignin-degrading ability

    PubMed Central

    2014-01-01

    Background The genome of Pleurotus ostreatus, an important edible mushroom and a model ligninolytic organism of interest in lignocellulose biorefineries due to its ability to delignify agricultural wastes, was sequenced with the purpose of identifying and characterizing the enzymes responsible for lignin degradation. Results Heterologous expression of the class II peroxidase genes, followed by kinetic studies, enabled their functional classification. The resulting inventory revealed the absence of lignin peroxidases (LiPs) and the presence of three versatile peroxidases (VPs) and six manganese peroxidases (MnPs), the crystal structures of two of them (VP1 and MnP4) were solved at 1.0 to 1.1 Å showing significant structural differences. Gene expansion supports the importance of both peroxidase types in the white-rot lifestyle of this fungus. Using a lignin model dimer and synthetic lignin, we showed that VP is able to degrade lignin. Moreover, the dual Mn-mediated and Mn-independent activity of P. ostreatus MnPs justifies their inclusion in a new peroxidase subfamily. The availability of the whole POD repertoire enabled investigation, at a biochemical level, of the existence of duplicated genes. Differences between isoenzymes are not limited to their kinetic constants. Surprising differences in their activity T50 and residual activity at both acidic and alkaline pH were observed. Directed mutagenesis and spectroscopic/structural information were combined to explain the catalytic and stability properties of the most interesting isoenzymes, and their evolutionary history was analyzed in the context of over 200 basidiomycete peroxidase sequences. Conclusions The analysis of the P. ostreatus genome shows a lignin-degrading system where the role generally played by LiP has been assumed by VP. Moreover, it enabled the first characterization of the complete set of peroxidase isoenzymes in a basidiomycete, revealing strong differences in stability properties and providing

  20. JGI Plant Genomics Gene Annotation Pipeline

    SciTech Connect

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  1. Gene Insertion Into Genomic Safe Harbors for Human Gene Therapy.

    PubMed

    Papapetrou, Eirini P; Schambach, Axel

    2016-04-01

    Genomic safe harbors (GSHs) are sites in the genome able to accommodate the integration of new genetic material in a manner that ensures that the newly inserted genetic elements: (i) function predictably and (ii) do not cause alterations of the host genome posing a risk to the host cell or organism. GSHs are thus ideal sites for transgene insertion whose use can empower functional genetics studies in basic research and therapeutic applications in human gene therapy. Currently, no fully validated GSHs exist in the human genome. Here, we review our formerly proposed GSH criteria and discuss additional considerations on extending these criteria, on strategies for the identification and validation of GSHs, as well as future prospects on GSH targeting for therapeutic applications. In view of recent advances in genome biology, gene targeting technologies, and regenerative medicine, gene insertion into GSHs can potentially catalyze nearly all applications in human gene therapy. PMID:26867951

  2. Genes but Not Genomes Reveal Bacterial Domestication of Lactococcus Lactis

    PubMed Central

    Passerini, Delphine; Beltramo, Charlotte; Coddeville, Michele; Quentin, Yves; Ritzenthaler, Paul

    2010-01-01

    Background The population structure and diversity of Lactococcus lactis subsp. lactis, a major industrial bacterium involved in milk fermentation, was determined at both gene and genome level. Seventy-six lactococcal isolates of various origins were studied by different genotyping methods and thirty-six strains displaying unique macrorestriction fingerprints were analyzed by a new multilocus sequence typing (MLST) scheme. This gene-based analysis was compared to genomic characteristics determined by pulsed-field gel electrophoresis (PFGE). Methodology/Principal Findings The MLST analysis revealed that L. lactis subsp. lactis is essentially clonal with infrequent intra- and intergenic recombination; also, despite its taxonomical classification as a subspecies, it displays a genetic diversity as substantial as that within several other bacterial species. Genome-based analysis revealed a genome size variability of 20%, a value typical of bacteria inhabiting different ecological niches, and that suggests a large pan-genome for this subspecies. However, the genomic characteristics (macrorestriction pattern, genome or chromosome size, plasmid content) did not correlate to the MLST-based phylogeny, with strains from the same sequence type (ST) differing by up to 230 kb in genome size. Conclusion/Significance The gene-based phylogeny was not fully consistent with the traditional classification into dairy and non-dairy strains but supported a new classification based on ecological separation between “environmental” strains, the main contributors to the genetic diversity within the subspecies, and “domesticated” strains, subject to recent genetic bottlenecks. Comparison between gene- and genome-based analyses revealed little relationship between core and dispensable genome phylogenies, indicating that clonal diversification and phenotypic variability of the “domesticated” strains essentially arose through substantial genomic flux within the dispensable genome

  3. Reconstruction of ancestral gene orders using intermediate genomes

    PubMed Central

    2015-01-01

    Background The problem of reconstructing ancestral genomes in a given phylogenetic tree arises in many different comparative genomics fields. Here, we focus on reconstructing the gene order of ancestral genomes, a problem that has been largely studied in the past 20 years, especially with the increasing availability of whole genome DNA sequences. There are two main approaches to this problem: event-based methods, that try to find the ancestral genomes that minimize the number of rearrangement events in the tree; and homology-based, that look for conserved structures, such as adjacent genes in the extant genomes, to build the ancestral genomes. Results We propose algorithms that use the concept of intermediate genomes, arising in optimal pairwise rearrangement scenarios. We show that intermediate genomes have combinatorial properties that make them easy to reconstruct, and develop fast algorithms with better reconstructed ancestral genomes than current event-based methods. The proposed framework is also designed to accept extra information, such as results from homology-based approaches, giving rise to combined algorithms with better results than the original methods. PMID:26451811

  4. Mobilized retrotransposon Tos17 of rice by alien DNA introgression transposes into genes and causes structural and methylation alterations of a flanking genomic region.

    PubMed

    Han, F P; Liu, Z L; Tan, M; Hao, S; Fedak, G; Liu, B

    2004-01-01

    Tos17 is a copia-like endogenous retrotransposon of rice, which can be activated by various stresses such as tissue culture and alien DNA introgression. To confirm element mobilization by introgression and to study possible structural and epigenetic effects of Tos17 insertion on its target sequences, we isolated all flanking regions of Tos17 in an introgressed rice line (Tong35) that contains minute amount of genomic DNA from wild rice (Zizania latifolia). It was found that there has been apparent but limited mobilization of Tos17 in this introgression line, as being reflected by increased but stable copy number of the element in progeny of the line. Three of the five activated copies of the element have transposed into genes. Based on sequence analysis and Southern blot hybridization with several double-enzyme digests, no structural change in Tos17 could be inferred in the introgression line. Cytosine methylation status at all seven CCGG sites within Tos17 was also identical between the introgression line and its rice parent (Matsumae)-all sites being heavily methylated. In contrast, changes in structure and cytosine methylation patterns were detected in one of the three low-copy genomic regions that flank newly transposed Tos17, and all changes are stably inherited through selfed generations. PMID:15703040

  5. Genome evolution in maize: from genomes back to genes.

    PubMed

    Schnable, James C

    2015-01-01

    Maize occupies dual roles as both (a) one of the big-three grain species (along with rice and wheat) responsible for providing more than half of the calories consumed around the world, and (b) a model system for plant genetics and cytogenetics dating back to the origin of the field of genetics in the early twentieth century. The long history of genetic investigation in this species combined with modern genomic and quantitative genetic data has provided particular insight into the characteristics of genes linked to phenotypes and how these genes differ from many other sequences in plant genomes that are not easily distinguishable based on molecular data alone. These recent results suggest that the number of genes in plants that make significant contributions to phenotype may be lower than the number of genes defined by current molecular criteria, and also indicate that syntenic conservation has been underemphasized as a marker for gene function. PMID:25494463

  6. Genome-wide characterization of maize miRNA genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    MicroRNAs (miRNAs) are small non-coding RNAs that play essential roles in plant growth and development. We conducted a genome-wide survey of maize miRNA genes, characterizing their structure, expression, and evolution. Computational approaches based on homology and secondary structure modeling ident...

  7. Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays

    PubMed Central

    Mak, Angel C. Y.; Lai, Yvonne Y. Y.; Lam, Ernest T.; Kwok, Tsz-Piu; Leung, Alden K. Y.; Poon, Annie; Mostovoy, Yulia; Hastie, Alex R.; Stedman, William; Anantharaman, Thomas; Andrews, Warren; Zhou, Xiang; Pang, Andy W. C.; Dai, Heng; Chu, Catherine; Lin, Chin; Wu, Jacob J. K.; Li, Catherine M. L.; Li, Jing-Woei; Yim, Aldrin K. Y.; Chan, Saki; Sibert, Justin; Džakula, Željko; Cao, Han; Yiu, Siu-Ming; Chan, Ting-Fung; Yip, Kevin Y.; Xiao, Ming; Kwok, Pui-Yan

    2016-01-01

    Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation. PMID:26510793

  8. From gene action to reactive genomes

    PubMed Central

    Keller, Evelyn Fox

    2014-01-01

    Poised at a critical turning point in the history of genetics, recent work (e.g. in genomics, epigenetics, genomic plasticity) obliges us to critically reexamine many of our most basic concepts. For example, I argue that genomic research supports a radical transformation in our understanding of the genome – a shift from an earlier conception of that entity as an effectively static collection of active genes to that of a dynamic and reactive system dedicated to the context specific regulation of protein-coding sequences. PMID:24882822

  9. Pichia stipitis genomics, transcriptomics, and gene clusters

    PubMed Central

    Jeffries, Thomas W; Van Vleet, Jennifer R Headman

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the result of several gene products acting together. When coinheritance is necessary for the overall physiological function, recombination and selection favor colocation of these genes in a cluster. These are particularly evident in strongly conserved and idiomatic traits. In some cases, the functional clusters consist of multiple gene families. Phylogenetic analyses of the members in each family show that once formed, functional clusters undergo duplication and differentiation. Genome-wide expression analysis reveals that regulatory patterns of clusters are similar after they have duplicated and that the expression profiles evolve along with functional differentiation of the clusters. Orthologous gene families appear to arise through tandem gene duplication, followed by differentiation in the regulatory and coding regions of the gene. Genome-wide expression analysis combined with cross-species comparisons of functional gene clusters should reveal many more aspects of eukaryotic physiology. PMID:19659741

  10. Recurrent Gene Duplication Diversifies Genome Defense Repertoire in Drosophila.

    PubMed

    Levine, Mia T; Vander Wende, Helen M; Hsieh, Emily; Baker, EmilyClare P; Malik, Harmit S

    2016-07-01

    Transposable elements (TEs) comprise large fractions of many eukaryotic genomes and imperil host genome integrity. The host genome combats these challenges by encoding proteins that silence TE activity. Both the introduction of new TEs via horizontal transfer and TE sequence evolution requires constant innovation of host-encoded TE silencing machinery to keep pace with TEs. One form of host innovation is the adaptation of existing, single-copy host genes. Indeed, host suppressors of TE replication often harbor signatures of positive selection. Such signatures are especially evident in genes encoding the piwi-interacting-RNA pathway of gene silencing, for example, the female germline-restricted TE silencer, HP1D/Rhino Host genomes can also innovate via gene duplication and divergence. However, the importance of gene family expansions, contractions, and gene turnover to host genome defense has been largely unexplored. Here, we functionally characterize Oxpecker, a young, tandem duplicate gene of HP1D/rhino We demonstrate that Oxpecker supports female fertility in Drosophila melanogaster and silences several TE families that are incompletely silenced by HP1D/Rhino in the female germline. We further show that, like Oxpecker, at least ten additional, structurally diverse, HP1D/rhino-derived daughter and "granddaughter" genes emerged during a short 15-million year period of Drosophila evolution. These young paralogs are transcribed primarily in germline tissues, where the genetic conflict between host genomes and TEs plays out. Our findings suggest that gene family expansion is an underappreciated yet potent evolutionary mechanism of genome defense diversification. PMID:26979388

  11. Selecting soluble/foldable protein domains through single-gene or genomic ORF filtering: structure of the head domain of Burkholderia pseudomallei antigen BPSL2063.

    PubMed

    Gourlay, Louise J; Peano, Clelia; Deantonio, Cecilia; Perletti, Lucia; Pietrelli, Alessandro; Villa, Riccardo; Matterazzo, Elena; Lassaux, Patricia; Santoro, Claudio; Puccio, Simone; Sblattero, Daniele; Bolognesi, Martino

    2015-11-01

    The 1.8 Å resolution crystal structure of a conserved domain of the potential Burkholderia pseudomallei antigen and trimeric autotransporter BPSL2063 is presented as a structural vaccinology target for melioidosis vaccine development. Since BPSL2063 (1090 amino acids) hosts only one conserved domain, and the expression/purification of the full-length protein proved to be problematic, a domain-filtering library was generated using β-lactamase as a reporter gene to select further BPSL2063 domains. As a result, two domains (D1 and D2) were identified and produced in soluble form in Escherichia coli. Furthermore, as a general tool, a genomic open reading frame-filtering library from the B. pseudomallei genome was also constructed to facilitate the selection of domain boundaries from the entire ORFeome. Such an approach allowed the selection of three potential protein antigens that were also produced in soluble form. The results imply the further development of ORF-filtering methods as a tool in protein-based research to improve the selection and production of soluble proteins or domains for downstream applications such as X-ray crystallography. PMID:26527140

  12. CiMT-1, an unusual chordate metallothionein gene in Ciona intestinalis genome: structure and expression studies.

    PubMed

    Franchi, Nicola; Boldrin, Francesco; Ballarin, Loriano; Piccinni, Ester

    2011-02-01

    The present article reports on the characterization of the urochordate metallothionein (MT) gene, CiMT-1, from the solitary ascidian Ciona intestinalis. The predicted protein is shorter than other known deuterostome MTs, having only 39 amino acids. The gene has the same tripartite structure as vertebrate MTs, with some features resembling those of echinoderm MTs. The promoter region shows the canonical cis-acting elements recognized by transcription factors that respond to metal, ROS, and cytokines. Unusual sequences, described in fish and echinoderms, are also present. In situ hybridization suggests that only a population of hemocytes involved in immune responses, i.e. granular amebocytes, express CiMT-1 mRNA. These observations support the idea that urochordates perform detoxification through hemocytes, and that MTs may play important roles in inflammatory humoral responses in tunicates. The reported data offer new clues for better understanding the evolution of these multivalent proteins from non-vertebrate to vertebrate chordates and reinforce their functions in detoxification and immunity. PMID:21328559

  13. PGDD: a database of gene and genome duplication in plants

    PubMed Central

    Lee, Tae-Ho; Tang, Haibao; Wang, Xiyin; Paterson, Andrew H.

    2013-01-01

    Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at http://chibba.agtec.uga.edu/duplication/), a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD. PMID:23180799

  14. Structural Genomics of Protein Phosphatases

    SciTech Connect

    Almo,S.; Bonanno, J.; Sauder, J.; Emtage, S.; Dilorenzo, T.; Malashkevich, V.; Wasserman, S.; Swaminathan, S.; Eswaramoorthy, S.; et al

    2007-01-01

    The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptional regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.

  15. Gene duplication and transfer events in plant mitochondria genome

    SciTech Connect

    Xiong Aisheng Peng Rihe; Zhuang Jing; Gao Feng; Zhu Bo; Fu Xiaoyan; Xue Yong; Jin Xiaofen; Tian Yongsheng; Zhao Wei; Yao Quanhong

    2008-11-07

    Gene or genome duplication events increase the amount of genetic material available to increase the genomic, and thereby phenotypic, complexity of organisms during evolution. Gene duplication and transfer events have been important to molecular evolution in all three domains of life, and may be the first step in the emergence of new gene functions. Gene transfer events have been proposed as another accelerator of evolution. The duplicated gene or genome, mainly nuclear, has been the subject of several recent reviews. In addition to the nuclear genome, organisms have organelle genomes, including mitochondrial genome. In this review, we briefly summarize gene duplication and transfer events in the plant mitochondrial genome.

  16. Characterization of the Wilson disease gene: Genomic organization; alternative splicing; structure/function predictions; and population frequencies of disease-specific mutations

    SciTech Connect

    Petrukhin, K.; Chernov, I.; Ross, B.M.

    1994-09-01

    The Wilson disease (WD) gene has recently been identified as a putative copper-transporting ATPase with high amino acid similarity with the Menkes disease (MNK) gene. We have further characterized the WD gene by extending the 5{prime}-coding and non-coding DNA sequence and elucidating the intron/exon structure and genomic organization. Analysis of RNA transcripts from liver, brain, kidney and placenta reveals extensive alternative splicing which may provide a mechanism to regulate the quantity of functional protein product. Comparative sequence analysis shows that WD and MNK belong to the sub-family of heavy metal-transporting ATPases with several characterizing features which include unique amino acid motifs and distinct N-terminal and C-terminal transmembrane structure. Our data indicate that the 600 amino acid metal binding portion of the WD and MNK proteins was formed by gene duplication events and splicing of the 6 metal binding domain segment to a common ancestral protein. We have raised a WD-specific anti-peptide antibody to the N-terminal region and are beginning to explore the cellular and intracellular location of the WD protein. The metal-binding segment of the WD protein has been expressed in E. coli and metal binding assays are underway to characterize this aspect of the protein`s function. We have identified numerous disease-specific mutations and developed a rapid {open_quotes}reverse dot blot{close_quotes} screening protocol to determine mutation frequencies in different populations. The most common mutation disrupts the characteristic SEHP motif and accounts for more than 40% of WD cases in North American, Russian, and Swedish populations. This mutation has not been observed in our limited Sicilian sample.

  17. Using Genomics for Natural Product Structure Elucidation.

    PubMed

    Tietz, Jonathan I; Mitchell, Douglas A

    2016-01-01

    Natural products (NPs) are the most historically bountiful source of chemical matter for drug development-especially for anti-infectives. With insights gleaned from genome mining, interest in natural product discovery has been reinvigorated. An essential stage in NP discovery is structural elucidation, which sheds light not only on the chemical composition of a molecule but also its novelty, properties, and derivatization potential. The history of structure elucidation is replete with techniquebased revolutions: combustion analysis, crystallography, UV, IR, MS, and NMR have each provided game-changing advances; the latest such advance is genomics. All natural products have a genetic basis, and the ability to obtain and interpret genomic information for structure elucidation is increasingly available at low cost to non-specialists. In this review, we describe the value of genomics as a structural elucidation technique, especially from the perspective of the natural product chemist approaching an unknown metabolite. Herein we first introduce the databases and programs of interest to the natural products chemist, with an emphasis on those currently most suited for general usability. We describe strategies for linking observed natural product-linked phenotypes to their corresponding gene clusters. We then discuss techniques for extracting structural information from genes, illustrated with numerous case examples. We also provide an analysis of the biases and limitations of the field with recommendations for future development. Our overview is not only aimed at biologically-oriented researchers already at ease with bioinformatic techniques, but also, in particular, at natural product, organic, and/or medicinal chemists not previously familiar with genomic techniques. PMID:26456468

  18. Bacterial Cellular Engineering by Genome Editing and Gene Silencing

    PubMed Central

    Nakashima, Nobutaka; Miyazaki, Kentaro

    2014-01-01

    Genome editing is an important technology for bacterial cellular engineering, which is commonly conducted by homologous recombination-based procedures, including gene knockout (disruption), knock-in (insertion), and allelic exchange. In addition, some new recombination-independent approaches have emerged that utilize catalytic RNAs, artificial nucleases, nucleic acid analogs, and peptide nucleic acids. Apart from these methods, which directly modify the genomic structure, an alternative approach is to conditionally modify the gene expression profile at the posttranscriptional level without altering the genomes. This is performed by expressing antisense RNAs to knock down (silence) target mRNAs in vivo. This review describes the features and recent advances on methods used in genomic engineering and silencing technologies that are advantageously used for bacterial cellular engineering. PMID:24552876

  19. Complete plastid genome sequence of Vaccinium macrocarpon: structure, gene content and rearrangements revealed by next generation sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete plastid genome sequence of the American cranberry was reconstructed using next-generation sequencing data by in silico procedures. We used Roche 454 shotgun sequence data to isolate cranberry plastid-specific sequences of the cultivar ‘HyRed’ via homology comparisons with complete seque...

  20. Regulation of methane genes and genome expression

    SciTech Connect

    John N. Reeve

    2009-09-09

    At the start of this project, it was known that methanogens were Archaeabacteria (now Archaea) and were therefore predicted to have gene expression and regulatory systems different from Bacteria, but few of the molecular biology details were established. The goals were then to establish the structures and organizations of genes in methanogens, and to develop the genetic technologies needed to investigate and dissect methanogen gene expression and regulation in vivo. By cloning and sequencing, we established the gene and operon structures of all of the “methane” genes that encode the enzymes that catalyze methane biosynthesis from carbon dioxide and hydrogen. This work identified unique sequences in the methane gene that we designated mcrA, that encodes the largest subunit of methyl-coenzyme M reductase, that could be used to identify methanogen DNA and establish methanogen phylogenetic relationships. McrA sequences are now the accepted standard and used extensively as hybridization probes to identify and quantify methanogens in environmental research. With the methane genes in hand, we used northern blot and then later whole-genome microarray hybridization analyses to establish how growth phase and substrate availability regulated methane gene expression in Methanobacterium thermautotrophicus ΔH (now Methanothermobacter thermautotrophicus). Isoenzymes or pairs of functionally equivalent enzymes catalyze several steps in the hydrogen-dependent reduction of carbon dioxide to methane. We established that hydrogen availability determine which of these pairs of methane genes is expressed and therefore which of the alternative enzymes is employed to catalyze methane biosynthesis under different environmental conditions. As were unable to establish a reliable genetic system for M. thermautotrophicus, we developed in vitro transcription as an alternative system to investigate methanogen gene expression and regulation. This led to the discovery that an archaeal protein

  1. Genome-wide SNPs and re-sequencing of growth habit and inflorescence genes in barley: implications for association mapping in germplasm arrays varying in size and structure

    PubMed Central

    2010-01-01

    association analyses - with SNP data only - in the larger germplasm arrays. For both vernalization sensitivity and inflorescence type, the most significant associations in the larger data sets were found with SNPs coincident with the synthetic markers used in the CAP Core and with SNPs detected via interaction analysis in the CAP Core. Conclusions Small and highly structured collections of germplasm, such as the CAP Core, are cost-effectively phenotyped and genotyped with high-throughput markers. They are also useful for characterizing allelic diversity at loci in germplasm of interest. Our results suggest that discovery-oriented exercises in AM in such small arrays may generate a large number of false-positives. However, if haplotypes in candidate genes are available, they may be used as anchors in an analysis of interactions to identify other candidate regions harboring genes determining target traits. Using larger germplasm arrays, genome regions where the principal genes determining vernalization sensitivity and row type are located were identified. PMID:21159198

  2. Unmet Challenges of Structural Genomics

    PubMed Central

    Chruszcz, Maksymilian; Domagalski, Marcin; Osinski, Tomasz; Wlodawer, Alexander; Minor, Wladek

    2010-01-01

    Summary Structural genomics (SG) programs have developed during the last decade many novel methodologies for faster and more accurate structure determination. These new tools and approaches led to determination of thousands of protein structures. The generation of enormous amounts of experimental data resulted in significant improvements in the understanding of many biological processes at molecular levels. However, the amount of data collected so far is so large that traditional analysis methods are limiting the rate of extraction of biological and biochemical information from 3-D models. This situation has prompted us to review the challenges that remain unmet by structural genomics, as well as the areas in which the potential impact of SG could exceed what has been achieved so far. PMID:20810277

  3. Haemonchus contortus: Genome Structure, Organization and Comparative Genomics.

    PubMed

    Laing, R; Martinelli, A; Tracey, A; Holroyd, N; Gilleard, J S; Cotton, J A

    2016-01-01

    One of the first genome sequencing projects for a parasitic nematode was that for Haemonchus contortus. The open access data from the Wellcome Trust Sanger Institute provided a valuable early resource for the research community, particularly for the identification of specific genes and genetic markers. Later, a second sequencing project was initiated by the University of Melbourne, and the two draft genome sequences for H. contortus were published back-to-back in 2013. There is a pressing need for long-range genomic information for genetic mapping, population genetics and functional genomic studies, so we are continuing to improve the Wellcome Trust Sanger Institute assembly to provide a finished reference genome for H. contortus. This review describes this process, compares the H. contortus genome assemblies with draft genomes from other members of the strongylid group and discusses future directions for parasite genomics using the H. contortus model. PMID:27238013

  4. Structural variations in plant genomes

    PubMed Central

    Edwards, David; Varshney, Rajeev K.

    2014-01-01

    Differences between plant genomes range from single nucleotide polymorphisms to large-scale duplications, deletions and rearrangements. The large polymorphisms are termed structural variants (SVs). SVs have received significant attention in human genetics and were found to be responsible for various chronic diseases. However, little effort has been directed towards understanding the role of SVs in plants. Many recent advances in plant genetics have resulted from improvements in high-resolution technologies for measuring SVs, including microarray-based techniques, and more recently, high-throughput DNA sequencing. In this review we describe recent reports of SV in plants and describe the genomic technologies currently used to measure these SVs. PMID:24907366

  5. Genes, genomes and identity. Projections on matter.

    PubMed

    Hauskeller, Christine

    2004-12-01

    This paper aims to show that references to genes and genomes are counterproductive in legal and political understandings of what it is to be human and a unique individual. To support this claim, I will give a brief overview of the many incompatible meanings the term 'identity' has gathered in reference to genes or genome in the contexts of biology and family ancestry, personal identity, species identity. One finds various and incompatible understandings of these expressions. While genetics is usually considered to deliver definitive knowledge about history and the future, genomics seems to work with more complicated relations between DNA, inheritance and phenotype. In genomics, 'identity' is no longer about identification and status markers but about individualization. Regulatory and legal documents project from traits to genomes, implying that individuality is at least represented, if not created, in a unique genome. Boundaries between humans and other animals, between different 'kinds' of humans, and between all individual humans are re-established via reference to the chemical matter of DNA. My analysis will show how this trend is a reactionary response to modern understandings of identities as social products and that it ignores new biomedical understandings of human bodies. PMID:15828152

  6. Structural characterization of genomes by large scale sequence-structure threading: application of reliability analysis in structural genomics

    PubMed Central

    Cherkasov, Artem; Ho Sui, Shannan J; Brunham, Robert C; Jones, Steven JM

    2004-01-01

    Background We establish that the occurrence of protein folds among genomes can be accurately described with a Weibull function. Systems which exhibit Weibull character can be interpreted with reliability theory commonly used in engineering analysis. For instance, Weibull distributions are widely used in reliability, maintainability and safety work to model time-to-failure of mechanical devices, mechanisms, building constructions and equipment. Results We have found that the Weibull function describes protein fold distribution within and among genomes more accurately than conventional power functions which have been used in a number of structural genomic studies reported to date. It has also been found that the Weibull reliability parameter β for protein fold distributions varies between genomes and may reflect differences in rates of gene duplication in evolutionary history of organisms. Conclusions The results of this work demonstrate that reliability analysis can provide useful insights and testable predictions in the fields of comparative and structural genomics. PMID:15274750

  7. Gene Fusion: A Genome Wide Survey

    NASA Technical Reports Server (NTRS)

    Liang, Ping; Riley, Monica

    2001-01-01

    As a well known fact, organisms form larger and complex multimodular (composite or chimeric) and mostly multi-functional proteins through gene fusion of two or more individual genes which have independent evolution histories and functions. We call each of these components a module. The existence of multimodular proteins may improves the efficiency in gene regulation and in cellular functions, and thus may give the host organism advantages in adaptation to environments. Analysis of all gene fusions in present-day organisms should allow us to examine the patterns of gene fusion in context with cellular functions, to trace back the evolution processes from the ancient smaller and uni-functional proteins to the present-day larger and complex multi-functional proteins, and to estimate the minimal number of ancestor proteins that existed in the last common ancestor for all life on earth. Although many multimodular proteins have been experimentally known, identification of gene fusion events systematically at genome scale had not been possible until recently when large number of completed genome sequences have been becoming available. In addition, technical difficulties for such analysis also exist due to the complexity of this biological and evolutionary process. We report from this study a new strategy to computationally identify multimodular proteins using completed genome sequences and the results surveyed from 22 organisms with the data from over 40 organisms to be presented during the meeting. Additional information is contained in the original extended abstract.

  8. Chromatin Structure Regulates Gene Conversion

    PubMed Central

    Cummings, W. Jason; Yabuki, Munehisa; Ordinario, Ellen C; Bednarski, David W; Quay, Simon; Maizels, Nancy

    2007-01-01

    Homology-directed repair is a powerful mechanism for maintaining and altering genomic structure. We asked how chromatin structure contributes to the use of homologous sequences as donors for repair using the chicken B cell line DT40 as a model. In DT40, immunoglobulin genes undergo regulated sequence diversification by gene conversion templated by pseudogene donors. We found that the immunoglobulin Vλ pseudogene array is characterized by histone modifications associated with active chromatin. We directly demonstrated the importance of chromatin structure for gene conversion, using a regulatable experimental system in which the heterochromatin protein HP1 (Drosophila melanogaster Su[var]205), expressed as a fusion to Escherichia coli lactose repressor, is tethered to polymerized lactose operators integrated within the pseudo-Vλ donor array. Tethered HP1 diminished histone acetylation within the pseudo-Vλ array, and altered the outcome of Vλ diversification, so that nontemplated mutations rather than templated mutations predominated. Thus, chromatin structure regulates homology-directed repair. These results suggest that histone modifications may contribute to maintaining genomic stability by preventing recombination between repetitive sequences. PMID:17880262

  9. Comparative Genomics in Identifying Aflatoxin Biosynthetic Genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus produces the most toxic and the most carcinogenic mycotoxins, aflatoxin B1 and B2. In order to solve aflatoxin contamination of food commodities, A. flavus genomics tools for identification of genes involved in aflatoxin biosynthesis have been employed. A. flavus Expressed Seque...

  10. Structural Genomics of Minimal Organisms: Pipeline and Results

    SciTech Connect

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2007-09-14

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93percent of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  11. Horizontal gene transfer, genome innovation and evolution.

    PubMed

    Gogarten, J Peter; Townsend, Jeffrey P

    2005-09-01

    To what extent is the tree of life the best representation of the evolutionary history of microorganisms? Recent work has shown that, among sets of prokaryotic genomes in which most homologous genes show extremely low sequence divergence, gene content can vary enormously, implying that those genes that are variably present or absent are frequently horizontally transferred. Traditionally, successful horizontal gene transfer was assumed to provide a selective advantage to either the host or the gene itself, but could horizontally transferred genes be neutral or nearly neutral? We suggest that for many prokaryotes, the boundaries between species are fuzzy, and therefore the principles of population genetics must be broadened so that they can be applied to higher taxonomic categories. PMID:16138096

  12. Complete structure, genomic organization, and expression of channel catfish (Ictalurus punctatus, Rafinesque 1818) matrix metalloproteinase-9 gene

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In the course of studying pathogenesis of enteric septicemia of catfish, we noted that the channel catfish (CC) matrix metalloproteinase-9 (MMP-9) expressed sequence tag (EST) was up-regulated after early Edwardsiella ictaluri infection. In this study, the CC MMP-9 gene was cloned, sequenced and ch...

  13. Genomic Prediction of Gene Bank Wheat Landraces.

    PubMed

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J; Wenzl, Peter; Singh, Sukhwinder

    2016-01-01

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, "diversity" and "prediction", including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15-20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials

  14. Genomic Prediction of Gene Bank Wheat Landraces

    PubMed Central

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J.; Wenzl, Peter; Singh, Sukhwinder

    2016-01-01

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite

  15. Functional Insights from Structural Genomics

    SciTech Connect

    Forouhar,F.; Kuzin, A.; Seetharaman, J.; Lee, I.; Zhou, W.; Abashidze, M.; Chen, Y.; Montelione, G.; Tong, L.; et al

    2007-01-01

    Structural genomics efforts have produced structural information, either directly or by modeling, for thousands of proteins over the past few years. While many of these proteins have known functions, a large percentage of them have not been characterized at the functional level. The structural information has provided valuable functional insights on some of these proteins, through careful structural analyses, serendipity, and structure-guided functional screening. Some of the success stories based on structures solved at the Northeast Structural Genomics Consortium (NESG) are reported here. These include a novel methyl salicylate esterase with important role in plant innate immunity, a novel RNA methyltransferase (H. influenzae yggJ (HI0303)), a novel spermidine/spermine N-acetyltransferase (B. subtilis PaiA), a novel methyltransferase or AdoMet binding protein (A. fulgidus AF{_}0241), an ATP:cob(I)alamin adenosyltransferase (B. subtilis YvqK), a novel carboxysome pore (E. coli EutN), a proline racemase homolog with a disrupted active site (B. melitensis BME11586), an FMN-dependent enzyme (S. pneumoniae SP{_}1951), and a 12-stranded {beta}-barrel with a novel fold (V. parahaemolyticus VPA1032).

  16. 2004 Structural, Function and Evolutionary Genomics

    SciTech Connect

    Douglas L. Brutlag Nancy Ryan Gray

    2005-03-23

    This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

  17. Floral gene resources from basal angiosperms for comparative genomics research

    PubMed Central

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and

  18. Lectin genes in the Frankia alni genome.

    PubMed

    Pujic, Petar; Fournier, Pascale; Alloisio, Nicole; Hay, Anne-Emmanuelle; Maréchal, Joelle; Anchisi, Stéphanie; Normand, Philippe

    2012-01-01

    Frankia alni strain ACN14a's genome was scanned for the presence of determinants involved in interactions with its host plant, Alnus spp. One such determinant type is lectin, proteins that bind specifically to sugar motifs. The genome of F. alni was found to contain 7 such lectin-coding genes, five of which were of the ricinB-type. The proteins coded by these genes contain either only the lectin domain, or also a heat shock protein or a serine-threonine kinase domain upstream. These lectins were found to have several homologs in Streptomyces spp., and a few in other bacterial genomes among which none in Frankia EAN1pec and CcI3 and two in strain EUN1f. One of these F. alni genes, FRAAL0616, was cloned in E. coli, fused with a reporter gene yielding a fusion protein that was found to bind to both root hairs and to bacterial hyphae. This protein was also found to modify the dynamics of nodule formation in A. glutinosa, resulting in a higher number of nodules per root. Its role could thus be to permit binding of microbial cells to root hairs and help symbiosis to occur under conditions of low Frankia cell counts such as in pioneer situations. PMID:22159868

  19. The d4 gene family in the human genome

    SciTech Connect

    Chestkov, A.V.; Baka, I.D.; Kost, M.V.

    1996-08-15

    The d4 domain, a novel zinc finger-like structural motif, was first revealed in the rat neuro-d4 protein. Here we demonstrate that the d4 domain is conserved in evolution and that three related genes form a d4 family in the human genome. The human neuro-d4 is very similar to rat neuro-d4 at both the amino acid and the nucleotide levels. Moreover, the same splice variants have been detected among rat and human neuro-d4 transcripts. This gene has been localized on chromosome 19, and two other genes, members of the d4 family isolated by screening of the human genomic library at low stringency, have been mapped to chromosomes 11 and 14. The gene on chromosome 11 is the homolog of the ubiquitously expressed mouse gene ubi-d4/requiem, which is required for cell death after deprivation of trophic factors. A gene with a conserved d4 domain has been found in the genome of the nematode Caenorhabditis elegans. The conservation of d4 proteins from nematodes to vertebrates suggests that they have a general importance, but a diversity of d4 proteins expressed in vertebrate nervous systems suggests that some family members have special functions. 11 refs., 2 figs.

  20. Chloroplast genome structure in Ilex (Aquifoliaceae)

    PubMed Central

    Yao, Xin; Tan, Yun-Hong; Liu, Ying-Ying; Song, Yu; Yang, Jun-Bo; Corlett, Richard T.

    2016-01-01

    Aquifoliaceae is the largest family in the campanulid order Aquifoliales. It consists of a single genus, Ilex, the hollies, which is the largest woody dioecious genus in the angiosperms. Most species are in East Asia or South America. The taxonomy and evolutionary history remain unclear due to the lack of a robust species-level phylogeny. We produced the first complete chloroplast genomes in this family, including seven Ilex species, by Illumina sequencing of long-range PCR products and subsequent reference-guided de novo assembly. These genomes have a typical bicyclic structure with a conserved genome arrangement and moderate divergence. The total length is 157,741 bp and there is one large single-copy region (LSC) with 87,109 bp, one small single-copy with 18,436 bp, and a pair of inverted repeat regions (IR) with 52,196 bp. A total of 144 genes were identified, including 96 protein-coding genes, 40 tRNA and 8 rRNA. Thirty-four repetitive sequences were identified in Ilex pubescens, with lengths >14 bp and identity >90%, and 11 divergence hotspot regions that could be targeted for phylogenetic markers. This study will contribute to improved resolution of deep branches of the Ilex phylogeny and facilitate identification of Ilex species. PMID:27378489

  1. On the analysis of large-scale genomic structures.

    PubMed

    Oiwa, Nestor Norio; Goldman, Carla

    2005-01-01

    We apply methods from statistical physics (histograms, correlation functions, fractal dimensions, and singularity spectra) to characterize large-scale structure of the distribution of nucleotides along genomic sequences. We discuss the role of the extension of noncoding segments ("junk DNA") for the genomic organization, and the connection between the coding segment distribution and the high-eukaryotic chromatin condensation. The following sequences taken from GenBank were analyzed: complete genome of Xanthomonas campestri, complete genome of yeast, chromosome V of Caenorhabditis elegans, and human chromosome XVII around gene BRCA1. The results are compared with the random and periodic sequences and those generated by simple and generalized fractal Cantor sets. PMID:15858230

  2. Genome-Wide Comparative Analysis Reveals Similar Types of NBS Genes in Hybrid Citrus sinensis Genome and Original Citrus clementine Genome and Provides New Insights into Non-TIR NBS Genes

    PubMed Central

    Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K.; Duan, Yongping; Luo, Feng

    2015-01-01

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention. PMID:25811466

  3. Kluyveromyces lactis genome harbours a functional linker histone encoding gene.

    PubMed

    Staneva, Dessislava; Georgieva, Milena; Miloshev, George

    2016-06-01

    Linker histones are essential components of chromatin in eukaryotes. Through interactions with linker DNA and nucleosomes they facilitate folding and maintenance of higher-order chromatin structures and thus delicately modulate gene activity. The necessity of linker histones in lower eukaryotes appears controversial and dubious. Genomic data have shown that Schizosaccharomyces pombe does not possess genes encoding linker histones while Kluyveromyces lactis has been reported to have a pseudogene. Regarding this controversy, we have provided the first direct experimental evidence for the existence of a functional linker histone gene, KlLH1, in K. lactis genome. Sequencing of KlLH1 from both genomic DNA and copy DNA confirmed the presence of an intact open reading frame. Transcription and splicing of the KlLH1 sequence as well as translation of its mRNA have been studied. In silico analysis revealed homology of KlLH1p to the histone H1/H5 protein family with predicted three domain structure characteristic for the linker histones of higher eukaryotes. This strongly proves that the yeast K. lactis does indeed possess a functional linker histone gene thus entailing the evolutionary preservation and significance of linker histones. The nucleotide sequences of KlLH1 are deposited in the GenBank under accession numbers KT826576, KT826577 and KT826578. PMID:27189369

  4. Genomic structure and evolution of multigene families: "flowers" on the human genome.

    PubMed

    Kim, Hie Lim; Iwase, Mineyo; Igawa, Takeshi; Nishioka, Tasuku; Kaneko, Satoko; Katsura, Yukako; Takahata, Naoyuki; Satta, Yoko

    2012-01-01

    We report the results of an extensive investigation of genomic structures in the human genome, with a particular focus on relatively large repeats (>50 kb) in adjacent chromosomal regions. We named such structures "Flowers" because the pattern observed on dot plots resembles a flower. We detected a total of 291 Flowers in the human genome. They were predominantly located in euchromatic regions. Flowers are gene-rich compared to the average gene density of the genome. Genes involved in systems receiving environmental information, such as immunity and detoxification, were overrepresented in Flowers. Within a Flower, the mean number of duplication units was approximately four. The maximum and minimum identities between homologs in a Flower showed different distributions; the maximum identity was often concentrated to 100% identity, while the minimum identity was evenly distributed in the range of 78% to 100%. Using a gene conversion detection test, we found frequent and/or recent gene conversion events within the tested Flowers. Interestingly, many of those converted regions contained protein-coding genes. Computer simulation studies suggest that one role of such frequent gene conversions is the elongation of the life span of gene families in a Flower by the resurrection of pseudogenes. PMID:22779033

  5. Genomic Structure and Evolution of Multigene Families: “Flowers” on the Human Genome

    PubMed Central

    Kim, Hie Lim; Iwase, Mineyo; Igawa, Takeshi; Nishioka, Tasuku; Kaneko, Satoko; Katsura, Yukako; Takahata, Naoyuki; Satta, Yoko

    2012-01-01

    We report the results of an extensive investigation of genomic structures in the human genome, with a particular focus on relatively large repeats (>50 kb) in adjacent chromosomal regions. We named such structures “Flowers” because the pattern observed on dot plots resembles a flower. We detected a total of 291 Flowers in the human genome. They were predominantly located in euchromatic regions. Flowers are gene-rich compared to the average gene density of the genome. Genes involved in systems receiving environmental information, such as immunity and detoxification, were overrepresented in Flowers. Within a Flower, the mean number of duplication units was approximately four. The maximum and minimum identities between homologs in a Flower showed different distributions; the maximum identity was often concentrated to 100% identity, while the minimum identity was evenly distributed in the range of 78% to 100%. Using a gene conversion detection test, we found frequent and/or recent gene conversion events within the tested Flowers. Interestingly, many of those converted regions contained protein-coding genes. Computer simulation studies suggest that one role of such frequent gene conversions is the elongation of the life span of gene families in a Flower by the resurrection of pseudogenes. PMID:22779033

  6. Evolution of galanin receptor genes: insights from the deuterostome genomes.

    PubMed

    Liu, Z; Xu, Y; Wu, L; Zhang, S

    2010-08-01

    Galanin exerts its biological activities through three different G protein-coupled receptors, Galr1, Galr2 and Galr3. To obtain insights into the evolution of Galrs, we searched the genomes of the deuterostomes by extensive BLAST survey and phylogenetic analyses. The Galr2 and Galr3 share similar genomic structures, and most of them are composed of 2 exons and 1 intron. However, most of Galr1 are composed of 3 extrons and 2 introns. We did not detect the typical Galr genes in the genomic databases of invertebrate deutserotomes, but three Galr1/Alstr homologs and two Galr1/Gpr151 homologs in amphioxus, two Galr1/Gpr151 homologs in sea squirt and one Galr1/Gpr151 homologs in sea urchin were identified. It is highly possible that the Galr genes in vertebrates may evolve from the homologous genes of Galr1/Alstr/Gpr151 in invertebrate deuterostomes. We also proposed that Galr3 genes were the products of Galr2 duplication during evolution, while Galr2 genes may evolve from Galr1. PMID:20476798

  7. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.)

    PubMed Central

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-01-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073

  8. Mining the genome for lipid genes.

    PubMed

    Kuivenhoven, Jan Albert; Hegele, Robert A

    2014-10-01

    Mining of the genome for lipid genes has since the early 1970s helped to shape our understanding of how triglycerides are packaged (in chylomicrons), repackaged (in very low density lipoproteins; VLDL), and hydrolyzed, and also how remnant and low-density lipoproteins (LDL) are cleared from the circulation. Gene discoveries have also provided insights into high-density lipoprotein (HDL) biogenesis and remodeling. Interestingly, at least half of these key molecular genetic studies were initiated with the benefit of prior knowledge of relevant proteins. In addition, multiple important findings originated from studies in mouse, and from other types of non-genetic approaches. Although it appears by now that the main lipid pathways have been uncovered, and that only modulators or adaptor proteins such as those encoded by LDLRAP1, APOA5, ANGPLT3/4, and PCSK9 are currently being discovered, genome wide association studies (GWAS) in particular have implicated many new loci based on statistical analyses; these may prove to have equally large impacts on lipoprotein traits as gene products that are already known. On the other hand, since 2004 - and particularly since 2010 when massively parallel sequencing has become de rigeur - no major new insights into genes governing lipid metabolism have been reported. This is probably because the etiologies of true Mendelian lipid disorders with overt clinical complications have been largely resolved. In the meantime, it has become clear that proving the importance of new candidate genes is challenging. This could be due to very low frequencies of large impact variants in the population. It must further be emphasized that functional genetic studies, while necessary, are often difficult to accomplish, making it hazardous to upgrade a variant that is simply associated to being definitively causative. Also, it is clear that applying a monogenic approach to dissect complex lipid traits that are mostly of polygenic origin is the wrong way to

  9. Hybrid Vigour? Genes, Genomics, and History

    PubMed Central

    BIVINS, ROBERTA

    2010-01-01

    Is the gene ‘special’ for historians? What effects, if any, has the notion of the ‘gene’ had on our understanding of history? Certainly, there is a widespread public and professional perception that genetics and history are or should be in dialogue with each other in some way. But historians and geneticists view history and genetics very differently – and assume very different relationships between them. And public perceptions of genes, genetics, genomics, and indeed the nature and meanings of ‘history’ differ yet again. Here, in looking at the meaning, and the implications – the significance – of the gene (and its corollary scientific disciplines and approaches) specifically to historians, I will focus on two aspects of the discourse. First, I will examine the ways in which historians have thus far approached genes and genetics, and the impact such studies have had on the field. There is considerable overlap between the subject matter of genetics/genomics and many of the most widely used analytic categories of contemporary historiography – race, gender, sexuality, ethnicity, (dis)ability, among others. Yet the impact of genetics and genomics on society has been studied principally by anthropologists, sociologists and ethicists.2 Only two historical sub-disciplines have engaged with the rise of genetics to any significant degree: the histories of science and of medicine. What does this indicate or suggest? Second, I will explore the impact of the ‘gene’ and genetic understandings (of, for example, the body, health, disease, identity, the family, and evolution) on public conceptions of history itself. PMID:20357894

  10. Genome-level identification, gene expression, and comparative analysis of porcine ß-defensin genes

    PubMed Central

    2012-01-01

    Background Beta-defensins (β-defensins) are innate immune peptides with evolutionary conservation across a wide range of species and has been suggested to play important roles in innate immune reactions against pathogens. However, the complete β-defensin repertoire in the pig has not been fully addressed. Result A BLAST analysis was performed against the available pig genomic sequence in the NCBI database to identify β-defensin-related sequences using previously reported β-defensin sequences of pigs, humans, and cattle. The porcine β-defensin gene clusters were mapped to chromosomes 7, 14, 15 and 17. The gene expression analysis of 17 newly annotated porcine β-defensin genes across 15 tissues using semi-quantitative reverse transcription polymerase chain reaction (RT-PCR) showed differences in their tissue distribution, with the kidney and testis having the largest pBD expression repertoire. We also analyzed single nucleotide polymorphisms (SNPs) in the mature peptide region of pBD genes from 35 pigs of 7 breeds. We found 8 cSNPs in 7 pBDs. Conclusion We identified 29 porcine β-defensin (pBD) gene-like sequences, including 17 unreported pBDs in the porcine genome. Comparative analysis of β-defensin genes in the pig genome with those in human and cattle genomes showed structural conservation of β-defensin syntenic regions among these species. PMID:23150902

  11. Genomic organization of the human skeletal muscle sodium channel gene

    SciTech Connect

    George, A.L. Jr.; Iyer, G.S.; Kleinfield, R.; Kallen, R.G.; Barchi, R.L. )

    1993-03-01

    Voltage-dependent sodium channels are essential for normal membrane excitability and contractility in adult skeletal muscle. The gene encoding the principal sodium channel [alpha]-subunit isoform in human skeletal muscle (SCN4A) has recently been shown to harbor point mutations in certain hereditary forms of periodic paralysis. The authors have carried out an analysis of the detailed structure of this gene including delination of intron-exon boundaries by genomic DNA cloning and sequence analysis. The complete coding region of SCN4A is found in 32.5 kb of genomic DNA and consists of 24 exons (54 to >2.2 kb) and 23 introns (97 bp-4.85 kb). The exon organization of the gene shows no relationship to the predicted functional domains of the channel protein and splice junctions interrupt many of the transmembrane segments. The genomic organization of sodium channels may have been partially conserved during evolution as evidenced by the observation that 10 of the 24 splice junctions in SCN4A are positioned in homologous locations in a putative sodium channel gene in Drosophila (para). The information presented here should be extremely useful both for further identifying sodium channel mutations and for gaining a better understanding of sodium channel evolution. 39 refs., 5 figs., 2 tabs.

  12. Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics

    PubMed Central

    Fermin, Damian; Allen, Baxter B; Blackwell, Thomas W; Menon, Rajasree; Adamski, Marcin; Xu, Yin; Ulintz, Peter; Omenn, Gilbert S; States, David J

    2006-01-01

    Background Defining the location of genes and the precise nature of gene products remains a fundamental challenge in genome annotation. Interrogating tandem mass spectrometry data using genomic sequence provides an unbiased method to identify novel translation products. A six-frame translation of the entire human genome was used as the query database to search for novel blood proteins in the data from the Human Proteome Organization Plasma Proteome Project. Because this target database is orders of magnitude larger than the databases traditionally employed in tandem mass spectra analysis, careful attention to significance testing is required. Confidence of identification is assessed using our previously described Poisson statistic, which estimates the significance of multi-peptide identifications incorporating the length of the matching sequence, number of spectra searched and size of the target sequence database. Results Applying a false discovery rate threshold of 0.05, we identified 282 significant open reading frames, each containing two or more peptide matches. There were 627 novel peptides associated with these open reading frames that mapped to a unique genomic coordinate placed within the start/stop points of previously annotated genes. These peptides matched 1,110 distinct tandem MS spectra. Peptides fell into four categories based upon where their genomic coordinates placed them relative to annotated exons within the parent gene. Conclusion This work provides evidence for novel alternative splice variants in many previously annotated genes. These findings suggest that annotation of the genome is not yet complete and that proteomics has the potential to further add to our understanding of gene structures. PMID:16646984

  13. Coevolution of the Organization and Structure of Prokaryotic Genomes.

    PubMed

    Touchon, Marie; Rocha, Eduardo P C

    2016-01-01

    The cytoplasm of prokaryotes contains many molecular machines interacting directly with the chromosome. These vital interactions depend on the chromosome structure, as a molecule, and on the genome organization, as a unit of genetic information. Strong selection for the organization of the genetic elements implicated in these interactions drives replicon ploidy, gene distribution, operon conservation, and the formation of replication-associated traits. The genomes of prokaryotes are also very plastic with high rates of horizontal gene transfer and gene loss. The evolutionary conflicts between plasticity and organization lead to the formation of regions with high genetic diversity whose impact on chromosome structure is poorly understood. Prokaryotic genomes are remarkable documents of natural history because they carry the imprint of all of these selective and mutational forces. Their study allows a better understanding of molecular mechanisms, their impact on microbial evolution, and how they can be tinkered in synthetic biology. PMID:26729648

  14. INTEGRATE: gene fusion discovery using whole genome and transcriptome data

    PubMed Central

    Zhang, Jin; White, Nicole M.; Schmidt, Heather K.; Fulton, Robert S.; Tomlinson, Chad; Warren, Wesley C.; Wilson, Richard K.; Maher, Christopher A.

    2016-01-01

    While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use. PMID:26556708

  15. Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species

    PubMed Central

    Hirao, Tomonori; Watanabe, Atsushi; Kurita, Manabu; Kondo, Teiji; Takata, Katsuhiko

    2008-01-01

    Background The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. Results The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. Conclusion The observed differences in genomic structure between C. japonica and other land plants, including

  16. Isolation, cDNA, and genomic structure of a conserved gene (NOF) at chromosome 11q13 next to FAU and oriented in the opposite transcriptional orientation

    SciTech Connect

    Kas, K.; Meyen, E.; Van De Ven, W.J.M.

    1996-06-15

    In our effort to characterize a gene at chromosome 11q13 involved in a t(11;17)(q13;q21) translocation in B-non-Hodgkin lymphoma, we have identified a novel human gene, NOF (Neighbour of FAU). It maps right next to FAU in a head to head configuration separated by a maximum of 146 nucleotides. cDNA clones representing NOF hybridized to a 2.2-kb mRNA present in all tissues tested. The largest open reading frame appeared to contain 166 amino acids and is proline rich, and the sequence shows no homology with any known gene in the public databases. The NOF gene consists of 4 exons and 3 introns spanning approximately 5 kb, and the boundaries between exons and introns follow the GT/AG rule. The NOF locus is conserved during evolution, with the predicted protein having over 80% identity to three translated mouse and rat ESTs of unknown function. Moreover, the mouse ESTs map in the same organization, closely linked to the FAU gene, in the mouse genome. NOF, however, is not affected by the t(11;17)(q13;121) chromosomal translocation. 14 refs., 2 figs.

  17. GenePRIMP: A GENE PRediction IMprovement Pipeline for Prokaryotic genomes

    SciTech Connect

    Pati, Amrita; Ivanova, Natalia N.; Mikhailova, Natalia; Ovchinnikova, Galina; Hooper, Sean D.; Lykidis, Athanasios; Kyrpides, Nikos C.

    2010-04-01

    We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.

  18. Genomic architecture and inheritance of human ribosomal RNA gene clusters

    PubMed Central

    Stults, Dawn M.; Killen, Michael W.; Pierce, Heather H.; Pierce, Andrew J.

    2008-01-01

    The finishing of the Human Genome Project largely completed the detailing of human euchromatic sequences; however, the most highly repetitive regions of the genome still could not be assembled. The 12 gene clusters producing the structural RNA components of the ribosome are critically important for cellular viability, yet fall into this unassembled region of the Human Genome Project. To determine the extent of human variation in ribosomal RNA gene content (rDNA) and patterns of rDNA cluster inheritance, we have determined the physical lengths of the rDNA clusters in peripheral blood white cells of healthy human volunteers. The cluster lengths exhibit striking variability between and within human individuals, ranging from 50 kb to >6 Mb, manifest essentially complete heterozygosity, and provide each person with their own unique rDNA electrophoretic karyotype. Analysis of these rDNA fingerprints in multigenerational human families demonstrates that the rDNA clusters are subject to meiotic rearrangement at a frequency >10% per cluster, per meiosis. With this high intrinsic recombinational instability, the rDNA clusters may serve as a unique paradigm of potential human genomic plasticity. PMID:18025267

  19. Transcriptional consequences of genomic structural aberrations in breast cancer

    PubMed Central

    Inaki, Koichiro; Hillmer, Axel M.; Ukil, Leena; Yao, Fei; Woo, Xing Yi; Vardy, Leah A.; Zawack, Kelson Folkvard Braaten; Lee, Charlie Wah Heng; Ariyaratne, Pramila Nuwantha; Chan, Yang Sun; Desai, Kartiki Vasant; Bergh, Jonas; Hall, Per; Putti, Thomas Choudary; Ong, Wai Loon; Shahab, Atif; Cacheux-Rataboul, Valere; Karuturi, Radha Krishna Murthy; Sung, Wing-Kin; Ruan, Xiaoan; Bourque, Guillaume; Ruan, Yijun; Liu, Edison T.

    2011-01-01

    Using a long-span, paired-end deep sequencing strategy, we have comprehensively identified cancer genome rearrangements in eight breast cancer genomes. Herein, we show that 40%–54% of these structural genomic rearrangements result in different forms of fusion transcripts and that 44% are potentially translated. We find that single segmental tandem duplication spanning several genes is a major source of the fusion gene transcripts in both cell lines and primary tumors involving adjacent genes placed in the reverse-order position by the duplication event. Certain other structural mutations, however, tend to attenuate gene expression. From these candidate gene fusions, we have found a fusion transcript (RPS6KB1–VMP1) recurrently expressed in ∼30% of breast cancers associated with potential clinical consequences. This gene fusion is caused by tandem duplication on 17q23 and appears to be an indicator of local genomic instability altering the expression of oncogenic components such as MIR21 and RPS6KB1. PMID:21467264

  20. Structural and Operational Complexity of the Geobacter Sulfurreducens Genome

    SciTech Connect

    Qiu, Yu; Cho, Byung-Kwan; Park, Young S.; Lovley, Derek R.; Palsson, Bernhard O.; Zengler, Karsten

    2010-06-30

    Prokaryotic genomes can be annotated based on their structural, operational, and functional properties. These annotations provide the pivotal scaffold for understanding cellular functions on a genome-scale, such as metabolism and transcriptional regulation. Here, we describe a systems approach to simultaneously determine the structural and operational annotation of the Geobacter sulfurreducens genome. Integration of proteomics, transcriptomics, RNA polymerase, and sigma factor-binding information with deep-sequencing-based analysis of primary 59-end transcripts allowed for a most precise annotation. The structural annotation is comprised of numerous previously undetected genes, noncoding RNAs, prevalent leaderless mRNA transcripts, and antisense transcripts. When compared with other prokaryotes, we found that the number of antisense transcripts reversely correlated with genome size. The operational annotation consists of 1453 operons, 22% of which have multiple transcription start sites that use different RNA polymerase holoenzymes. Several operons with multiple transcription start sites encoded genes with essential functions, giving insight into the regulatory complexity of the genome. The experimentally determined structural and operational annotations can be combined with functional annotation, yielding a new three-level annotation that greatly expands our understanding of prokaryotic genomes.

  1. Identification and characterization of essential genes in the human genome.

    PubMed

    Wang, Tim; Birsoy, Kıvanç; Hughes, Nicholas W; Krupczak, Kevin M; Post, Yorick; Wei, Jenny J; Lander, Eric S; Sabatini, David M

    2015-11-27

    Large-scale genetic analysis of lethal phenotypes has elucidated the molecular underpinnings of many biological processes. Using the bacterial clustered regularly interspaced short palindromic repeats (CRISPR) system, we constructed a genome-wide single-guide RNA library to screen for genes required for proliferation and survival in a human cancer cell line. Our screen revealed the set of cell-essential genes, which was validated with an orthogonal gene-trap-based screen and comparison with yeast gene knockouts. This set is enriched for genes that encode components of fundamental pathways, are expressed at high levels, and contain few inactivating polymorphisms in the human population. We also uncovered a large group of uncharacterized genes involved in RNA processing, a number of whose products localize to the nucleolus. Last, screens in additional cell lines showed a high degree of overlap in gene essentiality but also revealed differences specific to each cell line and cancer type that reflect the developmental origin, oncogenic drivers, paralogous gene expression pattern, and chromosomal structure of each line. These results demonstrate the power of CRISPR-based screens and suggest a general strategy for identifying liabilities in cancer cells. PMID:26472758

  2. Identification and characterization of essential genes in the human genome

    PubMed Central

    Wang, Tim; Birsoy, Kıvanç; Hughes, Nicholas W.; Krupczak, Kevin M.; Post, Yorick; Wei, Jenny J.; Lander, Eric S.; Sabatini, David M.

    2015-01-01

    Large-scale genetic analysis of lethal phenotypes has elucidated the molecular underpinnings of many biological processes. Using the bacterial clustered regularly interspaced short palindromic repeats (CRISPR) system, we constructed a genome-wide single-guide RNA (sgRNA) library to screen for genes required for proliferation and survival in a human cancer cell line. Our screen revealed the set of cell-essential genes, which was validated by an orthogonal gene-trap-based screen and comparison with yeast gene knockouts. This set is enriched for genes that encode components of fundamental pathways, are expressed at high levels, and contain few inactivating polymorphisms in the human population. We also uncovered a large group of uncharacterized genes involved in RNA processing, a number of whose products localize to the nucleolus. Lastly, screens in additional cell lines showed a high degree of overlap in gene essentiality, but also revealed differences specific to each cell line and cancer type that reflect the developmental origin, oncogenic drivers, paralogous gene expression pattern, and chromosomal structure of each line. These results demonstrate the power of CRISPR-based screens and suggest a general strategy for identifying liabilities in cancer cells. PMID:26472758

  3. Genome-wide association study using whole-genome sequencing rapidly identifies new genes influencing agronomic traits in rice.

    PubMed

    Yano, Kenji; Yamamoto, Eiji; Aya, Koichiro; Takeuchi, Hideyuki; Lo, Pei-Ching; Hu, Li; Yamasaki, Masanori; Yoshida, Shinya; Kitano, Hidemi; Hirano, Ko; Matsuoka, Makoto

    2016-08-01

    A genome-wide association study (GWAS) can be a powerful tool for the identification of genes associated with agronomic traits in crop species, but it is often hindered by population structure and the large extent of linkage disequilibrium. In this study, we identified agronomically important genes in rice using GWAS based on whole-genome sequencing, followed by the screening of candidate genes based on the estimated effect of nucleotide polymorphisms. Using this approach, we identified four new genes associated with agronomic traits. Some genes were undetectable by standard SNP analysis, but we detected them using gene-based association analysis. This study provides fundamental insights relevant to the rapid identification of genes associated with agronomic traits using GWAS and will accelerate future efforts aimed at crop improvement. PMID:27322545

  4. p63 gene structure in the phylum mollusca.

    PubMed

    Baričević, Ana; Štifanić, Mauro; Hamer, Bojan; Batel, Renato

    2015-08-01

    Roles of p53 family ancestor (p63) in the organisms' response to stressful environmental conditions (mainly pollution) have been studied among molluscs, especially in the genus Mytilus, within the last 15 years. Nevertheless, information about gene structure of this regulatory gene in molluscs is scarce. Here we report the first complete genomic structure of the p53 family orthologue in the mollusc Mediterranean mussel Mytilus galloprovincialis and confirm its similarity to vertebrate p63 gene. Our searches within the available molluscan genomes (Aplysia californica, Lottia gigantea, Crassostrea gigas and Biomphalaria glabrata), found only one p53 family member present in a single copy per haploid genome. Comparative analysis of those orthologues, additionally confirmed the conserved p63 gene structure. Conserved p63 gene structure can be a helpful tool to complement or/and revise gene annotations of any future p63 genomic sequence records in molluscs, but also in other animal phyla. Knowledge of the correct gene structure will enable better prediction of possible protein isoforms and their functions. Our analyses also pointed out possible mis-annotations of the p63 gene in sequenced molluscan genomes and stressed the value of manual inspection (based on alignments of cDNA and protein onto the genome sequence) for a reliable and complete gene annotation. PMID:25936268

  5. Identifying potential cancer driver genes by genomic data integration

    NASA Astrophysics Data System (ADS)

    Chen, Yong; Hao, Jingjing; Jiang, Wei; He, Tong; Zhang, Xuegong; Jiang, Tao; Jiang, Rui

    2013-12-01

    Cancer is a genomic disease associated with a plethora of gene mutations resulting in a loss of control over vital cellular functions. Among these mutated genes, driver genes are defined as being causally linked to oncogenesis, while passenger genes are thought to be irrelevant for cancer development. With increasing numbers of large-scale genomic datasets available, integrating these genomic data to identify driver genes from aberration regions of cancer genomes becomes an important goal of cancer genome analysis and investigations into mechanisms responsible for cancer development. A computational method, MAXDRIVER, is proposed here to identify potential driver genes on the basis of copy number aberration (CNA) regions of cancer genomes, by integrating publicly available human genomic data. MAXDRIVER employs several optimization strategies to construct a heterogeneous network, by means of combining a fused gene functional similarity network, gene-disease associations and a disease phenotypic similarity network. MAXDRIVER was validated to effectively recall known associations among genes and cancers. Previously identified as well as novel driver genes were detected by scanning CNAs of breast cancer, melanoma and liver carcinoma. Three predicted driver genes (CDKN2A, AKT1, RNF139) were found common in these three cancers by comparative analysis.

  6. A Roadmap for Functional Structural Variants in the Soybean Genome

    PubMed Central

    Anderson, Justin E.; Kantar, Michael B.; Kono, Thomas Y.; Fu, Fengli; Stec, Adrian O.; Song, Qijian; Cregan, Perry B.; Specht, James E.; Diers, Brian W.; Cannon, Steven B.; McHale, Leah K.; Stupar, Robert M.

    2014-01-01

    Gene structural variation (SV) has recently emerged as a key genetic mechanism underlying several important phenotypic traits in crop species. We screened a panel of 41 soybean (Glycine max) accessions serving as parents in a soybean nested association mapping population for deletions and duplications in more than 53,000 gene models. Array hybridization and whole genome resequencing methods were used as complementary technologies to identify SV in 1528 genes, or approximately 2.8%, of the soybean gene models. Although SV occurs throughout the genome, SV enrichment was noted in families of biotic defense response genes. Among accessions, SV was nearly eightfold less frequent for gene models that have retained paralogs since the last whole genome duplication event, compared with genes that have not retained paralogs. Increases in gene copy number, similar to that described at the Rhg1 resistance locus, account for approximately one-fourth of the genic SV events. This assessment of soybean SV occurrence presents a target list of genes potentially responsible for rapidly evolving and/or adaptive traits. PMID:24855315

  7. Gene organization and characterization of the complete mitochondrial genome of Hainan black goat (Capra hircus).

    PubMed

    Hu, Jiangtao; Zhao, Wei; Niu, Lili; Wang, Linjie; Li, Li; Zhang, Hongping; Zhong, Tao

    2016-05-01

    The complete mitochondrial genome sequence of Hainan black goat was determined for the first time by the PCR-based method. The total length of the mitogenome was 16,641 bp, including 33.54% A, 26.04% C, 27.31% T, 13.11% G. The genome structure contained 22 tRNA genes, 2 rRNA genes, 13 protein-coding genes and 1 control region (D-loop region). These results have extended more detail information of mitochondrial genome, thus being useful for further study on the genetic divergence and phylogenetic resolution of global goats. PMID:25211090

  8. Generating Genome-Scale Candidate Gene Lists for Pharmacogenomics

    PubMed Central

    Hansen, NT; Brunak, S; Altman, RB

    2009-01-01

    A critical task in pharmacogenomics is identifying genes that may be important modulators of drug response. High-throughput experimental methods are often plagued by false positives and do not take advantage of existing knowledge. Candidate gene lists can usefully summarize existing knowledge, but they are expensive to generate manually and may therefore have incomplete coverage. We have developed a method that ranks 12,460 genes in the human genome on the basis of their potential relevance to a specific query drug and its putative indications. Our method uses known gene–drug interactions, networks of gene–gene interactions, and available measures of drug–drug similarity. It ranks genes by building a local network of known interactions and assessing the similarity of the query drug (by both structure and indication) with drugs that interact with gene products in the local network. In a comprehensive benchmark, our method achieves an overall area under the curve of 0.82. To showcase our method, we found novel gene candidates for warfarin, gefitinib, carboplatin, and gemcitabine, and we provide the molecular hypotheses for these predictions. PMID:19369935

  9. Inter-genomic DNA Exchanges and Homeologous Gene Silencing Shaped the Nascent Allopolyploid Coffee Genome (Coffea arabica L.)

    PubMed Central

    Lashermes, Philippe; Hueber, Yann; Combes, Marie-Christine; Severac, Dany; Dereeper, Alexis

    2016-01-01

    Allopolyploidization is a biological process that has played a major role in plant speciation and evolution. Genomic changes are common consequences of polyploidization, but their dynamics over time are still poorly understood. Coffea arabica, a recently formed allotetraploid, was chosen to study genetic changes that accompany allopolyploid formation. Both RNA-seq and DNA-seq data were generated from two genetically distant C. arabica accessions. Genomic structural variation was investigated using C. canephora, one of its diploid progenitors, as reference genome. The fate of 9047 duplicate homeologous genes was inferred and compared between the accessions. The pattern of SNP density along the reference genome was consistent with the allopolyploid structure. Large genomic duplications or deletions were not detected. Two homeologous copies were retained and expressed in 96% of the genes analyzed. Nevertheless, duplicated genes were found to be affected by various genomic changes leading to homeolog loss or silencing. Genetic and epigenetic changes were evidenced that could have played a major role in the stabilization of the unique ancestral allotetraploid and its subsequent diversification. While the early evolution of C. arabica mainly involved homeologous crossover exchanges, the later stage appears to have relied on more gradual evolution involving gene conversion and homeolog silencing. PMID:27440920

  10. Rotavirus gene structure and function.

    PubMed Central

    Estes, M K; Cohen, J

    1989-01-01

    Knowledge of the structure and function of the genes and proteins of the rotaviruses has expanded rapidly. Information obtained in the last 5 years has revealed unexpected and unique molecular properties of rotavirus proteins of general interest to virologists, biochemists, and cell biologists. Rotaviruses share some features of replication with reoviruses, yet antigenic and molecular properties of the outer capsid proteins, VP4 (a protein whose cleavage is required for infectivity, possibly by mediating fusion with the cell membrane) and VP7 (a glycoprotein), show more similarities with those of other viruses such as the orthomyxoviruses, paramyxoviruses, and alphaviruses. Rotavirus morphogenesis is a unique process, during which immature subviral particles bud through the membrane of the endoplasmic reticulum (ER). During this process, transiently enveloped particles form, the outer capsid proteins are assembled onto particles, and mature particles accumulate in the lumen of the ER. Two ER-specific viral glycoproteins are involved in virus maturation, and these glycoproteins have been shown to be useful models for studying protein targeting and retention in the ER and for studying mechanisms of virus budding. New ideas and approaches to understanding how each gene functions to replicate and assemble the segmented viral genome have emerged from knowledge of the primary structure of rotavirus genes and their proteins and from knowledge of the properties of domains on individual proteins. Localization of type-specific and cross-reactive neutralizing epitopes on the outer capsid proteins is becoming increasingly useful in dissecting the protective immune response, including evaluation of vaccine trials, with the practical possibility of enhancing the production of new, more effective vaccines. Finally, future analyses with recently characterized immunologic and gene probes and new animal models can be expected to provide a basic understanding of what regulates the

  11. Comparative and Functional Genomics in Identifying Aflatoxin Biosynthetic Genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Identification of genes involved in aflatoxin biosynthesis through Aspergillus flavus genomics has been actively pursued. A. flavus Expressed Sequence Tags (EST’s) and whole genome sequencing have been completed. Groups of genes that are potentially involved in aflatoxin production have been profi...

  12. A salmonid EST genomic study: genes, duplications, phylogeny and microarrays

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background: Salmonids are of interest because of their relatively recent genome duplication, and their extensive use in wild fisheries and aquaculture. A comprehensive gene list and a comparison of genes in some of the different species provide valuable genomic information for one of the most wide...

  13. Genomic scan for genes predisposing to schizophrenia

    SciTech Connect

    Coon, H.; Jensen. S.; Holik, J.

    1994-03-15

    We initiated a genome-wide search for genes predisposing to schizophrenia by ascertaining 9 families, each containing three to five cases of schizophrenia. The 9 pedigrees were initially genotyped with 329 polymorphic DNA loci distributed throughout the genome. Assuming either autosomal dominant or recessive inheritance, 254 DNA loci yielded lod scores less than -2.0 at {theta} = 0.0, 101 DNA markers gave lod scores less than -2.0 at {theta} = 0.05, while 5 DNA loci produced maximum lod scores greater than 1: D4S35, D14S17, D15S1, D22S84, and D22S55. Of the DNA markers yielding lod scores greater than 1, D4S35 and D22S55 also were suggestive of linkage when the Affected-Pedigree-Member method was used. The families were then genotyped with four highly polymorphic simple sequence repeat markers; possible linkage diminished with DNA markers mapping nearby D4S35, while suggestive evidence of linkage remained with loci in the region of D22S55. Although follow-up investigation of these chromosomal regions may be warranted, our linkage results should be viewed as preliminary observations, as 35 unaffected persons are not past the age of risk. 90 refs., 3 tabs.

  14. Genomic structure of the α-amylase gene in the pearl oyster Pinctada fucata and its expression in response to salinity and food concentration.

    PubMed

    Huang, Guiju; Guo, Yihui; Li, Lu; Fan, Sigang; Yu, Ziniu; Yu, Dahui

    2016-08-01

    Amylase is one of the most important digestive enzymes for phytophagous animals. In this study, the cDNA, genomic DNA, and promoter region of the α-amylase gene of the pearl oyster Pinctada fucata were cloned by using reverse transcription-polymerase chain reaction (RT-PCR), rapid amplification of cDNA ends, and genome-walking methods. The full-length cDNA sequence was 1704bp long and consisted of a 5'-untranslated region of 17bp, a 3'-untranslated region of 118bp, and a 1569-bp open reading frame encoding a 522-aa polypeptide with a 20-aa signal peptide. Sequence alignment revealed that P. fucata α-amylase (Pfamy) shared the highest identity (91.6%) with Pinctada maxima. The phylogenetic tree showed that it was closely related to P. maxima, based on the amino acid sequences. The genomic DNA was 10850bp and contained nine exons, eight introns, and a promoter region of 3932bp. Several transcriptional factors such as GATA-1, AP-1, and SP1 were predicted in the promoter region. Quantitative RT-PCR assay indicated that the relative expression level of Pfamy was significantly higher in the digestive gland than in other tissues (gonad, gills, muscle, and mantle) (P<0.001). The expression level at salinity 27‰ was significantly higher than that at other salinities (P<0.05). Expression reached a minimum when the algal food concentration was 16×10(4)cells/mL, which was significantly lower than the level observed at 8×10(4)cells/mL and 20×10(4) cells/mL (P<0.05). Our findings provide a genetic basis for further research on Pfamy activity and will facilitate studies on the growth mechanisms and genetic improvement of the pearl oyster P. fucata. PMID:27129943

  15. Evidence-based gene predictions in plant genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Automated evidence-based gene building is a rapid and cost-effective way to provide reliable gene annotations on newly sequenced genomes. One of the limitations of evidence-based gene builders, however, is their requirement for gene expression evidence—known proteins, full-length cDNAs, or expressed...

  16. Genome-, Transcriptome- and Proteome-Wide Analyses of the Gliadin Gene Families in Triticum urartu

    PubMed Central

    Wang, Dongzhi; Yang, Wenlong; Sun, Jiazhu; Zhang, Aimin; Zhan, Kehui

    2015-01-01

    Gliadins are the major components of storage proteins in wheat grains, and they play an essential role in the dough extensibility and nutritional quality of flour. Because of the large number of the gliadin family members, the high level of sequence identity, and the lack of abundant genomic data for Triticum species, identifying the full complement of gliadin family genes in hexaploid wheat remains challenging. Triticum urartu is a wild diploid wheat species and considered the A-genome donor of polyploid wheat species. The accession PI428198 (G1812) was chosen to determine the complete composition of the gliadin gene families in the wheat A-genome using the available draft genome. Using a PCR-based cloning strategy for genomic DNA and mRNA as well as a bioinformatics analysis of genomic sequence data, 28 gliadin genes were characterized. Of these genes, 23 were α-gliadin genes, three were γ-gliadin genes and two were ω-gliadin genes. An RNA sequencing (RNA-Seq) survey of the dynamic expression patterns of gliadin genes revealed that their synthesis in immature grains began prior to 10 days post-anthesis (DPA), peaked at 15 DPA and gradually decreased at 20 DPA. The accumulation of proteins encoded by 16 of the expressed gliadin genes was further verified and quantified using proteomic methods. The phylogenetic analysis demonstrated that the homologs of these α-gliadin genes were present in tetraploid and hexaploid wheat, which was consistent with T. urartu being the A-genome progenitor species. This study presents a systematic investigation of the gliadin gene families in T. urartu that spans the genome, transcriptome and proteome, and it provides new information to better understand the molecular structure, expression profiles and evolution of the gliadin genes in T. urartu and common wheat. PMID:26132381

  17. Genome-, Transcriptome- and Proteome-Wide Analyses of the Gliadin Gene Families in Triticum urartu.

    PubMed

    Zhang, Yanlin; Luo, Guangbin; Liu, Dongcheng; Wang, Dongzhi; Yang, Wenlong; Sun, Jiazhu; Zhang, Aimin; Zhan, Kehui

    2015-01-01

    Gliadins are the major components of storage proteins in wheat grains, and they play an essential role in the dough extensibility and nutritional quality of flour. Because of the large number of the gliadin family members, the high level of sequence identity, and the lack of abundant genomic data for Triticum species, identifying the full complement of gliadin family genes in hexaploid wheat remains challenging. Triticum urartu is a wild diploid wheat species and considered the A-genome donor of polyploid wheat species. The accession PI428198 (G1812) was chosen to determine the complete composition of the gliadin gene families in the wheat A-genome using the available draft genome. Using a PCR-based cloning strategy for genomic DNA and mRNA as well as a bioinformatics analysis of genomic sequence data, 28 gliadin genes were characterized. Of these genes, 23 were α-gliadin genes, three were γ-gliadin genes and two were ω-gliadin genes. An RNA sequencing (RNA-Seq) survey of the dynamic expression patterns of gliadin genes revealed that their synthesis in immature grains began prior to 10 days post-anthesis (DPA), peaked at 15 DPA and gradually decreased at 20 DPA. The accumulation of proteins encoded by 16 of the expressed gliadin genes was further verified and quantified using proteomic methods. The phylogenetic analysis demonstrated that the homologs of these α-gliadin genes were present in tetraploid and hexaploid wheat, which was consistent with T. urartu being the A-genome progenitor species. This study presents a systematic investigation of the gliadin gene families in T. urartu that spans the genome, transcriptome and proteome, and it provides new information to better understand the molecular structure, expression profiles and evolution of the gliadin genes in T. urartu and common wheat. PMID:26132381

  18. Chicken rRNA Gene Cluster Structure

    PubMed Central

    Dyomin, Alexander G.; Koshel, Elena I.; Kiselev, Artem M.; Saifitdinova, Alsu F.; Galkina, Svetlana A.; Fukagawa, Tatsuo; Kostareva, Anna A.

    2016-01-01

    Ribosomal RNA (rRNA) genes, whose activity results in nucleolus formation, constitute an extremely important part of genome. Despite the extensive exploration into avian genomes, no complete description of avian rRNA gene primary structure has been offered so far. We publish a complete chicken rRNA gene cluster sequence here, including 5’ETS (1836 bp), 18S rRNA gene (1823 bp), ITS1 (2530 bp), 5.8S rRNA gene (157 bp), ITS2 (733 bp), 28S rRNA gene (4441 bp) and 3’ETS (343 bp). The rRNA gene cluster sequence of 11863 bp was assembled from raw reads and deposited to GenBank under KT445934 accession number. The assembly was validated through in situ fluorescent hybridization analysis on chicken metaphase chromosomes using computed and synthesized specific probes, as well as through the reference assembly against de novo assembled rRNA gene cluster sequence using sequenced fragments of BAC-clone containing chicken NOR (nucleolus organizer region). The results have confirmed the chicken rRNA gene cluster validity. PMID:27299357

  19. The human TAX1 gene encoding the axon-associated cell adhesion molecule TAG-1/axonin-1: Genomic structure and basic promoter

    SciTech Connect

    Kozlov, S.V.; Giger, R.J.; Hasler, T.; Sonderegger, P.; Korvatska, E.; Schorderet, D.F.

    1995-11-20

    The human TAX-1 gene (HGMW-approved symbol TAX1) is located on chromosome 1 (1q32.1) and encodes the neuronal cell adhesion molecule TAG-1/axonin-1. The gene product, termed TAG-1 in the rat and axonin-1 in the chicken, is composed of six immunoglobulin (Ig)-like and four fibronectin type III (FNIII)-like domains. It is found predominantly on the axons of particular nerve fiber tracts during neural development, and it has been demonstrated to function as a potent substratum for neurite outgrowth in vitro. Here we report the cloning and structural characterization of the TAX-1 gene. The transcribed region of the TAX-1 gene extends over about 40 kb. Like its chicken homologue, the human TAX-1 gene consists of 23 exons. Two GT/CA microsatellites were localized in the first intron; a polymorphism was found for one of them. Reporter gene analysis with serially truncated fragments of the 5{prime}-flanking region indicated that a 164-bp fragment located immediately upstream of the putative transcription initiation site was sufficient to function as a basal promoter. 45 refs., 3 figs., 2 tabs.

  20. Genome-wide analysis of chromatin packing in Arabidopsis thaliana at single-gene resolution

    PubMed Central

    Liu, Chang; Wang, Congmao; Wang, George; Becker, Claude; Zaidem, Maricris; Weigel, Detlef

    2016-01-01

    The three-dimensional packing of the genome plays an important role in regulating gene expression. We have used Hi-C, a genome-wide chromatin conformation capture (3C) method, to analyze Arabidopsis thaliana chromosomes dissected into subkilobase segments, which is required for gene-level resolution in this species with a gene-dense genome. We found that the repressive H3K27me3 histone mark is overrepresented in the promoter regions of genes that are in conformational linkage over long distances. In line with the globally dispersed distribution of RNA polymerase II in A. thaliana nuclear space, actively transcribed genes do not show a strong tendency to associate with each other. In general, there are often contacts between 5′ and 3′ ends of genes, forming local chromatin loops. Such self-loop structures of genes are more likely to occur in more highly expressed genes, although they can also be found in silent genes. Silent genes with local chromatin loops are highly enriched for the histone variant H3.3 at their 5′ and 3′ ends but depleted of repressive marks such as heterochromatic histone modifications and DNA methylation in flanking regions. Our results suggest that, different from animals, a major theme of genome folding in A. thaliana is the formation of structural units that correspond to gene bodies. PMID:27225844

  1. The discrepancies in the results of bioinformatics tools for genomic structural annotation

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena; Nowak, Robert; Osipowski, Paweł; Rymuszka, Jacek; Świerkula, Katarzyna; Wojcieszek, Michał; Przybecki, Zbigniew

    2014-11-01

    A major focus of sequencing project is to identify genes in genomes. However it is necessary to define the variety of genes and the criteria for identifying them. In this work we present discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. We use Fgenesh, GenScan and GeneMark to automated structural annotation, the results have been compared to reference annotation.

  2. Genomic sequencing reveals gene content, genomic organization, and recombination relationships in barley.

    PubMed

    Rostoks, Nils; Park, Yong-Jin; Ramakrishna, Wusirika; Ma, Jianxin; Druka, Arnis; Shiloff, Bryan A; SanMiguel, Phillip J; Jiang, Zeyu; Brueggeman, Robert; Sandhu, Devinder; Gill, Kulvinder; Bennetzen, Jeffrey L; Kleinhofs, Andris

    2002-05-01

    Barley (Hordeum vulgare L.) is one of the most important large-genome cereals with extensive genetic resources available in the public sector. Studies of genome organization in barley have been limited primarily to genetic markers and sparse sequence data. Here we report sequence analysis of 417.5 kb DNA from four BAC clones from different genomic locations. Sequences were analyzed with respect to gene content, the arrangement of repetitive sequences and the relationship of gene density to recombination frequencies. Gene densities ranged from 1 gene per 12 kb to 1 gene per 103 kb with an average of 1 gene per 21 kb. In general, genes were organized into islands separated by large blocks of nested retrotransposons. Single genes in apparent isolation were also found. Genes occupied 11% of the total sequence, LTR retrotransposons and other repeated elements accounted for 51.9% and the remaining 37.1% could not be annotated. PMID:12021850

  3. Genome Structure Gallery from the Mycobacterium Tuberculosis Structual Genomics Consortium

    DOE Data Explorer

    The TB Structural Genomics Consortium works with the structures of proteins from M. tuberculosis, analyzing these structures in the context of functional information that currently exists and that the Consortium generates. The database of linked structural and functional information constructed from this project will form a lasting basis for understanding M. tuberculosis pathogenesis and for structure-based drug design. The Consortium's structural and functional information is publicly available. The Structures Gallery makes more than 650 total structures available by PDB identifier. Some of these are not consortium targets, but all are viewable in 3D color and can be manipulated in various ways by Jmol, an open-source Java viewer for chemical structures in 3D from http://www.jmol.org/

  4. Gene identification in bacterial and organellar genomes using GeneScan.

    PubMed

    Ramakrishna, R; Srinivasan, R

    1999-03-30

    The performance of the GeneScan algorithm for gene identification has been improved by incorporation of a directed iterative scanning procedure. Application is made here to the cases of bacterial and organnellar genomes. The sensitivity of gene identification was 100% in Plasmodium falciparum plastid-like genome (35 kb) and in 98% in the Mycoplasma genitalium genome (approximately 580 kb) and the Haemophilus influenzae Rd genome (approximately 1.8 Mb). Sensitivity was found to improve in both the Open Reading Frames (ORFs) which have been identified as genes (by homology or by other methods) and those that are classified as hypothetical. False positive assignments (at the nucleotide level) were 0.25% in H. influenzae genome and 0.3% in M. genitalium. There were no false positive assignments in the plastid-like genome. The agreement between the GeneScan predictions and GeneMark predictions of putative ORFs was 97% in M. genitalium genome and 86% in H. influenzae genome. In terms of an exact match between predicted genes/ORFs and the annotation in the databank, GeneScan performance was evaluated to be between 72% and 90% in different genomes. We predict five putative ORFs that were not annotated earlier in the GenBank files for both M. genitalium and H. influenzae genomes. Our preliminary analysis of the newly sequenced G + C rich genome of Mycobacterium tuberculosis H37Rv also shows comparable sensitivity (99%). PMID:10353188

  5. Biased distribution of DNA uptake sequences towards genome maintenance genes.

    PubMed

    Davidsen, Tonje; Rødland, Einar A; Lagesen, Karin; Seeberg, Erling; Rognes, Torbjørn; Tønjum, Tone

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H.influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions. These results imply that the high frequency of DUS in genome maintenance genes is conserved among phylogenetically divergent species and thus are of significant biological importance. Increased DUS density is expected to enhance DNA uptake and the over-representation of DUS in genome maintenance genes might reflect facilitated recovery of genome preserving functions. For example, transient and beneficial increase in genome instability can be allowed during pathogenesis simply through loss of antimutator genes, since these DUS-containing sequences will be preferentially recovered. Furthermore, uptake of such genes could provide a mechanism for facilitated recovery from DNA damage after genotoxic stress. PMID:14960717

  6. Genomic organization and 5{prime}-flanking DNA sequence of the murine stomatin gene (Epb72)

    SciTech Connect

    Gallagher, P.G.; Turetsky, T.; Mentzer, W.C.

    1996-06-15

    Stomatin is a poorly understood integral membrane protein that is absent from the erythrocyte membranes of many patients with hereditary stomatocytosis. This report describes the cloning of the murine stomatin chromosomal gene, determination of its genomic structure, and characterization of the 5{prime}-flanking genomic DNA sequences. The stomatin gene is encoded by seven exons spread over {approximately}25 kb of genomic DNA. There is no concordance between the exon structure of the stomatin gene and the locations of three domains predicted on the basis of protein structure. Inspection of the 5{prime}-flanking DNA sequences reveals features of a TATA-less housekeeping gene promoter and consensus sequences for a number of potential DNA-binding proteins. 12 refs., 2 figs., 1 tab.

  7. Hemipteran genomics and psyllid gene expression

    Technology Transfer Automated Retrieval System (TEKTRAN)

    One of the best tools current available is the application of genomics to insect pest problems. Genomics provides rapid elucidation of the genetic basis of insect biology. Research efforts on psyllid genomics, while still in its infancy, is providing information which will aid strategies to suppress...

  8. Base composition and gene distribution: critical patterns in mammalian genome organization.

    PubMed

    Gardiner, K

    1996-12-01

    Recent success in developing transcriptional maps of large genomic regions provide excellent opportunities for the investigation of mammalian genome organization. Detailed definition of organizational features will, in the short term, aid in prioritizing genomic sequencing efforts and in interpreting sequencing results and, in the long term, will surely provide insights into the structural, functional and evolutionary basis for the mammalian chromosome and chromosomal banding patterns. For such efforts, human chromosome 21 provides an excellent model system because the physical and clone maps are detailed, and several transcriptional mapping projects have provided large numbers of novel genes. It is, therefore, valuable at this point to examine these transcriptional mapping data and to compare them with the isochore model of the mammalian genome, which describes patterns in base composition and predicts gene distributions. Not only do compelling organizational patterns appear, but new questions about additional possible patterns in gene size, structure, conservation and transcription can be asked. PMID:9257535

  9. Plant Ion Channels: Gene Families, Physiology, and Functional Genomics Analyses

    PubMed Central

    Ward, John M.; Mäser, Pascal; Schroeder, Julian I.

    2016-01-01

    Distinct potassium, anion, and calcium channels in the plasma membrane and vacuolar membrane of plant cells have been identified and characterized by patch clamping. Primarily owing to advances in Arabidopsis genetics and genomics, and yeast functional complementation, many of the corresponding genes have been identified. Recent advances in our understanding of ion channel genes that mediate signal transduction and ion transport are discussed here. Some plant ion channels, for example, ALMT and SLAC anion channel subunits, are unique. The majority of plant ion channel families exhibit homology to animal genes; such families include both hyperpolarization-and depolarization-activated Shaker-type potassium channels, CLC chloride transporters/channels, cyclic nucleotide–gated channels, and ionotropic glutamate receptor homologs. These plant ion channels offer unique opportunities to analyze the structural mechanisms and functions of ion channels. Here we review gene families of selected plant ion channel classes and discuss unique structure-function aspects and their physiological roles in plant cell signaling and transport. PMID:18842100

  10. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data.

    PubMed

    Jang, Ho; Hur, Youngmi; Lee, Hyunju

    2016-01-01

    DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes. PMID:27156852

  11. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

    PubMed Central

    Jang, Ho; Hur, Youngmi; Lee, Hyunju

    2016-01-01

    DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes. PMID:27156852

  12. Genome-editing Technologies for Gene and Cell Therapy.

    PubMed

    Maeder, Morgan L; Gersbach, Charles A

    2016-03-01

    Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed. PMID:26755333

  13. Genome-editing Technologies for Gene and Cell Therapy

    PubMed Central

    Maeder, Morgan L; Gersbach, Charles A

    2016-01-01

    Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed. PMID:26755333

  14. Integrating microarray gene expression object model and clinical document architecture for cancer genomics research.

    PubMed

    Park, Yu Rang; Lee, Hye Won; Kim, Ju Han

    2005-01-01

    Systematic integration of genomic-scale expression profiles with clinical information may facilitate cancer genomics research. MAGE-OM (Microarray Gene Expression Object Model) defines standard objects for genomic but not for clinical data. HL7 CDA (Clinical Document Architecture) is a document model for clinical information, describing syntax (generic structure) but not semantics. We designed a document template in XML Schema with additional constraints for CDA to define content semantics, enabling data model-level integration of MAGE-OM and CDA for cancer genomics research. PMID:16779360

  15. Genome Variability and Gene Content in Chordopoxviruses: Dependence on Microsatellites

    PubMed Central

    Hatcher, Eneida L.; Wang, Chunlin; Lefkowitz, Elliot J.

    2015-01-01

    To investigate gene loss in poxviruses belonging to the Chordopoxvirinae subfamily, we assessed the gene content of representative members of the subfamily, and determined whether individual genes present in each genome were intact, truncated, or fragmented. When nonintact genes were identified, the early stop mutations (ESMs) leading to gene truncation or fragmentation were analyzed. Of all the ESMs present in these poxvirus genomes, over 65% co-localized with microsatellites—simple sequence nucleotide repeats. On average, microsatellites comprise 24% of the nucleotide sequence of these poxvirus genomes. These simple repeats have been shown to exhibit high rates of variation, and represent a target for poxvirus protein variation, gene truncation, and reductive evolution. PMID:25912716

  16. Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis

    PubMed Central

    2012-01-01

    Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs) is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS) however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS) for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units) technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium) data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction. PMID:22554139

  17. Genome-wide Membrane Protein Structure Prediction

    PubMed Central

    Piccoli, Stefano; Suku, Eda; Garonzi, Marianna; Giorgetti, Alejandro

    2013-01-01

    Transmembrane proteins allow cells to extensively communicate with the external world in a very accurate and specific way. They form principal nodes in several signaling pathways and attract large interest in therapeutic intervention, as the majority pharmaceutical compounds target membrane proteins. Thus, according to the current genome annotation methods, a detailed structural/functional characterization at the protein level of each of the elements codified in the genome is also required. The extreme difficulty in obtaining high-resolution three-dimensional structures, calls for computational approaches. Here we review to which extent the efforts made in the last few years, combining the structural characterization of membrane proteins with protein bioinformatics techniques, could help describing membrane proteins at a genome-wide scale. In particular we analyze the use of comparative modeling techniques as a way of overcoming the lack of high-resolution three-dimensional structures in the human membrane proteome. PMID:24403851

  18. The Trypanosoma cruzi genome; conserved core genes and extremely variable surface molecule families.

    PubMed

    Andersson, Björn

    2011-01-01

    The protozoan parasite Trypanosoma cruzi is an important but neglected pathogen that causes chagas disease, which affects millions of people, mainly in latin America. The population structure and epidemiology of the parasite are complex, with much variability among strains. The genome sequence of a reference strain, CL Brener, was published in 2005, and the availability of this sequence has both revealed the complexity of the parasite genome and greatly facilitated research into parasite biology and pathogenesis, by making the sequences of more than 8000 core genes available. The T. cruzi genome is highly repetitive, which has resulted in inaccuracies in the genome sequence, and attempts have been made to provide a deeper analysis of repeated genes as a complement to the genome sequence. The genome was found to be organized in stable core regions containing housekeeping and other genes, surrounded by highly repetitive, often sub-telomeric highly variable regions containing multiple members of large families of surface molecule genes. Comparative sequencing of T. cruzi strains has been initiated and the results show that the core gene content of the parasite is highly conserved, but that much sequence variability, 3-4% difference at the DNA level on average between strains in coding regions, is present. The additional genomes will improve the understanding of parasite biology and epidemiology. PMID:21624458

  19. Higher plant mitochondrial DNA: Genomes, genes, mutants, transcription, translation

    SciTech Connect

    Not Available

    1986-01-01

    This volume contains brief summaries of 63 presentations given at the International Workshop on Higher Plant Mitochondrial DNA. The presentations are organized into topical discussions addressing plant genomes, mitochondrial genes, cytoplasmic male sterility, transcription, translation, plasmids and tissue culture. (DT)

  20. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics.

    PubMed

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-03-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  1. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics

    PubMed Central

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-01-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  2. Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

    PubMed

    Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

    2016-04-01

    Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp. PMID:26724943

  3. Comparative analysis of essential genes in prokaryotic genomic islands

    PubMed Central

    Zhang, Xi; Peng, Chong; Zhang, Ge; Gao, Feng

    2015-01-01

    Essential genes are thought to encode proteins that carry out the basic functions to sustain a cellular life, and genomic islands (GIs) usually contain clusters of horizontally transferred genes. It has been assumed that essential genes are not likely to be located in GIs, but systematical analysis of essential genes in GIs has not been explored before. Here, we have analyzed the essential genes in 28 prokaryotes by statistical method and reached a conclusion that essential genes in GIs are significantly fewer than those outside GIs. The function of 362 essential genes found in GIs has been explored further by BLAST against the Virulence Factor Database (VFDB) and the phage/prophage sequence database of PHAge Search Tool (PHAST). Consequently, 64 and 60 eligible essential genes are found to share the sequence similarity with the virulence factors and phage/prophages-related genes, respectively. Meanwhile, we find several toxin-related proteins and repressors encoded by these essential genes in GIs. The comparative analysis of essential genes in genomic islands will not only shed new light on the development of the prediction algorithm of essential genes, but also give a clue to detect the functionality of essential genes in genomic islands. PMID:26223387

  4. A genomic view on epilepsy and autism candidate genes.

    PubMed

    Jabbari, Kamel; Nürnberg, Peter

    2016-07-01

    Epilepsy is a common complex disorder most frequently associated with psychiatric and neurological diseases. Massive parallel sequencing of individual or cohort genomes and exomes led the identification of several disease associated genes. We review here the candidate genes in epilepsy genetics with focus on exome and gene panel data. Together with the examination of brain expressed genes and post synaptic proteome the results show that: (1) Non-metabolic epilepsies and autism candidate genes tend to be AT-rich and (2) large transcript size and local AT-richness are characteristic features of genes involved in developmental brain disorders and synaptic functions. These results point to the preferential location of core epilepsy and autism candidate genes in late replicating, GC-poor chromosomal regions (isochores). These results indicate that the genomic alterations leading to some brain disorders are confined to responsive chromatin areas harboring brain critical genes. PMID:26772991

  5. Genomic structure, chromosomal localization and expression profile of a novel melanoma differentiation associated (mda-7) gene with cancer specific growth suppressing and apoptosis inducing properties.

    SciTech Connect

    Huang, E. Y.; Madireddi, M. T.; Gopalkrishnan, R. V.; Leszczyniecka, M.; Su, Z. Z.; Lebedeva, I. V.; Kang, D. C.; Jian, H.; Lin, J. J.; Alexandre, D.; Chen, Y.; Vozhilla, N.; Mei, M. X.; Christiansen, K. A.; Sivo, F.; Goldstein, N. I.; Chada, S.; Huberman, E.; Pestka, S.; Fisher, P. B.; Biochip Technology Center; Columbia Univ.; Introgen Therapeutics Inc.; UMDNJ-Robert Wood Johnson Medical School

    2001-10-25

    Abnormalities in cellular differentiation are frequent occurrences in human cancers. Treatment of human melanoma cells with recombinant fibroblast interferon (IFN-beta) and the protein kinase C activator mezerein (MEZ) results in an irreversible loss in growth potential, suppression of tumorigenic properties and induction of terminal cell differentiation. Subtraction hybridization identified melanoma differentiation associated gene-7 (mda-7), as a gene induced during these physiological changes in human melanoma cells. Ectopic expression of mda-7 by means of a replication defective adenovirus results in growth suppression and induction of apoptosis in a broad spectrum of additional cancers, including melanoma, glioblastoma multiforme, osteosarcoma and carcinomas of the breast, cervix, colon, lung, nasopharynx and prostate. In contrast, no apparent harmful effects occur when mda-7 is expressed in normal epithelial or fibroblast cells. Human clones of mda-7 were isolated and its organization resolved in terms of intron/exon structure and chromosomal localization. Hu-mda-7 encompasses seven exons and six introns and encodes a protein with a predicted size of 23.8 kDa, consisting of 206 amino acids. Hu-mda-7 mRNA is stably expressed in the thymus, spleen and peripheral blood leukocytes. De novo mda-7 mRNA expression is also detected in human melanocytes and expression is inducible in cells of melanocyte/melanoma lineage and in certain normal and cancer cell types following treatment with a combination of IFN-beta plus MEZ. Mda-7 expression is also induced during megakaryocyte differentiation induced in human hematopoietic cells by treatment with TPA (12-O-tetradecanoyl phorbol-13-acetate). In contrast, de novo expression of mda-7 is not detected nor is it inducible by IFN-beta+MEZ in a spectrum of additional normal and cancer cells. No correlation was observed between induction of mda-7 mRNA expression and growth suppression following treatment with IFN-beta+MEZ and

  6. Impact of recurrent gene duplication on adaptation of plant genomes

    PubMed Central

    2014-01-01

    Background Recurrent gene duplication and retention played an important role in angiosperm genome evolution. It has been hypothesized that these processes contribute significantly to plant adaptation but so far this hypothesis has not been tested at the genome scale. Results We studied available sequenced angiosperm genomes to assess the frequency of positive selection footprints in lineage specific expanded (LSE) gene families compared to single-copy genes using a dN/dS-based test in a phylogenetic framework. We found 5.38% of alignments in LSE genes with codons under positive selection. In contrast, we found no evidence for codons under positive selection in the single-copy reference set. An analysis at the branch level shows that purifying selection acted more strongly on single-copy genes than on LSE gene clusters. Moreover we detect significantly more branches indicating evolution under positive selection and/or relaxed constraint in LSE genes than in single-copy genes. Conclusions In this – to our knowledge –first genome-scale study we provide strong empirical support for the hypothesis that LSE genes fuel adaptation in angiosperms. Our conservative approach for detecting selection footprints as well as our results can be of interest for further studies on (plant) gene family evolution. PMID:24884640

  7. Structure of the human hexabrachion (tenascin) gene.

    PubMed Central

    Gulcher, J R; Nies, D E; Alexakos, M J; Ravikant, N A; Sturgill, M E; Marton, L S; Stefansson, K

    1991-01-01

    The structure of the gene encoding human hexabrachion (tenascin) has been determined from overlapping clones isolated from a human genomic bacteriophage library. The genomic inserts were characterized by restriction mapping, Southern blot analysis, PCR, and DNA sequencing. The coding region of the hexabrachion gene spans approximately 80 kilobases of DNA and consists of 27 exons separated by 26 introns. The exon-intron structure supports a hypothesis based on the cDNA sequence that the hexabrachion gene is an assembly of DNA modules that are also found elsewhere in the genome. Single exons may encode a module, a portion of a module, or a group of modules. The 15 type III units similar to those found in fibronectin are each encoded either by a single exon or by two exons interrupted by an intron. All type III units known to be spliced out of the smaller forms of the protein are encoded by one exon. The fibrinogen-like domain of 210 amino acids is encoded by five exons. The 14.5 epidermal growth factor-like repeats are all encoded by a single exon. Images PMID:1719530

  8. Historical overview of research on the tobacco mosaic virus genome: genome organization, infectivity and gene manipulation.

    PubMed Central

    Okada, Y

    1999-01-01

    Early in the development of molecular biology, TMV RNA was widely used as a mRNA [corrected] that could be purified easily, and it contributed much to research on protein synthesis. Also, in the early stages of elucidation of the genetic code, artificially produced TMV mutants were widely used and provided the first proof that the genetic code was non-overlapping. In 1982, Goelet et al. determined the complete TMV RNA base sequence of 6395 nucleotides. The four genes (130K, 180K, 30K and coat protein) could then be mapped at precise locations in the TMV genome. Furthermore it had become clear, a little earlier, that genes located internally in the genome were expressed via subgenomic mRNAs. The initiation site for assembly of TMV particles was also determined. However, although TMV contributed so much at the beginning of the development of molecular biology, its influence was replaced by that of Escherichia coli and its phages in the next phase. As recombinant DNA technology developed in the 1980s, RNA virus research became more detached from the frontier of molecular biology. To recover from this setback, a gene-manipulation system was needed for RNA viruses. In 1986, two such systems were developed for TMV, using full-length cDNA clones, by Dawson's group and by Okada's group. Thus, reverse genetics could be used to elucidate the basic functions of all proteins encoded by the TMV genome. Identification of the function of the 30K protein was especially important because it was the first evidence that a plant virus possesses a cell-to-cell movement function. Many other plant viruses have since been found to encode comparable 'movement proteins'. TMV thus became the first plant virus for which structures and functions were known for all its genes. At the birth of molecular plant pathology, TMV became a leader again. TMV has also played pioneering roles in many other fields. TMV was the first virus for which the amino acid sequence of the coat protein was determined

  9. Genomic variants of genes associated with three horticultural traits in apple revealed by genome re-sequencing

    PubMed Central

    Zhang, Shijie; Chen, Weiping; Xin, Lu; Gao, Zhihong; Hou, Yingjun; Yu, Xinyi; Zhang, Zhen; Qu, Shenchun

    2014-01-01

    The apple (Malus × domestica Borkh.) cultivar ‘Su Shuai’ exhibits greater disease resistance, shorter internodes and lighter fruit flavor compared with its parents ‘Golden Delicious’ and ‘Indo’. To obtain a comprehensive overview of the sequence variation in these three horticultural traits, the genomes of ‘Su Shuai’ and ‘Indo’ were resequenced using next-generation sequencing and compared to the genome of ‘Golden Delicious’. A wide range of genetic variations were detected, including 2 454 406 and 18 749 349 single nucleotide polymorphism (SNP) and 59 547 and 50 143 structural variants (SVs) in the ‘Indo’ and ‘Su Shuai’ genomes, respectively. Among the SVs in ‘Su Shuai’, 17 genes related to disease resistance, 10 genes related to Gibberellin (GA) and 19 genes associated with fruit flavor were identified. The expression patterns of eight of the SV genes were examined using reverse transcription-quantitative polymerase chain reaction (RT-qPCR). The results of this study illustrate the genomic variation in these cultivars and provide evidence for a genetic basis for the horticultural traits of disease resistance, short internodes and lighter flavor exhibited in these cultivars. These results provide a genetic basis for the phenotypic characteristics of ‘Su Shuai’ and, as such, these SVs could serve as gene-specific molecular markers in maker-assisted breeding of apples. PMID:26504548

  10. The structure of neutrophil defensin genes.

    PubMed

    Linzmeier, R; Michaelson, D; Liu, L; Ganz, T

    1993-04-26

    Defensins are a family of microbicidal peptides abundant in the granules of mammalian neutrophils, in rabbit alveolar macrophages, and in human and murine intestinal Paneth cells. We cloned and sequenced the genes of three neutrophil-specific defensins. Human HNP-1 and HNP-3 are nearly identical and rabbit NP-3a is closely related. The four known neutrophil-specific defensin genes are strikingly similar in the structure and organization of their three exons and two introns, but the three defensin genes expressed in macrophages (MCP-1 and -2) or Paneth cells (HD-5) are organized differently: HD-5 had only two exons, and MCP-1 and -2 have a comparatively short first intron. The diverse genomic organization of defensin genes may contribute to their cell-specific expression. PMID:8477861

  11. Identification of chemosensory receptor genes from vertebrate genomes.

    PubMed

    Niimura, Yoshihito

    2013-01-01

    Chemical senses are essential for the survival of animals. In vertebrates, mainly three different types of receptors, olfactory receptors (ORs), vomeronasal receptors type 1 (V1Rs), and vomeronasal receptors type 2 (V2Rs), are responsible for the detection of chemicals in the environment. Mouse or rat genomes contain >1,000 OR genes, forming the largest multigene family in vertebrates, and have >100 V1R and V2R genes as well. Recent advancement in genome sequencing enabled us to computationally identify nearly complete repertories of OR, V1R, and V2R genes from various organisms, revealing that the numbers of these genes are highly variable among different organisms depending on each species' living environment. Here I would explain bioinformatic methods to identify the entire repertoires of OR, V1R, and V2R genes from vertebrate genome sequences. PMID:24014356

  12. Genome-Wide Detection and Analysis of Multifunctional Genes

    PubMed Central

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  13. Genome-Wide Detection and Analysis of Multifunctional Genes.

    PubMed

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-10-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms--H. sapiens, D. melanogaster, and S. cerevisiae--and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  14. Rapid turnover of antimicrobial-type cysteine-rich protein genes in closely related Oryza genomes.

    PubMed

    Shenton, Matthew R; Ohyanagi, Hajime; Wang, Zi-Xuan; Toyoda, Atsushi; Fujiyama, Asao; Nagata, Toshifumi; Feng, Qi; Han, Bin; Kurata, Nori

    2015-10-01

    Defensive and reproductive protein genes undergo rapid evolution. Small, cysteine-rich secreted peptides (CRPs) act as antimicrobial agents and function in plant intercellular signaling and are over-represented among reproductively expressed proteins. Because of their roles in defense, reproduction and development and their presence in multigene families, CRP variation can have major consequences for plant phenotypic and functional diversification. We surveyed the CRP genes of six closely related Oryza genomes comprising Oryza sativa ssp. japonica and ssp. indica, Oryza glaberrima and three accessions of Oryza rufipogon to observe patterns of evolution in these gene families and the effects of variation on their gene expression. These Oryza genomes, like other plant genomes, have accumulated large reservoirs of CRP sequences, comprising 26 groups totaling between 676 and 843 genes, in contrast to antimicrobial CRPs in animal genomes. Despite the close evolutionary relationships between the genomes, we observed rapid changes in number and structure among CRP gene families. Many CRP sequences are in gene clusters generated by local duplications, have undergone rapid turnover and are more likely to be silent or specifically expressed. By contrast, conserved CRP genes are more likely to be highly and broadly expressed. Variable CRP genes created by repeated duplication, gene modification and inactivation can gain new functions and expression patterns in newly evolved gene copies. For the CRP proteins, the process of gain/loss by deletion or duplication at gene clusters seems to be an important mechanism in evolution of the gene families, which also contributes to their expression evolution. PMID:25842177

  15. Pinpointing disease genes through phenomic and genomic data fusion

    PubMed Central

    2015-01-01

    Background Pinpointing genes involved in inherited human diseases remains a great challenge in the post-genomics era. Although approaches have been proposed either based on the guilt-by-association principle or making use of disease phenotype similarities, the low coverage of both diseases and genes in existing methods has been preventing the scan of causative genes for a significant proportion of diseases at the whole-genome level. Results To overcome this limitation, we proposed a rigorous statistical method called pgFusion to prioritize candidate genes by integrating one type of disease phenotype similarity derived from the Unified Medical Language System (UMLS) and seven types of gene functional similarities calculated from gene expression, gene ontology, pathway membership, protein sequence, protein domain, protein-protein interaction and regulation pattern, respectively. Our method covered a total of 7,719 diseases and 20,327 genes, achieving the highest coverage thus far for both diseases and genes. We performed leave-one-out cross-validation experiments to demonstrate the superior performance of our method and applied it to a real exome sequencing dataset of epileptic encephalopathies, showing the capability of this approach in finding causative genes for complex diseases. We further provided the standalone software and online services of pgFusion at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgfusion. Conclusions pgFusion not only provided an effective way for prioritizing candidate genes, but also demonstrated feasible solutions to two fundamental questions in the analysis of big genomic data: the comparability of heterogeneous data and the integration of multiple types of data. Applications of this method in exome or whole genome sequencing studies would accelerate the finding of causative genes for human diseases. Other research fields in genomics could also benefit from the incorporation of our data fusion methodology. PMID:25708473

  16. The fractal structure of the mitochondrial genomes

    NASA Astrophysics Data System (ADS)

    Oiwa, Nestor N.; Glazier, James A.

    2002-08-01

    The mitochondrial DNA genome has a definite multifractal structure. We show that loops, hairpins and inverted palindromes are responsible for this self-similarity. We can thus establish a definite relation between the function of subsequences and their fractal dimension. Intriguingly, protein coding DNAs also exhibit palindromic structures, although they do not appear in the sequence of amino acids. These structures may reflect the stabilization and transcriptional control of DNA or the control of posttranscriptional editing of mRNA.

  17. Molecular Assemblies, Genes and Genomics Integrated Efficiently (MAGGIE)

    SciTech Connect

    Baliga, Nitin S

    2011-05-26

    Final report on MAGGIE. We set ambitious goals to model the functions of individual organisms and their community from molecular to systems scale. These scientific goals are driving the development of sophisticated algorithms to analyze large amounts of experimental measurements made using high throughput technologies to explain and predict how the environment influences biological function at multiple scales and how the microbial systems in turn modify the environment. By experimentally evaluating predictions made using these models we will test the degree to which our quantitative multiscale understanding wilt help to rationally steer individual microbes and their communities towards specific tasks. Towards this end we have made substantial progress towards understanding evolution of gene families, transcriptional structures, detailed structures of keystone molecular assemblies (proteins and complexes), protein interactions, biological networks, microbial interactions, and community structure. Using comparative analysis we have tracked the evolutionary history of gene functions to understand how novel functions evolve. One level up, we have used proteomics data, high-resolution genome tiling microarrays, and 5' RNA sequencing to revise genome annotations, discover new genes including ncRNAs, and map dynamically changing operon structures of five model organisms: For Desulfovibrio vulgaris Hildenborough, Pyrococcus furiosis, Sulfolobus solfataricus, Methanococcus maripaludis and Haiobacterium salinarum NROL We have developed machine learning algorithms to accurately identify protein interactions at a near-zero false positive rate from noisy data generated using tagfess complex purification, TAP purification, and analysis of membrane complexes. Combining other genome-scale datasets produced by ENIGMA (in particular, microarray data) and available from literature we have been able to achieve a true positive rate as high as 65% at almost zero false positives when

  18. The cavefish genome reveals candidate genes for eye loss.

    PubMed

    McGaugh, Suzanne E; Gross, Joshua B; Aken, Bronwen; Blin, Maryline; Borowsky, Richard; Chalopin, Domitille; Hinaux, Hélène; Jeffery, William R; Keene, Alex; Ma, Li; Minx, Patrick; Murphy, Daniel; O'Quin, Kelly E; Rétaux, Sylvie; Rohner, Nicolas; Searle, Steve M J; Stahl, Bethany A; Tabin, Cliff; Volff, Jean-Nicolas; Yoshizawa, Masato; Warren, Wesley C

    2014-01-01

    Natural populations subjected to strong environmental selection pressures offer a window into the genetic underpinnings of evolutionary change. Cavefish populations, Astyanax mexicanus (Teleostei: Characiphysi), exhibit repeated, independent evolution for a variety of traits including eye degeneration, pigment loss, increased size and number of taste buds and mechanosensory organs, and shifts in many behavioural traits. Surface and cave forms are interfertile making this system amenable to genetic interrogation; however, lack of a reference genome has hampered efforts to identify genes responsible for changes in cave forms of A. mexicanus. Here we present the first de novo genome assembly for Astyanax mexicanus cavefish, contrast repeat elements to other teleost genomes, identify candidate genes underlying quantitative trait loci (QTL), and assay these candidate genes for potential functional and expression differences. We expect the cavefish genome to advance understanding of the evolutionary process, as well as, analogous human disease including retinal dysfunction. PMID:25329095

  19. The cavefish genome reveals candidate genes for eye loss

    PubMed Central

    McGaugh, Suzanne E.; Gross, Joshua B.; Aken, Bronwen; Blin, Maryline; Borowsky, Richard; Chalopin, Domitille; Hinaux, Hélène; Jeffery, William R.; Keene, Alex; Ma, Li; Minx, Patrick; Murphy, Daniel; O’Quin, Kelly E.; Rétaux, Sylvie; Rohner, Nicolas; Searle, Steve M. J.; Stahl, Bethany A.; Tabin, Cliff; Volff, Jean-Nicolas; Yoshizawa, Masato; Warren, Wesley C.

    2014-01-01

    Natural populations subjected to strong environmental selection pressures offer a window into the genetic underpinnings of evolutionary change. Cavefish populations, Astyanax mexicanus (Teleostei: Characiphysi), exhibit repeated, independent evolution for a variety of traits including eye degeneration, pigment loss, increased size and number of taste buds and mechanosensory organs, and shifts in many behavioural traits. Surface and cave forms are interfertile making this system amenable to genetic interrogation; however, lack of a reference genome has hampered efforts to identify genes responsible for changes in cave forms of A. mexicanus. Here we present the first de novo genome assembly for Astyanax mexicanus cavefish, contrast repeat elements to other teleost genomes, identify candidate genes underlying quantitative trait loci (QTL), and assay these candidate genes for potential functional and expression differences. We expect the cavefish genome to advance understanding of the evolutionary process, as well as, analogous human disease including retinal dysfunction. PMID:25329095

  20. Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss

    PubMed Central

    2010-01-01

    Background The bacterial genus Listeria contains pathogenic and non-pathogenic species, including the pathogens L. monocytogenes and L. ivanovii, both of which carry homologous virulence gene clusters such as the prfA cluster and clusters of internalin genes. Initial evidence for multiple deletions of the prfA cluster during the evolution of Listeria indicates that this genus provides an interesting model for studying the evolution of virulence and also presents practical challenges with regard to definition of pathogenic strains. Results To better understand genome evolution and evolution of virulence characteristics in Listeria, we used a next generation sequencing approach to generate draft genomes for seven strains representing Listeria species or clades for which genome sequences were not available. Comparative analyses of these draft genomes and six publicly available genomes, which together represent the main Listeria species, showed evidence for (i) a pangenome with 2,032 core and 2,918 accessory genes identified to date, (ii) a critical role of gene loss events in transition of Listeria species from facultative pathogen to saprotroph, even though a consistent pattern of gene loss seemed to be absent, and a number of isolates representing non-pathogenic species still carried some virulence associated genes, and (iii) divergence of modern pathogenic and non-pathogenic Listeria species and strains, most likely circa 47 million years ago, from a pathogenic common ancestor that contained key virulence genes. Conclusions Genome evolution in Listeria involved limited gene loss and acquisition as supported by (i) a relatively high coverage of the predicted pan-genome by the observed pan-genome, (ii) conserved genome size (between 2.8 and 3.2 Mb), and (iii) a highly syntenic genome. Limited gene loss in Listeria did include loss of virulence associated genes, likely associated with multiple transitions to a saprotrophic lifestyle. The genus Listeria thus provides

  1. Genome-scale comparative analysis of gene fusions, gene fissions, and the fungal tree of life

    PubMed Central

    Leonard, Guy; Richards, Thomas A.

    2012-01-01

    During the course of evolution genes undergo both fusion and fission by which ORFs are joined or separated. These processes can amend gene function and represent an important factor in the evolution of protein interaction networks. Gene fusions have been suggested to be useful characters for identifying evolutionary relationships because they constitute synapomorphies or cladistic characters. To investigate the fidelity of gene-fusion characters, we developed an approach for identifying differentially distributed gene fusions among whole-genome datasets: fdfBLAST. Applying this tool to the Fungi, we identified 63 gene fusions present in two or more genomes. Using a combination of phylogenetic and comparative genomic analyses, we then investigated the evolution of these genes across 115 fungal genomes, testing each gene fusion for evidence of homoplasy, including gene fission, convergence, and horizontal gene transfer. These analyses demonstrated 110 gene-fission events. We then identified a minimum of three mechanisms that drive gene fission: separation, degeneration, and duplication. These data suggest that gene fission plays an important and hitherto underestimated role in gene evolution. Gene fusions therefore are highly labile characters, and their use for polarizing evolutionary relationships, without reference to gene and species phylogenies, is limited. Accounting for these considerable sources of homoplasy, we identified fusion characters that provide support for multiple nodes in the phylogeny of the Fungi, including relationships within the deeply derived flagellum-forming fungi (i.e., the chytrids). PMID:23236161

  2. Genome-level evolution of resistance genes in Arabidopsis thaliana.

    PubMed Central

    Baumgarten, Andrew; Cannon, Steven; Spangler, Russ; May, Georgiana

    2003-01-01

    Pathogen resistance genes represent some of the most abundant and diverse gene families found within plant genomes. However, evolutionary mechanisms generating resistance gene diversity at the genome level are not well understood. We used the complete Arabidopsis thaliana genome sequence to show that most duplication of individual NBS-LRR sequences occurs at close physical proximity to the parent sequence and generates clusters of closely related NBS-LRR sequences. Deploying the statistical strength of phylogeographic approaches and using chromosomal location as a proxy for spatial location, we show that apparent duplication of NBS-LRR genes to ectopic chromosomal locations is largely the consequence of segmental chromosome duplication and rearrangement, rather than the independent duplication of individual sequences. Although accounting for a smaller fraction of NBS-LRR gene duplications, segmental chromosome duplication and rearrangement events have a large impact on the evolution of this multigene family. Intergenic exchange is dramatically lower between NBS-LRR sequences located in different chromosome regions as compared to exchange between sequences within the same chromosome region. Consequently, once translocated to new chromosome locations, NBS-LRR gene copies have a greater likelihood of escaping intergenic exchange and adopting new functions than do gene copies located within the same chromosomal region. We propose an evolutionary model that relates processes of genome evolution to mechanisms of evolution for the large, diverse, NBS-LRR gene family. PMID:14504238

  3. Performing integrative functional genomics analysis in GeneWeaver.org.

    PubMed

    Jay, Jeremy J; Chesler, Elissa J

    2014-01-01

    Functional genomics experiments and analyses give rise to large sets of results, each typically quantifying the relation of molecular entities including genes, gene products, polymorphisms, and other genomic features with biological characteristics or processes. There is tremendous utility and value in using these data in an integrative fashion to find convergent evidence for the role of genes in various processes, to identify functionally similar molecular entities, or to compare processes based on their genomic correlates. However, these gene-centered data are often deposited in diverse and non-interoperable stores. Therefore, integration requires biologists to implement computational algorithms and harmonization of gene identifiers both within and across species. The GeneWeaver web-based software system brings together a large data archive from diverse functional genomics data with a suite of combinatorial tools in an interactive environment. Account management features allow data and results to be shared among user-defined groups. Users can retrieve curated gene set data, upload, store, and share their own experimental results and perform integrative analyses including novel algorithmic approaches for set-set integration of genes and functions. PMID:24233775

  4. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure

    PubMed Central

    2011-01-01

    Background Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome. Results Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella. Conclusions When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution. PMID:21619600

  5. Complexity, Post-genomic Biology and Gene Expression Programs

    NASA Astrophysics Data System (ADS)

    Williams, Rohan B. H.; Luo, Oscar Junhong

    Gene expression represents the fundamental phenomenon by which information encoded in a genome is utilised for the overall biological objectives of the organism. Understanding this level of information transfer is therefore essential for dissecting the mechanistic basis of form and function of organisms. We survey recent developments in the methodology of the life sciences that is relevant for understanding the organisation and function of the genome and review our current understanding of the regulation of gene expression, and finally, outline some new approaches that may be useful in understanding the organisation of gene regulatory systems.

  6. LATERAL GENE TRANSFER AND THE HISTORY OF BACTERIAL GENOMES

    SciTech Connect

    Howard Ochman

    2006-02-22

    The aims of this research were to elucidate the role and extent of lateral transfer in the differentiation of bacterial strains and species, and to assess the impact of gene transfer on the evolution of bacterial genomes. The ultimate goal of the project is to examine the dynamics of a core set of protein-coding genes (i.e., those that are distributed universally among Bacteria) by developing conserved primers that would allow their amplification and sequencing in any bacterial taxa. In addition, we adopted a bioinformatic approach to elucidate the extent of lateral gene transfer in sequenced genome.

  7. The Gene Ontology's Reference Genome Project: A Unified Framework for Functional Annotation across Species

    PubMed Central

    2009-01-01

    The Gene Ontology (GO) is a collaborative effort that provides structured vocabularies for annotating the molecular function, biological role, and cellular location of gene products in a highly systematic way and in a species-neutral manner with the aim of unifying the representation of gene function across different organisms. Each contributing member of the GO Consortium independently associates GO terms to gene products from the organism(s) they are annotating. Here we introduce the Reference Genome project, which brings together those independent efforts into a unified framework based on the evolutionary relationships between genes in these different organisms. The Reference Genome project has two primary goals: to increase the depth and breadth of annotations for genes in each of the organisms in the project, and to create data sets and tools that enable other genome annotation efforts to infer GO annotations for homologous genes in their organisms. In addition, the project has several important incidental benefits, such as increasing annotation consistency across genome databases, and providing important improvements to the GO's logical structure and biological content. PMID:19578431

  8. Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome.

    PubMed

    Muñoz-Amatriaín, María; Lonardi, Stefano; Luo, MingCheng; Madishetty, Kavitha; Svensson, Jan T; Moscou, Matthew J; Wanamaker, Steve; Jiang, Tao; Kleinhofs, Andris; Muehlbauer, Gary J; Wise, Roger P; Stein, Nils; Ma, Yaqin; Rodriguez, Edmundo; Kudrna, Dave; Bhat, Prasanna R; Chao, Shiaoman; Condamine, Pascal; Heinen, Shane; Resnik, Josh; Wing, Rod; Witt, Heather N; Alpert, Matthew; Beccuti, Marco; Bozdag, Serdar; Cordero, Francesca; Mirebrahim, Hamid; Ounit, Rachid; Wu, Yonghui; You, Frank; Zheng, Jie; Simková, Hana; Dolezel, Jaroslav; Grimwood, Jane; Schmutz, Jeremy; Duma, Denisa; Altschmied, Lothar; Blake, Tom; Bregitzer, Phil; Cooper, Laurel; Dilbirligi, Muharrem; Falk, Anders; Feiz, Leila; Graner, Andreas; Gustafson, Perry; Hayes, Patrick M; Lemaux, Peggy; Mammadov, Jafar; Close, Timothy J

    2015-10-01

    Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant. PMID:26252423

  9. Detection of Prokaryotic Genes in the Amphimedon queenslandica Genome

    PubMed Central

    Conaco, Cecilia; Tsoulfas, Pantelis; Sakarya, Onur; Dolan, Amanda; Werren, John; Kosik, Kenneth S.

    2016-01-01

    Horizontal gene transfer (HGT) is common between prokaryotes and phagotrophic eukaryotes. In metazoans, the scale and significance of HGT remains largely unexplored but is usually linked to a close association with parasites and endosymbionts. Marine sponges (Porifera), which host many microorganisms in their tissues and lack an isolated germ line, are potential carriers of genes transferred from prokaryotes. In this study, we identified a number of potential horizontally transferred genes within the genome of the sponge, Amphimedon queenslandica. We further identified homologs of some of these genes in other sponges. The transferred genes, most of which possess catalytic activity for carbohydrate or protein metabolism, have assimilated host genome characteristics and are actively expressed. The diversity of functions contributed by the horizontally transferred genes is likely an important factor in the adaptation and evolution of A. queenslandica. These findings highlight the potential importance of HGT on the success of sponges in diverse ecological niches. PMID:26959231

  10. Detection of Prokaryotic Genes in the Amphimedon queenslandica Genome.

    PubMed

    Conaco, Cecilia; Tsoulfas, Pantelis; Sakarya, Onur; Dolan, Amanda; Werren, John; Kosik, Kenneth S

    2016-01-01

    Horizontal gene transfer (HGT) is common between prokaryotes and phagotrophic eukaryotes. In metazoans, the scale and significance of HGT remains largely unexplored but is usually linked to a close association with parasites and endosymbionts. Marine sponges (Porifera), which host many microorganisms in their tissues and lack an isolated germ line, are potential carriers of genes transferred from prokaryotes. In this study, we identified a number of potential horizontally transferred genes within the genome of the sponge, Amphimedon queenslandica. We further identified homologs of some of these genes in other sponges. The transferred genes, most of which possess catalytic activity for carbohydrate or protein metabolism, have assimilated host genome characteristics and are actively expressed. The diversity of functions contributed by the horizontally transferred genes is likely an important factor in the adaptation and evolution of A. queenslandica. These findings highlight the potential importance of HGT on the success of sponges in diverse ecological niches. PMID:26959231

  11. Genomes, diversity and resistance gene analogues in Musa species.

    PubMed

    Azhar, M; Heslop-Harrison, J S

    2008-01-01

    Resistance genes (R genes) in plants are abundant and may represent more than 1% of all the genes. Their diversity is critical to the recognition and response to attack from diverse pathogens. Like many other crops, banana and plantain face attacks from potentially devastating fungal and bacterial diseases, increased by a combination of worldwide spread of pathogens, exploitation of a small number of varieties, new pathogen mutations, and the lack of effective, benign and cheap chemical control. The challenge for plant breeders is to identify and exploit genetic resistances to diseases, which is particularly difficult in banana and plantain where the valuable cultivars are sterile, parthenocarpic and mostly triploid so conventional genetic analysis and breeding is impossible. In this paper, we review the nature of R genes and the key motifs, particularly in the Nucleotide Binding Sites (NBS), Leucine Rich Repeat (LRR) gene class. We present data about identity, nature and evolutionary diversity of the NBS domains of Musa R genes in diploid wild species with the Musa acuminata (A), M. balbisiana (B), M. schizocarpa (S), M. textilis (T), M. velutina and M. ornata genomes, and from various cultivated hybrid and triploid accessions, using PCR primers to isolate the domains from genomic DNA. Of 135 new sequences, 75% of the sequenced clones had uninterrupted open reading frames (ORFs), and phylogenetic UPGMA tree construction showed four clusters, one from Musa ornata, one largely from the B and T genomes, one from A and M. velutina, and the largest with A, B, T and S genomes. Only genes of the coiled-coil (non-TIR) class were found, typical of the grasses and presumably monocotyledons. The analysis of R genes in cultivated banana and plantain, and their wild relatives, has implications for identification and selection of resistance genes within the genus which may be useful for plant selection and breeding and also for defining relationships and genome evolution

  12. Genome-wide identification and analysis of the MADS-box gene family in sesame.

    PubMed

    Wei, Xin; Wang, Linhai; Yu, Jingyin; Zhang, Yanxin; Li, Donghua; Zhang, Xiurong

    2015-09-10

    MADS-box genes encode transcription factors that play crucial roles in plant growth and development. Sesame (Sesamum indicum L.) is an oil crop that contributes to the daily oil and protein requirements of almost half of the world's population; therefore, a genome-wide analysis of the MADS-box gene family is needed. Fifty-seven MADS-box genes were identified from 14 linkage groups of the sesame genome. Analysis of phylogenetic relationships with Arabidopsis thaliana, Utricularia gibba and Solanum lycopersicum MADS-box genes was performed. Sesame MADS-box genes were clustered into four groups: 28 MIKC(c)-type, 5 MIKC(⁎)-type, 14 Mα-type and 10 Mγ-type. Gene structure analysis revealed from 1 to 22 exons of sesame MADS-box genes. The number of exons in type II MADS-box genes greatly exceeded the number in type I genes. Motif distribution analysis of sesame MADS-box genes also indicated that type II MADS-box genes contained more motifs than type I genes. These results suggested that type II sesame MADS-box genes had more complex structures. By analyzing expression profiles of MADS-box genes in seven sesame transcriptomes, we determined that MIKC(C)-type MADS-box genes played significant roles in sesame flower and seed development. Although most MADS-box genes in the same clade showed similar expression features, some gene functions were diversified from the orthologous Arabidopsis genes. This research will contribute to uncovering the role of MADS-box genes in sesame development. PMID:25967387

  13. Genome engineering using a synthetic gene circuit in Bacillus subtilis

    PubMed Central

    Jeong, Da-Eun; Park, Seung-Hwan; Pan, Jae-Gu; Kim, Eui-Joong; Choi, Soo-Keun

    2015-01-01

    Genome engineering without leaving foreign DNA behind requires an efficient counter-selectable marker system. Here, we developed a genome engineering method in Bacillus subtilis using a synthetic gene circuit as a counter-selectable marker system. The system contained two repressible promoters (B. subtilis xylA (Pxyl) and spac (Pspac)) and two repressor genes (lacI and xylR). Pxyl-lacI was integrated into the B. subtilis genome with a target gene containing a desired mutation. The xylR and Pspac–chloramphenicol resistant genes (cat) were located on a helper plasmid. In the presence of xylose, repression of XylR by xylose induced LacI expression, the LacIs repressed the Pspac promoter and the cells become chloramphenicol sensitive. Thus, to survive in the presence of chloramphenicol, the cell must delete Pxyl-lacI by recombination between the wild-type and mutated target genes. The recombination leads to mutation of the target gene. The remaining helper plasmid was removed easily under the chloramphenicol absent condition. In this study, we showed base insertion, deletion and point mutation of the B. subtilis genome without leaving any foreign DNA behind. Additionally, we successfully deleted a 2-kb gene (amyE) and a 38-kb operon (ppsABCDE). This method will be useful to construct designer Bacillus strains for various industrial applications. PMID:25552415

  14. Sessile snails, dynamic genomes: gene rearrangements within the mitochondrial genome of a family of caenogastropod molluscs

    PubMed Central

    2010-01-01

    Background Widespread sampling of vertebrates, which comprise the majority of published animal mitochondrial genomes, has led to the view that mitochondrial gene rearrangements are relatively rare, and that gene orders are typically stable across major taxonomic groups. In contrast, more limited sampling within the Phylum Mollusca has revealed an unusually high number of gene order arrangements. Here we provide evidence that the lability of the molluscan mitochondrial genome extends to the family level by describing extensive gene order changes that have occurred within the Vermetidae, a family of sessile marine gastropods that radiated from a basal caenogastropod stock during the Cenozoic Era. Results Major mitochondrial gene rearrangements have occurred within this family at a scale unexpected for such an evolutionarily young group and unprecedented for any caenogastropod examined to date. We determined the complete mitochondrial genomes of four species (Dendropoma maximum, D. gregarium, Eualetes tulipa, and Thylacodes squamigerus) and the partial mitochondrial genomes of two others (Vermetus erectus and Thylaeodus sp.). Each of the six vermetid gastropods assayed possessed a unique gene order. In addition to the typical mitochondrial genome complement of 37 genes, additional tRNA genes were evident in D. gregarium (trnK) and Thylacodes squamigerus (trnV, trnLUUR). Three pseudogenes and additional tRNAs found within the genome of Thylacodes squamigerus provide evidence of a past duplication event in this taxon. Likewise, high sequence similarities between isoaccepting leucine tRNAs in Thylacodes, Eualetes, and Thylaeodus suggest that tRNA remolding has been rife within this family. While vermetids exhibit gene arrangements diagnostic of this family, they also share arrangements with littorinimorph caenogastropods, with which they have been linked based on sperm morphology and primary sequence-based phylogenies. Conclusions We have uncovered major changes in gene

  15. mGene: accurate SVM-based gene finding with an application to nematode genomes.

    PubMed

    Schweikert, Gabriele; Zien, Alexander; Zeller, Georg; Behr, Jonas; Dieterich, Christoph; Ong, Cheng Soon; Philips, Petra; De Bona, Fabio; Hartmann, Lisa; Bohlen, Anja; Krüger, Nina; Sonnenburg, Sören; Rätsch, Gunnar

    2009-11-01

    We present a highly accurate gene-prediction system for eukaryotic genomes, called mGene. It combines in an unprecedented manner the flexibility of generalized hidden Markov models (gHMMs) with the predictive power of modern machine learning methods, such as Support Vector Machines (SVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. Considering the average of sensitivity and specificity, the developmental version of mGene exhibited the best prediction performance on nucleotide, exon, and transcript level for ab initio and multiple-genome gene-prediction tasks. The fully developed version shows superior performance in 10 out of 12 evaluation criteria compared with the other participating gene finders, including Fgenesh++ and Augustus. An in-depth analysis of mGene's genome-wide predictions revealed that approximately 2200 predicted genes were not contained in the current genome annotation. Testing a subset of 57 of these genes by RT-PCR and sequencing, we confirmed expression for 24 (42%) of them. mGene missed 300 annotated genes, out of which 205 were unconfirmed. RT-PCR testing of 24 of these genes resulted in a success rate of merely 8%. These findings suggest that even the gene catalog of a well-studied organism such as C. elegans can be substantially improved by mGene's predictions. We also provide gene predictions for the four nematodes C. briggsae, C. brenneri, C. japonica, and C. remanei. Comparing the resulting proteomes among these organisms and to the known protein universe, we identified many species-specific gene inventions. In a quality assessment of several available annotations for these genomes, we find that mGene's predictions are most accurate. PMID:19564452

  16. Comparative Genomic Analysis of Drosophila melanogaster and Vector Mosquito Developmental Genes

    PubMed Central

    Behura, Susanta K.; Haugen, Morgan; Flannery, Ellen; Sarro, Joseph; Tessier, Charles R.; Severson, David W.; Duman-Scheel, Molly

    2011-01-01

    Genome sequencing projects have presented the opportunity for analysis of developmental genes in three vector mosquito species: Aedes aegypti, Culex quinquefasciatus, and Anopheles gambiae. A comparative genomic analysis of developmental genes in Drosophila melanogaster and these three important vectors of human disease was performed in this investigation. While the study was comprehensive, special emphasis centered on genes that 1) are components of developmental signaling pathways, 2) regulate fundamental developmental processes, 3) are critical for the development of tissues of vector importance, 4) function in developmental processes known to have diverged within insects, and 5) encode microRNAs (miRNAs) that regulate developmental transcripts in Drosophila. While most fruit fly developmental genes are conserved in the three vector mosquito species, several genes known to be critical for Drosophila development were not identified in one or more mosquito genomes. In other cases, mosquito lineage-specific gene gains with respect to D. melanogaster were noted. Sequence analyses also revealed that numerous repetitive sequences are a common structural feature of Drosophila and mosquito developmental genes. Finally, analysis of predicted miRNA binding sites in fruit fly and mosquito developmental genes suggests that the repertoire of developmental genes targeted by miRNAs is species-specific. The results of this study provide insight into the evolution of developmental genes and processes in dipterans and other arthropods, serve as a resource for those pursuing analysis of mosquito development, and will promote the design and refinement of functional analysis experiments. PMID:21754989

  17. A data management system for structural genomics.

    PubMed

    Raymond, Stéphane; O'Toole, Nicholas; Cygler, Miroslaw

    2004-06-21

    BACKGROUND: Structural genomics (SG) projects aim to determine thousands of protein structures by the development of high-throughput techniques for all steps of the experimental structure determination pipeline. Crucial to the success of such endeavours is the careful tracking and archiving of experimental and external data on protein targets. RESULTS: We have developed a sophisticated data management system for structural genomics. Central to the system is an Oracle-based, SQL-interfaced database. The database schema deals with all facets of the structure determination process, from target selection to data deposition. Users access the database via any web browser. Experimental data is input by users with pre-defined web forms. Data can be displayed according to numerous criteria. A list of all current target proteins can be viewed, with links for each target to associated entries in external databases. To avoid unnecessary work on targets, our data management system matches protein sequences weekly using BLAST to entries in the Protein Data Bank and to targets of other SG centers worldwide. CONCLUSION: Our system is a working, effective and user-friendly data management tool for structural genomics projects. In this report we present a detailed summary of the various capabilities of the system, using real target data as examples, and indicate our plans for future enhancements. PMID:15210054

  18. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    PubMed

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-01-01

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives. PMID:26658305

  19. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome

    PubMed Central

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-01-01

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives. PMID:26658305

  20. Divergence of the mitochondrial genome structure in the apicomplexan parasites, Babesia and Theileria.

    PubMed

    Hikosaka, Kenji; Watanabe, Yoh-Ichi; Tsuji, Naotoshi; Kita, Kiyoshi; Kishine, Hiroe; Arisue, Nobuko; Palacpac, Nirianne Marie Q; Kawazu, Shin-Ichiro; Sawai, Hiromi; Horii, Toshihiro; Igarashi, Ikuo; Tanabe, Kazuyuki

    2010-05-01

    Mitochondrial (mt) genomes from diverse phylogenetic groups vary considerably in size, structure, and organization. The genus Plasmodium, causative agent of malaria, of the phylum Apicomplexa, has the smallest mt genome in the form of a circular and/or tandemly repeated linear element of 6 kb, encoding only three protein genes (cox1, cox3, and cob). The closely related genera Babesia and Theileria also have small mt genomes (6.6 kb) that are monomeric linear with an organization distinct from Plasmodium. To elucidate the structural divergence and evolution of mt genomes between Babesia/Theileria and Plasmodium, we determined five new sequences from Babesia bigemina, B. caballi, B. gibsoni, Theileria orientalis, and T. equi. Together with previously reported sequences of B. bovis, T. annulata, and T. parva, all eight Babesia and Theileria mt genomes are linear molecules with terminal inverted repeats (TIRs) on both ends containing three protein-coding genes (cox1, cox3, and cob) and six large subunit (LSU) ribosomal RNA (rRNA) gene fragments. The organization and transcriptional direction of protein-coding genes and the rRNA gene fragments were completely conserved in the four Babesia species. In contrast, notable variation occurred in the four Theileria species. Although the genome structures of T. annulata and T. parva were nearly identical to those of Babesia, an inversion in the 3-kb central region was found in T. orientalis. Moreover, the T. equi mt genome is the largest (8.2 kb) and most divergent with unusually long TIR sequences, in which cox3 and two LSU rRNA gene fragments are located. The T. equi mt genome showed little synteny to the other species. These results suggest that the Theileria mt genome is highly diverse with lineage-specific evolution in two Theileria species: genome inversion in T. orientalis and gene-embedded long TIR in T. equi. PMID:20034997

  1. Bacterial origin of a diverse family of UDP-glycosyltransferase genes in the Tetranychus urticae genome.

    PubMed

    Ahn, Seung-Joon; Dermauw, Wannes; Wybouw, Nicky; Heckel, David G; Van Leeuwen, Thomas

    2014-07-01

    UDP-glycosyltransferases (UGTs) catalyze the conjugation of a variety of small lipophilic molecules with uridine diphosphate (UDP) sugars, altering them into more water-soluble metabolites. Thereby, UGTs play an important role in the detoxification of xenobiotics and in the regulation of endobiotics. Recently, the genome sequence was reported for the two-spotted spider mite, Tetranychus urticae, a polyphagous herbivore damaging a number of agricultural crops. Although various gene families implicated in xenobiotic metabolism have been documented in T. urticae, UGTs so far have not. We identified 80 UGT genes in the T. urticae genome, the largest number of UGT genes in a metazoan species reported so far. Phylogenetic analysis revealed that lineage-specific gene expansions increased the diversity of the T. urticae UGT repertoire. Genomic distribution, intron-exon structure and structural motifs in the T. urticae UGTs were also described. In addition, expression profiling after host-plant shifts and in acaricide resistant lines supported an important role for UGT genes in xenobiotic metabolism. Expanded searches of UGTs in other arachnid species (Subphylum Chelicerata), including a spider, a scorpion, two ticks and two predatory mites, unexpectedly revealed the complete absence of UGT genes. However, a centipede (Subphylum Myriapoda) and a water flea and a crayfish (Subphylum Crustacea) contain UGT genes in their genomes similar to insect UGTs, suggesting that the UGT gene family might have been lost early in the Chelicerata lineage and subsequently re-gained in the tetranychid mites. Sequence similarity of T. urticae UGTs and bacterial UGTs and their phylogenetic reconstruction suggest that spider mites acquired UGT genes from bacteria by horizontal gene transfer. Our findings show a unique evolutionary history of the T. urticae UGT gene family among other arthropods and provide important clues to its functions in relation to detoxification and thereby host

  2. Efficient Gene Tree Correction Guided by Genome Evolution

    PubMed Central

    Lafond, Manuel; Seguin, Jonathan; Boussau, Bastien; Guéguen, Laurent; El-Mabrouk, Nadia; Tannier, Eric

    2016-01-01

    Motivations Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases. Results We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny. Availability A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available. PMID:27513924

  3. Mapping and annotating obesity-related genes in pig and human genomes.

    PubMed

    Martelli, Pier Luigi; Fontanesi, Luca; Piovesan, Damiano; Fariselli, Piero; Casadio, Rita

    2014-01-01

    Background. Obesity is a major health problem in both developed and emerging countries. Obesity is a complex disease whose etiology involves genetic factors in strong interplay with environmental determinants and lifestyle. The discovery of genetic factors and biological pathways underlying human obesity is hampered by the difficulty in controlling the genetic background of human cohorts. Animal models are then necessary to further dissect the genetics of obesity. Pig has emerged as one of the most attractive models, because of the similarity with humans in the mechanisms regulating the fat deposition. Results. We collected the genes related to obesity in humans and to fat deposition traits in pig. We localized them on both human and pig genomes, building a map useful to interpret comparative studies on obesity. We characterized the collected genes structurally and functionally with BAR+ and mapped them on KEGG pathways and on STRING protein interaction network. Conclusions. The collected set consists of 361 obesity related genes in human and pig genomes. All genes were mapped on the human genome, and 54 could not be localized on the pig genome (release 2012). Only for 3 human genes there is no counterpart in pig, confirming that this animal is a good model for human obesity studies. Obesity related genes are mostly involved in regulation and signaling processes/pathways and relevant connection emerges between obesity-related genes and diseases such as cancer and infectious diseases. PMID:23855670

  4. Genomic Gene Clustering Analysis of Pathways in Eukaryotes

    PubMed Central

    Lee, Jennifer M.; Sonnhammer, Erik L.L.

    2003-01-01

    Genomic clustering of genes in a pathway is commonly found in prokaryotes due to transcriptional operons, but these are not present in most eukaryotes. Yet, there might be clustering to a lesser extent of pathway members in eukaryotic genomes, that assist coregulation of a set of functionally cooperating genes. We analyzed five sequenced eukaryotic genomes for clustering of genes assigned to the same pathway in the KEGG database. Between 98% and 30% of the analyzed pathways in a genome were found to exhibit significantly higher clustering levels than expected by chance. In descending order by the level of clustering, the genomes studied were Saccharomyces cerevisiae, Homo sapiens, Caenorhabditis elegans, Arabidopsis thaliana, and Drosophila melanogaster. Surprisingly, there is not much agreement between genomes in terms of which pathways are most clustered. Only seven of 69 pathways found in all species were significantly clustered in all five of them. This species-specific pattern of pathway clustering may reflect adaptations or evolutionary events unique to a particular lineage. We note that although operons are common in C. elegans, only 58% of the pathways showed significant clustering, which is less than in human. Virtually all pathways in S. cerevisiae showed significant clustering. PMID:12695325

  5. From genome to gene: a new epoch for wheat research?

    PubMed

    Wang, Meng; Wang, Shubin; Xia, Guangmin

    2015-06-01

    Genetic research for bread wheat (Triticum aestivum), a staple crop around the world, has been impeded by its complex large hexaploid genome that contains a high proportion of repetitive DNA. Recent advances in sequencing technology have now overcome these challenges and led to genome drafts for bread wheat and its progenitors as well as high-resolution transcriptomes. However, the exploitation of these data for identifying agronomically important genes in wheat is lagging behind. We review recent wheat genome sequencing achievements and focus on four aspects of strategies and future hotspots for wheat improvement: positional cloning, 'omics approaches, combining forward and reverse genetics, and epigenetics. PMID:25887708

  6. Nuclear structure, gene expression and development.

    PubMed

    Brown, K

    1999-01-01

    This article considers the extent to which features of nuclear structure are involved in the regulation of genome function. The recent renaissance in imaging technology has inspired a new determination to assign specific functions to nuclear domains or structures, many of which have been described as "factories" to express the idea that they coordinate nuclear processes in an efficient way. Visual data have been combined with genetic and biochemical information to support the idea that nuclear organization has functional significance. Particular DNA sequences or chromatin structures may nucleate domains that are permissive or restrictive of transcription, to which active or inactive loci could be recruited. Associations within the nucleus, as well as many nuclear structures, are transient and change dynamically during cell cycle progression and development. Despite this complexity, elucidation of the possible structural basis of epigenetic phenomena, such as the inheritance of a "cellular memory" of gene expression status, is an important goal for cell biology. Topics for discussion include the regulatory effect of chromatin structure on gene expression, putative "nuclear addresses" for genes and proteins, the functional significance of nuclear bodies, and the role of the nuclear matrix in nuclear compartmentalization. PMID:10651237

  7. Identification of Neural Outgrowth Genes using Genome-Wide RNAi

    PubMed Central

    Sepp, Katharine J.; Hong, Pengyu; Lizarraga, Sofia B.; Liu, Judy S.; Mejia, Luis A.; Walsh, Christopher A.; Perrimon, Norbert

    2008-01-01

    While genetic screens have identified many genes essential for neurite outgrowth, they have been limited in their ability to identify neural genes that also have earlier critical roles in the gastrula, or neural genes for which maternally contributed RNA compensates for gene mutations in the zygote. To address this, we developed methods to screen the Drosophila genome using RNA-interference (RNAi) on primary neural cells and present the results of the first full-genome RNAi screen in neurons. We used live-cell imaging and quantitative image analysis to characterize the morphological phenotypes of fluorescently labelled primary neurons and glia in response to RNAi-mediated gene knockdown. From the full genome screen, we focused our analysis on 104 evolutionarily conserved genes that when downregulated by RNAi, have morphological defects such as reduced axon extension, excessive branching, loss of fasciculation, and blebbing. To assist in the phenotypic analysis of the large data sets, we generated image analysis algorithms that could assess the statistical significance of the mutant phenotypes. The algorithms were essential for the analysis of the thousands of images generated by the screening process and will become a valuable tool for future genome-wide screens in primary neurons. Our analysis revealed unexpected, essential roles in neurite outgrowth for genes representing a wide range of functional categories including signalling molecules, enzymes, channels, receptors, and cytoskeletal proteins. We also found that genes known to be involved in protein and vesicle trafficking showed similar RNAi phenotypes. We confirmed phenotypes of the protein trafficking genes Sec61alpha and Ran GTPase using Drosophila embryo and mouse embryonic cerebral cortical neurons, respectively. Collectively, our results showed that RNAi phenotypes in primary neural culture can parallel in vivo phenotypes, and the screening technique can be used to identify many new genes that have

  8. Robust Gene-Gene Interaction Analysis in Genome Wide Association Studies.

    PubMed

    Kim, Yongkang; Park, Taesung

    2015-01-01

    Genome-wide association studies (GWAS) have successfully discovered hundreds of associations between genetic variants and complex traits. Most GWAS have focused on the identification of single variants. It has been shown that most of the variants that were discovered by GWAS could only partially explain disease heritability. The explanation for this missing heritability is generally believed to be gene-gene (GG) or gene-environment (GE) interactions and other structural variants. Generalized multifactor dimensionality reduction (GMDR) has been proven to be reasonably powerful in detecting GG and GE interactions; however, its performance has been found to decline when outlying quantitative traits are present. This paper proposes a robust GMDR estimation method (based on the L-estimator and M-estimator estimation methods) in an attempt to reduce the effects caused by outlying traits. A comparison of robust GMDR with the original MDR based on simulation studies showed the former method to outperform the latter. The performance of robust GMDR is illustrated through a real GWA example consisting of 8,577 samples from the Korean population using the Homeostasis Model Assessment of Insulin Resistance (HOMA-IR) level as a phenotype. Robust GMDR identified the KCNH1 gene to have strong interaction effects with other genes on the function of insulin secretion. PMID:26267341

  9. Robust Gene-Gene Interaction Analysis in Genome Wide Association Studies

    PubMed Central

    Kim, Yongkang; Park, Taesung

    2015-01-01

    Genome-wide association studies (GWAS) have successfully discovered hundreds of associations between genetic variants and complex traits. Most GWAS have focused on the identification of single variants. It has been shown that most of the variants that were discovered by GWAS could only partially explain disease heritability. The explanation for this missing heritability is generally believed to be gene-gene (GG) or gene-environment (GE) interactions and other structural variants. Generalized multifactor dimensionality reduction (GMDR) has been proven to be reasonably powerful in detecting GG and GE interactions; however, its performance has been found to decline when outlying quantitative traits are present. This paper proposes a robust GMDR estimation method (based on the L-estimator and M-estimator estimation methods) in an attempt to reduce the effects caused by outlying traits. A comparison of robust GMDR with the original MDR based on simulation studies showed the former method to outperform the latter. The performance of robust GMDR is illustrated through a real GWA example consisting of 8,577 samples from the Korean population using the Homeostasis Model Assessment of Insulin Resistance (HOMA-IR) level as a phenotype. Robust GMDR identified the KCNH1 gene to have strong interaction effects with other genes on the function of insulin secretion. PMID:26267341

  10. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    PubMed

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events). PMID:23161689