Science.gov

Sample records for gene genomic structure

  1. Gene3D: Structural Assignment for Whole Genes and Genomes Using the CATH Domain Structure Database

    PubMed Central

    Buchan, Daniel W.A.; Shepherd, Adrian J.; Lee, David; Pearl, Frances M.G.; Rison, Stuart C.G.; Thornton, Janet M.; Orengo, Christine A.

    2002-01-01

    We present a novel web-based resource, Gene3D, of precalculated structural assignments to gene sequences and whole genomes. This resource assigns structural domains from the CATH database to whole genes and links these to their curated functional and structural annotations within the CATH domain structure database, the functional Dictionary of Homologous Superfamilies (DHS) and PDBsum. Currently Gene3D provides annotation for 36 complete genomes (two eukaryotes, six archaea, and 28 bacteria). On average, between 30% and 40% of the genes of a given genome can be structurally annotated. Matches to structural domains are found using the profile-based method (PSI-BLAST). and a novel protocol, DRange, is used to resolve conflicts in matches involving different homologous superfamilies. PMID:11875040

  2. Cloning, characterization, and genomic structure of the mouse Ikbkap gene.

    PubMed

    Cuajungco, M P; Leyne, M; Mull, J; Gill, S P; Gusella, J F; Slaugenhaupt, S A

    2001-09-01

    Our laboratory recently reported that mutations in the human I-kappaB kinase-associated protein (IKBKAP) gene are responsible for familial dysautonomia (FD). Interestingly, amino acid substitutions in the IKAP correlate with increased risk for childhood bronchial asthma. Here, we report the cloning and genomic characterization of the mouse Ikbkap gene, the homolog of human IKBKAP. Like its human counterpart, Ikbkap encodes a protein of 1332 amino acids with a molecular weight of approximately 150 kDa. The Ikbkap gene product, Ikap, contains 37 exons that span approximately 51 kb. The protein shows 80% amino acid identity with human IKAP. It shows very high conservation across species and is homologous to the yeast Elp1/Iki3p protein, which is a member of the Elongator complex. The Ikbkap gene maps to chromosome 4 in a region that is syntenic to human chromosome 9q31.3. Because no animal model of FD currently exists, cloning of the mouse Ikbkap gene is an important first step toward creating a mouse model for FD. In addition, cloning of Ikbkap is crucial to the characterization of the putative mammalian Elongator complex.

  3. The Complete Chloroplast Genome Sequence of Podocarpus lambertii: Genome Structure, Evolutionary Aspects, Gene Content and SSR Detection

    PubMed Central

    Vieira, Leila do Nascimento; Faoro, Helisson; Rogalski, Marcelo; Fraga, Hugo Pacheco de Freitas; Cardoso, Rodrigo Luis Alves; de Souza, Emanuel Maltempi; de Oliveira Pedrosa, Fábio; Nodari, Rubens Onofre; Guerra, Miguel Pedro

    2014-01-01

    Background Podocarpus lambertii (Podocarpaceae) is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp) genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. Methodology/Principal Findings The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR). It contains 118 unique genes and one duplicated tRNA (trnN-GUU), which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi) and Araucariaceae (Agathis dammara). Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. Conclusion The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of this genus. PMID

  4. Comparative Genomics of Sibling Fungal Pathogenic Taxa Identifies Adaptive Evolution without Divergence in Pathogenicity Genes or Genomic Structure

    PubMed Central

    Sillo, Fabiano; Garbelotto, Matteo; Friedman, Maria; Gonthier, Paolo

    2015-01-01

    It has been estimated that the sister plant pathogenic fungal species Heterobasidion irregulare and Heterobasidion annosum may have been allopatrically isolated for 34–41 Myr. They are now sympatric due to the introduction of the first species from North America into Italy, where they freely hybridize. We used a comparative genomic approach to 1) confirm that the two species are distinct at the genomic level; 2) determine which gene groups have diverged the most and the least between species; 3) show that their overall genomic structures are similar, as predicted by the viability of hybrids, and identify genomic regions that instead are incongruent; and 4) test the previously formulated hypothesis that genes involved in pathogenicity may be less divergent between the two species than genes involved in saprobic decay and sporulation. Results based on the sequencing of three genomes per species identified a high level of interspecific similarity, but clearly confirmed the status of the two as distinct taxa. Genes involved in pathogenicity were more conserved between species than genes involved in saprobic growth and sporulation, corroborating at the genomic level that invasiveness may be determined by the two latter traits, as documented by field and inoculation studies. Additionally, the majority of genes under positive selection and the majority of genes bearing interspecific structural variations were involved either in transcriptional or in mitochondrial functions. This study provides genomic-level evidence that invasiveness of pathogenic microbes can be attained without the high levels of pathogenicity presumed to exist for pathogens challenging naïve hosts. PMID:26527650

  5. Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation

    PubMed Central

    Sharma, Virag; Elghafari, Anas; Hiller, Michael

    2016-01-01

    Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes. PMID:27016733

  6. The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome.

    PubMed

    Katju, Vaishali; Lynch, Michael

    2003-12-01

    The significance of gene duplication in provisioning raw materials for the evolution of genomic diversity is widely recognized, but the early evolutionary dynamics of duplicate genes remain obscure. To elucidate the structural characteristics of newly arisen gene duplicates at infancy and their subsequent evolutionary properties, we analyzed gene pairs with < or =10% divergence at synonymous sites within the genome of Caenorhabditis elegans. Structural heterogeneity between duplicate copies is present very early in their evolutionary history and is maintained over longer evolutionary timescales, suggesting that duplications across gene boundaries in conjunction with shuffling events have at least as much potential to contribute to long-term evolution as do fully redundant (complete) duplicates. The median duplication span of 1.4 kb falls short of the average gene length in C. elegans (2.5 kb), suggesting that partial gene duplications are frequent. Most gene duplicates reside close to the parent copy at inception, often as tandem inverted loci, and appear to disperse in the genome as they age, as a result of reduced survivorship of duplicates located in proximity to the ancestral copy. We propose that illegitimate recombination events leading to inverted duplications play a disproportionately large role in gene duplication within this genome in comparison with other mechanisms.

  7. The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome.

    PubMed Central

    Katju, Vaishali; Lynch, Michael

    2003-01-01

    The significance of gene duplication in provisioning raw materials for the evolution of genomic diversity is widely recognized, but the early evolutionary dynamics of duplicate genes remain obscure. To elucidate the structural characteristics of newly arisen gene duplicates at infancy and their subsequent evolutionary properties, we analyzed gene pairs with < or =10% divergence at synonymous sites within the genome of Caenorhabditis elegans. Structural heterogeneity between duplicate copies is present very early in their evolutionary history and is maintained over longer evolutionary timescales, suggesting that duplications across gene boundaries in conjunction with shuffling events have at least as much potential to contribute to long-term evolution as do fully redundant (complete) duplicates. The median duplication span of 1.4 kb falls short of the average gene length in C. elegans (2.5 kb), suggesting that partial gene duplications are frequent. Most gene duplicates reside close to the parent copy at inception, often as tandem inverted loci, and appear to disperse in the genome as they age, as a result of reduced survivorship of duplicates located in proximity to the ancestral copy. We propose that illegitimate recombination events leading to inverted duplications play a disproportionately large role in gene duplication within this genome in comparison with other mechanisms. PMID:14704166

  8. Evidence-based gene models for structural and functional annotations of the oil palm genome.

    PubMed

    Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

    2017-09-08

    Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC3-rich genes (GC3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC3-rich and intronless), as well as those associated with important functions, such as FA

  9. Genomic structure and nucleotide sequence of the p55 gene of the puffer fish Fugu rubripes

    SciTech Connect

    Elgar, G.; Rattray, F.; Greystrong, J.; Brenner, S.

    1995-06-10

    The p55 gene, which codes for a 55-kDa erythrocyte membrane protein, has been cloned and sequenced from the genome of the Japanese puffer fish Fugu rubripes (Fugu). This organism has the smallest recorded vertebrate genome and therefore provides an efficient way to sequence genes at the genomic level. The gene encoding p55 covers 5.5 kb from the beginning to the end of the coding sequence, four to six times smaller than the estimated size of the human gene, and is encoded by 12 exons. The structure of this gene has not been previously elucidated, but from this and other data we would predict a similar or identical structure in mammals. The predicted amino acid sequence of this gene in Fugu, coding for a polypeptide of 467 amino acids, is very similar to that of the human gene with the exception of the first two exons, which differ considerably. The predicted Fugu protein has a molecular weight (52.6 kDa compared with 52.3 kDa) and an isoelectric point very similar to those of human p55. In human, the p55 gene lies in the gene-dense Xq28 region, just 30 kb 3{prime} to the Factor VIII gene, and is estimated to cover 20-30 kb. Its 5{prime} end is associated with a CpG island, although there is no evidence that this is the case in Fugu. The small size of genes in Fugu and the high coding homology that they share with their mammalian equivalents, both in structure and sequence, make this compact vertebrate genome an ideal model for genomic studies. 23 refs., 3 figs.

  10. Alpha tubulin genes from Leishmania braziliensis: genomic organization, gene structure and insights on their expression

    PubMed Central

    2013-01-01

    Background Alpha tubulin is a fundamental component of the cytoskeleton which is responsible for cell shape and is involved in cell division, ciliary and flagellar motility and intracellular transport. Alpha tubulin gene expression varies according to the morphological changes suffered by Leishmania in its life cycle. However, the objective of studying the mechanisms responsible for the differential expression has resulted to be a difficult task due to the complex genome organization of tubulin genes and to the non-conventional mechanisms of gene regulation operating in Leishmania. Results We started this work by analyzing the genomic organization of α-tubulin genes in the Leishmania braziliensis genome database. The genomic organization of L. braziliensis α-tubulin genes differs from that existing in the L. major and L. infantum genomes. Two loci containing α-tubulin genes were found in the chromosomes 13 and 29, even though the existence of sequence gaps does not allow knowing the exact number of genes at each locus. Southern blot assays showed that α-tubulin locus at chromosome 13 contains at least 8 gene copies, which are tandemly organized with a 2.08-kb repetition unit; the locus at chromosome 29 seems to contain a sole α-tubulin gene. In addition, it was found that L. braziliensis α-tubulin locus at chromosome 13 contains two types of α-tubulin genes differing in their 3′ UTR, each one presumably containing different regulatory motifs. It was also determined that the mRNA expression levels of these genes are controlled by post-transcriptional mechanisms tightly linked to the growth temperature. Moreover, the decrease in the α-tubulin mRNA abundance observed when promastigotes were cultured at 35°C was accompanied by parasite morphology alterations, similar to that occurring during the promastigote to amastigote differentiation. Conclusions Information found in the genome databases indicates that α-tubulin genes have been reorganized in a drastic

  11. Alpha tubulin genes from Leishmania braziliensis: genomic organization, gene structure and insights on their expression.

    PubMed

    Ramírez, César A; Requena, José M; Puerta, Concepción J

    2013-07-06

    Alpha tubulin is a fundamental component of the cytoskeleton which is responsible for cell shape and is involved in cell division, ciliary and flagellar motility and intracellular transport. Alpha tubulin gene expression varies according to the morphological changes suffered by Leishmania in its life cycle. However, the objective of studying the mechanisms responsible for the differential expression has resulted to be a difficult task due to the complex genome organization of tubulin genes and to the non-conventional mechanisms of gene regulation operating in Leishmania. We started this work by analyzing the genomic organization of α-tubulin genes in the Leishmania braziliensis genome database. The genomic organization of L. braziliensis α-tubulin genes differs from that existing in the L. major and L. infantum genomes. Two loci containing α-tubulin genes were found in the chromosomes 13 and 29, even though the existence of sequence gaps does not allow knowing the exact number of genes at each locus. Southern blot assays showed that α-tubulin locus at chromosome 13 contains at least 8 gene copies, which are tandemly organized with a 2.08-kb repetition unit; the locus at chromosome 29 seems to contain a sole α-tubulin gene. In addition, it was found that L. braziliensis α-tubulin locus at chromosome 13 contains two types of α-tubulin genes differing in their 3' UTR, each one presumably containing different regulatory motifs. It was also determined that the mRNA expression levels of these genes are controlled by post-transcriptional mechanisms tightly linked to the growth temperature. Moreover, the decrease in the α-tubulin mRNA abundance observed when promastigotes were cultured at 35°C was accompanied by parasite morphology alterations, similar to that occurring during the promastigote to amastigote differentiation. Information found in the genome databases indicates that α-tubulin genes have been reorganized in a drastic manner along Leishmania

  12. Structural Relationships between Highly Conserved Elements and Genes in Vertebrate Genomes

    PubMed Central

    Sun, Hong; Skogerbø, Geir; Wang, Zhen; Liu, Wei; Li, Yixue

    2008-01-01

    Large numbers of sequence elements have been identified to be highly conserved among vertebrate genomes. These highly conserved elements (HCEs) are often located in or around genes that are involved in transcription regulation and early development. They have been shown to be involved in cis-regulatory activities through both in vivo and additional computational studies. We have investigated the structural relationships between such elements and genes in six vertebrate genomes human, mouse, rat, chicken, zebrafish and tetraodon and detected several thousand cases of conserved HCE-gene associations, and also cases of HCEs with no common target genes. A few examples underscore the potential significance of our findings about several individual genes. We found that the conserved association between HCE/HCEs and gene/genes are not restricted to elements by their absolute distance on the genome. Notably, long-range associations were identified and the molecular functions of the associated genes do not show any particular overrepresentation of the functional categories previously reported. HCEs in close proximity are found to be linked with different set of gene/genes. The results reflect the highly complex correlation between HCEs and their putative target genes. PMID:19008958

  13. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags

    SciTech Connect

    Xu, Y.; Mural, R.; Uberbacher, E.

    1997-02-01

    Computational methods for gene identification in genomic sequences typically have two phases: coding region prediction and gene parsing. While there are many effective methods for predicting coding regions (exons), parsing the predicted exons into proper gene structures, to a large extent, remains an unsolved problem. This paper presents an algorithm for inferring gene structures from predicted exon candidates, based on Expressed Sequence Tags (ESTs) and biological intuition/rules. The algorithm first finds all the related ESTs in the EST database (dbEST) for each predicted exon, and infers the boundaries of one or a series of genes based on the available EST information and biological rules. Then it constructs gene models within each pair of gene boundaries, that are most consistent with the EST information. By exploiting EST information and biological rules, the algorithm can (1) model complicated multiple gene structures, including embedded genes, (2) identify falsely-predicted exons and locate missed exons, and (3) make more accurate exon boundary predictions. The algorithm has been implemented and tested on long genomic sequences with a number of genes. Test results show that very accurate (predicted) gene models can be expected when related ESTs exist for the predicted exons.

  14. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags.

    PubMed

    Xu, Y; Mural, R J; Uberbacher, E C

    1997-01-01

    Computational methods for gene identification in genomic sequences typically have two phases: coding region prediction and gene parsing. While there are many effective methods for predicting coding regions (exons), parsing the predicted exons into proper gene structures, to a large extent, remains an unsolved problem. This paper presents an algorithm for inferring gene structures from predicted exon candidates, based on Expressed Sequence Tags (ESTs) and biological intuition/rules. The algorithm first finds all the related ESTs in the EST database (dbEST) for each predicted exon, and infers the boundaries of one or a series of genes based on the available EST information and biological rules. Then it constructs gene models within each pair of gene boundaries, that are most consistent with the EST information. By exploiting EST information and biological rules, the algorithm can (1) model complicated multiple gene structures, including embedded genes, (2) identify falsely-predicted exons and locate missed exons, and (3) make more accurate exon boundary predictions. The algorithm has been implemented and tested on long genomic sequences with a number of genes. Test results show that very accurate (predicted) gene models can be expected when related ESTs exist for the predicted exons.

  15. Structure of the murine MPTP-PEST gene: Genomic organization and chromosomal mapping

    SciTech Connect

    Charest, A.; Wagner, J.; Muise, E.S.

    1995-08-10

    Protein tyrosine phosphatases comprise a large family of enzymes that are involved in the control of cellular tyrosine phosphorylation. We have used {lambda} phage analysis to elucidate the complete genomic structure of an intracellular member of this family, the murine MPTP-PEST gene. Eight overlapping {lambda} phage clones representing the MPTP-PEST locus were isolated from a 129/sv mouse genomic library. The gene spans over 90 kb of the mouse genome and is composed of 18 exons, 10 of which constitute the catalytic phosphatase domain. Detailed comparison of the position of intron/exon boundaries of the phosphatase domain of MPTP-PEST to those of several other protein tyrosine phosphatases indicates that the MPTP-PEST catalytic domain contains additional exons as a consequence of the insertion of novel introns. In addition, this analysis reveals a strong conservation of the genomic organization within the catalytic domain of the protein tyrosine phosphatase gene family. Finally, fluorescence in situ hybridization with MPTP-PEST genomic DNA refines the map position of MPTP-PEST to mouse chromosome 5A3 to B. This result is in agreement with the previous mapping of the human PEST gene to chromosome 7q11.23, a region of synteny with the centromeric portion of mouse chromosome 5. 33 refs., 3 figs., 1 tab.

  16. Genomic structure of the human BCCIP gene and its expression in cancer.

    PubMed

    Meng, Xiangbing; Liu, Jingmei; Shen, Zhiyuan

    2003-01-02

    Human BCCIPalpha (Tok-1alpha) is a BRCA2 and CDKN1A (Cip1, p21) interacting protein. Our previous studies have showed that overexpression of BCCIPalpha inhibits the growth of certain tumor cells [Oncogene 20 (2001) 336]. In this study, we report the genomic structure of the human BCCIP gene, which contains nine exons. Alternative splicing of the 3'-terminal exons produces two isoforms of BCCIP transcripts, BCCIPalpha and BCCIPbeta. The BCCIP gene is flanked by two genes that are transcribed in the opposite orientation of the BCCIP gene. It lies head-to-head and shares a bi-directional promoter with the uroporphyrinogen III synthase (UROS) gene. The last three exons of BCCIP gene overlap the 3'-terminal seven exons of a DEAD/H helicase-like gene (DDX32). Using a matched normal/tumor cDNA array, we identified a reduced expression of BCCIP in kidney tumor, suggesting a role of BCCIP in cancer etiology.

  17. Structural genomics of highly conserved microbial genes of unknown function in search of new antibacterial targets.

    PubMed

    Abergel, Chantal; Coutard, Bruno; Byrne, Deborah; Chenivesse, Sabine; Claude, Jean-Baptiste; Deregnaucourt, Céline; Fricaux, Thierry; Gianesini-Boutreux, Celine; Jeudy, Sandra; Lebrun, Régine; Maza, Caroline; Notredame, Cédric; Poirot, Olivier; Suhre, Karsten; Varagnol, Majorie; Claverie, Jean-Michel

    2003-01-01

    With more than 100 antibacterial drugs at our disposal in the 1980's, the problem of bacterial infection was considered solved. Today, however, most hospital infections are insensitive to several classes of antibacterial drugs, and deadly strains of Staphylococcus aureus resistant to vancomycin--the last resort antibiotic--have recently begin to appear. Other life-threatening microbes, such as Enterococcus faecalis and Mycobacterium tuberculosis are already able to resist every available antibiotic. There is thus an urgent, and continuous need for new, preferably large-spectrum, antibacterial molecules, ideally targeting new biochemical pathways. Here we report on the progress of our structural genomics program aiming at the discovery of new antibacterial gene targets among evolutionary conserved genes of uncharacterized function. A series of bioinformatic and comparative genomics analyses were used to identify a set of 221 candidate genes common to Gram-positive and Gram-negative bacteria. These genes were split between two laboratories. They are now submitted to a systematic 3-D structure determination protocol including cloning, protein expression and purification, crystallization, X-ray diffraction, structure interpretation, and function prediction. We describe here our strategies for the 111 genes processed in our laboratory. Bioinformatics is used at most stages of the production process and out of 111 genes processed--and 17 months into the project--108 have been successfully cloned, 103 have exhibited detectable expression, 84 have led to the production of soluble protein, 46 have been purified, 12 have led to usable crystals, and 7 structures have been determined.

  18. Structural Genomics: From Genes to Structures With Valuable Materials And Many Questions in Between

    SciTech Connect

    Fox, B.G.; Goulding, C.; Malkowski, M.G.; Stewart, L.; Deacon, A.; /SLAC, SSRL

    2009-04-30

    The Protein Structure Initiative (PSI), funded by the US National Institutes of Health (NIH), provides a framework for the development and systematic evaluation of methods to solve protein structures. Although the PSI and other structural genomics efforts around the world have led to the solution of many new protein structures as well as the development of new methods, methodological bottlenecks still exist and are being addressed in this 'production phase' of PSI.

  19. Gene network inference via structural equation modeling in genetical genomics experiments.

    PubMed

    Liu, Bing; de la Fuente, Alberto; Hoeschele, Ina

    2008-03-01

    Our goal is gene network inference in genetical genomics or systems genetics experiments. For species where sequence information is available, we first perform expression quantitative trait locus (eQTL) mapping by jointly utilizing cis-, cis-trans-, and trans-regulation. After using local structural models to identify regulator-target pairs for each eQTL, we construct an encompassing directed network (EDN) by assembling all retained regulator-target relationships. The EDN has nodes corresponding to expressed genes and eQTL and directed edges from eQTL to cis-regulated target genes, from cis-regulated genes to cis-trans-regulated target genes, from trans-regulator genes to target genes, and from trans-eQTL to target genes. For network inference within the strongly constrained search space defined by the EDN, we propose structural equation modeling (SEM), because it can model cyclic networks and the EDN indeed contains feedback relationships. On the basis of a factorization of the likelihood and the constrained search space, our SEM algorithm infers networks involving several hundred genes and eQTL. Structure inference is based on a penalized likelihood ratio and an adaptation of Occam's window model selection. The SEM algorithm was evaluated using data simulated with nonlinear ordinary differential equations and known cyclic network topologies and was applied to a real yeast data set.

  20. Genomic structure and expression of uncoupling protein 2 genes in rainbow trout (Oncorhynchus mykiss).

    PubMed

    Coulibaly, Issa; Gahr, Scott A; Palti, Yniv; Yao, Jianbo; Rexroad, Caird E

    2006-08-09

    Uncoupling protein 2 (UCP2) belongs to the superfamily of mitochondrial anion carriers that dissociate the respiratory chain from ATP synthesis. It has been determined that UCP2 plays a role in several physiological processes such as energy expenditure, body weight control and fatty acid metabolism in several vertebrate species. We report the first characterization of UCP2s in rainbow trout (Oncorhynchus mykiss). Two UCP2 genes were identified in the rainbow trout genome, UCP2A and UCP2B. These genes are 93% similar in their predicted amino acid sequences and display the same genomic structure as other vertebrates (8 exons and 7 introns) spanning 4.2 kb and 3.2 kb, respectively. UCP2A and UCP2B were widely expressed in all tissues of the study with a predominant level in macrophage-rich tissues and reproductive organs. In fry muscle we observed an increase in UCP2B expression in response to fasting and a decrease after refeeding in agreement with previous studies in human, mouse, rat, and marsupials. The converse expression pattern was observed for UCP2A mRNA which decreased during fasting, suggesting different metabolic roles for UCP2A and UCP2B in rainbow trout muscle. Phylogenetic analysis including other genes from the UCP core family located rainbow trout UCP2A and UCP2B with their orthologs and suggested an early divergence of vertebrate UCPs from a common ancestor gene. We characterized two UCP2 genes in rainbow trout with similar genomic structures, amino acid sequences and distribution profiles. These genes appeared to be differentially regulated in response to fasting and refeeding in fry muscle. The genomic organization and phylogeny analysis support the hypothesis of a common ancestry between the vertebrate UCPs.

  1. Genomic structure of two ras family genes in the slime mold Physarum polycephalum.

    PubMed

    Trzcińska-Danielewicz, Joanna; Kozlowski, Piotr; Gierdal, Katarzyna; Wiejak, Jolanta; Jagielski, Adam; Toczko, Kazimierz; Fronk, Jan

    2002-08-01

    Genomic structure of two Physarum polycephalum ras family genes, Ppras2 and Pprap1, has been determined, including the upstream region of the latter. The genes are interrupted by three and four introns, respectively. The first intron of Ppras2 has the same location within the coding sequence as the first intron in another ras homolog from this organism, Ppras1 [Trzcińska-Danielewicz, J., Kozlowski, P., and Toczko, K. (1996). "Cloning and genomic sequence of the Physarum polycephalum Ppras1 gene, a homologue of the ras protooncogene", Gene 169, pp. 143-144]. All introns, ranging from 53 to ca. 460 base pairs, have the canonical 5' and 3' ends, are greatly enriched in pyrimidines in the coding strand and have frequent pyrimidines-only tracts. These latter features seem to be responsible for the difficulties in cloning and sequencing of parts of these genes. Short sequences shared with P. polycephalum transposon-like repeats are common in the introns, indicating a possible role of transposition in intron evolution. In all three ras family genes phase zero introns are located mostly between sequences coding for regular protein secondary structure elements.

  2. Global transcript structure resolution of high gene density genomes through multi-platform data integration

    PubMed Central

    O'Grady, Tina; Wang, Xia; Höner zu Bentrup, Kerstin; Baddoo, Melody; Concha, Monica; Flemington, Erik K.

    2016-01-01

    Annotation of herpesvirus genomes has traditionally been undertaken through the detection of open reading frames and other genomic motifs, supplemented with sequencing of individual cDNAs. Second generation sequencing and high-density microarray studies have revealed vastly greater herpesvirus transcriptome complexity than is captured by existing annotation. The pervasive nature of overlapping transcription throughout herpesvirus genomes, however, poses substantial problems in resolving transcript structures using these methods alone. We present an approach that combines the unique attributes of Pacific Biosciences Iso-Seq long-read, Illumina short-read and deepCAGE (Cap Analysis of Gene Expression) sequencing to globally resolve polyadenylated isoform structures in replicating Epstein-Barr virus (EBV). Our method, Transcriptome Resolution through Integration of Multi-platform Data (TRIMD), identifies nearly 300 novel EBV transcripts, quadrupling the size of the annotated viral transcriptome. These findings illustrate an array of mechanisms through which EBV achieves functional diversity in its relatively small, compact genome including programmed alternative splicing (e.g. across the IR1 repeats), alternative promoter usage by LMP2 and other latency-associated transcripts, intergenic splicing at the BZLF2 locus, and antisense transcription and pervasive readthrough transcription throughout the genome. PMID:27407110

  3. Overview of PSB track on gene structure identification in large-scale genomic sequence

    SciTech Connect

    Uberbacher, E.C.; Xu, Y.

    1998-12-31

    The recent funding of more than a dozen major genome centers to begin community-wide high-throughput sequencing of the human genome has created a significant new challenge for the computational analysis of DNA sequence and the prediction of gene structure and function. It has been estimated that on average from 1996 to 2003, approximately 2 million bases of newly finished DNA sequence will be produced every day and be made available on the Internet and in central databases. The finished (fully assembled) sequence generated each day will represent approximately 75 new genes (and their respective proteins), and many times this number will be represented in partially completed sequences. The information contained in these is of immeasurable value to medical research, biotechnology, the pharmaceutical industry and researchers in a host of fields ranging from microorganism metabolism, to structural biology, to bioremediation. Sequencing of microorganisms and other model organisms is also ramping up at a very rapid rate. The genomes for yeast and several microorganisms such as H. influenza have recently been fully sequenced, although the significance of many genes remains to be determined.

  4. Retrotransposition of gene transcripts leads to structural variation in mammalian genomes

    PubMed Central

    2013-01-01

    Background Retroposed processed gene transcripts are an important source of material for new gene formation on evolutionary timescales. Most prior work on gene retrocopy discovery compared copies in reference genome assemblies to their source genes. Here, we explore gene retrocopy insertion polymorphisms (GRIPs) that are present in the germlines of individual humans, mice, and chimpanzees, and we identify novel gene retrocopy insertions in cancerous somatic tissues that are absent from patient-matched non-cancer genomes. Results Through analysis of whole-genome sequence data, we found evidence for 48 GRIPs in the genomes of one or more humans sequenced as part of the 1,000 Genomes Project and The Cancer Genome Atlas, but which were not in the human reference assembly. Similarly, we found evidence for 755 GRIPs at distinct locations in one or more of 17 inbred mouse strains but which were not in the mouse reference assembly, and 19 GRIPs across a cohort of 10 chimpanzee genomes, which were not in the chimpanzee reference genome assembly. Many of these insertions are new members of existing gene families whose source genes are highly and widely expressed, and the majority have detectable hallmarks of processed gene retrocopy formation. We estimate the rate of novel gene retrocopy insertions in humans and chimps at roughly one new gene retrocopy insertion for every 6,000 individuals. Conclusions We find that gene retrocopy polymorphisms are a widespread phenomenon, present a multi-species analysis of these events, and provide a method for their ascertainment. PMID:23497673

  5. Computational Integration of Structural and Functional Genomics Data Across Species to Develop Information on Porcine Inflammatory Gene Regulatory Pathway

    USDA-ARS?s Scientific Manuscript database

    Comparative integration of structural and functional genomic data across species holds great promise in finding genes controlling disease resistance. We are investigating the porcine gut immune response to infection through gene expression profiling. We have collected porcine Affymetrix GeneChip da...

  6. The genomic structure of the gene encoding the human transforming growth factor {beta} type II receptor (TGF-{beta} RII)

    SciTech Connect

    Takenoshita, Seiichi; Hagiwara, Koichi; Nagashima, Makoto; Gemma, Akihiko

    1996-09-01

    The genomic structure of the human transforming growth factor-{beta} type II receptor gene (TGF-{beta} RII) was determined by two PCR-based methods, the {open_quotes}long distance sequencer{close_quotes} method and the {open_quotes}promoter finder{close_quotes} method. Genomic fragments containing exons and adjacent introns were amplified by PCR, and the nucleotide sequences were determined by direct sequencing and subcloning sequencing. The TGF-{beta} RII protein is encoded by 567 codons in 7 exons. This is the first report about the genomic structure of a gene that belongs to the serine/threonine kinase type II receptor subfamily. Knowledge of the genomic structure of the TGF-{beta} RII gene will facilitate investigation of the TGF-{beta} RII gene will facilitate investigation of the TGF-{beta} signaling pathway in normal human cells and of the aberrations occurring during carcinogenesis. 18 refs., 2 figs., 1 tab.

  7. Genomic structure and expression of STM2, the chromosome 1 familial Alzheimer disease gene

    SciTech Connect

    Levy-Lahad, E.; Wang, Kai; Fu, Ying Hui

    1996-06-01

    Mutations in the gene STM2 result in autosomal dominant familial Alzheimer disease. To screen for mutations and to identify regulatory elements for this gene, the genomic DNA sequence and intron-exon structure were determined. Twelve exons including 10 coding exons were identified in a genomic region spanning 23, 737 bp. The first 2 exons encode the 5{prime}-untranslated region. Expression analysis of STM2 indicates that two transcripts of 2.4 and 2.8 kb are found in skeletal muscle, pancreas, and heart. In addition, a splice variant of the 2.4-kb transcript was identified that is the result of the use of an alternative splice acceptor site located in exon 10. The use of this site results in a transcript lacking a single glutamate. The promotor for this gene and the alternatively spliced exons leading to the 2.8-kb form of the gene remain to be identified. Expression of STM2 was high in skeletal muscle and pancreas, with comparatively low levels observed in brain. This expression pattern is intriguing since in Alzheimer disease, pathology and degeneration are observed only in the central nervous system. 19 refs., 2 figs., 3 tabs.

  8. A close relationship between primary nucleotides sequence structure and the composition of functional genes in the genome of prokaryotes.

    PubMed

    Garcia, Juan A L; Fernández-Guerra, Antoni; Casamayor, Emilio O

    2011-12-01

    Comparative genomics is an essential tool to unravel how genomes change over evolutionary time and to gain clues on the links between functional genomics and evolution. In prokaryotes, the large, good quality, genome sequences available in public databases and the recently developed large-scale computational methods, offer an unprecedent view on the ecology and evolution of microorganisms through comparative genomics. In this work, we examined the links among genome structure (i.e., the sequential distribution of nucleotides itself by detrended fluctuation analysis, DFA) and genomic diversity (i.e., gene functionality by Clusters of Orthologous Genes, COGs) in 828 full sequenced prokaryotic genomes from 548 different bacteria and archaea species. DFA scaling exponent α indicated persistent long-range correlations (fractality) in each genome analyzed. Higher resolution power was found when considering the sequential succession of purine (AG) vs. pyrimidine (CT) bases than either keto (GT) to amino (AC) forms or strongly (GC) vs. weakly (AT) bonded nucleotides. Interestingly, the phyla Aquificae, Fusobacteria, Dictyoglomi, Nitrospirae, and Thermotogae were closer to archaea than to their bacterial counterparts. A strong significant correlation was found between scaling exponent α and COGs distribution, and we consistently observed that the larger α the more heterogeneous was the gene distribution within each functional category, suggesting a close relationship between primary nucleotides sequence structure and functional genes composition.

  9. Domain organization, genomic structure, evolution, and regulation of expression of the aggrecan gene family.

    PubMed

    Schwartz, N B; Pirok, E W; Mensch, J R; Domowicz, M S

    1999-01-01

    Proteoglycans are complex macromolecules, consisting of a polypeptide backbone to which are covalently attached one or more glycosaminoglycan chains. Molecular cloning has allowed identification of the genes encoding the core proteins of various proteoglycans, leading to a better understanding of the diversity of proteoglycan structure and function, as well as to the evolution of a classification of proteoglycans on the basis of emerging gene families that encode the different core proteins. One such family includes several proteoglycans that have been grouped with aggrecan, the large aggregating chondroitin sulfate proteoglycan of cartilage, based on a high number of sequence similarities within the N- and C-terminal domains. Thus far these proteoglycans include versican, neurocan, and brevican. It is now apparent that these proteins, as a group, are truly a gene family with shared structural motifs on the protein and nucleotide (mRNA) levels, and with nearly identical genomic organizations. Clearly a common ancestral origin is indicated for the members of the aggrecan family of proteoglycans. However, differing patterns of amplification and divergence have also occurred within certain exons across species and family members, leading to the class-characteristic protein motifs in the central carbohydrate-rich region exclusively. Thus the overall domain organization strongly suggests that sequence conservation in the terminal globular domains underlies common functions, whereas differences in the central portions of the genes account for functional specialization among the members of this gene family.

  10. The population genomics of begomoviruses: global scale population structure and gene flow

    PubMed Central

    2010-01-01

    Background The rapidly growing availability of diverse full genome sequences from across the world is increasing the feasibility of studying the large-scale population processes that underly observable pattern of virus diversity. In particular, characterizing the genetic structure of virus populations could potentially reveal much about how factors such as geographical distributions, host ranges and gene flow between populations combine to produce the discontinuous patterns of genetic diversity that we perceive as distinct virus species. Among the richest and most diverse full genome datasets that are available is that for the dicotyledonous plant infecting genus, Begomovirus, in the Family Geminiviridae. The begomoviruses all share the same whitefly vector, are highly recombinogenic and are distributed throughout tropical and subtropical regions where they seriously threaten the food security of the world's poorest people. Results We focus here on using a model-based population genetic approach to identify the genetically distinct sub-populations within the global begomovirus meta-population. We demonstrate the existence of at least seven major sub-populations that can further be sub-divided into as many as thirty four significantly differentiated and genetically cohesive minor sub-populations. Using the population structure framework revealed in the present study, we further explored the extent of gene flow and recombination between genetic populations. Conclusions Although geographical barriers are apparently the most significant underlying cause of the seven major population sub-divisions, within the framework of these sub-divisions, we explore patterns of gene flow to reveal that both host range differences and genetic barriers to recombination have probably been major contributors to the minor population sub-divisions that we have identified. We believe that the global Begomovirus population structure revealed here could facilitate population genetics studies

  11. Genomic structure of the human RBP56/hTAFII68 and FUS/TLS genes.

    PubMed

    Morohoshi, F; Ootsuka, Y; Arai, K; Ichikawa, H; Mitani, S; Munakata, N; Ohki, M

    1998-10-23

    We previously isolated RBP56 cDNA by PCR using mixed primers designed from the conserved sequences of the RNA binding domain of FUS/TLS and EWS proteins. RBP56 protein turned out to be hTAFII68 which was isolated as a TATA-binding protein associated factor (TAF) from a sub-population of TFIID complexes (Bertolotti A., Lutz, Y., Heard, D.J., Chambon, P., Tora, L., 1996. hTAFII68, a novel RNA/ssDNA-binding protein with homology to the proto-oncoproteins TLS/FUS and EWS is associated with both TFIID and RNA polymerase II. EMBO J. 15, 5022-5031). The RBP56/hTAFII68, FUS/TLS and EWS proteins comprise a sub-family of RNA binding proteins, which consist of an N-terminal Ser, Gly, Gln and Tyr-rich region, an RNA binding domain, a Cys2/Cys2 zinc finger motif and a C-terminal RGG-containing region. Rearrangement of the FUS/TLS gene and the EWS gene has been found in several types of malignant tumors, and the resultant fusion proteins play an important role in the pathogenesis of these tumors. In the present study, we determined the genomic structure of the RBP56/hTAFII68 gene. The RBP56/hTAFII68 gene spans about 37kb and consists of 16 exons from 33bp to 562bp. The longest exon, exon 15, encodes the C-terminal region containing 19 repeats of a degenerate DR(S)GG(G)YGG sequence. While the structure of the FUS/TLS gene has been reported previously, we determined the total DNA sequence of the FUS/TLS gene, consisting of 12kb. The RBP56/hTAFII68, FUS/TLS and EWS genes consist of similar numbers of exons. Comparison of the structures of these three genes showed that the organization of exons in the central part encoding a homologous RNA binding domain and a cysteine finger motif is highly conserved, and other exon boundaries are also located at similar sites, indicating that these three genes most likely originate from the same ancestor gene.

  12. Pseudoscorpion mitochondria show rearranged genes and genome-wide reductions of RNA gene sizes and inferred structures, yet typical nucleotide composition bias

    PubMed Central

    2012-01-01

    Background Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes. Results We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic analyses of all 13

  13. Analysis of the murine Dtk gene identifies conservation of genomic structure within a new receptor tyrosine kinase subfamily

    SciTech Connect

    Lewis, P.M.; Crosier, K.E.; Crosier, P.S.

    1996-01-01

    The receptor tyrosine kinase Dtk/Tyro 3/Sky/rse/brt/tif is a member of a new subfamily of receptors that also includes Axl/Ufo/Ark and Eyk/Mer. These receptors are characterized by the presence of two immunoglobulin-like loops and two fibronectin type III repeats in their extracellular domains. The structure of the murine Dtk gene has been determined. The gene consists of 21 exons that are distributed over 21 kb of genomic DNA. An isoform of Dtk is generated by differential splicing of exons from the 5{prime} region of the gene. The overall genomic structure of Dtk is virtually identical to that determined for the human UFO gene. This particular genomic organization is likely to have been duplicated and closely maintained throughout evolution. 38 refs., 3 figs., 1 tab.

  14. Genomic structure and characterization of the promoter region of the human NAK gene.

    PubMed

    Li, Sheng Fan; Fujita, Fumitaka; Hirai, Momoki; Lu, Rui; Niida, Hiroyuki; Nakanishi, Makoto

    2003-01-30

    NAK has been identified as an IkappaB-kinase activating-kinase that plays an important role in NF-kappaB activation in response to several pro-inflammatory cytokines such as TNF-alpha. We describe here the genomic structure of the human NAK gene and analysis of the promoter. The gene spanned 40.5 kb and contained 21 exons with lengths ranging from 39 to 196 bp. Comparison of the phase and position of intron insertions within the human NAK gene with those within IKKalpha, IKKbeta and IKK epsilon indicated that the exon/intron organization of IKK epsilon is more highly conserved than that of IKKalpha or IKKbeta. The transcriptional start site was mapped at a position about 98 bp upstream from the translation start site by means of both an RNase protection assay and a primer extension method. Fluorescence in situ hybridization using full-length human NAK cDNA as a probe showed that the human NAK gene is localized to human chromosome 13q14.2-3, a region in which the loss of heterozygosity is associated with squamous cell carcinoma and leukemia. By using a series of deletion constructs in performing a reporter assay, a minimal 77 bp upstream of the transcriptional initiation site was shown to contribute to the major promoter activity.

  15. Comparative Analysis of Syntenic Genes in Grass Genomes Reveals Accelerated Rates of Gene Structure and Coding Sequence Evolution in Polyploid Wheat1[W][OA

    PubMed Central

    Akhunov, Eduard D.; Sehgal, Sunish; Liang, Hanquan; Wang, Shichen; Akhunova, Alina R.; Kaur, Gaganpreet; Li, Wanlong; Forrest, Kerrie L.; See, Deven; Šimková, Hana; Ma, Yaqin; Hayden, Matthew J.; Luo, Mingcheng; Faris, Justin D.; Doležel, Jaroslav; Gill, Bikram S.

    2013-01-01

    Cycles of whole-genome duplication (WGD) and diploidization are hallmarks of eukaryotic genome evolution and speciation. Polyploid wheat (Triticum aestivum) has had a massive increase in genome size largely due to recent WGDs. How these processes may impact the dynamics of gene evolution was studied by comparing the patterns of gene structure changes, alternative splicing (AS), and codon substitution rates among wheat and model grass genomes. In orthologous gene sets, significantly more acquired and lost exonic sequences were detected in wheat than in model grasses. In wheat, 35% of these gene structure rearrangements resulted in frame-shift mutations and premature termination codons. An increased codon mutation rate in the wheat lineage compared with Brachypodium distachyon was found for 17% of orthologs. The discovery of premature termination codons in 38% of expressed genes was consistent with ongoing pseudogenization of the wheat genome. The rates of AS within the individual wheat subgenomes (21%–25%) were similar to diploid plants. However, we uncovered a high level of AS pattern divergence between the duplicated homeologous copies of genes. Our results are consistent with the accelerated accumulation of AS isoforms, nonsynonymous mutations, and gene structure rearrangements in the wheat lineage, likely due to genetic redundancy created by WGDs. Whereas these processes mostly contribute to the degeneration of a duplicated genome and its diploidization, they have the potential to facilitate the origin of new functional variations, which, upon selection in the evolutionary lineage, may play an important role in the origin of novel traits. PMID:23124323

  16. A highly conserved gene island of three genes on chromosome 3B of hexaploid wheat: diverse gene function and genomic structure maintained in a tightly linked block

    PubMed Central

    2010-01-01

    Background The complexity of the wheat genome has resulted from waves of retrotransposable element insertions. Gene deletions and disruptions generated by the fast replacement of repetitive elements in wheat have resulted in disruption of colinearity at a micro (sub-megabase) level among the cereals. In view of genomic changes that are possible within a given time span, conservation of genes between species tends to imply an important functional or regional constraint that does not permit a change in genomic structure. The ctg1034 contig completed in this paper was initially studied because it was assigned to the Sr2 resistance locus region, but detailed mapping studies subsequently assigned it to the long arm of 3B and revealed its unusual features. Results BAC shotgun sequencing of the hexaploid wheat (Triticum aestivum cv. Chinese Spring) genome has been used to assemble a group of 15 wheat BACs from the chromosome 3B physical map FPC contig ctg1034 into a 783,553 bp genomic sequence. This ctg1034 sequence was annotated for biological features such as genes and transposable elements. A three-gene island was identified among >80% repetitive DNA sequence. Using bioinformatics analysis there were no observable similarity in their gene functions. The ctg1034 gene island also displayed complete conservation of gene order and orientation with syntenic gene islands found in publicly available genome sequences of Brachypodium distachyon, Oryza sativa, Sorghum bicolor and Zea mays, even though the intergenic space and introns were divergent. Conclusion We propose that ctg1034 is located within the heterochromatic C-band region of deletion bin 3BL7 based on the identification of heterochromatic tandem repeats and presence of significant matches to chromodomain-containing gypsy LTR retrotransposable elements. We also speculate that this location, among other highly repetitive sequences, may account for the relative stability in gene order and orientation within the gene

  17. Comparative mapping, genomic structure, and expression analysis of eight pseudo-response regulator genes in Brassica rapa.

    PubMed

    Kim, Jin A; Kim, Jung Sun; Hong, Joon Ki; Lee, Yeon-Hee; Choi, Beom-Soon; Seol, Young-Joo; Jeon, Chang Hoo

    2012-05-01

    Circadian clocks regulate plant growth and development in response to environmental factors. In this function, clocks influence the adaptation of species to changes in location or climate. Circadian-clock genes have been subject of intense study in models such as Arabidopsis thaliana but the results may not necessarily reflect clock functions in species with polyploid genomes, such as Brassica species, that include multiple copies of clock-related genes. The triplicate genome of Brassica rapa retains high sequence-level co-linearity with Arabidopsis genomes. In B. rapa we had previously identified five orthologs of the five known Arabidopsis pseudo-response regulator (PRR) genes that are key regulators of the circadian clock in this species. Three of these B. rapa genes, BrPRR1, BrPPR5, and BrPPR7, are present in two copies each in the B. rapa genome, for a total of eight B. rapa PRR (BrPRR) orthologs. We have now determined sequences and expression characteristics of the eight BrPRR genes and mapped their positions in the B. rapa genome. Although both members of each paralogous pair exhibited the same expression pattern, some variation in their gene structures was apparent. The BrPRR genes are tightly linked to several flowering genes. The knowledge about genome location, copy number variation and structural diversity of these B. rapa clock genes will improve our understanding of clock-related functions in this important crop. This will facilitate the development of Brassica crops for optimal growth in new environments and under changing conditions.

  18. The genomic structure of the human Charcot-Leyden crystal protein gene is analogous to those of the galectin genes

    SciTech Connect

    Dyer, K.D. |; Handen, J.S.; Rosenberg, H.F.

    1997-03-01

    The Charcot-Leyden crystal (CLC) protein, or eosinophil lysophospholipase, is a characteristic protein of human eosinophils and basophils; recent work has demonstrated that the CLC protein is both structurally and functionally related to the galectin family of {beta}-galactoside binding proteins. The galectins as a group share a number of features in common, including a linear ligand binding site encoded on a single exon. In this work, we demonstrate that the intron-exon structure of the gene encoding CLC is analogous to those encoding the galectins. The coding sequence of the CLC gene is divided into four exons, with the entire {beta}-galactoside binding site encoded by exon III. We have isolated CLC {beta}-galactoside binding sites from both orangutan (Pongo pygmaeus) and murine (Mus musculus) genomic DNAs, both encoded on single exons, and noted conservation of the amino acids shown to interact directly with the {beta}-galactoside ligand. The most likely interpretation of these results suggests the occurrence of one or more exon duplication and insertion events, resulting in the distribution of this lectin domain to CLC as well as to the multiple galectin genes. 35 refs., 3 figs.

  19. Genes, genome and Gestalt.

    PubMed

    Grisolia, Cesar Koppe

    2005-03-31

    According to Gestalt thinking, biological systems cannot be viewed as the sum of their elements, but as processes of the whole. To understand organisms we must start from the whole, observing how the various parts are related. In genetics, we must observe the genome over and above the sum of its genes. Either loss or addition of one gene in a genome can change the function of the organism. Genomes are organized in networks of genes, which need to be well integrated. In the case of genetically modified organisms (GMOs), for example, soybeans, rats, Anopheles mosquitoes, and pigs, the insertion of an exogenous gene into a receptive organism generally causes disturbance in the networks, resulting in the breakdown of gene interactions. In these cases, genetic modification increased the genetic load of the GMO and consequently decreased its adaptability (fitness). Therefore, it is hard to claim that the production of such organisms with an increased genetic load does not have ethical implications.

  20. Ultra High-Resolution Gene Centric Genomic Structural Analysis of a Non-Syndromic Congenital Heart Defect, Tetralogy of Fallot

    PubMed Central

    Bittel, Douglas C.; Zhou, Xin-Gang; Kibiryeva, Nataliya; Fiedler, Stephanie; O’Brien, James E.; Marshall, Jennifer; Yu, Shihui; Liu, Hong-Yu

    2014-01-01

    Tetralogy of Fallot (TOF) is one of the most common severe congenital heart malformations. Great progress has been made in identifying key genes that regulate heart development, yet approximately 70% of TOF cases are sporadic and nonsyndromic with no known genetic cause. We created an ultra high-resolution gene centric comparative genomic hybridization (gcCGH) microarray based on 591 genes with a validated association with cardiovascular development or function. We used our gcCGH array to analyze the genomic structure of 34 infants with sporadic TOF without a deletion on chromosome 22q11.2 (n male = 20; n female = 14; age range of 2 to 10 months). Using our custom-made gcCGH microarray platform, we identified a total of 613 copy number variations (CNVs) ranging in size from 78 base pairs to 19.5 Mb. We identified 16 subjects with 33 CNVs that contained 13 different genes which are known to be directly associated with heart development. Additionally, there were 79 genes from the broader list of genes that were partially or completely contained in a CNV. All 34 individuals examined had at least one CNV involving these 79 genes. Furthermore, we had available whole genome exon arrays from right ventricular tissue in 13 of our subjects. We analyzed these for correlations between copy number and gene expression level. Surprisingly, we could detect only one clear association between CNVs and expression (GSTT1) for any of the 591 focal genes on the gcCGH array. The expression levels of GSTT1 were correlated with copy number in all cases examined (r = 0.95, p = 0.001). We identified a large number of small CNVs in genes with varying associations with heart development. Our results illustrate the complexity of human genome structural variation and underscore the need for multifactorial assessment of potential genetic/genomic factors that contribute to congenital heart defects. PMID:24498113

  1. Promoter-Specific Expression and Genomic Structure of IgLON Family Genes in Mouse

    PubMed Central

    Vanaveski, Taavi; Singh, Katyayani; Narvik, Jane; Eskla, Kattri-Liis; Visnapuu, Tanel; Heinla, Indrek; Jayaram, Mohan; Innos, Jürgen; Lilleväli, Kersti; Philips, Mari-Anne; Vasar, Eero

    2017-01-01

    IgLON family is composed of five genes: Lsamp, Ntm, Opcml, Negr1, and Iglon5; encoding for five highly homologous neural adhesion proteins that regulate neurite outgrowth and synapse formation. In the current study we performed in silico analysis revealing that Ntm and Opcml display similar genomic structure as previously reported for Lsamp, characterized by two alternative promotors 1a and 1b. Negr1 and Iglon5 transcripts have uniform 5′ region, suggesting single promoter. Iglon5, the recently characterized family member, shares high level of conservation and structural qualities characteristic to IgLON family such as N-terminal signal peptide, three Ig domains, and GPI anchor binding site. By using custom 5′-isoform-specific TaqMan gene-expression assay, we demonstrated heterogeneous expression of IgLON transcripts in different areas of mouse brain and several-fold lower expression in selected tissues outside central nervous system. As an example, the expression of IgLON transcripts in urogenital and reproductive system is in line with repeated reports of urogenital tumors accompanied by mutations in IgLON genes. Considering the high levels of intra-family homology shared by IgLONs, we investigated potential compensatory effects at the level of IgLON isoforms in the brains of mice deficient of one or two family members. We found that the lack of IgLONs is not compensated by a systematic quantitative increase of the other family members. On the contrary, the expression of Ntm 1a transcript and NEGR1 protein was significantly reduced in the frontal cortex of Lsamp-deficient mice suggesting that the expression patterns within IgLON family are balanced coherently. The actions of individual IgLONs, however, can be antagonistic as demonstrated by differential expression of Syp in deletion mutants of IgLONs. In conclusion, we show that the genomic twin-promoter structure has impact on both anatomical distribution and intra-family interactions of IgLON family members

  2. The mouse formin (Fmn) gene: Genomic structure, novel exons, and genetic mapping

    SciTech Connect

    Wang, C.C.; Chan, D.C.; Leder, P.

    1997-02-01

    Mutations in the mouse formin (Fmn) gene, formerly known as the limb deformity (ld) gene, give rise to recessively inherited limb deformities and renal malformations or aplasia. The Fmn gene encodes many differentially processed transcripts that are expressed in both adult and embryonic tissues. To study the genomic organization of the Fmn locus, we have used Fmn probes to isolate and characterize genomic clones spanning 500 kb. Our analysis of these clones shows that the Fmn gene is composed of at least 24 exons and spans 400 kb. We have identified two novel exons that are expressed in the developing embryonic limb bud as well as adult tissues such as brain and kidney. We have also used a microsatellite polymorphism from within the Fmn gene to map it genetically to a 2.2-cM interval between D2Mit58 and D2Mit103. 36 refs., 6 figs., 1 tab.

  3. Genomic structure, organisation, and promoter analysis of the bovine (Bos taurus) Mx1 gene.

    PubMed

    Gérardin, Joël A; Baise, Etienne A; Pire, Grégory A; Leroy, Michaël P-P; Desmecht, Daniel J-M

    2004-02-04

    Some MX proteins are known to confer a specific resistance against a panel of single-stranded RNA viruses. Many diseases due to such viruses are known to affect cattle worldwide, raising the possibility that the identification of an antiviral isoform of a bovine MX protein would allow the implementation of genetic selection programs aimed at improving innate resistance of cattle. With this potential application in mind, the present study was designed to isolate the bovine Mx1 gene including its promoter region and to investigate its genomic organisation and promoter reactivity. The bovine Mx1 gene is made up of 15 exons. All exon-intron boundaries conformed to the consensus sequences. A PCR product that contained a approximately 1-kb, 5'-flanking region upstream from the putative transcription start site was sequenced. Unexpectedly, this DNA region did not contain TATA or CCAAT motifs. A computer scan of the region disclosed a series of putative binding sites for known cytokines and transcription factors. There was a GAAAN(1-2)GAAA(C/G) motif, typical of an interferon-sensitive responsive element, between -118 and -107 from the putative transcription start site. There were also a NF-kappaB, two interleukin-6 binding sites, two Sp1 sites and five GC-rich boxes. The region also contained 12 stretches of the GAAA type, as described in all IFN-inducible genes. Bovine Mx1 expression was assessed by Northern blotting and immunofluorescence in the Madin Darby bovine kidney cells (MDBK) cell line treated with several stimuli. In conclusion, the bovine Mx1 gene and promoter region share the major structural and functional characteristics displayed by their homologs described in the rainbow trout, chicken, mouse and man.

  4. The Aspergillus Genome Database: multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations

    PubMed Central

    Cerqueira, Gustavo C.; Arnaud, Martha B.; Inglis, Diane O.; Skrzypek, Marek S.; Binkley, Gail; Simison, Matt; Miyasato, Stuart R.; Binkley, Jonathan; Orvis, Joshua; Shah, Prachi; Wymore, Farrell; Sherlock, Gavin; Wortman, Jennifer R.

    2014-01-01

    The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available web-based resource that was designed for Aspergillus researchers and is also a valuable source of information for the entire fungal research community. In addition to being a repository and central point of access to genome, transcriptome and polymorphism data, AspGD hosts a comprehensive comparative genomics toolbox that facilitates the exploration of precomputed orthologs among the 20 currently available Aspergillus genomes. AspGD curators perform gene product annotation based on review of the literature for four key Aspergillus species: Aspergillus nidulans, Aspergillus oryzae, Aspergillus fumigatus and Aspergillus niger. We have iteratively improved the structural annotation of Aspergillus genomes through the analysis of publicly available transcription data, mostly expressed sequenced tags, as described in a previous NAR Database article (Arnaud et al. 2012). In this update, we report substantive structural annotation improvements for A. nidulans, A. oryzae and A. fumigatus genomes based on recently available RNA-Seq data. Over 26 000 loci were updated across these species; although those primarily comprise the addition and extension of untranslated regions (UTRs), the new analysis also enabled over 1000 modifications affecting the coding sequence of genes in each target genome. PMID:24194595

  5. Genome-Wide Analysis of the Expansin Gene Superfamily Reveals Grapevine-Specific Structural and Functional Characteristics

    PubMed Central

    Tornielli, Giovanni Battista; Fasoli, Marianna; Venturini, Luca; Pezzotti, Mario; Zenoni, Sara

    2013-01-01

    Background Expansins are proteins that loosen plant cell walls in a pH-dependent manner, probably by increasing the relative movement among polymers thus causing irreversible expansion. The expansin superfamily (EXP) comprises four distinct families: expansin A (EXPA), expansin B (EXPB), expansin-like A (EXLA) and expansin-like B (EXLB). There is experimental evidence that EXPA and EXPB proteins are required for cell expansion and developmental processes involving cell wall modification, whereas the exact functions of EXLA and EXLB remain unclear. The complete grapevine (Vitis vinifera) genome sequence has allowed the characterization of many gene families, but an exhaustive genome-wide analysis of expansin gene expression has not been attempted thus far. Methodology/Principal Findings We identified 29 EXP superfamily genes in the grapevine genome, representing all four EXP families. Members of the same EXP family shared the same exon–intron structure, and phylogenetic analysis confirmed a closer relationship between EXP genes from woody species, i.e. grapevine and poplar (Populus trichocarpa), compared to those from Arabidopsis thaliana and rice (Oryza sativa). We also identified grapevine-specific duplication events involving the EXLB family. Global gene expression analysis confirmed a strong correlation among EXP genes expressed in mature and green/vegetative samples, respectively, as reported for other gene families in the recently-published grapevine gene expression atlas. We also observed the specific co-expression of EXLB genes in woody organs, and the involvement of certain grapevine EXP genes in berry development and post-harvest withering. Conclusion Our comprehensive analysis of the grapevine EXP superfamily confirmed and extended current knowledge about the structural and functional characteristics of this gene family, and also identified properties that are currently unique to grapevine expansin genes. Our data provide a model for the functional

  6. The Gene Ontology (GO) project: structured vocabularies for molecular biology and their application to genome and expression analysis.

    PubMed

    Blake, Judith A; Harris, Midori A

    2008-09-01

    Scientists wishing to utilize genomic data have quickly come to realize the benefit of standardizing descriptions of experimental procedures and results for computer-driven information retrieval systems. The focus of the Gene Ontology project is three-fold. First, the project goal is to compile the Gene Ontologies: structured vocabularies describing domains of molecular biology. Second, the project supports the use of these structured vocabularies in the annotation of gene products. Third, the gene product-to-GO annotation sets are provided by participating groups to the public through open access to the GO database and Web resource. This unit describes the current ontologies and what is beyond the scope of the Gene Ontology project. It addresses the issue of how GO vocabularies are constructed and related to genes and gene products. It concludes with a discussion of how researchers can access, browse, and utilize the GO project in the course of their own research. Copyright 2008 by John Wiley & Sons, Inc.

  7. Core histone genes of Giardia intestinalis: genomic organization, promoter structure, and expression.

    PubMed

    Yee, Janet; Tang, Anita; Lau, Wei-Ling; Ritter, Heather; Delport, Dewald; Page, Melissa; Adam, Rodney D; Müller, Miklós; Wu, Gang

    2007-04-10

    Giardia intestinalis is a protist found in freshwaters worldwide, and is the most common cause of parasitic diarrhea in humans. The phylogenetic position of this parasite is still much debated. Histones are small, highly conserved proteins that associate tightly with DNA to form chromatin within the nucleus. There are two classes of core histone genes in higher eukaryotes: DNA replication-independent histones and DNA replication-dependent ones. We identified two copies each of the core histone H2a, H2b and H3 genes, and three copies of the H4 gene, at separate locations on chromosomes 3, 4 and 5 within the genome of Giardia intestinalis, but no gene encoding a H1 linker histone could be recognized. The copies of each gene share extensive DNA sequence identities throughout their coding and 5' noncoding regions, which suggests these copies have arisen from relatively recent gene duplications or gene conversions. The transcription start sites are at triplet A sequences 1-27 nucleotides upstream of the translation start codon for each gene. We determined that a 50 bp region upstream from the start of the histone H4 coding region is the minimal promoter, and a highly conserved 15 bp sequence called the histone motif (him) is essential for its activity. The Giardia core histone genes are constitutively expressed at approximately equivalent levels and their mRNAs are polyadenylated. Competition gel-shift experiments suggest that a factor within the protein complex that binds him may also be a part of the protein complexes that bind other promoter elements described previously in Giardia. In contrast to other eukaryotes, the Giardia genome has only a single class of core histone genes that encode replication-independent histones. Our inability to locate a gene encoding the linker histone H1 leads us to speculate that the H1 protein may not be required for the compaction of Giardia's small and gene-rich genome.

  8. Genome structure drives patterns of gene family evolution in ciliates, a case study using Chilodonella uncinata (Protista, Ciliophora, Phyllopharyngea)

    PubMed Central

    Gao, Feng; Song, Weibo; Katz, Laura A.

    2014-01-01

    In most lineages, diversity among gene family members results from gene duplication followed by sequence divergence. Because of the genome rearrangements during the development of somatic nuclei, gene family evolution in ciliates involves more complex processes. Previous work on the ciliate Chilodonella uncinata revealed that macronuclear β-tubulin gene family members are generated by alternative processing, in which germline regions are alternatively used in multiple macronuclear chromosomes. To further study genome evolution in this ciliate, we analyzed its transcriptome and found that: 1) alternative processing is extensive among gene families; and 2) such gene families are likely to be C. uncinata-specific. We characterized additional macronuclear and micronuclear copies of one candidate alternatively processed gene family -- a protein kinase domain containing protein (PKc) -- from two C. uncinata strains. Analysis of the PKc sequences reveals: 1) multiple PKc gene family members in the macronucleus share some identical regions flanked by divergent regions; and 2) the shared identical regions are processed from a single micronuclear chromosome. We discuss analogous processes in lineages across the eukaryotic tree of life to provide further insights on the impact of genome structure on gene family evolution in eukaryotes. PMID:24749903

  9. Genome structure drives patterns of gene family evolution in ciliates, a case study using Chilodonella uncinata (Protista, Ciliophora, Phyllopharyngea).

    PubMed

    Gao, Feng; Song, Weibo; Katz, Laura A

    2014-08-01

    In most lineages, diversity among gene family members results from gene duplication followed by sequence divergence. Because of the genome rearrangements during the development of somatic nuclei, gene family evolution in ciliates involves more complex processes. Previous work on the ciliate Chilodonella uncinata revealed that macronuclear β-tubulin gene family members are generated by alternative processing, in which germline regions are alternatively used in multiple macronuclear chromosomes. To further study genome evolution in this ciliate, we analyzed its transcriptome and found that (1) alternative processing is extensive among gene families; and (2) such gene families are likely to be C. uncinata specific. We characterized additional macronuclear and micronuclear copies of one candidate alternatively processed gene family-a protein kinase domain containing protein (PKc)-from two C. uncinata strains. Analysis of the PKc sequences reveals that (1) multiple PKc gene family members in the macronucleus share some identical regions flanked by divergent regions; and (2) the shared identical regions are processed from a single micronuclear chromosome. We discuss analogous processes in lineages across the eukaryotic tree of life to provide further insights on the impact of genome structure on gene family evolution in eukaryotes. © 2014 The Author(s). Evolution © 2014 The Society for the Study of Evolution.

  10. The complete chloroplast genome sequence of an endemic monotypic genus Hagenia (Rosaceae): structural comparative analysis, gene content and microsatellite detection

    PubMed Central

    Saina, Josphat K.; Long, Zhicheng; Hu, Guangwan; Gituru, Robert W.

    2017-01-01

    Hagenia is an endangered monotypic genus endemic to the topical mountains of Africa. The only species, Hagenia abyssinica (Bruce) J.F. Gmel, is an important medicinal plant producing bioactive compounds that have been traditionally used by African communities as a remedy for gastrointestinal ailments in both humans and animals. Complete chloroplast genomes have been applied in resolving phylogenetic relationships within plant families. We employed high-throughput sequencing technologies to determine the complete chloroplast genome sequence of H. abyssinica. The genome is a circular molecule of 154,961 base pairs (bp), with a pair of Inverted Repeats (IR) 25,971 bp each, separated by two single copies; a large (LSC, 84,320 bp) and a small single copy (SSC, 18,696). H. abyssinica’s chloroplast genome has a 37.1% GC content and encodes 112 unique genes, 78 of which code for proteins, 30 are tRNA genes and four are rRNA genes. A comparative analysis with twenty other species, sequenced to-date from the family Rosaceae, revealed similarities in structural organization, gene content and arrangement. The observed size differences are attributed to the contraction/expansion of the inverted repeats. The translational initiation factor gene (infA) which had been previously reported in other chloroplast genomes was conspicuously missing in H. abyssinica. A total of 172 microsatellites and 49 large repeat sequences were detected in the chloroplast genome. A Maximum Likelihood analyses of 71 protein-coding genes placed Hagenia in Rosoideae. The availability of a complete chloroplast genome, the first in the Sanguisorbeae tribe, is beneficial for further molecular studies on taxonomic and phylogenomic resolution within the Rosaceae family. PMID:28097059

  11. The complete chloroplast genome sequence of an endemic monotypic genus Hagenia (Rosaceae): structural comparative analysis, gene content and microsatellite detection.

    PubMed

    Gichira, Andrew W; Li, Zhizhong; Saina, Josphat K; Long, Zhicheng; Hu, Guangwan; Gituru, Robert W; Wang, Qingfeng; Chen, Jinming

    2017-01-01

    Hagenia is an endangered monotypic genus endemic to the topical mountains of Africa. The only species, Hagenia abyssinica (Bruce) J.F. Gmel, is an important medicinal plant producing bioactive compounds that have been traditionally used by African communities as a remedy for gastrointestinal ailments in both humans and animals. Complete chloroplast genomes have been applied in resolving phylogenetic relationships within plant families. We employed high-throughput sequencing technologies to determine the complete chloroplast genome sequence of H. abyssinica. The genome is a circular molecule of 154,961 base pairs (bp), with a pair of Inverted Repeats (IR) 25,971 bp each, separated by two single copies; a large (LSC, 84,320 bp) and a small single copy (SSC, 18,696). H. abyssinica's chloroplast genome has a 37.1% GC content and encodes 112 unique genes, 78 of which code for proteins, 30 are tRNA genes and four are rRNA genes. A comparative analysis with twenty other species, sequenced to-date from the family Rosaceae, revealed similarities in structural organization, gene content and arrangement. The observed size differences are attributed to the contraction/expansion of the inverted repeats. The translational initiation factor gene (infA) which had been previously reported in other chloroplast genomes was conspicuously missing in H. abyssinica. A total of 172 microsatellites and 49 large repeat sequences were detected in the chloroplast genome. A Maximum Likelihood analyses of 71 protein-coding genes placed Hagenia in Rosoideae. The availability of a complete chloroplast genome, the first in the Sanguisorbeae tribe, is beneficial for further molecular studies on taxonomic and phylogenomic resolution within the Rosaceae family.

  12. Genome-Wide Analysis of the Sucrose Synthase Gene Family in Grape (Vitis vinifera): Structure, Evolution, and Expression Profiles.

    PubMed

    Zhu, Xudong; Wang, Mengqi; Li, Xiaopeng; Jiu, Songtao; Wang, Chen; Fang, Jinggui

    2017-03-28

    Sucrose synthase (SS) is widely considered as the key enzyme involved in the plant sugar metabolism that is critical to plant growth and development, especially quality of the fruit. The members of SS gene family have been identified and characterized in multiple plant genomes. However, detailed information about this gene family is lacking in grapevine (Vitis vinifera L.). In this study, we performed a systematic analysis of the grape (V. vinifera) genome and reported that there are five SS genes (VvSS1-5) in the grape genome. Comparison of the structures of grape SS genes showed high structural conservation of grape SS genes, resulting from the selection pressures during the evolutionary process. The segmental duplication of grape SS genes contributed to this gene family expansion. The syntenic analyses between grape and soybean (Glycine max) demonstrated that these genes located in corresponding syntenic blocks arose before the divergence of grape and soybean. Phylogenetic analysis revealed distinct evolutionary paths for the grape SS genes. VvSS1/VvSS5, VvSS2/VvSS3 and VvSS4 originated from three ancient SS genes, which were generated by duplication events before the split of monocots and eudicots. Bioinformatics analysis of publicly available microarray data, which was validated by quantitative real-time reverse transcription PCR (qRT-PCR), revealed distinct temporal and spatial expression patterns of VvSS genes in various tissues, organs and developmental stages, as well as in response to biotic and abiotic stresses. Taken together, our results will be beneficial for further investigations into the functions of SS gene in the processes of grape resistance to environmental stresses.

  13. Genome-Wide Analysis of the Sucrose Synthase Gene Family in Grape (Vitis vinifera): Structure, Evolution, and Expression Profiles

    PubMed Central

    Zhu, Xudong; Wang, Mengqi; Li, Xiaopeng; Jiu, Songtao; Wang, Chen; Fang, Jinggui

    2017-01-01

    Sucrose synthase (SS) is widely considered as the key enzyme involved in the plant sugar metabolism that is critical to plant growth and development, especially quality of the fruit. The members of SS gene family have been identified and characterized in multiple plant genomes. However, detailed information about this gene family is lacking in grapevine (Vitis vinifera L.). In this study, we performed a systematic analysis of the grape (V. vinifera) genome and reported that there are five SS genes (VvSS1–5) in the grape genome. Comparison of the structures of grape SS genes showed high structural conservation of grape SS genes, resulting from the selection pressures during the evolutionary process. The segmental duplication of grape SS genes contributed to this gene family expansion. The syntenic analyses between grape and soybean (Glycine max) demonstrated that these genes located in corresponding syntenic blocks arose before the divergence of grape and soybean. Phylogenetic analysis revealed distinct evolutionary paths for the grape SS genes. VvSS1/VvSS5, VvSS2/VvSS3 and VvSS4 originated from three ancient SS genes, which were generated by duplication events before the split of monocots and eudicots. Bioinformatics analysis of publicly available microarray data, which was validated by quantitative real-time reverse transcription PCR (qRT-PCR), revealed distinct temporal and spatial expression patterns of VvSS genes in various tissues, organs and developmental stages, as well as in response to biotic and abiotic stresses. Taken together, our results will be beneficial for further investigations into the functions of SS gene in the processes of grape resistance to environmental stresses. PMID:28350372

  14. Improved structural annotation of protein-coding genes in the Meloidogyne hapla genome using RNA-Seq.

    PubMed

    Guo, Yuelong; Bird, David McK; Nielsen, Dahlia M

    2014-01-01

    As high-throughput cDNA sequencing (RNA-Seq) is increasingly applied to hypothesis-driven biological studies, the prediction of protein coding genes based on these data are usurping strictly in silico approaches. Compared with computationally derived gene predictions, structural annotation is more accurate when based on biological evidence, particularly RNA-Seq data. Here, we refine the current genome annotation for the Meloidogyne hapla genome utilizing RNA-Seq data. Published structural annotation defines 14 420 protein-coding genes in the M. hapla genome. Of these, 25% (3751) were found to exhibit some incongruence with RNA-Seq data. Manual annotation enabled these discrepancies to be resolved. Our analysis revealed 544 new gene models that were missing from the prior annotation. Additionally, 1457 transcribed regions were newly identified on the ends of as-yet-unjoined contigs. We also searched for trans-spliced leaders, and based on RNA-Seq data, identified genes that appear to be trans-spliced. Four 22-bp trans-spliced leaders were identified using our pipeline, including the known trans-spliced leader, which is the M. hapla ortholog of SL1. In silico predictions of trans-splicing were validated by comparison with earlier results derived from an independent cDNA library constructed to capture trans-spliced transcripts. The new annotation, which we term HapPep5, is publically available at www.hapla.org.

  15. Global analysis of somatic structural genomic alterations and their impact on gene expression in diverse human cancers

    PubMed Central

    Alaei-Mahabadi, Babak; Karlsson, Joakim W.; Nilsson, Jonas A.; Larsson, Erik

    2016-01-01

    Tumor genomes are mosaics of somatic structural variants (SVs) that may contribute to the activation of oncogenes or inactivation of tumor suppressors, for example, by altering gene copy number amplitude. However, there are multiple other ways in which SVs can modulate transcription, but the general impact of such events on tumor transcriptional output has not been systematically determined. Here we use whole-genome sequencing data to map SVs across 600 tumors and 18 cancers, and investigate the relationship between SVs, copy number alterations (CNAs), and mRNA expression. We find that 34% of CNA breakpoints can be clarified structurally and that most amplifications are due to tandem duplications. We observe frequent swapping of strong and weak promoters in the context of gene fusions, and find that this has a measurable global impact on mRNA levels. Interestingly, several long noncoding RNAs were strongly activated by this mechanism. Additionally, SVs were confirmed in telomere reverse transcriptase (TERT) upstream regions in several cancers, associated with elevated TERT mRNA levels. We also highlight high-confidence gene fusions supported by both genomic and transcriptomic evidence, including a previously undescribed paired box 8 (PAX8)–nuclear factor, erythroid 2 like 2 (NFE2L2) fusion in thyroid carcinoma. In summary, we combine SV, CNA, and expression data to provide insights into the structural basis of CNAs as well as the impact of SVs on gene expression in tumors. PMID:27856756

  16. Structure of the cloned Locusta migratoria mitochondrial genome: restriction mapping and sequence of its ND-1 (URF-1) gene.

    PubMed

    McCracken, A; Uhlenbusch, I; Gellissen, G

    1987-01-01

    We have cloned the entire mitochondrial genome of Locusta migratoria in four fragments and characterised by restriction mapping. In addition, we have sequenced a 1,095 kb region containing the ND-1 (URF-1) gene. The inferred primary structure of the protein is highly homologous to its Drosophila counterpart (68%). The gene is flanked at the 5' end by the tRNA(CUNleu) gene, interrupted by the sequence TTG. The 3' end is flanked by the tRNA(UCNser) gene, followed by a sequence homologous to the 3' end of D. yakuba cytochrome b. The relative position of the genes is conserved between Locusta and Drosophila, thus indicating conservation of mitochondrial gene order in insects.

  17. The Eucalyptus Tonoplast Intrinsic Protein (TIP) Gene Subfamily: Genomic Organization, Structural Features, and Expression Profiles

    PubMed Central

    Rodrigues, Marcela I.; Takeda, Agnes A. S.; Bravo, Juliana P.; Maia, Ivan G.

    2016-01-01

    Plant aquaporins are water channels implicated in various physiological processes, including growth, development and adaptation to stress. In this study, the Tonoplast Intrinsic Protein (TIP) gene subfamily of Eucalyptus, an economically important woody species, was investigated and characterized. A genome-wide survey of the Eucalyptus grandis genome revealed the presence of eleven putative TIP genes (referred as EgTIP), which were individually assigned by phylogeny to each of the classical TIP1–5 groups. Homology modeling confirmed the presence of the two highly conserved NPA (Asn-Pro-Ala) motifs in the identified EgTIPs. Residue variations in the corresponding selectivity filters, that might reflect differences in EgTIP substrate specificity, were observed. All EgTIP genes, except EgTIP5.1, were transcribed and the majority of them showed organ/tissue-enriched expression. Inspection of the EgTIP promoters revealed the presence of common cis-regulatory elements implicated in abiotic stress and hormone responses pointing to an involvement of the identified genes in abiotic stress responses. In line with these observations, additional gene expression profiling demonstrated increased expression under polyethylene glycol-imposed osmotic stress. Overall, the results obtained suggest that these novel EgTIPs might be functionally implicated in eucalyptus adaptation to stress. PMID:27965702

  18. Structural characterization of helitrons and their stepwise capturing of gene fragments in the maize genome

    PubMed Central

    2011-01-01

    Background As a newly identified category of DNA transposon, helitrons have been found in a large number of eukaryotes genomes. Helitrons have contributed significantly to the intra-specific genome diversity in maize. Although many characteristics of helitrons in the maize genome have been well documented, the sequence of an intact autonomous helitrons has not been identified in maize. In addition, the process of gene fragment capturing during the transposition of helitrons has not been characterized. Results The whole genome sequences of maize inbred line B73 were analyzed, 1,649 helitron-like transposons including 1,515 helAs and 134 helBs were identified. ZmhelA1, ZmhelB1 and ZmhelB2 all encode an open reading frame (ORF) with intact replication initiator (Rep) motif and a DNA helicase (Hel) domain, which are similar to previously reported autonomous helitrons in other organisms. The putative autonomous ZmhelB1 and ZmhelB2 contain an extra replication factor-a protein1 (RPA1) transposase (RPA-TPase) including three single strand DNA-binding domains (DBD)-A/-B/-C in the ORF. Over ninety percent of maize helitrons identified have captured gene fragments. HelAs and helBs carry 4,645 and 249 gene fragments, which yield 2,507 and 187 different genes respectively. Many helitrons contain mutilple terminal sequences, but only one 3'-terminal sequence had an intact "CTAG" motif. There were no significant differences in the 5'-termini sequence between the veritas terminal sequence and the pseudo sequence. Helitrons not only can capture fragments, but were also shown to lose internal sequences during the course of transposing. Conclusions Three putative autonomous elements were identified, which encoded an intact Rep motif and a DNA helicase domain, suggesting that autonomous helitrons may exist in modern maize. The results indicate that gene fragments captured during the transposition of many helitrons happen in a stepwise way, with multiple gene fragments within one

  19. The human glia maturation factor-gamma gene: genomic structure and mutation analysis in gliomas with chromosome 19q loss.

    PubMed

    Peters, N; Smith, J S; Tachibana, I; Lee, H K; Pohl, U; Portier, B P; Louis, D N; Jenkins, R B

    1999-09-01

    Human glia maturation factor-gamma (hGMF-gamma) is a recently identified gene that may be involved in glial differentiation, neural regeneration, and inhibition of tumor cell proliferation. The gene maps to the long arm of chromosome 19 at band q13.2, a region that is frequently deleted in human malignant gliomas and is thus suspected to harbor a glioma tumor suppressor gene. Given the putative role of hGMF-gamma in cell differentiation and proliferation and its localization to chromosome 19q13, this gene is an interesting candidate for the chromosome 19q glioma tumor suppressor gene. To evaluate this possibility, we determined the genomic structure of human hGMF-gamma and performed mutation screening in a series of 41 gliomas with and without allelic loss of chromosome 19q. Mutations were not detected, which suggests that hGMF-gamma is not the chromosome 19q glioma suppressor gene. However, the elucidation of the genomic structure of hGMF-gamma may prove useful in future investigations of hGMF-gamma in the normal adult and developing human nervous system.

  20. Genomic cloning, structure, expression pattern, and chromosomal location of the human SIX3 gene.

    PubMed

    Granadino, B; Gallardo, M E; López-Ríos, J; Sanz, R; Ramos, C; Ayuso, C; Bovolenta, P; Rodríguez de Córdoba, S

    1999-01-01

    The Drosophila gene sine oculis (so) is a nuclear homeoprotein that is required for eye development. Homologous genes to so, denoted SIX genes, have been found in vertebrates. Among the SIX genes, SIX3 is considered to be the functional homologue of so. To provide insight into the potential implications of SIX3 in human ocular malformations, we have cloned and characterized the human SIX3 gene. In human eye, SIX3 produces a 3-kb transcript that codes for a 332-amino-acid polypeptide that is virtually identical to its mouse and chick homologues. Expression of SIX3 was detected in human embryos as early as 5-7 weeks of gestation and found to be maintained in the eye throughout the entire period of fetal development. At 20 weeks of gestation, expression of SIX3 in the human retina was detected in the ganglion cells and in cells of the inner nuclear layer. The human SIX3 gene spans 4.4 kb of genomic DNA and is split in two exons separated by a 1659-bp intron. SIX3 was mapped to human chromosome 2p16-p21, between the genetic markers D2S119 and D2S288. Interestingly, the map position of human SIX3 overlaps the locations of two dominant disorders with ocular phenotypes that have been assigned to this chromosomal region, holoprosencephaly type 2 and Malattia Leventinese.

  1. Genome-wide analysis of the structural genes regulating defense phenylpropanoid metabolism in Populus

    SciTech Connect

    Tschaplinski, Timothy J; Tsai, Chung-Jui; Harding, Scott A; Lindroth, richard L; Yuan, Yinan

    2006-01-01

    Salicin-based phenolic glycosides, hydroxycinnamate derivatives and flavonoid-derived condensed tannins comprise up to one-third of Populus leaf dry mass. Genes regulating the abundance and chemical diversity of these substances have not been comprehensively analysed in tree species exhibiting this metabolically demanding level of phenolic metabolism. Here, shikimate-phenylpropanoid pathway genes thought to give rise to these phenolic products were annotated from the Populus genome, their expression assessed by semiquantitative or quantitative reverse transcription polymerase chain reaction (PCR), and metabolic evidence for function presented. Unlike Arabidopsis, Populus leaves accumulate an array of hydroxycinnamoyl-quinate esters, which is consistent with broadened function of the expanded hydroxycinnamoyl-CoA transferase gene family. Greater flavonoid pathway diversity is also represented, and flavonoid gene families are larger. Consistent with expanded pathway function, most of these genes were upregulated during wound-stimulated condensed tannin synthesis in leaves. The suite of Populus genes regulating phenylpropanoid product accumulation should have important application in managing phenolic carbon pools in relation to climate change and global carbon cycling.

  2. Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models.

    PubMed

    Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar

    2016-03-01

    Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.

  3. Genomic structure and mapping of precerebellin and a precerebellin-related gene.

    PubMed

    Kavety, B; Jenkins, N A; Fletcher, C F; Copeland, N G; Morgan, J I

    1994-11-01

    The cerebellum-specific hexadecapeptide, cerebellin, is derived from a larger precursor, precerebellin, that has sequence homology to the complement component C1qB. We report the cloning of the murine homolog of precerebellin, Cbln1, and a closely related gene, Cbln2. Amino acid comparison of Cbln1 with Cbln2 revealed that Cbln2 is 88% identical to the carboxy terminal region of Cbln1. That these are independent genes was confirmed by Southern analysis and genome mapping. Cbln1 was positioned to the central region of mouse chromosome 8, 2.3 cM distal of JunB and 6.0 cM proximal of Mt1, while Cbln2 mapped to the distal end of mouse chromosome 18, 1.7 cM telomeric of Mbp.

  4. Structure and chromosomal localization of the genomic locus encoding the Kiz1 LIM-kinase gene

    SciTech Connect

    Bernard, O.; Burkitt, V.; Webb, G.C.

    1996-08-01

    We have cloned and characterized the mouse gene encoding Kiz1/Limk1, a new member of the zinc-finger LIM family that also has a kinase domain. The gene encompasses 25 kb of the mouse genome, and the organization of its 16 exons does not correlate with its functional domains. The promoter region of Kiz1/Limk1 was identified by cloning a 1.06-kb genomic fragment upstream from the first ATG in a promotorless CAT vector. This construct was demonstrated to drive CAT expression in Jurkat cells. The promoter sequence lacks conventional TATA and CAAT motifs but contains consensus binding sequences for several transcriptional regulators implicated in control of transcription in many different cell types, including Sp1, Ets, and E2A. Analysis of the chromosomal localization of KIZ1/LIMK1 indicates that it lies on human chromosome 17 in the region 17q25 and on mouse Chromosome 5, band G2. 15 refs., 3 figs., 1 tab.

  5. Characterization of the genomic structure of the mouse APLP1 gene

    SciTech Connect

    Zhong, Sue; Wu, Kuo; Black, I.B.; Schaar, D.G.

    1996-02-15

    This article reports on the organization of the mouse APLP1 gene, an evolutionarily conserved amyloid precursor-like protein. The amyloid beta protein, important in Alzheimer diseases, is derived from these precursor proteins. By investigating the expression and structure of this murine gene, it is hoped that more will be learned about the function and regulation of the human homologue. 27 refs., 2 figs.

  6. Genomic structure of the luciferase gene and phylogenetic analysis in the Hotaria-group fireflies.

    PubMed

    Choi, Yong Soo; Bae, Jin Sik; Lee, Kwang Sik; Kim, Seong Ryul; Kim, Iksoo; Kim, Jong Gill; Kim, Keun Young; Kim, Sam Eun; Suzuki, Hirobumi; Lee, Sang Mong; Sohn, Hung Dae; Jin, Byung Rae

    2003-02-01

    The luminescent fireflies have species specific flash patterns, being recognized as sexual communication. The luciferase gene is the sole enzyme responsible for bioluminescence. We describe here the complete nucleotide sequence and the exon-intron structure of the luciferase gene of the Hotaria-group fireflies, H. unmunsana, H. papariensis and H. tsushimana. The luciferase gene of the Hotaria-group firefly including the known H. parvula spans 1950 bp and consisted of six introns and seven exons coding for 548 amino acid residues, suggesting highly conserved structure among the Hotaria-group fireflies. Although only one luciferase gene was cloned from H. papariensis, each of the two sequences of the gene was found in H. unmunsana (U1 and Uc) and H. tsushimana (T1 and T2). The amino acid sequence divergence among H. unmunsana, H. papariensis, and H. tsushimana only ranged from zero to three amino acid residues, but H. parvula differed by 10-11 amino acid residues from the other Hotaria-group fireflies, suggesting a divergent relationship of this species. Phylogenetic analysis using the deduced amino acid sequences of the luciferase gene resulted in a monophyletic group in the Hotaria excluding H. parvula, suggesting a close relationship among H. unmunsana, H. papariensis and H. tsushimana. Additionally, we also analyzed the mitochondrial cytochrome oxidase I (COI) gene of the Hotaria-group fireflies. The deduced amino acid sequence of the COI gene of H. unmunsana was identical to that of H. papariensis and H. tsushimana, but different by three positions from H. parvula. In terms of nucleotide sequences of the COI gene, intraspecific sequence divergence was sometimes larger than interspecies level, and phylogenetic analysis placed the three species into monophyletic groups unresolved among them, but excluded H. parvula. In conclusion, our results suggest that H. unmunsana, H. papariensis and H. tsushimana are very closely related or might be an identical species, at

  7. The chicken transforming growth factor-beta 3 gene: genomic structure, transcriptional analysis, and chromosomal location.

    PubMed

    Burt, D W; Dey, B R; Paton, I R; Morrice, D R; Law, A S

    1995-02-01

    In this paper, we report the isolation, characterization, and mapping of the chicken transforming growth factor-beta 3 (TGF-beta 3) gene. The gene contains seven exons and six introns spanning 16-kb of the chicken genome. A comparison of the 5'-flanking regions of human and chicken TGF-beta 3 genes reveals two regions of sequence conservation. The first contains ATF/CRE and TBP/TATA sequence motifs within an 87-bp region. The second is a 162-bp region with no known sequence motifs. Identification of transcription start sites using chicken RNA isolated from various embryonic and adult tissues reveals two sites of initiation, P1 and P2, which map to these two conserved regions. Comparison of 3'-flanking regions of chicken and mammalian TGF-beta 3 genes also revealed conserved sequences. The most significant homologies were found in the 3'-most end of the transcribed region. DNA sequence analysis of chicken TGF-beta 3 cDNAs isolated by 3'-RACE revealed multiple polyadenylation sites unusually distant from a poly(A) signal motif. A Msc I restriction fragment length polymorphism (RFLP) marker was used to map the TGFB3 locus to linkage group E7 on the East Lansing reference backcross. Linkage to the TH locus showed that the TGFB3 locus was physically located on chicken chromosome 5.

  8. Genomic structure, gene expression, and promoter analysis of human multidrug resistance-associated protein 7

    SciTech Connect

    Kao, Hsin-Hsin; Chang, Ming-Shi; Cheng, Jan-Fang; Huang, Jin-Ding

    2002-03-15

    The multidrug resistance-associated protein (MRP) subfamily transporters associated with anticancer drug efflux are attributed to the multidrug-resistance of cancer cells. The genomic organization of human multidrug resistance-associated protein 7 (MRP7) was identified. The human MRP7 gene, consisting of 22 exons and 21 introns, greatly differs from other members of the human MRP subfamily. A splicing variant of human MRP7, MRP7A, expressed in most human tissues, was also characterized. The 1.93-kb promoter region of MRP7 was isolated and shown to support luciferase activity at a level 4- to 5-fold greater than that of the SV40 promoter. Basal MRP7 gene expression was regulated by 2 regions in the 5-flanking region at 1,780 1,287 bp, and at 611 to 208 bp. In Madin-Darby canine kidney (MDCK) cells, MRP7 promoter activity was increased by 226 percent by genotoxic 2-acetylaminofluorene and 347 percent by the histone deacetylase inhibitor, trichostatin A. The protein was expressed in the membrane fraction of transfected MDCK cells.

  9. The genomic structure of the chicken ICSBP gene and its transcriptional regulation by chicken interferon.

    PubMed

    Dosch, E; Zöller, B; Redmann-Müller, I; Nanda, I; Schmid, M; Viciano-Gofferge, A; Jungwirth, C

    1998-04-14

    The chicken interferon consensus sequence binding protein (ChICSBP) gene spans over 9 kb of DNA and consists, as its murine homolog, of nine exons. The first untranslated exon was identified by 5'-RACE technology. The second exon contains the translation initiation codon. Canonical consensus splice sites are found on every exon/intron junction. The introns are generally smaller than their mammalian counterparts. The ChICSBP and ChIRF-1 genes have been mapped by fluorescence in situ hybridization to different microchromosomes. The transcription start site has been mapped by primer extension. Inspection of the DNA sequence of a genomic clone containing the first exon and the region 1700-bp upstream revealed several potential cisregulatory elements of transcription. The ChICSBP mRNA is induced by recombinant ChIFN type I and ChIFN-gamma. A palindromic IFN regulatory element (pIRE) with high sequence homology to gamma activation site (GAS) sequences was functionally required in transient transfection assays for the induction of transcription by ChIFN-gamma.

  10. Genomic structure and chromosomal localization of the human deoxycytidine kinase gene

    SciTech Connect

    Song, J.J.; Walker, S.; Gribbin, T. ); Chen, E. Univ. of North Carolina, Chapel Hill ); Johnson, E.E.; Spychala, J.; Mitchell, B.S. )

    1993-01-15

    Deoxycytidine kinase (NTP:deoxycytidine 5[prime]-phosphotransferase, EC 2.7.1.74) is an enzyme that catalyzes phosphorylation of deoxyribonucleosides and a number of nucleoside analogs that are important in antiviral and cancer chemotherapy. Deficiency of this enzyme activity is associated with resistance to these agents, whereas increased enzyme activity is associated with increased activation of such compounds to cytotoxic nucleoside triphosphate derivatives. To characterize the regulation of expression of this gene, we have isolated genomic clones encompassing its entire coding and 5[prime] flanking regions and delinated all the exon/intron boundaries. The gene extends over more than 34 kilobases on chromosome 4 and the coding region is composed of 7 exons ranging in size from 90 to 1544 base pairs (bp). The 5[prime] flanking region is highly G+C-rich and contains four regions that are potential Sp1 binding sites. A 697-bp fragment encompassing 386 bp of 5[prime] upstream region, the 250-bp first exon, and 61 bp of the first intron was demonstrated to promote chloramphenicol acetyltransferase activity in a T-lymphoblast cell line and to have >6-fold greater activity in a Jurkat T-lymphoblast than in a Raji B-lymphoblast cell line. Our data suggest that these 5[prime] sequences may contain elements that are important for the tissue-specific differences in deoxycytidine kinase expression. 32 refs., 4 figs., 2 tabs.

  11. Evolutionary genomics: transdomain gene transfers.

    PubMed

    Bordenstein, Seth R

    2007-11-06

    Biologists have until now conceded that bacterial gene transfer to multicellular animals is relatively uncommon in Nature. A new study showing promiscuous insertions of bacterial endosymbiont genes into invertebrate genomes ushers in a shift in this paradigm.

  12. Genomic structure and marker-derived gene networks for growth and meat quality traits of Brazilian Nelore beef cattle.

    PubMed

    Mudadu, Maurício A; Porto-Neto, Laercio R; Mokry, Fabiana B; Tizioto, Polyana C; Oliveira, Priscila S N; Tullio, Rymer R; Nassu, Renata T; Niciura, Simone C M; Tholon, Patrícia; Alencar, Maurício M; Higa, Roberto H; Rosa, Antônio N; Feijó, Gélson L D; Ferraz, André L J; Silva, Luiz O C; Medeiros, Sérgio R; Lanna, Dante P; Nascimento, Michele L; Chaves, Amália S; Souza, Andrea R D L; Packer, Irineu U; Torres, Roberto A A; Siqueira, Fabiane; Mourão, Gerson B; Coutinho, Luiz L; Reverter, Antonio; Regitano, Luciana C A

    2016-03-15

    Nelore is the major beef cattle breed in Brazil with more than 130 million heads. Genome-wide association studies (GWAS) are often used to associate markers and genomic regions to growth and meat quality traits that can be used to assist selection programs. An alternative methodology to traditional GWAS that involves the construction of gene network interactions, derived from results of several GWAS is the AWM (Association Weight Matrices)/PCIT (Partial Correlation and Information Theory). With the aim of evaluating the genetic architecture of Brazilian Nelore cattle, we used high-density SNP genotyping data (~770,000 SNP) from 780 Nelore animals comprising 34 half-sibling families derived from highly disseminated and unrelated sires from across Brazil. The AWM/PCIT methodology was employed to evaluate the genes that participate in a series of eight phenotypes related to growth and meat quality obtained from this Nelore sample. Our results indicate a lack of structuring between the individuals studied since principal component analyses were not able to differentiate families by its sires or by its ancestral lineages. The application of the AWM/PCIT methodology revealed a trio of transcription factors (comprising VDR, LHX9 and ZEB1) which in combination connected 66 genes through 359 edges and whose biological functions were inspected, some revealing to participate in biological growth processes in literature searches. The diversity of the Nelore sample studied is not high enough to differentiate among families neither by sires nor by using the available ancestral lineage information. The gene networks constructed from the AWM/PCIT methodology were a useful alternative in characterizing genes and gene networks that were allegedly influential in growth and meat quality traits in Nelore cattle.

  13. A high-resolution reference genetic map positioning 8.8 K genes for the conifer white spruce: structural genomics implications and correspondence with physical distance.

    PubMed

    Pavy, Nathalie; Lamothe, Manuel; Pelgas, Betty; Gagnon, France; Birol, Inanç; Bohlmann, Joerg; Mackay, John; Isabel, Nathalie; Bousquet, Jean

    2017-04-01

    Over the last decade, extensive genetic and genomic resources have been developed for the conifer white spruce (Picea glauca, Pinaceae), which has one of the largest plant genomes (20 Gbp). Draft genome sequences of white spruce and other conifers have recently been produced, but dense genetic maps are needed to comprehend genome macrostructure, delineate regions involved in quantitative traits, complement functional genomic investigations, and assist the assembly of fragmented genomic sequences. A greatly expanded P. glauca composite linkage map was generated from a set of 1976 full-sib progeny, with the positioning of 8793 expressed genes. Regions with significant low or high gene density were identified. Gene family members tended to be mapped on the same chromosomes, with tandemly arrayed genes significantly biased towards specific functional classes. The map was integrated with transcriptome data surveyed across eight tissues. In total, 69 clusters of co-expressed and co-localising genes were identified. A high level of synteny was found with pine genetic maps, which should facilitate the transfer of structural information in the Pinaceae. Although the current white spruce genome sequence remains highly fragmented, dozens of scaffolds encompassing more than one mapped gene were identified. From these, the relationship between genetic and physical distances was examined and the genome-wide recombination rate was found to be much smaller than most estimates reported for angiosperm genomes. This gene linkage map shall assist the large-scale assembly of the next-generation white spruce genome sequence and provide a reference resource for the conifer genomics community.

  14. Reannotation and extended community resources for the genome of the non-seed plant Physcomitrella patens provide insights into the evolution of plant gene structures and functions

    PubMed Central

    2013-01-01

    Background The moss Physcomitrella patens as a model species provides an important reference for early-diverging lineages of plants and the release of the genome in 2008 opened the doors to genome-wide studies. The usability of a reference genome greatly depends on the quality of the annotation and the availability of centralized community resources. Therefore, in the light of accumulating evidence for missing genes, fragmentary gene structures, false annotations and a low rate of functional annotations on the original release, we decided to improve the moss genome annotation. Results Here, we report the complete moss genome re-annotation (designated V1.6) incorporating the increased transcript availability from a multitude of developmental stages and tissue types. We demonstrate the utility of the improved P. patens genome annotation for comparative genomics and new extensions to the cosmoss.org resource as a central repository for this plant “flagship” genome. The structural annotation of 32,275 protein-coding genes results in 8387 additional loci including 1456 loci with known protein domains or homologs in Plantae. This is the first release to include information on transcript isoforms, suggesting alternative splicing events for at least 10.8% of the loci. Furthermore, this release now also provides information on non-protein-coding loci. Functional annotations were improved regarding quality and coverage, resulting in 58% annotated loci (previously: 41%) that comprise also 7200 additional loci with GO annotations. Access and manual curation of the functional and structural genome annotation is provided via the http://www.cosmoss.org model organism database. Conclusions Comparative analysis of gene structure evolution along the green plant lineage provides novel insights, such as a comparatively high number of loci with 5’-UTR introns in the moss. Comparative analysis of functional annotations reveals expansions of moss house-keeping and metabolic genes

  15. Alternative splicing and genomic structure of the Wilms tumor gene WT1

    SciTech Connect

    Haber, D.A. Massachusetts General Hospital Cancer Center, Charlestown ); Sohn, R.L.; Buckler, A.J.; Pelletier, J.; Call, K.M.; Housman, D.E. )

    1991-11-01

    The chromosome 11p13 Wilms tumor susceptibility gene WT1 appears to play a crucial role in regulating the proliferation and differentiation of nephroblasts and gonadal tissue. The WT1 gene consists of 10 exons, encoding a complex pattern of mRNA species: four distinct transcripts are expressed, reflecting the presence or absence of two alternative splices. Splice I consists of a separate exon, encoding 17 amino acids, which is inserted between the proline-rich amino terminus and the zinc finger domains. Splice II arises from the use of an alternative 5{prime} splice junction and results in the insertion of 3 amino acids between zinc fingers 3 and 4. RNase protection analysis demonstrates that the most prevalent splice variant in both human and mouse is that which contains both alternative splices, whereas the least common is the transcript missing both splices. The relative distribution of splice variants is highly conserved between normal fetal kidney tissue and Wilms tumors that have intact WT1 transcripts. The ratio of these different WT1 mRNA species is also maintained as a function of development in the mouse kidney and in various mouse tissues expressing WT1. The conservation in structure and relative levels of each of the four WT1 mRNA species suggest that each encoded polypeptide makes a significant contribution to normal gene function. The control of cellular proliferation and differentiation exerted by the WT1 gene products may involve interactions between four polypeptides with distinct targets and functions.

  16. Chloroplast Genome Sequence of the Moss Tortula ruralis: Gene Content and Structural Arrangement Relative to Other Green Plant Chloroplast Genomes

    USDA-ARS?s Scientific Manuscript database

    Tortula ruralis, a widely distributed moss species in the family Pottiaceae, is increasingly being used as a model organism for the study of desiccation tolerance and mechanisms of cellular repair. In this paper, we present the chloroplast genome sequence of Tortula ruralis, only the second publishe...

  17. Mitochondrial Genome Structure of Photosynthetic Eukaryotes.

    PubMed

    Yurina, N P; Odintsova, M S

    2016-02-01

    Current ideas of plant mitochondrial genome organization are presented. Data on the size and structural organization of mtDNA, gene content, and peculiarities are summarized. Special emphasis is given to characteristic features of the mitochondrial genomes of land plants and photosynthetic algae that distinguish them from the mitochondrial genomes of other eukaryotes. The data published before the end of 2014 are reviewed.

  18. Genomic survey, gene expression analysis and structural modeling suggest diverse roles of DNA methyltransferases in legumes.

    PubMed

    Garg, Rohini; Kumari, Romika; Tiwari, Sneha; Goyal, Shweta

    2014-01-01

    DNA methylation plays a crucial role in development through inheritable gene silencing. Plants possess three types of DNA methyltransferases (MTases), namely Methyltransferase (MET), Chromomethylase (CMT) and Domains Rearranged Methyltransferase (DRM), which maintain methylation at CG, CHG and CHH sites. DNA MTases have not been studied in legumes so far. Here, we report the identification and analysis of putative DNA MTases in five legumes, including chickpea, soybean, pigeonpea, Medicago and Lotus. MTases in legumes could be classified in known MET, CMT, DRM and DNA nucleotide methyltransferases (DNMT2) subfamilies based on their domain organization. First three MTases represent DNA MTases, whereas DNMT2 represents a transfer RNA (tRNA) MTase. Structural comparison of all the MTases in plants with known MTases in mammalian and plant systems have been reported to assign structural features in context of biological functions of these proteins. The structure analysis clearly specified regions crucial for protein-protein interactions and regions important for nucleosome binding in various domains of CMT and MET proteins. In addition, structural model of DRM suggested that circular permutation of motifs does not have any effect on overall structure of DNA methyltransferase domain. These results provide valuable insights into role of various domains in molecular recognition and should facilitate mechanistic understanding of their function in mediating specific methylation patterns. Further, the comprehensive gene expression analyses of MTases in legumes provided evidence of their role in various developmental processes throughout the plant life cycle and response to various abiotic stresses. Overall, our study will be very helpful in establishing the specific functions of DNA MTases in legumes.

  19. Genomic organization of the structural genes controlling the astaxanthin biosynthesis pathway of Xanthophyllomyces dendrorhous.

    PubMed

    Niklitschek, Mauricio; Alcaíno, Jennifer; Barahona, Salvador; Sepúlveda, Dionisia; Lozano, Carla; Carmona, Marisela; Marcoleta, Andrés; Martínez, Claudio; Lodato, Patricia; Baeza, Marcelo; Cifuentes, Víctor

    2008-01-01

    The cloning and nucleotide sequence of the genes (idi, crtE, crtYB, crtl and crtS) controlling the astaxanthin biosynthesis pathway of the wild-type ATCC 24230 strain of Xanthophyllomyces dendrorhous in their genomic and cDNA version were obtained. The idi, crtE, crtYB, crtl and crtS genes were cloned, as fragments of 10.9, 11.5, 15.8, 5.9 and 4 kb respectively. The nucleotide sequence data analysis indicates that the idi, crtE, crtYB, crtl and crtS genes have 4, 8,4, 11, and 17 introns and 5, 9, 5, 12 and 18 exons respectively. In addition, a highly efficient site-directed mutagenesis system was developed by transformation by integration, followed by mitotic recombination (the double recombinant method). Heterozygote idi (idi+/idi-::hph), crtE (crtE+/crtE-::hph), crtYB (crtYB+/crtYB-::hph), crtI (crtI+/crtI-::hph) and crtS (crtS+/crtS-::hph) and homozygote mutants crtYB (crtYB-::hph/crtYB-::hph), crtI (crtI-::hph/crtI-::hph) and crtS (crtS-::hph/crtS-::hph) were constructed. All the heterozygote mutants have a pale phenotype and produce less carotenoids than the wild-type strain. The genetic analysis of the crtYB, crtl and crtS loci in the wild-type, heterozygote, and homozygote give evidence of the diploid constitution of ATCC 24230 strains. In addition, the cloning of a truncated form of the crtYB that lacks 153 amino acids of the N-terminal region derived from alternatively spliced mRNA was obtained. Their heterologous expression in Escherichia coli carrying the carotenogenic cluster of Erwinia uredovora result in trans-complementation and give evidence of its functionality in this bacterium, maintaining its phytoene synthase activity but not the lycopene cyclase activity.

  20. The structure of HIV-1 genomic RNA in the gp120 gene determines a recombination hot spot in vivo.

    PubMed

    Galetto, Román; Moumen, Abdeladim; Giacomoni, Véronique; Véron, Michel; Charneau, Pierre; Negroni, Matteo

    2004-08-27

    By frequently rearranging large regions of the genome, genetic recombination is a major determinant in the plasticity of the human immunodeficiency virus type I (HIV-1) population. In retroviruses, recombination mostly occurs by template switching during reverse transcription. The generation of retroviral vectors provides a means to study this process after a single cycle of infection of cells in culture. Using HIV-1-derived vectors, we present here the first characterization and estimate of the strength of a recombination hot spot in HIV-1 in vivo. In the hot spot region, located within the C2 portion of the gp120 envelope gene, the rate of recombination is up to ten times higher than in the surrounding regions. The hot region corresponds to a previously identified RNA hairpin structure. Although recombination breakpoints in vivo cluster in the top portion of the hairpin, the bias for template switching in this same region appears less marked in a cell-free system. By modulating the stability of this hairpin we were able to affect the local recombination rate both in vitro and in infected cells, indicating that the local folding of the genomic RNA is a major parameter in the recombination process. This characterization of reverse transcription products generated after a single cycle of infection provides insights in the understanding of the mechanism of recombination in vivo and suggests that specific regions of the genome might be prompted to yield different rates of evolution due to the presence of circumscribed recombination hot spots.

  1. Comparative analysis of syntenic genes in grass genomes reveals accelerated rates of gene structure and coding sequence evolution in polyploid wheat

    USDA-ARS?s Scientific Manuscript database

    Cycles of whole genome duplication (WGD) and diploidization are hallmarks of eukaryotic genome evolution and speciation. Polyploid wheat (Triticum aestivum) has had a massive increase in genome size largely due to recent WGDs. How these processes may impact the dynamics of gene evolution was studied...

  2. Gene functionalities and genome structure in Bathycoccus prasinos reflect cellular specializations at the base of the green lineage

    PubMed Central

    2012-01-01

    Background Bathycoccus prasinos is an extremely small cosmopolitan marine green alga whose cells are covered with intricate spider's web patterned scales that develop within the Golgi cisternae before their transport to the cell surface. The objective of this work is to sequence and analyze its genome, and to present a comparative analysis with other known genomes of the green lineage. Research Its small genome of 15 Mb consists of 19 chromosomes and lacks transposons. Although 70% of all B. prasinos genes share similarities with other Viridiplantae genes, up to 428 genes were probably acquired by horizontal gene transfer, mainly from other eukaryotes. Two chromosomes, one big and one small, are atypical, an unusual synapomorphic feature within the Mamiellales. Genes on these atypical outlier chromosomes show lower GC content and a significant fraction of putative horizontal gene transfer genes. Whereas the small outlier chromosome lacks colinearity with other Mamiellales and contains many unknown genes without homologs in other species, the big outlier shows a higher intron content, increased expression levels and a unique clustering pattern of housekeeping functionalities. Four gene families are highly expanded in B. prasinos, including sialyltransferases, sialidases, ankyrin repeats and zinc ion-binding genes, and we hypothesize that these genes are associated with the process of scale biogenesis. Conclusion The minimal genomes of the Mamiellophyceae provide a baseline for evolutionary and functional analyses of metabolic processes in green plants. PMID:22925495

  3. Genome-Wide Analysis Reveals Novel Genes Influencing Temporal Lobe Structure with Relevance to Neurodegeneration in Alzheimer’s Disease

    PubMed Central

    Stein, Jason L.; Hua, Xue; Morra, Jonathan H.; Lee, Suh; Hibar, Derrek P.; Ho, April J.; Leow, Alex D.; Toga, Arthur W.; Sul, Jae Hoon; Kang, Hyun Min; Eskin, Eleazar; Saykin, Andrew J.; Shen, Li; Foroud, Tatiana; Pankratz, Nathan; Huentelman, Matthew J.; Craig, David W.; Gerber, Jill D.; Allen, April N.; Corneveaux, Jason J.; Stephan, Dietrich A.; Webster, Jennifer; DeChairo, Bryan M.; Potkin, Steven G.; Jack, Clifford R.; Weiner, Michael W.; Thompson, Paul M.

    2010-01-01

    In a genome-wide association study of structural brain degeneration, we mapped the 3D profile of temporal lobe volume differences in 742 brain MRI scans of Alzheimer’s disease patients, mildly impaired, and healthy elderly subjects. After searching 546,314 genomic markers, 2 single nucleotide polymorphisms (SNPs) were associated with bilateral temporal lobe volume (P < 5×10−7). One SNP, rs10845840, is located in the GRIN2B gene which encodes the N-Methyl-D-Aspartate (NMDA) glutamate receptor NR2B subunit. This protein - involved in learning and memory, and excitotoxic cell death - has age-dependent prevalence in the synapse and is already a therapeutic target in Alzheimer’s disease. Risk alleles for lower temporal lobe volume at this SNP were significantly over-represented in AD and MCI subjects versus controls (odds ratio = 1.273; P = 0.039) and were associated with the mini-mental state exam (MMSE; t = −2.114; P = 0.035) demonstrating a negative effect on global cognitive function. Voxelwise maps of genetic association of this SNP with regional brain volumes, revealed intense temporal lobe effects (FDR correction at q = 0.05; critical P = 0.0257). This study uses large-scale brain mapping for gene discovery with implications for Alzheimer’s disease. PMID:20197096

  4. Characterization of the porcine sperm adhesion molecule gene SPAM1- expression analysis, genomic structure, and chromosomal mapping.

    PubMed

    Day, A E; Quilter, C R; Sargent, C A; Mileham, A J

    2002-06-01

    Sequence analysis of cDNA products, derived from adult porcine testis mRNA, gave overlapping nucleotide sequence correlating to 1952 bp of the sperm adhesion molecule 1 (SPAM1) gene. This sequence was shown to be homologous to SPAM1 genes known in other mammalian species and contained an open reading frame encoding a 493-amino acid protein. Fluorescence in situ hybridization (FISH), using a bacterial artificial chromosome (BAC) clone from the PigE BAC library, was used to map SPAM1 to chromosome 18 of the pig. This finding is consistent with comparative mapping experiments performed between pig and human chromosomes. Polymerase chain reaction (PCR) analysis of genomic DNA has shown that the 1952 bp of cDNA sequence spans approximately 9 kb of genomic DNA and comprises of at least four exons, with its size and structure being relatively conserved between mouse, human and pig. Reverse transcriptase (RT)-PCR analysis of mRNA from nine porcine tissues has also suggested that expression of SPAM1 is limited to the testis.

  5. Clustering of gene ontology terms in genomes.

    PubMed

    Tiirikka, Timo; Siermala, Markku; Vihinen, Mauno

    2014-10-25

    Although protein coding genes occupy only a small fraction of genomes in higher species, they are not randomly distributed within or between chromosomes. Clustering of genes with related function(s) and/or characteristics has been evident at several different levels. To study how common the clustering of functionally related genes is and what kind of functions the end products of these genes are involved, we collected gene ontology (GO) terms for complete genomes and developed a method to detect previously undefined gene clustering. Exhaustive analysis was performed for seven widely studied species ranging from human to Escherichia coli. To overcome problems related to varying gene lengths and densities, a novel method was developed and a fixed number of genes were analyzed irrespective of the genome span covered. Statistically very significant GO term clustering was apparent in all the investigated genomes. The analysis window, which ranged from 5 to 50 consecutive genes, revealed extensive GO term clusters for genes with widely varying functions. Here, the most interesting and significant results are discussed and the complete dataset for each analyzed species is available at the GOme database at http://bioinf.uta.fi/GOme. The results indicated that clusters of genes with related functions are very common, not only in bacteria, in which operons are frequent, but also in all the studied species irrespective of how complex they are. There are some differences between species but in all of them GO term clusters are common and of widely differing sizes. The presented method can be applied to analyze any genome or part of a genome for which descriptive features are available, and thus is not restricted to ontology terms. This method can also be applied to investigate gene and protein expression patterns. The results pave a way for further studies of mechanisms that shape genome structure and evolutionary forces related to them. Copyright © 2014 Elsevier B.V. All

  6. Genome position and gene amplification.

    PubMed

    Gajduskova, Pavla; Snijders, Antoine M; Kwek, Serena; Roydasgupta, Ritu; Fridlyand, Jane; Tokuyasu, Taku; Pinkel, Daniel; Albertson, Donna G

    2007-01-01

    Amplifications, regions of focal high-level copy number change, lead to overexpression of oncogenes or drug resistance genes in tumors. Their presence is often associated with poor prognosis; however, the use of amplification as a mechanism for overexpression of a particular gene in tumors varies. To investigate the influence of genome position on propensity to amplify, we integrated a mutant form of the gene encoding dihydrofolate reductase into different positions in the human genome, challenged cells with methotrexate and then studied the genomic alterations arising in drug resistant cells. We observed site-specific differences in methotrexate sensitivity, amplicon organization and amplification frequency. One site was uniquely associated with a significantly enhanced propensity to amplify and recurrent amplicon boundaries, possibly implicating a rare folate-sensitive fragile site in initiating amplification. Hierarchical clustering of gene expression patterns and subsequent gene enrichment analysis revealed two clusters differing significantly in expression of MYC target genes independent of integration site. These studies suggest that genome context together with the particular challenges to genome stability experienced during the progression to cancer contribute to the propensity to amplify a specific oncogene or drug resistance gene, whereas the overall functional response to drug (or other) challenge may be independent of the genomic location of an oncogene.

  7. Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite gene clusters.

    PubMed

    Dallery, Jean-Félix; Lapalu, Nicolas; Zampounis, Antonios; Pigné, Sandrine; Luyten, Isabelle; Amselem, Joëlle; Wittenberg, Alexander H J; Zhou, Shiguo; de Queiroz, Marisa V; Robin, Guillaume P; Auger, Annie; Hainaut, Matthieu; Henrissat, Bernard; Kim, Ki-Tae; Lee, Yong-Hwan; Lespinet, Olivier; Schwartz, David C; Thon, Michael R; O'Connell, Richard J

    2017-08-29

    The ascomycete fungus Colletotrichum higginsianum causes anthracnose disease of brassica crops and the model plant Arabidopsis thaliana. Previous versions of the genome sequence were highly fragmented, causing errors in the prediction of protein-coding genes and preventing the analysis of repetitive sequences and genome architecture. Here, we re-sequenced the genome using single-molecule real-time (SMRT) sequencing technology and, in combination with optical map data, this provided a gapless assembly of all twelve chromosomes except for the ribosomal DNA repeat cluster on chromosome 7. The more accurate gene annotation made possible by this new assembly revealed a large repertoire of secondary metabolism (SM) key genes (89) and putative biosynthetic pathways (77 SM gene clusters). The two mini-chromosomes differed from the ten core chromosomes in being repeat- and AT-rich and gene-poor but were significantly enriched with genes encoding putative secreted effector proteins. Transposable elements (TEs) were found to occupy 7% of the genome by length. Certain TE families showed a statistically significant association with effector genes and SM cluster genes and were transcriptionally active at particular stages of fungal development. All 24 subtelomeres were found to contain one of three highly-conserved repeat elements which, by providing sites for homologous recombination, were probably instrumental in four segmental duplications. The gapless genome of C. higginsianum provides access to repeat-rich regions that were previously poorly assembled, notably the mini-chromosomes and subtelomeres, and allowed prediction of the complete SM gene repertoire. It also provides insights into the potential role of TEs in gene and genome evolution and host adaptation in this asexual pathogen.

  8. Short interspersed nuclear elements (SINEs) are abundant in Solanaceae and have a family-specific impact on gene structure and genome organization.

    PubMed

    Seibt, Kathrin M; Wenke, Torsten; Muders, Katja; Truberg, Bernd; Schmidt, Thomas

    2016-05-01

    Short interspersed nuclear elements (SINEs) are highly abundant non-autonomous retrotransposons that are widespread in plants. They are short in size, non-coding, show high sequence diversity, and are therefore mostly not or not correctly annotated in plant genome sequences. Hence, comparative studies on genomic SINE populations are rare. To explore the structural organization and impact of SINEs, we comparatively investigated the genome sequences of the Solanaceae species potato (Solanum tuberosum), tomato (Solanum lycopersicum), wild tomato (Solanum pennellii), and two pepper cultivars (Capsicum annuum). Based on 8.5 Gbp sequence data, we annotated 82 983 SINE copies belonging to 10 families and subfamilies on a base pair level. Solanaceae SINEs are dispersed over all chromosomes with enrichments in distal regions. Depending on the genome assemblies and gene predictions, 30% of all SINE copies are associated with genes, particularly frequent in introns and untranslated regions (UTRs). The close association with genes is family specific. More than 10% of all genes annotated in the Solanaceae species investigated contain at least one SINE insertion, and we found genes harbouring up to 16 SINE copies. We demonstrate the involvement of SINEs in gene and genome evolution including the donation of splice sites, start and stop codons and exons to genes, enlargement of introns and UTRs, generation of tandem-like duplications and transduction of adjacent sequence regions. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  9. Chloroplast genome sequence of the moss Tortula ruralis: gene content, polymorphism, and structural arrangement relative to other green plant chloroplast genomes

    PubMed Central

    2010-01-01

    Background Tortula ruralis, a widely distributed species in the moss family Pottiaceae, is increasingly used as a model organism for the study of desiccation tolerance and mechanisms of cellular repair. In this paper, we present the chloroplast genome sequence of T. ruralis, only the second published chloroplast genome for a moss, and the first for a vegetatively desiccation-tolerant plant. Results The Tortula chloroplast genome is ~123,500 bp, and differs in a number of ways from that of Physcomitrella patens, the first published moss chloroplast genome. For example, Tortula lacks the ~71 kb inversion found in the large single copy region of the Physcomitrella genome and other members of the Funariales. Also, the Tortula chloroplast genome lacks petN, a gene found in all known land plant plastid genomes. In addition, an unusual case of nucleotide polymorphism was discovered. Conclusions Although the chloroplast genome of Tortula ruralis differs from that of the only other sequenced moss, Physcomitrella patens, we have yet to determine the biological significance of the differences. The polymorphisms we have uncovered in the sequencing of the genome offer a rare possibility (for mosses) of the generation of DNA markers for fine-level phylogenetic studies, or to investigate individual variation within populations. PMID:20187961

  10. Genome-wide Analyses of the Structural Gene Families Involved in the Legume-specific 5-Deoxyisoflavonoid Biosynthesis of Lotus japonicus

    PubMed Central

    Shimada, Norimoto; Sato, Shusei; Akashi, Tomoyoshi; Nakamura, Yasukazu; Tabata, Satoshi; Ayabe, Shin-ichi; Aoki, Toshio

    2007-01-01

    Abstract A model legume Lotus japonicus (Regel) K. Larsen is one of the subjects of genome sequencing and functional genomics programs. In the course of targeted approaches to the legume genomics, we analyzed the genes encoding enzymes involved in the biosynthesis of the legume-specific 5-deoxyisoflavonoid of L. japonicus, which produces isoflavan phytoalexins on elicitor treatment. The paralogous biosynthetic genes were assigned as comprehensively as possible by biochemical experiments, similarity searches, comparison of the gene structures, and phylogenetic analyses. Among the 10 biosynthetic genes investigated, six comprise multigene families, and in many cases they form gene clusters in the chromosomes. Semi-quantitative reverse transcriptase–PCR analyses showed coordinate up-regulation of most of the genes during phytoalexin induction and complex accumulation patterns of the transcripts in different organs. Some paralogous genes exhibited similar expression specificities, suggesting their genetic redundancy. The molecular evolution of the biosynthetic genes is discussed. The results presented here provide reliable annotations of the genes and genetic markers for comparative and functional genomics of leguminous plants. PMID:17452423

  11. Genomic organization of the ATM gene

    SciTech Connect

    Uziel, T.; Savitsky, K.; Platzer, M.; Rosenthal, A.

    1996-04-15

    The ATM gene was recently identified and found to be responsible for the human genetic disorder ataxia-telangiectasia. The major ATM transcript is 13 kb. Using long-distance PCR, we determined the genomic structure of this gene and identified all of its exon-intron boundaries. The ATM gene spans approximately 150 kb of genomic DNA and consists of 66 exons. The initiation codon falls within exon 4. The last exon is 3.8 kb and contains the stop codon and a 3{prime}-untranslated region of about 3600 nucleotides. 19 refs., 2 figs., 1 tab.

  12. Complete structure, genomic organization, and expression of channel catfish (Ictalurus punctatus, Rafinesque 1818) matrix metalloproteinase-9 gene.

    PubMed

    Yeh, Hung-Yueh; Klesius, Phillip H

    2008-03-01

    In this study, the channel catfish (CC) matrix metalloproteinase-9 (MMP-9) gene was cloned, sequenced, and characterized at both the cDNA and the genomic DNA levels. The complete sequence of the CC MMP-9 cDNA consisted of 2,551 nucleotides, including one open reading frame and 5'- and 3'-end untranslated regions. The open reading frame potentially encoded a 686-amino-acid peptide with a calculated molecular mass (without glycosylation) of approximately 77.4 kDa, which included a signal peptide and potentially heavy O-glycosylation sites. CC MMP-9 did not have the tripeptide Arg-Gly-Asp motif. The degree of conservation of the CC MMP-9 amino acid sequence to human and mouse counterparts was 55%, while to those of other fish species was 67-74%. The full-length CC MMP-9 genomic DNA comprised 5,663 nucleotides, much shorter than human or mouse counterparts. The exon-intron structure followed the splice acceptor/donor consensus rule, and the sequence contained 13 exons. The MMP-9 transcript was constitutively expressed in restrictive CC tissues. This result should provide fundamental information for further exploration of the role of MMP-9 in fish pathophysiology.

  13. The walnut (Juglans regia) genome sequence reveals diversity in genes coding for the biosynthesis of non-structural polyphenols.

    PubMed

    Martínez-García, Pedro J; Crepeau, Marc W; Puiu, Daniela; Gonzalez-Ibeas, Daniel; Whalen, Jeanne; Stevens, Kristian A; Paul, Robin; Butterfield, Timothy S; Britton, Monica T; Reagan, Russell L; Chakraborty, Sandeep; Walawage, Sriema L; Vasquez-Gross, Hans A; Cardeno, Charis; Famula, Randi A; Pratt, Kevin; Kuruganti, Sowmya; Aradhya, Mallikarjuna K; Leslie, Charles A; Dandekar, Abhaya M; Salzberg, Steven L; Wegrzyn, Jill L; Langley, Charles H; Neale, David B

    2016-09-01

    The Persian walnut (Juglans regia L.), a diploid species native to the mountainous regions of Central Asia, is the major walnut species cultivated for nut production and is one of the most widespread tree nut species in the world. The high nutritional value of J. regia nuts is associated with a rich array of polyphenolic compounds, whose complete biosynthetic pathways are still unknown. A J. regia genome sequence was obtained from the cultivar 'Chandler' to discover target genes and additional unknown genes. The 667-Mbp genome was assembled using two different methods (SOAPdenovo2 and MaSuRCA), with an N50 scaffold size of 464 955 bp (based on a genome size of 606 Mbp), 221 640 contigs and a GC content of 37%. Annotation with MAKER-P and other genomic resources yielded 32 498 gene models. Previous studies in walnut relying on tissue-specific methods have only identified a single polyphenol oxidase (PPO) gene (JrPPO1). Enabled by the J. regia genome sequence, a second homolog of PPO (JrPPO2) was discovered. In addition, about 130 genes in the large gallate 1-β-glucosyltransferase (GGT) superfamily were detected. Specifically, two genes, JrGGT1 and JrGGT2, were significantly homologous to the GGT from Quercus robur (QrGGT), which is involved in the synthesis of 1-O-galloyl-β-d-glucose, a precursor for the synthesis of hydrolysable tannins. The reference genome for J. regia provides meaningful insight into the complex pathways required for the synthesis of polyphenols. The walnut genome sequence provides important tools and methods to accelerate breeding and to facilitate the genetic dissection of complex traits.

  14. Dynamic structures in phytoplasma genomes: sequence variable mosaics (SVMs) of clustered genes

    USDA-ARS?s Scientific Manuscript database

    Emergence of the phytoplasma clade from an Acholeplasma-like ancestor gave rise to an intriguing group of cell wall-less prokaryotes through a remarkable and continuing evolutionary process. In a ceaseless progression, phytoplasmas have evolved reduced genomes, losing biochemical pathways for synth...

  15. Genomics screens for metastasis genes

    PubMed Central

    Yan, Jinchun; Huang, Qihong

    2014-01-01

    Metastasis is responsible for most cancer mortality. The process of metastasis is complex, requiring the coordinated expression and fine regulation of many genes in multiple pathways in both the tumor and host tissues. Identification and characterization of the genetic programs that regulate metastasis is critical to understanding the metastatic process and discovering molecular targets for the prevention and treatment of metastasis. Genomic approaches and functional genomic analyses can systemically discover metastasis genes. In this review, we summarize the genetic tools and methods that have been used to identify and characterize the genes that play critical roles in metastasis. PMID:22684367

  16. Genomic structure and promoter functional analysis of GnRH3 gene in large yellow croaker (Larimichthys crocea).

    PubMed

    Huang, Wei; Zhang, Jianshe; Liao, Zhi; Lv, Zhenming; Wu, Huifei; Zhu, Aiyi; Wu, Changwen

    2016-01-15

    Gonadotropin-releasing hormone III (GnRH3) is considered to be a key neurohormone in fish reproduction control. In the present study, the cDNA and genomic sequences of GnRH3 were cloned and characterized from large yellow croaker Larimichthys crocea. The cDNA encoded a protein of 99 amino acids with four functional motifs. The full-length genome sequence was composed of 3797 nucleotides, including four exons and three introns. Higher identities of amino acid sequences and conserved exon-intron organizations were found between LcGnRH3 and other GnRH3 genes. In addition, some special features of the sequences were detected in partial species. For example, two specific residues (V and A) were found in the family Sciaenidae, and the unique 75-72 bp type of the open reading frame 2 and 3 existed in the family Cyprinidae. Analysis of the 2576 bp promoter fragment of LcGnRH3 showed a number of transcription factor binding sites, such as AP1, CREB, GATA-1, HSF, FOXA2, and FOXL1. Promoter functional analysis using an EGFP reporter fusion in zebrafish larvae presented positive signals in the brain, including the olfactory region, the terminal nerve ganglion, the telencephalon, and the hypothalamus. The expression pattern was generally consistent with the endogenous GnRH3 GFP-expressing transgenic zebrafish lines, but the details were different. These results indicate that the structure and function of LcGnRH3 are generally similar to the other teleost GnRH3 genes, but there exist some distinctions among them.

  17. Genomic structure of PIR-B, the inhibitory member of the paired immunoglobulin-like receptor genes in mice.

    PubMed

    Alley, T L; Cooper, M D; Chen, M; Kubagawa, H

    1998-03-01

    The genes encoding the murine paired immunoglobulin-like receptors PIR-A and PIR-B are members of a novel gene family which encode cell-surface receptors bearing immunoreceptor tyrosine-based inhibitory motifs (ITIMs) and their non-inhibitory/activatory counterparts. PIR-A and PIR-B have highly homologous extracellular domains but distinct transmembrane and cytoplasmic regions. A charged arginine in the transmembrane region of PIR-A suggests its potential association with other transmembrane proteins to form a signal transducing unit. PIR-B, in contrast, has an uncharged transmembrane region and several ITIMs in its cytoplasmic tail. These characteristics suggest that PIR-A and PIR-B which are coordinately expressed by B cells and myeloid cells, serve counter-regulatory roles in humoral and inflammatory responses. In the present study we have determined the genomic structure of the single copy PIR-B gene. The gene consists of 15 exons and spans approximately 8 kilobases. The first exon contains the 5' untranslated region, the ATG translation start site, and approximately half of the leader peptide sequence. The remainder of the leader peptide sequence is encoded by exon 2. Exons 3-8 encode the six extracellular immunoglobulin-like domains and exons 9 and 10 code for the extracellular membrane proximal and transmembrane regions. The final five exons (exons 11-15) encode for the ITIM-bearing cytoplasmic tail and the 3' untranslated region. The intron/exon boundaries of PIR-B obey the GT-AG rule and are in phase I, with the notable exception of the three boundaries determined for ITIM-containing exons. A microsatellite composed of the trinucleotide repeat AAG in the intron between exons 9 and 10 provides a useful marker for studying population genetics.

  18. Genomic structure of the human beta-PIX gene and its alteration in gastric cancer.

    PubMed

    Li, Zhong you; Wang, You jie; Song, Jian ping; Kataoka, Hideki; Yoshii, Shigeto; Gao, Chang ming; Wang, Ya ping; Zhou, Jian nong; Ota, Satoshi; Tanaka, Masamitsu; Sugimura, Haruhiko

    2002-03-28

    beta-PIX, a newly identified p21-activated kinase (PAK)-interacting exchange factors (PIX), encodes a guanine nucleotide exchange factor for Rho guanosine triphosphatases. Characterization of beta-PIX gene was performed using the BAC Library method. The beta-PIX gene has 17 exons and an A/T polymorphism at the 32nd base upstream of the intron/exon junction of exon 7. The frequencies of genotypes A/T, A/A and T/T were 23.6% (13/55), 72.7% (40/55) and 3.6% (2/55), respectively; these frequencies are in Hardy-Weinberg equilibrium. Two out of 14 informative tumors (14.3%) were shown to have lost their heterozygosity at this locus, but no mutations in the remaining alleles were detected. In addition, we examined the gene-expression profile in another set of 30 gastric samples, but no significant over-expression of either the beta-PIX gene or the alpha-PIX gene was found. Though the beta-PIX gene has been speculated to potentially have tumor-related biological characteristics, the findings of the present study suggest that the involvement of beta-PIX gene in human gastric carcinogenesis is minimal.

  19. Population structure and comparative genome hybridization of European flor yeast reveal a unique group of Saccharomyces cerevisiae strains with few gene duplications in their genome.

    PubMed

    Legras, Jean-Luc; Erny, Claude; Charpentier, Claudine

    2014-01-01

    Wine biological aging is a wine making process used to produce specific beverages in several countries in Europe, including Spain, Italy, France, and Hungary. This process involves the formation of a velum at the surface of the wine. Here, we present the first large scale comparison of all European flor strains involved in this process. We inferred the population structure of these European flor strains from their microsatellite genotype diversity and analyzed their ploidy. We show that almost all of these flor strains belong to the same cluster and are diploid, except for a few Spanish strains. Comparison of the array hybridization profile of six flor strains originating from these four countries, with that of three wine strains did not reveal any large segmental amplification. Nonetheless, some genes, including YKL221W/MCH2 and YKL222C, were amplified in the genome of four out of six flor strains. Finally, we correlated ICR1 ncRNA and FLO11 polymorphisms with flor yeast population structure, and associate the presence of wild type ICR1 and a long Flo11p with thin velum formation in a cluster of Jura strains. These results provide new insight into the diversity of flor yeast and show that combinations of different adaptive changes can lead to an increase of hydrophobicity and affect velum formation.

  20. Population Structure and Comparative Genome Hybridization of European Flor Yeast Reveal a Unique Group of Saccharomyces cerevisiae Strains with Few Gene Duplications in Their Genome

    PubMed Central

    Legras, Jean-Luc; Erny, Claude; Charpentier, Claudine

    2014-01-01

    Wine biological aging is a wine making process used to produce specific beverages in several countries in Europe, including Spain, Italy, France, and Hungary. This process involves the formation of a velum at the surface of the wine. Here, we present the first large scale comparison of all European flor strains involved in this process. We inferred the population structure of these European flor strains from their microsatellite genotype diversity and analyzed their ploidy. We show that almost all of these flor strains belong to the same cluster and are diploid, except for a few Spanish strains. Comparison of the array hybridization profile of six flor strains originating from these four countries, with that of three wine strains did not reveal any large segmental amplification. Nonetheless, some genes, including YKL221W/MCH2 and YKL222C, were amplified in the genome of four out of six flor strains. Finally, we correlated ICR1 ncRNA and FLO11 polymorphisms with flor yeast population structure, and associate the presence of wild type ICR1 and a long Flo11p with thin velum formation in a cluster of Jura strains. These results provide new insight into the diversity of flor yeast and show that combinations of different adaptive changes can lead to an increase of hydrophobicity and affect velum formation. PMID:25272156

  1. Improved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangements

    PubMed Central

    Jühling, Frank; Pütz, Joern; Bernt, Matthias; Donath, Alexander; Middendorf, Martin; Florentz, Catherine; Stadler, Peter F.

    2012-01-01

    Transfer RNAs (tRNAs) are present in all types of cells as well as in organelles. tRNAs of animal mitochondria show a low level of primary sequence conservation and exhibit ‘bizarre’ secondary structures, lacking complete domains of the common cloverleaf. Such sequences are hard to detect and hence frequently missed in computational analyses and mitochondrial genome annotation. Here, we introduce an automatic annotation procedure for mitochondrial tRNA genes in Metazoa based on sequence and structural information in manually curated covariance models. The method, applied to re-annotate 1876 available metazoan mitochondrial RefSeq genomes, allows to distinguish between remaining functional genes and degrading ‘pseudogenes’, even at early stages of divergence. The subsequent analysis of a comprehensive set of mitochondrial tRNA genes gives new insights into the evolution of structures of mitochondrial tRNA sequences as well as into the mechanisms of genome rearrangements. We find frequent losses of tRNA genes concentrated in basal Metazoa, frequent independent losses of individual parts of tRNA genes, particularly in Arthropoda, and wide-spread conserved overlaps of tRNAs in opposite reading direction. Direct evidence for several recent Tandem Duplication-Random Loss events is gained, demonstrating that this mechanism has an impact on the appearance of new mitochondrial gene orders. PMID:22139921

  2. Genome-wide structural and evolutionary analysis of the P450 monooxygenase genes (P450ome) in the white rot fungus Phanerochaete chrysosporium : Evidence for gene duplications and extensive gene clustering

    PubMed Central

    Doddapaneni, Harshavardhan; Chakraborty, Ranajit; Yadav, Jagjit S

    2005-01-01

    Background Phanerochaete chrysosporium, the model white rot basidiomycetous fungus, has the extraordinary ability to mineralize (to CO2) lignin and detoxify a variety of chemical pollutants. Its cytochrome P450 monooxygenases have recently been implied in several of these biotransformations. Our initial P450 cloning efforts in P. chrysosporium and its subsequent whole genome sequencing have revealed an extraordinary P450 repertoire ("P450ome") containing at least 150 P450 genes with yet unknown function. In order to understand the functional diversity and the evolutionary mechanisms and significance of these hemeproteins, here we report a genome-wide structural and evolutionary analysis of the P450ome of this fungus. Results Our analysis showed that P. chrysosporium P450ome could be classified into 12 families and 23 sub-families and is characterized by the presence of multigene families. A genome-level structural analysis revealed 16 organizationally homogeneous and heterogeneous clusters of tandem P450 genes. Analysis of our cloned cDNAs revealed structurally conserved characteristics (intron numbers and locations, and functional domains) among members of the two representative multigene P450 families CYP63 and CYP505 (P450foxy). Considering the unusually complex structural features of the P450 genes in this genome, including microexons (2–10 aa) and frequent small introns (45–55 bp), alternative splicing, as experimentally observed for CYP63, may be a more widespread event in the P450ome of this fungus. Clan-level phylogenetic comparison revealed that P. chrysosporium P450 families fall under 11 fungal clans and the majority of these multigene families appear to have evolved locally in this genome from their respective progenitor genes, as a result of extensive gene duplications and rearrangements. Conclusion P. chrysosporium P450ome, the largest known todate among fungi, is characterized by tandem gene clusters and multigene families. This enormous P450

  3. Comparative assessment of the pig, mouse, and human genomes: A structural and functional analysis of genes involved in immunity

    USDA-ARS?s Scientific Manuscript database

    A detailed analysis was conducted on portions of the porcine, murine, and human genome associated with the immune response. It was found that non-protein coding RNA/DNA that potentially interact and regulate gene expression, nucleotide similarity, isochore type, and the similarity of 5’ and 3’ UTR ...

  4. Structure and partial genomic sequence of the human E2F1 gene.

    PubMed

    Neuman, E; Sellers, W R; McNeil, J A; Lawrence, J B; Kaelin, W G

    1996-09-16

    The E2F family of transcription factors appears to play a critical role in the transcription of certain genes required for cell cycle progression. E2F1, the first cloned member of this family, is regulated during the cell cycle at the mRNA level by changes in transcription of the E2F1 gene and at the protein level by complex formation with proteins such as the retinoblastoma gene product (pRB), cyclin A and DP1. E2F1 can override a pRB-induced G1/S block and can behave as an oncogene in certain cells. E2F1 was cloned and was found to contain seven exons. The dinucleotides at the 5' and 3' splice sites of intron 4 do not agree with consensus splice site sequences. Fluorescence in situ hybridization localized E2F1 to chromosome 20q11. Knowledge of the organization of E2F1 may facilitate identification of additional E2F family members, as well as detection of E2F1 abnormalities in human tumors.

  5. Insights into structural variations and genome rearrangements in prokaryotic genomes.

    PubMed

    Periwal, Vinita; Scaria, Vinod

    2015-01-01

    Structural variations (SVs) are genomic rearrangements that affect fairly large fragments of DNA. Most of the SVs such as inversions, deletions and translocations have been largely studied in context of genetic diseases in eukaryotes. However, recent studies demonstrate that genome rearrangements can also have profound impact on prokaryotic genomes, leading to altered cell phenotype. In contrast to single-nucleotide variations, SVs provide a much deeper insight into organization of bacterial genomes at a much better resolution. SVs can confer change in gene copy number, creation of new genes, altered gene expression and many other functional consequences. High-throughput technologies have now made it possible to explore SVs at a much refined resolution in bacterial genomes. Through this review, we aim to highlight the importance of the less explored field of SVs in prokaryotic genomes and their impact. We also discuss its potential applicability in the emerging fields of synthetic biology and genome engineering where targeted SVs could serve to create sophisticated and accurate genome editing. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Synonymous Codon Usage Bias in the Plastid Genome is Unrelated to Gene Structure and Shows Evolutionary Heterogeneity

    PubMed Central

    Qi, Yueying; Xu, Wenjing; Xing, Tian; Zhao, Mingming; Li, Nana; Yan, Li; Xia, Guangmin; Wang, Mengcheng

    2015-01-01

    Synonymous codon usage bias (SCUB) is the nonuniform usage of codons, occurring often in nearly all organisms. Our previous study found that SCUB is correlated with intron number, is unequal among exons in the plant nuclear genome, and mirrors evolutionary specialization. However, whether this rule exists in the plastid genome has not been addressed. Here, we present an analysis of SCUB in the plastid genomes of 25 species from lower to higher plants (algae, bryophytes, pteridophytes, gymnosperms, and spermatophytes). We found NNA and NNT (A- and T-ending codons) are preferential in the plastid genomes of all plants. Interestingly, this preference is heterogeneous among taxonomies of plants, with the strongest preference in bryophytes and the weakest in pteridophytes, suggesting an association between SCUB and plant evolution. In addition, SCUB frequencies are consistent among genes with varied introns and among exons, indicating that the bias of NNA and NNT is unrelated to either intron number or exon position. Further, SCUB is associated with DNA methylation–induced conversion of cytosine to thymine in the vascular plants but not in algae or bryophytes. These data demonstrate that these SCUB profiles in the plastid genome are distinctly different compared with the nuclear genome. PMID:25922569

  7. Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes.

    PubMed

    Matus, José Tomás; Aquea, Felipe; Arce-Johnson, Patricio

    2008-07-22

    The MYB superfamily constitutes the most abundant group of transcription factors described in plants. Members control processes such as epidermal cell differentiation, stomatal aperture, flavonoid synthesis, cold and drought tolerance and pathogen resistance. No genome-wide characterization of this family has been conducted in a woody species such as grapevine. In addition, previous analysis of the recently released grape genome sequence suggested expansion events of several gene families involved in wine quality. We describe and classify 108 members of the grape R2R3 MYB gene subfamily in terms of their genomic gene structures and similarity to their putative Arabidopsis thaliana orthologues. Seven gene models were derived and analyzed in terms of gene expression and their DNA binding domain structures. Despite low overall sequence homology in the C-terminus of all proteins, even in those with similar functions across Arabidopsis and Vitis, highly conserved motif sequences and exon lengths were found. The grape epidermal cell fate clade is expanded when compared with the Arabidopsis and rice MYB subfamilies. Two anthocyanin MYBA related clusters were identified in chromosomes 2 and 14, one of which includes the previously described grape colour locus. Tannin related loci were also detected with eight candidate homologues in chromosomes 4, 9 and 11. This genome wide transcription factor analysis in Vitis suggests that clade-specific grape R2R3 MYB genes are expanded while other MYB genes could be well conserved compared to Arabidopsis. MYB gene abundance, homology and orientation within particular loci also suggests that expanded MYB clades conferring quality attributes of grapes and wines, such as colour and astringency, could possess redundant, overlapping and cooperative functions.

  8. Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes

    PubMed Central

    Matus, José Tomás; Aquea, Felipe; Arce-Johnson, Patricio

    2008-01-01

    Background The MYB superfamily constitutes the most abundant group of transcription factors described in plants. Members control processes such as epidermal cell differentiation, stomatal aperture, flavonoid synthesis, cold and drought tolerance and pathogen resistance. No genome-wide characterization of this family has been conducted in a woody species such as grapevine. In addition, previous analysis of the recently released grape genome sequence suggested expansion events of several gene families involved in wine quality. Results We describe and classify 108 members of the grape R2R3 MYB gene subfamily in terms of their genomic gene structures and similarity to their putative Arabidopsis thaliana orthologues. Seven gene models were derived and analyzed in terms of gene expression and their DNA binding domain structures. Despite low overall sequence homology in the C-terminus of all proteins, even in those with similar functions across Arabidopsis and Vitis, highly conserved motif sequences and exon lengths were found. The grape epidermal cell fate clade is expanded when compared with the Arabidopsis and rice MYB subfamilies. Two anthocyanin MYBA related clusters were identified in chromosomes 2 and 14, one of which includes the previously described grape colour locus. Tannin related loci were also detected with eight candidate homologues in chromosomes 4, 9 and 11. Conclusion This genome wide transcription factor analysis in Vitis suggests that clade-specific grape R2R3 MYB genes are expanded while other MYB genes could be well conserved compared to Arabidopsis. MYB gene abundance, homology and orientation within particular loci also suggests that expanded MYB clades conferring quality attributes of grapes and wines, such as colour and astringency, could possess redundant, overlapping and cooperative functions. PMID:18647406

  9. Chromosomal localization, genomic structure, and allelic polymorphism of the human CD79a (lg-{alpha}/mb-1) gene

    SciTech Connect

    Hashimoto, S.; Gregersen, P.K.; Chiorazzi, N. |; Mohrenweiser, H.W.

    1994-12-31

    The germline DNA sequence of the human CD79a (Ig-{alpha}/mb-1) gene was determined by polymerase chain reaction sequencing of a cosmid clone derived from an arrayed human chromosome 19 library. The CD79a gene was localized to chromosome 19q13.2; this localization places the gene within the CEA-like gene cluster with the following gene order: -CEA-CGM1-CD79a-RPS11-ATP1A3-BGP-CGM9-. The genomic organization of the human CD79a gene resembles the mouse counterpart with five exons interrupted by four introns. Computer analyses suggest the presence of transcription regulatory elements known to be important in the regulation of mouse CD79a (AP-1, EBF, AP-2, MUF2, and SP-1 sites), as well as elements not found in the mouse gene (an NK-kB binding site and a series of E-box motifs). Similar to the mouse gene, the 5{prime} flanking region of human CD79a lacks a TATA box; however, unlike mouse CD79a, a classical octamer motif could not be identified in the human gene. Finally, a new Rsa I restriction fragment length polymorphism was defined in the non-coding regions of the human gene. 64 refs., 4 figs., 2 tabs.

  10. Genome Structure of the Symbiont Bifidobacterium pseudocatenulatum CECT 7765 and Gene Expression Profiling in Response to Lactulose-Derived Oligosaccharides.

    PubMed

    Benítez-Páez, Alfonso; Moreno, F Javier; Sanz, María L; Sanz, Yolanda

    2016-01-01

    Bifidobacterium pseudocatenulatum CECT 7765 was isolated from stools of a breast-fed infant. Although, this strain is generally considered an adult-type bifidobacterial species, it has also been shown to have pre-clinical efficacy in obesity models. In order to understand the molecular basis of its adaptation to complex carbohydrates and improve its potential functionality, we have analyzed its genome and transcriptome, as well as its metabolic output when growing in galacto-oligosaccharides derived from lactulose (GOS-Lu) as carbon source. B. pseudocatenulatum CECT 7765 shows strain-specific genome regions, including a great diversity of sugar metabolic-related genes. A preliminary and exploratory transcriptome analysis suggests candidate over-expression of several genes coding for sugar transporters and permeases; furthermore, five out of seven beta-galactosidases identified in the genome could be activated in response to GOS-Lu exposure. Here, we also propose that a specific gene cluster is involved in controlling the import and hydrolysis of certain di- and tri-saccharides, which seemed to be those primarily taken-up by the bifidobacterial strain. This was discerned from mass spectrometry-based quantification of different saccharide fractions of culture supernatants. Our results confirm that the expression of genes involved in sugar transport and metabolism and in the synthesis of leucine, an amino acid with a key role in glucose and energy homeostasis, was up-regulated by GOS-Lu. This was done using qPCR in addition to the exploratory information derived from the single-replicated RNAseq approach, together with the functional annotation of genes predicted to be encoded in the B. pseudocatenulatum CETC 7765 genome.

  11. Genome Structure of the Symbiont Bifidobacterium pseudocatenulatum CECT 7765 and Gene Expression Profiling in Response to Lactulose-Derived Oligosaccharides

    PubMed Central

    Benítez-Páez, Alfonso; Moreno, F. Javier; Sanz, María L.; Sanz, Yolanda

    2016-01-01

    Bifidobacterium pseudocatenulatum CECT 7765 was isolated from stools of a breast-fed infant. Although, this strain is generally considered an adult-type bifidobacterial species, it has also been shown to have pre-clinical efficacy in obesity models. In order to understand the molecular basis of its adaptation to complex carbohydrates and improve its potential functionality, we have analyzed its genome and transcriptome, as well as its metabolic output when growing in galacto-oligosaccharides derived from lactulose (GOS-Lu) as carbon source. B. pseudocatenulatum CECT 7765 shows strain-specific genome regions, including a great diversity of sugar metabolic-related genes. A preliminary and exploratory transcriptome analysis suggests candidate over-expression of several genes coding for sugar transporters and permeases; furthermore, five out of seven beta-galactosidases identified in the genome could be activated in response to GOS-Lu exposure. Here, we also propose that a specific gene cluster is involved in controlling the import and hydrolysis of certain di- and tri-saccharides, which seemed to be those primarily taken-up by the bifidobacterial strain. This was discerned from mass spectrometry-based quantification of different saccharide fractions of culture supernatants. Our results confirm that the expression of genes involved in sugar transport and metabolism and in the synthesis of leucine, an amino acid with a key role in glucose and energy homeostasis, was up-regulated by GOS-Lu. This was done using qPCR in addition to the exploratory information derived from the single-replicated RNAseq approach, together with the functional annotation of genes predicted to be encoded in the B. pseudocatenulatum CETC 7765 genome. PMID:27199952

  12. The Structural Genomics Consortium

    PubMed Central

    Jones, Molly Morgan; Castle-Clarke, Sophie; Brooker, Daniel; Nason, Edward; Huzair, Farah; Chataway, Joanna

    2014-01-01

    Abstract The Structural Genomics Consortium (SGC) supports drug discovery efforts through a unique, open access model of public-private collaboration. This study presents the results of an independent evaluation of the Structural Genomics Consortium, conducted by RAND Europe with the Institute on Governance. The evaluation aimed to establish the role of the SGC within the wider drug discovery and PPP landscape, assessing the merits of the SGC open access model relative to alternative models of funding R&D in this space, as well as the key trends and opportunities in the external environment that may impact on the future of the SGC. It also established the incentives and disincentives for investment, strengths and weaknesses of the SGC's model, and the opportunities and threats the SGC will face in the future. This enabled us to assess the most convincing arguments for funding the SGC at present; important trade-offs or limitations that should be addressed in moving towards the next funding phase; and whether funders are anticipating changes either to the SGC or the wider PPP landscape. Finally, we undertook a quantitative analysis to ascertain what judgements can be made about the SGC's past and current performance track record, before unpacking the role of the external environment and particular actors within the SGC in developing scenarios for the future. PMID:28560088

  13. Evolutionary origin of Rosaceae-specific active non-autonomous hAT elements and their contribution to gene regulation and genomic structural variation.

    PubMed

    Wang, Lu; Peng, Qian; Zhao, Jianbo; Ren, Fei; Zhou, Hui; Wang, Wei; Liao, Liao; Owiti, Albert; Jiang, Quan; Han, Yuepeng

    2016-05-01

    Transposable elements account for approximately 30 % of the Prunus genome; however, their evolutionary origin and functionality remain largely unclear. In this study, we identified a hAT transposon family, termed Moshan, in Prunus. The Moshan elements consist of three types, aMoshan, tMoshan, and mMoshan. The aMoshan and tMoshan types contain intact or truncated transposase genes, respectively, while the mMoshan type is miniature inverted-repeat transposable element (MITE). The Moshan transposons are unique to Rosaceae, and the copy numbers of different Moshan types are significantly correlated. Sequence homology analysis reveals that the mMoshan MITEs are direct deletion derivatives of the tMoshan progenitors, and one kind of mMoshan containing a MuDR-derived fragment were amplified predominately in the peach genome. The mMoshan sequences contain cis-regulatory elements that can enhance gene expression up to 100-fold. The mMoshan MITEs can serve as potential sources of micro and long noncoding RNAs. Whole-genome re-sequencing analysis indicates that mMoshan elements are highly active, and an insertion into S-haplotype-specific F-box gene was reported to cause the breakdown of self-incompatibility in sour cherry. Taken together, all these results suggest that the mMoshan elements play important roles in regulating gene expression and driving genomic structural variation in Prunus.

  14. Brief Guide to Genomics: DNA, Genes and Genomes

    MedlinePlus

    ... guía de genómica A Brief Guide to Genomics DNA, Genes and Genomes Deoxyribonucleic acid (DNA) is the ... and lead to a disease such as cancer. DNA Sequencing Sequencing simply means determining the exact order ...

  15. Weeding out the genes: the Arabidopsis genome project.

    PubMed

    Martienssen, R A

    2000-05-01

    The Arabidopsis genome sequence is scheduled for completion at the end of this year (December 2000). It will be the first higher plant genome to be sequenced, and will allow a detailed comparison with bacterial, yeast and animal genomes. Already, two of the five chromosomes have been sequenced, and we have had our first glimpse of higher eukaryotic centromeres, and the structure of heterochromatin. The implications for understanding plant gene function, genome structure and genome organization are profound. In this review, the lessons learned for future genome projects are reviewed as well as a summary of the initial findings in Arabidopsis.

  16. Heat Shock Protein 70 and 90 Genes in the Harmful Dinoflagellate Cochlodinium polykrikoides: Genomic Structures and Transcriptional Responses to Environmental Stresses

    PubMed Central

    Guo, Ruoyu; Youn, Seok Hyun; Ki, Jang-Seu

    2015-01-01

    The marine dinoflagellate Cochlodinium polykrikoides is responsible for harmful algal blooms in aquatic environments and has spread into the world's oceans. As a microeukaryote, it seems to have distinct genomic characteristics, like gene structure and regulation. In the present study, we characterized heat shock protein (HSP) 70/90 of C. polykrikoides and evaluated their transcriptional responses to environmental stresses. Both HSPs contained the conserved motif patterns, showing the highest homology with those of other dinoflagellates. Genomic analysis showed that the CpHSP70 had no intron but was encoded by tandem arrangement manner with separation of intergenic spacers. However, CpHSP90 had one intron in the coding genomic regions, and no intergenic region was found. Phylogenetic analyses of separate HSPs showed that CpHSP70 was closely related with the dinoflagellate Crypthecodinium cohnii and CpHSP90 with other Gymnodiniales in dinoflagellates. Gene expression analyses showed that both HSP genes were upregulated by the treatments of separate algicides CuSO4 and NaOCl; however, they displayed downregulation pattern with PCB treatment. The transcription of CpHSP90 and CpHSP70 showed similar expression patterns under the same toxicant treatment, suggesting that both genes might have cooperative functions for the toxicant induced gene regulation in the dinoflagellate. PMID:26064872

  17. Genomic contributions in livestock gene introgression programmes

    PubMed Central

    Wall, Eileen; Visscher, Peter M; Hospital, Frédéric; Woolliams, John A

    2005-01-01

    The composition of the genome after introgression of a marker gene from a donor to a recipient breed was studied using analytical and simulation methods. Theoretical predictions of proportional genomic contributions, including donor linkage drag, from ancestors used at each generation of crossing after an introgression programme agreed closely with simulated results. The obligate drag, the donor genome surrounding the target locus that cannot be removed by subsequent selection, was also studied. It was shown that the number of backcross generations and the length of the chromosome affected proportional genomic contributions to the carrier chromosomes. Population structure had no significant effect on ancestral contributions and linkage drag but it did have an effect on the obligate drag whereby larger offspring groups resulted in smaller obligate drag. The implications for an introgression programme of the number of backcross generations, the population structure and the carrier chromosome length are discussed. The equations derived describing contributions to the genome from individuals from a given generation provide a framework to predict the genomic composition of a population after the introgression of a favourable donor allele. These ancestral contributions can be assigned a value and therefore allow the prediction of genetic lag. PMID:15823237

  18. Gene Chips and Functional Genomics

    NASA Astrophysics Data System (ADS)

    Hamadeh, Hisham; Afshari, Cynthia

    2000-11-01

    These past few years of scientific discovery will undoubtedly be remembered as the "genomics era," the period in which biologists succeeded in enumerating the sequence of nucleotides making up all, or at least most, of human DNA. And while this achievement has been heralded as a technological feat equal to the moon landing, it is only the first of many advances in DNA technology. Scientists are now faced with the task of understanding the meaning of the DNA sequence. Specifically, they want to learn how the DNA code relates to protein function. An important tool in the study of "functional genomics," is the cDNA microarray—also known as the gene chip. Inspired by computer microchips, gene chips allow scientists to monitor the expression of hundreds, even thousands, of genes in a fraction of the time it used to take to monitor the expression of a single one. By altering the conditions under which a particular tissue expresses genes—say, by exposing it to toxins or growth factors—scientists can determine the suite of genes expressed in different situations and hence start to get a handle on the function of these genes. The authors discuss this important new technology and some of its practical applications.

  19. Horizontal gene transfer and the rock record: comparative genomics of phylogenetically distant bacteria that induce wrinkle structure formation in modern sediments.

    PubMed

    Flood, B E; Bailey, J V; Biddle, J F

    2014-03-01

    Wrinkle structures are sedimentary features that are produced primarily through the trapping and binding of siliciclastic sediments by mat-forming micro-organisms. Wrinkle structures and related sedimentary structures in the rock record are commonly interpreted to represent the stabilizing influence of cyanobacteria on sediments because cyanobacteria are known to produce similar textures and structures in modern tidal flat settings. However, other extant bacteria such as filamentous representatives of the family Beggiatoaceae can also interact with sediments to produce sedimentary features that morphologically resemble many of those associated with cyanobacteria-dominated mats. While Beggiatoa spp. and cyanobacteria are metabolically and phylogenetically distant, genomic analyses show that the two groups share hundreds of homologous genes, likely as the result of horizontal gene transfer. The comparative genomics results described here suggest that some horizontally transferred genes may code for phenotypic traits such as filament formation, chemotaxis, and the production of extracellular polymeric substances that potentially underlie the similar biostabilizing influences of these organisms on sediments. We suggest that the ecological utility of certain basic life modes such as the construction of mats and biofilms, coupled with the lateral mobility of genes in the microbial world, introduces an element of uncertainty into the inference of specific phylogenetic origins from gross morphological features preserved in the ancient rock record.

  20. Whole-genome DNA methylation patterns and complex associations with gene structure and expression during flower development in Arabidopsis.

    PubMed

    Yang, Hongxing; Chang, Fang; You, Chenjiang; Cui, Jie; Zhu, Genfeng; Wang, Lei; Zheng, Yu; Qi, Ji; Ma, Hong

    2015-01-01

    Flower development is a complex process requiring proper spatiotemporal expression of numerous genes. Accumulating evidence indicates that epigenetic mechanisms, including DNA methylation, play essential roles in modulating gene expression. However, few studies have examined the relationship between DNA methylation and floral gene expression on a genomic scale. Here we present detailed analyses of DNA methylomes at single-base resolution for three Arabidopsis floral periods: meristems, early flowers and late flowers. We detected 1.5 million methylcytosines, and estimated the methylation levels for 24 035 genes. We found that many cytosine sites were methylated de novo from the meristem to the early flower stage, and many sites were demethylated from early to late flowers. A comparison of the transcriptome data of the same three periods revealed that the methylation and demethylation processes were correlated with expression changes of >3000 genes, many of which are important for normal flower development. We also found different methylation patterns for three sequence contexts ((m) CG, (m) CHG and (m) CHH) and in different genic regions, potentially with different roles in gene expression. © 2014 The Authors The Plant Journal © 2014 John Wiley & Sons Ltd.

  1. Synaptotagmin gene content of the sequenced genomes.

    PubMed

    Craxton, Molly

    2004-07-06

    Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their great diversification in animals. Synaptotagmins occur in

  2. Genomic structure and expression analysis of the RNase kappa family ortholog gene in the insect Ceratitis capitata.

    PubMed

    Rampias, Theodoros N; Fragoulis, Emmanuel G; Sideris, Diamantis C

    2008-12-01

    Cc RNase is the founding member of the recently identified RNase kappa family, which is represented by a single ortholog in a wide range of animal taxonomic groups. Although the precise biological role of this protein is still unknown, it has been shown that the recombinant proteins isolated so far from the insect Ceratitis capitata and from human exhibit ribonucleolytic activity. In this work, we report the genomic organization and molecular evolution of the RNase kappa gene from various animal species, as well as expression analysis of the ortholog gene in C. capitata. The high degree of amino acid sequence similarity, in combination with the fact that exon sizes and intronic positions are extremely conserved among RNase kappa orthologs in 15 diverse genomes from sea anemone to human, imply a very significant biological function for this enzyme. In C. capitata, two forms of RNase kappa mRNA (0.9 and 1.5 kb) with various lengths of 3' UTR were identified as alternative products of a single gene, resulting from the use of different polyadenylation signals. Both transcripts are expressed in all insect tissues and developmental stages. Sequence analysis of the extended region of the longer transcript revealed the existence of three mRNA instability motifs (AUUUA) and five poly(U) tracts, whose functional importance in RNase kappa mRNA decay remains to be explored.

  3. Genome-wide genotyping-by-sequencing data provide a high-resolution view of wild Helianthus diversity, genetic structure, and interspecies gene flow.

    PubMed

    Baute, Gregory J; Owens, Gregory L; Bock, Dan G; Rieseberg, Loren H

    2016-12-01

    Wild sunflowers harbor considerable genetic diversity and are a major resource for improvement of the cultivated sunflower, Helianthus annuus. The Helianthus genus is also well known for its propensity for gene flow between taxa. We surveyed genomic diversity of 292 samples of wild Helianthus from 22 taxa that are cross-compatible with the cultivar using genotyping by sequencing. With these data, we derived a high-resolution phylogeny of the taxa, interrogated genome-wide levels of diversity, explored H. annuus population structure, and identified localized gene flow between H. annuus and its close relatives. Our phylogenomic analyses confirmed a number of previously established interspecific relationships and indicated for the first time that a newly described annual sunflower, H. winteri, is nested within H. annuus. Principal component analyses showed that H. annuus has geographic population structure with most notable subpopulations occurring in California and Texas. While gene flow was identified between H. annuus and H. bolanderi in California and between H. annuus and H. argophyllus in Texas, this genetic exchange does not appear to drive observed patterns of H. annuus population structure. Wild H. annuus remains an excellent resource for cultivated sunflower breeding effort because of its diversity and the ease with which it can be crossed with cultivated H. annuus. Cases of interspecific gene flow such as those documented here also indicate wild H. annuus can act as a bridge to capture alleles from other wild taxa; continued breeding efforts with it may therefore reap the largest rewards. © 2016 Botanical Society of America.

  4. The mouse p97 (CDC48) gene. Genomic structure, definition of transcriptional regulatory sequences, gene expression, and characterization of a pseudogene.

    PubMed

    Müller, J M; Meyer, H H; Ruhrberg, C; Stamp, G W; Warren, G; Shima, D T

    1999-04-09

    Here we present the first description of the genomic organization, transcriptional regulatory sequences, and adult and embryonic gene expression for the mouse p97(CDC48) AAA ATPase. Clones representing two distinct p97 genes were isolated in a genomic library screen, one of them likely representing a non-functional processed pseudogene. The coding region of the gene encoding the functional mRNA is interrupted by 16 introns and encompasses 20.4 kilobase pairs. Definition of the transcriptional initiation site and sequence analysis showed that the gene contains a TATA-less, GC-rich promoter region with an initiator element spanning the transcription start site. Cis-acting elements necessary for basal transcription activity reside within 410 base pairs of the flanking region as determined by transient transfection assays. In immunohistological analyses, p97 was widely expressed in embryos and adults, but protein levels were tightly controlled in a cell type- and cell differentiation-dependent manner. A remarkable heterogeneity in p97 immunostaining was found on a cellular level within a given tissue, and protein amounts in the cytoplasm and nucleus varied widely, suggesting a highly regulated and intermittent function for p97. This study provides the basis for a detailed analysis of the complex regulation of p97 and the reagents required for assessing its functional significance using targeted gene manipulation in the mouse.

  5. Uses of antimicrobial genes from microbial genome

    DOEpatents

    Sorek, Rotem; Rubin, Edward M.

    2013-08-20

    We describe a method for mining microbial genomes to discover antimicrobial genes and proteins having broad spectrum of activity. Also described are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.

  6. Structure, expression profile and phylogenetic inference of chalcone isomerase-like genes from the narrow-leafed lupin (Lupinus angustifolius L.) genome

    PubMed Central

    Przysiecka, Łucja; Książkiewicz, Michał; Wolko, Bogdan; Naganowska, Barbara

    2015-01-01

    Lupins, like other legumes, have a unique biosynthesis scheme of 5-deoxy-type flavonoids and isoflavonoids. A key enzyme in this pathway is chalcone isomerase (CHI), a member of CHI-fold protein family, encompassing subfamilies of CHI1, CHI2, CHI-like (CHIL), and fatty acid-binding (FAP) proteins. Here, two Lupinus angustifolius (narrow-leafed lupin) CHILs, LangCHIL1 and LangCHIL2, were identified and characterized using DNA fingerprinting, cytogenetic and linkage mapping, sequencing and expression profiling. Clones carrying CHIL sequences were assembled into two contigs. Full gene sequences were obtained from these contigs, and mapped in two L. angustifolius linkage groups by gene-specific markers. Bacterial artificial chromosome fluorescence in situ hybridization approach confirmed the localization of two LangCHIL genes in distinct chromosomes. The expression profiles of both LangCHIL isoforms were very similar. The highest level of transcription was in the roots of the third week of plant growth; thereafter, expression declined. The expression of both LangCHIL genes in leaves and stems was similar and low. Comparative mapping to reference legume genome sequences revealed strong syntenic links; however, LangCHIL2 contig had a much more conserved structure than LangCHIL1. LangCHIL2 is assumed to be an ancestor gene, whereas LangCHIL1 probably appeared as a result of duplication. As both copies are transcriptionally active, questions arise concerning their hypothetical functional divergence. Screening of the narrow-leafed lupin genome and transcriptome with CHI-fold protein sequences, followed by Bayesian inference of phylogeny and cross-genera synteny survey, identified representatives of all but one (CHI1) main subfamilies. They are as follows: two copies of CHI2, FAPa2 and CHIL, and single copies of FAPb and FAPa1. Duplicated genes are remnants of whole genome duplication which is assumed to have occurred after the divergence of Lupinus, Arachis, and Glycine

  7. Genome Structures and Halophyte-Specific Gene Expression of the Extremophile Thellungiella parvula in Comparison with Thellungiella salsuginea (Thellungiella halophila) and Arabidopsis1[W

    PubMed Central

    Oh, Dong-Ha; Dassanayake, Maheshi; Haas, Jeffrey S.; Kropornika, Anna; Wright, Chris; d’Urzo, Matilde Paino; Hong, Hyewon; Ali, Shahjahan; Hernandez, Alvaro; Lambert, Georgina M.; Inan, Gunsu; Galbraith, David W.; Bressan, Ray A.; Yun, Dae-Jin; Zhu, Jian-Kang; Cheeseman, John M.; Bohnert, Hans J.

    2010-01-01

    The genome of Thellungiella parvula, a halophytic relative of Arabidopsis (Arabidopsis thaliana), is being assembled using Roche-454 sequencing. Analyses of a 10-Mb scaffold revealed synteny with Arabidopsis, with recombination and inversion and an uneven distribution of repeat sequences. T. parvula genome structure and DNA sequences were compared with orthologous regions from Arabidopsis and publicly available bacterial artificial chromosome sequences from Thellungiella salsuginea (previously Thellungiella halophila). The three-way comparison of sequences, from one abiotic stress-sensitive species and two tolerant species, revealed extensive sequence conservation and microcolinearity, but grouping Thellungiella species separately from Arabidopsis. However, the T. parvula segments are distinguished from their T. salsuginea counterparts by a pronounced paucity of repeat sequences, resulting in a 30% shorter DNA segment with essentially the same gene content in T. parvula. Among the genes is SALT OVERLY SENSITIVE1 (SOS1), a sodium/proton antiporter, which represents an essential component of plant salinity stress tolerance. Although the SOS1 coding region is highly conserved among all three species, the promoter regions show conservation only between the two Thellungiella species. Comparative transcript analyses revealed higher levels of basal as well as salt-induced SOS1 expression in both Thellungiella species as compared with Arabidopsis. The Thellungiella species and other halophytes share conserved pyrimidine-rich 5′ untranslated region proximal regions of SOS1 that are missing in Arabidopsis. Completion of the genome structure of T. parvula is expected to highlight distinctive genetic elements underlying the extremophile lifestyle of this species. PMID:20833729

  8. Informational laws of genome structures

    PubMed Central

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-01-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined. PMID:27354155

  9. Informational laws of genome structures

    NASA Astrophysics Data System (ADS)

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-06-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.

  10. Comparative genomic analysis of sixty mycobacteriophage genomes: Genome clustering, gene acquisition and gene size

    PubMed Central

    Hatfull, Graham F.; Jacobs-Sera, Deborah; Lawrence, Jeffrey G.; Pope, Welkin H.; Russell, Daniel A.; Ko, Ching-Chung; Weber, Rebecca J.; Patel, Manisha C.; Germane, Katherine L.; Edgar, Robert H.; Hoyte, Natasha N.; Bowman, Charles A.; Tantoco, Anthony T.; Paladin, Elizabeth C.; Myers, Marlana S.; Smith, Alexis L.; Grace, Molly S.; Pham, Thuy T.; O'Brien, Matthew B.; Vogelsberger, Amy M.; Hryckowian, Andrew J.; Wynalek, Jessica L.; Donis-Keller, Helen; Bogel, Matt W.; Peebles, Craig L.; Cresawn, Steve G.; Hendrix, Roger W.

    2010-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. Expansion of a collection of sequenced phage genomes to a total of sixty – all infecting a common bacterial host – provides further insight into their diversity and evolution. Of the sixty phage genomes, 55 can be grouped into nine clusters according to their nucleotide sequence similarities, five of which can be further divided into subclusters; five genomes do not cluster with other phages. The sequence diversity between genomes within a cluster varies greatly; for example, the six genomes in cluster D share more than 97.5% average nucleotide similarity with each other. In contrast, similarity between the two genomes in Cluster I is barely detectable by diagonal plot analysis. The total of 6,858 predicted ORFs have been grouped into 1523 phamilies (phams) of related sequences, 46% of which possess only a single member. Only 18.8% of the phams have sequence similarity to non-mycobacteriophage database entries and fewer than 10% of all phams can be assigned functions based on database searching or synteny. Genome clustering facilitates the identification of genes that are in greatest genetic flux and are more likely to have been exchanged horizontally in relatively recent evolutionary time. Although mycobacteriophage genes exhibit smaller average size than genes of their host (205 residues compared to 315), phage genes in higher flux average only ∼100 amino acids, suggesting that the primary units of genetic exchange correspond to single protein domains. PMID:20064525

  11. Structural Genomics: Correlation Blocks, Population Structure, and Genome Architecture

    PubMed Central

    Hu, Xin-Sheng; Yeh, Francis C.; Wang, Zhiquan

    2011-01-01

    An integration of the pattern of genome-wide inter-site associations with evolutionary forces is important for gaining insights into the genomic evolution in natural or artificial populations. Here, we assess the inter-site correlation blocks and their distributions along chromosomes. A correlation block is broadly termed as the DNA segment within which strong correlations exist between genetic diversities at any two sites. We bring together the population genetic structure and the genomic diversity structure that have been independently built on different scales and synthesize the existing theories and methods for characterizing genomic structure at the population level. We discuss how population structure could shape correlation blocks and their patterns within and between populations. Effects of evolutionary forces (selection, migration, genetic drift, and mutation) on the pattern of genome-wide correlation blocks are discussed. In eukaryote organisms, we briefly discuss the associations between the pattern of correlation blocks and genome assembly features in eukaryote organisms, including the impacts of multigene family, the perturbation of transposable elements, and the repetitive nongenic sequences and GC-rich isochores. Our reviews suggest that the observable pattern of correlation blocks can refine our understanding of the ecological and evolutionary processes underlying the genomic evolution at the population level. PMID:21886455

  12. Genomic Structure and Identification of Novel Mutations in Usherin, the Gene Responsible for Usher Syndrome Type IIa

    PubMed Central

    Weston, M. D.; Eudy, J. D.; Fujita, S.; Yao, S.-F.; Usami, S.; Cremers, C.; Greenburg, J.; Ramesar, R.; Martini, A.; Moller, C.; Smith, R. J.; Sumegi, J.; Kimberling, William J.

    2000-01-01

    Usher syndrome type IIa (USHIIa) is an autosomal recessive disorder characterized by moderate to severe sensorineural hearing loss and progressive retinitis pigmentosa. This disorder maps to human chromosome 1q41. Recently, mutations in USHIIa patients were identified in a novel gene isolated from this chromosomal region. The USH2A gene encodes a protein with a predicted molecular weight of 171.5 kD and possesses laminin epidermal growth factor as well as fibronectin type III domains. These domains are observed in other protein components of the basal lamina and extracellular matrixes; they may also be observed in cell-adhesion molecules. The intron/exon organization of the gene whose protein we name “Usherin” was determined by direct sequencing of PCR products and cloned genomic DNA with cDNA-specific primers. The gene is encoded by 21 exons and spans a minimum of 105 kb. A mutation search of 57 independent USHIIa probands was performed with a combination of direct sequencing and heteroduplex analysis of PCR-amplified exons. Fifteen new mutations were found. Of 114 independent USH2A alleles, 58 harbored probable pathologic mutations. Ten cases of USHIIa were true homozygotes and 10 were compound heterozygotes; 18 heterozygotes with only one identifiable mutation were observed. Sixty-five percent (38/58) of cases had at least one mutation, and 51% (58/114) of the total number of possible mutations were identified. The allele 2299delG (previously reported as 2314delG) was the most frequent mutant allele observed (16%; 31/192). Three new missense mutations (C319Y, N346H, and C419F) were discovered; all were restricted to the previously unreported laminin domain VI region of Usherin. The possible significance of this domain, known to be necessary for laminin network assembly, is discussed in the context of domain VI mutations from other proteins. PMID:10729113

  13. Single molecule real-time sequencing of Xanthomonas oryzae genomes reveals a dynamic structure and complex TAL (transcription activator-like) effector gene relationships

    PubMed Central

    Booher, Nicholas J.; Carpenter, Sara C. D.; Sebra, Robert P.; Wang, Li; Salzberg, Steven L.; Leach, Jan E.; Bogdanove, Adam J.

    2016-01-01

    Pathogen-injected, direct transcriptional activators of host genes, TAL (transcription activator-like) effectors play determinative roles in plant diseases caused by Xanthomonas spp. A large domain of nearly identical, 33–35 aa repeats in each protein mediates DNA recognition. This modularity makes TAL effectors customizable and thus important also in biotechnology. However, the repeats render TAL effector (tal) genes nearly impossible to assemble using next-generation, short reads. Here, we demonstrate that long-read, single molecule real-time (SMRT) sequencing solves this problem. Taking an ensemble approach to first generate local, tal gene contigs, we correctly assembled de novo the genomes of two strains of the rice pathogen X. oryzae completed previously using the Sanger method and even identified errors in those references. Sequencing two more strains revealed a dynamic genome structure and a striking plasticity in tal gene content. Our results pave the way for population-level studies to inform resistance breeding, improve biotechnology and probe TAL effector evolution. PMID:27148456

  14. Comparative genomic analysis of prion genes

    PubMed Central

    Premzl, Marko; Gamulin, Vera

    2007-01-01

    Background The homologues of human disease genes are expected to contribute to better understanding of physiological and pathogenic processes. We made use of the present availability of vertebrate genomic sequences, and we have conducted the most comprehensive comparative genomic analysis of the prion protein gene PRNP and its homologues, shadow of prion protein gene SPRN and doppel gene PRND, and prion testis-specific gene PRNT so far. Results While the SPRN and PRNP homologues are present in all vertebrates, PRND is known in tetrapods, and PRNT is present in primates. PRNT could be viewed as a TE-associated gene. Using human as the base sequence for genomic sequence comparisons (VISTA), we annotated numerous potential cis-elements. The conserved regions in SPRNs harbour the potential Sp1 sites in promoters (mammals, birds), C-rich intron splicing enhancers and PTB intron splicing silencers in introns (mammals, birds), and hsa-miR-34a sites in 3'-UTRs (eutherians). We showed the conserved PRNP upstream regions, which may be potential enhancers or silencers (primates, dog). In the PRNP 3'-UTRs, there are conserved cytoplasmic polyadenylation element sites (mammals, birds). The PRND core promoters include highly conserved CCAAT, CArG and TATA boxes (mammals). We deduced 42 new protein primary structures, and performed the first phylogenetic analysis of all vertebrate prion genes. Using the protein alignment which included 122 sequences, we constructed the neighbour-joining tree which showed four major clusters, including shadoos, shadoo2s and prion protein-likes (cluster 1), fish prion proteins (cluster 2), tetrapode prion proteins (cluster 3) and doppels (cluster 4). We showed that the entire prion protein conformationally plastic region is well conserved between eutherian prion proteins and shadoos (18–25% identity and 28–34% similarity), and there could be a potential structural compatibility between shadoos and the left-handed parallel beta-helical fold

  15. [Integration of different T-DNA structures of ACC oxidase gene into carnation genome extended cut flower vase-life differently].

    PubMed

    Yu, Yi-Xun; Bao, Man-Zhu

    2004-09-01

    The cultivar 'Master' of carnation (Dianthus caryophyllus L.) was transformed with four T-DNA structures containing sense, antisense, sense direct repeat and antisense direct repeat gene of ACC oxidase mediated by Agrobacterium tumefaciens. Southern blotting detection showed that foreign gene was integrated into the carnation genome and 14 transgenic lines were obtained. The transgenic plants were transplanted to soil and grew normally in greenhouse. Of the 12 transgenic lines screened, the cut flower vase life of 8 transgenic lines is up to 11 days and the longest one is 12.8 days while the vase life of the control is 5.8 days under 25 degrees C. The vase life of 2 lines out of 3 with single sense ACO gene is same as that of the control, while the vase life of 3 lines out of 4 with single antisense ACO gene is prolonged. The vase life of cut flowers of 5 lines with direct repeat ACO genes is all prolonged by about 6 days, while the vase life of 3 out of 7 lines with single ACO gene is same as that of the control. During the senescence of cut flowers, the ethylene production of the most of the transgenic lines decreased significantly, and the production of ethylene is not detectable in lines T456, T556 and T575. The results of the research demonstrate that antisense foreign gene inhibits expression of endogenesis gene more significantly than sense one. Both sense direct repeat and antisense direct repeat foreign genes can suppress endogenous gene expression more significantly comparing to single foreign genes. The transgenic lines obtained from this research are useful to minimize carnation cut flower transportation and storage expenses.

  16. Genome-Wide Association Study of Cardiac Structure and Systolic Function in African Americans: The Candidate Gene Association Resource (CARe) Study

    PubMed Central

    Fox, Ervin R.; Musani, Solomon K.; Barbalic, Maja; Lin, Honghuang; Yu, Bing; Ogunyankin, Kofo O.; Smith, Nicholas L.; Kutlar, Abdullah; Glazer, Nicole L.; Post, Wendy S.; Paltoo, Dina N.; Dries, Daniel L.; Farlow, Deborah N.; Duarte, Christine W.; Kardia, Sharon L.; Meyers, Kristin J.; Sun, Yan V.; Arnett, Donna K.; Patki, Amit A.; Sha, Jin; Cui, Xiangqui; Samdarshi, Tandaw E.; Penman, Alan D.; Bibbins-Domingo, Kirsten; Bůžková, Petra; Benjamin, Emelia J.; Bluemke, David A.; Morrison, Alanna C.; Heiss, Gerardo; Carr, J. Jeffrey; Tracy, Russell P.; Mosley, Thomas H.; Taylor, Herman A.; Psaty, Bruce M.; Heckbert, Susan R.; Cappola, Thomas P.; Vasan, Ramachandran S.

    2013-01-01

    Background Using data from four community-based cohorts of African Americans (AA), we tested the association between genome-wide markers (SNPs) and cardiac phenotypes in the Candidate-gene Association REsource (CARe) study. Methods and Results Among 6,765 AA, we related age, sex, height and weight-adjusted residuals for nine cardiac phenotypes (assessed by echocardiogram or MRI) to 2.5 million SNPs genotyped using Genome-Wide Affymetrix Human SNP Array 6.0 (Affy6.0) and the remainder imputed. Within cohort genome-wide association analysis was conducted followed by meta-analysis across cohorts using inverse variance weights (genome-wide significance threshold=4.0 ×10−07). Supplementary pathway analysis was performed. We attempted replication in 3 smaller cohorts of African ancestry and tested look-ups in one consortium of European ancestry (EchoGEN). Across the 9 phenotypes, variants in 4 genetic loci reached genome-wide significance: rs4552931 in UBE2V2 (p=1.43 × 10−07) for left ventricular mass (LVM); rs7213314 in WIPI1 (p=1.68 × 10−07) for LV internal diastolic diameter (LVIDD); rs1571099 in PPAPDC1A (p= 2.57 × 10−08) for interventricular septal wall thickness (IVST); and rs9530176 in KLF5 (p=4.02 × 10−07) for ejection fraction (EF). Associated variants were enriched in three signaling pathways involved in cardiac remodeling. None of the 4 loci replicated in cohorts of African ancestry were confirmed in look-ups in EchoGEN. Conclusions In the largest GWAS of cardiac structure and function to date in AA, we identified 4 genetic loci related to LVM, IVST, LVIDD and EF that reached genome-wide significance. Replication results suggest that these loci may represent unique to individuals of African ancestry. Additional large-scale studies are warranted for these complex phenotypes. PMID:23275298

  17. Molecular characterization of two lactate dehydrogenase genes with a novel structural organization on the genome of Lactobacillus sp. strain MONT4.

    PubMed

    Weekes, Jennifer; Yüksel, Gülhan U

    2004-10-01

    Two lactate dehydrogenase (ldh) genes from Lactobacillus sp. strain MONT4 were cloned by complementation in Escherichia coli DC1368 (ldh pfl) and were sequenced. The sequence analysis revealed a novel genomic organization of the ldh genes. Subcloning of the individual ldh genes and their Northern blot analyses indicated that the genes are monocistronic.

  18. Multiple genome alignment for identifying the core structure among moderately related microbial genomes.

    PubMed

    Uchiyama, Ikuo

    2008-10-31

    Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.

  19. Multiple genome alignment for identifying the core structure among moderately related microbial genomes

    PubMed Central

    Uchiyama, Ikuo

    2008-01-01

    Background Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. Results The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. Conclusion The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes. PMID:18976470

  20. Pichia stipitis genomics, transcriptomics, and gene clusters

    Treesearch

    Thomas W. Jeffries; Jennifer R. Headman Van Vleet

    2009-01-01

    Genome sequencing and subsequent global gene expression studies have advanced our understanding of the lignocellulose-fermenting yeast Pichia stipitis. These studies have provided an insight into its central carbon metabolism, and analysis of its genome has revealed numerous functional gene clusters and tandem repeats. Specialized physiological traits are often the...

  1. Persistence drives gene clustering in bacterial genomes

    PubMed Central

    Fang, Gang; Rocha, Eduardo PC; Danchin, Antoine

    2008-01-01

    Background Gene clustering plays an important role in the organization of the bacterial chromosome and several mechanisms have been proposed to explain its extent. However, the controversies raised about the validity of each of these mechanisms remind us that the cause of this gene organization remains an open question. Models proposed to explain clustering did not take into account the function of the gene products nor the likely presence or absence of a given gene in a genome. However, genomes harbor two very different categories of genes: those genes present in a majority of organisms – persistent genes – and those present in very few organisms – rare genes. Results We show that two classes of genes are significantly clustered in bacterial genomes: the highly persistent and the rare genes. The clustering of rare genes is readily explained by the selfish operon theory. Yet, genes persistently present in bacterial genomes are also clustered and we try to understand why. We propose a model accounting specifically for such clustering, and show that indispensability in a genome with frequent gene deletion and insertion leads to the transient clustering of these genes. The model describes how clusters are created via the gene flux that continuously introduces new genes while deleting others. We then test if known selective processes, such as co-transcription, physical interaction or functional neighborhood, account for the stabilization of these clusters. Conclusion We show that the strong selective pressure acting on the function of persistent genes, in a permanent state of flux of genes in bacterial genomes, maintaining their size fairly constant, that drives persistent genes clustering. A further selective stabilization process might contribute to maintaining the clustering. PMID:18179692

  2. Computational Genomics: From Genome Sequence To Global Gene Regulation

    NASA Astrophysics Data System (ADS)

    Li, Hao

    2000-03-01

    As various genome projects are shifting to the post-sequencing phase, it becomes a big challenge to analyze the sequence data and extract biological information using computational tools. In the past, computational genomics has mainly focused on finding new genes and mapping out their biological functions. With the rapid accumulation of experimental data on genome-wide gene activities, it is now possible to understand how genes are regulated on a genomic scale. A major mechanism for gene regulation is to control the level of transcription, which is achieved by regulatory proteins that bind to short DNA sequences - the regulatory elements. We have developed a new approach to identifying regulatory elements in genomes. The approach formalizes how one would proceed to decipher a ``text'' consisting of a long string of letters written in an unknown language that did not delineate words. The algorithm is based on a statistical mechanics model in which the sequence is segmented probabilistically into ``words'' and a ``dictionary'' of ``words'' is built concurrently. For the control regions in the yeast genome, we built a ``dictionary'' of about one thousand words which includes many known as well as putative regulatory elements. I will discuss how we can use this dictionary to search for genes that are likely to be regulated in a similar fashion and to analyze gene expression data generated from DNA micro-array experiments.

  3. KEGG: kyoto encyclopedia of genes and genomes.

    PubMed

    Kanehisa, M; Goto, S

    2000-01-01

    KEGG (Kyoto Encyclopedia of Genes and Genomes) is a knowledge base for systematic analysis of gene functions, linking genomic information with higher order functional information. The genomic information is stored in the GENES database, which is a collection of gene catalogs for all the completely sequenced genomes and some partial genomes with up-to-date annotation of gene functions. The higher order functional information is stored in the PATHWAY database, which contains graphical representations of cellular processes, such as metabolism, membrane transport, signal transduction and cell cycle. The PATHWAY database is supplemented by a set of ortholog group tables for the information about conserved subpathways (pathway motifs), which are often encoded by positionally coupled genes on the chromosome and which are especially useful in predicting gene functions. A third database in KEGG is LIGAND for the information about chemical compounds, enzyme molecules and enzymatic reactions. KEGG provides Java graphics tools for browsing genome maps, comparing two genome maps and manipulating expression maps, as well as computational tools for sequence comparison, graph comparison and path computation. The KEGG databases are daily updated and made freely available (http://www. genome.ad.jp/kegg/).

  4. Functional coverage of the human genome by existing structures, structural genomics targets, and homology models.

    PubMed

    Xie, Lei; Bourne, Philip E

    2005-08-01

    The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB), target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB), it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.

  5. Genomic organization of the adrenoleukodystrophy gene

    SciTech Connect

    Sarde, C.O.; Mosser, J.; Kretz, C.

    1994-07-01

    Adrenoleukodystrophy (ALD), the most frequent peroxisomal disorder, is a severe neurodegenerative disease associated with an impairment of very long chain fatty acids {beta}-oxidation. The authors have recently identified by positional cloning the gene responsible for ALD, located in Xq28. It encodes a new member of the {open_quotes}ABC{close_quotes} superfamily of membrane-associated transporters that shows, in particular, significant homology to the 70-kDa peroxisomal membrane protein (PMP70). They report here a detailed characterization of the ALD gene structure. It extends over 21 kb and consists of 10 exons. To facilitate the detection of mutations in ALD patients, they have determined the intronic sequences flanking the exons as well as the sequence of the 3{prime} untranslated region and of the immediate 5{prime} promoter region. Sequences present in distal exons cross-hybridize strongly to additional sequences in the human genome. The ALD gene has been positioned on a pulsed-field map between DXS15 and the L1CAM gene, about 650 kb upstream from the color pigment genes. The frequent occurrence of color vision anomalies observed in patients with adrenomyeloneuropathy (the adult onset form of ALD) thus does not represent a contiguous gene syndrome but a secondary manifestation of ALD. 37 refs., 6 figs.

  6. Honeybee (Apis mellifera L.) mrjp gene family: computational analysis of putative promoters and genomic structure of mrjp1, the gene coding for the most abundant protein of larval food.

    PubMed

    Malecová, Barbora; Ramser, Juliane; O'Brien, John K; Janitz, Michal; Júdová, Jana; Lehrach, Hans; Simúth, Jozef

    2003-01-16

    Mrjp1 gene belongs to the honeybee mrjp gene family encoding the major royal jelly proteins (MRJPs), secreted by nurse bees into the royal jelly. In this study, we have isolated the genomic clone containing the entire mrjp1 gene and determined its sequence. The mrjp1 gene sequence spans over 3038 bp and contains six exons separated by five introns. Seven mismatches between the mrjp1 gene sequence and two previously independently published cDNA sequences were found, but these differences do not lead to any change in the deduced amino acid sequence of MRJP1. With the aid of inverse polymerase chain reaction we obtained sequences flanking the 5' ends of other mrjp genes (mrjp2, mrjp3, mrjp4 and mrjp5). Putative promoters were predicted upstream of all mrjp genes (including mrjp1). The predicted promoters contain the TATA motif (TATATATT), highly conserved both in sequence and position. Ultraspiracle (USP) transcription factor (TF) binding sites in putative promoter regions and clusters of dead ringer TF binding sites upstream of these promoters were predicted computationally. We propose that USP, as a juvenile hormone (JH) binding TF, might possibly act as a mediator of mrjp expression in response to JH. Mrjp1's genomic locus is predicted to encode an antisense transcript, partially overlapping with five mrjp1 exons and entirely overlapping with the putative promoter and predicted transcriptional start point of mrjp1. This finding may shed light on the mechanisms of regulation of mrjps expression. Southern blot analysis of genomic DNA revealed that all so far known members of mrjp gene family (mrjp1, mrjp2, mrjp3, mrjp4 and mrjp5) are present as single-copy genes per haploid honeybee genome. Although MRJPs and the yellow protein of Drosophila melanogaster share a certain degree of similarity in aa sequence and although it has been shown that they share a common evolutionary origin, neither structural similarities in the gene organization, nor significant similarities

  7. Structural genomics in North America.

    PubMed

    Terwilliger, T C

    2000-11-01

    Structural genomics in North America has moved remarkably quickly from ideas to pilot projects. Just three years ago, the field was only a concept, independently being discussed by its many inventors. Now it is already a well-organized, increasingly-funded, consortium-based effort to determine protein structures on a large scale.

  8. A unified gene catalog for the laboratory mouse reference genome.

    PubMed

    Zhu, Y; Richardson, J E; Hale, P; Baldarelli, R M; Reed, D J; Recla, J M; Sinclair, R; Reddy, T B K; Bult, C J

    2015-08-01

    We report here a semi-automated process by which mouse genome feature predictions and curated annotations (i.e., genes, pseudogenes, functional RNAs, etc.) from Ensembl, NCBI and Vertebrate Genome Annotation database (Vega) are reconciled with the genome features in the Mouse Genome Informatics (MGI) database (http://www.informatics.jax.org) into a comprehensive and non-redundant catalog. Our gene unification method employs an algorithm (fjoin--feature join) for efficient detection of genome coordinate overlaps among features represented in two annotation data sets. Following the analysis with fjoin, genome features are binned into six possible categories (1:1, 1:0, 0:1, 1:n, n:1, n:m) based on coordinate overlaps. These categories are subsequently prioritized for assessment of annotation equivalencies and differences. The version of the unified catalog reported here contains more than 59,000 entries, including 22,599 protein-coding coding genes, 12,455 pseudogenes, and 24,007 other feature types (e.g., microRNAs, lincRNAs, etc.). More than 23,000 of the entries in the MGI gene catalog have equivalent gene models in the annotation files obtained from NCBI, Vega, and Ensembl. 12,719 of the features are unique to NCBI relative to Ensembl/Vega; 11,957 are unique to Ensembl/Vega relative to NCBI, and 3095 are unique to MGI. More than 4000 genome features fall into categories that require manual inspection to resolve structural differences in the gene models from different annotation sources. Using the MGI unified gene catalog, researchers can easily generate a comprehensive report of mouse genome features from a single source and compare the details of gene and transcript structure using MGI's mouse genome browser.

  9. Ligninolytic peroxidase genes in the oyster mushroom genome: heterologous expression, molecular structure, catalytic and stability properties, and lignin-degrading ability

    PubMed Central

    2014-01-01

    Background The genome of Pleurotus ostreatus, an important edible mushroom and a model ligninolytic organism of interest in lignocellulose biorefineries due to its ability to delignify agricultural wastes, was sequenced with the purpose of identifying and characterizing the enzymes responsible for lignin degradation. Results Heterologous expression of the class II peroxidase genes, followed by kinetic studies, enabled their functional classification. The resulting inventory revealed the absence of lignin peroxidases (LiPs) and the presence of three versatile peroxidases (VPs) and six manganese peroxidases (MnPs), the crystal structures of two of them (VP1 and MnP4) were solved at 1.0 to 1.1 Å showing significant structural differences. Gene expansion supports the importance of both peroxidase types in the white-rot lifestyle of this fungus. Using a lignin model dimer and synthetic lignin, we showed that VP is able to degrade lignin. Moreover, the dual Mn-mediated and Mn-independent activity of P. ostreatus MnPs justifies their inclusion in a new peroxidase subfamily. The availability of the whole POD repertoire enabled investigation, at a biochemical level, of the existence of duplicated genes. Differences between isoenzymes are not limited to their kinetic constants. Surprising differences in their activity T50 and residual activity at both acidic and alkaline pH were observed. Directed mutagenesis and spectroscopic/structural information were combined to explain the catalytic and stability properties of the most interesting isoenzymes, and their evolutionary history was analyzed in the context of over 200 basidiomycete peroxidase sequences. Conclusions The analysis of the P. ostreatus genome shows a lignin-degrading system where the role generally played by LiP has been assumed by VP. Moreover, it enabled the first characterization of the complete set of peroxidase isoenzymes in a basidiomycete, revealing strong differences in stability properties and providing

  10. Structural Genomics of Protein Phosphatases

    SciTech Connect

    Almo,S.; Bonanno, J.; Sauder, J.; Emtage, S.; Dilorenzo, T.; Malashkevich, V.; Wasserman, S.; Swaminathan, S.; Eswaramoorthy, S.; et al

    2007-01-01

    The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptional regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.

  11. KEGG: Kyoto Encyclopedia of Genes and Genomes.

    PubMed

    Ogata, H; Goto, S; Sato, K; Fujibuchi, W; Bono, H; Kanehisa, M

    1999-01-01

    Kyoto Encyclopedia of Genes and Genomes (KEGG) is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules. The major component of KEGG is the PATHWAY database that consists of graphical diagrams of biochemical pathways including most of the known metabolic pathways and some of the known regulatory pathways. The pathway information is also represented by the ortholog group tables summarizing orthologous and paralogous gene groups among different organisms. KEGG maintains the GENES database for the gene catalogs of all organisms with complete genomes and selected organisms with partial genomes, which are continuously re-annotated, as well as the LIGAND database for chemical compounds and enzymes. Each gene catalog is associated with the graphical genome map for chromosomal locations that is represented by Java applet. In addition to the data collection efforts, KEGG develops and provides various computational tools, such as for reconstructing biochemical pathways from the complete genome sequence and for predicting gene regulatory networks from the gene expression profiles. The KEGG databases are daily updated and made freely available (http://www.genome.ad.jp/kegg/).

  12. Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays.

    PubMed

    Mak, Angel C Y; Lai, Yvonne Y Y; Lam, Ernest T; Kwok, Tsz-Piu; Leung, Alden K Y; Poon, Annie; Mostovoy, Yulia; Hastie, Alex R; Stedman, William; Anantharaman, Thomas; Andrews, Warren; Zhou, Xiang; Pang, Andy W C; Dai, Heng; Chu, Catherine; Lin, Chin; Wu, Jacob J K; Li, Catherine M L; Li, Jing-Woei; Yim, Aldrin K Y; Chan, Saki; Sibert, Justin; Džakula, Željko; Cao, Han; Yiu, Siu-Ming; Chan, Ting-Fung; Yip, Kevin Y; Xiao, Ming; Kwok, Pui-Yan

    2016-01-01

    Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation. Copyright © 2016 by the Genetics Society of America.

  13. Gene enrichment in plant genomic shotgun libraries.

    PubMed

    Rabinowicz, Pablo D; McCombie, W Richard; Martienssen, Robert A

    2003-04-01

    The Arabidopsis genome (about 130 Mbp) has been completely sequenced; whereas a draft sequence of the rice genome (about 430 Mbp) is now available and the sequencing of this genome will be completed in the near future. The much larger genomes of several important crop species, such as wheat (about 16,000 Mbp) or maize (about 2500 Mbp), may not be fully sequenced with current technology. Instead, sequencing-analysis strategies are being developed to obtain sequencing and mapping information selectively for the genic fraction (gene space) of complex plant genomes.

  14. Maximum likelihood for genome phylogeny on gene content.

    PubMed

    Zhang, Hongmei; Gu, Xun

    2004-01-01

    With the rapid growth of entire genome data, reconstructing the phylogenetic relationship among different genomes has become a hot topic in comparative genomics. Maximum likelihood approach is one of the various approaches, and has been very successful. However, there is no reported study for any applications in the genome tree-making mainly due to the lack of an analytical form of a probability model and/or the complicated calculation burden. In this paper we studied the mathematical structure of the stochastic model of genome evolution, and then developed a simplified likelihood function for observing a specific phylogenetic pattern under four genome situation using gene content information. We use the maximum likelihood approach to identify phylogenetic trees. Simulation results indicate that the proposed method works well and can identify trees with a high correction rate. Real data application provides satisfied results. The approach developed in this paper can serve as the basis for reconstructing phylogenies of more than four genomes.

  15. Directed self-assembly, genomic assembly complexity and the formation of biological structure, or, what are the genes for nacre?

    PubMed

    Cartwright, Julyan H E

    2016-03-13

    Biology uses dynamical mechanisms of self-organization and self-assembly of materials, but it also choreographs and directs these processes. The difference between abiotic self-assembly and a biological process is rather like the difference between setting up and running an experiment to make a material remotely compared with doing it in one's own laboratory: with a remote experiment-say on the International Space Station-everything must be set up beforehand to let the experiment run 'hands off', but in the laboratory one can intervene at any point in a 'hands-on' approach. It is clear that the latter process, of directed self-assembly, can allow much more complicated experiments and produce far more complex structures than self-assembly alone. This control over self-assembly in biology is exercised at certain key waypoints along a trajectory and the process may be quantified in terms of the genomic assembly complexity of a biomaterial. © 2016 The Author(s).

  16. Genome-Wide Views of Chromatin Structure

    PubMed Central

    Rando, Oliver J.; Chang, Howard Y.

    2010-01-01

    Eukaryotic genomes are packaged into a nucleoprotein complex known as chromatin, which affects most processes that occur on DNA. Along with genetic and biochemical studies of resident chromatin proteins and their modifying enzymes, mapping of chromatin structure in vivo is one of the main pillars in our understanding of how chromatin relates to cellular processes. In this review, we discuss the use of genomic technologies to characterize chromatin structure in vivo, with a focus on data from budding yeast and humans. The picture emerging from these studies is the detailed chromatin structure of a typical gene, where the typical behavior gives insight into the mechanisms and deep rules that establish chromatin structure. Important deviation from the archetype is also observed, usually as a consequence of unique regulatory mechanisms at special genomic loci. Chromatin structure shows substantial conservation from yeast to humans, but mammalian chromatin has additional layers of complexity that likely relate to the requirements of multicellularity such as the need to establish faithful gene regulatory mechanisms for cell differentiation. PMID:19317649

  17. JGI Plant Genomics Gene Annotation Pipeline

    SciTech Connect

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  18. Using Genomics for Natural Product Structure Elucidation.

    PubMed

    Tietz, Jonathan I; Mitchell, Douglas A

    2016-01-01

    Natural products (NPs) are the most historically bountiful source of chemical matter for drug development-especially for anti-infectives. With insights gleaned from genome mining, interest in natural product discovery has been reinvigorated. An essential stage in NP discovery is structural elucidation, which sheds light not only on the chemical composition of a molecule but also its novelty, properties, and derivatization potential. The history of structure elucidation is replete with techniquebased revolutions: combustion analysis, crystallography, UV, IR, MS, and NMR have each provided game-changing advances; the latest such advance is genomics. All natural products have a genetic basis, and the ability to obtain and interpret genomic information for structure elucidation is increasingly available at low cost to non-specialists. In this review, we describe the value of genomics as a structural elucidation technique, especially from the perspective of the natural product chemist approaching an unknown metabolite. Herein we first introduce the databases and programs of interest to the natural products chemist, with an emphasis on those currently most suited for general usability. We describe strategies for linking observed natural product-linked phenotypes to their corresponding gene clusters. We then discuss techniques for extracting structural information from genes, illustrated with numerous case examples. We also provide an analysis of the biases and limitations of the field with recommendations for future development. Our overview is not only aimed at biologically-oriented researchers already at ease with bioinformatic techniques, but also, in particular, at natural product, organic, and/or medicinal chemists not previously familiar with genomic techniques.

  19. Analysis of simple sequence repeat (SSR) structure and sequence within Epichloë endophyte genomes reveals impacts on gene structure and insights into ancestral hybridization events.

    PubMed

    Clayton, William; Eaton, Carla Jane; Dupont, Pierre-Yves; Gillanders, Tim; Cameron, Nick; Saikia, Sanjay; Scott, Barry

    2017-01-01

    Epichloë grass endophytes comprise a group of filamentous fungi of both sexual and asexual species. Known for the beneficial characteristics they endow upon their grass hosts, the identification of these endophyte species has been of great interest agronomically and scientifically. The use of simple sequence repeat loci and the variation in repeat elements has been used to rapidly identify endophyte species and strains, however, little is known of how the structure of repeat elements changes between species and strains, and where these repeat elements are located in the fungal genome. We report on an in-depth analysis of the structure and genomic location of the simple sequence repeat locus B10, commonly used for Epichloë endophyte species identification. The B10 repeat was found to be located within an exon of a putative bZIP transcription factor, suggesting possible impacts on polypeptide sequence and thus protein function. Analysis of this repeat in the asexual endophyte hybrid Epichloë uncinata revealed that the structure of B10 alleles reflects the ancestral species that hybridized to give rise to this species. Understanding the structure and sequence of these simple sequence repeats provides a useful set of tools for readily distinguishing strains and for gaining insights into the ancestral species that have undergone hybridization events.

  20. Selecting soluble/foldable protein domains through single-gene or genomic ORF filtering: structure of the head domain of Burkholderia pseudomallei antigen BPSL2063.

    PubMed

    Gourlay, Louise J; Peano, Clelia; Deantonio, Cecilia; Perletti, Lucia; Pietrelli, Alessandro; Villa, Riccardo; Matterazzo, Elena; Lassaux, Patricia; Santoro, Claudio; Puccio, Simone; Sblattero, Daniele; Bolognesi, Martino

    2015-11-01

    The 1.8 Å resolution crystal structure of a conserved domain of the potential Burkholderia pseudomallei antigen and trimeric autotransporter BPSL2063 is presented as a structural vaccinology target for melioidosis vaccine development. Since BPSL2063 (1090 amino acids) hosts only one conserved domain, and the expression/purification of the full-length protein proved to be problematic, a domain-filtering library was generated using β-lactamase as a reporter gene to select further BPSL2063 domains. As a result, two domains (D1 and D2) were identified and produced in soluble form in Escherichia coli. Furthermore, as a general tool, a genomic open reading frame-filtering library from the B. pseudomallei genome was also constructed to facilitate the selection of domain boundaries from the entire ORFeome. Such an approach allowed the selection of three potential protein antigens that were also produced in soluble form. The results imply the further development of ORF-filtering methods as a tool in protein-based research to improve the selection and production of soluble proteins or domains for downstream applications such as X-ray crystallography.

  1. An integrated approach to structural genomics.

    PubMed

    Heinemann, U; Frevert, J; Hofmann, K; Illing, G; Maurer, C; Oschkinat, H; Saenger, W

    2000-01-01

    Structural genomics aims at determining a set of protein structures that will represent all domain folds present in the biosphere. These structures can be used as the basis for the homology modelling of the majority of all remaining protein domains or, indeed, proteins. Structural genomics therefore promises to provide a comprehensive structural description of the protein universe. To achieve this, a broad scientific effort is required. The Berlin-based "Protein Structure Factory" (PSF) plans to contribute to this effort by setting up a local infrastructure for the low-cost, high-throughput analysis of soluble human proteins. In close collaboration with the German Human Genome Project (DHGP) protein-coding genes will be expressed in Escherichia coli or yeast. Affinity-tagged proteins will be purified semi-automatically for biophysical characterization and structure analysis by X-ray diffraction methods and NMR spectroscopy. In all steps of the structure analysis process, possibilities for automation, parallelization and standardization will be explored. Major new facilities that are created for the PSF include a robotic station for large-scale protein crystallization, an NMR center and an experimental station for protein crystallography at the synchrotron storage ring BESSY II in Berlin.

  2. Genes but Not Genomes Reveal Bacterial Domestication of Lactococcus Lactis

    PubMed Central

    Passerini, Delphine; Beltramo, Charlotte; Coddeville, Michele; Quentin, Yves; Ritzenthaler, Paul

    2010-01-01

    Background The population structure and diversity of Lactococcus lactis subsp. lactis, a major industrial bacterium involved in milk fermentation, was determined at both gene and genome level. Seventy-six lactococcal isolates of various origins were studied by different genotyping methods and thirty-six strains displaying unique macrorestriction fingerprints were analyzed by a new multilocus sequence typing (MLST) scheme. This gene-based analysis was compared to genomic characteristics determined by pulsed-field gel electrophoresis (PFGE). Methodology/Principal Findings The MLST analysis revealed that L. lactis subsp. lactis is essentially clonal with infrequent intra- and intergenic recombination; also, despite its taxonomical classification as a subspecies, it displays a genetic diversity as substantial as that within several other bacterial species. Genome-based analysis revealed a genome size variability of 20%, a value typical of bacteria inhabiting different ecological niches, and that suggests a large pan-genome for this subspecies. However, the genomic characteristics (macrorestriction pattern, genome or chromosome size, plasmid content) did not correlate to the MLST-based phylogeny, with strains from the same sequence type (ST) differing by up to 230 kb in genome size. Conclusion/Significance The gene-based phylogeny was not fully consistent with the traditional classification into dairy and non-dairy strains but supported a new classification based on ecological separation between “environmental” strains, the main contributors to the genetic diversity within the subspecies, and “domesticated” strains, subject to recent genetic bottlenecks. Comparison between gene- and genome-based analyses revealed little relationship between core and dispensable genome phylogenies, indicating that clonal diversification and phenotypic variability of the “domesticated” strains essentially arose through substantial genomic flux within the dispensable genome

  3. Genomic characterization of ribitol teichoic acid synthesis in Staphylococcus aureus: genes, genomic organization and gene duplication.

    PubMed

    Qian, Ziliang; Yin, Yanbin; Zhang, Yong; Lu, Lingyi; Li, Yixue; Jiang, Ying

    2006-04-05

    Staphylococcus aureus or MRSA (Methicillin Resistant S. aureus), is an acquired pathogen and the primary cause of nosocomial infections worldwide. In S. aureus, teichoic acid is an essential component of the cell wall, and its biosynthesis is not yet well characterized. Studies in Bacillus subtilis have discovered two different pathways of teichoic acid biosynthesis, in two strains W23 and 168 respectively, namely teichoic acid ribitol (tar) and teichoic acid glycerol (tag). The genes involved in these two pathways are also characterized, tarA, tarB, tarD, tarI, tarJ, tarK, tarL for the tar pathway, and tagA, tagB, tagD, tagE, tagF for the tag pathway. With the genome sequences of several MRSA strains: Mu50, MW2, N315, MRSA252, COL as well as methicillin susceptible strain MSSA476 available, a comparative genomic analysis was performed to characterize teichoic acid biosynthesis in these S. aureus strains. We identified all S. aureus tar and tag gene orthologs in the selected S. aureus strains which would contribute to teichoic acids sythesis. Based on our identification of genes orthologous to tarI, tarJ, tarL, which are specific to tar pathway in B. subtilis W23, we also concluded that tar is the major teichoic acid biogenesis pathway in S. aureus. Further analyses indicated that the S. aureus tar genes, different from the divergon organization in B. subtilis, are organized into several clusters in cis. Most interesting, compared with genes in B. subtilis tar pathway, the S. aureus tar specific genes (tarI,J,L) are duplicated in all six S. aureus genomes. In the S. aureus strains we analyzed, tar (teichoic acid ribitol) is the main teichoic acid biogenesis pathway. The tar genes are organized into several genomic groups in cis and the genes specific to tar (relative to tag): tarI, tarJ, tarL are duplicated. The genomic organization of the S. aureus tar pathway suggests their regulations are different when compared to B. subtilis tar or tag pathway, which are

  4. BreakTrans: uncovering the genomic architecture of gene fusions.

    PubMed

    Chen, Ken; Navin, Nicholas E; Wang, Yong; Schmidt, Heather K; Wallis, John W; Niu, Beifang; Fan, Xian; Zhao, Hao; McLellan, Michael D; Hoadley, Katherine A; Mardis, Elaine R; Ley, Timothy J; Perou, Charles M; Wilson, Richard K; Ding, Li

    2013-08-23

    Producing gene fusions through genomic structural rearrangements is a major mechanism for tumor evolution. Therefore, accurately detecting gene fusions and the originating rearrangements is of great importance for personalized cancer diagnosis and targeted therapy. We present a tool, BreakTrans, that systematically maps predicted gene fusions to structural rearrangements. Thus, BreakTrans not only validates both types of predictions, but also provides mechanistic interpretations. BreakTrans effectively validates known fusions and discovers novel events in a breast cancer cell line. Applying BreakTrans to 43 breast cancer samples in The Cancer Genome Atlas identifies 90 genomically validated gene fusions. BreakTrans is available at http://bioinformatics.mdanderson.org/main/BreakTrans.

  5. Gene Insertion Into Genomic Safe Harbors for Human Gene Therapy

    PubMed Central

    Papapetrou, Eirini P; Schambach, Axel

    2016-01-01

    Genomic safe harbors (GSHs) are sites in the genome able to accommodate the integration of new genetic material in a manner that ensures that the newly inserted genetic elements: (i) function predictably and (ii) do not cause alterations of the host genome posing a risk to the host cell or organism. GSHs are thus ideal sites for transgene insertion whose use can empower functional genetics studies in basic research and therapeutic applications in human gene therapy. Currently, no fully validated GSHs exist in the human genome. Here, we review our formerly proposed GSH criteria and discuss additional considerations on extending these criteria, on strategies for the identification and validation of GSHs, as well as future prospects on GSH targeting for therapeutic applications. In view of recent advances in genome biology, gene targeting technologies, and regenerative medicine, gene insertion into GSHs can potentially catalyze nearly all applications in human gene therapy. PMID:26867951

  6. Floral gene resources from basal angiosperms for comparative genomics research.

    PubMed

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; DePamphilis, Claude W; Leebens-Mack, James H

    2005-03-30

    The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and functional divergence, and

  7. Genomic evidence for adaptation by gene duplication.

    PubMed

    Qian, Wenfeng; Zhang, Jianzhi

    2014-08-01

    Gene duplication is widely believed to facilitate adaptation, but unambiguous evidence for this hypothesis has been found in only a small number of cases. Although gene duplication may increase the fitness of the involved organisms by doubling gene dosage or neofunctionalization, it may also result in a simple division of ancestral functions into daughter genes, which need not promote adaptation. Hence, the general validity of the adaptation by gene duplication hypothesis remains uncertain. Indeed, a genome-scale experiment found similar fitness effects of deleting pairs of duplicate genes and deleting individual singleton genes from the yeast genome, leading to the conclusion that duplication rarely results in adaptation. Here we contend that the above comparison is unfair because of a known duplication bias among genes with different fitness contributions. To rectify this problem, we compare homologous genes from the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. We discover that simultaneously deleting a duplicate gene pair in S. cerevisiae reduces fitness significantly more than deleting their singleton counterpart in S. pombe, revealing post-duplication adaptation. The duplicates-singleton difference in fitness effect is not attributable to a potential increase in gene dose after duplication, suggesting that the adaptation is owing to neofunctionalization, which we find to be explicable by acquisitions of binary protein-protein interactions rather than gene expression changes. These results provide genomic evidence for the role of gene duplication in organismal adaptation and are important for understanding the genetic mechanisms of evolutionary innovation.

  8. hSmad5 gene, a human hSmad family member: its full length cDNA, genomic structure, promoter region and mutation analysis in human tumors.

    PubMed

    Gemma, A; Hagiwara, K; Vincent, F; Ke, Y; Hancock, A R; Nagashima, M; Bennett, W P; Harris, C C

    1998-02-19

    hSmad (mothers against decapentaplegic)-related proteins are important messengers within the Transforming Growth Factor-beta1 (TGF-beta1) superfamily signal transduction pathways. To further characterize a member of this family, we obtained a full length cDNA of the human hSmad5 (hSmad5) gene by rapid amplification of cDNA ends (RACE) and then determined the genomic structure of the gene. There are eight exons and two alternative transcripts; the shorter transcript lacks exon 2. We identified the hSmad5 promoter region from a human genomic YAC clone by obtaining the nucleotide sequence extending 1235 base pairs upstream of the 5' end of the cDNA. We found a CpG island consistent with a promoter region, and we demonstrated promoter activity in a 1232 bp fragment located upstream of the transcription initiation site. To investigate the frequency of somatic hSmad5 mutations in human cancers, we designed intron-based primers to examine coding regions by polymerase chain reaction-single strand conformation polymorphism (PCR-SSCP) analysis. Neither homozygous deletions or point mutations were found in 40 primary gastric tumors and 51 cell lines derived from diverse types of human cancer including 20 cell lines resistant to the growth inhibitory effects of TGF-beta1. These results suggest that the hSmad5 gene is not commonly mutated and that other genetic alterations mediate the loss of TGF-beta1 responsiveness in human cancers.

  9. Genome-wide characterization of maize miRNA genes

    USDA-ARS?s Scientific Manuscript database

    MicroRNAs (miRNAs) are small non-coding RNAs that play essential roles in plant growth and development. We conducted a genome-wide survey of maize miRNA genes, characterizing their structure, expression, and evolution. Computational approaches based on homology and secondary structure modeling ident...

  10. Structural variations in plant genomes

    PubMed Central

    Edwards, David; Varshney, Rajeev K.

    2014-01-01

    Differences between plant genomes range from single nucleotide polymorphisms to large-scale duplications, deletions and rearrangements. The large polymorphisms are termed structural variants (SVs). SVs have received significant attention in human genetics and were found to be responsible for various chronic diseases. However, little effort has been directed towards understanding the role of SVs in plants. Many recent advances in plant genetics have resulted from improvements in high-resolution technologies for measuring SVs, including microarray-based techniques, and more recently, high-throughput DNA sequencing. In this review we describe recent reports of SV in plants and describe the genomic technologies currently used to measure these SVs. PMID:24907366

  11. From gene action to reactive genomes

    PubMed Central

    Keller, Evelyn Fox

    2014-01-01

    Poised at a critical turning point in the history of genetics, recent work (e.g. in genomics, epigenetics, genomic plasticity) obliges us to critically reexamine many of our most basic concepts. For example, I argue that genomic research supports a radical transformation in our understanding of the genome – a shift from an earlier conception of that entity as an effectively static collection of active genes to that of a dynamic and reactive system dedicated to the context specific regulation of protein-coding sequences. PMID:24882822

  12. Novel recombinant papillomavirus genomes expressing selectable genes

    PubMed Central

    Van Doorslaer, Koenraad; Porter, Samuel; McKinney, Caleb; Stepp, Wesley H.; McBride, Alison A.

    2016-01-01

    Papillomaviruses infect and replicate in keratinocytes, but viral proteins are initially expressed at low levels and there is no effective and quantitative method to determine the efficiency of infection on a cell-to-cell basis. Here we describe human papillomavirus (HPV) genomes that express marker proteins (antibiotic resistance genes and Green Fluorescent Protein), and can be used to elucidate early stages in HPV infection of primary keratinocytes. To generate these recombinant genomes, the late region of the oncogenic HPV18 genome was replaced by CpG free marker genes. Insertion of these exogenous genes did not affect early replication, and had only minimal effects on early viral transcription. When introduced into primary keratinocytes, the recombinant marker genomes gave rise to drug-resistant keratinocyte colonies and cell lines, which maintained the extrachromosomal recombinant genome long-term. Furthermore, the HPV18 “marker” genomes could be packaged into viral particles (quasivirions) and used to infect primary human keratinocytes in culture. This resulted in the outgrowth of drug-resistant keratinocyte colonies containing replicating HPV18 genomes. In summary, we describe HPV18 marker genomes that can be used to quantitatively investigate many aspects of the viral life cycle. PMID:27892937

  13. Molecular cloning, partial genomic structure and functional characterization of succinic semialdehyde dehydrogenase genes from the parasitic insects Lucilia cuprina and Ctenocephalides felis.

    PubMed

    Rothacker, B; Werr, M; Ilg, T

    2008-06-01

    The enzyme succinic semialdehyde dehydrogenase (SSADH; EC1.2.1.24) is a component of the gamma-aminobutyric acid degradation pathway in mammals and is essential for development and function of the nervous system. Here we report the identification, cDNA cloning and functional expression of SSADH from the parasitic insects Lucilia cuprina and Ctenocephalides felis. The recombinant proteins possess potent NAD+-dependent SSADH activity, while their catalytic efficiency for other aldehyde substrates is lower. A genomic copy of the L. cuprina SSADH gene contains two introns, while a genomic gene version of C. felis is devoid of introns. In contrast to the single copy SSADH genes in Drosophila melanogaster and mammals, in L. cuprina and C. felis, multiple SSADH gene copies are present in the genome.

  14. Characterization of the Wilson disease gene: Genomic organization; alternative splicing; structure/function predictions; and population frequencies of disease-specific mutations

    SciTech Connect

    Petrukhin, K.; Chernov, I.; Ross, B.M.

    1994-09-01

    The Wilson disease (WD) gene has recently been identified as a putative copper-transporting ATPase with high amino acid similarity with the Menkes disease (MNK) gene. We have further characterized the WD gene by extending the 5{prime}-coding and non-coding DNA sequence and elucidating the intron/exon structure and genomic organization. Analysis of RNA transcripts from liver, brain, kidney and placenta reveals extensive alternative splicing which may provide a mechanism to regulate the quantity of functional protein product. Comparative sequence analysis shows that WD and MNK belong to the sub-family of heavy metal-transporting ATPases with several characterizing features which include unique amino acid motifs and distinct N-terminal and C-terminal transmembrane structure. Our data indicate that the 600 amino acid metal binding portion of the WD and MNK proteins was formed by gene duplication events and splicing of the 6 metal binding domain segment to a common ancestral protein. We have raised a WD-specific anti-peptide antibody to the N-terminal region and are beginning to explore the cellular and intracellular location of the WD protein. The metal-binding segment of the WD protein has been expressed in E. coli and metal binding assays are underway to characterize this aspect of the protein`s function. We have identified numerous disease-specific mutations and developed a rapid {open_quotes}reverse dot blot{close_quotes} screening protocol to determine mutation frequencies in different populations. The most common mutation disrupts the characteristic SEHP motif and accounts for more than 40% of WD cases in North American, Russian, and Swedish populations. This mutation has not been observed in our limited Sicilian sample.

  15. Comparative genomic identification and validation of β-defensin genes in the Ovis aries genome.

    PubMed

    Hall, T J; McQuillan, C; Finlay, E K; O'Farrelly, C; Fair, S; Meade, K G

    2017-04-04

    β-defensins are small, cationic, antimicrobial peptides found in species across the plant and animal kingdoms. In addition to microbiocidal activity, roles in immunity as well as reproduction have more recently been documented. β-defensin genes in Ovis aries (domestic sheep) have been poorly annotated, having been identified only by automatic gene prediction algorithms. The objective of this study was to use a comparative genomics approach to identify and characterise the β-defensin gene repertoire in sheep using the bovine genome as the primary reference. All 57 currently predicted bovine β-defensin genes were used to find orthologous sequences in the most recent version of the sheep genome (OAR v4.0). Forty three genes were found to have close genomic matches (>70% similarity) between sheep and cattle. The orthologous genes were located in four clusters across the genome, with 4 genes on chromosome 2, 19 genes on chromosome 13, 5 genes on chromosome 20 and 15 genes on chromosome 26. Conserved gene order for the β-defensin genes was apparent in the two smaller clusters, although gene order was reversed on chromosome 2, suggesting an inversion between sheep and cattle. Complete conservation of gene order was also observed for chromosome 13 β-defensin orthologs. More structural differences were apparent between chromosome 26 genes and the orthologous region in the bovine reference genome, which is known to be copy-number variable. In this cluster, the Defensin-beta 1 (DEFB1) gene matched to eleven Bovine Neutrophil beta-Defensin (BNBD) genes on chromosome 27 with almost uniform similarity, as well as to tracheal, enteric and lingual anti-microbial peptides (TAP, EAP and LAP), suggesting that annotation of the bovine reference sequence is still incomplete. qPCR was used to profile the expression of 34 β-defensin genes, representing each of the four clusters, in the ram reproductive tract. Distinct site-specific and differential expression profiles were

  16. Recurrent Gene Duplication Diversifies Genome Defense Repertoire in Drosophila.

    PubMed

    Levine, Mia T; Vander Wende, Helen M; Hsieh, Emily; Baker, EmilyClare P; Malik, Harmit S

    2016-07-01

    Transposable elements (TEs) comprise large fractions of many eukaryotic genomes and imperil host genome integrity. The host genome combats these challenges by encoding proteins that silence TE activity. Both the introduction of new TEs via horizontal transfer and TE sequence evolution requires constant innovation of host-encoded TE silencing machinery to keep pace with TEs. One form of host innovation is the adaptation of existing, single-copy host genes. Indeed, host suppressors of TE replication often harbor signatures of positive selection. Such signatures are especially evident in genes encoding the piwi-interacting-RNA pathway of gene silencing, for example, the female germline-restricted TE silencer, HP1D/Rhino Host genomes can also innovate via gene duplication and divergence. However, the importance of gene family expansions, contractions, and gene turnover to host genome defense has been largely unexplored. Here, we functionally characterize Oxpecker, a young, tandem duplicate gene of HP1D/rhino We demonstrate that Oxpecker supports female fertility in Drosophila melanogaster and silences several TE families that are incompletely silenced by HP1D/Rhino in the female germline. We further show that, like Oxpecker, at least ten additional, structurally diverse, HP1D/rhino-derived daughter and "granddaughter" genes emerged during a short 15-million year period of Drosophila evolution. These young paralogs are transcribed primarily in germline tissues, where the genetic conflict between host genomes and TEs plays out. Our findings suggest that gene family expansion is an underappreciated yet potent evolutionary mechanism of genome defense diversification. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  17. Genomic disorders: A window into human gene and genome evolution

    PubMed Central

    Carvalho, Claudia M. B.; Zhang, Feng; Lupski, James R.

    2010-01-01

    Gene duplications alter the genetic constitution of organisms and can be a driving force of molecular evolution in humans and the great apes. In this context, the study of genomic disorders has uncovered the essential role played by the genomic architecture, especially low copy repeats (LCRs) or segmental duplications (SDs). In fact, regardless of the mechanism, LCRs can mediate or stimulate rearrangements, inciting genomic instability and generating dynamic and unstable regions prone to rapid molecular evolution. In humans, copy-number variation (CNV) has been implicated in common traits such as neuropathy, hypertension, color blindness, infertility, and behavioral traits including autism and schizophrenia, as well as disease susceptibility to HIV, lupus nephritis, and psoriasis among many other clinical phenotypes. The same mechanisms implicated in the origin of genomic disorders may also play a role in the emergence of segmental duplications and the evolution of new genes by means of genomic and gene duplication and triplication, exon shuffling, exon accretion, and fusion/fission events. PMID:20080665

  18. Haemonchus contortus: Genome Structure, Organization and Comparative Genomics.

    PubMed

    Laing, R; Martinelli, A; Tracey, A; Holroyd, N; Gilleard, J S; Cotton, J A

    2016-01-01

    One of the first genome sequencing projects for a parasitic nematode was that for Haemonchus contortus. The open access data from the Wellcome Trust Sanger Institute provided a valuable early resource for the research community, particularly for the identification of specific genes and genetic markers. Later, a second sequencing project was initiated by the University of Melbourne, and the two draft genome sequences for H. contortus were published back-to-back in 2013. There is a pressing need for long-range genomic information for genetic mapping, population genetics and functional genomic studies, so we are continuing to improve the Wellcome Trust Sanger Institute assembly to provide a finished reference genome for H. contortus. This review describes this process, compares the H. contortus genome assemblies with draft genomes from other members of the strongylid group and discusses future directions for parasite genomics using the H. contortus model. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Complete plastid genome sequence of Vaccinium macrocarpon: structure, gene content and rearrangements revealed by next generation sequencing

    USDA-ARS?s Scientific Manuscript database

    The complete plastid genome sequence of the American cranberry was reconstructed using next-generation sequencing data by in silico procedures. We used Roche 454 shotgun sequence data to isolate cranberry plastid-specific sequences of the cultivar ‘HyRed’ via homology comparisons with complete seque...

  20. Reproduction-related genes in the pearl oyster genome.

    PubMed

    Matsumoto, Toshie; Masaoka, Tetsuji; Fujiwara, Atsushi; Nakamura, Yoji; Satoh, Nori; Awaji, Masahiko

    2013-10-01

    Molluscan reproduction has been a target of biological research because of the various reproductive strategies that have evolved in this phylum. It has also been studied for the development of fisheries technologies, particularly aquaculture. Although fundamental processes of reproduction in other phyla, such as vertebrates and arthropods, have been well studied, information on the molecular mechanisms of molluscan reproduction remains limited. The recently released draft genome of the pearl oyster Pinctada fucata provides a novel and powerful platform for obtaining structural information on the genes and proteins involved in bivalve reproduction. In the present study, we analyzed the pearl oyster draft genome to screen reproduction-related genes. Analysis was mainly conducted for genes reported from other molluscs for encoding orthologs of reproduction-related proteins in other phyla. The gene search in the P. fucata gene models (version 1.1) and genome assembly (version 1.0) were performed using Genome Browser and BLAST software. The obtained gene models were then BLASTP searched against a public database to confirm the best-hit sequences. As a result, more than 40 gene models were identified with high accuracy to encode reproduction-related genes reported for P. fucata and other molluscs. These include vasa, nanos, doublesex- and mab-3-related transcription factor, 5-hydroxytryptamine (5-HT) receptors, vitellogenin, estrogen receptor, and others. The set of reproduction-related genes of P. fucata identified in the present study constitute a new tool for research on bivalve reproduction at the molecular level.

  1. PGDD: a database of gene and genome duplication in plants

    PubMed Central

    Lee, Tae-Ho; Tang, Haibao; Wang, Xiyin; Paterson, Andrew H.

    2013-01-01

    Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at http://chibba.agtec.uga.edu/duplication/), a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD. PMID:23180799

  2. Analysis of 90 Mb of the potato genome reveals conservation of gene structures and order with tomato but divergence in repetitive sequence composition

    PubMed Central

    Zhu, Wei; Ouyang, Shu; Iovene, Marina; O'Brien, Kimberly; Vuong, Hue; Jiang, Jiming; Buell, C Robin

    2008-01-01

    Background The Solanaceae family contains a number of important crop species including potato (Solanum tuberosum) which is grown for its underground storage organ known as a tuber. Albeit the 4th most important food crop in the world, other than a collection of ~220,000 Expressed Sequence Tags, limited genomic sequence information is currently available for potato and advances in potato yield and nutrition content would be greatly assisted through access to a complete genome sequence. While morphologically diverse, Solanaceae species such as potato, tomato, pepper, and eggplant share not only genes but also gene order thereby permitting highly informative comparative genomic analyses. Results In this study, we report on analysis 89.9 Mb of potato genomic sequence representing 10.2% of the genome generated through end sequencing of a potato bacterial artificial chromosome (BAC) clone library (87 Mb) and sequencing of 22 potato BAC clones (2.9 Mb). The GC content of potato is very similar to Solanum lycopersicon (tomato) and other dicotyledonous species yet distinct from the monocotyledonous grass species, Oryza sativa. Parallel analyses of repetitive sequences in potato and tomato revealed substantial differences in their abundance, 34.2% in potato versus 46.3% in tomato, which is consistent with the increased genome size per haploid genome of these two Solanum species. Specific classes and types of repetitive sequences were also differentially represented between these two species including a telomeric-related repetitive sequence, ribosomal DNA, and a number of unclassified repetitive sequences. Comparative analyses between tomato and potato at the gene level revealed a high level of conservation of gene content, genic feature, and gene order although discordances in synteny were observed. Conclusion Genomic level analyses of potato and tomato confirm that gene sequence and gene order are conserved between these solanaceous species and that this conservation can be

  3. Structural Genomics on the Web

    PubMed Central

    Wixon, Jo

    2001-01-01

    In this review we provide a brief guide to some of the resources and databases that can be used to locate information and aid research in the growing field of structural genomics. The review will provide examples, for less experienced users, of what can be achieved using a selection of the available sites. We hope that this will encourage you to use these sites to their full potential and whet your appetite to search for other related sites. PMID:18628900

  4. RNA-Seq improves annotation of protein-coding genes in the cucumber genome

    PubMed Central

    2011-01-01

    Background As more and more genomes are sequenced, genome annotation becomes increasingly important in bridging the gap between sequence and biology. Gene prediction, which is at the center of genome annotation, usually integrates various resources to compute consensus gene structures. However, many newly sequenced genomes have limited resources for gene predictions. In an effort to create high-quality gene models of the cucumber genome (Cucumis sativus var. sativus), based on the EVidenceModeler gene prediction pipeline, we incorporated the massively parallel complementary DNA sequencing (RNA-Seq) reads of 10 cucumber tissues into EVidenceModeler. We applied the new pipeline to the reassembled cucumber genome and included a comparison between our predicted protein-coding gene sets and a published set. Results The reassembled cucumber genome, annotated with RNA-Seq reads from 10 tissues, has 23, 248 identified protein-coding genes. Compared with the published prediction in 2009, approximately 8, 700 genes reveal structural modifications and 5, 285 genes only appear in the reassembled cucumber genome. All the related results, including genome sequence and annotations, are available at http://cmb.bnu.edu.cn/Cucumis_sativus_v20/. Conclusions We conclude that RNA-Seq greatly improves the accuracy of prediction of protein-coding genes in the reassembled cucumber genome. The comparison between the two gene sets also suggests that it is feasible to use RNA-Seq reads to annotate newly sequenced or less-studied genomes. PMID:22047402

  5. Structural analysis of the CD11b gene and phylogenetic analysis of the [alpha]-integrin gene family demonstrate remarkable conservation of genomic organization and suggest early diversification during evolution

    SciTech Connect

    Fleming, J.C.; Gonzalez, D.A.; Tenen, D.G. ); Pahl, H.L. Harvard Medical School, Boston, MA ); Smith, T.F. )

    1993-01-15

    CD11b is a member of the [beta]2 subfamily of the human leukocyte integrins. Its expression is limited to mature myeloid and NK cells and is up-regulated during the course of granulocytic and monocytic differentiation. The CD11b/CD18 (Mo1) heterodimer promotes adhesion of granulocytes and monocytes to C3bi-coated bacteria and endothelial cells. In an attempt to relate the exon structure to the known functional domains, as well as to identify and study cis-acting elements that are involved in its tissue-specific expression, the authors have isolated genomic clones encoding CD11b, deduced the exon/intron organization, and determined the transcriptional start site. The CD11b gene spans 55 kb and is encoded by 30 exons. Its structure closely resembles that of CD11c, another of the three leukocyte integrin [alpha]-chains, and suggests that these two genes arose by a gene duplication event. Furthermore, comparison of the CD11b gene structure with that of platelet glycoprotein llb and Drosophila PS2 suggest how the human leukocyte integrins evolved and dispersed during the course of evolution. 67 refs., 5 figs., 2 tabs.

  6. Single nucleotide polymorphisms reveal genetic structuring of the carpathian newt and provide evidence of interspecific gene flow in the nuclear genome.

    PubMed

    Zieliński, Piotr; Dudek, Katarzyna; Stuglik, Michał Tadeusz; Liana, Marcin; Babik, Wiesław

    2014-01-01

    Genetic variation within species is commonly structured in a hierarchical manner which may result from superimposition of processes acting at different spatial and temporal scales. In organisms of limited dispersal ability, signatures of past subdivision are detectable for a long time. Studies of contemporary genetic structure in such taxa inform about the history of isolation, range changes and local admixture resulting from geographically restricted hybridization with related species. Here we use a set of 139 transcriptome-derived, unlinked nuclear single nucleotide polymorphisms (SNP) to assess the genetic structure of the Carpathian newt (Lissotriton montandoni, Lm) and introgression from its congener, the smooth newt (L. vulgaris, Lv). Two substantially differentiated groups of Lm populations likely originated from separate refugia, both located in the Eastern Carpathians. The colonization of the present range in north-western and south-western directions was accompanied by a modest loss of variation; admixture between the two groups has occurred in the middle of the Eastern Carpathians. Local, apparently recent introgression of Lv alleles into several Lm populations was detected, demonstrating increased power for admixture detection in comparison to a previous study based on a limited number of microsatellite markers. The level of introgression was higher in Lm populations classified as admixed than in syntopic populations. We discuss the possible causes and propose further tests to distinguish between alternatives. Several outlier loci were identified in tests of interspecific differentiation, suggesting genomic heterogeneity of gene flow between species.

  7. Genome-wide SNPs and re-sequencing of growth habit and inflorescence genes in barley: implications for association mapping in germplasm arrays varying in size and structure.

    PubMed

    Cuesta-Marcos, Alfonso; Szucs, Péter; Close, Timothy J; Filichkin, Tanya; Muehlbauer, Gary J; Smith, Kevin P; Hayes, Patrick M

    2010-12-15

    --with SNP data only--in the larger germplasm arrays. For both vernalization sensitivity and inflorescence type, the most significant associations in the larger data sets were found with SNPs coincident with the synthetic markers used in the CAP Core and with SNPs detected via interaction analysis in the CAP Core. Small and highly structured collections of germplasm, such as the CAP Core, are cost-effectively phenotyped and genotyped with high-throughput markers. They are also useful for characterizing allelic diversity at loci in germplasm of interest. Our results suggest that discovery-oriented exercises in AM in such small arrays may generate a large number of false-positives. However, if haplotypes in candidate genes are available, they may be used as anchors in an analysis of interactions to identify other candidate regions harboring genes determining target traits. Using larger germplasm arrays, genome regions where the principal genes determining vernalization sensitivity and row type are located were identified.

  8. Gene duplication and transfer events in plant mitochondria genome

    SciTech Connect

    Xiong Aisheng Peng Rihe; Zhuang Jing; Gao Feng; Zhu Bo; Fu Xiaoyan; Xue Yong; Jin Xiaofen; Tian Yongsheng; Zhao Wei; Yao Quanhong

    2008-11-07

    Gene or genome duplication events increase the amount of genetic material available to increase the genomic, and thereby phenotypic, complexity of organisms during evolution. Gene duplication and transfer events have been important to molecular evolution in all three domains of life, and may be the first step in the emergence of new gene functions. Gene transfer events have been proposed as another accelerator of evolution. The duplicated gene or genome, mainly nuclear, has been the subject of several recent reviews. In addition to the nuclear genome, organisms have organelle genomes, including mitochondrial genome. In this review, we briefly summarize gene duplication and transfer events in the plant mitochondrial genome.

  9. Structural Genomics of Minimal Organisms: Pipeline and Results

    SciTech Connect

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2007-09-14

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93percent of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  10. 2004 Structural, Function and Evolutionary Genomics

    SciTech Connect

    Douglas L. Brutlag Nancy Ryan Gray

    2005-03-23

    This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

  11. Comparative genomics of the Hlx homeobox gene and protein: conservation of structure and expression from fish to mammals.

    PubMed

    Bates, Michael D; Wells, James M; Venkatesh, Byrappa

    2005-06-06

    Hlx is a homeobox transcription factor gene that is expressed in intestinal and hepatic mesenchyme of the developing mouse embryo and is essential for normal intestinal and hepatic development. Because of the morphological and molecular similarities in the development of the digestive system across species, we hypothesized that the Hlx gene and protein sequences and expression patterns would be conserved among vertebrates. Comparison of the Hlx gene orthologues of human, chimpanzee, mouse, rat, pufferfish (Fugu) and zebrafish demonstrates that these six genes share an identical organization with four exons and three introns. Comparison of the inferred Hlx protein sequences from these and three additional species (chick, Spanish ribbed newt and rainbow trout) reveals significant sequence identity, with identical homeodomains. The expression of Hlx in the mesenchyme of developing chick embryos is highly similar to that of mouse. Fugu Hlx is expressed in a tissue-specific manner that is similar though not identical to that of mouse, suggesting a conservation of Hlx function between mammals and birds. The mammalian and fish Hlx genes share a putative 5' upstream enhancer as well as an inverted repeat containing CCAAT boxes on opposite strands that we have previously shown to be important for mouse Hlx gene expression. These results suggest that the function of Hlx and the mechanisms regulating its expression are highly conserved in mammals, birds, amphibians and fish.

  12. From trees to the forest: genes to genomics.

    PubMed

    Mullighan, Charles; Petersdorf, Effie; Davies, Stella M; DiPersio, John

    2011-01-01

    Crick, Watson, and colleagues revealed the genetic code in 1953, and since that time, remarkable progress has been made in understanding what makes each of us who we are. Identification of single genes important in disease, and the development of a mechanistic understanding of genetic elements that regulate gene function, have cast light on the pathophysiology of many heritable and acquired disorders. In 1990, the human genome project commenced, with the goal of sequencing the entire human genome, and a "first draft" was published with astonishing speed in 2001. The first draft, although an extraordinary achievement, reported essentially an imaginary haploid mix of alleles rather than a true diploid genome. In the years since 2001, technology has further improved, and efforts have been focused on filling in the gaps in the initial genome and starting the huge task of looking at normal variation in the human genome. This work is the beginning of understanding human genetics in the context of the structure of the genome as a complete entity, and as more than simply the sum of a series of genes. We present 3 studies in this review that apply genomic approaches to leukemia and to transplantation to improve and extend therapies.

  13. Domains of α- and β-globin genes in the context of the structural-functional organization of the eukaryotic genome.

    PubMed

    Razin, S V; Ulianov, S V; Ioudinkova, E S; Gushchanskaya, E S; Gavrilov, A A; Iarovaia, O V

    2012-12-01

    The eukaryotic cell genome has a multilevel regulatory system of gene expression that includes stages of preliminary activation of genes or of extended genomic regions (switching them to potentially active states) and stages of final activation of promoters and maintaining their active status in cells of a certain lineage. Current views on the regulatory systems of transcription in eukaryotes have been formed based on results of systematic studies on a limited number of model systems, in particular, on the α- and β-globin gene domains of vertebrates. Unexpectedly, these genomic domains harboring genes responsible for the synthesis of different subunits of the same protein were found to have a fundamentally different organization inside chromatin. In this review, we analyze specific features of the organization of the α- and β-globin gene domains in vertebrates, as well as principles of activities of the regulatory systems in these domains. In the final part of the review, we attempt to answer the question how the evolution of α- and β-globin genes has led to segregation of these genes into two distinct types of chromatin domains situated on different chromosomes.

  14. Genome editing for human gene therapy.

    PubMed

    Meissner, Torsten B; Mandal, Pankaj K; Ferreira, Leonardo M R; Rossi, Derrick J; Cowan, Chad A

    2014-01-01

    The rapid advancement of genome-editing techniques holds much promise for the field of human gene therapy. From bacteria to model organisms and human cells, genome editing tools such as zinc-finger nucleases (ZNFs), TALENs, and CRISPR/Cas9 have been successfully used to manipulate the respective genomes with unprecedented precision. With regard to human gene therapy, it is of great interest to test the feasibility of genome editing in primary human hematopoietic cells that could potentially be used to treat a variety of human genetic disorders such as hemoglobinopathies, primary immunodeficiencies, and cancer. In this chapter, we explore the use of the CRISPR/Cas9 system for the efficient ablation of genes in two clinically relevant primary human cell types, CD4+ T cells and CD34+ hematopoietic stem and progenitor cells. By using two guide RNAs directed at a single locus, we achieve highly efficient and predictable deletions that ablate gene function. The use of a Cas9-2A-GFP fusion protein allows FACS-based enrichment of the transfected cells. The ease of designing, constructing, and testing guide RNAs makes this dual guide strategy an attractive approach for the efficient deletion of clinically relevant genes in primary human hematopoietic stem and effector cells and enables the use of CRISPR/Cas9 for gene therapy.

  15. Bacterial Cellular Engineering by Genome Editing and Gene Silencing

    PubMed Central

    Nakashima, Nobutaka; Miyazaki, Kentaro

    2014-01-01

    Genome editing is an important technology for bacterial cellular engineering, which is commonly conducted by homologous recombination-based procedures, including gene knockout (disruption), knock-in (insertion), and allelic exchange. In addition, some new recombination-independent approaches have emerged that utilize catalytic RNAs, artificial nucleases, nucleic acid analogs, and peptide nucleic acids. Apart from these methods, which directly modify the genomic structure, an alternative approach is to conditionally modify the gene expression profile at the posttranscriptional level without altering the genomes. This is performed by expressing antisense RNAs to knock down (silence) target mRNAs in vivo. This review describes the features and recent advances on methods used in genomic engineering and silencing technologies that are advantageously used for bacterial cellular engineering. PMID:24552876

  16. Genomic organization of the AODEF gene in Asparagus officinalis L.

    PubMed

    Ito, Takuro; Suzuki, Go; Ochiai, Toshinori; Nakada, Mutsumi; Kameya, Toshiaki; Kanno, Akira

    2005-04-01

    The perianths of Liliaceae plants, such as lily and tulip, have two whorls of almost identical petaloid organs, which are called tepals. According to the modified ABC model proposed in tulip, the class B genes are expressed in whorl 1 as well as whorls 2 and 3, so that the organs of whorls 1 and 2 have the same petaloid structure. The floral structure of asparagus (Asparagus officinalis L.) is similar to that of Liliaceae plants, however, the expression of B-class genes (AODEF, AOGLOA, AOGLOB) was not found in whorl 1, but was confined to whorls 2 and 3. This result does not support the modified ABC model in asparagus. In order to gain a better understanding of asparagus flower development, we have characterized a genomic clone of the AODEF gene. We compared the genomic organization and promoter sequence of AODEF with three well-studied DEF-like genes, DEFICIENS (Antirrhinum), APETALA3 (Arabidopsis), and OSMADS16 (rice). Exon-intron structures of these genes are well-conserved except for the large fifth intron in the AODEF gene and the OSMADS16 gene. Putative cis-elements including CArG-boxes were found in the promoter region and forty-two microsatellites were found in the AODEF genomic sequence.

  17. Gene discovery in the Entamoeba invadens genome.

    PubMed

    Wang, Zheng; Samuelson, John; Clark, C Graham; Eichinger, Daniel; Paul, Jaishree; Van Dellen, Katrina; Hall, Neil; Anderson, Iain; Loftus, Brendan

    2003-06-01

    Entamoeba invadens, a parasite of reptiles, is a model for the study of encystation by the human enteric pathogen Entamoeba histolytica, because E. invadens form cysts in axenic culture. With approximately 0.5-fold sequence coverage of the genome, we were able to get insights into E. invadens gene and genome features. Overall, the E. invadens genome displays many of the features that are emerging from ongoing genome sequencing efforts in E. histolytica. At the nucleotide level the E. invadens genome has on average 60% sequence identity with that of E. histolytica. The presence of introns in E. invadens was predicted with similar consensus (GTTTGT em leader A/TAG) sequences to those identified in E. histolytica and Entamoeba dispar. Sequences highly repeated in the genome of E. histolytica (rRNAs, tRNAs, CXXC-rich proteins, and Leu-rich repeat proteins) were found to be highly repeated in the E. invadens genome. Numerous proteins homologous to those implicated in amoebic virulence, (Gal/GalNAc lectins, amoebapores, and cysteine proteinases) and drug resistance (p-glycoproteins) were identified. Homologs of proteins involved in cell cycle, vesicular trafficking and signal transduction were identified, which may be involved in en/excystation and cell growth of E. invadens. Finally, multiple copies of a number of E. invadens genes coding for predicted enzymes involved in core metabolism and the targets of anti-amoebic drugs were identified.

  18. Regulatory genes in the ancestral chordate genomes.

    PubMed

    Satou, Yutaka; Wada, Shuichi; Sasakura, Yasunori; Satoh, Nori

    2008-12-01

    Changes or innovations in gene regulatory networks for the developmental program in the ancestral chordate genome appear to be a major component in the evolutionary process in which tadpole-type larvae, a unique characteristic of chordates, arose. These alterations may include new genetic interactions as well as the acquisition of new regulatory genes. Previous analyses of the Ciona genome revealed that many genes may have emerged after the divergence of the tunicate and vertebrate lineages. In this paper, we examined this possibility by examining a second non-vertebrate chordate genome. We conclude from this analysis that the ancient chordate included almost the same repertory of regulatory genes, but less redundancy than extant vertebrates, and that approximately 10% of vertebrate regulatory genes were innovated after the emergence of vertebrates. Thus, refined regulatory networks arose during vertebrate evolution mainly as preexisting regulatory genes multiplied rather than by generating new regulatory genes. The inferred regulatory gene sets of the ancestral chordate would be an important foundation for understanding how tadpole-type larvae, a unique characteristic of chordates, evolved.

  19. Genomic platform for efficient identification of fungal secondary metabolism genes

    USDA-ARS?s Scientific Manuscript database

    Fungal secondary metabolites (SMs) are structurally diverse natural compounds, which are thought to have great potential not only for medical industry but also for chemical and environmental industries. Since expansion of sequencing microbial genomes in 1990’s, it has been known that SM genes are ex...

  20. Genomic structure of the EWS gene and its relationship to EWSR1, a site of tumor-associated chromosome translocation

    SciTech Connect

    Plougastel, B.; Zucman, J.; Peter, M.; Thomas, G.; Delattre, O. )

    1993-12-01

    The EWS gene has been identified based on its location at the chromosome 22 breakpoint of the t(11;22)(q24;q12) translocation that characterizes Ewing sarcoma and related neuroectodermal tumors. The EWS gene spans about 40 kb of DNA and is encoded by 17 exons. The nucleotide sequence of the exons is identical to that of the previously described cDNA. The first 7 exons encode the N-terminal domain of EWS, which consists of a repeated degenerated polypeptide of 7 to 12 residues rich in tyrosine, serine, threonine, glycine, and glutamine. Exons 11, 12, and 13 encode the putative RNA binding domain. The three glycine- and arginine-rich motifs of the gene are mainly encoded by exons 8-9, 14, and 16. The DNA sequence in the 5[prime] region of the gene has features of a CpG-rich island and lacks canonical promoter elements, such as TATA and CCAAT consensus sequences. Positions of the chromosome 22 breakpoints were determined for 19 Ewing tumors. They were localized in introns 7 or 8 in 18 cases and in intron 10 in 1 case. 26 refs., 5 figs.

  1. Stem-loop structures in prokaryotic genomes

    PubMed Central

    Petrillo, Mauro; Silvestro, Giustina; Di Nocera, Pier Paolo; Boccia, Angelo; Paolella, Giovanni

    2006-01-01

    Background Prediction of secondary structures in the expressed sequences of bacterial genomes allows to investigate spontaneous folding of the corresponding RNA. This is particularly relevant in untranslated mRNA regions, where base pairing is less affected by interactions with the translation machinery. Relatively large stem-loops significantly contribute to the formation of more complex secondary structures, often important for the activity of sequence elements controlling gene expression. Results Systematic analysis of the distribution of stem-loop structures (SLSs) in 40 wholly-sequenced bacterial genomes is presented. SLSs were searched as stems measuring at least 12 bp, bordering loops 5 to 100 nt in length. G-U pairing in the stems was allowed. SLSs found in natural genomes are constantly more numerous and stable than those expected to randomly form in sequences of comparable size and composition. The large majority of SLSs fall within protein-coding regions but enrichment of specific, non random, SLS sub-populations of higher stability was observed within the intergenic regions of the chromosomes of several species. In low-GC firmicutes, most higher stability intergenic SLSs resemble canonical rho-independent transcriptional terminators, but very frequently feature at the 5'-end an additional A-rich stretch complementary to the 3' uridines. In all species, a clearly biased SLS distribution was observed within the intergenic space, with most concentrating at the 3'-end side of flanking CDSs. Some intergenic SLS regions are members of novel repeated sequence families. Conclusion In depth analysis of SLS features and distribution in 40 different bacterial genomes showed the presence of non random populations of such structures in all species. Many of these structures are plausibly transcribed, and might be involved in the control of transcription termination, or might serve as RNA elements which can enhance either the stability or the turnover of cotranscribed

  2. Regulation of methane genes and genome expression

    SciTech Connect

    John N. Reeve

    2009-09-09

    At the start of this project, it was known that methanogens were Archaeabacteria (now Archaea) and were therefore predicted to have gene expression and regulatory systems different from Bacteria, but few of the molecular biology details were established. The goals were then to establish the structures and organizations of genes in methanogens, and to develop the genetic technologies needed to investigate and dissect methanogen gene expression and regulation in vivo. By cloning and sequencing, we established the gene and operon structures of all of the “methane” genes that encode the enzymes that catalyze methane biosynthesis from carbon dioxide and hydrogen. This work identified unique sequences in the methane gene that we designated mcrA, that encodes the largest subunit of methyl-coenzyme M reductase, that could be used to identify methanogen DNA and establish methanogen phylogenetic relationships. McrA sequences are now the accepted standard and used extensively as hybridization probes to identify and quantify methanogens in environmental research. With the methane genes in hand, we used northern blot and then later whole-genome microarray hybridization analyses to establish how growth phase and substrate availability regulated methane gene expression in Methanobacterium thermautotrophicus ΔH (now Methanothermobacter thermautotrophicus). Isoenzymes or pairs of functionally equivalent enzymes catalyze several steps in the hydrogen-dependent reduction of carbon dioxide to methane. We established that hydrogen availability determine which of these pairs of methane genes is expressed and therefore which of the alternative enzymes is employed to catalyze methane biosynthesis under different environmental conditions. As were unable to establish a reliable genetic system for M. thermautotrophicus, we developed in vitro transcription as an alternative system to investigate methanogen gene expression and regulation. This led to the discovery that an archaeal protein

  3. iCAGES: integrated CAncer GEnome Score for comprehensively prioritizing driver genes in personal cancer genomes.

    PubMed

    Dong, Chengliang; Guo, Yunfei; Yang, Hui; He, Zeyu; Liu, Xiaoming; Wang, Kai

    2016-12-22

    Cancer results from the acquisition of somatic driver mutations. Several computational tools can predict driver genes from population-scale genomic data, but tools for analyzing personal cancer genomes are underdeveloped. Here we developed iCAGES, a novel statistical framework that infers driver variants by integrating contributions from coding, non-coding, and structural variants, identifies driver genes by combining genomic information and prior biological knowledge, then generates prioritized drug treatment. Analysis on The Cancer Genome Atlas (TCGA) data showed that iCAGES predicts whether patients respond to drug treatment (P = 0.006 by Fisher's exact test) and long-term survival (P = 0.003 from Cox regression). iCAGES is available at http://icages.wglab.org .

  4. Novel mechanism of conjoined gene formation in the human genome.

    PubMed

    Kim, Ryong Nam; Kim, Aeri; Choi, Sang-Haeng; Kim, Dae-Soo; Nam, Seong-Hyeuk; Kim, Dae-Won; Kim, Dong-Wook; Kang, Aram; Kim, Min-Young; Park, Kun-Hyang; Yoon, Byoung-Ha; Lee, Kang Seon; Park, Hong-Seog

    2012-03-01

    Recently, conjoined genes (CGs) have emerged as important genetic factors necessary for understanding the human genome. However, their formation mechanism and precise structures have remained mysterious. Based on a detailed structural analysis of 57 human CG transcript variants (CGTVs, discovered in this study) and all (833) known CGs in the human genome, we discovered that the poly(A) signal site from the upstream parent gene region is completely removed via the skipping or truncation of the final exon; consequently, CG transcription is terminated at the poly(A) signal site of the downstream parent gene. This result led us to propose a novel mechanism of CG formation: the complete removal of the poly(A) signal site from the upstream parent gene is a prerequisite for the CG transcriptional machinery to continue transcribing uninterrupted into the intergenic region and downstream parent gene. The removal of the poly(A) signal sequence from the upstream gene region appears to be caused by a deletion or truncation mutation in the human genome rather than post-transcriptional trans-splicing events. With respect to the characteristics of CG sequence structures, we found that intergenic regions are hot spots for novel exon creation during CGTV formation and that exons farther from the intergenic regions are more highly conserved in the CGTVs. Interestingly, many novel exons newly created within the intergenic and intragenic regions originated from transposable element sequences. Additionally, the CGTVs showed tumor tissue-biased expression. In conclusion, our study provides novel insights into the CG formation mechanism and expands the present concepts of the genetic structural landscape, gene regulation, and gene formation mechanisms in the human genome.

  5. Expression characterization, genomic structure and function analysis of fish ubiquitin-specific protease 18 (USP18) genes.

    PubMed

    Chen, Chen; Zhang, Yi-Bing; Gui, Jian-Fang

    2015-10-01

    In mammals, USP18 (ubiquitin-specific protease 18) is an interferon (IFN) inducible protein and plays a role in regulation of IFN response upon viral infection. In this study, we first cloned a USP18 homologous gene from virally-infected crucian carp (Carassius auratus) blastula embryonic (CAB) cells, and later found in other fish species including zebrafish. All fish USP18 genes have 10 exons and 9 introns comparable to 11 exons and 10 introns in non-fish vertebrates. Expression analysis revealed that fish USP18 was significantly induced in vitro and in vivo by IFN and IFN stimuli. Using promoter-driven luciferase reporter assay system to explore the molecular mechanism underlying fish USP18 expression, fish USP18 was identified as a typical interferon (IFN)-stimulated gene (ISG). Intracellular poly(I:C)-triggered zebrafish USP18 expression was regulated through RLR-IFN pathway, which was consistent with the fact that fish USP18 gene promoter contained two typical IFN-stimulated response elements (ISREs). Further mutation assays revealed that the distant ISRE motif primarily contributed to the induction of zebrafish USP18 by fish IFN and IFN stimuli. Functionally, fish USP18 inhibited poly(I:C)- and IFN-triggered activation of a common ISRE-containing promoter, and attenuated transcriptional expression of some ISGs including Stat1 and PKZ by recombinant IFN. Finally, we found that fish USP18 protein was expressed in cytoplasm and exhibited an ability to interact with ISG15. These results indicate that fish USP18 likely exerts its function similar to mammalian homologs. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Genomic hypomethylation in the human germline associates with selective structural mutability in the human genome.

    PubMed

    Li, Jian; Harris, R Alan; Cheung, Sau Wai; Coarfa, Cristian; Jeong, Mira; Goodell, Margaret A; White, Lisa D; Patel, Ankita; Kang, Sung-Hae; Shaw, Chad; Chinault, A Craig; Gambin, Tomasz; Gambin, Anna; Lupski, James R; Milosavljevic, Aleksandar

    2012-01-01

    The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR) mediated by low-copy repeats (LCRs). Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ~1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs) from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH) chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR-mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease.

  7. Genomic Hypomethylation in the Human Germline Associates with Selective Structural Mutability in the Human Genome

    PubMed Central

    Li, Jian; Harris, R. Alan; Cheung, Sau Wai; Coarfa, Cristian; Jeong, Mira; Goodell, Margaret A.; White, Lisa D.; Patel, Ankita; Kang, Sung-Hae; Shaw, Chad; Chinault, A. Craig; Gambin, Tomasz; Gambin, Anna; Lupski, James R.; Milosavljevic, Aleksandar

    2012-01-01

    The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR) mediated by low-copy repeats (LCRs). Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ∼1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs) from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH) chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR–mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease. PMID:22615578

  8. Chloroplast genome structure in Ilex (Aquifoliaceae)

    PubMed Central

    Yao, Xin; Tan, Yun-Hong; Liu, Ying-Ying; Song, Yu; Yang, Jun-Bo; Corlett, Richard T.

    2016-01-01

    Aquifoliaceae is the largest family in the campanulid order Aquifoliales. It consists of a single genus, Ilex, the hollies, which is the largest woody dioecious genus in the angiosperms. Most species are in East Asia or South America. The taxonomy and evolutionary history remain unclear due to the lack of a robust species-level phylogeny. We produced the first complete chloroplast genomes in this family, including seven Ilex species, by Illumina sequencing of long-range PCR products and subsequent reference-guided de novo assembly. These genomes have a typical bicyclic structure with a conserved genome arrangement and moderate divergence. The total length is 157,741 bp and there is one large single-copy region (LSC) with 87,109 bp, one small single-copy with 18,436 bp, and a pair of inverted repeat regions (IR) with 52,196 bp. A total of 144 genes were identified, including 96 protein-coding genes, 40 tRNA and 8 rRNA. Thirty-four repetitive sequences were identified in Ilex pubescens, with lengths >14 bp and identity >90%, and 11 divergence hotspot regions that could be targeted for phylogenetic markers. This study will contribute to improved resolution of deep branches of the Ilex phylogeny and facilitate identification of Ilex species. PMID:27378489

  9. Chloroplast genome structure in Ilex (Aquifoliaceae).

    PubMed

    Yao, Xin; Tan, Yun-Hong; Liu, Ying-Ying; Song, Yu; Yang, Jun-Bo; Corlett, Richard T

    2016-07-05

    Aquifoliaceae is the largest family in the campanulid order Aquifoliales. It consists of a single genus, Ilex, the hollies, which is the largest woody dioecious genus in the angiosperms. Most species are in East Asia or South America. The taxonomy and evolutionary history remain unclear due to the lack of a robust species-level phylogeny. We produced the first complete chloroplast genomes in this family, including seven Ilex species, by Illumina sequencing of long-range PCR products and subsequent reference-guided de novo assembly. These genomes have a typical bicyclic structure with a conserved genome arrangement and moderate divergence. The total length is 157,741 bp and there is one large single-copy region (LSC) with 87,109 bp, one small single-copy with 18,436 bp, and a pair of inverted repeat regions (IR) with 52,196 bp. A total of 144 genes were identified, including 96 protein-coding genes, 40 tRNA and 8 rRNA. Thirty-four repetitive sequences were identified in Ilex pubescens, with lengths >14 bp and identity >90%, and 11 divergence hotspot regions that could be targeted for phylogenetic markers. This study will contribute to improved resolution of deep branches of the Ilex phylogeny and facilitate identification of Ilex species.

  10. An introduction to genes, genomes and disease.

    PubMed

    Hall, Peter A; Reis-Filho, Jorge S; Tomlinson, Ian Pm; Poulsom, Richard

    2010-01-01

    The human and other genome projects and subsequent resequencing programmes have provided new perspectives on the nature of the gene and how genes function. Understanding the complexity of the eukaryotic nucleus and the diversity of genetic regulatory mechanisms, including the role of non-coding RNAs, translational control mechanisms and the extraordinary prevalence of splicing, will be central to understanding how genes function, as will the recognition of gene dosage issues. This introduction to the 2010 Annual Review Issue, Genes, Genomes and Disease, provides overviews of these areas and then considers their relevance to a range of human diseases, including cardiovascular and renal disease, neural tube defects and cancer. The p53 gene is considered as an example of a massively regulated gene and the genetic perturbations in cancer are considered in a historical perspective. High-throughput genomic and transcriptomic methods have led to a paradigm shift in the way cancers are perceived and have changed the way translational research is performed. The progress in our understanding of chromosomal rearrangements in cancer, once believed to be incredibly rare events in epithelial malignancies, is discussed. The identification of low-penetrance cancer susceptibility genes through genome-wide association studies and their implications are reviewed. The contribution and limitations of expression profiling are discussed. In the last series of reviews, future challenges are addressed: the promise of synthetic lethality strategies in cancer therapy, a case for 'systems' approaches to genetic networks and the potential of single molecule genetic technologies. Finally, the question 'Does massively parallel DNA resequencing signify the end of histopathology as we know it?' is posed. Readers should find that the 2010 Annual Review Issue is an invaluable resource on contemporary genetics and its applications to understanding disease.

  11. Structure of the germline genome of Tetrahymena thermophila and relationship to the massively rearranged somatic genome.

    PubMed

    Hamilton, Eileen P; Kapusta, Aurélie; Huvos, Piroska E; Bidwell, Shelby L; Zafar, Nikhat; Tang, Haibao; Hadjithomas, Michalis; Krishnakumar, Vivek; Badger, Jonathan H; Caler, Elisabet V; Russ, Carsten; Zeng, Qiandong; Fan, Lin; Levin, Joshua Z; Shea, Terrance; Young, Sarah K; Hegarty, Ryan; Daza, Riza; Gujja, Sharvari; Wortman, Jennifer R; Birren, Bruce W; Nusbaum, Chad; Thomas, Jainy; Carey, Clayton M; Pritham, Ellen J; Feschotte, Cédric; Noto, Tomoko; Mochizuki, Kazufumi; Papazyan, Romeo; Taverna, Sean D; Dear, Paul H; Cassidy-Hanley, Donna M; Xiong, Jie; Miao, Wei; Orias, Eduardo; Coyne, Robert S

    2016-11-28

    The germline genome of the binucleated ciliate Tetrahymena thermophila undergoes programmed chromosome breakage and massive DNA elimination to generate the somatic genome. Here, we present a complete sequence assembly of the germline genome and analyze multiple features of its structure and its relationship to the somatic genome, shedding light on the mechanisms of genome rearrangement as well as the evolutionary history of this remarkable germline/soma differentiation. Our results strengthen the notion that a complex, dynamic, and ongoing interplay between mobile DNA elements and the host genome have shaped Tetrahymena chromosome structure, locally and globally. Non-standard outcomes of rearrangement events, including the generation of short-lived somatic chromosomes and excision of DNA interrupting protein-coding regions, may represent novel forms of developmental gene regulation. We also compare Tetrahymena's germline/soma differentiation to that of other characterized ciliates, illustrating the wide diversity of adaptations that have occurred within this phylum.

  12. The Complete Mitochondrial Genome of Aleurocanthus camelliae: Insights into Gene Arrangement and Genome Organization within the Family Aleyrodidae.

    PubMed

    Chen, Shi-Chun; Wang, Xiao-Qing; Li, Pin-Wu; Hu, Xiang; Wang, Jin-Jun; Peng, Ping

    2016-11-07

    There are numerous gene rearrangements and transfer RNA gene absences existing in mitochondrial (mt) genomes of Aleyrodidae species. To understand how mt genomes evolved in the family Aleyrodidae, we have sequenced the complete mt genome of Aleurocanthus camelliae and comparatively analyzed all reported whitefly mt genomes. The mt genome of A. camelliae is 15,188 bp long, and consists of 13 protein-coding genes, two rRNA genes, 21 tRNA genes and a putative control region (GenBank: KU761949). The tRNA gene, trnI, has not been observed in this genome. The mt genome has a unique gene order and shares most gene boundaries with Tetraleurodes acaciae. Nineteen of 21 tRNA genes have the conventional cloverleaf shaped secondary structure and two (trnS₁ and trnS₂) lack the dihydrouridine (DHU) arm. Using ARWEN and homologous sequence alignment, we have identified five tRNA genes and revised the annotation for three whitefly mt genomes. This result suggests that most absent genes exist in the genomes and have not been identified, due to be lack of technology and inference sequence. The phylogenetic relationships among 11 whiteflies and Drosophila melanogaster were inferred by maximum likelihood and Bayesian inference methods. Aleurocanthus camelliae and T. acaciae form a sister group, and all three Bemisia tabaci and two Bemisia afer strains gather together. These results are identical to the relationships inferred from gene order. We inferred that gene rearrangement plays an important role in the mt genome evolved from whiteflies.

  13. The Complete Mitochondrial Genome of Aleurocanthus camelliae: Insights into Gene Arrangement and Genome Organization within the Family Aleyrodidae

    PubMed Central

    Chen, Shi-Chun; Wang, Xiao-Qing; Li, Pin-Wu; Hu, Xiang; Wang, Jin-Jun; Peng, Ping

    2016-01-01

    There are numerous gene rearrangements and transfer RNA gene absences existing in mitochondrial (mt) genomes of Aleyrodidae species. To understand how mt genomes evolved in the family Aleyrodidae, we have sequenced the complete mt genome of Aleurocanthus camelliae and comparatively analyzed all reported whitefly mt genomes. The mt genome of A. camelliae is 15,188 bp long, and consists of 13 protein-coding genes, two rRNA genes, 21 tRNA genes and a putative control region (GenBank: KU761949). The tRNA gene, trnI, has not been observed in this genome. The mt genome has a unique gene order and shares most gene boundaries with Tetraleurodes acaciae. Nineteen of 21 tRNA genes have the conventional cloverleaf shaped secondary structure and two (trnS1 and trnS2) lack the dihydrouridine (DHU) arm. Using ARWEN and homologous sequence alignment, we have identified five tRNA genes and revised the annotation for three whitefly mt genomes. This result suggests that most absent genes exist in the genomes and have not been identified, due to be lack of technology and inference sequence. The phylogenetic relationships among 11 whiteflies and Drosophila melanogaster were inferred by maximum likelihood and Bayesian inference methods. Aleurocanthus camelliae and T. acaciae form a sister group, and all three Bemisia tabaci and two Bemisia afer strains gather together. These results are identical to the relationships inferred from gene order. We inferred that gene rearrangement plays an important role in the mt genome evolved from whiteflies. PMID:27827992

  14. Gene Fusion: A Genome Wide Survey

    NASA Technical Reports Server (NTRS)

    Liang, Ping; Riley, Monica

    2001-01-01

    As a well known fact, organisms form larger and complex multimodular (composite or chimeric) and mostly multi-functional proteins through gene fusion of two or more individual genes which have independent evolution histories and functions. We call each of these components a module. The existence of multimodular proteins may improves the efficiency in gene regulation and in cellular functions, and thus may give the host organism advantages in adaptation to environments. Analysis of all gene fusions in present-day organisms should allow us to examine the patterns of gene fusion in context with cellular functions, to trace back the evolution processes from the ancient smaller and uni-functional proteins to the present-day larger and complex multi-functional proteins, and to estimate the minimal number of ancestor proteins that existed in the last common ancestor for all life on earth. Although many multimodular proteins have been experimentally known, identification of gene fusion events systematically at genome scale had not been possible until recently when large number of completed genome sequences have been becoming available. In addition, technical difficulties for such analysis also exist due to the complexity of this biological and evolutionary process. We report from this study a new strategy to computationally identify multimodular proteins using completed genome sequences and the results surveyed from 22 organisms with the data from over 40 organisms to be presented during the meeting. Additional information is contained in the original extended abstract.

  15. Genomic organization of the CC chemokine mip-3alpha/CCL20/larc/exodus/SCYA20, showing gene structure, splice variants, and chromosome localization.

    PubMed

    Nelson, R T; Boyd, J; Gladue, R P; Paradis, T; Thomas, R; Cunningham, A C; Lira, P; Brissette, W H; Hayes, L; Hames, L M; Neote, K S; McColl, S R

    2001-04-01

    We describe the genomic organization of a recently identified CC chemokine, MIP3alpha/CCL20 (HGMW-approved symbol SCYA20). The MIP-3alpha/CCL20 gene was cloned and sequenced, revealing a four exon, three intron structure, and was localized by FISH analysis to 2q35-q36. Two distinct cDNAs were identified, encoding two forms of MIP-3alpha/CCL20, Ala MIP-3alpha/CCL20 and Ser MIP-3alpha/CCL20, that differ by one amino acid at the predicted signal peptide cleavage site. Examination of the sequence around the boundary of intron 1 and exon 2 showed that use of alternative splice acceptor sites could give rise to Ala MIP-3alpha/CCL20 or Ser MIP-3alpha/CCL20. Both forms of MIP-3alpha/CCL20 were chemically synthesized and tested for biological activity. Both flu antigen plus IL-2-activated CD4(+) and CD8(+) T lymphoblasts and cord blood-derived dendritic cells responded to Ser and Ala MIP-3alpha/CCL20. T lymphocytes exposed only to IL-2 responded inconsistently, while no response was detected in naive T lymphocytes, monocytes, or neutrophils. The biological activity of Ser MIP-3alpha/CCL20 and Ala MIP-3alpha/CCL20 and the tissue-specific preference of different splice acceptor sites are not yet known.

  16. Genomic Prediction of Gene Bank Wheat Landraces

    PubMed Central

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J.; Wenzl, Peter; Singh, Sukhwinder

    2016-01-01

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite

  17. Genomic Prediction of Gene Bank Wheat Landraces.

    PubMed

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J; Wenzl, Peter; Singh, Sukhwinder

    2016-07-07

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, "diversity" and "prediction", including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15-20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials.

  18. Evolutionary genomics of LysM genes in land plants.

    PubMed

    Zhang, Xue-Cheng; Cannon, Steven B; Stacey, Gary

    2009-08-03

    The ubiquitous LysM motif recognizes peptidoglycan, chitooligosaccharides (chitin) and, presumably, other structurally-related oligosaccharides. LysM-containing proteins were first shown to be involved in bacterial cell wall degradation and, more recently, were implicated in perceiving chitin (one of the established pathogen-associated molecular patterns) and lipo-chitin (nodulation factors) in flowering plants. However, the majority of LysM genes in plants remain functionally uncharacterized and the evolutionary history of complex LysM genes remains elusive. We show that LysM-containing proteins display a wide range of complex domain architectures. However, only a simple core architecture is conserved across kingdoms. Each individual kingdom appears to have evolved a distinct array of domain architectures. We show that early plant lineages acquired four characteristic architectures and progressively lost several primitive architectures. We report plant LysM phylogenies and associated gene, protein and genomic features, and infer the relative timing of duplications of LYK genes. We report a domain architecture catalogue of LysM proteins across all kingdoms. The unique pattern of LysM protein domain architectures indicates the presence of distinctive evolutionary paths in individual kingdoms. We describe a comparative and evolutionary genomics study of LysM genes in plant kingdom. One of the two groups of tandemly arrayed plant LYK genes likely resulted from an ancient genome duplication followed by local genomic rearrangement, while the origin of the other groups of tandemly arrayed LYK genes remains obscure. Given the fact that no animal LysM motif-containing genes have been functionally characterized, this study provides clues to functional characterization of plant LysM genes and is also informative with regard to evolutionary and functional studies of animal LysM genes.

  19. Evolutionary genomics of LysM genes in land plants

    PubMed Central

    Zhang, Xue-Cheng; Cannon, Steven B; Stacey, Gary

    2009-01-01

    Background The ubiquitous LysM motif recognizes peptidoglycan, chitooligosaccharides (chitin) and, presumably, other structurally-related oligosaccharides. LysM-containing proteins were first shown to be involved in bacterial cell wall degradation and, more recently, were implicated in perceiving chitin (one of the established pathogen-associated molecular patterns) and lipo-chitin (nodulation factors) in flowering plants. However, the majority of LysM genes in plants remain functionally uncharacterized and the evolutionary history of complex LysM genes remains elusive. Results We show that LysM-containing proteins display a wide range of complex domain architectures. However, only a simple core architecture is conserved across kingdoms. Each individual kingdom appears to have evolved a distinct array of domain architectures. We show that early plant lineages acquired four characteristic architectures and progressively lost several primitive architectures. We report plant LysM phylogenies and associated gene, protein and genomic features, and infer the relative timing of duplications of LYK genes. Conclusion We report a domain architecture catalogue of LysM proteins across all kingdoms. The unique pattern of LysM protein domain architectures indicates the presence of distinctive evolutionary paths in individual kingdoms. We describe a comparative and evolutionary genomics study of LysM genes in plant kingdom. One of the two groups of tandemly arrayed plant LYK genes likely resulted from an ancient genome duplication followed by local genomic rearrangement, while the origin of the other groups of tandemly arrayed LYK genes remains obscure. Given the fact that no animal LysM motif-containing genes have been functionally characterized, this study provides clues to functional characterization of plant LysM genes and is also informative with regard to evolutionary and functional studies of animal LysM genes. PMID:19650916

  20. Whole genome DNA methylation: beyond genes silencing

    PubMed Central

    Tirado-Magallanes, Roberto; Rebbani, Khadija; Lim, Ricky; Pradhan, Sriharsa; Benoukraf, Touati

    2017-01-01

    The combination of DNA bisulfite treatment with high-throughput sequencing technologies has enabled investigation of genome-wide DNA methylation at near base pair level resolution, far beyond that of the kilobase-long canonical CpG islands that initially revealed the biological relevance of this covalent DNA modification. The latest high-resolution studies have revealed a role for very punctual DNA methylation in chromatin plasticity, gene regulation and splicing. Here, we aim to outline the major biological consequences of DNA methylation recently discovered. We also discuss the necessity of tuning DNA methylation resolution into an adequate scale to ease the integration of the methylome information with other chromatin features and transcription events such as gene expression, nucleosome positioning, transcription factors binding dynamic, gene splicing and genomic imprinting. Finally, our review sheds light on DNA methylation heterogeneity in cell population and the different approaches used for its assessment, including the contribution of single cell DNA analysis technology. PMID:27895318

  1. Genes after the human genome project.

    PubMed

    Baetu, Tudor M

    2012-03-01

    While the Human Genome Nomenclature Committee (HGNC) concept of the gene can accommodate a wide variety of genomic sequences contributing to phenotypic outcomes, it fails to specify how sequences should be grouped when dealing with complex loci consisting of adjacent/overlapping sequences contributing to the same phenotype, distant sequences shown to contribute to the same gene product, and partially overlapping sequences identified by different techniques. The purpose of this paper is to review recently proposed concepts of the gene and critically assess how well they succeed in addressing the above problems while preserving the degree of generality achieved by the HGNC concept. I conclude that a dynamic interplay between mapping and syntax-based concepts is required in order to satisfy these desiderata. Copyright © 2011 Elsevier Ltd. All rights reserved.

  2. Floral gene resources from basal angiosperms for comparative genomics research

    PubMed Central

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and

  3. Early evolutionary history and genomic features of gene duplicates in the human genome.

    PubMed

    Bu, Lijing; Katju, Vaishali

    2015-08-20

    Human gene duplicates have been the focus of intense research since the development of array-based and targeted next-generation sequencing approaches in the last decade. These studies have primarily concentrated on determining the extant copy-number variation from a population-genomic perspective but lack a robust evolutionary framework to elucidate the early structural and genomic characteristics of gene duplicates at emergence and their subsequent evolution with increasing age. We analyzed 184 gene duplicate pairs comprising small gene families in the draft human genome with 10% or less synonymous sequence divergence. Human gene duplicates primarily originate from DNA-mediated events, taking up genomic residence as intrachromosomal copies in direct or inverse orientation. The distribution of paralogs on autosomes follows random expectations in contrast to their significant enrichment on the sex chromosomes. Furthermore, human gene duplicates exhibit a skewed gradient of distribution along the chromosomal length with significant clustering in pericentromeric regions. Surprisingly, despite the large average length of human genes, the majority of extant duplicates (83%) are complete duplicates, wherein the entire ORF of the ancestral copy was duplicated. The preponderance of complete duplicates is in accord with an extremely large median duplication span of 36 kb, which enhances the probability of capturing ancestral ORFs in their entirety. With increasing evolutionary age, human paralogs exhibit declines in (i) the frequency of intrachromosomal paralogs, and (ii) the proportion of complete duplicates. These changes may reflect lower survival rates of certain classes of duplicates and/or the role of purifying selection. Duplications arising from RNA-mediated events comprise a small fraction (11.4%) of all human paralogs and are more numerous in older evolutionary cohorts of duplicates. The degree of structural resemblance, genomic location and duplication span

  4. A gene map of the human genome.

    PubMed

    Schuler, G D; Boguski, M S; Stewart, E A; Stein, L D; Gyapay, G; Rice, K; White, R E; Rodriguez-Tomé, P; Aggarwal, A; Bajorek, E; Bentolila, S; Birren, B B; Butler, A; Castle, A B; Chiannilkulchai, N; Chu, A; Clee, C; Cowles, S; Day, P J; Dibling, T; Drouot, N; Dunham, I; Duprat, S; East, C; Edwards, C; Fan, J B; Fang, N; Fizames, C; Garrett, C; Green, L; Hadley, D; Harris, M; Harrison, P; Brady, S; Hicks, A; Holloway, E; Hui, L; Hussain, S; Louis-Dit-Sully, C; Ma, J; MacGilvery, A; Mader, C; Maratukulam, A; Matise, T C; McKusick, K B; Morissette, J; Mungall, A; Muselet, D; Nusbaum, H C; Page, D C; Peck, A; Perkins, S; Piercy, M; Qin, F; Quackenbush, J; Ranby, S; Reif, T; Rozen, S; Sanders, C; She, X; Silva, J; Slonim, D K; Soderlund, C; Sun, W L; Tabar, P; Thangarajah, T; Vega-Czarny, N; Vollrath, D; Voyticky, S; Wilmer, T; Wu, X; Adams, M D; Auffray, C; Walter, N A; Brandon, R; Dehejia, A; Goodfellow, P N; Houlgatte, R; Hudson, J R; Ide, S E; Iorio, K R; Lee, W Y; Seki, N; Nagase, T; Ishikawa, K; Nomura, N; Phillips, C; Polymeropoulos, M H; Sandusky, M; Schmitt, K; Berry, R; Swanson, K; Torres, R; Venter, J C; Sikela, J M; Beckmann, J S; Weissenbach, J; Myers, R M; Cox, D R; James, M R; Bentley, D; Deloukas, P; Lander, E S; Hudson, T J

    1996-10-25

    The human genome is thought to harbor 50,000 to 100,000 genes, of which about half have been sampled to date in the form of expressed sequence tags. An international consortium was organized to develop and map gene-based sequence tagged site markers on a set of two radiation hybrid panels and a yeast artificial chromosome library. More than 16,000 human genes have been mapped relative to a framework map that contains about 1000 polymorphic genetic markers. The gene map unifies the existing genetic and physical maps with the nucleotide and protein sequence databases in a fashion that should speed the discovery of genes underlying inherited human disease. The integrated resource is available through a site on the World Wide Web at http://www.ncbi.nlm.nih.gov/SCIENCE96/.

  5. Genomic organization of the neurofibromatosis 1 gene (NF1)

    SciTech Connect

    Li, Y.; O`Connell, P.; Huntsman Breidenbach, H.

    1995-01-01

    Neurofibromatosis 1 maps to chromosome band 17q11.2, and the NF1 locus has been partially characterized. Even though the full-length NF1 cDNA has been sequenced, the complete genomic structure of the NF1 gene has not been elucidated. The 5{prime} end of NF1 is embedded in a CpG island containing a NotI restriction site, and the remainder of the gene lies in the adjacent 350-kb NotI fragment. In our efforts to develop a comprehensive screen for NF1 mutations, we have isolated genomic DNA clones that together harbor the entire NF1 cDNA sequence. We have identified all intron-exon boundaries of the coding region and established that it is composed of 59 exons. Furthermore, we have defined the 3{prime}-untranslated region (3{prime}-UTR) of the NF1 gene; it spans approximately 3.5 kb of genomic DNA sequence and is continuous with the stop codon. Oligonucleotide primer pairs synthesized from exon-flanking DNA sequences were used in the polymerase chain reaction with cloned, chromosome 17-specific genomic DNA as template to amplify NF1 exons 1 through 27b and the exon containing the 3{prime}-UTR separately. This information should be useful for implementing a comprehensive NF1 mutation screen using genomic DNA as template. 41 refs., 3 figs., 2 tabs.

  6. Gene organization inside replication domains in mammalian genomes

    NASA Astrophysics Data System (ADS)

    Zaghloul, Lamia; Baker, Antoine; Audit, Benjamin; Arneodo, Alain

    2012-11-01

    We investigate the large-scale organization of human genes with respect to "master" replication origins that were previously identified as bordering nucleotide compositional skew domains. We separate genes in two categories depending on their CpG enrichment at the promoter which can be considered as a marker of germline DNA methylation. Using expression data in mouse, we confirm that CpG-rich genes are highly expressed in germline whereas CpG-poor genes are in a silent state. We further show that, whether tissue-specific or broadly expressed (housekeeping genes), the CpG-rich genes are over-represented close to the replication skew domain borders suggesting some coordination of replication and transcription. We also reveal that the transcription of the longest CpG-rich genes is co-oriented with replication fork progression so that the promoter of these transcriptionally active genes be located into the accessible open chromatin environment surrounding the master replication origins that border the replication skew domains. The observation of a similar gene organization in the mouse genome confirms the interplay of replication, transcription and chromatin structure as the cornerstone of mammalian genome architecture.

  7. 3D genome structure modeling by Lorentzian objective function.

    PubMed

    Trieu, Tuan; Cheng, Jianlin

    2017-02-17

    The 3D structure of the genome plays a vital role in biological processes such as gene interaction, gene regulation, DNA replication and genome methylation. Advanced chromosomal conformation capture techniques, such as Hi-C and tethered conformation capture, can generate chromosomal contact data that can be used to computationally reconstruct 3D structures of the genome. We developed a novel restraint-based method that is capable of reconstructing 3D genome structures utilizing both intra-and inter-chromosomal contact data. Our method was robust to noise and performed well in comparison with a panel of existing methods on a controlled simulated data set. On a real Hi-C data set of the human genome, our method produced chromosome and genome structures that are consistent with 3D FISH data and known knowledge about the human chromosome and genome, such as, chromosome territories and the cluster of small chromosomes in the nucleus center with the exception of the chromosome 18. The tool and experimental data are available at https://missouri.box.com/v/LorDG.

  8. 3D genome structure modeling by Lorentzian objective function.

    PubMed

    Trieu, Tuan; Cheng, Jianlin

    2016-11-29

    The 3D structure of the genome plays a vital role in biological processes such as gene interaction, gene regulation, DNA replication and genome methylation. Advanced chromosomal conformation capture techniques, such as Hi-C and tethered conformation capture, can generate chromosomal contact data that can be used to computationally reconstruct 3D structures of the genome. We developed a novel restraint-based method that is capable of reconstructing 3D genome structures utilizing both intra-and inter-chromosomal contact data. Our method was robust to noise and performed well in comparison with a panel of existing methods on a controlled simulated data set. On a real Hi-C data set of the human genome, our method produced chromosome and genome structures that are consistent with 3D FISH data and known knowledge about the human chromosome and genome, such as, chromosome territories and the cluster of small chromosomes in the nucleus center with the exception of the chromosome 18. The tool and experimental data are available at https://missouri.box.com/v/LorDG.

  9. The d4 gene family in the human genome

    SciTech Connect

    Chestkov, A.V.; Baka, I.D.; Kost, M.V.

    1996-08-15

    The d4 domain, a novel zinc finger-like structural motif, was first revealed in the rat neuro-d4 protein. Here we demonstrate that the d4 domain is conserved in evolution and that three related genes form a d4 family in the human genome. The human neuro-d4 is very similar to rat neuro-d4 at both the amino acid and the nucleotide levels. Moreover, the same splice variants have been detected among rat and human neuro-d4 transcripts. This gene has been localized on chromosome 19, and two other genes, members of the d4 family isolated by screening of the human genomic library at low stringency, have been mapped to chromosomes 11 and 14. The gene on chromosome 11 is the homolog of the ubiquitously expressed mouse gene ubi-d4/requiem, which is required for cell death after deprivation of trophic factors. A gene with a conserved d4 domain has been found in the genome of the nematode Caenorhabditis elegans. The conservation of d4 proteins from nematodes to vertebrates suggests that they have a general importance, but a diversity of d4 proteins expressed in vertebrate nervous systems suggests that some family members have special functions. 11 refs., 2 figs.

  10. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

    PubMed

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-02-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp.

  11. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.)

    PubMed Central

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-01-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073

  12. Vive la différence: naming structural variants in the human reference genome.

    PubMed

    Seal, Ruth L; Wright, Mathew W; Gray, Kristian A; Bruford, Elspeth A

    2013-05-01

    The HUGO Gene Nomenclature Committee has approved gene symbols for the majority of protein-coding genes on the human reference genome. To adequately represent regions of complex structural variation, the Genome Reference Consortium now includes alternative representations of some of these regions as part of the reference genome. Here, we describe examples of how we name novel genes in these regions and how this nomenclature is displayed on our website, http://genenames.org.

  13. Exon structure of the human dystrophin gene

    SciTech Connect

    Roberts, R.G.; Coffey, A.J.; Bobrow, M.; Bentley, D.R.

    1993-05-01

    Application of a novel vectorette PCR approach to defining intron-exon boundaries has permitted completion of analysis of the exon structure of the largest and most complex known human gene. The authors present here a summary of the exon structure of the entire human dystrophin gene, together with the sizes of genomic HindIII fragments recognized by each exon, and (where available) GenBank accession numbers for adjacent intron sequences. 20 refs., 1 tab.

  14. Cancer genomics identifies disrupted epigenetic genes.

    PubMed

    Simó-Riudalbas, Laia; Esteller, Manel

    2014-06-01

    Latest advances in genome technologies have greatly advanced the discovery of epigenetic genes altered in cancer. The initial single candidate gene approaches have been coupled with newly developed epigenomic platforms to hasten the convergence of scientific discoveries and translational applications. Here, we present an overview of the evolution of cancer epigenomics and an updated catalog of disruptions in epigenetic pathways, whose misregulation can culminate in cancer. The creation of these basic mutational catalogs in cell lines and primary tumors will provide us with enough knowledge to move diagnostics and therapy from the laboratory bench to the bedside.

  15. Gene Conversion Shapes Linear Mitochondrial Genome Architecture

    PubMed Central

    Smith, David Roy; Keeling, Patrick J.

    2013-01-01

    Recently, it was shown that gene conversion between the ends of linear mitochondrial chromosomes can cause telomere expansion and the duplication of subtelomeric loci. However, it is not yet known how widespread this phenomenon is and how significantly it has impacted organelle genome architecture. Using linear mitochondrial DNAs and mitochondrial plasmids from diverse eukaryotes, we argue that telomeric recombination has played a major role in fashioning linear organelle chromosomes. We find that mitochondrial telomeres frequently expand into subtelomeric regions, resulting in gene duplications, homogenizations, and/or fragmentations. We suggest that these features are a product of subtelomeric gene conversion, provide a hypothetical model for this process, and employ genetic diversity data to support the idea that the greater the effective population size the greater the potential for gene conversion between subtelomeric loci. PMID:23572386

  16. The evolution of chloroplast genes and genomes in ferns.

    PubMed

    Wolf, Paul G; Der, Joshua P; Duffy, Aaron M; Davidson, Jacob B; Grusz, Amanda L; Pryer, Kathleen M

    2011-07-01

    Most of the publicly available data on chloroplast (plastid) genes and genomes come from seed plants, with relatively little information from their sister group, the ferns. Here we describe several broad evolutionary patterns and processes in fern plastid genomes (plastomes), and we include some new plastome sequence data. We review what we know about the evolutionary history of plastome structure across the fern phylogeny and we compare plastome organization and patterns of evolution in ferns to those in seed plants. A large clade of ferns is characterized by a plastome that has been reorganized with respect to the ancestral gene order (a similar order that is ancestral in seed plants). We review the sequence of inversions that gave rise to this organization. We also explore global nucleotide substitution patterns in ferns versus those found in seed plants across plastid genes, and we review the high levels of RNA editing observed in fern plastomes.

  17. Fast ancestral gene order reconstruction of genomes with unequal gene content.

    PubMed

    Feijão, Pedro; Araujo, Eloi

    2016-11-11

    During evolution, genomes are modified by large scale structural events, such as rearrangements, deletions or insertions of large blocks of DNA. Of particular interest, in order to better understand how this type of genomic evolution happens, is the reconstruction of ancestral genomes, given a phylogenetic tree with extant genomes at its leaves. One way of solving this problem is to assume a rearrangement model, such as Double Cut and Join (DCJ), and find a set of ancestral genomes that minimizes the number of events on the input tree. Since this problem is NP-hard for most rearrangement models, exact solutions are practical only for small instances, and heuristics have to be used for larger datasets. This type of approach can be called event-based. Another common approach is based on finding conserved structures between the input genomes, such as adjacencies between genes, possibly also assigning weights that indicate a measure of confidence or probability that this particular structure is present on each ancestral genome, and then finding a set of non conflicting adjacencies that optimize some given function, usually trying to maximize total weight and minimizing character changes in the tree. We call this type of methods homology-based. In previous work, we proposed an ancestral reconstruction method that combines homology- and event-based ideas, using the concept of intermediate genomes, that arise in DCJ rearrangement scenarios. This method showed better rate of correctly reconstructed adjacencies than other methods, while also being faster, since the use of intermediate genomes greatly reduces the search space. Here, we generalize the intermediate genome concept to genomes with unequal gene content, extending our method to account for gene insertions and deletions of any length. In many of the simulated datasets, our proposed method had better results than MLGO and MGRA, two state-of-the-art algorithms for ancestral reconstruction with unequal gene content

  18. Plant noncoding RNA gene discovery by "single-genome comparative genomics".

    PubMed

    Chen, Chong-Jian; Zhou, Hui; Chen, Yue-Qin; Qu, Liang-Hu; Gautheret, Daniel

    2011-03-01

    Plant genomes have undergone multiple rounds of duplications that contributed massively to the growth of gene families. The structure of resulting families has been studied in depth for protein-coding genes. However, little is known about the impact of duplications on noncoding RNA (ncRNA) genes. Here we perform a systematic analysis of duplicated regions in the rice genome in search of such ncRNA repeats. We observe that, just like their protein counterparts, most ncRNA genes have undergone multiple duplications that left visible sequence conservation footprints. The extent of ncRNA gene duplication in plants is such that these sequence footprints can be exploited for the discovery of novel ncRNA gene families on a large scale. We developed an SVM model that is able to retrieve likely ncRNA candidates among the 100,000+ repeat families in the rice genome, with a reasonably low false-positive discovery rate. Among the nearly 4000 ncRNA families predicted by this means, only 90 correspond to putative snoRNA or miRNA families. About half of the remaining families are classified as structured RNAs. New candidate ncRNAs are particularly enriched in UTR and intronic regions. Interestingly, 89% of the putative ncRNA families do not produce a detectable signal when their sequences are compared to another grass genome such as maize. Our results show that a large fraction of rice ncRNA genes are present in multiple copies and are species-specific or of recent origin. Intragenome comparison is a unique and potent source for the computational annotation of this major class of ncRNA.

  19. Coevolution of the Organization and Structure of Prokaryotic Genomes.

    PubMed

    Touchon, Marie; Rocha, Eduardo P C

    2016-01-04

    The cytoplasm of prokaryotes contains many molecular machines interacting directly with the chromosome. These vital interactions depend on the chromosome structure, as a molecule, and on the genome organization, as a unit of genetic information. Strong selection for the organization of the genetic elements implicated in these interactions drives replicon ploidy, gene distribution, operon conservation, and the formation of replication-associated traits. The genomes of prokaryotes are also very plastic with high rates of horizontal gene transfer and gene loss. The evolutionary conflicts between plasticity and organization lead to the formation of regions with high genetic diversity whose impact on chromosome structure is poorly understood. Prokaryotic genomes are remarkable documents of natural history because they carry the imprint of all of these selective and mutational forces. Their study allows a better understanding of molecular mechanisms, their impact on microbial evolution, and how they can be tinkered in synthetic biology.

  20. Genome-Wide Comparative Analysis Reveals Similar Types of NBS Genes in Hybrid Citrus sinensis Genome and Original Citrus clementine Genome and Provides New Insights into Non-TIR NBS Genes

    PubMed Central

    Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K.; Duan, Yongping; Luo, Feng

    2015-01-01

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention. PMID:25811466

  1. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes.

    PubMed

    Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K; Duan, Yongping; Luo, Feng

    2015-01-01

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.

  2. Structural genomics of pathogenic protozoa: an overview.

    PubMed

    Fan, Erkang; Baker, David; Fields, Stanley; Gelb, Michael H; Buckner, Frederick S; Van Voorhis, Wesley C; Phizicky, Eric; Dumont, Mark; Mehlin, Christopher; Grayhack, Elizabeth; Sullivan, Mark; Verlinde, Christophe; Detitta, George; Meldrum, Deirdre R; Merritt, Ethan A; Earnest, Thomas; Soltis, Michael; Zucker, Frank; Myler, Peter J; Schoenfeld, Lori; Kim, David; Worthey, Liz; Lacount, Doug; Vignali, Marissa; Li, Jizhen; Mondal, Somnath; Massey, Archna; Carroll, Brian; Gulde, Stacey; Luft, Joseph; Desoto, Larry; Holl, Mark; Caruthers, Jonathan; Bosch, Jürgen; Robien, Mark; Arakaki, Tracy; Holmes, Margaret; Le Trong, Isolde; Hol, Wim G J

    2008-01-01

    The Structural Genomics of Pathogenic Protozoa (SGPP) Consortium aimed to determine crystal structures of proteins from trypanosomatid and malaria parasites in a high throughput manner. The pipeline of target selection, protein production, crystallization, and structure determination, is sketched. Special emphasis is given to a number of technology developments including domain prediction, the use of "co-crystallants," and capillary crystallization. "Fragment cocktail crystallography" for medical structural genomics is also described.

  3. Genomic signatures of germline gene expression.

    PubMed

    McVicker, Graham; Green, Phil

    2010-11-01

    Transcribed regions in the human genome differ from adjacent intergenic regions in transposable element density, crossover rates, and asymmetric substitution and sequence composition patterns. We tested whether these differences reflect selection or are instead a byproduct of germline transcription, using publicly available gene expression data from a variety of germline and somatic tissues. Crossover rate shows a strong negative correlation with gene expression in meiotic tissues, suggesting that crossover is inhibited by transcription. Strand-biased composition (G+T content) and A → G versus T → C substitution asymmetry are both positively correlated with germline gene expression. We find no evidence for a strand bias in allele frequency data, implying that the substitution asymmetry reflects a mutation rather than a fixation bias. The density of transposable elements is positively correlated with germline expression, suggesting that such elements preferentially insert into regions that are actively transcribed. For each of the features examined, our analyses favor a nonselective explanation for the observed trends and point to the role of germline gene expression in shaping the mammalian genome.

  4. Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species

    PubMed Central

    Hirao, Tomonori; Watanabe, Atsushi; Kurita, Manabu; Kondo, Teiji; Takata, Katsuhiko

    2008-01-01

    Background The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. Results The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. Conclusion The observed differences in genomic structure between C. japonica and other land plants, including

  5. Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species.

    PubMed

    Hirao, Tomonori; Watanabe, Atsushi; Kurita, Manabu; Kondo, Teiji; Takata, Katsuhiko

    2008-06-23

    The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. The observed differences in genomic structure between C. japonica and other land plants, including pines, strongly support the

  6. Gene discovery in the hamster: a comparative genomics approach for gene annotation by sequencing of hamster testis cDNAs

    PubMed Central

    Oduru, Sreedhar; Campbell, Janee L; Karri, SriTulasi; Hendry, William J; Khan, Shafiq A; Williams, Simon C

    2003-01-01

    Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells. PMID:12783626

  7. Hybrid Vigour? Genes, Genomics, and History

    PubMed Central

    BIVINS, ROBERTA

    2010-01-01

    Is the gene ‘special’ for historians? What effects, if any, has the notion of the ‘gene’ had on our understanding of history? Certainly, there is a widespread public and professional perception that genetics and history are or should be in dialogue with each other in some way. But historians and geneticists view history and genetics very differently – and assume very different relationships between them. And public perceptions of genes, genetics, genomics, and indeed the nature and meanings of ‘history’ differ yet again. Here, in looking at the meaning, and the implications – the significance – of the gene (and its corollary scientific disciplines and approaches) specifically to historians, I will focus on two aspects of the discourse. First, I will examine the ways in which historians have thus far approached genes and genetics, and the impact such studies have had on the field. There is considerable overlap between the subject matter of genetics/genomics and many of the most widely used analytic categories of contemporary historiography – race, gender, sexuality, ethnicity, (dis)ability, among others. Yet the impact of genetics and genomics on society has been studied principally by anthropologists, sociologists and ethicists.2 Only two historical sub-disciplines have engaged with the rise of genetics to any significant degree: the histories of science and of medicine. What does this indicate or suggest? Second, I will explore the impact of the ‘gene’ and genetic understandings (of, for example, the body, health, disease, identity, the family, and evolution) on public conceptions of history itself. PMID:20357894

  8. Structure and organization of a 25 kbp region of the genome of the photosynthetic green sulfur bacterium Chlorobium vibrioforme containing Mg-chelatase encoding genes.

    PubMed

    Petersen, B L; Møller, M G; Stummann, B M; Henningsen, K W

    1998-01-01

    A region comprising approximately 25 kbp of the genome of the strictly anaerobic and obligate photosynthetic green sulfur bacterium Chlorobium vibrioforme has been mapped, subcloned and partly sequenced. Approximately 15 kbp have been sequenced in it's entirety and three genes with significant homology and feature similarity to the bchI, -D and -H genes and the chlI, -D and -H genes of Rhodobacter and Synechocystis strain PCC6803, respectively, which encode magnesium chelatase subunits, have been identified. Magnesium chelatase catalyzes the insertion of Mg2+ into protoporphyrin IX, and is the first enzyme unique to the (bacterio)chlorophyll specific branch of the porphyrin biosynthetic pathway. The organization of the three Mg-chelatase encoding genes is unique to Chlorobium and suggests that the magnesium chelatase of C. vibrioforme is encoded by a single operon. The analyzed 25 kbp region contains five additional open reading frames, two of which display significant homology and feature similarity to genes encoding lipoamide dehydrogenase and genes with function in purine synthesis, and another three display significant homology to open reading frames with unknown function in distantly related bacteria. Putative E. coli sigma 70-like promoter sequences, ribosome binding sequences and rho-independent transcriptional stop signals within the sequenced 15 kbp region are related to the identified genes and orfs. Southern analysis, restriction mapping and partial sequencing of the remaining ca. 10 kbp of the analyzed 25 kbp region have shown that this part includes the hemA, -C, -D and -B genes (MOBERG and AVISSAR 1994), which encode enzymes with function in the early part of the biosynthetic pathway of porphyrins.

  9. Structural and operational complexity of the Geobacter sulfurreducens genome

    PubMed Central

    Qiu, Yu; Cho, Byung-Kwan; Park, Young Seoub; Lovley, Derek; Palsson, Bernhard Ø.; Zengler, Karsten

    2010-01-01

    Prokaryotic genomes can be annotated based on their structural, operational, and functional properties. These annotations provide the pivotal scaffold for understanding cellular functions on a genome-scale, such as metabolism and transcriptional regulation. Here, we describe a systems approach to simultaneously determine the structural and operational annotation of the Geobacter sulfurreducens genome. Integration of proteomics, transcriptomics, RNA polymerase, and sigma factor-binding information with deep-sequencing-based analysis of primary 5′-end transcripts allowed for a most precise annotation. The structural annotation is comprised of numerous previously undetected genes, noncoding RNAs, prevalent leaderless mRNA transcripts, and antisense transcripts. When compared with other prokaryotes, we found that the number of antisense transcripts reversely correlated with genome size. The operational annotation consists of 1453 operons, 22% of which have multiple transcription start sites that use different RNA polymerase holoenzymes. Several operons with multiple transcription start sites encoded genes with essential functions, giving insight into the regulatory complexity of the genome. The experimentally determined structural and operational annotations can be combined with functional annotation, yielding a new three-level annotation that greatly expands our understanding of prokaryotic genomes. PMID:20592237

  10. Structural and Operational Complexity of the Geobacter Sulfurreducens Genome

    SciTech Connect

    Qiu, Yu; Cho, Byung-Kwan; Park, Young S.; Lovley, Derek R.; Palsson, Bernhard O.; Zengler, Karsten

    2010-06-30

    Prokaryotic genomes can be annotated based on their structural, operational, and functional properties. These annotations provide the pivotal scaffold for understanding cellular functions on a genome-scale, such as metabolism and transcriptional regulation. Here, we describe a systems approach to simultaneously determine the structural and operational annotation of the Geobacter sulfurreducens genome. Integration of proteomics, transcriptomics, RNA polymerase, and sigma factor-binding information with deep-sequencing-based analysis of primary 59-end transcripts allowed for a most precise annotation. The structural annotation is comprised of numerous previously undetected genes, noncoding RNAs, prevalent leaderless mRNA transcripts, and antisense transcripts. When compared with other prokaryotes, we found that the number of antisense transcripts reversely correlated with genome size. The operational annotation consists of 1453 operons, 22% of which have multiple transcription start sites that use different RNA polymerase holoenzymes. Several operons with multiple transcription start sites encoded genes with essential functions, giving insight into the regulatory complexity of the genome. The experimentally determined structural and operational annotations can be combined with functional annotation, yielding a new three-level annotation that greatly expands our understanding of prokaryotic genomes.

  11. Afrobatrachian mitochondrial genomes: genome reorganization, gene rearrangement mechanisms, and evolutionary trends of duplicated and rearranged genes

    PubMed Central

    2013-01-01

    Background Mitochondrial genomic (mitogenomic) reorganizations are rarely found in closely-related animals, yet drastic reorganizations have been found in the Ranoides frogs. The phylogenetic relationships of the three major ranoid taxa (Natatanura, Microhylidae, and Afrobatrachia) have been problematic, and mitogenomic information for afrobatrachians has not been available. Several molecular models for mitochondrial (mt) gene rearrangements have been proposed, but observational evidence has been insufficient to evaluate them. Furthermore, evolutionary trends in rearranged mt genes have not been well understood. To gain molecular and phylogenetic insights into these issues, we analyzed the mt genomes of four afrobatrachian species (Breviceps adspersus, Hemisus marmoratus, Hyperolius marmoratus, and Trichobatrachus robustus) and performed molecular phylogenetic analyses. Furthermore we searched for two evolutionary patterns expected in the rearranged mt genes of ranoids. Results Extensively reorganized mt genomes having many duplicated and rearranged genes were found in three of the four afrobatrachians analyzed. In fact, Breviceps has the largest known mt genome among vertebrates. Although the kinds of duplicated and rearranged genes differed among these species, a remarkable gene rearrangement pattern of non-tandemly copied genes situated within tandemly-copied regions was commonly found. Furthermore, the existence of concerted evolution was observed between non-neighboring copies of triplicated 12S and 16S ribosomal RNA regions. Conclusions Phylogenetic analyses based on mitogenomic data support a close relationship between Afrobatrachia and Microhylidae, with their estimated divergence 100 million years ago consistent with present-day endemism of afrobatrachians on the African continent. The afrobatrachian mt data supported the first tandem and second non-tandem duplication model for mt gene rearrangements and the recombination-based model for concerted

  12. Structure and expression of the gene coding for the alpha-subunit of DNA-dependent RNA polymerase from the chloroplast genome of Zea mays.

    PubMed Central

    Ruf, M; Kössel, H

    1988-01-01

    The rpoA gene coding for the alpha-subunit of DNA-dependent RNA polymerase located on the DNA of Zea mays chloroplasts has been characterized with respect to its position on the chloroplast genome and its nucleotide sequence. The amino acid sequence derived for a 39 Kd polypeptide shows strong homology with sequences derived from the rpoA genes of other chloroplast species and with the amino acid sequence of the alpha-subunit from E. coli RNA polymerase. Transcripts of the rpoA gene were identified by Northern hybridization and characterized by S1 mapping using total RNA isolated from maize chloroplasts. Antibodies raised against a synthetic C-terminal heptapeptide show cross reactivity with a 39 Kd polypeptide contained in the stroma fraction of maize chloroplasts. It is concluded that the rpoA gene is a functional gene and that therefore, at least the alpha-subunit of plastidic RNA polymerase, is expressed in chloroplasts. Images PMID:3399379

  13. Isolation, cDNA, and genomic structure of a conserved gene (NOF) at chromosome 11q13 next to FAU and oriented in the opposite transcriptional orientation

    SciTech Connect

    Kas, K.; Meyen, E.; Van De Ven, W.J.M.

    1996-06-15

    In our effort to characterize a gene at chromosome 11q13 involved in a t(11;17)(q13;q21) translocation in B-non-Hodgkin lymphoma, we have identified a novel human gene, NOF (Neighbour of FAU). It maps right next to FAU in a head to head configuration separated by a maximum of 146 nucleotides. cDNA clones representing NOF hybridized to a 2.2-kb mRNA present in all tissues tested. The largest open reading frame appeared to contain 166 amino acids and is proline rich, and the sequence shows no homology with any known gene in the public databases. The NOF gene consists of 4 exons and 3 introns spanning approximately 5 kb, and the boundaries between exons and introns follow the GT/AG rule. The NOF locus is conserved during evolution, with the predicted protein having over 80% identity to three translated mouse and rat ESTs of unknown function. Moreover, the mouse ESTs map in the same organization, closely linked to the FAU gene, in the mouse genome. NOF, however, is not affected by the t(11;17)(q13;121) chromosomal translocation. 14 refs., 2 figs.

  14. Translational control genes in the sea urchin genome.

    PubMed

    Morales, Julia; Mulner-Lorillon, Odile; Cosson, Bertrand; Morin, Emmanuelle; Bellé, Robert; Bradham, Cynthia A; Beane, Wendy S; Cormier, Patrick

    2006-12-01

    Sea urchin eggs and early cleavage stage embryos provide an example of regulated gene expression at the level of translation. The availability of the sea urchin genome offers the opportunity to investigate the "translational control" toolkit of this model system. The annotation of the genome reveals that most of the factors implicated in translational control are encoded by nonredundant genes in echinoderm, an advantage for future functional studies. In this paper, we focus on translation factors that have been shown or suggested to play crucial role in cell cycle and development of sea urchin embryos. Addressing the cap-binding translational control, three closely related eIF4E genes (class I, II, III) are present, whereas its repressor 4E-BP and its activator eIF4G are both encoded by one gene. Analysis of the class III eIF4E proteins in various phyla shows an echinoderm-specific amino acid substitution. Furthermore, an interaction site between eIF4G and poly(A)-binding protein is uncovered in the sea urchin eIF4G proteins and is conserved in metazoan evolution. In silico screening of the sea urchin genome has uncovered potential new regulators of eIF4E sharing the common eIF4E recognition motif. Taking together, these data provide new insights regarding the strong requirement of cap-dependent translation following fertilization. The genome analysis gives insights on the complexity of eEF1B structure and motifs of functional relevance, involved in the translational control of gene expression at the level of elongation. Finally, because deregulation of translation process can lead to diseases and tumor formation in humans, the sea urchin orthologs of human genes implicated in human diseases and signaling pathways regulating translation were also discussed.

  15. Genomic organization of the human lysosomal acid lipase gene (LIPA)

    SciTech Connect

    Aslandis, C.; Klima, H.; Lackner, K.J.; Schmitz, G. )

    1994-03-15

    Defects in the human lysosomal acid lipase gene are responsible for cholesteryl ester storage disease (CESD) and Wolman disease. Exon skipping as the cause for CESD has been demonstrated. The authors present here a summary of the exon structure of the entire human lysosomal acid lipase gene consisting of 10 exons, together with the sizes of genomic EcoRI and SacI fragments hybridizing to each exon. In addition, the DNA sequence of the putative promoter region is presented. The EMBL accession numbers for adjacent intron sequences are given. 7 refs., 2 figs., 1 tab.

  16. Visualization of RNA structure models within the Integrative Genomics Viewer.

    PubMed

    Busan, Steven; Weeks, Kevin M

    2017-07-01

    Analyses of the interrelationships between RNA structure and function are increasingly important components of genomic studies. The SHAPE-MaP strategy enables accurate RNA structure probing and realistic structure modeling of kilobase-length noncoding RNAs and mRNAs. Existing tools for visualizing RNA structure models are not suitable for efficient analysis of long, structurally heterogeneous RNAs. In addition, structure models are often advantageously interpreted in the context of other experimental data and gene annotation information, for which few tools currently exist. We have developed a module within the widely used and well supported open-source Integrative Genomics Viewer (IGV) that allows visualization of SHAPE and other chemical probing data, including raw reactivities, data-driven structural entropies, and data-constrained base-pair secondary structure models, in context with linear genomic data tracks. We illustrate the usefulness of visualizing RNA structure in the IGV by exploring structure models for a large viral RNA genome, comparing bacterial mRNA structure in cells with its structure under cell- and protein-free conditions, and comparing a noncoding RNA structure modeled using SHAPE data with a base-pairing model inferred through sequence covariation analysis. © 2017 Busan and Weeks; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  17. Molecular Characterization of Soybean Pterocarpan 2-Dimethylallyltransferase in Glyceollin Biosynthesis: Local Gene and Whole-Genome Duplications of Prenyltransferase Genes Led to the Structural Diversity of Soybean Prenylated Isoflavonoids.

    PubMed

    Yoneyama, Keisuke; Akashi, Tomoyoshi; Aoki, Toshio

    2016-12-01

    Soybean (Glycine max) accumulates several prenylated isoflavonoid phytoalexins, collectively referred to as glyceollins. Glyceollins (I, II, III, IV and V) possess modified pterocarpan skeletons with C5 moieties from dimethylallyl diphosphate, and they are commonly produced from (6aS, 11aS)-3,9,6a-trihydroxypterocarpan [(-)-glycinol]. The metabolic fate of (-)-glycinol is determined by the enzymatic introduction of a dimethylallyl group into C-4 or C-2, which is reportedly catalyzed by regiospecific prenyltransferases (PTs). 4-Dimethylallyl (-)-glycinol and 2-dimethylallyl (-)-glycinol are precursors of glyceollin I and other glyceollins, respectively. Although multiple genes encoding (-)-glycinol biosynthetic enzymes have been identified, those involved in the later steps of glyceollin formation mostly remain unidentified, except for (-)-glycinol 4-dimethylallyltransferase (G4DT), which is involved in glyceollin I biosynthesis. In this study, we identified four genes that encode isoflavonoid PTs, including (-)-glycinol 2-dimethylallyltransferase (G2DT), using homology-based in silico screening and biochemical characterization in yeast expression systems. Transcript analyses illustrated that changes in G2DT gene expression were correlated with the induction of glyceollins II, III, IV and V in elicitor-treated soybean cells and leaves, suggesting its involvement in glyceollin biosynthesis. Moreover, the genomic signatures of these PT genes revealed that G4DT and G2DT are paralogs derived from whole-genome duplications of the soybean genome, whereas other PT genes [isoflavone dimethylallyltransferase 1 (IDT1) and IDT2] were derived via local gene duplication on soybean chromosome 11.

  18. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards

    PubMed Central

    Rappaport, Noa; Hadar, Rotem; Plaschkes, Inbar; Iny Stein, Tsippi; Rosen, Naomi; Kohn, Asher; Twik, Michal; Safran, Marilyn

    2017-01-01

    Abstract A major challenge in understanding gene regulation is the unequivocal identification of enhancer elements and uncovering their connections to genes. We present GeneHancer, a novel database of human enhancers and their inferred target genes, in the framework of GeneCards. First, we integrated a total of 434 000 reported enhancers from four different genome-wide databases: the Encyclopedia of DNA Elements (ENCODE), the Ensembl regulatory build, the functional annotation of the mammalian genome (FANTOM) project and the VISTA Enhancer Browser. Employing an integration algorithm that aims to remove redundancy, GeneHancer portrays 285 000 integrated candidate enhancers (covering 12.4% of the genome), 94 000 of which are derived from more than one source, and each assigned an annotation-derived confidence score. GeneHancer subsequently links enhancers to genes, using: tissue co-expression correlation between genes and enhancer RNAs, as well as enhancer-targeted transcription factor genes; expression quantitative trait loci for variants within enhancers; and capture Hi-C, a promoter-specific genome conformation assay. The individual scores based on each of these four methods, along with gene–enhancer genomic distances, form the basis for GeneHancer’s combinatorial likelihood-based scores for enhancer–gene pairing. Finally, we define ‘elite’ enhancer–gene relations reflecting both a high-likelihood enhancer definition and a strong enhancer–gene association. GeneHancer predictions are fully integrated in the widely used GeneCards Suite, whereby candidate enhancers and their annotations are displayed on every relevant GeneCard. This assists in the mapping of non-coding variants to enhancers, and via the linked genes, forms a basis for variant–phenotype interpretation of whole-genome sequences in health and disease. Database URL: http://www.genecards.org/ PMID:28605766

  19. Genomic organization of the human skeletal muscle sodium channel gene

    SciTech Connect

    George, A.L. Jr.; Iyer, G.S.; Kleinfield, R.; Kallen, R.G.; Barchi, R.L. )

    1993-03-01

    Voltage-dependent sodium channels are essential for normal membrane excitability and contractility in adult skeletal muscle. The gene encoding the principal sodium channel [alpha]-subunit isoform in human skeletal muscle (SCN4A) has recently been shown to harbor point mutations in certain hereditary forms of periodic paralysis. The authors have carried out an analysis of the detailed structure of this gene including delination of intron-exon boundaries by genomic DNA cloning and sequence analysis. The complete coding region of SCN4A is found in 32.5 kb of genomic DNA and consists of 24 exons (54 to >2.2 kb) and 23 introns (97 bp-4.85 kb). The exon organization of the gene shows no relationship to the predicted functional domains of the channel protein and splice junctions interrupt many of the transmembrane segments. The genomic organization of sodium channels may have been partially conserved during evolution as evidenced by the observation that 10 of the 24 splice junctions in SCN4A are positioned in homologous locations in a putative sodium channel gene in Drosophila (para). The information presented here should be extremely useful both for further identifying sodium channel mutations and for gaining a better understanding of sodium channel evolution. 39 refs., 5 figs., 2 tabs.

  20. Structure and variation of the mitochondrial genome of fishes.

    PubMed

    Satoh, Takashi P; Miya, Masaki; Mabuchi, Kohji; Nishida, Mutsumi

    2016-09-07

    The mitochondrial (mt) genome has been used as an effective tool for phylogenetic and population genetic analyses in vertebrates. However, the structure and variability of the vertebrate mt genome are not well understood. A potential strategy for improving our understanding is to conduct a comprehensive comparative study of large mt genome data. The aim of this study was to characterize the structure and variability of the fish mt genome through comparative analysis of large datasets. An analysis of the secondary structure of proteins for 250 fish species (248 ray-finned and 2 cartilaginous fishes) illustrated that cytochrome c oxidase subunits (COI, COII, and COIII) and a cytochrome bc1 complex subunit (Cyt b) had substantial amino acid conservation. Among the four proteins, COI was the most conserved, as more than half of all amino acid sites were invariable among the 250 species. Our models identified 43 and 58 stems within 12S rRNA and 16S rRNA, respectively, with larger numbers than proposed previously for vertebrates. The models also identified 149 and 319 invariable sites in 12S rRNA and 16S rRNA, respectively, in all fishes. In particular, the present result verified that a region corresponding to the peptidyl transferase center in prokaryotic 23S rRNA, which is homologous to mt 16S rRNA, is also conserved in fish mt 16S rRNA. Concerning the gene order, we found 35 variations (in 32 families) that deviated from the common gene order in vertebrates. These gene rearrangements were mostly observed in the area spanning the ND5 gene to the control region as well as two tRNA gene cluster regions (IQM and WANCY regions). Although many of such gene rearrangements were unique to a specific taxon, some were shared polyphyletically between distantly related species. Through a large-scale comparative analysis of 250 fish species mt genomes, we elucidated various structural aspects of the fish mt genome and the encoded genes. The present results will be important for

  1. Characterization of histone genes isolated from Xenopus laevis and Xenopus tropicalis genomic libraries.

    PubMed Central

    Ruberti, I; Fragapane, P; Pierandrei-Amaldi, P; Beccari, E; Amaldi, F; Bozzoni, I

    1982-01-01

    Using a cDNA clone for the histone H3 we have isolated, from two genomic libraries of Xenopus laevis and Xenopus tropicalis, clones containing four different histone gene clusters. The structural organization of X. laevis histone genes has been determined by restriction mapping, Southern blot hybridization and translation of the mRNAs which hybridize to the various restriction fragments. The arrangement of the histone genes in X. tropicalis has been determined by Southern analysis using X. laevis genomic fragments, containing individual genes, as probes. Histone genes are clustered in the genome of X. laevis and X. tropicalis and, compared to invertebrates, show a higher organization heterogeneity as demonstrated by structural analysis of the four genomic clones. In fact, the order of the genes within individual clusters is not conserved. Images PMID:6296782

  2. Residual dipolar couplings: synergy between NMR and structural genomics.

    PubMed

    Al-Hashimi, Hashim M; Patel, Dinshaw J

    2002-01-01

    Structural genomics is on a quest for the structure and function of a significant fraction of gene products. Current efforts are focusing on structure determination of single-domain proteins, which can readily be targeted by X-ray crystallography, NMR spectroscopy and computational homology modeling. However, comprehensive association of gene products with functions also requires systematic determination of more complex protein structures and other biomolecules participating in cellular processes such as nucleic acids, and characterization of biomolecular interactions and dynamics relevant to function. Such NMR investigations are becoming more feasible, not only due to recent advances in NMR methodology, but also because structural genomics is providing valuable structural information and new experimental and computational tools. The measurement of residual dipolar couplings in partially oriented systems and other new NMR methods will play an important role in this synergistic relationship between NMR and structural genomics. Both an expansion in the domain of NMR application, and important contributions to future structural genomics efforts can be anticipated.

  3. Gene and genome construction in yeast.

    PubMed

    Gibson, Daniel G

    2011-04-01

    The yeast Saccharomyces cerevisiae has the capacity to take up and assemble dozens of different overlapping DNA molecules in one transformation event. These DNA molecules can be single-stranded oligonucleotides, to produce gene-sized fragments, or double-stranded DNA fragments, to produce molecules up to hundreds of kilobases in length, including complete bacterial genomes. This unit presents protocols for designing the DNA molecules to be assembled, transforming them into yeast, and confirming their assembly. © 2011 by John Wiley & Sons, Inc.

  4. The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community.

    PubMed

    Arnaud, Martha B; Chibucos, Marcus C; Costanzo, Maria C; Crabtree, Jonathan; Inglis, Diane O; Lotia, Adil; Orvis, Joshua; Shah, Prachi; Skrzypek, Marek S; Binkley, Gail; Miyasato, Stuart R; Wortman, Jennifer R; Sherlock, Gavin

    2010-01-01

    The Aspergillus Genome Database (AspGD) is an online genomics resource for researchers studying the genetics and molecular biology of the Aspergilli. AspGD combines high-quality manual curation of the experimental scientific literature examining the genetics and molecular biology of Aspergilli, cutting-edge comparative genomics approaches to iteratively refine and improve structural gene annotations across multiple Aspergillus species, and web-based research tools for accessing and exploring the data. All of these data are freely available at http://www.aspgd.org. We welcome feedback from users and the research community at aspergillus-curator@genome.stanford.edu.

  5. A Roadmap for Functional Structural Variants in the Soybean Genome

    PubMed Central

    Anderson, Justin E.; Kantar, Michael B.; Kono, Thomas Y.; Fu, Fengli; Stec, Adrian O.; Song, Qijian; Cregan, Perry B.; Specht, James E.; Diers, Brian W.; Cannon, Steven B.; McHale, Leah K.; Stupar, Robert M.

    2014-01-01

    Gene structural variation (SV) has recently emerged as a key genetic mechanism underlying several important phenotypic traits in crop species. We screened a panel of 41 soybean (Glycine max) accessions serving as parents in a soybean nested association mapping population for deletions and duplications in more than 53,000 gene models. Array hybridization and whole genome resequencing methods were used as complementary technologies to identify SV in 1528 genes, or approximately 2.8%, of the soybean gene models. Although SV occurs throughout the genome, SV enrichment was noted in families of biotic defense response genes. Among accessions, SV was nearly eightfold less frequent for gene models that have retained paralogs since the last whole genome duplication event, compared with genes that have not retained paralogs. Increases in gene copy number, similar to that described at the Rhg1 resistance locus, account for approximately one-fourth of the genic SV events. This assessment of soybean SV occurrence presents a target list of genes potentially responsible for rapidly evolving and/or adaptive traits. PMID:24855315

  6. Landscape genomics of Populus trichocarpa: the role of hybridization, limited gene flow, and natural selection in shaping patterns of population structure.

    PubMed

    Geraldes, Armando; Farzaneh, Nima; Grassa, Christopher J; McKown, Athena D; Guy, Robert D; Mansfield, Shawn D; Douglas, Carl J; Cronk, Quentin C B

    2014-11-01

    Populus trichocarpa is an ecologically important tree across western North America. We used a large population sample of 498 accessions over a wide geographical area genotyped with a 34K Populus SNP array to quantify geographical patterns of genetic variation in this species (landscape genomics). We present evidence that three processes contribute to the observed patterns: (1) introgression from the sister species P. balsamifera, (2) isolation by distance (IBD), and (3) natural selection. Introgression was detected only at the margins of the species' distribution. IBD was significant across the sampled area as a whole, but no evidence of restricted gene flow was detected in a core of drainages from southern British Columbia (BC). We identified a large number of FST outliers. Gene Ontology analyses revealed that FST outliers are overrepresented in genes involved in circadian rhythm and response to red/far-red light when the entire dataset is considered, whereas in southern BC heat response genes are overrepresented. We also identified strong correlations between geoclimate variables and allele frequencies at FST outlier loci that provide clues regarding the selective pressures acting at these loci.

  7. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    NASA Astrophysics Data System (ADS)

    Yanai, Itai; Camacho, Carlos J.; Delisi, Charles

    2000-09-01

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications.

  8. Elucidation of operon structures across closely related bacterial genomes.

    PubMed

    Zhou, Chuan; Ma, Qin; Li, Guojun

    2014-01-01

    About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components.

  9. Elucidation of Operon Structures across Closely Related Bacterial Genomes

    PubMed Central

    Li, Guojun

    2014-01-01

    About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components. PMID:24959722

  10. [GSDS: a gene structure display server].

    PubMed

    Guo, An-Yuan; Zhu, Qi-Hui; Chen, Xin; Luo, Jing-Chu

    2007-08-01

    We developed a web server GSDS (Gene Structure Display Server) for drawing gene structure schematic diagrams. Users can submit three types of dataCDS and genomic sequences, NCBI GenBank accession numbers or GIs, exon positions on a gene. GSDS uses this information to obtain the gene structure and draw diagram for it. Users can also designate some special regions to mark on the gene structure diagram. The output result will be PNG or SVG format picture. The corresponding sequence will be shown in a new window by clicking the picture in PNG format. A Chinese version for the main page is also built. The GSDS is available on http://gsds.cbi.pku.edu.cn/.

  11. Phylogenetic analysis of the mitochondrial genomes and nuclear rRNA genes of ticks reveals a deep phylogenetic structure within the genus Haemaphysalis and further elucidates the polyphyly of the genus Amblyomma with respect to Amblyomma sphenodonti and Amblyomma elaphense.

    PubMed

    Burger, Thomas D; Shao, Renfu; Barker, Stephen C

    2013-06-01

    We sequenced the entire mitochondrial genomes of 3 species of metastriate ticks: Haemaphysalis formosensis, H. parva, and Amblyomma cajennense. We also sequenced two thirds (ca. 9500bp) of the mitochondrial genomes of H. humerosa and H. hystricis. We used these 5 mitochondrial genome sequences together with the 13 tick mitochondrial genomes we sequenced previously and the 2 tick mitochondrial genomes sequenced by Black and Roehrdanz (1998), as well as the nuclear rRNA genes from 84 ticks and mites, in phylogenetic analyses. Our analyses reveal deep phylogenetic structure within the genus Haemaphysalis, with at least 2 species, H. parva and H. inermis that are highly divergent from the rest of the genus Haemaphysalis. We identify a region of the 18S rRNA gene which correlates with this division of the genus Haemaphysalis as well as a novel insertion in the mitochondrial genome of H. parva. We reject the hypotheses of Hoogstraal and Aeschlimann (1982) and Barker and Murrell (2004) on the relationships among metastriate genera. Instead, our analysis provides further evidence for the division of the Metastriata into 2 major lineages: (i) Amblyomma s.s. plus Rhipicephalinae (i.e. Rhipicephalus, Hyalomma, Rhipicentor, and Dermacentor); and (ii) Haemaphysalis plus Bothriocroton plus Amblyomma sphenodonti. We also provide further evidence for the polyphyly of the genus Amblyomma with respect to A. sphenodonti and A. elaphense. The most likely position of A. elaphense is sister to the rest of the Metastriata; the most likely position of A. sphenodonti is sister to the genus Bothriocroton. These 2 species do not belong in the genus Amblyomma, and we propose that new genera are required for A. sphenodonti and A. elaphense. Copyright © 2013 Elsevier GmbH. All rights reserved.

  12. Putative essential and core-essential genes in Mycoplasma genomes.

    PubMed

    Lin, Yan; Zhang, Randy Ren

    2011-01-01

    Mycoplasma, which was used to create the first "synthetic life", has been an important species in the emerging field, synthetic biology. However, essential genes, an important concept of synthetic biology, for both M. mycoides and M. capricolum, as well as 14 other Mycoplasma with available genomes, are still unknown. We have developed a gene essentiality prediction algorithm that incorporates information of biased gene strand distribution, homologous search and codon adaptation index. The algorithm, which achieved an accuracy of 80.8% and 78.9% in self-consistence and cross-validation tests, respectively, predicted 5880 essential genes in the 16 Mycoplasma genomes. The intersection set of essential genes in available Mycoplasma genomes consists of 153 core essential genes. The predicted essential genes (available from pDEG, tubic.tju.edu.cn/pdeg) and the proposed algorithm can be helpful for studying minimal Mycoplasma genomes as well as essential genes in other genomes.

  13. Putative essential and core-essential genes in Mycoplasma genomes

    PubMed Central

    Lin, Yan; Zhang, Randy Ren

    2011-01-01

    Mycoplasma, which was used to create the first “synthetic life”, has been an important species in the emerging field, synthetic biology. However, essential genes, an important concept of synthetic biology, for both M. mycoides and M. capricolum, as well as 14 other Mycoplasma with available genomes, are still unknown. We have developed a gene essentiality prediction algorithm that incorporates information of biased gene strand distribution, homologous search and codon adaptation index. The algorithm, which achieved an accuracy of 80.8% and 78.9% in self-consistence and cross-validation tests, respectively, predicted 5880 essential genes in the 16 Mycoplasma genomes. The intersection set of essential genes in available Mycoplasma genomes consists of 153 core essential genes. The predicted essential genes (available from pDEG, tubic.tju.edu.cn/pdeg) and the proposed algorithm can be helpful for studying minimal Mycoplasma genomes as well as essential genes in other genomes. PMID:22355572

  14. Chapter 6: Structural variation and medical genomics.

    PubMed

    Raphael, Benjamin J

    2012-01-01

    Differences between individual human genomes, or between human and cancer genomes, range in scale from single nucleotide variants (SNVs) through intermediate and large-scale duplications, deletions, and rearrangements of genomic segments. The latter class, called structural variants (SVs), have received considerable attention in the past several years as they are a previously under appreciated source of variation in human genomes. Much of this recent attention is the result of the availability of higher-resolution technologies for measuring these variants, including both microarray-based techniques, and more recently, high-throughput DNA sequencing. We describe the genomic technologies and computational techniques currently used to measure SVs, focusing on applications in human and cancer genomics.

  15. INTEGRATE: gene fusion discovery using whole genome and transcriptome data

    PubMed Central

    Zhang, Jin; White, Nicole M.; Schmidt, Heather K.; Fulton, Robert S.; Tomlinson, Chad; Warren, Wesley C.; Wilson, Richard K.; Maher, Christopher A.

    2016-01-01

    While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use. PMID:26556708

  16. INTEGRATE: gene fusion discovery using whole genome and transcriptome data.

    PubMed

    Zhang, Jin; White, Nicole M; Schmidt, Heather K; Fulton, Robert S; Tomlinson, Chad; Warren, Wesley C; Wilson, Richard K; Maher, Christopher A

    2016-01-01

    While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use.

  17. Genome Structure Gallery from the Mycobacterium Tuberculosis Structual Genomics Consortium

    DOE Data Explorer

    The TB Structural Genomics Consortium works with the structures of proteins from M. tuberculosis, analyzing these structures in the context of functional information that currently exists and that the Consortium generates. The database of linked structural and functional information constructed from this project will form a lasting basis for understanding M. tuberculosis pathogenesis and for structure-based drug design. The Consortium's structural and functional information is publicly available. The Structures Gallery makes more than 650 total structures available by PDB identifier. Some of these are not consortium targets, but all are viewable in 3D color and can be manipulated in various ways by Jmol, an open-source Java viewer for chemical structures in 3D from http://www.jmol.org/

  18. Genome Wide Identification, Phylogeny, and Expression of Aquaporin Genes in Common Carp (Cyprinus carpio).

    PubMed

    Dong, Chuanju; Chen, Lin; Feng, Jingyan; Xu, Jian; Mahboob, Shahid; Al-Ghanim, Khalid; Li, Xuejun; Xu, Peng

    2016-01-01

    Aquaporins (Aqps) are integral membrane proteins that facilitate the transport of water and small solutes across cell membranes. Among vertebrate species, Aqps are highly conserved in both gene structure and amino acid sequence. These proteins are vital for maintaining water homeostasis in living organisms, especially for aquatic animals such as teleost fish. Studies on teleost Aqps are mainly limited to several model species with diploid genomes. Common carp, which has a tetraploidized genome, is one of the most common aquaculture species being adapted to a wide range of aquatic environments. The complete common carp genome has recently been released, providing us the possibility for gene evolution of aqp gene family after whole genome duplication. In this study, we identified a total of 37 aqp genes from common carp genome. Phylogenetic analysis revealed that most of aqps are highly conserved. Comparative analysis was performed across five typical vertebrate genomes. We found that almost all of the aqp genes in common carp were duplicated in the evolution of the gene family. We postulated that the expansion of the aqp gene family in common carp was the result of an additional whole genome duplication event and that the aqp gene family in other teleosts has been lost in their evolution history with the reason that the functions of genes are redundant and conservation. Expression patterns were assessed in various tissues, including brain, heart, spleen, liver, intestine, gill, muscle, and skin, which demonstrated the comprehensive expression profiles of aqp genes in the tetraploidized genome. Significant gene expression divergences have been observed, revealing substantial expression divergences or functional divergences in those duplicated aqp genes post the latest WGD event. To some extent, the gene families are also considered as a unique source for evolutionary studies. Moreover, the whole set of common carp aqp gene family provides an essential genomic

  19. Genome Wide Identification, Phylogeny, and Expression of Aquaporin Genes in Common Carp (Cyprinus carpio)

    PubMed Central

    Feng, Jingyan; Xu, Jian; Mahboob, Shahid; Al-Ghanim, Khalid; Li, Xuejun

    2016-01-01

    Background Aquaporins (Aqps) are integral membrane proteins that facilitate the transport of water and small solutes across cell membranes. Among vertebrate species, Aqps are highly conserved in both gene structure and amino acid sequence. These proteins are vital for maintaining water homeostasis in living organisms, especially for aquatic animals such as teleost fish. Studies on teleost Aqps are mainly limited to several model species with diploid genomes. Common carp, which has a tetraploidized genome, is one of the most common aquaculture species being adapted to a wide range of aquatic environments. The complete common carp genome has recently been released, providing us the possibility for gene evolution of aqp gene family after whole genome duplication. Results In this study, we identified a total of 37 aqp genes from common carp genome. Phylogenetic analysis revealed that most of aqps are highly conserved. Comparative analysis was performed across five typical vertebrate genomes. We found that almost all of the aqp genes in common carp were duplicated in the evolution of the gene family. We postulated that the expansion of the aqp gene family in common carp was the result of an additional whole genome duplication event and that the aqp gene family in other teleosts has been lost in their evolution history with the reason that the functions of genes are redundant and conservation. Expression patterns were assessed in various tissues, including brain, heart, spleen, liver, intestine, gill, muscle, and skin, which demonstrated the comprehensive expression profiles of aqp genes in the tetraploidized genome. Significant gene expression divergences have been observed, revealing substantial expression divergences or functional divergences in those duplicated aqp genes post the latest WGD event. Conclusions To some extent, the gene families are also considered as a unique source for evolutionary studies. Moreover, the whole set of common carp aqp gene family

  20. Genomic analysis and gene structure of the plant carotenoid dioxygenase 4 family: a deeper study in Crocus sativus and its allies.

    PubMed

    Ahrazem, Oussama; Trapero, Almudena; Gómez, M Dolores; Rubio-Moraga, Angela; Gómez-Gómez, Lourdes

    2010-10-01

    The plastoglobule-targeted enzyme carotenoid cleavage dioxygenase (CCD4) mediates the formation of volatile C13 ketones, such as β-ionone, by cleaving the C9-C10 and C9'-C10' double bonds of cyclic carotenoids. Here, we report the isolation and analysis of CCD4 genomic DNA regions in Crocus sativus. Different CCD4 alleles have been identified: CsCCD4a which is found with and without an intron and CsCCD4b that showed the presence of a unique intron. The presence of different CCD4 alleles was also observed in other Crocus species. Furthermore, comparison of the locations of CCD4 introns within the coding region with CCD4 genes from other plant species suggests that independent gain/losses have occurred. The comparison of the promoter region of CsCCD4a and CsCCD4b with available CCD4 gene promoters from other plant species highlighted the conservation of cis-elements involved in light response, heat stress, as well as the absence and unique presence of cis-elements involved in circadian regulation and low temperature responses, respectively. Functional characterization of the Crocus sativus CCD4a promoter using Arabidopsis plants stably transformed with a DNA fragment of 1400 base pairs (P-CsCCD4a) fused to the β-glucuronidase (GUS) reporter gene showed that this sequence was sufficient to drive GUS expression in the flower, in particular high levels were detected in pollen.

  1. Whole-Genome Analysis of Gene Conversion Events

    NASA Astrophysics Data System (ADS)

    Hsu, Chih-Hao; Zhang, Yu; Hardison, Ross; Miller, Webb

    Gene conversion events are often overlooked in analyses of genome evolution. In a conversion event, an interval of DNA sequence (not necessarily containing a gene) overwrites a highly similar sequence. The event creates relationships among genomic intervals that can confound attempts to identify orthologs and to transfer functional annotation between genomes. Here we examine 1,112,202 paralogous pairs of human genomic intervals, and detect conversion events in about 13.5% of them. Properties of the putative gene conversions are analyzed, such as the lengths of the paralogous pairs and the spacing between their sources and targets. Our approach is illustrated using conversion events in the beta-globin gene cluster.

  2. A Genome-Wide Survey of Switchgrass Genome Structure and Organization

    PubMed Central

    Sharma, Manoj K.; Sharma, Rita; Cao, Peijian; Jenkins, Jerry; Bartley, Laura E.; Qualls, Morgan; Grimwood, Jane; Schmutz, Jeremy; Rokhsar, Daniel; Ronald, Pamela C.

    2012-01-01

    The perennial grass, switchgrass (Panicum virgatum L.), is a promising bioenergy crop and the target of whole genome sequencing. We constructed two bacterial artificial chromosome (BAC) libraries from the AP13 clone of switchgrass to gain insight into the genome structure and organization, initiate functional and comparative genomic studies, and assist with genome assembly. Together representing 16 haploid genome equivalents of switchgrass, each library comprises 101,376 clones with average insert sizes of 144 (HindIII-generated) and 110 kb (BstYI-generated). A total of 330,297 high quality BAC-end sequences (BES) were generated, accounting for 263.2 Mbp (16.4%) of the switchgrass genome. Analysis of the BES identified 279,099 known repetitive elements, >50,000 SSRs, and 2,528 novel repeat elements, named switchgrass repetitive elements (SREs). Comparative mapping of 47 full-length BAC sequences and 330K BES revealed high levels of synteny with the grass genomes sorghum, rice, maize, and Brachypodium. Our data indicate that the sorghum genome has retained larger microsyntenous regions with switchgrass besides high gene order conservation with rice. The resources generated in this effort will be useful for a broad range of applications. PMID:22511929

  3. Chemical genomics for studying parasite gene function and interaction

    PubMed Central

    Li, Jian; Yuan, Jing; Chen, Chin-chien; Inglese, James; Su, Xin-zhuan

    2013-01-01

    With the development of new technologies in genome sequencing, gene expression profiling, genotyping, and high-throughput screening of chemical compound libraries, small molecules are playing increasingly important roles in studying gene expression regulation, gene-gene interaction, and gene function. Here we briefly review and discuss some recent advancements in drug target identification and phenotype characterization using combinations of high-throughput screening of small-molecule libraries and various genome-wide methods such as whole genome sequencing, genome-wide association studies, and genome-wide expressional analysis. These approaches can be used to search for new drugs against parasitic infections, to identify drug targets or drug-resistance genes, and to infer gene function. PMID:24215777

  4. Wolbachia genome integrated in an insect chromosome: Evolution and fate of laterally transferred endosymbiont genes

    PubMed Central

    Nikoh, Naruo; Tanaka, Kohjiro; Shibata, Fukashi; Kondo, Natsuko; Hizume, Masahiro; Shimada, Masakazu; Fukatsu, Takema

    2008-01-01

    Recent accumulation of microbial genome data has demonstrated that lateral gene transfers constitute an important and universal evolutionary process in prokaryotes, while those in multicellular eukaryotes are still regarded as unusual, except for endosymbiotic gene transfers from mitochondria and plastids. Here we thoroughly investigated the bacterial genes derived from a Wolbachia endosymbiont on the nuclear genome of the beetle Callosobruchus chinensis. Exhaustive PCR detection and Southern blot analysis suggested that ∼30% of Wolbachia genes, in terms of the gene repertoire of wMel, are present on the insect nuclear genome. Fluorescent in situ hybridization located the transferred genes on the proximal region of the basal short arm of the X chromosome. Molecular evolutionary and other lines of evidence indicated that the transferred genes are probably derived from a single lateral transfer event. The transferred genes were, for the length examined, structurally disrupted, freed from functional constraints, and transcriptionally inactive. Hence, most, if not all, of the transferred genes have been pseudogenized. Notwithstanding this, the transferred genes were ubiquitously detected from Japanese and Taiwanese populations of C. chinensis, while the number of the transferred genes detected differed between the populations. The transferred genes were not detected from congenic beetle species, indicating that the transfer event occurred after speciation of C. chinensis, which was estimated to be one or several million years ago. These features of the laterally transferred endosymbiont genes are compared with the evolutionary patterns of mitochondrial and plastid genome fragments acquired by nuclear genomes through recent endosymbiotic gene transfers. PMID:18073380

  5. Parallel computation of genome-scale RNA secondary structure to detect structural constraints on human genome.

    PubMed

    Kawaguchi, Risa; Kiryu, Hisanori

    2016-05-06

    RNA secondary structure around splice sites is known to assist normal splicing by promoting spliceosome recognition. However, analyzing the structural properties of entire intronic regions or pre-mRNA sequences has been difficult hitherto, owing to serious experimental and computational limitations, such as low read coverage and numerical problems. Our novel software, "ParasoR", is designed to run on a computer cluster and enables the exact computation of various structural features of long RNA sequences under the constraint of maximal base-pairing distance. ParasoR divides dynamic programming (DP) matrices into smaller pieces, such that each piece can be computed by a separate computer node without losing the connectivity information between the pieces. ParasoR directly computes the ratios of DP variables to avoid the reduction of numerical precision caused by the cancellation of a large number of Boltzmann factors. The structural preferences of mRNAs computed by ParasoR shows a high concordance with those determined by high-throughput sequencing analyses. Using ParasoR, we investigated the global structural preferences of transcribed regions in the human genome. A genome-wide folding simulation indicated that transcribed regions are significantly more structural than intergenic regions after removing repeat sequences and k-mer frequency bias. In particular, we observed a highly significant preference for base pairing over entire intronic regions as compared to their antisense sequences, as well as to intergenic regions. A comparison between pre-mRNAs and mRNAs showed that coding regions become more accessible after splicing, indicating constraints for translational efficiency. Such changes are correlated with gene expression levels, as well as GC content, and are enriched among genes associated with cytoskeleton and kinase functions. We have shown that ParasoR is very useful for analyzing the structural properties of long RNA sequences such as mRNAs, pre

  6. Genome structure of bdelloid rotifers: shaped by asexuality or desiccation?

    PubMed

    Gladyshev, Eugene A; Arkhipova, Irina R

    2010-01-01

    Bdelloid rotifers are microscopic invertebrate animals best known for their ancient asexuality and the ability to survive desiccation at any life stage. Both factors are expected to have a profound influence on their genome structure. Recent molecular studies demonstrated that, although the gene-rich regions of bdelloid genomes are organized as colinear pairs of closely related sequences and depleted in repetitive DNA, subtelomeric regions harbor diverse transposable elements and horizontally acquired genes of foreign origin. Although asexuality is expected to result in depletion of deleterious transposons, only desiccation appears to have the power to produce all the uncovered genomic peculiarities. Repair of desiccation-induced DNA damage would require the presence of a homologous template, maintaining colinear pairs in gene-rich regions and selecting against insertion of repetitive DNA that might cause chromosomal rearrangements. Desiccation may also induce a transient state of competence in recovering animals, allowing them to acquire environmental DNA. Even if bdelloids engage in rare or obscure forms of sexual reproduction, all these features could still be present. The relative contribution of asexuality and desiccation to genome organization may be clarified by analyzing whole-genome sequences and comparing foreign gene and transposon content in species which lost the ability to survive desiccation.

  7. Genomic structure of the α-amylase gene in the pearl oyster Pinctada fucata and its expression in response to salinity and food concentration.

    PubMed

    Huang, Guiju; Guo, Yihui; Li, Lu; Fan, Sigang; Yu, Ziniu; Yu, Dahui

    2016-08-01

    Amylase is one of the most important digestive enzymes for phytophagous animals. In this study, the cDNA, genomic DNA, and promoter region of the α-amylase gene of the pearl oyster Pinctada fucata were cloned by using reverse transcription-polymerase chain reaction (RT-PCR), rapid amplification of cDNA ends, and genome-walking methods. The full-length cDNA sequence was 1704bp long and consisted of a 5'-untranslated region of 17bp, a 3'-untranslated region of 118bp, and a 1569-bp open reading frame encoding a 522-aa polypeptide with a 20-aa signal peptide. Sequence alignment revealed that P. fucata α-amylase (Pfamy) shared the highest identity (91.6%) with Pinctada maxima. The phylogenetic tree showed that it was closely related to P. maxima, based on the amino acid sequences. The genomic DNA was 10850bp and contained nine exons, eight introns, and a promoter region of 3932bp. Several transcriptional factors such as GATA-1, AP-1, and SP1 were predicted in the promoter region. Quantitative RT-PCR assay indicated that the relative expression level of Pfamy was significantly higher in the digestive gland than in other tissues (gonad, gills, muscle, and mantle) (P<0.001). The expression level at salinity 27‰ was significantly higher than that at other salinities (P<0.05). Expression reached a minimum when the algal food concentration was 16×10(4)cells/mL, which was significantly lower than the level observed at 8×10(4)cells/mL and 20×10(4) cells/mL (P<0.05). Our findings provide a genetic basis for further research on Pfamy activity and will facilitate studies on the growth mechanisms and genetic improvement of the pearl oyster P. fucata. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. p63 gene structure in the phylum mollusca.

    PubMed

    Baričević, Ana; Štifanić, Mauro; Hamer, Bojan; Batel, Renato

    2015-08-01

    Roles of p53 family ancestor (p63) in the organisms' response to stressful environmental conditions (mainly pollution) have been studied among molluscs, especially in the genus Mytilus, within the last 15 years. Nevertheless, information about gene structure of this regulatory gene in molluscs is scarce. Here we report the first complete genomic structure of the p53 family orthologue in the mollusc Mediterranean mussel Mytilus galloprovincialis and confirm its similarity to vertebrate p63 gene. Our searches within the available molluscan genomes (Aplysia californica, Lottia gigantea, Crassostrea gigas and Biomphalaria glabrata), found only one p53 family member present in a single copy per haploid genome. Comparative analysis of those orthologues, additionally confirmed the conserved p63 gene structure. Conserved p63 gene structure can be a helpful tool to complement or/and revise gene annotations of any future p63 genomic sequence records in molluscs, but also in other animal phyla. Knowledge of the correct gene structure will enable better prediction of possible protein isoforms and their functions. Our analyses also pointed out possible mis-annotations of the p63 gene in sequenced molluscan genomes and stressed the value of manual inspection (based on alignments of cDNA and protein onto the genome sequence) for a reliable and complete gene annotation. Copyright © 2015 Elsevier Inc. All rights reserved.

  9. Widespread of horizontal gene transfer in the human genome.

    PubMed

    Huang, Wenze; Tsai, Lillian; Li, Yulong; Hua, Nan; Sun, Chen; Wei, Chaochun

    2017-04-04

    A fundamental concept in biology is that heritable material is passed from parents to offspring, a process called vertical gene transfer. An alternative mechanism of gene acquisition is through horizontal gene transfer (HGT), which involves movement of genetic materials between different species. Horizontal gene transfer has been found prevalent in prokaryotes but very rare in eukaryote. In this paper, we investigate horizontal gene transfer in the human genome. From the pair-wise alignments between human genome and 53 vertebrate genomes, 1,467 human genome regions (2.6 M bases) from all chromosomes were found to be more conserved with non-mammals than with most mammals. These human genome regions involve 642 known genes, which are enriched with ion binding. Compared to known horizontal gene transfer regions in the human genome, there were few overlapping regions, which indicated horizontal gene transfer is more common than we expected in the human genome. Horizontal gene transfer impacts hundreds of human genes and this study provided insight into potential mechanisms of HGT in the human genome.

  10. GenePRIMP: A GENE PRediction IMprovement Pipeline for Prokaryotic genomes

    SciTech Connect

    Pati, Amrita; Ivanova, Natalia N.; Mikhailova, Natalia; Ovchinnikova, Galina; Hooper, Sean D.; Lykidis, Athanasios; Kyrpides, Nikos C.

    2010-04-01

    We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.

  11. Identification and characterization of essential genes in the human genome

    PubMed Central

    Wang, Tim; Birsoy, Kıvanç; Hughes, Nicholas W.; Krupczak, Kevin M.; Post, Yorick; Wei, Jenny J.; Lander, Eric S.; Sabatini, David M.

    2015-01-01

    Large-scale genetic analysis of lethal phenotypes has elucidated the molecular underpinnings of many biological processes. Using the bacterial clustered regularly interspaced short palindromic repeats (CRISPR) system, we constructed a genome-wide single-guide RNA (sgRNA) library to screen for genes required for proliferation and survival in a human cancer cell line. Our screen revealed the set of cell-essential genes, which was validated by an orthogonal gene-trap-based screen and comparison with yeast gene knockouts. This set is enriched for genes that encode components of fundamental pathways, are expressed at high levels, and contain few inactivating polymorphisms in the human population. We also uncovered a large group of uncharacterized genes involved in RNA processing, a number of whose products localize to the nucleolus. Lastly, screens in additional cell lines showed a high degree of overlap in gene essentiality, but also revealed differences specific to each cell line and cancer type that reflect the developmental origin, oncogenic drivers, paralogous gene expression pattern, and chromosomal structure of each line. These results demonstrate the power of CRISPR-based screens and suggest a general strategy for identifying liabilities in cancer cells. PMID:26472758

  12. Insular Organization of Gene Space in Grass Genomes

    PubMed Central

    Massa, Alicia N.; Wanjugi, Humphrey; Deal, Karin R.; You, Frank M.; Xu, Xiangyang; Gu, Yong Q.; Luo, Ming-Cheng; Anderson, Olin D.; Chan, Agnes P.; Rabinowicz, Pablo

    2013-01-01

    Wheat and maize genes were hypothesized to be clustered into islands but the hypothesis was not statistically tested. The hypothesis is statistically tested here in four grass species differing in genome size, Brachypodium distachyon, Oryza sativa, Sorghum bicolor, and Aegilops tauschii. Density functions obtained under a model where gene locations follow a homogeneous Poisson process and thus are not clustered are compared with a model-free situation quantified through a non-parametric density estimate. A simple homogeneous Poisson model for gene locations is not rejected for the small O. sativa and B. distachyon genomes, indicating that genes are distributed largely uniformly in those species, but is rejected for the larger S. bicolor and Ae. tauschii genomes, providing evidence for clustering of genes into islands. It is proposed to call the gene islands “gene insulae” to distinguish them from other types of gene clustering that have been proposed. An average S. bicolor and Ae. tauschii insula is estimated to contain 3.7 and 3.9 genes with an average intergenic distance within an insula of 2.1 and 16.5 kb, respectively. Inter-insular distances are greater than 8 and 81 kb and average 15.1 and 205 kb, in S. bicolor and Ae. tauschii, respectively. A greater gene density observed in the distal regions of the Ae. tauschii chromosomes is shown to be primarily caused by shortening of inter-insular distances. The comparison of the four grass genomes suggests that gene locations are largely a function of a homogeneous Poisson process in small genomes. Nonrandom insertions of LTR retroelements during genome expansion creates gene insulae, which become less dense and further apart with the increase in genome size. High concordance in relative lengths of orthologous intergenic distances among the investigated genomes including the maize genome suggests functional constraints on gene distribution in the grass genomes. PMID:23326580

  13. Insular organization of gene space in grass genomes.

    PubMed

    Gottlieb, Andrea; Müller, Hans-Georg; Massa, Alicia N; Wanjugi, Humphrey; Deal, Karin R; You, Frank M; Xu, Xiangyang; Gu, Yong Q; Luo, Ming-Cheng; Anderson, Olin D; Chan, Agnes P; Rabinowicz, Pablo; Devos, Katrien M; Dvorak, Jan

    2013-01-01

    Wheat and maize genes were hypothesized to be clustered into islands but the hypothesis was not statistically tested. The hypothesis is statistically tested here in four grass species differing in genome size, Brachypodium distachyon, Oryza sativa, Sorghum bicolor, and Aegilops tauschii. Density functions obtained under a model where gene locations follow a homogeneous Poisson process and thus are not clustered are compared with a model-free situation quantified through a non-parametric density estimate. A simple homogeneous Poisson model for gene locations is not rejected for the small O. sativa and B. distachyon genomes, indicating that genes are distributed largely uniformly in those species, but is rejected for the larger S. bicolor and Ae. tauschii genomes, providing evidence for clustering of genes into islands. It is proposed to call the gene islands "gene insulae" to distinguish them from other types of gene clustering that have been proposed. An average S. bicolor and Ae. tauschii insula is estimated to contain 3.7 and 3.9 genes with an average intergenic distance within an insula of 2.1 and 16.5 kb, respectively. Inter-insular distances are greater than 8 and 81 kb and average 15.1 and 205 kb, in S. bicolor and Ae. tauschii, respectively. A greater gene density observed in the distal regions of the Ae. tauschii chromosomes is shown to be primarily caused by shortening of inter-insular distances. The comparison of the four grass genomes suggests that gene locations are largely a function of a homogeneous Poisson process in small genomes. Nonrandom insertions of LTR retroelements during genome expansion creates gene insulae, which become less dense and further apart with the increase in genome size. High concordance in relative lengths of orthologous intergenic distances among the investigated genomes including the maize genome suggests functional constraints on gene distribution in the grass genomes.

  14. Rotavirus gene structure and function.

    PubMed Central

    Estes, M K; Cohen, J

    1989-01-01

    Knowledge of the structure and function of the genes and proteins of the rotaviruses has expanded rapidly. Information obtained in the last 5 years has revealed unexpected and unique molecular properties of rotavirus proteins of general interest to virologists, biochemists, and cell biologists. Rotaviruses share some features of replication with reoviruses, yet antigenic and molecular properties of the outer capsid proteins, VP4 (a protein whose cleavage is required for infectivity, possibly by mediating fusion with the cell membrane) and VP7 (a glycoprotein), show more similarities with those of other viruses such as the orthomyxoviruses, paramyxoviruses, and alphaviruses. Rotavirus morphogenesis is a unique process, during which immature subviral particles bud through the membrane of the endoplasmic reticulum (ER). During this process, transiently enveloped particles form, the outer capsid proteins are assembled onto particles, and mature particles accumulate in the lumen of the ER. Two ER-specific viral glycoproteins are involved in virus maturation, and these glycoproteins have been shown to be useful models for studying protein targeting and retention in the ER and for studying mechanisms of virus budding. New ideas and approaches to understanding how each gene functions to replicate and assemble the segmented viral genome have emerged from knowledge of the primary structure of rotavirus genes and their proteins and from knowledge of the properties of domains on individual proteins. Localization of type-specific and cross-reactive neutralizing epitopes on the outer capsid proteins is becoming increasingly useful in dissecting the protective immune response, including evaluation of vaccine trials, with the practical possibility of enhancing the production of new, more effective vaccines. Finally, future analyses with recently characterized immunologic and gene probes and new animal models can be expected to provide a basic understanding of what regulates the

  15. Identifying potential cancer driver genes by genomic data integration

    PubMed Central

    Chen, Yong; Hao, Jingjing; Jiang, Wei; He, Tong; Zhang, Xuegong; Jiang, Tao; Jiang, Rui

    2013-01-01

    Cancer is a genomic disease associated with a plethora of gene mutations resulting in a loss of control over vital cellular functions. Among these mutated genes, driver genes are defined as being causally linked to oncogenesis, while passenger genes are thought to be irrelevant for cancer development. With increasing numbers of large-scale genomic datasets available, integrating these genomic data to identify driver genes from aberration regions of cancer genomes becomes an important goal of cancer genome analysis and investigations into mechanisms responsible for cancer development. A computational method, MAXDRIVER, is proposed here to identify potential driver genes on the basis of copy number aberration (CNA) regions of cancer genomes, by integrating publicly available human genomic data. MAXDRIVER employs several optimization strategies to construct a heterogeneous network, by means of combining a fused gene functional similarity network, gene-disease associations and a disease phenotypic similarity network. MAXDRIVER was validated to effectively recall known associations among genes and cancers. Previously identified as well as novel driver genes were detected by scanning CNAs of breast cancer, melanoma and liver carcinoma. Three predicted driver genes (CDKN2A, AKT1, RNF139) were found common in these three cancers by comparative analysis. PMID:24346768

  16. Identifying potential cancer driver genes by genomic data integration

    NASA Astrophysics Data System (ADS)

    Chen, Yong; Hao, Jingjing; Jiang, Wei; He, Tong; Zhang, Xuegong; Jiang, Tao; Jiang, Rui

    2013-12-01

    Cancer is a genomic disease associated with a plethora of gene mutations resulting in a loss of control over vital cellular functions. Among these mutated genes, driver genes are defined as being causally linked to oncogenesis, while passenger genes are thought to be irrelevant for cancer development. With increasing numbers of large-scale genomic datasets available, integrating these genomic data to identify driver genes from aberration regions of cancer genomes becomes an important goal of cancer genome analysis and investigations into mechanisms responsible for cancer development. A computational method, MAXDRIVER, is proposed here to identify potential driver genes on the basis of copy number aberration (CNA) regions of cancer genomes, by integrating publicly available human genomic data. MAXDRIVER employs several optimization strategies to construct a heterogeneous network, by means of combining a fused gene functional similarity network, gene-disease associations and a disease phenotypic similarity network. MAXDRIVER was validated to effectively recall known associations among genes and cancers. Previously identified as well as novel driver genes were detected by scanning CNAs of breast cancer, melanoma and liver carcinoma. Three predicted driver genes (CDKN2A, AKT1, RNF139) were found common in these three cancers by comparative analysis.

  17. Molecular Characterization of Soybean Pterocarpan 2-Dimethylallyltransferase in Glyceollin Biosynthesis: Local Gene and Whole-Genome Duplications of Prenyltransferase Genes Led to the Structural Diversity of Soybean Prenylated Isoflavonoids

    PubMed Central

    Yoneyama, Keisuke; Akashi, Tomoyoshi; Aoki, Toshio

    2016-01-01

    Soybean (Glycine max) accumulates several prenylated isoflavonoid phytoalexins, collectively referred to as glyceollins. Glyceollins (I, II, III, IV and V) possess modified pterocarpan skeletons with C5 moieties from dimethylallyl diphosphate, and they are commonly produced from (6aS, 11aS)-3,9,6a-trihydroxypterocarpan [(−)-glycinol]. The metabolic fate of (−)-glycinol is determined by the enzymatic introduction of a dimethylallyl group into C-4 or C-2, which is reportedly catalyzed by regiospecific prenyltransferases (PTs). 4-Dimethylallyl (−)-glycinol and 2-dimethylallyl (−)-glycinol are precursors of glyceollin I and other glyceollins, respectively. Although multiple genes encoding (−)-glycinol biosynthetic enzymes have been identified, those involved in the later steps of glyceollin formation mostly remain unidentified, except for (−)-glycinol 4-dimethylallyltransferase (G4DT), which is involved in glyceollin I biosynthesis. In this study, we identified four genes that encode isoflavonoid PTs, including (−)-glycinol 2-dimethylallyltransferase (G2DT), using homology-based in silico screening and biochemical characterization in yeast expression systems. Transcript analyses illustrated that changes in G2DT gene expression were correlated with the induction of glyceollins II, III, IV and V in elicitor-treated soybean cells and leaves, suggesting its involvement in glyceollin biosynthesis. Moreover, the genomic signatures of these PT genes revealed that G4DT and G2DT are paralogs derived from whole-genome duplications of the soybean genome, whereas other PT genes [isoflavone dimethylallyltransferase 1 (IDT1) and IDT2] were derived via local gene duplication on soybean chromosome 11. PMID:27986914

  18. Comparative analysis of grapevine whole-genome gene predictions, functional annotation, categorization and integration of the predicted gene sequences

    PubMed Central

    2012-01-01

    Background The first draft assembly and gene prediction of the grapevine genome (8X base coverage) was made available to the scientific community in 2007, and functional annotation was developed on this gene prediction. Since then additional Sanger sequences were added to the 8X sequences pool and a new version of the genomic sequence with superior base coverage (12X) was produced. Results In order to more efficiently annotate the function of the genes predicted in the new assembly, it is important to build on as much of the previous work as possible, by transferring 8X annotation of the genome to the 12X version. The 8X and 12X assemblies and gene predictions of the grapevine genome were compared to answer the question, “Can we uniquely map 8X predicted genes to 12X predicted genes?” The results show that while the assemblies and gene structure predictions are too different to make a complete mapping between them, most genes (18,725) showed a one-to-one relationship between 8X predicted genes and the last version of 12X predicted genes. In addition, reshuffled genomic sequence structures appeared. These highlight regions of the genome where the gene predictions need to be taken with caution. Based on the new grapevine gene functional annotation and in-depth functional categorization, twenty eight new molecular networks have been created for VitisNet while the existing networks were updated. Conclusions The outcomes of this study provide a functional annotation of the 12X genes, an update of VitisNet, the system of the grapevine molecular networks, and a new functional categorization of genes. Data are available at the VitisNet website (http://www.sdstate.edu/ps/research/vitis/pathways.cfm). PMID:22554261

  19. Genome Editing Gene Therapy for Duchenne Muscular Dystrophy.

    PubMed

    Hotta, Akitsu

    2015-09-22

    Duchenne muscular dystrophy (DMD) is a severe genetic disorder caused by loss of function of the dystrophin gene on the X chromosome. Gene augmentation of dystrophin is challenging due to the large size of the dystrophin cDNA. Emerging genome editing technologies, such as TALEN and CRISPR-Cas9 systems, open a new erain the restoration of functional dystrophin and are a hallmark of bona fide gene therapy. In this review, we summarize current genome editing approaches, properties of target cell types for ex vivo gene therapy, and perspectives of in vivo gene therapy including genome editing in human zygotes. Although technical challenges, such as efficacy, accuracy, and delivery of the genome editing components, remain to be further improved, yet genome editing technologies offer a new avenue for the gene therapy of DMD.

  20. Using the Gene Ontology to Scan Multi-Level Gene Sets for Associations in Genome Wide Association Studies

    PubMed Central

    Schaid, Daniel J.; Sinnwell, Jason P.; Jenkins, Gregory D.; McDonnell, Shannon K.; Ingle, James N.; Kubo, Michiaki; Goss, Paul E.; Costantino, Joseph P.; Wickerham, D. Lawrence; Weinshilboum, Richard M.

    2011-01-01

    Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc “fixes”. To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted p-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. PMID:22161999

  1. Genomic scan for genes predisposing to schizophrenia

    SciTech Connect

    Coon, H.; Jensen. S.; Holik, J.

    1994-03-15

    We initiated a genome-wide search for genes predisposing to schizophrenia by ascertaining 9 families, each containing three to five cases of schizophrenia. The 9 pedigrees were initially genotyped with 329 polymorphic DNA loci distributed throughout the genome. Assuming either autosomal dominant or recessive inheritance, 254 DNA loci yielded lod scores less than -2.0 at {theta} = 0.0, 101 DNA markers gave lod scores less than -2.0 at {theta} = 0.05, while 5 DNA loci produced maximum lod scores greater than 1: D4S35, D14S17, D15S1, D22S84, and D22S55. Of the DNA markers yielding lod scores greater than 1, D4S35 and D22S55 also were suggestive of linkage when the Affected-Pedigree-Member method was used. The families were then genotyped with four highly polymorphic simple sequence repeat markers; possible linkage diminished with DNA markers mapping nearby D4S35, while suggestive evidence of linkage remained with loci in the region of D22S55. Although follow-up investigation of these chromosomal regions may be warranted, our linkage results should be viewed as preliminary observations, as 35 unaffected persons are not past the age of risk. 90 refs., 3 tabs.

  2. The discrepancies in the results of bioinformatics tools for genomic structural annotation

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena; Nowak, Robert; Osipowski, Paweł; Rymuszka, Jacek; Świerkula, Katarzyna; Wojcieszek, Michał; Przybecki, Zbigniew

    2014-11-01

    A major focus of sequencing project is to identify genes in genomes. However it is necessary to define the variety of genes and the criteria for identifying them. In this work we present discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. We use Fgenesh, GenScan and GeneMark to automated structural annotation, the results have been compared to reference annotation.

  3. Identification of structural variation in mouse genomes

    PubMed Central

    Keane, Thomas M.; Wong, Kim; Adams, David J.; Flint, Jonathan; Reymond, Alexandre; Yalcin, Binnaz

    2014-01-01

    Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation. PMID:25071822

  4. From genes to genomes: universal scale-invariant properties of microbial chromosome organisation.

    PubMed

    Audit, Benjamin; Ouzounis, Christos A

    2003-09-19

    The availability of complete genome sequences for a large variety of organisms is a major advance in understanding genome structure and function. One attribute of genome structure is chromosome organisation in terms of gene localisation and orientation. For example, bacterial operons, i.e. clusters of co-oriented genes that form transcription units, enable functionally related genes to be expressed simultaneously. The description of genome organisation was pioneered with the study of the distribution of genes of the Escherichia coli partial genetic map before the full genome sequence was known. Deploying powerful techniques from circular statistics and signal processing, we revisit the issue of gene localisation and orientation using 89 complete microbial chromosomes from the eubacterial and archaeal domains. We demonstrate that there is no characteristic size pertinent to the description of chromosome structure, e.g. there does not exist any single length appropriate to describe gene clustering. Our results show that, for all 89 chromosomes, gene positions and gene orientations share a common form of scale-invariant correlations known as "long-range correlations" that we can reveal for distances from the gene length, up to the chromosome size. This observation indicates that genes tend to assemble and to co-orient over any scale of observation greater than a few kilobases. This unexpected property of chromosome structure can be portrayed as an operon-like organisation at all scales and implies that a complete scale range extending over more than three orders of magnitudes of chromosome segment lengths is necessary to properly describe prokaryotic genome organisation. We propose that this pattern results from the effects of the superhelical context on gene expression coupled with the structure and dynamics of the nucleoid, possibly accommodating the diverse gene expression profiles needed during the different stages of cellular life.

  5. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes

    PubMed Central

    Lin, Michael F.; Carlson, Joseph W.; Crosby, Madeline A.; Matthews, Beverley B.; Yu, Charles; Park, Soo; Wan, Kenneth H.; Schroeder, Andrew J.; Gramates, L. Sian; St. Pierre, Susan E.; Roark, Margaret; Wiley, Kenneth L.; Kulathinal, Rob J.; Zhang, Peili; Myrick, Kyl V.; Antone, Jerry V.; Celniker, Susan E.; Gelbart, William M.; Kellis, Manolis

    2007-01-01

    The availability of sequenced genomes from 12 Drosophila species has enabled the use of comparative genomics for the systematic discovery of functional elements conserved within this genus. We have developed quantitative metrics for the evolutionary signatures specific to protein-coding regions and applied them genome-wide, resulting in 1193 candidate new protein-coding exons in the D. melanogaster genome. We have reviewed these predictions by manual curation and validated a subset by directed cDNA screening and sequencing, revealing both new genes and new alternative splice forms of known genes. We also used these evolutionary signatures to evaluate existing gene annotations, resulting in the validation of 87% of genes lacking descriptive names and identifying 414 poorly conserved genes that are likely to be spurious predictions, noncoding, or species-specific genes. Furthermore, our methods suggest a variety of refinements to hundreds of existing gene models, such as modifications to translation start codons and exon splice boundaries. Finally, we performed directed genome-wide searches for unusual protein-coding structures, discovering 149 possible examples of stop codon readthrough, 125 new candidate ORFs of polycistronic mRNAs, and several candidate translational frameshifts. These results affect >10% of annotated fly genes and demonstrate the power of comparative genomics to enhance our understanding of genome organization, even in a model organism as intensively studied as Drosophila melanogaster. PMID:17989253

  6. Whole genome sequence of Desulfovibrio magneticus strain RS-1 revealed common gene clusters in magnetotactic bacteria

    PubMed Central

    Nakazawa, Hidekazu; Arakaki, Atsushi; Narita-Yamada, Sachiko; Yashiro, Isao; Jinno, Koji; Aoki, Natsuko; Tsuruyama, Ai; Okamura, Yoshiko; Tanikawa, Satoshi; Fujita, Nobuyuki; Takeyama, Haruko; Matsunaga, Tadashi

    2009-01-01

    Magnetotactic bacteria are ubiquitous microorganisms that synthesize intracellular magnetite particles (magnetosomes) by accumulating Fe ions from aquatic environments. Recent molecular studies, including comprehensive proteomic, transcriptomic, and genomic analyses, have considerably improved our hypotheses of the magnetosome-formation mechanism. However, most of these studies have been conducted using pure-cultured bacterial strains of α-proteobacteria. Here, we report the whole-genome sequence of Desulfovibrio magneticus strain RS-1, the only isolate of magnetotactic microorganisms classified under δ-proteobacteria. Comparative genomics of the RS-1 and four α-proteobacterial strains revealed the presence of three separate gene regions (nuo and mamAB-like gene clusters, and gene region of a cryptic plasmid) conserved in all magnetotactic bacteria. The nuo gene cluster, encoding NADH dehydrogenase (complex I), was also common to the genomes of three iron-reducing bacteria exhibiting uncontrolled extracellular and/or intracellular magnetite synthesis. A cryptic plasmid, pDMC1, encodes three homologous genes that exhibit high similarities with those of other magnetotactic bacterial strains. In addition, the mamAB-like gene cluster, encoding the key components for magnetosome formation such as iron transport and magnetosome alignment, was conserved only in the genomes of magnetotactic bacteria as a similar genomic island-like structure. Our findings suggest the presence of core genetic components for magnetosome biosynthesis; these genes may have been acquired into the magnetotactic bacterial genomes by multiple gene-transfer events during proteobacterial evolution. PMID:19675025

  7. A salmonid EST genomic study: genes, duplications, phylogeny and microarrays

    USDA-ARS?s Scientific Manuscript database

    Background: Salmonids are of interest because of their relatively recent genome duplication, and their extensive use in wild fisheries and aquaculture. A comprehensive gene list and a comparison of genes in some of the different species provide valuable genomic information for one of the most wide...

  8. The Early ANTP Gene Repertoire: Insights from the Placozoan Genome

    PubMed Central

    Schierwater, Bernd; Kamm, Kai; Srivastava, Mansi; Rokhsar, Daniel; Rosengarten, Rafael D.; Dellaporta, Stephen L.

    2008-01-01

    The evolution of ANTP genes in the Metazoa has been the subject of conflicting hypotheses derived from full or partial gene sequences and genomic organization in higher animals. Whole genome sequences have recently filled in some crucial gaps for the basal metazoan phyla Cnidaria and Porifera. Here we analyze the complete genome of Trichoplax adhaerens, representing the basal metazoan phylum Placozoa, for its set of ANTP class genes. The Trichoplax genome encodes representatives of Hox/ParaHox-like, NKL, and extended Hox genes. This repertoire possibly mirrors the condition of a hypothetical cnidarian-bilaterian ancestor. The evolution of the cnidarian and bilaterian ANTP gene repertoires can be deduced by a limited number of cis-duplications of NKL and “extended Hox” genes and the presence of a single ancestral “ProtoHox” gene. PMID:18716659

  9. Comprehensively identifying and characterizing the missing gene sequences in human reference genome with integrated analytic approaches.

    PubMed

    Chen, Geng; Wang, Charles; Shi, Leming; Tong, Weida; Qu, Xiongfei; Chen, Jiwei; Yang, Jianmin; Shi, Caiping; Chen, Long; Zhou, Peiying; Lu, Bingxin; Shi, Tieliu

    2013-08-01

    The human reference genome is still incomplete and a number of gene sequences are missing from it. The approaches to uncover them, the reasons causing their absence and their functions are less explored. Here, we comprehensively identified and characterized the missing genes of human reference genome with RNA-Seq data from 16 different human tissues. By using a combined approach of genome-guided transcriptome reconstruction coupled with genome-wide comparison, we uncovered 3.78 and 2.37 Mb transcribed regions in the human genome assemblies of Celera and HuRef either missed from their homologous chromosomes of NCBI human reference genome build 37.2 or partially or entirely absent from the reference. We further identified a significant number of novel transcript contigs in each tissue from de novo transcriptome assembly that are unalignable to NCBI build 37.2 but can be aligned to at least one of the genomes from Celera, HuRef, chimpanzee, macaca or mouse. Our analyses indicate that the missing genes could result from genome misassembly, transposition, copy number variation, translocation and other structural variations. Moreover, our results further suggest that a large portion of these missing genes are conserved between human and other mammals, implying their important biological functions. Totally, 1,233 functional protein domains were detected in these missing genes. Collectively, our study not only provides approaches for uncovering the missing genes of a genome, but also proposes the potential reasons causing genes missed from the genome and highlights the importance of uncovering the missing genes of incomplete genomes.

  10. Structure of the germline genome of Tetrahymena thermophila and relationship to the massively rearranged somatic genome

    PubMed Central

    Hamilton, Eileen P; Kapusta, Aurélie; Huvos, Piroska E; Bidwell, Shelby L; Zafar, Nikhat; Tang, Haibao; Hadjithomas, Michalis; Krishnakumar, Vivek; Badger, Jonathan H; Caler, Elisabet V; Russ, Carsten; Zeng, Qiandong; Fan, Lin; Levin, Joshua Z; Shea, Terrance; Young, Sarah K; Hegarty, Ryan; Daza, Riza; Gujja, Sharvari; Wortman, Jennifer R; Birren, Bruce W; Nusbaum, Chad; Thomas, Jainy; Carey, Clayton M; Pritham, Ellen J; Feschotte, Cédric; Noto, Tomoko; Mochizuki, Kazufumi; Papazyan, Romeo; Taverna, Sean D; Dear, Paul H; Cassidy-Hanley, Donna M; Xiong, Jie; Miao, Wei; Orias, Eduardo; Coyne, Robert S

    2016-01-01

    The germline genome of the binucleated ciliate Tetrahymena thermophila undergoes programmed chromosome breakage and massive DNA elimination to generate the somatic genome. Here, we present a complete sequence assembly of the germline genome and analyze multiple features of its structure and its relationship to the somatic genome, shedding light on the mechanisms of genome rearrangement as well as the evolutionary history of this remarkable germline/soma differentiation. Our results strengthen the notion that a complex, dynamic, and ongoing interplay between mobile DNA elements and the host genome have shaped Tetrahymena chromosome structure, locally and globally. Non-standard outcomes of rearrangement events, including the generation of short-lived somatic chromosomes and excision of DNA interrupting protein-coding regions, may represent novel forms of developmental gene regulation. We also compare Tetrahymena’s germline/soma differentiation to that of other characterized ciliates, illustrating the wide diversity of adaptations that have occurred within this phylum. DOI: http://dx.doi.org/10.7554/eLife.19090.001 PMID:27892853

  11. Chicken rRNA Gene Cluster Structure.

    PubMed

    Dyomin, Alexander G; Koshel, Elena I; Kiselev, Artem M; Saifitdinova, Alsu F; Galkina, Svetlana A; Fukagawa, Tatsuo; Kostareva, Anna A; Gaginskaya, Elena R

    2016-01-01

    Ribosomal RNA (rRNA) genes, whose activity results in nucleolus formation, constitute an extremely important part of genome. Despite the extensive exploration into avian genomes, no complete description of avian rRNA gene primary structure has been offered so far. We publish a complete chicken rRNA gene cluster sequence here, including 5'ETS (1836 bp), 18S rRNA gene (1823 bp), ITS1 (2530 bp), 5.8S rRNA gene (157 bp), ITS2 (733 bp), 28S rRNA gene (4441 bp) and 3'ETS (343 bp). The rRNA gene cluster sequence of 11863 bp was assembled from raw reads and deposited to GenBank under KT445934 accession number. The assembly was validated through in situ fluorescent hybridization analysis on chicken metaphase chromosomes using computed and synthesized specific probes, as well as through the reference assembly against de novo assembled rRNA gene cluster sequence using sequenced fragments of BAC-clone containing chicken NOR (nucleolus organizer region). The results have confirmed the chicken rRNA gene cluster validity.

  12. Chicken rRNA Gene Cluster Structure

    PubMed Central

    Dyomin, Alexander G.; Koshel, Elena I.; Kiselev, Artem M.; Saifitdinova, Alsu F.; Galkina, Svetlana A.; Fukagawa, Tatsuo; Kostareva, Anna A.

    2016-01-01

    Ribosomal RNA (rRNA) genes, whose activity results in nucleolus formation, constitute an extremely important part of genome. Despite the extensive exploration into avian genomes, no complete description of avian rRNA gene primary structure has been offered so far. We publish a complete chicken rRNA gene cluster sequence here, including 5’ETS (1836 bp), 18S rRNA gene (1823 bp), ITS1 (2530 bp), 5.8S rRNA gene (157 bp), ITS2 (733 bp), 28S rRNA gene (4441 bp) and 3’ETS (343 bp). The rRNA gene cluster sequence of 11863 bp was assembled from raw reads and deposited to GenBank under KT445934 accession number. The assembly was validated through in situ fluorescent hybridization analysis on chicken metaphase chromosomes using computed and synthesized specific probes, as well as through the reference assembly against de novo assembled rRNA gene cluster sequence using sequenced fragments of BAC-clone containing chicken NOR (nucleolus organizer region). The results have confirmed the chicken rRNA gene cluster validity. PMID:27299357

  13. Two duplicated chicken-type lysozyme genes in disc abalone Haliotis discus discus: molecular aspects in relevance to structure, genomic organization, mRNA expression and bacteriolytic function.

    PubMed

    Umasuthan, Navaneethaiyer; Bathige, S D N K; Kasthuri, Saranya Revathy; Wan, Qiang; Whang, Ilson; Lee, Jehee

    2013-08-01

    Lysozymes are crucial antibacterial proteins that are associated with catalytic cleavage of peptidoglycan and subsequent bacteriolysis. The present study describes the identification of two lysozyme genes from disc abalone Haliotis discus discus and their characterization at sequence-, genomic-, transcriptional- and functional-levels. Two cDNAs and BAC clones bearing lysozyme genes were isolated from abalone transcriptome and BAC genomic libraries, respectively and sequences were determined. Corresponding deduced amino acid sequences harbored a chicken-type lysozyme (LysC) family profile and exhibited conserved characteristics of LysC family members including active residues (Glu and Asp) and GS(S/T)DYGIFQINS motif suggested that they are LysC counterparts in disc abalone and designated as abLysC1 and abLysC2. While abLysC1 represented the homolog recently reported in Ezo abalone [1], abLysC2 shared significant identity with LysC homologs. Unlike other vertebrate LysCs, coding sequence of abLysCs were distributed within five exons interrupted by four introns. Both abLysCs revealed a broader mRNA distribution with highest levels in mantle (abLysC1) and hepatopancreas (abLysC2) suggesting their likely main role in defense and digestion, respectively. Investigation of temporal transcriptional profiles post-LPS and -pathogen challenges revealed induced-responses of abLysCs in gills and hemocytes. The in vitro muramidase activity of purified recombinant (r) abLysCs proteins was evaluated, and findings indicated that they are active in acidic pH range (3.5-6.5) and over a broad temperature range (20-60 °C) and influenced by ionic strength. When the antibacterial spectra of (r)abLysCs were examined, they displayed differential activities against both Gram positive and Gram negative strains providing evidence for their involvement in bacteriolytic function in abalone physiology. Copyright © 2013 Elsevier Ltd. All rights reserved.

  14. Evidence-based gene predictions in plant genomes

    USDA-ARS?s Scientific Manuscript database

    Automated evidence-based gene building is a rapid and cost-effective way to provide reliable gene annotations on newly sequenced genomes. One of the limitations of evidence-based gene builders, however, is their requirement for gene expression evidence—known proteins, full-length cDNAs, or expressed...

  15. Missing genes in the annotation of prokaryotic genomes.

    PubMed

    Warren, Andrew S; Archuleta, Jeremy; Feng, Wu-Chun; Setubal, João Carlos

    2010-03-15

    Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting). Therefore the question arises as to whether current genome annotations have systematically missing, small genes. We have developed a high-performance computing methodology to investigate this problem. In this methodology we compare all ORFs larger than or equal to 33 aa from all fully-sequenced prokaryotic replicons. Based on that comparison, and using conservative criteria requiring a minimum taxonomic diversity between conserved ORFs in different genomes, we have discovered 1,153 candidate genes that are missing from current genome annotations. These missing genes are similar only to each other and do not have any strong similarity to gene sequences in public databases, with the implication that these ORFs belong to missing gene families. We also uncovered 38,895 intergenic ORFs, readily identified as putative genes by similarity to currently annotated genes (we call these absent annotations). The vast majority of the missing genes found are small (less than 100 aa). A comparison of select examples with GeneMark, EasyGene and Glimmer predictions yields evidence that some of these genes are escaping detection by these programs. Prokaryotic gene finders and prokaryotic genome annotations require improvement for accurate prediction of small genes. The number of missing gene families found is likely a lower bound on the actual number, due to the conservative criteria used to determine whether an ORF corresponds to a real gene.

  16. Mechanisms and dynamics of orphan gene emergence in insect genomes.

    PubMed

    Wissler, Lothar; Gadau, Jürgen; Simola, Daniel F; Helmkampf, Martin; Bornberg-Bauer, Erich

    2013-01-01

    Orphan genes are defined as genes that lack detectable similarity to genes in other species and therefore no clear signals of common descent (i.e., homology) can be inferred. Orphans are an enigmatic portion of the genome because their origin and function are mostly unknown and they typically make up 10% to 30% of all genes in a genome. Several case studies demonstrated that orphans can contribute to lineage-specific adaptation. Here, we study orphan genes by comparing 30 arthropod genomes, focusing in particular on seven recently sequenced ant genomes. This setup allows analyzing a major metazoan taxon and a comparison between social Hymenoptera (ants and bees) and nonsocial Diptera (flies and mosquitoes). First, we find that recently split lineages undergo accelerated genomic reorganization, including the rapid gain of many orphan genes. Second, between the two insect orders Hymenoptera and Diptera, orphan genes are more abundant and emerge more rapidly in Hymenoptera, in particular, in leaf-cutter ants. With respect to intragenomic localization, we find that ant orphan genes show little clustering, which suggests that orphan genes in ants are scattered uniformly over the genome and between nonorphan genes. Finally, our results indicate that the genetic mechanisms creating orphan genes-such as gene duplication, frame-shift fixation, creation of overlapping genes, horizontal gene transfer, and exaptation of transposable elements-act at different rates in insects, primates, and plants. In Formicidae, the majority of orphan genes has their origin in intergenic regions, pointing to a high rate of de novo gene formation or generalized gene loss, and support a recently proposed dynamic model of frequent gene birth and death.

  17. Structural and functional characterization of a transcription-enhancing sequence element in the rbcL gene of the Chlamydomonas chloroplast genome.

    PubMed

    Anthonisen, Inger Lill; Kasai, Seitaro; Kato, Ko; Salvador, Maria Luisa; Klein, Uwe

    2002-08-01

    The structure and function of a transcription-enhancing sequence element in the coding region of the Chlamydomonas reinhardtii rbcL gene was analyzed in Chlamydomonas chloroplast transformants in vivo. The enhancer sequence is contained within a DNA segment extending from position +108 to position +143, relative to the start site of rbcL gene transcription. The sequence remains functional when inverted or when placed 34 bp closer to or 87 bp further downstream of the basic rbcL promoter. However, it does not function from a site about 250 bp downstream of its original location. Besides promoting transcription initiation from the rbcL promoter, the element is able to augment transcription from the promoter of the Chlamydomonas chloroplast atpB gene, but has an inhibitory effect on transcription from the promoter of the chloroplast ribosomal RNA genes. The results suggest that the enhancer-like sequence acts upon transcription initiation in a position-specific and promoter type-specific manner.

  18. Flexibility and Symmetry of Prokaryotic Genome Rearrangement Reveal Lineage-Associated Core-Gene-Defined Genome Organizational Frameworks

    PubMed Central

    Kang, Yu; Gu, Chaohao; Yuan, Lina; Wang, Yue; Zhu, Yanmin; Li, Xinna; Luo, Qibin; Xiao, Jingfa; Jiang, Daquan; Qian, Minping; Ahmed Khan, Aftab; Chen, Fei

    2014-01-01

    ABSTRACT The prokaryotic pangenome partitions genes into core and dispensable genes. The order of core genes, albeit assumed to be stable under selection in general, is frequently interrupted by horizontal gene transfer and rearrangement, but how a core-gene-defined genome maintains its stability or flexibility remains to be investigated. Based on data from 30 species, including 425 genomes from six phyla, we grouped core genes into syntenic blocks in the context of a pangenome according to their stability across multiple isolates. A subset of the core genes, often species specific and lineage associated, formed a core-gene-defined genome organizational framework (cGOF). Such cGOFs are either single segmental (one-third of the species analyzed) or multisegmental (the rest). Multisegment cGOFs were further classified into symmetric or asymmetric according to segment orientations toward the origin-terminus axis. The cGOFs in Gram-positive species are exclusively symmetric and often reversible in orientation, as opposed to those of the Gram-negative bacteria, which are all asymmetric and irreversible. Meanwhile, all species showing strong strand-biased gene distribution contain symmetric cGOFs and often specific DnaE (α subunit of DNA polymerase III) isoforms. Furthermore, functional evaluations revealed that cGOF genes are hub associated with regard to cellular activities, and the stability of cGOF provides efficient indexes for scaffold orientation as demonstrated by assembling virtual and empirical genome drafts. cGOFs show species specificity, and the symmetry of multisegmental cGOFs is conserved among taxa and constrained by DNA polymerase-centric strand-biased gene distribution. The definition of species-specific cGOFs provides powerful guidance for genome assembly and other structure-based analysis. PMID:25425232

  19. Evolution of paralogous genes: Reconstruction of genome rearrangements through comparison of multiple genomes within Staphylococcus aureus.

    PubMed

    Tsuru, Takeshi; Kawai, Mikihiko; Mizutani-Ui, Yoko; Uchiyama, Ikuo; Kobayashi, Ichizo

    2006-06-01

    Analysis of evolution of paralogous genes in a genome is central to our understanding of genome evolution. Comparison of closely related bacterial genomes, which has provided clues as to how genome sequences evolve under natural conditions, would help in such an analysis. With species Staphylococcus aureus, whole-genome sequences have been decoded for seven strains. We compared their DNA sequences to detect large genome polymorphisms and to deduce mechanisms of genome rearrangements that have formed each of them. We first compared strains N315 and Mu50, which make one of the most closely related strain pairs, at the single-nucleotide resolution to catalogue all the middle-sized (more than 10 bp) to large genome polymorphisms such as indels and substitutions. These polymorphisms include two paralogous gene sets, one in a tandem paralogue gene cluster for toxins in a genomic island and the other in a ribosomal RNA operon. We also focused on two other tandem paralogue gene clusters and type I restriction-modification (RM) genes on the genomic islands. Then we reconstructed rearrangement events responsible for these polymorphisms, in the paralogous genes and the others, with reference to the other five genomes. For the tandem paralogue gene clusters, we were able to infer sequences for homologous recombination generating the change in the repeat number. These sequences were conserved among the repeated paralogous units likely because of their functional importance. The sequence specificity (S) subunit of type I RM systems showed recombination, likely at the homology of a conserved region, between the two variable regions for sequence specificity. We also noticed novel alleles in the ribosomal RNA operons and suggested a role for illegitimate recombination in their formation. These results revealed importance of recombination involving long conserved sequence in the evolution of paralogous genes in the genome.

  20. Molecular cloning, genomic structure, polymorphism analysis and recombinant expression of a α1-antitrypsin like gene from swamp eel, Monopterus albus.

    PubMed

    Li, Wei; Wang, Quanhe; Li, Shaobin; Jiang, Ao; Sun, Wenxiu

    2017-03-01

    Alpha-1-antitrypsin (AAT) is a highly polymorphic glycoprotein antiprotease, involved in the regulation of human immune response. Beyond some genomic characterization and a few protein characterizations, the function of teleost AAT remains uncertain. In this study we cloned an AAT-like gene from a swamp eel liver identifying four exons and three introns, and the full-length cDNA. The elucidated swamp eel AAT amino acid sequence showed high homology with known AATs from other teleosts. The swamp eel AAT was examined both in ten healthy tissues and in four bacterially-stimulated tissues resulting in up-regulation of swamp eel AAT at different times. Swamp eel AAT transcripts were ubiquitously but unevenly expressed in ten tissues. Further, the mature peptide sequence of swamp eel AAT was subcloned and transformed into E. coli with the recombinant proteins successfully inhibiting bovine trypsin activity. Analysis of recombinant AAT showed equimolar formation of irreversible complexes with proteinases, high stability at pH 7.0-10.0 and temperatures below 55 °C. Serum AAT protein level significantly increased in response to inflammation with AAT anti-sera, and, NF-κB, apolipoprotein A1 and transferrin gene expression were dramatically decreased over 72 h post recombinant AAT injection. Lastly, examination of swamp eel AAT allelic polymorphism identified all alleles in both healthy and diseased stock except allele*g, found only in diseased stock, but without statistical difference between the distribution frequency of allele*g in the two stocks. These results are crucial to our ongoing study of the role of teleost AAT in the innate immune system. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. Genome-wide characterization of the Pectate Lyase-like (PLL) genes in Brassica rapa.

    PubMed

    Jiang, Jingjing; Yao, Lina; Miao, Ying; Cao, Jiashu

    2013-11-01

    Pectate lyases (PL) depolymerize demethylated pectin (pectate, EC 4.2.2.2) by catalyzing the eliminative cleavage of α-1,4-glycosidic linked galacturonan. Pectate Lyase-like (PLL) genes are one of the largest and most complex families in plants. However, studies on the phylogeny, gene structure, and expression of PLL genes are limited. To understand the potential functions of PLL genes in plants, we characterized their intron-exon structure, phylogenetic relationships, and protein structures, and measured their expression patterns in various tissues, specifically the reproductive tissues in Brassica rapa. Sequence alignments revealed two characteristic motifs in PLL genes. The chromosome location analysis indicated that 18 of the 46 PLL genes were located in the least fractionated sub-genome (LF) of B. rapa, while 16 were located in the medium fractionated sub-genome (MF1) and 12 in the more fractionated sub-genome (MF2). Quantitative RT-PCR analysis showed that BrPLL genes were expressed in various tissues, with most of them being expressed in flowers. Detailed qRT-PCR analysis identified 11 pollen specific PLL genes and several other genes with unique spatial expression patterns. In addition, some duplicated genes showed similar expression patterns. The phylogenetic analysis identified three PLL gene subfamilies in plants, among which subfamily II might have evolved from gene neofunctionalization or subfunctionalization. Therefore, this study opens the possibility for exploring the roles of PLL genes during plant development.

  2. Genome-wide analysis reveals recurrent structural abnormalities of TP63 and other p53-related genes in peripheral T-cell lymphomas

    PubMed Central

    Vasmatzis, George; Johnson, Sarah H.; Knudson, Ryan A.; Ketterling, Rhett P.; Braggio, Esteban; Fonseca, Rafael; Viswanatha, David S.; Law, Mark E.; Kip, N. Sertac; Özsan, Nazan; Grebe, Stefan K.; Frederick, Lori A.; Eckloff, Bruce W.; Thompson, E. Aubrey; Kadin, Marshall E.; Milosevic, Dragana; Porcher, Julie C.; Asmann, Yan W.; Smith, David I.; Kovtun, Irina V.; Ansell, Stephen M.; Dogan, Ahmet

    2012-01-01

    Peripheral T-cell lymphomas (PTCLs) are aggressive malignancies of mature T lymphocytes with 5-year overall survival rates of only ∼ 35%. Improvement in outcomes has been stymied by poor understanding of the genetics and molecular pathogenesis of PTCL, with a resulting paucity of molecular targets for therapy. We developed bioinformatic tools to identify chromosomal rearrangements using genome-wide, next-generation sequencing analysis of mate-pair DNA libraries and applied these tools to 16 PTCL patient tissue samples and 6 PTCL cell lines. Thirteen recurrent abnormalities were identified, of which 5 involved p53-related genes (TP53, TP63, CDKN2A, WWOX, and ANKRD11). Among these abnormalities were novel TP63 rearrangements encoding fusion proteins homologous to ΔNp63, a dominant-negative p63 isoform that inhibits the p53 pathway. TP63 rearrangements were seen in 11 (5.8%) of 190 PTCLs and were associated with inferior overall survival; they also were detected in 2 (1.2%) of 164 diffuse large B-cell lymphomas. As TP53 mutations are rare in PTCL compared with other malignancies, our findings suggest that a constellation of alternate genetic abnormalities may contribute to disruption of p53-associated tumor suppressor function in PTCL. PMID:22855598

  3. Microfluidic gene arrays for rapid genomic profiling

    NASA Astrophysics Data System (ADS)

    West, Jay A.; Hukari, Kyle W.; Hux, Gary A.; Shepodd, Timothy J.

    2004-12-01

    Genomic analysis tools have recently become an indispensable tool for the evaluation of gene expression in a variety of experiment protocols. Two of the main drawbacks to this technology are the labor and time intensive process for sample preparation and the relatively long times required for target/probe hybridization. In order to overcome these two technological barriers we have developed a microfluidic chip to perform on chip sample purification and labeling, integrated with a high density genearray. Sample purification was performed using a porous polymer monolithic material functionalized with an oligo dT nucleotide sequence for the isolation of high purity mRNA. These purified mRNA"s can then rapidly labeled using a covalent fluorescent molecule which forms a selective covalent bond at the N7 position of guanine residues. These labeled mRNA"s can then released from the polymer monolith to allow for direct hybridization with oligonucletide probes deposited in microfluidic channel. To allow for rapid target/probe hybridization high density microarray were printed in microchannels. The channels can accommodate array densities as high as 4000 probes. When oligonucleotide deposition is complete, these channels are sealed using a polymer film which forms a pressure tight seal to allow sample reagent flow to the arrayed probes. This process will allow for real time target to probe hybridization monitoring using a top mounted CCD fiber bundle combination. Using this process we have been able to perform a multi-step sample preparation to labeled target/probe hybridization in less than 30 minutes. These results demonstrate the capability to perform rapid genomic screening on a high density microfluidic microarray of oligonucleotides.

  4. Mechanisms and Dynamics of Orphan Gene Emergence in Insect Genomes

    PubMed Central

    Wissler, Lothar; Gadau, Jürgen; Simola, Daniel F.; Helmkampf, Martin; Bornberg-Bauer, Erich

    2013-01-01

    Orphan genes are defined as genes that lack detectable similarity to genes in other species and therefore no clear signals of common descent (i.e., homology) can be inferred. Orphans are an enigmatic portion of the genome because their origin and function are mostly unknown and they typically make up 10% to 30% of all genes in a genome. Several case studies demonstrated that orphans can contribute to lineage-specific adaptation. Here, we study orphan genes by comparing 30 arthropod genomes, focusing in particular on seven recently sequenced ant genomes. This setup allows analyzing a major metazoan taxon and a comparison between social Hymenoptera (ants and bees) and nonsocial Diptera (flies and mosquitoes). First, we find that recently split lineages undergo accelerated genomic reorganization, including the rapid gain of many orphan genes. Second, between the two insect orders Hymenoptera and Diptera, orphan genes are more abundant and emerge more rapidly in Hymenoptera, in particular, in leaf-cutter ants. With respect to intragenomic localization, we find that ant orphan genes show little clustering, which suggests that orphan genes in ants are scattered uniformly over the genome and between nonorphan genes. Finally, our results indicate that the genetic mechanisms creating orphan genes—such as gene duplication, frame-shift fixation, creation of overlapping genes, horizontal gene transfer, and exaptation of transposable elements—act at different rates in insects, primates, and plants. In Formicidae, the majority of orphan genes has their origin in intergenic regions, pointing to a high rate of de novo gene formation or generalized gene loss, and support a recently proposed dynamic model of frequent gene birth and death. PMID:23348040

  5. Genome-wide analysis of chromatin packing in Arabidopsis thaliana at single-gene resolution

    PubMed Central

    Liu, Chang; Wang, Congmao; Wang, George; Becker, Claude; Zaidem, Maricris; Weigel, Detlef

    2016-01-01

    The three-dimensional packing of the genome plays an important role in regulating gene expression. We have used Hi-C, a genome-wide chromatin conformation capture (3C) method, to analyze Arabidopsis thaliana chromosomes dissected into subkilobase segments, which is required for gene-level resolution in this species with a gene-dense genome. We found that the repressive H3K27me3 histone mark is overrepresented in the promoter regions of genes that are in conformational linkage over long distances. In line with the globally dispersed distribution of RNA polymerase II in A. thaliana nuclear space, actively transcribed genes do not show a strong tendency to associate with each other. In general, there are often contacts between 5′ and 3′ ends of genes, forming local chromatin loops. Such self-loop structures of genes are more likely to occur in more highly expressed genes, although they can also be found in silent genes. Silent genes with local chromatin loops are highly enriched for the histone variant H3.3 at their 5′ and 3′ ends but depleted of repressive marks such as heterochromatic histone modifications and DNA methylation in flanking regions. Our results suggest that, different from animals, a major theme of genome folding in A. thaliana is the formation of structural units that correspond to gene bodies. PMID:27225844

  6. Hemipteran genomics and psyllid gene expression

    USDA-ARS?s Scientific Manuscript database

    One of the best tools current available is the application of genomics to insect pest problems. Genomics provides rapid elucidation of the genetic basis of insect biology. Research efforts on psyllid genomics, while still in its infancy, is providing information which will aid strategies to suppress...

  7. PIECE: a database for plant gene structure comparison and evolution

    PubMed Central

    Wang, Yi; You, Frank M.; Lazo, Gerard R.; Luo, Ming-Cheng; Thilmony, Roger; Gordon, Sean; Kianian, Shahryar F.; Gu, Yong Q.

    2013-01-01

    Gene families often show degrees of differences in terms of exon–intron structures depending on their distinct evolutionary histories. Comparative analysis of gene structures is important for understanding their evolutionary and functional relationships within plant species. Here, we present a comparative genomics database named PIECE (http://wheat.pw.usda.gov/piece) for Plant Intron and Exon Comparison and Evolution studies. The database contains all the annotated genes extracted from 25 sequenced plant genomes. These genes were classified based on Pfam motifs. Phylogenetic trees were pre-constructed for each gene category. PIECE provides a user-friendly interface for different types of searches and a graphical viewer for displaying a gene structure pattern diagram linked to the resulting bootstrapped dendrogram for each gene family. The gene structure evolution of orthologous gene groups was determined using the GLOOME, Exalign and GECA software programs that can be accessed within the database. PIECE also provides a web server version of the software, GSDraw, for drawing schematic diagrams of gene structures. PIECE is a powerful tool for comparing gene sequences and provides valuable insights into the evolution of gene structure in plant genomes. PMID:23180792

  8. Prevalent role of gene features in determining evolutionary fates of whole-genome duplication duplicated genes in flowering plants.

    PubMed

    Jiang, Wen-kai; Liu, Yun-long; Xia, En-hua; Gao, Li-zhi

    2013-04-01

    The evolution of genes and genomes after polyploidization has been the subject of extensive studies in evolutionary biology and plant sciences. While a significant number of duplicated genes are rapidly removed during a process called fractionation, which operates after the whole-genome duplication (WGD), another considerable number of genes are retained preferentially, leading to the phenomenon of biased gene retention. However, the evolutionary mechanisms underlying gene retention after WGD remain largely unknown. Through genome-wide analyses of sequence and functional data, we comprehensively investigated the relationships between gene features and the retention probability of duplicated genes after WGDs in six plant genomes, Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa), soybean (Glycine max), rice (Oryza sativa), sorghum (Sorghum bicolor), and maize (Zea mays). The results showed that multiple gene features were correlated with the probability of gene retention. Using a logistic regression model based on principal component analysis, we resolved evolutionary rate, structural complexity, and GC3 content as the three major contributors to gene retention. Cluster analysis of these features further classified retained genes into three distinct groups in terms of gene features and evolutionary behaviors. Type I genes are more prone to be selected by dosage balance; type II genes are possibly subject to subfunctionalization; and type III genes may serve as potential targets for neofunctionalization. This study highlights that gene features are able to act jointly as primary forces when determining the retention and evolution of WGD-derived duplicated genes in flowering plants. These findings thus may help to provide a resolution to the debate on different evolutionary models of gene fates after WGDs.

  9. Genome changes after gene duplication: haploidy vs. diploidy.

    PubMed

    Xue, Cheng; Huang, Ren; Maxwell, Taylor J; Fu, Yun-Xin

    2010-09-01

    Since genome size and the number of duplicate genes observed in genomes increase from haploid to diploid organisms, diploidy might provide more evolutionary probabilities through gene duplication. It is still unclear how diploidy promotes genomic evolution in detail. In this study, we explored the evolution of segmental gene duplication in haploid and diploid populations by analytical and simulation approaches. Results show that (1) under the double null recessive (DNR) selective model, given the same recombination rate, the evolutionary trajectories and consequences are very similar between the same-size gene-pool haploid vs. diploid populations; (2) recombination enlarges the probability of preservation of duplicate genes in either haploid or diploid large populations, and haplo-insufficiency reinforces this effect; and (3) the loss of duplicate genes at the ancestor locus is limited under recombination while under complete linkage the loss of duplicate genes is always random at the ancestor and newly duplicated loci. Therefore, we propose a model to explain the advantage of diploidy: diploidy might facilitate the increase of recombination rate, especially under sexual reproduction; more duplicate genes are preserved under more recombination by originalization (by which duplicate genes are preserved intact at a special quasi-mutation-selection balance under the DNR or haplo-insufficient selective model), so genome sizes and the number of duplicate genes in diploid organisms become larger. Additionally, it is suggested that small genomic rearrangements due to the random loss of duplicate genes might be limited under recombination.

  10. Directed genomic integration, gene replacement, and integrative gene expression in Streptococcus thermophilus.

    PubMed Central

    Mollet, B; Knol, J; Poolman, B; Marciset, O; Delley, M

    1993-01-01

    Several pGEM5- and pUC19-derived plasmids containing a selectable erythromycin resistance marker were integrated into the chromosome of Streptococcus thermophilus at the loci of the lactose-metabolizing genes. Integration occurred via homologous recombination and resulted in cointegrates between plasmid and genome, flanked by the homologous DNA used for integration. Selective pressure on the plasmid-located erythromycin resistance gene resulted in multiple amplifications of the integrated plasmid. Release of this selective pressure, however, gave way to homologous resolution of the cointegrate structures. By integration and subsequent resolution, we were able to replace the chromosomal lacZ gene with a modified copy carrying an in vitro-generated deletion. In the same way, we integrated a promoterless chloramphenicol acetyltransferase (cat) gene between the chromosomal lacS and lacZ genes of the lactose operon. The inserted cat gene became a functional part of the operon and was expressed and regulated accordingly. Selective pressure on the essential lacS and lacZ genes under normal growth conditions in milk ensures the maintenance and expression of the integrated gene. As there are only minimal repeated DNA sequences (an NdeI site) flanking the inserted cat gene, it was stably maintained even in the absence of lactose, i.e., when grown on sucrose or glucose. The methodology represents a stable system in which to express and regulate foreign genes in S. thermophilus, which could qualify in the future for an application with food. Images PMID:8331064

  11. Genomic Data Quality Impacts Automated Detection of Lateral Gene Transfer in Fungi

    PubMed Central

    Dupont, Pierre-Yves; Cox, Murray P.

    2017-01-01

    Lateral gene transfer (LGT, also known as horizontal gene transfer), an atypical mechanism of transferring genes between species, has almost become the default explanation for genes that display an unexpected composition or phylogeny. Numerous methods of detecting LGT events all rely on two fundamental strategies: primary structure composition or gene tree/species tree comparisons. Discouragingly, the results of these different approaches rarely coincide. With the wealth of genome data now available, detection of laterally transferred genes is increasingly being attempted in large uncurated eukaryotic datasets. However, detection methods depend greatly on the quality of the underlying genomic data, which are typically complex for eukaryotes. Furthermore, given the automated nature of genomic data collection, it is typically impractical to manually verify all protein or gene models, orthology predictions, and multiple sequence alignments, requiring researchers to accept a substantial margin of error in their datasets. Using a test case comprising plant-associated genomes across the fungal kingdom, this study reveals that composition- and phylogeny-based methods have little statistical power to detect laterally transferred genes. In particular, phylogenetic methods reveal extreme levels of topological variation in fungal gene trees, the vast majority of which show departures from the canonical species tree. Therefore, it is inherently challenging to detect LGT events in typical eukaryotic genomes. This finding is in striking contrast to the large number of claims for laterally transferred genes in eukaryotic species that routinely appear in the literature, and questions how many of these proposed examples are statistically well supported. PMID:28235827

  12. Mitochondrial Genome of Palpitomonas bilix: Derived Genome Structure and Ancestral System for Cytochrome c Maturation

    PubMed Central

    Nishimura, Yuki; Tanifuji, Goro; Kamikawa, Ryoma; Yabuki, Akinori; Hashimoto, Tetsuo; Inagaki, Yuji

    2016-01-01

    We here reported the mitochondrial (mt) genome of one of the heterotrophic microeukaryotes related to cryptophytes, Palpitomonas bilix. The P. bilix mt genome was found to be a linear molecule composed of “single copy region” (∼16 kb) and repeat regions (∼30 kb) arranged in an inverse manner at both ends of the genome. Linear mt genomes with large inverted repeats are known for three distantly related eukaryotes (including P. bilix), suggesting that this particular mt genome structure has emerged at least three times in the eukaryotic tree of life. The P. bilix mt genome contains 47 protein-coding genes including ccmA, ccmB, ccmC, and ccmF, which encode protein subunits involved in the system for cytochrome c maturation inherited from a bacterium (System I). We present data indicating that the phylogenetic relatives of P. bilix, namely, cryptophytes, goniomonads, and kathablepharids, utilize an alternative system for cytochrome c maturation, which has most likely emerged during the evolution of eukaryotes (System III). To explain the distribution of Systems I and III in P. bilix and its phylogenetic relatives, two scenarios are possible: (i) System I was replaced by System III on the branch leading to the common ancestor of cryptophytes, goniomonads, and kathablepharids, and (ii) the two systems co-existed in their common ancestor, and lost differentially among the four descendants. PMID:27604877

  13. Genomic organization and 5{prime}-flanking DNA sequence of the murine stomatin gene (Epb72)

    SciTech Connect

    Gallagher, P.G.; Turetsky, T.; Mentzer, W.C. |

    1996-06-15

    Stomatin is a poorly understood integral membrane protein that is absent from the erythrocyte membranes of many patients with hereditary stomatocytosis. This report describes the cloning of the murine stomatin chromosomal gene, determination of its genomic structure, and characterization of the 5{prime}-flanking genomic DNA sequences. The stomatin gene is encoded by seven exons spread over {approximately}25 kb of genomic DNA. There is no concordance between the exon structure of the stomatin gene and the locations of three domains predicted on the basis of protein structure. Inspection of the 5{prime}-flanking DNA sequences reveals features of a TATA-less housekeeping gene promoter and consensus sequences for a number of potential DNA-binding proteins. 12 refs., 2 figs., 1 tab.

  14. Genomic structure, chromosomal localization and expression profile of a novel melanoma differentiation associated (mda-7) gene with cancer specific growth suppressing and apoptosis inducing properties.

    SciTech Connect

    Huang, E. Y.; Madireddi, M. T.; Gopalkrishnan, R. V.; Leszczyniecka, M.; Su, Z. Z.; Lebedeva, I. V.; Kang, D. C.; Jian, H.; Lin, J. J.; Alexandre, D.; Chen, Y.; Vozhilla, N.; Mei, M. X.; Christiansen, K. A.; Sivo, F.; Goldstein, N. I.; Chada, S.; Huberman, E.; Pestka, S.; Fisher, P. B.; Biochip Technology Center; Columbia Univ.; Introgen Therapeutics Inc.; UMDNJ-Robert Wood Johnson Medical School

    2001-10-25

    Abnormalities in cellular differentiation are frequent occurrences in human cancers. Treatment of human melanoma cells with recombinant fibroblast interferon (IFN-beta) and the protein kinase C activator mezerein (MEZ) results in an irreversible loss in growth potential, suppression of tumorigenic properties and induction of terminal cell differentiation. Subtraction hybridization identified melanoma differentiation associated gene-7 (mda-7), as a gene induced during these physiological changes in human melanoma cells. Ectopic expression of mda-7 by means of a replication defective adenovirus results in growth suppression and induction of apoptosis in a broad spectrum of additional cancers, including melanoma, glioblastoma multiforme, osteosarcoma and carcinomas of the breast, cervix, colon, lung, nasopharynx and prostate. In contrast, no apparent harmful effects occur when mda-7 is expressed in normal epithelial or fibroblast cells. Human clones of mda-7 were isolated and its organization resolved in terms of intron/exon structure and chromosomal localization. Hu-mda-7 encompasses seven exons and six introns and encodes a protein with a predicted size of 23.8 kDa, consisting of 206 amino acids. Hu-mda-7 mRNA is stably expressed in the thymus, spleen and peripheral blood leukocytes. De novo mda-7 mRNA expression is also detected in human melanocytes and expression is inducible in cells of melanocyte/melanoma lineage and in certain normal and cancer cell types following treatment with a combination of IFN-beta plus MEZ. Mda-7 expression is also induced during megakaryocyte differentiation induced in human hematopoietic cells by treatment with TPA (12-O-tetradecanoyl phorbol-13-acetate). In contrast, de novo expression of mda-7 is not detected nor is it inducible by IFN-beta+MEZ in a spectrum of additional normal and cancer cells. No correlation was observed between induction of mda-7 mRNA expression and growth suppression following treatment with IFN-beta+MEZ and

  15. Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression.

    PubMed

    Arnaiz, Olivier; Van Dijk, Erwin; Bétermier, Mireille; Lhuillier-Akakpo, Maoussi; de Vanssay, Augustin; Duharcourt, Sandra; Sallet, Erika; Gouzy, Jérôme; Sperling, Linda

    2017-06-26

    The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3' and 5' UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis

  16. Structural Variation Mutagenesis of the Human Genome: Impact on Disease and Evolution

    PubMed Central

    Lupski, James R.

    2015-01-01

    Watson-Crick base-pair changes, or single-nucleotide variants (SNV), have long been known as a source of mutations. However, the extent to which DNA structural variation, including duplication and deletion copy number variants (CNV) and copy number neutral inversions and translocations, contribute to human genome variation and disease has been appreciated only recently. Moreover, the potential complexity of structural variants (SV) was not envisioned; thus, the frequency of complex genomic rearrangements (CGR) and how such events form remained a mystery. The concept of genomic disorders, diseases due to genomic rearrangements and not sequence-based changes for which genomic architecture incite genomic instability, delineated a new category of conditions distinct from chromosomal syndromes and single-gene Mendelian diseases. Nevertheless, it is the mechanistic understanding of CNV/SV formation that has promoted further understanding of human biology and disease and provided insights into human genome and gene evolution. PMID:25892534

  17. Plant Ion Channels: Gene Families, Physiology, and Functional Genomics Analyses

    PubMed Central

    Ward, John M.; Mäser, Pascal; Schroeder, Julian I.

    2016-01-01

    Distinct potassium, anion, and calcium channels in the plasma membrane and vacuolar membrane of plant cells have been identified and characterized by patch clamping. Primarily owing to advances in Arabidopsis genetics and genomics, and yeast functional complementation, many of the corresponding genes have been identified. Recent advances in our understanding of ion channel genes that mediate signal transduction and ion transport are discussed here. Some plant ion channels, for example, ALMT and SLAC anion channel subunits, are unique. The majority of plant ion channel families exhibit homology to animal genes; such families include both hyperpolarization-and depolarization-activated Shaker-type potassium channels, CLC chloride transporters/channels, cyclic nucleotide–gated channels, and ionotropic glutamate receptor homologs. These plant ion channels offer unique opportunities to analyze the structural mechanisms and functions of ion channels. Here we review gene families of selected plant ion channel classes and discuss unique structure-function aspects and their physiological roles in plant cell signaling and transport. PMID:18842100

  18. Structure and sequence of the saimiriine herpesvirus 1 genome.

    PubMed

    Tyler, Shaun; Severini, Alberto; Black, Darla; Walker, Matthew; Eberle, R

    2011-02-05

    We report here the complete genome sequence of the squirrel monkey α-herpesvirus saimiriine herpesvirus 1 (HVS1). Unlike the simplexviruses of other primate species, only the unique short region of the HVS1 genome is bounded by inverted repeats. While all Old World simian simplexviruses characterized to date lack the herpes simplex virus RL1 (γ34.5) gene, HVS1 has an RL1 gene. HVS1 lacks several genes that are present in other primate simplexviruses (US8.5, US10-12, UL43/43.5 and UL49A). Although the overall genome structure appears more like that of varicelloviruses, the encoded HVS1 proteins are most closely related to homologous proteins of the primate simplexviruses. Phylogenetic analyses confirm that HVS1 is a simplexvirus. Limited comparison of two HVS1 strains revealed a very low degree of sequence variation more typical of varicelloviruses. HVS1 is thus unique among the primate α-herpesviruses in that its genome has properties of both simplexviruses and varicelloviruses.

  19. Primary structure of the herpesvirus saimiri genome.

    PubMed Central

    Albrecht, J C; Nicholas, J; Biller, D; Cameron, K R; Biesinger, B; Newman, C; Wittmann, S; Craxton, M A; Coleman, H; Fleckenstein, B

    1992-01-01

    This report describes the complete nucleotide sequence of the genome of herpesvirus saimiri, the prototype of gammaherpesvirus subgroup 2 (rhadinoviruses). The unique low-G + C-content DNA region has 112,930 bp with an average base composition of 34.5% G + C and is flanked by about 35 noncoding high-G + C-content DNA repeats of 1,444 bp (70.8% G + C) in tandem orientation. We identified 76 major open reading frames and a set of seven U-RNA genes for a total of 83 potential genes. The genes are closely arranged, with only a few regions of sizable noncoding sequences. For 60 of the predicted proteins, homologous sequences are found in other herpesviruses. Genes conserved between herpesvirus saimiri and Epstein-Barr virus (gammaherpesvirus subgroup 1) show that their genomes are generally collinear, although conserved gene blocks are separated by unique genes that appear to determine the particular phenotype of these viruses. Several deduced protein sequences of herpesvirus saimiri without counterparts in most of the other sequenced herpesviruses exhibited significant homology with cellular proteins of known function. These include thymidylate synthase, dihydrofolate reductase, complement control proteins, the cell surface antigen CD59, cyclins, and G protein-coupled receptors. Searching for functional protein motifs revealed that the virus may encode a cytosine-specific methylase and a tyrosine-specific protein kinase. Several herpesvirus saimiri genes are potential candidates to cooperate with the gene for saimiri transformation-associated protein of subgroup A (STP-A) in T-lymphocyte growth stimulation. PMID:1321287

  20. Gene copy number variation throughout the Plasmodium falciparum genome.

    PubMed

    Cheeseman, Ian H; Gomez-Escobar, Natalia; Carret, Celine K; Ivens, Alasdair; Stewart, Lindsay B; Tetteh, Kevin K A; Conway, David J

    2009-08-04

    Gene copy number variation (CNV) is responsible for several important phenotypes of the malaria parasite Plasmodium falciparum, including drug resistance, loss of infected erythrocyte cytoadherence and alteration of receptor usage for erythrocyte invasion. Despite the known effects of CNV, little is known about its extent throughout the genome. We performed a whole-genome survey of CNV genes in P. falciparum using comparative genome hybridisation of a diverse set of 16 laboratory culture-adapted isolates to a custom designed high density Affymetrix GeneChip array. Overall, 186 genes showed hybridisation signals consistent with deletion or amplification in one or more isolate. There is a strong association of CNV with gene length, genomic location, and low orthology to genes in other Plasmodium species. Sub-telomeric regions of all chromosomes are strongly associated with CNV genes independent from members of previously described multigene families. However, approximately 40% of CNV genes were located in more central regions of the chromosomes. Among the previously undescribed CNV genes, several that are of potential phenotypic relevance are identified. CNV represents a major form of genetic variation within the P. falciparum genome; the distribution of gene features indicates the involvement of highly non-random mutational and selective processes. Additional studies should be directed at examining CNV in natural parasite populations to extend conclusions to clinical settings.

  1. The multiple facets of homology and their use in comparative genomics to study the evolution of genes, genomes, and species.

    PubMed

    Descorps-Declère, Stéphane; Lemoine, Frédéric; Sculo, Quentin; Lespinet, Olivier; Labedan, Bernard

    2008-04-01

    The incredible development of comparative genomics during the last decade has required a correct use of the concept of homology that was previously utilized only by evolutionary biologists. Unhappily, this concept has been often misunderstood and thus misused when exploited outside its evolutionary context. This review brings back to the correct definition of homology and explains how this definition has been progressively refined in order to adapt it to the various new kinds of analysis of gene properties and of their products that appear with the progress of comparative genomics. Then, we illustrate the power and the proficiency of such a concept when using the available genomics data in order to study the evolution of individual genes, of entire genomes and of species, respectively. After explaining how we detect homologues by an exhaustive comparison of a hundred of complete proteomes, we describe three main lines of research we have developed in the recent years. The first one exploits synteny and gene context data to better understand the mechanisms of genome evolution in prokaryotes. The second one is based on phylogenomics approaches to reconstruct the tree of life. The last one is devoted to reminding that protein homology is often limited to structural segments (SOH=segment of homology or module). Detecting and numbering modules allows tracing back protein history by identifying the events of gene duplication and gene fusion. We insist that one of the main present difficulties in such studies is a lack of a reliable method to identify genuine orthologues. Finally, we show how these homology studies are helpful to annotate genes and genomes and to study the complexity of the relationships between sequence and function of a gene.

  2. Genomic organization of a cellulase gene family in Phanerochaete chrysosporium

    Treesearch

    Sarah F. Covert; Jennifer Bolduc; Daniel Cullen

    1992-01-01

    Southern blot and nucleotide sequence analysis of Phanerochaete chrysosporium BKM-F-1767 genomic clones indicate that this wood-degrading fungus contains at least six genes with significant homology to the Trichoderma reesei cellobiohydrolase I gene (cbh1). Using pulsed-field gel electrophoresis to separate P. chrysosporium chromosomes, the six cellulase genes were...

  3. Genome-editing Technologies for Gene and Cell Therapy.

    PubMed

    Maeder, Morgan L; Gersbach, Charles A

    2016-03-01

    Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed.

  4. Coelacanth genome sequence reveals the evolutionary history of vertebrate genes.

    PubMed

    Noonan, James P; Grimwood, Jane; Danke, Joshua; Schmutz, Jeremy; Dickson, Mark; Amemiya, Chris T; Myers, Richard M

    2004-12-01

    The coelacanth is one of the nearest living relatives of tetrapods. However, a teleost species such as zebrafish or Fugu is typically used as the outgroup in current tetrapod comparative sequence analyses. Such studies are complicated by the fact that teleost genomes have undergone a whole-genome duplication event, as well as individual gene-duplication events. Here, we demonstrate the value of coelacanth genome sequence by complete sequencing and analysis of the protocadherin gene cluster of the Indonesian coelacanth, Latimeria menadoensis. We found that coelacanth has 49 protocadherin cluster genes organized in the same three ordered subclusters, alpha, beta, and gamma, as the 54 protocadherin cluster genes in human. In contrast, whole-genome and tandem duplications have generated two zebrafish protocadherin clusters comprised of at least 97 genes. Additionally, zebrafish protocadherins are far more prone to homogenizing gene conversion events than coelacanth protocadherins, suggesting that recombination- and duplication-driven plasticity may be a feature of teleost genomes. Our results indicate that coelacanth provides the ideal outgroup sequence against which tetrapod genomes can be measured. We therefore present L. menadoensis as a candidate for whole-genome sequencing.

  5. Genome-editing Technologies for Gene and Cell Therapy

    PubMed Central

    Maeder, Morgan L; Gersbach, Charles A

    2016-01-01

    Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed. PMID:26755333

  6. A data management system for structural genomics

    PubMed Central

    Raymond, Stéphane; O'Toole, Nicholas; Cygler, Miroslaw

    2004-01-01

    Background Structural genomics (SG) projects aim to determine thousands of protein structures by the development of high-throughput techniques for all steps of the experimental structure determination pipeline. Crucial to the success of such endeavours is the careful tracking and archiving of experimental and external data on protein targets. Results We have developed a sophisticated data management system for structural genomics. Central to the system is an Oracle-based, SQL-interfaced database. The database schema deals with all facets of the structure determination process, from target selection to data deposition. Users access the database via any web browser. Experimental data is input by users with pre-defined web forms. Data can be displayed according to numerous criteria. A list of all current target proteins can be viewed, with links for each target to associated entries in external databases. To avoid unnecessary work on targets, our data management system matches protein sequences weekly using BLAST to entries in the Protein Data Bank and to targets of other SG centers worldwide. Conclusion Our system is a working, effective and user-friendly data management tool for structural genomics projects. In this report we present a detailed summary of the various capabilities of the system, using real target data as examples, and indicate our plans for future enhancements. PMID:15210054

  7. Flexibility and symmetry of prokaryotic genome rearrangement reveal lineage-associated core-gene-defined genome organizational frameworks.

    PubMed

    Kang, Yu; Gu, Chaohao; Yuan, Lina; Wang, Yue; Zhu, Yanmin; Li, Xinna; Luo, Qibin; Xiao, Jingfa; Jiang, Daquan; Qian, Minping; Ahmed Khan, Aftab; Chen, Fei; Zhang, Zhang; Yu, Jun

    2014-11-25

    The prokaryotic pangenome partitions genes into core and dispensable genes. The order of core genes, albeit assumed to be stable under selection in general, is frequently interrupted by horizontal gene transfer and rearrangement, but how a core-gene-defined genome maintains its stability or flexibility remains to be investigated. Based on data from 30 species, including 425 genomes from six phyla, we grouped core genes into syntenic blocks in the context of a pangenome according to their stability across multiple isolates. A subset of the core genes, often species specific and lineage associated, formed a core-gene-defined genome organizational framework (cGOF). Such cGOFs are either single segmental (one-third of the species analyzed) or multisegmental (the rest). Multisegment cGOFs were further classified into symmetric or asymmetric according to segment orientations toward the origin-terminus axis. The cGOFs in Gram-positive species are exclusively symmetric and often reversible in orientation, as opposed to those of the Gram-negative bacteria, which are all asymmetric and irreversible. Meanwhile, all species showing strong strand-biased gene distribution contain symmetric cGOFs and often specific DnaE (α subunit of DNA polymerase III) isoforms. Furthermore, functional evaluations revealed that cGOF genes are hub associated with regard to cellular activities, and the stability of cGOF provides efficient indexes for scaffold orientation as demonstrated by assembling virtual and empirical genome drafts. cGOFs show species specificity, and the symmetry of multisegmental cGOFs is conserved among taxa and constrained by DNA polymerase-centric strand-biased gene distribution. The definition of species-specific cGOFs provides powerful guidance for genome assembly and other structure-based analysis. Prokaryotic genomes are frequently interrupted by horizontal gene transfer (HGT) and rearrangement. To know whether there is a set of genes not only conserved in position

  8. Amplification and characterization of eukaryotic structural genes.

    PubMed

    Maniatis, T; Efstratiadis, A; Sim, G K; Kafatos, F

    1978-05-01

    An approach to the study of eukaryotic structural genes which are differentially expressed during development is described. This approach involves the isolation and amplification of mRNA sequences by in vitro conversion of mRNA to double-stranded cDNA followed by molecular cloning in bacterial plasmids. This procedure provides highly specific hybridization probes that can be used to identify genes and their contiguous DNA sequences in genomic DNA, and to detect specific RNA transcripts during development. The nature of the method allows the isolation of individual mRNA sequences from a complex population of molecules at different stages of development.

  9. Interrogating the druggable genome with structural informatics.

    PubMed

    Hambly, Kevin; Danzer, Joseph; Muskal, Steven; Debe, Derek A

    2006-08-01

    Structural genomics projects are producing protein structure data at an unprecedented rate. In this paper, we present the Target Informatics Platform (TIP), a novel structural informatics approach for amplifying the rapidly expanding body of experimental protein structure information to enhance the discovery and optimization of small molecule protein modulators on a genomic scale. In TIP, existing experimental structure information is augmented using a homology modeling approach, and binding sites across multiple target families are compared using a clique detection algorithm. We report here a detailed analysis of the structural coverage for the set of druggable human targets, highlighting drug target families where the level of structural knowledge is currently quite high, as well as those areas where structural knowledge is sparse. Furthermore, we demonstrate the utility of TIP's intra- and inter-family binding site similarity analysis using a series of retrospective case studies. Our analysis underscores the utility of a structural informatics infrastructure for extracting drug discovery-relevant information from structural data, aiding researchers in the identification of lead discovery and optimization opportunities as well as potential "off-target" liabilities.

  10. Comparative mapping and genomic annotation of the bovine oncosuppressor gene WWOX.

    PubMed

    Manera, S; Bonfiglio, S; Malusà, A; Denis, C; Boussaha, M; Russo, V; Roperto, F; Perucatti, A; Di Meo, G P; Eggen, A; Ferretti, L

    2009-01-01

    WWOX (WW domain-containing oxidoreductase) is the gene mapping at FRA16D HSA16q23.1, the second most active common fragile site in the human genome. In this study we characterized at a detailed molecular level WWOX in the bovine genome. First, we sequenced cDNA from various tissues and obtained evidence in support of a 9-exon structure for the gene, similar to the human gene. Then, we recovered BACs using exon tags and annotated the gene to a >1-Mb genomic region of BTA18 using the Btau 4.0 genome assembly as a reference, thus resolving an issue related to exon 9, which is not included in the genomic annotation of the gene in the Entrez database. Finally, BACs spanning WWOX were used as FISH probes to obtain comparative mapping of the gene in Bos taurus, Bubalus bubalis, Ovis aries and Capra hircus to BTA18q12.1, BBU18q13, OAR14q12.1 and CHI18q12.1, respectively. Our data show that the chromosomal location of WWOX is conserved between man and 4 major domesticated species. Moreover, the annotation of the bovine gene also suggests a highly conserved genomic arrangement, including number and size of introns.

  11. Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression.

    PubMed

    Ay, Ferhat; Bunnik, Evelien M; Varoquaux, Nelle; Bol, Sebastiaan M; Prudhomme, Jacques; Vert, Jean-Philippe; Noble, William Stafford; Le Roch, Karine G

    2014-06-01

    The development of the human malaria parasite Plasmodium falciparum is controlled by coordinated changes in gene expression throughout its complex life cycle, but the corresponding regulatory mechanisms are incompletely understood. To study the relationship between genome architecture and gene regulation in Plasmodium, we assayed the genome architecture of P. falciparum at three time points during its erythrocytic (asexual) cycle. Using chromosome conformation capture coupled with next-generation sequencing technology (Hi-C), we obtained high-resolution chromosomal contact maps, which we then used to construct a consensus three-dimensional genome structure for each time point. We observed strong clustering of centromeres, telomeres, ribosomal DNA, and virulence genes, resulting in a complex architecture that cannot be explained by a simple volume exclusion model. Internal virulence gene clusters exhibit domain-like structures in contact maps, suggesting that they play an important role in the genome architecture. Midway during the erythrocytic cycle, at the highly transcriptionally active trophozoite stage, the genome adopts a more open chromatin structure with increased chromosomal intermingling. In addition, we observed reduced expression of genes located in spatial proximity to the repressive subtelomeric center, and colocalization of distinct groups of parasite-specific genes with coordinated expression profiles. Overall, our results are indicative of a strong association between the P. falciparum spatial genome organization and gene expression. Understanding the molecular processes involved in genome conformation dynamics could contribute to the discovery of novel antimalarial strategies.

  12. Complete female mitochondrial genome of Anodonta anatina (Mollusca: Unionidae): confirmation of a novel protein-coding gene (F ORF).

    PubMed

    Soroka, Marianna; Burzyński, Artur

    2015-04-01

    Freshwater mussels are among animals having two different, gender-specific mitochondrial genomes. We sequenced complete female mitochondrial genomes from five individuals of Anodonta anatina, a bivalve species common in palearctic ecozone. The length of the genome was variable: 15,637-15,653 bp. This variation was almost entirely confined to the non-coding parts, which constituted approximately 5% of the genome. Nucleotide diversity was moderate, at 0.3%. Nucleotide composition was typically biased towards AT (66.0%). All genes normally seen in animal mtDNA were identified, as well as the ORF characteristic for unionid mitochondrial genomes, bringing the total number of genes present to 38. If this additional ORF does encode a protein, it must evolve under a very relaxed selection since all substitutions within this gene were non-synonymous. The gene order and structure of the genome were identical to those of all female mitochondrial genomes described in unionid bivalves except the Gonideini.

  13. The inheritance of organelle genes and genomes: patterns and mechanisms.

    PubMed

    Xu, Jianping

    2005-12-01

    Unlike nuclear genes and genomes, the inheritance of organelle genes and genomes does not follow Mendel's laws. In this mini-review, I summarize recent research progress on the patterns and mechanisms of the inheritance of organelle genes and genomes. While most sexual eukaryotes show uniparental inheritance of organelle genes and genomes in some progeny at least part of the time, increasing evidence indicates that strictly uniparental inheritance is rare and that organelle inheritance patterns are very diverse and complex. In contrast with the predominance of uniparental inheritance in multicellular organisms, organelle genes in eukaryotic microorganisms, such as protists, algae, and fungi, typically show a greater diversity of inheritance patterns, with sex-determining loci playing significant roles. The diverse patterns of inheritance are matched by the rich variety of potential mechanisms. Indeed, many factors, both deterministic and stochastic, can influence observed patterns of organelle inheritance. Interestingly, in multicellular organisms, progeny from interspecific crosses seem to exhibit more frequent paternal leakage and biparental organelle genome inheritance than those from intraspecific crosses. The recent observation of a sex-determining gene in the basidiomycete yeast Cryptococcus neoformans, which controls mitochondrial DNA inheritance, has opened up potentially exciting research opportunities for identifying specific molecular genetic pathways that control organelle inheritance, as well as for testing evolutionary hypotheses regarding the prevalence of uniparental inheritance of organelle genes and genomes.

  14. Full genome comparison and characterization of avian H10 viruses with different pathogenicity in Mink (Mustela vison) reveals genetic and functional differences in the non-structural gene

    PubMed Central

    2010-01-01

    Background The unique property of some avian H10 viruses, particularly the ability to cause severe disease in mink without prior adaptation, enabled our study. Coupled with previous experimental data and genetic characterization here we tried to investigate the possible influence of different genes on the virulence of these H10 avian influenza viruses in mink. Results Phylogenetic analysis revealed a close relationship between the viruses studied. Our study also showed that there are no genetic differences in receptor specificity or the cleavability of the haemagglutinin proteins of these viruses regardless of whether they are of low or high pathogenicity in mink. In poly I:C stimulated mink lung cells the NS1 protein of influenza A virus showing high pathogenicity in mink down regulated the type I interferon promoter activity to a greater extent than the NS1 protein of the virus showing low pathogenicity in mink. Conclusions Differences in pathogenicity and virulence in mink between these strains could be related to clear amino acid differences in the non structural 1 (NS1) protein. The NS gene of mink/84 appears to have contributed to the virulence of the virus in mink by helping the virus evade the innate immune responses. PMID:20591155

  15. Genome Variability and Gene Content in Chordopoxviruses: Dependence on Microsatellites

    PubMed Central

    Hatcher, Eneida L.; Wang, Chunlin; Lefkowitz, Elliot J.

    2015-01-01

    To investigate gene loss in poxviruses belonging to the Chordopoxvirinae subfamily, we assessed the gene content of representative members of the subfamily, and determined whether individual genes present in each genome were intact, truncated, or fragmented. When nonintact genes were identified, the early stop mutations (ESMs) leading to gene truncation or fragmentation were analyzed. Of all the ESMs present in these poxvirus genomes, over 65% co-localized with microsatellites—simple sequence nucleotide repeats. On average, microsatellites comprise 24% of the nucleotide sequence of these poxvirus genomes. These simple repeats have been shown to exhibit high rates of variation, and represent a target for poxvirus protein variation, gene truncation, and reductive evolution. PMID:25912716

  16. Structure of the human annexin VI gene

    SciTech Connect

    Smith, P.D.; Moss, S.E.; Davies, A.; Crumpton, M.J.

    1994-03-29

    The authors report the structure of the human annexin VI gene and compare the intron-exon organization with the known structures of the human annexin I and II genes. The gene is {approximately}60 kbp long and contains 26 exons. Consistent with the published annexin VI cDNA sequence, the genomic sequence at the 3{prime} end does not contain a canonical polyadenylation signal. The genomic sequence upstream of the transcription start site contains TATAA and CAAT motifs. The spatial organization of the exons does not reveal any obvious similarities between the two halves of the annexin VI gene. Comparison of the intron-exon boundary positions of the annexin VI gene with those of annexins I and II reveals that within the repeated domains the break points are perfectly conserved except for exon 8, which is one codon smaller in annexin II. The corresponding point in the second half of annexin VI is represented by two exons, exons 20 and 21. The latter exon is alternatively spliced, giving rise to two annexin VI isoforms that differ with respect to a 6-amino acid insertion at the start of repeat 7. 32 refs., 6 figs.

  17. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    SciTech Connect

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-09-18

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society.

  18. Genome engineering and gene expression control for bacterial strain development.

    PubMed

    Song, Chan Woo; Lee, Joungmin; Lee, Sang Yup

    2015-01-01

    In recent years, a number of techniques and tools have been developed for genome engineering and gene expression control to achieve desired phenotypes of various bacteria. Here we review and discuss the recent advances in bacterial genome manipulation and gene expression control techniques, and their actual uses with accompanying examples. Genome engineering has been commonly performed based on homologous recombination. During such genome manipulation, the counterselection systems employing SacB or nucleases have mainly been used for the efficient selection of desired engineered strains. The recombineering technology enables simple and more rapid manipulation of the bacterial genome. The group II intron-mediated genome engineering technology is another option for some bacteria that are difficult to be engineered by homologous recombination. Due to the increasing demands on high-throughput screening of bacterial strains having the desired phenotypes, several multiplex genome engineering techniques have recently been developed and validated in some bacteria. Another approach to achieve desired bacterial phenotypes is the repression of target gene expression without the modification of genome sequences. This can be performed by expressing antisense RNA, small regulatory RNA, or CRISPR RNA to repress target gene expression at the transcriptional or translational level. All of these techniques allow efficient and rapid development and screening of bacterial strains having desired phenotypes, and more advanced techniques are expected to be seen. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Higher plant mitochondrial DNA: Genomes, genes, mutants, transcription, translation

    SciTech Connect

    Not Available

    1986-01-01

    This volume contains brief summaries of 63 presentations given at the International Workshop on Higher Plant Mitochondrial DNA. The presentations are organized into topical discussions addressing plant genomes, mitochondrial genes, cytoplasmic male sterility, transcription, translation, plasmids and tissue culture. (DT)

  20. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure.

    PubMed

    Zuccolo, Andrea; Bowers, John E; Estill, James C; Xiong, Zhiyong; Luo, Meizhong; Sebastian, Aswathy; Goicoechea, José Luis; Collura, Kristi; Yu, Yeisoo; Jiao, Yuannian; Duarte, Jill; Tang, Haibao; Ayyampalayam, Saravanaraj; Rounsley, Steve; Kudrna, Dave; Paterson, Andrew H; Pires, J Chris; Chanderbali, Andre; Soltis, Douglas E; Chamala, Srikar; Barbazuk, Brad; Soltis, Pamela S; Albert, Victor A; Ma, Hong; Mandoli, Dina; Banks, Jody; Carlson, John E; Tomkins, Jeffrey; dePamphilis, Claude W; Wing, Rod A; Leebens-Mack, Jim

    2011-01-01

    Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome. Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella. When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution.

  1. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure

    PubMed Central

    2011-01-01

    Background Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome. Results Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella. Conclusions When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution. PMID:21619600

  2. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics

    PubMed Central

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-01-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  3. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

    PubMed

    Huang, Ying; Chen, Shi-Yi; Deng, Feilong

    2016-01-01

    In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.

  4. High-throughput Crystallography for Structural Genomics

    PubMed Central

    Joachimiak, Andrzej

    2009-01-01

    Protein X-ray crystallography recently celebrated its 50th anniversary. The structures of myoglobin and hemoglobin determined by Kendrew and Perutz provided the first glimpses into the complex protein architecture and chemistry. Since then, the field of structural molecular biology has experienced extraordinary progress and now over 53,000 proteins structures have been deposited into the Protein Data Bank. In the past decade many advances in macromolecular crystallography have been driven by world-wide structural genomics efforts. This was made possible because of third-generation synchrotron sources, structure phasing approaches using anomalous signal and cryo-crystallography. Complementary progress in molecular biology, proteomics, hardware and software for crystallographic data collection, structure determination and refinement, computer science, databases, robotics and automation improved and accelerated many processes. These advancements provide the robust foundation for structural molecular biology and assure strong contribution to science in the future. In this report we focus mainly on reviewing structural genomics high-throughput X-ray crystallography technologies and their impact. PMID:19765976

  5. Comparative analysis of essential genes in prokaryotic genomic islands

    PubMed Central

    Zhang, Xi; Peng, Chong; Zhang, Ge; Gao, Feng

    2015-01-01

    Essential genes are thought to encode proteins that carry out the basic functions to sustain a cellular life, and genomic islands (GIs) usually contain clusters of horizontally transferred genes. It has been assumed that essential genes are not likely to be located in GIs, but systematical analysis of essential genes in GIs has not been explored before. Here, we have analyzed the essential genes in 28 prokaryotes by statistical method and reached a conclusion that essential genes in GIs are significantly fewer than those outside GIs. The function of 362 essential genes found in GIs has been explored further by BLAST against the Virulence Factor Database (VFDB) and the phage/prophage sequence database of PHAge Search Tool (PHAST). Consequently, 64 and 60 eligible essential genes are found to share the sequence similarity with the virulence factors and phage/prophages-related genes, respectively. Meanwhile, we find several toxin-related proteins and repressors encoded by these essential genes in GIs. The comparative analysis of essential genes in genomic islands will not only shed new light on the development of the prediction algorithm of essential genes, but also give a clue to detect the functionality of essential genes in genomic islands. PMID:26223387

  6. Genomic location and characterisation of MIC genes in cattle.

    PubMed

    Birch, James; De Juan Sanjuan, Cristina; Guzman, Efrain; Ellis, Shirley A

    2008-08-01

    Major histocompatibility complex (MHC) class I chain-related (MIC) genes have been previously identified and characterised in human. They encode polymorphic class I-like molecules that are stress-inducible, and constitute one of the ligands of the activating natural killer cell receptor NKG2D. We have identified three MIC genes within the cattle genome, located close to three non-classical MHC class I genes. The genomic position relative to other genes is very similar to the arrangement reported in the pig MHC region. Analysis of MIC cDNA sequences derived from a range of cattle cell lines suggest there may be four MIC genes in total. We have investigated the presence of the genes in distinct and well-defined MHC haplotypes, and show that one gene is consistently present, while configuration of the other three genes appears variable.

  7. Demarcating the gene-rich regions of the wheat genome

    PubMed Central

    Erayman, Mustafa; Sandhu, Devinder; Sidhu, Deepak; Dilbirligi, Muharrem; Baenziger, P. S.; Gill, Kulvinder S.

    2004-01-01

    By physically mapping 3025 loci including 252 phenotypically characterized genes and 17 quantitative trait loci (QTLs) relative to 334 deletion breakpoints, we localized the gene-containing fraction to 29% of the wheat genome present as 18 major and 30 minor gene-rich regions (GRRs). The GRRs varied both in gene number and density. The five largest GRRs physically spanning <3% of the genome contained 26% of the wheat genes. Approximate size of the GRRs ranged from 3 to 71 Mb. Recombination mainly occurred in the GRRs. Various GRRs varied as much as 128-fold for gene density and 140-fold for recombination rates. Except for a general suppression in 25–40% of the chromosomal region around centromeres, no correlation of recombination was observed with the gene density, the size, or chromosomal location of GRRs. More than 30% of the wheat genes are in recombination-poor regions thus are inaccessible to map-based cloning. PMID:15240829

  8. A potentially novel overlapping gene in the genomes of Israeli acute paralysis virus and its relatives

    PubMed Central

    Sabath, Niv; Price, Nicholas; Graur, Dan

    2009-01-01

    The Israeli acute paralysis virus (IAPV) is a honeybee-infecting virus that was found to be associated with colony collapse disorder. The IAPV genome contains two genes encoding a structural and a nonstructural polyprotein. We applied a recently developed method for the estimation of selection in overlapping genes to detect purifying selection and, hence, functionality. We provide evolutionary evidence for the existence of a functional overlapping gene, which is translated in the +1 reading frame of the structural polyprotein gene. Conserved orthologs of this putative gene, which we provisionally call pog (predicted overlapping gene), were also found in the genomes of a monophyletic clade of dicistroviruses that includes IAPV, acute bee paralysis virus, Kashmir bee virus, and Solenopsis invicta (red imported fire ant) virus 1. PMID:19761605

  9. Impact of recurrent gene duplication on adaptation of plant genomes

    PubMed Central

    2014-01-01

    Background Recurrent gene duplication and retention played an important role in angiosperm genome evolution. It has been hypothesized that these processes contribute significantly to plant adaptation but so far this hypothesis has not been tested at the genome scale. Results We studied available sequenced angiosperm genomes to assess the frequency of positive selection footprints in lineage specific expanded (LSE) gene families compared to single-copy genes using a dN/dS-based test in a phylogenetic framework. We found 5.38% of alignments in LSE genes with codons under positive selection. In contrast, we found no evidence for codons under positive selection in the single-copy reference set. An analysis at the branch level shows that purifying selection acted more strongly on single-copy genes than on LSE gene clusters. Moreover we detect significantly more branches indicating evolution under positive selection and/or relaxed constraint in LSE genes than in single-copy genes. Conclusions In this – to our knowledge –first genome-scale study we provide strong empirical support for the hypothesis that LSE genes fuel adaptation in angiosperms. Our conservative approach for detecting selection footprints as well as our results can be of interest for further studies on (plant) gene family evolution. PMID:24884640

  10. Distinct Gene Number-Genome Size Relationships for Eukaryotes and Non-Eukaryotes: Gene Content Estimation for Dinoflagellate Genomes

    PubMed Central

    Hou, Yubo; Lin, Senjie

    2009-01-01

    The ability to predict gene content is highly desirable for characterization of not-yet sequenced genomes like those of dinoflagellates. Using data from completely sequenced and annotated genomes from phylogenetically diverse lineages, we investigated the relationship between gene content and genome size using regression analyses. Distinct relationships between log10-transformed protein-coding gene number (Y′) versus log10-transformed genome size (X′, genome size in kbp) were found for eukaryotes and non-eukaryotes. Eukaryotes best fit a logarithmic model, Y′ = ln(-46.200+22.678X′, whereas non-eukaryotes a linear model, Y′ = 0.045+0.977X′, both with high significance (p<0.001, R2>0.91). Total gene number shows similar trends in both groups to their respective protein coding regressions. The distinct correlations reflect lower and decreasing gene-coding percentages as genome size increases in eukaryotes (82%–1%) compared to higher and relatively stable percentages in prokaryotes and viruses (97%–47%). The eukaryotic regression models project that the smallest dinoflagellate genome (3×106 kbp) contains 38,188 protein-coding (40,086 total) genes and the largest (245×106 kbp) 87,688 protein-coding (92,013 total) genes, corresponding to 1.8% and 0.05% gene-coding percentages. These estimates do not likely represent extraordinarily high functional diversity of the encoded proteome but rather highly redundant genomes as evidenced by high gene copy numbers documented for various dinoflagellate species. PMID:19750009

  11. Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

    PubMed

    Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

    2016-04-01

    Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp.

  12. Structural divergence between the human and chimpanzee genomes.

    PubMed

    Kehrer-Sawatzki, Hildegard; Cooper, David N

    2007-02-01

    The structural microheterogeneity evident between the human and chimpanzee genomes is quite considerable and includes inversions and duplications as well as deletions, ranging in size from a few base-pairs up to several megabases (Mb). Insertions and deletions have together given rise to at least 150 Mb of genomic DNA sequence that is either present or absent in humans as compared to chimpanzees. Such regions often contain paralogous sequences and members of multigene families thereby ensuring that the human and chimpanzee genomes differ by a significant fraction of their gene content. There is as yet no evidence to suggest that the large chromosomal rearrangements which serve to distinguish the human and chimpanzee karyotypes have influenced either speciation or the evolution of lineage-specific traits. However, the myriad submicroscopic rearrangements in both genomes, particularly those involving copy number variation, are unlikely to represent exclusively neutral changes and hence promise to facilitate the identification of genes that have been important for human-specific evolution.

  13. Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species.

    PubMed

    Nepal, Madhav P; Andersen, Ethan J; Neupane, Surendra; Benson, Benjamin V

    2017-09-30

    Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis, we investigated nTNL orthologs in the genomes of common bean, Medicago, soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis, common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence.

  14. Molecular Assemblies, Genes and Genomics Integrated Efficiently (MAGGIE)

    SciTech Connect

    Baliga, Nitin S

    2011-05-26

    Final report on MAGGIE. We set ambitious goals to model the functions of individual organisms and their community from molecular to systems scale. These scientific goals are driving the development of sophisticated algorithms to analyze large amounts of experimental measurements made using high throughput technologies to explain and predict how the environment influences biological function at multiple scales and how the microbial systems in turn modify the environment. By experimentally evaluating predictions made using these models we will test the degree to which our quantitative multiscale understanding wilt help to rationally steer individual microbes and their communities towards specific tasks. Towards this end we have made substantial progress towards understanding evolution of gene families, transcriptional structures, detailed structures of keystone molecular assemblies (proteins and complexes), protein interactions, biological networks, microbial interactions, and community structure. Using comparative analysis we have tracked the evolutionary history of gene functions to understand how novel functions evolve. One level up, we have used proteomics data, high-resolution genome tiling microarrays, and 5' RNA sequencing to revise genome annotations, discover new genes including ncRNAs, and map dynamically changing operon structures of five model organisms: For Desulfovibrio vulgaris Hildenborough, Pyrococcus furiosis, Sulfolobus solfataricus, Methanococcus maripaludis and Haiobacterium salinarum NROL We have developed machine learning algorithms to accurately identify protein interactions at a near-zero false positive rate from noisy data generated using tagfess complex purification, TAP purification, and analysis of membrane complexes. Combining other genome-scale datasets produced by ENIGMA (in particular, microarray data) and available from literature we have been able to achieve a true positive rate as high as 65% at almost zero false positives when

  15. What makes up plant genomes: The vanishing line between transposable elements and genes.

    PubMed

    Zhao, Dongyan; Ferguson, Ann A; Jiang, Ning

    2016-02-01

    The ultimate source of evolution is mutation. As the largest component in plant genomes, transposable elements (TEs) create numerous types of mutations that cannot be mimicked by other genetic mechanisms. When TEs insert into genomic sequences, they influence the expression of nearby genes as well as genes unlinked to the insertion. TEs can duplicate, mobilize, and recombine normal genes or gene fragments, with the potential to generate new genes or modify the structure of existing genes. TEs also donate their transposase coding regions for cellular functions in a process called TE domestication. Despite the host defense against TE activity, a subset of TEs survived and thrived through discreet selection of transposition activity, target site, element size, and the internal sequence. Finally, TEs have established strategies to reduce the efficacy of host defense system by increasing the cost of silencing TEs. This review discusses the recent progress in the area of plant TEs with a focus on the interaction between TEs and genes.

  16. Genome-wide identification and expression profiling of ankyrin-repeat gene family in maize.

    PubMed

    Jiang, Haiyang; Wu, Qingqing; Jin, Jing; Sheng, Lei; Yan, Hanwei; Cheng, Beijiu; Zhu, Suwen

    2013-09-01

    Members of the ankyrin repeats (ANK) gene family encode ANK domain that are common in diverse organisms and play important roles in cell growth and development, such as cell-cell signal transduction and cell cycle regulation. Recently, genome-wide identification and evolutionary analyses of the ANK gene family have been carried out in Arabidopsis and rice. However, little is known regarding the ANK genes in the entire maize genome. In this study, we described the identification and structural characterization of 71 ANK genes in maize (ZmANK). Then, comprehensive bioinformatics analyses of ZmANK genes family were performed including phylogenetic, domain and motif analysis, chromosomal localization, intron/exon structural patterns, gene duplications and expression profiling. Domain composition analyses showed that ZmANK genes formed ten subfamilies. Five tandem duplications and 14 segmental duplications were identified in ZmANK genes. Furthermore, we took comparative analysis of the total ANK gene family in Arabidopsis, rice and maize, ZmANKs were more closely paired with OsANKs than with AtANKs. At last, expression profile analyses were performed. Forty-one members of ZmANK genes held EST sequences records. Semi-quantitative expression and microarray data analysis of these 41 ZmANK genes demonstrated that ZmANK genes exhibit a various expression pattern, suggesting that functional diversification of ZmANK genes family. The results will present significant insights to explore ANK genes expression and function in future studies in maize.

  17. Exogenous gene integration mediated by genome editing technologies in zebrafish.

    PubMed

    Morita, Hitoshi; Taimatsu, Kiyohito; Yanagi, Kanoko; Kawahara, Atsuo

    2017-03-08

    Genome editing technologies, such as transcription activator-like effector nuclease (TALEN) and the clustered regularly interspaced short palindromic repeat (CRISPR)/ CRISPR-associated protein (Cas) systems, can induce DNA double-strand breaks (DSBs) at the targeted genomic locus, leading to frameshift-mediated gene disruption in the process of DSB repair. Recently, the technology-induced DSBs followed by DSB repairs are applied to integrate exogenous genes into the targeted genomic locus in various model organisms. In addition to a conventional knock-in technology mediated by homology-directed repair (HDR), novel knock-in technologies using refined donor vectors have also been developed with the genome editing technologies based on other DSB repair mechanisms, including non-homologous end joining (NHEJ) and microhomology-mediated end joining (MMEJ). Therefore, the improved knock-in technologies would contribute to freely modify the genome of model organisms.

  18. Evolution of genes and genomes on the Drosophila phylogeny.

    PubMed

    Clark, Andrew G; Eisen, Michael B; Smith, Douglas R; Bergman, Casey M; Oliver, Brian; Markow, Therese A; Kaufman, Thomas C; Kellis, Manolis; Gelbart, William; Iyer, Venky N; Pollard, Daniel A; Sackton, Timothy B; Larracuente, Amanda M; Singh, Nadia D; Abad, Jose P; Abt, Dawn N; Adryan, Boris; Aguade, Montserrat; Akashi, Hiroshi; Anderson, Wyatt W; Aquadro, Charles F; Ardell, David H; Arguello, Roman; Artieri, Carlo G; Barbash, Daniel A; Barker, Daniel; Barsanti, Paolo; Batterham, Phil; Batzoglou, Serafim; Begun, Dave; Bhutkar, Arjun; Blanco, Enrico; Bosak, Stephanie A; Bradley, Robert K; Brand, Adrianne D; Brent, Michael R; Brooks, Angela N; Brown, Randall H; Butlin, Roger K; Caggese, Corrado; Calvi, Brian R; Bernardo de Carvalho, A; Caspi, Anat; Castrezana, Sergio; Celniker, Susan E; Chang, Jean L; Chapple, Charles; Chatterji, Sourav; Chinwalla, Asif; Civetta, Alberto; Clifton, Sandra W; Comeron, Josep M; Costello, James C; Coyne, Jerry A; Daub, Jennifer; David, Robert G; Delcher, Arthur L; Delehaunty, Kim; Do, Chuong B; Ebling, Heather; Edwards, Kevin; Eickbush, Thomas; Evans, Jay D; Filipski, Alan; Findeiss, Sven; Freyhult, Eva; Fulton, Lucinda; Fulton, Robert; Garcia, Ana C L; Gardiner, Anastasia; Garfield, David A; Garvin, Barry E; Gibson, Greg; Gilbert, Don; Gnerre, Sante; Godfrey, Jennifer; Good, Robert; Gotea, Valer; Gravely, Brenton; Greenberg, Anthony J; Griffiths-Jones, Sam; Gross, Samuel; Guigo, Roderic; Gustafson, Erik A; Haerty, Wilfried; Hahn, Matthew W; Halligan, Daniel L; Halpern, Aaron L; Halter, Gillian M; Han, Mira V; Heger, Andreas; Hillier, LaDeana; Hinrichs, Angie S; Holmes, Ian; Hoskins, Roger A; Hubisz, Melissa J; Hultmark, Dan; Huntley, Melanie A; Jaffe, David B; Jagadeeshan, Santosh; Jeck, William R; Johnson, Justin; Jones, Corbin D; Jordan, William C; Karpen, Gary H; Kataoka, Eiko; Keightley, Peter D; Kheradpour, Pouya; Kirkness, Ewen F; Koerich, Leonardo B; Kristiansen, Karsten; Kudrna, Dave; Kulathinal, Rob J; Kumar, Sudhir; Kwok, Roberta; Lander, Eric; Langley, Charles H; Lapoint, Richard; Lazzaro, Brian P; Lee, So-Jeong; Levesque, Lisa; Li, Ruiqiang; Lin, Chiao-Feng; Lin, Michael F; Lindblad-Toh, Kerstin; Llopart, Ana; Long, Manyuan; Low, Lloyd; Lozovsky, Elena; Lu, Jian; Luo, Meizhong; Machado, Carlos A; Makalowski, Wojciech; Marzo, Mar; Matsuda, Muneo; Matzkin, Luciano; McAllister, Bryant; McBride, Carolyn S; McKernan, Brendan; McKernan, Kevin; Mendez-Lago, Maria; Minx, Patrick; Mollenhauer, Michael U; Montooth, Kristi; Mount, Stephen M; Mu, Xu; Myers, Eugene; Negre, Barbara; Newfeld, Stuart; Nielsen, Rasmus; Noor, Mohamed A F; O'Grady, Patrick; Pachter, Lior; Papaceit, Montserrat; Parisi, Matthew J; Parisi, Michael; Parts, Leopold; Pedersen, Jakob S; Pesole, Graziano; Phillippy, Adam M; Ponting, Chris P; Pop, Mihai; Porcelli, Damiano; Powell, Jeffrey R; Prohaska, Sonja; Pruitt, Kim; Puig, Marta; Quesneville, Hadi; Ram, Kristipati Ravi; Rand, David; Rasmussen, Matthew D; Reed, Laura K; Reenan, Robert; Reily, Amy; Remington, Karin A; Rieger, Tania T; Ritchie, Michael G; Robin, Charles; Rogers, Yu-Hui; Rohde, Claudia; Rozas, Julio; Rubenfield, Marc J; Ruiz, Alfredo; Russo, Susan; Salzberg, Steven L; Sanchez-Gracia, Alejandro; Saranga, David J; Sato, Hajime; Schaeffer, Stephen W; Schatz, Michael C; Schlenke, Todd; Schwartz, Russell; Segarra, Carmen; Singh, Rama S; Sirot, Laura; Sirota, Marina; Sisneros, Nicholas B; Smith, Chris D; Smith, Temple F; Spieth, John; Stage, Deborah E; Stark, Alexander; Stephan, Wolfgang; Strausberg, Robert L; Strempel, Sebastian; Sturgill, David; Sutton, Granger; Sutton, Granger G; Tao, Wei; Teichmann, Sarah; Tobari, Yoshiko N; Tomimura, Yoshihiko; Tsolas, Jason M; Valente, Vera L S; Venter, Eli; Venter, J Craig; Vicario, Saverio; Vieira, Filipe G; Vilella, Albert J; Villasante, Alfredo; Walenz, Brian; Wang, Jun; Wasserman, Marvin; Watts, Thomas; Wilson, Derek; Wilson, Richard K; Wing, Rod A; Wolfner, Mariana F; Wong, Alex; Wong, Gane Ka-Shu; Wu, Chung-I; Wu, Gabriel; Yamamoto, Daisuke; Yang, Hsiao-Pei; Yang, Shiaw-Pyng; Yorke, James A; Yoshida, Kiyohito; Zdobnov, Evgeny; Zhang, Peili; Zhang, Yu; Zimin, Aleksey V; Baldwin, Jennifer; Abdouelleil, Amr; Abdulkadir, Jamal; Abebe, Adal; Abera, Brikti; Abreu, Justin; Acer, St Christophe; Aftuck, Lynne; Alexander, Allen; An, Peter; Anderson, Erica; Anderson, Scott; Arachi, Harindra; Azer, Marc; Bachantsang, Pasang; Barry, Andrew; Bayul, Tashi; Berlin, Aaron; Bessette, Daniel; Bloom, Toby; Blye, Jason; Boguslavskiy, Leonid; Bonnet, Claude; Boukhgalter, Boris; Bourzgui, Imane; Brown, Adam; Cahill, Patrick; Channer, Sheridon; Cheshatsang, Yama; Chuda, Lisa; Citroen, Mieke; Collymore, Alville; Cooke, Patrick; Costello, Maura; D'Aco, Katie; Daza, Riza; De Haan, Georgius; DeGray, Stuart; DeMaso, Christina; Dhargay, Norbu; Dooley, Kimberly; Dooley, Erin; Doricent, Missole; Dorje, Passang; Dorjee, Kunsang; Dupes, Alan; Elong, Richard; Falk, Jill; Farina, Abderrahim; Faro, Susan; Ferguson, Diallo; Fisher, Sheila; Foley, Chelsea D; Franke, Alicia; Friedrich, Dennis; Gadbois, Loryn; Gearin, Gary; Gearin, Christina R; Giannoukos, Georgia; Goode, Tina; Graham, Joseph; Grandbois, Edward; Grewal, Sharleen; Gyaltsen, Kunsang; Hafez, Nabil; Hagos, Birhane; Hall, Jennifer; Henson, Charlotte; Hollinger, Andrew; Honan, Tracey; Huard, Monika D; Hughes, Leanne; Hurhula, Brian; Husby, M Erii; Kamat, Asha; Kanga, Ben; Kashin, Seva; Khazanovich, Dmitry; Kisner, Peter; Lance, Krista; Lara, Marcia; Lee, William; Lennon, Niall; Letendre, Frances; LeVine, Rosie; Lipovsky, Alex; Liu, Xiaohong; Liu, Jinlei; Liu, Shangtao; Lokyitsang, Tashi; Lokyitsang, Yeshi; Lubonja, Rakela; Lui, Annie; MacDonald, Pen; Magnisalis, Vasilia; Maru, Kebede; Matthews, Charles; McCusker, William; McDonough, Susan; Mehta, Teena; Meldrim, James; Meneus, Louis; Mihai, Oana; Mihalev, Atanas; Mihova, Tanya; Mittelman, Rachel; Mlenga, Valentine; Montmayeur, Anna; Mulrain, Leonidas; Navidi, Adam; Naylor, Jerome; Negash, Tamrat; Nguyen, Thu; Nguyen, Nga; Nicol, Robert; Norbu, Choe; Norbu, Nyima; Novod, Nathaniel; O'Neill, Barry; Osman, Sahal; Markiewicz, Eva; Oyono, Otero L; Patti, Christopher; Phunkhang, Pema; Pierre, Fritz; Priest, Margaret; Raghuraman, Sujaa; Rege, Filip; Reyes, Rebecca; Rise, Cecil; Rogov, Peter; Ross, Keenan; Ryan, Elizabeth; Settipalli, Sampath; Shea, Terry; Sherpa, Ngawang; Shi, Lu; Shih, Diana; Sparrow, Todd; Spaulding, Jessica; Stalker, John; Stange-Thomann, Nicole; Stavropoulos, Sharon; Stone, Catherine; Strader, Christopher; Tesfaye, Senait; Thomson, Talene; Thoulutsang, Yama; Thoulutsang, Dawa; Topham, Kerri; Topping, Ira; Tsamla, Tsamla; Vassiliev, Helen; Vo, Andy; Wangchuk, Tsering; Wangdi, Tsering; Weiand, Michael; Wilkinson, Jane; Wilson, Adam; Yadav, Shailendra; Young, Geneva; Yu, Qing; Zembek, Lisa; Zhong, Danni; Zimmer, Andrew; Zwirko, Zac; Jaffe, David B; Alvarez, Pablo; Brockman, Will; Butler, Jonathan; Chin, CheeWhye; Gnerre, Sante; Grabherr, Manfred; Kleber, Michael; Mauceli, Evan; MacCallum, Iain

    2007-11-08

    Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.

  19. Genome-Wide Detection and Analysis of Multifunctional Genes

    PubMed Central

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  20. Evidence-based gene predictions in plant genomes.

    PubMed

    Liang, Chengzhi; Mao, Long; Ware, Doreen; Stein, Lincoln

    2009-10-01

    Automated evidence-based gene building is a rapid and cost-effective way to provide reliable gene annotations on newly sequenced genomes. One of the limitations of evidence-based gene builders, however, is their requirement for transcriptional evidence-known proteins, full-length cDNAs, or expressed sequence tags (ESTs)-in the species of interest. This limitation is of particular concern for plant genomes, where the rate of genome sequencing is greatly outpacing the rate of EST- and cDNA-sequencing projects. To overcome this limitation, we have developed an evidence-based gene build system (the Gramene pipeline) that can use transcriptional evidence across related species. The Gramene pipeline uses the Ensembl computing infrastructure with a novel data processing scheme. Using the previously annotated plant genomes, the dicot Arabidopsis thaliana and the monocot Oryza sativa, we show that the cross-species ESTs from within monocot or dicot class are a valuable source of evidence for gene predictions. We also find that, using only EST and cross-species evidence, the Gramene pipeline can generate a plant gene set that is comparable in quality to the human genes based on known proteins and full-length cDNAs. We compare the Gramene pipeline to several widely used ab initio gene prediction programs in rice; this comparison shows the pipeline performs favorably at both the gene and exon levels with cross-species gene products only. We discuss the results of testing the pipeline on a 22-Mb region of the newly sequenced maize genome and discuss potential application of the pipeline to other genomes.

  1. Pinpointing disease genes through phenomic and genomic data fusion

    PubMed Central

    2015-01-01

    Background Pinpointing genes involved in inherited human diseases remains a great challenge in the post-genomics era. Although approaches have been proposed either based on the guilt-by-association principle or making use of disease phenotype similarities, the low coverage of both diseases and genes in existing methods has been preventing the scan of causative genes for a significant proportion of diseases at the whole-genome level. Results To overcome this limitation, we proposed a rigorous statistical method called pgFusion to prioritize candidate genes by integrating one type of disease phenotype similarity derived from the Unified Medical Language System (UMLS) and seven types of gene functional similarities calculated from gene expression, gene ontology, pathway membership, protein sequence, protein domain, protein-protein interaction and regulation pattern, respectively. Our method covered a total of 7,719 diseases and 20,327 genes, achieving the highest coverage thus far for both diseases and genes. We performed leave-one-out cross-validation experiments to demonstrate the superior performance of our method and applied it to a real exome sequencing dataset of epileptic encephalopathies, showing the capability of this approach in finding causative genes for complex diseases. We further provided the standalone software and online services of pgFusion at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgfusion. Conclusions pgFusion not only provided an effective way for prioritizing candidate genes, but also demonstrated feasible solutions to two fundamental questions in the analysis of big genomic data: the comparability of heterogeneous data and the integration of multiple types of data. Applications of this method in exome or whole genome sequencing studies would accelerate the finding of causative genes for human diseases. Other research fields in genomics could also benefit from the incorporation of our data fusion methodology. PMID:25708473

  2. Pinpointing disease genes through phenomic and genomic data fusion.

    PubMed

    Jiang, Rui; Wu, Mengmeng; Li, Lianshuo

    2015-01-01

    Pinpointing genes involved in inherited human diseases remains a great challenge in the post-genomics era. Although approaches have been proposed either based on the guilt-by-association principle or making use of disease phenotype similarities, the low coverage of both diseases and genes in existing methods has been preventing the scan of causative genes for a significant proportion of diseases at the whole-genome level. To overcome this limitation, we proposed a rigorous statistical method called pgFusion to prioritize candidate genes by integrating one type of disease phenotype similarity derived from the Unified Medical Language System (UMLS) and seven types of gene functional similarities calculated from gene expression, gene ontology, pathway membership, protein sequence, protein domain, protein-protein interaction and regulation pattern, respectively. Our method covered a total of 7,719 diseases and 20,327 genes, achieving the highest coverage thus far for both diseases and genes. We performed leave-one-out cross-validation experiments to demonstrate the superior performance of our method and applied it to a real exome sequencing dataset of epileptic encephalopathies, showing the capability of this approach in finding causative genes for complex diseases. We further provided the standalone software and online services of pgFusion at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgfusion. pgFusion not only provided an effective way for prioritizing candidate genes, but also demonstrated feasible solutions to two fundamental questions in the analysis of big genomic data: the comparability of heterogeneous data and the integration of multiple types of data. Applications of this method in exome or whole genome sequencing studies would accelerate the finding of causative genes for human diseases. Other research fields in genomics could also benefit from the incorporation of our data fusion methodology.

  3. Genome-wide identification and analysis of the MADS-box gene family in apple.

    PubMed

    Tian, Yi; Dong, Qinglong; Ji, Zhirui; Chi, Fumei; Cong, Peihua; Zhou, Zongshan

    2015-01-25

    The MADS-box gene family is one of the most widely studied families in plants and has diverse developmental roles in flower pattern formation, gametophyte cell division and fruit differentiation. Although the genome-wide analysis of this family has been performed in some species, little is known regarding MADS-box genes in apple (Malus domestica). In this study, 146 MADS-box genes were identified in the apple genome and were phylogenetically clustered into six subgroups (MIKC(c), MIKC*, Mα, Mβ, Mγ and Mδ) with the MADS-box genes from Arabidopsis and rice. The predicted apple MADS-box genes were distributed across all 17 chromosomes at different densities. Additionally, the MADS-box domain, exon length, gene structure and motif compositions of the apple MADS-box genes were analysed. Moreover, the expression of all of the apple MADS-box genes was analysed in the root, stem, leaf, flower tissues and five stages of fruit development. All of the apple MADS-box genes, with the exception of some genes in each group, were expressed in at least one of the tissues tested, which indicates that the MADS-box genes are involved in various aspects of the physiological and developmental processes of the apple. To the best of our knowledge, this report describes the first genome-wide analysis of the apple MADS-box gene family, and the results should provide valuable information for understanding the classification, cloning and putative functions of this family.

  4. The cavefish genome reveals candidate genes for eye loss

    PubMed Central

    McGaugh, Suzanne E.; Gross, Joshua B.; Aken, Bronwen; Blin, Maryline; Borowsky, Richard; Chalopin, Domitille; Hinaux, Hélène; Jeffery, William R.; Keene, Alex; Ma, Li; Minx, Patrick; Murphy, Daniel; O’Quin, Kelly E.; Rétaux, Sylvie; Rohner, Nicolas; Searle, Steve M. J.; Stahl, Bethany A.; Tabin, Cliff; Volff, Jean-Nicolas; Yoshizawa, Masato; Warren, Wesley C.

    2014-01-01

    Natural populations subjected to strong environmental selection pressures offer a window into the genetic underpinnings of evolutionary change. Cavefish populations, Astyanax mexicanus (Teleostei: Characiphysi), exhibit repeated, independent evolution for a variety of traits including eye degeneration, pigment loss, increased size and number of taste buds and mechanosensory organs, and shifts in many behavioural traits. Surface and cave forms are interfertile making this system amenable to genetic interrogation; however, lack of a reference genome has hampered efforts to identify genes responsible for changes in cave forms of A. mexicanus. Here we present the first de novo genome assembly for Astyanax mexicanus cavefish, contrast repeat elements to other teleost genomes, identify candidate genes underlying quantitative trait loci (QTL), and assay these candidate genes for potential functional and expression differences. We expect the cavefish genome to advance understanding of the evolutionary process, as well as, analogous human disease including retinal dysfunction. PMID:25329095

  5. Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss

    PubMed Central

    2010-01-01

    Background The bacterial genus Listeria contains pathogenic and non-pathogenic species, including the pathogens L. monocytogenes and L. ivanovii, both of which carry homologous virulence gene clusters such as the prfA cluster and clusters of internalin genes. Initial evidence for multiple deletions of the prfA cluster during the evolution of Listeria indicates that this genus provides an interesting model for studying the evolution of virulence and also presents practical challenges with regard to definition of pathogenic strains. Results To better understand genome evolution and evolution of virulence characteristics in Listeria, we used a next generation sequencing approach to generate draft genomes for seven strains representing Listeria species or clades for which genome sequences were not available. Comparative analyses of these draft genomes and six publicly available genomes, which together represent the main Listeria species, showed evidence for (i) a pangenome with 2,032 core and 2,918 accessory genes identified to date, (ii) a critical role of gene loss events in transition of Listeria species from facultative pathogen to saprotroph, even though a consistent pattern of gene loss seemed to be absent, and a number of isolates representing non-pathogenic species still carried some virulence associated genes, and (iii) divergence of modern pathogenic and non-pathogenic Listeria species and strains, most likely circa 47 million years ago, from a pathogenic common ancestor that contained key virulence genes. Conclusions Genome evolution in Listeria involved limited gene loss and acquisition as supported by (i) a relatively high coverage of the predicted pan-genome by the observed pan-genome, (ii) conserved genome size (between 2.8 and 3.2 Mb), and (iii) a highly syntenic genome. Limited gene loss in Listeria did include loss of virulence associated genes, likely associated with multiple transitions to a saprotrophic lifestyle. The genus Listeria thus provides

  6. Comparative genomics of the bacterial genus Listeria: Genome evolution is characterized by limited gene acquisition and limited gene loss.

    PubMed

    den Bakker, Henk C; Cummings, Craig A; Ferreira, Vania; Vatta, Paolo; Orsi, Renato H; Degoricija, Lovorka; Barker, Melissa; Petrauskene, Olga; Furtado, Manohar R; Wiedmann, Martin

    2010-12-02

    The bacterial genus Listeria contains pathogenic and non-pathogenic species, including the pathogens L. monocytogenes and L. ivanovii, both of which carry homologous virulence gene clusters such as the prfA cluster and clusters of internalin genes. Initial evidence for multiple deletions of the prfA cluster during the evolution of Listeria indicates that this genus provides an interesting model for studying the evolution of virulence and also presents practical challenges with regard to definition of pathogenic strains. To better understand genome evolution and evolution of virulence characteristics in Listeria, we used a next generation sequencing approach to generate draft genomes for seven strains representing Listeria species or clades for which genome sequences were not available. Comparative analyses of these draft genomes and six publicly available genomes, which together represent the main Listeria species, showed evidence for (i) a pangenome with 2,032 core and 2,918 accessory genes identified to date, (ii) a critical role of gene loss events in transition of Listeria species from facultative pathogen to saprotroph, even though a consistent pattern of gene loss seemed to be absent, and a number of isolates representing non-pathogenic species still carried some virulence associated genes, and (iii) divergence of modern pathogenic and non-pathogenic Listeria species and strains, most likely circa 47 million years ago, from a pathogenic common ancestor that contained key virulence genes. Genome evolution in Listeria involved limited gene loss and acquisition as supported by (i) a relatively high coverage of the predicted pan-genome by the observed pan-genome, (ii) conserved genome size (between 2.8 and 3.2 Mb), and (iii) a highly syntenic genome. Limited gene loss in Listeria did include loss of virulence associated genes, likely associated with multiple transitions to a saprotrophic lifestyle. The genus Listeria thus provides an example of a group of

  7. Analysis of correlation structures in the Synechocystis PCC6803 genome.

    PubMed

    Wu, Zuo-Bing

    2014-12-01

    Transfer of nucleotide strings in the Synechocystis sp. PCC6803 genome is investigated to exhibit periodic and non-periodic correlation structures by using the recurrence plot method and the phase space reconstruction technique. The periodic correlation structures are generated by periodic transfer of several substrings in long periodic or non-periodic nucleotide strings embedded in the coding regions of genes. The non-periodic correlation structures are generated by non-periodic transfer of several substrings covering or overlapping with the coding regions of genes. In the periodic and non-periodic transfer, some gaps divide the long nucleotide strings into the substrings and prevent their global transfer. Most of the gaps are either the replacement of one base or the insertion/reduction of one base. In the reconstructed phase space, the points generated from two or three steps for the continuous iterative transfer via the second maximal distance can be fitted by two lines. It partly reveals an intrinsic dynamics in the transfer of nucleotide strings. Due to the comparison of the relative positions and lengths, the substrings concerned with the non-periodic correlation structures are almost identical to the mobile elements annotated in the genome. The mobile elements are thus endowed with the basic results on the correlation structures.

  8. Mapping ancestral genomes with massive gene loss: A matrix sandwich problem

    PubMed Central

    Gavranović, Haris; Chauve, Cedric; Salse, Jérôme; Tannier, Eric

    2011-01-01

    Motivation: Ancestral genomes provide a better way to understand the structural evolution of genomes than the simple comparison of extant genomes. Most ancestral genome reconstruction methods rely on universal markers, that is, homologous families of DNA segments present in exactly one exemplar in every considered species. Complex histories of genes or other markers, undergoing duplications and losses, are rarely taken into account. It follows that some ancestors are inaccessible by these methods, such as the proto–monocotyledon whose evolution involved massive gene loss following a whole genome duplication. Results: We propose a mapping approach based on the combinatorial notion of ‘sandwich consecutive ones matrix’, which explicitly takes gene losses into account. We introduce combinatorial optimization problems related to this concept, and propose a heuristic solver and a lower bound on the optimal solution. We use these results to propose a configuration for the proto-chromosomes of the monocot ancestor, and study the accuracy of this configuration. We also use our method to reconstruct the ancestral boreoeutherian genomes, which illustrates that the framework we propose is not specific to plant paleogenomics but is adapted to reconstruct any ancestral genome from extant genomes with heterogeneous marker content. Availability: Upon request to the authors. Contact: haris.gavranovic@gmail.com; eric.tannier@inria.fr PMID:21685079

  9. Ligninolytic peroxidase genes in the oyster mushroom genome: heterologous expression, molecular structure, catalytic and stability properties, and lignin-degrading ability

    Treesearch

    Elena Fernández-Fueyo; Francisco J Ruiz-Dueñas; María Jesús Martinez; Antonio Romero; Kenneth E Hammel; Francisco Javier Medrano; Angel T. Martínez

    2014-01-01

    Background: The genome of Pleurotus ostreatus, an important edible mushroom and a model ligninolytic organism of interest in lignocellulose biorefineries due to its ability to delignify agricultural wastes, was sequenced with the purpose of identifying and characterizing the enzymes responsible for lignin degradation. ...

  10. From Genomics to Gene Therapy: Induced Pluripotent Stem Cells Meet Genome Editing.

    PubMed

    Hotta, Akitsu; Yamanaka, Shinya

    2015-01-01

    The advent of induced pluripotent stem (iPS) cells has opened up numerous avenues of opportunity for cell therapy, including the initiation in September 2014 of the first human clinical trial to treat dry age-related macular degeneration. In parallel, advances in genome-editing technologies by site-specific nucleases have dramatically improved our ability to edit endogenous genomic sequences at targeted sites of interest. In fact, clinical trials have already begun to implement this technology to control HIV infection. Genome editing in iPS cells is a powerful tool and enables researchers to investigate the intricacies of the human genome in a dish. In the near future, the groundwork laid by such an approach may expand the possibilities of gene therapy for treating congenital disorders. In this review, we summarize the exciting progress being made in the utilization of genomic editing technologies in pluripotent stem cells and discuss remaining challenges toward gene therapy applications.

  11. Gene calling and bacterial genome annotation with BG7.

    PubMed

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  12. Comparative genetics and genomics of nematodes: genome structure, development, and lifestyle.

    PubMed

    Sommer, Ralf J; Streit, Adrian

    2011-01-01

    Nematodes are found in virtually all habitats on earth. Many of them are parasites of plants and animals, including humans. The free-living nematode, Caenorhabditis elegans, is one of the genetically best-studied model organisms and was the first metazoan whose genome was fully sequenced. In recent years, the draft genome sequences of another six nematodes representing four of the five major clades of nematodes were published. Compared to mammalian genomes, all these genomes are very small. Nevertheless, they contain almost the same number of genes as the human genome. Nematodes are therefore a very attractive system for comparative genetic and genomic studies, with C. elegans as an excellent baseline. Here, we review the efforts that were made to extend genetic analysis to nematodes other than C. elegans, and we compare the seven available nematode genomes. One of the most striking findings is the unexpectedly high incidence of gene acquisition through horizontal gene transfer (HGT).

  13. LATERAL GENE TRANSFER AND THE HISTORY OF BACTERIAL GENOMES

    SciTech Connect

    Howard Ochman

    2006-02-22

    The aims of this research were to elucidate the role and extent of lateral transfer in the differentiation of bacterial strains and species, and to assess the impact of gene transfer on the evolution of bacterial genomes. The ultimate goal of the project is to examine the dynamics of a core set of protein-coding genes (i.e., those that are distributed universally among Bacteria) by developing conserved primers that would allow their amplification and sequencing in any bacterial taxa. In addition, we adopted a bioinformatic approach to elucidate the extent of lateral gene transfer in sequenced genome.

  14. Building phylogenetic trees by using gene Nucleotide Genomic Signals.

    PubMed

    Cristea, Paul Dan

    2012-01-01

    Nucleotide genomic signal (NuGS) methodology allows a molecular level approach to determine distances between homologous genes or between conserved equivalent non-coding genome regions in various species or individuals of the same species. Therefore, distances between the genes of species or individuals can be computed and phylogenetic trees can be built. The paper illustrates the use of the nucleotide imbalance (N) and nucleotide pair imbalance (P) signals to determine the distances between the genes of several Hominidae. The results are in accordance with those of other genetic or phylogenetic approaches to establish distances between Hominidae species.

  15. Genomic imprinting-an epigenetic gene-regulatory model.

    PubMed

    Koerner, Martha V; Barlow, Denise P

    2010-04-01

    Epigenetic mechanisms (Box 1) are considered to play major gene-regulatory roles in development, differentiation and disease. However, the relative importance of epigenetics in defining the mammalian transcriptome in normal and disease states is unknown. The mammalian genome contains only a few model systems where epigenetic gene regulation has been shown to play a major role in transcriptional control. These model systems are important not only to investigate the biological function of known epigenetic modifications but also to identify new and unexpected epigenetic mechanisms in the mammalian genome. Here we review recent progress in understanding how epigenetic mechanisms control imprinted gene expression.

  16. Re-annotation of genome microbial CoDing-Sequences: finding new genes and inaccurately annotated genes

    PubMed Central

    2002-01-01

    Background Analysis of any newly sequenced bacterial genome starts with the identification of protein-coding genes. Despite the accumulation of multiple complete genome sequences, which provide useful comparisons with close relatives among other organisms during the annotation process, accurate gene prediction remains quite difficult. A major reason for this situation is that genes are tightly packed in prokaryotes, resulting in frequent overlap. Thus, detection of translation initiation sites and/or selection of the correct coding regions remain difficult unless appropriate biological knowledge (about the structure of a gene) is imbedded in the approach. Results We have developed a new program that automatically identifies biologically significant candidate genes in a bacterial genome. Twenty-six complete prokaryotic genomes were analyzed using this tool, and the accuracy of gene finding was assessed by comparison with existing annotations. This analysis revealed that, despite the enormous effort of genome program annotators, a small but not negligible number of genes annotated within the framework of sequencing projects are likely to be partially inaccurate or plainly wrong. Moreover, the analysis of several putative new genes shows that, as expected, many short genes have escaped annotation. In most cases, these new genes revealed frameshifts that could be either artifacts or genuine frameshifts. Some entirely unexpected new genes have also been identified. This allowed us to get a more complete picture of prokaryotic genomes. The results of this procedure are progressively integrated into the SWISS-PROT reference databank. Conclusions The results described in the present study show that our procedure is very satisfactory in terms of gene finding accuracy. Except in few cases, discrepancies between our results and annotations provided by individual authors can be accounted for by the nature of each annotation process or by specific characteristics of some

  17. Learning directed acyclic graphical structures with genetical genomics data.

    PubMed

    Gao, Bin; Cui, Yuehua

    2015-12-15

    Large amount of research efforts have been focused on estimating gene networks based on gene expression data to understand the functional basis of a living organism. Such networks are often obtained by considering pairwise correlations between genes, thus may not reflect the true connectivity between genes. By treating gene expressions as quantitative traits while considering genetic markers, genetical genomics analysis has shown its power in enhancing the understanding of gene regulations. Previous works have shown the improved performance on estimating the undirected network graphical structure by incorporating genetic markers as covariates. Knowing that gene expressions are often due to directed regulations, it is more meaningful to estimate the directed graphical network. In this article, we introduce a covariate-adjusted Gaussian graphical model to estimate the Markov equivalence class of the directed acyclic graphs (DAGs) in a genetical genomics analysis framework. We develop a two-stage estimation procedure to first estimate the regression coefficient matrix by [Formula: see text] penalization. The estimated coefficient matrix is then used to estimate the mean values in our multi-response Gaussian model to estimate the regulatory networks of gene expressions using PC-algorithm. The estimation consistency for high dimensional sparse DAGs is established. Simulations are conducted to demonstrate our theoretical results. The method is applied to a human Alzheimer's disease dataset in which differential DAGs are identified between cases and controls. R code for implementing the method can be downloaded at http://www.stt.msu.edu/∼cui. R code for implementing the method is freely available at http://www.stt.msu.edu/∼cui/software.html. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. Mechanisms underlying structural variant formation in genomic disorders

    PubMed Central

    Carvalho, Claudia M. B.; Lupski, James R.

    2016-01-01

    With the recent burst of technological developments in genomics, and the clinical implementation of genome-wide assays, our understanding of the molecular basis of genomic disorders, specifically the contribution of structural variation to disease burden, is evolving quickly. Ongoing studies have revealed a ubiquitous role for genome architecture in the formation of structural variants at a given locus, both in DNA recombination-based processes and in replication-based processes. These reports showcase the influence of repeat sequences on genomic stability and structural variant complexity and also highlight the tremendous plasticity and dynamic nature of our genome in evolution, health and disease susceptibility. PMID:26924765

  19. Estimating genome-wide gene networks using nonparametric Bayesian network models on massively parallel computers.

    PubMed

    Tamada, Yoshinori; Imoto, Seiya; Araki, Hiromitsu; Nagasaki, Masao; Print, Cristin; Charnock-Jones, D Stephen; Miyano, Satoru

    2011-01-01

    We present a novel algorithm to estimate genome-wide gene networks consisting of more than 20,000 genes from gene expression data using nonparametric Bayesian networks. Due to the difficulty of learning Bayesian network structures, existing algorithms cannot be applied to more than a few thousand genes. Our algorithm overcomes this limitation by repeatedly estimating subnetworks in parallel for genes selected by neighbor node sampling. Through numerical simulation, we confirmed that our algorithm outperformed a heuristic algorithm in a shorter time. We applied our algorithm to microarray data from human umbilical vein endothelial cells (HUVECs) treated with siRNAs, to construct a human genome-wide gene network, which we compared to a small gene network estimated for the genes extracted using a traditional bioinformatics method. The results showed that our genome-wide gene network contains many features of the small network, as well as others that could not be captured during the small network estimation. The results also revealed master-regulator genes that are not in the small network but that control many of the genes in the small network. These analyses were impossible to realize without our proposed algorithm.

  20. Sessile snails, dynamic genomes: gene rearrangements within the mitochondrial genome of a family of caenogastropod molluscs.

    PubMed

    Rawlings, Timothy A; MacInnis, Martin J; Bieler, Rüdiger; Boore, Jeffrey L; Collins, Timothy M

    2010-07-19

    Widespread sampling of vertebrates, which comprise the majority of published animal mitochondrial genomes, has led to the view that mitochondrial gene rearrangements are relatively rare, and that gene orders are typically stable across major taxonomic groups. In contrast, more limited sampling within the Phylum Mollusca has revealed an unusually high number of gene order arrangements. Here we provide evidence that the lability of the molluscan mitochondrial genome extends to the family level by describing extensive gene order changes that have occurred within the Vermetidae, a family of sessile marine gastropods that radiated from a basal caenogastropod stock during the Cenozoic Era. Major mitochondrial gene rearrangements have occurred within this family at a scale unexpected for such an evolutionarily young group and unprecedented for any caenogastropod examined to date. We determined the complete mitochondrial genomes of four species (Dendropoma maximum, D. gregarium, Eualetes tulipa, and Thylacodes squamigerus) and the partial mitochondrial genomes of two others (Vermetus erectus and Thylaeodus sp.). Each of the six vermetid gastropods assayed possessed a unique gene order. In addition to the typical mitochondrial genome complement of 37 genes, additional tRNA genes were evident in D. gregarium (trnK) and Thylacodes squamigerus (trnV, trnLUUR). Three pseudogenes and additional tRNAs found within the genome of Thylacodes squamigerus provide evidence of a past duplication event in this taxon. Likewise, high sequence similarities between isoaccepting leucine tRNAs in Thylacodes, Eualetes, and Thylaeodus suggest that tRNA remolding has been rife within this family. While vermetids exhibit gene arrangements diagnostic of this family, they also share arrangements with littorinimorph caenogastropods, with which they have been linked based on sperm morphology and primary sequence-based phylogenies. We have uncovered major changes in gene order within a family of

  1. Sessile snails, dynamic genomes: gene rearrangements within the mitochondrial genome of a family of caenogastropod molluscs

    PubMed Central

    2010-01-01

    Background Widespread sampling of vertebrates, which comprise the majority of published animal mitochondrial genomes, has led to the view that mitochondrial gene rearrangements are relatively rare, and that gene orders are typically stable across major taxonomic groups. In contrast, more limited sampling within the Phylum Mollusca has revealed an unusually high number of gene order arrangements. Here we provide evidence that the lability of the molluscan mitochondrial genome extends to the family level by describing extensive gene order changes that have occurred within the Vermetidae, a family of sessile marine gastropods that radiated from a basal caenogastropod stock during the Cenozoic Era. Results Major mitochondrial gene rearrangements have occurred within this family at a scale unexpected for such an evolutionarily young group and unprecedented for any caenogastropod examined to date. We determined the complete mitochondrial genomes of four species (Dendropoma maximum, D. gregarium, Eualetes tulipa, and Thylacodes squamigerus) and the partial mitochondrial genomes of two others (Vermetus erectus and Thylaeodus sp.). Each of the six vermetid gastropods assayed possessed a unique gene order. In addition to the typical mitochondrial genome complement of 37 genes, additional tRNA genes were evident in D. gregarium (trnK) and Thylacodes squamigerus (trnV, trnLUUR). Three pseudogenes and additional tRNAs found within the genome of Thylacodes squamigerus provide evidence of a past duplication event in this taxon. Likewise, high sequence similarities between isoaccepting leucine tRNAs in Thylacodes, Eualetes, and Thylaeodus suggest that tRNA remolding has been rife within this family. While vermetids exhibit gene arrangements diagnostic of this family, they also share arrangements with littorinimorph caenogastropods, with which they have been linked based on sperm morphology and primary sequence-based phylogenies. Conclusions We have uncovered major changes in gene

  2. Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer

    DOE PAGES

    Gregory, Ann C.; Solonenko, Sergei A.; Ignacio-Espinoza, J. Cesar; ...

    2016-11-16

    Genetic recombination is a driving force in genome evolution. Among viruses it has a dual role. For genomes with higher fitness, it maintains genome integrity in the face of high mutation rates. Conversely, for genomes with lower fitness, it provides immediate access to sequence space that cannot be reached by mutation alone. Understanding how recombination impacts the cohesion and dissolution of individual whole genomes within viral sequence space is poorly understood across double-stranded DNA bacteriophages (a.k.a phages) due to the challenges of obtaining appropriately scaled genomic datasets. Here in this study we explore the role of recombination in both maintainingmore » and differentiating whole genomes of 142 wild double-stranded DNA marine cyanophages. Phylogenomic analysis across the 51 core genes revealed ten lineages, six of which were well represented. These phylogenomic lineages represent discrete genotypic populations based on comparisons of intra- and inter- lineage shared gene content, genome-wide average nucleotide identity, as well as detected gaps in the distribution of pairwise differences between genomes. McDonald-Kreitman selection tests identified putative niche-differentiating genes under positive selection that differed across the six well-represented genotypic populations and that may have driven initial divergence. Concurrent with patterns of recombination of discrete populations, recombination analyses of both genic and intergenic regions largely revealed decreased genetic exchange across individual genomes between relative to within populations. Lastly, these findings suggest that discrete double-stranded DNA marine cyanophage populations occur in nature and are maintained by patterns of recombination akin to those observed in bacteria, archaea and in sexual eukaryotes.« less

  3. Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer

    SciTech Connect

    Gregory, Ann C.; Solonenko, Sergei A.; Ignacio-Espinoza, J. Cesar; LaButti, Kurt; Copeland, Alex; Sudek, Sebastian; Maitland, Ashley; Chittick, Lauren; dos Santos, Filipa; Weitz, Joshua S.; Worden, Alexandra Z.; Woyke, Tanja; Sullivan, Matthew B.

    2016-11-16

    Genetic recombination is a driving force in genome evolution. Among viruses it has a dual role. For genomes with higher fitness, it maintains genome integrity in the face of high mutation rates. Conversely, for genomes with lower fitness, it provides immediate access to sequence space that cannot be reached by mutation alone. Understanding how recombination impacts the cohesion and dissolution of individual whole genomes within viral sequence space is poorly understood across double-stranded DNA bacteriophages (a.k.a phages) due to the challenges of obtaining appropriately scaled genomic datasets. Here in this study we explore the role of recombination in both maintaining and differentiating whole genomes of 142 wild double-stranded DNA marine cyanophages. Phylogenomic analysis across the 51 core genes revealed ten lineages, six of which were well represented. These phylogenomic lineages represent discrete genotypic populations based on comparisons of intra- and inter- lineage shared gene content, genome-wide average nucleotide identity, as well as detected gaps in the distribution of pairwise differences between genomes. McDonald-Kreitman selection tests identified putative niche-differentiating genes under positive selection that differed across the six well-represented genotypic populations and that may have driven initial divergence. Concurrent with patterns of recombination of discrete populations, recombination analyses of both genic and intergenic regions largely revealed decreased genetic exchange across individual genomes between relative to within populations. Lastly, these findings suggest that discrete double-stranded DNA marine cyanophage populations occur in nature and are maintained by patterns of recombination akin to those observed in bacteria, archaea and in sexual eukaryotes.

  4. Genomes, diversity and resistance gene analogues in Musa species.

    PubMed

    Azhar, M; Heslop-Harrison, J S

    2008-01-01

    Resistance genes (R genes) in plants are abundant and may represent more than 1% of all the genes. Their diversity is critical to the recognition and response to attack from diverse pathogens. Like many other crops, banana and plantain face attacks from potentially devastating fungal and bacterial diseases, increased by a combination of worldwide spread of pathogens, exploitation of a small number of varieties, new pathogen mutations, and the lack of effective, benign and cheap chemical control. The challenge for plant breeders is to identify and exploit genetic resistances to diseases, which is particularly difficult in banana and plantain where the valuable cultivars are sterile, parthenocarpic and mostly triploid so conventional genetic analysis and breeding is impossible. In this paper, we review the nature of R genes and the key motifs, particularly in the Nucleotide Binding Sites (NBS), Leucine Rich Repeat (LRR) gene class. We present data about identity, nature and evolutionary diversity of the NBS domains of Musa R genes in diploid wild species with the Musa acuminata (A), M. balbisiana (B), M. schizocarpa (S), M. textilis (T), M. velutina and M. ornata genomes, and from various cultivated hybrid and triploid accessions, using PCR primers to isolate the domains from genomic DNA. Of 135 new sequences, 75% of the sequenced clones had uninterrupted open reading frames (ORFs), and phylogenetic UPGMA tree construction showed four clusters, one from Musa ornata, one largely from the B and T genomes, one from A and M. velutina, and the largest with A, B, T and S genomes. Only genes of the coiled-coil (non-TIR) class were found, typical of the grasses and presumably monocotyledons. The analysis of R genes in cultivated banana and plantain, and their wild relatives, has implications for identification and selection of resistance genes within the genus which may be useful for plant selection and breeding and also for defining relationships and genome evolution

  5. Detection of Prokaryotic Genes in the Amphimedon queenslandica Genome

    PubMed Central

    Conaco, Cecilia; Tsoulfas, Pantelis; Sakarya, Onur; Dolan, Amanda; Werren, John; Kosik, Kenneth S.

    2016-01-01

    Horizontal gene transfer (HGT) is common between prokaryotes and phagotrophic eukaryotes. In metazoans, the scale and significance of HGT remains largely unexplored but is usually linked to a close association with parasites and endosymbionts. Marine sponges (Porifera), which host many microorganisms in their tissues and lack an isolated germ line, are potential carriers of genes transferred from prokaryotes. In this study, we identified a number of potential horizontally transferred genes within the genome of the sponge, Amphimedon queenslandica. We further identified homologs of some of these genes in other sponges. The transferred genes, most of which possess catalytic activity for carbohydrate or protein metabolism, have assimilated host genome characteristics and are actively expressed. The diversity of functions contributed by the horizontally transferred genes is likely an important factor in the adaptation and evolution of A. queenslandica. These findings highlight the potential importance of HGT on the success of sponges in diverse ecological niches. PMID:26959231

  6. Beyond Genomics: Studying Evolution with Gene Coexpression Networks.

    PubMed

    Ruprecht, Colin; Vaid, Neha; Proost, Sebastian; Persson, Staffan; Mutwil, Marek

    2017-04-01

    Understanding how genomes change as organisms become more complex is a central question in evolution. Molecular evolutionary studies typically correlate the appearance of genes and gene families with the emergence of biological pathways and morphological features. While such approaches are of great importance to understand how organisms evolve, they are also limited, as functionally related genes work together in contexts of dynamic gene networks. Since functionally related genes are often transcriptionally coregulated, gene coexpression networks present a resource to study the evolution of biological pathways. In this opinion article, we discuss recent developments in this field and how coexpression analyses can be merged with existing genomic approaches to transfer functional knowledge between species to study the appearance or extension of pathways.

  7. The evolution of chloroplast genome structure in ferns.

    PubMed

    Wolf, Paul G; Roper, Jessie M; Duffy, Aaron M

    2010-09-01

    The plastid genome (plastome) is a rich source of phylogenetic and other comparative data in plants. Most land plants possess a plastome of similar structure. However, in a major group of plants, the ferns, a unique plastome structure has evolved. The gene order in ferns has been explained by a series of genomic inversions relative to the plastome organization of seed plants. Here, we examine for the first time the structure of the plastome across fern phylogeny. We used a PCR-based strategy to map and partially sequence plastomes. We found that a pair of partially overlapping inversions in the region of the inverted repeat occurred in the common ancestor of most ferns. However, the ancestral (seed plant) structure is still found in early diverging branches leading to the osmundoid and filmy fern lineages. We found that a second pair of overlapping inversions occurred on a branch leading to the core leptosporangiates. We also found that the unique placement of the gene matK in ferns (lacking a flanking intron) is not a result of a large-scale inversion, as previously thought. This is because the intron loss maps to an earlier point on the phylogeny than the nearby inversion. We speculate on why inversions may occur in pairs and what this may mean for the dynamics of plastome evolution.

  8. Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome.

    PubMed

    Muñoz-Amatriaín, María; Lonardi, Stefano; Luo, MingCheng; Madishetty, Kavitha; Svensson, Jan T; Moscou, Matthew J; Wanamaker, Steve; Jiang, Tao; Kleinhofs, Andris; Muehlbauer, Gary J; Wise, Roger P; Stein, Nils; Ma, Yaqin; Rodriguez, Edmundo; Kudrna, Dave; Bhat, Prasanna R; Chao, Shiaoman; Condamine, Pascal; Heinen, Shane; Resnik, Josh; Wing, Rod; Witt, Heather N; Alpert, Matthew; Beccuti, Marco; Bozdag, Serdar; Cordero, Francesca; Mirebrahim, Hamid; Ounit, Rachid; Wu, Yonghui; You, Frank; Zheng, Jie; Simková, Hana; Dolezel, Jaroslav; Grimwood, Jane; Schmutz, Jeremy; Duma, Denisa; Altschmied, Lothar; Blake, Tom; Bregitzer, Phil; Cooper, Laurel; Dilbirligi, Muharrem; Falk, Anders; Feiz, Leila; Graner, Andreas; Gustafson, Perry; Hayes, Patrick M; Lemaux, Peggy; Mammadov, Jafar; Close, Timothy J

    2015-10-01

    Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant.

  9. Genome engineering using a synthetic gene circuit in Bacillus subtilis

    PubMed Central

    Jeong, Da-Eun; Park, Seung-Hwan; Pan, Jae-Gu; Kim, Eui-Joong; Choi, Soo-Keun

    2015-01-01

    Genome engineering without leaving foreign DNA behind requires an efficient counter-selectable marker system. Here, we developed a genome engineering method in Bacillus subtilis using a synthetic gene circuit as a counter-selectable marker system. The system contained two repressible promoters (B. subtilis xylA (Pxyl) and spac (Pspac)) and two repressor genes (lacI and xylR). Pxyl-lacI was integrated into the B. subtilis genome with a target gene containing a desired mutation. The xylR and Pspac–chloramphenicol resistant genes (cat) were located on a helper plasmid. In the presence of xylose, repression of XylR by xylose induced LacI expression, the LacIs repressed the Pspac promoter and the cells become chloramphenicol sensitive. Thus, to survive in the presence of chloramphenicol, the cell must delete Pxyl-lacI by recombination between the wild-type and mutated target genes. The recombination leads to mutation of the target gene. The remaining helper plasmid was removed easily under the chloramphenicol absent condition. In this study, we showed base insertion, deletion and point mutation of the B. subtilis genome without leaving any foreign DNA behind. Additionally, we successfully deleted a 2-kb gene (amyE) and a 38-kb operon (ppsABCDE). This method will be useful to construct designer Bacillus strains for various industrial applications. PMID:25552415

  10. Genome engineering using a synthetic gene circuit in Bacillus subtilis.

    PubMed

    Jeong, Da-Eun; Park, Seung-Hwan; Pan, Jae-Gu; Kim, Eui-Joong; Choi, Soo-Keun

    2015-03-31

    Genome engineering without leaving foreign DNA behind requires an efficient counter-selectable marker system. Here, we developed a genome engineering method in Bacillus subtilis using a synthetic gene circuit as a counter-selectable marker system. The system contained two repressible promoters (B. subtilis xylA (Pxyl) and spac (Pspac)) and two repressor genes (lacI and xylR). Pxyl-lacI was integrated into the B. subtilis genome with a target gene containing a desired mutation. The xylR and Pspac-chloramphenicol resistant genes (cat) were located on a helper plasmid. In the presence of xylose, repression of XylR by xylose induced LacI expression, the LacIs repressed the Pspac promoter and the cells become chloramphenicol sensitive. Thus, to survive in the presence of chloramphenicol, the cell must delete Pxyl-lacI by recombination between the wild-type and mutated target genes. The recombination leads to mutation of the target gene. The remaining helper plasmid was removed easily under the chloramphenicol absent condition. In this study, we showed base insertion, deletion and point mutation of the B. subtilis genome without leaving any foreign DNA behind. Additionally, we successfully deleted a 2-kb gene (amyE) and a 38-kb operon (ppsABCDE). This method will be useful to construct designer Bacillus strains for various industrial applications.

  11. Whole genome phylogeny of Prochlorococcus marinus group of cyanobacteria: genome alignment and overlapping gene approach.

    PubMed

    Prabha, Ratna; Singh, Dhananjaya P; Gupta, Shailendra K; Rai, Anil

    2014-06-01

    Prochlorococcus is the smallest known oxygenic phototrophic marine cyanobacterium dominating the mid-latitude oceans. Physiologically and genetically distinct P. marinus isolates from many oceans in the world were assigned two different groups, a tightly clustered high-light (HL)-adapted and a divergent low-light (LL-) adapted clade. Phylogenetic analysis of this cyanobacterium on the basis of 16S rRNA and other conserved genes did not show consistency with its phenotypic behavior. We analyzed phylogeny of this genus on the basis of complete genome sequences through genome alignment, overlapping-gene content and gene-order approach. Phylogenetic tree of P. marinus obtained by comparing whole genome sequences in contrast to that based on 16S rRNA gene, corresponded well with the HL/LL ecotypic distinction of twelve strains and showed consistency with phenotypic classification of P. marinus. Evidence for the horizontal descent and acquisition of genes within and across the genus was observed. Many genes involved in metabolic functions were found to be conserved across these genomes and many were continuously gained by different strains as per their needs during the course of their evolution. Consistency in the physiological and genetic phylogeny based on whole genome sequence is established. These observations improve our understanding about the adaptation and diversification of these organisms under evolutionary pressure.

  12. Genome instability mechanisms and the structure of cancer genomes.

    PubMed

    Cassidy, Liam D; Venkitaraman, Ashok R

    2012-02-01

    Genomic instability is a hallmark of cancer cells, and arises from the aberrations that these cells exhibit in the normal biological mechanisms that repair and replicate the genome, or ensure its accurate segregation during cell division. Increasingly detailed descriptions of cancer genomes have begun to emerge from next-generation sequencing (NGS), providing snapshots of their nature and heterogeneity in different cancers at different stages in their evolution. Here, we attempt to extract from these sequencing studies insights into the role of genome instability mechanisms in carcinogenesis, and to identify challenges impeding further progress.

  13. Genome-wide analysis of the MADS-box gene family in Brachypodium distachyon.

    PubMed

    Wei, Bo; Zhang, Rong-Zhi; Guo, Juan-Juan; Liu, Dan-Mei; Li, Ai-Li; Fan, Ren-Chun; Mao, Long; Zhang, Xiang-Qi

    2014-01-01

    MADS-box genes are important transcription factors for plant development, especially floral organogenesis. Brachypodium distachyon is a model for biofuel plants and temperate grasses such as wheat and barley, but a comprehensive analysis of MADS-box family proteins in Brachypodium is still missing. We report here a genome-wide analysis of the MADS-box gene family in Brachypodium distachyon. We identified 57 MADS-box genes and classified them into 32 MIKC(c)-type, 7 MIKC*-type, 9 Mα, 7 Mβ and 2 Mγ MADS-box genes according to their phylogenetic relationships to the Arabidopsis and rice MADS-box genes. Detailed gene structure and motif distribution were then studied. Investigation of their chromosomal localizations revealed that Brachypodium MADS-box genes distributed evenly across five chromosomes. In addition, five pairs of type II MADS-box genes were found on synteny blocks derived from whole genome duplication blocks. We then performed a systematic expression analysis of Brachypodium MADS-box genes in various tissues, particular floral organs. Further detection under salt, drought, and low-temperature conditions showed that some MADS-box genes may also be involved in abiotic stress responses, including type I genes. Comparative studies of MADS-box genes among Brachypodium, rice and Arabidopsis showed that Brachypodium had fewer gene duplication events. Taken together, this work provides useful data for further functional studies of MADS-box genes in Brachypodium distachyon.

  14. Sequencing rare marine actinomycete genomes reveals high density of unique natural product biosynthetic gene clusters.

    PubMed

    Schorn, Michelle A; Alanjary, Mohammad M; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R; Ziemert, Nadine; Moore, Bradley S

    2016-12-01

    Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites.

  15. Nematoda: genes, genomes and the evolution of parasitism.

    PubMed

    Blaxter, Mark L

    2003-01-01

    Nematodes are remarkably successful, both as free-living organisms and as parasites. The diversity of parasitic lifestyles displayed by nematodes, and the diversity of hosts used, reflects both a propensity towards parasitism in the phylum, and an adaptability to new and challenging environments. Parasitism of plants and animals has evolved many times independently within the Nematoda. Analysis of these origins of parasitism using a molecular phylogeny highlights the diversity underlying the parasitic mode of life. Many vertebrate parasites have arthropod-associated sister taxa, and most invade their hosts as third stage larvae: these features co-occur across the tree and thus suggest that this may have been a shared route to parasitism. Analysis of nematode genes and genomes has been greatly facilitated by the Caenorhabditis elegans project. However, the availability of the whole genome sequence from this free-living rhabditid does not simply permit definition of 'parasitism' genes; each nematode genome is a mosaic of conserved features and evolutionary novelties. The rapid progress of parasitic nematode genome projects focussing on species from across the diversity of the phylum has defined sets of genes that have patterns of evolution that suggest their involvement with various facets of parasitism, in particular the problems of acquisition of nutrients in new hosts and the evasion of host immune defences. With the advent of functional genomics techniques in parasites, and in particular the possibility of gene knockout using RNA interference, the roles of many putative parasitism genes call now be tested.

  16. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    PubMed

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-12-11

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.

  17. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome

    PubMed Central

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-01-01

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first