Science.gov

Sample records for gene genomic structure

  1. Gene3D: Structural Assignment for Whole Genes and Genomes Using the CATH Domain Structure Database

    PubMed Central

    Buchan, Daniel W.A.; Shepherd, Adrian J.; Lee, David; Pearl, Frances M.G.; Rison, Stuart C.G.; Thornton, Janet M.; Orengo, Christine A.

    2002-01-01

    We present a novel web-based resource, Gene3D, of precalculated structural assignments to gene sequences and whole genomes. This resource assigns structural domains from the CATH database to whole genes and links these to their curated functional and structural annotations within the CATH domain structure database, the functional Dictionary of Homologous Superfamilies (DHS) and PDBsum. Currently Gene3D provides annotation for 36 complete genomes (two eukaryotes, six archaea, and 28 bacteria). On average, between 30% and 40% of the genes of a given genome can be structurally annotated. Matches to structural domains are found using the profile-based method (PSI-BLAST). and a novel protocol, DRange, is used to resolve conflicts in matches involving different homologous superfamilies. PMID:11875040

  2. The Complete Chloroplast Genome Sequence of Podocarpus lambertii: Genome Structure, Evolutionary Aspects, Gene Content and SSR Detection

    PubMed Central

    Vieira, Leila do Nascimento; Faoro, Helisson; Rogalski, Marcelo; Fraga, Hugo Pacheco de Freitas; Cardoso, Rodrigo Luis Alves; de Souza, Emanuel Maltempi; de Oliveira Pedrosa, Fábio; Nodari, Rubens Onofre; Guerra, Miguel Pedro

    2014-01-01

    Background Podocarpus lambertii (Podocarpaceae) is a native conifer from the Brazilian Atlantic Forest Biome, which is considered one of the 25 biodiversity hotspots in the world. The advancement of next-generation sequencing technologies has enabled the rapid acquisition of whole chloroplast (cp) genome sequences at low cost. Several studies have proven the potential of cp genomes as tools to understand enigmatic and basal phylogenetic relationships at different taxonomic levels, as well as further probe the structural and functional evolution of plants. In this work, we present the complete cp genome sequence of P. lambertii. Methodology/Principal Findings The P. lambertii cp genome is 133,734 bp in length, and similar to other sequenced cupressophytes, it lacks one of the large inverted repeat regions (IR). It contains 118 unique genes and one duplicated tRNA (trnN-GUU), which occurs as an inverted repeat sequence. The rps16 gene was not found, which was previously reported for the plastid genome of another Podocarpaceae (Nageia nagi) and Araucariaceae (Agathis dammara). Structurally, P. lambertii shows 4 inversions of a large DNA fragment ∼20,000 bp compared to the Podocarpus totara cp genome. These unexpected characteristics may be attributed to geographical distance and different adaptive needs. The P. lambertii cp genome presents a total of 28 tandem repeats and 156 SSRs, with homo- and dipolymers being the most common and tri-, tetra-, penta-, and hexapolymers occurring with less frequency. Conclusion The complete cp genome sequence of P. lambertii revealed significant structural changes, even in species from the same genus. These results reinforce the apparently loss of rps16 gene in Podocarpaceae cp genome. In addition, several SSRs in the P. lambertii cp genome are likely intraspecific polymorphism sites, which may allow highly sensitive phylogeographic and population structure studies, as well as phylogenetic studies of species of this genus. PMID

  3. Comparative Genomics of Sibling Fungal Pathogenic Taxa Identifies Adaptive Evolution without Divergence in Pathogenicity Genes or Genomic Structure

    PubMed Central

    Sillo, Fabiano; Garbelotto, Matteo; Friedman, Maria; Gonthier, Paolo

    2015-01-01

    It has been estimated that the sister plant pathogenic fungal species Heterobasidion irregulare and Heterobasidion annosum may have been allopatrically isolated for 34–41 Myr. They are now sympatric due to the introduction of the first species from North America into Italy, where they freely hybridize. We used a comparative genomic approach to 1) confirm that the two species are distinct at the genomic level; 2) determine which gene groups have diverged the most and the least between species; 3) show that their overall genomic structures are similar, as predicted by the viability of hybrids, and identify genomic regions that instead are incongruent; and 4) test the previously formulated hypothesis that genes involved in pathogenicity may be less divergent between the two species than genes involved in saprobic decay and sporulation. Results based on the sequencing of three genomes per species identified a high level of interspecific similarity, but clearly confirmed the status of the two as distinct taxa. Genes involved in pathogenicity were more conserved between species than genes involved in saprobic growth and sporulation, corroborating at the genomic level that invasiveness may be determined by the two latter traits, as documented by field and inoculation studies. Additionally, the majority of genes under positive selection and the majority of genes bearing interspecific structural variations were involved either in transcriptional or in mitochondrial functions. This study provides genomic-level evidence that invasiveness of pathogenic microbes can be attained without the high levels of pathogenicity presumed to exist for pathogens challenging naïve hosts. PMID:26527650

  4. Coding exon-structure aware realigner (CESAR) utilizes genome alignments for accurate comparative gene annotation

    PubMed Central

    Sharma, Virag; Elghafari, Anas; Hiller, Michael

    2016-01-01

    Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource (http://bds.mpi-cbg.de/hillerlab/CESAR/). CESAR (https://github.com/hillerlab/CESAR/) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes. PMID:27016733

  5. The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome.

    PubMed Central

    Katju, Vaishali; Lynch, Michael

    2003-01-01

    The significance of gene duplication in provisioning raw materials for the evolution of genomic diversity is widely recognized, but the early evolutionary dynamics of duplicate genes remain obscure. To elucidate the structural characteristics of newly arisen gene duplicates at infancy and their subsequent evolutionary properties, we analyzed gene pairs with < or =10% divergence at synonymous sites within the genome of Caenorhabditis elegans. Structural heterogeneity between duplicate copies is present very early in their evolutionary history and is maintained over longer evolutionary timescales, suggesting that duplications across gene boundaries in conjunction with shuffling events have at least as much potential to contribute to long-term evolution as do fully redundant (complete) duplicates. The median duplication span of 1.4 kb falls short of the average gene length in C. elegans (2.5 kb), suggesting that partial gene duplications are frequent. Most gene duplicates reside close to the parent copy at inception, often as tandem inverted loci, and appear to disperse in the genome as they age, as a result of reduced survivorship of duplicates located in proximity to the ancestral copy. We propose that illegitimate recombination events leading to inverted duplications play a disproportionately large role in gene duplication within this genome in comparison with other mechanisms. PMID:14704166

  6. The structure and early evolution of recently arisen gene duplicates in the Caenorhabditis elegans genome.

    PubMed

    Katju, Vaishali; Lynch, Michael

    2003-12-01

    The significance of gene duplication in provisioning raw materials for the evolution of genomic diversity is widely recognized, but the early evolutionary dynamics of duplicate genes remain obscure. To elucidate the structural characteristics of newly arisen gene duplicates at infancy and their subsequent evolutionary properties, we analyzed gene pairs with < or =10% divergence at synonymous sites within the genome of Caenorhabditis elegans. Structural heterogeneity between duplicate copies is present very early in their evolutionary history and is maintained over longer evolutionary timescales, suggesting that duplications across gene boundaries in conjunction with shuffling events have at least as much potential to contribute to long-term evolution as do fully redundant (complete) duplicates. The median duplication span of 1.4 kb falls short of the average gene length in C. elegans (2.5 kb), suggesting that partial gene duplications are frequent. Most gene duplicates reside close to the parent copy at inception, often as tandem inverted loci, and appear to disperse in the genome as they age, as a result of reduced survivorship of duplicates located in proximity to the ancestral copy. We propose that illegitimate recombination events leading to inverted duplications play a disproportionately large role in gene duplication within this genome in comparison with other mechanisms.

  7. Genomic structure and nucleotide sequence of the p55 gene of the puffer fish Fugu rubripes

    SciTech Connect

    Elgar, G.; Rattray, F.; Greystrong, J.; Brenner, S.

    1995-06-10

    The p55 gene, which codes for a 55-kDa erythrocyte membrane protein, has been cloned and sequenced from the genome of the Japanese puffer fish Fugu rubripes (Fugu). This organism has the smallest recorded vertebrate genome and therefore provides an efficient way to sequence genes at the genomic level. The gene encoding p55 covers 5.5 kb from the beginning to the end of the coding sequence, four to six times smaller than the estimated size of the human gene, and is encoded by 12 exons. The structure of this gene has not been previously elucidated, but from this and other data we would predict a similar or identical structure in mammals. The predicted amino acid sequence of this gene in Fugu, coding for a polypeptide of 467 amino acids, is very similar to that of the human gene with the exception of the first two exons, which differ considerably. The predicted Fugu protein has a molecular weight (52.6 kDa compared with 52.3 kDa) and an isoelectric point very similar to those of human p55. In human, the p55 gene lies in the gene-dense Xq28 region, just 30 kb 3{prime} to the Factor VIII gene, and is estimated to cover 20-30 kb. Its 5{prime} end is associated with a CpG island, although there is no evidence that this is the case in Fugu. The small size of genes in Fugu and the high coding homology that they share with their mammalian equivalents, both in structure and sequence, make this compact vertebrate genome an ideal model for genomic studies. 23 refs., 3 figs.

  8. Alpha tubulin genes from Leishmania braziliensis: genomic organization, gene structure and insights on their expression

    PubMed Central

    2013-01-01

    Background Alpha tubulin is a fundamental component of the cytoskeleton which is responsible for cell shape and is involved in cell division, ciliary and flagellar motility and intracellular transport. Alpha tubulin gene expression varies according to the morphological changes suffered by Leishmania in its life cycle. However, the objective of studying the mechanisms responsible for the differential expression has resulted to be a difficult task due to the complex genome organization of tubulin genes and to the non-conventional mechanisms of gene regulation operating in Leishmania. Results We started this work by analyzing the genomic organization of α-tubulin genes in the Leishmania braziliensis genome database. The genomic organization of L. braziliensis α-tubulin genes differs from that existing in the L. major and L. infantum genomes. Two loci containing α-tubulin genes were found in the chromosomes 13 and 29, even though the existence of sequence gaps does not allow knowing the exact number of genes at each locus. Southern blot assays showed that α-tubulin locus at chromosome 13 contains at least 8 gene copies, which are tandemly organized with a 2.08-kb repetition unit; the locus at chromosome 29 seems to contain a sole α-tubulin gene. In addition, it was found that L. braziliensis α-tubulin locus at chromosome 13 contains two types of α-tubulin genes differing in their 3′ UTR, each one presumably containing different regulatory motifs. It was also determined that the mRNA expression levels of these genes are controlled by post-transcriptional mechanisms tightly linked to the growth temperature. Moreover, the decrease in the α-tubulin mRNA abundance observed when promastigotes were cultured at 35°C was accompanied by parasite morphology alterations, similar to that occurring during the promastigote to amastigote differentiation. Conclusions Information found in the genome databases indicates that α-tubulin genes have been reorganized in a drastic

  9. Structural Relationships between Highly Conserved Elements and Genes in Vertebrate Genomes

    PubMed Central

    Sun, Hong; Skogerbø, Geir; Wang, Zhen; Liu, Wei; Li, Yixue

    2008-01-01

    Large numbers of sequence elements have been identified to be highly conserved among vertebrate genomes. These highly conserved elements (HCEs) are often located in or around genes that are involved in transcription regulation and early development. They have been shown to be involved in cis-regulatory activities through both in vivo and additional computational studies. We have investigated the structural relationships between such elements and genes in six vertebrate genomes human, mouse, rat, chicken, zebrafish and tetraodon and detected several thousand cases of conserved HCE-gene associations, and also cases of HCEs with no common target genes. A few examples underscore the potential significance of our findings about several individual genes. We found that the conserved association between HCE/HCEs and gene/genes are not restricted to elements by their absolute distance on the genome. Notably, long-range associations were identified and the molecular functions of the associated genes do not show any particular overrepresentation of the functional categories previously reported. HCEs in close proximity are found to be linked with different set of gene/genes. The results reflect the highly complex correlation between HCEs and their putative target genes. PMID:19008958

  10. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags.

    PubMed

    Xu, Y; Mural, R J; Uberbacher, E C

    1997-01-01

    Computational methods for gene identification in genomic sequences typically have two phases: coding region prediction and gene parsing. While there are many effective methods for predicting coding regions (exons), parsing the predicted exons into proper gene structures, to a large extent, remains an unsolved problem. This paper presents an algorithm for inferring gene structures from predicted exon candidates, based on Expressed Sequence Tags (ESTs) and biological intuition/rules. The algorithm first finds all the related ESTs in the EST database (dbEST) for each predicted exon, and infers the boundaries of one or a series of genes based on the available EST information and biological rules. Then it constructs gene models within each pair of gene boundaries, that are most consistent with the EST information. By exploiting EST information and biological rules, the algorithm can (1) model complicated multiple gene structures, including embedded genes, (2) identify falsely-predicted exons and locate missed exons, and (3) make more accurate exon boundary predictions. The algorithm has been implemented and tested on long genomic sequences with a number of genes. Test results show that very accurate (predicted) gene models can be expected when related ESTs exist for the predicted exons.

  11. Inferring gene structures in genomic sequences using pattern recognition and expressed sequence tags

    SciTech Connect

    Xu, Y.; Mural, R.; Uberbacher, E.

    1997-02-01

    Computational methods for gene identification in genomic sequences typically have two phases: coding region prediction and gene parsing. While there are many effective methods for predicting coding regions (exons), parsing the predicted exons into proper gene structures, to a large extent, remains an unsolved problem. This paper presents an algorithm for inferring gene structures from predicted exon candidates, based on Expressed Sequence Tags (ESTs) and biological intuition/rules. The algorithm first finds all the related ESTs in the EST database (dbEST) for each predicted exon, and infers the boundaries of one or a series of genes based on the available EST information and biological rules. Then it constructs gene models within each pair of gene boundaries, that are most consistent with the EST information. By exploiting EST information and biological rules, the algorithm can (1) model complicated multiple gene structures, including embedded genes, (2) identify falsely-predicted exons and locate missed exons, and (3) make more accurate exon boundary predictions. The algorithm has been implemented and tested on long genomic sequences with a number of genes. Test results show that very accurate (predicted) gene models can be expected when related ESTs exist for the predicted exons.

  12. Genomic structure of the human BCCIP gene and its expression in cancer.

    PubMed

    Meng, Xiangbing; Liu, Jingmei; Shen, Zhiyuan

    2003-01-02

    Human BCCIPalpha (Tok-1alpha) is a BRCA2 and CDKN1A (Cip1, p21) interacting protein. Our previous studies have showed that overexpression of BCCIPalpha inhibits the growth of certain tumor cells [Oncogene 20 (2001) 336]. In this study, we report the genomic structure of the human BCCIP gene, which contains nine exons. Alternative splicing of the 3'-terminal exons produces two isoforms of BCCIP transcripts, BCCIPalpha and BCCIPbeta. The BCCIP gene is flanked by two genes that are transcribed in the opposite orientation of the BCCIP gene. It lies head-to-head and shares a bi-directional promoter with the uroporphyrinogen III synthase (UROS) gene. The last three exons of BCCIP gene overlap the 3'-terminal seven exons of a DEAD/H helicase-like gene (DDX32). Using a matched normal/tumor cDNA array, we identified a reduced expression of BCCIP in kidney tumor, suggesting a role of BCCIP in cancer etiology.

  13. Structural genomics of highly conserved microbial genes of unknown function in search of new antibacterial targets.

    PubMed

    Abergel, Chantal; Coutard, Bruno; Byrne, Deborah; Chenivesse, Sabine; Claude, Jean-Baptiste; Deregnaucourt, Céline; Fricaux, Thierry; Gianesini-Boutreux, Celine; Jeudy, Sandra; Lebrun, Régine; Maza, Caroline; Notredame, Cédric; Poirot, Olivier; Suhre, Karsten; Varagnol, Majorie; Claverie, Jean-Michel

    2003-01-01

    With more than 100 antibacterial drugs at our disposal in the 1980's, the problem of bacterial infection was considered solved. Today, however, most hospital infections are insensitive to several classes of antibacterial drugs, and deadly strains of Staphylococcus aureus resistant to vancomycin--the last resort antibiotic--have recently begin to appear. Other life-threatening microbes, such as Enterococcus faecalis and Mycobacterium tuberculosis are already able to resist every available antibiotic. There is thus an urgent, and continuous need for new, preferably large-spectrum, antibacterial molecules, ideally targeting new biochemical pathways. Here we report on the progress of our structural genomics program aiming at the discovery of new antibacterial gene targets among evolutionary conserved genes of uncharacterized function. A series of bioinformatic and comparative genomics analyses were used to identify a set of 221 candidate genes common to Gram-positive and Gram-negative bacteria. These genes were split between two laboratories. They are now submitted to a systematic 3-D structure determination protocol including cloning, protein expression and purification, crystallization, X-ray diffraction, structure interpretation, and function prediction. We describe here our strategies for the 111 genes processed in our laboratory. Bioinformatics is used at most stages of the production process and out of 111 genes processed--and 17 months into the project--108 have been successfully cloned, 103 have exhibited detectable expression, 84 have led to the production of soluble protein, 46 have been purified, 12 have led to usable crystals, and 7 structures have been determined.

  14. Gene network inference via structural equation modeling in genetical genomics experiments.

    PubMed

    Liu, Bing; de la Fuente, Alberto; Hoeschele, Ina

    2008-03-01

    Our goal is gene network inference in genetical genomics or systems genetics experiments. For species where sequence information is available, we first perform expression quantitative trait locus (eQTL) mapping by jointly utilizing cis-, cis-trans-, and trans-regulation. After using local structural models to identify regulator-target pairs for each eQTL, we construct an encompassing directed network (EDN) by assembling all retained regulator-target relationships. The EDN has nodes corresponding to expressed genes and eQTL and directed edges from eQTL to cis-regulated target genes, from cis-regulated genes to cis-trans-regulated target genes, from trans-regulator genes to target genes, and from trans-eQTL to target genes. For network inference within the strongly constrained search space defined by the EDN, we propose structural equation modeling (SEM), because it can model cyclic networks and the EDN indeed contains feedback relationships. On the basis of a factorization of the likelihood and the constrained search space, our SEM algorithm infers networks involving several hundred genes and eQTL. Structure inference is based on a penalized likelihood ratio and an adaptation of Occam's window model selection. The SEM algorithm was evaluated using data simulated with nonlinear ordinary differential equations and known cyclic network topologies and was applied to a real yeast data set.

  15. Structural Genomics: From Genes to Structures With Valuable Materials And Many Questions in Between

    SciTech Connect

    Fox, B.G.; Goulding, C.; Malkowski, M.G.; Stewart, L.; Deacon, A.; /SLAC, SSRL

    2009-04-30

    The Protein Structure Initiative (PSI), funded by the US National Institutes of Health (NIH), provides a framework for the development and systematic evaluation of methods to solve protein structures. Although the PSI and other structural genomics efforts around the world have led to the solution of many new protein structures as well as the development of new methods, methodological bottlenecks still exist and are being addressed in this 'production phase' of PSI.

  16. The mouse Fau gene: genomic structure, chromosomal localization, and characterization of two retropseudogenes.

    PubMed

    Casteels, D; Poirier, C; Guénet, J L; Merregaert, J

    1995-01-01

    The Fau gene is the cellular homolog of the fox sequence of the Finkel-Biskis-Reilly murine sarcoma virus (FBR-MuSV). FBR-MuSV acquired the Fau gene by transduction in a transcriptional orientation opposite to that of the genomic Fau gene. The genomic structure of the mouse Fau gene (MMFAU) and its upstream elements have been determined and are similar to those of the human FAU gene. The gene consists of five exons and is located on chromosome 19. The first exon is not translated. The promoter region has no well-defined TATA box but contains the polypyrimidine initiator flanked by regions of high GC content (65%) and shows all of the characteristics of a housekeeping gene. The 5' end of the mRNA transcript was determined by 5' RACE analysis and is located, as expected, in the polypyrimidine initiator site. Furthermore, the sequences of two retropseudogenes (Fau-ps1 and Fau-ps2) are reported. Both pseudogenes are approximately 75% identical to the Fau cDNA, but both are shorter due to a deletion at the 5' end and do not encode a functional protein. Fau-prs is interrupted by an AG-rich region of about 350 bp within the S30 region of the Fau cDNA. Fau-ps1 was localized on chromosome 1 and Fau-ps2 on chromosome 7.

  17. Overview of PSB track on gene structure identification in large-scale genomic sequence

    SciTech Connect

    Uberbacher, E.C.; Xu, Y.

    1998-12-31

    The recent funding of more than a dozen major genome centers to begin community-wide high-throughput sequencing of the human genome has created a significant new challenge for the computational analysis of DNA sequence and the prediction of gene structure and function. It has been estimated that on average from 1996 to 2003, approximately 2 million bases of newly finished DNA sequence will be produced every day and be made available on the Internet and in central databases. The finished (fully assembled) sequence generated each day will represent approximately 75 new genes (and their respective proteins), and many times this number will be represented in partially completed sequences. The information contained in these is of immeasurable value to medical research, biotechnology, the pharmaceutical industry and researchers in a host of fields ranging from microorganism metabolism, to structural biology, to bioremediation. Sequencing of microorganisms and other model organisms is also ramping up at a very rapid rate. The genomes for yeast and several microorganisms such as H. influenza have recently been fully sequenced, although the significance of many genes remains to be determined.

  18. Computational Integration of Structural and Functional Genomics Data Across Species to Develop Information on Porcine Inflammatory Gene Regulatory Pathway

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Comparative integration of structural and functional genomic data across species holds great promise in finding genes controlling disease resistance. We are investigating the porcine gut immune response to infection through gene expression profiling. We have collected porcine Affymetrix GeneChip da...

  19. The genomic structure of the gene encoding the human transforming growth factor {beta} type II receptor (TGF-{beta} RII)

    SciTech Connect

    Takenoshita, Seiichi; Hagiwara, Koichi; Nagashima, Makoto; Gemma, Akihiko

    1996-09-01

    The genomic structure of the human transforming growth factor-{beta} type II receptor gene (TGF-{beta} RII) was determined by two PCR-based methods, the {open_quotes}long distance sequencer{close_quotes} method and the {open_quotes}promoter finder{close_quotes} method. Genomic fragments containing exons and adjacent introns were amplified by PCR, and the nucleotide sequences were determined by direct sequencing and subcloning sequencing. The TGF-{beta} RII protein is encoded by 567 codons in 7 exons. This is the first report about the genomic structure of a gene that belongs to the serine/threonine kinase type II receptor subfamily. Knowledge of the genomic structure of the TGF-{beta} RII gene will facilitate investigation of the TGF-{beta} RII gene will facilitate investigation of the TGF-{beta} signaling pathway in normal human cells and of the aberrations occurring during carcinogenesis. 18 refs., 2 figs., 1 tab.

  20. Genomic structure and expression of STM2, the chromosome 1 familial Alzheimer disease gene

    SciTech Connect

    Levy-Lahad, E.; Wang, Kai; Fu, Ying Hui

    1996-06-01

    Mutations in the gene STM2 result in autosomal dominant familial Alzheimer disease. To screen for mutations and to identify regulatory elements for this gene, the genomic DNA sequence and intron-exon structure were determined. Twelve exons including 10 coding exons were identified in a genomic region spanning 23, 737 bp. The first 2 exons encode the 5{prime}-untranslated region. Expression analysis of STM2 indicates that two transcripts of 2.4 and 2.8 kb are found in skeletal muscle, pancreas, and heart. In addition, a splice variant of the 2.4-kb transcript was identified that is the result of the use of an alternative splice acceptor site located in exon 10. The use of this site results in a transcript lacking a single glutamate. The promotor for this gene and the alternatively spliced exons leading to the 2.8-kb form of the gene remain to be identified. Expression of STM2 was high in skeletal muscle and pancreas, with comparatively low levels observed in brain. This expression pattern is intriguing since in Alzheimer disease, pathology and degeneration are observed only in the central nervous system. 19 refs., 2 figs., 3 tabs.

  1. A close relationship between primary nucleotides sequence structure and the composition of functional genes in the genome of prokaryotes.

    PubMed

    Garcia, Juan A L; Fernández-Guerra, Antoni; Casamayor, Emilio O

    2011-12-01

    Comparative genomics is an essential tool to unravel how genomes change over evolutionary time and to gain clues on the links between functional genomics and evolution. In prokaryotes, the large, good quality, genome sequences available in public databases and the recently developed large-scale computational methods, offer an unprecedent view on the ecology and evolution of microorganisms through comparative genomics. In this work, we examined the links among genome structure (i.e., the sequential distribution of nucleotides itself by detrended fluctuation analysis, DFA) and genomic diversity (i.e., gene functionality by Clusters of Orthologous Genes, COGs) in 828 full sequenced prokaryotic genomes from 548 different bacteria and archaea species. DFA scaling exponent α indicated persistent long-range correlations (fractality) in each genome analyzed. Higher resolution power was found when considering the sequential succession of purine (AG) vs. pyrimidine (CT) bases than either keto (GT) to amino (AC) forms or strongly (GC) vs. weakly (AT) bonded nucleotides. Interestingly, the phyla Aquificae, Fusobacteria, Dictyoglomi, Nitrospirae, and Thermotogae were closer to archaea than to their bacterial counterparts. A strong significant correlation was found between scaling exponent α and COGs distribution, and we consistently observed that the larger α the more heterogeneous was the gene distribution within each functional category, suggesting a close relationship between primary nucleotides sequence structure and functional genes composition.

  2. Analysis of the murine Dtk gene identifies conservation of genomic structure within a new receptor tyrosine kinase subfamily

    SciTech Connect

    Lewis, P.M.; Crosier, K.E.; Crosier, P.S.

    1996-01-01

    The receptor tyrosine kinase Dtk/Tyro 3/Sky/rse/brt/tif is a member of a new subfamily of receptors that also includes Axl/Ufo/Ark and Eyk/Mer. These receptors are characterized by the presence of two immunoglobulin-like loops and two fibronectin type III repeats in their extracellular domains. The structure of the murine Dtk gene has been determined. The gene consists of 21 exons that are distributed over 21 kb of genomic DNA. An isoform of Dtk is generated by differential splicing of exons from the 5{prime} region of the gene. The overall genomic structure of Dtk is virtually identical to that determined for the human UFO gene. This particular genomic organization is likely to have been duplicated and closely maintained throughout evolution. 38 refs., 3 figs., 1 tab.

  3. Comparative mapping, genomic structure, and expression analysis of eight pseudo-response regulator genes in Brassica rapa.

    PubMed

    Kim, Jin A; Kim, Jung Sun; Hong, Joon Ki; Lee, Yeon-Hee; Choi, Beom-Soon; Seol, Young-Joo; Jeon, Chang Hoo

    2012-05-01

    Circadian clocks regulate plant growth and development in response to environmental factors. In this function, clocks influence the adaptation of species to changes in location or climate. Circadian-clock genes have been subject of intense study in models such as Arabidopsis thaliana but the results may not necessarily reflect clock functions in species with polyploid genomes, such as Brassica species, that include multiple copies of clock-related genes. The triplicate genome of Brassica rapa retains high sequence-level co-linearity with Arabidopsis genomes. In B. rapa we had previously identified five orthologs of the five known Arabidopsis pseudo-response regulator (PRR) genes that are key regulators of the circadian clock in this species. Three of these B. rapa genes, BrPRR1, BrPPR5, and BrPPR7, are present in two copies each in the B. rapa genome, for a total of eight B. rapa PRR (BrPRR) orthologs. We have now determined sequences and expression characteristics of the eight BrPRR genes and mapped their positions in the B. rapa genome. Although both members of each paralogous pair exhibited the same expression pattern, some variation in their gene structures was apparent. The BrPRR genes are tightly linked to several flowering genes. The knowledge about genome location, copy number variation and structural diversity of these B. rapa clock genes will improve our understanding of clock-related functions in this important crop. This will facilitate the development of Brassica crops for optimal growth in new environments and under changing conditions.

  4. Genes, genome and Gestalt.

    PubMed

    Grisolia, Cesar Koppe

    2005-03-31

    According to Gestalt thinking, biological systems cannot be viewed as the sum of their elements, but as processes of the whole. To understand organisms we must start from the whole, observing how the various parts are related. In genetics, we must observe the genome over and above the sum of its genes. Either loss or addition of one gene in a genome can change the function of the organism. Genomes are organized in networks of genes, which need to be well integrated. In the case of genetically modified organisms (GMOs), for example, soybeans, rats, Anopheles mosquitoes, and pigs, the insertion of an exogenous gene into a receptive organism generally causes disturbance in the networks, resulting in the breakdown of gene interactions. In these cases, genetic modification increased the genetic load of the GMO and consequently decreased its adaptability (fitness). Therefore, it is hard to claim that the production of such organisms with an increased genetic load does not have ethical implications.

  5. The genomic structure of the human Charcot-Leyden crystal protein gene is analogous to those of the galectin genes

    SciTech Connect

    Dyer, K.D. |; Handen, J.S.; Rosenberg, H.F.

    1997-03-01

    The Charcot-Leyden crystal (CLC) protein, or eosinophil lysophospholipase, is a characteristic protein of human eosinophils and basophils; recent work has demonstrated that the CLC protein is both structurally and functionally related to the galectin family of {beta}-galactoside binding proteins. The galectins as a group share a number of features in common, including a linear ligand binding site encoded on a single exon. In this work, we demonstrate that the intron-exon structure of the gene encoding CLC is analogous to those encoding the galectins. The coding sequence of the CLC gene is divided into four exons, with the entire {beta}-galactoside binding site encoded by exon III. We have isolated CLC {beta}-galactoside binding sites from both orangutan (Pongo pygmaeus) and murine (Mus musculus) genomic DNAs, both encoded on single exons, and noted conservation of the amino acids shown to interact directly with the {beta}-galactoside ligand. The most likely interpretation of these results suggests the occurrence of one or more exon duplication and insertion events, resulting in the distribution of this lectin domain to CLC as well as to the multiple galectin genes. 35 refs., 3 figs.

  6. Ultra High-Resolution Gene Centric Genomic Structural Analysis of a Non-Syndromic Congenital Heart Defect, Tetralogy of Fallot

    PubMed Central

    Bittel, Douglas C.; Zhou, Xin-Gang; Kibiryeva, Nataliya; Fiedler, Stephanie; O’Brien, James E.; Marshall, Jennifer; Yu, Shihui; Liu, Hong-Yu

    2014-01-01

    Tetralogy of Fallot (TOF) is one of the most common severe congenital heart malformations. Great progress has been made in identifying key genes that regulate heart development, yet approximately 70% of TOF cases are sporadic and nonsyndromic with no known genetic cause. We created an ultra high-resolution gene centric comparative genomic hybridization (gcCGH) microarray based on 591 genes with a validated association with cardiovascular development or function. We used our gcCGH array to analyze the genomic structure of 34 infants with sporadic TOF without a deletion on chromosome 22q11.2 (n male = 20; n female = 14; age range of 2 to 10 months). Using our custom-made gcCGH microarray platform, we identified a total of 613 copy number variations (CNVs) ranging in size from 78 base pairs to 19.5 Mb. We identified 16 subjects with 33 CNVs that contained 13 different genes which are known to be directly associated with heart development. Additionally, there were 79 genes from the broader list of genes that were partially or completely contained in a CNV. All 34 individuals examined had at least one CNV involving these 79 genes. Furthermore, we had available whole genome exon arrays from right ventricular tissue in 13 of our subjects. We analyzed these for correlations between copy number and gene expression level. Surprisingly, we could detect only one clear association between CNVs and expression (GSTT1) for any of the 591 focal genes on the gcCGH array. The expression levels of GSTT1 were correlated with copy number in all cases examined (r = 0.95, p = 0.001). We identified a large number of small CNVs in genes with varying associations with heart development. Our results illustrate the complexity of human genome structural variation and underscore the need for multifactorial assessment of potential genetic/genomic factors that contribute to congenital heart defects. PMID:24498113

  7. Promoter-Specific Expression and Genomic Structure of IgLON Family Genes in Mouse

    PubMed Central

    Vanaveski, Taavi; Singh, Katyayani; Narvik, Jane; Eskla, Kattri-Liis; Visnapuu, Tanel; Heinla, Indrek; Jayaram, Mohan; Innos, Jürgen; Lilleväli, Kersti; Philips, Mari-Anne; Vasar, Eero

    2017-01-01

    IgLON family is composed of five genes: Lsamp, Ntm, Opcml, Negr1, and Iglon5; encoding for five highly homologous neural adhesion proteins that regulate neurite outgrowth and synapse formation. In the current study we performed in silico analysis revealing that Ntm and Opcml display similar genomic structure as previously reported for Lsamp, characterized by two alternative promotors 1a and 1b. Negr1 and Iglon5 transcripts have uniform 5′ region, suggesting single promoter. Iglon5, the recently characterized family member, shares high level of conservation and structural qualities characteristic to IgLON family such as N-terminal signal peptide, three Ig domains, and GPI anchor binding site. By using custom 5′-isoform-specific TaqMan gene-expression assay, we demonstrated heterogeneous expression of IgLON transcripts in different areas of mouse brain and several-fold lower expression in selected tissues outside central nervous system. As an example, the expression of IgLON transcripts in urogenital and reproductive system is in line with repeated reports of urogenital tumors accompanied by mutations in IgLON genes. Considering the high levels of intra-family homology shared by IgLONs, we investigated potential compensatory effects at the level of IgLON isoforms in the brains of mice deficient of one or two family members. We found that the lack of IgLONs is not compensated by a systematic quantitative increase of the other family members. On the contrary, the expression of Ntm 1a transcript and NEGR1 protein was significantly reduced in the frontal cortex of Lsamp-deficient mice suggesting that the expression patterns within IgLON family are balanced coherently. The actions of individual IgLONs, however, can be antagonistic as demonstrated by differential expression of Syp in deletion mutants of IgLONs. In conclusion, we show that the genomic twin-promoter structure has impact on both anatomical distribution and intra-family interactions of IgLON family members

  8. The mouse formin (Fmn) gene: Genomic structure, novel exons, and genetic mapping

    SciTech Connect

    Wang, C.C.; Chan, D.C.; Leder, P.

    1997-02-01

    Mutations in the mouse formin (Fmn) gene, formerly known as the limb deformity (ld) gene, give rise to recessively inherited limb deformities and renal malformations or aplasia. The Fmn gene encodes many differentially processed transcripts that are expressed in both adult and embryonic tissues. To study the genomic organization of the Fmn locus, we have used Fmn probes to isolate and characterize genomic clones spanning 500 kb. Our analysis of these clones shows that the Fmn gene is composed of at least 24 exons and spans 400 kb. We have identified two novel exons that are expressed in the developing embryonic limb bud as well as adult tissues such as brain and kidney. We have also used a microsatellite polymorphism from within the Fmn gene to map it genetically to a 2.2-cM interval between D2Mit58 and D2Mit103. 36 refs., 6 figs., 1 tab.

  9. The Aspergillus Genome Database: multispecies curation and incorporation of RNA-Seq data to improve structural gene annotations

    PubMed Central

    Cerqueira, Gustavo C.; Arnaud, Martha B.; Inglis, Diane O.; Skrzypek, Marek S.; Binkley, Gail; Simison, Matt; Miyasato, Stuart R.; Binkley, Jonathan; Orvis, Joshua; Shah, Prachi; Wymore, Farrell; Sherlock, Gavin; Wortman, Jennifer R.

    2014-01-01

    The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available web-based resource that was designed for Aspergillus researchers and is also a valuable source of information for the entire fungal research community. In addition to being a repository and central point of access to genome, transcriptome and polymorphism data, AspGD hosts a comprehensive comparative genomics toolbox that facilitates the exploration of precomputed orthologs among the 20 currently available Aspergillus genomes. AspGD curators perform gene product annotation based on review of the literature for four key Aspergillus species: Aspergillus nidulans, Aspergillus oryzae, Aspergillus fumigatus and Aspergillus niger. We have iteratively improved the structural annotation of Aspergillus genomes through the analysis of publicly available transcription data, mostly expressed sequenced tags, as described in a previous NAR Database article (Arnaud et al. 2012). In this update, we report substantive structural annotation improvements for A. nidulans, A. oryzae and A. fumigatus genomes based on recently available RNA-Seq data. Over 26 000 loci were updated across these species; although those primarily comprise the addition and extension of untranslated regions (UTRs), the new analysis also enabled over 1000 modifications affecting the coding sequence of genes in each target genome. PMID:24194595

  10. Genome-Wide Analysis of the Expansin Gene Superfamily Reveals Grapevine-Specific Structural and Functional Characteristics

    PubMed Central

    Tornielli, Giovanni Battista; Fasoli, Marianna; Venturini, Luca; Pezzotti, Mario; Zenoni, Sara

    2013-01-01

    Background Expansins are proteins that loosen plant cell walls in a pH-dependent manner, probably by increasing the relative movement among polymers thus causing irreversible expansion. The expansin superfamily (EXP) comprises four distinct families: expansin A (EXPA), expansin B (EXPB), expansin-like A (EXLA) and expansin-like B (EXLB). There is experimental evidence that EXPA and EXPB proteins are required for cell expansion and developmental processes involving cell wall modification, whereas the exact functions of EXLA and EXLB remain unclear. The complete grapevine (Vitis vinifera) genome sequence has allowed the characterization of many gene families, but an exhaustive genome-wide analysis of expansin gene expression has not been attempted thus far. Methodology/Principal Findings We identified 29 EXP superfamily genes in the grapevine genome, representing all four EXP families. Members of the same EXP family shared the same exon–intron structure, and phylogenetic analysis confirmed a closer relationship between EXP genes from woody species, i.e. grapevine and poplar (Populus trichocarpa), compared to those from Arabidopsis thaliana and rice (Oryza sativa). We also identified grapevine-specific duplication events involving the EXLB family. Global gene expression analysis confirmed a strong correlation among EXP genes expressed in mature and green/vegetative samples, respectively, as reported for other gene families in the recently-published grapevine gene expression atlas. We also observed the specific co-expression of EXLB genes in woody organs, and the involvement of certain grapevine EXP genes in berry development and post-harvest withering. Conclusion Our comprehensive analysis of the grapevine EXP superfamily confirmed and extended current knowledge about the structural and functional characteristics of this gene family, and also identified properties that are currently unique to grapevine expansin genes. Our data provide a model for the functional

  11. Genome structure drives patterns of gene family evolution in ciliates, a case study using Chilodonella uncinata (Protista, Ciliophora, Phyllopharyngea).

    PubMed

    Gao, Feng; Song, Weibo; Katz, Laura A

    2014-08-01

    In most lineages, diversity among gene family members results from gene duplication followed by sequence divergence. Because of the genome rearrangements during the development of somatic nuclei, gene family evolution in ciliates involves more complex processes. Previous work on the ciliate Chilodonella uncinata revealed that macronuclear β-tubulin gene family members are generated by alternative processing, in which germline regions are alternatively used in multiple macronuclear chromosomes. To further study genome evolution in this ciliate, we analyzed its transcriptome and found that (1) alternative processing is extensive among gene families; and (2) such gene families are likely to be C. uncinata specific. We characterized additional macronuclear and micronuclear copies of one candidate alternatively processed gene family-a protein kinase domain containing protein (PKc)-from two C. uncinata strains. Analysis of the PKc sequences reveals that (1) multiple PKc gene family members in the macronucleus share some identical regions flanked by divergent regions; and (2) the shared identical regions are processed from a single micronuclear chromosome. We discuss analogous processes in lineages across the eukaryotic tree of life to provide further insights on the impact of genome structure on gene family evolution in eukaryotes.

  12. The complete chloroplast genome sequence of an endemic monotypic genus Hagenia (Rosaceae): structural comparative analysis, gene content and microsatellite detection.

    PubMed

    Gichira, Andrew W; Li, Zhizhong; Saina, Josphat K; Long, Zhicheng; Hu, Guangwan; Gituru, Robert W; Wang, Qingfeng; Chen, Jinming

    2017-01-01

    Hagenia is an endangered monotypic genus endemic to the topical mountains of Africa. The only species, Hagenia abyssinica (Bruce) J.F. Gmel, is an important medicinal plant producing bioactive compounds that have been traditionally used by African communities as a remedy for gastrointestinal ailments in both humans and animals. Complete chloroplast genomes have been applied in resolving phylogenetic relationships within plant families. We employed high-throughput sequencing technologies to determine the complete chloroplast genome sequence of H. abyssinica. The genome is a circular molecule of 154,961 base pairs (bp), with a pair of Inverted Repeats (IR) 25,971 bp each, separated by two single copies; a large (LSC, 84,320 bp) and a small single copy (SSC, 18,696). H. abyssinica's chloroplast genome has a 37.1% GC content and encodes 112 unique genes, 78 of which code for proteins, 30 are tRNA genes and four are rRNA genes. A comparative analysis with twenty other species, sequenced to-date from the family Rosaceae, revealed similarities in structural organization, gene content and arrangement. The observed size differences are attributed to the contraction/expansion of the inverted repeats. The translational initiation factor gene (infA) which had been previously reported in other chloroplast genomes was conspicuously missing in H. abyssinica. A total of 172 microsatellites and 49 large repeat sequences were detected in the chloroplast genome. A Maximum Likelihood analyses of 71 protein-coding genes placed Hagenia in Rosoideae. The availability of a complete chloroplast genome, the first in the Sanguisorbeae tribe, is beneficial for further molecular studies on taxonomic and phylogenomic resolution within the Rosaceae family.

  13. The complete chloroplast genome sequence of an endemic monotypic genus Hagenia (Rosaceae): structural comparative analysis, gene content and microsatellite detection

    PubMed Central

    Saina, Josphat K.; Long, Zhicheng; Hu, Guangwan; Gituru, Robert W.

    2017-01-01

    Hagenia is an endangered monotypic genus endemic to the topical mountains of Africa. The only species, Hagenia abyssinica (Bruce) J.F. Gmel, is an important medicinal plant producing bioactive compounds that have been traditionally used by African communities as a remedy for gastrointestinal ailments in both humans and animals. Complete chloroplast genomes have been applied in resolving phylogenetic relationships within plant families. We employed high-throughput sequencing technologies to determine the complete chloroplast genome sequence of H. abyssinica. The genome is a circular molecule of 154,961 base pairs (bp), with a pair of Inverted Repeats (IR) 25,971 bp each, separated by two single copies; a large (LSC, 84,320 bp) and a small single copy (SSC, 18,696). H. abyssinica’s chloroplast genome has a 37.1% GC content and encodes 112 unique genes, 78 of which code for proteins, 30 are tRNA genes and four are rRNA genes. A comparative analysis with twenty other species, sequenced to-date from the family Rosaceae, revealed similarities in structural organization, gene content and arrangement. The observed size differences are attributed to the contraction/expansion of the inverted repeats. The translational initiation factor gene (infA) which had been previously reported in other chloroplast genomes was conspicuously missing in H. abyssinica. A total of 172 microsatellites and 49 large repeat sequences were detected in the chloroplast genome. A Maximum Likelihood analyses of 71 protein-coding genes placed Hagenia in Rosoideae. The availability of a complete chloroplast genome, the first in the Sanguisorbeae tribe, is beneficial for further molecular studies on taxonomic and phylogenomic resolution within the Rosaceae family. PMID:28097059

  14. Genome-Wide Analysis of the Sucrose Synthase Gene Family in Grape (Vitis vinifera): Structure, Evolution, and Expression Profiles.

    PubMed

    Zhu, Xudong; Wang, Mengqi; Li, Xiaopeng; Jiu, Songtao; Wang, Chen; Fang, Jinggui

    2017-03-28

    Sucrose synthase (SS) is widely considered as the key enzyme involved in the plant sugar metabolism that is critical to plant growth and development, especially quality of the fruit. The members of SS gene family have been identified and characterized in multiple plant genomes. However, detailed information about this gene family is lacking in grapevine (Vitis vinifera L.). In this study, we performed a systematic analysis of the grape (V. vinifera) genome and reported that there are five SS genes (VvSS1-5) in the grape genome. Comparison of the structures of grape SS genes showed high structural conservation of grape SS genes, resulting from the selection pressures during the evolutionary process. The segmental duplication of grape SS genes contributed to this gene family expansion. The syntenic analyses between grape and soybean (Glycine max) demonstrated that these genes located in corresponding syntenic blocks arose before the divergence of grape and soybean. Phylogenetic analysis revealed distinct evolutionary paths for the grape SS genes. VvSS1/VvSS5, VvSS2/VvSS3 and VvSS4 originated from three ancient SS genes, which were generated by duplication events before the split of monocots and eudicots. Bioinformatics analysis of publicly available microarray data, which was validated by quantitative real-time reverse transcription PCR (qRT-PCR), revealed distinct temporal and spatial expression patterns of VvSS genes in various tissues, organs and developmental stages, as well as in response to biotic and abiotic stresses. Taken together, our results will be beneficial for further investigations into the functions of SS gene in the processes of grape resistance to environmental stresses.

  15. Improved structural annotation of protein-coding genes in the Meloidogyne hapla genome using RNA-Seq.

    PubMed

    Guo, Yuelong; Bird, David McK; Nielsen, Dahlia M

    2014-01-01

    As high-throughput cDNA sequencing (RNA-Seq) is increasingly applied to hypothesis-driven biological studies, the prediction of protein coding genes based on these data are usurping strictly in silico approaches. Compared with computationally derived gene predictions, structural annotation is more accurate when based on biological evidence, particularly RNA-Seq data. Here, we refine the current genome annotation for the Meloidogyne hapla genome utilizing RNA-Seq data. Published structural annotation defines 14 420 protein-coding genes in the M. hapla genome. Of these, 25% (3751) were found to exhibit some incongruence with RNA-Seq data. Manual annotation enabled these discrepancies to be resolved. Our analysis revealed 544 new gene models that were missing from the prior annotation. Additionally, 1457 transcribed regions were newly identified on the ends of as-yet-unjoined contigs. We also searched for trans-spliced leaders, and based on RNA-Seq data, identified genes that appear to be trans-spliced. Four 22-bp trans-spliced leaders were identified using our pipeline, including the known trans-spliced leader, which is the M. hapla ortholog of SL1. In silico predictions of trans-splicing were validated by comparison with earlier results derived from an independent cDNA library constructed to capture trans-spliced transcripts. The new annotation, which we term HapPep5, is publically available at www.hapla.org.

  16. Global analysis of somatic structural genomic alterations and their impact on gene expression in diverse human cancers

    PubMed Central

    Alaei-Mahabadi, Babak; Karlsson, Joakim W.; Nilsson, Jonas A.; Larsson, Erik

    2016-01-01

    Tumor genomes are mosaics of somatic structural variants (SVs) that may contribute to the activation of oncogenes or inactivation of tumor suppressors, for example, by altering gene copy number amplitude. However, there are multiple other ways in which SVs can modulate transcription, but the general impact of such events on tumor transcriptional output has not been systematically determined. Here we use whole-genome sequencing data to map SVs across 600 tumors and 18 cancers, and investigate the relationship between SVs, copy number alterations (CNAs), and mRNA expression. We find that 34% of CNA breakpoints can be clarified structurally and that most amplifications are due to tandem duplications. We observe frequent swapping of strong and weak promoters in the context of gene fusions, and find that this has a measurable global impact on mRNA levels. Interestingly, several long noncoding RNAs were strongly activated by this mechanism. Additionally, SVs were confirmed in telomere reverse transcriptase (TERT) upstream regions in several cancers, associated with elevated TERT mRNA levels. We also highlight high-confidence gene fusions supported by both genomic and transcriptomic evidence, including a previously undescribed paired box 8 (PAX8)–nuclear factor, erythroid 2 like 2 (NFE2L2) fusion in thyroid carcinoma. In summary, we combine SV, CNA, and expression data to provide insights into the structural basis of CNAs as well as the impact of SVs on gene expression in tumors. PMID:27856756

  17. The Eucalyptus Tonoplast Intrinsic Protein (TIP) Gene Subfamily: Genomic Organization, Structural Features, and Expression Profiles

    PubMed Central

    Rodrigues, Marcela I.; Takeda, Agnes A. S.; Bravo, Juliana P.; Maia, Ivan G.

    2016-01-01

    Plant aquaporins are water channels implicated in various physiological processes, including growth, development and adaptation to stress. In this study, the Tonoplast Intrinsic Protein (TIP) gene subfamily of Eucalyptus, an economically important woody species, was investigated and characterized. A genome-wide survey of the Eucalyptus grandis genome revealed the presence of eleven putative TIP genes (referred as EgTIP), which were individually assigned by phylogeny to each of the classical TIP1–5 groups. Homology modeling confirmed the presence of the two highly conserved NPA (Asn-Pro-Ala) motifs in the identified EgTIPs. Residue variations in the corresponding selectivity filters, that might reflect differences in EgTIP substrate specificity, were observed. All EgTIP genes, except EgTIP5.1, were transcribed and the majority of them showed organ/tissue-enriched expression. Inspection of the EgTIP promoters revealed the presence of common cis-regulatory elements implicated in abiotic stress and hormone responses pointing to an involvement of the identified genes in abiotic stress responses. In line with these observations, additional gene expression profiling demonstrated increased expression under polyethylene glycol-imposed osmotic stress. Overall, the results obtained suggest that these novel EgTIPs might be functionally implicated in eucalyptus adaptation to stress. PMID:27965702

  18. The human glia maturation factor-gamma gene: genomic structure and mutation analysis in gliomas with chromosome 19q loss.

    PubMed

    Peters, N; Smith, J S; Tachibana, I; Lee, H K; Pohl, U; Portier, B P; Louis, D N; Jenkins, R B

    1999-09-01

    Human glia maturation factor-gamma (hGMF-gamma) is a recently identified gene that may be involved in glial differentiation, neural regeneration, and inhibition of tumor cell proliferation. The gene maps to the long arm of chromosome 19 at band q13.2, a region that is frequently deleted in human malignant gliomas and is thus suspected to harbor a glioma tumor suppressor gene. Given the putative role of hGMF-gamma in cell differentiation and proliferation and its localization to chromosome 19q13, this gene is an interesting candidate for the chromosome 19q glioma tumor suppressor gene. To evaluate this possibility, we determined the genomic structure of human hGMF-gamma and performed mutation screening in a series of 41 gliomas with and without allelic loss of chromosome 19q. Mutations were not detected, which suggests that hGMF-gamma is not the chromosome 19q glioma suppressor gene. However, the elucidation of the genomic structure of hGMF-gamma may prove useful in future investigations of hGMF-gamma in the normal adult and developing human nervous system.

  19. Structural characterization of helitrons and their stepwise capturing of gene fragments in the maize genome

    PubMed Central

    2011-01-01

    Background As a newly identified category of DNA transposon, helitrons have been found in a large number of eukaryotes genomes. Helitrons have contributed significantly to the intra-specific genome diversity in maize. Although many characteristics of helitrons in the maize genome have been well documented, the sequence of an intact autonomous helitrons has not been identified in maize. In addition, the process of gene fragment capturing during the transposition of helitrons has not been characterized. Results The whole genome sequences of maize inbred line B73 were analyzed, 1,649 helitron-like transposons including 1,515 helAs and 134 helBs were identified. ZmhelA1, ZmhelB1 and ZmhelB2 all encode an open reading frame (ORF) with intact replication initiator (Rep) motif and a DNA helicase (Hel) domain, which are similar to previously reported autonomous helitrons in other organisms. The putative autonomous ZmhelB1 and ZmhelB2 contain an extra replication factor-a protein1 (RPA1) transposase (RPA-TPase) including three single strand DNA-binding domains (DBD)-A/-B/-C in the ORF. Over ninety percent of maize helitrons identified have captured gene fragments. HelAs and helBs carry 4,645 and 249 gene fragments, which yield 2,507 and 187 different genes respectively. Many helitrons contain mutilple terminal sequences, but only one 3'-terminal sequence had an intact "CTAG" motif. There were no significant differences in the 5'-termini sequence between the veritas terminal sequence and the pseudo sequence. Helitrons not only can capture fragments, but were also shown to lose internal sequences during the course of transposing. Conclusions Three putative autonomous elements were identified, which encoded an intact Rep motif and a DNA helicase domain, suggesting that autonomous helitrons may exist in modern maize. The results indicate that gene fragments captured during the transposition of many helitrons happen in a stepwise way, with multiple gene fragments within one

  20. Genome-wide analysis of the structural genes regulating defense phenylpropanoid metabolism in Populus

    SciTech Connect

    Tschaplinski, Timothy J; Tsai, Chung-Jui; Harding, Scott A; Lindroth, richard L; Yuan, Yinan

    2006-01-01

    Salicin-based phenolic glycosides, hydroxycinnamate derivatives and flavonoid-derived condensed tannins comprise up to one-third of Populus leaf dry mass. Genes regulating the abundance and chemical diversity of these substances have not been comprehensively analysed in tree species exhibiting this metabolically demanding level of phenolic metabolism. Here, shikimate-phenylpropanoid pathway genes thought to give rise to these phenolic products were annotated from the Populus genome, their expression assessed by semiquantitative or quantitative reverse transcription polymerase chain reaction (PCR), and metabolic evidence for function presented. Unlike Arabidopsis, Populus leaves accumulate an array of hydroxycinnamoyl-quinate esters, which is consistent with broadened function of the expanded hydroxycinnamoyl-CoA transferase gene family. Greater flavonoid pathway diversity is also represented, and flavonoid gene families are larger. Consistent with expanded pathway function, most of these genes were upregulated during wound-stimulated condensed tannin synthesis in leaves. The suite of Populus genes regulating phenylpropanoid product accumulation should have important application in managing phenolic carbon pools in relation to climate change and global carbon cycling.

  1. Genomic cloning, structure, expression pattern, and chromosomal location of the human SIX3 gene.

    PubMed

    Granadino, B; Gallardo, M E; López-Ríos, J; Sanz, R; Ramos, C; Ayuso, C; Bovolenta, P; Rodríguez de Córdoba, S

    1999-01-01

    The Drosophila gene sine oculis (so) is a nuclear homeoprotein that is required for eye development. Homologous genes to so, denoted SIX genes, have been found in vertebrates. Among the SIX genes, SIX3 is considered to be the functional homologue of so. To provide insight into the potential implications of SIX3 in human ocular malformations, we have cloned and characterized the human SIX3 gene. In human eye, SIX3 produces a 3-kb transcript that codes for a 332-amino-acid polypeptide that is virtually identical to its mouse and chick homologues. Expression of SIX3 was detected in human embryos as early as 5-7 weeks of gestation and found to be maintained in the eye throughout the entire period of fetal development. At 20 weeks of gestation, expression of SIX3 in the human retina was detected in the ganglion cells and in cells of the inner nuclear layer. The human SIX3 gene spans 4.4 kb of genomic DNA and is split in two exons separated by a 1659-bp intron. SIX3 was mapped to human chromosome 2p16-p21, between the genetic markers D2S119 and D2S288. Interestingly, the map position of human SIX3 overlaps the locations of two dominant disorders with ocular phenotypes that have been assigned to this chromosomal region, holoprosencephaly type 2 and Malattia Leventinese.

  2. Accounting for Population Structure in Gene-by-Environment Interactions in Genome-Wide Association Studies Using Mixed Models.

    PubMed

    Sul, Jae Hoon; Bilow, Michael; Yang, Wen-Yun; Kostem, Emrah; Furlotte, Nick; He, Dan; Eskin, Eleazar

    2016-03-01

    Although genome-wide association studies (GWASs) have discovered numerous novel genetic variants associated with many complex traits and diseases, those genetic variants typically explain only a small fraction of phenotypic variance. Factors that account for phenotypic variance include environmental factors and gene-by-environment interactions (GEIs). Recently, several studies have conducted genome-wide gene-by-environment association analyses and demonstrated important roles of GEIs in complex traits. One of the main challenges in these association studies is to control effects of population structure that may cause spurious associations. Many studies have analyzed how population structure influences statistics of genetic variants and developed several statistical approaches to correct for population structure. However, the impact of population structure on GEI statistics in GWASs has not been extensively studied and nor have there been methods designed to correct for population structure on GEI statistics. In this paper, we show both analytically and empirically that population structure may cause spurious GEIs and use both simulation and two GWAS datasets to support our finding. We propose a statistical approach based on mixed models to account for population structure on GEI statistics. We find that our approach effectively controls population structure on statistics for GEIs as well as for genetic variants.

  3. Genomic structure and mapping of precerebellin and a precerebellin-related gene.

    PubMed

    Kavety, B; Jenkins, N A; Fletcher, C F; Copeland, N G; Morgan, J I

    1994-11-01

    The cerebellum-specific hexadecapeptide, cerebellin, is derived from a larger precursor, precerebellin, that has sequence homology to the complement component C1qB. We report the cloning of the murine homolog of precerebellin, Cbln1, and a closely related gene, Cbln2. Amino acid comparison of Cbln1 with Cbln2 revealed that Cbln2 is 88% identical to the carboxy terminal region of Cbln1. That these are independent genes was confirmed by Southern analysis and genome mapping. Cbln1 was positioned to the central region of mouse chromosome 8, 2.3 cM distal of JunB and 6.0 cM proximal of Mt1, while Cbln2 mapped to the distal end of mouse chromosome 18, 1.7 cM telomeric of Mbp.

  4. Structure and chromosomal localization of the genomic locus encoding the Kiz1 LIM-kinase gene

    SciTech Connect

    Bernard, O.; Burkitt, V.; Webb, G.C.

    1996-08-01

    We have cloned and characterized the mouse gene encoding Kiz1/Limk1, a new member of the zinc-finger LIM family that also has a kinase domain. The gene encompasses 25 kb of the mouse genome, and the organization of its 16 exons does not correlate with its functional domains. The promoter region of Kiz1/Limk1 was identified by cloning a 1.06-kb genomic fragment upstream from the first ATG in a promotorless CAT vector. This construct was demonstrated to drive CAT expression in Jurkat cells. The promoter sequence lacks conventional TATA and CAAT motifs but contains consensus binding sequences for several transcriptional regulators implicated in control of transcription in many different cell types, including Sp1, Ets, and E2A. Analysis of the chromosomal localization of KIZ1/LIMK1 indicates that it lies on human chromosome 17 in the region 17q25 and on mouse Chromosome 5, band G2. 15 refs., 3 figs., 1 tab.

  5. Characterization of the genomic structure of the mouse APLP1 gene

    SciTech Connect

    Zhong, Sue; Wu, Kuo; Black, I.B.; Schaar, D.G.

    1996-02-15

    This article reports on the organization of the mouse APLP1 gene, an evolutionarily conserved amyloid precursor-like protein. The amyloid beta protein, important in Alzheimer diseases, is derived from these precursor proteins. By investigating the expression and structure of this murine gene, it is hoped that more will be learned about the function and regulation of the human homologue. 27 refs., 2 figs.

  6. Genomic structure of the luciferase gene and phylogenetic analysis in the Hotaria-group fireflies.

    PubMed

    Choi, Yong Soo; Bae, Jin Sik; Lee, Kwang Sik; Kim, Seong Ryul; Kim, Iksoo; Kim, Jong Gill; Kim, Keun Young; Kim, Sam Eun; Suzuki, Hirobumi; Lee, Sang Mong; Sohn, Hung Dae; Jin, Byung Rae

    2003-02-01

    The luminescent fireflies have species specific flash patterns, being recognized as sexual communication. The luciferase gene is the sole enzyme responsible for bioluminescence. We describe here the complete nucleotide sequence and the exon-intron structure of the luciferase gene of the Hotaria-group fireflies, H. unmunsana, H. papariensis and H. tsushimana. The luciferase gene of the Hotaria-group firefly including the known H. parvula spans 1950 bp and consisted of six introns and seven exons coding for 548 amino acid residues, suggesting highly conserved structure among the Hotaria-group fireflies. Although only one luciferase gene was cloned from H. papariensis, each of the two sequences of the gene was found in H. unmunsana (U1 and Uc) and H. tsushimana (T1 and T2). The amino acid sequence divergence among H. unmunsana, H. papariensis, and H. tsushimana only ranged from zero to three amino acid residues, but H. parvula differed by 10-11 amino acid residues from the other Hotaria-group fireflies, suggesting a divergent relationship of this species. Phylogenetic analysis using the deduced amino acid sequences of the luciferase gene resulted in a monophyletic group in the Hotaria excluding H. parvula, suggesting a close relationship among H. unmunsana, H. papariensis and H. tsushimana. Additionally, we also analyzed the mitochondrial cytochrome oxidase I (COI) gene of the Hotaria-group fireflies. The deduced amino acid sequence of the COI gene of H. unmunsana was identical to that of H. papariensis and H. tsushimana, but different by three positions from H. parvula. In terms of nucleotide sequences of the COI gene, intraspecific sequence divergence was sometimes larger than interspecies level, and phylogenetic analysis placed the three species into monophyletic groups unresolved among them, but excluded H. parvula. In conclusion, our results suggest that H. unmunsana, H. papariensis and H. tsushimana are very closely related or might be an identical species, at

  7. A high-resolution reference genetic map positioning 8.8 K genes for the conifer white spruce: structural genomics implications and correspondence with physical distance.

    PubMed

    Pavy, Nathalie; Lamothe, Manuel; Pelgas, Betty; Gagnon, France; Birol, Inanç; Bohlmann, Joerg; Mackay, John; Isabel, Nathalie; Bousquet, Jean

    2017-04-01

    Over the last decade, extensive genetic and genomic resources have been developed for the conifer white spruce (Picea glauca, Pinaceae), which has one of the largest plant genomes (20 Gbp). Draft genome sequences of white spruce and other conifers have recently been produced, but dense genetic maps are needed to comprehend genome macrostructure, delineate regions involved in quantitative traits, complement functional genomic investigations, and assist the assembly of fragmented genomic sequences. A greatly expanded P. glauca composite linkage map was generated from a set of 1976 full-sib progeny, with the positioning of 8793 expressed genes. Regions with significant low or high gene density were identified. Gene family members tended to be mapped on the same chromosomes, with tandemly arrayed genes significantly biased towards specific functional classes. The map was integrated with transcriptome data surveyed across eight tissues. In total, 69 clusters of co-expressed and co-localising genes were identified. A high level of synteny was found with pine genetic maps, which should facilitate the transfer of structural information in the Pinaceae. Although the current white spruce genome sequence remains highly fragmented, dozens of scaffolds encompassing more than one mapped gene were identified. From these, the relationship between genetic and physical distances was examined and the genome-wide recombination rate was found to be much smaller than most estimates reported for angiosperm genomes. This gene linkage map shall assist the large-scale assembly of the next-generation white spruce genome sequence and provide a reference resource for the conifer genomics community.

  8. Reannotation and extended community resources for the genome of the non-seed plant Physcomitrella patens provide insights into the evolution of plant gene structures and functions

    PubMed Central

    2013-01-01

    Background The moss Physcomitrella patens as a model species provides an important reference for early-diverging lineages of plants and the release of the genome in 2008 opened the doors to genome-wide studies. The usability of a reference genome greatly depends on the quality of the annotation and the availability of centralized community resources. Therefore, in the light of accumulating evidence for missing genes, fragmentary gene structures, false annotations and a low rate of functional annotations on the original release, we decided to improve the moss genome annotation. Results Here, we report the complete moss genome re-annotation (designated V1.6) incorporating the increased transcript availability from a multitude of developmental stages and tissue types. We demonstrate the utility of the improved P. patens genome annotation for comparative genomics and new extensions to the cosmoss.org resource as a central repository for this plant “flagship” genome. The structural annotation of 32,275 protein-coding genes results in 8387 additional loci including 1456 loci with known protein domains or homologs in Plantae. This is the first release to include information on transcript isoforms, suggesting alternative splicing events for at least 10.8% of the loci. Furthermore, this release now also provides information on non-protein-coding loci. Functional annotations were improved regarding quality and coverage, resulting in 58% annotated loci (previously: 41%) that comprise also 7200 additional loci with GO annotations. Access and manual curation of the functional and structural genome annotation is provided via the http://www.cosmoss.org model organism database. Conclusions Comparative analysis of gene structure evolution along the green plant lineage provides novel insights, such as a comparatively high number of loci with 5’-UTR introns in the moss. Comparative analysis of functional annotations reveals expansions of moss house-keeping and metabolic genes

  9. Alternative splicing and genomic structure of the Wilms tumor gene WT1

    SciTech Connect

    Haber, D.A. Massachusetts General Hospital Cancer Center, Charlestown ); Sohn, R.L.; Buckler, A.J.; Pelletier, J.; Call, K.M.; Housman, D.E. )

    1991-11-01

    The chromosome 11p13 Wilms tumor susceptibility gene WT1 appears to play a crucial role in regulating the proliferation and differentiation of nephroblasts and gonadal tissue. The WT1 gene consists of 10 exons, encoding a complex pattern of mRNA species: four distinct transcripts are expressed, reflecting the presence or absence of two alternative splices. Splice I consists of a separate exon, encoding 17 amino acids, which is inserted between the proline-rich amino terminus and the zinc finger domains. Splice II arises from the use of an alternative 5{prime} splice junction and results in the insertion of 3 amino acids between zinc fingers 3 and 4. RNase protection analysis demonstrates that the most prevalent splice variant in both human and mouse is that which contains both alternative splices, whereas the least common is the transcript missing both splices. The relative distribution of splice variants is highly conserved between normal fetal kidney tissue and Wilms tumors that have intact WT1 transcripts. The ratio of these different WT1 mRNA species is also maintained as a function of development in the mouse kidney and in various mouse tissues expressing WT1. The conservation in structure and relative levels of each of the four WT1 mRNA species suggest that each encoded polypeptide makes a significant contribution to normal gene function. The control of cellular proliferation and differentiation exerted by the WT1 gene products may involve interactions between four polypeptides with distinct targets and functions.

  10. Mitochondrial Genome Structure of Photosynthetic Eukaryotes.

    PubMed

    Yurina, N P; Odintsova, M S

    2016-02-01

    Current ideas of plant mitochondrial genome organization are presented. Data on the size and structural organization of mtDNA, gene content, and peculiarities are summarized. Special emphasis is given to characteristic features of the mitochondrial genomes of land plants and photosynthetic algae that distinguish them from the mitochondrial genomes of other eukaryotes. The data published before the end of 2014 are reviewed.

  11. Genomic survey, gene expression analysis and structural modeling suggest diverse roles of DNA methyltransferases in legumes.

    PubMed

    Garg, Rohini; Kumari, Romika; Tiwari, Sneha; Goyal, Shweta

    2014-01-01

    DNA methylation plays a crucial role in development through inheritable gene silencing. Plants possess three types of DNA methyltransferases (MTases), namely Methyltransferase (MET), Chromomethylase (CMT) and Domains Rearranged Methyltransferase (DRM), which maintain methylation at CG, CHG and CHH sites. DNA MTases have not been studied in legumes so far. Here, we report the identification and analysis of putative DNA MTases in five legumes, including chickpea, soybean, pigeonpea, Medicago and Lotus. MTases in legumes could be classified in known MET, CMT, DRM and DNA nucleotide methyltransferases (DNMT2) subfamilies based on their domain organization. First three MTases represent DNA MTases, whereas DNMT2 represents a transfer RNA (tRNA) MTase. Structural comparison of all the MTases in plants with known MTases in mammalian and plant systems have been reported to assign structural features in context of biological functions of these proteins. The structure analysis clearly specified regions crucial for protein-protein interactions and regions important for nucleosome binding in various domains of CMT and MET proteins. In addition, structural model of DRM suggested that circular permutation of motifs does not have any effect on overall structure of DNA methyltransferase domain. These results provide valuable insights into role of various domains in molecular recognition and should facilitate mechanistic understanding of their function in mediating specific methylation patterns. Further, the comprehensive gene expression analyses of MTases in legumes provided evidence of their role in various developmental processes throughout the plant life cycle and response to various abiotic stresses. Overall, our study will be very helpful in establishing the specific functions of DNA MTases in legumes.

  12. The structure of HIV-1 genomic RNA in the gp120 gene determines a recombination hot spot in vivo.

    PubMed

    Galetto, Román; Moumen, Abdeladim; Giacomoni, Véronique; Véron, Michel; Charneau, Pierre; Negroni, Matteo

    2004-08-27

    By frequently rearranging large regions of the genome, genetic recombination is a major determinant in the plasticity of the human immunodeficiency virus type I (HIV-1) population. In retroviruses, recombination mostly occurs by template switching during reverse transcription. The generation of retroviral vectors provides a means to study this process after a single cycle of infection of cells in culture. Using HIV-1-derived vectors, we present here the first characterization and estimate of the strength of a recombination hot spot in HIV-1 in vivo. In the hot spot region, located within the C2 portion of the gp120 envelope gene, the rate of recombination is up to ten times higher than in the surrounding regions. The hot region corresponds to a previously identified RNA hairpin structure. Although recombination breakpoints in vivo cluster in the top portion of the hairpin, the bias for template switching in this same region appears less marked in a cell-free system. By modulating the stability of this hairpin we were able to affect the local recombination rate both in vitro and in infected cells, indicating that the local folding of the genomic RNA is a major parameter in the recombination process. This characterization of reverse transcription products generated after a single cycle of infection provides insights in the understanding of the mechanism of recombination in vivo and suggests that specific regions of the genome might be prompted to yield different rates of evolution due to the presence of circumscribed recombination hot spots.

  13. Comparative analysis of syntenic genes in grass genomes reveals accelerated rates of gene structure and coding sequence evolution in polyploid wheat

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cycles of whole genome duplication (WGD) and diploidization are hallmarks of eukaryotic genome evolution and speciation. Polyploid wheat (Triticum aestivum) has had a massive increase in genome size largely due to recent WGDs. How these processes may impact the dynamics of gene evolution was studied...

  14. Characterization of the porcine sperm adhesion molecule gene SPAM1- expression analysis, genomic structure, and chromosomal mapping.

    PubMed

    Day, A E; Quilter, C R; Sargent, C A; Mileham, A J

    2002-06-01

    Sequence analysis of cDNA products, derived from adult porcine testis mRNA, gave overlapping nucleotide sequence correlating to 1952 bp of the sperm adhesion molecule 1 (SPAM1) gene. This sequence was shown to be homologous to SPAM1 genes known in other mammalian species and contained an open reading frame encoding a 493-amino acid protein. Fluorescence in situ hybridization (FISH), using a bacterial artificial chromosome (BAC) clone from the PigE BAC library, was used to map SPAM1 to chromosome 18 of the pig. This finding is consistent with comparative mapping experiments performed between pig and human chromosomes. Polymerase chain reaction (PCR) analysis of genomic DNA has shown that the 1952 bp of cDNA sequence spans approximately 9 kb of genomic DNA and comprises of at least four exons, with its size and structure being relatively conserved between mouse, human and pig. Reverse transcriptase (RT)-PCR analysis of mRNA from nine porcine tissues has also suggested that expression of SPAM1 is limited to the testis.

  15. Short interspersed nuclear elements (SINEs) are abundant in Solanaceae and have a family-specific impact on gene structure and genome organization.

    PubMed

    Seibt, Kathrin M; Wenke, Torsten; Muders, Katja; Truberg, Bernd; Schmidt, Thomas

    2016-05-01

    Short interspersed nuclear elements (SINEs) are highly abundant non-autonomous retrotransposons that are widespread in plants. They are short in size, non-coding, show high sequence diversity, and are therefore mostly not or not correctly annotated in plant genome sequences. Hence, comparative studies on genomic SINE populations are rare. To explore the structural organization and impact of SINEs, we comparatively investigated the genome sequences of the Solanaceae species potato (Solanum tuberosum), tomato (Solanum lycopersicum), wild tomato (Solanum pennellii), and two pepper cultivars (Capsicum annuum). Based on 8.5 Gbp sequence data, we annotated 82 983 SINE copies belonging to 10 families and subfamilies on a base pair level. Solanaceae SINEs are dispersed over all chromosomes with enrichments in distal regions. Depending on the genome assemblies and gene predictions, 30% of all SINE copies are associated with genes, particularly frequent in introns and untranslated regions (UTRs). The close association with genes is family specific. More than 10% of all genes annotated in the Solanaceae species investigated contain at least one SINE insertion, and we found genes harbouring up to 16 SINE copies. We demonstrate the involvement of SINEs in gene and genome evolution including the donation of splice sites, start and stop codons and exons to genes, enlargement of introns and UTRs, generation of tandem-like duplications and transduction of adjacent sequence regions.

  16. Genome-wide Analyses of the Structural Gene Families Involved in the Legume-specific 5-Deoxyisoflavonoid Biosynthesis of Lotus japonicus

    PubMed Central

    Shimada, Norimoto; Sato, Shusei; Akashi, Tomoyoshi; Nakamura, Yasukazu; Tabata, Satoshi; Ayabe, Shin-ichi; Aoki, Toshio

    2007-01-01

    Abstract A model legume Lotus japonicus (Regel) K. Larsen is one of the subjects of genome sequencing and functional genomics programs. In the course of targeted approaches to the legume genomics, we analyzed the genes encoding enzymes involved in the biosynthesis of the legume-specific 5-deoxyisoflavonoid of L. japonicus, which produces isoflavan phytoalexins on elicitor treatment. The paralogous biosynthetic genes were assigned as comprehensively as possible by biochemical experiments, similarity searches, comparison of the gene structures, and phylogenetic analyses. Among the 10 biosynthetic genes investigated, six comprise multigene families, and in many cases they form gene clusters in the chromosomes. Semi-quantitative reverse transcriptase–PCR analyses showed coordinate up-regulation of most of the genes during phytoalexin induction and complex accumulation patterns of the transcripts in different organs. Some paralogous genes exhibited similar expression specificities, suggesting their genetic redundancy. The molecular evolution of the biosynthetic genes is discussed. The results presented here provide reliable annotations of the genes and genetic markers for comparative and functional genomics of leguminous plants. PMID:17452423

  17. Chloroplast genome sequence of the moss Tortula ruralis: gene content, polymorphism, and structural arrangement relative to other green plant chloroplast genomes

    PubMed Central

    2010-01-01

    Background Tortula ruralis, a widely distributed species in the moss family Pottiaceae, is increasingly used as a model organism for the study of desiccation tolerance and mechanisms of cellular repair. In this paper, we present the chloroplast genome sequence of T. ruralis, only the second published chloroplast genome for a moss, and the first for a vegetatively desiccation-tolerant plant. Results The Tortula chloroplast genome is ~123,500 bp, and differs in a number of ways from that of Physcomitrella patens, the first published moss chloroplast genome. For example, Tortula lacks the ~71 kb inversion found in the large single copy region of the Physcomitrella genome and other members of the Funariales. Also, the Tortula chloroplast genome lacks petN, a gene found in all known land plant plastid genomes. In addition, an unusual case of nucleotide polymorphism was discovered. Conclusions Although the chloroplast genome of Tortula ruralis differs from that of the only other sequenced moss, Physcomitrella patens, we have yet to determine the biological significance of the differences. The polymorphisms we have uncovered in the sequencing of the genome offer a rare possibility (for mosses) of the generation of DNA markers for fine-level phylogenetic studies, or to investigate individual variation within populations. PMID:20187961

  18. The walnut (Juglans regia) genome sequence reveals diversity in genes coding for the biosynthesis of non-structural polyphenols.

    PubMed

    Martínez-García, Pedro J; Crepeau, Marc W; Puiu, Daniela; Gonzalez-Ibeas, Daniel; Whalen, Jeanne; Stevens, Kristian A; Paul, Robin; Butterfield, Timothy S; Britton, Monica T; Reagan, Russell L; Chakraborty, Sandeep; Walawage, Sriema L; Vasquez-Gross, Hans A; Cardeno, Charis; Famula, Randi A; Pratt, Kevin; Kuruganti, Sowmya; Aradhya, Mallikarjuna K; Leslie, Charles A; Dandekar, Abhaya M; Salzberg, Steven L; Wegrzyn, Jill L; Langley, Charles H; Neale, David B

    2016-09-01

    The Persian walnut (Juglans regia L.), a diploid species native to the mountainous regions of Central Asia, is the major walnut species cultivated for nut production and is one of the most widespread tree nut species in the world. The high nutritional value of J. regia nuts is associated with a rich array of polyphenolic compounds, whose complete biosynthetic pathways are still unknown. A J. regia genome sequence was obtained from the cultivar 'Chandler' to discover target genes and additional unknown genes. The 667-Mbp genome was assembled using two different methods (SOAPdenovo2 and MaSuRCA), with an N50 scaffold size of 464 955 bp (based on a genome size of 606 Mbp), 221 640 contigs and a GC content of 37%. Annotation with MAKER-P and other genomic resources yielded 32 498 gene models. Previous studies in walnut relying on tissue-specific methods have only identified a single polyphenol oxidase (PPO) gene (JrPPO1). Enabled by the J. regia genome sequence, a second homolog of PPO (JrPPO2) was discovered. In addition, about 130 genes in the large gallate 1-β-glucosyltransferase (GGT) superfamily were detected. Specifically, two genes, JrGGT1 and JrGGT2, were significantly homologous to the GGT from Quercus robur (QrGGT), which is involved in the synthesis of 1-O-galloyl-β-d-glucose, a precursor for the synthesis of hydrolysable tannins. The reference genome for J. regia provides meaningful insight into the complex pathways required for the synthesis of polyphenols. The walnut genome sequence provides important tools and methods to accelerate breeding and to facilitate the genetic dissection of complex traits.

  19. Dynamic structures in phytoplasma genomes: sequence variable mosaics (SVMs) of clustered genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Emergence of the phytoplasma clade from an Acholeplasma-like ancestor gave rise to an intriguing group of cell wall-less prokaryotes through a remarkable and continuing evolutionary process. In a ceaseless progression, phytoplasmas have evolved reduced genomes, losing biochemical pathways for synth...

  20. Genomic structure and promoter functional analysis of GnRH3 gene in large yellow croaker (Larimichthys crocea).

    PubMed

    Huang, Wei; Zhang, Jianshe; Liao, Zhi; Lv, Zhenming; Wu, Huifei; Zhu, Aiyi; Wu, Changwen

    2016-01-15

    Gonadotropin-releasing hormone III (GnRH3) is considered to be a key neurohormone in fish reproduction control. In the present study, the cDNA and genomic sequences of GnRH3 were cloned and characterized from large yellow croaker Larimichthys crocea. The cDNA encoded a protein of 99 amino acids with four functional motifs. The full-length genome sequence was composed of 3797 nucleotides, including four exons and three introns. Higher identities of amino acid sequences and conserved exon-intron organizations were found between LcGnRH3 and other GnRH3 genes. In addition, some special features of the sequences were detected in partial species. For example, two specific residues (V and A) were found in the family Sciaenidae, and the unique 75-72 bp type of the open reading frame 2 and 3 existed in the family Cyprinidae. Analysis of the 2576 bp promoter fragment of LcGnRH3 showed a number of transcription factor binding sites, such as AP1, CREB, GATA-1, HSF, FOXA2, and FOXL1. Promoter functional analysis using an EGFP reporter fusion in zebrafish larvae presented positive signals in the brain, including the olfactory region, the terminal nerve ganglion, the telencephalon, and the hypothalamus. The expression pattern was generally consistent with the endogenous GnRH3 GFP-expressing transgenic zebrafish lines, but the details were different. These results indicate that the structure and function of LcGnRH3 are generally similar to the other teleost GnRH3 genes, but there exist some distinctions among them.

  1. Genomic structure of PIR-B, the inhibitory member of the paired immunoglobulin-like receptor genes in mice.

    PubMed

    Alley, T L; Cooper, M D; Chen, M; Kubagawa, H

    1998-03-01

    The genes encoding the murine paired immunoglobulin-like receptors PIR-A and PIR-B are members of a novel gene family which encode cell-surface receptors bearing immunoreceptor tyrosine-based inhibitory motifs (ITIMs) and their non-inhibitory/activatory counterparts. PIR-A and PIR-B have highly homologous extracellular domains but distinct transmembrane and cytoplasmic regions. A charged arginine in the transmembrane region of PIR-A suggests its potential association with other transmembrane proteins to form a signal transducing unit. PIR-B, in contrast, has an uncharged transmembrane region and several ITIMs in its cytoplasmic tail. These characteristics suggest that PIR-A and PIR-B which are coordinately expressed by B cells and myeloid cells, serve counter-regulatory roles in humoral and inflammatory responses. In the present study we have determined the genomic structure of the single copy PIR-B gene. The gene consists of 15 exons and spans approximately 8 kilobases. The first exon contains the 5' untranslated region, the ATG translation start site, and approximately half of the leader peptide sequence. The remainder of the leader peptide sequence is encoded by exon 2. Exons 3-8 encode the six extracellular immunoglobulin-like domains and exons 9 and 10 code for the extracellular membrane proximal and transmembrane regions. The final five exons (exons 11-15) encode for the ITIM-bearing cytoplasmic tail and the 3' untranslated region. The intron/exon boundaries of PIR-B obey the GT-AG rule and are in phase I, with the notable exception of the three boundaries determined for ITIM-containing exons. A microsatellite composed of the trinucleotide repeat AAG in the intron between exons 9 and 10 provides a useful marker for studying population genetics.

  2. Population structure and comparative genome hybridization of European flor yeast reveal a unique group of Saccharomyces cerevisiae strains with few gene duplications in their genome.

    PubMed

    Legras, Jean-Luc; Erny, Claude; Charpentier, Claudine

    2014-01-01

    Wine biological aging is a wine making process used to produce specific beverages in several countries in Europe, including Spain, Italy, France, and Hungary. This process involves the formation of a velum at the surface of the wine. Here, we present the first large scale comparison of all European flor strains involved in this process. We inferred the population structure of these European flor strains from their microsatellite genotype diversity and analyzed their ploidy. We show that almost all of these flor strains belong to the same cluster and are diploid, except for a few Spanish strains. Comparison of the array hybridization profile of six flor strains originating from these four countries, with that of three wine strains did not reveal any large segmental amplification. Nonetheless, some genes, including YKL221W/MCH2 and YKL222C, were amplified in the genome of four out of six flor strains. Finally, we correlated ICR1 ncRNA and FLO11 polymorphisms with flor yeast population structure, and associate the presence of wild type ICR1 and a long Flo11p with thin velum formation in a cluster of Jura strains. These results provide new insight into the diversity of flor yeast and show that combinations of different adaptive changes can lead to an increase of hydrophobicity and affect velum formation.

  3. Population Structure and Comparative Genome Hybridization of European Flor Yeast Reveal a Unique Group of Saccharomyces cerevisiae Strains with Few Gene Duplications in Their Genome

    PubMed Central

    Legras, Jean-Luc; Erny, Claude; Charpentier, Claudine

    2014-01-01

    Wine biological aging is a wine making process used to produce specific beverages in several countries in Europe, including Spain, Italy, France, and Hungary. This process involves the formation of a velum at the surface of the wine. Here, we present the first large scale comparison of all European flor strains involved in this process. We inferred the population structure of these European flor strains from their microsatellite genotype diversity and analyzed their ploidy. We show that almost all of these flor strains belong to the same cluster and are diploid, except for a few Spanish strains. Comparison of the array hybridization profile of six flor strains originating from these four countries, with that of three wine strains did not reveal any large segmental amplification. Nonetheless, some genes, including YKL221W/MCH2 and YKL222C, were amplified in the genome of four out of six flor strains. Finally, we correlated ICR1 ncRNA and FLO11 polymorphisms with flor yeast population structure, and associate the presence of wild type ICR1 and a long Flo11p with thin velum formation in a cluster of Jura strains. These results provide new insight into the diversity of flor yeast and show that combinations of different adaptive changes can lead to an increase of hydrophobicity and affect velum formation. PMID:25272156

  4. Improved systematic tRNA gene annotation allows new insights into the evolution of mitochondrial tRNA structures and into the mechanisms of mitochondrial genome rearrangements

    PubMed Central

    Jühling, Frank; Pütz, Joern; Bernt, Matthias; Donath, Alexander; Middendorf, Martin; Florentz, Catherine; Stadler, Peter F.

    2012-01-01

    Transfer RNAs (tRNAs) are present in all types of cells as well as in organelles. tRNAs of animal mitochondria show a low level of primary sequence conservation and exhibit ‘bizarre’ secondary structures, lacking complete domains of the common cloverleaf. Such sequences are hard to detect and hence frequently missed in computational analyses and mitochondrial genome annotation. Here, we introduce an automatic annotation procedure for mitochondrial tRNA genes in Metazoa based on sequence and structural information in manually curated covariance models. The method, applied to re-annotate 1876 available metazoan mitochondrial RefSeq genomes, allows to distinguish between remaining functional genes and degrading ‘pseudogenes’, even at early stages of divergence. The subsequent analysis of a comprehensive set of mitochondrial tRNA genes gives new insights into the evolution of structures of mitochondrial tRNA sequences as well as into the mechanisms of genome rearrangements. We find frequent losses of tRNA genes concentrated in basal Metazoa, frequent independent losses of individual parts of tRNA genes, particularly in Arthropoda, and wide-spread conserved overlaps of tRNAs in opposite reading direction. Direct evidence for several recent Tandem Duplication-Random Loss events is gained, demonstrating that this mechanism has an impact on the appearance of new mitochondrial gene orders. PMID:22139921

  5. Genome-wide structural and evolutionary analysis of the P450 monooxygenase genes (P450ome) in the white rot fungus Phanerochaete chrysosporium : Evidence for gene duplications and extensive gene clustering

    PubMed Central

    Doddapaneni, Harshavardhan; Chakraborty, Ranajit; Yadav, Jagjit S

    2005-01-01

    Background Phanerochaete chrysosporium, the model white rot basidiomycetous fungus, has the extraordinary ability to mineralize (to CO2) lignin and detoxify a variety of chemical pollutants. Its cytochrome P450 monooxygenases have recently been implied in several of these biotransformations. Our initial P450 cloning efforts in P. chrysosporium and its subsequent whole genome sequencing have revealed an extraordinary P450 repertoire ("P450ome") containing at least 150 P450 genes with yet unknown function. In order to understand the functional diversity and the evolutionary mechanisms and significance of these hemeproteins, here we report a genome-wide structural and evolutionary analysis of the P450ome of this fungus. Results Our analysis showed that P. chrysosporium P450ome could be classified into 12 families and 23 sub-families and is characterized by the presence of multigene families. A genome-level structural analysis revealed 16 organizationally homogeneous and heterogeneous clusters of tandem P450 genes. Analysis of our cloned cDNAs revealed structurally conserved characteristics (intron numbers and locations, and functional domains) among members of the two representative multigene P450 families CYP63 and CYP505 (P450foxy). Considering the unusually complex structural features of the P450 genes in this genome, including microexons (2–10 aa) and frequent small introns (45–55 bp), alternative splicing, as experimentally observed for CYP63, may be a more widespread event in the P450ome of this fungus. Clan-level phylogenetic comparison revealed that P. chrysosporium P450 families fall under 11 fungal clans and the majority of these multigene families appear to have evolved locally in this genome from their respective progenitor genes, as a result of extensive gene duplications and rearrangements. Conclusion P. chrysosporium P450ome, the largest known todate among fungi, is characterized by tandem gene clusters and multigene families. This enormous P450

  6. Structure and partial genomic sequence of the human E2F1 gene.

    PubMed

    Neuman, E; Sellers, W R; McNeil, J A; Lawrence, J B; Kaelin, W G

    1996-09-16

    The E2F family of transcription factors appears to play a critical role in the transcription of certain genes required for cell cycle progression. E2F1, the first cloned member of this family, is regulated during the cell cycle at the mRNA level by changes in transcription of the E2F1 gene and at the protein level by complex formation with proteins such as the retinoblastoma gene product (pRB), cyclin A and DP1. E2F1 can override a pRB-induced G1/S block and can behave as an oncogene in certain cells. E2F1 was cloned and was found to contain seven exons. The dinucleotides at the 5' and 3' splice sites of intron 4 do not agree with consensus splice site sequences. Fluorescence in situ hybridization localized E2F1 to chromosome 20q11. Knowledge of the organization of E2F1 may facilitate identification of additional E2F family members, as well as detection of E2F1 abnormalities in human tumors.

  7. Analysis of the grape MYB R2R3 subfamily reveals expanded wine quality-related clades and conserved gene structure organization across Vitis and Arabidopsis genomes

    PubMed Central

    Matus, José Tomás; Aquea, Felipe; Arce-Johnson, Patricio

    2008-01-01

    Background The MYB superfamily constitutes the most abundant group of transcription factors described in plants. Members control processes such as epidermal cell differentiation, stomatal aperture, flavonoid synthesis, cold and drought tolerance and pathogen resistance. No genome-wide characterization of this family has been conducted in a woody species such as grapevine. In addition, previous analysis of the recently released grape genome sequence suggested expansion events of several gene families involved in wine quality. Results We describe and classify 108 members of the grape R2R3 MYB gene subfamily in terms of their genomic gene structures and similarity to their putative Arabidopsis thaliana orthologues. Seven gene models were derived and analyzed in terms of gene expression and their DNA binding domain structures. Despite low overall sequence homology in the C-terminus of all proteins, even in those with similar functions across Arabidopsis and Vitis, highly conserved motif sequences and exon lengths were found. The grape epidermal cell fate clade is expanded when compared with the Arabidopsis and rice MYB subfamilies. Two anthocyanin MYBA related clusters were identified in chromosomes 2 and 14, one of which includes the previously described grape colour locus. Tannin related loci were also detected with eight candidate homologues in chromosomes 4, 9 and 11. Conclusion This genome wide transcription factor analysis in Vitis suggests that clade-specific grape R2R3 MYB genes are expanded while other MYB genes could be well conserved compared to Arabidopsis. MYB gene abundance, homology and orientation within particular loci also suggests that expanded MYB clades conferring quality attributes of grapes and wines, such as colour and astringency, could possess redundant, overlapping and cooperative functions. PMID:18647406

  8. Synonymous Codon Usage Bias in the Plastid Genome is Unrelated to Gene Structure and Shows Evolutionary Heterogeneity

    PubMed Central

    Qi, Yueying; Xu, Wenjing; Xing, Tian; Zhao, Mingming; Li, Nana; Yan, Li; Xia, Guangmin; Wang, Mengcheng

    2015-01-01

    Synonymous codon usage bias (SCUB) is the nonuniform usage of codons, occurring often in nearly all organisms. Our previous study found that SCUB is correlated with intron number, is unequal among exons in the plant nuclear genome, and mirrors evolutionary specialization. However, whether this rule exists in the plastid genome has not been addressed. Here, we present an analysis of SCUB in the plastid genomes of 25 species from lower to higher plants (algae, bryophytes, pteridophytes, gymnosperms, and spermatophytes). We found NNA and NNT (A- and T-ending codons) are preferential in the plastid genomes of all plants. Interestingly, this preference is heterogeneous among taxonomies of plants, with the strongest preference in bryophytes and the weakest in pteridophytes, suggesting an association between SCUB and plant evolution. In addition, SCUB frequencies are consistent among genes with varied introns and among exons, indicating that the bias of NNA and NNT is unrelated to either intron number or exon position. Further, SCUB is associated with DNA methylation–induced conversion of cytosine to thymine in the vascular plants but not in algae or bryophytes. These data demonstrate that these SCUB profiles in the plastid genome are distinctly different compared with the nuclear genome. PMID:25922569

  9. Genome Structure of the Symbiont Bifidobacterium pseudocatenulatum CECT 7765 and Gene Expression Profiling in Response to Lactulose-Derived Oligosaccharides

    PubMed Central

    Benítez-Páez, Alfonso; Moreno, F. Javier; Sanz, María L.; Sanz, Yolanda

    2016-01-01

    Bifidobacterium pseudocatenulatum CECT 7765 was isolated from stools of a breast-fed infant. Although, this strain is generally considered an adult-type bifidobacterial species, it has also been shown to have pre-clinical efficacy in obesity models. In order to understand the molecular basis of its adaptation to complex carbohydrates and improve its potential functionality, we have analyzed its genome and transcriptome, as well as its metabolic output when growing in galacto-oligosaccharides derived from lactulose (GOS-Lu) as carbon source. B. pseudocatenulatum CECT 7765 shows strain-specific genome regions, including a great diversity of sugar metabolic-related genes. A preliminary and exploratory transcriptome analysis suggests candidate over-expression of several genes coding for sugar transporters and permeases; furthermore, five out of seven beta-galactosidases identified in the genome could be activated in response to GOS-Lu exposure. Here, we also propose that a specific gene cluster is involved in controlling the import and hydrolysis of certain di- and tri-saccharides, which seemed to be those primarily taken-up by the bifidobacterial strain. This was discerned from mass spectrometry-based quantification of different saccharide fractions of culture supernatants. Our results confirm that the expression of genes involved in sugar transport and metabolism and in the synthesis of leucine, an amino acid with a key role in glucose and energy homeostasis, was up-regulated by GOS-Lu. This was done using qPCR in addition to the exploratory information derived from the single-replicated RNAseq approach, together with the functional annotation of genes predicted to be encoded in the B. pseudocatenulatum CETC 7765 genome. PMID:27199952

  10. Brief Guide to Genomics: DNA, Genes and Genomes

    MedlinePlus

    ... guía de genómica A Brief Guide to Genomics DNA, Genes and Genomes Deoxyribonucleic acid (DNA) is the ... and lead to a disease such as cancer. DNA Sequencing Sequencing simply means determining the exact order ...

  11. Evolutionary origin of Rosaceae-specific active non-autonomous hAT elements and their contribution to gene regulation and genomic structural variation.

    PubMed

    Wang, Lu; Peng, Qian; Zhao, Jianbo; Ren, Fei; Zhou, Hui; Wang, Wei; Liao, Liao; Owiti, Albert; Jiang, Quan; Han, Yuepeng

    2016-05-01

    Transposable elements account for approximately 30 % of the Prunus genome; however, their evolutionary origin and functionality remain largely unclear. In this study, we identified a hAT transposon family, termed Moshan, in Prunus. The Moshan elements consist of three types, aMoshan, tMoshan, and mMoshan. The aMoshan and tMoshan types contain intact or truncated transposase genes, respectively, while the mMoshan type is miniature inverted-repeat transposable element (MITE). The Moshan transposons are unique to Rosaceae, and the copy numbers of different Moshan types are significantly correlated. Sequence homology analysis reveals that the mMoshan MITEs are direct deletion derivatives of the tMoshan progenitors, and one kind of mMoshan containing a MuDR-derived fragment were amplified predominately in the peach genome. The mMoshan sequences contain cis-regulatory elements that can enhance gene expression up to 100-fold. The mMoshan MITEs can serve as potential sources of micro and long noncoding RNAs. Whole-genome re-sequencing analysis indicates that mMoshan elements are highly active, and an insertion into S-haplotype-specific F-box gene was reported to cause the breakdown of self-incompatibility in sour cherry. Taken together, all these results suggest that the mMoshan elements play important roles in regulating gene expression and driving genomic structural variation in Prunus.

  12. Weeding out the genes: the Arabidopsis genome project.

    PubMed

    Martienssen, R A

    2000-05-01

    The Arabidopsis genome sequence is scheduled for completion at the end of this year (December 2000). It will be the first higher plant genome to be sequenced, and will allow a detailed comparison with bacterial, yeast and animal genomes. Already, two of the five chromosomes have been sequenced, and we have had our first glimpse of higher eukaryotic centromeres, and the structure of heterochromatin. The implications for understanding plant gene function, genome structure and genome organization are profound. In this review, the lessons learned for future genome projects are reviewed as well as a summary of the initial findings in Arabidopsis.

  13. Gene Chips and Functional Genomics

    NASA Astrophysics Data System (ADS)

    Hamadeh, Hisham; Afshari, Cynthia

    2000-11-01

    These past few years of scientific discovery will undoubtedly be remembered as the "genomics era," the period in which biologists succeeded in enumerating the sequence of nucleotides making up all, or at least most, of human DNA. And while this achievement has been heralded as a technological feat equal to the moon landing, it is only the first of many advances in DNA technology. Scientists are now faced with the task of understanding the meaning of the DNA sequence. Specifically, they want to learn how the DNA code relates to protein function. An important tool in the study of "functional genomics," is the cDNA microarray—also known as the gene chip. Inspired by computer microchips, gene chips allow scientists to monitor the expression of hundreds, even thousands, of genes in a fraction of the time it used to take to monitor the expression of a single one. By altering the conditions under which a particular tissue expresses genes—say, by exposing it to toxins or growth factors—scientists can determine the suite of genes expressed in different situations and hence start to get a handle on the function of these genes. The authors discuss this important new technology and some of its practical applications.

  14. Horizontal gene transfer and the rock record: comparative genomics of phylogenetically distant bacteria that induce wrinkle structure formation in modern sediments.

    PubMed

    Flood, B E; Bailey, J V; Biddle, J F

    2014-03-01

    Wrinkle structures are sedimentary features that are produced primarily through the trapping and binding of siliciclastic sediments by mat-forming micro-organisms. Wrinkle structures and related sedimentary structures in the rock record are commonly interpreted to represent the stabilizing influence of cyanobacteria on sediments because cyanobacteria are known to produce similar textures and structures in modern tidal flat settings. However, other extant bacteria such as filamentous representatives of the family Beggiatoaceae can also interact with sediments to produce sedimentary features that morphologically resemble many of those associated with cyanobacteria-dominated mats. While Beggiatoa spp. and cyanobacteria are metabolically and phylogenetically distant, genomic analyses show that the two groups share hundreds of homologous genes, likely as the result of horizontal gene transfer. The comparative genomics results described here suggest that some horizontally transferred genes may code for phenotypic traits such as filament formation, chemotaxis, and the production of extracellular polymeric substances that potentially underlie the similar biostabilizing influences of these organisms on sediments. We suggest that the ecological utility of certain basic life modes such as the construction of mats and biofilms, coupled with the lateral mobility of genes in the microbial world, introduces an element of uncertainty into the inference of specific phylogenetic origins from gross morphological features preserved in the ancient rock record.

  15. Genomic structure and expression analysis of the RNase kappa family ortholog gene in the insect Ceratitis capitata.

    PubMed

    Rampias, Theodoros N; Fragoulis, Emmanuel G; Sideris, Diamantis C

    2008-12-01

    Cc RNase is the founding member of the recently identified RNase kappa family, which is represented by a single ortholog in a wide range of animal taxonomic groups. Although the precise biological role of this protein is still unknown, it has been shown that the recombinant proteins isolated so far from the insect Ceratitis capitata and from human exhibit ribonucleolytic activity. In this work, we report the genomic organization and molecular evolution of the RNase kappa gene from various animal species, as well as expression analysis of the ortholog gene in C. capitata. The high degree of amino acid sequence similarity, in combination with the fact that exon sizes and intronic positions are extremely conserved among RNase kappa orthologs in 15 diverse genomes from sea anemone to human, imply a very significant biological function for this enzyme. In C. capitata, two forms of RNase kappa mRNA (0.9 and 1.5 kb) with various lengths of 3' UTR were identified as alternative products of a single gene, resulting from the use of different polyadenylation signals. Both transcripts are expressed in all insect tissues and developmental stages. Sequence analysis of the extended region of the longer transcript revealed the existence of three mRNA instability motifs (AUUUA) and five poly(U) tracts, whose functional importance in RNase kappa mRNA decay remains to be explored.

  16. Uses of antimicrobial genes from microbial genome

    DOEpatents

    Sorek, Rotem; Rubin, Edward M.

    2013-08-20

    We describe a method for mining microbial genomes to discover antimicrobial genes and proteins having broad spectrum of activity. Also described are antimicrobial genes and their expression products from various microbial genomes that were found using this method. The products of such genes can be used as antimicrobial agents or as tools for molecular biology.

  17. The mouse p97 (CDC48) gene. Genomic structure, definition of transcriptional regulatory sequences, gene expression, and characterization of a pseudogene.

    PubMed

    Müller, J M; Meyer, H H; Ruhrberg, C; Stamp, G W; Warren, G; Shima, D T

    1999-04-09

    Here we present the first description of the genomic organization, transcriptional regulatory sequences, and adult and embryonic gene expression for the mouse p97(CDC48) AAA ATPase. Clones representing two distinct p97 genes were isolated in a genomic library screen, one of them likely representing a non-functional processed pseudogene. The coding region of the gene encoding the functional mRNA is interrupted by 16 introns and encompasses 20.4 kilobase pairs. Definition of the transcriptional initiation site and sequence analysis showed that the gene contains a TATA-less, GC-rich promoter region with an initiator element spanning the transcription start site. Cis-acting elements necessary for basal transcription activity reside within 410 base pairs of the flanking region as determined by transient transfection assays. In immunohistological analyses, p97 was widely expressed in embryos and adults, but protein levels were tightly controlled in a cell type- and cell differentiation-dependent manner. A remarkable heterogeneity in p97 immunostaining was found on a cellular level within a given tissue, and protein amounts in the cytoplasm and nucleus varied widely, suggesting a highly regulated and intermittent function for p97. This study provides the basis for a detailed analysis of the complex regulation of p97 and the reagents required for assessing its functional significance using targeted gene manipulation in the mouse.

  18. Structure, expression profile and phylogenetic inference of chalcone isomerase-like genes from the narrow-leafed lupin (Lupinus angustifolius L.) genome

    PubMed Central

    Przysiecka, Łucja; Książkiewicz, Michał; Wolko, Bogdan; Naganowska, Barbara

    2015-01-01

    Lupins, like other legumes, have a unique biosynthesis scheme of 5-deoxy-type flavonoids and isoflavonoids. A key enzyme in this pathway is chalcone isomerase (CHI), a member of CHI-fold protein family, encompassing subfamilies of CHI1, CHI2, CHI-like (CHIL), and fatty acid-binding (FAP) proteins. Here, two Lupinus angustifolius (narrow-leafed lupin) CHILs, LangCHIL1 and LangCHIL2, were identified and characterized using DNA fingerprinting, cytogenetic and linkage mapping, sequencing and expression profiling. Clones carrying CHIL sequences were assembled into two contigs. Full gene sequences were obtained from these contigs, and mapped in two L. angustifolius linkage groups by gene-specific markers. Bacterial artificial chromosome fluorescence in situ hybridization approach confirmed the localization of two LangCHIL genes in distinct chromosomes. The expression profiles of both LangCHIL isoforms were very similar. The highest level of transcription was in the roots of the third week of plant growth; thereafter, expression declined. The expression of both LangCHIL genes in leaves and stems was similar and low. Comparative mapping to reference legume genome sequences revealed strong syntenic links; however, LangCHIL2 contig had a much more conserved structure than LangCHIL1. LangCHIL2 is assumed to be an ancestor gene, whereas LangCHIL1 probably appeared as a result of duplication. As both copies are transcriptionally active, questions arise concerning their hypothetical functional divergence. Screening of the narrow-leafed lupin genome and transcriptome with CHI-fold protein sequences, followed by Bayesian inference of phylogeny and cross-genera synteny survey, identified representatives of all but one (CHI1) main subfamilies. They are as follows: two copies of CHI2, FAPa2 and CHIL, and single copies of FAPb and FAPa1. Duplicated genes are remnants of whole genome duplication which is assumed to have occurred after the divergence of Lupinus, Arachis, and Glycine

  19. Comparative genomic analysis of sixty mycobacteriophage genomes: Genome clustering, gene acquisition and gene size

    PubMed Central

    Hatfull, Graham F.; Jacobs-Sera, Deborah; Lawrence, Jeffrey G.; Pope, Welkin H.; Russell, Daniel A.; Ko, Ching-Chung; Weber, Rebecca J.; Patel, Manisha C.; Germane, Katherine L.; Edgar, Robert H.; Hoyte, Natasha N.; Bowman, Charles A.; Tantoco, Anthony T.; Paladin, Elizabeth C.; Myers, Marlana S.; Smith, Alexis L.; Grace, Molly S.; Pham, Thuy T.; O'Brien, Matthew B.; Vogelsberger, Amy M.; Hryckowian, Andrew J.; Wynalek, Jessica L.; Donis-Keller, Helen; Bogel, Matt W.; Peebles, Craig L.; Cresawn, Steve G.; Hendrix, Roger W.

    2010-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. Expansion of a collection of sequenced phage genomes to a total of sixty – all infecting a common bacterial host – provides further insight into their diversity and evolution. Of the sixty phage genomes, 55 can be grouped into nine clusters according to their nucleotide sequence similarities, five of which can be further divided into subclusters; five genomes do not cluster with other phages. The sequence diversity between genomes within a cluster varies greatly; for example, the six genomes in cluster D share more than 97.5% average nucleotide similarity with each other. In contrast, similarity between the two genomes in Cluster I is barely detectable by diagonal plot analysis. The total of 6,858 predicted ORFs have been grouped into 1523 phamilies (phams) of related sequences, 46% of which possess only a single member. Only 18.8% of the phams have sequence similarity to non-mycobacteriophage database entries and fewer than 10% of all phams can be assigned functions based on database searching or synteny. Genome clustering facilitates the identification of genes that are in greatest genetic flux and are more likely to have been exchanged horizontally in relatively recent evolutionary time. Although mycobacteriophage genes exhibit smaller average size than genes of their host (205 residues compared to 315), phage genes in higher flux average only ∼100 amino acids, suggesting that the primary units of genetic exchange correspond to single protein domains. PMID:20064525

  20. Informational laws of genome structures

    PubMed Central

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-01-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined. PMID:27354155

  1. Informational laws of genome structures

    NASA Astrophysics Data System (ADS)

    Bonnici, Vincenzo; Manca, Vincenzo

    2016-06-01

    In recent years, the analysis of genomes by means of strings of length k occurring in the genomes, called k-mers, has provided important insights into the basic mechanisms and design principles of genome structures. In the present study, we focus on the proper choice of the value of k for applying information theoretic concepts that express intrinsic aspects of genomes. The value k = lg2(n), where n is the genome length, is determined to be the best choice in the definition of some genomic informational indexes that are studied and computed for seventy genomes. These indexes, which are based on information entropies and on suitable comparisons with random genomes, suggest five informational laws, to which all of the considered genomes obey. Moreover, an informational genome complexity measure is proposed, which is a generalized logistic map that balances entropic and anti-entropic components of genomes and is related to their evolutionary dynamics. Finally, applications to computational synthetic biology are briefly outlined.

  2. Comparative genomic analysis of prion genes

    PubMed Central

    Premzl, Marko; Gamulin, Vera

    2007-01-01

    Background The homologues of human disease genes are expected to contribute to better understanding of physiological and pathogenic processes. We made use of the present availability of vertebrate genomic sequences, and we have conducted the most comprehensive comparative genomic analysis of the prion protein gene PRNP and its homologues, shadow of prion protein gene SPRN and doppel gene PRND, and prion testis-specific gene PRNT so far. Results While the SPRN and PRNP homologues are present in all vertebrates, PRND is known in tetrapods, and PRNT is present in primates. PRNT could be viewed as a TE-associated gene. Using human as the base sequence for genomic sequence comparisons (VISTA), we annotated numerous potential cis-elements. The conserved regions in SPRNs harbour the potential Sp1 sites in promoters (mammals, birds), C-rich intron splicing enhancers and PTB intron splicing silencers in introns (mammals, birds), and hsa-miR-34a sites in 3'-UTRs (eutherians). We showed the conserved PRNP upstream regions, which may be potential enhancers or silencers (primates, dog). In the PRNP 3'-UTRs, there are conserved cytoplasmic polyadenylation element sites (mammals, birds). The PRND core promoters include highly conserved CCAAT, CArG and TATA boxes (mammals). We deduced 42 new protein primary structures, and performed the first phylogenetic analysis of all vertebrate prion genes. Using the protein alignment which included 122 sequences, we constructed the neighbour-joining tree which showed four major clusters, including shadoos, shadoo2s and prion protein-likes (cluster 1), fish prion proteins (cluster 2), tetrapode prion proteins (cluster 3) and doppels (cluster 4). We showed that the entire prion protein conformationally plastic region is well conserved between eutherian prion proteins and shadoos (18–25% identity and 28–34% similarity), and there could be a potential structural compatibility between shadoos and the left-handed parallel beta-helical fold

  3. Single molecule real-time sequencing of Xanthomonas oryzae genomes reveals a dynamic structure and complex TAL (transcription activator-like) effector gene relationships

    PubMed Central

    Booher, Nicholas J.; Carpenter, Sara C. D.; Sebra, Robert P.; Wang, Li; Salzberg, Steven L.; Leach, Jan E.; Bogdanove, Adam J.

    2016-01-01

    Pathogen-injected, direct transcriptional activators of host genes, TAL (transcription activator-like) effectors play determinative roles in plant diseases caused by Xanthomonas spp. A large domain of nearly identical, 33–35 aa repeats in each protein mediates DNA recognition. This modularity makes TAL effectors customizable and thus important also in biotechnology. However, the repeats render TAL effector (tal) genes nearly impossible to assemble using next-generation, short reads. Here, we demonstrate that long-read, single molecule real-time (SMRT) sequencing solves this problem. Taking an ensemble approach to first generate local, tal gene contigs, we correctly assembled de novo the genomes of two strains of the rice pathogen X. oryzae completed previously using the Sanger method and even identified errors in those references. Sequencing two more strains revealed a dynamic genome structure and a striking plasticity in tal gene content. Our results pave the way for population-level studies to inform resistance breeding, improve biotechnology and probe TAL effector evolution. PMID:27148456

  4. Genomic Structure and Identification of Novel Mutations in Usherin, the Gene Responsible for Usher Syndrome Type IIa

    PubMed Central

    Weston, M. D.; Eudy, J. D.; Fujita, S.; Yao, S.-F.; Usami, S.; Cremers, C.; Greenburg, J.; Ramesar, R.; Martini, A.; Moller, C.; Smith, R. J.; Sumegi, J.; Kimberling, William J.

    2000-01-01

    Usher syndrome type IIa (USHIIa) is an autosomal recessive disorder characterized by moderate to severe sensorineural hearing loss and progressive retinitis pigmentosa. This disorder maps to human chromosome 1q41. Recently, mutations in USHIIa patients were identified in a novel gene isolated from this chromosomal region. The USH2A gene encodes a protein with a predicted molecular weight of 171.5 kD and possesses laminin epidermal growth factor as well as fibronectin type III domains. These domains are observed in other protein components of the basal lamina and extracellular matrixes; they may also be observed in cell-adhesion molecules. The intron/exon organization of the gene whose protein we name “Usherin” was determined by direct sequencing of PCR products and cloned genomic DNA with cDNA-specific primers. The gene is encoded by 21 exons and spans a minimum of 105 kb. A mutation search of 57 independent USHIIa probands was performed with a combination of direct sequencing and heteroduplex analysis of PCR-amplified exons. Fifteen new mutations were found. Of 114 independent USH2A alleles, 58 harbored probable pathologic mutations. Ten cases of USHIIa were true homozygotes and 10 were compound heterozygotes; 18 heterozygotes with only one identifiable mutation were observed. Sixty-five percent (38/58) of cases had at least one mutation, and 51% (58/114) of the total number of possible mutations were identified. The allele 2299delG (previously reported as 2314delG) was the most frequent mutant allele observed (16%; 31/192). Three new missense mutations (C319Y, N346H, and C419F) were discovered; all were restricted to the previously unreported laminin domain VI region of Usherin. The possible significance of this domain, known to be necessary for laminin network assembly, is discussed in the context of domain VI mutations from other proteins. PMID:10729113

  5. [Integration of different T-DNA structures of ACC oxidase gene into carnation genome extended cut flower vase-life differently].

    PubMed

    Yu, Yi-Xun; Bao, Man-Zhu

    2004-09-01

    The cultivar 'Master' of carnation (Dianthus caryophyllus L.) was transformed with four T-DNA structures containing sense, antisense, sense direct repeat and antisense direct repeat gene of ACC oxidase mediated by Agrobacterium tumefaciens. Southern blotting detection showed that foreign gene was integrated into the carnation genome and 14 transgenic lines were obtained. The transgenic plants were transplanted to soil and grew normally in greenhouse. Of the 12 transgenic lines screened, the cut flower vase life of 8 transgenic lines is up to 11 days and the longest one is 12.8 days while the vase life of the control is 5.8 days under 25 degrees C. The vase life of 2 lines out of 3 with single sense ACO gene is same as that of the control, while the vase life of 3 lines out of 4 with single antisense ACO gene is prolonged. The vase life of cut flowers of 5 lines with direct repeat ACO genes is all prolonged by about 6 days, while the vase life of 3 out of 7 lines with single ACO gene is same as that of the control. During the senescence of cut flowers, the ethylene production of the most of the transgenic lines decreased significantly, and the production of ethylene is not detectable in lines T456, T556 and T575. The results of the research demonstrate that antisense foreign gene inhibits expression of endogenesis gene more significantly than sense one. Both sense direct repeat and antisense direct repeat foreign genes can suppress endogenous gene expression more significantly comparing to single foreign genes. The transgenic lines obtained from this research are useful to minimize carnation cut flower transportation and storage expenses.

  6. Multiple genome alignment for identifying the core structure among moderately related microbial genomes

    PubMed Central

    Uchiyama, Ikuo

    2008-01-01

    Background Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. Results The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. Conclusion The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes. PMID:18976470

  7. KEGG: kyoto encyclopedia of genes and genomes.

    PubMed

    Kanehisa, M; Goto, S

    2000-01-01

    KEGG (Kyoto Encyclopedia of Genes and Genomes) is a knowledge base for systematic analysis of gene functions, linking genomic information with higher order functional information. The genomic information is stored in the GENES database, which is a collection of gene catalogs for all the completely sequenced genomes and some partial genomes with up-to-date annotation of gene functions. The higher order functional information is stored in the PATHWAY database, which contains graphical representations of cellular processes, such as metabolism, membrane transport, signal transduction and cell cycle. The PATHWAY database is supplemented by a set of ortholog group tables for the information about conserved subpathways (pathway motifs), which are often encoded by positionally coupled genes on the chromosome and which are especially useful in predicting gene functions. A third database in KEGG is LIGAND for the information about chemical compounds, enzyme molecules and enzymatic reactions. KEGG provides Java graphics tools for browsing genome maps, comparing two genome maps and manipulating expression maps, as well as computational tools for sequence comparison, graph comparison and path computation. The KEGG databases are daily updated and made freely available (http://www. genome.ad.jp/kegg/).

  8. A unified gene catalog for the laboratory mouse reference genome.

    PubMed

    Zhu, Y; Richardson, J E; Hale, P; Baldarelli, R M; Reed, D J; Recla, J M; Sinclair, R; Reddy, T B K; Bult, C J

    2015-08-01

    We report here a semi-automated process by which mouse genome feature predictions and curated annotations (i.e., genes, pseudogenes, functional RNAs, etc.) from Ensembl, NCBI and Vertebrate Genome Annotation database (Vega) are reconciled with the genome features in the Mouse Genome Informatics (MGI) database (http://www.informatics.jax.org) into a comprehensive and non-redundant catalog. Our gene unification method employs an algorithm (fjoin--feature join) for efficient detection of genome coordinate overlaps among features represented in two annotation data sets. Following the analysis with fjoin, genome features are binned into six possible categories (1:1, 1:0, 0:1, 1:n, n:1, n:m) based on coordinate overlaps. These categories are subsequently prioritized for assessment of annotation equivalencies and differences. The version of the unified catalog reported here contains more than 59,000 entries, including 22,599 protein-coding coding genes, 12,455 pseudogenes, and 24,007 other feature types (e.g., microRNAs, lincRNAs, etc.). More than 23,000 of the entries in the MGI gene catalog have equivalent gene models in the annotation files obtained from NCBI, Vega, and Ensembl. 12,719 of the features are unique to NCBI relative to Ensembl/Vega; 11,957 are unique to Ensembl/Vega relative to NCBI, and 3095 are unique to MGI. More than 4000 genome features fall into categories that require manual inspection to resolve structural differences in the gene models from different annotation sources. Using the MGI unified gene catalog, researchers can easily generate a comprehensive report of mouse genome features from a single source and compare the details of gene and transcript structure using MGI's mouse genome browser.

  9. Functional coverage of the human genome by existing structures, structural genomics targets, and homology models.

    PubMed

    Xie, Lei; Bourne, Philip E

    2005-08-01

    The bias in protein structure and function space resulting from experimental limitations and targeting of particular functional classes of proteins by structural biologists has long been recognized, but never continuously quantified. Using the Enzyme Commission and the Gene Ontology classifications as a reference frame, and integrating structure data from the Protein Data Bank (PDB), target sequences from the structural genomics projects, structure homology derived from the SUPERFAMILY database, and genome annotations from Ensembl and NCBI, we provide a quantified view, both at the domain and whole-protein levels, of the current and projected coverage of protein structure and function space relative to the human genome. Protein structures currently provide at least one domain that covers 37% of the functional classes identified in the genome; whole structure coverage exists for 25% of the genome. If all the structural genomics targets were solved (twice the current number of structures in the PDB), it is estimated that structures of one domain would cover 69% of the functional classes identified and complete structure coverage would be 44%. Homology models from existing experimental structures extend the 37% coverage to 56% of the genome as single domains and 25% to 31% for complete structures. Coverage from homology models is not evenly distributed by protein family, reflecting differing degrees of sequence and structure divergence within families. While these data provide coverage, conversely, they also systematically highlight functional classes of proteins for which structures should be determined. Current key functional families without structure representation are highlighted here; updated information on the "most wanted list" that should be solved is available on a weekly basis from http://function.rcsb.org:8080/pdb/function_distribution/index.html.

  10. Honeybee (Apis mellifera L.) mrjp gene family: computational analysis of putative promoters and genomic structure of mrjp1, the gene coding for the most abundant protein of larval food.

    PubMed

    Malecová, Barbora; Ramser, Juliane; O'Brien, John K; Janitz, Michal; Júdová, Jana; Lehrach, Hans; Simúth, Jozef

    2003-01-16

    Mrjp1 gene belongs to the honeybee mrjp gene family encoding the major royal jelly proteins (MRJPs), secreted by nurse bees into the royal jelly. In this study, we have isolated the genomic clone containing the entire mrjp1 gene and determined its sequence. The mrjp1 gene sequence spans over 3038 bp and contains six exons separated by five introns. Seven mismatches between the mrjp1 gene sequence and two previously independently published cDNA sequences were found, but these differences do not lead to any change in the deduced amino acid sequence of MRJP1. With the aid of inverse polymerase chain reaction we obtained sequences flanking the 5' ends of other mrjp genes (mrjp2, mrjp3, mrjp4 and mrjp5). Putative promoters were predicted upstream of all mrjp genes (including mrjp1). The predicted promoters contain the TATA motif (TATATATT), highly conserved both in sequence and position. Ultraspiracle (USP) transcription factor (TF) binding sites in putative promoter regions and clusters of dead ringer TF binding sites upstream of these promoters were predicted computationally. We propose that USP, as a juvenile hormone (JH) binding TF, might possibly act as a mediator of mrjp expression in response to JH. Mrjp1's genomic locus is predicted to encode an antisense transcript, partially overlapping with five mrjp1 exons and entirely overlapping with the putative promoter and predicted transcriptional start point of mrjp1. This finding may shed light on the mechanisms of regulation of mrjps expression. Southern blot analysis of genomic DNA revealed that all so far known members of mrjp gene family (mrjp1, mrjp2, mrjp3, mrjp4 and mrjp5) are present as single-copy genes per haploid honeybee genome. Although MRJPs and the yellow protein of Drosophila melanogaster share a certain degree of similarity in aa sequence and although it has been shown that they share a common evolutionary origin, neither structural similarities in the gene organization, nor significant similarities

  11. KEGG: Kyoto Encyclopedia of Genes and Genomes.

    PubMed

    Ogata, H; Goto, S; Sato, K; Fujibuchi, W; Bono, H; Kanehisa, M

    1999-01-01

    Kyoto Encyclopedia of Genes and Genomes (KEGG) is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules. The major component of KEGG is the PATHWAY database that consists of graphical diagrams of biochemical pathways including most of the known metabolic pathways and some of the known regulatory pathways. The pathway information is also represented by the ortholog group tables summarizing orthologous and paralogous gene groups among different organisms. KEGG maintains the GENES database for the gene catalogs of all organisms with complete genomes and selected organisms with partial genomes, which are continuously re-annotated, as well as the LIGAND database for chemical compounds and enzymes. Each gene catalog is associated with the graphical genome map for chromosomal locations that is represented by Java applet. In addition to the data collection efforts, KEGG develops and provides various computational tools, such as for reconstructing biochemical pathways from the complete genome sequence and for predicting gene regulatory networks from the gene expression profiles. The KEGG databases are daily updated and made freely available (http://www.genome.ad.jp/kegg/).

  12. Gene enrichment in plant genomic shotgun libraries.

    PubMed

    Rabinowicz, Pablo D; McCombie, W Richard; Martienssen, Robert A

    2003-04-01

    The Arabidopsis genome (about 130 Mbp) has been completely sequenced; whereas a draft sequence of the rice genome (about 430 Mbp) is now available and the sequencing of this genome will be completed in the near future. The much larger genomes of several important crop species, such as wheat (about 16,000 Mbp) or maize (about 2500 Mbp), may not be fully sequenced with current technology. Instead, sequencing-analysis strategies are being developed to obtain sequencing and mapping information selectively for the genic fraction (gene space) of complex plant genomes.

  13. Structural Genomics of Protein Phosphatases

    SciTech Connect

    Almo,S.; Bonanno, J.; Sauder, J.; Emtage, S.; Dilorenzo, T.; Malashkevich, V.; Wasserman, S.; Swaminathan, S.; Eswaramoorthy, S.; et al

    2007-01-01

    The New York SGX Research Center for Structural Genomics (NYSGXRC) of the NIGMS Protein Structure Initiative (PSI) has applied its high-throughput X-ray crystallographic structure determination platform to systematic studies of all human protein phosphatases and protein phosphatases from biomedically-relevant pathogens. To date, the NYSGXRC has determined structures of 21 distinct protein phosphatases: 14 from human, 2 from mouse, 2 from the pathogen Toxoplasma gondii, 1 from Trypanosoma brucei, the parasite responsible for African sleeping sickness, and 2 from the principal mosquito vector of malaria in Africa, Anopheles gambiae. These structures provide insights into both normal and pathophysiologic processes, including transcriptional regulation, regulation of major signaling pathways, neural development, and type 1 diabetes. In conjunction with the contributions of other international structural genomics consortia, these efforts promise to provide an unprecedented database and materials repository for structure-guided experimental and computational discovery of inhibitors for all classes of protein phosphatases.

  14. JGI Plant Genomics Gene Annotation Pipeline

    SciTech Connect

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David; Hayes, David; Mitros, Therese

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward this aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.

  15. Genes but Not Genomes Reveal Bacterial Domestication of Lactococcus Lactis

    PubMed Central

    Passerini, Delphine; Beltramo, Charlotte; Coddeville, Michele; Quentin, Yves; Ritzenthaler, Paul

    2010-01-01

    Background The population structure and diversity of Lactococcus lactis subsp. lactis, a major industrial bacterium involved in milk fermentation, was determined at both gene and genome level. Seventy-six lactococcal isolates of various origins were studied by different genotyping methods and thirty-six strains displaying unique macrorestriction fingerprints were analyzed by a new multilocus sequence typing (MLST) scheme. This gene-based analysis was compared to genomic characteristics determined by pulsed-field gel electrophoresis (PFGE). Methodology/Principal Findings The MLST analysis revealed that L. lactis subsp. lactis is essentially clonal with infrequent intra- and intergenic recombination; also, despite its taxonomical classification as a subspecies, it displays a genetic diversity as substantial as that within several other bacterial species. Genome-based analysis revealed a genome size variability of 20%, a value typical of bacteria inhabiting different ecological niches, and that suggests a large pan-genome for this subspecies. However, the genomic characteristics (macrorestriction pattern, genome or chromosome size, plasmid content) did not correlate to the MLST-based phylogeny, with strains from the same sequence type (ST) differing by up to 230 kb in genome size. Conclusion/Significance The gene-based phylogeny was not fully consistent with the traditional classification into dairy and non-dairy strains but supported a new classification based on ecological separation between “environmental” strains, the main contributors to the genetic diversity within the subspecies, and “domesticated” strains, subject to recent genetic bottlenecks. Comparison between gene- and genome-based analyses revealed little relationship between core and dispensable genome phylogenies, indicating that clonal diversification and phenotypic variability of the “domesticated” strains essentially arose through substantial genomic flux within the dispensable genome

  16. Gene Insertion Into Genomic Safe Harbors for Human Gene Therapy

    PubMed Central

    Papapetrou, Eirini P; Schambach, Axel

    2016-01-01

    Genomic safe harbors (GSHs) are sites in the genome able to accommodate the integration of new genetic material in a manner that ensures that the newly inserted genetic elements: (i) function predictably and (ii) do not cause alterations of the host genome posing a risk to the host cell or organism. GSHs are thus ideal sites for transgene insertion whose use can empower functional genetics studies in basic research and therapeutic applications in human gene therapy. Currently, no fully validated GSHs exist in the human genome. Here, we review our formerly proposed GSH criteria and discuss additional considerations on extending these criteria, on strategies for the identification and validation of GSHs, as well as future prospects on GSH targeting for therapeutic applications. In view of recent advances in genome biology, gene targeting technologies, and regenerative medicine, gene insertion into GSHs can potentially catalyze nearly all applications in human gene therapy. PMID:26867951

  17. Genome-Wide Views of Chromatin Structure

    PubMed Central

    Rando, Oliver J.; Chang, Howard Y.

    2010-01-01

    Eukaryotic genomes are packaged into a nucleoprotein complex known as chromatin, which affects most processes that occur on DNA. Along with genetic and biochemical studies of resident chromatin proteins and their modifying enzymes, mapping of chromatin structure in vivo is one of the main pillars in our understanding of how chromatin relates to cellular processes. In this review, we discuss the use of genomic technologies to characterize chromatin structure in vivo, with a focus on data from budding yeast and humans. The picture emerging from these studies is the detailed chromatin structure of a typical gene, where the typical behavior gives insight into the mechanisms and deep rules that establish chromatin structure. Important deviation from the archetype is also observed, usually as a consequence of unique regulatory mechanisms at special genomic loci. Chromatin structure shows substantial conservation from yeast to humans, but mammalian chromatin has additional layers of complexity that likely relate to the requirements of multicellularity such as the need to establish faithful gene regulatory mechanisms for cell differentiation. PMID:19317649

  18. Using Genomics for Natural Product Structure Elucidation.

    PubMed

    Tietz, Jonathan I; Mitchell, Douglas A

    2016-01-01

    Natural products (NPs) are the most historically bountiful source of chemical matter for drug development-especially for anti-infectives. With insights gleaned from genome mining, interest in natural product discovery has been reinvigorated. An essential stage in NP discovery is structural elucidation, which sheds light not only on the chemical composition of a molecule but also its novelty, properties, and derivatization potential. The history of structure elucidation is replete with techniquebased revolutions: combustion analysis, crystallography, UV, IR, MS, and NMR have each provided game-changing advances; the latest such advance is genomics. All natural products have a genetic basis, and the ability to obtain and interpret genomic information for structure elucidation is increasingly available at low cost to non-specialists. In this review, we describe the value of genomics as a structural elucidation technique, especially from the perspective of the natural product chemist approaching an unknown metabolite. Herein we first introduce the databases and programs of interest to the natural products chemist, with an emphasis on those currently most suited for general usability. We describe strategies for linking observed natural product-linked phenotypes to their corresponding gene clusters. We then discuss techniques for extracting structural information from genes, illustrated with numerous case examples. We also provide an analysis of the biases and limitations of the field with recommendations for future development. Our overview is not only aimed at biologically-oriented researchers already at ease with bioinformatic techniques, but also, in particular, at natural product, organic, and/or medicinal chemists not previously familiar with genomic techniques.

  19. Genomic evidence for adaptation by gene duplication.

    PubMed

    Qian, Wenfeng; Zhang, Jianzhi

    2014-08-01

    Gene duplication is widely believed to facilitate adaptation, but unambiguous evidence for this hypothesis has been found in only a small number of cases. Although gene duplication may increase the fitness of the involved organisms by doubling gene dosage or neofunctionalization, it may also result in a simple division of ancestral functions into daughter genes, which need not promote adaptation. Hence, the general validity of the adaptation by gene duplication hypothesis remains uncertain. Indeed, a genome-scale experiment found similar fitness effects of deleting pairs of duplicate genes and deleting individual singleton genes from the yeast genome, leading to the conclusion that duplication rarely results in adaptation. Here we contend that the above comparison is unfair because of a known duplication bias among genes with different fitness contributions. To rectify this problem, we compare homologous genes from the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe. We discover that simultaneously deleting a duplicate gene pair in S. cerevisiae reduces fitness significantly more than deleting their singleton counterpart in S. pombe, revealing post-duplication adaptation. The duplicates-singleton difference in fitness effect is not attributable to a potential increase in gene dose after duplication, suggesting that the adaptation is owing to neofunctionalization, which we find to be explicable by acquisitions of binary protein-protein interactions rather than gene expression changes. These results provide genomic evidence for the role of gene duplication in organismal adaptation and are important for understanding the genetic mechanisms of evolutionary innovation.

  20. An integrated approach to structural genomics.

    PubMed

    Heinemann, U; Frevert, J; Hofmann, K; Illing, G; Maurer, C; Oschkinat, H; Saenger, W

    2000-01-01

    Structural genomics aims at determining a set of protein structures that will represent all domain folds present in the biosphere. These structures can be used as the basis for the homology modelling of the majority of all remaining protein domains or, indeed, proteins. Structural genomics therefore promises to provide a comprehensive structural description of the protein universe. To achieve this, a broad scientific effort is required. The Berlin-based "Protein Structure Factory" (PSF) plans to contribute to this effort by setting up a local infrastructure for the low-cost, high-throughput analysis of soluble human proteins. In close collaboration with the German Human Genome Project (DHGP) protein-coding genes will be expressed in Escherichia coli or yeast. Affinity-tagged proteins will be purified semi-automatically for biophysical characterization and structure analysis by X-ray diffraction methods and NMR spectroscopy. In all steps of the structure analysis process, possibilities for automation, parallelization and standardization will be explored. Major new facilities that are created for the PSF include a robotic station for large-scale protein crystallization, an NMR center and an experimental station for protein crystallography at the synchrotron storage ring BESSY II in Berlin.

  1. From gene action to reactive genomes

    PubMed Central

    Keller, Evelyn Fox

    2014-01-01

    Poised at a critical turning point in the history of genetics, recent work (e.g. in genomics, epigenetics, genomic plasticity) obliges us to critically reexamine many of our most basic concepts. For example, I argue that genomic research supports a radical transformation in our understanding of the genome – a shift from an earlier conception of that entity as an effectively static collection of active genes to that of a dynamic and reactive system dedicated to the context specific regulation of protein-coding sequences. PMID:24882822

  2. hSmad5 gene, a human hSmad family member: its full length cDNA, genomic structure, promoter region and mutation analysis in human tumors.

    PubMed

    Gemma, A; Hagiwara, K; Vincent, F; Ke, Y; Hancock, A R; Nagashima, M; Bennett, W P; Harris, C C

    1998-02-19

    hSmad (mothers against decapentaplegic)-related proteins are important messengers within the Transforming Growth Factor-beta1 (TGF-beta1) superfamily signal transduction pathways. To further characterize a member of this family, we obtained a full length cDNA of the human hSmad5 (hSmad5) gene by rapid amplification of cDNA ends (RACE) and then determined the genomic structure of the gene. There are eight exons and two alternative transcripts; the shorter transcript lacks exon 2. We identified the hSmad5 promoter region from a human genomic YAC clone by obtaining the nucleotide sequence extending 1235 base pairs upstream of the 5' end of the cDNA. We found a CpG island consistent with a promoter region, and we demonstrated promoter activity in a 1232 bp fragment located upstream of the transcription initiation site. To investigate the frequency of somatic hSmad5 mutations in human cancers, we designed intron-based primers to examine coding regions by polymerase chain reaction-single strand conformation polymorphism (PCR-SSCP) analysis. Neither homozygous deletions or point mutations were found in 40 primary gastric tumors and 51 cell lines derived from diverse types of human cancer including 20 cell lines resistant to the growth inhibitory effects of TGF-beta1. These results suggest that the hSmad5 gene is not commonly mutated and that other genetic alterations mediate the loss of TGF-beta1 responsiveness in human cancers.

  3. Novel recombinant papillomavirus genomes expressing selectable genes

    PubMed Central

    Van Doorslaer, Koenraad; Porter, Samuel; McKinney, Caleb; Stepp, Wesley H.; McBride, Alison A.

    2016-01-01

    Papillomaviruses infect and replicate in keratinocytes, but viral proteins are initially expressed at low levels and there is no effective and quantitative method to determine the efficiency of infection on a cell-to-cell basis. Here we describe human papillomavirus (HPV) genomes that express marker proteins (antibiotic resistance genes and Green Fluorescent Protein), and can be used to elucidate early stages in HPV infection of primary keratinocytes. To generate these recombinant genomes, the late region of the oncogenic HPV18 genome was replaced by CpG free marker genes. Insertion of these exogenous genes did not affect early replication, and had only minimal effects on early viral transcription. When introduced into primary keratinocytes, the recombinant marker genomes gave rise to drug-resistant keratinocyte colonies and cell lines, which maintained the extrachromosomal recombinant genome long-term. Furthermore, the HPV18 “marker” genomes could be packaged into viral particles (quasivirions) and used to infect primary human keratinocytes in culture. This resulted in the outgrowth of drug-resistant keratinocyte colonies containing replicating HPV18 genomes. In summary, we describe HPV18 marker genomes that can be used to quantitatively investigate many aspects of the viral life cycle. PMID:27892937

  4. Molecular cloning, partial genomic structure and functional characterization of succinic semialdehyde dehydrogenase genes from the parasitic insects Lucilia cuprina and Ctenocephalides felis.

    PubMed

    Rothacker, B; Werr, M; Ilg, T

    2008-06-01

    The enzyme succinic semialdehyde dehydrogenase (SSADH; EC1.2.1.24) is a component of the gamma-aminobutyric acid degradation pathway in mammals and is essential for development and function of the nervous system. Here we report the identification, cDNA cloning and functional expression of SSADH from the parasitic insects Lucilia cuprina and Ctenocephalides felis. The recombinant proteins possess potent NAD+-dependent SSADH activity, while their catalytic efficiency for other aldehyde substrates is lower. A genomic copy of the L. cuprina SSADH gene contains two introns, while a genomic gene version of C. felis is devoid of introns. In contrast to the single copy SSADH genes in Drosophila melanogaster and mammals, in L. cuprina and C. felis, multiple SSADH gene copies are present in the genome.

  5. Characterization of the Wilson disease gene: Genomic organization; alternative splicing; structure/function predictions; and population frequencies of disease-specific mutations

    SciTech Connect

    Petrukhin, K.; Chernov, I.; Ross, B.M.

    1994-09-01

    The Wilson disease (WD) gene has recently been identified as a putative copper-transporting ATPase with high amino acid similarity with the Menkes disease (MNK) gene. We have further characterized the WD gene by extending the 5{prime}-coding and non-coding DNA sequence and elucidating the intron/exon structure and genomic organization. Analysis of RNA transcripts from liver, brain, kidney and placenta reveals extensive alternative splicing which may provide a mechanism to regulate the quantity of functional protein product. Comparative sequence analysis shows that WD and MNK belong to the sub-family of heavy metal-transporting ATPases with several characterizing features which include unique amino acid motifs and distinct N-terminal and C-terminal transmembrane structure. Our data indicate that the 600 amino acid metal binding portion of the WD and MNK proteins was formed by gene duplication events and splicing of the 6 metal binding domain segment to a common ancestral protein. We have raised a WD-specific anti-peptide antibody to the N-terminal region and are beginning to explore the cellular and intracellular location of the WD protein. The metal-binding segment of the WD protein has been expressed in E. coli and metal binding assays are underway to characterize this aspect of the protein`s function. We have identified numerous disease-specific mutations and developed a rapid {open_quotes}reverse dot blot{close_quotes} screening protocol to determine mutation frequencies in different populations. The most common mutation disrupts the characteristic SEHP motif and accounts for more than 40% of WD cases in North American, Russian, and Swedish populations. This mutation has not been observed in our limited Sicilian sample.

  6. PGDD: a database of gene and genome duplication in plants

    PubMed Central

    Lee, Tae-Ho; Tang, Haibao; Wang, Xiyin; Paterson, Andrew H.

    2013-01-01

    Genome duplication (GD) has permanently shaped the architecture and function of many higher eukaryotic genomes. The angiosperms (flowering plants) are outstanding models in which to elucidate consequences of GD for higher eukaryotes, owing to their propensity for chromosomal duplication or even triplication in a few cases. Duplicated genome structures often require both intra- and inter-genome alignments to unravel their evolutionary history, also providing the means to deduce both obvious and otherwise-cryptic orthology, paralogy and other relationships among genes. The burgeoning sets of angiosperm genome sequences provide the foundation for a host of investigations into the functional and evolutionary consequences of gene and GD. To provide genome alignments from a single resource based on uniform standards that have been validated by empirical studies, we built the Plant Genome Duplication Database (PGDD; freely available at http://chibba.agtec.uga.edu/duplication/), a web service providing synteny information in terms of colinearity between chromosomes. At present, PGDD contains data for 26 plants including bryophytes and chlorophyta, as well as angiosperms with draft genome sequences. In addition to the inclusion of new genomes as they become available, we are preparing new functions to enhance PGDD. PMID:23180799

  7. Gene duplication and transfer events in plant mitochondria genome

    SciTech Connect

    Xiong Aisheng Peng Rihe; Zhuang Jing; Gao Feng; Zhu Bo; Fu Xiaoyan; Xue Yong; Jin Xiaofen; Tian Yongsheng; Zhao Wei; Yao Quanhong

    2008-11-07

    Gene or genome duplication events increase the amount of genetic material available to increase the genomic, and thereby phenotypic, complexity of organisms during evolution. Gene duplication and transfer events have been important to molecular evolution in all three domains of life, and may be the first step in the emergence of new gene functions. Gene transfer events have been proposed as another accelerator of evolution. The duplicated gene or genome, mainly nuclear, has been the subject of several recent reviews. In addition to the nuclear genome, organisms have organelle genomes, including mitochondrial genome. In this review, we briefly summarize gene duplication and transfer events in the plant mitochondrial genome.

  8. Structural Genomics on the Web

    PubMed Central

    Wixon, Jo

    2001-01-01

    In this review we provide a brief guide to some of the resources and databases that can be used to locate information and aid research in the growing field of structural genomics. The review will provide examples, for less experienced users, of what can be achieved using a selection of the available sites. We hope that this will encourage you to use these sites to their full potential and whet your appetite to search for other related sites. PMID:18628900

  9. Single nucleotide polymorphisms reveal genetic structuring of the carpathian newt and provide evidence of interspecific gene flow in the nuclear genome.

    PubMed

    Zieliński, Piotr; Dudek, Katarzyna; Stuglik, Michał Tadeusz; Liana, Marcin; Babik, Wiesław

    2014-01-01

    Genetic variation within species is commonly structured in a hierarchical manner which may result from superimposition of processes acting at different spatial and temporal scales. In organisms of limited dispersal ability, signatures of past subdivision are detectable for a long time. Studies of contemporary genetic structure in such taxa inform about the history of isolation, range changes and local admixture resulting from geographically restricted hybridization with related species. Here we use a set of 139 transcriptome-derived, unlinked nuclear single nucleotide polymorphisms (SNP) to assess the genetic structure of the Carpathian newt (Lissotriton montandoni, Lm) and introgression from its congener, the smooth newt (L. vulgaris, Lv). Two substantially differentiated groups of Lm populations likely originated from separate refugia, both located in the Eastern Carpathians. The colonization of the present range in north-western and south-western directions was accompanied by a modest loss of variation; admixture between the two groups has occurred in the middle of the Eastern Carpathians. Local, apparently recent introgression of Lv alleles into several Lm populations was detected, demonstrating increased power for admixture detection in comparison to a previous study based on a limited number of microsatellite markers. The level of introgression was higher in Lm populations classified as admixed than in syntopic populations. We discuss the possible causes and propose further tests to distinguish between alternatives. Several outlier loci were identified in tests of interspecific differentiation, suggesting genomic heterogeneity of gene flow between species.

  10. From trees to the forest: genes to genomics.

    PubMed

    Mullighan, Charles; Petersdorf, Effie; Davies, Stella M; DiPersio, John

    2011-01-01

    Crick, Watson, and colleagues revealed the genetic code in 1953, and since that time, remarkable progress has been made in understanding what makes each of us who we are. Identification of single genes important in disease, and the development of a mechanistic understanding of genetic elements that regulate gene function, have cast light on the pathophysiology of many heritable and acquired disorders. In 1990, the human genome project commenced, with the goal of sequencing the entire human genome, and a "first draft" was published with astonishing speed in 2001. The first draft, although an extraordinary achievement, reported essentially an imaginary haploid mix of alleles rather than a true diploid genome. In the years since 2001, technology has further improved, and efforts have been focused on filling in the gaps in the initial genome and starting the huge task of looking at normal variation in the human genome. This work is the beginning of understanding human genetics in the context of the structure of the genome as a complete entity, and as more than simply the sum of a series of genes. We present 3 studies in this review that apply genomic approaches to leukemia and to transplantation to improve and extend therapies.

  11. Bacterial Cellular Engineering by Genome Editing and Gene Silencing

    PubMed Central

    Nakashima, Nobutaka; Miyazaki, Kentaro

    2014-01-01

    Genome editing is an important technology for bacterial cellular engineering, which is commonly conducted by homologous recombination-based procedures, including gene knockout (disruption), knock-in (insertion), and allelic exchange. In addition, some new recombination-independent approaches have emerged that utilize catalytic RNAs, artificial nucleases, nucleic acid analogs, and peptide nucleic acids. Apart from these methods, which directly modify the genomic structure, an alternative approach is to conditionally modify the gene expression profile at the posttranscriptional level without altering the genomes. This is performed by expressing antisense RNAs to knock down (silence) target mRNAs in vivo. This review describes the features and recent advances on methods used in genomic engineering and silencing technologies that are advantageously used for bacterial cellular engineering. PMID:24552876

  12. Genome editing for human gene therapy.

    PubMed

    Meissner, Torsten B; Mandal, Pankaj K; Ferreira, Leonardo M R; Rossi, Derrick J; Cowan, Chad A

    2014-01-01

    The rapid advancement of genome-editing techniques holds much promise for the field of human gene therapy. From bacteria to model organisms and human cells, genome editing tools such as zinc-finger nucleases (ZNFs), TALENs, and CRISPR/Cas9 have been successfully used to manipulate the respective genomes with unprecedented precision. With regard to human gene therapy, it is of great interest to test the feasibility of genome editing in primary human hematopoietic cells that could potentially be used to treat a variety of human genetic disorders such as hemoglobinopathies, primary immunodeficiencies, and cancer. In this chapter, we explore the use of the CRISPR/Cas9 system for the efficient ablation of genes in two clinically relevant primary human cell types, CD4+ T cells and CD34+ hematopoietic stem and progenitor cells. By using two guide RNAs directed at a single locus, we achieve highly efficient and predictable deletions that ablate gene function. The use of a Cas9-2A-GFP fusion protein allows FACS-based enrichment of the transfected cells. The ease of designing, constructing, and testing guide RNAs makes this dual guide strategy an attractive approach for the efficient deletion of clinically relevant genes in primary human hematopoietic stem and effector cells and enables the use of CRISPR/Cas9 for gene therapy.

  13. Structural Genomics of Minimal Organisms: Pipeline and Results

    SciTech Connect

    Kim, Sung-Hou; Shin, Dong-Hae; Kim, Rosalind; Adams, Paul; Chandonia, John-Marc

    2007-09-14

    The initial objective of the Berkeley Structural Genomics Center was to obtain a near complete three-dimensional (3D) structural information of all soluble proteins of two minimal organisms, closely related pathogens Mycoplasma genitalium and M. pneumoniae. The former has fewer than 500 genes and the latter has fewer than 700 genes. A semiautomated structural genomics pipeline was set up from target selection, cloning, expression, purification, and ultimately structural determination. At the time of this writing, structural information of more than 93percent of all soluble proteins of M. genitalium is avail able. This chapter summarizes the approaches taken by the authors' center.

  14. Genomic organization of the AODEF gene in Asparagus officinalis L.

    PubMed

    Ito, Takuro; Suzuki, Go; Ochiai, Toshinori; Nakada, Mutsumi; Kameya, Toshiaki; Kanno, Akira

    2005-04-01

    The perianths of Liliaceae plants, such as lily and tulip, have two whorls of almost identical petaloid organs, which are called tepals. According to the modified ABC model proposed in tulip, the class B genes are expressed in whorl 1 as well as whorls 2 and 3, so that the organs of whorls 1 and 2 have the same petaloid structure. The floral structure of asparagus (Asparagus officinalis L.) is similar to that of Liliaceae plants, however, the expression of B-class genes (AODEF, AOGLOA, AOGLOB) was not found in whorl 1, but was confined to whorls 2 and 3. This result does not support the modified ABC model in asparagus. In order to gain a better understanding of asparagus flower development, we have characterized a genomic clone of the AODEF gene. We compared the genomic organization and promoter sequence of AODEF with three well-studied DEF-like genes, DEFICIENS (Antirrhinum), APETALA3 (Arabidopsis), and OSMADS16 (rice). Exon-intron structures of these genes are well-conserved except for the large fifth intron in the AODEF gene and the OSMADS16 gene. Putative cis-elements including CArG-boxes were found in the promoter region and forty-two microsatellites were found in the AODEF genomic sequence.

  15. Regulatory genes in the ancestral chordate genomes.

    PubMed

    Satou, Yutaka; Wada, Shuichi; Sasakura, Yasunori; Satoh, Nori

    2008-12-01

    Changes or innovations in gene regulatory networks for the developmental program in the ancestral chordate genome appear to be a major component in the evolutionary process in which tadpole-type larvae, a unique characteristic of chordates, arose. These alterations may include new genetic interactions as well as the acquisition of new regulatory genes. Previous analyses of the Ciona genome revealed that many genes may have emerged after the divergence of the tunicate and vertebrate lineages. In this paper, we examined this possibility by examining a second non-vertebrate chordate genome. We conclude from this analysis that the ancient chordate included almost the same repertory of regulatory genes, but less redundancy than extant vertebrates, and that approximately 10% of vertebrate regulatory genes were innovated after the emergence of vertebrates. Thus, refined regulatory networks arose during vertebrate evolution mainly as preexisting regulatory genes multiplied rather than by generating new regulatory genes. The inferred regulatory gene sets of the ancestral chordate would be an important foundation for understanding how tadpole-type larvae, a unique characteristic of chordates, evolved.

  16. 2004 Structural, Function and Evolutionary Genomics

    SciTech Connect

    Douglas L. Brutlag Nancy Ryan Gray

    2005-03-23

    This Gordon conference will cover the areas of structural, functional and evolutionary genomics. It will take a systematic approach to genomics, examining the evolution of proteins, protein functional sites, protein-protein interactions, regulatory networks, and metabolic networks. Emphasis will be placed on what we can learn from comparative genomics and entire genomes and proteomes.

  17. Domains of α- and β-globin genes in the context of the structural-functional organization of the eukaryotic genome.

    PubMed

    Razin, S V; Ulianov, S V; Ioudinkova, E S; Gushchanskaya, E S; Gavrilov, A A; Iarovaia, O V

    2012-12-01

    The eukaryotic cell genome has a multilevel regulatory system of gene expression that includes stages of preliminary activation of genes or of extended genomic regions (switching them to potentially active states) and stages of final activation of promoters and maintaining their active status in cells of a certain lineage. Current views on the regulatory systems of transcription in eukaryotes have been formed based on results of systematic studies on a limited number of model systems, in particular, on the α- and β-globin gene domains of vertebrates. Unexpectedly, these genomic domains harboring genes responsible for the synthesis of different subunits of the same protein were found to have a fundamentally different organization inside chromatin. In this review, we analyze specific features of the organization of the α- and β-globin gene domains in vertebrates, as well as principles of activities of the regulatory systems in these domains. In the final part of the review, we attempt to answer the question how the evolution of α- and β-globin genes has led to segregation of these genes into two distinct types of chromatin domains situated on different chromosomes.

  18. Genomic platform for efficient identification of fungal secondary metabolism genes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Fungal secondary metabolites (SMs) are structurally diverse natural compounds, which are thought to have great potential not only for medical industry but also for chemical and environmental industries. Since expansion of sequencing microbial genomes in 1990’s, it has been known that SM genes are ex...

  19. Regulation of methane genes and genome expression

    SciTech Connect

    John N. Reeve

    2009-09-09

    At the start of this project, it was known that methanogens were Archaeabacteria (now Archaea) and were therefore predicted to have gene expression and regulatory systems different from Bacteria, but few of the molecular biology details were established. The goals were then to establish the structures and organizations of genes in methanogens, and to develop the genetic technologies needed to investigate and dissect methanogen gene expression and regulation in vivo. By cloning and sequencing, we established the gene and operon structures of all of the “methane” genes that encode the enzymes that catalyze methane biosynthesis from carbon dioxide and hydrogen. This work identified unique sequences in the methane gene that we designated mcrA, that encodes the largest subunit of methyl-coenzyme M reductase, that could be used to identify methanogen DNA and establish methanogen phylogenetic relationships. McrA sequences are now the accepted standard and used extensively as hybridization probes to identify and quantify methanogens in environmental research. With the methane genes in hand, we used northern blot and then later whole-genome microarray hybridization analyses to establish how growth phase and substrate availability regulated methane gene expression in Methanobacterium thermautotrophicus ΔH (now Methanothermobacter thermautotrophicus). Isoenzymes or pairs of functionally equivalent enzymes catalyze several steps in the hydrogen-dependent reduction of carbon dioxide to methane. We established that hydrogen availability determine which of these pairs of methane genes is expressed and therefore which of the alternative enzymes is employed to catalyze methane biosynthesis under different environmental conditions. As were unable to establish a reliable genetic system for M. thermautotrophicus, we developed in vitro transcription as an alternative system to investigate methanogen gene expression and regulation. This led to the discovery that an archaeal protein

  20. iCAGES: integrated CAncer GEnome Score for comprehensively prioritizing driver genes in personal cancer genomes.

    PubMed

    Dong, Chengliang; Guo, Yunfei; Yang, Hui; He, Zeyu; Liu, Xiaoming; Wang, Kai

    2016-12-22

    Cancer results from the acquisition of somatic driver mutations. Several computational tools can predict driver genes from population-scale genomic data, but tools for analyzing personal cancer genomes are underdeveloped. Here we developed iCAGES, a novel statistical framework that infers driver variants by integrating contributions from coding, non-coding, and structural variants, identifies driver genes by combining genomic information and prior biological knowledge, then generates prioritized drug treatment. Analysis on The Cancer Genome Atlas (TCGA) data showed that iCAGES predicts whether patients respond to drug treatment (P = 0.006 by Fisher's exact test) and long-term survival (P = 0.003 from Cox regression). iCAGES is available at http://icages.wglab.org .

  1. Genomic structure of the EWS gene and its relationship to EWSR1, a site of tumor-associated chromosome translocation

    SciTech Connect

    Plougastel, B.; Zucman, J.; Peter, M.; Thomas, G.; Delattre, O. )

    1993-12-01

    The EWS gene has been identified based on its location at the chromosome 22 breakpoint of the t(11;22)(q24;q12) translocation that characterizes Ewing sarcoma and related neuroectodermal tumors. The EWS gene spans about 40 kb of DNA and is encoded by 17 exons. The nucleotide sequence of the exons is identical to that of the previously described cDNA. The first 7 exons encode the N-terminal domain of EWS, which consists of a repeated degenerated polypeptide of 7 to 12 residues rich in tyrosine, serine, threonine, glycine, and glutamine. Exons 11, 12, and 13 encode the putative RNA binding domain. The three glycine- and arginine-rich motifs of the gene are mainly encoded by exons 8-9, 14, and 16. The DNA sequence in the 5[prime] region of the gene has features of a CpG-rich island and lacks canonical promoter elements, such as TATA and CCAAT consensus sequences. Positions of the chromosome 22 breakpoints were determined for 19 Ewing tumors. They were localized in introns 7 or 8 in 18 cases and in intron 10 in 1 case. 26 refs., 5 figs.

  2. The Complete Mitochondrial Genome of Aleurocanthus camelliae: Insights into Gene Arrangement and Genome Organization within the Family Aleyrodidae

    PubMed Central

    Chen, Shi-Chun; Wang, Xiao-Qing; Li, Pin-Wu; Hu, Xiang; Wang, Jin-Jun; Peng, Ping

    2016-01-01

    There are numerous gene rearrangements and transfer RNA gene absences existing in mitochondrial (mt) genomes of Aleyrodidae species. To understand how mt genomes evolved in the family Aleyrodidae, we have sequenced the complete mt genome of Aleurocanthus camelliae and comparatively analyzed all reported whitefly mt genomes. The mt genome of A. camelliae is 15,188 bp long, and consists of 13 protein-coding genes, two rRNA genes, 21 tRNA genes and a putative control region (GenBank: KU761949). The tRNA gene, trnI, has not been observed in this genome. The mt genome has a unique gene order and shares most gene boundaries with Tetraleurodes acaciae. Nineteen of 21 tRNA genes have the conventional cloverleaf shaped secondary structure and two (trnS1 and trnS2) lack the dihydrouridine (DHU) arm. Using ARWEN and homologous sequence alignment, we have identified five tRNA genes and revised the annotation for three whitefly mt genomes. This result suggests that most absent genes exist in the genomes and have not been identified, due to be lack of technology and inference sequence. The phylogenetic relationships among 11 whiteflies and Drosophila melanogaster were inferred by maximum likelihood and Bayesian inference methods. Aleurocanthus camelliae and T. acaciae form a sister group, and all three Bemisia tabaci and two Bemisia afer strains gather together. These results are identical to the relationships inferred from gene order. We inferred that gene rearrangement plays an important role in the mt genome evolved from whiteflies. PMID:27827992

  3. The Complete Mitochondrial Genome of Aleurocanthus camelliae: Insights into Gene Arrangement and Genome Organization within the Family Aleyrodidae.

    PubMed

    Chen, Shi-Chun; Wang, Xiao-Qing; Li, Pin-Wu; Hu, Xiang; Wang, Jin-Jun; Peng, Ping

    2016-11-07

    There are numerous gene rearrangements and transfer RNA gene absences existing in mitochondrial (mt) genomes of Aleyrodidae species. To understand how mt genomes evolved in the family Aleyrodidae, we have sequenced the complete mt genome of Aleurocanthus camelliae and comparatively analyzed all reported whitefly mt genomes. The mt genome of A. camelliae is 15,188 bp long, and consists of 13 protein-coding genes, two rRNA genes, 21 tRNA genes and a putative control region (GenBank: KU761949). The tRNA gene, trnI, has not been observed in this genome. The mt genome has a unique gene order and shares most gene boundaries with Tetraleurodes acaciae. Nineteen of 21 tRNA genes have the conventional cloverleaf shaped secondary structure and two (trnS₁ and trnS₂) lack the dihydrouridine (DHU) arm. Using ARWEN and homologous sequence alignment, we have identified five tRNA genes and revised the annotation for three whitefly mt genomes. This result suggests that most absent genes exist in the genomes and have not been identified, due to be lack of technology and inference sequence. The phylogenetic relationships among 11 whiteflies and Drosophila melanogaster were inferred by maximum likelihood and Bayesian inference methods. Aleurocanthus camelliae and T. acaciae form a sister group, and all three Bemisia tabaci and two Bemisia afer strains gather together. These results are identical to the relationships inferred from gene order. We inferred that gene rearrangement plays an important role in the mt genome evolved from whiteflies.

  4. An introduction to genes, genomes and disease.

    PubMed

    Hall, Peter A; Reis-Filho, Jorge S; Tomlinson, Ian Pm; Poulsom, Richard

    2010-01-01

    The human and other genome projects and subsequent resequencing programmes have provided new perspectives on the nature of the gene and how genes function. Understanding the complexity of the eukaryotic nucleus and the diversity of genetic regulatory mechanisms, including the role of non-coding RNAs, translational control mechanisms and the extraordinary prevalence of splicing, will be central to understanding how genes function, as will the recognition of gene dosage issues. This introduction to the 2010 Annual Review Issue, Genes, Genomes and Disease, provides overviews of these areas and then considers their relevance to a range of human diseases, including cardiovascular and renal disease, neural tube defects and cancer. The p53 gene is considered as an example of a massively regulated gene and the genetic perturbations in cancer are considered in a historical perspective. High-throughput genomic and transcriptomic methods have led to a paradigm shift in the way cancers are perceived and have changed the way translational research is performed. The progress in our understanding of chromosomal rearrangements in cancer, once believed to be incredibly rare events in epithelial malignancies, is discussed. The identification of low-penetrance cancer susceptibility genes through genome-wide association studies and their implications are reviewed. The contribution and limitations of expression profiling are discussed. In the last series of reviews, future challenges are addressed: the promise of synthetic lethality strategies in cancer therapy, a case for 'systems' approaches to genetic networks and the potential of single molecule genetic technologies. Finally, the question 'Does massively parallel DNA resequencing signify the end of histopathology as we know it?' is posed. Readers should find that the 2010 Annual Review Issue is an invaluable resource on contemporary genetics and its applications to understanding disease.

  5. Genomic hypomethylation in the human germline associates with selective structural mutability in the human genome.

    PubMed

    Li, Jian; Harris, R Alan; Cheung, Sau Wai; Coarfa, Cristian; Jeong, Mira; Goodell, Margaret A; White, Lisa D; Patel, Ankita; Kang, Sung-Hae; Shaw, Chad; Chinault, A Craig; Gambin, Tomasz; Gambin, Anna; Lupski, James R; Milosavljevic, Aleksandar

    2012-01-01

    The hotspots of structural polymorphisms and structural mutability in the human genome remain to be explained mechanistically. We examine associations of structural mutability with germline DNA methylation and with non-allelic homologous recombination (NAHR) mediated by low-copy repeats (LCRs). Combined evidence from four human sperm methylome maps, human genome evolution, structural polymorphisms in the human population, and previous genomic and disease studies consistently points to a strong association of germline hypomethylation and genomic instability. Specifically, methylation deserts, the ~1% fraction of the human genome with the lowest methylation in the germline, show a tenfold enrichment for structural rearrangements that occurred in the human genome since the branching of chimpanzee and are highly enriched for fast-evolving loci that regulate tissue-specific gene expression. Analysis of copy number variants (CNVs) from 400 human samples identified using a custom-designed array comparative genomic hybridization (aCGH) chip, combined with publicly available structural variation data, indicates that association of structural mutability with germline hypomethylation is comparable in magnitude to the association of structural mutability with LCR-mediated NAHR. Moreover, rare CNVs occurring in the genomes of individuals diagnosed with schizophrenia, bipolar disorder, and developmental delay and de novo CNVs occurring in those diagnosed with autism are significantly more concentrated within hypomethylated regions. These findings suggest a new connection between the epigenome, selective mutability, evolution, and human disease.

  6. Gene Fusion: A Genome Wide Survey

    NASA Technical Reports Server (NTRS)

    Liang, Ping; Riley, Monica

    2001-01-01

    As a well known fact, organisms form larger and complex multimodular (composite or chimeric) and mostly multi-functional proteins through gene fusion of two or more individual genes which have independent evolution histories and functions. We call each of these components a module. The existence of multimodular proteins may improves the efficiency in gene regulation and in cellular functions, and thus may give the host organism advantages in adaptation to environments. Analysis of all gene fusions in present-day organisms should allow us to examine the patterns of gene fusion in context with cellular functions, to trace back the evolution processes from the ancient smaller and uni-functional proteins to the present-day larger and complex multi-functional proteins, and to estimate the minimal number of ancestor proteins that existed in the last common ancestor for all life on earth. Although many multimodular proteins have been experimentally known, identification of gene fusion events systematically at genome scale had not been possible until recently when large number of completed genome sequences have been becoming available. In addition, technical difficulties for such analysis also exist due to the complexity of this biological and evolutionary process. We report from this study a new strategy to computationally identify multimodular proteins using completed genome sequences and the results surveyed from 22 organisms with the data from over 40 organisms to be presented during the meeting. Additional information is contained in the original extended abstract.

  7. Chloroplast genome structure in Ilex (Aquifoliaceae)

    PubMed Central

    Yao, Xin; Tan, Yun-Hong; Liu, Ying-Ying; Song, Yu; Yang, Jun-Bo; Corlett, Richard T.

    2016-01-01

    Aquifoliaceae is the largest family in the campanulid order Aquifoliales. It consists of a single genus, Ilex, the hollies, which is the largest woody dioecious genus in the angiosperms. Most species are in East Asia or South America. The taxonomy and evolutionary history remain unclear due to the lack of a robust species-level phylogeny. We produced the first complete chloroplast genomes in this family, including seven Ilex species, by Illumina sequencing of long-range PCR products and subsequent reference-guided de novo assembly. These genomes have a typical bicyclic structure with a conserved genome arrangement and moderate divergence. The total length is 157,741 bp and there is one large single-copy region (LSC) with 87,109 bp, one small single-copy with 18,436 bp, and a pair of inverted repeat regions (IR) with 52,196 bp. A total of 144 genes were identified, including 96 protein-coding genes, 40 tRNA and 8 rRNA. Thirty-four repetitive sequences were identified in Ilex pubescens, with lengths >14 bp and identity >90%, and 11 divergence hotspot regions that could be targeted for phylogenetic markers. This study will contribute to improved resolution of deep branches of the Ilex phylogeny and facilitate identification of Ilex species. PMID:27378489

  8. Structure of the germline genome of Tetrahymena thermophila and relationship to the massively rearranged somatic genome.

    PubMed

    Hamilton, Eileen P; Kapusta, Aurélie; Huvos, Piroska E; Bidwell, Shelby L; Zafar, Nikhat; Tang, Haibao; Hadjithomas, Michalis; Krishnakumar, Vivek; Badger, Jonathan H; Caler, Elisabet V; Russ, Carsten; Zeng, Qiandong; Fan, Lin; Levin, Joshua Z; Shea, Terrance; Young, Sarah K; Hegarty, Ryan; Daza, Riza; Gujja, Sharvari; Wortman, Jennifer R; Birren, Bruce W; Nusbaum, Chad; Thomas, Jainy; Carey, Clayton M; Pritham, Ellen J; Feschotte, Cédric; Noto, Tomoko; Mochizuki, Kazufumi; Papazyan, Romeo; Taverna, Sean D; Dear, Paul H; Cassidy-Hanley, Donna M; Xiong, Jie; Miao, Wei; Orias, Eduardo; Coyne, Robert S

    2016-11-28

    The germline genome of the binucleated ciliate Tetrahymena thermophila undergoes programmed chromosome breakage and massive DNA elimination to generate the somatic genome. Here, we present a complete sequence assembly of the germline genome and analyze multiple features of its structure and its relationship to the somatic genome, shedding light on the mechanisms of genome rearrangement as well as the evolutionary history of this remarkable germline/soma differentiation. Our results strengthen the notion that a complex, dynamic, and ongoing interplay between mobile DNA elements and the host genome have shaped Tetrahymena chromosome structure, locally and globally. Non-standard outcomes of rearrangement events, including the generation of short-lived somatic chromosomes and excision of DNA interrupting protein-coding regions, may represent novel forms of developmental gene regulation. We also compare Tetrahymena's germline/soma differentiation to that of other characterized ciliates, illustrating the wide diversity of adaptations that have occurred within this phylum.

  9. Genomic Prediction of Gene Bank Wheat Landraces

    PubMed Central

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J.; Wenzl, Peter; Singh, Sukhwinder

    2016-01-01

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, “diversity” and “prediction”, including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15–20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite

  10. Genomic Prediction of Gene Bank Wheat Landraces.

    PubMed

    Crossa, José; Jarquín, Diego; Franco, Jorge; Pérez-Rodríguez, Paulino; Burgueño, Juan; Saint-Pierre, Carolina; Vikram, Prashant; Sansaloni, Carolina; Petroli, Cesar; Akdemir, Deniz; Sneller, Clay; Reynolds, Matthew; Tattaris, Maria; Payne, Thomas; Guzman, Carlos; Peña, Roberto J; Wenzl, Peter; Singh, Sukhwinder

    2016-07-07

    This study examines genomic prediction within 8416 Mexican landrace accessions and 2403 Iranian landrace accessions stored in gene banks. The Mexican and Iranian collections were evaluated in separate field trials, including an optimum environment for several traits, and in two separate environments (drought, D and heat, H) for the highly heritable traits, days to heading (DTH), and days to maturity (DTM). Analyses accounting and not accounting for population structure were performed. Genomic prediction models include genotype × environment interaction (G × E). Two alternative prediction strategies were studied: (1) random cross-validation of the data in 20% training (TRN) and 80% testing (TST) (TRN20-TST80) sets, and (2) two types of core sets, "diversity" and "prediction", including 10% and 20%, respectively, of the total collections. Accounting for population structure decreased prediction accuracy by 15-20% as compared to prediction accuracy obtained when not accounting for population structure. Accounting for population structure gave prediction accuracies for traits evaluated in one environment for TRN20-TST80 that ranged from 0.407 to 0.677 for Mexican landraces, and from 0.166 to 0.662 for Iranian landraces. Prediction accuracy of the 20% diversity core set was similar to accuracies obtained for TRN20-TST80, ranging from 0.412 to 0.654 for Mexican landraces, and from 0.182 to 0.647 for Iranian landraces. The predictive core set gave similar prediction accuracy as the diversity core set for Mexican collections, but slightly lower for Iranian collections. Prediction accuracy when incorporating G × E for DTH and DTM for Mexican landraces for TRN20-TST80 was around 0.60, which is greater than without the G × E term. For Iranian landraces, accuracies were 0.55 for the G × E model with TRN20-TST80. Results show promising prediction accuracies for potential use in germplasm enhancement and rapid introgression of exotic germplasm into elite materials.

  11. Genomic organization of the CC chemokine mip-3alpha/CCL20/larc/exodus/SCYA20, showing gene structure, splice variants, and chromosome localization.

    PubMed

    Nelson, R T; Boyd, J; Gladue, R P; Paradis, T; Thomas, R; Cunningham, A C; Lira, P; Brissette, W H; Hayes, L; Hames, L M; Neote, K S; McColl, S R

    2001-04-01

    We describe the genomic organization of a recently identified CC chemokine, MIP3alpha/CCL20 (HGMW-approved symbol SCYA20). The MIP-3alpha/CCL20 gene was cloned and sequenced, revealing a four exon, three intron structure, and was localized by FISH analysis to 2q35-q36. Two distinct cDNAs were identified, encoding two forms of MIP-3alpha/CCL20, Ala MIP-3alpha/CCL20 and Ser MIP-3alpha/CCL20, that differ by one amino acid at the predicted signal peptide cleavage site. Examination of the sequence around the boundary of intron 1 and exon 2 showed that use of alternative splice acceptor sites could give rise to Ala MIP-3alpha/CCL20 or Ser MIP-3alpha/CCL20. Both forms of MIP-3alpha/CCL20 were chemically synthesized and tested for biological activity. Both flu antigen plus IL-2-activated CD4(+) and CD8(+) T lymphoblasts and cord blood-derived dendritic cells responded to Ser and Ala MIP-3alpha/CCL20. T lymphocytes exposed only to IL-2 responded inconsistently, while no response was detected in naive T lymphocytes, monocytes, or neutrophils. The biological activity of Ser MIP-3alpha/CCL20 and Ala MIP-3alpha/CCL20 and the tissue-specific preference of different splice acceptor sites are not yet known.

  12. Genes after the human genome project.

    PubMed

    Baetu, Tudor M

    2012-03-01

    While the Human Genome Nomenclature Committee (HGNC) concept of the gene can accommodate a wide variety of genomic sequences contributing to phenotypic outcomes, it fails to specify how sequences should be grouped when dealing with complex loci consisting of adjacent/overlapping sequences contributing to the same phenotype, distant sequences shown to contribute to the same gene product, and partially overlapping sequences identified by different techniques. The purpose of this paper is to review recently proposed concepts of the gene and critically assess how well they succeed in addressing the above problems while preserving the degree of generality achieved by the HGNC concept. I conclude that a dynamic interplay between mapping and syntax-based concepts is required in order to satisfy these desiderata.

  13. Floral gene resources from basal angiosperms for comparative genomics research

    PubMed Central

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and

  14. A gene map of the human genome.

    PubMed

    Schuler, G D; Boguski, M S; Stewart, E A; Stein, L D; Gyapay, G; Rice, K; White, R E; Rodriguez-Tomé, P; Aggarwal, A; Bajorek, E; Bentolila, S; Birren, B B; Butler, A; Castle, A B; Chiannilkulchai, N; Chu, A; Clee, C; Cowles, S; Day, P J; Dibling, T; Drouot, N; Dunham, I; Duprat, S; East, C; Edwards, C; Fan, J B; Fang, N; Fizames, C; Garrett, C; Green, L; Hadley, D; Harris, M; Harrison, P; Brady, S; Hicks, A; Holloway, E; Hui, L; Hussain, S; Louis-Dit-Sully, C; Ma, J; MacGilvery, A; Mader, C; Maratukulam, A; Matise, T C; McKusick, K B; Morissette, J; Mungall, A; Muselet, D; Nusbaum, H C; Page, D C; Peck, A; Perkins, S; Piercy, M; Qin, F; Quackenbush, J; Ranby, S; Reif, T; Rozen, S; Sanders, C; She, X; Silva, J; Slonim, D K; Soderlund, C; Sun, W L; Tabar, P; Thangarajah, T; Vega-Czarny, N; Vollrath, D; Voyticky, S; Wilmer, T; Wu, X; Adams, M D; Auffray, C; Walter, N A; Brandon, R; Dehejia, A; Goodfellow, P N; Houlgatte, R; Hudson, J R; Ide, S E; Iorio, K R; Lee, W Y; Seki, N; Nagase, T; Ishikawa, K; Nomura, N; Phillips, C; Polymeropoulos, M H; Sandusky, M; Schmitt, K; Berry, R; Swanson, K; Torres, R; Venter, J C; Sikela, J M; Beckmann, J S; Weissenbach, J; Myers, R M; Cox, D R; James, M R; Bentley, D; Deloukas, P; Lander, E S; Hudson, T J

    1996-10-25

    The human genome is thought to harbor 50,000 to 100,000 genes, of which about half have been sampled to date in the form of expressed sequence tags. An international consortium was organized to develop and map gene-based sequence tagged site markers on a set of two radiation hybrid panels and a yeast artificial chromosome library. More than 16,000 human genes have been mapped relative to a framework map that contains about 1000 polymorphic genetic markers. The gene map unifies the existing genetic and physical maps with the nucleotide and protein sequence databases in a fashion that should speed the discovery of genes underlying inherited human disease. The integrated resource is available through a site on the World Wide Web at http://www.ncbi.nlm.nih.gov/SCIENCE96/.

  15. Genomic organization of the neurofibromatosis 1 gene (NF1)

    SciTech Connect

    Li, Y.; O`Connell, P.; Huntsman Breidenbach, H.

    1995-01-01

    Neurofibromatosis 1 maps to chromosome band 17q11.2, and the NF1 locus has been partially characterized. Even though the full-length NF1 cDNA has been sequenced, the complete genomic structure of the NF1 gene has not been elucidated. The 5{prime} end of NF1 is embedded in a CpG island containing a NotI restriction site, and the remainder of the gene lies in the adjacent 350-kb NotI fragment. In our efforts to develop a comprehensive screen for NF1 mutations, we have isolated genomic DNA clones that together harbor the entire NF1 cDNA sequence. We have identified all intron-exon boundaries of the coding region and established that it is composed of 59 exons. Furthermore, we have defined the 3{prime}-untranslated region (3{prime}-UTR) of the NF1 gene; it spans approximately 3.5 kb of genomic DNA sequence and is continuous with the stop codon. Oligonucleotide primer pairs synthesized from exon-flanking DNA sequences were used in the polymerase chain reaction with cloned, chromosome 17-specific genomic DNA as template to amplify NF1 exons 1 through 27b and the exon containing the 3{prime}-UTR separately. This information should be useful for implementing a comprehensive NF1 mutation screen using genomic DNA as template. 41 refs., 3 figs., 2 tabs.

  16. Gene organization inside replication domains in mammalian genomes

    NASA Astrophysics Data System (ADS)

    Zaghloul, Lamia; Baker, Antoine; Audit, Benjamin; Arneodo, Alain

    2012-11-01

    We investigate the large-scale organization of human genes with respect to "master" replication origins that were previously identified as bordering nucleotide compositional skew domains. We separate genes in two categories depending on their CpG enrichment at the promoter which can be considered as a marker of germline DNA methylation. Using expression data in mouse, we confirm that CpG-rich genes are highly expressed in germline whereas CpG-poor genes are in a silent state. We further show that, whether tissue-specific or broadly expressed (housekeeping genes), the CpG-rich genes are over-represented close to the replication skew domain borders suggesting some coordination of replication and transcription. We also reveal that the transcription of the longest CpG-rich genes is co-oriented with replication fork progression so that the promoter of these transcriptionally active genes be located into the accessible open chromatin environment surrounding the master replication origins that border the replication skew domains. The observation of a similar gene organization in the mouse genome confirms the interplay of replication, transcription and chromatin structure as the cornerstone of mammalian genome architecture.

  17. The d4 gene family in the human genome

    SciTech Connect

    Chestkov, A.V.; Baka, I.D.; Kost, M.V.

    1996-08-15

    The d4 domain, a novel zinc finger-like structural motif, was first revealed in the rat neuro-d4 protein. Here we demonstrate that the d4 domain is conserved in evolution and that three related genes form a d4 family in the human genome. The human neuro-d4 is very similar to rat neuro-d4 at both the amino acid and the nucleotide levels. Moreover, the same splice variants have been detected among rat and human neuro-d4 transcripts. This gene has been localized on chromosome 19, and two other genes, members of the d4 family isolated by screening of the human genomic library at low stringency, have been mapped to chromosomes 11 and 14. The gene on chromosome 11 is the homolog of the ubiquitously expressed mouse gene ubi-d4/requiem, which is required for cell death after deprivation of trophic factors. A gene with a conserved d4 domain has been found in the genome of the nematode Caenorhabditis elegans. The conservation of d4 proteins from nematodes to vertebrates suggests that they have a general importance, but a diversity of d4 proteins expressed in vertebrate nervous systems suggests that some family members have special functions. 11 refs., 2 figs.

  18. Vive la différence: naming structural variants in the human reference genome.

    PubMed

    Seal, Ruth L; Wright, Mathew W; Gray, Kristian A; Bruford, Elspeth A

    2013-05-01

    The HUGO Gene Nomenclature Committee has approved gene symbols for the majority of protein-coding genes on the human reference genome. To adequately represent regions of complex structural variation, the Genome Reference Consortium now includes alternative representations of some of these regions as part of the reference genome. Here, we describe examples of how we name novel genes in these regions and how this nomenclature is displayed on our website, http://genenames.org.

  19. Cancer genomics identifies disrupted epigenetic genes.

    PubMed

    Simó-Riudalbas, Laia; Esteller, Manel

    2014-06-01

    Latest advances in genome technologies have greatly advanced the discovery of epigenetic genes altered in cancer. The initial single candidate gene approaches have been coupled with newly developed epigenomic platforms to hasten the convergence of scientific discoveries and translational applications. Here, we present an overview of the evolution of cancer epigenomics and an updated catalog of disruptions in epigenetic pathways, whose misregulation can culminate in cancer. The creation of these basic mutational catalogs in cell lines and primary tumors will provide us with enough knowledge to move diagnostics and therapy from the laboratory bench to the bedside.

  20. The evolution of chloroplast genes and genomes in ferns.

    PubMed

    Wolf, Paul G; Der, Joshua P; Duffy, Aaron M; Davidson, Jacob B; Grusz, Amanda L; Pryer, Kathleen M

    2011-07-01

    Most of the publicly available data on chloroplast (plastid) genes and genomes come from seed plants, with relatively little information from their sister group, the ferns. Here we describe several broad evolutionary patterns and processes in fern plastid genomes (plastomes), and we include some new plastome sequence data. We review what we know about the evolutionary history of plastome structure across the fern phylogeny and we compare plastome organization and patterns of evolution in ferns to those in seed plants. A large clade of ferns is characterized by a plastome that has been reorganized with respect to the ancestral gene order (a similar order that is ancestral in seed plants). We review the sequence of inversions that gave rise to this organization. We also explore global nucleotide substitution patterns in ferns versus those found in seed plants across plastid genes, and we review the high levels of RNA editing observed in fern plastomes.

  1. 3D genome structure modeling by Lorentzian objective function.

    PubMed

    Trieu, Tuan; Cheng, Jianlin

    2016-11-29

    The 3D structure of the genome plays a vital role in biological processes such as gene interaction, gene regulation, DNA replication and genome methylation. Advanced chromosomal conformation capture techniques, such as Hi-C and tethered conformation capture, can generate chromosomal contact data that can be used to computationally reconstruct 3D structures of the genome. We developed a novel restraint-based method that is capable of reconstructing 3D genome structures utilizing both intra-and inter-chromosomal contact data. Our method was robust to noise and performed well in comparison with a panel of existing methods on a controlled simulated data set. On a real Hi-C data set of the human genome, our method produced chromosome and genome structures that are consistent with 3D FISH data and known knowledge about the human chromosome and genome, such as, chromosome territories and the cluster of small chromosomes in the nucleus center with the exception of the chromosome 18. The tool and experimental data are available at https://missouri.box.com/v/LorDG.

  2. Gene Conversion Shapes Linear Mitochondrial Genome Architecture

    PubMed Central

    Smith, David Roy; Keeling, Patrick J.

    2013-01-01

    Recently, it was shown that gene conversion between the ends of linear mitochondrial chromosomes can cause telomere expansion and the duplication of subtelomeric loci. However, it is not yet known how widespread this phenomenon is and how significantly it has impacted organelle genome architecture. Using linear mitochondrial DNAs and mitochondrial plasmids from diverse eukaryotes, we argue that telomeric recombination has played a major role in fashioning linear organelle chromosomes. We find that mitochondrial telomeres frequently expand into subtelomeric regions, resulting in gene duplications, homogenizations, and/or fragmentations. We suggest that these features are a product of subtelomeric gene conversion, provide a hypothetical model for this process, and employ genetic diversity data to support the idea that the greater the effective population size the greater the potential for gene conversion between subtelomeric loci. PMID:23572386

  3. Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

    PubMed

    Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

    2015-02-01

    The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp.

  4. Genome-Wide Comparative Analysis Reveals Similar Types of NBS Genes in Hybrid Citrus sinensis Genome and Original Citrus clementine Genome and Provides New Insights into Non-TIR NBS Genes

    PubMed Central

    Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K.; Duan, Yongping; Luo, Feng

    2015-01-01

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention. PMID:25811466

  5. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes.

    PubMed

    Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K; Duan, Yongping; Luo, Feng

    2015-01-01

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.

  6. Exon structure of the human dystrophin gene

    SciTech Connect

    Roberts, R.G.; Coffey, A.J.; Bobrow, M.; Bentley, D.R.

    1993-05-01

    Application of a novel vectorette PCR approach to defining intron-exon boundaries has permitted completion of analysis of the exon structure of the largest and most complex known human gene. The authors present here a summary of the exon structure of the entire human dystrophin gene, together with the sizes of genomic HindIII fragments recognized by each exon, and (where available) GenBank accession numbers for adjacent intron sequences. 20 refs., 1 tab.

  7. Coevolution of the Organization and Structure of Prokaryotic Genomes.

    PubMed

    Touchon, Marie; Rocha, Eduardo P C

    2016-01-04

    The cytoplasm of prokaryotes contains many molecular machines interacting directly with the chromosome. These vital interactions depend on the chromosome structure, as a molecule, and on the genome organization, as a unit of genetic information. Strong selection for the organization of the genetic elements implicated in these interactions drives replicon ploidy, gene distribution, operon conservation, and the formation of replication-associated traits. The genomes of prokaryotes are also very plastic with high rates of horizontal gene transfer and gene loss. The evolutionary conflicts between plasticity and organization lead to the formation of regions with high genetic diversity whose impact on chromosome structure is poorly understood. Prokaryotic genomes are remarkable documents of natural history because they carry the imprint of all of these selective and mutational forces. Their study allows a better understanding of molecular mechanisms, their impact on microbial evolution, and how they can be tinkered in synthetic biology.

  8. Genomic signatures of germline gene expression.

    PubMed

    McVicker, Graham; Green, Phil

    2010-11-01

    Transcribed regions in the human genome differ from adjacent intergenic regions in transposable element density, crossover rates, and asymmetric substitution and sequence composition patterns. We tested whether these differences reflect selection or are instead a byproduct of germline transcription, using publicly available gene expression data from a variety of germline and somatic tissues. Crossover rate shows a strong negative correlation with gene expression in meiotic tissues, suggesting that crossover is inhibited by transcription. Strand-biased composition (G+T content) and A → G versus T → C substitution asymmetry are both positively correlated with germline gene expression. We find no evidence for a strand bias in allele frequency data, implying that the substitution asymmetry reflects a mutation rather than a fixation bias. The density of transposable elements is positively correlated with germline expression, suggesting that such elements preferentially insert into regions that are actively transcribed. For each of the features examined, our analyses favor a nonselective explanation for the observed trends and point to the role of germline gene expression in shaping the mammalian genome.

  9. Mining the genome for lipid genes.

    PubMed

    Kuivenhoven, Jan Albert; Hegele, Robert A

    2014-10-01

    Mining of the genome for lipid genes has since the early 1970s helped to shape our understanding of how triglycerides are packaged (in chylomicrons), repackaged (in very low density lipoproteins; VLDL), and hydrolyzed, and also how remnant and low-density lipoproteins (LDL) are cleared from the circulation. Gene discoveries have also provided insights into high-density lipoprotein (HDL) biogenesis and remodeling. Interestingly, at least half of these key molecular genetic studies were initiated with the benefit of prior knowledge of relevant proteins. In addition, multiple important findings originated from studies in mouse, and from other types of non-genetic approaches. Although it appears by now that the main lipid pathways have been uncovered, and that only modulators or adaptor proteins such as those encoded by LDLRAP1, APOA5, ANGPLT3/4, and PCSK9 are currently being discovered, genome wide association studies (GWAS) in particular have implicated many new loci based on statistical analyses; these may prove to have equally large impacts on lipoprotein traits as gene products that are already known. On the other hand, since 2004 - and particularly since 2010 when massively parallel sequencing has become de rigeur - no major new insights into genes governing lipid metabolism have been reported. This is probably because the etiologies of true Mendelian lipid disorders with overt clinical complications have been largely resolved. In the meantime, it has become clear that proving the importance of new candidate genes is challenging. This could be due to very low frequencies of large impact variants in the population. It must further be emphasized that functional genetic studies, while necessary, are often difficult to accomplish, making it hazardous to upgrade a variant that is simply associated to being definitively causative. Also, it is clear that applying a monogenic approach to dissect complex lipid traits that are mostly of polygenic origin is the wrong way to

  10. Hybrid Vigour? Genes, Genomics, and History

    PubMed Central

    BIVINS, ROBERTA

    2010-01-01

    Is the gene ‘special’ for historians? What effects, if any, has the notion of the ‘gene’ had on our understanding of history? Certainly, there is a widespread public and professional perception that genetics and history are or should be in dialogue with each other in some way. But historians and geneticists view history and genetics very differently – and assume very different relationships between them. And public perceptions of genes, genetics, genomics, and indeed the nature and meanings of ‘history’ differ yet again. Here, in looking at the meaning, and the implications – the significance – of the gene (and its corollary scientific disciplines and approaches) specifically to historians, I will focus on two aspects of the discourse. First, I will examine the ways in which historians have thus far approached genes and genetics, and the impact such studies have had on the field. There is considerable overlap between the subject matter of genetics/genomics and many of the most widely used analytic categories of contemporary historiography – race, gender, sexuality, ethnicity, (dis)ability, among others. Yet the impact of genetics and genomics on society has been studied principally by anthropologists, sociologists and ethicists.2 Only two historical sub-disciplines have engaged with the rise of genetics to any significant degree: the histories of science and of medicine. What does this indicate or suggest? Second, I will explore the impact of the ‘gene’ and genetic understandings (of, for example, the body, health, disease, identity, the family, and evolution) on public conceptions of history itself. PMID:20357894

  11. Structural genomics of pathogenic protozoa: an overview.

    PubMed

    Fan, Erkang; Baker, David; Fields, Stanley; Gelb, Michael H; Buckner, Frederick S; Van Voorhis, Wesley C; Phizicky, Eric; Dumont, Mark; Mehlin, Christopher; Grayhack, Elizabeth; Sullivan, Mark; Verlinde, Christophe; Detitta, George; Meldrum, Deirdre R; Merritt, Ethan A; Earnest, Thomas; Soltis, Michael; Zucker, Frank; Myler, Peter J; Schoenfeld, Lori; Kim, David; Worthey, Liz; Lacount, Doug; Vignali, Marissa; Li, Jizhen; Mondal, Somnath; Massey, Archna; Carroll, Brian; Gulde, Stacey; Luft, Joseph; Desoto, Larry; Holl, Mark; Caruthers, Jonathan; Bosch, Jürgen; Robien, Mark; Arakaki, Tracy; Holmes, Margaret; Le Trong, Isolde; Hol, Wim G J

    2008-01-01

    The Structural Genomics of Pathogenic Protozoa (SGPP) Consortium aimed to determine crystal structures of proteins from trypanosomatid and malaria parasites in a high throughput manner. The pipeline of target selection, protein production, crystallization, and structure determination, is sketched. Special emphasis is given to a number of technology developments including domain prediction, the use of "co-crystallants," and capillary crystallization. "Fragment cocktail crystallography" for medical structural genomics is also described.

  12. Afrobatrachian mitochondrial genomes: genome reorganization, gene rearrangement mechanisms, and evolutionary trends of duplicated and rearranged genes

    PubMed Central

    2013-01-01

    Background Mitochondrial genomic (mitogenomic) reorganizations are rarely found in closely-related animals, yet drastic reorganizations have been found in the Ranoides frogs. The phylogenetic relationships of the three major ranoid taxa (Natatanura, Microhylidae, and Afrobatrachia) have been problematic, and mitogenomic information for afrobatrachians has not been available. Several molecular models for mitochondrial (mt) gene rearrangements have been proposed, but observational evidence has been insufficient to evaluate them. Furthermore, evolutionary trends in rearranged mt genes have not been well understood. To gain molecular and phylogenetic insights into these issues, we analyzed the mt genomes of four afrobatrachian species (Breviceps adspersus, Hemisus marmoratus, Hyperolius marmoratus, and Trichobatrachus robustus) and performed molecular phylogenetic analyses. Furthermore we searched for two evolutionary patterns expected in the rearranged mt genes of ranoids. Results Extensively reorganized mt genomes having many duplicated and rearranged genes were found in three of the four afrobatrachians analyzed. In fact, Breviceps has the largest known mt genome among vertebrates. Although the kinds of duplicated and rearranged genes differed among these species, a remarkable gene rearrangement pattern of non-tandemly copied genes situated within tandemly-copied regions was commonly found. Furthermore, the existence of concerted evolution was observed between non-neighboring copies of triplicated 12S and 16S ribosomal RNA regions. Conclusions Phylogenetic analyses based on mitogenomic data support a close relationship between Afrobatrachia and Microhylidae, with their estimated divergence 100 million years ago consistent with present-day endemism of afrobatrachians on the African continent. The afrobatrachian mt data supported the first tandem and second non-tandem duplication model for mt gene rearrangements and the recombination-based model for concerted

  13. Translational control genes in the sea urchin genome.

    PubMed

    Morales, Julia; Mulner-Lorillon, Odile; Cosson, Bertrand; Morin, Emmanuelle; Bellé, Robert; Bradham, Cynthia A; Beane, Wendy S; Cormier, Patrick

    2006-12-01

    Sea urchin eggs and early cleavage stage embryos provide an example of regulated gene expression at the level of translation. The availability of the sea urchin genome offers the opportunity to investigate the "translational control" toolkit of this model system. The annotation of the genome reveals that most of the factors implicated in translational control are encoded by nonredundant genes in echinoderm, an advantage for future functional studies. In this paper, we focus on translation factors that have been shown or suggested to play crucial role in cell cycle and development of sea urchin embryos. Addressing the cap-binding translational control, three closely related eIF4E genes (class I, II, III) are present, whereas its repressor 4E-BP and its activator eIF4G are both encoded by one gene. Analysis of the class III eIF4E proteins in various phyla shows an echinoderm-specific amino acid substitution. Furthermore, an interaction site between eIF4G and poly(A)-binding protein is uncovered in the sea urchin eIF4G proteins and is conserved in metazoan evolution. In silico screening of the sea urchin genome has uncovered potential new regulators of eIF4E sharing the common eIF4E recognition motif. Taking together, these data provide new insights regarding the strong requirement of cap-dependent translation following fertilization. The genome analysis gives insights on the complexity of eEF1B structure and motifs of functional relevance, involved in the translational control of gene expression at the level of elongation. Finally, because deregulation of translation process can lead to diseases and tumor formation in humans, the sea urchin orthologs of human genes implicated in human diseases and signaling pathways regulating translation were also discussed.

  14. Genomic organization of the human lysosomal acid lipase gene (LIPA)

    SciTech Connect

    Aslandis, C.; Klima, H.; Lackner, K.J.; Schmitz, G. )

    1994-03-15

    Defects in the human lysosomal acid lipase gene are responsible for cholesteryl ester storage disease (CESD) and Wolman disease. Exon skipping as the cause for CESD has been demonstrated. The authors present here a summary of the exon structure of the entire human lysosomal acid lipase gene consisting of 10 exons, together with the sizes of genomic EcoRI and SacI fragments hybridizing to each exon. In addition, the DNA sequence of the putative promoter region is presented. The EMBL accession numbers for adjacent intron sequences are given. 7 refs., 2 figs., 1 tab.

  15. Structural and Operational Complexity of the Geobacter Sulfurreducens Genome

    SciTech Connect

    Qiu, Yu; Cho, Byung-Kwan; Park, Young S.; Lovley, Derek R.; Palsson, Bernhard O.; Zengler, Karsten

    2010-06-30

    Prokaryotic genomes can be annotated based on their structural, operational, and functional properties. These annotations provide the pivotal scaffold for understanding cellular functions on a genome-scale, such as metabolism and transcriptional regulation. Here, we describe a systems approach to simultaneously determine the structural and operational annotation of the Geobacter sulfurreducens genome. Integration of proteomics, transcriptomics, RNA polymerase, and sigma factor-binding information with deep-sequencing-based analysis of primary 59-end transcripts allowed for a most precise annotation. The structural annotation is comprised of numerous previously undetected genes, noncoding RNAs, prevalent leaderless mRNA transcripts, and antisense transcripts. When compared with other prokaryotes, we found that the number of antisense transcripts reversely correlated with genome size. The operational annotation consists of 1453 operons, 22% of which have multiple transcription start sites that use different RNA polymerase holoenzymes. Several operons with multiple transcription start sites encoded genes with essential functions, giving insight into the regulatory complexity of the genome. The experimentally determined structural and operational annotations can be combined with functional annotation, yielding a new three-level annotation that greatly expands our understanding of prokaryotic genomes.

  16. Structural and operational complexity of the Geobacter sulfurreducens genome

    PubMed Central

    Qiu, Yu; Cho, Byung-Kwan; Park, Young Seoub; Lovley, Derek; Palsson, Bernhard Ø.; Zengler, Karsten

    2010-01-01

    Prokaryotic genomes can be annotated based on their structural, operational, and functional properties. These annotations provide the pivotal scaffold for understanding cellular functions on a genome-scale, such as metabolism and transcriptional regulation. Here, we describe a systems approach to simultaneously determine the structural and operational annotation of the Geobacter sulfurreducens genome. Integration of proteomics, transcriptomics, RNA polymerase, and sigma factor-binding information with deep-sequencing-based analysis of primary 5′-end transcripts allowed for a most precise annotation. The structural annotation is comprised of numerous previously undetected genes, noncoding RNAs, prevalent leaderless mRNA transcripts, and antisense transcripts. When compared with other prokaryotes, we found that the number of antisense transcripts reversely correlated with genome size. The operational annotation consists of 1453 operons, 22% of which have multiple transcription start sites that use different RNA polymerase holoenzymes. Several operons with multiple transcription start sites encoded genes with essential functions, giving insight into the regulatory complexity of the genome. The experimentally determined structural and operational annotations can be combined with functional annotation, yielding a new three-level annotation that greatly expands our understanding of prokaryotic genomes. PMID:20592237

  17. Structure and organization of a 25 kbp region of the genome of the photosynthetic green sulfur bacterium Chlorobium vibrioforme containing Mg-chelatase encoding genes.

    PubMed

    Petersen, B L; Møller, M G; Stummann, B M; Henningsen, K W

    1998-01-01

    A region comprising approximately 25 kbp of the genome of the strictly anaerobic and obligate photosynthetic green sulfur bacterium Chlorobium vibrioforme has been mapped, subcloned and partly sequenced. Approximately 15 kbp have been sequenced in it's entirety and three genes with significant homology and feature similarity to the bchI, -D and -H genes and the chlI, -D and -H genes of Rhodobacter and Synechocystis strain PCC6803, respectively, which encode magnesium chelatase subunits, have been identified. Magnesium chelatase catalyzes the insertion of Mg2+ into protoporphyrin IX, and is the first enzyme unique to the (bacterio)chlorophyll specific branch of the porphyrin biosynthetic pathway. The organization of the three Mg-chelatase encoding genes is unique to Chlorobium and suggests that the magnesium chelatase of C. vibrioforme is encoded by a single operon. The analyzed 25 kbp region contains five additional open reading frames, two of which display significant homology and feature similarity to genes encoding lipoamide dehydrogenase and genes with function in purine synthesis, and another three display significant homology to open reading frames with unknown function in distantly related bacteria. Putative E. coli sigma 70-like promoter sequences, ribosome binding sequences and rho-independent transcriptional stop signals within the sequenced 15 kbp region are related to the identified genes and orfs. Southern analysis, restriction mapping and partial sequencing of the remaining ca. 10 kbp of the analyzed 25 kbp region have shown that this part includes the hemA, -C, -D and -B genes (MOBERG and AVISSAR 1994), which encode enzymes with function in the early part of the biosynthetic pathway of porphyrins.

  18. Isolation, cDNA, and genomic structure of a conserved gene (NOF) at chromosome 11q13 next to FAU and oriented in the opposite transcriptional orientation

    SciTech Connect

    Kas, K.; Meyen, E.; Van De Ven, W.J.M.

    1996-06-15

    In our effort to characterize a gene at chromosome 11q13 involved in a t(11;17)(q13;q21) translocation in B-non-Hodgkin lymphoma, we have identified a novel human gene, NOF (Neighbour of FAU). It maps right next to FAU in a head to head configuration separated by a maximum of 146 nucleotides. cDNA clones representing NOF hybridized to a 2.2-kb mRNA present in all tissues tested. The largest open reading frame appeared to contain 166 amino acids and is proline rich, and the sequence shows no homology with any known gene in the public databases. The NOF gene consists of 4 exons and 3 introns spanning approximately 5 kb, and the boundaries between exons and introns follow the GT/AG rule. The NOF locus is conserved during evolution, with the predicted protein having over 80% identity to three translated mouse and rat ESTs of unknown function. Moreover, the mouse ESTs map in the same organization, closely linked to the FAU gene, in the mouse genome. NOF, however, is not affected by the t(11;17)(q13;121) chromosomal translocation. 14 refs., 2 figs.

  19. Isolation, cDNA, and genomic structure of a conserved gene (NOF) at chromosome 11q13 next to FAU and oriented in the opposite transcriptional orientation.

    PubMed

    Kas, K; Lemahieu, V; Meyen, E; Van de Ven, W J; Merregaert, J

    1996-06-15

    In our effort to characterize a gene at chromosome 11q13 involved in a t(11;17)(q13;q21) translocation in B-non-Hodgkin lymphoma, we have identified a novel human gene, NOF (Neighbour of FAU). It maps right next to FAU in a head to head configuration separated by a maximum of 146 nucleotides. cDNA clones representing NOF hybridized to a 2. 2-kb mRNA present in all tissues tested. The largest open reading frame appeared to contain 166 amino acids and is proline rich, and the sequence shows no homology with any known gene in the public databases. The NOF gene consists of 4 exons and 3 introns spanning approximately 5 kb, and the boundaries between exons and introns follow the GT/AG rule. The NOF locus is conserved during evolution, with the predicted protein having over 80% identity to three translated mouse and rat ESTs of unknown function. Moreover, the mouse ESTs map in the same organization, closely linked to the FAU gene, in the mouse genome. NOF, however, is not affected by the t(11;17)(q13;q21) chromosomal translocation.

  20. Structure and expression of the gene coding for the alpha-subunit of DNA-dependent RNA polymerase from the chloroplast genome of Zea mays.

    PubMed Central

    Ruf, M; Kössel, H

    1988-01-01

    The rpoA gene coding for the alpha-subunit of DNA-dependent RNA polymerase located on the DNA of Zea mays chloroplasts has been characterized with respect to its position on the chloroplast genome and its nucleotide sequence. The amino acid sequence derived for a 39 Kd polypeptide shows strong homology with sequences derived from the rpoA genes of other chloroplast species and with the amino acid sequence of the alpha-subunit from E. coli RNA polymerase. Transcripts of the rpoA gene were identified by Northern hybridization and characterized by S1 mapping using total RNA isolated from maize chloroplasts. Antibodies raised against a synthetic C-terminal heptapeptide show cross reactivity with a 39 Kd polypeptide contained in the stroma fraction of maize chloroplasts. It is concluded that the rpoA gene is a functional gene and that therefore, at least the alpha-subunit of plastidic RNA polymerase, is expressed in chloroplasts. Images PMID:3399379

  1. Characterization of histone genes isolated from Xenopus laevis and Xenopus tropicalis genomic libraries.

    PubMed Central

    Ruberti, I; Fragapane, P; Pierandrei-Amaldi, P; Beccari, E; Amaldi, F; Bozzoni, I

    1982-01-01

    Using a cDNA clone for the histone H3 we have isolated, from two genomic libraries of Xenopus laevis and Xenopus tropicalis, clones containing four different histone gene clusters. The structural organization of X. laevis histone genes has been determined by restriction mapping, Southern blot hybridization and translation of the mRNAs which hybridize to the various restriction fragments. The arrangement of the histone genes in X. tropicalis has been determined by Southern analysis using X. laevis genomic fragments, containing individual genes, as probes. Histone genes are clustered in the genome of X. laevis and X. tropicalis and, compared to invertebrates, show a higher organization heterogeneity as demonstrated by structural analysis of the four genomic clones. In fact, the order of the genes within individual clusters is not conserved. Images PMID:6296782

  2. Molecular Characterization of Soybean Pterocarpan 2-Dimethylallyltransferase in Glyceollin Biosynthesis: Local Gene and Whole-Genome Duplications of Prenyltransferase Genes Led to the Structural Diversity of Soybean Prenylated Isoflavonoids.

    PubMed

    Yoneyama, Keisuke; Akashi, Tomoyoshi; Aoki, Toshio

    2016-12-01

    Soybean (Glycine max) accumulates several prenylated isoflavonoid phytoalexins, collectively referred to as glyceollins. Glyceollins (I, II, III, IV and V) possess modified pterocarpan skeletons with C5 moieties from dimethylallyl diphosphate, and they are commonly produced from (6aS, 11aS)-3,9,6a-trihydroxypterocarpan [(-)-glycinol]. The metabolic fate of (-)-glycinol is determined by the enzymatic introduction of a dimethylallyl group into C-4 or C-2, which is reportedly catalyzed by regiospecific prenyltransferases (PTs). 4-Dimethylallyl (-)-glycinol and 2-dimethylallyl (-)-glycinol are precursors of glyceollin I and other glyceollins, respectively. Although multiple genes encoding (-)-glycinol biosynthetic enzymes have been identified, those involved in the later steps of glyceollin formation mostly remain unidentified, except for (-)-glycinol 4-dimethylallyltransferase (G4DT), which is involved in glyceollin I biosynthesis. In this study, we identified four genes that encode isoflavonoid PTs, including (-)-glycinol 2-dimethylallyltransferase (G2DT), using homology-based in silico screening and biochemical characterization in yeast expression systems. Transcript analyses illustrated that changes in G2DT gene expression were correlated with the induction of glyceollins II, III, IV and V in elicitor-treated soybean cells and leaves, suggesting its involvement in glyceollin biosynthesis. Moreover, the genomic signatures of these PT genes revealed that G4DT and G2DT are paralogs derived from whole-genome duplications of the soybean genome, whereas other PT genes [isoflavone dimethylallyltransferase 1 (IDT1) and IDT2] were derived via local gene duplication on soybean chromosome 11.

  3. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    NASA Astrophysics Data System (ADS)

    Yanai, Itai; Camacho, Carlos J.; Delisi, Charles

    2000-09-01

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications.

  4. Putative essential and core-essential genes in Mycoplasma genomes.

    PubMed

    Lin, Yan; Zhang, Randy Ren

    2011-01-01

    Mycoplasma, which was used to create the first "synthetic life", has been an important species in the emerging field, synthetic biology. However, essential genes, an important concept of synthetic biology, for both M. mycoides and M. capricolum, as well as 14 other Mycoplasma with available genomes, are still unknown. We have developed a gene essentiality prediction algorithm that incorporates information of biased gene strand distribution, homologous search and codon adaptation index. The algorithm, which achieved an accuracy of 80.8% and 78.9% in self-consistence and cross-validation tests, respectively, predicted 5880 essential genes in the 16 Mycoplasma genomes. The intersection set of essential genes in available Mycoplasma genomes consists of 153 core essential genes. The predicted essential genes (available from pDEG, tubic.tju.edu.cn/pdeg) and the proposed algorithm can be helpful for studying minimal Mycoplasma genomes as well as essential genes in other genomes.

  5. Putative essential and core-essential genes in Mycoplasma genomes

    PubMed Central

    Lin, Yan; Zhang, Randy Ren

    2011-01-01

    Mycoplasma, which was used to create the first “synthetic life”, has been an important species in the emerging field, synthetic biology. However, essential genes, an important concept of synthetic biology, for both M. mycoides and M. capricolum, as well as 14 other Mycoplasma with available genomes, are still unknown. We have developed a gene essentiality prediction algorithm that incorporates information of biased gene strand distribution, homologous search and codon adaptation index. The algorithm, which achieved an accuracy of 80.8% and 78.9% in self-consistence and cross-validation tests, respectively, predicted 5880 essential genes in the 16 Mycoplasma genomes. The intersection set of essential genes in available Mycoplasma genomes consists of 153 core essential genes. The predicted essential genes (available from pDEG, tubic.tju.edu.cn/pdeg) and the proposed algorithm can be helpful for studying minimal Mycoplasma genomes as well as essential genes in other genomes. PMID:22355572

  6. A Roadmap for Functional Structural Variants in the Soybean Genome

    PubMed Central

    Anderson, Justin E.; Kantar, Michael B.; Kono, Thomas Y.; Fu, Fengli; Stec, Adrian O.; Song, Qijian; Cregan, Perry B.; Specht, James E.; Diers, Brian W.; Cannon, Steven B.; McHale, Leah K.; Stupar, Robert M.

    2014-01-01

    Gene structural variation (SV) has recently emerged as a key genetic mechanism underlying several important phenotypic traits in crop species. We screened a panel of 41 soybean (Glycine max) accessions serving as parents in a soybean nested association mapping population for deletions and duplications in more than 53,000 gene models. Array hybridization and whole genome resequencing methods were used as complementary technologies to identify SV in 1528 genes, or approximately 2.8%, of the soybean gene models. Although SV occurs throughout the genome, SV enrichment was noted in families of biotic defense response genes. Among accessions, SV was nearly eightfold less frequent for gene models that have retained paralogs since the last whole genome duplication event, compared with genes that have not retained paralogs. Increases in gene copy number, similar to that described at the Rhg1 resistance locus, account for approximately one-fourth of the genic SV events. This assessment of soybean SV occurrence presents a target list of genes potentially responsible for rapidly evolving and/or adaptive traits. PMID:24855315

  7. INTEGRATE: gene fusion discovery using whole genome and transcriptome data

    PubMed Central

    Zhang, Jin; White, Nicole M.; Schmidt, Heather K.; Fulton, Robert S.; Tomlinson, Chad; Warren, Wesley C.; Wilson, Richard K.; Maher, Christopher A.

    2016-01-01

    While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use. PMID:26556708

  8. INTEGRATE: gene fusion discovery using whole genome and transcriptome data.

    PubMed

    Zhang, Jin; White, Nicole M; Schmidt, Heather K; Fulton, Robert S; Tomlinson, Chad; Warren, Wesley C; Wilson, Richard K; Maher, Christopher A

    2016-01-01

    While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use.

  9. Landscape genomics of Populus trichocarpa: the role of hybridization, limited gene flow, and natural selection in shaping patterns of population structure.

    PubMed

    Geraldes, Armando; Farzaneh, Nima; Grassa, Christopher J; McKown, Athena D; Guy, Robert D; Mansfield, Shawn D; Douglas, Carl J; Cronk, Quentin C B

    2014-11-01

    Populus trichocarpa is an ecologically important tree across western North America. We used a large population sample of 498 accessions over a wide geographical area genotyped with a 34K Populus SNP array to quantify geographical patterns of genetic variation in this species (landscape genomics). We present evidence that three processes contribute to the observed patterns: (1) introgression from the sister species P. balsamifera, (2) isolation by distance (IBD), and (3) natural selection. Introgression was detected only at the margins of the species' distribution. IBD was significant across the sampled area as a whole, but no evidence of restricted gene flow was detected in a core of drainages from southern British Columbia (BC). We identified a large number of FST outliers. Gene Ontology analyses revealed that FST outliers are overrepresented in genes involved in circadian rhythm and response to red/far-red light when the entire dataset is considered, whereas in southern BC heat response genes are overrepresented. We also identified strong correlations between geoclimate variables and allele frequencies at FST outlier loci that provide clues regarding the selective pressures acting at these loci.

  10. Elucidation of operon structures across closely related bacterial genomes.

    PubMed

    Zhou, Chuan; Ma, Qin; Li, Guojun

    2014-01-01

    About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components.

  11. Genome Wide Identification, Phylogeny, and Expression of Aquaporin Genes in Common Carp (Cyprinus carpio)

    PubMed Central

    Feng, Jingyan; Xu, Jian; Mahboob, Shahid; Al-Ghanim, Khalid; Li, Xuejun

    2016-01-01

    Background Aquaporins (Aqps) are integral membrane proteins that facilitate the transport of water and small solutes across cell membranes. Among vertebrate species, Aqps are highly conserved in both gene structure and amino acid sequence. These proteins are vital for maintaining water homeostasis in living organisms, especially for aquatic animals such as teleost fish. Studies on teleost Aqps are mainly limited to several model species with diploid genomes. Common carp, which has a tetraploidized genome, is one of the most common aquaculture species being adapted to a wide range of aquatic environments. The complete common carp genome has recently been released, providing us the possibility for gene evolution of aqp gene family after whole genome duplication. Results In this study, we identified a total of 37 aqp genes from common carp genome. Phylogenetic analysis revealed that most of aqps are highly conserved. Comparative analysis was performed across five typical vertebrate genomes. We found that almost all of the aqp genes in common carp were duplicated in the evolution of the gene family. We postulated that the expansion of the aqp gene family in common carp was the result of an additional whole genome duplication event and that the aqp gene family in other teleosts has been lost in their evolution history with the reason that the functions of genes are redundant and conservation. Expression patterns were assessed in various tissues, including brain, heart, spleen, liver, intestine, gill, muscle, and skin, which demonstrated the comprehensive expression profiles of aqp genes in the tetraploidized genome. Significant gene expression divergences have been observed, revealing substantial expression divergences or functional divergences in those duplicated aqp genes post the latest WGD event. Conclusions To some extent, the gene families are also considered as a unique source for evolutionary studies. Moreover, the whole set of common carp aqp gene family

  12. Chapter 6: Structural variation and medical genomics.

    PubMed

    Raphael, Benjamin J

    2012-01-01

    Differences between individual human genomes, or between human and cancer genomes, range in scale from single nucleotide variants (SNVs) through intermediate and large-scale duplications, deletions, and rearrangements of genomic segments. The latter class, called structural variants (SVs), have received considerable attention in the past several years as they are a previously under appreciated source of variation in human genomes. Much of this recent attention is the result of the availability of higher-resolution technologies for measuring these variants, including both microarray-based techniques, and more recently, high-throughput DNA sequencing. We describe the genomic technologies and computational techniques currently used to measure SVs, focusing on applications in human and cancer genomics.

  13. Wolbachia genome integrated in an insect chromosome: Evolution and fate of laterally transferred endosymbiont genes

    PubMed Central

    Nikoh, Naruo; Tanaka, Kohjiro; Shibata, Fukashi; Kondo, Natsuko; Hizume, Masahiro; Shimada, Masakazu; Fukatsu, Takema

    2008-01-01

    Recent accumulation of microbial genome data has demonstrated that lateral gene transfers constitute an important and universal evolutionary process in prokaryotes, while those in multicellular eukaryotes are still regarded as unusual, except for endosymbiotic gene transfers from mitochondria and plastids. Here we thoroughly investigated the bacterial genes derived from a Wolbachia endosymbiont on the nuclear genome of the beetle Callosobruchus chinensis. Exhaustive PCR detection and Southern blot analysis suggested that ∼30% of Wolbachia genes, in terms of the gene repertoire of wMel, are present on the insect nuclear genome. Fluorescent in situ hybridization located the transferred genes on the proximal region of the basal short arm of the X chromosome. Molecular evolutionary and other lines of evidence indicated that the transferred genes are probably derived from a single lateral transfer event. The transferred genes were, for the length examined, structurally disrupted, freed from functional constraints, and transcriptionally inactive. Hence, most, if not all, of the transferred genes have been pseudogenized. Notwithstanding this, the transferred genes were ubiquitously detected from Japanese and Taiwanese populations of C. chinensis, while the number of the transferred genes detected differed between the populations. The transferred genes were not detected from congenic beetle species, indicating that the transfer event occurred after speciation of C. chinensis, which was estimated to be one or several million years ago. These features of the laterally transferred endosymbiont genes are compared with the evolutionary patterns of mitochondrial and plastid genome fragments acquired by nuclear genomes through recent endosymbiotic gene transfers. PMID:18073380

  14. Whole-Genome Analysis of Gene Conversion Events

    NASA Astrophysics Data System (ADS)

    Hsu, Chih-Hao; Zhang, Yu; Hardison, Ross; Miller, Webb

    Gene conversion events are often overlooked in analyses of genome evolution. In a conversion event, an interval of DNA sequence (not necessarily containing a gene) overwrites a highly similar sequence. The event creates relationships among genomic intervals that can confound attempts to identify orthologs and to transfer functional annotation between genomes. Here we examine 1,112,202 paralogous pairs of human genomic intervals, and detect conversion events in about 13.5% of them. Properties of the putative gene conversions are analyzed, such as the lengths of the paralogous pairs and the spacing between their sources and targets. Our approach is illustrated using conversion events in the beta-globin gene cluster.

  15. Chemical genomics for studying parasite gene function and interaction

    PubMed Central

    Li, Jian; Yuan, Jing; Chen, Chin-chien; Inglese, James; Su, Xin-zhuan

    2013-01-01

    With the development of new technologies in genome sequencing, gene expression profiling, genotyping, and high-throughput screening of chemical compound libraries, small molecules are playing increasingly important roles in studying gene expression regulation, gene-gene interaction, and gene function. Here we briefly review and discuss some recent advancements in drug target identification and phenotype characterization using combinations of high-throughput screening of small-molecule libraries and various genome-wide methods such as whole genome sequencing, genome-wide association studies, and genome-wide expressional analysis. These approaches can be used to search for new drugs against parasitic infections, to identify drug targets or drug-resistance genes, and to infer gene function. PMID:24215777

  16. Genome Structure Gallery from the Mycobacterium Tuberculosis Structual Genomics Consortium

    DOE Data Explorer

    The TB Structural Genomics Consortium works with the structures of proteins from M. tuberculosis, analyzing these structures in the context of functional information that currently exists and that the Consortium generates. The database of linked structural and functional information constructed from this project will form a lasting basis for understanding M. tuberculosis pathogenesis and for structure-based drug design. The Consortium's structural and functional information is publicly available. The Structures Gallery makes more than 650 total structures available by PDB identifier. Some of these are not consortium targets, but all are viewable in 3D color and can be manipulated in various ways by Jmol, an open-source Java viewer for chemical structures in 3D from http://www.jmol.org/

  17. GenePRIMP: A GENE PRediction IMprovement Pipeline for Prokaryotic genomes

    SciTech Connect

    Pati, Amrita; Ivanova, Natalia N.; Mikhailova, Natalia; Ovchinnikova, Galina; Hooper, Sean D.; Lykidis, Athanasios; Kyrpides, Nikos C.

    2010-04-01

    We present 'gene prediction improvement pipeline' (GenePRIMP; http://geneprimp.jgi-psf.org/), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missed genes and split genes. We found that manual curation of gene models using the anomaly reports generated by GenePRIMP improved their quality, and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome-sequencing and annotation technologies.

  18. Genomic analysis and gene structure of the plant carotenoid dioxygenase 4 family: a deeper study in Crocus sativus and its allies.

    PubMed

    Ahrazem, Oussama; Trapero, Almudena; Gómez, M Dolores; Rubio-Moraga, Angela; Gómez-Gómez, Lourdes

    2010-10-01

    The plastoglobule-targeted enzyme carotenoid cleavage dioxygenase (CCD4) mediates the formation of volatile C13 ketones, such as β-ionone, by cleaving the C9-C10 and C9'-C10' double bonds of cyclic carotenoids. Here, we report the isolation and analysis of CCD4 genomic DNA regions in Crocus sativus. Different CCD4 alleles have been identified: CsCCD4a which is found with and without an intron and CsCCD4b that showed the presence of a unique intron. The presence of different CCD4 alleles was also observed in other Crocus species. Furthermore, comparison of the locations of CCD4 introns within the coding region with CCD4 genes from other plant species suggests that independent gain/losses have occurred. The comparison of the promoter region of CsCCD4a and CsCCD4b with available CCD4 gene promoters from other plant species highlighted the conservation of cis-elements involved in light response, heat stress, as well as the absence and unique presence of cis-elements involved in circadian regulation and low temperature responses, respectively. Functional characterization of the Crocus sativus CCD4a promoter using Arabidopsis plants stably transformed with a DNA fragment of 1400 base pairs (P-CsCCD4a) fused to the β-glucuronidase (GUS) reporter gene showed that this sequence was sufficient to drive GUS expression in the flower, in particular high levels were detected in pollen.

  19. Identification and characterization of essential genes in the human genome

    PubMed Central

    Wang, Tim; Birsoy, Kıvanç; Hughes, Nicholas W.; Krupczak, Kevin M.; Post, Yorick; Wei, Jenny J.; Lander, Eric S.; Sabatini, David M.

    2015-01-01

    Large-scale genetic analysis of lethal phenotypes has elucidated the molecular underpinnings of many biological processes. Using the bacterial clustered regularly interspaced short palindromic repeats (CRISPR) system, we constructed a genome-wide single-guide RNA (sgRNA) library to screen for genes required for proliferation and survival in a human cancer cell line. Our screen revealed the set of cell-essential genes, which was validated by an orthogonal gene-trap-based screen and comparison with yeast gene knockouts. This set is enriched for genes that encode components of fundamental pathways, are expressed at high levels, and contain few inactivating polymorphisms in the human population. We also uncovered a large group of uncharacterized genes involved in RNA processing, a number of whose products localize to the nucleolus. Lastly, screens in additional cell lines showed a high degree of overlap in gene essentiality, but also revealed differences specific to each cell line and cancer type that reflect the developmental origin, oncogenic drivers, paralogous gene expression pattern, and chromosomal structure of each line. These results demonstrate the power of CRISPR-based screens and suggest a general strategy for identifying liabilities in cancer cells. PMID:26472758

  20. Genomic structure of the α-amylase gene in the pearl oyster Pinctada fucata and its expression in response to salinity and food concentration.

    PubMed

    Huang, Guiju; Guo, Yihui; Li, Lu; Fan, Sigang; Yu, Ziniu; Yu, Dahui

    2016-08-01

    Amylase is one of the most important digestive enzymes for phytophagous animals. In this study, the cDNA, genomic DNA, and promoter region of the α-amylase gene of the pearl oyster Pinctada fucata were cloned by using reverse transcription-polymerase chain reaction (RT-PCR), rapid amplification of cDNA ends, and genome-walking methods. The full-length cDNA sequence was 1704bp long and consisted of a 5'-untranslated region of 17bp, a 3'-untranslated region of 118bp, and a 1569-bp open reading frame encoding a 522-aa polypeptide with a 20-aa signal peptide. Sequence alignment revealed that P. fucata α-amylase (Pfamy) shared the highest identity (91.6%) with Pinctada maxima. The phylogenetic tree showed that it was closely related to P. maxima, based on the amino acid sequences. The genomic DNA was 10850bp and contained nine exons, eight introns, and a promoter region of 3932bp. Several transcriptional factors such as GATA-1, AP-1, and SP1 were predicted in the promoter region. Quantitative RT-PCR assay indicated that the relative expression level of Pfamy was significantly higher in the digestive gland than in other tissues (gonad, gills, muscle, and mantle) (P<0.001). The expression level at salinity 27‰ was significantly higher than that at other salinities (P<0.05). Expression reached a minimum when the algal food concentration was 16×10(4)cells/mL, which was significantly lower than the level observed at 8×10(4)cells/mL and 20×10(4) cells/mL (P<0.05). Our findings provide a genetic basis for further research on Pfamy activity and will facilitate studies on the growth mechanisms and genetic improvement of the pearl oyster P. fucata.

  1. p63 gene structure in the phylum mollusca.

    PubMed

    Baričević, Ana; Štifanić, Mauro; Hamer, Bojan; Batel, Renato

    2015-08-01

    Roles of p53 family ancestor (p63) in the organisms' response to stressful environmental conditions (mainly pollution) have been studied among molluscs, especially in the genus Mytilus, within the last 15 years. Nevertheless, information about gene structure of this regulatory gene in molluscs is scarce. Here we report the first complete genomic structure of the p53 family orthologue in the mollusc Mediterranean mussel Mytilus galloprovincialis and confirm its similarity to vertebrate p63 gene. Our searches within the available molluscan genomes (Aplysia californica, Lottia gigantea, Crassostrea gigas and Biomphalaria glabrata), found only one p53 family member present in a single copy per haploid genome. Comparative analysis of those orthologues, additionally confirmed the conserved p63 gene structure. Conserved p63 gene structure can be a helpful tool to complement or/and revise gene annotations of any future p63 genomic sequence records in molluscs, but also in other animal phyla. Knowledge of the correct gene structure will enable better prediction of possible protein isoforms and their functions. Our analyses also pointed out possible mis-annotations of the p63 gene in sequenced molluscan genomes and stressed the value of manual inspection (based on alignments of cDNA and protein onto the genome sequence) for a reliable and complete gene annotation.

  2. Insular Organization of Gene Space in Grass Genomes

    PubMed Central

    Massa, Alicia N.; Wanjugi, Humphrey; Deal, Karin R.; You, Frank M.; Xu, Xiangyang; Gu, Yong Q.; Luo, Ming-Cheng; Anderson, Olin D.; Chan, Agnes P.; Rabinowicz, Pablo

    2013-01-01

    Wheat and maize genes were hypothesized to be clustered into islands but the hypothesis was not statistically tested. The hypothesis is statistically tested here in four grass species differing in genome size, Brachypodium distachyon, Oryza sativa, Sorghum bicolor, and Aegilops tauschii. Density functions obtained under a model where gene locations follow a homogeneous Poisson process and thus are not clustered are compared with a model-free situation quantified through a non-parametric density estimate. A simple homogeneous Poisson model for gene locations is not rejected for the small O. sativa and B. distachyon genomes, indicating that genes are distributed largely uniformly in those species, but is rejected for the larger S. bicolor and Ae. tauschii genomes, providing evidence for clustering of genes into islands. It is proposed to call the gene islands “gene insulae” to distinguish them from other types of gene clustering that have been proposed. An average S. bicolor and Ae. tauschii insula is estimated to contain 3.7 and 3.9 genes with an average intergenic distance within an insula of 2.1 and 16.5 kb, respectively. Inter-insular distances are greater than 8 and 81 kb and average 15.1 and 205 kb, in S. bicolor and Ae. tauschii, respectively. A greater gene density observed in the distal regions of the Ae. tauschii chromosomes is shown to be primarily caused by shortening of inter-insular distances. The comparison of the four grass genomes suggests that gene locations are largely a function of a homogeneous Poisson process in small genomes. Nonrandom insertions of LTR retroelements during genome expansion creates gene insulae, which become less dense and further apart with the increase in genome size. High concordance in relative lengths of orthologous intergenic distances among the investigated genomes including the maize genome suggests functional constraints on gene distribution in the grass genomes. PMID:23326580

  3. Insular organization of gene space in grass genomes.

    PubMed

    Gottlieb, Andrea; Müller, Hans-Georg; Massa, Alicia N; Wanjugi, Humphrey; Deal, Karin R; You, Frank M; Xu, Xiangyang; Gu, Yong Q; Luo, Ming-Cheng; Anderson, Olin D; Chan, Agnes P; Rabinowicz, Pablo; Devos, Katrien M; Dvorak, Jan

    2013-01-01

    Wheat and maize genes were hypothesized to be clustered into islands but the hypothesis was not statistically tested. The hypothesis is statistically tested here in four grass species differing in genome size, Brachypodium distachyon, Oryza sativa, Sorghum bicolor, and Aegilops tauschii. Density functions obtained under a model where gene locations follow a homogeneous Poisson process and thus are not clustered are compared with a model-free situation quantified through a non-parametric density estimate. A simple homogeneous Poisson model for gene locations is not rejected for the small O. sativa and B. distachyon genomes, indicating that genes are distributed largely uniformly in those species, but is rejected for the larger S. bicolor and Ae. tauschii genomes, providing evidence for clustering of genes into islands. It is proposed to call the gene islands "gene insulae" to distinguish them from other types of gene clustering that have been proposed. An average S. bicolor and Ae. tauschii insula is estimated to contain 3.7 and 3.9 genes with an average intergenic distance within an insula of 2.1 and 16.5 kb, respectively. Inter-insular distances are greater than 8 and 81 kb and average 15.1 and 205 kb, in S. bicolor and Ae. tauschii, respectively. A greater gene density observed in the distal regions of the Ae. tauschii chromosomes is shown to be primarily caused by shortening of inter-insular distances. The comparison of the four grass genomes suggests that gene locations are largely a function of a homogeneous Poisson process in small genomes. Nonrandom insertions of LTR retroelements during genome expansion creates gene insulae, which become less dense and further apart with the increase in genome size. High concordance in relative lengths of orthologous intergenic distances among the investigated genomes including the maize genome suggests functional constraints on gene distribution in the grass genomes.

  4. Comparative analysis of grapevine whole-genome gene predictions, functional annotation, categorization and integration of the predicted gene sequences

    PubMed Central

    2012-01-01

    Background The first draft assembly and gene prediction of the grapevine genome (8X base coverage) was made available to the scientific community in 2007, and functional annotation was developed on this gene prediction. Since then additional Sanger sequences were added to the 8X sequences pool and a new version of the genomic sequence with superior base coverage (12X) was produced. Results In order to more efficiently annotate the function of the genes predicted in the new assembly, it is important to build on as much of the previous work as possible, by transferring 8X annotation of the genome to the 12X version. The 8X and 12X assemblies and gene predictions of the grapevine genome were compared to answer the question, “Can we uniquely map 8X predicted genes to 12X predicted genes?” The results show that while the assemblies and gene structure predictions are too different to make a complete mapping between them, most genes (18,725) showed a one-to-one relationship between 8X predicted genes and the last version of 12X predicted genes. In addition, reshuffled genomic sequence structures appeared. These highlight regions of the genome where the gene predictions need to be taken with caution. Based on the new grapevine gene functional annotation and in-depth functional categorization, twenty eight new molecular networks have been created for VitisNet while the existing networks were updated. Conclusions The outcomes of this study provide a functional annotation of the 12X genes, an update of VitisNet, the system of the grapevine molecular networks, and a new functional categorization of genes. Data are available at the VitisNet website (http://www.sdstate.edu/ps/research/vitis/pathways.cfm). PMID:22554261

  5. Identifying potential cancer driver genes by genomic data integration

    NASA Astrophysics Data System (ADS)

    Chen, Yong; Hao, Jingjing; Jiang, Wei; He, Tong; Zhang, Xuegong; Jiang, Tao; Jiang, Rui

    2013-12-01

    Cancer is a genomic disease associated with a plethora of gene mutations resulting in a loss of control over vital cellular functions. Among these mutated genes, driver genes are defined as being causally linked to oncogenesis, while passenger genes are thought to be irrelevant for cancer development. With increasing numbers of large-scale genomic datasets available, integrating these genomic data to identify driver genes from aberration regions of cancer genomes becomes an important goal of cancer genome analysis and investigations into mechanisms responsible for cancer development. A computational method, MAXDRIVER, is proposed here to identify potential driver genes on the basis of copy number aberration (CNA) regions of cancer genomes, by integrating publicly available human genomic data. MAXDRIVER employs several optimization strategies to construct a heterogeneous network, by means of combining a fused gene functional similarity network, gene-disease associations and a disease phenotypic similarity network. MAXDRIVER was validated to effectively recall known associations among genes and cancers. Previously identified as well as novel driver genes were detected by scanning CNAs of breast cancer, melanoma and liver carcinoma. Three predicted driver genes (CDKN2A, AKT1, RNF139) were found common in these three cancers by comparative analysis.

  6. Genome Editing Gene Therapy for Duchenne Muscular Dystrophy.

    PubMed

    Hotta, Akitsu

    2015-09-22

    Duchenne muscular dystrophy (DMD) is a severe genetic disorder caused by loss of function of the dystrophin gene on the X chromosome. Gene augmentation of dystrophin is challenging due to the large size of the dystrophin cDNA. Emerging genome editing technologies, such as TALEN and CRISPR-Cas9 systems, open a new erain the restoration of functional dystrophin and are a hallmark of bona fide gene therapy. In this review, we summarize current genome editing approaches, properties of target cell types for ex vivo gene therapy, and perspectives of in vivo gene therapy including genome editing in human zygotes. Although technical challenges, such as efficacy, accuracy, and delivery of the genome editing components, remain to be further improved, yet genome editing technologies offer a new avenue for the gene therapy of DMD.

  7. Using the Gene Ontology to Scan Multi-Level Gene Sets for Associations in Genome Wide Association Studies

    PubMed Central

    Schaid, Daniel J.; Sinnwell, Jason P.; Jenkins, Gregory D.; McDonnell, Shannon K.; Ingle, James N.; Kubo, Michiaki; Goss, Paul E.; Costantino, Joseph P.; Wickerham, D. Lawrence; Weinshilboum, Richard M.

    2011-01-01

    Gene-set analyses have been widely used in gene expression studies, and some of the developed methods have been extended to genome wide association studies (GWAS). Yet, complications due to linkage disequilibrium (LD) among single nucleotide polymorphisms (SNPs), and variable numbers of SNPs per gene and genes per gene-set, have plagued current approaches, often leading to ad hoc “fixes”. To overcome some of the current limitations, we developed a general approach to scan GWAS SNP data for both gene-level and gene-set analyses, building on score statistics for generalized linear models, and taking advantage of the directed acyclic graph structure of the gene ontology when creating gene-sets. However, other types of gene-set structures can be used, such as the popular Kyoto Encyclopedia of Genes and Genomes (KEGG). Our approach combines SNPs into genes, and genes into gene-sets, but assures that positive and negative effects of genes on a trait do not cancel. To control for multiple testing of many gene-sets, we use an efficient computational strategy that accounts for LD and provides accurate step-down adjusted p-values for each gene-set. Application of our methods to two different GWAS provide guidance on the potential strengths and weaknesses of our proposed gene-set analyses. PMID:22161999

  8. Molecular Characterization of Soybean Pterocarpan 2-Dimethylallyltransferase in Glyceollin Biosynthesis: Local Gene and Whole-Genome Duplications of Prenyltransferase Genes Led to the Structural Diversity of Soybean Prenylated Isoflavonoids

    PubMed Central

    Yoneyama, Keisuke; Akashi, Tomoyoshi; Aoki, Toshio

    2016-01-01

    Soybean (Glycine max) accumulates several prenylated isoflavonoid phytoalexins, collectively referred to as glyceollins. Glyceollins (I, II, III, IV and V) possess modified pterocarpan skeletons with C5 moieties from dimethylallyl diphosphate, and they are commonly produced from (6aS, 11aS)-3,9,6a-trihydroxypterocarpan [(−)-glycinol]. The metabolic fate of (−)-glycinol is determined by the enzymatic introduction of a dimethylallyl group into C-4 or C-2, which is reportedly catalyzed by regiospecific prenyltransferases (PTs). 4-Dimethylallyl (−)-glycinol and 2-dimethylallyl (−)-glycinol are precursors of glyceollin I and other glyceollins, respectively. Although multiple genes encoding (−)-glycinol biosynthetic enzymes have been identified, those involved in the later steps of glyceollin formation mostly remain unidentified, except for (−)-glycinol 4-dimethylallyltransferase (G4DT), which is involved in glyceollin I biosynthesis. In this study, we identified four genes that encode isoflavonoid PTs, including (−)-glycinol 2-dimethylallyltransferase (G2DT), using homology-based in silico screening and biochemical characterization in yeast expression systems. Transcript analyses illustrated that changes in G2DT gene expression were correlated with the induction of glyceollins II, III, IV and V in elicitor-treated soybean cells and leaves, suggesting its involvement in glyceollin biosynthesis. Moreover, the genomic signatures of these PT genes revealed that G4DT and G2DT are paralogs derived from whole-genome duplications of the soybean genome, whereas other PT genes [isoflavone dimethylallyltransferase 1 (IDT1) and IDT2] were derived via local gene duplication on soybean chromosome 11. PMID:27986914

  9. Genomic scan for genes predisposing to schizophrenia

    SciTech Connect

    Coon, H.; Jensen. S.; Holik, J.

    1994-03-15

    We initiated a genome-wide search for genes predisposing to schizophrenia by ascertaining 9 families, each containing three to five cases of schizophrenia. The 9 pedigrees were initially genotyped with 329 polymorphic DNA loci distributed throughout the genome. Assuming either autosomal dominant or recessive inheritance, 254 DNA loci yielded lod scores less than -2.0 at {theta} = 0.0, 101 DNA markers gave lod scores less than -2.0 at {theta} = 0.05, while 5 DNA loci produced maximum lod scores greater than 1: D4S35, D14S17, D15S1, D22S84, and D22S55. Of the DNA markers yielding lod scores greater than 1, D4S35 and D22S55 also were suggestive of linkage when the Affected-Pedigree-Member method was used. The families were then genotyped with four highly polymorphic simple sequence repeat markers; possible linkage diminished with DNA markers mapping nearby D4S35, while suggestive evidence of linkage remained with loci in the region of D22S55. Although follow-up investigation of these chromosomal regions may be warranted, our linkage results should be viewed as preliminary observations, as 35 unaffected persons are not past the age of risk. 90 refs., 3 tabs.

  10. From genes to genomes: universal scale-invariant properties of microbial chromosome organisation.

    PubMed

    Audit, Benjamin; Ouzounis, Christos A

    2003-09-19

    The availability of complete genome sequences for a large variety of organisms is a major advance in understanding genome structure and function. One attribute of genome structure is chromosome organisation in terms of gene localisation and orientation. For example, bacterial operons, i.e. clusters of co-oriented genes that form transcription units, enable functionally related genes to be expressed simultaneously. The description of genome organisation was pioneered with the study of the distribution of genes of the Escherichia coli partial genetic map before the full genome sequence was known. Deploying powerful techniques from circular statistics and signal processing, we revisit the issue of gene localisation and orientation using 89 complete microbial chromosomes from the eubacterial and archaeal domains. We demonstrate that there is no characteristic size pertinent to the description of chromosome structure, e.g. there does not exist any single length appropriate to describe gene clustering. Our results show that, for all 89 chromosomes, gene positions and gene orientations share a common form of scale-invariant correlations known as "long-range correlations" that we can reveal for distances from the gene length, up to the chromosome size. This observation indicates that genes tend to assemble and to co-orient over any scale of observation greater than a few kilobases. This unexpected property of chromosome structure can be portrayed as an operon-like organisation at all scales and implies that a complete scale range extending over more than three orders of magnitudes of chromosome segment lengths is necessary to properly describe prokaryotic genome organisation. We propose that this pattern results from the effects of the superhelical context on gene expression coupled with the structure and dynamics of the nucleoid, possibly accommodating the diverse gene expression profiles needed during the different stages of cellular life.

  11. Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes

    PubMed Central

    Lin, Michael F.; Carlson, Joseph W.; Crosby, Madeline A.; Matthews, Beverley B.; Yu, Charles; Park, Soo; Wan, Kenneth H.; Schroeder, Andrew J.; Gramates, L. Sian; St. Pierre, Susan E.; Roark, Margaret; Wiley, Kenneth L.; Kulathinal, Rob J.; Zhang, Peili; Myrick, Kyl V.; Antone, Jerry V.; Celniker, Susan E.; Gelbart, William M.; Kellis, Manolis

    2007-01-01

    The availability of sequenced genomes from 12 Drosophila species has enabled the use of comparative genomics for the systematic discovery of functional elements conserved within this genus. We have developed quantitative metrics for the evolutionary signatures specific to protein-coding regions and applied them genome-wide, resulting in 1193 candidate new protein-coding exons in the D. melanogaster genome. We have reviewed these predictions by manual curation and validated a subset by directed cDNA screening and sequencing, revealing both new genes and new alternative splice forms of known genes. We also used these evolutionary signatures to evaluate existing gene annotations, resulting in the validation of 87% of genes lacking descriptive names and identifying 414 poorly conserved genes that are likely to be spurious predictions, noncoding, or species-specific genes. Furthermore, our methods suggest a variety of refinements to hundreds of existing gene models, such as modifications to translation start codons and exon splice boundaries. Finally, we performed directed genome-wide searches for unusual protein-coding structures, discovering 149 possible examples of stop codon readthrough, 125 new candidate ORFs of polycistronic mRNAs, and several candidate translational frameshifts. These results affect >10% of annotated fly genes and demonstrate the power of comparative genomics to enhance our understanding of genome organization, even in a model organism as intensively studied as Drosophila melanogaster. PMID:17989253

  12. Whole genome sequence of Desulfovibrio magneticus strain RS-1 revealed common gene clusters in magnetotactic bacteria

    PubMed Central

    Nakazawa, Hidekazu; Arakaki, Atsushi; Narita-Yamada, Sachiko; Yashiro, Isao; Jinno, Koji; Aoki, Natsuko; Tsuruyama, Ai; Okamura, Yoshiko; Tanikawa, Satoshi; Fujita, Nobuyuki; Takeyama, Haruko; Matsunaga, Tadashi

    2009-01-01

    Magnetotactic bacteria are ubiquitous microorganisms that synthesize intracellular magnetite particles (magnetosomes) by accumulating Fe ions from aquatic environments. Recent molecular studies, including comprehensive proteomic, transcriptomic, and genomic analyses, have considerably improved our hypotheses of the magnetosome-formation mechanism. However, most of these studies have been conducted using pure-cultured bacterial strains of α-proteobacteria. Here, we report the whole-genome sequence of Desulfovibrio magneticus strain RS-1, the only isolate of magnetotactic microorganisms classified under δ-proteobacteria. Comparative genomics of the RS-1 and four α-proteobacterial strains revealed the presence of three separate gene regions (nuo and mamAB-like gene clusters, and gene region of a cryptic plasmid) conserved in all magnetotactic bacteria. The nuo gene cluster, encoding NADH dehydrogenase (complex I), was also common to the genomes of three iron-reducing bacteria exhibiting uncontrolled extracellular and/or intracellular magnetite synthesis. A cryptic plasmid, pDMC1, encodes three homologous genes that exhibit high similarities with those of other magnetotactic bacterial strains. In addition, the mamAB-like gene cluster, encoding the key components for magnetosome formation such as iron transport and magnetosome alignment, was conserved only in the genomes of magnetotactic bacteria as a similar genomic island-like structure. Our findings suggest the presence of core genetic components for magnetosome biosynthesis; these genes may have been acquired into the magnetotactic bacterial genomes by multiple gene-transfer events during proteobacterial evolution. PMID:19675025

  13. The discrepancies in the results of bioinformatics tools for genomic structural annotation

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena; Nowak, Robert; Osipowski, Paweł; Rymuszka, Jacek; Świerkula, Katarzyna; Wojcieszek, Michał; Przybecki, Zbigniew

    2014-11-01

    A major focus of sequencing project is to identify genes in genomes. However it is necessary to define the variety of genes and the criteria for identifying them. In this work we present discrepancies and dependencies from the application of different bioinformatic programs for structural annotation performed on the cucumber data set from Polish Consortium of Cucumber Genome Sequencing. We use Fgenesh, GenScan and GeneMark to automated structural annotation, the results have been compared to reference annotation.

  14. Rotavirus gene structure and function.

    PubMed Central

    Estes, M K; Cohen, J

    1989-01-01

    Knowledge of the structure and function of the genes and proteins of the rotaviruses has expanded rapidly. Information obtained in the last 5 years has revealed unexpected and unique molecular properties of rotavirus proteins of general interest to virologists, biochemists, and cell biologists. Rotaviruses share some features of replication with reoviruses, yet antigenic and molecular properties of the outer capsid proteins, VP4 (a protein whose cleavage is required for infectivity, possibly by mediating fusion with the cell membrane) and VP7 (a glycoprotein), show more similarities with those of other viruses such as the orthomyxoviruses, paramyxoviruses, and alphaviruses. Rotavirus morphogenesis is a unique process, during which immature subviral particles bud through the membrane of the endoplasmic reticulum (ER). During this process, transiently enveloped particles form, the outer capsid proteins are assembled onto particles, and mature particles accumulate in the lumen of the ER. Two ER-specific viral glycoproteins are involved in virus maturation, and these glycoproteins have been shown to be useful models for studying protein targeting and retention in the ER and for studying mechanisms of virus budding. New ideas and approaches to understanding how each gene functions to replicate and assemble the segmented viral genome have emerged from knowledge of the primary structure of rotavirus genes and their proteins and from knowledge of the properties of domains on individual proteins. Localization of type-specific and cross-reactive neutralizing epitopes on the outer capsid proteins is becoming increasingly useful in dissecting the protective immune response, including evaluation of vaccine trials, with the practical possibility of enhancing the production of new, more effective vaccines. Finally, future analyses with recently characterized immunologic and gene probes and new animal models can be expected to provide a basic understanding of what regulates the

  15. Comprehensively identifying and characterizing the missing gene sequences in human reference genome with integrated analytic approaches.

    PubMed

    Chen, Geng; Wang, Charles; Shi, Leming; Tong, Weida; Qu, Xiongfei; Chen, Jiwei; Yang, Jianmin; Shi, Caiping; Chen, Long; Zhou, Peiying; Lu, Bingxin; Shi, Tieliu

    2013-08-01

    The human reference genome is still incomplete and a number of gene sequences are missing from it. The approaches to uncover them, the reasons causing their absence and their functions are less explored. Here, we comprehensively identified and characterized the missing genes of human reference genome with RNA-Seq data from 16 different human tissues. By using a combined approach of genome-guided transcriptome reconstruction coupled with genome-wide comparison, we uncovered 3.78 and 2.37 Mb transcribed regions in the human genome assemblies of Celera and HuRef either missed from their homologous chromosomes of NCBI human reference genome build 37.2 or partially or entirely absent from the reference. We further identified a significant number of novel transcript contigs in each tissue from de novo transcriptome assembly that are unalignable to NCBI build 37.2 but can be aligned to at least one of the genomes from Celera, HuRef, chimpanzee, macaca or mouse. Our analyses indicate that the missing genes could result from genome misassembly, transposition, copy number variation, translocation and other structural variations. Moreover, our results further suggest that a large portion of these missing genes are conserved between human and other mammals, implying their important biological functions. Totally, 1,233 functional protein domains were detected in these missing genes. Collectively, our study not only provides approaches for uncovering the missing genes of a genome, but also proposes the potential reasons causing genes missed from the genome and highlights the importance of uncovering the missing genes of incomplete genomes.

  16. Identification of structural variation in mouse genomes

    PubMed Central

    Keane, Thomas M.; Wong, Kim; Adams, David J.; Flint, Jonathan; Reymond, Alexandre; Yalcin, Binnaz

    2014-01-01

    Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation. PMID:25071822

  17. Evidence-based gene predictions in plant genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Automated evidence-based gene building is a rapid and cost-effective way to provide reliable gene annotations on newly sequenced genomes. One of the limitations of evidence-based gene builders, however, is their requirement for gene expression evidence—known proteins, full-length cDNAs, or expressed...

  18. Structure of the germline genome of Tetrahymena thermophila and relationship to the massively rearranged somatic genome

    PubMed Central

    Hamilton, Eileen P; Kapusta, Aurélie; Huvos, Piroska E; Bidwell, Shelby L; Zafar, Nikhat; Tang, Haibao; Hadjithomas, Michalis; Krishnakumar, Vivek; Badger, Jonathan H; Caler, Elisabet V; Russ, Carsten; Zeng, Qiandong; Fan, Lin; Levin, Joshua Z; Shea, Terrance; Young, Sarah K; Hegarty, Ryan; Daza, Riza; Gujja, Sharvari; Wortman, Jennifer R; Birren, Bruce W; Nusbaum, Chad; Thomas, Jainy; Carey, Clayton M; Pritham, Ellen J; Feschotte, Cédric; Noto, Tomoko; Mochizuki, Kazufumi; Papazyan, Romeo; Taverna, Sean D; Dear, Paul H; Cassidy-Hanley, Donna M; Xiong, Jie; Miao, Wei; Orias, Eduardo; Coyne, Robert S

    2016-01-01

    The germline genome of the binucleated ciliate Tetrahymena thermophila undergoes programmed chromosome breakage and massive DNA elimination to generate the somatic genome. Here, we present a complete sequence assembly of the germline genome and analyze multiple features of its structure and its relationship to the somatic genome, shedding light on the mechanisms of genome rearrangement as well as the evolutionary history of this remarkable germline/soma differentiation. Our results strengthen the notion that a complex, dynamic, and ongoing interplay between mobile DNA elements and the host genome have shaped Tetrahymena chromosome structure, locally and globally. Non-standard outcomes of rearrangement events, including the generation of short-lived somatic chromosomes and excision of DNA interrupting protein-coding regions, may represent novel forms of developmental gene regulation. We also compare Tetrahymena’s germline/soma differentiation to that of other characterized ciliates, illustrating the wide diversity of adaptations that have occurred within this phylum. DOI: http://dx.doi.org/10.7554/eLife.19090.001 PMID:27892853

  19. Mechanisms and dynamics of orphan gene emergence in insect genomes.

    PubMed

    Wissler, Lothar; Gadau, Jürgen; Simola, Daniel F; Helmkampf, Martin; Bornberg-Bauer, Erich

    2013-01-01

    Orphan genes are defined as genes that lack detectable similarity to genes in other species and therefore no clear signals of common descent (i.e., homology) can be inferred. Orphans are an enigmatic portion of the genome because their origin and function are mostly unknown and they typically make up 10% to 30% of all genes in a genome. Several case studies demonstrated that orphans can contribute to lineage-specific adaptation. Here, we study orphan genes by comparing 30 arthropod genomes, focusing in particular on seven recently sequenced ant genomes. This setup allows analyzing a major metazoan taxon and a comparison between social Hymenoptera (ants and bees) and nonsocial Diptera (flies and mosquitoes). First, we find that recently split lineages undergo accelerated genomic reorganization, including the rapid gain of many orphan genes. Second, between the two insect orders Hymenoptera and Diptera, orphan genes are more abundant and emerge more rapidly in Hymenoptera, in particular, in leaf-cutter ants. With respect to intragenomic localization, we find that ant orphan genes show little clustering, which suggests that orphan genes in ants are scattered uniformly over the genome and between nonorphan genes. Finally, our results indicate that the genetic mechanisms creating orphan genes-such as gene duplication, frame-shift fixation, creation of overlapping genes, horizontal gene transfer, and exaptation of transposable elements-act at different rates in insects, primates, and plants. In Formicidae, the majority of orphan genes has their origin in intergenic regions, pointing to a high rate of de novo gene formation or generalized gene loss, and support a recently proposed dynamic model of frequent gene birth and death.

  20. Chicken rRNA Gene Cluster Structure

    PubMed Central

    Dyomin, Alexander G.; Koshel, Elena I.; Kiselev, Artem M.; Saifitdinova, Alsu F.; Galkina, Svetlana A.; Fukagawa, Tatsuo; Kostareva, Anna A.

    2016-01-01

    Ribosomal RNA (rRNA) genes, whose activity results in nucleolus formation, constitute an extremely important part of genome. Despite the extensive exploration into avian genomes, no complete description of avian rRNA gene primary structure has been offered so far. We publish a complete chicken rRNA gene cluster sequence here, including 5’ETS (1836 bp), 18S rRNA gene (1823 bp), ITS1 (2530 bp), 5.8S rRNA gene (157 bp), ITS2 (733 bp), 28S rRNA gene (4441 bp) and 3’ETS (343 bp). The rRNA gene cluster sequence of 11863 bp was assembled from raw reads and deposited to GenBank under KT445934 accession number. The assembly was validated through in situ fluorescent hybridization analysis on chicken metaphase chromosomes using computed and synthesized specific probes, as well as through the reference assembly against de novo assembled rRNA gene cluster sequence using sequenced fragments of BAC-clone containing chicken NOR (nucleolus organizer region). The results have confirmed the chicken rRNA gene cluster validity. PMID:27299357

  1. Mechanisms and Dynamics of Orphan Gene Emergence in Insect Genomes

    PubMed Central

    Wissler, Lothar; Gadau, Jürgen; Simola, Daniel F.; Helmkampf, Martin; Bornberg-Bauer, Erich

    2013-01-01

    Orphan genes are defined as genes that lack detectable similarity to genes in other species and therefore no clear signals of common descent (i.e., homology) can be inferred. Orphans are an enigmatic portion of the genome because their origin and function are mostly unknown and they typically make up 10% to 30% of all genes in a genome. Several case studies demonstrated that orphans can contribute to lineage-specific adaptation. Here, we study orphan genes by comparing 30 arthropod genomes, focusing in particular on seven recently sequenced ant genomes. This setup allows analyzing a major metazoan taxon and a comparison between social Hymenoptera (ants and bees) and nonsocial Diptera (flies and mosquitoes). First, we find that recently split lineages undergo accelerated genomic reorganization, including the rapid gain of many orphan genes. Second, between the two insect orders Hymenoptera and Diptera, orphan genes are more abundant and emerge more rapidly in Hymenoptera, in particular, in leaf-cutter ants. With respect to intragenomic localization, we find that ant orphan genes show little clustering, which suggests that orphan genes in ants are scattered uniformly over the genome and between nonorphan genes. Finally, our results indicate that the genetic mechanisms creating orphan genes—such as gene duplication, frame-shift fixation, creation of overlapping genes, horizontal gene transfer, and exaptation of transposable elements—act at different rates in insects, primates, and plants. In Formicidae, the majority of orphan genes has their origin in intergenic regions, pointing to a high rate of de novo gene formation or generalized gene loss, and support a recently proposed dynamic model of frequent gene birth and death. PMID:23348040

  2. Microfluidic gene arrays for rapid genomic profiling

    NASA Astrophysics Data System (ADS)

    West, Jay A.; Hukari, Kyle W.; Hux, Gary A.; Shepodd, Timothy J.

    2004-12-01

    Genomic analysis tools have recently become an indispensable tool for the evaluation of gene expression in a variety of experiment protocols. Two of the main drawbacks to this technology are the labor and time intensive process for sample preparation and the relatively long times required for target/probe hybridization. In order to overcome these two technological barriers we have developed a microfluidic chip to perform on chip sample purification and labeling, integrated with a high density genearray. Sample purification was performed using a porous polymer monolithic material functionalized with an oligo dT nucleotide sequence for the isolation of high purity mRNA. These purified mRNA"s can then rapidly labeled using a covalent fluorescent molecule which forms a selective covalent bond at the N7 position of guanine residues. These labeled mRNA"s can then released from the polymer monolith to allow for direct hybridization with oligonucletide probes deposited in microfluidic channel. To allow for rapid target/probe hybridization high density microarray were printed in microchannels. The channels can accommodate array densities as high as 4000 probes. When oligonucleotide deposition is complete, these channels are sealed using a polymer film which forms a pressure tight seal to allow sample reagent flow to the arrayed probes. This process will allow for real time target to probe hybridization monitoring using a top mounted CCD fiber bundle combination. Using this process we have been able to perform a multi-step sample preparation to labeled target/probe hybridization in less than 30 minutes. These results demonstrate the capability to perform rapid genomic screening on a high density microfluidic microarray of oligonucleotides.

  3. Structural and functional characterization of a transcription-enhancing sequence element in the rbcL gene of the Chlamydomonas chloroplast genome.

    PubMed

    Anthonisen, Inger Lill; Kasai, Seitaro; Kato, Ko; Salvador, Maria Luisa; Klein, Uwe

    2002-08-01

    The structure and function of a transcription-enhancing sequence element in the coding region of the Chlamydomonas reinhardtii rbcL gene was analyzed in Chlamydomonas chloroplast transformants in vivo. The enhancer sequence is contained within a DNA segment extending from position +108 to position +143, relative to the start site of rbcL gene transcription. The sequence remains functional when inverted or when placed 34 bp closer to or 87 bp further downstream of the basic rbcL promoter. However, it does not function from a site about 250 bp downstream of its original location. Besides promoting transcription initiation from the rbcL promoter, the element is able to augment transcription from the promoter of the Chlamydomonas chloroplast atpB gene, but has an inhibitory effect on transcription from the promoter of the chloroplast ribosomal RNA genes. The results suggest that the enhancer-like sequence acts upon transcription initiation in a position-specific and promoter type-specific manner.

  4. Genome-wide analysis of chromatin packing in Arabidopsis thaliana at single-gene resolution

    PubMed Central

    Liu, Chang; Wang, Congmao; Wang, George; Becker, Claude; Zaidem, Maricris; Weigel, Detlef

    2016-01-01

    The three-dimensional packing of the genome plays an important role in regulating gene expression. We have used Hi-C, a genome-wide chromatin conformation capture (3C) method, to analyze Arabidopsis thaliana chromosomes dissected into subkilobase segments, which is required for gene-level resolution in this species with a gene-dense genome. We found that the repressive H3K27me3 histone mark is overrepresented in the promoter regions of genes that are in conformational linkage over long distances. In line with the globally dispersed distribution of RNA polymerase II in A. thaliana nuclear space, actively transcribed genes do not show a strong tendency to associate with each other. In general, there are often contacts between 5′ and 3′ ends of genes, forming local chromatin loops. Such self-loop structures of genes are more likely to occur in more highly expressed genes, although they can also be found in silent genes. Silent genes with local chromatin loops are highly enriched for the histone variant H3.3 at their 5′ and 3′ ends but depleted of repressive marks such as heterochromatic histone modifications and DNA methylation in flanking regions. Our results suggest that, different from animals, a major theme of genome folding in A. thaliana is the formation of structural units that correspond to gene bodies. PMID:27225844

  5. Molecular cloning, genomic structure, polymorphism analysis and recombinant expression of a α1-antitrypsin like gene from swamp eel, Monopterus albus.

    PubMed

    Li, Wei; Wang, Quanhe; Li, Shaobin; Jiang, Ao; Sun, Wenxiu

    2017-03-01

    Alpha-1-antitrypsin (AAT) is a highly polymorphic glycoprotein antiprotease, involved in the regulation of human immune response. Beyond some genomic characterization and a few protein characterizations, the function of teleost AAT remains uncertain. In this study we cloned an AAT-like gene from a swamp eel liver identifying four exons and three introns, and the full-length cDNA. The elucidated swamp eel AAT amino acid sequence showed high homology with known AATs from other teleosts. The swamp eel AAT was examined both in ten healthy tissues and in four bacterially-stimulated tissues resulting in up-regulation of swamp eel AAT at different times. Swamp eel AAT transcripts were ubiquitously but unevenly expressed in ten tissues. Further, the mature peptide sequence of swamp eel AAT was subcloned and transformed into E. coli with the recombinant proteins successfully inhibiting bovine trypsin activity. Analysis of recombinant AAT showed equimolar formation of irreversible complexes with proteinases, high stability at pH 7.0-10.0 and temperatures below 55 °C. Serum AAT protein level significantly increased in response to inflammation with AAT anti-sera, and, NF-κB, apolipoprotein A1 and transferrin gene expression were dramatically decreased over 72 h post recombinant AAT injection. Lastly, examination of swamp eel AAT allelic polymorphism identified all alleles in both healthy and diseased stock except allele*g, found only in diseased stock, but without statistical difference between the distribution frequency of allele*g in the two stocks. These results are crucial to our ongoing study of the role of teleost AAT in the innate immune system.

  6. Prevalent role of gene features in determining evolutionary fates of whole-genome duplication duplicated genes in flowering plants.

    PubMed

    Jiang, Wen-kai; Liu, Yun-long; Xia, En-hua; Gao, Li-zhi

    2013-04-01

    The evolution of genes and genomes after polyploidization has been the subject of extensive studies in evolutionary biology and plant sciences. While a significant number of duplicated genes are rapidly removed during a process called fractionation, which operates after the whole-genome duplication (WGD), another considerable number of genes are retained preferentially, leading to the phenomenon of biased gene retention. However, the evolutionary mechanisms underlying gene retention after WGD remain largely unknown. Through genome-wide analyses of sequence and functional data, we comprehensively investigated the relationships between gene features and the retention probability of duplicated genes after WGDs in six plant genomes, Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa), soybean (Glycine max), rice (Oryza sativa), sorghum (Sorghum bicolor), and maize (Zea mays). The results showed that multiple gene features were correlated with the probability of gene retention. Using a logistic regression model based on principal component analysis, we resolved evolutionary rate, structural complexity, and GC3 content as the three major contributors to gene retention. Cluster analysis of these features further classified retained genes into three distinct groups in terms of gene features and evolutionary behaviors. Type I genes are more prone to be selected by dosage balance; type II genes are possibly subject to subfunctionalization; and type III genes may serve as potential targets for neofunctionalization. This study highlights that gene features are able to act jointly as primary forces when determining the retention and evolution of WGD-derived duplicated genes in flowering plants. These findings thus may help to provide a resolution to the debate on different evolutionary models of gene fates after WGDs.

  7. Genomic Data Quality Impacts Automated Detection of Lateral Gene Transfer in Fungi

    PubMed Central

    Dupont, Pierre-Yves; Cox, Murray P.

    2017-01-01

    Lateral gene transfer (LGT, also known as horizontal gene transfer), an atypical mechanism of transferring genes between species, has almost become the default explanation for genes that display an unexpected composition or phylogeny. Numerous methods of detecting LGT events all rely on two fundamental strategies: primary structure composition or gene tree/species tree comparisons. Discouragingly, the results of these different approaches rarely coincide. With the wealth of genome data now available, detection of laterally transferred genes is increasingly being attempted in large uncurated eukaryotic datasets. However, detection methods depend greatly on the quality of the underlying genomic data, which are typically complex for eukaryotes. Furthermore, given the automated nature of genomic data collection, it is typically impractical to manually verify all protein or gene models, orthology predictions, and multiple sequence alignments, requiring researchers to accept a substantial margin of error in their datasets. Using a test case comprising plant-associated genomes across the fungal kingdom, this study reveals that composition- and phylogeny-based methods have little statistical power to detect laterally transferred genes. In particular, phylogenetic methods reveal extreme levels of topological variation in fungal gene trees, the vast majority of which show departures from the canonical species tree. Therefore, it is inherently challenging to detect LGT events in typical eukaryotic genomes. This finding is in striking contrast to the large number of claims for laterally transferred genes in eukaryotic species that routinely appear in the literature, and questions how many of these proposed examples are statistically well supported. PMID:28235827

  8. Hemipteran genomics and psyllid gene expression

    Technology Transfer Automated Retrieval System (TEKTRAN)

    One of the best tools current available is the application of genomics to insect pest problems. Genomics provides rapid elucidation of the genetic basis of insect biology. Research efforts on psyllid genomics, while still in its infancy, is providing information which will aid strategies to suppress...

  9. Genome changes after gene duplication: haploidy vs. diploidy.

    PubMed

    Xue, Cheng; Huang, Ren; Maxwell, Taylor J; Fu, Yun-Xin

    2010-09-01

    Since genome size and the number of duplicate genes observed in genomes increase from haploid to diploid organisms, diploidy might provide more evolutionary probabilities through gene duplication. It is still unclear how diploidy promotes genomic evolution in detail. In this study, we explored the evolution of segmental gene duplication in haploid and diploid populations by analytical and simulation approaches. Results show that (1) under the double null recessive (DNR) selective model, given the same recombination rate, the evolutionary trajectories and consequences are very similar between the same-size gene-pool haploid vs. diploid populations; (2) recombination enlarges the probability of preservation of duplicate genes in either haploid or diploid large populations, and haplo-insufficiency reinforces this effect; and (3) the loss of duplicate genes at the ancestor locus is limited under recombination while under complete linkage the loss of duplicate genes is always random at the ancestor and newly duplicated loci. Therefore, we propose a model to explain the advantage of diploidy: diploidy might facilitate the increase of recombination rate, especially under sexual reproduction; more duplicate genes are preserved under more recombination by originalization (by which duplicate genes are preserved intact at a special quasi-mutation-selection balance under the DNR or haplo-insufficient selective model), so genome sizes and the number of duplicate genes in diploid organisms become larger. Additionally, it is suggested that small genomic rearrangements due to the random loss of duplicate genes might be limited under recombination.

  10. The fractal structure of the mitochondrial genomes

    NASA Astrophysics Data System (ADS)

    Oiwa, Nestor N.; Glazier, James A.

    2002-08-01

    The mitochondrial DNA genome has a definite multifractal structure. We show that loops, hairpins and inverted palindromes are responsible for this self-similarity. We can thus establish a definite relation between the function of subsequences and their fractal dimension. Intriguingly, protein coding DNAs also exhibit palindromic structures, although they do not appear in the sequence of amino acids. These structures may reflect the stabilization and transcriptional control of DNA or the control of posttranscriptional editing of mRNA.

  11. Genomic organization and 5{prime}-flanking DNA sequence of the murine stomatin gene (Epb72)

    SciTech Connect

    Gallagher, P.G.; Turetsky, T.; Mentzer, W.C. |

    1996-06-15

    Stomatin is a poorly understood integral membrane protein that is absent from the erythrocyte membranes of many patients with hereditary stomatocytosis. This report describes the cloning of the murine stomatin chromosomal gene, determination of its genomic structure, and characterization of the 5{prime}-flanking genomic DNA sequences. The stomatin gene is encoded by seven exons spread over {approximately}25 kb of genomic DNA. There is no concordance between the exon structure of the stomatin gene and the locations of three domains predicted on the basis of protein structure. Inspection of the 5{prime}-flanking DNA sequences reveals features of a TATA-less housekeeping gene promoter and consensus sequences for a number of potential DNA-binding proteins. 12 refs., 2 figs., 1 tab.

  12. PIECE: a database for plant gene structure comparison and evolution

    PubMed Central

    Wang, Yi; You, Frank M.; Lazo, Gerard R.; Luo, Ming-Cheng; Thilmony, Roger; Gordon, Sean; Kianian, Shahryar F.; Gu, Yong Q.

    2013-01-01

    Gene families often show degrees of differences in terms of exon–intron structures depending on their distinct evolutionary histories. Comparative analysis of gene structures is important for understanding their evolutionary and functional relationships within plant species. Here, we present a comparative genomics database named PIECE (http://wheat.pw.usda.gov/piece) for Plant Intron and Exon Comparison and Evolution studies. The database contains all the annotated genes extracted from 25 sequenced plant genomes. These genes were classified based on Pfam motifs. Phylogenetic trees were pre-constructed for each gene category. PIECE provides a user-friendly interface for different types of searches and a graphical viewer for displaying a gene structure pattern diagram linked to the resulting bootstrapped dendrogram for each gene family. The gene structure evolution of orthologous gene groups was determined using the GLOOME, Exalign and GECA software programs that can be accessed within the database. PIECE also provides a web server version of the software, GSDraw, for drawing schematic diagrams of gene structures. PIECE is a powerful tool for comparing gene sequences and provides valuable insights into the evolution of gene structure in plant genomes. PMID:23180792

  13. Mitochondrial Genome of Palpitomonas bilix: Derived Genome Structure and Ancestral System for Cytochrome c Maturation

    PubMed Central

    Nishimura, Yuki; Tanifuji, Goro; Kamikawa, Ryoma; Yabuki, Akinori; Hashimoto, Tetsuo; Inagaki, Yuji

    2016-01-01

    We here reported the mitochondrial (mt) genome of one of the heterotrophic microeukaryotes related to cryptophytes, Palpitomonas bilix. The P. bilix mt genome was found to be a linear molecule composed of “single copy region” (∼16 kb) and repeat regions (∼30 kb) arranged in an inverse manner at both ends of the genome. Linear mt genomes with large inverted repeats are known for three distantly related eukaryotes (including P. bilix), suggesting that this particular mt genome structure has emerged at least three times in the eukaryotic tree of life. The P. bilix mt genome contains 47 protein-coding genes including ccmA, ccmB, ccmC, and ccmF, which encode protein subunits involved in the system for cytochrome c maturation inherited from a bacterium (System I). We present data indicating that the phylogenetic relatives of P. bilix, namely, cryptophytes, goniomonads, and kathablepharids, utilize an alternative system for cytochrome c maturation, which has most likely emerged during the evolution of eukaryotes (System III). To explain the distribution of Systems I and III in P. bilix and its phylogenetic relatives, two scenarios are possible: (i) System I was replaced by System III on the branch leading to the common ancestor of cryptophytes, goniomonads, and kathablepharids, and (ii) the two systems co-existed in their common ancestor, and lost differentially among the four descendants. PMID:27604877

  14. Plant Ion Channels: Gene Families, Physiology, and Functional Genomics Analyses

    PubMed Central

    Ward, John M.; Mäser, Pascal; Schroeder, Julian I.

    2016-01-01

    Distinct potassium, anion, and calcium channels in the plasma membrane and vacuolar membrane of plant cells have been identified and characterized by patch clamping. Primarily owing to advances in Arabidopsis genetics and genomics, and yeast functional complementation, many of the corresponding genes have been identified. Recent advances in our understanding of ion channel genes that mediate signal transduction and ion transport are discussed here. Some plant ion channels, for example, ALMT and SLAC anion channel subunits, are unique. The majority of plant ion channel families exhibit homology to animal genes; such families include both hyperpolarization-and depolarization-activated Shaker-type potassium channels, CLC chloride transporters/channels, cyclic nucleotide–gated channels, and ionotropic glutamate receptor homologs. These plant ion channels offer unique opportunities to analyze the structural mechanisms and functions of ion channels. Here we review gene families of selected plant ion channel classes and discuss unique structure-function aspects and their physiological roles in plant cell signaling and transport. PMID:18842100

  15. Structural Variation Mutagenesis of the Human Genome: Impact on Disease and Evolution

    PubMed Central

    Lupski, James R.

    2015-01-01

    Watson-Crick base-pair changes, or single-nucleotide variants (SNV), have long been known as a source of mutations. However, the extent to which DNA structural variation, including duplication and deletion copy number variants (CNV) and copy number neutral inversions and translocations, contribute to human genome variation and disease has been appreciated only recently. Moreover, the potential complexity of structural variants (SV) was not envisioned; thus, the frequency of complex genomic rearrangements (CGR) and how such events form remained a mystery. The concept of genomic disorders, diseases due to genomic rearrangements and not sequence-based changes for which genomic architecture incite genomic instability, delineated a new category of conditions distinct from chromosomal syndromes and single-gene Mendelian diseases. Nevertheless, it is the mechanistic understanding of CNV/SV formation that has promoted further understanding of human biology and disease and provided insights into human genome and gene evolution. PMID:25892534

  16. Structure and sequence of the saimiriine herpesvirus 1 genome.

    PubMed

    Tyler, Shaun; Severini, Alberto; Black, Darla; Walker, Matthew; Eberle, R

    2011-02-05

    We report here the complete genome sequence of the squirrel monkey α-herpesvirus saimiriine herpesvirus 1 (HVS1). Unlike the simplexviruses of other primate species, only the unique short region of the HVS1 genome is bounded by inverted repeats. While all Old World simian simplexviruses characterized to date lack the herpes simplex virus RL1 (γ34.5) gene, HVS1 has an RL1 gene. HVS1 lacks several genes that are present in other primate simplexviruses (US8.5, US10-12, UL43/43.5 and UL49A). Although the overall genome structure appears more like that of varicelloviruses, the encoded HVS1 proteins are most closely related to homologous proteins of the primate simplexviruses. Phylogenetic analyses confirm that HVS1 is a simplexvirus. Limited comparison of two HVS1 strains revealed a very low degree of sequence variation more typical of varicelloviruses. HVS1 is thus unique among the primate α-herpesviruses in that its genome has properties of both simplexviruses and varicelloviruses.

  17. Coelacanth genome sequence reveals the evolutionary history of vertebrate genes.

    PubMed

    Noonan, James P; Grimwood, Jane; Danke, Joshua; Schmutz, Jeremy; Dickson, Mark; Amemiya, Chris T; Myers, Richard M

    2004-12-01

    The coelacanth is one of the nearest living relatives of tetrapods. However, a teleost species such as zebrafish or Fugu is typically used as the outgroup in current tetrapod comparative sequence analyses. Such studies are complicated by the fact that teleost genomes have undergone a whole-genome duplication event, as well as individual gene-duplication events. Here, we demonstrate the value of coelacanth genome sequence by complete sequencing and analysis of the protocadherin gene cluster of the Indonesian coelacanth, Latimeria menadoensis. We found that coelacanth has 49 protocadherin cluster genes organized in the same three ordered subclusters, alpha, beta, and gamma, as the 54 protocadherin cluster genes in human. In contrast, whole-genome and tandem duplications have generated two zebrafish protocadherin clusters comprised of at least 97 genes. Additionally, zebrafish protocadherins are far more prone to homogenizing gene conversion events than coelacanth protocadherins, suggesting that recombination- and duplication-driven plasticity may be a feature of teleost genomes. Our results indicate that coelacanth provides the ideal outgroup sequence against which tetrapod genomes can be measured. We therefore present L. menadoensis as a candidate for whole-genome sequencing.

  18. Genome-editing Technologies for Gene and Cell Therapy.

    PubMed

    Maeder, Morgan L; Gersbach, Charles A

    2016-03-01

    Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed.

  19. Genome-editing Technologies for Gene and Cell Therapy

    PubMed Central

    Maeder, Morgan L; Gersbach, Charles A

    2016-01-01

    Gene therapy has historically been defined as the addition of new genes to human cells. However, the recent advent of genome-editing technologies has enabled a new paradigm in which the sequence of the human genome can be precisely manipulated to achieve a therapeutic effect. This includes the correction of mutations that cause disease, the addition of therapeutic genes to specific sites in the genome, and the removal of deleterious genes or genome sequences. This review presents the mechanisms of different genome-editing strategies and describes each of the common nuclease-based platforms, including zinc finger nucleases, transcription activator-like effector nucleases (TALENs), meganucleases, and the CRISPR/Cas9 system. We then summarize the progress made in applying genome editing to various areas of gene and cell therapy, including antiviral strategies, immunotherapies, and the treatment of monogenic hereditary disorders. The current challenges and future prospects for genome editing as a transformative technology for gene and cell therapy are also discussed. PMID:26755333

  20. Comparative mapping and genomic annotation of the bovine oncosuppressor gene WWOX.

    PubMed

    Manera, S; Bonfiglio, S; Malusà, A; Denis, C; Boussaha, M; Russo, V; Roperto, F; Perucatti, A; Di Meo, G P; Eggen, A; Ferretti, L

    2009-01-01

    WWOX (WW domain-containing oxidoreductase) is the gene mapping at FRA16D HSA16q23.1, the second most active common fragile site in the human genome. In this study we characterized at a detailed molecular level WWOX in the bovine genome. First, we sequenced cDNA from various tissues and obtained evidence in support of a 9-exon structure for the gene, similar to the human gene. Then, we recovered BACs using exon tags and annotated the gene to a >1-Mb genomic region of BTA18 using the Btau 4.0 genome assembly as a reference, thus resolving an issue related to exon 9, which is not included in the genomic annotation of the gene in the Entrez database. Finally, BACs spanning WWOX were used as FISH probes to obtain comparative mapping of the gene in Bos taurus, Bubalus bubalis, Ovis aries and Capra hircus to BTA18q12.1, BBU18q13, OAR14q12.1 and CHI18q12.1, respectively. Our data show that the chromosomal location of WWOX is conserved between man and 4 major domesticated species. Moreover, the annotation of the bovine gene also suggests a highly conserved genomic arrangement, including number and size of introns.

  1. Primary structure of the herpesvirus saimiri genome.

    PubMed Central

    Albrecht, J C; Nicholas, J; Biller, D; Cameron, K R; Biesinger, B; Newman, C; Wittmann, S; Craxton, M A; Coleman, H; Fleckenstein, B

    1992-01-01

    This report describes the complete nucleotide sequence of the genome of herpesvirus saimiri, the prototype of gammaherpesvirus subgroup 2 (rhadinoviruses). The unique low-G + C-content DNA region has 112,930 bp with an average base composition of 34.5% G + C and is flanked by about 35 noncoding high-G + C-content DNA repeats of 1,444 bp (70.8% G + C) in tandem orientation. We identified 76 major open reading frames and a set of seven U-RNA genes for a total of 83 potential genes. The genes are closely arranged, with only a few regions of sizable noncoding sequences. For 60 of the predicted proteins, homologous sequences are found in other herpesviruses. Genes conserved between herpesvirus saimiri and Epstein-Barr virus (gammaherpesvirus subgroup 1) show that their genomes are generally collinear, although conserved gene blocks are separated by unique genes that appear to determine the particular phenotype of these viruses. Several deduced protein sequences of herpesvirus saimiri without counterparts in most of the other sequenced herpesviruses exhibited significant homology with cellular proteins of known function. These include thymidylate synthase, dihydrofolate reductase, complement control proteins, the cell surface antigen CD59, cyclins, and G protein-coupled receptors. Searching for functional protein motifs revealed that the virus may encode a cytosine-specific methylase and a tyrosine-specific protein kinase. Several herpesvirus saimiri genes are potential candidates to cooperate with the gene for saimiri transformation-associated protein of subgroup A (STP-A) in T-lymphocyte growth stimulation. PMID:1321287

  2. Three-dimensional modeling of the P. falciparum genome during the erythrocytic cycle reveals a strong connection between genome architecture and gene expression.

    PubMed

    Ay, Ferhat; Bunnik, Evelien M; Varoquaux, Nelle; Bol, Sebastiaan M; Prudhomme, Jacques; Vert, Jean-Philippe; Noble, William Stafford; Le Roch, Karine G

    2014-06-01

    The development of the human malaria parasite Plasmodium falciparum is controlled by coordinated changes in gene expression throughout its complex life cycle, but the corresponding regulatory mechanisms are incompletely understood. To study the relationship between genome architecture and gene regulation in Plasmodium, we assayed the genome architecture of P. falciparum at three time points during its erythrocytic (asexual) cycle. Using chromosome conformation capture coupled with next-generation sequencing technology (Hi-C), we obtained high-resolution chromosomal contact maps, which we then used to construct a consensus three-dimensional genome structure for each time point. We observed strong clustering of centromeres, telomeres, ribosomal DNA, and virulence genes, resulting in a complex architecture that cannot be explained by a simple volume exclusion model. Internal virulence gene clusters exhibit domain-like structures in contact maps, suggesting that they play an important role in the genome architecture. Midway during the erythrocytic cycle, at the highly transcriptionally active trophozoite stage, the genome adopts a more open chromatin structure with increased chromosomal intermingling. In addition, we observed reduced expression of genes located in spatial proximity to the repressive subtelomeric center, and colocalization of distinct groups of parasite-specific genes with coordinated expression profiles. Overall, our results are indicative of a strong association between the P. falciparum spatial genome organization and gene expression. Understanding the molecular processes involved in genome conformation dynamics could contribute to the discovery of novel antimalarial strategies.

  3. Complete female mitochondrial genome of Anodonta anatina (Mollusca: Unionidae): confirmation of a novel protein-coding gene (F ORF).

    PubMed

    Soroka, Marianna; Burzyński, Artur

    2015-04-01

    Freshwater mussels are among animals having two different, gender-specific mitochondrial genomes. We sequenced complete female mitochondrial genomes from five individuals of Anodonta anatina, a bivalve species common in palearctic ecozone. The length of the genome was variable: 15,637-15,653 bp. This variation was almost entirely confined to the non-coding parts, which constituted approximately 5% of the genome. Nucleotide diversity was moderate, at 0.3%. Nucleotide composition was typically biased towards AT (66.0%). All genes normally seen in animal mtDNA were identified, as well as the ORF characteristic for unionid mitochondrial genomes, bringing the total number of genes present to 38. If this additional ORF does encode a protein, it must evolve under a very relaxed selection since all substitutions within this gene were non-synonymous. The gene order and structure of the genome were identical to those of all female mitochondrial genomes described in unionid bivalves except the Gonideini.

  4. The inheritance of organelle genes and genomes: patterns and mechanisms.

    PubMed

    Xu, Jianping

    2005-12-01

    Unlike nuclear genes and genomes, the inheritance of organelle genes and genomes does not follow Mendel's laws. In this mini-review, I summarize recent research progress on the patterns and mechanisms of the inheritance of organelle genes and genomes. While most sexual eukaryotes show uniparental inheritance of organelle genes and genomes in some progeny at least part of the time, increasing evidence indicates that strictly uniparental inheritance is rare and that organelle inheritance patterns are very diverse and complex. In contrast with the predominance of uniparental inheritance in multicellular organisms, organelle genes in eukaryotic microorganisms, such as protists, algae, and fungi, typically show a greater diversity of inheritance patterns, with sex-determining loci playing significant roles. The diverse patterns of inheritance are matched by the rich variety of potential mechanisms. Indeed, many factors, both deterministic and stochastic, can influence observed patterns of organelle inheritance. Interestingly, in multicellular organisms, progeny from interspecific crosses seem to exhibit more frequent paternal leakage and biparental organelle genome inheritance than those from intraspecific crosses. The recent observation of a sex-determining gene in the basidiomycete yeast Cryptococcus neoformans, which controls mitochondrial DNA inheritance, has opened up potentially exciting research opportunities for identifying specific molecular genetic pathways that control organelle inheritance, as well as for testing evolutionary hypotheses regarding the prevalence of uniparental inheritance of organelle genes and genomes.

  5. Genome Variability and Gene Content in Chordopoxviruses: Dependence on Microsatellites

    PubMed Central

    Hatcher, Eneida L.; Wang, Chunlin; Lefkowitz, Elliot J.

    2015-01-01

    To investigate gene loss in poxviruses belonging to the Chordopoxvirinae subfamily, we assessed the gene content of representative members of the subfamily, and determined whether individual genes present in each genome were intact, truncated, or fragmented. When nonintact genes were identified, the early stop mutations (ESMs) leading to gene truncation or fragmentation were analyzed. Of all the ESMs present in these poxvirus genomes, over 65% co-localized with microsatellites—simple sequence nucleotide repeats. On average, microsatellites comprise 24% of the nucleotide sequence of these poxvirus genomes. These simple repeats have been shown to exhibit high rates of variation, and represent a target for poxvirus protein variation, gene truncation, and reductive evolution. PMID:25912716

  6. Predictions of Gene Family Distributions in Microbial Genomes: Evolution by Gene Duplication and Modification

    SciTech Connect

    Yanai, Itai; Camacho, Carlos J.; DeLisi, Charles

    2000-09-18

    A universal property of microbial genomes is the considerable fraction of genes that are homologous to other genes within the same genome. The process by which these homologues are generated is not well understood, but sequence analysis of 20 microbial genomes unveils a recurrent distribution of gene family sizes. We show that a simple evolutionary model based on random gene duplication and point mutations fully accounts for these distributions and permits predictions for the number of gene families in genomes not yet complete. Our findings are consistent with the notion that a genome evolves from a set of precursor genes to a mature size by gene duplications and increasing modifications. (c) 2000 The American Physical Society.

  7. Genome engineering and gene expression control for bacterial strain development.

    PubMed

    Song, Chan Woo; Lee, Joungmin; Lee, Sang Yup

    2015-01-01

    In recent years, a number of techniques and tools have been developed for genome engineering and gene expression control to achieve desired phenotypes of various bacteria. Here we review and discuss the recent advances in bacterial genome manipulation and gene expression control techniques, and their actual uses with accompanying examples. Genome engineering has been commonly performed based on homologous recombination. During such genome manipulation, the counterselection systems employing SacB or nucleases have mainly been used for the efficient selection of desired engineered strains. The recombineering technology enables simple and more rapid manipulation of the bacterial genome. The group II intron-mediated genome engineering technology is another option for some bacteria that are difficult to be engineered by homologous recombination. Due to the increasing demands on high-throughput screening of bacterial strains having the desired phenotypes, several multiplex genome engineering techniques have recently been developed and validated in some bacteria. Another approach to achieve desired bacterial phenotypes is the repression of target gene expression without the modification of genome sequences. This can be performed by expressing antisense RNA, small regulatory RNA, or CRISPR RNA to repress target gene expression at the transcriptional or translational level. All of these techniques allow efficient and rapid development and screening of bacterial strains having desired phenotypes, and more advanced techniques are expected to be seen.

  8. Higher plant mitochondrial DNA: Genomes, genes, mutants, transcription, translation

    SciTech Connect

    Not Available

    1986-01-01

    This volume contains brief summaries of 63 presentations given at the International Workshop on Higher Plant Mitochondrial DNA. The presentations are organized into topical discussions addressing plant genomes, mitochondrial genes, cytoplasmic male sterility, transcription, translation, plasmids and tissue culture. (DT)

  9. A data management system for structural genomics

    PubMed Central

    Raymond, Stéphane; O'Toole, Nicholas; Cygler, Miroslaw

    2004-01-01

    Background Structural genomics (SG) projects aim to determine thousands of protein structures by the development of high-throughput techniques for all steps of the experimental structure determination pipeline. Crucial to the success of such endeavours is the careful tracking and archiving of experimental and external data on protein targets. Results We have developed a sophisticated data management system for structural genomics. Central to the system is an Oracle-based, SQL-interfaced database. The database schema deals with all facets of the structure determination process, from target selection to data deposition. Users access the database via any web browser. Experimental data is input by users with pre-defined web forms. Data can be displayed according to numerous criteria. A list of all current target proteins can be viewed, with links for each target to associated entries in external databases. To avoid unnecessary work on targets, our data management system matches protein sequences weekly using BLAST to entries in the Protein Data Bank and to targets of other SG centers worldwide. Conclusion Our system is a working, effective and user-friendly data management tool for structural genomics projects. In this report we present a detailed summary of the various capabilities of the system, using real target data as examples, and indicate our plans for future enhancements. PMID:15210054

  10. Amplification and characterization of eukaryotic structural genes.

    PubMed

    Maniatis, T; Efstratiadis, A; Sim, G K; Kafatos, F

    1978-05-01

    An approach to the study of eukaryotic structural genes which are differentially expressed during development is described. This approach involves the isolation and amplification of mRNA sequences by in vitro conversion of mRNA to double-stranded cDNA followed by molecular cloning in bacterial plasmids. This procedure provides highly specific hybridization probes that can be used to identify genes and their contiguous DNA sequences in genomic DNA, and to detect specific RNA transcripts during development. The nature of the method allows the isolation of individual mRNA sequences from a complex population of molecules at different stages of development.

  11. Interrogating the druggable genome with structural informatics.

    PubMed

    Hambly, Kevin; Danzer, Joseph; Muskal, Steven; Debe, Derek A

    2006-08-01

    Structural genomics projects are producing protein structure data at an unprecedented rate. In this paper, we present the Target Informatics Platform (TIP), a novel structural informatics approach for amplifying the rapidly expanding body of experimental protein structure information to enhance the discovery and optimization of small molecule protein modulators on a genomic scale. In TIP, existing experimental structure information is augmented using a homology modeling approach, and binding sites across multiple target families are compared using a clique detection algorithm. We report here a detailed analysis of the structural coverage for the set of druggable human targets, highlighting drug target families where the level of structural knowledge is currently quite high, as well as those areas where structural knowledge is sparse. Furthermore, we demonstrate the utility of TIP's intra- and inter-family binding site similarity analysis using a series of retrospective case studies. Our analysis underscores the utility of a structural informatics infrastructure for extracting drug discovery-relevant information from structural data, aiding researchers in the identification of lead discovery and optimization opportunities as well as potential "off-target" liabilities.

  12. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

    PubMed

    Huang, Ying; Chen, Shi-Yi; Deng, Feilong

    2016-01-01

    In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.

  13. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics

    PubMed Central

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-01-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  14. Comparative analysis of essential genes in prokaryotic genomic islands

    PubMed Central

    Zhang, Xi; Peng, Chong; Zhang, Ge; Gao, Feng

    2015-01-01

    Essential genes are thought to encode proteins that carry out the basic functions to sustain a cellular life, and genomic islands (GIs) usually contain clusters of horizontally transferred genes. It has been assumed that essential genes are not likely to be located in GIs, but systematical analysis of essential genes in GIs has not been explored before. Here, we have analyzed the essential genes in 28 prokaryotes by statistical method and reached a conclusion that essential genes in GIs are significantly fewer than those outside GIs. The function of 362 essential genes found in GIs has been explored further by BLAST against the Virulence Factor Database (VFDB) and the phage/prophage sequence database of PHAge Search Tool (PHAST). Consequently, 64 and 60 eligible essential genes are found to share the sequence similarity with the virulence factors and phage/prophages-related genes, respectively. Meanwhile, we find several toxin-related proteins and repressors encoded by these essential genes in GIs. The comparative analysis of essential genes in genomic islands will not only shed new light on the development of the prediction algorithm of essential genes, but also give a clue to detect the functionality of essential genes in genomic islands. PMID:26223387

  15. A physical map for the Amborella trichopoda genome sheds light on the evolution of angiosperm genome structure

    PubMed Central

    2011-01-01

    Background Recent phylogenetic analyses have identified Amborella trichopoda, an understory tree species endemic to the forests of New Caledonia, as sister to a clade including all other known flowering plant species. The Amborella genome is a unique reference for understanding the evolution of angiosperm genomes because it can serve as an outgroup to root comparative analyses. A physical map, BAC end sequences and sample shotgun sequences provide a first view of the 870 Mbp Amborella genome. Results Analysis of Amborella BAC ends sequenced from each contig suggests that the density of long terminal repeat retrotransposons is negatively correlated with that of protein coding genes. Syntenic, presumably ancestral, gene blocks were identified in comparisons of the Amborella BAC contigs and the sequenced Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera and Oryza sativa genomes. Parsimony mapping of the loss of synteny corroborates previous analyses suggesting that the rate of structural change has been more rapid on lineages leading to Arabidopsis and Oryza compared with lineages leading to Populus and Vitis. The gamma paleohexiploidy event identified in the Arabidopsis, Populus and Vitis genomes is shown to have occurred after the divergence of all other known angiosperms from the lineage leading to Amborella. Conclusions When placed in the context of a physical map, BAC end sequences representing just 5.4% of the Amborella genome have facilitated reconstruction of gene blocks that existed in the last common ancestor of all flowering plants. The Amborella genome is an invaluable reference for inferences concerning the ancestral angiosperm and subsequent genome evolution. PMID:21619600

  16. A potentially novel overlapping gene in the genomes of Israeli acute paralysis virus and its relatives

    PubMed Central

    Sabath, Niv; Price, Nicholas; Graur, Dan

    2009-01-01

    The Israeli acute paralysis virus (IAPV) is a honeybee-infecting virus that was found to be associated with colony collapse disorder. The IAPV genome contains two genes encoding a structural and a nonstructural polyprotein. We applied a recently developed method for the estimation of selection in overlapping genes to detect purifying selection and, hence, functionality. We provide evolutionary evidence for the existence of a functional overlapping gene, which is translated in the +1 reading frame of the structural polyprotein gene. Conserved orthologs of this putative gene, which we provisionally call pog (predicted overlapping gene), were also found in the genomes of a monophyletic clade of dicistroviruses that includes IAPV, acute bee paralysis virus, Kashmir bee virus, and Solenopsis invicta (red imported fire ant) virus 1. PMID:19761605

  17. Demarcating the gene-rich regions of the wheat genome

    PubMed Central

    Erayman, Mustafa; Sandhu, Devinder; Sidhu, Deepak; Dilbirligi, Muharrem; Baenziger, P. S.; Gill, Kulvinder S.

    2004-01-01

    By physically mapping 3025 loci including 252 phenotypically characterized genes and 17 quantitative trait loci (QTLs) relative to 334 deletion breakpoints, we localized the gene-containing fraction to 29% of the wheat genome present as 18 major and 30 minor gene-rich regions (GRRs). The GRRs varied both in gene number and density. The five largest GRRs physically spanning <3% of the genome contained 26% of the wheat genes. Approximate size of the GRRs ranged from 3 to 71 Mb. Recombination mainly occurred in the GRRs. Various GRRs varied as much as 128-fold for gene density and 140-fold for recombination rates. Except for a general suppression in 25–40% of the chromosomal region around centromeres, no correlation of recombination was observed with the gene density, the size, or chromosomal location of GRRs. More than 30% of the wheat genes are in recombination-poor regions thus are inaccessible to map-based cloning. PMID:15240829

  18. Genomic location and characterisation of MIC genes in cattle.

    PubMed

    Birch, James; De Juan Sanjuan, Cristina; Guzman, Efrain; Ellis, Shirley A

    2008-08-01

    Major histocompatibility complex (MHC) class I chain-related (MIC) genes have been previously identified and characterised in human. They encode polymorphic class I-like molecules that are stress-inducible, and constitute one of the ligands of the activating natural killer cell receptor NKG2D. We have identified three MIC genes within the cattle genome, located close to three non-classical MHC class I genes. The genomic position relative to other genes is very similar to the arrangement reported in the pig MHC region. Analysis of MIC cDNA sequences derived from a range of cattle cell lines suggest there may be four MIC genes in total. We have investigated the presence of the genes in distinct and well-defined MHC haplotypes, and show that one gene is consistently present, while configuration of the other three genes appears variable.

  19. Impact of recurrent gene duplication on adaptation of plant genomes

    PubMed Central

    2014-01-01

    Background Recurrent gene duplication and retention played an important role in angiosperm genome evolution. It has been hypothesized that these processes contribute significantly to plant adaptation but so far this hypothesis has not been tested at the genome scale. Results We studied available sequenced angiosperm genomes to assess the frequency of positive selection footprints in lineage specific expanded (LSE) gene families compared to single-copy genes using a dN/dS-based test in a phylogenetic framework. We found 5.38% of alignments in LSE genes with codons under positive selection. In contrast, we found no evidence for codons under positive selection in the single-copy reference set. An analysis at the branch level shows that purifying selection acted more strongly on single-copy genes than on LSE gene clusters. Moreover we detect significantly more branches indicating evolution under positive selection and/or relaxed constraint in LSE genes than in single-copy genes. Conclusions In this – to our knowledge –first genome-scale study we provide strong empirical support for the hypothesis that LSE genes fuel adaptation in angiosperms. Our conservative approach for detecting selection footprints as well as our results can be of interest for further studies on (plant) gene family evolution. PMID:24884640

  20. Structure of the human annexin VI gene

    SciTech Connect

    Smith, P.D.; Moss, S.E.; Davies, A.; Crumpton, M.J.

    1994-03-29

    The authors report the structure of the human annexin VI gene and compare the intron-exon organization with the known structures of the human annexin I and II genes. The gene is {approximately}60 kbp long and contains 26 exons. Consistent with the published annexin VI cDNA sequence, the genomic sequence at the 3{prime} end does not contain a canonical polyadenylation signal. The genomic sequence upstream of the transcription start site contains TATAA and CAAT motifs. The spatial organization of the exons does not reveal any obvious similarities between the two halves of the annexin VI gene. Comparison of the intron-exon boundary positions of the annexin VI gene with those of annexins I and II reveals that within the repeated domains the break points are perfectly conserved except for exon 8, which is one codon smaller in annexin II. The corresponding point in the second half of annexin VI is represented by two exons, exons 20 and 21. The latter exon is alternatively spliced, giving rise to two annexin VI isoforms that differ with respect to a 6-amino acid insertion at the start of repeat 7. 32 refs., 6 figs.

  1. Analysis of pan-genome to identify the core genes and essential genes of Brucella spp.

    PubMed

    Yang, Xiaowen; Li, Yajie; Zang, Juan; Li, Yexia; Bie, Pengfei; Lu, Yanli; Wu, Qingmin

    2016-04-01

    Brucella spp. are facultative intracellular pathogens, that cause a contagious zoonotic disease, that can result in such outcomes as abortion or sterility in susceptible animal hosts and grave, debilitating illness in humans. For deciphering the survival mechanism of Brucella spp. in vivo, 42 Brucella complete genomes from NCBI were analyzed for the pan-genome and core genome by identification of their composition and function of Brucella genomes. The results showed that the total 132,143 protein-coding genes in these genomes were divided into 5369 clusters. Among these, 1710 clusters were associated with the core genome, 1182 clusters with strain-specific genes and 2477 clusters with dispensable genomes. COG analysis indicated that 44 % of the core genes were devoted to metabolism, which were mainly responsible for energy production and conversion (COG category C), and amino acid transport and metabolism (COG category E). Meanwhile, approximately 35 % of the core genes were in positive selection. In addition, 1252 potential essential genes were predicted in the core genome by comparison with a prokaryote database of essential genes. The results suggested that the core genes in Brucella genomes are relatively conservation, and the energy and amino acid metabolism play a more important role in the process of growth and reproduction in Brucella spp. This study might help us to better understand the mechanisms of Brucella persistent infection and provide some clues for further exploring the gene modules of the intracellular survival in Brucella spp.

  2. What makes up plant genomes: The vanishing line between transposable elements and genes.

    PubMed

    Zhao, Dongyan; Ferguson, Ann A; Jiang, Ning

    2016-02-01

    The ultimate source of evolution is mutation. As the largest component in plant genomes, transposable elements (TEs) create numerous types of mutations that cannot be mimicked by other genetic mechanisms. When TEs insert into genomic sequences, they influence the expression of nearby genes as well as genes unlinked to the insertion. TEs can duplicate, mobilize, and recombine normal genes or gene fragments, with the potential to generate new genes or modify the structure of existing genes. TEs also donate their transposase coding regions for cellular functions in a process called TE domestication. Despite the host defense against TE activity, a subset of TEs survived and thrived through discreet selection of transposition activity, target site, element size, and the internal sequence. Finally, TEs have established strategies to reduce the efficacy of host defense system by increasing the cost of silencing TEs. This review discusses the recent progress in the area of plant TEs with a focus on the interaction between TEs and genes.

  3. Genome-wide identification and expression profiling of ankyrin-repeat gene family in maize.

    PubMed

    Jiang, Haiyang; Wu, Qingqing; Jin, Jing; Sheng, Lei; Yan, Hanwei; Cheng, Beijiu; Zhu, Suwen

    2013-09-01

    Members of the ankyrin repeats (ANK) gene family encode ANK domain that are common in diverse organisms and play important roles in cell growth and development, such as cell-cell signal transduction and cell cycle regulation. Recently, genome-wide identification and evolutionary analyses of the ANK gene family have been carried out in Arabidopsis and rice. However, little is known regarding the ANK genes in the entire maize genome. In this study, we described the identification and structural characterization of 71 ANK genes in maize (ZmANK). Then, comprehensive bioinformatics analyses of ZmANK genes family were performed including phylogenetic, domain and motif analysis, chromosomal localization, intron/exon structural patterns, gene duplications and expression profiling. Domain composition analyses showed that ZmANK genes formed ten subfamilies. Five tandem duplications and 14 segmental duplications were identified in ZmANK genes. Furthermore, we took comparative analysis of the total ANK gene family in Arabidopsis, rice and maize, ZmANKs were more closely paired with OsANKs than with AtANKs. At last, expression profile analyses were performed. Forty-one members of ZmANK genes held EST sequences records. Semi-quantitative expression and microarray data analysis of these 41 ZmANK genes demonstrated that ZmANK genes exhibit a various expression pattern, suggesting that functional diversification of ZmANK genes family. The results will present significant insights to explore ANK genes expression and function in future studies in maize.

  4. Molecular Assemblies, Genes and Genomics Integrated Efficiently (MAGGIE)

    SciTech Connect

    Baliga, Nitin S

    2011-05-26

    Final report on MAGGIE. We set ambitious goals to model the functions of individual organisms and their community from molecular to systems scale. These scientific goals are driving the development of sophisticated algorithms to analyze large amounts of experimental measurements made using high throughput technologies to explain and predict how the environment influences biological function at multiple scales and how the microbial systems in turn modify the environment. By experimentally evaluating predictions made using these models we will test the degree to which our quantitative multiscale understanding wilt help to rationally steer individual microbes and their communities towards specific tasks. Towards this end we have made substantial progress towards understanding evolution of gene families, transcriptional structures, detailed structures of keystone molecular assemblies (proteins and complexes), protein interactions, biological networks, microbial interactions, and community structure. Using comparative analysis we have tracked the evolutionary history of gene functions to understand how novel functions evolve. One level up, we have used proteomics data, high-resolution genome tiling microarrays, and 5' RNA sequencing to revise genome annotations, discover new genes including ncRNAs, and map dynamically changing operon structures of five model organisms: For Desulfovibrio vulgaris Hildenborough, Pyrococcus furiosis, Sulfolobus solfataricus, Methanococcus maripaludis and Haiobacterium salinarum NROL We have developed machine learning algorithms to accurately identify protein interactions at a near-zero false positive rate from noisy data generated using tagfess complex purification, TAP purification, and analysis of membrane complexes. Combining other genome-scale datasets produced by ENIGMA (in particular, microarray data) and available from literature we have been able to achieve a true positive rate as high as 65% at almost zero false positives when

  5. Exogenous gene integration mediated by genome editing technologies in zebrafish.

    PubMed

    Morita, Hitoshi; Taimatsu, Kiyohito; Yanagi, Kanoko; Kawahara, Atsuo

    2017-03-08

    Genome editing technologies, such as transcription activator-like effector nuclease (TALEN) and the clustered regularly interspaced short palindromic repeat (CRISPR)/ CRISPR-associated protein (Cas) systems, can induce DNA double-strand breaks (DSBs) at the targeted genomic locus, leading to frameshift-mediated gene disruption in the process of DSB repair. Recently, the technology-induced DSBs followed by DSB repairs are applied to integrate exogenous genes into the targeted genomic locus in various model organisms. In addition to a conventional knock-in technology mediated by homology-directed repair (HDR), novel knock-in technologies using refined donor vectors have also been developed with the genome editing technologies based on other DSB repair mechanisms, including non-homologous end joining (NHEJ) and microhomology-mediated end joining (MMEJ). Therefore, the improved knock-in technologies would contribute to freely modify the genome of model organisms.

  6. Structural divergence between the human and chimpanzee genomes.

    PubMed

    Kehrer-Sawatzki, Hildegard; Cooper, David N

    2007-02-01

    The structural microheterogeneity evident between the human and chimpanzee genomes is quite considerable and includes inversions and duplications as well as deletions, ranging in size from a few base-pairs up to several megabases (Mb). Insertions and deletions have together given rise to at least 150 Mb of genomic DNA sequence that is either present or absent in humans as compared to chimpanzees. Such regions often contain paralogous sequences and members of multigene families thereby ensuring that the human and chimpanzee genomes differ by a significant fraction of their gene content. There is as yet no evidence to suggest that the large chromosomal rearrangements which serve to distinguish the human and chimpanzee karyotypes have influenced either speciation or the evolution of lineage-specific traits. However, the myriad submicroscopic rearrangements in both genomes, particularly those involving copy number variation, are unlikely to represent exclusively neutral changes and hence promise to facilitate the identification of genes that have been important for human-specific evolution.

  7. Genome-wide identification and analysis of the MADS-box gene family in apple.

    PubMed

    Tian, Yi; Dong, Qinglong; Ji, Zhirui; Chi, Fumei; Cong, Peihua; Zhou, Zongshan

    2015-01-25

    The MADS-box gene family is one of the most widely studied families in plants and has diverse developmental roles in flower pattern formation, gametophyte cell division and fruit differentiation. Although the genome-wide analysis of this family has been performed in some species, little is known regarding MADS-box genes in apple (Malus domestica). In this study, 146 MADS-box genes were identified in the apple genome and were phylogenetically clustered into six subgroups (MIKC(c), MIKC*, Mα, Mβ, Mγ and Mδ) with the MADS-box genes from Arabidopsis and rice. The predicted apple MADS-box genes were distributed across all 17 chromosomes at different densities. Additionally, the MADS-box domain, exon length, gene structure and motif compositions of the apple MADS-box genes were analysed. Moreover, the expression of all of the apple MADS-box genes was analysed in the root, stem, leaf, flower tissues and five stages of fruit development. All of the apple MADS-box genes, with the exception of some genes in each group, were expressed in at least one of the tissues tested, which indicates that the MADS-box genes are involved in various aspects of the physiological and developmental processes of the apple. To the best of our knowledge, this report describes the first genome-wide analysis of the apple MADS-box gene family, and the results should provide valuable information for understanding the classification, cloning and putative functions of this family.

  8. Pinpointing disease genes through phenomic and genomic data fusion

    PubMed Central

    2015-01-01

    Background Pinpointing genes involved in inherited human diseases remains a great challenge in the post-genomics era. Although approaches have been proposed either based on the guilt-by-association principle or making use of disease phenotype similarities, the low coverage of both diseases and genes in existing methods has been preventing the scan of causative genes for a significant proportion of diseases at the whole-genome level. Results To overcome this limitation, we proposed a rigorous statistical method called pgFusion to prioritize candidate genes by integrating one type of disease phenotype similarity derived from the Unified Medical Language System (UMLS) and seven types of gene functional similarities calculated from gene expression, gene ontology, pathway membership, protein sequence, protein domain, protein-protein interaction and regulation pattern, respectively. Our method covered a total of 7,719 diseases and 20,327 genes, achieving the highest coverage thus far for both diseases and genes. We performed leave-one-out cross-validation experiments to demonstrate the superior performance of our method and applied it to a real exome sequencing dataset of epileptic encephalopathies, showing the capability of this approach in finding causative genes for complex diseases. We further provided the standalone software and online services of pgFusion at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgfusion. Conclusions pgFusion not only provided an effective way for prioritizing candidate genes, but also demonstrated feasible solutions to two fundamental questions in the analysis of big genomic data: the comparability of heterogeneous data and the integration of multiple types of data. Applications of this method in exome or whole genome sequencing studies would accelerate the finding of causative genes for human diseases. Other research fields in genomics could also benefit from the incorporation of our data fusion methodology. PMID:25708473

  9. The cavefish genome reveals candidate genes for eye loss

    PubMed Central

    McGaugh, Suzanne E.; Gross, Joshua B.; Aken, Bronwen; Blin, Maryline; Borowsky, Richard; Chalopin, Domitille; Hinaux, Hélène; Jeffery, William R.; Keene, Alex; Ma, Li; Minx, Patrick; Murphy, Daniel; O’Quin, Kelly E.; Rétaux, Sylvie; Rohner, Nicolas; Searle, Steve M. J.; Stahl, Bethany A.; Tabin, Cliff; Volff, Jean-Nicolas; Yoshizawa, Masato; Warren, Wesley C.

    2014-01-01

    Natural populations subjected to strong environmental selection pressures offer a window into the genetic underpinnings of evolutionary change. Cavefish populations, Astyanax mexicanus (Teleostei: Characiphysi), exhibit repeated, independent evolution for a variety of traits including eye degeneration, pigment loss, increased size and number of taste buds and mechanosensory organs, and shifts in many behavioural traits. Surface and cave forms are interfertile making this system amenable to genetic interrogation; however, lack of a reference genome has hampered efforts to identify genes responsible for changes in cave forms of A. mexicanus. Here we present the first de novo genome assembly for Astyanax mexicanus cavefish, contrast repeat elements to other teleost genomes, identify candidate genes underlying quantitative trait loci (QTL), and assay these candidate genes for potential functional and expression differences. We expect the cavefish genome to advance understanding of the evolutionary process, as well as, analogous human disease including retinal dysfunction. PMID:25329095

  10. From Genomics to Gene Therapy: Induced Pluripotent Stem Cells Meet Genome Editing.

    PubMed

    Hotta, Akitsu; Yamanaka, Shinya

    2015-01-01

    The advent of induced pluripotent stem (iPS) cells has opened up numerous avenues of opportunity for cell therapy, including the initiation in September 2014 of the first human clinical trial to treat dry age-related macular degeneration. In parallel, advances in genome-editing technologies by site-specific nucleases have dramatically improved our ability to edit endogenous genomic sequences at targeted sites of interest. In fact, clinical trials have already begun to implement this technology to control HIV infection. Genome editing in iPS cells is a powerful tool and enables researchers to investigate the intricacies of the human genome in a dish. In the near future, the groundwork laid by such an approach may expand the possibilities of gene therapy for treating congenital disorders. In this review, we summarize the exciting progress being made in the utilization of genomic editing technologies in pluripotent stem cells and discuss remaining challenges toward gene therapy applications.

  11. Gene calling and bacterial genome annotation with BG7.

    PubMed

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  12. Genomic imprinting-an epigenetic gene-regulatory model.

    PubMed

    Koerner, Martha V; Barlow, Denise P

    2010-04-01

    Epigenetic mechanisms (Box 1) are considered to play major gene-regulatory roles in development, differentiation and disease. However, the relative importance of epigenetics in defining the mammalian transcriptome in normal and disease states is unknown. The mammalian genome contains only a few model systems where epigenetic gene regulation has been shown to play a major role in transcriptional control. These model systems are important not only to investigate the biological function of known epigenetic modifications but also to identify new and unexpected epigenetic mechanisms in the mammalian genome. Here we review recent progress in understanding how epigenetic mechanisms control imprinted gene expression.

  13. LATERAL GENE TRANSFER AND THE HISTORY OF BACTERIAL GENOMES

    SciTech Connect

    Howard Ochman

    2006-02-22

    The aims of this research were to elucidate the role and extent of lateral transfer in the differentiation of bacterial strains and species, and to assess the impact of gene transfer on the evolution of bacterial genomes. The ultimate goal of the project is to examine the dynamics of a core set of protein-coding genes (i.e., those that are distributed universally among Bacteria) by developing conserved primers that would allow their amplification and sequencing in any bacterial taxa. In addition, we adopted a bioinformatic approach to elucidate the extent of lateral gene transfer in sequenced genome.

  14. Analysis of correlation structures in the Synechocystis PCC6803 genome.

    PubMed

    Wu, Zuo-Bing

    2014-12-01

    Transfer of nucleotide strings in the Synechocystis sp. PCC6803 genome is investigated to exhibit periodic and non-periodic correlation structures by using the recurrence plot method and the phase space reconstruction technique. The periodic correlation structures are generated by periodic transfer of several substrings in long periodic or non-periodic nucleotide strings embedded in the coding regions of genes. The non-periodic correlation structures are generated by non-periodic transfer of several substrings covering or overlapping with the coding regions of genes. In the periodic and non-periodic transfer, some gaps divide the long nucleotide strings into the substrings and prevent their global transfer. Most of the gaps are either the replacement of one base or the insertion/reduction of one base. In the reconstructed phase space, the points generated from two or three steps for the continuous iterative transfer via the second maximal distance can be fitted by two lines. It partly reveals an intrinsic dynamics in the transfer of nucleotide strings. Due to the comparison of the relative positions and lengths, the substrings concerned with the non-periodic correlation structures are almost identical to the mobile elements annotated in the genome. The mobile elements are thus endowed with the basic results on the correlation structures.

  15. Estimating genome-wide gene networks using nonparametric Bayesian network models on massively parallel computers.

    PubMed

    Tamada, Yoshinori; Imoto, Seiya; Araki, Hiromitsu; Nagasaki, Masao; Print, Cristin; Charnock-Jones, D Stephen; Miyano, Satoru

    2011-01-01

    We present a novel algorithm to estimate genome-wide gene networks consisting of more than 20,000 genes from gene expression data using nonparametric Bayesian networks. Due to the difficulty of learning Bayesian network structures, existing algorithms cannot be applied to more than a few thousand genes. Our algorithm overcomes this limitation by repeatedly estimating subnetworks in parallel for genes selected by neighbor node sampling. Through numerical simulation, we confirmed that our algorithm outperformed a heuristic algorithm in a shorter time. We applied our algorithm to microarray data from human umbilical vein endothelial cells (HUVECs) treated with siRNAs, to construct a human genome-wide gene network, which we compared to a small gene network estimated for the genes extracted using a traditional bioinformatics method. The results showed that our genome-wide gene network contains many features of the small network, as well as others that could not be captured during the small network estimation. The results also revealed master-regulator genes that are not in the small network but that control many of the genes in the small network. These analyses were impossible to realize without our proposed algorithm.

  16. Sessile snails, dynamic genomes: gene rearrangements within the mitochondrial genome of a family of caenogastropod molluscs

    PubMed Central

    2010-01-01

    Background Widespread sampling of vertebrates, which comprise the majority of published animal mitochondrial genomes, has led to the view that mitochondrial gene rearrangements are relatively rare, and that gene orders are typically stable across major taxonomic groups. In contrast, more limited sampling within the Phylum Mollusca has revealed an unusually high number of gene order arrangements. Here we provide evidence that the lability of the molluscan mitochondrial genome extends to the family level by describing extensive gene order changes that have occurred within the Vermetidae, a family of sessile marine gastropods that radiated from a basal caenogastropod stock during the Cenozoic Era. Results Major mitochondrial gene rearrangements have occurred within this family at a scale unexpected for such an evolutionarily young group and unprecedented for any caenogastropod examined to date. We determined the complete mitochondrial genomes of four species (Dendropoma maximum, D. gregarium, Eualetes tulipa, and Thylacodes squamigerus) and the partial mitochondrial genomes of two others (Vermetus erectus and Thylaeodus sp.). Each of the six vermetid gastropods assayed possessed a unique gene order. In addition to the typical mitochondrial genome complement of 37 genes, additional tRNA genes were evident in D. gregarium (trnK) and Thylacodes squamigerus (trnV, trnLUUR). Three pseudogenes and additional tRNAs found within the genome of Thylacodes squamigerus provide evidence of a past duplication event in this taxon. Likewise, high sequence similarities between isoaccepting leucine tRNAs in Thylacodes, Eualetes, and Thylaeodus suggest that tRNA remolding has been rife within this family. While vermetids exhibit gene arrangements diagnostic of this family, they also share arrangements with littorinimorph caenogastropods, with which they have been linked based on sperm morphology and primary sequence-based phylogenies. Conclusions We have uncovered major changes in gene

  17. Genomes, diversity and resistance gene analogues in Musa species.

    PubMed

    Azhar, M; Heslop-Harrison, J S

    2008-01-01

    Resistance genes (R genes) in plants are abundant and may represent more than 1% of all the genes. Their diversity is critical to the recognition and response to attack from diverse pathogens. Like many other crops, banana and plantain face attacks from potentially devastating fungal and bacterial diseases, increased by a combination of worldwide spread of pathogens, exploitation of a small number of varieties, new pathogen mutations, and the lack of effective, benign and cheap chemical control. The challenge for plant breeders is to identify and exploit genetic resistances to diseases, which is particularly difficult in banana and plantain where the valuable cultivars are sterile, parthenocarpic and mostly triploid so conventional genetic analysis and breeding is impossible. In this paper, we review the nature of R genes and the key motifs, particularly in the Nucleotide Binding Sites (NBS), Leucine Rich Repeat (LRR) gene class. We present data about identity, nature and evolutionary diversity of the NBS domains of Musa R genes in diploid wild species with the Musa acuminata (A), M. balbisiana (B), M. schizocarpa (S), M. textilis (T), M. velutina and M. ornata genomes, and from various cultivated hybrid and triploid accessions, using PCR primers to isolate the domains from genomic DNA. Of 135 new sequences, 75% of the sequenced clones had uninterrupted open reading frames (ORFs), and phylogenetic UPGMA tree construction showed four clusters, one from Musa ornata, one largely from the B and T genomes, one from A and M. velutina, and the largest with A, B, T and S genomes. Only genes of the coiled-coil (non-TIR) class were found, typical of the grasses and presumably monocotyledons. The analysis of R genes in cultivated banana and plantain, and their wild relatives, has implications for identification and selection of resistance genes within the genus which may be useful for plant selection and breeding and also for defining relationships and genome evolution

  18. Detection of Prokaryotic Genes in the Amphimedon queenslandica Genome

    PubMed Central

    Conaco, Cecilia; Tsoulfas, Pantelis; Sakarya, Onur; Dolan, Amanda; Werren, John; Kosik, Kenneth S.

    2016-01-01

    Horizontal gene transfer (HGT) is common between prokaryotes and phagotrophic eukaryotes. In metazoans, the scale and significance of HGT remains largely unexplored but is usually linked to a close association with parasites and endosymbionts. Marine sponges (Porifera), which host many microorganisms in their tissues and lack an isolated germ line, are potential carriers of genes transferred from prokaryotes. In this study, we identified a number of potential horizontally transferred genes within the genome of the sponge, Amphimedon queenslandica. We further identified homologs of some of these genes in other sponges. The transferred genes, most of which possess catalytic activity for carbohydrate or protein metabolism, have assimilated host genome characteristics and are actively expressed. The diversity of functions contributed by the horizontally transferred genes is likely an important factor in the adaptation and evolution of A. queenslandica. These findings highlight the potential importance of HGT on the success of sponges in diverse ecological niches. PMID:26959231

  19. Beyond Genomics: Studying Evolution with Gene Coexpression Networks.

    PubMed

    Ruprecht, Colin; Vaid, Neha; Proost, Sebastian; Persson, Staffan; Mutwil, Marek

    2017-04-01

    Understanding how genomes change as organisms become more complex is a central question in evolution. Molecular evolutionary studies typically correlate the appearance of genes and gene families with the emergence of biological pathways and morphological features. While such approaches are of great importance to understand how organisms evolve, they are also limited, as functionally related genes work together in contexts of dynamic gene networks. Since functionally related genes are often transcriptionally coregulated, gene coexpression networks present a resource to study the evolution of biological pathways. In this opinion article, we discuss recent developments in this field and how coexpression analyses can be merged with existing genomic approaches to transfer functional knowledge between species to study the appearance or extension of pathways.

  20. Sequencing of 15 622 gene-bearing BACs clarifies the gene-dense regions of the barley genome.

    PubMed

    Muñoz-Amatriaín, María; Lonardi, Stefano; Luo, MingCheng; Madishetty, Kavitha; Svensson, Jan T; Moscou, Matthew J; Wanamaker, Steve; Jiang, Tao; Kleinhofs, Andris; Muehlbauer, Gary J; Wise, Roger P; Stein, Nils; Ma, Yaqin; Rodriguez, Edmundo; Kudrna, Dave; Bhat, Prasanna R; Chao, Shiaoman; Condamine, Pascal; Heinen, Shane; Resnik, Josh; Wing, Rod; Witt, Heather N; Alpert, Matthew; Beccuti, Marco; Bozdag, Serdar; Cordero, Francesca; Mirebrahim, Hamid; Ounit, Rachid; Wu, Yonghui; You, Frank; Zheng, Jie; Simková, Hana; Dolezel, Jaroslav; Grimwood, Jane; Schmutz, Jeremy; Duma, Denisa; Altschmied, Lothar; Blake, Tom; Bregitzer, Phil; Cooper, Laurel; Dilbirligi, Muharrem; Falk, Anders; Feiz, Leila; Graner, Andreas; Gustafson, Perry; Hayes, Patrick M; Lemaux, Peggy; Mammadov, Jafar; Close, Timothy J

    2015-10-01

    Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant.

  1. Whole genome phylogeny of Prochlorococcus marinus group of cyanobacteria: genome alignment and overlapping gene approach.

    PubMed

    Prabha, Ratna; Singh, Dhananjaya P; Gupta, Shailendra K; Rai, Anil

    2014-06-01

    Prochlorococcus is the smallest known oxygenic phototrophic marine cyanobacterium dominating the mid-latitude oceans. Physiologically and genetically distinct P. marinus isolates from many oceans in the world were assigned two different groups, a tightly clustered high-light (HL)-adapted and a divergent low-light (LL-) adapted clade. Phylogenetic analysis of this cyanobacterium on the basis of 16S rRNA and other conserved genes did not show consistency with its phenotypic behavior. We analyzed phylogeny of this genus on the basis of complete genome sequences through genome alignment, overlapping-gene content and gene-order approach. Phylogenetic tree of P. marinus obtained by comparing whole genome sequences in contrast to that based on 16S rRNA gene, corresponded well with the HL/LL ecotypic distinction of twelve strains and showed consistency with phenotypic classification of P. marinus. Evidence for the horizontal descent and acquisition of genes within and across the genus was observed. Many genes involved in metabolic functions were found to be conserved across these genomes and many were continuously gained by different strains as per their needs during the course of their evolution. Consistency in the physiological and genetic phylogeny based on whole genome sequence is established. These observations improve our understanding about the adaptation and diversification of these organisms under evolutionary pressure.

  2. Genome engineering using a synthetic gene circuit in Bacillus subtilis.

    PubMed

    Jeong, Da-Eun; Park, Seung-Hwan; Pan, Jae-Gu; Kim, Eui-Joong; Choi, Soo-Keun

    2015-03-31

    Genome engineering without leaving foreign DNA behind requires an efficient counter-selectable marker system. Here, we developed a genome engineering method in Bacillus subtilis using a synthetic gene circuit as a counter-selectable marker system. The system contained two repressible promoters (B. subtilis xylA (Pxyl) and spac (Pspac)) and two repressor genes (lacI and xylR). Pxyl-lacI was integrated into the B. subtilis genome with a target gene containing a desired mutation. The xylR and Pspac-chloramphenicol resistant genes (cat) were located on a helper plasmid. In the presence of xylose, repression of XylR by xylose induced LacI expression, the LacIs repressed the Pspac promoter and the cells become chloramphenicol sensitive. Thus, to survive in the presence of chloramphenicol, the cell must delete Pxyl-lacI by recombination between the wild-type and mutated target genes. The recombination leads to mutation of the target gene. The remaining helper plasmid was removed easily under the chloramphenicol absent condition. In this study, we showed base insertion, deletion and point mutation of the B. subtilis genome without leaving any foreign DNA behind. Additionally, we successfully deleted a 2-kb gene (amyE) and a 38-kb operon (ppsABCDE). This method will be useful to construct designer Bacillus strains for various industrial applications.

  3. Genome-wide analysis of the MADS-box gene family in Brachypodium distachyon.

    PubMed

    Wei, Bo; Zhang, Rong-Zhi; Guo, Juan-Juan; Liu, Dan-Mei; Li, Ai-Li; Fan, Ren-Chun; Mao, Long; Zhang, Xiang-Qi

    2014-01-01

    MADS-box genes are important transcription factors for plant development, especially floral organogenesis. Brachypodium distachyon is a model for biofuel plants and temperate grasses such as wheat and barley, but a comprehensive analysis of MADS-box family proteins in Brachypodium is still missing. We report here a genome-wide analysis of the MADS-box gene family in Brachypodium distachyon. We identified 57 MADS-box genes and classified them into 32 MIKC(c)-type, 7 MIKC*-type, 9 Mα, 7 Mβ and 2 Mγ MADS-box genes according to their phylogenetic relationships to the Arabidopsis and rice MADS-box genes. Detailed gene structure and motif distribution were then studied. Investigation of their chromosomal localizations revealed that Brachypodium MADS-box genes distributed evenly across five chromosomes. In addition, five pairs of type II MADS-box genes were found on synteny blocks derived from whole genome duplication blocks. We then performed a systematic expression analysis of Brachypodium MADS-box genes in various tissues, particular floral organs. Further detection under salt, drought, and low-temperature conditions showed that some MADS-box genes may also be involved in abiotic stress responses, including type I genes. Comparative studies of MADS-box genes among Brachypodium, rice and Arabidopsis showed that Brachypodium had fewer gene duplication events. Taken together, this work provides useful data for further functional studies of MADS-box genes in Brachypodium distachyon.

  4. Mechanisms underlying structural variant formation in genomic disorders

    PubMed Central

    Carvalho, Claudia M. B.; Lupski, James R.

    2016-01-01

    With the recent burst of technological developments in genomics, and the clinical implementation of genome-wide assays, our understanding of the molecular basis of genomic disorders, specifically the contribution of structural variation to disease burden, is evolving quickly. Ongoing studies have revealed a ubiquitous role for genome architecture in the formation of structural variants at a given locus, both in DNA recombination-based processes and in replication-based processes. These reports showcase the influence of repeat sequences on genomic stability and structural variant complexity and also highlight the tremendous plasticity and dynamic nature of our genome in evolution, health and disease susceptibility. PMID:26924765

  5. Structural Genomics of Bacterial Virulence Factors

    DTIC Science & Technology

    2006-05-01

    involved in65 these processes. The large G+ C content difference between66 orf6, orf7 and orf8 (35%), and other Bacteroides genes67 ( 42 %) suggests a...initiating assembly of the central spindle, a structure that has important roles in cytokinesis. In C . elegans embryos and other animal cells, central...D. Read, T. Popovic, and C . M. Fraser. 2004. Identification of anthrax toxin genes in a Bacillus cereus associated with an illness resembling

  6. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome

    PubMed Central

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-01-01

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives. PMID:26658305

  7. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    PubMed

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-12-11

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.

  8. The evolution of chloroplast genome structure in ferns.

    PubMed

    Wolf, Paul G; Roper, Jessie M; Duffy, Aaron M

    2010-09-01

    The plastid genome (plastome) is a rich source of phylogenetic and other comparative data in plants. Most land plants possess a plastome of similar structure. However, in a major group of plants, the ferns, a unique plastome structure has evolved. The gene order in ferns has been explained by a series of genomic inversions relative to the plastome organization of seed plants. Here, we examine for the first time the structure of the plastome across fern phylogeny. We used a PCR-based strategy to map and partially sequence plastomes. We found that a pair of partially overlapping inversions in the region of the inverted repeat occurred in the common ancestor of most ferns. However, the ancestral (seed plant) structure is still found in early diverging branches leading to the osmundoid and filmy fern lineages. We found that a second pair of overlapping inversions occurred on a branch leading to the core leptosporangiates. We also found that the unique placement of the gene matK in ferns (lacking a flanking intron) is not a result of a large-scale inversion, as previously thought. This is because the intron loss maps to an earlier point on the phylogeny than the nearby inversion. We speculate on why inversions may occur in pairs and what this may mean for the dynamics of plastome evolution.

  9. Genome-wide identification and analysis of the MADS-box gene family in sesame.

    PubMed

    Wei, Xin; Wang, Linhai; Yu, Jingyin; Zhang, Yanxin; Li, Donghua; Zhang, Xiurong

    2015-09-10

    MADS-box genes encode transcription factors that play crucial roles in plant growth and development. Sesame (Sesamum indicum L.) is an oil crop that contributes to the daily oil and protein requirements of almost half of the world's population; therefore, a genome-wide analysis of the MADS-box gene family is needed. Fifty-seven MADS-box genes were identified from 14 linkage groups of the sesame genome. Analysis of phylogenetic relationships with Arabidopsis thaliana, Utricularia gibba and Solanum lycopersicum MADS-box genes was performed. Sesame MADS-box genes were clustered into four groups: 28 MIKC(c)-type, 5 MIKC(⁎)-type, 14 Mα-type and 10 Mγ-type. Gene structure analysis revealed from 1 to 22 exons of sesame MADS-box genes. The number of exons in type II MADS-box genes greatly exceeded the number in type I genes. Motif distribution analysis of sesame MADS-box genes also indicated that type II MADS-box genes contained more motifs than type I genes. These results suggested that type II sesame MADS-box genes had more complex structures. By analyzing expression profiles of MADS-box genes in seven sesame transcriptomes, we determined that MIKC(C)-type MADS-box genes played significant roles in sesame flower and seed development. Although most MADS-box genes in the same clade showed similar expression features, some gene functions were diversified from the orthologous Arabidopsis genes. This research will contribute to uncovering the role of MADS-box genes in sesame development.

  10. Features of Arabidopsis genes and genome discovered using full-length cDNAs.

    PubMed

    Alexandrov, Nickolai N; Troukhan, Maxim E; Brover, Vyacheslav V; Tatarinova, Tatiana; Flavell, Richard B; Feldmann, Kenneth A

    2006-01-01

    Arabidopsis is currently the reference genome for higher plants. A new, more detailed statistical analysis of Arabidopsis gene structure is presented including intron and exon lengths, intergenic distances, features of promoters, and variant 5'-ends of mRNAs transcribed from the same transcription unit. We also provide a statistical characterization of Arabidopsis transcripts in terms of their size, UTR lengths, 3'-end cleavage sites, splicing variants, and coding potential. These analyses were facilitated by scrutiny of our collection of sequenced full-length cDNAs and much larger collection of 5'-ESTs, together with another set of full-length cDNAs from Salk/Stanford/Plant Gene Expression Center/RIKEN. Examples of alternative splicing are observed for transcripts from 7% of the genes and many of these genes display multiple spliced isoforms. Most splicing variants lie in non-coding regions of the transcripts. Non-canonical splice sites constitute less than 1% of all splice sites. Genes with fewer than four introns display reduced average mRNA levels. Putative alternative transcription start sites were observed in 30% of highly expressed genes and in more than 50% of the genes with low expression. Transcription start sites correlate remarkably well with a CG skew peak in the DNA sequences. The intergenic distances vary considerably, those where genes are transcribed towards one another being significantly shorter. New transcripts, missing in the current TIGR genome annotation and ESTs that are non-coding, including those antisense to known genes, are derived and cataloged in the Supplementary Material. They identify 148 new loci in the Arabidopsis genome. The conclusions drawn provide a better understanding of the Arabidopsis genome and how the gene transcripts are processed. The results also allow better predictions to be made for, as yet, poorly defined genes and provide a reference for comparisons with other plant genomes whose complete sequences are currently

  11. Gene Space Dynamics During the Evolution of Aegilops tauschii, Brachypodium distachyon, Oryza sativa, and Sorghum bicolor Genomes

    PubMed Central

    Massa, A. N.; Wanjugi, H.; Deal, K. R.; O'Brien, K.; You, F. M.; Maiti, R.; Chan, A. P.; Gu, Y. Q.; Luo, M. C.; Anderson, O. D.; Rabinowicz, P. D.; Dvorak, J.; Devos, K. M.

    2011-01-01

    Nine different regions totaling 9.7 Mb of the 4.02 Gb Aegilops tauschii genome were sequenced using the Sanger sequencing technology and compared with orthologous Brachypodium distachyon, Oryza sativa (rice), and Sorghum bicolor (sorghum) genomic sequences. The ancestral gene content in these regions was inferred and used to estimate gene deletion and gene duplication rates along each branch of the phylogenetic tree relating the four species. The total gene number in the extant Ae. tauschii genome was estimated to be 36,371. The gene deletion and gene duplication rates and total gene numbers in the four genomes were used to estimate the total gene number in each node of the phylogenetic tree. The common ancestor of the Brachypodieae and Triticeae lineages was estimated to have had 28,558 genes, and the common ancestor of the Panicoideae, Ehrhartoideae, and Pooideae subfamilies was estimated to have had 27,152 or 28,350 genes, depending on the ancestral gene scenario. Relative to the Brachypodieae and Triticeae common ancestor, the gene number was reduced in B. distachyon by 3,026 genes and increased in Ae. tauschii by 7,813 genes. The sum of gene deletion and gene duplication rates, which reflects the rate of gene synteny loss, was correlated with the rate of structural chromosome rearrangements and was highest in the Ae. tauschii lineage and lowest in the rice lineage. The high rate of gene space evolution in the Ae. tauschii lineage accounts for the fact that, contrary to the expectations, the level of synteny between the phylogenetically more related Ae. tauschii and B. distachyon genomes is similar to the level of synteny between the Ae. tauschii genome and the genomes of the less related rice and sorghum. The ratio of gene duplication to gene deletion rates in these four grass species closely parallels both the total number of genes in a species and the overall genome size. Because the overall genome size is to a large extent a function of the repeated

  12. Gene space dynamics during the evolution of Aegilops tauschii, Brachypodium distachyon, Oryza sativa, and Sorghum bicolor genomes.

    PubMed

    Massa, A N; Wanjugi, H; Deal, K R; O'Brien, K; You, F M; Maiti, R; Chan, A P; Gu, Y Q; Luo, M C; Anderson, O D; Rabinowicz, P D; Dvorak, J; Devos, K M

    2011-09-01

    Nine different regions totaling 9.7 Mb of the 4.02 Gb Aegilops tauschii genome were sequenced using the Sanger sequencing technology and compared with orthologous Brachypodium distachyon, Oryza sativa (rice), and Sorghum bicolor (sorghum) genomic sequences. The ancestral gene content in these regions was inferred and used to estimate gene deletion and gene duplication rates along each branch of the phylogenetic tree relating the four species. The total gene number in the extant Ae. tauschii genome was estimated to be 36,371. The gene deletion and gene duplication rates and total gene numbers in the four genomes were used to estimate the total gene number in each node of the phylogenetic tree. The common ancestor of the Brachypodieae and Triticeae lineages was estimated to have had 28,558 genes, and the common ancestor of the Panicoideae, Ehrhartoideae, and Pooideae subfamilies was estimated to have had 27,152 or 28,350 genes, depending on the ancestral gene scenario. Relative to the Brachypodieae and Triticeae common ancestor, the gene number was reduced in B. distachyon by 3,026 genes and increased in Ae. tauschii by 7,813 genes. The sum of gene deletion and gene duplication rates, which reflects the rate of gene synteny loss, was correlated with the rate of structural chromosome rearrangements and was highest in the Ae. tauschii lineage and lowest in the rice lineage. The high rate of gene space evolution in the Ae. tauschii lineage accounts for the fact that, contrary to the expectations, the level of synteny between the phylogenetically more related Ae. tauschii and B. distachyon genomes is similar to the level of synteny between the Ae. tauschii genome and the genomes of the less related rice and sorghum. The ratio of gene duplication to gene deletion rates in these four grass species closely parallels both the total number of genes in a species and the overall genome size. Because the overall genome size is to a large extent a function of the repeated

  13. Genome-wide investigation and transcriptome analysis of the WRKY gene family in Gossypium.

    PubMed

    Ding, Mingquan; Chen, Jiadong; Jiang, Yurong; Lin, Lifeng; Cao, YueFen; Wang, Minhua; Zhang, Yuting; Rong, Junkang; Ye, Wuwei

    2015-02-01

    WRKY transcription factors play important roles in various stress responses in diverse plant species. In cotton, this family has not been well studied, especially in relation to fiber development. Here, the genomes and transcriptomes of Gossypium raimondii and Gossypium arboreum were investigated to identify fiber development related WRKY genes. This represents the first comprehensive comparative study of WRKY transcription factors in both diploid A and D cotton species. In total, 112 G. raimondii and 109 G. arboreum WRKY genes were identified. No significant gene structure or domain alterations were detected between the two species, but many SNPs distributed unequally in exon and intron regions. Physical mapping revealed that the WRKY genes in G. arboreum were not located in the corresponding chromosomes of G. raimondii, suggesting great chromosome rearrangement in the diploid cotton genomes. The cotton WRKY genes, especially subgroups I and II, have expanded through multiple whole genome duplications and tandem duplications compared with other plant species. Sequence comparison showed many functionally divergent sites between WRKY subgroups, while the genes within each group are under strong purifying selection. Transcriptome analysis suggested that many WRKY genes participate in specific fiber development processes such as fiber initiation, elongation and maturation with different expression patterns between species. Complex WRKY gene expression such as differential Dt and At allelic gene expression in G. hirsutum and alternative splicing events were also observed in both diploid and tetraploid cottons during fiber development process. In conclusion, this study provides important information on the evolution and function of WRKY gene family in cotton species.

  14. Comparative genomic analysis of Drosophila melanogaster and vector mosquito developmental genes.

    PubMed

    Behura, Susanta K; Haugen, Morgan; Flannery, Ellen; Sarro, Joseph; Tessier, Charles R; Severson, David W; Duman-Scheel, Molly

    2011-01-01

    Genome sequencing projects have presented the opportunity for analysis of developmental genes in three vector mosquito species: Aedes aegypti, Culex quinquefasciatus, and Anopheles gambiae. A comparative genomic analysis of developmental genes in Drosophila melanogaster and these three important vectors of human disease was performed in this investigation. While the study was comprehensive, special emphasis centered on genes that 1) are components of developmental signaling pathways, 2) regulate fundamental developmental processes, 3) are critical for the development of tissues of vector importance, 4) function in developmental processes known to have diverged within insects, and 5) encode microRNAs (miRNAs) that regulate developmental transcripts in Drosophila. While most fruit fly developmental genes are conserved in the three vector mosquito species, several genes known to be critical for Drosophila development were not identified in one or more mosquito genomes. In other cases, mosquito lineage-specific gene gains with respect to D. melanogaster were noted. Sequence analyses also revealed that numerous repetitive sequences are a common structural feature of Drosophila and mosquito developmental genes. Finally, analysis of predicted miRNA binding sites in fruit fly and mosquito developmental genes suggests that the repertoire of developmental genes targeted by miRNAs is species-specific. The results of this study provide insight into the evolution of developmental genes and processes in dipterans and other arthropods, serve as a resource for those pursuing analysis of mosquito development, and will promote the design and refinement of functional analysis experiments.

  15. Genome instability mechanisms and the structure of cancer genomes.

    PubMed

    Cassidy, Liam D; Venkitaraman, Ashok R

    2012-02-01

    Genomic instability is a hallmark of cancer cells, and arises from the aberrations that these cells exhibit in the normal biological mechanisms that repair and replicate the genome, or ensure its accurate segregation during cell division. Increasingly detailed descriptions of cancer genomes have begun to emerge from next-generation sequencing (NGS), providing snapshots of their nature and heterogeneity in different cancers at different stages in their evolution. Here, we attempt to extract from these sequencing studies insights into the role of genome instability mechanisms in carcinogenesis, and to identify challenges impeding further progress.

  16. Genome sequence, comparative analysis and haplotype structure of the domestic dog.

    PubMed

    Lindblad-Toh, Kerstin; Wade, Claire M; Mikkelsen, Tarjei S; Karlsson, Elinor K; Jaffe, David B; Kamal, Michael; Clamp, Michele; Chang, Jean L; Kulbokas, Edward J; Zody, Michael C; Mauceli, Evan; Xie, Xiaohui; Breen, Matthew; Wayne, Robert K; Ostrander, Elaine A; Ponting, Chris P; Galibert, Francis; Smith, Douglas R; DeJong, Pieter J; Kirkness, Ewen; Alvarez, Pablo; Biagi, Tara; Brockman, William; Butler, Jonathan; Chin, Chee-Wye; Cook, April; Cuff, James; Daly, Mark J; DeCaprio, David; Gnerre, Sante; Grabherr, Manfred; Kellis, Manolis; Kleber, Michael; Bardeleben, Carolyne; Goodstadt, Leo; Heger, Andreas; Hitte, Christophe; Kim, Lisa; Koepfli, Klaus-Peter; Parker, Heidi G; Pollinger, John P; Searle, Stephen M J; Sutter, Nathan B; Thomas, Rachael; Webber, Caleb; Baldwin, Jennifer; Abebe, Adal; Abouelleil, Amr; Aftuck, Lynne; Ait-Zahra, Mostafa; Aldredge, Tyler; Allen, Nicole; An, Peter; Anderson, Scott; Antoine, Claudel; Arachchi, Harindra; Aslam, Ali; Ayotte, Laura; Bachantsang, Pasang; Barry, Andrew; Bayul, Tashi; Benamara, Mostafa; Berlin, Aaron; Bessette, Daniel; Blitshteyn, Berta; Bloom, Toby; Blye, Jason; Boguslavskiy, Leonid; Bonnet, Claude; Boukhgalter, Boris; Brown, Adam; Cahill, Patrick; Calixte, Nadia; Camarata, Jody; Cheshatsang, Yama; Chu, Jeffrey; Citroen, Mieke; Collymore, Alville; Cooke, Patrick; Dawoe, Tenzin; Daza, Riza; Decktor, Karin; DeGray, Stuart; Dhargay, Norbu; Dooley, Kimberly; Dooley, Kathleen; Dorje, Passang; Dorjee, Kunsang; Dorris, Lester; Duffey, Noah; Dupes, Alan; Egbiremolen, Osebhajajeme; Elong, Richard; Falk, Jill; Farina, Abderrahim; Faro, Susan; Ferguson, Diallo; Ferreira, Patricia; Fisher, Sheila; FitzGerald, Mike; Foley, Karen; Foley, Chelsea; Franke, Alicia; Friedrich, Dennis; Gage, Diane; Garber, Manuel; Gearin, Gary; Giannoukos, Georgia; Goode, Tina; Goyette, Audra; Graham, Joseph; Grandbois, Edward; Gyaltsen, Kunsang; Hafez, Nabil; Hagopian, Daniel; Hagos, Birhane; Hall, Jennifer; Healy, Claire; Hegarty, Ryan; Honan, Tracey; Horn, Andrea; Houde, Nathan; Hughes, Leanne; Hunnicutt, Leigh; Husby, M; Jester, Benjamin; Jones, Charlien; Kamat, Asha; Kanga, Ben; Kells, Cristyn; Khazanovich, Dmitry; Kieu, Alix Chinh; Kisner, Peter; Kumar, Mayank; Lance, Krista; Landers, Thomas; Lara, Marcia; Lee, William; Leger, Jean-Pierre; Lennon, Niall; Leuper, Lisa; LeVine, Sarah; Liu, Jinlei; Liu, Xiaohong; Lokyitsang, Yeshi; Lokyitsang, Tashi; Lui, Annie; Macdonald, Jan; Major, John; Marabella, Richard; Maru, Kebede; Matthews, Charles; McDonough, Susan; Mehta, Teena; Meldrim, James; Melnikov, Alexandre; Meneus, Louis; Mihalev, Atanas; Mihova, Tanya; Miller, Karen; Mittelman, Rachel; Mlenga, Valentine; Mulrain, Leonidas; Munson, Glen; Navidi, Adam; Naylor, Jerome; Nguyen, Tuyen; Nguyen, Nga; Nguyen, Cindy; Nguyen, Thu; Nicol, Robert; Norbu, Nyima; Norbu, Choe; Novod, Nathaniel; Nyima, Tenchoe; Olandt, Peter; O'Neill, Barry; O'Neill, Keith; Osman, Sahal; Oyono, Lucien; Patti, Christopher; Perrin, Danielle; Phunkhang, Pema; Pierre, Fritz; Priest, Margaret; Rachupka, Anthony; Raghuraman, Sujaa; Rameau, Rayale; Ray, Verneda; Raymond, Christina; Rege, Filip; Rise, Cecil; Rogers, Julie; Rogov, Peter; Sahalie, Julie; Settipalli, Sampath; Sharpe, Theodore; Shea, Terrance; Sheehan, Mechele; Sherpa, Ngawang; Shi, Jianying; Shih, Diana; Sloan, Jessie; Smith, Cherylyn; Sparrow, Todd; Stalker, John; Stange-Thomann, Nicole; Stavropoulos, Sharon; Stone, Catherine; Stone, Sabrina; Sykes, Sean; Tchuinga, Pierre; Tenzing, Pema; Tesfaye, Senait; Thoulutsang, Dawa; Thoulutsang, Yama; Topham, Kerri; Topping, Ira; Tsamla, Tsamla; Vassiliev, Helen; Venkataraman, Vijay; Vo, Andy; Wangchuk, Tsering; Wangdi, Tsering; Weiand, Michael; Wilkinson, Jane; Wilson, Adam; Yadav, Shailendra; Yang, Shuli; Yang, Xiaoping; Young, Geneva; Yu, Qing; Zainoun, Joanne; Zembek, Lisa; Zimmer, Andrew; Lander, Eric S

    2005-12-08

    Here we report a high-quality draft genome sequence of the domestic dog (Canis familiaris), together with a dense map of single nucleotide polymorphisms (SNPs) across breeds. The dog is of particular interest because it provides important evolutionary information and because existing breeds show great phenotypic diversity for morphological, physiological and behavioural traits. We use sequence comparison with the primate and rodent lineages to shed light on the structure and evolution of genomes and genes. Notably, the majority of the most highly conserved non-coding sequences in mammalian genomes are clustered near a small subset of genes with important roles in development. Analysis of SNPs reveals long-range haplotypes across the entire dog genome, and defines the nature of genetic diversity within and across breeds. The current SNP map now makes it possible for genome-wide association studies to identify genes responsible for diseases and traits, with important consequences for human and companion animal health.

  17. X:Map: annotation and visualization of genome structure for Affymetrix exon array analysis

    PubMed Central

    Yates, Tim; Okoniewski, Michał J.; Miller, Crispin J.

    2008-01-01

    Affymetrix exon arrays aim to target every known and predicted exon in the human, mouse or rat genomes, and have reporters that extend beyond protein coding regions to other areas of the transcribed genome. This combination of increased coverage and precision is important because a substantial proportion of protein coding genes are predicted to be alternatively spliced, and because many non-coding genes are known also to be of biological significance. In order to fully exploit these arrays, it is necessary to associate each reporter on the array with the features of the genome it is targeting, and to relate these to gene and genome structure. X:Map is a genome annotation database that provides this information. Data can be browsed using a novel Google-maps based interface, and analysed and further visualized through an associated BioConductor package. The database can be found at http://xmap.picr.man.ac.uk. PMID:17932061

  18. Study of a chimeric foot-and-mouth disease virus DNA vaccine containing structural genes of serotype O in a genome backbone of serotype Asia 1 in guinea pigs.

    PubMed

    Chockalingam, A K; Thiyagarajan, S; Govindasamy, N; Patnaikuni, R; Garlapati, S; Golla, R R; Joyappa, D H; Krishnamshetty, P; Veluvarti, V V S; Veluvati, V V S

    2010-01-01

    Since foot-and-mouth disease virus (FMDV) serotypes display a great genetic and antigenic diversity, there is a constant requirement to monitor the performance of FMDV vaccines in the field with respect to their antigenic coverage. To avoid possible antigenic changes in field FMDV isolates during their adaptation to BHK-21 cells, a standard step used in production of conventional FMDV vaccines, the custom-made chimeric conventional or DNA vaccines, in which antigenic determinants are replaced with those of appropriate field strains, should be constructed. Using this approach, we made a plasmid-based chimeric FMDV DNA vaccine containing structural genes of serotype O in the genome backbone of serotype Asia 1, all under the control of Human cytomegalovirus (HCMV) immediate early gene promoter. BHK-21 cells transfected with the chimeric DNA vaccine did not show cytopathic effect (CPE), but expressed virus-specific proteins as demonstrated by 35S-methionine labeling and immunoprecipitation. Guinea pigs immunized with the chimeric DNA vaccine produced virus-specific antibodies assayed by ELISA and virus neutralization test (VNT), respectively. The chimeric DNA vaccine showed a partial protection of guinea pigs challenged with the virulent FMDV. Although the chimeric DNA vaccine, in general, was not as effective as a conventional one, this study encourages further work towards the development of genetically engineered custom-made chimeric vaccines against FMDV.

  19. Expression and genomic structure of the dormancy-associated MADS box genes MADS13 in Japanese pears (Pyrus pyrifolia Nakai) that differ in their chilling requirement for endodormancy release.

    PubMed

    Saito, Takanori; Bai, Songling; Ito, Akiko; Sakamoto, Daisuke; Saito, Toshihiro; Ubi, Banjamin Ewa; Imai, Tsuyoshi; Moriguchi, Takaya

    2013-06-01

    We isolated three dormancy-associated MADS-box (DAM) genes (MADS13-1, MADS13-2 and MADS13-3) and showed regulated expression concomitant with endodormancy establishment and release in the leaf buds of Japanese pear 'Kosui'. Comparative analysis between 'Kosui' and Taiwanese pear TP-85-119 ('Hengshanli'), a less dormant pear cultivar, showed reduction of MADS13-1 expression level in 'Hengshanli' earlier than in 'Kosui' towards endodormancy release, suggesting the possible relationship between chilling requirement and MADS13-1 expression. Application of hydrogen cyanamide accelerated endodormancy release with a reduction in MADS13 expression, whereas heat treatment in autumn inhibited endodormancy establishment without induction of MADS13 expression, indicating a close relationship between the MADS13 expression pattern and endodormancy phase transitions. Moreover, both the cis-acting regulatory elements and the methylation status in the 5' upstream region of the MADS13-1 gene were not largely different between 'Kosui' and 'Hengshanli'. Genomic structures of MADS13-1 from 'Kosui' and 'Hengshanli' revealed a 3218 bp insertion in the first intron of 'Hengshanli' that might be ascribed to the lower expression of MADS13-1tw; however, this insertion was also found in pear genotypes with a high chilling requirement. These results indicated that the low expression of MADS13-1 in 'Hengshanli' towards endodormancy release could not be explained by the identified cis-acting regulatory elements, the methylation status of the putative promoter or by intron insertion.

  20. Efficient Gene Tree Correction Guided by Genome Evolution

    PubMed Central

    Lafond, Manuel; Seguin, Jonathan; Boussau, Bastien; Guéguen, Laurent; El-Mabrouk, Nadia; Tannier, Eric

    2016-01-01

    Motivations Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases. Results We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny. Availability A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available. PMID:27513924

  1. Integrated genome-wide analysis of genomic changes and gene regulation in human adrenocortical tissue samples.

    PubMed

    Gara, Sudheer Kumar; Wang, Yonghong; Patel, Dhaval; Liu-Chittenden, Yi; Jain, Meenu; Boufraqech, Myriem; Zhang, Lisa; Meltzer, Paul S; Kebebew, Electron

    2015-10-30

    To gain insight into the pathogenesis of adrenocortical carcinoma (ACC) and whether there is progression from normal-to-adenoma-to-carcinoma, we performed genome-wide gene expression, gene methylation, microRNA expression and comparative genomic hybridization (CGH) analysis in human adrenocortical tissue (normal, adrenocortical adenomas and ACC) samples. A pairwise comparison of normal, adrenocortical adenomas and ACC gene expression profiles with more than four-fold expression differences and an adjusted P-value < 0.05 revealed no major differences in normal versus adrenocortical adenoma whereas there are 808 and 1085, respectively, dysregulated genes between ACC versus adrenocortical adenoma and ACC versus normal. The majority of the dysregulated genes in ACC were downregulated. By integrating the CGH, gene methylation and expression profiles of potential miRNAs with the gene expression of dysregulated genes, we found that there are higher alterations in ACC versus normal compared to ACC versus adrenocortical adenoma. Importantly, we identified several novel molecular pathways that are associated with dysregulated genes and further experimentally validated that oncostatin m signaling induces caspase 3 dependent apoptosis and suppresses cell proliferation. Finally, we propose that there is higher number of genomic changes from normal-to-adenoma-to-carcinoma and identified oncostatin m signaling as a plausible druggable pathway for therapeutics.

  2. Draft genome sequence of an Acinetobacter genomic species 3 strain harboring a bla(NDM-1) gene.

    PubMed

    Chen, Yong; Cui, Yujun; Pu, Fei; Jiang, Guoqin; Zhao, Xiangna; Yuan, Yanting; Zhao, Wei; Li, Dongfang; Liu, Hui; Li, Yin; Liang, Ting; Xu, Li; Wang, Yan; Song, Qing; Yang, Jiyong; Liang, Long; Yang, Ruifu; Han, Li; Song, Yajun

    2012-01-01

    Here we report the draft genome sequence of one Acinetobacter genomic species 3 strain, D499, which harbors the bla(NDM-1) gene. The total length of the assembled genome is 4,103,824 bp, and 3,896 coding sequences (CDSs) were predicted within the genome. A previously unreported bla(NDM-1)-bearing plasmid was identified in this strain.

  3. Genomic structures and characterization of the 5'-flanking regions of acyl carrier protein and Delta4-palmitoyl-ACP desaturase genes from Coriandrum sativum.

    PubMed

    Kim, Mi Jung; Shin, Jeong Sheop; Kim, Jeong-Kook; Suh, Mi Chung

    2005-09-25

    The seed-specific or seed-predominant promoters of acyl carrier protein (Cs-ACP1) and Delta4-palmitoyl-acyl carrier protein desaturase (Cs-4PAD) genes, which are involved in the biosynthesis of petroselinic acid, were isolated from coriander (Coriandrum sativum) and analyzed in coriander endosperms and transgenic Arabidopsis. The expression of Cs-ACP1 and Cs-4PAD genes was coordinately regulated during seed development.

  4. Genome Sequencing Fishes out Longevity Genes.

    PubMed

    Lakhina, Vanisha; Murphy, Coleen T

    2015-12-03

    Understanding the molecular basis underlying aging is critical if we are to fully understand how and why we age-and possibly how to delay the aging process. Up until now, most longevity pathways were discovered in invertebrates because of their short lifespans and availability of genetic tools. Now, Reichwald et al. and Valenzano et al. independently provide a reference genome for the short-lived African turquoise killifish, establishing its role as a vertebrate system for aging research.

  5. Draft Genome Sequence and Gene Annotation of Stemphylium lycopersici Strain CIDEFI-216.

    PubMed

    Franco, Mario E E; López, Silvina; Medina, Rocio; Saparrat, Mario C N; Balatti, Pedro

    2015-09-24

    Stemphylium lycopersici is a plant-pathogenic fungus that is widely distributed throughout the world. In tomatoes, it is one of the etiological agents of gray leaf spot disease. Here, we report the first draft genome sequence of S. lycopersici, including its gene structure and functional annotation.

  6. Draft Genome Sequence and Gene Annotation of Stemphylium lycopersici Strain CIDEFI-216

    PubMed Central

    Franco, Mario E. E.; López, Silvina; Medina, Rocio; Saparrat, Mario C. N.

    2015-01-01

    Stemphylium lycopersici is a plant-pathogenic fungus that is widely distributed throughout the world. In tomatoes, it is one of the etiological agents of gray leaf spot disease. Here, we report the first draft genome sequence of S. lycopersici, including its gene structure and functional annotation. PMID:26404600

  7. Genomic organization of SLC3A1, a transporter gene mutated in cystinuria

    SciTech Connect

    Pras, E.; Sood, R.; Raben, N.

    1996-08-15

    The SLC3A1 gene encodes a transport protein for cystine and the dibasic amino acids. Recently mutations in this gene have been shown to cause cystinuria. We report the genomic structure and organization of SLC3A1, which is composed of 10 exons and spans nearly 45 kb. Until now screening for mutations in SLC3A1 has been based on RT-PCR amplification of illegitimate mRNA transcripts from white blood cells. In this report we provide primers for amplification of exons from genomic DNA, thus simplifying the process of screening for SLC3A1 mutations in cystinuria. 20 refs., 3 figs., 2 tabs.

  8. Identification of the major structural and nonstructural proteins encoded by human parvovirus B19 and mapping of their genes by procaryotic expression of isolated genomic fragments

    SciTech Connect

    Cotmore, S.F.; McKie, V.C.; Anderson, L.J.; Astell, C.R.; Tattersall, P.

    1986-11-01

    Plasma from a child with homozygous sickle-cell disease, sampled during the early phase of an aplastic crisis, contained human parvovirus B19 virions. Plasma taken 10 days later (during the convalescent phase) contained both immunoglobulin M and immunoglobulin G antibodies directed against two viral polypeptides with apparent molecular weights for 83,000 and 58,000 which were present exclusively in the particulate fraction of the plasma taken during the acute phase. These two protein species comigrated at 110S on neutral sucrose velocity gradients with the B19 viral DNA and thus appear to constitute the viral capsid polypeptides. The B19 genome was molecularly cloned into a bacterial plasmid vector. Two expression constructs containing B19 sequences from different halves of the viral genome were obtained, which directed the synthesis, in bacteria, of segments of virally encoded protein. These polypeptide fragments were then purified and used to immunize rabbits. Antibodies against a protein sequence specified between nucleotides 2897 and 3749 recognized both the 83- and 58-kilodalton capsid polypeptides in aplastic plasma taken during the acute phase and detected similar proteins in the similar proteins in the tissues of a stillborn fetus which had been infected transplacentally with B19. Antibodies against a protein sequence encoded in the other half of the B19 genome (nucleotides 1072 through 2044) did not react specifically with any protein in plasma taken during the acute phase but recognized three nonstructural polypeptides of 71, 63, and 52 kilodaltons present in the liver and, at lower levels, in some other tissues of the transplacentally infected fetus.

  9. Simplified DGS procedure for large-scale genome structural study.

    PubMed

    Jung, Yong-Chul; Xu, Jia; Chen, Jun; Kim, Yeong; Winchester, David; Wang, San Ming

    2009-11-01

    Ditag genome scanning (DGS) uses next-generation DNA sequencing to sequence the ends of ditag fragments produced by restriction enzymes. These sequences are compared to known genome sequences to determine their structure. In order to use DGS for large-scale genome structural studies, we have substantially revised the original protocol by replacing the in vivo genomic DNA cloning with in vitro adaptor ligation, eliminating the ditag concatemerization steps, and replacing the 454 sequencer with Solexa or SOLiD sequencers for ditag sequence collection. This revised protocol further increases genome coverage and resolution and allows DGS to be used to analyze multiple genomes simultaneously.

  10. A genome-wide 20 K citrus microarray for gene expression analysis

    PubMed Central

    Martinez-Godoy, M Angeles; Mauri, Nuria; Juarez, Jose; Marques, M Carmen; Santiago, Julia; Forment, Javier; Gadea, Jose

    2008-01-01

    Background Understanding of genetic elements that contribute to key aspects of citrus biology will impact future improvements in this economically important crop. Global gene expression analysis demands microarray platforms with a high genome coverage. In the last years, genome-wide EST collections have been generated in citrus, opening the possibility to create new tools for functional genomics in this crop plant. Results We have designed and constructed a publicly available genome-wide cDNA microarray that include 21,081 putative unigenes of citrus. As a functional companion to the microarray, a web-browsable database [1] was created and populated with information about the unigenes represented in the microarray, including cDNA libraries, isolated clones, raw and processed nucleotide and protein sequences, and results of all the structural and functional annotation of the unigenes, like general description, BLAST hits, putative Arabidopsis orthologs, microsatellites, putative SNPs, GO classification and PFAM domains. We have performed a Gene Ontology comparison with the full set of Arabidopsis proteins to estimate the genome coverage of the microarray. We have also performed microarray hybridizations to check its usability. Conclusion This new cDNA microarray replaces the first 7K microarray generated two years ago and allows gene expression analysis at a more global scale. We have followed a rational design to minimize cross-hybridization while maintaining its utility for different citrus species. Furthermore, we also provide access to a website with full structural and functional annotation of the unigenes represented in the microarray, along with the ability to use this site to directly perform gene expression analysis using standard tools at different publicly available servers. Furthermore, we show how this microarray offers a good representation of the citrus genome and present the usefulness of this genomic tool for global studies in citrus by using it to

  11. yrGATE: a web-based gene-structure annotation tool for the identification and dissemination of eukaryotic genes.

    PubMed

    Wilkerson, Matthew D; Schlueter, Shannon D; Brendel, Volker

    2006-01-01

    Your Gene structure Annotation Tool for Eukaryotes (yrGATE) provides an Annotation Tool and Community Utilities for worldwide web-based community genome and gene annotation. Annotators can evaluate gene structure evidence derived from multiple sources to create gene structure annotations. Administrators regulate the acceptance of annotations into published gene sets. yrGATE is designed to facilitate rapid and accurate annotation of emerging genomes as well as to confirm, refine, or correct currently published annotations. yrGATE is highly portable and supports different standard input and output formats. The yrGATE software and usage cases are available at http://www.plantgdb.org/prj/yrGATE.

  12. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  13. Analyses of the complete genome and gene expression of chloroplast of sweet potato [Ipomoea batata].

    PubMed

    Yan, Lang; Lai, Xianjun; Li, Xuedan; Wei, Changhe; Tan, Xuemei; Zhang, Yizheng

    2015-01-01

    Sweet potato [Ipomoea batatas (L.) Lam] ranks among the top seven most important food crops cultivated worldwide and is hexaploid plant (2n=6x=90) in the Convolvulaceae family with a genome size between 2,200 to 3,000 Mb. The genomic resources for this crop are deficient due to its complicated genetic structure. Here, we report the complete nucleotide sequence of the chloroplast (cp) genome of sweet potato, which is a circular molecule of 161,303 bp in the typical quadripartite structure with large (LSC) and small (SSC) single-copy regions separated by a pair of inverted repeats (IRs). The chloroplast DNA contains a total of 145 genes, including 94 protein-encoding genes of which there are 72 single-copy and 11 double-copy genes. The organization and structure of the chloroplast genome (gene content and order, IR expansion/contraction, random repeating sequences, structural rearrangement) of sweet potato were compared with those of Ipomoea (L.) species and some basal important angiosperms, respectively. Some boundary gene-flow and gene gain-and-loss events were identified at intra- and inter-species levels. In addition, by comparing with the transcriptome sequences of sweet potato, the RNA editing events and differential expressions of the chloroplast functional-genes were detected. Moreover, phylogenetic analysis was conducted based on 77 protein-coding genes from 33 taxa and the result may contribute to a better understanding of the evolution progress of the genus Ipomoea (L.), including phylogenetic relationships, intraspecific differentiation and interspecific introgression.

  14. Analyses of the Complete Genome and Gene Expression of Chloroplast of Sweet Potato [Ipomoea batata

    PubMed Central

    Yan, Lang; Lai, Xianjun; Li, Xuedan; Wei, Changhe; Tan, Xuemei; Zhang, Yizheng

    2015-01-01

    Sweet potato [Ipomoea batatas (L.) Lam] ranks among the top seven most important food crops cultivated worldwide and is hexaploid plant (2n=6x=90) in the Convolvulaceae family with a genome size between 2,200 to 3,000 Mb. The genomic resources for this crop are deficient due to its complicated genetic structure. Here, we report the complete nucleotide sequence of the chloroplast (cp) genome of sweet potato, which is a circular molecule of 161,303 bp in the typical quadripartite structure with large (LSC) and small (SSC) single-copy regions separated by a pair of inverted repeats (IRs). The chloroplast DNA contains a total of 145 genes, including 94 protein-encoding genes of which there are 72 single-copy and 11 double-copy genes. The organization and structure of the chloroplast genome (gene content and order, IR expansion/contraction, random repeating sequences, structural rearrangement) of sweet potato were compared with those of Ipomoea (L.) species and some basal important angiosperms, respectively. Some boundary gene-flow and gene gain-and-loss events were identified at intra- and inter-species levels. In addition, by comparing with the transcriptome sequences of sweet potato, the RNA editing events and differential expressions of the chloroplast functional-genes were detected. Moreover, phylogenetic analysis was conducted based on 77 protein-coding genes from 33 taxa and the result may contribute to a better understanding of the evolution progress of the genus Ipomoea (L.), including phylogenetic relationships, intraspecific differentiation and interspecific introgression. PMID:25874767

  15. Mining Bacterial Genomes for Secondary Metabolite Gene Clusters.

    PubMed

    Adamek, Martina; Spohn, Marius; Stegmann, Evi; Ziemert, Nadine

    2017-01-01

    With the emergence of bacterial resistance against frequently used antibiotics, novel antibacterial compounds are urgently needed. Traditional bioactivity-guided drug discovery strategies involve laborious screening efforts and display high rediscovery rates. With the progress in next generation sequencing methods and the knowledge that the majority of antibiotics in clinical use are produced as secondary metabolites by bacteria, mining bacterial genomes for secondary metabolites with antimicrobial activity is a promising approach, which can guide a more time and cost-effective identification of novel compounds. However, what sounds easy to accomplish, comes with several challenges. To date, several tools for the prediction of secondary metabolite gene clusters are available, some of which are based on the detection of signature genes, while others are searching for specific patterns in gene content or regulation.Apart from the mere identification of gene clusters, several other factors such as determining cluster boundaries and assessing the novelty of the detected cluster are important. For this purpose, comparison of the predicted secondary metabolite genes with different cluster and compound databases is necessary. Furthermore, it is advisable to classify detected clusters into gene cluster families. So far, there is no standardized procedure for genome mining; however, different approaches to overcome all of these challenges exist and are addressed in this chapter. We give practical guidance on the workflow for secondary metabolite gene cluster identification, which includes the determination of gene cluster boundaries, addresses problems occurring with the use of draft genomes, and gives an outlook on the different methods for gene cluster classification. Based on comprehensible examples a protocol is set, which should enable the readers to mine their own genome data for interesting secondary metabolites.

  16. Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis.

    PubMed

    Skwark, Marcin J; Croucher, Nicholas J; Puranen, Santeri; Chewapreecha, Claire; Pesonen, Maiju; Xu, Ying Ying; Turner, Paul; Harris, Simon R; Beres, Stephen B; Musser, James M; Parkhill, Julian; Bentley, Stephen D; Aurell, Erik; Corander, Jukka

    2017-02-01

    Recent advances in the scale and diversity of population genomic datasets for bacteria now provide the potential for genome-wide patterns of co-evolution to be studied at the resolution of individual bases. Here we describe a new statistical method, genomeDCA, which uses recent advances in computational structural biology to identify the polymorphic loci under the strongest co-evolutionary pressures. We apply genomeDCA to two large population data sets representing the major human pathogens Streptococcus pneumoniae (pneumococcus) and Streptococcus pyogenes (group A Streptococcus). For pneumococcus we identified 5,199 putative epistatic interactions between 1,936 sites. Over three-quarters of the links were between sites within the pbp2x, pbp1a and pbp2b genes, the sequences of which are critical in determining non-susceptibility to beta-lactam antibiotics. A network-based analysis found these genes were also coupled to that encoding dihydrofolate reductase, changes to which underlie trimethoprim resistance. Distinct from these antibiotic resistance genes, a large network component of 384 protein coding sequences encompassed many genes critical in basic cellular functions, while another distinct component included genes associated with virulence. The group A Streptococcus (GAS) data set population represents a clonal population with relatively little genetic variation and a high level of linkage disequilibrium across the genome. Despite this, we were able to pinpoint two RNA pseudouridine synthases, which were each strongly linked to a separate set of loci across the chromosome, representing biologically plausible targets of co-selection. The population genomic analysis method applied here identifies statistically significantly co-evolving locus pairs, potentially arising from fitness selection interdependence reflecting underlying protein-protein interactions, or genes whose product activities contribute to the same phenotype. This discovery approach greatly

  17. Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis

    PubMed Central

    Pesonen, Maiju; Musser, James M.; Bentley, Stephen D.; Aurell, Erik; Corander, Jukka

    2017-01-01

    Recent advances in the scale and diversity of population genomic datasets for bacteria now provide the potential for genome-wide patterns of co-evolution to be studied at the resolution of individual bases. Here we describe a new statistical method, genomeDCA, which uses recent advances in computational structural biology to identify the polymorphic loci under the strongest co-evolutionary pressures. We apply genomeDCA to two large population data sets representing the major human pathogens Streptococcus pneumoniae (pneumococcus) and Streptococcus pyogenes (group A Streptococcus). For pneumococcus we identified 5,199 putative epistatic interactions between 1,936 sites. Over three-quarters of the links were between sites within the pbp2x, pbp1a and pbp2b genes, the sequences of which are critical in determining non-susceptibility to beta-lactam antibiotics. A network-based analysis found these genes were also coupled to that encoding dihydrofolate reductase, changes to which underlie trimethoprim resistance. Distinct from these antibiotic resistance genes, a large network component of 384 protein coding sequences encompassed many genes critical in basic cellular functions, while another distinct component included genes associated with virulence. The group A Streptococcus (GAS) data set population represents a clonal population with relatively little genetic variation and a high level of linkage disequilibrium across the genome. Despite this, we were able to pinpoint two RNA pseudouridine synthases, which were each strongly linked to a separate set of loci across the chromosome, representing biologically plausible targets of co-selection. The population genomic analysis method applied here identifies statistically significantly co-evolving locus pairs, potentially arising from fitness selection interdependence reflecting underlying protein-protein interactions, or genes whose product activities contribute to the same phenotype. This discovery approach greatly

  18. [Evolution of gene orders in genomes of cyanobacteria].

    PubMed

    Markov, A V; Zakharov, I A

    2009-08-01

    Genomes of 23 strains of cyanobacteria were comparatively analyzed using quantitative methods of estimation of gene order similarity. It has been found that reconstructions of phylogenesis of cyanobacteria based on the comparison of the orders of genes in chromosomes and nucleotide sequences appear to be similar. This confirms the applicability of quantitative measures of similarity of gene orders for phylogenetic reconstructions. In the evolution of marine unicellular plankton cyanobacteria, genome rearrangements are fixed with a low rate (about 3% of gene order changes per 1% of 16S rRNA changes), whereas in other groups of cyanobacteria the gene order can change several times more rapidly. The gene orders in genomes of cyanobacteria and chloroplasts preserve a considerable degree of similarity. The closest relatives of chloroplasts among the analyzed cyanobacteria are likely to be strains from hot springs belonging to the genus Synechococcus. Comparative analysis of gene orders and nucleotide sequences strongly suggests that Synechococcus strains from diferent environments (sea, fresh waters, hot springs) are not related and belong to evolutionally distant lines.

  19. The Plasmodium apicoplast genome: conserved structure and close relationship of P. ovale to rodent malaria parasites.

    PubMed

    Arisue, Nobuko; Hashimoto, Tetsuo; Mitsui, Hideya; Palacpac, Nirianne M Q; Kaneko, Akira; Kawai, Satoru; Hasegawa, Masami; Tanabe, Kazuyuki; Horii, Toshihiro

    2012-09-01

    Apicoplast, a nonphotosynthetic plastid derived from secondary symbiotic origin, is essential for the survival of malaria parasites of the genus Plasmodium. Elucidation of the evolution of the apicoplast genome in Plasmodium species is important to better understand the functions of the organelle. However, the complete apicoplast genome is available for only the most virulent human malaria parasite, Plasmodium falciparum. Here, we obtained the near-complete apicoplast genome sequences from eight Plasmodium species that infect a wide variety of vertebrate hosts and performed structural and phylogenetic analyses. We found that gene repertoire, gene arrangement, and other structural attributes were highly conserved. Phylogenetic reconstruction using 30 protein-coding genes of the apicoplast genome inferred, for the first time, a close relationship between P. ovale and rodent parasites. This close relatedness was robustly supported using multiple evolutionary assumptions and models. The finding suggests that an ancestral host switch occurred between rodent and human Plasmodium parasites.

  20. GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences.

    PubMed

    Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark

    2013-01-01

    Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).

  1. Genome-wide analysis of the GRAS gene family in Chinese cabbage (Brassica rapa ssp. pekinensis).

    PubMed

    Song, Xiao-Ming; Liu, Tong-Kun; Duan, Wei-Ke; Ma, Qing-Hua; Ren, Jun; Wang, Zhen; Li, Ying; Hou, Xi-Lin

    2014-01-01

    The GRAS gene family is one of the most important families of transcriptional regulators. In this study, 48 GRAS genes are identified from Chinese cabbage, and they are classified into eight groups according to the classification of Arabidopsis. The characterization, classification, gene structure and phylogenetic construction of GRAS proteins are performed. Distribution mapping shows that GRAS proteins are nonrandomly localized in 10 chromosomes. Fifty-five orthologous gene pairs are shared by Chinese cabbage and Arabidopsis, and interaction networks of these orthologous genes are constructed. The expansion of GRAS genes in Chinese cabbage results from genome triplication. Among the 17 species examined, 14 higher plants carry the GRAS genes, whereas two lower plants and one fungi species do not. Furthermore, the expression patterns of GRAS genes exhibit differences in three tissues based on RNA-seq data. Taken together, this comprehensive analysis will provide rich resources for studying GRAS protein functions in Chinese cabbage.

  2. Chloroplast Genome Analysis of Resurrection Tertiary Relict Haberlea rhodopensis Highlights Genes Important for Desiccation Stress Response.

    PubMed

    Ivanova, Zdravka; Sablok, Gaurav; Daskalova, Evelina; Zahmanova, Gergana; Apostolova, Elena; Yahubyan, Galina; Baev, Vesselin

    2017-01-01

    Haberlea rhodopensis is a paleolithic tertiary relict species, best known as a resurrection plant with remarkable tolerance to desiccation. When exposed to severe drought stress, H. rhodopensis shows an ability to maintain the structural integrity of its photosynthetic apparatus, which re-activates easily upon rehydration. We present here the results from the assembly and annotation of the chloroplast (cp) genome of H. rhodopensis, which was further subjected to comparative analysis with the cp genomes of closely related species. H. rhodopensis showed a cp genome size of 153,099 bp, harboring a pair of inverted repeats (IR) of 25,415 bp separated by small and large copy regions (SSC and LSC) of 17,826 and 84,443 bp. The genome structure, gene order, GC content and codon usage are similar to those of the typical angiosperm cp genomes. The genome hosts 137 genes representing 70.66% of the plastome, which includes 86 protein-coding genes, 36 tRNAs, and 4 rRNAs. A comparative plastome analysis with other closely related Lamiales members revealed conserved gene order in the IR and LSC/SSC regions. A phylogenetic analysis based on protein-coding genes from 33 species defines this species as belonging to the Gesneriaceae family. From an evolutionary point of view, a site-specific selection analysis detected positively selected sites in 17 genes, most of which are involved in photosynthesis (e.g., rbcL, ndhF, accD, atpE, etc.). The observed codon substitutions may be interpreted as being a consequence of molecular adaptation to drought stress, which ensures an evolutionary advantage to H. rhodopensis.

  3. Chloroplast Genome Analysis of Resurrection Tertiary Relict Haberlea rhodopensis Highlights Genes Important for Desiccation Stress Response

    PubMed Central

    Ivanova, Zdravka; Sablok, Gaurav; Daskalova, Evelina; Zahmanova, Gergana; Apostolova, Elena; Yahubyan, Galina; Baev, Vesselin

    2017-01-01

    Haberlea rhodopensis is a paleolithic tertiary relict species, best known as a resurrection plant with remarkable tolerance to desiccation. When exposed to severe drought stress, H. rhodopensis shows an ability to maintain the structural integrity of its photosynthetic apparatus, which re-activates easily upon rehydration. We present here the results from the assembly and annotation of the chloroplast (cp) genome of H. rhodopensis, which was further subjected to comparative analysis with the cp genomes of closely related species. H. rhodopensis showed a cp genome size of 153,099 bp, harboring a pair of inverted repeats (IR) of 25,415 bp separated by small and large copy regions (SSC and LSC) of 17,826 and 84,443 bp. The genome structure, gene order, GC content and codon usage are similar to those of the typical angiosperm cp genomes. The genome hosts 137 genes representing 70.66% of the plastome, which includes 86 protein-coding genes, 36 tRNAs, and 4 rRNAs. A comparative plastome analysis with other closely related Lamiales members revealed conserved gene order in the IR and LSC/SSC regions. A phylogenetic analysis based on protein-coding genes from 33 species defines this species as belonging to the Gesneriaceae family. From an evolutionary point of view, a site-specific selection analysis detected positively selected sites in 17 genes, most of which are involved in photosynthesis (e.g., rbcL, ndhF, accD, atpE, etc.). The observed codon substitutions may be interpreted as being a consequence of molecular adaptation to drought stress, which ensures an evolutionary advantage to H. rhodopensis. PMID:28265281

  4. Bacterial Genes in the Aphid Genome: Absence of Functional Gene Transfer from Buchnera to Its Host

    PubMed Central

    Nikoh, Naruo; McCutcheon, John P.; Kudo, Toshiaki; Miyagishima, Shin-ya; Moran, Nancy A.; Nakabachi, Atsushi

    2010-01-01

    Genome reduction is typical of obligate symbionts. In cellular organelles, this reduction partly reflects transfer of ancestral bacterial genes to the host genome, but little is known about gene transfer in other obligate symbioses. Aphids harbor anciently acquired obligate mutualists, Buchnera aphidicola (Gammaproteobacteria), which have highly reduced genomes (420–650 kb), raising the possibility of gene transfer from ancestral Buchnera to the aphid genome. In addition, aphids often harbor other bacteria that also are potential sources of transferred genes. Previous limited sampling of genes expressed in bacteriocytes, the specialized cells that harbor Buchnera, revealed that aphids acquired at least two genes from bacteria. The newly sequenced genome of the pea aphid, Acyrthosiphon pisum, presents the first opportunity for a complete inventory of genes transferred from bacteria to the host genome in the context of an ancient obligate symbiosis. Computational screening of the entire A. pisum genome, followed by phylogenetic and experimental analyses, provided strong support for the transfer of 12 genes or gene fragments from bacteria to the aphid genome: three LD–carboxypeptidases (LdcA1, LdcA2,ψLdcA), five rare lipoprotein As (RlpA1-5), N-acetylmuramoyl-L-alanine amidase (AmiD), 1,4-beta-N-acetylmuramidase (bLys), DNA polymerase III alpha chain (ψDnaE), and ATP synthase delta chain (ψAtpH). Buchnera was the apparent source of two highly truncated pseudogenes (ψDnaE and ψAtpH). Most other transferred genes were closely related to genes from relatives of Wolbachia (Alphaproteobacteria). At least eight of the transferred genes (LdcA1, AmiD, RlpA1-5, bLys) appear to be functional, and expression of seven (LdcA1, AmiD, RlpA1-5) are highly upregulated in bacteriocytes. The LdcAs and RlpAs appear to have been duplicated after transfer. Our results excluded the hypothesis that genome reduction in Buchnera has been accompanied by gene transfer to the host

  5. Genome duplication and gene loss affect the evolution of heat shock transcription factor genes in legumes.

    PubMed

    Lin, Yongxiang; Cheng, Ying; Jin, Jing; Jin, Xiaolei; Jiang, Haiyang; Yan, Hanwei; Cheng, Beijiu

    2014-01-01

    Whole-genome duplication events (polyploidy events) and gene loss events have played important roles in the evolution of legumes. Here we show that the vast majority of Hsf gene duplications resulted from whole genome duplication events rather than tandem duplication, and significant differences in gene retention exist between species. By searching for intraspecies gene colinearity (microsynteny) and dating the age distributions of duplicated genes, we found that genome duplications accounted for 42 of 46 Hsf-containing segments in Glycine max, while paired segments were rarely identified in Lotus japonicas, Medicago truncatula and Cajanus cajan. However, by comparing interspecies microsynteny, we determined that the great majority of Hsf-containing segments in Lotus japonicas, Medicago truncatula and Cajanus cajan show extensive conservation with the duplicated regions of Glycine max. These segments formed 17 groups of orthologous segments. These results suggest that these regions shared ancient genome duplication with Hsf genes in Glycine max, but more than half of the copies of these genes were lost. On the other hand, the Glycine max Hsf gene family retained approximately 75% and 84% of duplicated genes produced from the ancient genome duplication and recent Glycine-specific genome duplication, respectively. Continuous purifying selection has played a key role in the maintenance of Hsf genes in Glycine max. Expression analysis of the Hsf genes in Lotus japonicus revealed their putative involvement in multiple tissue-/developmental stages and responses to various abiotic stimuli. This study traces the evolution of Hsf genes in legume species and demonstrates that the rates of gene gain and loss are far from equilibrium in different species.

  6. Epigenomics and the structure of the living genome.

    PubMed

    Friedman, Nir; Rando, Oliver J

    2015-10-01

    Eukaryotic genomes are packaged into an extensively folded state known as chromatin. Analysis of the structure of eukaryotic chromosomes has been revolutionized by development of a suite of genome-wide measurement technologies, collectively termed "epigenomics." We review major advances in epigenomic analysis of eukaryotic genomes, covering aspects of genome folding at scales ranging from whole chromosome folding down to nucleotide-resolution assays that provide structural insights into protein-DNA interactions. We then briefly outline several challenges remaining and highlight new developments such as single-cell epigenomic assays that will help provide us with a high-resolution structural understanding of eukaryotic genomes.

  7. Exploring laccase genes from plant pathogen genomes: a bioinformatic approach.

    PubMed

    Feng, B Z; Li, P Q; Fu, L; Yu, X M

    2015-10-30

    To date, research on laccases has mostly been focused on plant and fungal laccases and their current use in biotechnological applications. In contrast, little is known about laccases from plant pathogens, although recent rapid progress in whole genome sequencing of an increasing number of organisms has facilitated their identification and ascertainment of their origins. In this study, a comparative analysis was performed to elucidate the distribution of laccases among bacteria, fungi, and oomycetes, and, through comparison of their amino acids, to determine the relationships between them. We retrieved the laccase genes for the 20 publicly available plant pathogen genomes. From these, 125 laccase genes were identified in total, including seven in bacterial genomes, 101 in fungal genomes, and 17 in oomycete genomes. Most of the predicted protein models of these genes shared typical fungal laccase characteristics, possessing four conserved domains with one cysteine and ten histidine residues at these domains. Phylogenetic analysis illustrated that laccases from bacteria and oomycetes were grouped into two distinct clades, whereas fungal laccases clustered in three main clades. These results provide the theoretical groundwork regarding the role of laccases in plant pathogens and might be used to guide future research into these enzymes.

  8. Gene discovery in the Acanthamoeba castellanii genome

    SciTech Connect

    Anderson, Iain J.; Watkins, Russell F.; Samuelson, John; Spencer,David F.; Majoros, William H.; Gray, Michael W.; Loftus, Brendan J.

    2005-08-01

    Acanthamoeba castellanii is a free-living amoeba found in soil, freshwater, and marine environments and an important predator of bacteria. Acanthamoeba castellanii is also an opportunistic pathogen of clinical interest, responsible for several distinct diseases in humans. In order to provide a genomic platform for the study of this ubiquitous and important protist, we generated a sequence survey of approximately 0.5 x coverage of the genome. The data predict that A. castellanii exhibits a greater biosynthetic capacity than the free-living Dictyostelium discoideum and the parasite Entamoeba histolytica, providing an explanation for the ability of A. castellanii to inhabit adversity of environments. Alginate lyase may provide access to bacteria within biofilms by breaking down the biofilm matrix, and polyhydroxybutyrate depolymerase may facilitate utilization of the bacterial storage compound polyhydroxybutyrate as a food source. Enzymes for the synthesis and breakdown of cellulose were identified, and they likely participate in encystation and excystation as in D. discoideum. Trehalose-6-phosphate synthase is present, suggesting that trehalose plays a role in stress adaptation. Detection and response to a number of stress conditions is likely accomplished with a large set of signal transduction histidine kinases and a set of putative receptorserine/threonine kinases similar to those found in E. histolytica. Serine, cysteine and metalloproteases were identified, some of which are likely involved in pathogenicity.

  9. Genome-wide analysis of homeobox genes from Mesobuthus martensii reveals Hox gene duplication in scorpions.

    PubMed

    Di, Zhiyong; Yu, Yao; Wu, Yingliang; Hao, Pei; He, Yawen; Zhao, Huabin; Li, Yixue; Zhao, Guoping; Li, Xuan; Li, Wenxin; Cao, Zhijian

    2015-06-01

    Homeobox genes belong to a large gene group, which encodes the famous DNA-binding homeodomain that plays a key role in development and cellular differentiation during embryogenesis in animals. Here, one hundred forty-nine homeobox genes were identified from the Asian scorpion, Mesobuthus martensii (Chelicerata: Arachnida: Scorpiones: Buthidae) based on our newly assembled genome sequence with approximately 248 × coverage. The identified homeobox genes were categorized into eight classes including 82 families: 67 ANTP class genes, 33 PRD genes, 11 LIM genes, five POU genes, six SINE genes, 14 TALE genes, five CUT genes, two ZF genes and six unclassified genes. Transcriptome data confirmed that more than half of the genes were expressed in adults. The homeobox gene diversity of the eight classes is similar to the previously analyzed Mandibulata arthropods. Interestingly, it is hypothesized that the scorpion M. martensii may have two Hox clusters. The first complete genome-wide analysis of homeobox genes in Chelicerata not only reveals the repertoire of scorpion, arachnid and chelicerate homeobox genes, but also shows some insights into the evolution of arthropod homeobox genes.

  10. IGD: a resource for intronless genes in the human genome.

    PubMed

    Louhichi, Amel; Fourati, Ahmed; Rebaï, Ahmed

    2011-11-15

    Intronless genes (IGs) fraction varies between 2.7 and 97.7% in eukaryotic genomes. Although many databases on exons and introns exist, there was no curated database for such genes which allowed their study in a concerted manner. Such a database would be useful to identify the functional features and the distribution of these genes across the genome. Here, a new database of IGs in eukaryotes based on GenBank data was described. This database, called IGD (Intronless Gene Database), is a collection of gene sequences that were annotated and curated. The current version of IGD contains 687 human intronless genes with their protein and CDS sequences. Some features of the entries are given in this paper. Data was extracted from GenBank release 183 using a Perl script. Data extraction was followed by a manual curation step. Intronless genes were then analyzed based on their RefSeq annotation and Gene Ontology functional class. IGD represents a useful resource for retrieval and in silico study of intronless genes. IGD is available at http://www.bioinfo-cbs.org/igd with comprehensive help and FAQ pages that illustrate the main uses of this resource.

  11. Gene map of large yellow croaker (Larimichthys crocea) provides insights into teleost genome evolution and conserved regions associated with growth.

    PubMed

    Xiao, Shijun; Wang, Panpan; Zhang, Yan; Fang, Lujing; Liu, Yang; Li, Jiong-Tang; Wang, Zhi-Yong

    2015-12-22

    The genetic map of a species is essential for its whole genome assembly and can be applied to the mapping of important traits. In this study, we performed RNA-seq for a family of large yellow croakers (Larimichthys crocea) and constructed a high-density genetic map. In this map, 24 linkage groups comprised 3,448 polymorphic SNP markers. Approximately 72.4% (2,495) of the markers were located in protein-coding regions. Comparison of the croaker genome with those of five model fish species revealed that the croaker genome structure was closer to that of the medaka than to the remaining four genomes. Because the medaka genome preserves the teleost ancestral karyotype, this result indicated that the croaker genome might also maintain the teleost ancestral genome structure. The analysis also revealed different genome rearrangements across teleosts. QTL mapping and association analysis consistently identified growth-related QTL regions and associated genes. Orthologs of the associated genes in other species were demonstrated to regulate development, indicating that these genes might regulate development and growth in croaker. This gene map will enable us to construct the croaker genome for comparative studies and to provide an important resource for selective breeding of croaker.

  12. A Genomic Signature and the Identification of New Sporulation Genes

    PubMed Central

    Abecasis, Ana B.; Serrano, Mónica; Alves, Renato; Quintais, Leonor

    2013-01-01

    Bacterial endospores are the most resistant cell type known to humans, as they are able to withstand extremes of temperature, pressure, chemical injury, and time. They are also of interest because the endospore is the infective particle in a variety of human and livestock diseases. Endosporulation is characterized by the morphogenesis of an endospore within a mother cell. Based on the genes known to be involved in endosporulation in the model organism Bacillus subtilis, a conserved core of about 100 genes was derived, representing the minimal machinery for endosporulation. The core was used to define a genomic signature of about 50 genes that are able to distinguish endospore-forming organisms, based on complete genome sequences, and we show this 50-gene signature is robust against phylogenetic proximity and other artifacts. This signature includes previously uncharacterized genes that we can now show are important for sporulation in B. subtilis and/or are under developmental control, thus further validating this genomic signature. We also predict that a series of polyextremophylic organisms, as well as several gut bacteria, are able to form endospores, and we identified 3 new loci essential for sporulation in B. subtilis: ytaF, ylmC, and ylzA. In all, the results support the view that endosporulation likely evolved once, at the base of the Firmicutes phylum, and is unrelated to other bacterial cell differentiation programs and that this involved the evolution of new genes and functions, as well as the cooption of ancestral, housekeeping functions. PMID:23396918

  13. GENOME-ENABLED DISCOVERY OF CARBON SEQUESTRATION GENES IN POPLAR

    SciTech Connect

    DAVIS J M

    2007-10-11

    Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, the ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.

  14. GeneOrder3.0: Software for comparing the order of genes in pairs of small bacterial genomes

    PubMed Central

    Celamkoti, Srikanth; Kundeti, Sashidhara; Purkayastha, Anjan; Mazumder, Raja; Buck, Charles; Seto, Donald

    2004-01-01

    Background An increasing number of whole viral and bacterial genomes are being sequenced and deposited in public databases. In parallel to the mounting interest in whole genomes, the number of whole genome analyses software tools is also increasing. GeneOrder was originally developed to provide an analysis of genes between two genomes, allowing visualization of gene order and synteny comparisons of any small genomes. It was originally developed for comparing virus, mitochondrion and chloroplast genomes. This is now extended to small bacterial genomes of sizes less than 2 Mb. Results GeneOrder3.0 has been developed and validated successfully on several small bacterial genomes (ca. 580 kb to 1.83 Mb) archived in the NCBI GenBank database. It is an updated web-based "on-the-fly" computational tool allowing gene order and synteny comparisons of any two small bacterial genomes. Analyses of several bacterial genomes show that a large amount of gene and genome re-arrangement occurs, as seen with earlier DNA software tools. This can be displayed at the protein level using GeneOrder3.0. Whole genome alignments of genes are presented in both a table and a dot plot. This allows the detection of evolutionary more distant relationships since protein sequences are more conserved than DNA sequences. Conclusions GeneOrder3.0 allows researchers to perform comparative analysis of gene order and synteny in genomes of sizes up to 2 Mb "on-the-fly." Availability: and . PMID:15128433

  15. A genome-wide analysis of the expansin genes in Malus × Domestica.

    PubMed

    Zhang, Shizhong; Xu, Ruirui; Gao, Zheng; Chen, Changtian; Jiang, Zesheng; Shu, Huairui

    2014-04-01

    Expansins were first identified as cell wall-loosening proteins; they are involved in regulating cell expansion, fruits softening and many other physiological processes. However, our knowledge about the expansin family members and their evolutionary relationships in fruit trees, such as apple, is limited. In this study, we identified 41 members of the expansin gene family in the genome of apple (Malus × Domestica L. Borkh). Phylogenetic analysis revealed that expansin genes in apple could be divided into four subfamilies according to their gene structures and protein motifs. By phylogenetic analysis of the expansins in five plants (Arabidopsis, rice, poplar, grape and apple), the expansins were divided into 17 subgroups. Our gene duplication analysis revealed that whole-genome and chromosomal-segment duplications contributed to the expansion of Mdexpansins. The microarray and expressed sequence tag (EST) data showed that 34 Mdexpansin genes could be divided into five groups by the EST analysis; they may also play different roles during fruit development. An expression model for MdEXPA16 and MdEXPA20 showed their potential role in developing fruit. Overall, our study provides useful data and novel insights into the functions and regulatory mechanisms of the expansin genes in apple, as well as their evolution and divergence. As the first step towards genome-wide analysis of the expansin genes in apple, our results have established a solid foundation for future studies on the function of the expansin genes in fruit development.

  16. Re-Examining the Gene in Personalized Genomics

    ERIC Educational Resources Information Center

    Bartol, Jordan

    2013-01-01

    Personalized genomics companies (PG; also called "direct-to-consumer genetics") are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept…

  17. Comparative 3D genome structure analysis of the fission and the budding yeast.

    PubMed

    Gong, Ke; Tjong, Harianto; Zhou, Xianghong Jasmine; Alber, Frank

    2015-01-01

    We studied the 3D structural organization of the fission yeast genome, which emerges from the tethering of heterochromatic regions in otherwise randomly configured chromosomes represented as flexible polymer chains in an nuclear environment. This model is sufficient to explain in a statistical manner many experimentally determined distinctive features of the fission yeast genome, including chromatin interaction patterns from Hi-C experiments and the co-locations of functionally related and co-expressed genes, such as genes expressed by Pol-III. Our findings demonstrate that some previously described structure-function correlations can be explained as a consequence of random chromatin collisions driven by a few geometric constraints (mainly due to centromere-SPB and telomere-NE tethering) combined with the specific gene locations in the chromosome sequence. We also performed a comparative analysis between the fission and budding yeast genome structures, for which we previously detected a similar organizing principle. However, due to the different chromosome sizes and numbers, substantial differences are observed in the 3D structural genome organization between the two species, most notably in the nuclear locations of orthologous genes, and the extent of nuclear territories for genes and chromosomes. However, despite those differences, remarkably, functional similarities are maintained, which is evident when comparing spatial clustering of functionally related genes in both yeasts. Functionally related genes show a similar spatial clustering behavior in both yeasts, even though their nuclear locations are largely different between the yeast species.

  18. Genomic structure and expression of immunoglobulins in Squamata.

    PubMed

    Olivieri, David N; Garet, Elina; Estevez, Olivia; Sánchez-Espinel, Christian; Gambón-Deza, Francisco

    2016-04-01

    The Squamata order represents a major evolutionary reptile lineage, yet the structure and expression of immunoglobulins in this order has been scarcely studied in detail. From the genome sequences of four Squamata species (Gekko japonicus, Ophisaurus gracilis, Pogona vitticeps and Ophiophagus hannah) and RNA-seq datasets from 18 other Squamata species, we identified the immunoglobulins present in these animals as well as the tissues in which they are found. All Squamata have at least three immunoglobulin classes; namely, the immunoglobulins M, D, and Y. Unlike mammals, however, we provide evidence that some Squamata lineages possess more than one Cμ gene which is located downstream from the Cδ gene. The existence of two evolutionary lineages of immunoglobulin Y is shown. Additionally, it is demonstrated that while all Squamata species possess the λ light chain, only Iguanidae species possess the κ light chain.

  19. Plasmodium vivax apicoplast genome: a comparative analysis of major genes from Indian field isolates.

    PubMed

    Saxena, Vishal; Garg, Shilpi; Tripathi, Jyotsna; Sharma, Sonal; Pakalapati, Deepak; Subudhi, Amit K; Boopathi, P A; Saggu, Gagandeep S; Kochar, Sanjay K; Kochar, Dhanpat K; Das, Ashis

    2012-04-01

    The apicomplexan parasite Plasmodium vivax is responsible for causing more than 70% of human malaria cases in Central and South America, Southeastern Asia and the Indian subcontinent. The rising severity of the disease and the increasing incidences of resistance shown by this parasite towards usual therapeutic regimens have necessitated investigation of putative novel drug targets to combat this disease. The apicoplast, an organelle of procaryotic origin, and its circular genome carrying genes of possible functional importance, are being looked upon as potential drug targets. The genes on this circular genome are believed to be highly conserved among all Plasmodium species. Till date, the plastid genome of P. falciparum, P. berghei and P. chabaudi have been detailed while partial sequences of some genes from other parasites including P. vivax have been studied for identifying evolutionary positions of these parasites. The functional aspects and significance of most of these genes are still hypothetical. In one of our previous reports, we have detailed the complete sequence, as well as structural and functional characteristics of the Elongation factor encoding tufA gene from the plastid genome of P. vivax. We present here the sequences of large and small subunit rRNA (lsu and ssu rRNA) genes, sufB (ORF470) gene, RNA polymerase (rpo B, C) subunit genes and clpC (casienolytic protease) gene from the plastid genome of P. vivax. A comparative analysis of these genes between P. vivax and P. falciparum reveals approximately 5-16% differences. A codon usage analysis of major plastid genes has shown a high frequency of codons rich in A/T at any or all of the three positions in all the species. TTA, AAT, AAA, TAT, and ATA are the major preferred codons. The sequences, functional domains and structural analysis of respective proteins do not show any variations in the active sites. A comparative analysis of these Indian P. vivax plastid genome encoded genes has also been done

  20. An integrated map of structural variation in 2,504 human genomes.

    PubMed

    Sudmant, Peter H; Rausch, Tobias; Gardner, Eugene J; Handsaker, Robert E; Abyzov, Alexej; Huddleston, John; Zhang, Yan; Ye, Kai; Jun, Goo; Hsi-Yang Fritz, Markus; Konkel, Miriam K; Malhotra, Ankit; Stütz, Adrian M; Shi, Xinghua; Paolo Casale, Francesco; Chen, Jieming; Hormozdiari, Fereydoun; Dayama, Gargi; Chen, Ken; Malig, Maika; Chaisson, Mark J P; Walter, Klaudia; Meiers, Sascha; Kashin, Seva; Garrison, Erik; Auton, Adam; Lam, Hugo Y K; Jasmine Mu, Xinmeng; Alkan, Can; Antaki, Danny; Bae, Taejeong; Cerveira, Eliza; Chines, Peter; Chong, Zechen; Clarke, Laura; Dal, Elif; Ding, Li; Emery, Sarah; Fan, Xian; Gujral, Madhusudan; Kahveci, Fatma; Kidd, Jeffrey M; Kong, Yu; Lameijer, Eric-Wubbo; McCarthy, Shane; Flicek, Paul; Gibbs, Richard A; Marth, Gabor; Mason, Christopher E; Menelaou, Androniki; Muzny, Donna M; Nelson, Bradley J; Noor, Amina; Parrish, Nicholas F; Pendleton, Matthew; Quitadamo, Andrew; Raeder, Benjamin; Schadt, Eric E; Romanovitch, Mallory; Schlattl, Andreas; Sebra, Robert; Shabalin, Andrey A; Untergasser, Andreas; Walker, Jerilyn A; Wang, Min; Yu, Fuli; Zhang, Chengsheng; Zhang, Jing; Zheng-Bradley, Xiangqun; Zhou, Wanding; Zichner, Thomas; Sebat, Jonathan; Batzer, Mark A; McCarroll, Steven A; Mills, Ryan E; Gerstein, Mark B; Bashir, Ali; Stegle, Oliver; Devine, Scott E; Lee, Charles; Eichler, Evan E; Korbel, Jan O

    2015-10-01

    Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.

  1. Genome-wide identification and functional analyses of calmodulin genes in Solanaceous species

    PubMed Central

    2013-01-01

    Background Calmodulin (CaM) is a major calcium sensor in all eukaryotes. It binds calcium and modulates the activity of a wide range of downstream proteins in response to calcium signals. However, little is known about the CaM gene family in Solanaceous species, including the economically important species, tomato (Solanum lycopersicum), and the gene silencing model plant, Nicotiana benthamiana. Moreover, the potential function of CaM in plant disease resistance remains largely unclear. Results We performed genome-wide identification of CaM gene families in Solanaceous species. Employing bioinformatics approaches, multiple full-length CaM genes were identified from tomato, N. benthamiana and potato (S. tuberosum) genomes, with tomato having 6 CaM genes, N. benthamiana having 7 CaM genes, and potato having 4 CaM genes. Sequence comparison analyses showed that three tomato genes, SlCaM3/4/5, two potato genes StCaM2/3, and two sets of N. benthamiana genes, NbCaM1/2/3/4 and NbCaM5/6, encode identical CaM proteins, yet the genes contain different intron/exon organization and are located on different chromosomes. Further sequence comparisons and gene structural and phylogenetic analyses reveal that Solanaceous species gained a new group of CaM genes during evolution. These new CaM genes are unusual in that they contain three introns in contrast to only a single intron typical of known CaM genes in plants. The tomato CaM (SlCaM) genes were found to be expressed in all organs. Prediction of cis-acting elements in 5' upstream sequences and expression analyses demonstrated that SlCaM genes have potential to be highly responsive to a variety of biotic and abiotic stimuli. Additionally, silencing of SlCaM2 and SlCaM6 altered expression of a set of signaling and defense-related genes and resulted in significantly lower resistance to Tobacco rattle virus and the oomycete pathogen, Pythium aphanidermatum. Conclusions The CaM gene families in the Solanaceous species tomato, N

  2. Genome-Wide Scans for Delineation of Candidate Genes Regulating Seed-Protein Content in Chickpea

    PubMed Central

    Upadhyaya, Hari D.; Bajaj, Deepak; Narnoliya, Laxmi; Das, Shouvik; Kumar, Vinod; Gowda, C. L. L.; Sharma, Shivali; Tyagi, Akhilesh K.; Parida, Swarup K.

    2016-01-01

    Identification of potential genes/alleles governing complex seed-protein content (SPC) is essential in marker-assisted breeding for quality trait improvement of chickpea. Henceforth, the present study utilized an integrated genomics-assisted breeding strategy encompassing trait association analysis, selective genotyping in traditional bi-parental mapping population and differential expression profiling for the first-time to understand the complex genetic architecture of quantitative SPC trait in chickpea. For GWAS (genome-wide association study), high-throughput genotyping information of 16376 genome-based SNPs (single nucleotide polymorphism) discovered from a structured population of 336 sequenced desi and kabuli accessions [with 150–200 kb LD (linkage disequilibrium) decay] was utilized. This led to identification of seven most effective genomic loci (genes) associated [10–20% with 41% combined PVE (phenotypic variation explained)] with SPC trait in chickpea. Regardless of the diverse desi and kabuli genetic backgrounds, a comparable level of association potential of the identified seven genomic loci with SPC trait was observed. Five SPC-associated genes were validated successfully in parental accessions and homozygous individuals of an intra-specific desi RIL (recombinant inbred line) mapping population (ICC 12299 × ICC 4958) by selective genotyping. The seed-specific expression, including differential up-regulation (>four fold) of six SPC-associated genes particularly in accessions, parents and homozygous individuals of the aforementioned mapping population with a high level of contrasting SPC (21–22%) was evident. Collectively, the integrated genomic approach delineated diverse naturally occurring novel functional SNP allelic variants in six potential candidate genes regulating SPC trait in chickpea. Of these, a non-synonymous SNP allele-carrying zinc finger transcription factor gene exhibiting strong association with SPC trait was found to be the most

  3. From structure prediction to genomic screens for novel non-coding RNAs.

    PubMed

    Gorodkin, Jan; Hofacker, Ivo L

    2011-08-01

    Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other.

  4. Genome-Wide Identification and Expression Analysis of WRKY Gene Family in Capsicum annuum L.

    PubMed Central

    Diao, Wei-Ping; Snyder, John C.; Wang, Shu-Bin; Liu, Jin-Bing; Pan, Bao-Gui; Guo, Guang-Jun; Wei, Ge

    2016-01-01

    The WRKY family of transcription factors is one of the most important families of plant transcriptional regulators with members regulating multiple biological processes, especially in regulating defense against biotic and abiotic stresses. However, little information is available about WRKYs in pepper (Capsicum annuum L.). The recent release of completely assembled genome sequences of pepper allowed us to perform a genome-wide investigation for pepper WRKY proteins. In the present study, a total of 71 WRKY genes were identified in the pepper genome. According to structural features of their encoded proteins, the pepper WRKY genes (CaWRKY) were classified into three main groups, with the second group further divided into five subgroups. Genome mapping analysis revealed that CaWRKY were enriched on four chromosomes, especially on chromosome 1, and 15.5% of the family members were tandemly duplicated genes. A phylogenetic tree was constructed depending on WRKY domain' sequences derived from pepper and Arabidopsis. The expression of 21 selected CaWRKY genes in response to seven different biotic and abiotic stresses (salt, heat shock, drought, Phytophtora capsici, SA, MeJA, and ABA) was evaluated by quantitative RT-PCR; Some CaWRKYs were highly expressed and up-regulated by stress treatment. Our results will provide a platform for functional identification and molecular breeding studies of WRKY genes in pepper. PMID:26941768

  5. Comparative genomic analysis of teleost fish bmal genes.

    PubMed

    Wang, Han

    2009-05-01

    Bmal1 (Brain and muscle ARNT like 1) gene is a key circadian clock gene. Tetrapods also have the second Bmal gene, Bmal2. Fruit fly has only one bmal1/cycle gene. Interrogation of the five teleost fish genome sequences coupled with phylogenetic and splice site analyses found that zebrafish have two bmal1 genes, bmal1a and bmal1b, and bmal2a; Japanese pufferfish (fugu), green spotted pufferfish (tetraodon) and Japanese medaka fish each have two bmal2 genes, bmal2a and bmal2b, and bmal1a; and three-spine stickleback have bmal1a and bmal2b. Syntenic analysis further indicated that zebrafish bmal1a/bmal1b, and fugu, tetraodon and medaka bmal2a/bmal2b are ancient duplicates. Although the dN/dS ratios of these four fish bmal duplicates are all <1, implicating they have been under purifying selection, the Tajima relative rate test showed that fugu, tetraodon and medaka bmal2a/bmal2b have asymmetric evolutionary rates, suggesting that one of these duplicates have been subject to positive selection or relaxed functional constraint. These results support the notion that teleost fish bmal genes were derived from the fish-specific genome duplication (FSGD), divergent resolution following the duplication led to retaining different ancient bmal duplicates in different fishes, which could have shaped the evolution of the complex teleost fish timekeeping mechanisms.

  6. Population structure and minimum core genome typing of Legionella pneumophila

    PubMed Central

    Qin, Tian; Zhang, Wen; Liu, Wenbin; Zhou, Haijian; Ren, Hongyu; Shao, Zhujun; Lan, Ruiting; Xu, Jianguo

    2016-01-01

    Legionella pneumophila is an important human pathogen causing Legionnaires’ disease. In this study, whole genome sequencing (WGS) was used to study the characteristics and population structure of L. pneumophila strains. We sequenced and compared 53 isolates of L. pneumophila covering different serogroups and sequence-based typing (SBT) types (STs). We found that 1,896 single-copy orthologous genes were shared by all isolates and were defined as the minimum core genome (MCG) of L. pneumophila. A total of 323,224 single-nucleotide polymorphisms (SNPs) were identified among the 53 strains. After excluding 314,059 SNPs which were likely to be results of recombination, the remaining 9,165 SNPs were referred to as MCG SNPs. Population Structure analysis based on MCG divided the 53 L. pneumophila into nine MCG groups. The within-group distances were much smaller than the between-group distances, indicating considerable divergence between MCG groups. MCG groups were also supplied by phylogenetic analysis and may be considered as robust taxonomic units within L. pneumophila. Among the nine MCG groups, eight showed high intracellular growth ability while one showed low intracellular growth ability. Furthermore, MCG typing also showed high resolution in subtyping ST1 strains. The results obtained in this study provided significant insights into the evolution, population structure and pathogenicity of L. pneumophila. PMID:26888563

  7. Diversity of human tRNA genes from the 1000-genomes project.

    PubMed

    Parisien, Marc; Wang, Xiaoyun; Pan, Tao

    2013-12-01

    The sequence diversity of individual human genomes has been extensively analyzed for variations and phenotypic implications for mRNA, miRNA, and long non-coding RNA genes. TRNA (tRNA) also exhibits large sequence diversity in the human genome, but tRNA gene sequence variation and potential functional implications in individual human genomes have not been investigated. Here we capitalize on the sequencing data from the 1000-genomes project to examine the diversity of tRNA genes in the human population. Previous analysis of the reference human genome indicated an unexpected large number of diverse tRNA genes beyond the necessity of translation, suggesting that some tRNA transcripts may perform non-canonical functions. We found 24 new tRNA sequences in>1% and 76 new tRNA sequences in>0.2% of all individuals, indicating that tRNA genes are also subject to evolutionary changes in the human population. Unexpectedly, two abundant new tRNA genes contain base-pair mismatches in the anticodon stem. We experimentally determined that these two new tRNAs have altered structures in vitro; however, one new tRNA is not aminoacylated but extremely stable in HeLa cells, suggesting that this new tRNA can be used for non-canonical function. Our results show that at the scale of human population, tRNA genes are more diverse than conventionally understood, and some new tRNAs may perform non-canonical, extra-translational functions that may be linked to human health and disease.

  8. Genome-wide identification, phylogeny, and expression of fibroblast growth genes in common carp.

    PubMed

    Jiang, Likun; Zhang, Songhao; Dong, Chuanju; Chen, Baohua; Feng, Jingyan; Peng, Wenzhu; Mahboob, Shahid; Al-Ghanim, Khalid A; Xu, Peng

    2016-03-10

    Fibroblast growth factors (FGFs) are a large family of polypeptide growth factors, which are found in organisms ranging from nematodes to humans. In vertebrates, a number of FGFs have been shown to play important roles in developing embryos and adult organisms. Among the vertebrate species, FGFs are highly conserved in both gene structure and amino-acid sequence. However, studies on teleost FGFs are mainly limited to model species, hence we investigated FGFs in the common carp genome. We identified 35 FGFs in the common carp genome. Phylogenetic analysis revealed that most of the FGFs are highly conserved, though recent gene duplication and gene losses do exist. By examining the copy number of FGFs in several vertebrate genomes, we found that eight FGFs in common carp have undergone gene duplications, including FGF6a, FGF6b, FGF7, FGF8b, FGF10a, FGF11b, FGF13a, and FGF18b. The expression patterns of all FGFs were examined in various tissues, including the blood, brain, gill, heart, intestine, muscle, skin, spleen and kidney, showing that most of the FGFs were ubiquitously expressed, indicating their critical role in common carp. To some extent, examination of gene families with detailed phylogenetic or orthology analysis verified the authenticity and accuracy of assembly and annotation of the recently published common carp whole genome sequences. Gene families are also considered as a unique source for evolutionary studies. Moreover, the whole set of common carp FGF gene family provides an important genomic resource for future biochemical, physiological, and phylogenetic studies on FGFs in teleosts.

  9. Genomic architecture of MHC-linked odorant receptor gene repertoires among 16 vertebrate species.

    PubMed

    Santos, Pablo Sandro Carvalho; Kellermann, Thomas; Uchanska-Ziegler, Barbara; Ziegler, Andreas

    2010-09-01

    The recent sequencing and assembly of the genomes of different organisms have shown that almost all vertebrates studied in detail so far have one or more clusters of genes encoding odorant receptors (OR) in close physical linkage to the major histocompatibility complex (MHC). It has been postulated that MHC-linked OR genes could be involved in MHC-influenced mate choice, comprising both pre- as well as post-copulatory mechanisms. We have therefore carried out a systematic comparison of protein sequences of these receptors from the genomes of man, chimpanzee, gorilla, orangutan, rhesus macaque, mouse, rat, dog, cat, cow, pig, horse, elephant, opossum, frog and zebra fish (amounting to a total of 559 protein sequences) in order to identify OR families exhibiting evolutionarily conserved MHC linkage. In addition, we compared the genomic structure of this region within these 16 species, accounting for presence or absence of OR gene families, gene order, transcriptional orientation and linkage to the MHC or framework genes. The results are presented in the form of gene maps and phylogenetic analyses that reveal largely concordant repertoires of gene families, at least among tetrapods, although each of the eight taxa studied (primates, rodents, ungulates, carnivores, proboscids, marsupials, amphibians and teleosts) exhibits a typical architecture of MHC (or MHC framework loci)-linked OR genes. Furthermore, the comparison of the genomic organization of this region has implications for phylogenetic relationships between closely related taxa, especially in disputed cases such as the evolutionary history of even- and odd-toed ungulates and carnivores. Finally, the largely conserved linkage between distinct OR genes and the MHC supports the concept that particular alleles within a given haplotype function in a concerted fashion during self-/non-self-discrimination processes in reproduction.

  10. Genome assembly has a major impact on gene content: a comparison of annotation in two Bos taurus assemblies.

    PubMed

    Florea, Liliana; Souvorov, Alexander; Kalbfleisch, Theodore S; Salzberg, Steven L

    2011-01-01

    Gene and SNP annotation are among the first and most important steps in analyzing a genome. As the number of sequenced genomes continues to grow, a key question is: how does the quality of the assembled sequence affect the annotations? We compared the gene and SNP annotations for two different Bos taurus genome assemblies built from the same data but with significant improvements in the later assembly. The same annotation software was used for annotating both sequences. While some annotation differences are expected even between high-quality assemblies such as these, we found that a staggering 40% of the genes (>9,500) varied significantly between assemblies, due in part to the availability of new gene evidence but primarily to genome mis-assembly events and local sequence variations. For instance, although the later assembly is generally superior, 660 protein coding genes in the earlier assembly are entirely missing from the later genome's annotation, and approximately 3,600 (15%) of the genes have complex structural differences between the two assemblies. In addition, 12-20% of the predicted proteins in both assemblies have relatively large sequence differences when compared to their RefSeq models, and 6-15% of bovine dbSNP records are unrecoverable in the two assemblies. Our findings highlight the consequences of genome assembly quality on gene and SNP annotation and argue for continued improvements in any draft genome sequence. We also found that tracking a gene between different assemblies of the same genome is surprisingly difficult, due to the numerous changes, both small and large, that occur in some genes. As a side benefit, our analyses helped us identify many specific loci for improvement in the Bos taurus genome assembly.

  11. Comparative genomic analysis of equilibrative nucleoside transporters suggests conserved protein structure despite limited sequence identity.

    PubMed

    Sankar, Narendra; Machado, Jerry; Abdulla, Parween; Hilliker, Arthur J; Coe, Imogen R

    2002-10-15

    Equilibrative nucleoside transporters (ENTs) are a recently characterized and poorly understood group of membrane proteins that are important in the uptake of endogenous nucleosides required for nucleic acid and nucleoside triphosphate synthesis. Despite their central importance in cellular metabolism and nucleoside analog chemotherapy, no human ENT gene has been described and nothing is known about gene structure and function. To gain insight into the ENT gene family, we used experimental and in silico comparative genomic approaches to identify ENT genes in three evolutionarily diverse organisms with completely (or almost completely) sequenced genomes, Homo sapiens, Caenorhabditis elegans and Drosophila melanogaster. We describe the chromosomal location, the predicted ENT gene structure and putative structural topologies of predicted ENT proteins derived from the open reading frames. Despite variations in genomic layout and limited ortholog protein sequence identity (< or =27.45%), predicted topologies of ENT proteins are strikingly similar, suggesting an evolutionary conservation of a prototypic structure. In addition, a similar distribution of protein domains on exons is apparent in all three taxa. These data demonstrate that comparative sequence analyses should be combined with other approaches (such as genomic and proteomic analyses) to fully understand structure, function and evolution of protein families.

  12. Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

    PubMed Central

    Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinIzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian

    2013-01-01

    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five percent of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25× higher than those between inbred lines and 50× lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP–encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence. PMID:23357949

  13. Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

    SciTech Connect

    Condon, Bradford J.; Leng, Yueqiang; Wu, Dongliang; Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinlzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian

    2013-01-24

    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25 higher than those between inbred lines and 50 lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.

  14. Genome-Wide Architecture of Disease Resistance Genes in Lettuce.

    PubMed

    Christopoulou, Marilena; Wo, Sebastian Reyes-Chin; Kozik, Alex; McHale, Leah K; Truco, Maria-Jose; Wroblewski, Tadeusz; Michelmore, Richard W

    2015-10-08

    Genome-wide motif searches identified 1134 genes in the lettuce reference genome of cv. Salinas that are potentially involved in pathogen recognition, of which 385 were predicted to encode nucleotide binding-leucine rich repeat receptor (NLR) proteins. Using a maximum-likelihood approach, we grouped the NLRs into 25 multigene families and 17 singletons. Forty-one percent of these NLR-encoding genes belong to three families, the largest being RGC16 with 62 genes in cv. Salinas. The majority of NLR-encoding genes are located in five major resistance clusters (MRCs) on chromosomes 1, 2, 3, 4, and 8 and cosegregate with multiple disease resistance phenotypes. Most MRCs contain primarily members of a single NLR gene family but a few are more complex. MRC2 spans 73 Mb and contains 61 NLRs of six different gene families that cosegregate with nine disease resistance phenotypes. MRC3, which is 25 Mb, contains 22 RGC21 genes and colocates with Dm13. A library of 33 transgenic RNA interference tester stocks was generated for functional analysis of NLR-encoding genes that cosegregated with disease resistance phenotypes in each of the MRCs. Members of four NLR-encoding families, RGC1, RGC2, RGC21, and RGC12 were shown to be required for 16 disease resistance phenotypes in lettuce. The general composition of MRCs is conserved across different genotypes; however, the specific repertoire of NLR-encoding genes varied particularly of the rapidly evolving Type I genes. These tester stocks are valuable resources for future analyses of additional resistance phenotypes.

  15. Genomic analyses of bacterial porin-cytochrome gene clusters

    DOE PAGES

    Shi, Liang; Fredrickson, James K.; Zachara, John M.

    2014-11-26

    In this study, the porin-cytochrome (Pcc) protein complex is responsible for trans-outer membrane electron transfer during extracellular reduction of Fe(III) by the dissimilatory metal-reducing bacterium Geobacter sulfurreducens PCA. The identified and characterized Pcc complex of G. sulfurreducens PCA consists of a porin-like outer-membrane protein, a periplasmic 8-heme c type cytochrome (c-Cyt) and an outer-membrane 12-heme c-Cyt, and the genes encoding the Pcc proteins are clustered in the same regions of genome (i.e., the pcc gene clusters) of G. sulfurreducens PCA. A survey of additionally microbial genomes has identified the pcc gene clusters in all sequenced Geobacter spp. and other bacteriamore » from six different phyla, including Anaeromyxobacter dehalogenans 2CP-1, A. dehalogenans 2CP-C, Anaeromyxobacter sp. K, Candidatus Kuenenia stuttgartiensis, Denitrovibrio acetiphilus DSM 12809, Desulfurispirillum indicum S5, Desulfurivibrio alkaliphilus AHT2, Desulfurobacterium thermolithotrophum DSM 11699, Desulfuromonas acetoxidans DSM 684, Ignavibacterium album JCM 16511, and Thermovibrio ammonificans HB-1. The numbers of genes in the pcc gene clusters vary, ranging from two to nine. Similar to the metal-reducing (Mtr) gene clusters of other Fe(III)-reducing bacteria, such as Shewanella spp., additional genes that encode putative c-Cyts with predicted cellular localizations at the cytoplasmic membrane, periplasm and outer membrane often associate with the pcc gene clusters. This suggests that the Pcc-associated c-Cyts may be part of the pathways for extracellular electron transfer reactions. The presence of pcc gene clusters in the microorganisms that do not reduce solid-phase Fe(III) and Mn(IV) oxides, such as D. alkaliphilus AHT2 and I. album JCM 16511, also suggests that some of the pcc gene clusters may be involved in extracellular electron transfer reactions with the substrates other than Fe(III) and Mn(IV) oxides.« less

  16. Population and Functional Genomics of Neisseria Revealed with Gene-by-Gene Approaches

    PubMed Central

    Harrison, Odile B.

    2016-01-01

    Rapid low-cost whole-genome sequencing (WGS) is revolutionizing microbiology; however, complementary advances in accessible, reproducible, and rapid analysis techniques are required to realize the potential of these data. Here, investigations of the genus Neisseria illustrated the gene-by-gene conceptual approach to the organization and analysis of WGS data. Using the gene and its link to phenotype as a starting point, the BIGSdb database, which powers the PubMLST databases, enables the assembly of large open-access collections of annotated genomes that provide insight into the evolution of the Neisseria, the epidemiology of meningococcal and gonococcal disease, and mechanisms of Neisseria pathogenicity. PMID:27098959

  17. Systematically fragmented genes in a multipartite mitochondrial genome

    PubMed Central

    Vlcek, Cestmir; Marande, William; Teijeiro, Shona; Lukeš, Julius; Burger, Gertraud

    2011-01-01

    Arguably, the most bizarre mitochondrial DNA (mtDNA) is that of the euglenozoan eukaryote Diplonema papillatum. The genome consists of numerous small circular chromosomes none of which appears to encode a complete gene. For instance, the cox1 coding sequence is spread out over nine different chromosomes in non-overlapping pieces (modules), which are transcribed separately and joined to a contiguous mRNA by trans-splicing. Here, we examine how many genes are encoded by Diplonema mtDNA and whether all are fragmented and their transcripts trans-spliced. Module identification is challenging due to the sequence divergence of Diplonema mitochondrial genes. By employing most sensitive protein profile search algorithms and comparing genomic with cDNA sequence, we recognize a total of 11 typical mitochondrial genes. The 10 protein-coding genes are systematically chopped up into three to 12 modules of 60–350 bp length. The corresponding mRNAs are all trans-spliced. Identification of ribosomal RNAs is most difficult. So far, we only detect the 3′-module of the large subunit ribosomal RNA (rRNA); it does not trans-splice with other pieces. The small subunit rRNA gene remains elusive. Our results open new intriguing questions about the biochemistry and evolution of mitochondrial trans-splicing in Diplonema. PMID:20935050

  18. Stability domains of actin genes and genomic evolution

    NASA Astrophysics Data System (ADS)

    Carlon, E.; Dkhissi, A.; Malki, M. Lejard; Blossey, R.

    2007-11-01

    In eukaryotic genes, the protein coding sequence is split into several fragments, the exons, separated by noncoding DNA stretches, the introns. Prokaryotes do not have introns in their genomes. We report calculations of the stability domains of actin genes for various organisms in the animal, plant, and fungi kingdoms. Actin genes have been chosen because they have been highly conserved during evolution. In these genes, all introns were removed so as to mimic ancient genes at the time of the early eukaryotic development, i.e., before intron insertion. Common stability boundaries are found in evolutionarily distant organisms, which implies that these boundaries date from the early origin of eukaryotes. In general, the boundaries correspond with intron positions in the actins of vertebrates and other animals, but not much for plants and fungi. The sharpest boundary is found in a locus where fungi, algae, and animals have introns in positions separated by one nucleotide only, which identifies a hot spot for insertion. These results suggest that some introns may have been incorporated into the genomes through a thermodynamically driven mechanism, in agreement with previous observations on human genes. They also suggest a different mechanism for intron insertion in plants and animals.

  19. Comparative genomics of Neisseria meningitidis: core genome, islands of horizontal transfer and pathogen-specific genes.

    PubMed

    Dunning Hotopp, Julie C; Grifantini, Renata; Kumar, Nikhil; Tzeng, Yih Ling; Fouts, Derrick; Frigimelica, Elisabetta; Draghi, Monia; Giuliani, Marzia Monica; Rappuoli, Rino; Stephens, David S; Grandi, Guido; Tettelin, Hervé

    2006-12-01

    To better understand Neisseria meningitidis genomes and virulence, microarray comparative genome hybridization (mCGH) data were collected from one Neisseria cinerea, two Neisseria lactamica, two Neisseria gonorrhoeae and 48 Neisseria meningitidis isolates. For N. meningitidis, these isolates are from diverse clonal complexes, invasive and carriage strains, and all major serogroups. The microarray platform represented N. meningitidis strains MC58, Z2491 and FAM18, and N. gonorrhoeae FA1090. By comparing hybridization data to genome sequences, the core N. meningitidis genome and insertions/deletions (e.g. capsule locus, type I secretion system) related to pathogenicity were identified, including further characterization of the capsule locus, bioinformatics analysis of a type I secretion system, and identification of some metabolic pathways associated with intracellular survival in pathogens. Hybridization data clustered meningococcal isolates from similar clonal complexes that were distinguished by the differential presence of six distinct islands of horizontal transfer. Several of these islands contained prophage or other mobile elements, including a novel prophage and a transposon carrying portions of a type I secretion system. Acquisition of some genetic islands appears to have occurred in multiple lineages, including transfer between N. lactamica and N. meningitidis. However, island acquisition occurs infrequently, such that the genomic-level relationship is not obscured within clonal complexes. The N. meningitidis genome is characterized by the horizontal acquisition of multiple genetic islands; the study of these islands reveals important sets of genes varying between isolates and likely to be related to pathogenicity.

  20. Corynebacterium diphtheriae: genome diversity, population structure and genotyping perspectives.

    PubMed

    Mokrousov, Igor

    2009-01-01

    The epidemic re-emergence of diphtheria in Russia and the Newly Independent States (NIS) of the former Soviet Union in the 1990s demonstrated the continued threat of this thought to be rare disease. The bacteriophage encoded toxin is a main virulence factor of Corynebacterium diphtheriae, however, an analysis of the first complete genome sequence of C. diphtheriae revealed a recent acquisition of other pathogenicity factors including iron-uptake systems, adhesins and fimbrial proteins as indeed this extracellular pathogen has more possibilities for lateral gene transfer than, e.g., its close relative, mainly intracellular Mycobacterium tuberculosis. C. diphtheriae appears to have a phylogeographical structure mainly represented by area-specific variants whose circulation is under strong influence of human host factors, including health control measures, first of all, vaccination, and social economic conditions. This framework core population structure may be challenged by importation of the endemic and eventually toxigenic strains from new areas thus leading to localized or large epidemics caused directly by imported strains or by bacteriophage-lysogenized indigenous strains converted into toxin production. A feature of C. diphtheriae co-existence with humans is its periodicity: following large epidemic in the 1990s, the present period is marked by increasing heterogeneity of the circulating populations whereas re-emergence of new toxigenic variants along with persistent circulation of invasive non-toxigenic strains appear alarming. To identify and rapidly monitor subtle changes in the genome structure at an infraclonal level during and between epidemics, portable and discriminatory typing methods of C. diphtheriae are still needed. In this view, CRISPRs and minisatellites are promising genomic markers for development of high-resolution typing schemes and databasing of C. diphtheriae.

  1. The banana E2 gene family: Genomic identification, characterization, expression profiling analysis.

    PubMed

    Dong, Chen; Hu, Huigang; Jue, Dengwei; Zhao, Qiufang; Chen, Hongliang; Xie, Jianghui; Jia, Liqiang

    2016-04-01

    The E2 is at the center of a cascade of Ub1 transfers, and it links activation of the Ub1 by E1 to its eventual E3-catalyzed attachment to substrate. Although the genome-wide analysis of this family has been performed in some species, little is known about analysis of E2 genes in banana. In this study, 74 E2 genes of banana were identified and phylogenetically clustered into thirteen subgroups. The predicted banana E2 genes were distributed across all 11 chromosomes at different densities. Additionally, the E2 domain, gene structure and motif compositions were analyzed. The expression of all of the banana E2 genes was analyzed in the root, stem, leaf, flower organs, five stages of fruit development and under abiotic stresses. All of the banana E2 genes, with the exception of few genes in each group, were expressed in at least one of the organs and fruit developments, which indicated that the E2 genes might involve in various aspects of the physiological and developmental processes of the banana. Quantitative RT-PCR (qRT-PCR) analysis identified that 45 E2s under drought and 33 E2s under salt were induced. To the best of our knowledge, this report describes the first genome-wide analysis of the banana E2 gene family, and the results should provide valuable information for understanding the classification, cloning and putative functions of this family.

  2. Inference of gene regulatory networks from genome-wide knockout fitness data

    PubMed Central

    Wang, Liming; Wang, Xiaodong; Arkin, Adam P.; Samoilov, Michael S.

    2013-01-01

    Motivation: Genome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A state–space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrR–LiuR network in bacteria Shewanella oneidensis. Availability: MATLAB code and datasets are available to download at http://www.duke.edu/∼lw174/Fitness.zip and http://genomics.lbl.gov/supplemental/fitness-bioinf/ Contact: wangx@ee.columbia.edu or mssamoilov@lbl.gov Supplementary information

  3. Genome-Wide Analysis of Homeobox Gene Family in Legumes: Identification, Gene Duplication and Expression Profiling

    PubMed Central

    Garg, Rohini; Jain, Mukesh

    2015-01-01

    Homeobox genes encode transcription factors that are known to play a major role in different aspects of plant growth and development. In the present study, we identified homeobox genes belonging to 14 different classes in five legume species, including chickpea, soybean, Medicago, Lotus and pigeonpea. The characteristic differences within homeodomain sequences among various classes of homeobox gene family were quite evident. Genome-wide expression analysis using publicly available datasets (RNA-seq and microarray) indicated that homeobox genes are differentially expressed in various tissues/developmental stages and under stress conditions in different legumes. We validated the differential expression of selected chickpea homeobox genes via quantitative reverse transcription polymerase chain reaction. Genome duplication analysis in soybean indicated that segmental duplication has significantly contributed in the expansion of homeobox gene family. The Ka/Ks ratio of duplicated homeobox genes in soybean showed that several members of this family have undergone purifying selection. Moreover, expression profiling indicated that duplicated genes might have been retained due to sub-functionalization. The genome-wide identification and comprehensive gene expression profiling of homeobox gene family members in legumes will provide opportunities for functional analysis to unravel their exact role in plant growth and development. PMID:25745864

  4. Genome-wide analysis of homeobox gene family in legumes: identification, gene duplication and expression profiling.

    PubMed

    Bhattacharjee, Annapurna; Ghangal, Rajesh; Garg, Rohini; Jain, Mukesh

    2015-01-01

    Homeobox genes encode transcription factors that are known to play a major role in different aspects of plant growth and development. In the present study, we identified homeobox genes belonging to 14 different classes in five legume species, including chickpea, soybean, Medicago, Lotus and pigeonpea. The characteristic differences within homeodomain sequences among various classes of homeobox gene family were quite evident. Genome-wide expression analysis using publicly available datasets (RNA-seq and microarray) indicated that homeobox genes are differentially expressed in various tissues/developmental stages and under stress conditions in different legumes. We validated the differential expression of selected chickpea homeobox genes via quantitative reverse transcription polymerase chain reaction. Genome duplication analysis in soybean indicated that segmental duplication has significantly contributed in the expansion of homeobox gene family. The Ka/Ks ratio of duplicated homeobox genes in soybean showed that several members of this family have undergone purifying selection. Moreover, expression profiling indicated that duplicated genes might have been retained due to sub-functionalization. The genome-wide identification and comprehensive gene expression profiling of homeobox gene family members in legumes will provide opportunities for functional analysis to unravel their exact role in plant growth and development.

  5. Convergent gene loss following gene and genome duplications creates single-copy families in flowering plants.

    PubMed

    De Smet, Riet; Adams, Keith L; Vandepoele, Klaas; Van Montagu, Marc C E; Maere, Steven; Van de Peer, Yves

    2013-02-19

    The importance of gene gain through duplication has long been appreciated. In contrast, the importance of gene loss has only recently attracted attention. Indeed, studies in organisms ranging from plants to worms and humans suggest that duplication of some genes might be better tolerated than that of others. Here we have undertaken a large-scale study to investigate the existence of duplication-resistant genes in the sequenced genomes of 20 flowering plants. We demonstrate that there is a large set of genes that is convergently restored to single-copy status following multiple genome-wide and smaller scale duplication events. We rule out the possibility that such a pattern could be explained by random gene loss only and therefore propose that there is selection pressure to preserve such genes as singletons. This is further substantiated by the observation that angiosperm single-copy genes do not comprise a random fraction of the genome, but instead are often involved in essential housekeeping functions that are highly conserved across all eukaryotes. Furthermore, single-copy genes are generally expressed more highly and in more tissues than non-single-copy genes, and they exhibit higher sequence conservation. Finally, we propose different hypotheses to explain their resistance against duplication.

  6. Horizontally transferred genes in the genome of Pacific white shrimp, Litopenaeus vannamei

    PubMed Central

    2013-01-01

    Background In recent years, as the development of next-generation sequencing technology, a growing number of genes have been reported as being horizontally transferred from prokaryotes to eukaryotes, most of them involving arthropods. As a member of the phylum Arthropoda, the Pacific white shrimp Litopenaeus vannamei has to adapt to the complex water environments with various symbiotic or parasitic microorganisms, which provide a platform for horizontal gene transfer (HGT). Results In this study, we analyzed the genome-wide HGT events in L. vannamei. Through homology search and phylogenetic analysis, followed by experimental PCR confirmation, 14 genes with HGT event were identified: 12 of them were transferred from bacteria and two from fungi. Structure analysis of these genes showed that the introns of the two fungi-originated genes were substituted by shrimp DNA fragment, two genes transferred from bacteria had shrimp specific introns inserted in them. Furthermore, around other three bacteria-originated genes, there were three large DNA segments inserted into the shrimp genome. One segment was a transposon that fully transferred, and the other two segments contained only coding regions of bacteria. Functional prediction of these 14 genes showed that 6 of them might be related to energy metabolism, and 4 others related to defense of the organism. Conclusions HGT events from bacteria or fungi were happened in the genome of L. vannamei, and these horizontally transferred genes can be transcribed in shrimp. This is the first time to report the existence of horizontally transferred genes in shrimp. Importantly, most of these genes are exposed to a negative selection pressure and appeared to be functional. PMID:23914989

  7. Analysis of CATMA transcriptome data identifies hundreds of novel functional genes and improves gene models in the Arabidopsis genome

    PubMed Central

    Aubourg, Sébastien; Martin-Magniette, Marie-Laure; Brunaud, Véronique; Taconnat, Ludivine; Bitton, Frédérique; Balzergue, Sandrine; Jullien, Pauline E; Ingouff, Mathieu; Thareau, Vincent; Schiex, Thomas; Lecharny, Alain; Renou, Jean-Pierre

    2007-01-01

    Background Since the finishing of the sequencing of the Arabidopsis thaliana genome, the Arabidopsis community and the annotator centers have been working on the improvement of gene annotation at the structural and functional levels. In this context, we have used the large CATMA resource on the Arabidopsis transcriptome to search for genes missed by different annotation processes. Probes on the CATMA microarrays are specific gene sequence tags (GSTs) based on the CDS models predicted by the Eugene software. Among the 24 576 CATMA v2 GSTs, 677 are in regions considered as intergenic by the TAIR annotation. We analyzed the cognate transcriptome data in the CATMA resource and carried out data-mining to characterize novel genes and improve gene models. Results The statistical analysis of the results of more than 500 hybridized samples distributed among 12 organs provides an experimental validation for 465 novel genes. The hybridization evidence was confirmed by RT-PCR approaches for 88% of the 465 novel genes. Comparisons with the current annotation show that these novel genes often encode small proteins, with an average size of 137 aa. Our approach has also led to the improvement of pre-existing gene models through both the extension of 16 CDS and the identification of 13 gene models erroneously constituted of two merged CDS. Conclusion This work is a noticeable step forward in the improvement of the Arabidopsis genome annotation. We increased the number of Arabidopsis validated genes by 465 novel transcribed genes to which we associated several functional annotations such as expression profiles, sequence conservation in plants, cognate transcripts and protein motifs. PMID:17980019

  8. Genome-Wide Analysis and Characterization of Aux/IAA Family Genes in Brassica rapa

    PubMed Central

    Rameneni, Jana Jeevan; Li, Xiaonan; Sivanandhan, Ganesan; Choi, Su Ryun; Pang, Wenxing; Im, Subin; Lim, Yong Pyo

    2016-01-01

    Auxins are the key players in plant growth development involving leaf formation, phototropism, root, fruit and embryo development. Auxin/Indole-3-Acetic Acid (Aux/IAA) are early auxin response genes noted as transcriptional repressors in plant auxin signaling. However, many studies focus on Aux/ARF gene families and much less is known about the Aux/IAA gene family in Brassica rapa (B. rapa). Here we performed a comprehensive genome-wide analysis and identified 55 Aux/IAA genes in B. rapa using four conserved motifs of Aux/IAA family (PF02309). Chromosomal mapping of the B. rapa Aux/IAA (BrIAA) genes facilitated understanding cluster rearrangement of the crucifer building blocks in the genome. Phylogenetic analysis of BrIAA with Arabidopsis thaliana, Oryza sativa and Zea mays identified 51 sister pairs including 15 same species (BrIAA—BrIAA) and 36 cross species (BrIAA—AtIAA) IAA genes. Among the 55 BrIAA genes, expression of 43 and 45 genes were verified using Genebank B. rapa ESTs and in home developed microarray data from mature leaves of Chiifu and RcBr lines. Despite their huge morphological difference, tissue specific expression analysis of BrIAA genes between the parental lines Chiifu and RcBr showed that the genes followed a similar pattern of expression during leaf development and a different pattern during bud, flower and siliqua development stages. The response of the BrIAA genes to abiotic and auxin stress at different time intervals revealed their involvement in stress response. Single Nucleotide Polymorphisms between IAA genes of reference genome Chiifu and RcBr were focused and identified. Our study examines the scope of conservation and divergence of Aux/IAA genes and their structures in B. rapa. Analyzing the expression and structural variation between two parental lines will significantly contribute to functional genomics of Brassica crops and we belive our study would provide a foundation in understanding the Aux/IAA genes in B. rapa. PMID

  9. Genome-Wide Analysis and Characterization of Aux/IAA Family Genes in Brassica rapa.

    PubMed

    Paul, Parameswari; Dhandapani, Vignesh; Rameneni, Jana Jeevan; Li, Xiaonan; Sivanandhan, Ganesan; Choi, Su Ryun; Pang, Wenxing; Im, Subin; Lim, Yong Pyo

    2016-01-01

    Auxins are the key players in plant growth development involving leaf formation, phototropism, root, fruit and embryo development. Auxin/Indole-3-Acetic Acid (Aux/IAA) are early auxin response genes noted as transcriptional repressors in plant auxin signaling. However, many studies focus on Aux/ARF gene families and much less is known about the Aux/IAA gene family in Brassica rapa (B. rapa). Here we performed a comprehensive genome-wide analysis and identified 55 Aux/IAA genes in B. rapa using four conserved motifs of Aux/IAA family (PF02309). Chromosomal mapping of the B. rapa Aux/IAA (BrIAA) genes facilitated understanding cluster rearrangement of the crucifer building blocks in the genome. Phylogenetic analysis of BrIAA with Arabidopsis thaliana, Oryza sativa and Zea mays identified 51 sister pairs including 15 same species (BrIAA-BrIAA) and 36 cross species (BrIAA-AtIAA) IAA genes. Among the 55 BrIAA genes, expression of 43 and 45 genes were verified using Genebank B. rapa ESTs and in home developed microarray data from mature leaves of Chiifu and RcBr lines. Despite their huge morphological difference, tissue specific expression analysis of BrIAA genes between the parental lines Chiifu and RcBr showed that the genes followed a similar pattern of expression during leaf development and a different pattern during bud, flower and siliqua development stages. The response of the BrIAA genes to abiotic and auxin stress at different time intervals revealed their involvement in stress response. Single Nucleotide Polymorphisms between IAA genes of reference genome Chiifu and RcBr were focused and identified. Our study examines the scope of conservation and divergence of Aux/IAA genes and their structures in B. rapa. Analyzing the expression and structural variation between two parental lines will significantly contribute to functional genomics of Brassica crops and we belive our study would provide a foundation in understanding the Aux/IAA genes in B. rapa.

  10. Genome-wide analysis of the R2R3-MYB transcription factor gene family in sweet orange (Citrus sinensis).

    PubMed

    Liu, Chaoyang; Wang, Xia; Xu, Yuantao; Deng, Xiuxin; Xu, Qiang

    2014-10-01

    MYB transcription factor represents one of the largest gene families in plant genomes. Sweet orange (Citrus sinensis) is one of the most important fruit crops worldwide, and recently the genome has been sequenced. This provides an opportunity to investigate the organization and evolutionary characteristics of sweet orange MYB genes from whole genome view. In the present study, we identified 100 R2R3-MYB genes in the sweet orange genome. A comprehensive analysis of this gene family was performed, including the phylogeny, gene structure, chromosomal localization and expression pattern analyses. The 100 genes were divided into 29 subfamilies based on the sequence similarity and phylogeny, and the classification was also well supported by the highly conserved exon/intron structures and motif composition. The phylogenomic comparison of MYB gene family among sweet orange and related plant species, Arabidopsis, cacao and papaya suggested the existence of functional divergence during evolution. Expression profiling indicated that sweet orange R2R3-MYB genes exhibited distinct temporal and spatial expression patterns. Our analysis suggested that the sweet orange MYB genes may play important roles in different plant biological processes, some of which may be potentially involved in citrus fruit quality. These results will be useful for future functional analysis of the MYB gene family in sweet orange.

  11. Rapid and efficient genome-wide characterization of Xanthomonas TAL effector genes

    PubMed Central

    Yu, Yan-Hua; Lu, Ye; He, Yong-Qiang; Huang, Sheng; Tang, Ji-Liang

    2015-01-01

    Xanthomonas TALE transcriptional activators act as virulence or avirulence factors by activating host disease susceptibility or resistance genes. Their specificity is determined by a tandem repeat domain. Some Xanthomonas pathogens contain 10–30 TALEs per strain. Although TALEs play critical roles in pathogenesis, their studies have so far been limited to a few examples, due to their highly repetitive gene structure and extreme similarity among different members, which constrict sequencing and assembling. To facilitate TALE studies, we developed an efficient and rapid pipeline for genome-wide cloning of tal genes as many as possible from a strain. Here, we report the pipeline and its use to identify all 18 tal genes from a newly isolated strain of the rice pathogen Xathomonas oryzae. Target prediction revealed a number of potential rice targets including several notable genes such as genes encoding SWEET, WRKY, Hen1, and BAK1 proteins, which provide candidates for further experimental functional analysis of the TALEs. PMID:26271455

  12. The ankyrin repeat gene family in rice: genome-wide identification, classification and expression profiling.

    PubMed

    Huang, Jianyan; Zhao, Xiaobo; Yu, Huihui; Ouyang, Yidan; Wang, Lei; Zhang, Qifa

    2009-10-01

    Ankyrin repeat (ANK) containing proteins comprise a large protein family. Although many members of this family have been implicated in plant growth, development and signal transduction, only a few ANK genes have been reported in rice. In this study, we analyzed the structures, phylogenetic relationship, genome localizations and expression profiles of 175 ankyrin repeat genes identified in rice (OsANK). Domain composition analysis suggested OsANK proteins can be classified into ten subfamilies. Chromosomal localizations of OsANK genes indicated nine segmental duplication events involving 17 genes and 65 OsANK genes were involved in tandem duplications. The expression profiles of 158 OsANK genes were analyzed in 24 tissues covering the whole life cycle of two rice genotypes, Minghui 63 and Zhenshan 97. Sixteen genes showed preferential expression in given tissues compared to all the other tissues in Minghui 63 and Zhenshan 97. Nine genes were preferentially expressed in stamen of 1 day before flowering, suggesting that these genes may play important roles in pollination and fertilization. Expression data of OsANK genes were also obtained with tissues of seedlings subjected to three phytohormone (NAA, GA3 and KT) and light/dark treatments. Eighteen genes showed differential expression with at least one phytohormone treatment while under light/dark treatments, 13 OsANK genes showed differential expression. Our data provided a very useful reference for cloning and functional analysis of members of this gene family in rice.

  13. Transport genes and chemotaxis in Laribacter hongkongensis: a genome-wide analysis

    PubMed Central

    2011-01-01

    Background Laribacter hongkongensis is a Gram-negative, sea gull-shaped rod associated with community-acquired gastroenteritis. The bacterium has been found in diverse freshwater environments including fish, frogs and drinking water reservoirs. Using the complete genome sequence data of L. hongkongensis, we performed a comprehensive analysis of putative transport-related genes and genes related to chemotaxis, motility and quorum sensing, which may help the bacterium adapt to the changing environments and combat harmful substances. Results A genome-wide analysis using Transport Classification Database TCDB, similarity and keyword searches revealed the presence of a large diversity of transporters (n = 457) and genes related to chemotaxis (n = 52) and flagellar biosynthesis (n = 40) in the L. hongkongensis genome. The transporters included those from all seven major transporter categories, which may allow the uptake of essential nutrients or ions, and extrusion of metabolic end products and hazardous substances. L. hongkongensis is unique among closely related members of Neisseriaceae family in possessing higher number of proteins related to transport of ammonium, urea and dicarboxylate, which may reflect the importance of nitrogen and dicarboxylate metabolism in this assacharolytic bacterium. Structural modeling of two C4-dicarboxylate transporters showed that they possessed similar structures to the determined structures of other DctP-TRAP transporters, with one having an unusual disulfide bond. Diverse mechanisms for iron transport, including hemin transporters for iron acquisition from host proteins, were also identified. In addition to the chemotaxis and flagella-related genes, the L. hongkongensis genome also contained two copies of qseB/qseC homologues of the AI-3 quorum sensing system. Conclusions The large number of diverse transporters and genes involved in chemotaxis, motility and quorum sensing suggested that the bacterium may utilize a complex system to

  14. Child Development and Structural Variation in the Human Genome

    ERIC Educational Resources Information Center

    Zhang, Ying; Haraksingh, Rajini; Grubert, Fabian; Abyzov, Alexej; Gerstein, Mark; Weissman, Sherman; Urban, Alexander E.

    2013-01-01

    Structural variation of the human genome sequence is the insertion, deletion, or rearrangement of stretches of DNA sequence sized from around 1,000 to millions of base pairs. Over the past few years, structural variation has been shown to be far more common in human genomes than previously thought. Very little is currently known about the effects…

  15. Complete sequence and gene organization of the mitochondrial genome of the land snail Albinaria coerulea.

    PubMed

    Hatzoglou, E; Rodakis, G C; Lecanidou, R

    1995-08-01

    The complete sequence (14,130 bp) of the mitochondrial DNA (mtDNA) of the land snail Albinaria coerulea was determined. It contains 13 protein, two rRNA and 22 tRNA genes. Twenty-four of these genes are encoded by one and 13 genes by the other strand. The gene arrangement shares almost no similarities with that of two other molluscs for which the complete gene content and arrangement are known, the bivalve Mytilus edulis and the chiton Katharina tunicata; the protein and rRNA gene order is similar to that of another terrestrial gastropod, Cepaea nemoralis. Unusual features include the following: (1) the absence of lengthy noncoding regions (there are only 141 intergenic nucleotides interspersed at different gene borders, the longest intergenic sequence being 42 nucleotides) (2) the presence of several overlapping genes (mostly tRNAs), (3) the presence of tRNA-like structures and other stem and loop structures within genes. An RNA editing system acting on tRNAs must necessarily be invoked for posttranscriptional extension of the overlapping tRNAs. Due to these features, and also because of the small size of its genes (e.g., it contains the smallest rRNA genes among the known coelomates), it is one of the most compact mitochondrial genomes known to date.

  16. Complete Sequence and Gene Organization of the Mitochondrial Genome of the Land Snail Albinaria Coerulea

    PubMed Central

    Hatzoglou, E.; Rodakis, G. C.; Lecanidou, R.

    1995-01-01

    The complete sequence (14,130 bp) of the mitochondrial DNA (mtDNA) of the land snail Albinaria coerulea was determined. It contains 13 protein, two rRNA and 22 tRNA genes. Twenty-four of these genes are encoded by one and 13 genes by the other strand. The gene arrangement shares almost no similarities with that of two other molluscs for which the complete gene content and arrangement are known, the bivalve Mytilus edulis and the chiton Katharina tunicata; the protein and rRNA gene order is similar to that of another terrestrial gastropod, Cepaea nemoralis. Unusual features include the following: (1) the absence of lengthy noncoding regions (there are only 141 intergenic nucleotides interspersed at different gene borders, the longest intergenic sequence being 42 nucleotides), (2) the presence of several overlapping genes (mostly tRNAs), (3) the presence of tRNA-like structures and other stem and loop structures within genes. An RNA editing system acting on tRNAs must necessarily be invoked for posttranscriptional extension of the overlapping tRNAs. Due to these features, and also because of the small size of its genes (e.g., it contains the smallest rRNA genes among the known coelomates), it is one of the most compact mitochondrial genomes known to date. PMID:7498775

  17. Comparative genomics of mitochondria in chlorarachniophyte algae: endosymbiotic gene transfer and organellar genome dynamics.

    PubMed

    Tanifuji, Goro; Archibald, John M; Hashimoto, Tetsuo

    2016-02-18

    Chlorarachniophyte algae possess four DNA-containing compartments per cell, the nucleus, mitochondrion, plastid and nucleomorph, the latter being a relic nucleus derived from a secondary endosymbiont. While the evolutionary dynamics of plastid and nucleomorph genomes have been investigated, a comparative investigation of mitochondrial genomes (mtDNAs) has not been carried out. We have sequenced the complete mtDNA of Lotharella oceanica and compared it to that of another chlorarachniophyte, Bigelowiella natans. The linear mtDNA of L. oceanica is 36.7 kbp in size and contains 35 protein genes, three rRNAs and 24 tRNAs. The codons GUG and UUG appear to be capable of acting as initiation codons in the chlorarachniophyte mtDNAs, in addition to AUG. Rpl16, rps4 and atp8 genes are missing in L.oceanica mtDNA, despite being present in B. natans mtDNA. We searched for, and found, mitochondrial rpl16 and rps4 genes with spliceosomal introns in the L. oceanica nuclear genome, indicating that mitochondrion-to-host-nucleus gene transfer occurred after the divergence of these two genera. Despite being of similar size and coding capacity, the level of synteny between L. oceanica and B. natans mtDNA is low, suggesting frequent rearrangements. Overall, our results suggest that chlorarachniophyte mtDNAs are more evolutionarily dynamic than their plastid counterparts.

  18. Comparative genomics of mitochondria in chlorarachniophyte algae: endosymbiotic gene transfer and organellar genome dynamics

    NASA Astrophysics Data System (ADS)

    Tanifuji, Goro; Archibald, John M.; Hashimoto, Tetsuo

    2016-02-01

    Chlorarachniophyte algae possess four DNA-containing compartments per cell, the nucleus, mitochondrion, plastid and nucleomorph, the latter being a relic nucleus derived from a secondary endosymbiont. While the evolutionary dynamics of plastid and nucleomorph genomes have been investigated, a comparative investigation of mitochondrial genomes (mtDNAs) has not been carried out. We have sequenced the complete mtDNA of Lotharella oceanica and compared it to that of another chlorarachniophyte, Bigelowiella natans. The linear mtDNA of L. oceanica is 36.7 kbp in size and contains 35 protein genes, three rRNAs and 24 tRNAs. The codons GUG and UUG appear to be capable of acting as initiation codons in the chlorarachniophyte mtDNAs, in addition to AUG. Rpl16, rps4 and atp8 genes are missing in L.oceanica mtDNA, despite being present in B. natans mtDNA. We searched for, and found, mitochondrial rpl16 and rps4 genes with spliceosomal introns in the L. oceanica nuclear genome, indicating that mitochondrion-to-host-nucleus gene transfer occurred after the divergence of these two genera. Despite being of similar size and coding capacity, the level of synteny between L. oceanica and B. natans mtDNA is low, suggesting frequent rearrangements. Overall, our results suggest that chlorarachniophyte mtDNAs are more evolutionarily dynamic than their plastid counterparts.

  19. Identification of a genomic reservoir for new TRIM genes in primate genomes.

    PubMed

    Han, Kyudong; Lou, Dianne I; Sawyer, Sara L

    2011-12-01

    Tripartite Motif (TRIM) ubiquitin ligases act in the innate immune response against viruses. One of the best characterized members of this family, TRIM5α, serves as a potent retroviral restriction factor with activity against HIV. Here, we characterize what are likely to be the youngest TRIM genes in the human genome. For instance, we have identified 11 TRIM genes that are specific to humans and African apes (chimpanzees, bonobos, and gorillas) and another 7 that are human-specific. Many of these young genes have never been described, and their identification brings the total number of known human TRIM genes to approximately 100. These genes were acquired through segmental duplications, most of which originated from a single locus on chromosome 11. Another polymorphic duplication of this locus has resulted in these genes being copy number variable within the human population, with a Han Chinese woman identified as having 12 additional copies of these TRIM genes compared to other individuals screened in this study. Recently, this locus was annotated as one of 34 "hotspot" regions that are also copy number variable in the genomes of chimpanzees and rhesus macaques. Most of the young TRIM genes originating from this locus are expressed, spliced, and contain signatures of positive natural selection in regions known to determine virus recognition in TRIM5α. However, we find that they do not restrict the same retroviruses as TRIM5α, consistent with the high degree of divergence observed in the regions that control target specificity. We propose that this recombinationally volatile locus serves as a reservoir from which new TRIM genes arise through segmental duplication, allowing primates to continually acquire new antiviral genes that can be selected to target new and evolving pathogens.

  20. Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis.

    PubMed

    Lees, Jonathan G; Lee, David; Studer, Romain A; Dawson, Natalie L; Sillitoe, Ian; Das, Sayoni; Yeats, Corin; Dessailly, Benoit H; Rentzsch, Robert; Orengo, Christine A

    2014-01-01

    Gene3D (http://gene3d.biochem.ucl.ac.uk) is a database of protein domain structure annotations for protein sequences. Domains are predicted using a library of profile HMMs from 2738 CATH superfamilies. Gene3D assigns domain annotations to Ensembl and UniProt sequence sets including >6000 cellular genomes and >20 million unique protein sequences. This represents an increase of 45% in the number of protein sequences since our last publication. Thanks to improvements in the underlying data and pipeline, we see large increases in the domain coverage of sequences. We have expanded this coverage by integrating Pfam and SUPERFAMILY domain annotations, and we now resolve domain overlaps to provide highly comprehensive composite multi-domain architectures. To make these data more accessible for comparative genome analyses, we have developed novel search algorithms for searching genomes to identify related multi-domain architectures. In addition to providing domain family annotations, we have now developed a pipeline for 3D homology modelling of domains in Gene3D. This has been applied to the human genome and will be rolled out to other major organisms over the next year.

  1. Genomic Aberrations Frequently Alter Chromatin Regulatory Genes in Chordoma

    PubMed Central

    Wang, Lu; Zehir, Ahmet; Nafa, Khedoudja; Zhou, Nengyi; Berger, Michael F.; Casanova, Jacklyn; Sadowska, Justyna; Lu, Chao; Allis, C. David; Gounder, Mrinal; Chandhanayingyong, Chandhanarat; Ladanyi, Marc; Boland, Patrick J; Hameed, Meera

    2016-01-01

    Chordoma is a rare primary bone neoplasm that is resistant to standard chemotherapies. Despite aggressive surgical management, local recurrence and metastasis is not uncommon. To identify the specific genetic aberrations that play key roles in chordoma pathogenesis, we utilized a genome-wide high-resolution SNP-array and next generation sequencing (NGS)-based molecular profiling platform to study 24 patient samples with typical histopathologic features of chordoma. Matching normal tissues were available for 16 samples. SNP-array analysis revealed nonrandom copy number losses across the genome, frequently involving 3, 9p, 1p, 14, 10, and 13. In contrast, copy number gain is uncommon in chordomas. Two minimum deleted regions were observed on 3p within a ~8 Mb segment at 3p21.1–p21.31, which overlaps SETD2, BAP1 and PBRM1. The minimum deleted region on 9p was mapped to CDKN2A locus at 9p21.3, and homozygous deletion of CDKN2A was detected in 5/22 chordomas (~23%). NGS-based molecular profiling demonstrated an extremely low level of mutation rate in chordomas, with an average of 0.5 mutations per sample for the 16 cases with matched normal. When the mutated genes were grouped based on molecular functions, many of the mutation events (~40%) were found in chromatin regulatory genes. The combined copy number and mutation profiling revealed that SETD2 is the single gene affected most frequently in chordomas, either by deletion or by mutations. Our study demonstrated that chordoma belongs to the C-class (copy number changes) tumors whose oncogenic signature is non-random multiple copy number losses across the genome and genomic aberrations frequently alter chromatin regulatory genes. PMID:27072194

  2. Re-examining the Gene in Personalized Genomics

    NASA Astrophysics Data System (ADS)

    Bartol, Jordan

    2013-10-01

    Personalized genomics companies (PG; also called `direct-to-consumer genetics') are businesses marketing genetic testing to consumers over the Internet. While much has been written about these new businesses, little attention has been given to their roles in science communication. This paper provides an analysis of the gene concept presented to customers and the relation between the information given and the science behind PG. Two quite different gene concepts are present in company rhetoric, but only one features in the science. To explain this, we must appreciate the delicate tension between PG, academic science, public expectation, and market forces.

  3. The genome of Salinibacter ruber: Convergence and gene exchange among hyperhalophilic bacteria and archaea

    PubMed Central

    Mongodin, E. F.; Nelson, K. E.; Daugherty, S.; DeBoy, R. T.; Wister, J.; Khouri, H.; Weidman, J.; Walsh, D. A.; Papke, R. T.; Sanchez Perez, G.; Sharma, A. K.; Nesbø, C. L.; MacLeod, D.; Bapteste, E.; Doolittle, W. F.; Charlebois, R. L.; Legault, B.; Rodriguez-Valera, F.

    2005-01-01

    Saturated thalassic brines are among the most physically demanding habitats on Earth: few microbes survive in them. Salinibacter ruber is among these organisms and has been found repeatedly in significant numbers in climax saltern crystallizer communities. The phenotype of this bacterium is remarkably similar to that of the hyperhalophilic Archaea (Haloarchaea). The genome sequence suggests that this resemblance has arisen through convergence at the physiological level (different genes producing similar overall phenotype) and the molecular level (independent mutations yielding similar sequences or structures). Several genes and gene clusters also derive by lateral transfer from (or may have been laterally transferred to) haloarchaea. S. ruber encodes four rhodopsins. One resembles bacterial proteorhodopsins and three are of the haloarchaeal type, previously uncharacterized in a bacterial genome. The impact of these modular adaptive elements on the cell biology and ecology of S. ruber is substantial, affecting salt adaptation, bioenergetics, and photobiology. PMID:16330755

  4. Phylogeny, genomic organization and expression of lambda and kappa immunoglobulin light chain genes in a reptile, Anolis carolinensis.

    PubMed

    Wu, Qian; Wei, Zhiguo; Yang, Zhi; Wang, Tao; Ren, Liming; Hu, Xiaoxiang; Meng, Qingyong; Guo, Ying; Zhu, Qinghong; Robert, Jacques; Hammarström, Lennart; Li, Ning; Zhao, Yaofeng

    2010-05-01

    The reptiles are the last major taxon of jawed vertebrates in which immunoglobulin light chain isotypes have not been well characterized. Using the recently released genome sequencing data, we show in this study that the reptile Anolis carolinensis expresses both lambda and kappa light chain genes. The genomic organization of both gene loci is structurally similar to their respective counterparts in mammals. The identified lambda locus contains three constant region genes each preceded by a joining gene segment, and a total of 37 variable gene segments. In contrast, the kappa locus contains only a single constant region gene, and two joining gene segments with a single family of 14 variable gene segments located upstream. Analysis of junctions of the recombined VJ transcripts reveals a paucity of N and P nucleotides in both expressed lambda and kappa sequences. These results help us to understand the generation of the immunoglobulin repertoire in reptiles and immunoglobulin evolution in vertebrates.

  5. Metabolic Genes within Cyanophage Genomes: Implications for Diversity and Evolution

    PubMed Central

    Gao, E-Bin; Huang, Youhua; Ning, Degang

    2016-01-01

    Cyanophages, a group of viruses specifically infecting cyanobacteria, are genetically diverse and extensively abundant in water environments. As a result of selective pressure, cyanophages often acquire a range of metabolic genes from host genomes. The host-derived genes make a significant contribution to the ecological success of cyanophages. In this review, we summarize the host-derived metabolic genes, as well as their origin and roles in cyanophage evolution and important host metabolic pathways, such as the light-dependent reactions of photosynthesis, the pentose phosphate pathway, nutrient acquisition and nucleotide biosynthesis. We also discuss the suitability of the host-derived metabolic genes as potential diagnostic markers for the detection of genetic diversity of cyanophages in natural environments. PMID:27690109

  6. Nonclinical and Clinical Enterococcus faecium Strains, but Not Enterococcus faecalis Strains, Have Distinct Structural and Functional Genomic Features

    PubMed Central

    Kim, Eun Bae

    2014-01-01

    Certain strains of Enterococcus faecium and Enterococcus faecalis contribute beneficially to animal health and food production, while others are associated with nosocomial infections. To determine whether there are structural and functional genomic features that are distinct between nonclinical (NC) and clinical (CL) strains of those species, we analyzed the genomes of 31 E. faecium and 38 E. faecalis strains. Hierarchical clustering of 7,017 orthologs found in the E. faecium pangenome revealed that NC strains clustered into two clades and are distinct from CL strains. NC E. faecium genomes are significantly smaller than CL genomes, and this difference was partly explained by significantly fewer mobile genetic elements (ME), virulence factors (VF), and antibiotic resistance (AR) genes. E. faecium ortholog comparisons identified 68 and 153 genes that are enriched for NC and CL strains, respectively. Proximity analysis showed that CL-enriched loci, and not NC-enriched loci, are more frequently colocalized on the genome with ME. In CL genomes, AR genes are also colocalized with ME, and VF are more frequently associated with CL-enriched loci. Genes in 23 functional groups are also differentially enriched between NC and CL E. faecium genomes. In contrast, differences were not observed between NC and CL E. faecalis genomes despite their having larger genomes than E. faecium. Our findings show that unlike E. faecalis, NC and CL E. faecium strains are equipped with distinct structural and functional genomic features indicative of adaptation to different environments. PMID:24141120

  7. Regulatory Features for Odorant Receptor Genes in the Mouse Genome

    PubMed Central

    Degl’Innocenti, Andrea; D’Errico, Anna

    2017-01-01

    The odorant receptor genes, seven transmembrane receptor genes constituting the vastest mammalian gene multifamily, are expressed monogenically and monoallelicaly in each sensory neuron in the olfactory epithelium. This characteristic, often referred to as the one neuron–one receptor rule, is driven by mostly uncharacterized molecular dynamics, generally named odorant receptor gene choice. Much attention has been paid by the scientific community to the identification of sequences regulating the expression of odorant receptor genes within their loci, where related genes are usually arranged in genomic clusters. A number of studies identified transcription factor binding sites on odorant receptor promoter sequences. Similar binding sites were also found on a number of enhancers that regulate in cis their transcription, but have been proposed to form interchromosomal networks. Odorant receptor gene choice seems to occur via the local removal of strongly repressive epigenetic markings, put in place during the maturation of the sensory neuron on each odorant receptor locus. Here we review the fast-changing state of art for the study of regulatory features for odorant receptor genes. PMID:28270833

  8. Regulatory Features for Odorant Receptor Genes in the Mouse Genome.

    PubMed

    Degl'Innocenti, Andrea; D'Errico, Anna

    2017-01-01

    The odorant receptor genes, seven transmembrane receptor genes constituting the vastest mammalian gene multifamily, are expressed monogenically and monoallelicaly in each sensory neuron in the olfactory epithelium. This characteristic, often referred to as the one neuron-one receptor rule, is driven by mostly uncharacterized molecular dynamics, generally named odorant receptor gene choice. Much attention has been paid by the scientific community to the identification of sequences regulating the expression of odorant receptor genes within their loci, where related genes are usually arranged in genomic clusters. A number of studies identified transcription factor binding sites on odorant receptor promoter sequences. Similar binding sites were also found on a number of enhancers that regulate in cis their transcription, but have been proposed to form interchromosomal networks. Odorant receptor gene choice seems to occur via the local removal of strongly repressive epigenetic markings, put in place during the maturation of the sensory neuron on each odorant receptor locus. Here we review the fast-changing state of art for the study of regulatory features for odorant receptor genes.

  9. Advances in Pig Genomics and Functional Gene Discovery

    PubMed Central

    2003-01-01

    Advances in pig gene identification, mapping and functional analysis have continued to make rapid progress. The porcine genetic linkage map now has nearly 3000 loci, including several hundred genes, and is likely to expand considerably in the next few years, with many more genes and amplified fragment length polymorphism (AFLP) markers being added to the map. The physical genetic map is also growing rapidly and has over 3000 genes and markers. Several recent quantitative trait loci (QTL) scans and candidate gene analyses have identified important chromosomal regions and individual genes associated with traits of economic interest. The commercial pig industry is actively using this information and traditional performance information to improve pig production by marker-assisted selection (MAS). Research to study the co-expression of thousands of genes is now advancing and methods to combine these approaches to aid in gene discovery are under way. The pig's role in xenotransplantation and biomedical research makes the study of its genome important for the study of human disease. This review will briefly describe advances made, directions for future research and the implications for both the pig industry and human health. PMID:18629119

  10. New Markov Model Approaches to Deciphering Microbial Genome Function and Evolution: Comparative Genomics of Laterally Transferred Genes

    SciTech Connect

    Borodovsky, M.

    2013-04-11

    Algorithmic methods for gene prediction have been developed and successfully applied to many different prokaryotic genome sequences. As the set of genes in a particular genome is not homogeneous with respect to DNA sequence composition features, the GeneMark.hmm program utilizes two Markov models representing distinct classes of protein coding genes denoted "typical" and "atypical". Atypical genes are those whose DNA features deviate significantly from those classified as typical and they represent approximately 10% of any given genome. In addition to the inherent interest of more accurately predicting genes, the atypical status of these genes may also reflect their separate evolutionary ancestry from other genes in that genome. We hypothesize that atypical genes are largely comprised of those genes that have been relatively recently acquired through lateral gene transfer (LGT). If so, what fraction of atypical genes are such bona fide LGTs? We have made atypical gene predictions for all fully completed prokaryotic genomes; we have been able to compare these results to other "surrogate" methods of LGT prediction.

  11. Whole-Genome Sequencing of a Family with Hereditary Pulmonary Alveolar Proteinosis Identifies a Rare Structural Variant Involving CSF2RA/CRLF2/IL3RA Gene Disruption

    PubMed Central

    Chiu, Chih-Yung; Su, Shih-Chi; Fan, Wen-Lang; Lai, Shen-Hao; Tsai, Ming-Han; Chen, Shih-Hsiang; Wong, Kin-Sun; Chung, Wen-Hung

    2017-01-01

    Pulmonary alveolar proteinosis (PAP) is a rare pulmonary disease in which the abnormalities in alveolar surfactant accumulation are caused by impairments of GM-CSF pathway attributing to defects in a variety of genes. However, hereditary PAP is extremely uncommon and a detailed understanding in the genetic inheritance of PAP in a family may provide timely diagnosis, treatment and proper intervention including genetic consultation. Here, we described a comprehensive analysis of genome and gene expression for a family containing one affected child with a diagnosis of PAP and two other healthy siblings. Family-based whole-genome analysis revealed a homozygous deletion that disrupts CSF2RA, CRLF2, and IL3RA gene in the pseudoautosomal region of the X chromosome in the affected child and one of asymptomatic siblings. Further functional pathway analysis of differentially expressed genes in IL-1β-treated peripheral blood mononuclear cells highlighted the insufficiency of immune response in the child with PAP, especially the protection against bacterial infection. Collectively, our results reveal a novel allele as the genetic determinant of a family with PAP and provide insights into variable expressivity and incomplete penetrance of this rare disease, which will be helpful for proper genetic consultation and prompt treatment to avoid mortality and morbidity. PMID:28233860

  12. The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis

    PubMed Central

    Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

    2015-01-01

    Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5’ portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids. PMID:26046631

  13. Essential Genes Predicted in the Genome of Rubrivivax gelatinosus

    PubMed Central

    2016-01-01

    ABSTRACT Rubrivivax gelatinosus is a betaproteobacterium with impressive metabolic diversity. It is capable of phototrophy, chemotrophy, two different mechanisms of sugar metabolism, fermentation, and H2 gas production. To identify core essential genes, R. gelatinosus was subjected to saturating transposon mutagenesis and high-throughput sequencing (TnSeq) analysis using nutrient-rich, aerobic conditions. Results revealed that virtually no primary metabolic genes are essential to the organism and that genomic redundancy only explains a portion of the nonessentiality, but some biosynthetic pathways are still essential under nutrient-rich conditions. Different essentialities of different portions of the Pho regulatory pathway suggest that overexpression of the regulon is toxic and hint at a larger connection between phosphate regulation and cellular health. Lastly, various essentialities of different tRNAs hint at a more complex situation than would be expected for such a core process. These results expand upon research regarding cross-organism gene essentiality and further enrich the study of purple nonsulfur bacteria. IMPORTANCE Microbial genomic data are increasing at a tremendous rate, but physiological characterization of those data lags far behind. One mechanism of high-throughput physiological characterization is TnSeq, which uses high-volume transposon mutagenesis and high-throughput sequencing to identify all of the essential genes in a given organism's genome. Here TnSeq was used to identify essential genes in the metabolically versatile betaproteobacterium Rubrivivax gelatinosus. The results presented here add to the growing TnSeq field and also reveal important aspects of R. gelatinosus physiology, which are applicable to researchers working on metabolically flexible organisms. PMID:27274029

  14. Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee.

    PubMed

    Ventura, Mario; Catacchio, Claudia R; Alkan, Can; Marques-Bonet, Tomas; Sajjadian, Saba; Graves, Tina A; Hormozdiari, Fereydoun; Navarro, Arcadi; Malig, Maika; Baker, Carl; Lee, Choli; Turner, Emily H; Chen, Lin; Kidd, Jeffrey M; Archidiacono, Nicoletta; Shendure, Jay; Wilson, Richard K; Eichler, Evan E

    2011-10-01

    Structural variation has played an important role in the evolutionary restructuring of human and great ape genomes. Recent analyses have suggested that the genomes of chimpanzee and human have been particularly enriched for this form of genetic variation. Here, we set out to assess the extent of structural variation in the gorilla lineage by generating 10-fold genomic sequence coverage from a western lowland gorilla and integrating these data into a physical and cytogenetic framework of structural variation. We discovered and validated over 7665 structural changes within the gorilla lineage, including sequence resolution of inversions, deletions, duplications, and mobile element insertions. A comparison with human and other ape genomes shows that the gorilla genome has been subjected to the highest rate of segmental duplication. We show that both the gorilla and chimpanzee genomes have experienced independent yet convergent patterns of structural mutation that have not occurred in humans, including the formation of subtelomeric heterochromatic caps, the hyperexpansion of segmental duplications, and bursts of retroviral integrations. Our analysis suggests that the chimpanzee and gorilla genomes are structurally more derived than either orangutan or human genomes.

  15. Gorilla genome structural variation reveals evolutionary parallelisms with chimpanzee

    PubMed Central

    Ventura, Mario; Catacchio, Claudia R.; Alkan, Can; Marques-Bonet, Tomas; Sajjadian, Saba; Graves, Tina A.; Hormozdiari, Fereydoun; Navarro, Arcadi; Malig, Maika; Baker, Carl; Lee, Choli; Turner, Emily H.; Chen, Lin; Kidd, Jeffrey M.; Archidiacono, Nicoletta; Shendure, Jay; Wilson, Richard K.; Eichler, Evan E.

    2011-01-01

    Structural variation has played an important role in the evolutionary restructuring of human and great ape genomes. Recent analyses have suggested that the genomes of chimpanzee and human have been particularly enriched for this form of genetic variation. Here, we set out to assess the extent of structural variation in the gorilla lineage by generating 10-fold genomic sequence coverage from a western lowland gorilla and integrating these data into a physical and cytogenetic framework of structural variation. We discovered and validated over 7665 structural changes within the gorilla lineage, including sequence resolution of inversions, deletions, duplications, and mobile element insertions. A comparison with human and other ape genomes shows that the gorilla genome has been subjected to the highest rate of segmental duplication. We show that both the gorilla and chimpanzee genomes have experienced independent yet convergent patterns of structural mutation that have not occurred in humans, including the formation of subtelomeric heterochromatic caps, the hyperexpansion of segmental duplications, and bursts of retroviral integrations. Our analysis suggests that the chimpanzee and gorilla genomes are structurally more derived than either orangutan or human genomes. PMID:21685127

  16. A salmonid EST genomic study: genes, duplications, phylogeny and microarrays

    PubMed Central

    Koop, Ben F; von Schalburg, Kristian R; Leong, Jong; Walker, Neil; Lieph, Ryan; Cooper, Glenn A; Robb, Adrienne; Beetz-Sargent, Marianne; Holt, Robert A; Moore, Richard; Brahmbhatt, Sonal; Rosner, Jamie; Rexroad, Caird E; McGowan, Colin R; Davidson, William S

    2008-01-01

    Background Salmonids are of interest because of their relatively recent genome duplication, and their extensive use in wild fisheries and aquaculture. A comprehensive gene list and a comparison of genes in some of the different species provide valuable genomic information for one of the most widely studied groups of fish. Results 298,304 expressed sequence tags (ESTs) from Atlantic salmon (69% of the total), 11,664 chinook, 10,813 sockeye, 10,051 brook trout, 10,975 grayling, 8,630 lake whitefish, and 3,624 northern pike ESTs were obtained in this study and have been deposited into the public databases. Contigs were built and putative full-length Atlantic salmon clones have been identified. A database containing ESTs, assemblies, consensus sequences, open reading frames, gene predictions and putative annotation is available. The overall similarity between Atlantic salmon ESTs and those of rainbow trout, chinook, sockeye, brook trout, grayling, lake whitefish, northern pike and rainbow smelt is 93.4, 94.2, 94.6, 94.4, 92.5, 91.7, 89.6, and 86.2% respectively. An analysis of 78 transcript sets show Salmo as a sister group to Oncorhynchus and Salvelinus within Salmoninae, and Thymallinae as a sister group to Salmoninae and Coregoninae within Salmonidae. Extensive gene duplication is consistent with a genome duplication in the common ancestor of salmonids. Using all of the available EST data, a new expanded salmonid cDNA microarray of 32,000 features was created. Cross-species hybridizations to this cDNA microarray indicate that this resource will be useful for studies of all 68 salmonid species. Conclusion An extensive collection and analysis of salmonid RNA putative transcripts indicate that Pacific salmon, Atlantic salmon and charr are 94–96% similar while the more distant whitefish, grayling, pike and smelt are 93, 92, 89 and 86% similar to salmon. The salmonid transcriptome reveals a complex history of gene duplication that is consistent with an ancestral

  17. CCor: A whole genome network-based similarity measure between two genes.

    PubMed

    Hu, Yiming; Zhao, Hongyu

    2016-12-01

    Measuring the similarity between genes is often the starting point for building gene regulatory networks. Most similarity measures used in practice only consider pairwise information with a few also consider network structure. Although theoretical properties of pairwise measures are well understood in the statistics literature, little is known about their statistical properties of those similarity measures based on network structure. In this article, we consider a new whole genome network-based similarity measure, called CCor, that makes use of information of all the genes in the network. We derive a concentration inequality of CCor and compare it with the commonly used Pearson correlation coefficient for inferring network modules. Both theoretical analysis and real data example demonstrate the advantages of CCor over existing measures for inferring gene modules.

  18. CCor: a whole genome network-based similarity measure between two genes

    PubMed Central

    Hu, Yiming; Zhao, Hongyu

    2016-01-01

    Summary Measuring the similarity between genes is often the starting point for building gene regulatory networks. Most similarity measures used in practice only consider pairwise information with a few also consider network structure. Although theoretical properties of pairwise measures are well understood in the statistics literature, little is known about their statistical properties of those similarity measures based on network structure. In this article, we consider a new whole genome network-based similarity measure, called CCor, that makes use of information of all the genes in the network. We derive a concentration inequality of CCor and compare it with the commonly used Pearson correlation coe cient for inferring network modules. Both theoretical analysis and real data example demonstrate the advantages of CCor over existing measures for inferring gene modules. PMID:26953524

  19. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments.

    PubMed

    Haas, Brian J; Salzberg, Steven L; Zhu, Wei; Pertea, Mihaela; Allen, Jonathan E; Orvis, Joshua; White, Owen; Buell, C Robin; Wortman, Jennifer R

    2008-01-11

    EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

  20. Automated Eukaryotic Gene Structure Annotation Using EVidenceModeler and the Program to Assemble Spliced Alignments

    SciTech Connect

    Haas, B J; Salzberg, S L; Zhu, W; Pertea, M; Allen, J E; Orvis, J; White, O; Buell, C R; Wortman, J R

    2007-12-10

    EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation.

  1. Genomics and genetics of gonadotropin beta-subunit genes: Unique FSHB and duplicated LHB/CGB loci

    PubMed Central

    Nagirnaja, Liina; Rull, Kristiina; Uusküla, Liis; Hallast, Pille; Grigorova, Marina; Laan, Maris

    2010-01-01

    The follicle stimulating hormone (FSH), luteinizing hormone (LH) and chorionic gonadotropin (HCG) play a critical role in human reproduction. Despite the common evolutionary ancestry and functional relatedness of the gonadotropin hormone beta (GtHB) genes, the single-copy FSHB (at 11p13) and the multi-copy LHB/CGB genes (at 19q13.32) exhibit locus-specific differences regarding their genomic context, evolution, genetic variation and expressional profile. FSHB represents a conservative vertebrate gene with a unique function and it is located in a structurally stable gene-poor region. In contrast, the primate-specific LHB/CGB gene cluster is located in a gene-rich genomic context and demonstrates an example of evolutionary young and unstable genomic region. The gene cluster is shaped by a constant balance between selection that acts on specific functions of the loci and frequent gene conversion events among duplicons. As the transcription of the GtHB genes is rate-limiting in the assembly of respective hormones, the genomic and genetic context of the FSHB and the LHB/CGB genes largely affects the profile of the hormone production. PMID:20488225

  2. Genomic analysis of primordial dwarfism reveals novel disease genes.

    PubMed

    Shaheen, Ranad; Faqeih, Eissa; Ansari, Shinu; Abdel-Salam, Ghada; Al-Hassnan, Zuhair N; Al-Shidi, Tarfa; Alomar, Rana; Sogaty, Sameera; Alkuraya, Fowzan S

    2014-02-01

    Primordial dwarfism (PD) is a disease in which severely impaired fetal growth persists throughout postnatal development and results in stunted adult size. The condition is highly heterogeneous clinically, but the use of certain phenotypic aspects such as head circumference and facial appearance has proven helpful in defining clinical subgroups. In this study, we present the results of clinical and genomic characterization of 16 new patients in whom a broad definition of PD was used (e.g., 3M syndrome was included). We report a novel PD syndrome with distinct facies in two unrelated patients, each with a different homozygous truncating mutation in CRIPT. Our analysis also reveals, in addition to mutations in known PD disease genes, the first instance of biallelic truncating BRCA2 mutation causing PD with normal bone marrow analysis. In addition, we have identified a novel locus for Seckel syndrome based on a consanguineous multiplex family and identified a homozygous truncating mutation in DNA2 as the likely cause. An additional novel PD disease candidate gene XRCC4 was identified by autozygome/exome analysis, and the knockout mouse phenotype is highly compatible with PD. Thus, we add a number of novel genes to the growing list of PD-linked genes, including one which we show to be linked to a novel PD syndrome with a distinct facial appearance. PD is extremely heterogeneous genetically and clinically, and genomic tools are often required to reach a molecular diagnosis.

  3. Sugarcane Functional Genomics: Gene Discovery for Agronomic Trait Development

    PubMed Central

    Menossi, M.; Silva-Filho, M. C.; Vincentz, M.; Van-Sluys, M.-A.; Souza, G. M.

    2008-01-01

    Sugarcane is a highly productive crop used for centuries as the main source of sugar and recently to produce ethanol, a renewable bio-fuel energy source. There is increased interest in this crop due to the impending need to decrease fossil fuel usage. Sugarcane has a highly polyploid genome. Expressed sequence tag (EST) sequencing has significantly contributed to gene discovery and expression studies used to associate function with sugarcane genes. A significant amount of data exists on regulatory events controlling responses to herbivory, drought, and phosphate deficiency, which cause important constraints on yield and on endophytic bacteria, which are highly beneficial. The means to reduce drought, phosphate deficiency, and herbivory by the sugarcane borer have a negative impact on the environment. Improved tolerance for these constraints is being sought. Sugarcane's ability to accumulate sucrose up to 16% of its culm dry weight is a challenge for genetic manipulation. Genome-based technology such as cDNA microarray data indicates genes associated with sugar content that may be used to develop new varieties improved for sucrose content or for traits that restrict the expansion of the cultivated land. The genes can also be used as molecular markers of agronomic traits in traditional breeding programs. PMID:18273390

  4. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes

    PubMed Central

    Biankin, Andrew V.; Waddell, Nicola; Kassahn, Karin S.; Gingras, Marie-Claude; Muthuswamy, Lakshmi B.; Johns, Amber L.; Miller, David K.; Wilson, Peter J.; Patch, Ann-Marie; Wu, Jianmin; Chang, David K.; Cowley, Mark J.; Gardiner, Brooke B.; Song, Sarah; Harliwong, Ivon; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Wani, Shivangi; Gongora, Milena; Pajic, Marina; Scarlett, Christopher J.; Gill, Anthony J.; Pinho, Andreia V.; Rooman, Ilse; Anderson, Matthew; Holmes, Oliver; Leonard, Conrad; Taylor, Darrin; Wood, Scott; Xu, Qinying; Nones, Katia; Fink, J. Lynn; Christ, Angelika; Bruxner, Tim; Cloonan, Nicole; Kolle, Gabriel; Newell, Felicity; Pinese, Mark; Mead, R. Scott; Humphris, Jeremy L.; Kaplan, Warren; Jones, Marc D.; Colvin, Emily K.; Nagrial, Adnan M.; Humphrey, Emily S.; Chou, Angela; Chin, Venessa T.; Chantrill, Lorraine A.; Mawson, Amanda; Samra, Jaswinder S.; Kench, James G.; Lovell, Jessica A.; Daly, Roger J.; Merrett, Neil D.; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q.; Barbour, Andrew; Zeps, Nikolajs; Kakkar, Nipun; Zhao, Fengmei; Wu, Yuan Qing; Wang, Min; Muzny, Donna M.; Fisher, William E.; Brunicardi, F. Charles; Hodges, Sally E.; Reid, Jeffrey G.; Drummond, Jennifer; Chang, Kyle; Han, Yi; Lewis, Lora R.; Dinh, Huyen; Buhay, Christian J.; Beck, Timothy; Timms, Lee; Sam, Michelle; Begley, Kimberly; Brown, Andrew; Pai, Deepa; Panchal, Ami; Buchner, Nicholas; De Borja, Richard; Denroche, Robert E.; Yung, Christina K.; Serra, Stefano; Onetto, Nicole; Mukhopadhyay, Debabrata; Tsao, Ming-Sound; Shaw, Patricia A.; Petersen, Gloria M.; Gallinger, Steven; Hruban, Ralph H.; Maitra, Anirban; Iacobuzio-Donahue, Christine A.; Schulick, Richard D.; Wolfgang, Christopher L.; Morgan, Richard A.; Lawlor, Rita T.; Capelli, Paola; Corbo, Vincenzo; Scardoni, Maria; Tortora, Giampaolo; Tempero, Margaret A.; Mann, Karen M.; Jenkins, Nancy A.; Perez-Mancera, Pedro A.; Adams, David J.; Largaespada, David A.; Wessels, Lodewyk F. A.; Rust, Alistair G.; Stein, Lincoln D.; Tuveson, David A.; Copeland, Neal G.; Musgrove, Elizabeth A.; Scarpa, Aldo; Eshleman, James R.; Hudson, Thomas J.; Sutherland, Robert L.; Wheeler, David A.; Pearson, John V.; McPherson, John D.; Gibbs, Richard A.; Grimmond, Sean M.

    2012-01-01

    Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis. PMID:23103869

  5. Genome-Wide Identification of KANADI1 Target Genes

    PubMed Central

    Ott, Felix; Weigel, Detlef; Bowman, John L.; Heisler, Marcus G.; Wenkel, Stephan

    2013-01-01

    Plant organ development and polarity establishment is mediated by the action of several transcription factors. Among these, the KANADI (KAN) subclade of the GARP protein family plays important roles in polarity-associated processes during embryo, shoot and root patterning. In this study, we have identified a set of potential direct target genes of KAN1 through a combination of chromatin immunoprecipitation/DNA sequencing (ChIP-Seq) and genome-wide transcriptional profiling using tiling arrays. Target genes are over-represented for genes involved in the regulation of organ development as well as in the response to auxin. KAN1 affects directly the expression of several genes previously shown to be important in the establishment of polarity during lateral organ and vascular tissue development. We also show that KAN1 controls through its target genes auxin effects on organ development at different levels: transport and its regulation, and signaling. In addition, KAN1 regulates genes involved in the response to abscisic acid, jasmonic acid, brassinosteroids, ethylene, cytokinins and gibberellins. The role of KAN1 in organ polarity is antagonized by HD-ZIPIII transcription factors, including REVOLUTA (REV). A comparison of their target genes reveals that the REV/KAN1 module acts in organ patterning through opposite regulation of shared targets. Evidence of mutual repression between closely related family members is also shown. PMID:24155946

  6. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes.

    PubMed

    Biankin, Andrew V; Waddell, Nicola; Kassahn, Karin S; Gingras, Marie-Claude; Muthuswamy, Lakshmi B; Johns, Amber L; Miller, David K; Wilson, Peter J; Patch, Ann-Marie; Wu, Jianmin; Chang, David K; Cowley, Mark J; Gardiner, Brooke B; Song, Sarah; Harliwong, Ivon; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Wani, Shivangi; Gongora, Milena; Pajic, Marina; Scarlett, Christopher J; Gill, Anthony J; Pinho, Andreia V; Rooman, Ilse; Anderson, Matthew; Holmes, Oliver; Leonard, Conrad; Taylor, Darrin; Wood, Scott; Xu, Qinying; Nones, Katia; Fink, J Lynn; Christ, Angelika; Bruxner, Tim; Cloonan, Nicole; Kolle, Gabriel; Newell, Felicity; Pinese, Mark; Mead, R Scott; Humphris, Jeremy L; Kaplan, Warren; Jones, Marc D; Colvin, Emily K; Nagrial, Adnan M; Humphrey, Emily S; Chou, Angela; Chin, Venessa T; Chantrill, Lorraine A; Mawson, Amanda; Samra, Jaswinder S; Kench, James G; Lovell, Jessica A; Daly, Roger J; Merrett, Neil D; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q; Barbour, Andrew; Zeps, Nikolajs; Kakkar, Nipun; Zhao, Fengmei; Wu, Yuan Qing; Wang, Min; Muzny, Donna M; Fisher, William E; Brunicardi, F Charles; Hodges, Sally E; Reid, Jeffrey G; Drummond, Jennifer; Chang, Kyle; Han, Yi; Lewis, Lora R; Dinh, Huyen; Buhay, Christian J; Beck, Timothy; Timms, Lee; Sam, Michelle; Begley, Kimberly; Brown, Andrew; Pai, Deepa; Panchal, Ami; Buchner, Nicholas; De Borja, Richard; Denroche, Robert E; Yung, Christina K; Serra, Stefano; Onetto, Nicole; Mukhopadhyay, Debabrata; Tsao, Ming-Sound; Shaw, Patricia A; Petersen, Gloria M; Gallinger, Steven; Hruban, Ralph H; Maitra, Anirban; Iacobuzio-Donahue, Christine A; Schulick, Richard D; Wolfgang, Christopher L; Morgan, Richard A; Lawlor, Rita T; Capelli, Paola; Corbo, Vincenzo; Scardoni, Maria; Tortora, Giampaolo; Tempero, Margaret A; Mann, Karen M; Jenkins, Nancy A; Perez-Mancera, Pedro A; Adams, David J; Largaespada, David A; Wessels, Lodewyk F A; Rust, Alistair G; Stein, Lincoln D; Tuveson, David A; Copeland, Neal G; Musgrove, Elizabeth A; Scarpa, Aldo; Eshleman, James R; Hudson, Thomas J; Sutherland, Robert L; Wheeler, David A; Pearson, John V; McPherson, John D; Gibbs, Richard A; Grimmond, Sean M

    2012-11-15

    Pancreatic cancer is a highly lethal malignancy with few effective therapies. We performed exome sequencing and copy number analysis to define genomic aberrations in a prospectively accrued clinical cohort (n = 142) of early (stage I and II) sporadic pancreatic ductal adenocarcinoma. Detailed analysis of 99 informative tumours identified substantial heterogeneity with 2,016 non-silent mutations and 1,628 copy-number variations. We define 16 significantly mutated genes, reaffirming known mutations (KRAS, TP53, CDKN2A, SMAD4, MLL3, TGFBR2, ARID1A and SF3B1), and uncover novel mutated genes including additional genes involved in chromatin modification (EPC1 and ARID2), DNA damage repair (ATM) and other mechanisms (ZIM2, MAP2K4, NALCN, SLC16A4 and MAGEA6). Integrative analysis with in vitro functional data and animal models provided supportive evidence for potential roles for these genetic aberrations in carcinogenesis. Pathway-based analysis of recurrently mutated genes recapitulated clustering in core signalling pathways in pancreatic ductal adenocarcinoma, and identified new mutated genes in each pathway. We also identified frequent and diverse somatic aberrations in genes described traditionally as embryonic regulators of axon guidance, particularly SLIT/ROBO signalling, which was also evident in murine Sleeping Beauty transposon-mediated somatic mutagenesis models of pancreatic cancer, providing further supportive evidence for the potential involvement of axon guidance genes in pancreatic carcinogenesis.

  7. Gene duplication, genome duplication, and the functional diversification of vertebrate globins

    PubMed Central

    Storz, Jay F.; Opazo, Juan C.; Hoffmann, Federico G.

    2015-01-01

    The functional diversification of the vertebrate globin gene superfamily provides an especially vivid illustration of the role of gene duplication and whole-genome duplication in promoting evolutionary innovation. For example, key globin proteins that evolved specialized functions in various aspects of oxidative metabolism and oxygen signaling pathways (hemoglobin [Hb], myoglobin [Mb], and cytoglobin [Cygb]) trace their origins to two whole-genome duplication events in the stem lineage of vertebrates. The retention of the proto-Hb and Mb genes in the ancestor of jawed vertebrates permitted a physiological division of labor between the oxygen-carrier function of Hb and the oxygen-storage function of Mb. In the Hb gene lineage, a subsequent tandem gene duplication gave rise to the proto α- and β-globin genes, which permitted the formation of multimeric Hbs composed of unlike subunits (α2β2). The evolution of this heteromeric quaternary structure was central to the emergence of Hb as a specialized oxygen-transport protein because it provided a mechanism for cooperative oxygen-binding and allosteric regulatory control. Subsequent rounds of duplication and divergence have produced diverse repertoires of α- and β-like globin genes that are ontogenetically regulated such that functionally distinct Hb isoforms are expressed during different stages of prenatal development and postnatal life. In the ancestor of jawless fishes, the proto Mb and Hb genes appear to have been secondarily lost, and the Cygb homolog evolved a specialized respiratory function in blood-oxygen transport. Phylogenetic and comparative genomic analyses of the vertebrate globin gene superfamily have revealed numerous instances in which paralogous globins have convergently evolved similar expression patterns and/or similar functional specializations in different organismal lineages. PMID:22846683

  8. Gene duplication, genome duplication, and the functional diversification of vertebrate globins.

    PubMed

    Storz, Jay F; Opazo, Juan C; Hoffmann, Federico G

    2013-02-01

    The functional diversification of the vertebrate globin gene superfamily provides an especially vivid illustration of the role of gene duplication and whole-genome duplication in promoting evolutionary innovation. For example, key globin proteins that evolved specialized functions in various aspects of oxidative metabolism and oxygen signaling pathways (hemoglobin [Hb], myoglobin [Mb], and cytoglobin [Cygb]) trace their origins to two whole-genome duplication events in the stem lineage of vertebrates. The retention of the proto-Hb and Mb genes in the ancestor of jawed vertebrates permitted a physiological division of labor between the oxygen-carrier function of Hb and the oxygen-storage function of Mb. In the Hb gene lineage, a subsequent tandem gene duplication gave rise to the proto α- and β-globin genes, which permitted the formation of multimeric Hbs composed of unlike subunits (α(2)β(2)). The evolution of this heteromeric quaternary structure was central to the emergence of Hb as a specialized oxygen-transport protein because it provided a mechanism for cooperative oxygen-binding and allosteric regulatory control. Subsequent rounds of duplication and divergence have produced diverse repertoires of α- and β-like globin genes that are ontogenetically regulated such that functionally distinct Hb isoforms are expressed during different stages of prenatal development and postnatal life. In the ancestor of jawless fishes, the proto Mb and Hb genes appear to have been secondarily lost, and the Cygb homolog evolved a specialized respiratory function in blood-oxygen transport. Phylogenetic and comparative genomic analyses of the vertebrate globin gene superfamily have revealed numerous instances in which paralogous globins have convergently evolved similar expression patterns and/or similar functional specializations in different organismal lineages.

  9. Genome size diversity in angiosperms and its influence on gene space.

    PubMed

    Dodsworth, Steven; Leitch, Andrew R; Leitch, Ilia J

    2015-12-01

    Genome size varies c. 2400-fold in angiosperms (flowering plants), although the range of genome size is skewed towards small genomes, with a mean genome size of 1C=5.7Gb. One of the most crucial factors governing genome size in angiosperms is the relative amount and activity of repetitive elements. Recently, there have been new insights into how these repeats, previously discarded as 'junk' DNA, can have a significant impact on gene space (i.e. the part of the genome comprising all the genes and gene-related DNA). Here we review these new findings and explore in what ways genome size itself plays a role in influencing how repeats impact genome dynamics and gene space, including gene expression.

  10. The emerging biofuel crop Camelina sativa retains a highly undifferentiated hexaploid genome structure.

    PubMed

    Kagale, Sateesh; Koh, Chushin; Nixon, John; Bollina, Venkatesh; Clarke, Wayne E; Tuteja, Reetu; Spillane, Charles; Robinson, Stephen J; Links, Matthew G; Clarke, Carling; Higgins, Erin E; Huebert, Terry; Sharpe, Andrew G; Parkin, Isobel A P

    2014-04-23

    Camelina sativa is an oilseed with desirable agronomic and oil-quality attributes for a viable industrial oil platform crop. Here we generate the first chromosome-scale high-quality reference genome sequence for C. sativa and annotated 89,418 protein-coding genes, representing a whole-genome triplication event relative to the crucifer model Arabidopsis thaliana. C. sativa represents the first crop species to be sequenced from lineage I of the Brassicaceae. The well-preserved hexaploid genome structure of C. sativa surprisingly mirrors those of economically important amphidiploid Brassica crop species from lineage II as well as wheat and cotton. The three genomes of C. sativa show no evidence of fractionation bias and limited expression-level bias, both characteristics commonly associated with polyploid evolution. The highly undifferentiated polyploid genome of C. sativa presents significant consequences for breeding and genetic manipulation of this industrial oil crop.

  11. Genomic Structure of an Economically Important Cyanobacterium, Arthrospira (Spirulina) platensis NIES-39

    PubMed Central

    Fujisawa, Takatomo; Narikawa, Rei; Okamoto, Shinobu; Ehira, Shigeki; Yoshimura, Hidehisa; Suzuki, Iwane; Masuda, Tatsuru; Mochimaru, Mari; Takaichi, Shinichi; Awai, Koichiro; Sekine, Mitsuo; Horikawa, Hiroshi; Yashiro, Isao; Omata, Seiha; Takarada, Hiromi; Katano, Yoko; Kosugi, Hiroki; Tanikawa, Satoshi; Ohmori, Kazuko; Sato, Naoki; Ikeuchi, Masahiko; Fujita, Nobuyuki; Ohmori, Masayuki

    2010-01-01

    A filamentous non-N2-fixing cyanobacterium, Arthrospira (Spirulina) platensis, is an important organism for industrial applications and as a food supply. Almost the complete genome of A. platensis NIES-39 was determined in this study. The genome structure of A. platensis is estimated to be a single, circular chromosome of 6.8 Mb, based on optical mapping. Annotation of this 6.7 Mb sequence yielded 6630 protein-coding genes as well as two sets of rRNA genes and 40 tRNA genes. Of the protein-coding genes, 78% are similar to those of other organisms; the remaining 22% are currently unknown. A total 612 kb of the genome comprise group II introns, insertion sequences and some repetitive elements. Group I introns are located in a protein-coding region. Abundant restriction-modification systems were determined. Unique features in the gene composition were noted, particularly in a large number of genes for adenylate cyclase and haemolysin-like Ca2+-binding proteins and in chemotaxis proteins. Filament-specific genes were highlighted by comparative genomic analysis. PMID:20203057

  12. Genome-wide identification and characterization of WRKY gene family in Salix suchowensis

    PubMed Central

    Ye, Qiaolin; Yin, Tongming

    2016-01-01

    WRKY proteins are the zinc finger transcription factors that were first identified in plants. They can specifically interact with the W-box, which can be found in the promoter region of a large number of plant target genes, to regulate the expressions of downstream target genes. They also participate in diverse physiological and growing processes in plants. Prior to this study, a plenty of WRKY genes have been identified and characterized in herbaceous species, but there is no large-scale study of WRKY genes in willow. With the whole genome sequencing of Salix suchowensis, we have the opportunity to conduct the genome-wide research for willow WRKY gene family. In this study, we identified 85 WRKY genes in the willow genome and renamed them from SsWRKY1 to SsWRKY85 on the basis of their specific distributions on chromosomes. Due to their diverse structural features, the 85 willow WRKY genes could be further classified into three main groups (group I–III), with five subgroups (IIa–IIe) in group II. With the multiple sequence alignment and the manual search, we found three variations of the WRKYGQK heptapeptide: WRKYGRK, WKKYGQK and WRKYGKK, and four variations of the normal zinc finger motif, which might execute some new biological functions. In addition, the SsWRKY genes from the same subgroup share the similar exon–intron structures and conserved motif domains. Further studies of SsWRKY genes revealed that segmental duplication events (SDs) played a more prominent role in the expansion of SsWRKY genes. Distinct expression profiles of SsWRKY genes with RNA sequencing data revealed that diverse expression patterns among five tissues, including tender roots, young leaves, vegetative buds, non-lignified stems and barks. With the analyses of WRKY gene family in willow, it is not only beneficial to complete the functional and annotation information of WRKY genes family in woody plants, but also provide important references to investigate the expansion and evolution

  13. Mapping Our Genes: The Genome Projects: How Big, How Fast

    DOE R&D Accomplishments Database

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for �writing the rules� of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. The Office of Technology Assessment (OTA) prepared this report with the assistance of several hundred experts throughout the world.

  14. Mapping our genes: The genome projects: How big, how fast

    SciTech Connect

    none,

    1988-04-01

    For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technology, and politics. Congress is responsible for /open quotes/writing the rules/close quotes/ of what various federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the US Congress. Congressional interest focused on how to assess the rationales for conducting human genome projects, how to fund human genome projects (at what level and through which mechanisms), how to coordinate the scientific and technical programs of the several federal agencies and private interests already supporting various genome projects, and how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology. OTA prepared this report with the assistance of several hundred experts throughout the world. 342 refs., 26 figs., 11 tabs.

  15. OxyGene: an innovative platform for investigating oxidative-response genes in whole prokaryotic genomes

    PubMed Central

    Thybert, David; Avner, Stéphane; Lucchetti-Miganeh, Céline; Chéron, Angélique; Barloy-Hubler, Frédérique

    2008-01-01

    Background Oxidative stress is a common stress encountered by living organisms and is due to an imbalance between intracellular reactive oxygen and nitrogen species (ROS, RNS) and cellular antioxidant defence. To defend themselves against ROS/RNS, bacteria possess a subsystem of detoxification enzymes, which are classified with regard to their substrates. To identify such enzymes in prokaryotic genomes, different approaches based on similarity, enzyme profiles or patterns exist. Unfortunately, several problems persist in the annotation, classification and naming of these enzymes due mainly to some erroneous entries in databases, mistake propagation, absence of updating and disparity in function description. Description In order to improve the current annotation of oxidative stress subsystems, an innovative platform named OxyGene has been developed. It integrates an original database called OxyDB, holding thoroughly tested anchor-based signatures associated to subfamilies of oxidative stress enzymes, and a new anchor-driven annotator, for ab initio detection of ROS/RNS response genes. All complete Bacterial and Archaeal genomes have been re-annotated, and the results stored in the OxyGene repository can be interrogated via a Graphical User Interface. Conclusion OxyGene enables the exploration and comparative analysis of enzymes belonging to 37 detoxification subclasses in 664 microbial genomes. It proposes a new classification that improves both the ontology and the annotation of the detoxification subsystems in prokaryotic whole genomes, while discovering new ORFs and attributing precise function to hypothetical annotated proteins. OxyGene is freely available at: PMID:19117520

  16. Genome-Wide Identification and Functional Classification of Tomato (Solanum lycopersicum) Aldehyde Dehydrogenase (ALDH) Gene Superfamily

    PubMed Central

    Lopez-Valverde, Francisco J.; Robles-Bolivar, Paula; Lima-Cabello, Elena; Gachomo, Emma W.; Kotchoni, Simeon O.

    2016-01-01

    Aldehyde dehydrogenases (ALDHs) is a protein superfamily that catalyzes the oxidation of aldehyde molecules into their corresponding non-toxic carboxylic acids, and responding to different environmental stresses, offering promising genetic approaches for improving plant adaptation. The aim of the current study is the functional analysis for systematic identification of S. lycopersicum ALDH gene superfamily. We performed genome-based ALDH genes identification and functional classification, phylogenetic relationship, structure and catalytic domains analysis, and microarray based gene expression. Twenty nine unique tomato ALDH sequences encoding 11 ALDH families were identified, including a unique member of the family 19 ALDH. Phylogenetic analysis revealed 13 groups, with a conserved relationship among ALDH families. Functional structure analysis of ALDH2 showed a catalytic mechanism involving Cys-Glu couple. However, the analysis of ALDH3 showed no functional gene duplication or potential neo-functionalities. Gene expression analysis reveals that particular ALDH genes might respond to wounding stress increasing the expression as ALDH2B7. Overall, this study reveals the complexity of S. lycopersicum ALDH gene superfamily and offers new insights into the structure-functional features and evolution of ALDH gene families in vascular plants. The functional characterization of ALDHs is valuable and promoting molecular breeding in tomato for the improvement of stress tolerance and signaling. PMID:27755582

  17. Visible integration of the adenosine deaminase (ADA) gene into the recipient genome after gene therapy.

    PubMed

    Egashira, M; Ariga, T; Kawamura, N; Miyoshi, O; Niikawa, N; Sakiyama, Y

    1998-01-23

    Gene therapy for patients with adenosine deaminase (ADA) deficiency has become practical in the 1990s, and the exogenous gene has been reported to survive for several years in the recipient genome. To evaluate the integration efficiency of the ADA gene (ADA) into peripheral blood lymphocytes (PBL) of a patient with ADA deficiency who is receiving gene therapy, we performed two-color interphase fluorescence in situ hybridization (FISH) analysis by using digoxigenin-labeled ADA-cDNA and the biotin-labeled lambda-genomic ADA clone as probes. After each of 9 sequential series of gene therapy, interphase nuclei of 100 mononuclear cells from the patient were analyzed, and those of a LASN-producing cell line were used as a control. FISH signals were detected with rhodamine and FITC for the cDNA and the genomic DNA, respectively. The number of PBL giving a transgene signal grew after the sequential gene therapies, and the proportion of signal-positive cells reached about 10%. Our results indicate that the two-color FISH system can be used as a potential aid to monitor the efficiency of the ADA gene therapy.

  18. Gene structure in the sea urchin Strongylocentrotus purpuratus based on transcriptome analysis.

    PubMed

    Tu, Qiang; Cameron, R Andrew; Worley, Kim C; Gibbs, Richard A; Davidson, Eric H

    2012-10-01

    A comprehensive transcriptome analysis has been performed on protein-coding RNAs of Strongylocentrotus purpuratus, including 10 different embryonic stages, six feeding larval and metamorphosed juvenile stages, and six adult tissues. In this study, we pooled the transcriptomes from all of these sources and focused on the insights they provide for gene structure in the genome of this recently sequenced model system. The genome had initially been annotated by use of computational gene model prediction algorithms. A large fraction of these predicted genes were recovered in the transcriptome when the reads were mapped to the genome and appropriately filtered and analyzed. However, in a manually curated subset, we discovered that more than half the computational gene model predictions were imperfect, containing errors such as missing exons, prediction of nonexistent exons, erroneous intron/exon boundaries, fusion of adjacent genes, and prediction of multiple genes from single genes. The transcriptome data have been used to provide a systematic upgrade of the gene model predictions throughout the genome, very greatly improving the research usability of the genomic sequence. We have constructed new public databases that incorporate information from the transcriptome analyses. The transcript-based gene model data were used to define average structural parameters for S. purpuratus protein-coding genes. In addition, we constructed a custom sea urchin gene ontology, and assigned about 7000 different annotated transcripts to 24 functional classes. Strong correlations became evident between given functional ontology classes and structural properties, including gene size, exon number, and exon and intron size.

  19. Distal chromatin structure influences local nucleosome positions and gene expression.

    PubMed

    Jansen, An; van der Zande, Elisa; Meert, Wim; Fink, Gerald R; Verstrepen, Kevin J

    2012-05-01

    The positions of nucleosomes across the genome influence several cellular processes, including gene transcription. However, our understanding of the factors dictating where nucleosomes are located and how this affects gene regulation is still limited. Here, we perform an extensive in vivo study to investigate the influence of the neighboring chromatin structure on local nucleosome positioning and gene expression. Using truncated versions of the Saccharomyces cerevisiae URA3 gene, we show that nucleosome positions in the URA3 promoter are at least partly determined by the local DNA sequence, with so-called 'anti-nucleosomal elements' like poly(dA:dT) tracts being key determinants of nucleosome positions. In addition, we show that changes in the nucleosome positions in the URA3 promoter strongly affect the promoter activity. Most interestingly, in addition to demonstrating the effect of the local DNA sequence, our study provides novel in vivo evidence that nucleosome positions are also affected by the position of neighboring nucleosomes. Nucleosome structure may therefore be an important selective force for conservation of gene order on a chromosome, because relocating a gene to another genomic position (where the positions of neighboring nucleosomes are different from the original locus) can have dramatic consequences for the gene's nucleosome structure and thus its expression.

  20. Rapid genome reshaping by multiple-gene loss after whole-genome duplication in teleost fish suggested by mathematical modeling

    PubMed Central

    Sato, Yukuto; Tsukamoto, Katsumi; Nishida, Mutsumi

    2015-01-01

    Whole-genome duplication (WGD) is believed to be a significant source of major evolutionary innovation. Redundant genes resulting from WGD are thought to be lost or acquire new functions. However, the rates of gene loss and thus temporal process of genome reshaping after WGD remain unclear. The WGD shared by all teleost fish, one-half of all jawed vertebrates, was more recent than the two ancient WGDs that occurred before the origin of jawed vertebrates, and thus lends itself to analysis of gene loss and genome reshaping. Using a newly developed orthology identification pipeline, we inferred the post–teleost-specific WGD evolutionary histories of 6,892 protein-coding genes from nine phylogenetically representative teleost genomes on a time-calibrated tree. We found that rapid gene loss did occur in the first 60 My, with a loss of more than 70–80% of duplicated genes, and produced similar genomic gene arrangements within teleosts in that relatively short time. Mathematical modeling suggests that rapid gene loss occurred mainly by events involving simultaneous loss of multiple genes. We found that the subsequent 250 My were characterized by slow and steady loss of individual genes. Our pipeline also identified about 1,100 shared single-copy genes that are inferred to have become singletons before the divergence of clupeocephalan teleosts. Therefore, our comparative genome analysis suggests that rapid gene loss just after the WGD reshaped teleost genomes before the major divergence, and provides a useful set of marker genes for future phylogenetic analysis. PMID:26578810

  1. Rapid genome reshaping by multiple-gene loss after whole-genome duplication in teleost fish suggested by mathematical modeling.

    PubMed

    Inoue, Jun; Sato, Yukuto; Sinclair, Robert; Tsukamoto, Katsumi; Nishida, Mutsumi

    2015-12-01

    Whole-genome duplication (WGD) is believed to be a significant source of major evolutionary innovation. Redundant genes resulting from WGD are thought to be lost or acquire new functions. However, the rates of gene loss and thus temporal process of genome reshaping after WGD remain unclear. The WGD shared by all teleost fish, one-half of all jawed vertebrates, was more recent than the two ancient WGDs that occurred before the origin of jawed vertebrates, and thus lends itself to analysis of gene loss and genome reshaping. Using a newly developed orthology identification pipeline, we inferred the post-teleost-specific WGD evolutionary histories of 6,892 protein-coding genes from nine phylogenetically representative teleost genomes on a time-calibrated tree. We found that rapid gene loss did occur in the first 60 My, with a loss of more than 70-80% of duplicated genes, and produced similar genomic gene arrangements within teleosts in that relatively short time. Mathematical modeling suggests that rapid gene loss occurred mainly by events involving simultaneous loss of multiple genes. We found that the subsequent 250 My were characterized by slow and steady loss of individual genes. Our pipeline also identified about 1,100 shared single-copy genes that are inferred to have become singletons before the divergence of clupeocephalan teleosts. Therefore, our comparative genome analysis suggests that rapid gene loss just after the WGD reshaped teleost genomes before the major divergence, and provides a useful set of marker genes for future phylogenetic analysis.

  2. Complete Chloroplast Genome of the Wollemi Pine (Wollemia nobilis): Structure and Evolution

    PubMed Central

    Yap, Jia-Yee S.; Rohner, Thore; Greenfield, Abigail; Van Der Merwe, Marlien; McPherson, Hannah; Glenn, Wendy; Kornfeld, Geoff; Marendy, Elessa; Pan, Annie Y. H.; Wilkins, Marc R.; Rossetto, Maurizio; Delaney, Sven K.

    2015-01-01

    The Wollemi pine (Wollemia nobilis) is a rare Southern conifer with striking morphological similarity to fossil pines. A small population of W. nobilis was discovered in 1994 in a remote canyon system in the Wollemi National Park (near Sydney, Australia). This population contains fewer than 100 individuals and is critically endangered. Previous genetic studies of the Wollemi pine have investigated its evolutionary relationship with other pines in the family Araucariaceae, and have suggested that the Wollemi pine genome contains little or no variation. However, these studies were performed prior to the widespread use of genome sequencing, and their conclusions were based on a limited fraction of the Wollemi pine genome. In this study, we address this problem by determining the entire sequence of the W. nobilis chloroplast genome. A detailed analysis of the structure of the genome is presented, and the evolution of the genome is inferred by comparison with the chloroplast sequences of other members of the Araucariaceae and the related family Podocarpaceae. Pairwise alignments of whole genome sequences, and the presence of unique pseudogenes, gene duplications and insertions in W. nobilis and Araucariaceae, indicate that the W. nobilis chloroplast genome is most similar to that of its sister taxon Agathis. However, the W. nobilis genome contains an unusually high number of repetitive sequences, and these could be used in future studies to investigate and conserve any remnant genetic diversity in the Wollemi pine. PMID:26061691

  3. Gene organization in the UL region and inverted repeats of the canine herpesvirus genome.

    PubMed

    Rémond, M; Sheldrick, P; Lebreton, F; Nardeux, P; Foulon, T

    1996-01-01

    Restriction mapping and the determination of scattered nucleotide sequences have permitted a description of the global structure and evolutionary affinities of the canine herpesvirus (CHV) genome. The global structure closely resembles that of the totally sequenced genomes of varicella-zoster virus and equine herpesvirus 1 (EHV-1) in having a 37 bp inverted repeat flanking a long unique region (UL) of approximately 100,000 bp, and a 10,100-10,700 bp inverted repeat flanking a short unique region (U8) of roughly 7,400-8,600 bp. On the basis of the sequences obtained, 35 homologues to previously identified herpesvirus gene products were found in UL and the major inverted repeat, and the level of the similarities indicated that CHV belongs to the genus Varicellovirus. Within the genus, CHV appears to be most closely related to EHV-1, pseudorabies virus and feline herpesvirus. Surprisingly, genes for both subunits of the viral ribonucleotide reductase were found to be missing from their equivalent place in other herpesvirus genomes. Either they have been translocated to another position in the CHV genome or, we think more likely, they have been lost.

  4. Pangenome Analysis of Burkholderia pseudomallei: Genome Evolution Preserves Gene Order despite High Recombination Rates.

    PubMed

    Spring-Pearson, Senanu M; Stone, Joshua K; Doyle, Adina; Allender, Christopher J; Okinaka, Richard T; Mayo, Mark; Broomall, Stacey M; Hill, Jessica M; Karavis, Mark A; Hubbard, Kyle S; Insalaco, Joseph M; McNew, Lauren A; Rosenzweig, C Nicole; Gibbons, Henry S; Currie, Bart J; Wagner, David M; Keim, Paul; Tuanyok, Apichai

    2015-01-01

    The pangenomic diversity in Burkholderia pseudomallei is high, with approximately 5.8% of the genome consisting of genomic islands. Genomic islands are known hotspots for recombination driven primarily by site-specific recombination associated with tRNAs. However, recombination rates in other portions of the genome are also high, a feature we expected to disrupt gene order. We analyzed the pangenome of 37 isolates of B. pseudomallei and demonstrate that the pangenome is 'open', with approximately 136 new genes identified with each new genome sequenced, and that the global core genome consists of 4568±16 homologs. Genes associated with metabolism were statistically overrepresented in the core genome, and genes associated with mobile elements, disease, and motility were primarily associated with accessory portions of the pangenome. The frequency distribution of genes present in between 1 and 37 of the genomes analyzed matches well with a model of genome evolution in which 96% of the genome has very low recombination rates but 4% of the genome recombines readily. Using homologous genes among pairs of genomes, we found that gene order was highly conserved among strains, despite the high recombination rates previously observed. High rates of gene transfer and recombination are incompatible with retaining gene order unless these processes are either highly localized to specific sites within the genome, or are characterized by symmetrical gene gain and loss. Our results demonstrate that both processes occur: localized recombination introduces many new genes at relatively few sites, and recombination throughout the genome generates the novel multi-locus sequence types previously observed while preserving gene order.

  5. Evolutionary Genomics and Adaptive Evolution of the Hedgehog Gene Family (Shh, Ihh and Dhh) in Vertebrates

    PubMed Central

    Pereira, Joana; Johnson, Warren E.; O’Brien, Stephen J.; Jarvis, Erich D.; Zhang, Guojie; Gilbert, M. Thomas P.; Vasconcelos, Vitor; Antunes, Agostinho

    2014-01-01

    The Hedgehog (Hh) gene family codes for a class of secreted proteins composed of two active domains that act as signalling molecules during embryo development, namely for the development of the nervous and skeletal systems and the formation of the testis cord. While only one Hh gene is found typically in invertebrate genomes, most vertebrates species have three (Sonic hedgehog – Shh; Indian hedgehog – Ihh; and Desert hedgehog – Dhh), each with different expression patterns and functions, which likely helped promote the increasing complexity of vertebrates and their successful diversification. In this study, we used comparative genomic and adaptive evolutionary analyses to characterize the evolution of the Hh genes in vertebrates following the two major whole genome duplication (WGD) events. To overcome the lack of Hh-coding sequences on avian publicly available databases, we used an extensive dataset of 45 avian and three non-avian reptilian genomes to show that birds have all three Hh paralogs. We find suggestions that following the WGD events, vertebrate Hh paralogous genes evolved independently within similar linkage groups and under different evolutionary rates, especially within the catalytic domain. The structural regions around the ion-binding site were identified to be under positive selection in the signaling domain. These findings contrast with those observed in invertebrates, where different lineages that experienced gene duplication retained similar selective constraints in the Hh orthologs. Our results provide new insights on the evolutionary history of the Hh gene family, the functional roles of these paralogs in vertebrate species, and on the location of mutational hotspots. PMID:25549322

  6. Evolutionary genomics and adaptive evolution of the Hedgehog gene family (Shh, Ihh and Dhh) in vertebrates.

    PubMed

    Pereira, Joana; Johnson, Warren E; O'Brien, Stephen J; Jarvis, Erich D; Zhang, Guojie; Gilbert, M Thomas P; Vasconcelos, Vitor; Antunes, Agostinho

    2014-01-01

    The Hedgehog (Hh) gene family codes for a class of secreted proteins composed of two active domains that act as signalling molecules during embryo development, namely for the development of the nervous and skeletal systems and the formation of the testis cord. While only one Hh gene is found typically in invertebrate genomes, most vertebrates species have three (Sonic hedgehog--Shh; Indian hedgehog--Ihh; and Desert hedgehog--Dhh), each with different expression patterns and functions, which likely helped promote the increasing complexity of vertebrates and their successful diversification. In this study, we used comparative genomic and adaptive evolutionary analyses to characterize the evolution of the Hh genes in vertebrates following the two major whole genome duplication (WGD) events. To overcome the lack of Hh-coding sequences on avian publicly available databases, we used an extensive dataset of 45 avian and three non-avian reptilian genomes to show that birds have all three Hh paralogs. We find suggestions that following the WGD events, vertebrate Hh paralogous genes evolved independently within similar linkage groups and under different evolutionary rates, especially within the catalytic domain. The structural regions around the ion-binding site were identified to be under positive selection in the signaling domain. These findings contrast with those observed in invertebrates, where different lineages that experienced gene duplication retained similar selective constraints in the Hh orthologs. Our results provide new insights on the evolutionary history of the Hh gene family, the functional roles of these paralogs in vertebrate species, and on the location of mutational hotspots.

  7. Genomic analyses of bacterial porin-cytochrome gene clusters

    SciTech Connect

    Shi, Liang; Fredrickson, James K.; Zachara, John M.

    2014-11-26

    In this study, the porin-cytochrome (Pcc) protein complex is responsible for trans-outer membrane electron transfer during extracellular reduction of Fe(III) by the dissimilatory metal-reducing bacterium Geobacter sulfurreducens PCA. The identified and characterized Pcc complex of G. sulfurreducens PCA consists of a porin-like outer-membrane protein, a periplasmic 8-heme c type cytochrome (c-Cyt) and an outer-membrane 12-heme c-Cyt, and the genes encoding the Pcc proteins are clustered in the same regions of genome (i.e., the pcc gene clusters) of G. sulfurreducens PCA. A survey of additionally microbial genomes has identified the pcc gene clusters in all sequenced Geobacter spp. and other bacteria from six different phyla, including Anaeromyxobacter dehalogenans 2CP-1, A. dehalogenans 2CP-C, Anaeromyxobacter sp. K, Candidatus Kuenenia stuttgartiensis, Denitrovibrio acetiphilus DSM 12809, Desulfurispirillum indicum S5, Desulfurivibrio alkaliphilus AHT2, Desulfurobacterium thermolithotrophum DSM 11699, Desulfuromonas acetoxidans DSM 684, Ignavibacterium album JCM 16511, and Thermovibrio ammonificans HB-1. The numbers of genes in the pcc gene clusters vary, ranging from two to nine. Similar to the metal-reducing (Mtr) gene clusters of other Fe(III)-reducing bacteria, such as Shewanella spp., additional genes that encode putative c-Cyts with predicted cellular localizations at the cytoplasmic membrane, periplasm and outer membrane often associate with the pcc gene clusters. This suggests that the Pcc-associated c-Cyts may be part of the pathways for extracellular electron transfer reactions. The presence of pcc gene clusters in the microorganisms that do not reduce solid-phase Fe(III) and Mn(IV) oxides, such as D. alkaliphilus AHT2 and I. album JCM 16511, also suggests that some of the pcc gene clusters may be involved in extracellular

  8. Genome-Wide Analysis of the Aquaporin Gene Family in Chickpea (Cicer arietinum L.).

    PubMed

    Deokar, Amit A; Tar'an, Bunyamin

    2016-01-01

    Aquaporins (AQPs) are essential membrane proteins that play critical role in the transport of water and many other solutes across cell membranes. In this study, a comprehensive genome-wide analysis identified 40 AQP genes in chickpea (Cicer arietinum L.). A complete overview of the chickpea AQP (CaAQP) gene family is presented, including their chromosomal locations, gene structure, phylogeny, gene duplication, conserved functional motifs, gene expression, and conserved promoter motifs. To understand AQP's evolution, a comparative analysis of chickpea AQPs with AQP orthologs from soybean, Medicago, common bean, and Arabidopsis was performed. The chickpea AQP genes were found on all of the chickpea chromosomes, except chromosome 7, with a maximum of six genes on chromosome 6, and a minimum of one gene on chromosome 5. Gene duplication analysis indicated that the expansion of chickpea AQP gene family might have been due to segmental and tandem duplications. CaAQPs were grouped into four subfamilies including 15 NOD26-like intrinsic proteins (NIPs), 13 tonoplast intrinsic proteins (TIPs), eight plasma membrane intrinsic proteins (PIPs), and four small basic intrinsic proteins (SIPs) based on sequence similarities and phylogenetic position. Gene structure analysis revealed a highly conserved exon-intron pattern within CaAQP subfamilies supporting the CaAQP family classification. Functional prediction based on conserved Ar/R selectivity filters, Froger's residues, and specificity-determining positions suggested wide differences in substrate specificity among the subfamilies of CaAQPs. Expression analysis of the AQP genes indicated that some of the genes are tissue-specific, whereas few other AQP genes showed differential expression in response to biotic and abiotic stresses. Promoter profiling of CaAQP genes for conserved cis-acting regulatory elements revealed enrichment of cis-elements involved in circadian control, light response, defense and stress responsiveness

  9. Genome-Wide Analysis of the Aquaporin Gene Family in Chickpea (Cicer arietinum L.)

    PubMed Central

    Deokar, Amit A.; Tar'an, Bunyamin

    2016-01-01

    Aquaporins (AQPs) are essential membrane proteins that play critical role in the transport of water and many other solutes across cell membranes. In this study, a comprehensive genome-wide analysis identified 40 AQP genes in chickpea (Cicer arietinum L.). A complete overview of the chickpea AQP (CaAQP) gene family is presented, including their chromosomal locations, gene structure, phylogeny, gene duplication, conserved functional motifs, gene expression, and conserved promoter motifs. To understand AQP's evolution, a comparative analysis of chickpea AQPs with AQP orthologs from soybean, Medicago, common bean, and Arabidopsis was performed. The chickpea AQP genes were found on all of the chickpea chromosomes, except chromosome 7, with a maximum of six genes on chromosome 6, and a minimum of one gene on chromosome 5. Gene duplication analysis indicated that the expansion of chickpea AQP gene family might have been due to segmental and tandem duplications. CaAQPs were grouped into four subfamilies including 15 NOD26-like intrinsic proteins (NIPs), 13 tonoplast intrinsic proteins (TIPs), eight plasma membrane intrinsic proteins (PIPs), and four small basic intrinsic proteins (SIPs) based on sequence similarities and phylogenetic position. Gene structure analysis revealed a highly conserved exon-intron pattern within CaAQP subfamilies supporting the CaAQP family classification. Functional prediction based on conserved Ar/R selectivity filters, Froger's residues, and specificity-determining positions suggested wide differences in substrate specificity among the subfamilies of CaAQPs. Expression analysis of the AQP genes indicated that some of the genes are tissue-specific, whereas few other AQP genes showed differential expression in response to biotic and abiotic stresses. Promoter profiling of CaAQP genes for conserved cis-acting regulatory elements revealed enrichment of cis-elements involved in circadian control, light response, defense and stress responsiveness

  10. Genome-Wide Survey of Flavonoid Biosynthesis Genes and Gene Expression Analysis between Black- and Yellow-Seeded Brassica napus

    PubMed Central

    Qu, Cunmin; Zhao, Huiyan; Fu, Fuyou; Wang, Zhen; Zhang, Kai; Zhou, Yan; Wang, Xin; Wang, Rui; Xu, Xinfu; Tang, Zhanglin; Lu, Kun; Li, Jia-Na

    2016-01-01

    Flavonoids, the compounds that impart color to fruits, flowers, and seeds, are the most widespread secondary metabolites in plants. However, a systematic analysis of these loci has not been performed in Brassicaceae. In this study, we isolated 649 nucleotide sequences related to flavonoid biosynthesis, i.e., the Transparent Testa (TT) genes, and their associated amino acid sequences in 17 Brassicaceae species, grouped into Arabidopsis or Brassicaceae subgroups. Moreover, 36 copies of 21 genes of the flavonoid biosynthesis pathway were identified in Arabidopsis thaliana, 53 were identified in Brassica rapa, 50 in Brassica oleracea, and 95 in B. napus, followed the genomic distribution, collinearity analysis and genes triplication of them among Brassicaceae species. The results showed that the extensive gene loss, whole genome triplication, and diploidization that occurred after divergence from the common ancestor. Using qRT-PCR methods, we analyzed the expression of 18 flavonoid biosynthesis genes in 6 yellow- and black-seeded B. napus inbred lines with different genetic background, found that 12 of which were preferentially expressed during seed development, whereas the remaining genes were expressed in all B. napus tissues examined. Moreover, 14 of these genes showed significant differences in expression level during seed development, and all but four of these (i.e., BnTT5, BnTT7, BnTT10, and BnTTG1) had similar expression patterns among the yellow- and black-seeded B. napus. Results showed that the structural genes (BnTT3, BnTT18, and BnBAN), regulatory genes (BnTTG2 and BnTT16) and three encoding transfer proteins (BnTT12, BnTT19, and BnAHA10) might play an crucial roles in the formation of different seed coat colors in B. napus. These data will be helpful for illustrating the molecular mechanisms of flavonoid biosynthesis in Brassicaceae species. PMID:27999578

  11. Genome Diversification Mechanism of Rodent and Lagomorpha Chemokine Genes

    PubMed Central

    Shibata, Kanako; Yoshie, Osamu; Tanase, Sumio

    2013-01-01

    Chemokines are a large family of small cytokines that are involved in host defence and body homeostasis through recruitment of cells expressing their receptors. Their genes are known to undergo rapid evolution. Therefore, the number and content of chemokine genes can be quite diverse among the different species, making the orthologous relationships often ambiguous even between closely related species. Given that rodents and rabbit are useful experimental models in medicine and drug development, we have deduced the chemokine genes from the genome sequences of several rodent species and rabbit and compared them with those of human and mouse to determine the orthologous relationships. The interspecies differences should be taken into consideration when experimental results from animal models are extrapolated into humans. The chemokine gene lists and their orthologous relationships presented here will be useful for studies using these animal models. Our analysis also enables us to reconstruct possible gene duplication processes that generated the different sets of chemokine genes in these species. PMID:23991422

  12. Genome-enabled Discovery of Carbon Sequestration Genes

    SciTech Connect

    Tuskan, Gerald A; Tschaplinski, Timothy J; Kalluri, Udaya C; Yin, Tongming; Yang, Xiaohan; Zhang, Xinye; Engle, Nancy L; Ranjan, Priya; Basu, Manojit M; Gunter, Lee E; Jawdy, Sara; Martin, Madhavi Z; Campbell, Alina S; DiFazio, Stephen P; Davis, John M; Hinchee, Maud; Pinnacchio, Christa; Meilan, R; Busov, V.; Strauss, S

    2009-01-01

    The fate of carbon below ground is likely to be a major factor determining the success of carbon sequestration strategies involving plants. Despite their importance, molecular processes controlling belowground C allocation and partitioning are poorly understood. This project is leveraging the Populus trichocarpa genome sequence to discover genes important to C sequestration in plants and soils. The focus is on the identification of genes that provide key control points for the flow and chemical transformations of carbon in roots, concentrating on genes that control the synthesis of chemical forms of carbon that result in slower turnover rates of soil organic matter (i.e., increased recalcitrance). We propose to enhance carbon allocation and partitioning to roots by 1) modifying the auxin signaling pathway, and the invertase family, which controls sucrose metabolism, and by 2) increasing root proliferation through transgenesis with genes known to control fine root proliferation (e.g., ANT), 3) increasing the production of recalcitrant C metabolites by identifying genes controlling secondary C metabolism by a major mQTL-based gene discovery effort, and 4) increasing aboveground productivity by enhancing drought tolerance to achieve maximum C sequestration. This broad, integrated approach is aimed at ultimately enhancing root biomass as well as root detritus longevity, providing the best prospects for significant enhancement of belowground C sequestration.

  13. Synthetic zinc finger proteins: the advent of targeted gene regulation and genome modification technologies.

    PubMed

    Gersbach, Charles A; Gaj, Thomas; Barbas, Carlos F

    2014-08-19

    The understanding of gene regulation and the structure and function of the human genome increased dramatically at the end of the 20th century. Yet the technologies for manipulating the genome have been slower to develop. For instance, the field of gene therapy has been focused on correcting genetic diseases and augmenting tissue repair for more than 40 years. However, with the exception of a few very low efficiency approaches, conventional genetic engineering methods have only been able to add auxiliary genes to cells. This has been a substantial obstacle to the clinical success of gene therapies and has also led to severe unintended consequences in several cases. Therefore, technologies that facilitate the precise modification of cellular genomes have diverse and significant implications in many facets of research and are essential for translating the products of the Genomic Revolution into tangible benefits for medicine and biotechnology. To address this need, in the 1990s, we embarked on a mission to develop technologies for engineering protein-DNA interactions with the aim of creating custom tools capable of targeting any DNA sequence. Our goal has been to allow researchers to reach into genomes to specifically regulate, knock out, or replace any gene. To realize these goals, we initially focused on understanding and manipulating zinc finger proteins. In particular, we sought to create a simple and straightforward method that enables unspecialized laboratories to engineer custom DNA-modifying proteins using only defined modular components, a web-based utility, and standard recombinant DNA technology. Two significant challenges we faced were (i) the development of zinc finger domains that target sequences not recognized by naturally occurring zinc finger proteins and (ii) determining how individual zinc finger domains could be tethered together as polydactyl proteins to recognize unique locations within complex genomes. We and others have since used this modular

  14. Census of solo LuxR genes in prokaryotic genomes

    PubMed Central

    Hudaiberdiev, Sanjarbek; Choudhary, Kumari S.; Vera Alvarez, Roberto; Gelencsér, Zsolt; Ligeti, Balázs; Lamba, Doriano; Pongor, Sándor

    2015-01-01

    luxR genes encode transcriptional regulators that control acyl homoserine lactone-based quorum sensing (AHL QS) in Gram negative bacteria. On the bacterial chromosome, luxR genes are usually found next or near to a luxI gene encoding the AHL signal synthase. Recently, a number of luxR genes were described that have no luxI genes in their vicinity on the chromosome. These so-called solo luxR genes may either respond to internal AHL signals produced by a non-adjacent luxI in the chromosome, or can respond to exogenous signals. Here we present a survey of solo luxR genes found in complete and draft bacterial genomes in the NCBI databases using HMMs. We found that 2698 of the 3550 luxR genes found are solos, which is an unexpectedly high number even if some of the hits may be false positives. We also found that solo LuxR sequences form distinct clusters that are different from the clusters of LuxR sequences that are part of the known luxR-luxI topological arrangements. We also found a number of cases that we termed twin luxR topologies, in which two adjacent luxR genes were in tandem or divergent orientation. Many of the luxR solo clusters were devoid of the sequence motifs characteristic of AHL binding LuxR proteins so there is room to speculate that the solos may be involved in sensing hitherto unknown signals. It was noted that only some of the LuxR clades are rich in conserved cysteine residues. Molecular modeling suggests that some of the cysteines may be involved in disulfide formation, which makes us speculate that some LuxR proteins, including some of the solos may be involved in redox regulation. PMID:25815274

  15. Orthopoxvirus Genome Evolution: The Role of Gene Loss

    PubMed Central

    Hendrickson, Robert Curtis; Wang, Chunlin; Hatcher, Eneida L.; Lefkowitz, Elliot J.

    2010-01-01

    Poxviruses are highly successful pathogens, known to infect a variety of hosts. The family Poxviridae includes Variola virus, the causative agent of smallpox, which has been eradicated as a public health threat but could potentially reemerge as a bioterrorist threat. The risk scenario includes other animal poxviruses and genetically engineered manipulations of poxviruses. Studies of orthologous gene sets have established the evolutionary relationships of members within the Poxviridae family. It is not clear, however, how variations between family members arose in the past, an important issue in understanding how these viruses may vary and possibly produce future threats. Using a newly developed poxvirus-specific tool, we predicted accurate gene sets for viruses with completely sequenced genomes in the genus Orthopoxvirus. Employing sensitive sequence comparison techniques together with comparison of syntenic gene maps, we established the relationships between all viral gene sets. These techniques allowed us to unambiguously identify the gene loss/gain events that have occurred over the course of orthopoxvirus evolution. It is clear that for all existing Orthopoxvirus species, no individual species has acquired protein-coding genes unique to that species. All existing species contain genes that are all present in members of the species Cowpox virus and that cowpox virus strains contain every gene present in any other orthopoxvirus strain. These results support a theory of reductive evolution in which the reduction in size of the core gene set of a putative ancestral virus played a critical role in speciation and confining any newly emerging virus species to a particular environmental (host or tissue) niche. PMID:21994715

  16. Three-dimensional Structure of a Viral Genome-delivery Portal Vertex

    SciTech Connect

    A Olia; P Prevelige Jr.; J Johnson; G Cingolani

    2011-12-31

    DNA viruses such as bacteriophages and herpesviruses deliver their genome into and out of the capsid through large proteinaceous assemblies, known as portal proteins. Here, we report two snapshots of the dodecameric portal protein of bacteriophage P22. The 3.25-{angstrom}-resolution structure of the portal-protein core bound to 12 copies of gene product 4 (gp4) reveals a {approx}1.1-MDa assembly formed by 24 proteins. Unexpectedly, a lower-resolution structure of the full-length portal protein unveils the unique topology of the C-terminal domain, which forms a {approx}200-{angstrom}-long {alpha}-helical barrel. This domain inserts deeply into the virion and is highly conserved in the Podoviridae family. We propose that the barrel domain facilitates genome spooling onto the interior surface of the capsid during genome packaging and, in analogy to a rifle barrel, increases the accuracy of genome ejection into the host cell.

  17. Genes encoding calmodulin-binding proteins in the Arabidopsis genome

    NASA Technical Reports Server (NTRS)

    Reddy, Vaka S.; Ali, Gul S.; Reddy, Anireddy S N.

    2002-01-01

    Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.

  18. Genes encoding calmodulin-binding proteins in the Arabidopsis genome.

    PubMed

    Reddy, Vaka S; Ali, Gul S; Reddy, Anireddy S N

    2002-03-22

    Analysis of the recently completed Arabidopsis genome sequence indicates that approximately 31% of the predicted genes could not be assigned to functional categories, as they do not show any sequence similarity with proteins of known function from other organisms. Calmodulin (CaM), a ubiquitous and multifunctional Ca(2+) sensor, interacts with a wide variety of cellular proteins and modulates their activity/function in regulating diverse cellular processes. However, the primary amino acid sequence of the CaM-binding domain in different CaM-binding proteins (CBPs) is not conserved. One way to identify most of the CBPs in the Arabidopsis genome is by protein-protein interaction-based screening of expression libraries with CaM. Here, using a mixture of radiolabeled CaM isoforms from Arabidopsis, we screened several expression libraries prepared from flower meristem, seedlings, or tissues treated with hormones, an elicitor, or a pathogen. Sequence analysis of 77 positive clones that interact with CaM in a Ca(2+)-dependent manner revealed 20 CBPs, including 14 previously unknown CBPs. In addition, by searching the Arabidopsis genome sequence with the newly identified and known plant or animal CBPs, we identified a total of 27 CBPs. Among these, 16 CBPs are represented by families with 2-20 members in each family. Gene expression analysis revealed that CBPs and CBP paralogs are expressed differentially. Our data suggest that Arabidopsis has a large number of CBPs including several plant-specific ones. Although CaM is highly conserved between plants and animals, only a few CBPs are common to both plants and animals. Analysis of Arabidopsis CBPs revealed the presence of a variety of interesting domains. Our analyses identified several hypothetical proteins in the Arabidopsis genome as CaM targets, suggesting their involvement in Ca(2+)-mediated signaling networks.

  19. Genome-wide analysis of SAUR gene family in Solanaceae species.

    PubMed

    Wu, Jian; Liu, Songyu; He, Yanjun; Guan, Xiaoyan; Zhu, Xiangfei; Cheng, Lin; Wang, Jie; Lu, Gang

    2012-11-01

    The plant hormone auxin plays a vital role in regulating many aspects of plant growth and development. Small auxin up-regulated RNAs (SAURs) are primary auxin response genes hypothesized to be involved in auxin signaling pathway, but their functions remain unclear. Here, a genome-wide search for SAUR gene homologues in Solanaceae species identified 99 and 134 members of SAUR gene family from tomato and potato, respectively. Phylogenetic analysis indicated that the SAUR proteins from Arabidopsis, rice, sorghum, tomato and potato were divided into four major groups with 16 subgroups. Among them, 25 histidine-rich SAURs genes with metal-binding characteristics were found in Arabidopsis, sorghum and Solanaceae species, but not in rice. Using tomato as a model, a comprehensive overview of SAUR gene family is presented, including the gene structures, phylogeny and chromosome locations. Quantitative real-time PCR analysis indicated that 11 randomly selected SlSAUR genes in tomato could be expressed at least in one of the tomato organs/tissues tested. However, different SlSAUR genes displayed distinctive expression levels. SlSAUR16 and SlSAUR71 exhibited highly tissue-specific expression patterns. Almost all of the detected SlSAURs showed an accumulating pattern of mRNA along tomato flower and fruit development. Some of them displayed differential response to exogenous IAA treatment. The abiotic (cold, salt and drought) stresses significantly modified transcript levels of SlSAURs genes. Most of them were down-regulated in response to abiotic stresses (drought, heat and salinity), but SlSAUR58, as a histidine-rich SAUR gene, was up-regulated after salt treatment, indicating that it may play a specific role in the salt signaling transduction pathway. Our comparative analysis provides some basic genomic information for the SAUR genes in the Solanaceae species and will pave the way for deciphering their function during plant development.

  20. Phylogeny Inference of Closely Related Bacterial Genomes: Combining the Features of Both Overlapping Genes and Collinear Genomic Regions

    PubMed Central

    Zhang, Yan-Cong; Lin, Kui

    2015-01-01

    Overlapping genes (OGs) represent one type of widespread genomic feature in bacterial genomes and have been used as rare genomic markers in phylogeny inference of closely related bacterial species. However, the inference may experience a decrease in performance for phylogenomic analysis of too closely or too distantly related genomes. Another drawback of OGs as phylogenetic markers is that they usually take little account of the effects of genomic rearrangement on the similarity estimation, such as intra-chromosome/genome translocations, horizontal gene transfer, and gene losses. To explore such effects on the accuracy of phylogeny reconstruction, we combine phylogenetic signals of OGs with collinear genomic regions, here called locally collinear blocks (LCBs). By putting these together, we refine our previous metric of pairwise similarity between two closely related bacterial genomes. As a case study, we used this new method to reconstruct the phylogenies of 88 Enterobacteriale genomes of the class Gammaproteobacteria. Our results demonstrated that the topological accuracy of the inferred phylogeny was improved when both OGs and LCBs were simultaneously considered, suggesting that combining these two phylogenetic markers may reduce, to some extent, the influence of gene loss on phylogeny inference. Such phylogenomic studies, we believe, will help us to explore a more effective approach to increasing the robustness of phylogeny reconstruction of closely related bacterial organisms. PMID:26715828

  1. Gene Loss and Movement in the Maize Genome

    PubMed Central

    Lai, Jinsheng; Ma, Jianxin; Swigoňová, Zuzana; Ramakrishna, Wusirika; Linton, Eric; Llaca, Victor; Tanyolac, Bahattin; Park, Yong-Jin; Jeong, O-Young; Bennetzen, Jeffrey L.; Messing, Joachim

    2004-01-01

    Maize (Zea mays L. ssp. mays), one of the most important agricultural crops in the world, originated by hybridization of two closely related progenitors. To investigate the fate of its genes after tetraploidization, we analyzed the sequence of five duplicated regions from different chromosomal locations. We also compared corresponding regions from sorghum and rice, two important crops that have largely collinear maps with maize. The split of sorghum and maize progenitors was recently estimated to be 11.9 Mya, whereas rice diverged from the common ancestor of maize and sorghum ∼50 Mya. A data set of roughly 4 Mb yielded 206 predicted genes from the three species, excluding any transposon-related genes, but including eight gene remnants. On average, 14% of the genes within the aligned regions are noncollinear between any two species. However, scoring each maize region separately, the set of noncollinear genes between all four regions jumps to 68%. This is largely because at least 50% of the duplicated genes from the two progenitors of maize have been lost over a very short period of time, possibly as short as 5 million years. Using the nearly completed rice sequence, we found noncollinear genes in other chromosomal positions, frequently in more than one. This demonstrates that many genes in these species have moved to new chromosomal locations in the last 50 million years or less, most as single gene events that did not dramatically alter gene structure. PMID:15466290

  2. Genomic Copy Number Dictates a Gene-Independent Cell Response to CRISPR/Cas9 Targeting | Office of Cancer Genomics

    Cancer.gov

    The CRISPR/Cas9 system enables genome editing and somatic cell genetic screens in mammalian cells. We performed genome-scale loss-of-function screens in 33 cancer cell lines to identify genes essential for proliferation/survival and found a strong correlation between increased gene copy number and decreased cell viability after genome editing. Within regions of copy-number gain, CRISPR/Cas9 targeting of both expressed and unexpressed genes, as well as intergenic loci, led to significantly decreased cell proliferation through induction of a G2 cell-cycle arrest.

  3. 3D structures of individual mammalian genomes studied by single-cell Hi-C.

    PubMed

    Stevens, Tim J; Lando, David; Basu, Srinjan; Atkinson, Liam P; Cao, Yang; Lee, Steven F; Leeb, Martin; Wohlfahrt, Kai J; Boucher, Wayne; O'Shaughnessy-Kirwan, Aoife; Cramard, Julie; Faure, Andre J; Ralser, Meryem; Blanco, Enrique; Morey, Lluis; Sansó, Miriam; Palayret, Matthieu G S; Lehner, Ben; Di Croce, Luciano; Wutz, Anton; Hendrich, Brian; Klenerman, Dave; Laue, Ernest D

    2017-04-06

    The folding of genomic DNA from the beads-on-a-string-like structure of nucleosomes into higher-order assemblies is crucially linked to nuclear processes. Here we calculate 3D structures of entire mammalian genomes using data from a new chromosome conformation capture procedure that allows us to first image and then process single cells. The technique enables genome folding to be examined at a scale of less than 100 kb, and chromosome structures to be validated. The structures of individual topological-associated domains and loops vary substantially from cell to cell. By contrast, A and B compartments, lamina-associated domains and active enhancers and promoters are organized in a consistent way on a genome-wide basis in every cell, suggesting that they could drive chromosome and genome folding. By studying genes regulated by pluripotency factor and nucleosome remodelling deacetylase (NuRD), we illustrate how the determination of single-cell genome structure provides a new approach for investigating biological processes.

  4. The vacuolar protein sorting genes in insects: A comparative genome view.

    PubMed

    Li, Zhaofei; Blissard, Gary

    2015-07-01

    In eukaryotic cells, regulated vesicular trafficking is critical for directing protein transport and for recycling and degradation of membrane lipids and proteins. Through carefully regulated transport vesicles, the endomembrane system performs a large and important array of dynamic cellular functions while maintaining the integrity of the cellular membrane system. Genetic studies in yeast Saccharomyces cerevisiae have identified approximately 50 vacuolar protein sorting (VPS) genes involved in vesicle trafficking, and most of these genes are also characterized in mammals. The VPS proteins form distinct functional complexes, which include complexes known as ESCRT, retromer, CORVET, HOPS, GARP, and PI3K-III. Little is known about the orthologs of VPS proteins in insects. Here, with the newly annotated Manduca sexta genome, we carried out genomic comparative analysis of VPS proteins in yeast, humans, and 13 sequenced insect genomes representing the Orders Hymenoptera, Diptera, Hemiptera, Phthiraptera, Lepidoptera, and Coleoptera. Amino acid sequence alignments and domain/motif structure analyses reveal that most of the components of ESCRT, retromer, CORVET, HOPS, GARP, and PI3K-III are evolutionarily conserved across yeast, insects, and humans. However, in contrast to the VPS gene expansions observed in the human genome, only four VPS genes (VPS13, VPS16, VPS33, and VPS37) were expanded in the six insect Orders. Additionally, VPS2 was expanded only in species from Phthiraptera, Lepidoptera, and Coleoptera. These studies provide a baseline for understanding the evolution of vesicular trafficking across yeast, insect, and human genomes, and also provide a basis for further addressing specific functional roles of VPS proteins in insects.

  5. Sense-antisense gene pairs: sequence, transcription, and structure are not conserved between human and mouse

    PubMed Central

    Wood, Emily J.; Chin-Inmanu, Kwanrutai; Jia, Hui; Lipovich, Leonard

    2013-01-01

    Previous efforts to characterize conservation between the human and mouse genomes focused largely on sequence comparisons. These studies are inherently limited because they don't account for gene structure differences, which may exist despite genomic sequence conservation. Recent high-throughput transcriptome studies have revealed widespread and extensive overlaps between genes, and transcripts, encoded on both strands of the genomic sequence. This overlapping gene organization, which produces sense-antisense (SAS) gene pairs, is capable of effecting regulatory cascades through established mechanisms. We present an evolutionary conservation assessment of SAS pairs, on three levels: genomic, transcriptomic, and structural. From a genome-wide dataset of human SAS pairs, we first identified orthologous loci in the mouse genome, then assessed their transcription in the mouse, and finally compared the genomic structures of SAS pairs expressed in both species. We found that approximately half of human SAS loci have single orthologous locations in the mouse genome; however, only half of those orthologous locations have SAS transcriptional activity in the mouse. This suggests that high human-mouse gene conservation overlooks widespread distinctions in SAS pair incidence and expression. We compared gene structures at orthologous SAS loci, finding frequent differences in gene structure between human and orthologous mouse SAS pair members. Our categorization of human SAS pairs with respect to mouse conservation of expression as well as structure points to limitations of mouse models. Gene structure differences, including at SAS loci, may account for some of the phenotypic distinctions between primates and rodents. Genes in non-conserved SAS pairs may contribute to evolutionary lineage-specific regulatory outcomes. PMID:24133500

  6. The compact Selaginella genome identifies changes in gene content associated with the evolution of vascular plants

    SciTech Connect

    Grigoriev, Igor V.; Banks, Jo Ann; Nishiyama, Tomoaki; Hasebe, Mitsuyasu; Bowman, John L.; Gribskov, Michael; dePamphilis, Claude; Albert, Victor A.; Aono, Naoki; Aoyama, Tsuyoshi; Ambrose, Barbara A.; Ashton, Neil W.; Axtell, Michael J.; Barker, Elizabeth; Barker, Michael S.; Bennetzen, Jeffrey L.; Bonawitz, Nicholas D.; Chapple, Clint; Cheng, Chaoyang; Correa, Luiz Gustavo Guedes; Dacre, Michael; DeBarry, Jeremy; Dreyer, Ingo; Elias, Marek; Engstrom, Eric M.; Estelle, Mark; Feng, Liang; Finet, Cedric; Floyd, Sandra K.; Frommer, Wolf B.; Fujita, Tomomichi; Gramzow, Lydia; Gutensohn, Michael; Harholt, Jesper; Hattori, Mitsuru; Heyl, Alexander; Hirai, Tadayoshi; Hiwatashi, Yuji; Ishikawa, Masaki; Iwata, Mineko; Karol, Kenneth G.; Koehler, Barbara; Kolukisaoglu, Uener; Kubo, Minoru; Kurata, Tetsuya; Lalonde, Sylvie; Li, Kejie; Li, Ying; Litt, Amy; Lyons, Eric; Manning, Gerard; Maruyama, Takeshi; Michael, Todd P.; Mikami, Koji; Miyazaki, Saori; Morinaga, Shin-ichi; Murata, Takashi; Mueller-Roeber, Bernd; Nelson, David R.; Obara, Mari; Oguri, Yasuko; Olmstead, Richard G.; Onodera, Naoko; Petersen, Bent Larsen; Pils, Birgit; Prigge, Michael; Rensing, Stefan A.; Riano-Pachon, Diego Mauricio; Roberts, Alison W.; Sato, Yoshikatsu; Scheller, Henrik Vibe; Schulz, Burkhard; Schulz, Christian; Shakirov, Eugene V.; Shibagaki, Nakako; Shinohara, Naoki; Shippen, Dorothy E.; Sorensen, Iben; Sotooka, Ryo; Sugimoto, Nagisa; Sugita, Mamoru; Sumikawa, Naomi; Tanurdzic, Milos; Theilsen, Gunter; Ulvskov, Peter; Wakazuki, Sachiko; Weng, Jing-Ke; Willats, William W.G.T.; Wipf, Daniel; Wolf, Paul G.; Yang, Lixing; Zimmer, Andreas D.; Zhu, Qihui; Mitros, Therese; Hellsten, Uffe; Loque, Dominique; Otillar, Robert; Salamov, Asaf; Schmutz, Jeremy; Shapiro, Harris; Lindquist, Erika; Lucas, Susan; Rokhsar, Daniel

    2011-04-28

    We report the genome sequence of the nonseed vascular plant, Selaginella moellendorffii, and by comparative genomics identify genes that likely played important roles in the early evolution of vascular plants and their subsequent evolution

  7. Protein interaction maps for complete genomes based on gene fusion events

    NASA Astrophysics Data System (ADS)

    Enright, Anton J.; Iliopoulos, Ioannis; Kyrpides, Nikos C.; Ouzounis, Christos A.

    1999-11-01

    A large-scale effort to measure, detect and analyse protein-protein interactions using experimental methods is under way. These include biochemistry such as co-immunoprecipitation or crosslinking, molecular biology such as the two-hybrid system or phage display, and genetics such as unlinked noncomplementing mutant detection. Using the two-hybrid system, an international effort to analyse the complete yeast genome is in progress. Evidently, all these approaches are tedious, labour intensive and inaccurate. From a computational perspective, the question is how can we predict that two proteins interact from structure or sequence alone. Here we present a method that identifies gene-fusion events in complete genomes, solely based on sequence comparison. Because there must be selective pressure for certain genes to be fused over the course of evolution, we are able to predict functional associations of proteins. We show that 215 genes or proteins in the complete genomes of Escherichia coli, Haemophilus influenzae and Methanococcus jannaschii are involved in 64 unique fusion events. The approach is general, and can be applied even to genes of unknown function.

  8. Bidirectional promoters of insects: genome-wide comparison, evolutionary implication and influence on gene expression.

    PubMed

    Behura, Susanta K; Severson, David W

    2015-01-30

    Bidirectional promoters are widespread in insect genomes. By analyzing 23 insect genomes we show that the frequency of bidirectional gene pairs varies according to genome compactness and density of genes among the species. The density of bidirectional genes expected based on number of genes per megabase of genome explains the observed density suggesting that bidirectional pairing of genes may be due to random event. We identified specific transcription factor binding motifs that are enriched in bidirectional promoters across insect species. Furthermore, we observed that bidirectional promoters may act as transcriptional hotspots in insect genomes where protein coding genes tend to aggregate in significantly biased (p < 0.001) manner compared to unidirectional promoters. Natural selection seems to have an association with the extent of bidirectionality of genes among the species. The rate of non-synonymous-to-synonymous changes (dN/dS) shows a second-order polynomial distribution with bidirectionality between species indicating that bidirectionality is dependent upon evolutionary pressure acting on the genomes. Analysis of genome-wide microarray expression data of multiple insect species suggested that bidirectionality has a similar association with transcriptome variation across species. Furthermore, bidirectional promoters show significant association with correlated expression of the divergent gene pairs depending upon their motif composition. Analysis of gene ontology showed that bidirectional genes tend to have a common association with functions related to "binding" (including ion binding, nucleotide binding and protein binding) across genomes. Such functional constraint of bidirectional genes may explain their widespread persistence in genome of diverse insect species.

  9. The genomic organization of the human transcription factor 3 (TFE3) gene

    SciTech Connect

    Macchi, P.; Repetto, M.; Villa, A.; Vezzoni, P.

    1995-08-10

    We have determined the exon-intron structure of the human TFE3 gene located on Xp11.22-23. By designing PCR primers, we were able to amplify various segments of the TFE3 genomic region, thus establishing that this gene is composed of seven exons, the first six of which are small (from 56 to 159 nt). The 5{prime} UT region is contained entirely in the first exon, while the 3{prime} UT region is contained in the seventh exon. The comparison of the genomic and the published cDNA versions revealed that the deduced amino acid sequence of TFE3 in the C-terminus region is 125 amino acids shorter than previously reported. This eliminates most of the putative proline- and arginine-rich domain and makes the human sequence more similar to its mouse homolog. The activation domain at the N-terminus is contained in exon 2, as has been described for the mouse. The basic helix-loop-helix (BHLH) motif is spread over exons 4 to 6, while the leucine zipper (LZ) is almost all contained in the last portion of exon 6. This split BHLH is different from other BHLH-LZ genes whose genomic structures have been determined up to now. 20 refs., 1 fig., 1 tab.

  10. Genomic organization of the human SCN5A gene encoding the cardiac sodium channel

    SciTech Connect

    Wang, Qing; Li, Zhizhong; Shen, Jiaxiang; Keating, M.T.

    1996-05-15

    The voltage-gated cardiac sodium channel, SCN5A, is responsible for the initial upstroke of the action potential. Mutations in the human SCN5A gene cause susceptibility to cardiac arrhythmias and sudden death in the long QT syndrome (LQT). In this report we characterize the genomic structure of SCN5A. SCN5A consists of 28 exons spanning approximately 80 kb on chromosome 3p21. We describe the sequences of all intron/exon boundaries and a dinucleotide repeat polymorphism in intron 16. Oligonucleotide primers based on exon-flanking sequences amplify all SCN5A exons by PCR. This work establishes the complete genomic organization of SCN5A and will enable high-resolution analyses of this locus for mutations associated with LQT and other phenotypes for which SCN5A may be a candidate gene. 40 refs., 4 figs., 2 tabs.

  11. Horizontal gene transfer from diverse bacteria to an insect genome enables a tripartite nested mealybug symbiosis.

    PubMed

    Husnik, Filip; Nikoh, Naruo; Koga, Ryuichi; Ross, Laura; Duncan, Rebecca P; Fujie, Manabu; Tanaka, Makiko; Satoh, Nori; Bachtrog, Doris; Wilson, Alex C C; von Dohlen, Carol D; Fukatsu, Takema; McCutcheon, John P

    2013-06-20

    The smallest reported bacterial genome belongs to Tremblaya princeps, a symbiont of Planococcus citri mealybugs (PCIT). Tremblaya PCIT not only has a 139 kb genome, but possesses its own bacterial endosymbiont, Moranella endobia. Genome and transcriptome sequencing, including genome sequencing from a Tremblaya lineage lacking intracellular bacteria, reveals that the extreme genomic degeneracy of Tremblaya PCIT likely resulted from acquiring Moranella as an endosymbiont. In addition, at least 22 expressed horizontally transferred genes from multiple diverse bacteria to the mealybug genome likely complement missing symbiont genes. However, none of these horizontally transferred genes are from Tremblaya, showing that genome reduction in this symbiont has not been enabled by gene transfer to the host nucleus. Our results thus indicate that the functioning of this three-way symbiosis is dependent on genes from at least six lineages of organisms and reveal a path to intimate endosymbiosis distinct from that followed by organelles.

  12. Molecular Networking and Pattern-Based Genome Mining Improves Discovery of Biosynthetic Gene Clusters and their Products from Salinispora Species

    SciTech Connect

    Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; Sarkar, Anindita; Li, Jie; Ziemert, Nadine; Wang, Mingxun; Bandeira, Nuno; Moore, Bradley S.; Dorrestein, Pieter C.; Jensen, Paul R.

    2015-04-09

    Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated the identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.