Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U.; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N.; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O.
2014-01-01
Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes. PMID:25264628
Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O
2014-01-01
Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes.
De Novo Origin of Human Protein-Coding Genes
Wu, Dong-Dong; Irwin, David M.; Zhang, Ya-Ping
2011-01-01
The de novo origin of a new protein-coding gene from non-coding DNA is considered to be a very rare occurrence in genomes. Here we identify 60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee. The functionality of these genes is supported by both transcriptional and proteomic evidence. RNA–seq data indicate that these genes have their highest expression levels in the cerebral cortex and testes, which might suggest that these genes contribute to phenotypic traits that are unique to humans, such as improved cognitive ability. Our results are inconsistent with the traditional view that the de novo origin of new genes is very rare, thus there should be greater appreciation of the importance of the de novo origination of genes. PMID:22102831
A Dual Origin of the Xist Gene from a Protein-Coding Gene and a Set of Transposable Elements
Elisaphenko, Eugeny A.; Kolesnikov, Nikolay N.; Shevchenko, Alexander I.; Rogozin, Igor B.; Nesterova, Tatyana B.; Brockdorff, Neil; Zakian, Suren M.
2008-01-01
X-chromosome inactivation, which occurs in female eutherian mammals is controlled by a complex X-linked locus termed the X-inactivation center (XIC). Previously it was proposed that genes of the XIC evolved, at least in part, as a result of pseudogenization of protein-coding genes. In this study we show that the key XIC gene Xist, which displays fragmentary homology to a protein-coding gene Lnx3, emerged de novo in early eutherians by integration of mobile elements which gave rise to simple tandem repeats. The Xist gene promoter region and four out of ten exons found in eutherians retain homology to exons of the Lnx3 gene. The remaining six Xist exons including those with simple tandem repeats detectable in their structure have similarity to different transposable elements. Integration of mobile elements into Xist accompanies the overall evolution of the gene and presumably continues in contemporary eutherian species. Additionally we showed that the combination of remnants of protein-coding sequences and mobile elements is not unique to the Xist gene and is found in other XIC genes producing non-coding nuclear RNA. PMID:18575625
Genes uniquely expressed in human growth plate chondrocytes uncover a distinct regulatory network.
Li, Bing; Balasubramanian, Karthika; Krakow, Deborah; Cohn, Daniel H
2017-12-20
Chondrogenesis is the earliest stage of skeletal development and is a highly dynamic process, integrating the activities and functions of transcription factors, cell signaling molecules and extracellular matrix proteins. The molecular mechanisms underlying chondrogenesis have been extensively studied and multiple key regulators of this process have been identified. However, a genome-wide overview of the gene regulatory network in chondrogenesis has not been achieved. In this study, employing RNA sequencing, we identified 332 protein coding genes and 34 long non-coding RNA (lncRNA) genes that are highly selectively expressed in human fetal growth plate chondrocytes. Among the protein coding genes, 32 genes were associated with 62 distinct human skeletal disorders and 153 genes were associated with skeletal defects in knockout mice, confirming their essential roles in skeletal formation. These gene products formed a comprehensive physical interaction network and participated in multiple cellular processes regulating skeletal development. The data also revealed 34 transcription factors and 11,334 distal enhancers that were uniquely active in chondrocytes, functioning as transcriptional regulators for the cartilage-selective genes. Our findings revealed a complex gene regulatory network controlling skeletal development whereby transcription factors, enhancers and lncRNAs participate in chondrogenesis by transcriptional regulation of key genes. Additionally, the cartilage-selective genes represent candidate genes for unsolved human skeletal disorders.
Fujisawa, Takatomo; Narikawa, Rei; Okamoto, Shinobu; Ehira, Shigeki; Yoshimura, Hidehisa; Suzuki, Iwane; Masuda, Tatsuru; Mochimaru, Mari; Takaichi, Shinichi; Awai, Koichiro; Sekine, Mitsuo; Horikawa, Hiroshi; Yashiro, Isao; Omata, Seiha; Takarada, Hiromi; Katano, Yoko; Kosugi, Hiroki; Tanikawa, Satoshi; Ohmori, Kazuko; Sato, Naoki; Ikeuchi, Masahiko; Fujita, Nobuyuki; Ohmori, Masayuki
2010-01-01
A filamentous non-N2-fixing cyanobacterium, Arthrospira (Spirulina) platensis, is an important organism for industrial applications and as a food supply. Almost the complete genome of A. platensis NIES-39 was determined in this study. The genome structure of A. platensis is estimated to be a single, circular chromosome of 6.8 Mb, based on optical mapping. Annotation of this 6.7 Mb sequence yielded 6630 protein-coding genes as well as two sets of rRNA genes and 40 tRNA genes. Of the protein-coding genes, 78% are similar to those of other organisms; the remaining 22% are currently unknown. A total 612 kb of the genome comprise group II introns, insertion sequences and some repetitive elements. Group I introns are located in a protein-coding region. Abundant restriction-modification systems were determined. Unique features in the gene composition were noted, particularly in a large number of genes for adenylate cyclase and haemolysin-like Ca2+-binding proteins and in chemotaxis proteins. Filament-specific genes were highlighted by comparative genomic analysis. PMID:20203057
Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana.
Mayer, K; Schüller, C; Wambutt, R; Murphy, G; Volckaert, G; Pohl, T; Düsterhöft, A; Stiekema, W; Entian, K D; Terryn, N; Harris, B; Ansorge, W; Brandt, P; Grivell, L; Rieger, M; Weichselgartner, M; de Simone, V; Obermaier, B; Mache, R; Müller, M; Kreis, M; Delseny, M; Puigdomenech, P; Watson, M; Schmidtheini, T; Reichert, B; Portatelle, D; Perez-Alonso, M; Boutry, M; Bancroft, I; Vos, P; Hoheisel, J; Zimmermann, W; Wedler, H; Ridley, P; Langham, S A; McCullagh, B; Bilham, L; Robben, J; Van der Schueren, J; Grymonprez, B; Chuang, Y J; Vandenbussche, F; Braeken, M; Weltjens, I; Voet, M; Bastiaens, I; Aert, R; Defoor, E; Weitzenegger, T; Bothe, G; Ramsperger, U; Hilbert, H; Braun, M; Holzer, E; Brandt, A; Peters, S; van Staveren, M; Dirske, W; Mooijman, P; Klein Lankhorst, R; Rose, M; Hauf, J; Kötter, P; Berneiser, S; Hempel, S; Feldpausch, M; Lamberth, S; Van den Daele, H; De Keyser, A; Buysshaert, C; Gielen, J; Villarroel, R; De Clercq, R; Van Montagu, M; Rogers, J; Cronin, A; Quail, M; Bray-Allen, S; Clark, L; Doggett, J; Hall, S; Kay, M; Lennard, N; McLay, K; Mayes, R; Pettett, A; Rajandream, M A; Lyne, M; Benes, V; Rechmann, S; Borkova, D; Blöcker, H; Scharfe, M; Grimm, M; Löhnert, T H; Dose, S; de Haan, M; Maarse, A; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Fartmann, B; Granderath, K; Dauner, D; Herzl, A; Neumann, S; Argiriou, A; Vitale, D; Liguori, R; Piravandi, E; Massenet, O; Quigley, F; Clabauld, G; Mündlein, A; Felber, R; Schnabl, S; Hiller, R; Schmidt, W; Lecharny, A; Aubourg, S; Chefdor, F; Cooke, R; Berger, C; Montfort, A; Casacuberta, E; Gibbons, T; Weber, N; Vandenbol, M; Bargues, M; Terol, J; Torres, A; Perez-Perez, A; Purnelle, B; Bent, E; Johnson, S; Tacon, D; Jesse, T; Heijnen, L; Schwarz, S; Scholler, P; Heber, S; Francs, P; Bielke, C; Frishman, D; Haase, D; Lemcke, K; Mewes, H W; Stocker, S; Zaccaria, P; Bevan, M; Wilson, R K; de la Bastide, M; Habermann, K; Parnell, L; Dedhia, N; Gnoj, L; Schutz, K; Huang, E; Spiegel, L; Sehkon, M; Murray, J; Sheet, P; Cordes, M; Abu-Threideh, J; Stoneking, T; Kalicki, J; Graves, T; Harmon, G; Edwards, J; Latreille, P; Courtney, L; Cloud, J; Abbott, A; Scott, K; Johnson, D; Minx, P; Bentley, D; Fulton, B; Miller, N; Greco, T; Kemp, K; Kramer, J; Fulton, L; Mardis, E; Dante, M; Pepin, K; Hillier, L; Nelson, J; Spieth, J; Ryan, E; Andrews, S; Geisel, C; Layman, D; Du, H; Ali, J; Berghoff, A; Jones, K; Drone, K; Cotton, M; Joshu, C; Antonoiu, B; Zidanic, M; Strong, C; Sun, H; Lamar, B; Yordan, C; Ma, P; Zhong, J; Preston, R; Vil, D; Shekher, M; Matero, A; Shah, R; Swaby, I K; O'Shaughnessy, A; Rodriguez, M; Hoffmann, J; Till, S; Granat, S; Shohdy, N; Hasegawa, A; Hameed, A; Lodhi, M; Johnson, A; Chen, E; Marra, M; Martienssen, R; McCombie, W R
1999-12-16
The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.
Robson, James F; Barker, Daniel
2015-10-13
To demonstrate the bioinformatics capabilities of a low-cost computer, the Raspberry Pi, we present a comparison of the protein-coding gene content of two species in phylum Chlamydiae: Chlamydia trachomatis, a common sexually transmitted infection of humans, and Candidatus Protochlamydia amoebophila, a recently discovered amoebal endosymbiont. Identifying species-specific proteins and differences in protein families could provide insights into the unique phenotypes of the two species. Using a Raspberry Pi computer, sequence similarity-based protein families were predicted across the two species, C. trachomatis and P. amoebophila, and their members counted. Examples include nine multi-protein families unique to C. trachomatis, 132 multi-protein families unique to P. amoebophila and one family with multiple copies in both. Most families unique to C. trachomatis were polymorphic outer-membrane proteins. Additionally, multiple protein families lacking functional annotation were found. Predicted functional interactions suggest one of these families is involved with the exodeoxyribonuclease V complex. The Raspberry Pi computer is adequate for a comparative genomics project of this scope. The protein families unique to P. amoebophila may provide a basis for investigating the host-endosymbiont interaction. However, additional species should be included; and further laboratory research is required to identify the functions of unknown or putative proteins. Multiple outer membrane proteins were found in C. trachomatis, suggesting importance for host evasion. The tyrosine transport protein family is shared between both species, with four proteins in C. trachomatis and two in P. amoebophila. Shared protein families could provide a starting point for discovery of wide-spectrum drugs against Chlamydiae.
Boyd, David A.; Thevenot, Tracy; Gumbmann, Markus; Honeyman, Allen L.; Hamilton, Ian R.
2000-01-01
Transposon mutagenesis and marker rescue were used to isolate and identify an 8.5-kb contiguous region containing six open reading frames constituting the operon for the sorbitol P-enolpyruvate phosphotransferase transport system (PTS) of Streptococcus mutans LT11. The first gene, srlD, codes for sorbitol-6-phosphate dehydrogenase, followed downstream by srlR, coding for a transcriptional regulator; srlM, coding for a putative activator; and the srlA, srlE, and srlB genes, coding for the EIIC, EIIBC, and EIIA components of the sorbitol PTS, respectively. Among all sorbitol PTS operons characterized to date, the srlD gene is found after the genes coding for the EII components; thus, the location of the gene in S. mutans is unique. The SrlR protein is similar to several transcriptional regulators found in Bacillus spp. that contain PTS regulator domains (J. Stülke, M. Arnaud, G. Rapoport, and I. Martin-Verstraete, Mol. Microbiol. 28:865–874, 1998), and its gene overlaps the srlM gene by 1 bp. The arrangement of these two regulatory genes is unique, having not been reported for other bacteria. PMID:10639465
Kazakoff, Stephen H.; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T.; Gresshoff, Peter M.
2012-01-01
Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® ‘Second Generation DNA Sequencing (2GS)’ and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites. PMID:23272141
Kazakoff, Stephen H; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T; Gresshoff, Peter M
2012-01-01
Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® 'Second Generation DNA Sequencing (2GS)' and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites.
AbouHaidar, Mounir Georges; Venkataraman, Srividhya; Golshani, Ashkan; Liu, Bolin; Ahmad, Tauqeer
2014-10-07
The highly structured (64% GC) covalently closed circular (CCC) RNA (220 nt) of the virusoid associated with rice yellow mottle virus codes for a 16-kDa highly basic protein using novel modalities for coding, translation, and gene expression. This CCC RNA is the smallest among all known viroids and virusoids and the only one that codes proteins. Its sequence possesses an internal ribosome entry site and is directly translated through two (or three) completely overlapping ORFs (shifting to a new reading frame at the end of each round). The initiation and termination codons overlap UGAUGA (underline highlights the initiation codon AUG within the combined initiation-termination sequence). Termination codons can be ignored to obtain larger read-through proteins. This circular RNA with no noncoding sequences is a unique natural supercompact "nanogenome."
Novel numerical and graphical representation of DNA sequences and proteins.
Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D
2006-12-01
We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.
Bogdanov, Yuri F; Dadashev, Sergei Y; Grishaeva, Tatiana M
2003-01-01
Evolutionarily distant organisms have not only orthologs, but also nonhomologous proteins that build functionally similar subcellular structures. For instance, this is true with protein components of the synaptonemal complex (SC), a universal ultrastructure that ensures the successful pairing and recombination of homologous chromosomes during meiosis. We aimed at developing a method to search databases for genes that code for such nonhomologous but functionally analogous proteins. Advantage was taken of the ultrastructural parameters of SC and the conformation of SC proteins responsible for these. Proteins involved in SC central space are known to be similar in secondary structure. Using published data, we found a highly significant correlation between the width of the SC central space and the length of rod-shaped central domain of mammalian and yeast intermediate proteins forming transversal filaments in the SC central space. Basing on this, we suggested a method for searching genome databases of distant organisms for genes whose virtual proteins meet the above correlation requirement. Our recent finding of the Drosophila melanogaster CG17604 gene coding for synaptonemal complex transversal filament protein received experimental support from another lab. With the same strategy, we showed that the Arabidopsis thaliana and Caenorhabditis elegans genomes contain unique genes coding for such proteins.
AbouHaidar, Mounir Georges; Venkataraman, Srividhya; Golshani, Ashkan; Liu, Bolin; Ahmad, Tauqeer
2014-01-01
The highly structured (64% GC) covalently closed circular (CCC) RNA (220 nt) of the virusoid associated with rice yellow mottle virus codes for a 16-kDa highly basic protein using novel modalities for coding, translation, and gene expression. This CCC RNA is the smallest among all known viroids and virusoids and the only one that codes proteins. Its sequence possesses an internal ribosome entry site and is directly translated through two (or three) completely overlapping ORFs (shifting to a new reading frame at the end of each round). The initiation and termination codons overlap UGAUGA (underline highlights the initiation codon AUG within the combined initiation-termination sequence). Termination codons can be ignored to obtain larger read-through proteins. This circular RNA with no noncoding sequences is a unique natural supercompact “nanogenome.” PMID:25253891
Cipriano, Andrea; Ballarino, Monica
2018-01-01
The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as “junk” DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years. PMID:29560353
Activity-Dependent Human Brain Coding/Noncoding Gene Regulatory Networks
Lipovich, Leonard; Dachet, Fabien; Cai, Juan; Bagla, Shruti; Balan, Karina; Jia, Hui; Loeb, Jeffrey A.
2012-01-01
While most gene transcription yields RNA transcripts that code for proteins, a sizable proportion of the genome generates RNA transcripts that do not code for proteins, but may have important regulatory functions. The brain-derived neurotrophic factor (BDNF) gene, a key regulator of neuronal activity, is overlapped by a primate-specific, antisense long noncoding RNA (lncRNA) called BDNFOS. We demonstrate reciprocal patterns of BDNF and BDNFOS transcription in highly active regions of human neocortex removed as a treatment for intractable seizures. A genome-wide analysis of activity-dependent coding and noncoding human transcription using a custom lncRNA microarray identified 1288 differentially expressed lncRNAs, of which 26 had expression profiles that matched activity-dependent coding genes and an additional 8 were adjacent to or overlapping with differentially expressed protein-coding genes. The functions of most of these protein-coding partner genes, such as ARC, include long-term potentiation, synaptic activity, and memory. The nuclear lncRNAs NEAT1, MALAT1, and RPPH1, composing an RNAse P-dependent lncRNA-maturation pathway, were also upregulated. As a means to replicate human neuronal activity, repeated depolarization of SY5Y cells resulted in sustained CREB activation and produced an inverse pattern of BDNF-BDNFOS co-expression that was not achieved with a single depolarization. RNAi-mediated knockdown of BDNFOS in human SY5Y cells increased BDNF expression, suggesting that BDNFOS directly downregulates BDNF. Temporal expression patterns of other lncRNA-messenger RNA pairs validated the effect of chronic neuronal activity on the transcriptome and implied various lncRNA regulatory mechanisms. lncRNAs, some of which are unique to primates, thus appear to have potentially important regulatory roles in activity-dependent human brain plasticity. PMID:22960213
Li, Hu; Liu, Hui; Shi, Aimin; Štys, Pavel; Zhou, Xuguo; Cai, Wanzhi
2012-01-01
Many of true bugs are important insect pests to cultivated crops and some are important vectors of human diseases, but few cladistic analyses have addressed relationships among the seven infraorders of Heteroptera. The Enicocephalomorpha and Nepomorpha are consider the basal groups of Heteroptera, but the basal-most lineage remains unresolved. Here we report the mitochondrial genome of the unique-headed bug Stenopirates sp., the first mitochondrial genome sequenced from Enicocephalomorpha. The Stenopirates sp. mitochondrial genome is a typical circular DNA molecule of 15, 384 bp in length, and contains 37 genes and a large non-coding fragment. The gene order differs substantially from other known insect mitochondrial genomes, with rearrangements of both tRNA genes and protein-coding genes. The overall AT content (82.5%) of Stenopirates sp. is the highest among all the known heteropteran mitochondrial genomes. The strand bias is consistent with other true bugs with negative GC-skew and positive AT-skew for the J-strand. The heteropteran mitochondrial atp8 exhibits the highest evolutionary rate, whereas cox1 appears to have the lowest rate. Furthermore, a negative correlation was observed between the variation of nucleotide substitutions and the GC content of each protein-coding gene. A microsatellite was identified in the putative control region. Finally, phylogenetic reconstruction suggests that Enicocephalomorpha is the sister group to all the remaining Heteroptera. PMID:22235294
Genome fluctuations in cyanobacteria reflect evolutionary, developmental and adaptive traits.
Larsson, John; Nylander, Johan Aa; Bergman, Birgitta
2011-06-30
Cyanobacteria belong to an ancient group of photosynthetic prokaryotes with pronounced variations in their cellular differentiation strategies, physiological capacities and choice of habitat. Sequencing efforts have shown that genomes within this phylum are equally diverse in terms of size and protein-coding capacity. To increase our understanding of genomic changes in the lineage, the genomes of 58 contemporary cyanobacteria were analysed for shared and unique orthologs. A total of 404 protein families, present in all cyanobacterial genomes, were identified. Two of these are unique to the phylum, corresponding to an AbrB family transcriptional regulator and a gene that escapes functional annotation although its genomic neighbourhood is conserved among the organisms examined. The evolution of cyanobacterial genome sizes involves a mix of gains and losses in the clade encompassing complex cyanobacteria, while a single event of reduction is evident in a clade dominated by unicellular cyanobacteria. Genome sizes and gene family copy numbers evolve at a higher rate in the former clade, and multi-copy genes were predominant in large genomes. Orthologs unique to cyanobacteria exhibiting specific characteristics, such as filament formation, heterocyst differentiation, diazotrophy and symbiotic competence, were also identified. An ancestral character reconstruction suggests that the most recent common ancestor of cyanobacteria had a genome size of approx. 4.5 Mbp and 1678 to 3291 protein-coding genes, 4%-6% of which are unique to cyanobacteria today. The different rates of genome-size evolution and multi-copy gene abundance suggest two routes of genome development in the history of cyanobacteria. The expansion strategy is driven by gene-family enlargment and generates a broad adaptive potential; while the genome streamlining strategy imposes adaptations to highly specific niches, also reflected in their different functional capacities. A few genomes display extreme proliferation of non-coding nucleotides which is likely to be the result of initial expansion of genomes/gene copy number to gain adaptive potential, followed by a shift to a life-style in a highly specific niche (e.g. symbiosis). This transition results in redundancy of genes and gene families, leading to an increase in junk DNA and eventually to gene loss. A few orthologs can be correlated with specific phenotypes in cyanobacteria, such as filament formation and symbiotic competence; these constitute exciting exploratory targets.
Genome fluctuations in cyanobacteria reflect evolutionary, developmental and adaptive traits
2011-01-01
Background Cyanobacteria belong to an ancient group of photosynthetic prokaryotes with pronounced variations in their cellular differentiation strategies, physiological capacities and choice of habitat. Sequencing efforts have shown that genomes within this phylum are equally diverse in terms of size and protein-coding capacity. To increase our understanding of genomic changes in the lineage, the genomes of 58 contemporary cyanobacteria were analysed for shared and unique orthologs. Results A total of 404 protein families, present in all cyanobacterial genomes, were identified. Two of these are unique to the phylum, corresponding to an AbrB family transcriptional regulator and a gene that escapes functional annotation although its genomic neighbourhood is conserved among the organisms examined. The evolution of cyanobacterial genome sizes involves a mix of gains and losses in the clade encompassing complex cyanobacteria, while a single event of reduction is evident in a clade dominated by unicellular cyanobacteria. Genome sizes and gene family copy numbers evolve at a higher rate in the former clade, and multi-copy genes were predominant in large genomes. Orthologs unique to cyanobacteria exhibiting specific characteristics, such as filament formation, heterocyst differentiation, diazotrophy and symbiotic competence, were also identified. An ancestral character reconstruction suggests that the most recent common ancestor of cyanobacteria had a genome size of approx. 4.5 Mbp and 1678 to 3291 protein-coding genes, 4%-6% of which are unique to cyanobacteria today. Conclusions The different rates of genome-size evolution and multi-copy gene abundance suggest two routes of genome development in the history of cyanobacteria. The expansion strategy is driven by gene-family enlargment and generates a broad adaptive potential; while the genome streamlining strategy imposes adaptations to highly specific niches, also reflected in their different functional capacities. A few genomes display extreme proliferation of non-coding nucleotides which is likely to be the result of initial expansion of genomes/gene copy number to gain adaptive potential, followed by a shift to a life-style in a highly specific niche (e.g. symbiosis). This transition results in redundancy of genes and gene families, leading to an increase in junk DNA and eventually to gene loss. A few orthologs can be correlated with specific phenotypes in cyanobacteria, such as filament formation and symbiotic competence; these constitute exciting exploratory targets. PMID:21718514
Ambigapathy, Ganesh; Zheng, Zhaoqing; Li, Wei; Keifer, Joyce
2013-01-01
Brain-derived neurotrophic factor (BDNF) has a diverse functional role and complex pattern of gene expression. Alternative splicing of mRNA transcripts leads to further diversity of mRNAs and protein isoforms. Here, we describe the regulation of BDNF mRNA transcripts in an in vitro model of eyeblink classical conditioning and a unique transcript that forms a functionally distinct truncated BDNF protein isoform. Nine different mRNA transcripts from the BDNF gene of the pond turtle Trachemys scripta elegans (tBDNF) are selectively regulated during classical conditioning: exon I mRNA transcripts show no change, exon II transcripts are downregulated, while exon III transcripts are upregulated. One unique transcript that codes from exon II, tBDNF2a, contains a 40 base pair deletion in the protein coding exon that generates a truncated tBDNF protein. The truncated transcript and protein are expressed in the naïve untrained state and are fully repressed during conditioning when full-length mature tBDNF is expressed, thereby having an alternate pattern of expression in conditioning. Truncated BDNF is not restricted to turtles as a truncated mRNA splice variant has been described for the human BDNF gene. Further studies are required to determine the ubiquity of truncated BDNF alternative splice variants across species and the mechanisms of regulation and function of this newly recognized BDNF protein.
Ambigapathy, Ganesh; Zheng, Zhaoqing; Li, Wei; Keifer, Joyce
2013-01-01
Brain-derived neurotrophic factor (BDNF) has a diverse functional role and complex pattern of gene expression. Alternative splicing of mRNA transcripts leads to further diversity of mRNAs and protein isoforms. Here, we describe the regulation of BDNF mRNA transcripts in an in vitro model of eyeblink classical conditioning and a unique transcript that forms a functionally distinct truncated BDNF protein isoform. Nine different mRNA transcripts from the BDNF gene of the pond turtle Trachemys scripta elegans (tBDNF) are selectively regulated during classical conditioning: exon I mRNA transcripts show no change, exon II transcripts are downregulated, while exon III transcripts are upregulated. One unique transcript that codes from exon II, tBDNF2a, contains a 40 base pair deletion in the protein coding exon that generates a truncated tBDNF protein. The truncated transcript and protein are expressed in the naïve untrained state and are fully repressed during conditioning when full-length mature tBDNF is expressed, thereby having an alternate pattern of expression in conditioning. Truncated BDNF is not restricted to turtles as a truncated mRNA splice variant has been described for the human BDNF gene. Further studies are required to determine the ubiquity of truncated BDNF alternative splice variants across species and the mechanisms of regulation and function of this newly recognized BDNF protein. PMID:23825634
Origin of sphinx, a young chimeric RNA gene in Drosophila melanogaster
Wang, Wen; Brunet, Frédéric G.; Nevo, Eviatar; Long, Manyuan
2002-01-01
Non-protein-coding RNA genes play an important role in various biological processes. How new RNA genes originated and whether this process is controlled by similar evolutionary mechanisms for the origin of protein-coding genes remains unclear. A young chimeric RNA gene that we term sphinx (spx) provides the first insight into the early stage of evolution of RNA genes. spx originated as an insertion of a retroposed sequence of the ATP synthase chain F gene at the cytological region 60DB since the divergence of Drosophila melanogaster from its sibling species 2–3 million years ago. This retrosequence, which is located at 102F on the fourth chromosome, recruited a nearby exon and intron, thereby evolving a chimeric gene structure. This molecular process suggests that the mechanism of exon shuffling, which can generate protein-coding genes, also plays a role in the origin of RNA genes. The subsequent evolutionary process of spx has been associated with a high nucleotide substitution rate, possibly driven by a continuous positive Darwinian selection for a novel function, as is shown in its sex- and development-specific alternative splicing. To test whether spx has adapted to different environments, we investigated its population genetic structure in the unique “Evolution Canyon” in Israel, revealing a similar haplotype structure in spx, and thus similar evolutionary forces operating on spx between environments. PMID:11904380
Yatawara, Lalani; Wickramasinghe, Susiji; Rajapakse, R P V J; Agatsuma, Takeshi
2010-09-01
In the present study, we determined the complete mitochondrial (mt) genome sequence (13,839bp) of parasitic nematode Setaria digitata and its structure and organization compared with Onchocerca volvulus, Dirofilaria immitis and Brugia malayi. The mt genome of S. digitata is slightly larger than the mt genomes of other filarial nematodes. S. digitata mt genome contains 36 genes (12 protein-coding genes, 22 transfer RNAs and 2 ribosomal RNAs) that are typically found in metazoans. This genome contains a high A+T (75.1%) content and low G+C content (24.9%). The mt gene order for S. digitata is the same as those for O. volvulus, D. immitis and B. malayi but it is distinctly different from other nematodes compared. The start codons inferred in the mt genome of S. digitata are TTT, ATT, TTG, ATG, GTT and ATA. Interestingly, the initiation codon TTT is unique to S. digitata mt genome and four protein-coding genes use this codon as a translation initiation codon. Five protein-coding genes use TAG as a stop codon whereas three genes use TAA and four genes use T as a termination codon. Out of 64 possible codons, only 57 are used for mitochondrial protein-coding genes of S. digitata. T-rich codons such as TTT (18.9%), GTT (7.9%), TTG (7.8%), TAT (7%), ATT (5.7%), TCT (4.8%) and TTA (4.1%) are used more frequently. This pattern of codon usage reflects the strong bias for T in the mt genome of S. digitata. In conclusion, the present investigation provides new molecular data for future studies of the comparative mitochondrial genomics and systematic of parasitic nematodes of socio-economic importance. 2010 Elsevier B.V. All rights reserved.
Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prochnik, Simon E.; Umen, James; Nedelcu, Aurora
2010-07-01
Analysis of the Volvox carteri genome reveals that this green alga's increased organismal complexity and multicellularity are associated with modifications in protein families shared with its unicellular ancestor, and not with large-scale innovations in protein coding capacity. The multicellular green alga Volvox carteri and its morphologically diverse close relatives (the volvocine algae) are uniquely suited for investigating the evolution of multicellularity and development. We sequenced the 138 Mb genome of V. carteri and compared its {approx}14,500 predicted proteins to those of its unicellular relative, Chlamydomonas reinhardtii. Despite fundamental differences in organismal complexity and life history, the two species have similarmore » protein-coding potentials, and few species-specific protein-coding gene predictions. Interestingly, volvocine algal-specific proteins are enriched in Volvox, including those associated with an expanded and highly compartmentalized extracellular matrix. Our analysis shows that increases in organismal complexity can be associated with modifications of lineage-specific proteins rather than large-scale invention of protein-coding capacity.« less
Chen, Frank; Spano, Anthony; Goodman, Benjamin E.; Blasier, Kiev R.; Sabat, Agnes; Jeffery, Erin; Norris, Andrew; Shabanowitz, Jeffrey; Hunt, Donald F.; Lebedev, Nikolai
2010-01-01
The gene transfer agent of Rhodobacter capsulatus (GTA) is a unique phage-like particle that exchanges genetic information between members of this same species of bacterium. Besides being an excellent tool for genetic mapping, the GTA has a number of advantages for biotechnological and nanoengineering purposes. To facilitate the GTA purification and identify the proteins involved in GTA expression, assembly and regulation, in the present work we construct and transform into R. capsulatus Y262 a gene coding for a C-terminally His-tagged capsid protein. The constructed protein was expressed in the cells, assembled into chimeric GTA particles inside the cells and excreted from the cells into surrounding medium. Transmission electron micrographs of phosphotungstate-stained, NiNTA-purified chimeric GTA confirm that its structure is similar to normal GTA particles, with many particles composed both of a head and a tail. The mass spectrometric proteomic analysis of polypeptides present in the GTA recovered outside the cells shows that GTA is composed of at least 9 proteins represented in the GTA gene cluster including proteins coded for by Orf’s 3, 5, 6–9, 11, 13, and 15. PMID:19105630
Chen, Frank; Spano, Anthony; Goodman, Benjamin E; Blasier, Kiev R; Sabat, Agnes; Jeffery, Erin; Norris, Andrew; Shabanowitz, Jeffrey; Hunt, Donald F; Lebedev, Nikolai
2009-02-01
The gene transfer agent of Rhodobacter capsulatus (GTA) is a unique phage-like particle that exchanges genetic information between members of this same species of bacterium. Besides being an excellent tool for genetic mapping, the GTA has a number of advantages for biotechnological and nanoengineering purposes. To facilitate the GTA purification and identify the proteins involved in GTA expression, assembly and regulation, in the present work we construct and transform into R. capsulatus Y262 a gene coding for a C-terminally His-tagged capsid protein. The constructed protein was expressed in the cells, assembled into chimeric GTA particles inside the cells and excreted from the cells into surrounding medium. Transmission electron micrographs of phosphotungstate-stained, NiNTA-purified chimeric GTA confirm that its structure is similar to normal GTA particles, with many particles composed both of a head and a tail. The mass spectrometric proteomic analysis of polypeptides present in the GTA recovered outside the cells shows that GTA is composed of at least 9 proteins represented in the GTA gene cluster including proteins coded for by Orf's 3, 5, 6-9, 11, 13, and 15.
Chamber Specific Gene Expression Landscape of the Zebrafish Heart
Singh, Angom Ramcharan; Sivadas, Ambily; Sabharwal, Ankit; Vellarikal, Shamsudheen Karuthedath; Jayarajan, Rijith; Verma, Ankit; Kapoor, Shruti; Joshi, Adita; Scaria, Vinod; Sivasubbu, Sridhar
2016-01-01
The organization of structure and function of cardiac chambers in vertebrates is defined by chamber-specific distinct gene expression. This peculiarity and uniqueness of the genetic signatures demonstrates functional resolution attributed to the different chambers of the heart. Altered expression of the cardiac chamber genes can lead to individual chamber related dysfunctions and disease patho-physiologies. Information on transcriptional repertoire of cardiac compartments is important to understand the spectrum of chamber specific anomalies. We have carried out a genome wide transcriptome profiling study of the three cardiac chambers in the zebrafish heart using RNA sequencing. We have captured the gene expression patterns of 13,396 protein coding genes in the three cardiac chambers—atrium, ventricle and bulbus arteriosus. Of these, 7,260 known protein coding genes are highly expressed (≥10 FPKM) in the zebrafish heart. Thus, this study represents nearly an all-inclusive information on the zebrafish cardiac transcriptome. In this study, a total of 96 differentially expressed genes across the three cardiac chambers in zebrafish were identified. The atrium, ventricle and bulbus arteriosus displayed 20, 32 and 44 uniquely expressing genes respectively. We validated the expression of predicted chamber-restricted genes using independent semi-quantitative and qualitative experimental techniques. In addition, we identified 23 putative novel protein coding genes that are specifically restricted to the ventricle and not in the atrium or bulbus arteriosus. In our knowledge, these 23 novel genes have either not been investigated in detail or are sparsely studied. The transcriptome identified in this study includes 68 differentially expressing zebrafish cardiac chamber genes that have a human ortholog. We also carried out spatiotemporal gene expression profiling of the 96 differentially expressed genes throughout the three cardiac chambers in 11 developmental stages and 6 tissue types of zebrafish. We hypothesize that clustering the differentially expressed genes with both known and unknown functions will deliver detailed insights on fundamental gene networks that are important for the development and specification of the cardiac chambers. It is also postulated that this transcriptome atlas will help utilize zebrafish in a better way as a model for studying cardiac development and to explore functional role of gene networks in cardiac disease pathogenesis. PMID:26815362
Genomics of Clostridium taeniosporum, an organism which forms endospores with ribbon-like appendages
Cambridge, Joshua M.; Blinkova, Alexandra L.; Salvador Rocha, Erick I.; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M.; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O.
2018-01-01
Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12–14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism. PMID:29293521
Cambridge, Joshua M; Blinkova, Alexandra L; Salvador Rocha, Erick I; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O; Walker, James R
2018-01-01
Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12-14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism.
Robertson, Helen E; Lapraz, François; Egger, Bernhard; Telford, Maximilian J; Schiffer, Philipp H
2017-05-12
Acoels are small, ubiquitous - but understudied - marine worms with a very simple body plan. Their internal phylogeny is still not fully resolved, and the position of their proposed phylum Xenacoelomorpha remains debated. Here we describe mitochondrial genome sequences from the acoels Paratomella rubra and Isodiametra pulchra, and the complete mitochondrial genome of the acoel Archaphanostoma ylvae. The P. rubra and A. ylvae sequences are typical for metazoans in size and gene content. The larger I. pulchra mitochondrial genome contains both ribosomal genes, 21 tRNAs, but only 11 protein-coding genes. We find evidence suggesting a duplicated sequence in the I. pulchra mitochondrial genome. The P. rubra, I. pulchra and A. ylvae mitochondria have a unique genome organisation in comparison to other metazoan mitochondrial genomes. We found a large degree of protein-coding gene and tRNA overlap with little non-coding sequence in the compact P. rubra genome. Conversely, the A. ylvae and I. pulchra genomes have many long non-coding sequences between genes, likely driving genome size expansion in the latter. Phylogenetic trees inferred from mitochondrial genes retrieve Xenacoelomorpha as an early branching taxon in the deuterostomes. Sequence divergence analysis between P. rubra sampled in England and Spain indicates cryptic diversity.
Informational structure of genetic sequences and nature of gene splicing
NASA Astrophysics Data System (ADS)
Trifonov, E. N.
1991-10-01
Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio
2004-01-01
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
The complete mitochondrial genome sequence of the maned wolf (Chrysocyon brachyurus).
Zhao, Chao; Yang, Xiufeng; Zhang, Honghai; Zhang, Jin; Chen, Lei; Sha, Weilai; Liu, Guangshuai
2016-01-01
In this study, the complete mitochondrial genome of the maned wolf (Chrysocyon brachyurus), the unique species in Chrysocyon, was sequenced and reported for the first time using blood samples obtained from a female individual in Shanghai Zoo, China. Sequence analysis showed that the genome structure was in accordance with other Canidae species and it contained 12 S rRNA gene, 16 S rRNA gene, 22 tRNA genes, 13 protein-coding genes and 1 control region.
Fu, Cheng-Jie; Sheikh, Sanea; Miao, Wei; Andersson, Siv G E; Baldauf, Sandra L
2014-08-21
Discoba (Excavata) is an ancient group of eukaryotes with great morphological and ecological diversity. Unlike the other major divisions of Discoba (Jakobida and Euglenozoa), little is known about the mitochondrial DNAs (mtDNAs) of Heterolobosea. We have assembled a complete mtDNA genome from the aggregating heterolobosean amoeba, Acrasis kona, which consists of a single circular highly AT-rich (83.3%) molecule of 51.5 kb. Unexpectedly, A. kona mtDNA is missing roughly 40% of the protein-coding genes and nearly half of the transfer RNAs found in the only other sequenced heterolobosean mtDNAs, those of Naegleria spp. Instead, over a quarter of A. kona mtDNA consists of novel open reading frames. Eleven of the 16 protein-coding genes missing from A. kona mtDNA were identified in its nuclear DNA and polyA RNA, and phylogenetic analyses indicate that at least 10 of these 11 putative nuclear-encoded mitochondrial (NcMt) proteins arose by direct transfer from the mitochondrion. Acrasis kona mtDNA also employs C-to-U type RNA editing, and 12 homologs of DYW-type pentatricopeptide repeat (PPR) proteins implicated in plant organellar RNA editing are found in A. kona nuclear DNA. A mapping of mitochondrial gene content onto a consensus phylogeny reveals a sporadic pattern of relative stasis and rampant gene loss in Discoba. Rampant loss occurred independently in the unique common lineage leading to Heterolobosea + Tsukubamonadida and later in the unique lineage leading to Acrasis. Meanwhile, mtDNA gene content appears to be remarkably stable in the Acrasis sister lineage leading to Naegleria and in their distant relatives Jakobida. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Efficient analysis of mouse genome sequences reveal many nonsense variants
Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E.; Libert, Claude
2016-01-01
Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605
Fernandez-Valverde, Selene L; Calcino, Andrew D; Degnan, Bernard M
2015-05-15
The demosponge Amphimedon queenslandica is amongst the few early-branching metazoans with an assembled and annotated draft genome, making it an important species in the study of the origin and early evolution of animals. Current gene models in this species are largely based on in silico predictions and low coverage expressed sequence tag (EST) evidence. Amphimedon queenslandica protein-coding gene models are improved using deep RNA-Seq data from four developmental stages and CEL-Seq data from 82 developmental samples. Over 86% of previously predicted genes are retained in the new gene models, although 24% have additional exons; there is also a marked increase in the total number of annotated 3' and 5' untranslated regions (UTRs). Importantly, these new developmental transcriptome data reveal numerous previously unannotated protein-coding genes in the Amphimedon genome, increasing the total gene number by 25%, from 30,060 to 40,122. In general, Amphimedon genes have introns that are markedly smaller than those in other animals and most of the alternatively spliced genes in Amphimedon undergo intron-retention; exon-skipping is the least common mode of alternative splicing. Finally, in addition to canonical polyadenylation signal sequences, Amphimedon genes are enriched in a number of unique AT-rich motifs in their 3' UTRs. The inclusion of developmental transcriptome data has substantially improved the structure and composition of protein-coding gene models in Amphimedon queenslandica, providing a more accurate and comprehensive set of genes for functional and comparative studies. These improvements reveal the Amphimedon genome is comprised of a remarkably high number of tightly packed genes. These genes have small introns and there is pervasive intron retention amongst alternatively spliced transcripts. These aspects of the sponge genome are more similar unicellular opisthokont genomes than to other animal genomes.
Myelin protein zero gene sequencing diagnoses Charcot-Marie-Tooth Type 1B disease
DOE Office of Scientific and Technical Information (OSTI.GOV)
Su, Y.; Zhang, H.; Madrid, R.
1994-09-01
Charcot-Marie-Tooth disease (CMT), the most common genetic neuropathy, affects about 1 in 2600 people in Norway and is found worldwide. CMT Type 1 (CMT1) has slow nerve conduction with demyelinated Schwann cells. Autosomal dominant CMT Type 1B (CMT1B) results from mutations in the myelin protein zero gene which directs the synthesis of more than half of all Schwann cell protein. This gene was mapped to the chromosome 1q22-1q23.1 borderline by fluorescence in situ hybridization. The first 7 of 7 reported CMT1B mutations are unique. Thus the most effective means to identify CMT1B mutations in at-risk family members and fetuses ismore » to sequence the entire coding sequence in dominant or sporadic CMT patients without the CMT1A duplication. Of the 19 primers used in 16 pars to uniquely amplify the entire MPZ coding sequence, 6 primer pairs were used to amplify and sequence the 6 exons. The DyeDeoxy Terminator cycle sequencing method used with four different color fluorescent lables was superior to manual sequencing because it sequences more bases unambiguously from extracted genomic DNA samples within 24 hours. This protocol was used to test 28 CMT and Dejerine-Sottas patients without CMT1A gene duplication. Sequencing MPZ gene-specific amplified fragments identified 9 polymorphic sites within the 6 exons that encode the 248 amino acid MPZ protein. The large number of major CMT1B mutations identified by single strand sequencing are being verified by reverse strand sequencing and when possible, by restriction enzyme analysis. This protocol can be used to distringuish CMT1B patients from othre CMT phenotypes and to determine the CMT1B status of relatives both presymptomatically and prenatally.« less
genenames.org: the HGNC resources in 2011
Seal, Ruth L.; Gordon, Susan M.; Lush, Michael J.; Wright, Mathew W.; Bruford, Elspeth A.
2011-01-01
The HUGO Gene Nomenclature Committee (HGNC) aims to assign a unique gene symbol and name to every human gene. The HGNC database currently contains almost 30 000 approved gene symbols, over 19 000 of which represent protein-coding genes. The public website, www.genenames.org, displays all approved nomenclature within Symbol Reports that contain data curated by HGNC editors and links to related genomic, phenotypic and proteomic information. Here we describe improvements to our resources, including a new Quick Gene Search, a new List Search, an integrated HGNC BioMart and a new Statistics and Downloads facility. PMID:20929869
Li, Juan; Chen, Fen; Sugiyama, Hiromu; Blair, David; Lin, Rui-Qing; Zhu, Xing-Quan
2015-07-01
In the present study, near-complete mitochondrial (mt) genome sequences for Schistosoma japonicum from different regions in the Philippines and Japan were amplified and sequenced. Comparisons among S. japonicum from the Philippines, Japan, and China revealed a geographically based length difference in mt genomes, but the mt genomic organization and gene arrangement were the same. Sequence differences among samples from the Philippines and all samples from the three endemic areas were 0.57-2.12 and 0.76-3.85 %, respectively. The most variable part of the mt genome was the non-coding region. In the coding portion of the genome, protein-coding genes varied more than rRNA genes and tRNAs. The near-complete mt genome sequences for Philippine specimens were identical in length (14,091 bp) which was 4 bp longer than those of S. japonicum samples from Japan and China. This indel provides a unique genetic marker for S. japonicum samples from the Philippines. Phylogenetic analyses based on the concatenated amino acids of 12 protein-coding genes showed that samples of S. japonicum clustered according to their geographical origins. The identified mitochondrial indel marker will be useful for tracing the source of S. japonicum infection in humans and animals in Southeast Asia.
The Unique Biosynthetic Route from Lupinus β-Conglutin Gene to Blad
Monteiro, Sara; Freitas, Regina; Rajasekhar, Baru T.; Teixeira, Artur R.; Ferreira, Ricardo B.
2010-01-01
Background During seed germination, β-conglutin undergoes a major cycle of limited proteolysis in which many of its constituent subunits are processed into a 20 kDa polypeptide termed blad. Blad is the main component of a glycooligomer, accumulating exclusively in the cotyledons of Lupinus species, between days 4 and 12 after the onset of germination. Principal Findings The sequence of the gene encoding β-conglutin precursor (1791 nucleotides) is reported. This gene, which shares 44 to 57% similarity and 20 to 37% identity with other vicilin-like protein genes, includes several features in common with these globulins, but also specific hallmarks. Most notable is the presence of an ubiquitin interacting motif (UIM), which possibly links the unique catabolic route of β-conglutin to the ubiquitin/proteasome proteolytic pathway. Significance Blad forms through a unique route from and is a stable intermediary product of its precursor, β-conglutin, the major Lupinus seed storage protein. It is composed of 173 amino acid residues, is encoded by an intron-containing, internal fragment of the gene that codes for β-conglutin precursor (nucleotides 394 to 913) and exhibits an isoelectric point of 9.6 and a molecular mass of 20,404.85 Da. Consistent with its role as a storage protein, blad contains an extremely high proportion of the nitrogen-rich amino acids. PMID:20066045
The sea cucumber genome provides insights into morphological evolution and visceral regeneration
Dai, Hui; Hamel, Jean-François; Liu, Chengzhang; Yu, Yang; Liu, Shilin; Lin, Wenchao; Guo, Kaimin; Jin, Songjun; Xu, Peng; Storey, Kenneth B.; Huan, Pin; Zhang, Tao; Zhou, Yi; Zhang, Jiquan; Lin, Chenggang; Li, Xiaoni; Xing, Lili; Huo, Da; Sun, Mingzhe; Wang, Lei; Mercier, Annie; Li, Fuhua; Yang, Hongsheng
2017-01-01
Apart from sharing common ancestry with chordates, sea cucumbers exhibit a unique morphology and exceptional regenerative capacity. Here we present the complete genome sequence of an economically important sea cucumber, A. japonicus, generated using Illumina and PacBio platforms, to achieve an assembly of approximately 805 Mb (contig N50 of 190 Kb and scaffold N50 of 486 Kb), with 30,350 protein-coding genes and high continuity. We used this resource to explore key genetic mechanisms behind the unique biological characters of sea cucumbers. Phylogenetic and comparative genomic analyses revealed the presence of marker genes associated with notochord and gill slits, suggesting that these chordate features were present in ancestral echinoderms. The unique shape and weak mineralization of the sea cucumber adult body were also preliminarily explained by the contraction of biomineralization genes. Genome, transcriptome, and proteome analyses of organ regrowth after induced evisceration provided insight into the molecular underpinnings of visceral regeneration, including a specific tandem-duplicated prostatic secretory protein of 94 amino acids (PSP94)-like gene family and a significantly expanded fibrinogen-related protein (FREP) gene family. This high-quality genome resource will provide a useful framework for future research into biological processes and evolution in deuterostomes, including remarkable regenerative abilities that could have medical applications. Moreover, the multiomics data will be of prime value for commercial sea cucumber breeding programs. PMID:29023486
The sea cucumber genome provides insights into morphological evolution and visceral regeneration.
Zhang, Xiaojun; Sun, Lina; Yuan, Jianbo; Sun, Yamin; Gao, Yi; Zhang, Libin; Li, Shihao; Dai, Hui; Hamel, Jean-François; Liu, Chengzhang; Yu, Yang; Liu, Shilin; Lin, Wenchao; Guo, Kaimin; Jin, Songjun; Xu, Peng; Storey, Kenneth B; Huan, Pin; Zhang, Tao; Zhou, Yi; Zhang, Jiquan; Lin, Chenggang; Li, Xiaoni; Xing, Lili; Huo, Da; Sun, Mingzhe; Wang, Lei; Mercier, Annie; Li, Fuhua; Yang, Hongsheng; Xiang, Jianhai
2017-10-01
Apart from sharing common ancestry with chordates, sea cucumbers exhibit a unique morphology and exceptional regenerative capacity. Here we present the complete genome sequence of an economically important sea cucumber, A. japonicus, generated using Illumina and PacBio platforms, to achieve an assembly of approximately 805 Mb (contig N50 of 190 Kb and scaffold N50 of 486 Kb), with 30,350 protein-coding genes and high continuity. We used this resource to explore key genetic mechanisms behind the unique biological characters of sea cucumbers. Phylogenetic and comparative genomic analyses revealed the presence of marker genes associated with notochord and gill slits, suggesting that these chordate features were present in ancestral echinoderms. The unique shape and weak mineralization of the sea cucumber adult body were also preliminarily explained by the contraction of biomineralization genes. Genome, transcriptome, and proteome analyses of organ regrowth after induced evisceration provided insight into the molecular underpinnings of visceral regeneration, including a specific tandem-duplicated prostatic secretory protein of 94 amino acids (PSP94)-like gene family and a significantly expanded fibrinogen-related protein (FREP) gene family. This high-quality genome resource will provide a useful framework for future research into biological processes and evolution in deuterostomes, including remarkable regenerative abilities that could have medical applications. Moreover, the multiomics data will be of prime value for commercial sea cucumber breeding programs.
Genenames.org: the HGNC and VGNC resources in 2017.
Yates, Bethan; Braschi, Bryony; Gray, Kristian A; Seal, Ruth L; Tweedie, Susan; Bruford, Elspeth A
2017-01-04
The HUGO Gene Nomenclature Committee (HGNC) based at the European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. Currently the HGNC database contains almost 40 000 approved gene symbols, over 19 000 of which represent protein-coding genes. In addition to naming genomic loci we manually curate genes into family sets based on shared characteristics such as homology, function or phenotype. We have recently updated our gene family resources and introduced new improved visualizations which can be seen alongside our gene symbol reports on our primary website http://www.genenames.org In 2016 we expanded our remit and formed the Vertebrate Gene Nomenclature Committee (VGNC) which is responsible for assigning names to vertebrate species lacking a dedicated nomenclature group. Using the chimpanzee genome as a pilot project we have approved symbols and names for over 14 500 protein-coding genes in chimpanzee, and have developed a new website http://vertebrate.genenames.org to distribute these data. Here, we review our online data and resources, focusing particularly on the improvements and new developments made during the last two years. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
The Evolution and Expression Pattern of Human Overlapping lncRNA and Protein-coding Gene Pairs.
Ning, Qianqian; Li, Yixue; Wang, Zhen; Zhou, Songwen; Sun, Hong; Yu, Guangjun
2017-03-27
Long non-coding RNA overlapping with protein-coding gene (lncRNA-coding pair) is a special type of overlapping genes. Protein-coding overlapping genes have been well studied and increasing attention has been paid to lncRNAs. By studying lncRNA-coding pairs in human genome, we showed that lncRNA-coding pairs were more likely to be generated by overprinting and retaining genes in lncRNA-coding pairs were given higher priority than non-overlapping genes. Besides, the preference of overlapping configurations preserved during evolution was based on the origin of lncRNA-coding pairs. Further investigations showed that lncRNAs promoting the splicing of their embedded protein-coding partners was a unilateral interaction, but the existence of overlapping partners improving the gene expression was bidirectional and the effect was decreased with the increased evolutionary age of genes. Additionally, the expression of lncRNA-coding pairs showed an overall positive correlation and the expression correlation was associated with their overlapping configurations, local genomic environment and evolutionary age of genes. Comparison of the expression correlation of lncRNA-coding pairs between normal and cancer samples found that the lineage-specific pairs including old protein-coding genes may play an important role in tumorigenesis. This work presents a systematically comprehensive understanding of the evolution and the expression pattern of human lncRNA-coding pairs.
Sinha, Pallavi; Pazhamala, Lekha T.; Singh, Vikas K.; Saxena, Rachit K.; Krishnamurthy, L.; Azam, Sarwar; Khan, Aamir W.; Varshney, Rajeev K.
2016-01-01
Pigeonpea is a resilient crop, which is relatively more drought tolerant than many other legume crops. To understand the molecular mechanisms of this unique feature of pigeonpea, 51 genes were selected using the Hidden Markov Models (HMM) those codes for proteins having close similarity to universal stress protein domain. Validation of these genes was conducted on three pigeonpea genotypes (ICPL 151, ICPL 8755, and ICPL 227) having different levels of drought tolerance. Gene expression analysis using qRT-PCR revealed 6, 8, and 18 genes to be ≥2-fold differentially expressed in ICPL 151, ICPL 8755, and ICPL 227, respectively. A total of 10 differentially expressed genes showed ≥2-fold up-regulation in the more drought tolerant genotype, which encoded four different classes of proteins. These include plant U-box protein (four genes), universal stress protein A-like protein (four genes), cation/H(+) antiporter protein (one gene) and an uncharacterized protein (one gene). Genes C.cajan_29830 and C.cajan_33874 belonging to uspA, were found significantly expressed in all the three genotypes with ≥2-fold expression variations. Expression profiling of these two genes on the four other legume crops revealed their specific role in pigeonpea. Therefore, these genes seem to be promising candidates for conferring drought tolerance specifically to pigeonpea. PMID:26779199
The complete chloroplast genome sequence of the medicinal plant Andrographis paniculata.
Ding, Ping; Shao, Yanhua; Li, Qian; Gao, Junli; Zhang, Runjing; Lai, Xiaoping; Wang, Deqin; Zhang, Huiye
2016-07-01
The complete chloroplast genome of Andrographis paniculata, an important medicinal plant with great economic value, has been studied in this article. The genome size is 150,249 bp in length, with 38.3% GC content. A pair of inverted repeats (IRs, 25,300 bp) are separated by a large single copy region (LSC, 82,459 bp) and a small single-copy region (SSC, 17,190 bp). The chloroplast genome contains 114 unique genes, 80 protein-coding genes, 30 tRNA genes and 4 rRNA genes. In these genes, 15 genes contained 1 intron and 3 genes comprised of 2 introns.
Lee, Jungeun; Kang, Yoonjee; Shin, Seung Chul; Park, Hyun; Lee, Hyoungseok
2014-01-01
Background Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been researched as an important ecological marker and as an extremophile plant for studies on stress tolerance. Despite its importance, little genomic information is available for D. antarctica. Here, we report the complete chloroplast genome, transcriptome profiles of the coding/noncoding genes, and the posttranscriptional processing by RNA editing in the chloroplast system. Results The complete chloroplast genome of D. antarctica is 135,362 bp in length with a typical quadripartite structure, including the large (LSC: 79,881 bp) and small (SSC: 12,519 bp) single-copy regions, separated by a pair of identical inverted repeats (IR: 21,481 bp). It contains 114 unique genes, including 81 unique protein-coding genes, 29 tRNA genes, and 4 rRNA genes. Sequence divergence analysis with other plastomes from the BEP clade of the grass family suggests a sister relationship between D. antarctica, Festuca arundinacea and Lolium perenne of the Poeae tribe, based on the whole plastome. In addition, we conducted high-resolution mapping of the chloroplast-derived transcripts. Thus, we created an expression profile for 81 protein-coding genes and identified ndhC, psbJ, rps19, psaJ, and psbA as the most highly expressed chloroplast genes. Small RNA-seq analysis identified 27 small noncoding RNAs of chloroplast origin that were preferentially located near the 5′- or 3′-ends of genes. We also found >30 RNA-editing sites in the D. antarctica chloroplast genome, with a dominance of C-to-U conversions. Conclusions We assembled and characterized the complete chloroplast genome sequence of D. antarctica and investigated the features of the plastid transcriptome. These data may contribute to a better understanding of the evolution of D. antarctica within the Poaceae family for use in molecular phylogenetic studies and may also help researchers understand the characteristics of the chloroplast transcriptome. PMID:24647560
Lee, Jungeun; Kang, Yoonjee; Shin, Seung Chul; Park, Hyun; Lee, Hyoungseok
2014-01-01
Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been researched as an important ecological marker and as an extremophile plant for studies on stress tolerance. Despite its importance, little genomic information is available for D. antarctica. Here, we report the complete chloroplast genome, transcriptome profiles of the coding/noncoding genes, and the posttranscriptional processing by RNA editing in the chloroplast system. The complete chloroplast genome of D. antarctica is 135,362 bp in length with a typical quadripartite structure, including the large (LSC: 79,881 bp) and small (SSC: 12,519 bp) single-copy regions, separated by a pair of identical inverted repeats (IR: 21,481 bp). It contains 114 unique genes, including 81 unique protein-coding genes, 29 tRNA genes, and 4 rRNA genes. Sequence divergence analysis with other plastomes from the BEP clade of the grass family suggests a sister relationship between D. antarctica, Festuca arundinacea and Lolium perenne of the Poeae tribe, based on the whole plastome. In addition, we conducted high-resolution mapping of the chloroplast-derived transcripts. Thus, we created an expression profile for 81 protein-coding genes and identified ndhC, psbJ, rps19, psaJ, and psbA as the most highly expressed chloroplast genes. Small RNA-seq analysis identified 27 small noncoding RNAs of chloroplast origin that were preferentially located near the 5'- or 3'-ends of genes. We also found >30 RNA-editing sites in the D. antarctica chloroplast genome, with a dominance of C-to-U conversions. We assembled and characterized the complete chloroplast genome sequence of D. antarctica and investigated the features of the plastid transcriptome. These data may contribute to a better understanding of the evolution of D. antarctica within the Poaceae family for use in molecular phylogenetic studies and may also help researchers understand the characteristics of the chloroplast transcriptome.
Zhang, Yan-Qiong; Chen, Dong-Liang; Tian, Hai-Feng; Zhang, Bao-Hong; Wen, Jian-Fan
2009-10-01
Using a combined computational program, we identified 50 potential microRNAs (miRNAs) in Giardia lamblia, one of the most primitive unicellular eukaryotes. These miRNAs are unique to G. lamblia and no homologues have been found in other organisms; miRNAs, currently known in other species, were not found in G. lamblia. This suggests that miRNA biogenesis and miRNA-mediated gene regulation pathway may evolve independently, especially in evolutionarily distant lineages. A majority (43) of the predicted miRNAs are located at one single locus; however, some miRNAs have two or more copies in the genome. Among the 58 miRNA genes, 28 are located in the intergenic regions whereas 30 are present in the anti-sense strands of the protein-coding sequences. Five predicted miRNAs are expressed in G. lamblia trophozoite cells evidenced by expressed sequence tags or RT-PCR. Thirty-seven identified miRNAs may target 50 protein-coding genes, including seven variant-specific surface proteins (VSPs). Our findings provide a clue that miRNA-mediated gene regulation may exist in the early stage of eukaryotic evolution, suggesting that it is an important regulation system ubiquitous in eukaryotes.
The resurrection genome of Boea hygrometrica: A blueprint for survival of dehydration.
Xiao, Lihong; Yang, Ge; Zhang, Liechi; Yang, Xinhua; Zhao, Shuang; Ji, Zhongzhong; Zhou, Qing; Hu, Min; Wang, Yu; Chen, Ming; Xu, Yu; Jin, Haijing; Xiao, Xuan; Hu, Guipeng; Bao, Fang; Hu, Yong; Wan, Ping; Li, Legong; Deng, Xin; Kuang, Tingyun; Xiang, Chengbin; Zhu, Jian-Kang; Oliver, Melvin J; He, Yikun
2015-05-05
"Drying without dying" is an essential trait in land plant evolution. Unraveling how a unique group of angiosperms, the Resurrection Plants, survive desiccation of their leaves and roots has been hampered by the lack of a foundational genome perspective. Here we report the ∼1,691-Mb sequenced genome of Boea hygrometrica, an important resurrection plant model. The sequence revealed evidence for two historical genome-wide duplication events, a compliment of 49,374 protein-coding genes, 29.15% of which are unique (orphan) to Boea and 20% of which (9,888) significantly respond to desiccation at the transcript level. Expansion of early light-inducible protein (ELIP) and 5S rRNA genes highlights the importance of the protection of the photosynthetic apparatus during drying and the rapid resumption of protein synthesis in the resurrection capability of Boea. Transcriptome analysis reveals extensive alternative splicing of transcripts and a focus on cellular protection strategies. The lack of desiccation tolerance-specific genome organizational features suggests the resurrection phenotype evolved mainly by an alteration in the control of dehydration response genes.
A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes.
Hezroni, Hadas; Ben-Tov Perry, Rotem; Meir, Zohar; Housman, Gali; Lubelsky, Yoav; Ulitsky, Igor
2017-08-30
Only a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs. We systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality. We estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.
In Silico Pattern-Based Analysis of the Human Cytomegalovirus Genome
Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T.; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas
2003-01-01
More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/). PMID:12634390
In silico pattern-based analysis of the human cytomegalovirus genome.
Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas
2003-04-01
More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/).
Genome analysis of the platypus reveals unique signatures of evolution.
Warren, Wesley C; Hillier, LaDeana W; Marshall Graves, Jennifer A; Birney, Ewan; Ponting, Chris P; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P; Miethke, Pat; Waters, Paul D; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S; López-Otín, Carlos; Ordóñez, Gonzalo R; Eichler, Evan E; Chen, Lin; Cheng, Ze; Deakin, Janine E; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T; Wakefield, Matthew J; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A; Smit, Arian F A; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A; Walker, Jerilyn A; Konkel, Miriam K; Harris, Robert S; Whittington, Camilla M; Wong, Emily S W; Gemmell, Neil J; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R; Ray, David A; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H; Taylor, James; Jones, Russell C; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N; Pohl, Craig S; Smith, Scott M; Hou, Shunfeng; Nefedov, Mikhail; de Jong, Pieter J; Renfree, Marilyn B; Mardis, Elaine R; Wilson, Richard K
2008-05-08
We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.
Genome analysis of the platypus reveals unique signatures of evolution
Warren, Wesley C.; Hillier, LaDeana W.; Marshall Graves, Jennifer A.; Birney, Ewan; Ponting, Chris P.; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T.; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P.; Miethke, Pat; Waters, Paul D.; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S.; López-Otín, Carlos; Ordóñez, Gonzalo R.; Eichler, Evan E.; Chen, Lin; Cheng, Ze; Deakin, Janine E.; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T.; Wakefield, Matthew J.; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A.; Smit, Arian F. A.; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A.; Walker, Jerilyn A.; Konkel, Miriam K.; Harris, Robert S.; Whittington, Camilla M.; Wong, Emily S. W.; Gemmell, Neil J.; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M.; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P.; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J.; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M.; Sharp, Julie A.; Nicholas, Kevin R.; Ray, David A.; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H.; Taylor, James; Jones, Russell C.; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N.; Pohl, Craig S.; Smith, Scott M.; Hou, Shunfeng; Renfree, Marilyn B.; Mardis, Elaine R.; Wilson, Richard K.
2009-01-01
We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation. PMID:18464734
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Yongyan; Key Laboratory of Animal Biotechnology, Ministry of Agriculture, Northwest A and F University, Yangling 712100, Shaanxi; Ai, Zhiying
2013-10-15
Embryonic stem cells (ESCs) can proliferate indefinitely in vitro and differentiate into cells of all three germ layers. These unique properties make them exceptionally valuable for drug discovery and regenerative medicine. However, the practical application of ESCs is limited because it is difficult to derive and culture ESCs. It has been demonstrated that CHIR99021 (CHIR) promotes self-renewal and enhances the derivation efficiency of mouse (m)ESCs. However, the downstream targets of CHIR are not fully understood. In this study, we identified CHIR-regulated genes in mESCs using microarray analysis. Our microarray data demonstrated that CHIR not only influenced the Wnt/β-catenin pathway bymore » stabilizing β-catenin, but also modulated several other pluripotency-related signaling pathways such as TGF-β, Notch and MAPK signaling pathways. More detailed analysis demonstrated that CHIR inhibited Nodal signaling, while activating bone morphogenetic protein signaling in mESCs. In addition, we found that pluripotency-maintaining transcription factors were up-regulated by CHIR, while several developmental-related genes were down-regulated. Furthermore, we found that CHIR altered the expression of epigenetic regulatory genes and long intergenic non-coding RNAs. Quantitative real-time PCR results were consistent with microarray data, suggesting that CHIR alters the expression pattern of protein-encoding genes (especially transcription factors), epigenetic regulatory genes and non-coding RNAs to establish a relatively stable pluripotency-maintaining network. - Highlights: • Combined use of CHIR with LIF promotes self-renewal of J1 mESCs. • CHIR-regulated genes are involved in multiple pathways. • CHIR inhibits Nodal signaling and promotes Bmp4 expression to activate BMP signaling. • Expression of epigenetic regulatory genes and lincRNAs is altered by CHIR.« less
The complete chloroplast genome sequence of Dendrobium nobile.
Yan, Wenjin; Niu, Zhitao; Zhu, Shuying; Ye, Meirong; Ding, Xiaoyu
2016-11-01
The complete chloroplast (cp) genome sequence of Dendrobium nobile, an endangered and traditional Chinese medicine with important economic value, is presented in this article. The total genome size is 150,793 bp, containing a large single copy (LSC) region (84,939 bp) and a small single copy region (SSC) (13,310 bp) which were separated by two inverted repeat (IRs) regions (26,272 bp). The overall GC contents of the plastid genome were 38.8%. In total, 130 unique genes were annotated and they were consisted of 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. Fourteen genes contained one or two introns.
Hashim, Noor Haza Fazlin; Bharudin, Izwan; Abu Bakar, Mohd Faizal; Huang, Kie Kyon; Alias, Halimah; Lee, Bernard K. B.; Mat Isa, Mohd Noor; Mat-Sharani, Shuhaila; Sulaiman, Suhaila; Tay, Lih Jinq; Zolkefli, Radziah; Muhammad Noor, Yusuf; Law, Douglas Sie Nguong; Abdul Rahman, Siti Hamidah; Md-Illias, Rosli; Abu Bakar, Farah Diba; Najimudin, Nazalan; Abdul Murad, Abdul Munir; Mahadi, Nor Muhammad
2018-01-01
Extremely low temperatures present various challenges to life that include ice formation and effects on metabolic capacity. Psyhcrophilic microorganisms typically have an array of mechanisms to enable survival in cold temperatures. In this study, we sequenced and analysed the genome of a psychrophilic yeast isolated in the Antarctic region, Glaciozyma antarctica. The genome annotation identified 7857 protein coding sequences. From the genome sequence analysis we were able to identify genes that encoded for proteins known to be associated with cold survival, in addition to annotating genes that are unique to G. antarctica. For genes that are known to be involved in cold adaptation such as anti-freeze proteins (AFPs), our gene expression analysis revealed that they were differentially transcribed over time and in response to different temperatures. This indicated the presence of an array of adaptation systems that can respond to a changing but persistent cold environment. We were also able to validate the activity of all the AFPs annotated where the recombinant AFPs demonstrated anti-freeze capacity. This work is an important foundation for further collective exploration into psychrophilic microbiology where among other potential, the genes unique to this species may represent a pool of novel mechanisms for cold survival. PMID:29385175
Shamimuzzaman, Md.
2018-01-01
To understand translational capacity on a genome-wide scale across three developmental stages of immature soybean seed cotyledons, ribosome profiling was performed in combination with RNA sequencing and cluster analysis. Transcripts representing 216 unique genes demonstrated a higher level of translational activity in at least one stage by exhibiting higher translational efficiencies (TEs) in which there were relatively more ribosome footprint sequence reads mapping to the transcript than were present in the control total RNA sample. The majority of these transcripts were more translationally active at the early stage of seed development and included 12 unique serine or cysteine proteases and 16 2S albumin and low molecular weight cysteine-rich proteins that may serve as substrates for turnover and mobilization early in seed development. It would appear that the serine proteases and 2S albumins play a vital role in the early stages. In contrast, our investigation of profiles of 19 genes encoding high abundance seed storage proteins, such as glycinins, beta-conglycinins, lectin, and Kunitz trypsin inhibitors, showed that they all had similar patterns in which the TE values started at low levels and increased approximately 2 to 6-fold during development. The highest levels of these seed protein transcripts were found at the mid-developmental stage, whereas the highest ribosome footprint levels of only up to 1.6 TE were found at the late developmental stage. These experimental findings suggest that the major seed storage protein coding genes are primarily regulated at the transcriptional level during normal soybean cotyledon development. Finally, our analyses also identified a total of 370 unique gene models that showed very low TE values including over 48 genes encoding ribosomal family proteins and 95 gene models that are related to energy and photosynthetic functions, many of which have homology to the chloroplast genome. Additionally, we showed that genes of the chloroplast were relatively translationally inactive during seed development. PMID:29570733
Complete genome sequence of Paenibacillus sp. strain JDR-2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chow, Virginia; Nong, Guang; St. John, Franz J.
2012-01-01
Paenibacillus sp. strain JDR-2, an aggressively xylanolytic bacterium isolated from sweetgum (Liquidambar styraciflua) wood, is able to efficiently depolymerize, assimilate and metabolize 4-O-methylglucuronoxylan, the predominant structural component of hardwood hemicelluloses. A basis for this capability was first supported by the identification of genes and characterization of encoded enzymes and has been further defined by the sequencing and annotation of the complete genome, which we describe. In addition to genes implicated in the utilization of -1,4-xylan, genes have also been identified for the utilization of other hemicellulosic polysaccharides. The genome of Paenibacillus sp. JDR-2 contains 7,184,930 bp in a single repliconmore » with 6,288 protein-coding and 122 RNA genes. Uniquely prominent are 874 genes encoding proteins involved in carbohydrate transport and metabolism. The prevalence and organization of these genes support a metabolic potential for bioprocessing of hemicellulose fractions derived from lignocellulosic resources.« less
Carrasco-Rando, Marta; Tutor, Antonio S.; Prieto-Sánchez, Silvia; González-Pérez, Esther; Barrios, Natalia; Letizia, Annalisa; Martín, Paloma; Campuzano, Sonsoles; Ruiz-Gómez, Mar
2011-01-01
A central issue of myogenesis is the acquisition of identity by individual muscles. In Drosophila, at the time muscle progenitors are singled out, they already express unique combinations of muscle identity genes. This muscle code results from the integration of positional and temporal signalling inputs. Here we identify, by means of loss-of-function and ectopic expression approaches, the Iroquois Complex homeobox genes araucan and caupolican as novel muscle identity genes that confer lateral transverse muscle identity. The acquisition of this fate requires that Araucan/Caupolican repress other muscle identity genes such as slouch and vestigial. In addition, we show that Caupolican-dependent slouch expression depends on the activation state of the Ras/Mitogen Activated Protein Kinase cascade. This provides a comprehensive insight into the way Iroquois genes integrate in muscle progenitors, signalling inputs that modulate gene expression and protein activity. PMID:21811416
Jankowitsch, Frank; Schwarz, Julia; Rückert, Christian; Gust, Bertolt; Szczepanowski, Rafael; Blom, Jochen; Pelzer, Stefan; Kalinowski, Jörn
2012-01-01
Streptomyces davawensis JCM 4913 synthesizes the antibiotic roseoflavin, a structural riboflavin (vitamin B2) analog. Here, we report the 9,466,619-bp linear chromosome of S. davawensis JCM 4913 and a 89,331-bp linear plasmid. The sequence has an average G+C content of 70.58% and contains six rRNA operons (16S-23S-5S) and 69 tRNA genes. The 8,616 predicted protein-coding sequences include 32 clusters coding for secondary metabolites, several of which are unique to S. davawensis. The chromosome contains long terminal inverted repeats of 33,255 bp each and atypical telomeres. Sequence analysis with regard to riboflavin biosynthesis revealed three different patterns of gene organization in Streptomyces species. Heterologous expression of a set of genes present on a subgenomic fragment of S. davawensis resulted in the production of roseoflavin by the host Streptomyces coelicolor M1152. Phylogenetic analysis revealed that S. davawensis is a close relative of Streptomyces cinnabarinus, and much to our surprise, we found that the latter bacterium is a roseoflavin producer as well. PMID:23043000
Wang, Shuo; Gao, Li-Zhi
2016-11-01
The complete chloroplast genome sequence of foxtail millet (Setaria italica), an important food and fodder crop in the family Poaceae, is first reported in this study. The genome consists of 1 35 516 bp containing a pair of inverted repeats (IRs) of 21 804 bp separated by a large single-copy (LSC) region and a small single-copy (SSC) region of 79 896 bp and 12 012 bp, respectively. Coding sequences constitute 58.8% of the genome harboring 111 unique genes, 71 of which are protein-coding genes, 4 are rRNA genes, and 36 are tRNA genes. Phylogenetic analysis indicated foxtail millet clustered with Panicum virgatum and Echinochloa crus-galli belonging to the tribe Paniceae of the subfamily Panicoideae. This newly determined chloroplast genome will provide valuable information for the future breeding programs of valuable cereal crops in the family Poaceae.
Structural architecture of the human long non-coding RNA, steroid receptor RNA activator
Novikova, Irina V.; Hennelly, Scott P.; Sanbonmatsu, Karissa Y.
2012-01-01
While functional roles of several long non-coding RNAs (lncRNAs) have been determined, the molecular mechanisms are not well understood. Here, we report the first experimentally derived secondary structure of a human lncRNA, the steroid receptor RNA activator (SRA), 0.87 kB in size. The SRA RNA is a non-coding RNA that coactivates several human sex hormone receptors and is strongly associated with breast cancer. Coding isoforms of SRA are also expressed to produce proteins, making the SRA gene a unique bifunctional system. Our experimental findings (SHAPE, in-line, DMS and RNase V1 probing) reveal that this lncRNA has a complex structural organization, consisting of four domains, with a variety of secondary structure elements. We examine the coevolution of the SRA gene at the RNA structure and protein structure levels using comparative sequence analysis across vertebrates. Rapid evolutionary stabilization of RNA structure, combined with frame-disrupting mutations in conserved regions, suggests that evolutionary pressure preserves the RNA structural core rather than its translational product. We perform similar experiments on alternatively spliced SRA isoforms to assess their structural features. PMID:22362738
Unique features of a global human ectoparasite identified through sequencing of the bed bug genome.
Benoit, Joshua B; Adelman, Zach N; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C; Szuter, Elise M; Hagan, Richard W; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M; Nelson, David R; Rosendale, Andrew J; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R; Ioannidis, Panagiotis; Waterhouse, Robert M; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J Spencer; Gondhalekar, Ameya D; Scharf, Michael E; Peterson, Brittany F; Raje, Kapil R; Hottel, Benjamin A; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S T; Duncan, Elizabeth J; Murali, Shwetha C; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C; Muzny, Donna M; Wheeler, David; Panfilio, Kristen A; Vargas Jentzsch, Iris M; Vargo, Edward L; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T; Anderson, Michelle A E; Jones, Jeffery W; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D; Attardo, Geoffrey M; Robertson, Hugh M; Zdobnov, Evgeny M; Ribeiro, Jose M C; Gibbs, Richard A; Werren, John H; Palli, Subba R; Schal, Coby; Richards, Stephen
2016-02-02
The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host-symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human-bed bug and symbiont-bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite.
Unique features of a global human ectoparasite identified through sequencing of the bed bug genome
Benoit, Joshua B.; Adelman, Zach N.; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C.; Szuter, Elise M.; Hagan, Richard W.; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M.; Nelson, David R.; Rosendale, Andrew J.; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M.; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R.; Ioannidis, Panagiotis; Waterhouse, Robert M.; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J. Spencer; Gondhalekar, Ameya D.; Scharf, Michael E.; Peterson, Brittany F.; Raje, Kapil R.; Hottel, Benjamin A.; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S. T.; Duncan, Elizabeth J.; Murali, Shwetha C.; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L.; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C.; Muzny, Donna M.; Wheeler, David; Panfilio, Kristen A.; Vargas Jentzsch, Iris M.; Vargo, Edward L.; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T.; Anderson, Michelle A. E.; Jones, Jeffery W.; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D.; Attardo, Geoffrey M.; Robertson, Hugh M.; Zdobnov, Evgeny M.; Ribeiro, Jose M. C.; Gibbs, Richard A.; Werren, John H.; Palli, Subba R.; Schal, Coby; Richards, Stephen
2016-01-01
The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host–symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human–bed bug and symbiont–bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite. PMID:26836814
Multi-Omics Driven Assembly and Annotation of the Sandalwood (Santalum album) Genome.
Mahesh, Hirehally Basavarajegowda; Subba, Pratigya; Advani, Jayshree; Shirke, Meghana Deepak; Loganathan, Ramya Malarini; Chandana, Shankara Lingu; Shilpa, Siddappa; Chatterjee, Oishi; Pinto, Sneha Maria; Prasad, Thottethodi Subrahmanya Keshava; Gowda, Malali
2018-04-01
Indian sandalwood ( Santalum album ) is an important tropical evergreen tree known for its fragrant heartwood-derived essential oil and its valuable carving wood. Here, we applied an integrated genomic, transcriptomic, and proteomic approach to assemble and annotate the Indian sandalwood genome. Our genome sequencing resulted in the establishment of a draft map of the smallest genome for any woody tree species to date (221 Mb). The genome annotation predicted 38,119 protein-coding genes and 27.42% repetitive DNA elements. In-depth proteome analysis revealed the identities of 72,325 unique peptides, which confirmed 10,076 of the predicted genes. The addition of transcriptomic and proteogenomic approaches resulted in the identification of 53 novel proteins and 34 gene-correction events that were missed by genomic approaches. Proteogenomic analysis also helped in reassigning 1,348 potential noncoding RNAs as bona fide protein-coding messenger RNAs. Gene expression patterns at the RNA and protein levels indicated that peptide sequencing was useful in capturing proteins encoded by nuclear and organellar genomes alike. Mass spectrometry-based proteomic evidence provided an unbiased approach toward the identification of proteins encoded by organellar genomes. Such proteins are often missed in transcriptome data sets due to the enrichment of only messenger RNAs that contain poly(A) tails. Overall, the use of integrated omic approaches enhanced the quality of the assembly and annotation of this nonmodel plant genome. The availability of genomic, transcriptomic, and proteomic data will enhance genomics-assisted breeding, germplasm characterization, and conservation of sandalwood trees. © 2018 American Society of Plant Biologists. All Rights Reserved.
Hücker, Sarah M.; Ardern, Zachary; Goldberg, Tatyana; Schafferhans, Andrea; Bernhofer, Michael; Vestergaard, Gisle; Nelson, Chase W.; Schloter, Michael; Rost, Burkhard; Scherer, Siegfried
2017-01-01
In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set. PMID:28902868
Sugai, Akihiro; Sato, Hiroki; Yoneda, Misako; Kai, Chieko
2017-08-01
The regulation of transcription during Nipah virus (NiV) replication is poorly understood. Using a bicistronic minigenome system, we investigated the involvement of non-coding regions (NCRs) in the transcriptional re-initiation efficiency of NiV RNA polymerase. Reporter assays revealed that attenuation of NiV gene expression was not constant at each gene junction, and that the attenuating property was controlled by the 3' NCR. However, this regulation was independent of the gene-end, gene-start and intergenic regions. Northern blot analysis indicated that regulation of viral gene expression by the phosphoprotein (P) and large protein (L) 3' NCRs occurred at the transcription level. We identified uridine-rich tracts within the L 3' NCR that are similar to gene-end signals. These gene-end-like sequences were recognized as weak transcription termination signals by the viral RNA polymerase, thereby reducing downstream gene transcription. Thus, we suggest that NiV has a unique mechanism of transcriptional regulation. Copyright © 2017 Elsevier Inc. All rights reserved.
Ramesh, S V
2013-09-01
Of late non-coding RNAs (ncRNAs)-mediated gene silencing is an influential tool deliberately deployed to negatively regulate the expression of targeted genes. In addition to the widely employed small interfering RNA (siRNA)-mediated gene silencing approach, other variants like artificial miRNA (amiRNA), miRNA mimics, and artificial transacting siRNAs (tasiRNAs) are being explored and successfully deployed in developing non-coding RNA-based genetically modified plants. The ncRNA-based gene manipulations are typified with mobile nature of silencing signals, interference from viral genome-derived suppressor proteins, and an obligation for meticulous computational analysis to prevaricate any inadvertent effects. In a broad sense, risk assessment inquiries for genetically modified plants based on the expression of ncRNAs are competently addressed by the environmental risk assessment (ERA) models, currently in vogue, designed for the first generation transgenic plants which are based on the expression of heterologous proteins. Nevertheless, transgenic plants functioning on the foundation of ncRNAs warrant due attention with respect to their unique attributes like off-target or non-target gene silencing effects, small RNAs (sRNAs) persistence, food and feed safety assessments, problems in detection and tracking of sRNAs in food, impact of ncRNAs in plant protection measures, effect of mutations etc. The role of recent developments in sequencing techniques like next generation sequencing (NGS) and the ERA paradigm of the different countries in vogue are also discussed in the context of ncRNA-based gene manipulations.
Bhasi, Ashwini; Philip, Philge; Manikandan, Vinu; Senapathy, Periannan
2009-01-01
We have developed ExDom, a unique database for the comparative analysis of the exon–intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon–intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon–intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon–intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/. PMID:18984624
The complete chloroplast genome of Aconitum chiisanense Nakai (Ranunculaceae).
Lim, Chae Eun; Kim, Goon-Bo; Baek, Seunghoon; Han, Su-Min; Yu, Hee-Ju; Mun, Jeong-Hwan
2017-01-01
We determined the complete chloroplast DNA sequence of Aconitum chiisanense Nakai, a rare Aconitum species endemic to Korea. The chloroplast genome is 155 934 bp in length and contains 4 rRNA, 30 tRNA, and 78 protein-coding genes. Phylogenetic analysis revealed that the chloroplast genome of A. chiisanense is closely related to that of A. barbatum var. puberulum. Sequence comparison with other Ranunculaceae chloroplasts identified a unique deletion in the rps16 gene of A. chiisanense chloroplast DNA that can serve as a molecular marker for species identification.
The whole chloroplast genome of wild rice (Oryza australiensis).
Wu, Zhiqiang; Ge, Song
2016-01-01
The whole chloroplast genome of wild rice (Oryza australiensis) is characterized in this study. The genome size is 135,224 bp, exhibiting a typical circular structure including a pair of 25,776 bp inverted repeats (IRa,b) separated by a large single-copy region (LSC) of 82,212 bp and a small single-copy region (SSC) of 12,470 bp. The overall GC content of the genome is 38.95%. 110 unique genes were annotated, including 76 protein-coding genes, 4 ribosomal RNA genes, and 30t RNA genes. Among these, 18 are duplicated in the inverted repeat regions, 13 genes contain one intron, and 2 genes (rps12 and ycf3) have two introns.
Omeire, Destiny; Abdin, Shaunte; Brooks, Daniel M; Miranda, Hector C
2015-04-01
The Germain's Peacock-Pheasant Polyplectron germaini (Aves, Galliformes, Phasianidae) is classified as Near Threatened on the IUCN Red List. The complete mitochondrial genome of P. germaini is 16,699 bp, consisting of 13 protein-coding genes, 2 rRNA, 22 tRNA genes and 1 control region. All of the 13 protein-coding genes have ATG as start codon. Eight of the 13 protein-coding genes have TAA as stop codon.
NASA Technical Reports Server (NTRS)
Stolc, Viktor; Samanta, Manoj Pratim; Tongprasit, Waraporn; Marshall, Wallace F.
2005-01-01
The important role that cilia and flagella play in human disease creates an urgent need to identify genes involved in ciliary assembly and function. The strong and specific induction of flagellar-coding genes during flagellar regeneration in Chlamydomonas reinhardtii suggests that transcriptional profiling of such cells would reveal new flagella-related genes. We have conducted a genome-wide analysis of RNA transcript levels during flagellar regeneration in Chlamydomonas by using maskless photolithography method-produced DNA oligonucleotide microarrays with unique probe sequences for all exons of the 19,803 predicted genes. This analysis represents previously uncharacterized whole-genome transcriptional activity profiling study in this important model organism. Analysis of strongly induced genes reveals a large set of known flagellar components and also identifies a number of important disease-related proteins as being involved with cilia and flagella, including the zebrafish polycystic kidney genes Qilin, Reptin, and Pontin, as well as the testis-expressed tubby-like protein TULP2.
Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui; ...
2014-10-02
Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui
Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
McGill, Susan E; Barker, Daniel
2017-07-20
" Candidatus Ruthia magnifica", "Candidatus Vesicomyosocius okutanii" and Thiomicrospira crunogena are all sulfur-oxidising bacteria found in deep-sea vent environments. Recent research suggests that the two symbiotic organisms, "Candidatus R. magnifica" and "Candidatus V. okutanii", may share common ancestry with the autonomously living species T. crunogena. We used comparative genomics to examine the genome-wide protein-coding content of all three species to explore their similarities. In particular, we used the OrthoMCL algorithm to sort proteins into groups of putative orthologs on the basis of sequence similarity. The OrthoMCL inflation parameter was tuned using biological criteria. Using the tuned value, OrthoMCL delimited 1070 protein groups. 63.5% of these groups contained one protein from each species. Two groups contained duplicate protein copies from all three species. 123 groups were unique to T. crunogena and ten groups included multiple copies of T. crunogena proteins but only single copies from the other species. "Candidatus R. magnifica" had one unique group, and had multiple copies in one group where the other species had a single copy. There were no groups unique to "Candidatus V. okutanii", and no groups in which there were multiple "Candidatus V. okutanii" proteins but only single proteins from the other species. Results align with previous suggestions that all three species share a common ancestor. However this is not definitive evidence to make taxonomic conclusions and the possibility of horizontal gene transfer was not investigated. Methodologically, the tuning of the OrthoMCL inflation parameter using biological criteria provides further methods to refine the OrthoMCL procedure.
2004-12-09
We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Recombination and Population Mosaic of a Multifunctional Viral Gene, Adeno-Associated Virus cap
Takeuchi, Yasuhiro; Myers, Richard; Danos, Olivier
2008-01-01
Homologous recombination is a dominant force in evolution and results in genetic mosaics. To detect evidence of recombination events and assess the biological significance of genetic mosaics, genome sequences for various viral populations of reasonably large size are now available in the GenBank. We studied a multi-functional viral gene, the adeno-associated virus (AAV) cap gene, which codes for three capsid proteins, VP1, VP2 and VP3. VP1-3 share a common C-terminal domain corresponding to VP3, which forms the viral core structure, while the VP1 unique N-terminal part contains an enzymatic domain with phospholipase A2 activity. Our recombinant detection program (RecI) revealed five novel recombination events, four of which have their cross-over points in the N-terminal, VP1 and VP2 unique region. Comparison of phylogenetic trees for different cap gene regions confirmed discordant phylogenies for the recombinant sequences. Furthermore, differences in the phylogenetic tree structures for the VP1 unique (VP1u) region and the rest of cap highlighted the mosaic nature of cap gene in the AAV population: two dominant forms of VP1u sequences were identified and these forms are linked to diverse sequences in the rest of cap gene. This observation together with the finding of frequent recombination in the VP1 and 2 unique regions suggests that this region is a recombination hot spot. Recombination events in this region preserve protein blocks of distinctive functions and contribute to convergence in VP1u and divergence of the rest of cap. Additionally the possible biological significance of two dominant VP1u forms is inferred. PMID:18286191
Kang, Seokha; Sultana, Tahera; Eom, Keeseon S; Park, Yung Chul; Soonthornpong, Nathan; Nadler, Steven A; Park, Joong-Ki
2009-01-15
The complete mitochondrial genome sequence was determined for the human pinworm Enterobius vermicularis (Oxyurida: Nematoda) and used to infer its phylogenetic relationship to other major groups of chromadorean nematodes. The E. vermicularis genome is a 14,010-bp circular DNA molecule that encodes 36 genes (12 proteins, 22 tRNAs, and 2 rRNAs). This mtDNA genome lacks atp8, as reported for almost all other nematode species investigated. Phylogenetic analyses (maximum parsimony, maximum likelihood, neighbor joining, and Bayesian inference) of nucleotide sequences for the 12 protein-coding genes of 25 nematode species placed E. vermicularis, a representative of the order Oxyurida, as sister to the main Ascaridida+Rhabditida group. Tree topology comparisons using statistical tests rejected an alternative hypothesis favoring a closer relationship among Ascaridida, Spirurida, and Oxyurida, which has been supported from most studies based on nuclear ribosomal DNA sequences. Unlike the relatively conserved gene arrangement found for most chromadorean taxa, E. vermicularis mtDNA gene order is very unique, not sharing similarity to any other nematode species reported to date. This lack of gene order similarity may represent idiosyncratic gene rearrangements unique to this specific lineage of the oxyurids. To more fully understand the extent of gene rearrangement and its evolutionary significance within the nematode phylogenetic framework, additional mitochondrial genomes representing a greater evolutionary diversity of species must be characterized.
Liu, Huitao; Cui, Peng; Zhan, Kehui; Lin, Qiang; Zhuo, Guoyin; Guo, Xiaoli; Ding, Feng; Yang, Wenlong; Liu, Dongcheng; Hu, Songnian; Yu, Jun; Zhang, Aimin
2011-03-29
Plant mitochondria, semiautonomous organelles that function as manufacturers of cellular ATP, have their own genome that has a slow rate of evolution and rapid rearrangement. Cytoplasmic male sterility (CMS), a common phenotype in higher plants, is closely associated with rearrangements in mitochondrial DNA (mtDNA), and is widely used to produce F1 hybrid seeds in a variety of valuable crop species. Novel chimeric genes deduced from mtDNA rearrangements causing CMS have been identified in several plants, such as rice, sunflower, pepper, and rapeseed, but there are very few reports about mtDNA rearrangements in wheat. In the present work, we describe the mitochondrial genome of a wheat K-type CMS line and compare it with its maintainer line. The complete mtDNA sequence of a wheat K-type (with cytoplasm of Aegilops kotschyi) CMS line, Ks3, was assembled into a master circle (MC) molecule of 647,559 bp and found to harbor 34 known protein-coding genes, three rRNAs (18 S, 26 S, and 5 S rRNAs), and 16 different tRNAs. Compared to our previously published sequence of a K-type maintainer line, Km3, we detected Ks3-specific mtDNA (> 100 bp, 11.38%) and repeats (> 100 bp, 29 units) as well as genes that are unique to each line: rpl5 was missing in Ks3 and trnH was absent from Km3. We also defined 32 single nucleotide polymorphisms (SNPs) in 13 protein-coding, albeit functionally irrelevant, genes, and predicted 22 unique ORFs in Ks3, representing potential candidates for K-type CMS. All these sequence variations are candidates for involvement in CMS. A comparative analysis of the mtDNA of several angiosperms, including those from Ks3, Km3, rice, maize, Arabidopsis thaliana, and rapeseed, showed that non-coding sequences of higher plants had mostly divergent multiple reorganizations during the mtDNA evolution of higher plants. The complete mitochondrial genome of the wheat K-type CMS line Ks3 is very different from that of its maintainer line Km3, especially in non-coding sequences. Sequence rearrangement has produced novel chimeric ORFs, which may be candidate genes for CMS. Comparative analysis of several angiosperm mtDNAs indicated that non-coding sequences are the most frequently reorganized during mtDNA evolution in higher plants.
Liu, Yangyang; Han, Xiao; Yuan, Junting; Geng, Tuoyu; Chen, Shihao; Hu, Xuming; Cui, Isabelle H; Cui, Hengmi
2017-04-07
The type II bacterial CRISPR/Cas9 system is a simple, convenient, and powerful tool for targeted gene editing. Here, we describe a CRISPR/Cas9-based approach for inserting a poly(A) transcriptional terminator into both alleles of a targeted gene to silence protein-coding and non-protein-coding genes, which often play key roles in gene regulation but are difficult to silence via insertion or deletion of short DNA fragments. The integration of 225 bp of bovine growth hormone poly(A) signals into either the first intron or the first exon or behind the promoter of target genes caused efficient termination of expression of PPP1R12C , NSUN2 (protein-coding genes), and MALAT1 (non-protein-coding gene). Both NeoR and PuroR were used as markers in the selection of clonal cell lines with biallelic integration of a poly(A) signal. Genotyping analysis indicated that the cell lines displayed the desired biallelic silencing after a brief selection period. These combined results indicate that this CRISPR/Cas9-based approach offers an easy, convenient, and efficient novel technique for gene silencing in cell lines, especially for those in which gene integration is difficult because of a low efficiency of homology-directed repair. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Convergent evolution of marine mammals is associated with distinct substitutions in common genes
Zhou, Xuming; Seim, Inge; Gladyshev, Vadim N.
2015-01-01
Phenotypic convergence is thought to be driven by parallel substitutions coupled with natural selection at the sequence level. Multiple independent evolutionary transitions of mammals to an aquatic environment offer an opportunity to test this thesis. Here, whole genome alignment of coding sequences identified widespread parallel amino acid substitutions in marine mammals; however, the majority of these changes were not unique to these animals. Conversely, we report that candidate aquatic adaptation genes, identified by signatures of likelihood convergence and/or elevated ratio of nonsynonymous to synonymous nucleotide substitution rate, are characterized by very few parallel substitutions and exhibit distinct sequence changes in each group. Moreover, no significant positive correlation was found between likelihood convergence and positive selection in all three marine lineages. These results suggest that convergence in protein coding genes associated with aquatic lifestyle is mainly characterized by independent substitutions and relaxed negative selection. PMID:26549748
Biomimetic Artificial Epigenetic Code for Targeted Acetylation of Histones.
Taniguchi, Junichi; Feng, Yihong; Pandian, Ganesh N; Hashiya, Fumitaka; Hidaka, Takuya; Hashiya, Kaori; Park, Soyoung; Bando, Toshikazu; Ito, Shinji; Sugiyama, Hiroshi
2018-06-13
While the central role of locus-specific acetylation of histone proteins in eukaryotic gene expression is well established, the availability of designer tools to regulate acetylation at particular nucleosome sites remains limited. Here, we develop a unique strategy to introduce acetylation by constructing a bifunctional molecule designated Bi-PIP. Bi-PIP has a P300/CBP-selective bromodomain inhibitor (Bi) as a P300/CBP recruiter and a pyrrole-imidazole polyamide (PIP) as a sequence-selective DNA binder. Biochemical assays verified that Bi-PIPs recruit P300 to the nucleosomes having their target DNA sequences and extensively accelerate acetylation. Bi-PIPs also activated transcription of genes that have corresponding cognate DNA sequences inside living cells. Our results demonstrate that Bi-PIPs could act as a synthetic programmable histone code of acetylation, which emulates the bromodomain-mediated natural propagation system of histone acetylation to activate gene expression in a sequence-selective manner.
RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336”
Kumar, Ranjit; Lawrence, Mark L.; Watt, James; Cooksey, Amanda M.; Burgess, Shane C.; Nanduri, Bindu
2012-01-01
Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify “novel” genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method. The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations. PMID:22276113
RNA-seq based transcriptional map of bovine respiratory disease pathogen "Histophilus somni 2336".
Kumar, Ranjit; Lawrence, Mark L; Watt, James; Cooksey, Amanda M; Burgess, Shane C; Nanduri, Bindu
2012-01-01
Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method.The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations.
Balfanz, Sabine; Strünker, Timo; Frings, Stephan; Baumann, Arnd
2005-04-01
In invertebrates, the biogenic-amine octopamine is an important physiological regulator. It controls and modulates neuronal development, circadian rhythm, locomotion, 'fight or flight' responses, as well as learning and memory. Octopamine mediates its effects by activation of different GTP-binding protein (G protein)-coupled receptor types, which induce either cAMP production or Ca(2+) release. Here we describe the functional characterization of two genes from Drosophila melanogaster that encode three octopamine receptors. The first gene (Dmoa1) codes for two polypeptides that are generated by alternative splicing. When heterologously expressed, both receptors cause oscillatory increases of the intracellular Ca(2+) concentration in response to applying nanomolar concentrations of octopamine. The second gene (Dmoa2) codes for a receptor that specifically activates adenylate cyclase and causes a rise of intracellular cAMP with an EC(50) of approximately 3 x 10(-8) m octopamine. Tyramine, the precursor of octopamine biosynthesis, activates all three receptors at > or = 100-fold higher concentrations, whereas dopamine and serotonin are non-effective. Developmental expression of Dmoa genes was assessed by RT-PCR. Overlapping but not identical expression patterns were observed for the individual transcripts. The genes characterized in this report encode unique receptors that display signature properties of native octopamine receptors.
Chakraborty, Supriyo; Uddin, Arif; Mazumder, Tarikul Huda; Choudhury, Monisha Nath; Malakar, Arup Kumar; Paul, Prosenjit; Halder, Binata; Deka, Himangshu; Mazumder, Gulshana Akthar; Barbhuiya, Riazul Ahmed; Barbhuiya, Masuk Ahmed; Devi, Warepam Jesmi
2017-12-02
The study of codon usage coupled with phylogenetic analysis is an important tool to understand the genetic and evolutionary relationship of a gene. The 13 protein coding genes of human mitochondria are involved in electron transport chain for the generation of energy currency (ATP). However, no work has yet been reported on the codon usage of the mitochondrial protein coding genes across six continents. To understand the patterns of codon usage in mitochondrial genes across six different continents, we used bioinformatic analyses to analyze the protein coding genes. The codon usage bias was low as revealed from high ENC value. Correlation between codon usage and GC3 suggested that all the codons ending with G/C were positively correlated with GC3 but vice versa for A/T ending codons with the exception of ND4L and ND5 genes. Neutrality plot revealed that for the genes ATP6, COI, COIII, CYB, ND4 and ND4L, natural selection might have played a major role while mutation pressure might have played a dominant role in the codon usage bias of ATP8, COII, ND1, ND2, ND3, ND5 and ND6 genes. Phylogenetic analysis indicated that evolutionary relationships in each of 13 protein coding genes of human mitochondria were different across six continents and further suggested that geographical distance was an important factor for the origin and evolution of 13 protein coding genes of human mitochondria. Copyright © 2017 Elsevier B.V. and Mitochondria Research Society. All rights reserved.
Assignment of the {beta}-arrestin 1 gene (ARRB1) to human chromosome 11q13
DOE Office of Scientific and Technical Information (OSTI.GOV)
Calabrese, G.; Morizio, E.; Palka, G.
1994-11-01
Two types of proteins play a major role in determining homologous desensitization of G-coupled receptors: {beta}-adrenergic receptor kinase ({beta}ARK), which phosphorylates the agonist-occupied receptor, and its functional cofactor, {beta}-arrestin. {beta}ARK is a member of a multigene family, consisting of six known subtypes, which have also been named G-protein-coupled receptor kinases (GRK 1 to 6) due to the apparently unique functional association of such kinases with this receptor family. The gene for {beta}ARK1 has been localized to human chromosome 11q13. The four members of the arrestin/{beta}-arrestin gene family identified so far are arrestin, X-arrestin, {beta}-arrestin 1, and {beta}-arrestin 2. Here themore » authors report the chromosome mapping of the human gene for {beta}-arrestin 1 (ARRB1) to chromosome 11q13 by fluorescence in situ hybridization (FISH). Two-color FISH confirmed that the two genes coding for the functionally related proteins {beta}ARK1 and {beta}arrestin 1 both map to 11q13. 16 refs., 1 fig., 1 tab.« less
Long non-coding RNAs and mRNAs profiling during spleen development in pig.
Che, Tiandong; Li, Diyan; Jin, Long; Fu, Yuhua; Liu, Yingkai; Liu, Pengliang; Wang, Yixin; Tang, Qianzi; Ma, Jideng; Wang, Xun; Jiang, Anan; Li, Xuewei; Li, Mingzhou
2018-01-01
Genome-wide transcriptomic studies in humans and mice have become extensive and mature. However, a comprehensive and systematic understanding of protein-coding genes and long non-coding RNAs (lncRNAs) expressed during pig spleen development has not been achieved. LncRNAs are known to participate in regulatory networks for an array of biological processes. Here, we constructed 18 RNA libraries from developing fetal pig spleen (55 days before birth), postnatal pig spleens (0, 30, 180 days and 2 years after birth), and the samples from the 2-year-old Wild Boar. A total of 15,040 lncRNA transcripts were identified among these samples. We found that the temporal expression pattern of lncRNAs was more restricted than observed for protein-coding genes. Time-series analysis showed two large modules for protein-coding genes and lncRNAs. The up-regulated module was enriched for genes related to immune and inflammatory function, while the down-regulated module was enriched for cell proliferation processes such as cell division and DNA replication. Co-expression networks indicated the functional relatedness between protein-coding genes and lncRNAs, which were enriched for similar functions over the series of time points examined. We identified numerous differentially expressed protein-coding genes and lncRNAs in all five developmental stages. Notably, ceruloplasmin precursor (CP), a protein-coding gene participating in antioxidant and iron transport processes, was differentially expressed in all stages. This study provides the first catalog of the developing pig spleen, and contributes to a fuller understanding of the molecular mechanisms underpinning mammalian spleen development.
NASA Technical Reports Server (NTRS)
Yu, Y.; Okayasu, R.; Weil, M. M.; Silver, A.; McCarthy, M.; Zabriskie, R.; Long, S.; Cox, R.; Ullrich, R. L.
2001-01-01
Female BALB/c mice are unusually radiosensitive and more susceptible than C57BL/6 and other tested inbred mice to ionizing radiation (IR)-induced mammary tumors. This breast cancer susceptibility is correlated with elevated susceptibility for mammary cell transformation and genomic instability following irradiation. In this study, we report the identification of two BALB/c strain-specific polymorphisms in the coding region of Prkdc, the gene encoding the DNA-dependent protein kinase catalytic subunit, which is known to be involved in DNA double-stranded break repair and post-IR signal transduction. First, we identified an A --> G transition at base 11530 resulting in a Met --> Val conversion at codon 3844 (M3844V) in the phosphatidylinositol 3-kinase domain upstream of the scid mutation (Y4046X). Second, we identified a C --> T transition at base 6418 resulting in an Arg --> Cys conversion at codon 2140 (R2140C) downstream of the putative leucine zipper domain. This unique PrkdcBALB variant gene is shown to be associated with decreased DNA-dependent protein kinase catalytic subunit activity and with increased susceptibility to IR-induced genomic instability in primary mammary epithelial cells. The data provide the first evidence that naturally arising allelic variation in a mouse DNA damage response gene may associate with IR response and breast cancer risk.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Teumer, J.; Green, H.
1989-02-01
The gene for involucrin, an epidermal protein, has been remodeled in the higher primates. Most of the coding region of the human gene consists of a modern segment of repeats derived from a 10-codon sequence present in the ancestral segment of the gene. The modern segment can be divided into early, middle, and late regions. The authors report here the nucleotide sequence of three alleles of the gorilla involucrin gene. Each possesses a modern segment homologous to that of the human and consisting of 10-codon repeats. The early and middle regions are similar to the corresponding regions of the humanmore » allele and are nearly identical among the different gorilla alleles. The late region consists of recent duplications whose pattern is unique in each of the gorilla alleles and in the human allele. The early region is located in what is now the 3{prime} third of the modern segment, and the late, polymorphic region is located in what is now the 5{prime} third. Therefore, as the modern segment expanded during evolution, its 3{prime} end became stabilized, and continuing duplications became confined to its 5{prime} end. The expansion of the involucrin coding region, which began long before the separation of the gorilla and human, has continued in both species after their separation.« less
Wang, Shuo; Gao, Li-Zhi
2016-09-01
The complete chloroplast genome of green foxtail (Setaria viridis), a promising model system for C4 photosynthesis, is first reported in this study. The genome harbors a large single copy (LSC) region of 81 016 bp and a small single copy (SSC) region of 12 456 bp separated by a pair of inverted repeat (IRa and IRb) regions of 22 315 bp. GC content is 38.92%. The proportion of coding sequence is 57.97%, comprising of 111 (19 duplicated in IR regions) unique genes, 71 of which are protein-coding genes, four are rRNA genes, and 36 are tRNA genes. Phylogenetic analysis indicated that S. viridis was clustered with its cultivated species S. italica in the tribe Paniceae of the family Poaceae. This newly determined chloroplast genome will provide valuable genetic resources to assist future studies on C4 photosynthesis in grasses.
Nguyen, Thong T; Suryamohan, Kushal; Kuriakose, Boney; Janakiraman, Vasantharajan; Reichelt, Mike; Chaudhuri, Subhra; Guillory, Joseph; Divakaran, Neethu; Rabins, P E; Goel, Ridhi; Deka, Bhabesh; Sarkar, Suman; Ekka, Preety; Tsai, Yu-Chih; Vargas, Derek; Santhosh, Sam; Mohan, Sangeetha; Chin, Chen-Shan; Korlach, Jonas; Thomas, George; Babu, Azariah; Seshagiri, Somasekar
2018-06-12
We sequenced the Hyposidra talaca NPV (HytaNPV) double stranded circular DNA genome using PacBio single molecule sequencing technology. We found that the HytaNPV genome is 139,089 bp long with a GC content of 39.6%. It encodes 141 open reading frames (ORFs) including the 37 baculovirus core genes, 25 genes conserved among lepidopteran baculoviruses, 72 genes known in baculovirus, and 7 genes unique to the HytaNPV genome. It is a group II alphabaculovirus that codes for the F protein and lacks the gp64 gene found in group I alphabaculovirus viruses. Using RNA-seq, we confirmed the expression of the ORFs identified in the HytaNPV genome. Phylogenetic analysis showed HytaNPV to be closest to BusuNPV, SujuNPV and EcobNPV that infect other tea pests, Buzura suppressaria, Sucra jujuba, and Ectropis oblique, respectively. We identified repeat elements and a conserved non-coding baculovirus element in the genome. Analysis of the putative promoter sequences identified motif consistent with the temporal expression of the genes observed in the RNA-seq data.
The complete chloroplast genome of Sinopodophyllum hexandrum (Berberidaceae).
Li, Huie; Guo, Qiqiang
2016-07-01
The complete chloroplast (cp) genome of the Sinopodophyllum hexandrum (Berberidaceae) was determined in this study. The circular genome is 157,940 bp in size, and comprises a pair of inverted repeat (IR) regions of 26,077 bp each, a large single-copy (LSC) region of 86,460 bp and a small single-copy (SSC) region of 19,326 bp. The GC content of the whole cp genome was 38.5%. A total of 133 genes were identified, including 88 protein-coding genes, 37 tRNA genes and eight rRNA genes. The whole cp genome consists of 114 unique genes, and 19 genes are duplicated in the IR regions. The phylogenetic analysis revealed that S. hexandrum is closely related to Nandina domestica within the family Berberidaceae.
2014-01-01
Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method. PMID:24386895
The genome of Mesobuthus martensii reveals a unique adaptation model of arthropods
Cao, Zhijian; Yu, Yao; Wu, Yingliang; Hao, Pei; Di, Zhiyong; He, Yawen; Chen, Zongyun; Yang, Weishan; Shen, Zhiyong; He, Xiaohua; Sheng, Jia; Xu, Xiaobo; Pan, Bohu; Feng, Jing; Yang, Xiaojuan; Hong, Wei; Zhao, Wenjuan; Li, Zhongjie; Huang, Kai; Li, Tian; Kong, Yimeng; Liu, Hui; Jiang, Dahe; Zhang, Binyan; Hu, Jun; Hu, Youtian; Wang, Bin; Dai, Jianliang; Yuan, Bifeng; Feng, Yuqi; Huang, Wei; Xing, Xiaojing; Zhao, Guoping; Li, Xuan; Li, Yixue; Li, Wenxin
2013-01-01
Representing a basal branch of arachnids, scorpions are known as ‘living fossils’ that maintain an ancient anatomy and are adapted to have survived extreme climate changes. Here we report the genome sequence of Mesobuthus martensii, containing 32,016 protein-coding genes, the most among sequenced arthropods. Although M. martensii appears to evolve conservatively, it has a greater gene family turnover than the insects that have undergone diverse morphological and physiological changes, suggesting the decoupling of the molecular and morphological evolution in scorpions. Underlying the long-term adaptation of scorpions is the expansion of the gene families enriched in basic metabolic pathways, signalling pathways, neurotoxins and cytochrome P450, and the different dynamics of expansion between the shared and the scorpion lineage-specific gene families. Genomic and transcriptomic analyses further illustrate the important genetic features associated with prey, nocturnal behaviour, feeding and detoxification. The M. martensii genome reveals a unique adaptation model of arthropods, offering new insights into the genetic bases of the living fossils. PMID:24129506
Prediction of plant lncRNA by ensemble machine learning classifiers.
Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian
2018-05-02
In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae).
Pan, Hong-Chun; Fang, Hong-Yan; Li, Shi-Wei; Liu, Jun-Hong; Wang, Ying; Wang, An-Tai
2014-12-01
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae) is composed of two linear DNA molecules. The mitochondrial DNA (mtDNA) molecule 1 is 8010 bp long and contains six protein-coding genes, large subunit rRNA, methionine and tryptophan tRNAs, two pseudogenes consisting respectively of a partial copy of COI, and terminal sequences at two ends of the linear mtDNA, while the mtDNA molecule 2 is 7576 bp long and contains seven protein-coding genes, small subunit rRNA, methionine tRNA, a pseudogene consisting of a partial copy of COI and terminal sequences at two ends of the linear mtDNA. COI gene begins with GTG as start codon, whereas other 12 protein-coding genes start with a typical ATG initiation codon. In addition, all protein-coding genes are terminated with TAA as stop codon.
Inheritance-mode specific pathogenicity prioritization (ISPP) for human protein coding genes.
Hsu, Jacob Shujui; Kwan, Johnny S H; Pan, Zhicheng; Garcia-Barcelo, Maria-Mercè; Sham, Pak Chung; Li, Miaoxin
2016-10-15
Exome sequencing studies have facilitated the detection of causal genetic variants in yet-unsolved Mendelian diseases. However, the identification of disease causal genes among a list of candidates in an exome sequencing study is still not fully settled, and it is often difficult to prioritize candidate genes for follow-up studies. The inheritance mode provides crucial information for understanding Mendelian diseases, but none of the existing gene prioritization tools fully utilize this information. We examined the characteristics of Mendelian disease genes under different inheritance modes. The results suggest that Mendelian disease genes with autosomal dominant (AD) inheritance mode are more haploinsufficiency and de novo mutation sensitive, whereas those autosomal recessive (AR) genes have significantly more non-synonymous variants and regulatory transcript isoforms. In addition, the X-linked (XL) Mendelian disease genes have fewer non-synonymous and synonymous variants. As a result, we derived a new scoring system for prioritizing candidate genes for Mendelian diseases according to the inheritance mode. Our scoring system assigned to each annotated protein-coding gene (N = 18 859) three pathogenic scores according to the inheritance mode (AD, AR and XL). This inheritance mode-specific framework achieved higher accuracy (area under curve = 0.84) in XL mode. The inheritance-mode specific pathogenicity prioritization (ISPP) outperformed other well-known methods including Haploinsufficiency, Recessive, Network centrality, Genic Intolerance, Gene Damage Index and Gene Constraint scores. This systematic study suggests that genes manifesting disease inheritance modes tend to have unique characteristics. ISPP is included in KGGSeq v1.0 (http://grass.cgs.hku.hk/limx/kggseq/), and source code is available from (https://github.com/jacobhsu35/ISPP.git). mxli@hku.hkSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred
2014-11-20
Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria. Copyright © 2014 Elsevier B.V. All rights reserved.
Network perturbation by recurrent regulatory variants in cancer
Cho, Ara; Lee, Insuk; Choi, Jung Kyoon
2017-01-01
Cancer driving genes have been identified as recurrently affected by variants that alter protein-coding sequences. However, a majority of cancer variants arise in noncoding regions, and some of them are thought to play a critical role through transcriptional perturbation. Here we identified putative transcriptional driver genes based on combinatorial variant recurrence in cis-regulatory regions. The identified genes showed high connectivity in the cancer type-specific transcription regulatory network, with high outdegree and many downstream genes, highlighting their causative role during tumorigenesis. In the protein interactome, the identified transcriptional drivers were not as highly connected as coding driver genes but appeared to form a network module centered on the coding drivers. The coding and regulatory variants associated via these interactions between the coding and transcriptional drivers showed exclusive and complementary occurrence patterns across tumor samples. Transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes. PMID:28333928
Ruhlman, Tracey; Lee, Seung-Bum; Jansen, Robert K; Hostetler, Jessica B; Tallon, Luke J; Town, Christopher D; Daniell, Henry
2006-08-31
Carrot (Daucus carota) is a major food crop in the US and worldwide. Its capacity for storage and its lifecycle as a biennial make it an attractive species for the introduction of foreign genes, especially for oral delivery of vaccines and other therapeutic proteins. Until recently efforts to express recombinant proteins in carrot have had limited success in terms of protein accumulation in the edible tap roots. Plastid genetic engineering offers the potential to overcome this limitation, as demonstrated by the accumulation of BADH in chromoplasts of carrot taproots to confer exceedingly high levels of salt resistance. The complete plastid genome of carrot provides essential information required for genetic engineering. Additionally, the sequence data add to the rapidly growing database of plastid genomes for assessing phylogenetic relationships among angiosperms. The complete carrot plastid genome is 155,911 bp in length, with 115 unique genes and 21 duplicated genes within the IR. There are four ribosomal RNAs, 30 distinct tRNA genes and 18 intron-containing genes. Repeat analysis reveals 12 direct and 2 inverted repeats > or = 30 bp with a sequence identity > or = 90%. Phylogenetic analysis of nucleotide sequences for 61 protein-coding genes using both maximum parsimony (MP) and maximum likelihood (ML) were performed for 29 angiosperms. Phylogenies from both methods provide strong support for the monophyly of several major angiosperm clades, including monocots, eudicots, rosids, asterids, eurosids II, euasterids I, and euasterids II. The carrot plastid genome contains a number of dispersed direct and inverted repeats scattered throughout coding and non-coding regions. This is the first sequenced plastid genome of the family Apiaceae and only the second published genome sequence of the species-rich euasterid II clade. Both MP and ML trees provide very strong support (100% bootstrap) for the sister relationship of Daucus with Panax in the euasterid II clade. These results provide the best taxon sampling of complete chloroplast genomes and the strongest support yet for the sister relationship of Caryophyllales to the asterids. The availability of the complete plastid genome sequence should facilitate improved transformation efficiency and foreign gene expression in carrot through utilization of endogenous flanking sequences and regulatory elements.
Complete mitochondrial genome of Ostrea denselamellosa (Bivalvia, Ostreidae).
Yu, Hong; Kong, Lingfeng; Li, Qi
2016-01-01
The complete mitochondrial (mt) genome of the flat oyster, Ostrea denselamellosa, was determined using Long-PCR and genome walking techniques in this study. The total length of the mt genome sequence of O. denselamellosa was 16,227 bp, which is the smallest reported Ostreidae mt genome to date. It contained 12 protein-coding genes (lacking of ATP8), 23 transfer RNA genes, and two ribosomal RNA genes. A bias towards a higher representation of nucleotides A and T (60.7%) was detected in the mt genome of O. denselamellosa. The rrnL was split into two fragments (3' half, 711 bp; 5' half, 509 bp), which seems to be the unique characteristics of Ostreidae mt genomes.
Trypanosome RNA polymerases and transcription factors: sensible trypanocidal drug targets?
Vanhamme, Luc
2008-11-01
Trypanosomes and Leishmaniae are the agents of several important parasitic diseases threatening hundreds of million human beings worldwide. As they diverged early in evolution, they display original molecular characteristics. These peculiarities are each defining putative specific targets for anti-parasitic drugs. Transcription displays its lot of unique characteristics in trypanosomes and will be taken as an example to uncover these targets. Unique features of transcription in trypanosomes include constitutive and poly-cistronic transcription by RNA polymerase II as well as transcription of protein-coding genes by RNA polymerase I. It is becoming clear that these unique mechanisms are performed by dedicated molecular players. The first of them have been recently characterized. They are reviewed and their suitability as drug targets is commented.
Isolation and characterization of the gene coding for Escherichia coli arginyl-tRNA synthetase.
Eriani, G; Dirheimer, G; Gangloff, J
1989-01-01
The gene coding for Escherichia coli arginyl-tRNA synthetase (argS) was isolated as a fragment of 2.4 kb after analysis and subcloning of recombinant plasmids from the Clarke and Carbon library. The clone bearing the gene overproduces arginyl-tRNA synthetase by a factor 100. This means that the enzyme represents more than 20% of the cellular total protein content. Sequencing revealed that the fragment contains a unique open reading frame of 1734 bp flanked at its 5' and 3' ends respectively by 247 bp and 397 bp. The length of the corresponding protein (577 aa) is well consistent with earlier Mr determination (about 70 kd). Primer extension analysis of the ArgRS mRNA by reverse transcriptase, located its 5' end respectively at 8 and 30 nucleotides downstream of a TATA and a TTGAC like element (CTGAC) and 60 nucleotides upstream of the unusual translation initiation codon GUG; nuclease S1 analysis located the 3'-end at 48 bp downstream of the translation termination codon. argS has a codon usage pattern typical for highly expressed E. coli genes. With the exception of the presence of a HVGH sequence similar to the HIGH consensus element, ArgRS has no relevant sequence homologies with other aminoacyl-tRNA synthetases. Images PMID:2668891
Zhang, Huijuan; Bartley, Glenn E; Mitchell, Cheryl R; Zhang, Hui; Yokoyama, Wallace
2011-10-26
The physiological effects of the hydrolysates of white rice protein (WRP), brown rice protein (BRP), and soy protein (SP) hydrolyzed by the food grade enzyme, alcalase2.4 L, were compared to the original protein source. Male Syrian Golden hamsters were fed high-fat diets containing either 20% casein (control) or 20% extracted proteins or their hydrolysates as the protein source for 3 weeks. The brown rice protein hydrolysate (BRPH) diet group reduced weight gain 76% compared with the control. Animals fed the BRPH supplemented diet also had lower final body weight, liver weight, very low density lipoprotein cholesterol (VLDL-C), and liver cholesterol, and higher fecal fat and bile acid excretion than the control. Expression levels of hepatic genes for lipid oxidation, PPARα, ACOX1, and CPT1, were highest for hamsters fed the BRPH supplemented diet. Expression of CYP7A1, the gene regulating bile acid synthesis, was higher in all test groups. Expression of CYP51, a gene coding for an enzyme involved in cholesterol synthesis, was highest in the BRPH diet group. The results suggest that BRPH includes unique peptides that reduce weight gain and hepatic cholesterol synthesis.
Nowell, Victoria J; Kropinski, Andrew M; Songer, J Glenn; MacInnes, Janet I; Parreira, Valeria R; Prescott, John F
2012-01-01
Clostridium perfringens is a common inhabitant of the avian and mammalian gastrointestinal tracts and can behave commensally or pathogenically. Some enteric diseases caused by type A C. perfringens, including bovine clostridial abomasitis, remain poorly understood. To investigate the potential basis of virulence in strains causing this disease, we sequenced the genome of a type A C. perfringens isolate (strain F262) from a case of bovine clostridial abomasitis. The ∼3.34 Mbp chromosome of C. perfringens F262 is predicted to contain 3163 protein-coding genes, 76 tRNA genes, and an integrated plasmid sequence, Cfrag (∼18 kb). In addition, sequences of two complete circular plasmids, pF262C (4.8 kb) and pF262D (9.1 kb), and two incomplete plasmid fragments, pF262A (48.5 kb) and pF262B (50.0 kb), were identified. Comparison of the chromosome sequence of C. perfringens F262 to complete C. perfringens chromosomes, plasmids and phages revealed 261 unique genes. No novel toxin genes related to previously described clostridial toxins were identified: 60% of the 261 unique genes were hypothetical proteins. There was a two base pair deletion in virS, a gene reported to encode the main sensor kinase involved in virulence gene activation. Despite this frameshift mutation, C. perfringens F262 expressed perfringolysin O, alpha-toxin and the beta2-toxin, suggesting that another regulation system might contribute to the pathogenicity of this strain. Two complete plasmids, pF262C (4.8 kb) and pF262D (9.1 kb), unique to this strain of C. perfringens were identified.
Nowell, Victoria J.; Kropinski, Andrew M.; Songer, J. Glenn; MacInnes, Janet I.; Parreira, Valeria R.; Prescott, John F.
2012-01-01
Clostridium perfringens is a common inhabitant of the avian and mammalian gastrointestinal tracts and can behave commensally or pathogenically. Some enteric diseases caused by type A C. perfringens, including bovine clostridial abomasitis, remain poorly understood. To investigate the potential basis of virulence in strains causing this disease, we sequenced the genome of a type A C. perfringens isolate (strain F262) from a case of bovine clostridial abomasitis. The ∼3.34 Mbp chromosome of C. perfringens F262 is predicted to contain 3163 protein-coding genes, 76 tRNA genes, and an integrated plasmid sequence, Cfrag (∼18 kb). In addition, sequences of two complete circular plasmids, pF262C (4.8 kb) and pF262D (9.1 kb), and two incomplete plasmid fragments, pF262A (48.5 kb) and pF262B (50.0 kb), were identified. Comparison of the chromosome sequence of C. perfringens F262 to complete C. perfringens chromosomes, plasmids and phages revealed 261 unique genes. No novel toxin genes related to previously described clostridial toxins were identified: 60% of the 261 unique genes were hypothetical proteins. There was a two base pair deletion in virS, a gene reported to encode the main sensor kinase involved in virulence gene activation. Despite this frameshift mutation, C. perfringens F262 expressed perfringolysin O, alpha-toxin and the beta2-toxin, suggesting that another regulation system might contribute to the pathogenicity of this strain. Two complete plasmids, pF262C (4.8 kb) and pF262D (9.1 kb), unique to this strain of C. perfringens were identified. PMID:22412860
A fresh look at the male-specific region of the human Y chromosome.
Jangravi, Zohreh; Alikhani, Mehdi; Arefnezhad, Babak; Sharifi Tabar, Mehdi; Taleahmad, Sara; Karamzadeh, Razieh; Jadaliha, Mahdieh; Mousavi, Seyed Ahmad; Ahmadi Rastegar, Diba; Parsamatin, Pouria; Vakilian, Haghighat; Mirshahvaladi, Shahab; Sabbaghian, Marjan; Mohseni Meybodi, Anahita; Mirzaei, Mehdi; Shahhoseini, Maryam; Ebrahimi, Marzieh; Piryaei, Abbas; Moosavi-Movahedi, Ali Akbar; Haynes, Paul A; Goodchild, Ann K; Nasr-Esfahani, Mohammad Hossein; Jabbari, Esmaiel; Baharvand, Hossein; Sedighi Gilani, Mohammad Ali; Gourabi, Hamid; Salekdeh, Ghasem Hosseini
2013-01-04
The Chromosome-centric Human Proteome Project (C-HPP) aims to systematically map the entire human proteome with the intent to enhance our understanding of human biology at the cellular level. This project attempts simultaneously to establish a sound basis for the development of diagnostic, prognostic, therapeutic, and preventive medical applications. In Iran, current efforts focus on mapping the proteome of the human Y chromosome. The male-specific region of the Y chromosome (MSY) is unique in many aspects and comprises 95% of the chromosome's length. The MSY continually retains its haploid state and is full of repeated sequences. It is responsible for important biological roles such as sex determination and male fertility. Here, we present the most recent update of MSY protein-encoding genes and their association with various traits and diseases including sex determination and reversal, spermatogenesis and male infertility, cancers such as prostate cancers, sex-specific effects on the brain and behavior, and graft-versus-host disease. We also present information available from RNA sequencing, protein-protein interaction, post-translational modification of MSY protein-coding genes and their implications in biological systems. An overview of Human Y chromosome Proteome Project is presented and a systematic approach is suggested to ensure that at least one of each predicted protein-coding gene's major representative proteins will be characterized in the context of its major anatomical sites of expression, its abundance, and its functional relevance in a biological and/or medical context. There are many technical and biological issues that will need to be overcome in order to accomplish the full scale mapping.
Delimitation of essential genes of cassava latent virus DNA 2.
Etessami, P; Callis, R; Ellwood, S; Stanley, J
1988-01-01
Insertion and deletion mutagenesis of both extended open reading frames (ORFs) of cassava latent virus DNA 2 destroys infectivity. Infectivity is restored by coinoculating constructs that contain single mutations within different ORFs. Although frequent intermolecular recombination produces dominant parental-type virus, mutants can be retained within the virus population indicating that they are competent for replication and suggesting that rescue can occur by complementation of trans acting gene products. By cloning specific fragments into DNA 1 coat protein deletion vectors we have delimited the DNA 2 coding regions and provide substantive evidence that both are essential for virus infection. Although a DNA 2 component is unique to whitefly-transmitted geminiviruses, the results demonstrate that neither coding region is involved solely in insect transmission. The requirement for a bipartite genome for whitefly-transmitted geminiviruses is discussed. Images PMID:3387209
Abernathy, Jason; Overturf, Ken
2018-01-04
Reformulation of aquafeeds in salmonid diets to include more plant proteins is critical for sustainable aquaculture. However, increasing plant proteins can lead to stunted growth and enteritis. Toward an understanding of the regulatory mechanisms behind plant protein utilization, directional RNA sequencing of liver tissues from a rainbow trout strain selected for growth on an all plant-protein diet and a control strain, both fed a plant diet for 12 weeks, were utilized to construct long noncoding RNAs. Antisense long noncoding RNAs were selected for differential expression and functional analyses since they have been shown to have regulatory actions within a genome. A total of 142 unique antisense long noncoding RNAs were differentially expressed between strains, 60 of which could be mapped to a gene. Genes underlying these noncoding RNAs are indicated in lipid metabolism and immunity. Six noncoding transcripts were also found to overlap with differentially expressed protein-coding genes, all of which were co-expressed. Associating variation in regulatory elements between rainbow trout strains with differing tolerance to plant-protein diets will assist in future studies toward increased gains throughout carnivorous aquaculture.
A Simple Test of Class-Level Genetic Association Can Reveal Novel Cardiometabolic Trait Loci.
Qian, Jing; Nunez, Sara; Reed, Eric; Reilly, Muredach P; Foulkes, Andrea S
2016-01-01
Characterizing the genetic determinants of complex diseases can be further augmented by incorporating knowledge of underlying structure or classifications of the genome, such as newly developed mappings of protein-coding genes, epigenetic marks, enhancer elements and non-coding RNAs. We apply a simple class-level testing framework, termed Genetic Class Association Testing (GenCAT), to identify protein-coding gene association with 14 cardiometabolic (CMD) related traits across 6 publicly available genome wide association (GWA) meta-analysis data resources. GenCAT uses SNP-level meta-analysis test statistics across all SNPs within a class of elements, as well as the size of the class and its unique correlation structure, to determine if the class is statistically meaningful. The novelty of findings is evaluated through investigation of regional signals. A subset of findings are validated using recently updated, larger meta-analysis resources. A simulation study is presented to characterize overall performance with respect to power, control of family-wise error and computational efficiency. All analysis is performed using the GenCAT package, R version 3.2.1. We demonstrate that class-level testing complements the common first stage minP approach that involves individual SNP-level testing followed by post-hoc ascribing of statistically significant SNPs to genes and loci. GenCAT suggests 54 protein-coding genes at 41 distinct loci for the 13 CMD traits investigated in the discovery analysis, that are beyond the discoveries of minP alone. An additional application to biological pathways demonstrates flexibility in defining genetic classes. We conclude that it would be prudent to include class-level testing as standard practice in GWA analysis. GenCAT, for example, can be used as a simple, complementary and efficient strategy for class-level testing that leverages existing data resources, requires only summary level data in the form of test statistics, and adds significant value with respect to its potential for identifying multiple novel and clinically relevant trait associations.
Complete genome sequencing and evolutionary analysis of Indian isolates of Dengue virus type 2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dash, Paban Kumar, E-mail: pabandash@rediffmail.com; Sharma, Shashi; Soni, Manisha
Highlights: •Complete genome of Indian DENV-2 was deciphered for the first time in this study. •The recent Indian DENV-2 revealed presence of many unique amino acid residues. •Genotype shift (American to Cosmopolitan) characterizes evolution of DENV-2 in India. •Circulation of a unique clade of DENV-2 in South Asia was identified. -- Abstract: Dengue is the most important arboviral infection of global public health significance. It is now endemic in most parts of the South East Asia including India. Though Dengue virus type 2 (DENV-2) is predominantly associated with major outbreaks in India, complete genome information of Indian DENV-2 is notmore » available. In this study, the full-length genome of five DENV-2 isolates (four from 2001 to 2011 and one from 1960), from different parts of India was determined. The complete genome of the Indian DENV-2 was found to be 10,670 bases long with an open reading frame coding for 3391 amino acids. The recent Indian DENV-2 (2001–2011) revealed a nucleotide sequence identity of around 90% and 97% with an older Indian DENV-2 (1960) and closely related Sri Lankan and Chinese DENV-2 respectively. Presence of unique amino acid residues and non-conservative substitutions in critical amino acid residues of major structural and non-structural proteins was observed in recent Indian DENV-2. Selection pressure analysis revealed positive selection in few amino acid sites of the genes encoding for structural and non-structural proteins. The molecular phylogenetic analysis based on comparison of both complete coding region and envelope protein gene with globally diverse DENV-2 viruses classified the recent Indian isolates into a unique South Asian clade within Cosmopolitan genotype. A shift of genotype from American to Cosmopolitan in 1970s characterized the evolution of DENV-2 in India. Present study is the first report on complete genome characterization of emerging DENV-2 isolates from India and highlights the circulation of a unique clade in South Asia.« less
Small, Clayton M; Harlin-Cognato, April D; Jones, Adam G
2013-01-01
Evolutionary studies have revealed that reproductive proteins in animals and plants often evolve more rapidly than the genome-wide average. The causes of this pattern, which may include relaxed purifying selection, sexual selection, sexual conflict, pathogen resistance, reinforcement, or gene duplication, remain elusive. Investigative expansions to additional taxa and reproductive tissues have the potential to shed new light on this unresolved problem. Here, we embark on such an expansion, in a comparison of the brood-pouch transcriptome between two male-pregnant species of the pipefish genus Syngnathus. Male brooding tissues in syngnathid fishes represent a novel, nonurogenital reproductive trait, heretofore mostly uncharacterized from a molecular perspective. We leveraged next-generation sequencing (Roche 454 pyrosequencing) to compare transcript abundance in the male brooding tissues of pregnant with nonpregnant samples from Gulf (S. scovelli) and dusky (S. floridae) pipefish. A core set of protein-coding genes, including multiple members of astacin metalloprotease and c-type lectin gene families, is consistent between species in both the direction and magnitude of expression bias. As predicted, coding DNA sequence analysis of these putative “male pregnancy proteins” suggests rapid evolution relative to nondifferentially expressed genes and reflects signatures of adaptation similar in magnitude to those reported from Drosophila male accessory gland proteins. Although the precise drivers of male pregnancy protein divergence remain unknown, we argue that the male pregnancy transcriptome in syngnathid fishes, a clade diverse with respect to brooding morphology and mating system, represents a unique and promising object of study for understanding the perplexing evolutionary nature of reproductive molecules. PMID:24324861
Barrero, Roberto A; Guerrero, Felix D; Black, Michael; McCooke, John; Chapman, Brett; Schilkey, Faye; Pérez de León, Adalberto A; Miller, Robert J; Bruns, Sara; Dobry, Jason; Mikhaylenko, Galina; Stormo, Keith; Bell, Callum; Tao, Quanzhou; Bogden, Robert; Moolhuijzen, Paula M; Hunter, Adam; Bellgard, Matthew I
2017-08-01
The genome of the cattle tick Rhipicephalus microplus, an ectoparasite with global distribution, is estimated to be 7.1Gbp in length and consists of approximately 70% repetitive DNA. We report the draft assembly of a tick genome that utilized a hybrid sequencing and assembly approach to capture the repetitive fractions of the genome. Our hybrid approach produced an assembly consisting of 2.0Gbp represented in 195,170 scaffolds with a N50 of 60,284bp. The Rmi v2.0 assembly is 51.46% repetitive with a large fraction of unclassified repeats, short interspersed elements, long interspersed elements and long terminal repeats. We identified 38,827 putative R. microplus gene loci, of which 24,758 were protein coding genes (≥100 amino acids). OrthoMCL comparative analysis against 11 selected species including insects and vertebrates identified 10,835 and 3,423 protein coding gene loci that are unique to R. microplus or common to both R. microplus and Ixodes scapularis ticks, respectively. We identified 191 microRNA loci, of which 168 have similarity to known miRNAs and 23 represent novel miRNA families. We identified the genomic loci of several highly divergent R. microplus esterases with sequence similarity to acetylcholinesterase. Additionally we report the finding of a novel cytochrome P450 CYP41 homolog that shows similar protein folding structures to known CYP41 proteins known to be involved in acaricide resistance. Copyright © 2017 Australian Society for Parasitology. Published by Elsevier Ltd. All rights reserved.
Kirkness, Ewen F; Haas, Brian J; Sun, Weilin; Braig, Henk R; Perotti, M Alejandra; Clark, John M; Lee, Si Hyeock; Robertson, Hugh M; Kennedy, Ryan C; Elhaik, Eran; Gerlach, Daniel; Kriventseva, Evgenia V; Elsik, Christine G; Graur, Dan; Hill, Catherine A; Veenstra, Jan A; Walenz, Brian; Tubío, José Manuel C; Ribeiro, José M C; Rozas, Julio; Johnston, J Spencer; Reese, Justin T; Popadic, Aleksandar; Tojo, Marta; Raoult, Didier; Reed, David L; Tomoyasu, Yoshinori; Kraus, Emily; Krause, Emily; Mittapalli, Omprakash; Margam, Venu M; Li, Hong-Mei; Meyer, Jason M; Johnson, Reed M; Romero-Severson, Jeanne; Vanzee, Janice Pagel; Alvarez-Ponce, David; Vieira, Filipe G; Aguadé, Montserrat; Guirao-Rico, Sara; Anzola, Juan M; Yoon, Kyong S; Strycharz, Joseph P; Unger, Maria F; Christley, Scott; Lobo, Neil F; Seufferheld, Manfredo J; Wang, Naikuan; Dasch, Gregory A; Struchiner, Claudio J; Madey, Greg; Hannick, Linda I; Bidwell, Shelby; Joardar, Vinita; Caler, Elisabet; Shao, Renfu; Barker, Stephen C; Cameron, Stephen; Bruggner, Robert V; Regier, Allison; Johnson, Justin; Viswanathan, Lakshmi; Utterback, Terry R; Sutton, Granger G; Lawson, Daniel; Waterhouse, Robert M; Venter, J Craig; Strausberg, Robert L; Berenbaum, May R; Collins, Frank H; Zdobnov, Evgeny M; Pittendrigh, Barry R
2010-07-06
As an obligatory parasite of humans, the body louse (Pediculus humanus humanus) is an important vector for human diseases, including epidemic typhus, relapsing fever, and trench fever. Here, we present genome sequences of the body louse and its primary bacterial endosymbiont Candidatus Riesia pediculicola. The body louse has the smallest known insect genome, spanning 108 Mb. Despite its status as an obligate parasite, it retains a remarkably complete basal insect repertoire of 10,773 protein-coding genes and 57 microRNAs. Representing hemimetabolous insects, the genome of the body louse thus provides a reference for studies of holometabolous insects. Compared with other insect genomes, the body louse genome contains significantly fewer genes associated with environmental sensing and response, including odorant and gustatory receptors and detoxifying enzymes. The unique architecture of the 18 minicircular mitochondrial chromosomes of the body louse may be linked to the loss of the gene encoding the mitochondrial single-stranded DNA binding protein. The genome of the obligatory louse endosymbiont Candidatus Riesia pediculicola encodes less than 600 genes on a short, linear chromosome and a circular plasmid. The plasmid harbors a unique arrangement of genes required for the synthesis of pantothenate, an essential vitamin deficient in the louse diet. The human body louse, its primary endosymbiont, and the bacterial pathogens that it vectors all possess genomes reduced in size compared with their free-living close relatives. Thus, the body louse genome project offers unique information and tools to use in advancing understanding of coevolution among vectors, symbionts, and pathogens.
Kirkness, Ewen F.; Haas, Brian J.; Sun, Weilin; Braig, Henk R.; Perotti, M. Alejandra; Clark, John M.; Lee, Si Hyeock; Robertson, Hugh M.; Kennedy, Ryan C.; Elhaik, Eran; Gerlach, Daniel; Kriventseva, Evgenia V.; Elsik, Christine G.; Graur, Dan; Hill, Catherine A.; Veenstra, Jan A.; Walenz, Brian; Tubío, José Manuel C.; Ribeiro, José M. C.; Rozas, Julio; Johnston, J. Spencer; Reese, Justin T.; Popadic, Aleksandar; Tojo, Marta; Raoult, Didier; Reed, David L.; Tomoyasu, Yoshinori; Kraus, Emily; Mittapalli, Omprakash; Margam, Venu M.; Li, Hong-Mei; Meyer, Jason M.; Johnson, Reed M.; Romero-Severson, Jeanne; VanZee, Janice Pagel; Alvarez-Ponce, David; Vieira, Filipe G.; Aguadé, Montserrat; Guirao-Rico, Sara; Anzola, Juan M.; Yoon, Kyong S.; Strycharz, Joseph P.; Unger, Maria F.; Christley, Scott; Lobo, Neil F.; Seufferheld, Manfredo J.; Wang, NaiKuan; Dasch, Gregory A.; Struchiner, Claudio J.; Madey, Greg; Hannick, Linda I.; Bidwell, Shelby; Joardar, Vinita; Caler, Elisabet; Shao, Renfu; Barker, Stephen C.; Cameron, Stephen; Bruggner, Robert V.; Regier, Allison; Johnson, Justin; Viswanathan, Lakshmi; Utterback, Terry R.; Sutton, Granger G.; Lawson, Daniel; Waterhouse, Robert M.; Venter, J. Craig; Strausberg, Robert L.; Collins, Frank H.; Zdobnov, Evgeny M.; Pittendrigh, Barry R.
2010-01-01
As an obligatory parasite of humans, the body louse (Pediculus humanus humanus) is an important vector for human diseases, including epidemic typhus, relapsing fever, and trench fever. Here, we present genome sequences of the body louse and its primary bacterial endosymbiont Candidatus Riesia pediculicola. The body louse has the smallest known insect genome, spanning 108 Mb. Despite its status as an obligate parasite, it retains a remarkably complete basal insect repertoire of 10,773 protein-coding genes and 57 microRNAs. Representing hemimetabolous insects, the genome of the body louse thus provides a reference for studies of holometabolous insects. Compared with other insect genomes, the body louse genome contains significantly fewer genes associated with environmental sensing and response, including odorant and gustatory receptors and detoxifying enzymes. The unique architecture of the 18 minicircular mitochondrial chromosomes of the body louse may be linked to the loss of the gene encoding the mitochondrial single-stranded DNA binding protein. The genome of the obligatory louse endosymbiont Candidatus Riesia pediculicola encodes less than 600 genes on a short, linear chromosome and a circular plasmid. The plasmid harbors a unique arrangement of genes required for the synthesis of pantothenate, an essential vitamin deficient in the louse diet. The human body louse, its primary endosymbiont, and the bacterial pathogens that it vectors all possess genomes reduced in size compared with their free-living close relatives. Thus, the body louse genome project offers unique information and tools to use in advancing understanding of coevolution among vectors, symbionts, and pathogens. PMID:20566863
Aslam, Luqman; Beal, Kathryn; Ann Blomberg, Le; Bouffard, Pascal; Burt, David W.; Crasta, Oswald; Crooijmans, Richard P. M. A.; Cooper, Kristal; Coulombe, Roger A.; De, Supriyo; Delany, Mary E.; Dodgson, Jerry B.; Dong, Jennifer J.; Evans, Clive; Frederickson, Karin M.; Flicek, Paul; Florea, Liliana; Folkerts, Otto; Groenen, Martien A. M.; Harkins, Tim T.; Herrero, Javier; Hoffmann, Steve; Megens, Hendrik-Jan; Jiang, Andrew; de Jong, Pieter; Kaiser, Pete; Kim, Heebal; Kim, Kyu-Won; Kim, Sungwon; Langenberger, David; Lee, Mi-Kyung; Lee, Taeheon; Mane, Shrinivasrao; Marcais, Guillaume; Marz, Manja; McElroy, Audrey P.; Modise, Thero; Nefedov, Mikhail; Notredame, Cédric; Paton, Ian R.; Payne, William S.; Pertea, Geo; Prickett, Dennis; Puiu, Daniela; Qioa, Dan; Raineri, Emanuele; Ruffier, Magali; Salzberg, Steven L.; Schatz, Michael C.; Scheuring, Chantel; Schmidt, Carl J.; Schroeder, Steven; Searle, Stephen M. J.; Smith, Edward J.; Smith, Jacqueline; Sonstegard, Tad S.; Stadler, Peter F.; Tafer, Hakim; Tu, Zhijian (Jake); Van Tassell, Curtis P.; Vilella, Albert J.; Williams, Kelly P.; Yorke, James A.; Zhang, Liqing; Zhang, Hong-Bin; Zhang, Xiaojun; Zhang, Yang; Reed, Kent M.
2010-01-01
A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest. PMID:20838655
Lie, Kai K; Tørresen, Ole K; Solbakken, Monica Hongrø; Rønnestad, Ivar; Tooming-Klunderud, Ave; Nederbragt, Alexander J; Jentoft, Sissel; Sæle, Øystein
2018-03-06
The ballan wrasse (Labrus bergylta) belongs to a large teleost family containing more than 600 species showing several unique evolutionary traits such as lack of stomach and hermaphroditism. Agastric fish are found throughout the teleost phylogeny, in quite diverse and unrelated lineages, indicating stomach loss has occurred independently multiple times in the course of evolution. By assembling the ballan wrasse genome and transcriptome we aimed to determine the genetic basis for its digestive system function and appetite regulation. Among other, this knowledge will aid the formulation of aquaculture diets that meet the nutritional needs of agastric species. Long and short read sequencing technologies were combined to generate a ballan wrasse genome of 805 Mbp. Analysis of the genome and transcriptome assemblies confirmed the absence of genes that code for proteins involved in gastric function. The gene coding for the appetite stimulating protein ghrelin was also absent in wrasse. Gene synteny mapping identified several appetite-controlling genes and their paralogs previously undescribed in fish. Transcriptome profiling along the length of the intestine found a declining expression gradient from the anterior to the posterior, and a distinct expression profile in the hind gut. We showed gene loss has occurred for all known genes related to stomach function in the ballan wrasse, while the remaining functions of the digestive tract appear intact. The results also show appetite control in ballan wrasse has undergone substantial changes. The loss of ghrelin suggests that other genes, such as motilin, may play a ghrelin like role. The wrasse genome offers novel insight in to the evolutionary traits of this large family. As the stomach plays a major role in protein digestion, the lack of genes related to stomach digestion in wrasse suggests it requires formulated diets with higher levels of readily digestible protein than those for gastric species.
The Sequence and Analysis of Duplication Rich Human Chromosome 16
DOE R&D Accomplishments Database
Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.
2004-01-01
We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.
Ferree, Patrick M.; Fang, Christopher; Mastrodimos, Mariah; Hay, Bruce A.; Amrhein, Henry; Akbari, Omar S.
2015-01-01
The jewel wasp Nasonia vitripennis is a rising model organism for the study of haplo-diploid reproduction characteristic of hymenopteran insects, which include all wasps, bees, and ants. We performed transcriptional profiling of the ovary, the female soma, and the male soma of N. vitripennis to complement a previously existing transcriptome of the wasp testis. These data were deposited into an open-access genome browser for visualization of transcripts relative to their gene models. We used these data to identify the assemblies of genes uniquely expressed in the germ-line tissues. We found that 156 protein-coding genes are expressed exclusively in the wasp testis compared with only 22 in the ovary. Of the testis-specific genes, eight are candidates for male-specific DNA packaging proteins known as protamines. We found very similar expression patterns of centrosome associated genes in the testis and ovary, arguing that de novo centrosome formation, a key process for development of unfertilized eggs into males, likely does not rely on large-scale transcriptional differences between these tissues. In contrast, a number of meiosis-related genes show a bias toward testis-specific expression, despite the lack of true meiosis in N. vitripennis males. These patterns may reflect an unexpected complexity of male gamete production in the haploid males of this organism. Broadly, these data add to the growing number of genomic and genetic tools available in N. vitripennis for addressing important biological questions in this rising insect model organism. PMID:26464360
The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system.
Vonk, Freek J; Casewell, Nicholas R; Henkel, Christiaan V; Heimberg, Alysha M; Jansen, Hans J; McCleary, Ryan J R; Kerkkamp, Harald M E; Vos, Rutger A; Guerreiro, Isabel; Calvete, Juan J; Wüster, Wolfgang; Woods, Anthony E; Logan, Jessica M; Harrison, Robert A; Castoe, Todd A; de Koning, A P Jason; Pollock, David D; Yandell, Mark; Calderon, Diego; Renjifo, Camila; Currier, Rachel B; Salgado, David; Pla, Davinia; Sanz, Libia; Hyder, Asad S; Ribeiro, José M C; Arntzen, Jan W; van den Thillart, Guido E E J M; Boetzer, Marten; Pirovano, Walter; Dirks, Ron P; Spaink, Herman P; Duboule, Denis; McGlinn, Edwina; Kini, R Manjunatha; Richardson, Michael K
2013-12-17
Snakes are limbless predators, and many species use venom to help overpower relatively large, agile prey. Snake venoms are complex protein mixtures encoded by several multilocus gene families that function synergistically to cause incapacitation. To examine venom evolution, we sequenced and interrogated the genome of a venomous snake, the king cobra (Ophiophagus hannah), and compared it, together with our unique transcriptome, microRNA, and proteome datasets from this species, with data from other vertebrates. In contrast to the platypus, the only other venomous vertebrate with a sequenced genome, we find that snake toxin genes evolve through several distinct co-option mechanisms and exhibit surprisingly variable levels of gene duplication and directional selection that correlate with their functional importance in prey capture. The enigmatic accessory venom gland shows a very different pattern of toxin gene expression from the main venom gland and seems to have recruited toxin-like lectin genes repeatedly for new nontoxic functions. In addition, tissue-specific microRNA analyses suggested the co-option of core genetic regulatory components of the venom secretory system from a pancreatic origin. Although the king cobra is limbless, we recovered coding sequences for all Hox genes involved in amniote limb development, with the exception of Hoxd12. Our results provide a unique view of the origin and evolution of snake venom and reveal multiple genome-level adaptive responses to natural selection in this complex biological weapon system. More generally, they provide insight into mechanisms of protein evolution under strong selection.
Circular RNAs: Unexpected outputs of many protein-coding genes
Wilusz, Jeremy E.
2017-01-01
ABSTRACT Pre-mRNAs from thousands of eukaryotic genes can be non-canonically spliced to generate circular RNAs, some of which accumulate to higher levels than their associated linear mRNA. Recent work has revealed widespread mechanisms that dictate whether the spliceosome generates a linear or circular RNA. For most genes, circular RNA biogenesis via backsplicing is far less efficient than canonical splicing, but circular RNAs can accumulate due to their long half-lives. Backsplicing is often initiated when complementary sequences from different introns base pair and bring the intervening splice sites close together. This process is further regulated by the combinatorial action of RNA binding proteins, which allow circular RNAs to be expressed in unique patterns. Some genes do not require complementary sequences to generate RNA circles and instead take advantage of exon skipping events. It is still unclear what most mature circular RNAs do, but future investigations into their functions will be facilitated by recently described methods to modulate circular RNA levels. PMID:27571848
The Complete Plastome Sequence of an Antarctic Bryophyte Sanionia uncinata (Hedw.) Loeske
Park, Mira; Park, Hyun; Lee, Hyoungseok; Lee, Byeong-ha
2018-01-01
Organellar genomes of bryophytes are poorly represented with chloroplast genomes of only four mosses, four liverworts and two hornworts having been sequenced and annotated. Moreover, while Antarctic vegetation is dominated by the bryophytes, there are few reports on the plastid genomes for the Antarctic bryophytes. Sanionia uncinata (Hedw.) Loeske is one of the most dominant moss species in the maritime Antarctic. It has been researched as an important marker for ecological studies and as an extremophile plant for studies on stress tolerance. Here, we report the complete plastome sequence of S. uncinata, which can be exploited in comparative studies to identify the lineage-specific divergence across different species. The complete plastome of S. uncinata is 124,374 bp in length with a typical quadripartite structure of 114 unique genes including 82 unique protein-coding genes, 37 tRNA genes and four rRNA genes. However, two genes encoding the α subunit of RNA polymerase (rpoA) and encoding the cytochrome b6/f complex subunit VIII (petN) were absent. We could identify nuclear genes homologous to those genes, which suggests that rpoA and petN might have been relocated from the chloroplast genome to the nuclear genome. PMID:29494552
Maize GO annotation—methods, evaluation, and review (maize-GAMER)
USDA-ARS?s Scientific Manuscript database
We created a new high-coverage, robust, and reproducible functional annotation of maize protein-coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein-coding genes, respectively, this stu...
Plant Proteins Are Smaller Because They Are Encoded by Fewer Exons than Animal Proteins.
Ramírez-Sánchez, Obed; Pérez-Rodríguez, Paulino; Delaye, Luis; Tiessen, Axel
2016-12-01
Protein size is an important biochemical feature since longer proteins can harbor more domains and therefore can display more biological functionalities than shorter proteins. We found remarkable differences in protein length, exon structure, and domain count among different phylogenetic lineages. While eukaryotic proteins have an average size of 472 amino acid residues (aa), average protein sizes in plant genomes are smaller than those of animals and fungi. Proteins unique to plants are ∼81aa shorter than plant proteins conserved among other eukaryotic lineages. The smaller average size of plant proteins could neither be explained by endosymbiosis nor subcellular compartmentation nor exon size, but rather due to exon number. Metazoan proteins are encoded on average by ∼10 exons of small size [∼176 nucleotides (nt)]. Streptophyta have on average only ∼5.7 exons of medium size (∼230nt). Multicellular species code for large proteins by increasing the exon number, while most unicellular organisms employ rather larger exons (>400nt). Among subcellular compartments, membrane proteins are the largest (∼520aa), whereas the smallest proteins correspond to the gene ontology group of ribosome (∼240aa). Plant genes are encoded by half the number of exons and also contain fewer domains than animal proteins on average. Interestingly, endosymbiotic proteins that migrated to the plant nucleus became larger than their cyanobacterial orthologs. We thus conclude that plants have proteins larger than bacteria but smaller than animals or fungi. Compared to the average of eukaryotic species, plants have ∼34% more but ∼20% smaller proteins. This suggests that photosynthetic organisms are unique and deserve therefore special attention with regard to the evolutionary forces acting on their genomes and proteomes. Copyright © 2016 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.
Omasits, Ulrich; Varadarajan, Adithi R; Schmid, Michael; Goetze, Sandra; Melidis, Damianos; Bourqui, Marc; Nikolayeva, Olga; Québatte, Maxime; Patrignani, Andrea; Dehio, Christoph; Frey, Juerg E; Robinson, Mark D; Wollscheid, Bernd; Ahrens, Christian H
2017-12-01
Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae , Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote. © 2017 Omasits et al.; Published by Cold Spring Harbor Laboratory Press.
Complete plastid genome of Astragalus mongholicus var. nakaianus (Fabaceae).
Choi, In-Su; Kim, Joo-Hwan; Choi, Byoung-Hee
2016-07-01
The first complete plastid genome (plastome) of the largest angiosperm genus, Astragalus, was sequenced for the Korean endangered endemic species A. mongholicus var. nakaianus. Its genome is relatively short (123,633 bp) because it lacks an Inverted Repeat (IR) region. It comprises 110 genes, including four unique rRNAs, 30 tRNAs, and 76 protein-coding genes. Similar to other closely related plastomes, rpl22 and rps16 are absent. The putative pseudogene with abnormal stop codons is atpE. This plastome has no additional inversions when compared with highly variable plastomes from IRLC tribes Fabeae and Trifolieae. Our phylogenetic analysis confirms the non-monophyly of Galegeae.
The virophage as a unique parasite of the giant mimivirus.
La Scola, Bernard; Desnues, Christelle; Pagnier, Isabelle; Robert, Catherine; Barrassi, Lina; Fournous, Ghislain; Merchat, Michèle; Suzan-Monti, Marie; Forterre, Patrick; Koonin, Eugene; Raoult, Didier
2008-09-04
Viruses are obligate parasites of Eukarya, Archaea and Bacteria. Acanthamoeba polyphaga mimivirus (APMV) is the largest known virus; it grows only in amoeba and is visible under the optical microscope. Mimivirus possesses a 1,185-kilobase double-stranded linear chromosome whose coding capacity is greater than that of numerous bacteria and archaea1, 2, 3. Here we describe an icosahedral small virus, Sputnik, 50 nm in size, found associated with a new strain of APMV. Sputnik cannot multiply in Acanthamoeba castellanii but grows rapidly, after an eclipse phase, in the giant virus factory found in amoebae co-infected with APMV4. Sputnik growth is deleterious to APMV and results in the production of abortive forms and abnormal capsid assembly of the host virus. The Sputnik genome is an 18.343-kilobase circular double-stranded DNA and contains genes that are linked to viruses infecting each of the three domains of life Eukarya, Archaea and Bacteria. Of the 21 predicted protein-coding genes, eight encode proteins with detectable homologues, including three proteins apparently derived from APMV, a homologue of an archaeal virus integrase, a predicted primase-helicase, a packaging ATPase with homologues in bacteriophages and eukaryotic viruses, a distant homologue of bacterial insertion sequence transposase DNA-binding subunit, and a Zn-ribbon protein. The closest homologues of the last four of these proteins were detected in the Global Ocean Survey environmental data set5, suggesting that Sputnik represents a currently unknown family of viruses. Considering its functional analogy with bacteriophages, we classify this virus as a virophage. The virophage could be a vehicle mediating lateral gene transfer between giant viruses.
Singhal, Dinesh K; Singhal, Raxita; Malik, Hruda N; Kumar, Surender; Kumar, Sudarshan; Mohanty, Ashok K; Kaushik, Jai K; Malakar, Dhruba
2014-01-01
Nanog is a homeodomain containing protein which plays important roles in regulation of signaling pathways for maintenance and induction of pluripotency in stem cells. Because of its unique expression in stem cells it is also regarded as pluripotency marker. In this study goat Nanog (gNanog) gene has been amplified, cloned and characterized at sequence level with successful over-expression in CHO-K1 cell line using a lentiviral based system. gNanog ORF is 903 bp long which codes for Nanog protein of size 300 amino acids (aas). Complete nucleotide sequence shows some evolutionary mutation in goat in comparision to other species. Protein sequence of goat is highly similar to other species. Overall, gNanog nucleotide sequence and predicted protein sequence showed high similarity and minimum divergence with cattle (96 % identity/4 % divergence) and buffalo (94/5 %) while low similarity and high divergence with pig (84/15 %), human (81/23 %) and mouse (69/40 %) indicating evolutionary closeness of gNanog to cattle and buffalo. gNanog lentiviral expression construct was prepared for over-expression of Nanog gene in adult goat fibroblast cells. Lentiviral expression construct of Nanog enabled continuous protein expression for induction and maintenance of pluripotency. Western blotting revealed the expression of Nanog gene at protein level which supported that the lentiviral expression system is highly promising for Nanog protein expression in differentiated goat cell.
Dubey, Bhawna; Meganathan, P R; Haque, Ikramul
2012-07-01
This paper reports the complete mitochondrial genome sequence of an endangered Indian snake, Python molurus molurus (Indian Rock Python). A typical snake mitochondrial (mt) genome of 17258 bp length comprising of 37 genes including the 13 protein coding genes, 22 tRNA genes, and 2 ribosomal RNA genes along with duplicate control regions is described herein. The P. molurus molurus mt. genome is relatively similar to other snake mt. genomes with respect to gene arrangement, composition, tRNA structures and skews of AT/GC bases. The nucleotide composition of the genome shows that there are more A-C % than T-G% on the positive strand as revealed by positive AT and CG skews. Comparison of individual protein coding genes, with other snake genomes suggests that ATP8 and NADH3 genes have high divergence rates. Codon usage analysis reveals a preference of NNC codons over NNG codons in the mt. genome of P. molurus. Also, the synonymous and non-synonymous substitution rates (ka/ks) suggest that most of the protein coding genes are under purifying selection pressure. The phylogenetic analyses involving the concatenated 13 protein coding genes of P. molurus molurus conformed to the previously established snake phylogeny.
Hashimoto, Takuma; Horikawa, Daiki D.; Saito, Yuki; Kuwahara, Hirokazu; Kozuka-Hata, Hiroko; Shin-I, Tadasu; Minakuchi, Yohei; Ohishi, Kazuko; Motoyama, Ayuko; Aizu, Tomoyuki; Enomoto, Atsushi; Kondo, Koyuki; Tanaka, Sae; Hara, Yuichiro; Koshikawa, Shigeyuki; Sagara, Hiroshi; Miura, Toru; Yokobori, Shin-ichi; Miyagawa, Kiyoshi; Suzuki, Yutaka; Kubo, Takeo; Oyama, Masaaki; Kohara, Yuji; Fujiyama, Asao; Arakawa, Kazuharu; Katayama, Toshiaki; Toyoda, Atsushi; Kunieda, Takekazu
2016-01-01
Tardigrades, also known as water bears, are small aquatic animals. Some tardigrade species tolerate almost complete dehydration and exhibit extraordinary tolerance to various physical extremes in the dehydrated state. Here we determine a high-quality genome sequence of Ramazzottius varieornatus, one of the most stress-tolerant tardigrade species. Precise gene repertoire analyses reveal the presence of a small proportion (1.2% or less) of putative foreign genes, loss of gene pathways that promote stress damage, expansion of gene families related to ameliorating damage, and evolution and high expression of novel tardigrade-unique proteins. Minor changes in the gene expression profiles during dehydration and rehydration suggest constitutive expression of tolerance-related genes. Using human cultured cells, we demonstrate that a tardigrade-unique DNA-associating protein suppresses X-ray-induced DNA damage by ∼40% and improves radiotolerance. These findings indicate the relevance of tardigrade-unique proteins to tolerability and tardigrades could be a bountiful source of new protection genes and mechanisms. PMID:27649274
Hashimoto, Takuma; Horikawa, Daiki D; Saito, Yuki; Kuwahara, Hirokazu; Kozuka-Hata, Hiroko; Shin-I, Tadasu; Minakuchi, Yohei; Ohishi, Kazuko; Motoyama, Ayuko; Aizu, Tomoyuki; Enomoto, Atsushi; Kondo, Koyuki; Tanaka, Sae; Hara, Yuichiro; Koshikawa, Shigeyuki; Sagara, Hiroshi; Miura, Toru; Yokobori, Shin-Ichi; Miyagawa, Kiyoshi; Suzuki, Yutaka; Kubo, Takeo; Oyama, Masaaki; Kohara, Yuji; Fujiyama, Asao; Arakawa, Kazuharu; Katayama, Toshiaki; Toyoda, Atsushi; Kunieda, Takekazu
2016-09-20
Tardigrades, also known as water bears, are small aquatic animals. Some tardigrade species tolerate almost complete dehydration and exhibit extraordinary tolerance to various physical extremes in the dehydrated state. Here we determine a high-quality genome sequence of Ramazzottius varieornatus, one of the most stress-tolerant tardigrade species. Precise gene repertoire analyses reveal the presence of a small proportion (1.2% or less) of putative foreign genes, loss of gene pathways that promote stress damage, expansion of gene families related to ameliorating damage, and evolution and high expression of novel tardigrade-unique proteins. Minor changes in the gene expression profiles during dehydration and rehydration suggest constitutive expression of tolerance-related genes. Using human cultured cells, we demonstrate that a tardigrade-unique DNA-associating protein suppresses X-ray-induced DNA damage by ∼40% and improves radiotolerance. These findings indicate the relevance of tardigrade-unique proteins to tolerability and tardigrades could be a bountiful source of new protection genes and mechanisms.
Conserved syntenic clusters of protein coding genes are missing in birds.
Lovell, Peter V; Wirthlin, Morgan; Wilhelm, Larry; Minx, Patrick; Lazar, Nathan H; Carbone, Lucia; Warren, Wesley C; Mello, Claudio V
2014-01-01
Birds are one of the most highly successful and diverse groups of vertebrates, having evolved a number of distinct characteristics, including feathers and wings, a sturdy lightweight skeleton and unique respiratory and urinary/excretion systems. However, the genetic basis of these traits is poorly understood. Using comparative genomics based on extensive searches of 60 avian genomes, we have found that birds lack approximately 274 protein coding genes that are present in the genomes of most vertebrate lineages and are for the most part organized in conserved syntenic clusters in non-avian sauropsids and in humans. These genes are located in regions associated with chromosomal rearrangements, and are largely present in crocodiles, suggesting that their loss occurred subsequent to the split of dinosaurs/birds from crocodilians. Many of these genes are associated with lethality in rodents, human genetic disorders, or biological functions targeting various tissues. Functional enrichment analysis combined with orthogroup analysis and paralog searches revealed enrichments that were shared by non-avian species, present only in birds, or shared between all species. Together these results provide a clearer definition of the genetic background of extant birds, extend the findings of previous studies on missing avian genes, and provide clues about molecular events that shaped avian evolution. They also have implications for fields that largely benefit from avian studies, including development, immune system, oncogenesis, and brain function and cognition. With regards to the missing genes, birds can be considered ‘natural knockouts’ that may become invaluable model organisms for several human diseases.
Durham, Bryndan P.; Grote, Jana; Whittaker, Kerry A.; Bender, Sara J.; Luo, Haiwei; Grim, Sharon L.; Brown, Julia M.; Casey, John R.; Dron, Antony; Florez-Leiva, Lennin; Krupke, Andreas; Luria, Catherine M.; Mine, Aric H.; Nigro, Olivia D.; Pather, Santhiska; Talarmin, Agathe; Wear, Emma K.; Weber, Thomas S.; Wilson, Jesse M.; Church, Matthew J.; DeLong, Edward F.; Karl, David M.; Steward, Grieg F.; Eppley, John M.; Kyrpides, Nikos C.; Schuster, Stephan; Rappé, Michael S.
2014-01-01
Strain HIMB11 is a planktonic marine bacterium isolated from coastal seawater in Kaneohe Bay, Oahu, Hawaii belonging to the ubiquitous and versatile Roseobacter clade of the alphaproteobacterial family Rhodobacteraceae. Here we describe the preliminary characteristics of strain HIMB11, including annotation of the draft genome sequence and comparative genomic analysis with other members of the Roseobacter lineage. The 3,098,747 bp draft genome is arranged in 34 contigs and contains 3,183 protein-coding genes and 54 RNA genes. Phylogenomic and 16S rRNA gene analyses indicate that HIMB11 represents a unique sublineage within the Roseobacter clade. Comparison with other publicly available genome sequences from members of the Roseobacter lineage reveals that strain HIMB11 has the genomic potential to utilize a wide variety of energy sources (e.g. organic matter, reduced inorganic sulfur, light, carbon monoxide), while possessing a reduced number of substrate transporters. PMID:25197450
Durham, Bryndan P; Grote, Jana; Whittaker, Kerry A; Bender, Sara J; Luo, Haiwei; Grim, Sharon L; Brown, Julia M; Casey, John R; Dron, Antony; Florez-Leiva, Lennin; Krupke, Andreas; Luria, Catherine M; Mine, Aric H; Nigro, Olivia D; Pather, Santhiska; Talarmin, Agathe; Wear, Emma K; Weber, Thomas S; Wilson, Jesse M; Church, Matthew J; DeLong, Edward F; Karl, David M; Steward, Grieg F; Eppley, John M; Kyrpides, Nikos C; Schuster, Stephan; Rappé, Michael S
2014-06-15
Strain HIMB11 is a planktonic marine bacterium isolated from coastal seawater in Kaneohe Bay, Oahu, Hawaii belonging to the ubiquitous and versatile Roseobacter clade of the alphaproteobacterial family Rhodobacteraceae. Here we describe the preliminary characteristics of strain HIMB11, including annotation of the draft genome sequence and comparative genomic analysis with other members of the Roseobacter lineage. The 3,098,747 bp draft genome is arranged in 34 contigs and contains 3,183 protein-coding genes and 54 RNA genes. Phylogenomic and 16S rRNA gene analyses indicate that HIMB11 represents a unique sublineage within the Roseobacter clade. Comparison with other publicly available genome sequences from members of the Roseobacter lineage reveals that strain HIMB11 has the genomic potential to utilize a wide variety of energy sources (e.g. organic matter, reduced inorganic sulfur, light, carbon monoxide), while possessing a reduced number of substrate transporters.
Computational Tools and Algorithms for Designing Customized Synthetic Genes
Gould, Nathan; Hendy, Oliver; Papamichail, Dimitris
2014-01-01
Advances in DNA synthesis have enabled the construction of artificial genes, gene circuits, and genomes of bacterial scale. Freedom in de novo design of synthetic constructs provides significant power in studying the impact of mutations in sequence features, and verifying hypotheses on the functional information that is encoded in nucleic and amino acids. To aid this goal, a large number of software tools of variable sophistication have been implemented, enabling the design of synthetic genes for sequence optimization based on rationally defined properties. The first generation of tools dealt predominantly with singular objectives such as codon usage optimization and unique restriction site incorporation. Recent years have seen the emergence of sequence design tools that aim to evolve sequences toward combinations of objectives. The design of optimal protein-coding sequences adhering to multiple objectives is computationally hard, and most tools rely on heuristics to sample the vast sequence design space. In this review, we study some of the algorithmic issues behind gene optimization and the approaches that different tools have adopted to redesign genes and optimize desired coding features. We utilize test cases to demonstrate the efficiency of each approach, as well as identify their strengths and limitations. PMID:25340050
CMCpy: Genetic Code-Message Coevolution Models in Python
Becich, Peter J.; Stark, Brian P.; Bhat, Harish S.; Ardell, David H.
2013-01-01
Code-message coevolution (CMC) models represent coevolution of a genetic code and a population of protein-coding genes (“messages”). Formally, CMC models are sets of quasispecies coupled together for fitness through a shared genetic code. Although CMC models display plausible explanations for the origin of multiple genetic code traits by natural selection, useful modern implementations of CMC models are not currently available. To meet this need we present CMCpy, an object-oriented Python API and command-line executable front-end that can reproduce all published results of CMC models. CMCpy implements multiple solvers for leading eigenpairs of quasispecies models. We also present novel analytical results that extend and generalize applications of perturbation theory to quasispecies models and pioneer the application of a homotopy method for quasispecies with non-unique maximally fit genotypes. Our results therefore facilitate the computational and analytical study of a variety of evolutionary systems. CMCpy is free open-source software available from http://pypi.python.org/pypi/CMCpy/. PMID:23532367
The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.
Qian, Jun; Song, Jingyuan; Gao, Huanhuan; Zhu, Yingjie; Xu, Jiang; Pang, Xiaohui; Yao, Hui; Sun, Chao; Li, Xian'en; Li, Chuyuan; Liu, Juyan; Xu, Haibin; Chen, Shilin
2013-01-01
Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.
Seligmann, Hervé
2013-03-01
Usual DNA→RNA transcription exchanges T→U. Assuming different systematic symmetric nucleotide exchanges during translation, some GenBank RNAs match exactly human mitochondrial sequences (exchange rules listed in decreasing transcript frequencies): C↔U, A↔U, A↔U+C↔G (two nucleotide pairs exchanged), G↔U, A↔G, C↔G, none for A↔C, A↔G+C↔U, and A↔C+G↔U. Most unusual transcripts involve exchanging uracil. Independent measures of rates of rare replicational enzymatic DNA nucleotide misinsertions predict frequencies of RNA transcripts systematically exchanging the corresponding misinserted nucleotides. Exchange transcripts self-hybridize less than other gene regions, self-hybridization increases with length, suggesting endoribonuclease-limited elongation. Blast detects stop codon depleted putative protein coding overlapping genes within exchange-transcribed mitochondrial genes. These align with existing GenBank proteins (mainly metazoan origins, prokaryotic and viral origins underrepresented). These GenBank proteins frequently interact with RNA/DNA, are membrane transporters, or are typical of mitochondrial metabolism. Nucleotide exchange transcript frequencies increase with overlapping gene densities and stop densities, indicating finely tuned counterbalancing regulation of expression of systematic symmetric nucleotide exchange-encrypted proteins. Such expression necessitates combined activities of suppressor tRNAs matching stops, and nucleotide exchange transcription. Two independent properties confirm predicted exchanged overlap coding genes: discrepancy of third codon nucleotide contents from replicational deamination gradients, and codon usage according to circular code predictions. Predictions from both properties converge, especially for frequent nucleotide exchange types. Nucleotide exchanging transcription apparently increases coding densities of protein coding genes without lengthening genomes, revealing unsuspected functional DNA coding potential. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Chen, Geng; Yin, Kangping; Shi, Leming; Fang, Yuanzhang; Qi, Ya; Li, Peng; Luo, Jian; He, Bing; Liu, Mingyao; Shi, Tieliu
2011-01-01
In their expression process, different genes can generate diverse functional products, including various protein-coding or noncoding RNAs. Here, we investigated the protein-coding capacities and the expression levels of their isoforms for human known genes, the conservation and disease association of long noncoding RNAs (ncRNAs) with two transcriptome sequencing datasets from human brain tissues and 10 mixed cell lines. Comparative analysis revealed that about two-thirds of the genes expressed between brain and cell lines are the same, but less than one-third of their isoforms are identical. Besides those genes specially expressed in brain and cell lines, about 66% of genes expressed in common encoded different isoforms. Moreover, most genes dominantly expressed one isoform and some genes only generated protein-coding (or noncoding) RNAs in one sample but not in another. We found 282 human genes could encode both protein-coding and noncoding RNAs through alternative splicing in the two samples. We also identified more than 1,000 long ncRNAs, and most of those long ncRNAs contain conserved elements across either 46 vertebrates or 33 placental mammals or 10 primates. Further analysis showed that some long ncRNAs differentially expressed in human breast cancer or lung cancer, several of those differentially expressed long ncRNAs were validated by RT-PCR. In addition, those validated differentially expressed long ncRNAs were found significantly correlated with certain breast cancer or lung cancer related genes, indicating the important biological relevance between long ncRNAs and human cancers. Our findings reveal that the differences of gene expression profile between samples mainly result from the expressed gene isoforms, and highlight the importance of studying genes at the isoform level for completely illustrating the intricate transcriptome.
Cheng, Chao; Ung, Matthew; Grant, Gavin D.; Whitfield, Michael L.
2013-01-01
Cell cycle is a complex and highly supervised process that must proceed with regulatory precision to achieve successful cellular division. Despite the wide application, microarray time course experiments have several limitations in identifying cell cycle genes. We thus propose a computational model to predict human cell cycle genes based on transcription factor (TF) binding and regulatory motif information in their promoters. We utilize ENCODE ChIP-seq data and motif information as predictors to discriminate cell cycle against non-cell cycle genes. Our results show that both the trans- TF features and the cis- motif features are predictive of cell cycle genes, and a combination of the two types of features can further improve prediction accuracy. We apply our model to a complete list of GENCODE promoters to predict novel cell cycle driving promoters for both protein-coding genes and non-coding RNAs such as lincRNAs. We find that a similar percentage of lincRNAs are cell cycle regulated as protein-coding genes, suggesting the importance of non-coding RNAs in cell cycle division. The model we propose here provides not only a practical tool for identifying novel cell cycle genes with high accuracy, but also new insights on cell cycle regulation by TFs and cis-regulatory elements. PMID:23874175
Discovery of rare protein-coding genes in model methylotroph Methylobacterium extorquens AM1.
Kumar, Dhirendra; Mondal, Anupam Kumar; Yadav, Amit Kumar; Dash, Debasis
2014-12-01
Proteogenomics involves the use of MS to refine annotation of protein-coding genes and discover genes in a genome. We carried out comprehensive proteogenomic analysis of Methylobacterium extorquens AM1 (ME-AM1) from publicly available proteomics data with a motive to improve annotation for methylotrophs; organisms capable of surviving in reduced carbon compounds such as methanol. Besides identifying 2482(50%) proteins, 29 new genes were discovered and 66 annotated gene models were revised in ME-AM1 genome. One such novel gene is identified with 75 peptides, lacks homolog in other methylobacteria but has glycosyl transferase and lipopolysaccharide biosynthesis protein domains, indicating its potential role in outer membrane synthesis. Many novel genes are present only in ME-AM1 among methylobacteria. Distant homologs of these genes in unrelated taxonomic classes and low GC-content of few genes suggest lateral gene transfer as a potential mode of their origin. Annotations of methylotrophy related genes were also improved by the discovery of a short gene in methylotrophy gene island and redefining a gene important for pyrroquinoline quinone synthesis, essential for methylotrophy. The combined use of proteogenomics and rigorous bioinformatics analysis greatly enhanced the annotation of protein-coding genes in model methylotroph ME-AM1 genome. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The genome of Eucalyptus grandis.
Myburg, Alexander A; Grattapaglia, Dario; Tuskan, Gerald A; Hellsten, Uffe; Hayes, Richard D; Grimwood, Jane; Jenkins, Jerry; Lindquist, Erika; Tice, Hope; Bauer, Diane; Goodstein, David M; Dubchak, Inna; Poliakov, Alexandre; Mizrachi, Eshchar; Kullan, Anand R K; Hussey, Steven G; Pinard, Desre; van der Merwe, Karen; Singh, Pooja; van Jaarsveld, Ida; Silva-Junior, Orzenil B; Togawa, Roberto C; Pappas, Marilia R; Faria, Danielle A; Sansaloni, Carolina P; Petroli, Cesar D; Yang, Xiaohan; Ranjan, Priya; Tschaplinski, Timothy J; Ye, Chu-Yu; Li, Ting; Sterck, Lieven; Vanneste, Kevin; Murat, Florent; Soler, Marçal; Clemente, Hélène San; Saidi, Naijib; Cassan-Wang, Hua; Dunand, Christophe; Hefer, Charles A; Bornberg-Bauer, Erich; Kersting, Anna R; Vining, Kelly; Amarasinghe, Vindhya; Ranik, Martin; Naithani, Sushma; Elser, Justin; Boyd, Alexander E; Liston, Aaron; Spatafora, Joseph W; Dharmwardhana, Palitha; Raja, Rajani; Sullivan, Christopher; Romanel, Elisson; Alves-Ferreira, Marcio; Külheim, Carsten; Foley, William; Carocha, Victor; Paiva, Jorge; Kudrna, David; Brommonschenkel, Sergio H; Pasquali, Giancarlo; Byrne, Margaret; Rigault, Philippe; Tibbits, Josquin; Spokevicius, Antanas; Jones, Rebecca C; Steane, Dorothy A; Vaillancourt, René E; Potts, Brad M; Joubert, Fourie; Barry, Kerrie; Pappas, Georgios J; Strauss, Steven H; Jaiswal, Pankaj; Grima-Pettenati, Jacqueline; Salse, Jérôme; Van de Peer, Yves; Rokhsar, Daniel S; Schmutz, Jeremy
2014-06-19
Eucalypts are the world's most widely planted hardwood trees. Their outstanding diversity, adaptability and growth have made them a global renewable resource of fibre and energy. We sequenced and assembled >94% of the 640-megabase genome of Eucalyptus grandis. Of 36,376 predicted protein-coding genes, 34% occur in tandem duplications, the largest proportion thus far in plant genomes. Eucalyptus also shows the highest diversity of genes for specialized metabolites such as terpenes that act as chemical defence and provide unique pharmaceutical oils. Genome sequencing of the E. grandis sister species E. globulus and a set of inbred E. grandis tree genomes reveals dynamic genome evolution and hotspots of inbreeding depression. The E. grandis genome is the first reference for the eudicot order Myrtales and is placed here sister to the eurosids. This resource expands our understanding of the unique biology of large woody perennials and provides a powerful tool to accelerate comparative biology, breeding and biotechnology.
Hart, Andrew; Cortés, María Paz; Latorre, Mauricio; Martinez, Servet
2018-01-01
The analysis of codon usage bias has been widely used to characterize different communities of microorganisms. In this context, the aim of this work was to study the codon usage bias in a natural consortium of five acidophilic bacteria used for biomining. The codon usage bias of the consortium was contrasted with genes from an alternative collection of acidophilic reference strains and metagenome samples. Results indicate that acidophilic bacteria preferentially have low codon usage bias, consistent with both their capacity to live in a wide range of habitats and their slow growth rate, a characteristic probably acquired independently from their phylogenetic relationships. In addition, the analysis showed significant differences in the unique sets of genes from the autotrophic species of the consortium in relation to other acidophilic organisms, principally in genes which code for proteins involved in metal and oxidative stress resistance. The lower values of codon usage bias obtained in this unique set of genes suggest higher transcriptional adaptation to living in extreme conditions, which was probably acquired as a measure for resisting the elevated metal conditions present in the mine.
Reeve, Wayne; van Berkum, Peter; Ardley, Julie; ...
2017-03-04
Bradyrhizobium elkanii USDA 76 T (INSCD = ARAG00000000), the type strain for Bradyrhizobium elkanii, is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing root nodule of Glycine max (L. Merr) grown in the USA. Because of its significance as a microsymbiont of this economically important legume, B. elkanii USDA 76 T was selected as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria sequencing project. Here the symbiotic abilities of B. elkanii USDA 76 T are described, together with its genome sequence information and annotation. The 9,484,767 bpmore » high-quality draft genome is arranged in 2 scaffolds of 25 contigs, containing 9060 protein-coding genes and 91 RNA-only encoding genes. The B. elkanii USDA 76 T genome contains a low GC content region with symbiotic nod and fix genes, indicating the presence of a symbiotic island integration. A comparison of five B. elkanii genomes that formed a clique revealed that 356 of the 9060 protein coding genes of USDA 76 T were unique, including 22 genes of an intact resident prophage. A conserved set of 7556 genes were also identified for this species, including genes encoding a general secretion pathway as well as type II, III, IV and VI secretion system proteins. The type III secretion system has previously been characterized as a host determinant for Rj and/or rj soybean cultivars. Here we show that the USDA 76 T genome contains genes encoding all the type III secretion system components, including a translocon complex protein NopX required for the introduction of effector proteins into host cells. While many bradyrhizobial strains are unable to nodulate the soybean cultivar Clark (rj1), USDA 76 T was able to elicit nodules on Clark (rj1), although in reduced numbers, when plants were grown in Leonard jars containing sand or vermiculite. In these conditions, we postulate that the presence of NopX allows USDA 76 T to introduce various effector molecules into this host to enable nodulation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reeve, Wayne; van Berkum, Peter; Ardley, Julie
Bradyrhizobium elkanii USDA 76 T (INSCD = ARAG00000000), the type strain for Bradyrhizobium elkanii, is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing root nodule of Glycine max (L. Merr) grown in the USA. Because of its significance as a microsymbiont of this economically important legume, B. elkanii USDA 76 T was selected as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria sequencing project. Here the symbiotic abilities of B. elkanii USDA 76 T are described, together with its genome sequence information and annotation. The 9,484,767 bpmore » high-quality draft genome is arranged in 2 scaffolds of 25 contigs, containing 9060 protein-coding genes and 91 RNA-only encoding genes. The B. elkanii USDA 76 T genome contains a low GC content region with symbiotic nod and fix genes, indicating the presence of a symbiotic island integration. A comparison of five B. elkanii genomes that formed a clique revealed that 356 of the 9060 protein coding genes of USDA 76 T were unique, including 22 genes of an intact resident prophage. A conserved set of 7556 genes were also identified for this species, including genes encoding a general secretion pathway as well as type II, III, IV and VI secretion system proteins. The type III secretion system has previously been characterized as a host determinant for Rj and/or rj soybean cultivars. Here we show that the USDA 76 T genome contains genes encoding all the type III secretion system components, including a translocon complex protein NopX required for the introduction of effector proteins into host cells. While many bradyrhizobial strains are unable to nodulate the soybean cultivar Clark (rj1), USDA 76 T was able to elicit nodules on Clark (rj1), although in reduced numbers, when plants were grown in Leonard jars containing sand or vermiculite. In these conditions, we postulate that the presence of NopX allows USDA 76 T to introduce various effector molecules into this host to enable nodulation.« less
Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.
Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M
2010-12-15
Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.
Yallowitz, Alisha R.; Gong, Ke-Qin; Swinehart, Ilea T.; Nelson, Lisa T.; Wellik, Deneen M.
2009-01-01
Summary Hox genes control many developmental events along the AP axis, but few target genes have been identified. Whether target genes are activated or repressed, what enhancer elements are required for regulation, and how different domains of the Hox proteins contribute to regulatory specificity is poorly understood. Six2 is genetically downstream of both the Hox11 paralogous genes in the developing mammalian kidney and Hoxa2 in branchial arch and facial mesenchyme. Loss-of-function of Hox11 leads to loss of Six2 expression and loss-of-function of Hoxa2 leads to expanded Six2 expression. Herein we demonstrate that a single enhancer site upstream of the Six2 coding sequence is responsible for both activation by Hox11 proteins in the kidney and repression by Hoxa2 in the branchial arch and facial mesenchyme in vivo. DNA binding activity is required for both activation and repression, but differential activity is not controlled by differences in the homeodomains. Rather, protein domains N- and C-terminal to the homeodomain confer activation versus repression activity. These data support a model in which the DNA binding specificity of Hox proteins in vivo may be similar, consistent with accumulated in vitro data, and that unique functions result mainly from differential interactions mediated by non-homeodomain regions of Hox proteins. PMID:19716816
ten Lohuis, Michael R.; Miller, David J.
1998-01-01
In the dinoflagellate Amphidinium carterae, photoadaptation involves changes in the transcription of genes encoding both of the major classes of light-harvesting proteins, the peridinin chlorophyll a proteins (PCPs) and the major a/c-containing intrinsic light-harvesting proteins (LHCs). PCP and LHC transcript levels were increased up to 86- and 6-fold higher, respectively, under low-light conditions relative to cells grown at high illumination. These increases in transcript abundance were accompanied by decreases in the extent of methylation of CpG and CpNpG motifs within or near PCP- and LHC-coding regions. Cytosine methylation levels in A. carterae are therefore nonstatic and may vary with environmental conditions in a manner suggestive of involvement in the regulation of gene expression. However, chemically induced undermethylation was insufficient in activating transcription, because treatment with two methylation inhibitors had no effect on PCP mRNA or protein levels. Regulation of gene activity through changes in DNA methylation has traditionally been assumed to be restricted to higher eukaryotes (deuterostomes and green plants); however, the atypically large genomes of dinoflagellates may have generated the requirement for systems of this type in a relatively “primitive” organism. Dinoflagellates may therefore provide a unique perspective on the evolution of eukaryotic DNA-methylation systems. PMID:9576788
Fan, SiGang; Hu, ChaoQun; Wen, Jing; Zhang, LvPing
2011-05-01
The complete mitochondrial DNA sequence contains useful information for phylogenetic analyses of metazoa. In this study, the complete mitochondrial DNA sequence of sea cucumber Stichopus horrens (Holothuroidea: Stichopodidae: Stichopus) is presented. The complete sequence was determined using normal and long PCRs. The mitochondrial genome of Stichopus horrens is a circular molecule 16257 bps long, composed of 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes. Most of these genes are coded on the heavy strand except for one protein-coding gene (nad6) and five tRNA genes (tRNA ( Ser(UCN) ), tRNA ( Gln ), tRNA ( Ala ), tRNA ( Val ), tRNA ( Asp )) which are coded on the light strand. The composition of the heavy strand is 30.8% A, 23.7% C, 16.2% G, and 29.3% T bases (AT skew=0.025; GC skew=-0.188). A non-coding region of 675 bp was identified as a putative control region because of its location and AT richness. The intergenic spacers range from 1 to 50 bp in size, totaling 227 bp. A total of 25 overlapping nucleotides, ranging from 1 to 10 bp in size, exist among 11 genes. All 13 protein-coding genes are initiated with an ATG. The TAA codon is used as the stop codon in all the protein coding genes except nad3 and nad4 that use TAG as their termination codon. The most frequently used amino acids are Leu (16.29%), Ser (10.34%) and Phe (8.37%). All of the tRNA genes have the potential to fold into typical cloverleaf secondary structures. We also compared the order of the genes in the mitochondrial DNA from the five holothurians that are now available and found a novel gene arrangement in the mitochondrial DNA of Stichopus horrens.
Anisha, Shashidharan; Bhasker, Salini; Mohankumar, Chinnamma
2012-03-01
Vechur cow, categorized as a critically maintained breed by the FAO, is a unique breed of Bos indicus due to its extremely small size, less fodder intake, adaptability, easy domestication and traditional medicinal property of the milk. Lactoferrin (Lf) is an iron-binding glycoprotein that is found predominantly in the milk of mammals. The full coding region of Lf gene of Vechur cow was cloned, sequenced and expressed in a prokaryotic system. Antibacterial activity of the recombinant Lf showed suppression of bacterial growth. To the best of our knowledge this is the first time that the full coding region of Lf gene of B. indicus Vechur breed is sequenced, successfully expressed in a prokaryotic system and characterized. Comparative analysis of Lf gene sequence of five Vechur cows with B. taurus revealed 15 SNPs in the exon region associated with 11 amino acid substitutions. The amino acid arginine was noticed as a pronounced substitution and the tertiary structure analysis of the BLfV protein confirmed the positions of arginine in the β sheet region, random coil and helix region 1. Based on the recent reports on the nutritional therapies of arginine supplementation for wound healing and for cardiovascular diseases, the higher level of arginine in the lactoferrin protein of Vechur cow milk provides enormous scope for further therapeutic studies. Copyright © 2011 Elsevier B.V. All rights reserved.
The mitochondrial genome of Priapulus caudatus Lamarck (Priapulida: Priapulidae).
Webster, Bonnie L; Mackenzie-Dodds, Jacqueline A; Telford, Maximilian J; Littlewood, D Timothy J
2007-03-01
We sequenced and annotated the complete mitochondrial (mt) genome of the priapulid Priapulus caudatus in order to provide a source of phylogenetic characters including an assessment of gene order arrangement. The genome was 14,919 bp in its entirety with few, short non-coding regions. A number of protein-coding and tRNA genes overlapped, making the genome relatively compact. The gene order was: cox1, cox2, trnK, trnD, atp8, atp6, cox3, trnG, nad3, trnA, trnR, trnN, rrnS, trnV, rrnL, trnL(yaa), trnL(nag), nad1, -trnS(nga), -cob, -nad6, trnP, -trnT, nad4L, nad4, trnH, nad5, trnF, -trnE, -trnS(nct), trnI, -trnQ, trnM, nad2, trnW, -trnC, -trnY; where '-' indicates genes transcribed on the opposite strand. The gene order, although unique amongst Metazoa, shared the greatest number of gene boundaries and the longest contiguous fragments with the chelicerate Limulus polyphemus. The mt genomes of these taxa differed only by a single inversion of 18 contiguous genes bounded by rrnS and trnS(nct). Other arthropods and nematodes shared fewer gene boundaries but considerably more than the most similar non-ecdysozoan.
Comparative Genomics of the Balsaminaceae Sister Genera Hydrocera triflora and Impatiens pinfanensis
Li, Zhi-Zhong; Saina, Josphat K.; Gichira, Andrew W.; Kyalo, Cornelius M.; Wang, Qing-Feng
2018-01-01
The family Balsaminaceae, which consists of the economically important genus Impatiens and the monotypic genus Hydrocera, lacks a reported or published complete chloroplast genome sequence. Therefore, chloroplast genome sequences of the two sister genera are significant to give insight into the phylogenetic position and understanding the evolution of the Balsaminaceae family among the Ericales. In this study, complete chloroplast (cp) genomes of Impatiens pinfanensis and Hydrocera triflora were characterized and assembled using a high-throughput sequencing method. The complete cp genomes were found to possess the typical quadripartite structure of land plants chloroplast genomes with double-stranded molecules of 154,189 bp (Impatiens pinfanensis) and 152,238 bp (Hydrocera triflora) in length. A total of 115 unique genes were identified in both genomes, of which 80 are protein-coding genes, 31 are distinct transfer RNA (tRNA) and four distinct ribosomal RNA (rRNA). Thirty codons, of which 29 had A/T ending codons, revealed relative synonymous codon usage values of >1, whereas those with G/C ending codons displayed values of <1. The simple sequence repeats comprise mostly the mononucleotide repeats A/T in all examined cp genomes. Phylogenetic analysis based on 51 common protein-coding genes indicated that the Balsaminaceae family formed a lineage with Ebenaceae together with all the other Ericales. PMID:29360746
Reinhardt, Josephine A.; Wanjiru, Betty M.; Brant, Alicia T.; Saelao, Perot; Begun, David J.; Jones, Corbin D.
2013-01-01
How non-coding DNA gives rise to new protein-coding genes (de novo genes) is not well understood. Recent work has revealed the origins and functions of a few de novo genes, but common principles governing the evolution or biological roles of these genes are unknown. To better define these principles, we performed a parallel analysis of the evolution and function of six putatively protein-coding de novo genes described in Drosophila melanogaster. Reconstruction of the transcriptional history of de novo genes shows that two de novo genes emerged from novel long non-coding RNAs that arose at least 5 MY prior to evolution of an open reading frame. In contrast, four other de novo genes evolved a translated open reading frame and transcription within the same evolutionary interval suggesting that nascent open reading frames (proto-ORFs), while not required, can contribute to the emergence of a new de novo gene. However, none of the genes arose from proto-ORFs that existed long before expression evolved. Sequence and structural evolution of de novo genes was rapid compared to nearby genes and the structural complexity of de novo genes steadily increases over evolutionary time. Despite the fact that these genes are transcribed at a higher level in males than females, and are most strongly expressed in testes, RNAi experiments show that most of these genes are essential in both sexes during metamorphosis. This lethality suggests that protein coding de novo genes in Drosophila quickly become functionally important. PMID:24146629
Conserved small mRNA with an unique, extended Shine-Dalgarno sequence
Hahn, Julia; Migur, Anzhela; von Boeselager, Raphael Freiherr; Kubatova, Nina; Kubareva, Elena; Schwalbe, Harald
2017-01-01
ABSTRACT Up to now, very small protein-coding genes have remained unrecognized in sequenced genomes. We identified an mRNA of 165 nucleotides (nt), which is conserved in Bradyrhizobiaceae and encodes a polypeptide with 14 amino acid residues (aa). The small mRNA harboring a unique Shine-Dalgarno sequence (SD) with a length of 17 nt was localized predominantly in the ribosome-containing P100 fraction of Bradyrhizobium japonicum USDA 110. Strong interaction between the mRNA and 30S ribosomal subunits was demonstrated by their co-sedimentation in sucrose density gradient. Using translational fusions with egfp, we detected weak translation and found that it is impeded by both the extended SD and the GTG start codon (instead of ATG). Biophysical characterization (CD- and NMR-spectroscopy) showed that synthesized polypeptide remained unstructured in physiological puffer. Replacement of the start codon by a stop codon increased the stability of the transcript, strongly suggesting additional posttranscriptional regulation at the ribosome. Therefore, the small gene was named rreB (ribosome-regulated expression in Bradyrhizobiaceae). Assuming that the unique ribosome binding site (RBS) is a hallmark of rreB homologs or similarly regulated genes, we looked for similar putative RBS in bacterial genomes and detected regions with at least 16 nt complementarity to the 3′-end of 16S rRNA upstream of sORFs in Caulobacterales, Rhizobiales, Rhodobacterales and Rhodospirillales. In the Rhodobacter/Roseobacter lineage of α-proteobacteria the corresponding gene (rreR) is conserved and encodes an 18 aa protein. This shows how specific RBS features can be used to identify new genes with presumably similar control of expression at the RNA level. PMID:27834614
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis
Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia
2011-01-01
Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358
Raju, Hemalatha B; Tsinoremas, Nicholas F; Capobianco, Enrico
2016-01-01
Regeneration of injured nerves is likely occurring in the peripheral nervous system, but not in the central nervous system. Although protein-coding gene expression has been assessed during nerve regeneration, little is currently known about the role of non-coding RNAs (ncRNAs). This leaves open questions about the potential effects of ncRNAs at transcriptome level. Due to the limited availability of human neuropathic pain (NP) data, we have identified the most comprehensive time-course gene expression profile referred to sciatic nerve (SN) injury and studied in a rat model using two neuronal tissues, namely dorsal root ganglion (DRG) and SN. We have developed a methodology to identify differentially expressed bioentities starting from microarray probes and repurposing them to annotate ncRNAs, while analyzing the expression profiles of protein-coding genes. The approach is designed to reuse microarray data and perform first profiling and then meta-analysis through three main steps. First, we used contextual analysis to identify what we considered putative or potential protein-coding targets for selected ncRNAs. Relevance was therefore assigned to differential expression of neighbor protein-coding genes, with neighborhood defined by a fixed genomic distance from long or antisense ncRNA loci, and of parental genes associated with pseudogenes. Second, connectivity among putative targets was used to build networks, in turn useful to conduct inference at interactomic scale. Last, network paths were annotated to assess relevance to NP. We found significant differential expression in long-intergenic ncRNAs (32 lincRNAs in SN and 8 in DRG), antisense RNA (31 asRNA in SN and 12 in DRG), and pseudogenes (456 in SN and 56 in DRG). In particular, contextual analysis centered on pseudogenes revealed some targets with known association to neurodegeneration and/or neurogenesis processes. While modules of the olfactory receptors were clearly identified in protein-protein interaction networks, other connectivity paths were identified between proteins already investigated in studies on disorders, such as Parkinson, Down syndrome, Huntington disease, and Alzheimer. Our findings suggest the importance of reusing gene expression data by meta-analysis approaches.
Methylation of miRNA genes and oncogenesis.
Loginov, V I; Rykov, S V; Fridman, M V; Braga, E A
2015-02-01
Interaction between microRNA (miRNA) and messenger RNA of target genes at the posttranscriptional level provides fine-tuned dynamic regulation of cell signaling pathways. Each miRNA can be involved in regulating hundreds of protein-coding genes, and, conversely, a number of different miRNAs usually target a structural gene. Epigenetic gene inactivation associated with methylation of promoter CpG-islands is common to both protein-coding genes and miRNA genes. Here, data on functions of miRNAs in development of tumor-cell phenotype are reviewed. Genomic organization of promoter CpG-islands of the miRNA genes located in inter- and intragenic areas is discussed. The literature and our own results on frequency of CpG-island methylation in miRNA genes from tumors are summarized, and data regarding a link between such modification and changed activity of miRNA genes and, consequently, protein-coding target genes are presented. Moreover, the impact of miRNA gene methylation on key oncogenetic processes as well as affected signaling pathways is discussed.
Dimitrieva, Slavica; Anisimova, Maria
2014-01-01
In protein-coding genes, synonymous mutations are often thought not to affect fitness and therefore are not subject to natural selection. Yet increasingly, cases of non-neutral evolution at certain synonymous sites were reported over the last decade. To evaluate the extent and the nature of site-specific selection on synonymous codons, we computed the site-to-site synonymous rate variation (SRV) and identified gene properties that make SRV more likely in a large database of protein-coding gene families and protein domains. To our knowledge, this is the first study that explores the determinants and patterns of the SRV in real data. We show that the SRV is widespread in the evolution of protein-coding sequences, putting in doubt the validity of the synonymous rate as a standard neutral proxy. While protein domains rarely undergo adaptive evolution, the SRV appears to play important role in optimizing the domain function at the level of DNA. In contrast, protein families are more likely to evolve by positive selection, but are less likely to exhibit SRV. Stronger SRV was detected in genes with stronger codon bias and tRNA reusage, those coding for proteins with larger number of interactions or forming larger number of structures, located in intracellular components and those involved in typically conserved complex processes and functions. Genes with extreme SRV show higher expression levels in nearly all tissues. This indicates that codon bias in a gene, which often correlates with gene expression, may often be a site-specific phenomenon regulating the speed of translation along the sequence, consistent with the co-translational folding hypothesis. Strikingly, genes with SRV were strongly overrepresented for metabolic pathways and those associated with several genetic diseases, particularly cancers and diabetes.
McTavish, H; LaQuier, F; Arciero, D; Logan, M; Mundfrom, G; Fuchs, J A; Hooper, A B
1993-04-01
The genome of Nitrosomonas europaea contains at least three copies each of the genes coding for hydroxylamine oxidoreductase (HAO) and cytochrome c554. A copy of an HAO gene is always located within 2.7 kb of a copy of a cytochrome c554 gene. Cytochrome P-460, a protein that shares very unusual spectral features with HAO, was found to be encoded by a gene separate from the HAO genes.
McGuire, Austen B; Rafi, Syed K; Manzardo, Ann M; Butler, Merlin G
2016-05-05
Mammalian chromosomes are comprised of complex chromatin architecture with the specific assembly and configuration of each chromosome influencing gene expression and function in yet undefined ways by varying degrees of heterochromatinization that result in Giemsa (G) negative euchromatic (light) bands and G-positive heterochromatic (dark) bands. We carried out morphometric measurements of high-resolution chromosome ideograms for the first time to characterize the total euchromatic and heterochromatic chromosome band length, distribution and localization of 20,145 known protein-coding genes, 790 recognized autism spectrum disorder (ASD) genes and 365 obesity genes. The individual lengths of G-negative euchromatin and G-positive heterochromatin chromosome bands were measured in millimeters and recorded from scaled and stacked digital images of 850-band high-resolution ideograms supplied by the International Society of Chromosome Nomenclature (ISCN) 2013. Our overall measurements followed established banding patterns based on chromosome size. G-negative euchromatic band regions contained 60% of protein-coding genes while the remaining 40% were distributed across the four heterochromatic dark band sub-types. ASD genes were disproportionately overrepresented in the darker heterochromatic sub-bands, while the obesity gene distribution pattern did not significantly differ from protein-coding genes. Our study supports recent trends implicating genes located in heterochromatin regions playing a role in biological processes including neurodevelopment and function, specifically genes associated with ASD.
Identification of high-efficiency 3'GG gRNA motifs in indexed FASTA files with ngg2.
Roberson, Elisha D O
CRISPR/Cas9 is emerging as one of the most-used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3'GG motif, which substantially increases the efficiency of editing at all sites tested in C. elegans . Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a python command-line tool, ngg2, to identify 3'GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six model genomes: Saccharomyces cerevisiae , Caenorhabditis elegans , Drosophila melanogaster , Danio rerio , Mus musculus , and Homo sapiens. I also scanned the genomes of pig ( Sus scrofa ) and African elephant ( Loxodonta africana ) to demonstrate the utility in non-model organisms. I identified more than 60 million single match 3'GG motifs in these genomes. Greater than 61% of all protein coding genes in the reference genomes had at least one unique 3'GG gRNA site overlapping an exon. In particular, more than 96% of mouse and 93% of human protein coding genes have at least one unique, overlapping 3'GG gRNA. These identified sites can be used as a starting point in gRNA selection, and the ngg2 tool provides an important ability to identify 3'GG editing sites in any species with an available genome sequence.
Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinIzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian
2013-01-01
The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five percent of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25× higher than those between inbred lines and 50× lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP–encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence. PMID:23357949
DOE Office of Scientific and Technical Information (OSTI.GOV)
Condon, Bradford J.; Leng, Yueqiang; Wu, Dongliang
The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five of each genome differs between strains of the same species,more » while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25 higher than those between inbred lines and 50 lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.« less
Computer analysis of protein functional sites projection on exon structure of genes in Metazoa.
Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A
2015-01-01
Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and allow a better understanding of the emergence of biological diversity.
Ruhlman, Tracey; Lee, Seung-Bum; Jansen, Robert K; Hostetler, Jessica B; Tallon, Luke J; Town, Christopher D; Daniell, Henry
2006-01-01
Background Carrot (Daucus carota) is a major food crop in the US and worldwide. Its capacity for storage and its lifecycle as a biennial make it an attractive species for the introduction of foreign genes, especially for oral delivery of vaccines and other therapeutic proteins. Until recently efforts to express recombinant proteins in carrot have had limited success in terms of protein accumulation in the edible tap roots. Plastid genetic engineering offers the potential to overcome this limitation, as demonstrated by the accumulation of BADH in chromoplasts of carrot taproots to confer exceedingly high levels of salt resistance. The complete plastid genome of carrot provides essential information required for genetic engineering. Additionally, the sequence data add to the rapidly growing database of plastid genomes for assessing phylogenetic relationships among angiosperms. Results The complete carrot plastid genome is 155,911 bp in length, with 115 unique genes and 21 duplicated genes within the IR. There are four ribosomal RNAs, 30 distinct tRNA genes and 18 intron-containing genes. Repeat analysis reveals 12 direct and 2 inverted repeats ≥ 30 bp with a sequence identity ≥ 90%. Phylogenetic analysis of nucleotide sequences for 61 protein-coding genes using both maximum parsimony (MP) and maximum likelihood (ML) were performed for 29 angiosperms. Phylogenies from both methods provide strong support for the monophyly of several major angiosperm clades, including monocots, eudicots, rosids, asterids, eurosids II, euasterids I, and euasterids II. Conclusion The carrot plastid genome contains a number of dispersed direct and inverted repeats scattered throughout coding and non-coding regions. This is the first sequenced plastid genome of the family Apiaceae and only the second published genome sequence of the species-rich euasterid II clade. Both MP and ML trees provide very strong support (100% bootstrap) for the sister relationship of Daucus with Panax in the euasterid II clade. These results provide the best taxon sampling of complete chloroplast genomes and the strongest support yet for the sister relationship of Caryophyllales to the asterids. The availability of the complete plastid genome sequence should facilitate improved transformation efficiency and foreign gene expression in carrot through utilization of endogenous flanking sequences and regulatory elements. PMID:16945140
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smialowska, Agata, E-mail: smialowskaa@gmail.com; School of Life Sciences, Södertörn Högskola, Huddinge 141-89; Djupedal, Ingela
Highlights: • Protein coding genes accumulate anti-sense sRNAs in fission yeast S. pombe. • RNAi represses protein-coding genes in S. pombe. • RNAi-mediated gene repression is post-transcriptional. - Abstract: RNA interference (RNAi) is a gene silencing mechanism conserved from fungi to mammals. Small interfering RNAs are products and mediators of the RNAi pathway and act as specificity factors in recruiting effector complexes. The Schizosaccharomyces pombe genome encodes one of each of the core RNAi proteins, Dicer, Argonaute and RNA-dependent RNA polymerase (dcr1, ago1, rdp1). Even though the function of RNAi in heterochromatin assembly in S. pombe is established, its rolemore » in controlling gene expression is elusive. Here, we report the identification of small RNAs mapped anti-sense to protein coding genes in fission yeast. We demonstrate that these genes are up-regulated at the protein level in RNAi mutants, while their mRNA levels are not significantly changed. We show that the repression by RNAi is not a result of heterochromatin formation. Thus, we conclude that RNAi is involved in post-transcriptional gene silencing in S. pombe.« less
Dam, Phuongan; Kataeva, Irina; Yang, Sung-Jae; Zhou, Fengfeng; Yin, Yanbin; Chou, Wenchi; Poole, Farris L.; Westpheling, Janet; Hettich, Robert; Giannone, Richard; Lewis, Derrick L.; Kelly, Robert; Gilbert, Harry J.; Henrissat, Bernard; Xu, Ying; Adams, Michael W. W.
2011-01-01
Caldicellulosiruptor bescii DSM 6725 utilizes various polysaccharides and grows efficiently on untreated high-lignin grasses and hardwood at an optimum temperature of ∼80°C. It is a promising anaerobic bacterium for studying high-temperature biomass conversion. Its genome contains 2666 protein-coding sequences organized into 1209 operons. Expression of 2196 genes (83%) was confirmed experimentally. At least 322 genes appear to have been obtained by lateral gene transfer (LGT). Putative functions were assigned to 364 conserved/hypothetical protein (C/HP) genes. The genome contains 171 and 88 genes related to carbohydrate transport and utilization, respectively. Growth on cellulose led to the up-regulation of 32 carbohydrate-active (CAZy), 61 sugar transport, 25 transcription factor and 234 C/HP genes. Some C/HPs were overproduced on cellulose or xylan, suggesting their involvement in polysaccharide conversion. A unique feature of the genome is enrichment with genes encoding multi-modular, multi-functional CAZy proteins organized into one large cluster, the products of which are proposed to act synergistically on different components of plant cell walls and to aid the ability of C. bescii to convert plant biomass. The high duplication of CAZy domains coupled with the ability to acquire foreign genes by LGT may have allowed the bacterium to rapidly adapt to changing plant biomass-rich environments. PMID:21227922
NASA Astrophysics Data System (ADS)
Gusev, Oleg; Kikawada, Takahiro; Shagimardanova, Elena; Suetsugu, Yoshitaka; Ayupov, Rustam
Origin of anhydrobiosis in the larvae of the sleeping chironomid Polypedilum vanderplanki represents unique example of set of evolutionary events in a single species, resulted in acquiring new ability allowing survival in extremely changeable environment. Complex comparative analysis of the genome of P. vanderplanki resulted in discovery of a set of features, including existence of the set of unique clusters of genes contributing in desiccation resistance. Surprisingly, in several cases, the genes mainly contributing to the formation of the molecular shield in the larvae are sleeping chironomid-specific and have no homology with genes from other insects, including P. nubifer - a chironomid from the same genus. Protein L-isoaspartyl methyltransferase (PIMT) acts on proteins that have been non-enzymatically damaged due to age, and partially restores aspartic residues, extending life of the polypeptides. PIMT a highly conserved enzyme present in nearly all eukaryotes, and microorganisms mostly in a single copy (or in a few isoforms in certain plants and some bacteria). While conducting a comparative analysis of the genomes of two chironomid midge species different in their ability to stand complete water loss, we have noticed that structure and number of PIMT-coding genes in the desiccation resistant (anhydrobiotic) midge (Polypedilum vanderplanki, Pv) is different from those of the common desiccation-sensitive midge (Polypedilum nubifer, Pn) and the rest of insects. Both species have a clear orthologous PIMT shared by all insects. At the same time, in contrast to Pn which has only one PIMT gene (PnPimt-1), the Pv genome contains 12 additional genes paralogous to Pimt1 (PvPimt-2-12) presumably coding functional PIMT proteins, which are arranged in a single cluster. Remarkably, PvPimt-1 location in the Pv is different from the rest of Pimt-like genes. PvPimt-1 gene is ubiquitously expressed during the life cycle, but expression of the PvPimt2-12 is limited to the eggs and larval stages. Finally, the expression of Pimt1 gene in both chironomids was not changed in response to desiccation, while the clustered PvPimt2-12 showed strong up-regulation in response to water loss and other abiotic stresses. The abundance of PvPimt2-12 mRNAs was maximal in anhydrobiotic larvae, and it resembles the case of plant seeds where accumulation of PIMT provides additional protection for proteins during long dry storage. Predicted proteins of PvPimT2-12 contain conservative L-isoaspartyl methyltransferase functional domain. At the same time the length and structure of N- and C- terminals of the predicted proteins show significant variation, suggesting different substrate preferences or other specific properties of different Pv-PIMT Furthermore, the multi-member family in Pv is the first observation of drastic expansion and evolution of Pimt genes in general, and particularly in a single insect species. This work was supported by Russian Foundation for Basic Research (No. 12-08-33157 mol_a_ved and No. 14-04-01657_A).
Solov'ev, V V; Kel', A E; Kolchanov, N A
1989-01-01
The factors, determining the presence of inverted and symmetrical repeats in genes coding for globular proteins, have been analysed. An interesting property of genetical code has been revealed in the analysis of symmetrical repeats: the pairs of symmetrical codons corresponded to pairs of amino acids with mostly similar physical-chemical parameters. This property may explain the presence of symmetrical repeats and palindromes only in genes coding for beta-structural proteins-polypeptides, where amino acids with similar physical-chemical properties occupy symmetrical positions. A stochastic model of evolution of polynucleotide sequences has been used for analysis of inverted repeats. The modelling demonstrated that only limiting of sequences (uneven frequencies of used codons) is enough for arising of nonrandom inverted repeats in genes.
The Human Cell Surfaceome of Breast Tumors
da Cunha, Júlia Pinheiro Chagas; Galante, Pedro Alexandre Favoretto; de Souza, Jorge Estefano Santana; Pieprzyk, Martin; Carraro, Dirce Maria; Old, Lloyd J.; Camargo, Anamaria Aranha; de Souza, Sandro José
2013-01-01
Introduction. Cell surface proteins are ideal targets for cancer therapy and diagnosis. We have identified a set of more than 3700 genes that code for transmembrane proteins believed to be at human cell surface. Methods. We used a high-throuput qPCR system for the analysis of 573 cell surface protein-coding genes in 12 primary breast tumors, 8 breast cell lines, and 21 normal human tissues including breast. To better understand the role of these genes in breast tumors, we used a series of bioinformatics strategies to integrates different type, of the datasets, such as KEGG, protein-protein interaction databases, ONCOMINE, and data from, literature. Results. We found that at least 77 genes are overexpressed in breast primary tumors while at least 2 of them have also a restricted expression pattern in normal tissues. We found common signaling pathways that may be regulated in breast tumors through the overexpression of these cell surface protein-coding genes. Furthermore, a comparison was made between the genes found in this report and other genes associated with features clinically relevant for breast tumorigenesis. Conclusions. The expression profiling generated in this study, together with an integrative bioinformatics analysis, allowed us to identify putative targets for breast tumors. PMID:24195083
The king cobra genome reveals dynamic gene evolution and adaptation in the snake venom system
Vonk, Freek J.; Casewell, Nicholas R.; Henkel, Christiaan V.; Heimberg, Alysha M.; Jansen, Hans J.; McCleary, Ryan J. R.; Kerkkamp, Harald M. E.; Vos, Rutger A.; Guerreiro, Isabel; Calvete, Juan J.; Wüster, Wolfgang; Woods, Anthony E.; Logan, Jessica M.; Harrison, Robert A.; Castoe, Todd A.; de Koning, A. P. Jason; Pollock, David D.; Yandell, Mark; Calderon, Diego; Renjifo, Camila; Currier, Rachel B.; Salgado, David; Pla, Davinia; Sanz, Libia; Hyder, Asad S.; Ribeiro, José M. C.; Arntzen, Jan W.; van den Thillart, Guido E. E. J. M.; Boetzer, Marten; Pirovano, Walter; Dirks, Ron P.; Spaink, Herman P.; Duboule, Denis; McGlinn, Edwina; Kini, R. Manjunatha; Richardson, Michael K.
2013-01-01
Snakes are limbless predators, and many species use venom to help overpower relatively large, agile prey. Snake venoms are complex protein mixtures encoded by several multilocus gene families that function synergistically to cause incapacitation. To examine venom evolution, we sequenced and interrogated the genome of a venomous snake, the king cobra (Ophiophagus hannah), and compared it, together with our unique transcriptome, microRNA, and proteome datasets from this species, with data from other vertebrates. In contrast to the platypus, the only other venomous vertebrate with a sequenced genome, we find that snake toxin genes evolve through several distinct co-option mechanisms and exhibit surprisingly variable levels of gene duplication and directional selection that correlate with their functional importance in prey capture. The enigmatic accessory venom gland shows a very different pattern of toxin gene expression from the main venom gland and seems to have recruited toxin-like lectin genes repeatedly for new nontoxic functions. In addition, tissue-specific microRNA analyses suggested the co-option of core genetic regulatory components of the venom secretory system from a pancreatic origin. Although the king cobra is limbless, we recovered coding sequences for all Hox genes involved in amniote limb development, with the exception of Hoxd12. Our results provide a unique view of the origin and evolution of snake venom and reveal multiple genome-level adaptive responses to natural selection in this complex biological weapon system. More generally, they provide insight into mechanisms of protein evolution under strong selection. PMID:24297900
Peng, Xinxia; Gralinski, Lisa; Armour, Christopher D; Ferris, Martin T; Thomas, Matthew J; Proll, Sean; Bradel-Tretheway, Birgit G; Korth, Marcus J; Castle, John C; Biery, Matthew C; Bouzek, Heather K; Haynor, David R; Frieman, Matthew B; Heise, Mark; Raymond, Christopher K; Baric, Ralph S; Katze, Michael G
2010-10-26
Studies of the host response to virus infection typically focus on protein-coding genes. However, non-protein-coding RNAs (ncRNAs) are transcribed in mammalian cells, and the roles of many of these ncRNAs remain enigmas. Using next-generation sequencing, we performed a whole-transcriptome analysis of the host response to severe acute respiratory syndrome coronavirus (SARS-CoV) infection across four founder mouse strains of the Collaborative Cross. We observed differential expression of approximately 500 annotated, long ncRNAs and 1,000 nonannotated genomic regions during infection. Moreover, studies of a subset of these ncRNAs and genomic regions showed the following. (i) Most were similarly regulated in response to influenza virus infection. (ii) They had distinctive kinetic expression profiles in type I interferon receptor and STAT1 knockout mice during SARS-CoV infection, including unique signatures of ncRNA expression associated with lethal infection. (iii) Over 40% were similarly regulated in vitro in response to both influenza virus infection and interferon treatment. These findings represent the first discovery of the widespread differential expression of long ncRNAs in response to virus infection and suggest that ncRNAs are involved in regulating the host response, including innate immunity. At the same time, virus infection models provide a unique platform for studying the biology and regulation of ncRNAs.
Peng, Xinxia; Gralinski, Lisa; Armour, Christopher D.; Ferris, Martin T.; Thomas, Matthew J.; Proll, Sean; Bradel-Tretheway, Birgit G.; Korth, Marcus J.; Castle, John C.; Biery, Matthew C.; Bouzek, Heather K.; Haynor, David R.; Frieman, Matthew B.; Heise, Mark; Raymond, Christopher K.; Baric, Ralph S.; Katze, Michael G.
2010-01-01
Studies of the host response to virus infection typically focus on protein-coding genes. However, non-protein-coding RNAs (ncRNAs) are transcribed in mammalian cells, and the roles of many of these ncRNAs remain enigmas. Using next-generation sequencing, we performed a whole-transcriptome analysis of the host response to severe acute respiratory syndrome coronavirus (SARS-CoV) infection across four founder mouse strains of the Collaborative Cross. We observed differential expression of approximately 500 annotated, long ncRNAs and 1,000 nonannotated genomic regions during infection. Moreover, studies of a subset of these ncRNAs and genomic regions showed the following. (i) Most were similarly regulated in response to influenza virus infection. (ii) They had distinctive kinetic expression profiles in type I interferon receptor and STAT1 knockout mice during SARS-CoV infection, including unique signatures of ncRNA expression associated with lethal infection. (iii) Over 40% were similarly regulated in vitro in response to both influenza virus infection and interferon treatment. These findings represent the first discovery of the widespread differential expression of long ncRNAs in response to virus infection and suggest that ncRNAs are involved in regulating the host response, including innate immunity. At the same time, virus infection models provide a unique platform for studying the biology and regulation of ncRNAs. PMID:20978541
Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin
ERIC Educational Resources Information Center
Offner, Susan
2010-01-01
The beta hemoglobin protein is identical in humans and chimpanzees. In this tutorial, students see that even though the proteins are identical, the genes that code for them are not. There are many more differences in the introns than in the exons, which indicates that coding regions of DNA are more highly conserved than non-coding regions.
1996-01-01
Mutations in the Caenorhabditis elegans gene unc-89 result in nematodes having disorganized muscle structure in which thick filaments are not organized into A-bands, and there are no M-lines. Beginning with a partial cDNA from the C. elegans sequencing project, we have cloned and sequenced the unc-89 gene. An unc-89 allele, st515, was found to contain an 84-bp deletion and a 10-bp duplication, resulting in an in- frame stop codon within predicted unc-89 coding sequence. Analysis of the complete coding sequence for unc-89 predicts a novel 6,632 amino acid polypeptide consisting of sequence motifs which have been implicated in protein-protein interactions. UNC-89 begins with 67 residues of unique sequences, SH3, dbl/CDC24, and PH domains, 7 immunoglobulins (Ig) domains, a putative KSP-containing multiphosphorylation domain, and ends with 46 Ig domains. A polyclonal antiserum raised to a portion of unc-89 encoded sequence reacts to a twitchin-sized polypeptide from wild type, but truncated polypeptides from st515 and from the amber allele e2338. By immunofluorescent microscopy, this antiserum localizes to the middle of A-bands, consistent with UNC-89 being a structural component of the M-line. Previous studies indicate that myofilament lattice assembly begins with positional cues laid down in the basement membrane and muscle cell membrane. We propose that the intracellular protein UNC-89 responds to these signals, localizes, and then participates in assembling an M-line. PMID:8603916
Sebaihia, Mohammed; Preston, Andrew; Maskell, Duncan J.; Kuzmiak, Holly; Connell, Terry D.; King, Natalie D.; Orndorff, Paul E.; Miyamoto, David M.; Thomson, Nicholas R.; Harris, David; Goble, Arlette; Lord, Angela; Murphy, Lee; Quail, Michael A.; Rutter, Simon; Squares, Robert; Squares, Steven; Woodward, John; Parkhill, Julian; Temple, Louise M.
2006-01-01
Bordetella avium is a pathogen of poultry and is phylogenetically distinct from Bordetella bronchiseptica, Bordetella pertussis, and Bordetella parapertussis, which are other species in the Bordetella genus that infect mammals. In order to understand the evolutionary relatedness of Bordetella species and further the understanding of pathogenesis, we obtained the complete genome sequence of B. avium strain 197N, a pathogenic strain that has been extensively studied. With 3,732,255 base pairs of DNA and 3,417 predicted coding sequences, it has the smallest genome and gene complement of the sequenced bordetellae. In this study, the presence or absence of previously reported virulence factors from B. avium was confirmed, and the genetic bases for growth characteristics were elucidated. Over 1,100 genes present in B. avium but not in B. bronchiseptica were identified, and most were predicted to encode surface or secreted proteins that are likely to define an organism adapted to the avian rather than the mammalian respiratory tracts. These include genes coding for the synthesis of a polysaccharide capsule, hemagglutinins, a type I secretion system adjacent to two very large genes for secreted proteins, and unique genes for both lipopolysaccharide and fimbrial biogenesis. Three apparently complete prophages are also present. The BvgAS virulence regulatory system appears to have polymorphisms at a poly(C) tract that is involved in phase variation in other bordetellae. A number of putative iron-regulated outer membrane proteins were predicted from the sequence, and this regulation was confirmed experimentally for five of these. PMID:16885469
Niarchos, Athanasios; Siora, Anastasia; Konstantinou, Evangelia; Kalampoki, Vasiliki; Lagoumintzis, George; Poulas, Konstantinos
2017-01-01
During the last few decades, the recombinant protein expression finds more and more applications. The cloning of protein-coding genes into expression vectors is required to be directional for proper expression, and versatile in order to facilitate gene insertion in multiple different vectors for expression tests. In this study, the TA-GC cloning method is proposed, as a new, simple and efficient method for the directional cloning of protein-coding genes in expression vectors. The presented method features several advantages over existing methods, which tend to be relatively more labour intensive, inflexible or expensive. The proposed method relies on the complementarity between single A- and G-overhangs of the protein-coding gene, obtained after a short incubation with T4 DNA polymerase, and T and C overhangs of the novel vector pET-BccI, created after digestion with the restriction endonuclease BccI. The novel protein-expression vector pET-BccI also facilitates the screening of transformed colonies for recombinant transformants. Evaluation experiments of the proposed TA-GC cloning method showed that 81% of the transformed colonies contained recombinant pET-BccI plasmids, and 98% of the recombinant colonies expressed the desired protein. This demonstrates that TA-GC cloning could be a valuable method for cloning protein-coding genes in expression vectors.
Niarchos, Athanasios; Siora, Anastasia; Konstantinou, Evangelia; Kalampoki, Vasiliki; Poulas, Konstantinos
2017-01-01
During the last few decades, the recombinant protein expression finds more and more applications. The cloning of protein-coding genes into expression vectors is required to be directional for proper expression, and versatile in order to facilitate gene insertion in multiple different vectors for expression tests. In this study, the TA-GC cloning method is proposed, as a new, simple and efficient method for the directional cloning of protein-coding genes in expression vectors. The presented method features several advantages over existing methods, which tend to be relatively more labour intensive, inflexible or expensive. The proposed method relies on the complementarity between single A- and G-overhangs of the protein-coding gene, obtained after a short incubation with T4 DNA polymerase, and T and C overhangs of the novel vector pET-BccI, created after digestion with the restriction endonuclease BccI. The novel protein-expression vector pET-BccI also facilitates the screening of transformed colonies for recombinant transformants. Evaluation experiments of the proposed TA-GC cloning method showed that 81% of the transformed colonies contained recombinant pET-BccI plasmids, and 98% of the recombinant colonies expressed the desired protein. This demonstrates that TA-GC cloning could be a valuable method for cloning protein-coding genes in expression vectors. PMID:29091919
Neuhaus, Klaus; Landstorfer, Richard; Fellner, Lea; Simon, Svenja; Schafferhans, Andrea; Goldberg, Tatyana; Marx, Harald; Ozoline, Olga N; Rost, Burkhard; Kuster, Bernhard; Keim, Daniel A; Scherer, Siegfried
2016-02-24
Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome). Using the above methods, 72 short, non-annotated protein-coding genes were detected. All of these showed signals in the ribosomal footprinting assay indicating mRNA translation. Seven were verified by mass spectrometry. Fifty-seven genes are annotated in other enterobacteriaceae, mainly as hypothetical genes; the remaining 15 genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, 61 of the 72 novel proteins exhibit predicted structural and functional features similar to those of annotated proteins. Many of the novel genes show differential transcription when grown under eleven diverse growth conditions suggesting environmental regulation. Three genes were found to confer a phenotype in previous studies, e.g., decreased cattle colonization. These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that hypothetical genes are not annotation artifacts and opening an additional way to study their functionality. All 72 genes are taxonomically restricted and, therefore, appear to have evolved relatively recently de novo.
Romero, Roberto; Tarca, Adi L; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S; Kalita, Cynthia A; Cai, Juan; Yeo, Lami; Lipovich, Leonard
2014-09-01
To identify differentially expressed long non-coding RNA (lncRNA) genes in human myometrium in women with spontaneous labor at term. Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n = 19) and women in spontaneous labor at term (n = 20). RNA was extracted and profiled using an Illumina® microarray platform. We have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. We identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an experimental method completely independent of the microarray analysis. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site, that lacked evolutionary conservation beyond primates. We provide, for the first time, evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term.
Making the Bend: DNA Tertiary Structure and Protein-DNA Interactions
Harteis, Sabrina; Schneider, Sabine
2014-01-01
DNA structure functions as an overlapping code to the DNA sequence. Rapid progress in understanding the role of DNA structure in gene regulation, DNA damage recognition and genome stability has been made. The three dimensional structure of both proteins and DNA plays a crucial role for their specific interaction, and proteins can recognise the chemical signature of DNA sequence (“base readout”) as well as the intrinsic DNA structure (“shape recognition”). These recognition mechanisms do not exist in isolation but, depending on the individual interaction partners, are combined to various extents. Driving force for the interaction between protein and DNA remain the unique thermodynamics of each individual DNA-protein pair. In this review we focus on the structures and conformations adopted by DNA, both influenced by and influencing the specific interaction with the corresponding protein binding partner, as well as their underlying thermodynamics. PMID:25026169
Raymond, Frédéric; Boisvert, Sébastien; Roy, Gaétan; Ritt, Jean-François; Légaré, Danielle; Isnard, Amandine; Stanke, Mario; Olivier, Martin; Tremblay, Michel J.; Papadopoulou, Barbara; Ouellette, Marc; Corbeil, Jacques
2012-01-01
The Leishmania tarentolae Parrot-TarII strain genome sequence was resolved to an average 16-fold mean coverage by next-generation DNA sequencing technologies. This is the first non-pathogenic to humans kinetoplastid protozoan genome to be described thus providing an opportunity for comparison with the completed genomes of pathogenic Leishmania species. A high synteny was observed between all sequenced Leishmania species. A limited number of chromosomal regions diverged between L. tarentolae and L. infantum, while remaining syntenic to L. major. Globally, >90% of the L. tarentolae gene content was shared with the other Leishmania species. We identified 95 predicted coding sequences unique to L. tarentolae and 250 genes that were absent from L. tarentolae. Interestingly, many of the latter genes were expressed in the intracellular amastigote stage of pathogenic species. In addition, genes coding for products involved in antioxidant defence or participating in vesicular-mediated protein transport were underrepresented in L. tarentolae. In contrast to other Leishmania genomes, two gene families were expanded in L. tarentolae, namely the zinc metallo-peptidase surface glycoprotein GP63 and the promastigote surface antigen PSA31C. Overall, L. tarentolae's gene content appears better adapted to the promastigote insect stage rather than the amastigote mammalian stage. PMID:21998295
The complete chloroplast genome sequence of Chikusichloa aquatica (Poaceae: Oryzeae).
Zhang, Jie; Zhang, Dan; Shi, Chao; Gao, Ju; Gao, Li-Zhi
2016-07-01
The complete chloroplast sequence of the Chikusichloa aquatica was determined in this study. The genome consists of 136 563 bp containing a pair of inverted repeats (IRs) of 20 837 bp, which was separated by a large single-copy region and a small single-copy region of 82 315 bp and 33 411 bp, respectively. The C. aquatica cp genome encodes 111 functional genes (71 protein-coding genes, four rRNA genes, and 36 tRNA genes): 92 are unique, while 19 are duplicated in the IR regions. The genic regions account for 58.9% of whole cp genome, and the GC content of the plastome is 39.0%. A phylogenomic analysis showed that C. aquatica is closely related to Rhynchoryza subulata that belongs to the tribe Oryzeae.
Peng, Rui; Zeng, Bo; Meng, Xiuxiang; Yue, Bisong; Zhang, Zhihe; Zou, Fangdong
2007-08-01
The complete mitochondrial genome sequence of the giant panda, Ailuropoda melanoleuca, was determined by the long and accurate polymerase chain reaction (LA-PCR) with conserved primers and primer walking sequence methods. The complete mitochondrial DNA is 16,805 nucleotides in length and contains two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one control region. The total length of the 13 protein-coding genes is longer than the American black bear, brown bear and polar bear by 3 amino acids at the end of ND5 gene. The codon usage also followed the typical vertebrate pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 5 (ND5) gene. The molecular phylogenetic analysis was performed on the sequences of 12 concatenated heavy-strand encoded protein-coding genes, and suggested that the giant panda is most closely related to bears.
Lipovich, Leonard; Hou, Zhuo-Cheng; Jia, Hui; Sinkler, Christopher; McGowen, Michael; Sterner, Kirstin N; Weckle, Amy; Sugalski, Amara B; Pipes, Lenore; Gatti, Domenico L; Mason, Christopher E; Sherwood, Chet C; Hof, Patrick R; Kuzawa, Christopher W; Grossman, Lawrence I; Goodman, Morris; Wildman, Derek E
2016-02-01
The human brain and human cognitive abilities are strikingly different from those of other great apes despite relatively modest genome sequence divergence. However, little is presently known about the interspecies divergence in gene structure and transcription that might contribute to these phenotypic differences. To date, most comparative studies of gene structure in the brain have examined humans, chimpanzees, and macaque monkeys. To add to this body of knowledge, we analyze here the brain transcriptome of the western lowland gorilla (Gorilla gorilla gorilla), an African great ape species that is phylogenetically closely related to humans, but with a brain that is approximately one-third the size. Manual transcriptome curation from a sample of the planum temporale region of the neocortex revealed 12 protein-coding genes and one noncoding-RNA gene with exons in the gorilla unmatched by public transcriptome data from the orthologous human loci. These interspecies gene structure differences accounted for a total of 134 amino acids in proteins found in the gorilla that were absent from protein products of the orthologous human genes. Proteins varying in structure between human and gorilla were involved in immunity and energy metabolism, suggesting their relevance to phenotypic differences. This gorilla neocortical transcriptome comprises an empirical, not homology- or prediction-driven, resource for orthologous gene comparisons between human and gorilla. These findings provide a unique repository of the sequences and structures of thousands of genes transcribed in the gorilla brain, pointing to candidate genes that may contribute to the traits distinguishing humans from other closely related great apes. © 2015 Wiley Periodicals, Inc.
Nedelcu, Aurora M.; Lee, Robert W.; Lemieux, Claude; Gray, Michael W.; Burger, Gertraud
2000-01-01
Two distinct mitochondrial genome types have been described among the green algal lineages investigated to date: a reduced–derived, Chlamydomonas-like type and an ancestral, Prototheca-like type. To determine if this unexpected dichotomy is real or is due to insufficient or biased sampling and to define trends in the evolution of the green algal mitochondrial genome, we sequenced and analyzed the mitochondrial DNA (mtDNA) of Scenedesmus obliquus. This genome is 42,919 bp in size and encodes 42 conserved genes (i.e., large and small subunit rRNA genes, 27 tRNA and 13 respiratory protein-coding genes), four additional free-standing open reading frames with no known homologs, and an intronic reading frame with endonuclease/maturase similarity. No 5S rRNA or ribosomal protein-coding genes have been identified in Scenedesmus mtDNA. The standard protein-coding genes feature a deviant genetic code characterized by the use of UAG (normally a stop codon) to specify leucine, and the unprecedented use of UCA (normally a serine codon) as a signal for termination of translation. The mitochondrial genome of Scenedesmus combines features of both green algal mitochondrial genome types: the presence of a more complex set of protein-coding and tRNA genes is shared with the ancestral type, whereas the lack of 5S rRNA and ribosomal protein-coding genes as well as the presence of fragmented and scrambled rRNA genes are shared with the reduced–derived type of mitochondrial genome organization. Furthermore, the gene content and the fragmentation pattern of the rRNA genes suggest that this genome represents an intermediate stage in the evolutionary process of mitochondrial genome streamlining in green algae. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF204057.] PMID:10854413
Genomic insights into the Ixodes scapularis tick vector of Lyme disease
Gulia-Nuss, Monika; Nuss, Andrew B.; Meyer, Jason M.; Sonenshine, Daniel E.; Roe, R. Michael; Waterhouse, Robert M.; Sattelle, David B.; de la Fuente, José; Ribeiro, Jose M.; Megy, Karine; Thimmapuram, Jyothi; Miller, Jason R.; Walenz, Brian P.; Koren, Sergey; Hostetler, Jessica B.; Thiagarajan, Mathangi; Joardar, Vinita S.; Hannick, Linda I.; Bidwell, Shelby; Hammond, Martin P.; Young, Sarah; Zeng, Qiandong; Abrudan, Jenica L.; Almeida, Francisca C.; Ayllón, Nieves; Bhide, Ketaki; Bissinger, Brooke W.; Bonzon-Kulichenko, Elena; Buckingham, Steven D.; Caffrey, Daniel R.; Caimano, Melissa J.; Croset, Vincent; Driscoll, Timothy; Gilbert, Don; Gillespie, Joseph J.; Giraldo-Calderón, Gloria I.; Grabowski, Jeffrey M.; Jiang, David; Khalil, Sayed M. S.; Kim, Donghun; Kocan, Katherine M.; Koči, Juraj; Kuhn, Richard J.; Kurtti, Timothy J.; Lees, Kristin; Lang, Emma G.; Kennedy, Ryan C.; Kwon, Hyeogsun; Perera, Rushika; Qi, Yumin; Radolf, Justin D.; Sakamoto, Joyce M.; Sánchez-Gracia, Alejandro; Severo, Maiara S.; Silverman, Neal; Šimo, Ladislav; Tojo, Marta; Tornador, Cristian; Van Zee, Janice P.; Vázquez, Jesús; Vieira, Filipe G.; Villar, Margarita; Wespiser, Adam R.; Yang, Yunlong; Zhu, Jiwei; Arensburger, Peter; Pietrantonio, Patricia V.; Barker, Stephen C.; Shao, Renfu; Zdobnov, Evgeny M.; Hauser, Frank; Grimmelikhuijzen, Cornelis J. P.; Park, Yoonseong; Rozas, Julio; Benton, Richard; Pedra, Joao H. F.; Nelson, David R.; Unger, Maria F.; Tubio, Jose M. C.; Tu, Zhijian; Robertson, Hugh M.; Shumway, Martin; Sutton, Granger; Wortman, Jennifer R.; Lawson, Daniel; Wikel, Stephen K.; Nene, Vishvanath M.; Fraser, Claire M.; Collins, Frank H.; Birren, Bruce; Nelson, Karen E.; Caler, Elisabet; Hill, Catherine A.
2016-01-01
Ticks transmit more pathogens to humans and animals than any other arthropod. We describe the 2.1 Gbp nuclear genome of the tick, Ixodes scapularis (Say), which vectors pathogens that cause Lyme disease, human granulocytic anaplasmosis, babesiosis and other diseases. The large genome reflects accumulation of repetitive DNA, new lineages of retro-transposons, and gene architecture patterns resembling ancient metazoans rather than pancrustaceans. Annotation of scaffolds representing ∼57% of the genome, reveals 20,486 protein-coding genes and expansions of gene families associated with tick–host interactions. We report insights from genome analyses into parasitic processes unique to ticks, including host ‘questing', prolonged feeding, cuticle synthesis, blood meal concentration, novel methods of haemoglobin digestion, haem detoxification, vitellogenesis and prolonged off-host survival. We identify proteins associated with the agent of human granulocytic anaplasmosis, an emerging disease, and the encephalitis-causing Langat virus, and a population structure correlated to life-history traits and transmission of the Lyme disease agent. PMID:26856261
Genomic insights into the Ixodes scapularis tick vector of Lyme disease.
Gulia-Nuss, Monika; Nuss, Andrew B; Meyer, Jason M; Sonenshine, Daniel E; Roe, R Michael; Waterhouse, Robert M; Sattelle, David B; de la Fuente, José; Ribeiro, Jose M; Megy, Karine; Thimmapuram, Jyothi; Miller, Jason R; Walenz, Brian P; Koren, Sergey; Hostetler, Jessica B; Thiagarajan, Mathangi; Joardar, Vinita S; Hannick, Linda I; Bidwell, Shelby; Hammond, Martin P; Young, Sarah; Zeng, Qiandong; Abrudan, Jenica L; Almeida, Francisca C; Ayllón, Nieves; Bhide, Ketaki; Bissinger, Brooke W; Bonzon-Kulichenko, Elena; Buckingham, Steven D; Caffrey, Daniel R; Caimano, Melissa J; Croset, Vincent; Driscoll, Timothy; Gilbert, Don; Gillespie, Joseph J; Giraldo-Calderón, Gloria I; Grabowski, Jeffrey M; Jiang, David; Khalil, Sayed M S; Kim, Donghun; Kocan, Katherine M; Koči, Juraj; Kuhn, Richard J; Kurtti, Timothy J; Lees, Kristin; Lang, Emma G; Kennedy, Ryan C; Kwon, Hyeogsun; Perera, Rushika; Qi, Yumin; Radolf, Justin D; Sakamoto, Joyce M; Sánchez-Gracia, Alejandro; Severo, Maiara S; Silverman, Neal; Šimo, Ladislav; Tojo, Marta; Tornador, Cristian; Van Zee, Janice P; Vázquez, Jesús; Vieira, Filipe G; Villar, Margarita; Wespiser, Adam R; Yang, Yunlong; Zhu, Jiwei; Arensburger, Peter; Pietrantonio, Patricia V; Barker, Stephen C; Shao, Renfu; Zdobnov, Evgeny M; Hauser, Frank; Grimmelikhuijzen, Cornelis J P; Park, Yoonseong; Rozas, Julio; Benton, Richard; Pedra, Joao H F; Nelson, David R; Unger, Maria F; Tubio, Jose M C; Tu, Zhijian; Robertson, Hugh M; Shumway, Martin; Sutton, Granger; Wortman, Jennifer R; Lawson, Daniel; Wikel, Stephen K; Nene, Vishvanath M; Fraser, Claire M; Collins, Frank H; Birren, Bruce; Nelson, Karen E; Caler, Elisabet; Hill, Catherine A
2016-02-09
Ticks transmit more pathogens to humans and animals than any other arthropod. We describe the 2.1 Gbp nuclear genome of the tick, Ixodes scapularis (Say), which vectors pathogens that cause Lyme disease, human granulocytic anaplasmosis, babesiosis and other diseases. The large genome reflects accumulation of repetitive DNA, new lineages of retro-transposons, and gene architecture patterns resembling ancient metazoans rather than pancrustaceans. Annotation of scaffolds representing ∼57% of the genome, reveals 20,486 protein-coding genes and expansions of gene families associated with tick-host interactions. We report insights from genome analyses into parasitic processes unique to ticks, including host 'questing', prolonged feeding, cuticle synthesis, blood meal concentration, novel methods of haemoglobin digestion, haem detoxification, vitellogenesis and prolonged off-host survival. We identify proteins associated with the agent of human granulocytic anaplasmosis, an emerging disease, and the encephalitis-causing Langat virus, and a population structure correlated to life-history traits and transmission of the Lyme disease agent.
[Clinical and genetic analysis of a patient with Treacher Collins syndrome in TCOF1 gene].
Li, Hongbo; Zhang, Xu; Li, Zhenyue; Chen, Jing; Lu, Yu; Jia, Jingjie; Yuan, Huijun; Han, Dongyi
2012-05-01
To analyze the clinical and genetic features of a patient with Treacher Collins syndrome (TCS), and identify the mutation in TCOF1 gene. The medical history was taken, and general physical examinations and otological examinations were conducted in this patient. Genomic DNA was extracted from this patient and his parents and complete TCOF1 gene coding exons were amplified by specific PCR primers. Direct sequencing was carried out to identify the mutations. The raw data was analyzed with GeneTool software and molecular biological website. We detected a heterozygous c. 1639 delAG mutation in exon 11 of TCOF1, which resulted in a truncated protein lacking normal function. This mutation is a novel mutation and the second case identified in exon 11 of in TCS. TCS patient reported in this study has unique clinical phenotype. TCOF1 gene mutation is the specific risk factor.
Cavanagh, Jorunn Pauline; Hjerde, Erik; Holden, Matthew T G; Kahlke, Tim; Klingenberg, Claus; Flægstad, Trond; Parkhill, Julian; Bentley, Stephen D; Sollid, Johanna U Ericson
2014-11-01
Staphylococcus haemolyticus is an emerging cause of nosocomial infections, primarily affecting immunocompromised patients. A comparative genomic analysis was performed on clinical S. haemolyticus isolates to investigate their genetic relationship and explore the coding sequences with respect to antimicrobial resistance determinants and putative hospital adaptation. Whole-genome sequencing was performed on 134 isolates of S. haemolyticus from geographically diverse origins (Belgium, 2; Germany, 10; Japan, 13; Norway, 54; Spain, 2; Switzerland, 43; UK, 9; USA, 1). Each genome was individually assembled. Protein coding sequences (CDSs) were predicted and homologous genes were categorized into three types: Type I, core genes, homologues present in all strains; Type II, unique core genes, homologues shared by only a subgroup of strains; and Type III, unique genes, strain-specific CDSs. The phylogenetic relationship between the isolates was built from variable sites in the form of single nucleotide polymorphisms (SNPs) in the core genome and used to construct a maximum likelihood phylogeny. SNPs in the genome core regions divided the isolates into one major group of 126 isolates and one minor group of isolates with highly diverse genomes. The major group was further subdivided into seven clades (A-G), of which four (A-D) encompassed isolates only from Europe. Antimicrobial multiresistance was observed in 77.7% of the collection. High levels of homologous recombination were detected in genes involved in adherence, staphylococcal host adaptation and bacterial cell communication. The presence of several successful and highly resistant clones underlines the adaptive potential of this opportunistic pathogen. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy.
Cavanagh, Jorunn Pauline; Hjerde, Erik; Holden, Matthew T. G.; Kahlke, Tim; Klingenberg, Claus; Flægstad, Trond; Parkhill, Julian; Bentley, Stephen D.; Sollid, Johanna U. Ericson
2014-01-01
Objectives Staphylococcus haemolyticus is an emerging cause of nosocomial infections, primarily affecting immunocompromised patients. A comparative genomic analysis was performed on clinical S. haemolyticus isolates to investigate their genetic relationship and explore the coding sequences with respect to antimicrobial resistance determinants and putative hospital adaptation. Methods Whole-genome sequencing was performed on 134 isolates of S. haemolyticus from geographically diverse origins (Belgium, 2; Germany, 10; Japan, 13; Norway, 54; Spain, 2; Switzerland, 43; UK, 9; USA, 1). Each genome was individually assembled. Protein coding sequences (CDSs) were predicted and homologous genes were categorized into three types: Type I, core genes, homologues present in all strains; Type II, unique core genes, homologues shared by only a subgroup of strains; and Type III, unique genes, strain-specific CDSs. The phylogenetic relationship between the isolates was built from variable sites in the form of single nucleotide polymorphisms (SNPs) in the core genome and used to construct a maximum likelihood phylogeny. Results SNPs in the genome core regions divided the isolates into one major group of 126 isolates and one minor group of isolates with highly diverse genomes. The major group was further subdivided into seven clades (A–G), of which four (A–D) encompassed isolates only from Europe. Antimicrobial multiresistance was observed in 77.7% of the collection. High levels of homologous recombination were detected in genes involved in adherence, staphylococcal host adaptation and bacterial cell communication. Conclusions The presence of several successful and highly resistant clones underlines the adaptive potential of this opportunistic pathogen. PMID:25038069
Wu, Yueh-Lung; Wu, Carol-P; Huang, Yu-Hui; Huang, Sheng-Ping; Lo, Huei-Ru; Chang, Hao-Shuo; Lin, Pi-Hsiu; Wu, Ming-Cheng; Chang, Chia-Jung; Chao, Yu-Chan
2014-11-01
The p143 gene from Autographa californica multinucleocapsid nucleopolyhedrovirus (AcMNPV) has been found to increase the expression of luciferase, which is driven by the polyhedrin gene promoter, in a plasmid with virus coinfection. Further study indicated that this is due to the presence of a replication origin (ori) in the coding region of this gene. Transient DNA replication assays showed that a specific fragment of the p143 coding sequence, p143-3, underwent virus-dependent DNA replication in Spodoptera frugiperda IPLB-Sf-21 (Sf-21) cells. Deletion analysis of the p143-3 fragment showed that subfragment p143-3.2a contained the essential sequence of this putative ori. Sequence analysis of this region revealed a unique distribution of imperfect palindromes with high AT contents. No sequence homology or similarity between p143-3.2a and any other known ori was detected, suggesting that it is a novel baculovirus ori. Further study showed that the p143-3.2a ori can replicate more efficiently in infected Sf-21 cells than baculovirus homologous regions (hrs), the major baculovirus ori, or non-hr oris during virus replication. Previously, hr on its own was unable to replicate in mammalian cells, and for mammalian viral oris, viral proteins are generally required for their proper replication in host cells. However, the p143-3.2a ori was, surprisingly, found to function as an efficient ori in mammalian cells without the need for any viral proteins. We conclude that p143 contains a unique sequence that can function as an ori to enhance gene expression in not only insect cells but also mammalian cells. Baculovirus DNA replication relies on both hr and non-hr oris; however, so far very little is known about the latter oris. Here we have identified a new non-hr ori, the p143 ori, which resides in the coding region of p143. By developing a novel DNA replication-enhanced reporter system, we have identified and located the core region required for the p143 ori. This ori contains a large number of imperfect inverted repeats and is the most active ori in the viral genome during virus infection in insect cells. We also found that it is a unique ori that can replicate in mammalian cells without the assistance of baculovirus gene products. The identification of this ori should contribute to a better understanding of baculovirus DNA replication. Also, this ori is very useful in assisting with gene expression in mammalian cells. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roa, B.B.; Warner, L.E.; Lupski, J.R.
1994-09-01
The MPZ gene that maps to chromosome 1q22q23 encodes myelin protein zero, which is the most abundant peripheral nerve myelin protein that functions as a homophilic adhesion molecule in myelin compaction. Association of the MPZ gene with the dysmyelinating peripheral neuropathies Charcot-Marie-Tooth disease type 1B (CMT1B) and the more severe Dejerine-Sottas syndrome (DSS) was previously demonstrated by MPZ mutations identified in CMT1B and in rare DSS patients. In this study, the coding region of the MPZ gene was screened for mutations in a cohort of 74 unrelated patients with either CMT type 1 or DSS who do not carry themore » most common CMT1-associated molecular lesion of a 1.5 Mb DNA duplication on 17p11.2-p12. Heteroduplex analysis detected base mismatches in ten patients that were distributed over three exons of MPZ. Direct sequencing of PCR-amplified genomic DNA identified a de novo MPZ mutation associated with CMT1B that predicts an Ile(135)Thr substitution. This finding further confirms the role of MPZ in the CMT1B disease process. In addition, two polymorphisms were identified within the Gly(200) and Ser(228) codons that do not alter the respective amino acid residues. A fourth base mismatch in MPZ exon 3 detected by heteroduplex analysis is currently being characterized by direct sequence determination. Previously, four unrelated patients in this same cohort were found to have unique point mutations in the coding region of the PMP22 gene. The collective findings on CMT1 point mutations could suggest that regulatory region mutations, and possibly mutations in CMT gene(s) apart from the MPZ, PMP22 and Cx32 genes identified thus far, may prove to be significant for a number of CMT1 cases that do not involve DNA duplication.« less
Miller-Delaney, Suzanne F.C.; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C.; Bray, Isabella M.; Reynolds, James P.; Gwinn, Ryder; Stallings, Raymond L.
2015-01-01
Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. PMID:25552301
Mycobacterium ahvazicum sp. nov., the nineteenth species of the Mycobacterium simiae complex.
Bouam, Amar; Heidarieh, Parvin; Shahraki, Abodolrazagh Hashemi; Pourahmad, Fazel; Mirsaeidi, Mehdi; Hashemzadeh, Mohamad; Baptiste, Emeline; Armstrong, Nicholas; Levasseur, Anthony; Robert, Catherine; Drancourt, Michel
2018-03-07
Four slowly growing mycobacteria isolates were isolated from the respiratory tract and soft tissue biopsies collected in four unrelated patients in Iran. Conventional phenotypic tests indicated that these four isolates were identical to Mycobacterium lentiflavum while 16S rRNA gene sequencing yielded a unique sequence separated from that of M. lentiflavum. One representative strain AFP-003 T was characterized as comprising a 6,121,237-bp chromosome (66.24% guanosine-cytosine content) encoding for 5,758 protein-coding genes, 50 tRNA and one complete rRNA operon. A total of 2,876 proteins were found to be associated with the mobilome, including 195 phage proteins. A total of 1,235 proteins were found to be associated with virulence and 96 with toxin/antitoxin systems. The genome of AFP-003 T has the genetic potential to produce secondary metabolites, with 39 genes found to be associated with polyketide synthases and non-ribosomal peptide syntases and 11 genes encoding for bacteriocins. Two regions encoding putative prophages and three OriC regions separated by the dnaA gene were predicted. Strain AFP-003 T genome exhibits 86% average nucleotide identity with Mycobacterium genavense genome. Genetic and genomic data indicate that strain AFP-003 T is representative of a novel Mycobacterium species that we named Mycobacterium ahvazicum, the nineteenth species of the expanding Mycobacterium simiae complex.
Raju, Hemalatha B.; Tsinoremas, Nicholas F.; Capobianco, Enrico
2016-01-01
Regeneration of injured nerves is likely occurring in the peripheral nervous system, but not in the central nervous system. Although protein-coding gene expression has been assessed during nerve regeneration, little is currently known about the role of non-coding RNAs (ncRNAs). This leaves open questions about the potential effects of ncRNAs at transcriptome level. Due to the limited availability of human neuropathic pain (NP) data, we have identified the most comprehensive time-course gene expression profile referred to sciatic nerve (SN) injury and studied in a rat model using two neuronal tissues, namely dorsal root ganglion (DRG) and SN. We have developed a methodology to identify differentially expressed bioentities starting from microarray probes and repurposing them to annotate ncRNAs, while analyzing the expression profiles of protein-coding genes. The approach is designed to reuse microarray data and perform first profiling and then meta-analysis through three main steps. First, we used contextual analysis to identify what we considered putative or potential protein-coding targets for selected ncRNAs. Relevance was therefore assigned to differential expression of neighbor protein-coding genes, with neighborhood defined by a fixed genomic distance from long or antisense ncRNA loci, and of parental genes associated with pseudogenes. Second, connectivity among putative targets was used to build networks, in turn useful to conduct inference at interactomic scale. Last, network paths were annotated to assess relevance to NP. We found significant differential expression in long-intergenic ncRNAs (32 lincRNAs in SN and 8 in DRG), antisense RNA (31 asRNA in SN and 12 in DRG), and pseudogenes (456 in SN and 56 in DRG). In particular, contextual analysis centered on pseudogenes revealed some targets with known association to neurodegeneration and/or neurogenesis processes. While modules of the olfactory receptors were clearly identified in protein–protein interaction networks, other connectivity paths were identified between proteins already investigated in studies on disorders, such as Parkinson, Down syndrome, Huntington disease, and Alzheimer. Our findings suggest the importance of reusing gene expression data by meta-analysis approaches. PMID:27803687
NASA Technical Reports Server (NTRS)
Patil, Shameekumar; Takezawa, D.; Poovaiah, B. W.
1995-01-01
Calcium, a universal second messenger, regulates diverse cellular processes in eukaryotes. Ca-2(+) and Ca-2(+)/calmodulin-regulated protein phosphorylation play a pivotal role in amplifying and diversifying the action of Ca-2(+)- mediated signals. A chimeric Ca-2(+)/calmodulin-dependent protein kinase (CCaMK) gene with a visinin-like Ca-2(+)- binding domain was cloned and characterized from lily. The cDNA clone contains an open reading frame coding for a protein of 520 amino acids. The predicted structure of CCaMK contains a catalytic domain followed by two regulatory domains, a calmodulin-binding domain and a visinin-like Ca-2(+)-binding domain. The amino-terminal region of CCaMK contains all 11 conserved subdomains characteristic of serine/threonine protein kinases. The calmodulin-binding region of CCaMK has high homology (79%) to alpha subunit of mammalian Ca-2(+)/calmodulin-dependent protein kinase. The calmodulin-binding region is fused to a neural visinin-like domain that contains three Ca-2(+)-binding EF-hand motifs and a biotin-binding site. The Escherichia coli-expressed protein (approx. 56 kDa) binds calmodulin in a Ca-2(+)-dependent manner. Furthermore, Ca-45-binding assays revealed that CCaMK directly binds Ca-2(+). The CCaMK gene is preferentially expressed in developing anthers. Southern blot analysis revealed that CCaMK is encoded by a single gene. The structural features of the gene suggest that it has multiple regulatory controls and could play a unique role in Ca-2(+) signaling in plants.
Delcourt, Vivian; Lucier, Jean-François; Gagnon, Jules; Beaudoin, Maxime C; Vanderperre, Benoît; Breton, Marc-André; Motard, Julie; Jacques, Jean-François; Brunelle, Mylène; Gagnon-Arsenault, Isabelle; Fournier, Isabelle; Ouangraoua, Aida; Hunting, Darel J; Cohen, Alan A; Landry, Christian R; Scott, Michelle S
2017-01-01
Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins. PMID:29083303
Wu, Zongfu; Wang, Weixue; Tang, Min; Shao, Jing; Dai, Chen; Zhang, Wei; Fan, Hongjie; Yao, Huochun; Zong, Jie; Chen, Dai; Wang, Junning; Lu, Chengping
2014-02-10
Streptococcus suis (SS) is an important swine pathogen worldwide that occasionally causes serious infections in humans. SS infection may result in meningitis in pigs and humans. The pathogenic mechanisms of SS are poorly understood. Here, we provide the complete genome sequence of S. suis serotype 2 (SS2) strain SC070731 isolated from a pig with meningitis. The chromosome is 2,138,568bp in length. There are 1933 predicted protein coding sequences and 96.7% (57/59) of the known virulence-associated genes are present in the genome. Strain SC070731 showed similar virulence with SS2 virulent strains HA9801 and ZY05719, but was more virulent than SS2 virulent strain P1/7 in the zebrafish infection model. Comparative genomic analysis revealed a unique 105K genomic island in strain SC070731 that is absent in seven other sequenced SS2 strains. Further analysis of the 105K genomic island indicated that it contained a complete nisin locus similar to the nisin U locus in S. uberis strain 42, a prophage similar to S. oralis phage PH10 and several antibiotic resistance genes. Several proteins in the 105K genomic island, including nisin and RelBE toxin-antitoxin system, contribute to the bacterial fitness and virulence in other pathogenic bacteria. Further investigation of newly identified gene products, including four putative new virulence-associated surface proteins, will improve our understanding of SS pathogenesis. Copyright © 2013 Elsevier B.V. All rights reserved.
Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux
Lynch, Erin A.; Langille, Morgan G. I.; Darling, Aaron; Wilbanks, Elizabeth G.; Haltiner, Caitlin; Shao, Katie S. Y.; Starr, Michael O.; Teiling, Clotilde; Harkins, Timothy T.; Edwards, Robert A.; Eisen, Jonathan A.; Facciotti, Marc T.
2012-01-01
We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model group for the study of archaeal biology. The unique physiological properties of these organisms also make them good candidates for novel enzyme discovery for biotechnological applications. Seven genomes were sequenced to ∼20×coverage and assembled to an average of 50 contigs (range 5 scaffolds - 168 contigs). Comparisons of protein-coding gene compliments revealed large-scale differences in COG functional group enrichment between these genera. Analysis of genes encoding machinery for DNA metabolism reveals genera-specific expansions of the general transcription factor TATA binding protein as well as a history of extensive duplication and horizontal transfer of the proliferating cell nuclear antigen. Insights gained from this study emphasize the importance of haloarchaea for investigation of archaeal biology. PMID:22848480
Sugii, Yuh; Kasai, Tomonari; Ikeda, Masashi; Vaidyanath, Arun; Kumon, Kazuki; Mizutani, Akifumi; Seno, Akimasa; Tokutaka, Heizo; Kudoh, Takayuki; Seno, Masaharu
2016-01-01
To identify cell-specific markers, we designed a DNA microarray platform with oligonucleotide probes for human membrane-anchored proteins. Human glioma cell lines were analyzed using microarray and compared with normal and fetal brain tissues. For the microarray analysis, we employed a spherical self-organizing map, which is a clustering method suitable for the conversion of multidimensional data into two-dimensional data and displays the relationship on a spherical surface. Based on the gene expression profile, the cell surface characteristics were successfully mirrored onto the spherical surface, thereby distinguishing normal brain tissue from the disease model based on the strength of gene expression. The clustered glioma-specific genes were further analyzed by polymerase chain reaction procedure and immunocytochemical staining of glioma cells. Our platform and the following procedure were successfully demonstrated to categorize the genes coding for cell surface proteins that are specific to glioma cells. Our assessment demonstrates that a spherical self-organizing map is a valuable tool for distinguishing cell surface markers and can be employed in marker discovery studies for the treatment of cancer.
Cell Engineering and Molecular Pharming for Biopharmaceuticals
Abdullah, M.A; Rahmah, Anisa ur; Sinskey, A.J; Rha, C.K
2008-01-01
Biopharmaceuticals are often produced by recombinant E. coli or mammalian cell lines. This is usually achieved by the introduction of a gene or cDNA coding for the protein of interest into a well-characterized strain of producer cells. Naturally, each recombinant production system has its own unique advantages and disadvantages. This paper examines the current practices, developments, and future trends in the production of biopharmaceuticals. Platform technologies for rapid screening and analyses of biosystems are reviewed. Strategies to improve productivity via metabolic and integrated engineering are also highlighted. PMID:19662143
Romero, Roberto; Tarca, Adi; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S.; Kalita, Cynthia A.; Cai, Juan; Yeo, Lami; Lipovich, Leonard
2014-01-01
Objective The mechanisms responsible for normal and abnormal parturition are poorly understood. Myometrial activation leading to regular uterine contractions is a key component of labor. Dysfunctional labor (arrest of dilatation and/or descent) is a leading indication for cesarean delivery. Compelling evidence suggests that most of these disorders are functional in nature, and not the result of cephalopelvic disproportion. The methodology and the datasets afforded by the post-genomic era provide novel opportunities to understand and target gene functions in these disorders. In 2012, the ENCODE Consortium elucidated the extraordinary abundance and functional complexity of long non-coding RNA genes in the human genome. The purpose of the study was to identify differentially expressed long non-coding RNA genes in human myometrium in women in spontaneous labor at term. Materials and Methods Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n=19) and women in spontaneous labor at term (n=20). RNA was extracted and profiled using an Illumina® microarray platform. The analysis of the protein coding genes from this study has been previously reported. Here, we have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. Results Upon considering more than 18,498 distinct lncRNA genes compiled nonredundantly from public experimental data sources, and interrogating 2,634 that matched Illumina microarray probes, we identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an independent experimental method. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site that lacked evolutionary conservation beyond primates. Conclusions We provide for the first time evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known, as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term. PMID:24168098
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ansong, Charles; Tolic, Nikola; Purvine, Samuel O.
Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. For example systems biology-oriented genome scale modeling efforts greatly benefit from accurate annotation of protein-coding genes to develop proper functioning models. However, determining protein-coding genes for most new genomes is almost completely performed by inference, using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. With the ability to directly measure peptides arising from expressed proteins, mass spectrometry-based proteomics approaches can be used to augment and verify codingmore » regions of a genomic sequence and importantly detect post-translational processing events. In this study we utilized “shotgun” proteomics to guide accurate primary genome annotation of the bacterial pathogen Salmonella Typhimurium 14028 to facilitate a systems-level understanding of Salmonella biology. The data provides protein-level experimental confirmation for 44% of predicted protein-coding genes, suggests revisions to 48 genes assigned incorrect translational start sites, and uncovers 13 non-annotated genes missed by gene prediction programs. We also present a comprehensive analysis of post-translational processing events in Salmonella, revealing a wide range of complex chemical modifications (70 distinct modifications) and confirming more than 130 signal peptide and N-terminal methionine cleavage events in Salmonella. This study highlights several ways in which proteomics data applied during the primary stages of annotation can improve the quality of genome annotations, especially with regards to the annotation of mature protein products.« less
Complete mitochondrial genome of the agarophyte red alga Gelidium vagum (Gelidiales).
Yang, Eun Chan; Kim, Kyeong Mi; Boo, Ga Hun; Lee, Jung-Hyun; Boo, Sung Min; Yoon, Hwan Su
2014-08-01
We describe the first complete mitochondrial genome of Gelidium vagum (Gelidiales) (24,901 bp, 30.4% GC content), an agar-producing red alga. The circular mitochondrial genome contains 43 genes, including 23 protein-coding, 18 tRNA and 2 rRNA genes. All the protein-coding genes have a typical ATG start codon. No introns were found. Two genes, secY and rps12, were overlapped by 41 bp.
Zhang, Haiyun; Sun, Dejun; Li, Defu; Zheng, Zeguang; Xu, Jingyi; Liang, Xue; Zhang, Chenting; Wang, Sheng; Wang, Jian; Lu, Wenju
2018-05-15
Long non-coding RNAs (lncRNAs) have critical regulatory roles in protein-coding gene expression. Aberrant expression profiles of lncRNAs have been observed in various human diseases. In this study, we investigated transcriptome profiles in lung tissues of chronic cigarette smoke (CS)-induced COPD mouse model. We found that 109 lncRNAs and 260 mRNAs were significantly differential expressed in lungs of chronic CS-induced COPD mouse model compared with control animals. GO and KEGG analyses indicated that differentially expressed lncRNAs associated protein-coding genes were mainly involved in protein processing of endoplasmic reticulum pathway, and taurine and hypotaurine metabolism pathway. The combination of high throughput data analysis and the results of qRT-PCR validation in lungs of chronic CS-induced COPD mouse model, 16HBE cells with CSE treatment and PBMC from patients with COPD revealed that NR_102714 and its associated protein-coding gene UCHL1 might be involved in the development of COPD both in mouse and human. In conclusion, our study demonstrated that aberrant expression profiles of lncRNAs and mRNAs existed in lungs of chronic CS-induced COPD mouse model. From animal models perspective, these results might provide further clues to investigate biological functions of lncRNAs and their potential target protein-coding genes in the pathogenesis of COPD.
Ni, ZhouXian; Ye, YouJu; Bai, Tiandao; Xu, Meng; Xu, Li-An
2017-09-11
The chloroplast genome (CPG) of Pinus massoniana belonging to the genus Pinus (Pinaceae), which is a primary source of turpentine, was sequenced and analyzed in terms of gene rearrangements, ndh genes loss, and the contraction and expansion of short inverted repeats (IRs). P. massoniana CPG has a typical quadripartite structure that includes large single copy (LSC) (65,563 bp), small single copy (SSC) (53,230 bp) and two IRs (IRa and IRb, 485 bp). The 108 unique genes were identified, including 73 protein-coding genes, 31 tRNAs, and 4 rRNAs. Most of the 81 simple sequence repeats (SSRs) identified in CPG were mononucleotides motifs of A/T types and located in non-coding regions. Comparisons with related species revealed an inversion (21,556 bp) in the LSC region; P. massoniana CPG lacks all 11 intact ndh genes (four ndh genes lost completely; the five remained truncated as pseudogenes; and the other two ndh genes remain as pseudogenes because of short insertions or deletions). A pair of short IRs was found instead of large IRs, and size variations among pine species were observed, which resulted from short insertions or deletions and non-synchronized variations between "IRa" and "IRb". The results of phylogenetic analyses based on whole CPG sequences of 16 conifers indicated that the whole CPG sequences could be used as a powerful tool in phylogenetic analyses.
Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R
2015-01-01
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.
Dasenko, Mark A.
2015-01-01
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693
Origin and evolution of the long non-coding genes in the X-inactivation center.
Romito, Antonio; Rougeulle, Claire
2011-11-01
Random X chromosome inactivation (XCI), the eutherian mechanism of X-linked gene dosage compensation, is controlled by a cis-acting locus termed the X-inactivation center (Xic). One of the striking features that characterize the Xic landscape is the abundance of loci transcribing non-coding RNAs (ncRNAs), including Xist, the master regulator of the inactivation process. Recent comparative genomic analyses have depicted the evolutionary scenario behind the origin of the X-inactivation center, revealing that this locus evolved from a region harboring protein-coding genes. During mammalian radiation, this ancestral protein-coding region was disrupted in the marsupial group, whilst it provided in eutherian lineage the starting material for the non-translated RNAs of the X-inactivation center. The emergence of non-coding genes occurred by a dual mechanism involving loss of protein-coding function of the pre-existing genes and integration of different classes of mobile elements, some of which modeled the structure and sequence of the non-coding genes in a species-specific manner. The rising genes started to produce transcripts that acquired function in regulating the epigenetic status of the X chromosome, as shown for Xist, its antisense Tsix, Jpx, and recently suggested for Ftx. Thus, the appearance of the Xic, which occurred after the divergence between eutherians and marsupials, was the basis for the evolution of random X inactivation as a strategy to achieve dosage compensation. Copyright © 2011. Published by Elsevier Masson SAS.
Seligmann, Hervé
2013-05-07
GenBank's EST database includes RNAs matching exactly human mitochondrial sequences assuming systematic asymmetric nucleotide exchange-transcription along exchange rules: A→G→C→U/T→A (12 ESTs), A→U/T→C→G→A (4 ESTs), C→G→U/T→C (3 ESTs), and A→C→G→U/T→A (1 EST), no RNAs correspond to other potential asymmetric exchange rules. Hypothetical polypeptides translated from nucleotide-exchanged human mitochondrial protein coding genes align with numerous GenBank proteins, predicted secondary structures resemble their putative GenBank homologue's. Two independent methods designed to detect overlapping genes (one based on nucleotide contents analyses in relation to replicative deamination gradients at third codon positions, and circular code analyses of codon contents based on frame redundancy), confirm nucleotide-exchange-encrypted overlapping genes. Methods converge on which genes are most probably active, and which not, and this for the various exchange rules. Mean EST lengths produced by different nucleotide exchanges are proportional to (a) extents that various bioinformatics analyses confirm the protein coding status of putative overlapping genes; (b) known kinetic chemistry parameters of the corresponding nucleotide substitutions by the human mitochondrial DNA polymerase gamma (nucleotide DNA misinsertion rates); (c) stop codon densities in predicted overlapping genes (stop codon readthrough and exchanging polymerization regulate gene expression by counterbalancing each other). Numerous rarely expressed proteins seem encoded within regular mitochondrial genes through asymmetric nucleotide exchange, avoiding lengthening genomes. Intersecting evidence between several independent approaches confirms the working hypothesis status of gene encryption by systematic nucleotide exchanges. Copyright © 2013 Elsevier Ltd. All rights reserved.
Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil
2015-01-01
The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073
Garcia de la Serrana, Daniel; Devlin, Robert H; Johnston, Ian A
2015-07-31
Coho salmon (Oncorhynchus kisutch) transgenic for growth hormone (Gh) express Gh in multiple tissues which results in increased appetite and continuous high growth with satiation feeding. Restricting Gh-transgenics to the same lower ration (TR) as wild-type fish (WT) results in similar growth, but with the recruitment of fewer, larger diameter, muscle skeletal fibres to reach a given body size. In order to better understand the genetic mechanisms behind these different patterns of muscle growth and to investigate how the decoupling of Gh and nutritional signals affects gene regulation we used RNA-seq to compare the fast skeletal muscle transcriptome in TR and WT coho salmon. Illumina sequencing of individually barcoded libraries from 6 WT and 6 TR coho salmon yielded 704,550,985 paired end reads which were used to construct 323,115 contigs containing 19,093 unique genes of which >10,000 contained >90 % of the coding sequence. Transcripts coding for 31 genes required for myoblast fusion were identified with 22 significantly downregulated in TR relative to WT fish, including 10 (vaspa, cdh15, graf1, crk, crkl, dock1, trio, plekho1a, cdc42a and dock5) associated with signaling through the cell surface protein cadherin. Nineteen out of 44 (43 %) translation initiation factors and 14 of 47 (30 %) protein chaperones were upregulated in TR relative to WT fish. TR coho salmon showed increased growth hormone transcripts and gene expression associated with protein synthesis and folding than WT fish even though net rates of protein accretion were similar. The uncoupling of Gh and amino acid signals likely results in additional costs of transcription associated with protein turnover in TR fish. The predicted reduction in the ionic costs of homeostasis in TR fish associated with increased fibre size were shown to involve multiple pathways regulating myotube fusion, particularly cadherin signaling.
Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan
2017-10-03
Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.
Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan
2017-01-01
Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes. PMID:29108274
Lee, Jungeun; Noh, Eun Kyeung; Choi, Hyung-Seok; Shin, Seung Chul; Park, Hyun; Lee, Hyoungseok
2013-03-01
Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been studied as an extremophile that has successfully adapted to marginal land with the harshest environment for terrestrial plants. However, limited genetic research has focused on this species due to the lack of genomic resources. Here, we present the first de novo assembly of its transcriptome by massive parallel sequencing and its expression profile using D. antarctica grown under various stress conditions. Total sequence reads generated by pyrosequencing were assembled into 60,765 unigenes (28,177 contigs and 32,588 singletons). A total of 29,173 unique protein-coding genes were identified based on sequence similarities to known proteins. The combined results from all three stress conditions indicated differential expression of 3,110 genes. Quantitative reverse transcription polymerase chain reaction showed that several well-known stress-responsive genes encoding late embryogenesis abundant protein, dehydrin 1, and ice recrystallization inhibition protein were induced dramatically and that genes encoding U-box-domain-containing protein, electron transfer flavoprotein-ubiquinone, and F-box-containing protein were induced by abiotic stressors in a manner conserved with other plant species. We identified more than 2,000 simple sequence repeats that can be developed as functional molecular markers. This dataset is the most comprehensive transcriptome resource currently available for D. antarctica and is therefore expected to be an important foundation for future genetic studies of grasses and extremophiles.
Unique pathway of expression of an opal suppressor phosphoserine tRNA.
Lee, B J; de la Peña, P; Tobian, J A; Zasloff, M; Hatfield, D
1987-01-01
An opal suppressor phosphoserine tRNA gene is present in single copy in the genomes of higher vertebrates. We have shown that the product of this gene functions as a suppressor in an in vitro assay, and we have proposed that it may donate a modified amino acid directly to protein in response to specific UGA codons. In this report, we show through in vitro and in vivo studies that the human and Xenopus opal suppressor phosphoserine tRNAs are synthesized by a pathway that is, to the best of our knowledge, unlike that of any known eukaryotic tRNA. The primary transcript of this gene does not contain a 5'-leader sequence; and, therefore, transcription of this suppressor is initiated at the first nucleotide within the coding sequence. The 5'-terminal triphosphate, present on the primary transcript, remains intact through 3'-terminal maturation and through subsequent transport of the tRNA to the cytoplasm. The unique biosynthetic pathway of this opal suppressor may underlie its distinctive role in eukaryotic cells. Images PMID:3114749
Pietan, Lucas L.; Spradling, Theresa A.
2016-01-01
In animals, mitochondrial DNA (mtDNA) typically occurs as a single circular chromosome with 13 protein-coding genes and 22 tRNA genes. The various species of lice examined previously, however, have shown mitochondrial genome rearrangements with a range of chromosome sizes and numbers. Our research demonstrates that the mitochondrial genomes of two species of chewing lice found on pocket gophers, Geomydoecus aurei and Thomomydoecus minor, are fragmented with the 1,536 base-pair (bp) cytochrome-oxidase subunit I (cox1) gene occurring as the only protein-coding gene on a 1,916–1,964 bp minicircular chromosome in the two species, respectively. The cox1 gene of T. minor begins with an atypical start codon, while that of G. aurei does not. Components of the non-protein coding sequence of G. aurei and T. minor include a tRNA (isoleucine) gene, inverted repeat sequences consistent with origins of replication, and an additional non-coding region that is smaller than the non-coding sequence of other lice with such fragmented mitochondrial genomes. Sequences of cox1 minichromosome clones for each species reveal extensive length and sequence heteroplasmy in both coding and noncoding regions. The highly variable non-gene regions of G. aurei and T. minor have little sequence similarity with one another except for a 19-bp region of phylogenetically conserved sequence with unknown function. PMID:27589589
BuD, a helix–loop–helix DNA-binding domain for genome modification
Stella, Stefano; Molina, Rafael; López-Méndez, Blanca; Juillerat, Alexandre; Bertonati, Claudia; Daboussi, Fayza; Campos-Olivas, Ramon; Duchateau, Phillippe; Montoya, Guillermo
2014-01-01
DNA editing offers new possibilities in synthetic biology and biomedicine for modulation or modification of cellular functions to organisms. However, inaccuracy in this process may lead to genome damage. To address this important problem, a strategy allowing specific gene modification has been achieved through the addition, removal or exchange of DNA sequences using customized proteins and the endogenous DNA-repair machinery. Therefore, the engineering of specific protein–DNA interactions in protein scaffolds is key to providing ‘toolkits’ for precise genome modification or regulation of gene expression. In a search for putative DNA-binding domains, BurrH, a protein that recognizes a 19 bp DNA target, was identified. Here, its apo and DNA-bound crystal structures are reported, revealing a central region containing 19 repeats of a helix–loop–helix modular domain (BurrH domain; BuD), which identifies the DNA target by a single residue-to-nucleotide code, thus facilitating its redesign for gene targeting. New DNA-binding specificities have been engineered in this template, showing that BuD-derived nucleases (BuDNs) induce high levels of gene targeting in a locus of the human haemoglobin β (HBB) gene close to mutations responsible for sickle-cell anaemia. Hence, the unique combination of high efficiency and specificity of the BuD arrays can push forward diverse genome-modification approaches for cell or organism redesign, opening new avenues for gene editing. PMID:25004980
Computer analysis of protein functional sites projection on exon structure of genes in Metazoa
2015-01-01
Background Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. Results One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. Conclusions These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and allow a better understanding of the emergence of biological diversity. PMID:26693737
Rokyta, Darin R; Wray, Kenneth P; Lemmon, Alan R; Lemmon, Emily Moriarty; Caudle, S Brian
2011-04-01
Despite causing considerable human mortality and morbidity, animal toxins represent a valuable source of pharmacologically active macromolecules, a unique system for studying molecular adaptation, and a powerful framework for examining structure-function relationships in proteins. Snake venoms are particularly useful in the latter regard as they consist primarily of a moderate number of proteins and peptides that have been found to belong to just a handful of protein families. As these proteins and peptides are produced in dedicated glands, transcriptome sequencing has proven to be an effective approach to identifying the expressed toxin genes. We generated a venom-gland transcriptome for the Eastern Diamondback Rattlesnake (Crotalus adamanteus) using Roche 454 sequencing technology. In the current work, we focus on transcripts encoding toxins. We identified 40 unique toxin transcripts, 30 of which have full-length coding sequences, and 10 have only partial coding sequences. These toxins account for 24% of the total sequencing reads. We found toxins from 11 previously described families of snake-venom toxins and have discovered two putative, previously undescribed toxin classes. The most diverse and highly expressed toxin classes in the C. adamanteus venom-gland transcriptome are the serine proteinases, metalloproteinases, and C-type lectins. The serine proteinases are the most abundant class, accounting for 35% of the toxin sequencing reads. Metalloproteinases are the most diverse; 11 different forms have been identified. Using our sequences and those available in public databases, we detected positive selection in seven of the eight toxin families for which sufficient sequences were available for the analysis. We find that the vast majority of the genes that contribute directly to this vertebrate trait show evidence for a role for positive selection in their evolutionary history. Copyright © 2011 Elsevier Ltd. All rights reserved.
Ni, Lianghong; Zhao, Zhili; Xu, Hongxi; Chen, Shilin; Dorje, Gaawe
2016-02-15
Endemic to the Sino-Himalayan subregion, the medicinal alpine plant Gentiana straminea is a threatened species. The genetic and molecular data about it is deficient. Here we report the complete chloroplast (cp) genome sequence of G. straminea, as the first sequenced member of the family Gentianaceae. The cp genome is 148,991bp in length, including a large single copy (LSC) region of 81,240bp, a small single copy (SSC) region of 17,085bp and a pair of inverted repeats (IRs) of 25,333bp. It contains 112 unique genes, including 78 protein-coding genes, 30 tRNAs and 4 rRNAs. The rps16 gene lacks exon2 between trnK-UUU and trnQ-UUG, which is the first rps16 pseudogene found in the nonparasitic plants of Asterids clade. Sequence analysis revealed the presence of 13 forward repeats, 13 palindrome repeats and 39 simple sequence repeats (SSRs). An entire cp genome comparison study of G. straminea and four other species in Gentianales was carried out. Phylogenetic analyses using maximum likelihood (ML) and maximum parsimony (MP) were performed based on 69 protein-coding genes from 36 species of Asterids. The results strongly supported the position of Gentianaceae as one member of the order Gentianales. The complete chloroplast genome sequence will provide intragenic information for its conservation and contribute to research on the genetic and phylogenetic analyses of Gentianales and Asterids. Copyright © 2015 Elsevier B.V. All rights reserved.
Yu, Hong; Kong, Lingfeng; Li, Qi
2016-01-01
In this study, we evaluated the efficacy of 12 mitochondrial protein-coding genes from 238 mitochondrial genomes of 140 molluscan species as potential DNA barcodes for mollusks. Three barcoding methods (distance, monophyly and character-based methods) were used in species identification. The species recovery rates based on genetic distances for the 12 genes ranged from 70.83 to 83.33%. There were no significant differences in intra- or interspecific variability among the 12 genes. The monophyly and character-based methods provided higher resolution than the distance-based method in species delimitation. Especially in closely related taxa, the character-based method showed some advantages. The results suggested that besides the standard COI barcode, other 11 mitochondrial protein-coding genes could also be potentially used as a molecular diagnostic for molluscan species discrimination. Our results also showed that the combination of mitochondrial genes did not enhance the efficacy for species identification and a single mitochondrial gene would be fully competent.
Habuka, Masato; Fagerberg, Linn; Hallström, Björn M.; Pontén, Fredrik; Yamamoto, Tadashi; Uhlen, Mathias
2015-01-01
To understand functions and diseases of urinary bladder, it is important to define its molecular constituents and their roles in urinary bladder biology. Here, we performed genome-wide deep RNA sequencing analysis of human urinary bladder samples and identified genes up-regulated in the urinary bladder by comparing the transcriptome data to those of all other major human tissue types. 90 protein-coding genes were elevated in the urinary bladder, either with enhanced expression uniquely in the urinary bladder or elevated expression together with at least one other tissue (group enriched). We further examined the localization of these proteins by immunohistochemistry and tissue microarrays and 20 of these 90 proteins were localized to the whole urothelium with a majority not yet described in the context of the urinary bladder. Four additional proteins were found specifically in the umbrella cells (Uroplakin 1a, 2, 3a, and 3b), and three in the intermediate/basal cells (KRT17, PCP4L1 and ATP1A4). 61 of the 90 elevated genes have not been previously described in the context of urinary bladder and the corresponding proteins are interesting targets for more in-depth studies. In summary, an integrated omics approach using transcriptomics and antibody-based profiling has been used to define a comprehensive list of proteins elevated in the urinary bladder. PMID:26694548
Xiong, Yan; Yue, Feng; Jia, Zhihao; Gao, Yun; Jin, Wen; Hu, Keping; Zhang, Yong; Zhu, Dahai; Yang, Gongshe; Kuang, Shihuan
2018-04-01
The thermogenic activities of brown and beige adipocytes can be exploited to reduce energy surplus and counteract obesity. Recent RNA sequencing studies have uncovered a number of long noncoding RNAs (lncRNAs) uniquely expressed in white and brown adipose tissues (WAT and BAT), but whether and how these lncRNAs function in adipogenesis remain largely unknown. Here, we report the identification of a novel brown adipocyte-enriched LncRNA (AK079912), and its nuclear localization, function and regulation. The expression of AK079912 increases during brown preadipocyte differentiation and in response to cold-stimulated browning of white adipocytes. Knockdown of AK079912 inhibits brown preadipocyte differentiation, manifested by reductions in lipid accumulation and down-regulation of adipogenic and BAT-specific genes. Conversely, ectopic expression of AK079912 in white preadipocytes up-regulates the expression of genes involved in thermogenesis. Mechanistically, inhibition of AK079912 reduces mitochondrial copy number and protein levels of mitochondria electron transport chain (ETC) complexes, whereas AK079912 overexpression increases the levels of ETC proteins. Lastly, reporter and pharmacological assays identify Pparγ as an upstream regulator of AK079912. These results provide new insights into the function of non-coding RNAs in brown adipogenesis and regulating browning of white adipocytes. Copyright © 2018 Elsevier B.V. All rights reserved.
Liu, Kaiyu; Li, Yi; Jousset, Françoise-Xavière; Zadori, Zoltan; Szelei, Jozsef; Yu, Qian; Pham, Hanh Thi; Lépine, François; Bergoin, Max; Tijssen, Peter
2011-01-01
The Acheta domesticus densovirus (AdDNV), isolated from crickets, has been endemic in Europe for at least 35 years. Severe epizootics have also been observed in American commercial rearings since 2009 and 2010. The AdDNV genome was cloned and sequenced for this study. The transcription map showed that splicing occurred in both the nonstructural (NS) and capsid protein (VP) multicistronic RNAs. The splicing pattern of NS mRNA predicted 3 nonstructural proteins (NS1 [576 codons], NS2 [286 codons], and NS3 [213 codons]). The VP gene cassette contained two VP open reading frames (ORFs), of 597 (ORF-A) and 268 (ORF-B) codons. The VP2 sequence was shown by N-terminal Edman degradation and mass spectrometry to correspond with ORF-A. Mass spectrometry, sequencing, and Western blotting of baculovirus-expressed VPs versus native structural proteins demonstrated that the VP1 structural protein was generated by joining ORF-A and -B via splicing (splice II), eliminating the N terminus of VP2. This splice resulted in a nested set of VP1 (816 codons), VP3 (467 codons), and VP4 (429 codons) structural proteins. In contrast, the two splices within ORF-B (Ia and Ib) removed the donor site of intron II and resulted in VP2, VP3, and VP4 expression. ORF-B may also code for several nonstructural proteins, of 268, 233, and 158 codons. The small ORF-B contains the coding sequence for a phospholipase A2 motif found in VP1, which was shown previously to be critical for cellular uptake of the virus. These splicing features are unique among parvoviruses and define a new genus of ambisense densoviruses. PMID:21775445
Using a Euclid distance discriminant method to find protein coding genes in the yeast genome.
Zhang, Chun-Ting; Wang, Ju; Zhang, Ren
2002-02-01
The Euclid distance discriminant method is used to find protein coding genes in the yeast genome, based on the single nucleotide frequencies at three codon positions in the ORFs. The method is extremely simple and may be extended to find genes in prokaryotic genomes or eukaryotic genomes with less introns. Six-fold cross-validation tests have demonstrated that the accuracy of the algorithm is better than 93%. Based on this, it is found that the total number of protein coding genes in the yeast genome is less than or equal to 5579 only, about 3.8-7.0% less than 5800-6000, which is currently widely accepted. The base compositions at three codon positions are analyzed in details using a graphic method. The result shows that the preference codons adopted by yeast genes are of the RGW type, where R, G and W indicate the bases of purine, non-G and A/T, whereas the 'codons' in the intergenic sequences are of the form NNN, where N denotes any base. This fact constitutes the basis of the algorithm to distinguish between coding and non-coding ORFs in the yeast genome. The names of putative non-coding ORFs are listed here in detail.
Kim, Min Jee; Hong, Eui Jeong; Kim, Iksoo
2016-01-01
We sequenced the complete mitochondrial (mt) genome of Camponotus atrox (Hymenoptera: Formicidae), which is only distributed in Korea. The genome was 16 540 bp in size and contained typical sets of genes (13 protein-coding genes, 22 tRNAs, and 2 rRNAs). The C. atrox A+T-rich region, at 1402 bp, was the longest of all sequenced ant genomes and was composed of an identical tandem repeat consisting of six 100-bp copies and one 96-bp copy. A total of 315 bp of intergenic spacer sequence was spread over 23 regions. An alignment of the spacer sequences in ants was largely feasible among congeneric species, and there was substantial sequence divergence, indicating their potential use as molecular markers for congeneric species. The A/T contents at the first and second codon positions of protein-coding genes (PCGs) were similar for ant species, including C. atrox (73.9% vs. 72.3%, on average). With increased taxon sampling among hymenopteran superfamilies, differences in the divergence rates (i.e., the non-synonymous substitution rates) between the suborders Symphyta and Apocrita were detected, consistent with previous results. The C. atrox mt genome had a unique gene arrangement, trnI-trnM-trnQ, at the A+T-rich region and ND2 junction (underline indicates inverted gene). This may have originated from a tandem duplication of trnM-trnI, resulting in trnM-trnI-trnM-trnI-trnQ, and the subsequent loss of the first trnM and second trnI, resulting in trnI-trnM-trnQ.
Sporophyte Formation and Life Cycle Completion in Moss Requires Heterotrimeric G-Proteins1[OPEN
Hackenberg, Dieter; Quatrano, Ralph
2016-01-01
In this study, we report the functional characterization of heterotrimeric G-proteins from a nonvascular plant, the moss Physcomitrella patens. In plants, G-proteins have been characterized from only a few angiosperms to date, where their involvement has been shown during regulation of multiple signaling and developmental pathways affecting overall plant fitness. In addition to its unparalleled evolutionary position in the plant lineages, the P. patens genome also codes for a unique assortment of G-protein components, which includes two copies of Gβ and Gγ genes, but no canonical Gα. Instead, a single gene encoding an extra-large Gα (XLG) protein exists in the P. patens genome. Here, we demonstrate that in P. patens the canonical Gα is biochemically and functionally replaced by an XLG protein, which works in the same genetic pathway as one of the Gβ proteins to control its development. Furthermore, the specific G-protein subunits in P. patens are essential for its life cycle completion. Deletion of the genomic locus of PpXLG or PpGβ2 results in smaller, slower growing gametophores. Normal reproductive structures develop on these gametophores, but they are unable to form any sporophyte, the only diploid stage in the moss life cycle. Finally, the mutant phenotypes of ΔPpXLG and ΔPpGβ2 can be complemented by the homologous genes from Arabidopsis, AtXLG2 and AtAGB1, respectively, suggesting an overall conservation of their function throughout the plant evolution. PMID:27550997
Splicing regulation and dysregulation of cholinergic genes expressed at the neuromuscular junction.
Ohno, Kinji; Rahman, Mohammad Alinoor; Nazim, Mohammad; Nasrin, Farhana; Lin, Yingni; Takeda, Jun-Ichi; Masuda, Akio
2017-08-01
We humans have evolved by acquiring diversity of alternative RNA metabolisms including alternative means of splicing and transcribing non-coding genes, and not by acquiring new coding genes. Tissue-specific and developmental stage-specific alternative RNA splicing is achieved by tightly regulated spatiotemporal regulation of expressions and activations of RNA-binding proteins that recognize their cognate splicing cis-elements on nascent RNA transcripts. Genes expressed at the neuromuscular junction are also alternatively spliced. In addition, germline mutations provoke aberrant splicing by compromising binding of RNA-binding proteins, and cause congenital myasthenic syndromes (CMS). We present physiological splicing mechanisms of genes for agrin (AGRN), acetylcholinesterase (ACHE), MuSK (MUSK), acetylcholine receptor (AChR) α1 subunit (CHRNA1), and collagen Q (COLQ) in human, and their aberration in diseases. Splicing isoforms of AChE T , AChE H , and AChE R are generated by hnRNP H/F. Skipping of MUSK exon 10 makes a Wnt-insensitive MuSK isoform, which is unique to human. Skipping of exon 10 is achieved by coordinated binding of hnRNP C, YB-1, and hnRNP L to exon 10. Exon P3A of CHRNA1 is alternatively included to generate a non-functional AChR α1 subunit in human. Molecular dissection of splicing mutations in patients with CMS reveals that exon P3A is alternatively skipped by hnRNP H, polypyrimidine tract-binding protein 1, and hnRNP L. Similarly, analysis of an exonic mutation in COLQ exon 16 in a CMS patient discloses that constitutive splicing of exon 16 requires binding of serine arginine-rich splicing factor 1. Intronic and exonic splicing mutations in CMS enable us to dissect molecular mechanisms underlying alternative and constitutive splicing of genes expressed at the neuromuscular junction. This is an article for the special issue XVth International Symposium on Cholinergic Mechanisms. © 2017 International Society for Neurochemistry.
Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil
2015-02-01
The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences
Saber, Morteza Mahmoudi; Adeyemi Babarinde, Isaac; Hettiarachchi, Nilmini; Saitou, Naruya
2016-01-01
Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions. PMID:27289096
Stotz, Henrik U; Harvey, Pascoe J; Haddadi, Parham; Mashanova, Alla; Kukol, Andreas; Larkan, Nicholas J; Borhan, M Hossein; Fitt, Bruce D L
2018-01-01
Genes coding for nucleotide-binding leucine-rich repeat (LRR) receptors (NLRs) control resistance against intracellular (cell-penetrating) pathogens. However, evidence for a role of genes coding for proteins with LRR domains in resistance against extracellular (apoplastic) fungal pathogens is limited. Here, the distribution of genes coding for proteins with eLRR domains but lacking kinase domains was determined for the Brassica napus genome. Predictions of signal peptide and transmembrane regions divided these genes into 184 coding for receptor-like proteins (RLPs) and 121 coding for secreted proteins (SPs). Together with previously annotated NLRs, a total of 720 LRR genes were found. Leptosphaeria maculans-induced expression during a compatible interaction with cultivar Topas differed between RLP, SP and NLR gene families; NLR genes were induced relatively late, during the necrotrophic phase of pathogen colonization. Seven RLP, one SP and two NLR genes were found in Rlm1 and Rlm3/Rlm4/Rlm7/Rlm9 loci for resistance against L. maculans on chromosome A07 of B. napus. One NLR gene at the Rlm9 locus was positively selected, as was the RLP gene on chromosome A10 with LepR3 and Rlm2 alleles conferring resistance against L. maculans races with corresponding effectors AvrLm1 and AvrLm2, respectively. Known loci for resistance against L. maculans (extracellular hemi-biotrophic fungus), Sclerotinia sclerotiorum (necrotrophic fungus) and Plasmodiophora brassicae (intracellular, obligate biotrophic protist) were examined for presence of RLPs, SPs and NLRs in these regions. Whereas loci for resistance against P. brassicae were enriched for NLRs, no such signature was observed for the other pathogens. These findings demonstrate involvement of (i) NLR genes in resistance against the intracellular pathogen P. brassicae and a putative NLR gene in Rlm9-mediated resistance against the extracellular pathogen L. maculans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).
Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren
2016-04-01
Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)
Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren
2016-01-01
Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575
The Saccharomyces cerevisiae enolase-related regions encode proteins that are active enolases.
Kornblatt, M J; Richard Albert, J; Mattie, S; Zakaib, J; Dayanandan, S; Hanic-Joyce, P J; Joyce, P B M
2013-02-01
In addition to two genes (ENO1 and ENO2) known to code for enolase (EC4.2.1.11), the Saccharomyces cerevisiae genome contains three enolase-related regions (ERR1, ERR2 and ERR3) which could potentially encode proteins with enolase function. Here, we show that products of these genes (Err2p and Err3p) have secondary and quaternary structures similar to those of yeast enolase (Eno1p). In addition, Err2p and Err3p can convert 2-phosphoglycerate to phosphoenolpyruvate, with kinetic parameters similar to those of Eno1p, suggesting that these proteins could function as enolases in vivo. To address this possibility, we overexpressed the ERR2 and ERR3 genes individually in a double-null yeast strain lacking ENO1 and ENO2, and showed that either ERR2 or ERR3 could complement the growth defect in this strain when cells are grown in medium with glucose as the carbon source. Taken together, these data suggest that the ERR genes in Saccharomyces cerevisiae encode a protein that could function in glycolysis as enolase. The presence of these enolase-related regions in Saccharomyces cerevisiae and their absence in other related yeasts suggests that these genes may play some unique role in Saccharomyces cerevisiae. Further experiments will be required to determine whether these functions are related to glycolysis or other cellular processes. Copyright © 2012 John Wiley & Sons, Ltd.
Croucher, Peter J P; Brewer, Michael S; Winchell, Christopher J; Oxford, Geoff S; Gillespie, Rosemary G
2013-12-08
A number of spider species within the family Theridiidae exhibit a dramatic abdominal (opisthosomal) color polymorphism. The polymorphism is inherited in a broadly Mendelian fashion and in some species consists of dozens of discrete morphs that are convergent across taxa and populations. Few genomic resources exist for spiders. Here, as a first necessary step towards identifying the genetic basis for this trait we present the near complete transcriptomes of two species: the Hawaiian happy-face spider Theridion grallator and Theridion californicum. We mined the gene complement for pigment-pathway genes and examined differential expression (DE) between morphs that are unpatterned (plain yellow) and patterned (yellow with superimposed patches of red, white or very dark brown). By deep sequencing both RNA-seq and normalized cDNA libraries from pooled specimens of each species we were able to assemble a comprehensive gene set for both species that we estimate to be 98-99% complete. It is likely that these species express more than 20,000 protein-coding genes, perhaps 4.5% (ca. 870) of which might be unique to spiders. Mining for pigment-associated Drosophila melanogaster genes indicated the presence of all ommochrome pathway genes and most pteridine pathway genes and DE analyses further indicate a possible role for the pteridine pathway in theridiid color patterning. Based upon our estimates, T. grallator and T. californicum express a large inventory of protein-coding genes. Our comprehensive assembly illustrates the continuing value of sequencing normalized cDNA libraries in addition to RNA-seq in order to generate a reference transcriptome for non-model species. The identification of pteridine-related genes and their possible involvement in color patterning is a novel finding in spiders and one that suggests a biochemical link between guanine deposits and the pigments exhibited by these species.
Fourie, Gerda; van der Merwe, Nicolaas A; Wingfield, Brenda D; Bogale, Mesfin; Tudzynski, Bettina; Wingfield, Michael J; Steenkamp, Emma T
2013-09-08
The availability of mitochondrial genomes has allowed for the resolution of numerous questions regarding the evolutionary history of fungi and other eukaryotes. In the Gibberella fujikuroi species complex, the exact relationships among the so-called "African", "Asian" and "American" Clades remain largely unresolved, irrespective of the markers employed. In this study, we considered the feasibility of using mitochondrial genes to infer the phylogenetic relationships among Fusarium species in this complex. The mitochondrial genomes of representatives of the three Clades (Fusarium circinatum, F. verticillioides and F. fujikuroi) were characterized and we determined whether or not the mitochondrial genomes of these fungi have value in resolving the higher level evolutionary relationships in the complex. Overall, the mitochondrial genomes of the three species displayed a high degree of synteny, with all the genes (protein coding genes, unique ORFs, ribosomal RNA and tRNA genes) in identical order and orientation, as well as introns that share similar positions within genes. The intergenic regions and introns generally contributed significantly to the size differences and diversity observed among these genomes. Phylogenetic analysis of the concatenated protein-coding dataset separated members of the Gibberella fujikuroi complex from other Fusarium species and suggested that F. fujikuroi ("Asian" Clade) is basal in the complex. However, individual mitochondrial gene trees were largely incongruent with one another and with the concatenated gene tree, because six distinct phylogenetic trees were recovered from the various single gene datasets. The mitochondrial genomes of Fusarium species in the Gibberella fujikuroi complex are remarkably similar to those of the previously characterized Fusarium species and Sordariomycetes. Despite apparently representing a single replicative unit, all of the genes encoded on the mitochondrial genomes of these fungi do not share the same evolutionary history. This incongruence could be due to biased selection on some genes or recombination among mitochondrial genomes. The results thus suggest that the use of individual mitochondrial genes for phylogenetic inference could mask the true relationships between species in this complex.
Biosynthesis and expression of ependymin homologous sequences in zebrafish brain.
Sterrer, S; Königstorfer, A; Hoffmann, W
1990-01-01
Ependymins are unique, brain specific glycoproteins, which are major constituents of the cerebrospinal fluid. Originally, they were discovered in goldfish and are thought to be involved in synaptic plasticity. In the present study two transcripts were characterized in Brachydanio rerio originating from a single gene possibly by alternative splicing. These transcripts differ only in the length of their 3'-non-coding-regions and the encoded protein shares 90 and 88% homology with the two corresponding goldfish proteins, respectively. In situ hybridization revealed the expression of ependymins exclusively in the leptomeninx including its invaginations but not at all in the ependymal layer surrounding the ventricles. An initial developmental profile showed that ependymins first appear before hatching, i.e. between 48 and 72 h postfertilization.
Recognition of Protein-coding Genes Based on Z-curve Algorithms
-Biao Guo, Feng; Lin, Yan; -Ling Chen, Ling
2014-01-01
Recognition of protein-coding genes, a classical bioinformatics issue, is an absolutely needed step for annotating newly sequenced genomes. The Z-curve algorithm, as one of the most effective methods on this issue, has been successfully applied in annotating or re-annotating many genomes, including those of bacteria, archaea and viruses. Two Z-curve based ab initio gene-finding programs have been developed: ZCURVE (for bacteria and archaea) and ZCURVE_V (for viruses and phages). ZCURVE_C (for 57 bacteria) and Zfisher (for any bacterium) are web servers for re-annotation of bacterial and archaeal genomes. The above four tools can be used for genome annotation or re-annotation, either independently or combined with the other gene-finding programs. In addition to recognizing protein-coding genes and exons, Z-curve algorithms are also effective in recognizing promoters and translation start sites. Here, we summarize the applications of Z-curve algorithms in gene finding and genome annotation. PMID:24822027
Highly Conserved Keratin-Associated Protein 7-1 Gene in Yak, Taurine and Zebu Cattle.
Arlud, S; He, N; Sari, E M; Ma, Z-J; Zhang, H; An, T-W; Han, J-L
2017-01-01
Keratin-associated proteins (KRTAPs) play a critical role in cross-linking the keratin intermediate filaments to build a hair shaft. The genetic polymorphisms of the bovine KRTAP7-1 gene were investigated for the first time in this study. The complete coding sequence of the KRTAP7-1 gene in 108 domestic yak, taurine and zebu cattle from China and Indonesia were successfully amplified using polymerase chain reaction and then directly sequenced. Only two single-nucleotide polymorphisms (one nonsynonymous at c.7C/G and another synonymous at c.21C/T) and three haplotypes (BOVIN-KRTAP7-1*A, B and C) were identified in the complete coding sequence of the bovine KRTAP7-1 gene among all animals. There was no polymorphism across three Chinese indigenous yak breeds and one Indonesian zebu cattle population, all sharing the BOVINKRTAP71*A haplotype. The four taurine cattle populations also had BOVIN-KRTAP7-1*A as the most common haplotype with a frequency of 0.80. The frequency of novel haplotype BOVIN-KRTAP7-1*B was only 0.07 present in one heterozygous animal in each of the four taurine cattle populations, while BOVINKRTAP7- 1*C was only found in a Simmental and a local Chinese Yellow cattle population with frequencies of 0.17 and 0.36, respectively. The monomorphic yak KRTAP7-1 gene in particular, and highly conserved bovine, sheep and goat KRTAP7-1 genes in general, demonstrated its unique intrinsic structural property (e.g., > 21% high glycine content) and primary functional importance in supporting the mechanical strength and shape of hair.
Li, Shicheng; Sun, Xiao; Miao, Shuncheng; Liu, Jia; Jiao, Wenjie
2017-11-01
Cigarette smoking is one of the greatest preventable risk factors for developing cancer, and most cases of lung squamous cell carcinoma (lung SCC) are associated with smoking. The pathogenesis mechanism of tumor progress is unclear. This study aimed to identify biomarkers in smoking-related lung cancer, including protein-coding gene, long noncoding RNA, and transcription factors. We selected and obtained messenger RNA microarray datasets and clinical data from the Gene Expression Omnibus database to identify gene expression altered by cigarette smoking. Integrated bioinformatic analysis was used to clarify biological functions of the identified genes, including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, the construction of a protein-protein interaction network, transcription factor, and statistical analyses. Subsequent quantitative real-time PCR was utilized to verify these bioinformatic analyses. Five hundred and ninety-eight differentially expressed genes and 21 long noncoding RNA were identified in smoking-related lung SCC. GO and KEGG pathway analysis showed that identified genes were enriched in the cancer-related functions and pathways. The protein-protein interaction network revealed seven hub genes identified in lung SCC. Several transcription factors and their binding sites were predicted. The results of real-time quantitative PCR revealed that AURKA and BIRC5 were significantly upregulated and LINC00094 was downregulated in the tumor tissues of smoking patients. Further statistical analysis indicated that dysregulation of AURKA, BIRC5, and LINC00094 indicated poor prognosis in lung SCC. Protein-coding genes AURKA, BIRC5, and LINC00094 could be biomarkers or therapeutic targets for smoking-related lung SCC. © 2017 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.
Barnidge, David R; Jelinek, Diane F; Muddiman, David C; Kay, Neil E
2005-01-01
Relative protein expression levels were compared in leukemic B cells from two patients with chronic lymphocytic leukemia (CLL) having either mutated (M-CLL) or unmutated (UM-CLL) immunoglobulin variable heavy chain genes (IgV(H)). Cells were separated into cytosol and membrane protein fractions then labeled with acid-cleavable ICAT reagents (cICAT). Labeled proteins were digested with trypsin then subjected to SCX and affinity chromatography followed by LC-ESI-MS/MS analysis on a linear ion trap mass spectrometer. A total of 9 proteins from the cytosol fraction and 4 from the membrane fraction showed a 3-fold or greater difference between M-CLL and UM-CLL and a subset of these were examined by Western blot where results concurred with cICAT abundance ratios. The abundance of one of the proteins in particular, the mitochondrial membrane protein cytochrome c oxidase subunit COX G was examined in 6 M-CLL and 6 UM-CLL patients using western blot and results showed significantly greater levels (P < 0.001) in M-CLL patients vs UM-CLL patients. These results demonstrate that stable isotope labeling and mass spectrometry can complement 2D gel electrophoresis and gene microarray technologies for identifying putative and perhaps unique prognostic markers in CLL.
Alam, Syed Imteyaz; Dwivedi, Pratistha
2016-10-01
The whole genome sequencing and annotation of Clostridium perfringens strains revealed several genes coding for proteins of unknown function with no significant similarities to genes in other organisms. Our previous studies clearly demonstrated that hypothetical proteins CPF_2500, CPF_1441, CPF_0876, CPF_0093, CPF_2002, CPF_2314, CPF_1179, CPF_1132, CPF_2853, CPF_0552, CPF_2032, CPF_0438, CPF_1440, CPF_2918, CPF_0656, and CPF_2364 are genuine proteins of C. perfringens expressed in high abundance. This study explored the putative role of these hypothetical proteins using bioinformatic tools and evaluated their potential as putative candidates for prophylaxis. Apart from a group of eight hypothetical proteins (HPs), a putative function was predicted for the rest of the hypothetical proteins using one or more of the algorithms used. The phylogenetic analysis did not suggest an evidence of a horizontal gene transfer event except for HP CPF_0876. HP CPF_2918 is an abundant extracellular protein, unique to C. perfringens species with maximum strain coverage and did not show any significant match in the database. CPF_2918 was cloned, recombinant protein was purified to near homogeneity, and probing with mouse anti-CPF_2918 serum revealed surface localization of the protein in C. perfringens ATCC13124 cultures. The purified recombinant CPF_2918 protein induced antibody production, a mixed Th1 and Th2 kind of response, and provided partial protection to immunized mice in direct C. perfringens challenge. Copyright © 2016 Elsevier B.V. All rights reserved.
CXCL4 induces a unique transcriptome in monocyte-derived macrophages
Gleissner, Christian A.; Shaked, Iftach; Little, Kristina M.; Ley, Klaus
2012-01-01
In atherosclerotic arteries, blood monocytes differentiate to macrophages in the presence of growth factors like macrophage colony-stimulation factor (MCSF) and chemokines like platelet factor 4 (CXCL4). To compare the gene expression signature of CXCL4-induced macrophages with MCSF-induced macrophages or macrophages polarized with IFN-γ/LPS (M1) or IL-4 (M2), we cultured primary human peripheral blood monocytes for six days. mRNA expression was measured by Affymetrix gene chips and differences were analyzed by Local Pooled Error test, Profile of Complex Functionality and Gene Set Enrichment Analysis. 375 genes were differentially expressed between MCSF- and CXCL4-induced macrophages, 206 of them overexpressed in CXCL4 macrophages coding for genes implicated in the inflammatory/immune response, antigen processing/presentation, and lipid metabolism. CXCL4-induced macrophages overexpressed some M1 and M2 genes and the corresponding cytokines at the protein level, however, their transcriptome clustered with neither M1 nor M2 transcriptomes. They almost completely lost the ability to phagocytose zymosan beads. Genes linked to atherosclerosis were not consistently up- or downregulated. Scavenger receptors showed lower and cholesterol efflux transporters higher expression in CXCL4- than MCSF-induced macrophages, resulting in lower LDL content. We conclude that CXCL4 induces a unique macrophage transcriptome distinct from known macrophage types, defining a new macrophage differentiation that we propose to call M4. PMID:20335529
Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia
2015-01-01
Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/ PMID:26363020
GENCODE: the reference human genome annotation for The ENCODE Project.
Harrow, Jennifer; Frankish, Adam; Gonzalez, Jose M; Tapanari, Electra; Diekhans, Mark; Kokocinski, Felix; Aken, Bronwen L; Barrell, Daniel; Zadissa, Amonida; Searle, Stephen; Barnes, If; Bignell, Alexandra; Boychenko, Veronika; Hunt, Toby; Kay, Mike; Mukherjee, Gaurab; Rajan, Jeena; Despacio-Reyes, Gloria; Saunders, Gary; Steward, Charles; Harte, Rachel; Lin, Michael; Howald, Cédric; Tanzer, Andrea; Derrien, Thomas; Chrast, Jacqueline; Walters, Nathalie; Balasubramanian, Suganthi; Pei, Baikang; Tress, Michael; Rodriguez, Jose Manuel; Ezkurdia, Iakes; van Baren, Jeltje; Brent, Michael; Haussler, David; Kellis, Manolis; Valencia, Alfonso; Reymond, Alexandre; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J
2012-09-01
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Redwan, R M; Saidin, A; Kumar, S V
2015-08-12
Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology. In this study, the high error rate of PacBio long sequence reads of A. comosus's total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus. The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.
Zhao, Yi; Tang, Liang; Li, Zhe; Jin, Jinpu; Luo, Jingchu; Gao, Ge
2015-04-18
Long-established protein-coding genes may lose their coding potential during evolution ("unitary gene loss"). Members of the Poaceae family are a major food source and represent an ideal model clade for plant evolution research. However, the global pattern of unitary gene loss in Poaceae genomes as well as the evolutionary fate of lost genes are still less-investigated and remain largely elusive. Using a locally developed pipeline, we identified 129 unitary gene loss events for long-established protein-coding genes from four representative species of Poaceae, i.e. brachypodium, rice, sorghum and maize. Functional annotation suggested that the lost genes in all or most of Poaceae species are enriched for genes involved in development and response to endogenous stimulus. We also found that 44 mutated genomic loci of lost genes, which we referred as relics, were still actively transcribed, and of which 84% (37 of 44) showed significantly differential expression across different tissues. More interestingly, we found that there were totally five expressed relics may function as competitive endogenous RNA in brachypodium, rice and sorghum genome. Based on comparative genomics and transcriptome data, we firstly compiled a comprehensive catalogue of unitary gene loss events in Poaceae species and characterized a statistically significant functional preference for these lost genes as well showed the potential of relics functioning as competitive endogenous RNAs in Poaceae genomes.
Mutant phenotypes for thousands of bacterial genes of unknown function
Price, Morgan N.; Wetmore, Kelly M.; Waters, R. Jordan; ...
2018-05-16
One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because theymore » are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Lastly, our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.« less
Mutant phenotypes for thousands of bacterial genes of unknown function
DOE Office of Scientific and Technical Information (OSTI.GOV)
Price, Morgan N.; Wetmore, Kelly M.; Waters, R. Jordan
One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because theymore » are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Lastly, our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.« less
Analysis of protein-coding genetic variation in 60,706 humans.
Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G
2016-08-18
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Bakera, Beata; Makowska, Bogna; Groszyk, Jolanta; Niziołek, Michał; Orczyk, Wacław; Bolibok-Brągoszewska, Hanna; Hromada-Judycka, Aneta; Rakoczy-Trojanowska, Monika
2015-08-01
Benzoxazinoids (BX) are major secondary metabolites of gramineous plants that play an important role in disease resistance and allelopathy. They also have many other unique properties including anti-bacterial and anti-fungal activity, and the ability to reduce alfa-amylase activity. The biosynthesis and modification of BX are controlled by the genes Bx1 ÷ Bx10, GT and glu, and the majority of these Bx genes have been mapped in maize, wheat and rye. However, the genetic basis of BX biosynthesis remains largely uncharacterized apart from some data from maize and wheat. The aim of this study was to isolate, sequence and characterize five genes (ScBx1, ScBx2, ScBx3, ScBx4 and ScBx5) encoding enzymes involved in the synthesis of DIBOA, an important defense compound of rye. Using a modified 3D procedure of BAC library screening, seven BAC clones containing all of the ScBx genes were isolated and sequenced. Bioinformatic analyses of the resulting contigs were used to examine the structure and other features of these genes, including their promoters, introns and 3'UTRs. Comparative analysis showed that the ScBx genes are similar to those of other Poaceae species, especially to the TaBx genes. The polymorphisms present both in the coding sequences and non-coding regions of ScBx in relation to other Bx genes are predicted to have an impact on the expression, structure and properties of the encoded proteins.
Sugita, Chieko; Ogata, Koretsugu; Shikata, Masamitsu; Jikuya, Hiroyuki; Takano, Jun; Furumichi, Miho; Kanehisa, Minoru; Omata, Tatsuo; Sugiura, Masahiro; Sugita, Mamoru
2007-01-01
The entire genome of the unicellular cyanobacterium Synechococcus elongatus PCC 6301 (formerly Anacystis nidulans Berkeley strain 6301) was sequenced. The genome consisted of a circular chromosome 2,696,255 bp long. A total of 2,525 potential protein-coding genes, two sets of rRNA genes, 45 tRNA genes representing 42 tRNA species, and several genes for small stable RNAs were assigned to the chromosome by similarity searches and computer predictions. The translated products of 56% of the potential protein-coding genes showed sequence similarities to experimentally identified and predicted proteins of known function, and the products of 35% of the genes showed sequence similarities to the translated products of hypothetical genes. The remaining 9% of genes lacked significant similarities to genes for predicted proteins in the public DNA databases. Some 139 genes coding for photosynthesis-related components were identified. Thirty-seven genes for two-component signal transduction systems were also identified. This is the smallest number of such genes identified in cyanobacteria, except for marine cyanobacteria, suggesting that only simple signal transduction systems are found in this strain. The gene arrangement and nucleotide sequence of Synechococcus elongatus PCC 6301 were nearly identical to those of a closely related strain Synechococcus elongatus PCC 7942, except for the presence of a 188.6 kb inversion. The sequences as well as the gene information shown in this paper are available in the Web database, CYORF (http://www.cyano.genome.jp/).
Moreno-Sánchez, Natalia; Rueda, Julia; Reverter, Antonio; Carabaño, María Jesús; Díaz, Clara
2012-03-01
Variations on the transcriptome from one skeletal muscle type to another still remain unknown. The reliable identification of stable gene coexpression networks is essential to unravel gene functions and define biological processes. The differential expression of two distinct muscles, M. flexor digitorum (FD) and M. psoas major (PM), was studied using microarrays in cattle to illustrate muscle-specific transcription patterns and to quantify changes in connectivity regarding the expected gene coexpression pattern. A total of 206 genes were differentially expressed (DE), 94 upregulated in PM and 112 in FD. The distribution of DE genes in pathways and biological functions was explored in the context of system biology. Global interactomes for genes of interest were predicted. Fast/slow twitch genes, genes coding for extracellular matrix, ribosomal and heat shock proteins, and fatty acid uptake centred the specific gene expression patterns per muscle. Genes involved in repairing mechanisms, such as ribosomal and heat shock proteins, suggested a differential ability of muscles to react to similar stressing factors, acting preferentially in slow twitch muscles. Muscle attributes do not seem to be completely explained by the muscle fibre composition. Changes in connectivity accounted for 24% of significant correlations between DE genes. Genes changing their connectivity mostly seem to contribute to the main differential attributes that characterize each specific muscle type. These results underscore the unique flexibility of skeletal muscle where a substantial set of genes are able to change their behavior depending on the circumstances.
Christie, Andrew E.; Sommer, Stephanie A.; Cieslak, Matthew C.; Hartline, Daniel K.; Lenz, Petra H.
2017-01-01
Coral reef ecosystems of many sub-tropical and tropical marine coastal environments have suffered significant degradation from anthropogenic sources. Research to inform management strategies that mitigate stressors and promote a healthy ecosystem has focused on the ecology and physiology of coral reefs and associated organisms. Few studies focus on the surrounding pelagic communities, which are equally important to ecosystem function. Zooplankton, often dominated by small crustaceans such as copepods, is an important food source for invertebrates and fishes, especially larval fishes. The reef-associated zooplankton includes a sub-neustonic copepod family that could serve as an indicator species for the community. Here, we describe the generation of a de novo transcriptome for one such copepod, Labidocera madurae, a pontellid from an intensively-studied coral reef ecosystem, Kāne‘ohe Bay, Oahu, Hawai‘i. The transcriptome was assembled using high-throughput sequence data obtained from whole organisms. It comprised 211,002 unique transcripts, including 72,391 with coding regions. It was assessed for quality and completeness using multiple workflows. Bench-marking-universal-single-copy-orthologs (BUSCO) analysis identified transcripts for 88% of expected eukaryotic core proteins. Targeted gene-discovery analyses included searches for transcripts coding full-length “giant” proteins (>4,000 amino acids), proteins and splice variants of voltage-gated sodium channels, and proteins involved in the circadian signaling pathway. Four different reference transcriptomes were generated and compared for the detection of differential gene expression between copepodites and adult females; 6,229 genes were consistently identified as differentially expressed between the two regardless of reference. Automated bioinformatics analyses and targeted manual gene curation suggest that the de novo assembled L. madurae transcriptome is of high quality and completeness. This transcriptome provides a new resource for assessing the global physiological status of a planktonic species inhabiting a coral reef ecosystem that is subjected to multiple anthropogenic stressors. The workflows provide a template for generating and assessing transcriptomes in other non-model species. PMID:29065152
Roncalli, Vittoria; Christie, Andrew E; Sommer, Stephanie A; Cieslak, Matthew C; Hartline, Daniel K; Lenz, Petra H
2017-01-01
Coral reef ecosystems of many sub-tropical and tropical marine coastal environments have suffered significant degradation from anthropogenic sources. Research to inform management strategies that mitigate stressors and promote a healthy ecosystem has focused on the ecology and physiology of coral reefs and associated organisms. Few studies focus on the surrounding pelagic communities, which are equally important to ecosystem function. Zooplankton, often dominated by small crustaceans such as copepods, is an important food source for invertebrates and fishes, especially larval fishes. The reef-associated zooplankton includes a sub-neustonic copepod family that could serve as an indicator species for the community. Here, we describe the generation of a de novo transcriptome for one such copepod, Labidocera madurae, a pontellid from an intensively-studied coral reef ecosystem, Kāne'ohe Bay, Oahu, Hawai'i. The transcriptome was assembled using high-throughput sequence data obtained from whole organisms. It comprised 211,002 unique transcripts, including 72,391 with coding regions. It was assessed for quality and completeness using multiple workflows. Bench-marking-universal-single-copy-orthologs (BUSCO) analysis identified transcripts for 88% of expected eukaryotic core proteins. Targeted gene-discovery analyses included searches for transcripts coding full-length "giant" proteins (>4,000 amino acids), proteins and splice variants of voltage-gated sodium channels, and proteins involved in the circadian signaling pathway. Four different reference transcriptomes were generated and compared for the detection of differential gene expression between copepodites and adult females; 6,229 genes were consistently identified as differentially expressed between the two regardless of reference. Automated bioinformatics analyses and targeted manual gene curation suggest that the de novo assembled L. madurae transcriptome is of high quality and completeness. This transcriptome provides a new resource for assessing the global physiological status of a planktonic species inhabiting a coral reef ecosystem that is subjected to multiple anthropogenic stressors. The workflows provide a template for generating and assessing transcriptomes in other non-model species.
Dominguez, Daniel; Tsai, Yi-Hsuan; Gomez, Nicholas; Jha, Deepak Kumar; Davis, Ian; Wang, Zefeng
2016-01-01
Progression through the cell cycle is largely dependent on waves of periodic gene expression, and the regulatory networks for these transcriptome dynamics have emerged as critical points of vulnerability in various aspects of tumor biology. Through RNA-sequencing of human cells during two continuous cell cycles (>2.3 billion paired reads), we identified over 1 000 mRNAs, non-coding RNAs and pseudogenes with periodic expression. Periodic transcripts are enriched in functions related to DNA metabolism, mitosis, and DNA damage response, indicating these genes likely represent putative cell cycle regulators. Using our set of periodic genes, we developed a new approach termed “mitotic trait” that can classify primary tumors and normal tissues by their transcriptome similarity to different cell cycle stages. By analyzing >4 000 tumor samples in The Cancer Genome Atlas (TCGA) and other expression data sets, we found that mitotic trait significantly correlates with genetic alterations, tumor subtype and, notably, patient survival. We further defined a core set of 67 genes with robust periodic expression in multiple cell types. Proteins encoded by these genes function as major hubs of protein-protein interaction and are mostly required for cell cycle progression. The core genes also have unique chromatin features including increased levels of CTCF/RAD21 binding and H3K36me3. Loss of these features in uterine and kidney cancers is associated with altered expression of the core 67 genes. Our study suggests new chromatin-associated mechanisms for periodic gene regulation and offers a predictor of cancer patient outcomes. PMID:27364684
Han, Limin; Chen, Chen; Wang, Zhezhi
2018-01-01
Epipremnum aureum is an important foliage plant in the Araceae family. In this study, we have sequenced the complete chloroplast genome of E. aureum by using Illumina Hiseq sequencing platforms. This genome is a double-stranded circular DNA sequence of 164,831 bp that contains 35.8% GC. The two inverted repeats (IRa and IRb; 26,606 bp) are spaced by a small single-copy region (22,868 bp) and a large single-copy region (88,751 bp). The chloroplast genome has 131 (113 unique) functional genes, including 86 (79 unique) protein-coding genes, 37 (30 unique) tRNA genes, and eight (four unique) rRNA genes. Tandem repeats comprise the majority of the 43 long repetitive sequences. In addition, 111 simple sequence repeats are present, with mononucleotides being the most common type and di- and tetranucleotides being infrequent events. Positive selection pressure on rps12 in the E. aureum chloroplast has been demonstrated via synonymous and nonsynonymous substitution rates and selection pressure sites analyses. Ycf15 and infA are pseudogenes in this species. We constructed a Maximum Likelihood phylogenetic tree based on the complete chloroplast genomes of 38 species from 13 families. Those results strongly indicated that E. aureum is positioned as the sister of Colocasia esculenta within the Araceae family. This work may provide information for further study of the molecular phylogenetic relationships within Araceae, as well as molecular markers and breeding novel varieties by chloroplast genetic-transformation of E. aureum in particular. PMID:29529038
Reggiani, Claudio; Coppens, Sandra; Sekhara, Tayeb; Dimov, Ivan; Pichon, Bruno; Lufin, Nicolas; Addor, Marie-Claude; Belligni, Elga Fabia; Digilio, Maria Cristina; Faletra, Flavio; Ferrero, Giovanni Battista; Gerard, Marion; Isidor, Bertrand; Joss, Shelagh; Niel-Bütschi, Florence; Perrone, Maria Dolores; Petit, Florence; Renieri, Alessandra; Romana, Serge; Topa, Alexandra; Vermeesch, Joris Robert; Lenaerts, Tom; Casimir, Georges; Abramowicz, Marc; Bontempi, Gianluca; Vilain, Catheline; Deconinck, Nicolas; Smits, Guillaume
2017-07-19
Tissue-specific integrative omics has the potential to reveal new genic elements important for developmental disorders. Two pediatric patients with global developmental delay and intellectual disability phenotype underwent array-CGH genetic testing, both showing a partial deletion of the DLG2 gene. From independent human and murine omics datasets, we combined copy number variations, histone modifications, developmental tissue-specific regulation, and protein data to explore the molecular mechanism at play. Integrating genomics, transcriptomics, and epigenomics data, we describe two novel DLG2 promoters and coding first exons expressed in human fetal brain. Their murine conservation and protein-level evidence allowed us to produce new DLG2 gene models for human and mouse. These new genic elements are deleted in 90% of 29 patients (public and in-house) showing partial deletion of the DLG2 gene. The patients' clinical characteristics expand the neurodevelopmental phenotypic spectrum linked to DLG2 gene disruption to cognitive and behavioral categories. While protein-coding genes are regarded as well known, our work shows that integration of multiple omics datasets can unveil novel coding elements. From a clinical perspective, our work demonstrates that two new DLG2 promoters and exons are crucial for the neurodevelopmental phenotypes associated with this gene. In addition, our work brings evidence for the lack of cross-annotation in human versus mouse reference genomes and nucleotide versus protein databases.
NASA Technical Reports Server (NTRS)
Everroad, R. Craig; Stuart, Rhona K.; Bebout, Brad M.; Detweiler, Angela M.; Lee, Jackson Zan; Woebken, Dagmar; Bebout, Leslie E.; Pett-Ridge, Jennifer
2016-01-01
The nonheterocystous filamentous cyanobacterium, strain ESFC-1, is a recently described member of the order Oscillatoriales within the Cyanobacteria. ESFC-1 has been shown to be a major diazotroph in the intertidal microbial mat system at Elkhorn Slough, CA, USA. Based on phylogenetic analyses of the 16S RNA gene, ESFC-1 appears to belong to a unique, genus-level divergence; the draft genome sequence of this strain has now been determined. Here we report features of this genome as they relate to the ecological functions and capabilities of strain ESFC-1. The 5,632,035 bp genome sequence encodes 4914 protein-coding genes and 92 RNA genes. One striking feature of this cyanobacterium is the apparent lack of either uptake or bi-directional hydrogenases typically expected within a diazotroph. Additionally, a large genomic island is found that contains numerous low GC-content genes and genes related to extracellular polysaccharide production and cell wall synthesis and maintenance.
Everroad, R. Craig; Stuart, Rhona K.; Bebout, Brad M.; ...
2016-08-24
The nonheterocystous filamentous cyanobacterium, strain ESFC-1, is a recently described member of the order Oscillatoriales within the Cyanobacteria. ESFC-1 has been shown to be a major diazotroph in the intertidal microbial mat system at Elkhorn Slough, CA, USA. Based on phylogenetic analyses of the 16S RNA gene, ESFC-1 appears to belong to a unique, genus-level divergence; the draft genome sequence of this strain has now been determined. Here we report features of this genome as they relate to the ecological functions and capabilities of strain ESFC-1. The 5,632,035 bp genome sequence encodes 4914 protein-coding genes and 92 RNA genes. Onemore » striking feature of this cyanobacterium is the apparent lack of either uptake or bi-directional hydrogenases typically expected within a diazotroph. In addition, a large genomic island is found that contains numerous low GC-content genes and genes related to extracellular polysaccharide production and cell wall synthesis and maintenance.« less
Foox, Jonathan; Brugler, Mercer; Siddall, Mark Edward; Rodríguez, Estefanía
2016-07-01
Six complete and three partial actiniarian mitochondrial genomes were amplified in two semi-circles using long-range PCR and pyrosequenced in a single run on a 454 GS Junior, doubling the number of complete mitogenomes available within the order. Typical metazoan mtDNA features included circularity, 13 protein-coding genes, 2 ribosomal RNA genes, and length ranging from 17,498 to 19,727 bp. Several typical anthozoan mitochondrial genome features were also observed including the presence of only two transfer RNA genes, elevated A + T richness ranging from 54.9 to 62.4%, large intergenic regions, and group 1 introns interrupting NADH dehydrogenase subunit 5 and cytochrome c oxidase subunit I, the latter of which possesses a homing endonuclease gene. Within the sea anemone Alicia sansibarensis, we report the first mitochondrial gene order rearrangement within the Actiniaria, as well as putative novel non-canonical protein-coding genes. Phylogenetic analyses of all 13 protein-coding and 2 ribosomal genes largely corroborated current hypotheses of sea anemone interrelatedness, with a few lower-level differences.
Yong, Hoi-Sen; Song, Sze-Looi; Lim, Phaik-Eem; Chan, Kok-Gan; Chow, Wan-Loo; Eamsobhana, Praphathip
2015-01-01
The whole mitochondrial genome of the pest fruit fly Bactrocera arecae was obtained from next-generation sequencing of genomic DNA. It had a total length of 15,900 bp, consisting of 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The control region (952 bp) was flanked by rrnS and trnI genes. The start codons included 6 ATG, 3 ATT and 1 each of ATA, ATC, GTG and TCG. Eight TAA, two TAG, one incomplete TA and two incomplete T stop codons were represented in the protein-coding genes. The cloverleaf structure for trnS1 lacked the D-loop, and that of trnN and trnF lacked the TΨC-loop. Molecular phylogeny based on 13 protein-coding genes was concordant with 37 mitochondrial genes, with B. arecae having closest genetic affinity to B. tryoni. The subgenus Bactrocera of Dacini tribe and the Dacinae subfamily (Dacini and Ceratitidini tribes) were monophyletic. The whole mitogenome of B. arecae will serve as a useful dataset for studying the genetics, systematics and phylogenetic relationships of the many species of Bactrocera genus in particular, and tephritid fruit flies in general. PMID:26472633
Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas.
Mathelier, Anthony; Lefebvre, Calvin; Zhang, Allen W; Arenillas, David J; Ding, Jiarui; Wasserman, Wyeth W; Shah, Sohrab P
2015-04-23
With the rapid increase of whole-genome sequencing of human cancers, an important opportunity to analyze and characterize somatic mutations lying within cis-regulatory regions has emerged. A focus on protein-coding regions to identify nonsense or missense mutations disruptive to protein structure and/or function has led to important insights; however, the impact on gene expression of mutations lying within cis-regulatory regions remains under-explored. We analyzed somatic mutations from 84 matched tumor-normal whole genomes from B-cell lymphomas with accompanying gene expression measurements to elucidate the extent to which these cancers are disrupted by cis-regulatory mutations. We characterize mutations overlapping a high quality set of well-annotated transcription factor binding sites (TFBSs), covering a similar portion of the genome as protein-coding exons. Our results indicate that cis-regulatory mutations overlapping predicted TFBSs are enriched in promoter regions of genes involved in apoptosis or growth/proliferation. By integrating gene expression data with mutation data, our computational approach culminates with identification of cis-regulatory mutations most likely to participate in dysregulation of the gene expression program. The impact can be measured along with protein-coding mutations to highlight key mutations disrupting gene expression and pathways in cancer. Our study yields specific genes with disrupted expression triggered by genomic mutations in either the coding or the regulatory space. It implies that mutated regulatory components of the genome contribute substantially to cancer pathways. Our analyses demonstrate that identifying genomically altered cis-regulatory elements coupled with analysis of gene expression data will augment biological interpretation of mutational landscapes of cancers.
Quach, Tommy; Brooks, Daniel M; Miranda, Hector C
2016-01-01
The complete mitochondrial genome of the Palawan peacock-pheasant Polyplectron napoleonis is 16,710 bp and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a control-region. All protein-coding genes use the standard ATG start codon, except for cox1 which has GTG start codon. Seven out of 13 PCGs have TAA stop codons, two have AGG (cox1 and nd6), and three PCGs (nd2, cox2 and nd4) have incomplete stop codon of just T- - nucleotide.
Zampini, Massimiliano; Mur, Luis A J; Rees Stevens, Pauline; Pachebat, Justin A; Newbold, C James; Hayes, Finbarr; Kingston-Smith, Alison
2016-05-25
Synthetic biology is characterized by the development of novel and powerful DNA fabrication methods and by the application of engineering principles to biology. The current study describes Terminator Operon Reporter (TOR), a new gene assembly technology based on the conditional activation of a reporter gene in response to sequence errors occurring at the assembly stage of the synthetic element. These errors are monitored by a transcription terminator that is placed between the synthetic gene and reporter gene. Switching of this terminator between active and inactive states dictates the transcription status of the downstream reporter gene to provide a rapid and facile readout of the accuracy of synthetic assembly. Designed specifically and uniquely for the synthesis of protein coding genes in bacteria, TOR allows the rapid and cost-effective fabrication of synthetic constructs by employing oligonucleotides at the most basic purification level (desalted) and without the need for costly and time-consuming post-synthesis correction methods. Thus, TOR streamlines gene assembly approaches, which are central to the future development of synthetic biology.
Itoh, Takeshi; Tanaka, Tsuyoshi; Barrero, Roberto A.; Yamasaki, Chisato; Fujii, Yasuyuki; Hilton, Phillip B.; Antonio, Baltazar A.; Aono, Hideo; Apweiler, Rolf; Bruskiewich, Richard; Bureau, Thomas; Burr, Frances; Costa de Oliveira, Antonio; Fuks, Galina; Habara, Takuya; Haberer, Georg; Han, Bin; Harada, Erimi; Hiraki, Aiko T.; Hirochika, Hirohiko; Hoen, Douglas; Hokari, Hiroki; Hosokawa, Satomi; Hsing, Yue; Ikawa, Hiroshi; Ikeo, Kazuho; Imanishi, Tadashi; Ito, Yukiyo; Jaiswal, Pankaj; Kanno, Masako; Kawahara, Yoshihiro; Kawamura, Toshiyuki; Kawashima, Hiroaki; Khurana, Jitendra P.; Kikuchi, Shoshi; Komatsu, Setsuko; Koyanagi, Kanako O.; Kubooka, Hiromi; Lieberherr, Damien; Lin, Yao-Cheng; Lonsdale, David; Matsumoto, Takashi; Matsuya, Akihiro; McCombie, W. Richard; Messing, Joachim; Miyao, Akio; Mulder, Nicola; Nagamura, Yoshiaki; Nam, Jongmin; Namiki, Nobukazu; Numa, Hisataka; Nurimoto, Shin; O’Donovan, Claire; Ohyanagi, Hajime; Okido, Toshihisa; OOta, Satoshi; Osato, Naoki; Palmer, Lance E.; Quetier, Francis; Raghuvanshi, Saurabh; Saichi, Naomi; Sakai, Hiroaki; Sakai, Yasumichi; Sakata, Katsumi; Sakurai, Tetsuya; Sato, Fumihiko; Sato, Yoshiharu; Schoof, Heiko; Seki, Motoaki; Shibata, Michie; Shimizu, Yuji; Shinozaki, Kazuo; Shinso, Yuji; Singh, Nagendra K.; Smith-White, Brian; Takeda, Jun-ichi; Tanino, Motohiko; Tatusova, Tatiana; Thongjuea, Supat; Todokoro, Fusano; Tsugane, Mika; Tyagi, Akhilesh K.; Vanavichit, Apichart; Wang, Aihui; Wing, Rod A.; Yamaguchi, Kaori; Yamamoto, Mayu; Yamamoto, Naoyuki; Yu, Yeisoo; Zhang, Hao; Zhao, Qiang; Higo, Kenichi; Burr, Benjamin; Gojobori, Takashi; Sasaki, Takuji
2007-01-01
We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ∼32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene. PMID:17210932
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stella, Stefano; University of Copenhagen, Blegdamsvej 3B, 2200 Copenhagen; Molina, Rafael
Crystal structures of BurrH and the BurrH–DNA complex are reported. DNA editing offers new possibilities in synthetic biology and biomedicine for modulation or modification of cellular functions to organisms. However, inaccuracy in this process may lead to genome damage. To address this important problem, a strategy allowing specific gene modification has been achieved through the addition, removal or exchange of DNA sequences using customized proteins and the endogenous DNA-repair machinery. Therefore, the engineering of specific protein–DNA interactions in protein scaffolds is key to providing ‘toolkits’ for precise genome modification or regulation of gene expression. In a search for putative DNA-bindingmore » domains, BurrH, a protein that recognizes a 19 bp DNA target, was identified. Here, its apo and DNA-bound crystal structures are reported, revealing a central region containing 19 repeats of a helix–loop–helix modular domain (BurrH domain; BuD), which identifies the DNA target by a single residue-to-nucleotide code, thus facilitating its redesign for gene targeting. New DNA-binding specificities have been engineered in this template, showing that BuD-derived nucleases (BuDNs) induce high levels of gene targeting in a locus of the human haemoglobin β (HBB) gene close to mutations responsible for sickle-cell anaemia. Hence, the unique combination of high efficiency and specificity of the BuD arrays can push forward diverse genome-modification approaches for cell or organism redesign, opening new avenues for gene editing.« less
[Cloning and characterization of a novel rat gene RSD-7 differentially expressed in testis].
Zhang, Xiao-dong; Gou, Da-wei; Miao, Shi-ying; Zhang, Jian-chao; Zong, Shu-dong; Wang, Lin-fang
2003-06-01
To isolate and identify the differentially expressed genes in spermatogenesis for the understanding molecular mechanism of spermatogenesis. Screening of the cDNA library, Northern blot, expression and purification in E. coli with GST expression system, immunocytochemical staining of testis sections were used. (1) A cDNA fragment designated as RSD-7 was isolated from rat testis cDNA library. It was 1,238 bp in length, coding a protein of 232 amino acids with the GenBank accession number AF315467. The encoding protein of RSD-7 cDNA had a Ubiquitin-like domain. (2) Northern blot indicated that RSD-7 was uniquely expressed in rat testis, and in the testis RSD-7 emerged on the 30th postnatal day and expressed until 120th postnatal day. (3) Expression and purification of RSD-7 protein in E. coli with GST expression system and were used to obtain anti-RSD-7 antibody. (4) Immunolocalization of RSD-7 in rat testis revealed that it is expressed only in Sertoli cells. Transcription pattern of RSD-7 and localization of RSD-7 protein in testis have been made, which established the base for the functional study of RSD-7.
The transcriptional activator ZNF143 is essential for normal development in zebrafish
2012-01-01
Background ZNF143 is a sequence-specific DNA-binding protein that stimulates transcription of both small RNA genes by RNA polymerase II or III, or protein-coding genes by RNA polymerase II, using separable activating domains. We describe phenotypic effects following knockdown of this protein in developing Danio rerio (zebrafish) embryos by injection of morpholino antisense oligonucleotides that target znf143 mRNA. Results The loss of function phenotype is pleiotropic and includes a broad array of abnormalities including defects in heart, blood, ear and midbrain hindbrain boundary. Defects are rescued by coinjection of synthetic mRNA encoding full-length ZNF143 protein, but not by protein lacking the amino-terminal activation domains. Accordingly, expression of several marker genes is affected following knockdown, including GATA-binding protein 1 (gata1), cardiac myosin light chain 2 (cmlc2) and paired box gene 2a (pax2a). The zebrafish pax2a gene proximal promoter contains two binding sites for ZNF143, and reporter gene transcription driven by this promoter in transfected cells is activated by this protein. Conclusions Normal development of zebrafish embryos requires ZNF143. Furthermore, the pax2a gene is probably one example of many protein-coding gene targets of ZNF143 during zebrafish development. PMID:22268977
The transcriptional activator ZNF143 is essential for normal development in zebrafish.
Halbig, Kari M; Lekven, Arne C; Kunkel, Gary R
2012-01-23
ZNF143 is a sequence-specific DNA-binding protein that stimulates transcription of both small RNA genes by RNA polymerase II or III, or protein-coding genes by RNA polymerase II, using separable activating domains. We describe phenotypic effects following knockdown of this protein in developing Danio rerio (zebrafish) embryos by injection of morpholino antisense oligonucleotides that target znf143 mRNA. The loss of function phenotype is pleiotropic and includes a broad array of abnormalities including defects in heart, blood, ear and midbrain hindbrain boundary. Defects are rescued by coinjection of synthetic mRNA encoding full-length ZNF143 protein, but not by protein lacking the amino-terminal activation domains. Accordingly, expression of several marker genes is affected following knockdown, including GATA-binding protein 1 (gata1), cardiac myosin light chain 2 (cmlc2) and paired box gene 2a (pax2a). The zebrafish pax2a gene proximal promoter contains two binding sites for ZNF143, and reporter gene transcription driven by this promoter in transfected cells is activated by this protein. Normal development of zebrafish embryos requires ZNF143. Furthermore, the pax2a gene is probably one example of many protein-coding gene targets of ZNF143 during zebrafish development.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rittig, S.; Siggaard, C.; Pedersen, E.B.
1996-01-01
Familial neurohypophyseal diabetes insipidus (FNDI) is an autosomal dominant disorder characterized by progressive postnatal deficiency of arginine vasopressin as a result of mutation in the gene that encodes the hormone. To determine the extent of mutations in the coding region that produce the phenotype, we studied members of 17 unrelated kindreds with the disorder. We sequenced all 3 exons of the gene by using a rapid, direct dye-terminator method and found the causative mutation in each kindred. In four kindreds, the mutations were each identical to mutations described in other affected families. In the other 13 kindreds each mutation wasmore » unique. There were two missense mutations that altered the cleavage region of the signal peptide, seven missense mutations in exon 2, which codes for the conserved portion of the protein, one nonsense mutation in exon 2, and three nonsense mutations in exon 3. These findings, together with the clinical features of FNDI, suggest that each of the mutations exerts an effect by directing the production of a pre-prohormone that cannot be folded, processed, or degraded properly and eventually destroys vasopressinergic neurons. 63 refs., 5 figs., 6 tabs.« less
Rittig, S.; Robertson, G. L.; Siggaard, C.; Kovács, L.; Gregersen, N.; Nyborg, J.; Pedersen, E. B.
1996-01-01
Familial neurohypophyseal diabetes insipidus (FNDI) is an autosomal dominant disorder characterized by progressive postnatal deficiency of arginine vasopressin as a result of mutation in the gene that encodes the hormone. To determine the extent of mutations in the coding region that produce the phenotype, we studied members of 17 unrelated kindreds with the disorder. We sequenced all 3 exons of the gene by using a rapid, direct dye-terminator method and found the causative mutation in each kindred. In four kindreds, the mutations were each identical to mutations described in other affected families. In the other 13 kindreds each mutation was unique. There were two missense mutations that altered the cleavage region of the signal peptide, seven missense mutations in exon 2, which codes for the conserved portion of the protein, one nonsense mutation in exon 2, and three nonsense mutations in exon 3. These findings, together with the clinical features of FNDI, suggest that each of the mutations exerts an effect by directing the production of a pre-prohormone that cannot be folded, processed, or degraded properly and eventually destroys vasopressinergic neurons. Images Figure 3 PMID:8554046
Su, Huei-Jiun; Hu, Jer-Ming
2012-01-01
Background and Aims The holoparasitic flowering plant Balanophora displays extreme floral reduction and was previously found to have enormous rate acceleration in the nuclear 18S rDNA region. So far, it remains unclear whether non-ribosomal, protein-coding genes of Balanophora also evolve in an accelerated fashion and whether the genes with high substitution rates retain their functionality. To tackle these issues, six different genes were sequenced from two Balanophora species and their rate variation and expression patterns were examined. Methods Sequences including nuclear PI, euAP3, TM6, LFY and RPB2 and mitochondrial matR were determined from two Balanophora spp. and compared with selected hemiparasitic species of Santalales and autotrophic core eudicots. Gene expression was detected for the six protein-coding genes and the expression patterns of the three B-class genes (PI, AP3 and TM6) were further examined across different organs of B. laxiflora using RT-PCR. Key Results Balanophora mitochondrial matR is highly accelerated in both nonsynonymous (dN) and synonymous (dS) substitution rates, whereas the rate variation of nuclear genes LFY, PI, euAP3, TM6 and RPB2 are less dramatic. Significant dS increases were detected in Balanophora PI, TM6, RPB2 and dN accelerations in euAP3. All of the protein-coding genes are expressed in inflorescences, indicative of their functionality. PI is restrictively expressed in tepals, synandria and floral bracts, whereas AP3 and TM6 are widely expressed in both male and female inflorescences. Conclusions Despite the observation that rates of sequence evolution are generally higher in Balanophora than in hemiparasitic species of Santalales and autotrophic core eudicots, the five nuclear protein-coding genes are functional and are evolving at a much slower rate than 18S rDNA. The mechanism or mechanisms responsible for rapid sequence evolution and concomitant rate acceleration for 18S rDNA and matR are currently not well understood and require further study in Balanophora and other holoparasites. PMID:23041381
Investigating the Role of RIO Protein Kinases in Caenorhabditis elegans
Raymant, Greta; Bertram, Sonja E.; Esmaillie, Reza; Nadarajan, Saravanapriah; Breugelmans, Bert; Hofmann, Andreas; Gasser, Robin B.; Colaiácovo, Monica P.; Boag, Peter R.
2015-01-01
RIO protein kinases (RIOKs) are a relatively conserved family of enzymes implicated in cell cycle control and ribosomal RNA processing. Despite their functional importance, they remain a poorly understood group of kinases in multicellular organisms. Here, we show that the C. elegans genome contains one member of each of the three RIOK sub-families and that each of the genes coding for them has a unique tissue expression pattern. Our analysis showed that the gene encoding RIOK-1 (riok-1) was broadly and strongly expressed. Interestingly, the intestinal expression of riok-1 was dependent upon two putative binding sites for the oxidative and xenobiotic stress response transcription factor SKN-1. RNA interference (RNAi)-mediated knock down of riok-1 resulted in germline defects, including defects in germ line stem cell proliferation, oocyte maturation and the production of endomitotic oocytes. Taken together, our findings indicate new functions for RIOK-1 in post mitotic tissues and in reproduction. PMID:25688864
Chen, Chaoyang; Sun, Chongran; Wu, Yi-Rui
2018-03-21
A wild-type solventogenic strain Clostridium diolis WST, isolated from mangrove sediments, was characterized to produce high amount of butanol and acetone with negligible level of ethanol and acids from glucose via a unique acetone-butanol (AB) fermentation pathway. Through the genomic sequencing, the assembled draft genome of strain WST is calculated to be 5.85 Mb with a GC content of 29.69% and contains 5263 genes that contribute to the annotation of 5049 protein-coding sequences. Within these annotated genes, the butanol dehydrogenase gene (bdh) was determined to be in a higher amount from strain WST compared to other Clostridial strains, which is positively related to its high-efficient production of butanol. Therefore, we present a draft genome sequence analysis of strain WST in this article that should facilitate to further understand the solventogenic mechanism of this special microorganism.
Carver, Melissa N.; Müller, Ulrika; Bekiranov, Stefan; Auble, David T.
2017-01-01
Transcriptome studies on eukaryotic cells have revealed an unexpected abundance and diversity of noncoding RNAs synthesized by RNA polymerase II (Pol II), some of which influence the expression of protein-coding genes. Yet, much less is known about biogenesis of Pol II non-coding RNA than mRNAs. In the budding yeast Saccharomyces cerevisiae, initiation of non-coding transcripts by Pol II appears to be similar to that of mRNAs, but a distinct pathway is utilized for termination of most non-coding RNAs: the Sen1-dependent or “NNS” pathway. Here, we examine the effect on the S. cerevisiae transcriptome of conditional mutations in the genes encoding six different essential proteins that influence Sen1-dependent termination: Sen1, Nrd1, Nab3, Ssu72, Rpb11, and Hrp1. We observe surprisingly diverse effects on transcript abundance for the different proteins that cannot be explained simply by differing severity of the mutations. Rather, we infer from our results that termination of Pol II transcription of non-coding RNA genes is subject to complex combinatorial control that likely involves proteins beyond those studied here. Furthermore, we identify new targets and functions of Sen1-dependent termination, including a role in repression of meiotic genes in vegetative cells. In combination with other recent whole-genome studies on termination of non-coding RNAs, our results provide promising directions for further investigation. PMID:28665995
The CDC Hemophilia A Mutation Project (CHAMP) Mutation List: a New Online Resource
Payne, Amanda B.; Miller, Connie H.; Kelly, Fiona M.; Soucie, J. Michael; Hooper, W. Craig
2015-01-01
Genotyping efforts in hemophilia A (HA) populations in many countries have identified large numbers of unique mutations in the Factor VIII gene (F8). To assist HA researchers conducting genotyping analyses, we have developed a listing of F8 mutations including those listed in existing locus-specific databases as well as those identified in patient populations and reported in the literature. Each mutation was reviewed and uniquely identified using Human Genome Variation Society (HGVS) nomenclature standards for coding DNA and predicted protein changes as well as traditional nomenclature based on the mature, processed protein. Listings also include the associated hemophilia severity classified by International Society of Thrombosis and Haemostasis (ISTH) criteria, associations of the mutations with inhibitors, and reference information. The mutation list currently contains 2,537 unique mutations known to cause HA. HA severity caused by the mutation is available for 2,022 mutations (80%) and information on inhibitors is available for 1,816 mutations (72%). The CDC Hemophilia A Mutation Project (CHAMP) Mutation List is available at http://www.cdc.gov/hemophiliamutations for download and search and will be updated quarterly based on periodic literature reviews and submitted reports. PMID:23280990
Mitochondrial divergence between slow- and fast-aging garter snakes.
Schwartz, Tonia S; Arendsee, Zebulun W; Bronikowski, Anne M
2015-11-01
Mitochondrial function has long been hypothesized to be intimately involved in aging processes--either directly through declining efficiency of mitochondrial respiration and ATP production with advancing age, or indirectly, e.g., through increased mitochondrial production of damaging free radicals with age. Yet we lack a comprehensive understanding of the evolution of mitochondrial genotypes and phenotypes across diverse animal models, particularly in species that have extremely labile physiology. Here, we measure mitochondrial genome-types and transcription in ecotypes of garter snakes (Thamnophis elegans) that are adapted to disparate habitats and have diverged in aging rates and lifespans despite residing in close proximity. Using two RNA-seq datasets, we (1) reconstruct the garter snake mitochondrial genome sequence and bioinformatically identify regulatory elements, (2) test for divergence of mitochondrial gene expression between the ecotypes and in response to heat stress, and (3) test for sequence divergence in mitochondrial protein-coding regions in these slow-aging (SA) and fast-aging (FA) naturally occurring ecotypes. At the nucleotide sequence level, we confirmed two (duplicated) mitochondrial control regions one of which contains a glucocorticoid response element (GRE). Gene expression of protein-coding genes was higher in FA snakes relative to SA snakes for most genes, but was neither affected by heat stress nor an interaction between heat stress and ecotype. SA and FA ecotypes had unique mitochondrial haplotypes with amino acid substitutions in both CYTB and ND5. The CYTB amino acid change (Isoleucine → Threonine) was highly segregated between ecotypes. This divergence of mitochondrial haplotypes between SA and FA snakes contrasts with nuclear gene-flow estimates, but correlates with previously reported divergence in mitochondrial function (mitochondrial oxygen consumption, ATP production, and reactive oxygen species consequences). Copyright © 2015 Elsevier Inc. All rights reserved.
Decoding the Long Noncoding RNA During Cardiac Maturation: A Roadmap for Functional Discovery.
Touma, Marlin; Kang, Xuedong; Zhao, Yan; Cass, Ashley A; Gao, Fuying; Biniwale, Reshma; Coppola, Giovanni; Xiao, Xinshu; Reemtsen, Brian; Wang, Yibin
2016-10-01
Cardiac maturation during perinatal transition of heart is critical for functional adaptation to hemodynamic load and nutrient environment. Perturbation in this process has major implications in congenital heart defects. Transcriptome programming during perinatal stages is an important information but incomplete in current literature, particularly, the expression profiles of the long noncoding RNAs (lncRNAs) are not fully elucidated. From comprehensive analysis of transcriptomes derived from neonatal mouse heart left and right ventricles, a total of 45 167 unique transcripts were identified, including 21 916 known and 2033 novel lncRNAs. Among these lncRNAs, 196 exhibited significant dynamic regulation along maturation process. By implementing parallel weighted gene co-expression network analysis of mRNA and lncRNA data sets, several lncRNA modules coordinately expressed in a developmental manner similar to protein coding genes, while few lncRNAs revealed chamber-specific patterns. Out of 2262 lncRNAs located within 50 kb of protein coding genes, 5% significantly correlate with the expression of their neighboring genes. The impact of Ppp1r1b-lncRNA on the corresponding partner gene Tcap was validated in cultured myoblasts. This concordant regulation was also conserved in human infantile hearts. Furthermore, the Ppp1r1b-lncRNA/Tcap expression ratio was identified as a molecular signature that differentiated congenital heart defect phenotypes. The study provides the first high-resolution landscape on neonatal cardiac lncRNAs and reveals their potential interaction with mRNA transcriptome during cardiac maturation. Ppp1r1b-lncRNA was identified as a regulator of Tcap expression, with dynamic interaction in postnatal cardiac development and congenital heart defects. © 2016 American Heart Association, Inc.
Wang, Guang-Zhong; Lercher, Martin J.; Hurst, Laurence D.
2011-01-01
Abstract How is noise in gene expression modulated? Do mechanisms of noise control impact genome organization? In yeast, the expression of one gene can affect that of a very close neighbor. As the effect is highly regionalized, we hypothesize that genes in different orientations will have differing degrees of coupled expression and, in turn, different noise levels. Divergently organized gene pairs, in particular those with bidirectional promoters, have close promoters, maximizing the likelihood that expression of one gene affects the neighbor. With more distant promoters, the same is less likely to hold for gene pairs in nondivergent orientation. Stochastic models suggest that coupled chromatin dynamics will typically result in low abundance-corrected noise (ACN). Transcription of noncoding RNA (ncRNA) from a bidirectional promoter, we thus hypothesize to be a noise-reduction, expression-priming, mechanism. The hypothesis correctly predicts that protein-coding genes with a bidirectional promoter, including those with a ncRNA partner, have lower ACN than other genes and divergent gene pairs uniquely have correlated ACN. Moreover, as predicted, ACN increases with the distance between promoters. The model also correctly predicts ncRNA transcripts to be often divergently transcribed from genes that a priori would be under selection for low noise (essential genes, protein complex genes) and that the latter genes should commonly reside in divergent orientation. Likewise, that genes with bidirectional promoters are rare subtelomerically, cluster together, and are enriched in essential gene clusters is expected and observed. We conclude that gene orientation and transcription of ncRNAs are candidate modulators of noise. PMID:21402863
Characterization of the Lymantria dispar nucleopolyhedrovirus 25K FP gene
David S. Bischoff; James M. Slavicek
1996-01-01
The Lymantria dispar nucleopolyhedrovirus (LdMNPV) gene encoding the 25K FP protein has been cloned and sequenced. The 25KFP gene codes for a 217 amino acid protein with a predicted molecular mass of 24870 Da. Expression of the 25K FP protein in a rabbit reticulocyte system generated a 27 kDa protein, in close agreement with the...
Batagov, Arsen O; Yarmishyn, Aliaksandr A; Jenjaroenpun, Piroon; Tan, Jovina Z; Nishida, Yuichiro; Kurochkin, Igor V
2013-10-16
Mammalian genomes are extensively transcribed producing thousands of long non-protein-coding RNAs (lncRNAs). The biological significance and function of the vast majority of lncRNAs remain unclear. Recent studies have implicated several lncRNAs as playing important roles in embryonic development and cancer progression. LncRNAs are characterized with different genomic architectures in relationship with their associated protein-coding genes. Our study aimed at bridging lncRNA architecture with dynamical patterns of their expression using differentiating human neuroblastoma cells model. LncRNA expression was studied in a 120-hours timecourse of differentiation of human neuroblastoma SH-SY5Y cells into neurons upon treatment with retinoic acid (RA), the compound used for the treatment of neuroblastoma. A custom microarray chip was utilized to interrogate expression levels of 9,267 lncRNAs in the course of differentiation. We categorized lncRNAs into 19 architecture classes according to their position relatively to protein-coding genes. For each architecture class, dynamics of expression of lncRNAs was studied in association with their protein-coding partners. It allowed us to demonstrate positive correlation of lncRNAs with their associated protein-coding genes at bidirectional promoters and for sense-antisense transcript pairs. In contrast, lncRNAs located in the introns and downstream of the protein-coding genes were characterized with negative correlation modes. We further classified the lncRNAs by the temporal patterns of their expression dynamics. We found that intronic and bidirectional promoter architectures are associated with rapid RA-dependent induction or repression of the corresponding lncRNAs, followed by their constant expression. At the same time, lncRNAs expressed downstream of protein-coding genes are characterized by rapid induction, followed by transcriptional repression. Quantitative RT-PCR analysis confirmed the discovered functional modes for several selected lncRNAs associated with proteins involved in cancer and embryonic development. This is the first report detailing dynamical changes of multiple lncRNAs during RA-induced neuroblastoma differentiation. Integration of genomic and transcriptomic levels of information allowed us to demonstrate specific behavior of lncRNAs organized in different genomic architectures. This study also provides a list of lncRNAs with possible roles in neuroblastoma.
DOE R&D Accomplishments Database
Liang, X.
1998-06-10
The genome of Methanococcus jannaschii has been sequenced completely and has been found to contain approximately 1,770 predicted protein-coding regions. When these coding regions are expressed and how their expression is regulated, however, remain open questions. In this work, mass spectrometry was combined with two-dimensional gel electrophoresis to identify which proteins the genes produce under different growth conditions, and thus investigate the regulation of genes responsible for functions characteristic of this thermophilic representative of the methanogenic Archaea.
Michael, Todd P; Bryant, Douglas; Gutierrez, Ryan; Borisjuk, Nikolai; Chu, Philomena; Zhang, Hanzhong; Xia, Jing; Zhou, Junfei; Peng, Hai; El Baidouri, Moaine; Ten Hallers, Boudewijn; Hastie, Alex R; Liang, Tiffany; Acosta, Kenneth; Gilbert, Sarah; McEntee, Connor; Jackson, Scott A; Mockler, Todd C; Zhang, Weixiong; Lam, Eric
2017-02-01
Spirodela polyrhiza is a fast-growing aquatic monocot with highly reduced morphology, genome size and number of protein-coding genes. Considering these biological features of Spirodela and its basal position in the monocot lineage, understanding its genome architecture could shed light on plant adaptation and genome evolution. Like many draft genomes, however, the 158-Mb Spirodela genome sequence has not been resolved to chromosomes, and important genome characteristics have not been defined. Here we deployed rapid genome-wide physical maps combined with high-coverage short-read sequencing to resolve the 20 chromosomes of Spirodela and to empirically delineate its genome features. Our data revealed a dramatic reduction in the number of the rDNA repeat units in Spirodela to fewer than 100, which is even fewer than that reported for yeast. Consistent with its unique phylogenetic position, small RNA sequencing revealed 29 Spirodela-specific microRNA, with only two being shared with Elaeis guineensis (oil palm) and Musa balbisiana (banana). Combining DNA methylation data and small RNA sequencing enabled the accurate prediction of 20.5% long terminal repeats (LTRs) that doubled the previous estimate, and revealed a high Solo:Intact LTR ratio of 8.2. Interestingly, we found that Spirodela has the lowest global DNA methylation levels (9%) of any plant species tested. Taken together our results reveal a genome that has undergone reduction, likely through eliminating non-essential protein coding genes, rDNA and LTRs. In addition to delineating the genome features of this unique plant, the methodologies described and large-scale genome resources from this work will enable future evolutionary and functional studies of this basal monocot family. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.
Tzagoloff, A; Shtanko, A
1995-06-01
Three complementation groups of a pet mutant collection have been found to be composed of respiratory-deficient deficient mutants with lesions in mitochondrial protein synthesis. Recombinant plasmids capable of restoring respiration were cloned by transformation of representatives of each complementation group with a yeast genomic library. The plasmids were used to characterize the complementing genes and to institute disruption of the chromosomal copies of each gene in respiratory-proficient yeast. The sequences of the cloned genes indicate that they code for isoleucyl-, arginyl- and glutamyl-tRNA synthetases. The properties of the mutants used to obtain the genes and of strains with the disrupted genes indicate that all three aminoacyl-tRNA synthetases function exclusively in mitochondrial proteins synthesis. The ISM1 gene for mitochondrial isoleucyl-tRNA synthetase has been localized to chromosome XVI next to UME5. The MSR1 gene for the arginyl-tRNA synthetase was previously located on yeast chromosome VIII. The third gene MSE1 for the mitochondrial glutamyl-tRNA synthetase has not been localized. The identification of three new genes coding for mitochondrial-specific aminoacyl-tRNA synthetases indicates that in Saccharomyces cerevisiae at least 11 members of this protein family are encoded by genes distinct from those coding for the homologous cytoplasmic enzymes.
Sudianto, Edi; Wu, Chung-Shien; Lin, Ching-Ping; Chaw, Shu-Miaw
2016-01-01
Phylogeny of the ten Pinaceous genera has long been contentious. Plastid genomes (plastomes) provide an opportunity to resolve this problem because they contain rich evolutionary information. To comprehend the plastid phylogenomics of all ten Pinaceous genera, we sequenced the plastomes of two previously unavailable genera, Pseudolarix amabilis (122,234 bp) and Tsuga chinensis (120,859 bp). Both plastomes share similar gene repertoire and order. Here for the first time we report a unique insertion of tandem repeats in accD of T. chinensis. From the 65 plastid protein-coding genes common to all Pinaceous genera, we re-examined the phylogenetic relationship among all Pinaceous genera. Our two phylogenetic trees are congruent in an identical tree topology, with the five genera of the Abietoideae subfamily constituting a monophyletic clade separate from the other three subfamilies: Pinoideae, Piceoideae, and Laricoideae. The five genera of Abietoideae were grouped into two sister clades consisting of (1) Cedrus alone and (2) two sister subclades of Pseudolarix—Tsuga and Abies—Keteleeria, with the former uniquely losing the gene psaM and the latter specifically excluding the 3 psbA from the residual inverted repeat. PMID:27352945
Vicente, Juan J; Galardi-Castilla, María; Escalante, Ricardo; Sastre, Leandro
2008-01-03
The social amoeba Dictyostelium discoideum executes a multicellular development program upon starvation. This morphogenetic process requires the differential regulation of a large number of genes and is coordinated by extracellular signals. The MADS-box transcription factor SrfA is required for several stages of development, including slug migration and spore terminal differentiation. Subtractive hybridization allowed the isolation of a gene, sigN (SrfA-induced gene N), that was dependent on the transcription factor SrfA for expression at the slug stage of development. Homology searches detected the existence of a large family of sigN-related genes in the Dictyostelium discoideum genome. The 13 most similar genes are grouped in two regions of chromosome 2 and have been named Group1 and Group2 sigN genes. The putative encoded proteins are 87-89 amino acids long. All these genes have a similar structure, composed of a first exon containing a 13 nucleotides long open reading frame and a second exon comprising the remaining of the putative coding region. The expression of these genes is induced at10 hours of development. Analyses of their promoter regions indicate that these genes are expressed in the prestalk region of developing structures. The addition of antibodies raised against SigN Group 2 proteins induced disintegration of multi-cellular structures at the mound stage of development. A large family of genes coding for small proteins has been identified in D. discoideum. Two groups of very similar genes from this family have been shown to be specifically expressed in prestalk cells during development. Functional studies using antibodies raised against Group 2 SigN proteins indicate that these genes could play a role during multicellular development.
Comparative Genomics of a Parthenogenesis-Inducing Wolbachia Symbiont
Lindsey, Amelia R. I.; Werren, John H.; Richards, Stephen; Stouthamer, Richard
2016-01-01
Wolbachia is an intracellular symbiont of invertebrates responsible for inducing a wide variety of phenotypes in its host. These host-Wolbachia relationships span the continuum from reproductive parasitism to obligate mutualism, and provide a unique system to study genomic changes associated with the evolution of symbiosis. We present the genome sequence from a parthenogenesis-inducing Wolbachia strain (wTpre) infecting the minute parasitoid wasp Trichogramma pretiosum. The wTpre genome is the most complete parthenogenesis-inducing Wolbachia genome available to date. We used comparative genomics across 16 Wolbachia strains, representing five supergroups, to identify a core Wolbachia genome of 496 sets of orthologous genes. Only 14 of these sets are unique to Wolbachia when compared to other bacteria from the Rickettsiales. We show that the B supergroup of Wolbachia, of which wTpre is a member, contains a significantly higher number of ankyrin repeat-containing genes than other supergroups. In the wTpre genome, there is evidence for truncation of the protein coding sequences in 20% of ORFs, mostly as a result of frameshift mutations. The wTpre strain represents a conversion from cytoplasmic incompatibility to a parthenogenesis-inducing lifestyle, and is required for reproduction in the Trichogramma host it infects. We hypothesize that the large number of coding frame truncations has accompanied the change in reproductive mode of the wTpre strain. PMID:27194801
Comparative Genomics of a Parthenogenesis-Inducing Wolbachia Symbiont.
Lindsey, Amelia R I; Werren, John H; Richards, Stephen; Stouthamer, Richard
2016-07-07
Wolbachia is an intracellular symbiont of invertebrates responsible for inducing a wide variety of phenotypes in its host. These host-Wolbachia relationships span the continuum from reproductive parasitism to obligate mutualism, and provide a unique system to study genomic changes associated with the evolution of symbiosis. We present the genome sequence from a parthenogenesis-inducing Wolbachia strain (wTpre) infecting the minute parasitoid wasp Trichogramma pretiosum The wTpre genome is the most complete parthenogenesis-inducing Wolbachia genome available to date. We used comparative genomics across 16 Wolbachia strains, representing five supergroups, to identify a core Wolbachia genome of 496 sets of orthologous genes. Only 14 of these sets are unique to Wolbachia when compared to other bacteria from the Rickettsiales. We show that the B supergroup of Wolbachia, of which wTpre is a member, contains a significantly higher number of ankyrin repeat-containing genes than other supergroups. In the wTpre genome, there is evidence for truncation of the protein coding sequences in 20% of ORFs, mostly as a result of frameshift mutations. The wTpre strain represents a conversion from cytoplasmic incompatibility to a parthenogenesis-inducing lifestyle, and is required for reproduction in the Trichogramma host it infects. We hypothesize that the large number of coding frame truncations has accompanied the change in reproductive mode of the wTpre strain. Copyright © 2016 Lindsey et al.
Sporophyte Formation and Life Cycle Completion in Moss Requires Heterotrimeric G-Proteins.
Hackenberg, Dieter; Perroud, Pierre-François; Quatrano, Ralph; Pandey, Sona
2016-10-01
In this study, we report the functional characterization of heterotrimeric G-proteins from a nonvascular plant, the moss Physcomitrella patens. In plants, G-proteins have been characterized from only a few angiosperms to date, where their involvement has been shown during regulation of multiple signaling and developmental pathways affecting overall plant fitness. In addition to its unparalleled evolutionary position in the plant lineages, the P. patens genome also codes for a unique assortment of G-protein components, which includes two copies of Gβ and Gγ genes, but no canonical Gα Instead, a single gene encoding an extra-large Gα (XLG) protein exists in the P. patens genome. Here, we demonstrate that in P. patens the canonical Gα is biochemically and functionally replaced by an XLG protein, which works in the same genetic pathway as one of the Gβ proteins to control its development. Furthermore, the specific G-protein subunits in P. patens are essential for its life cycle completion. Deletion of the genomic locus of PpXLG or PpGβ2 results in smaller, slower growing gametophores. Normal reproductive structures develop on these gametophores, but they are unable to form any sporophyte, the only diploid stage in the moss life cycle. Finally, the mutant phenotypes of ΔPpXLG and ΔPpGβ2 can be complemented by the homologous genes from Arabidopsis, AtXLG2 and AtAGB1, respectively, suggesting an overall conservation of their function throughout the plant evolution. © 2016 American Society of Plant Biologists. All Rights Reserved.
Identifying personal microbiomes using metagenomic codes
Franzosa, Eric A.; Huang, Katherine; Meadow, James F.; Gevers, Dirk; Lemon, Katherine P.; Bohannan, Brendan J. M.; Huttenhower, Curtis
2015-01-01
Community composition within the human microbiome varies across individuals, but it remains unknown if this variation is sufficient to uniquely identify individuals within large populations or stable enough to identify them over time. We investigated this by developing a hitting set-based coding algorithm and applying it to the Human Microbiome Project population. Our approach defined body site-specific metagenomic codes: sets of microbial taxa or genes prioritized to uniquely and stably identify individuals. Codes capturing strain variation in clade-specific marker genes were able to distinguish among 100s of individuals at an initial sampling time point. In comparisons with follow-up samples collected 30–300 d later, ∼30% of individuals could still be uniquely pinpointed using metagenomic codes from a typical body site; coincidental (false positive) matches were rare. Codes based on the gut microbiome were exceptionally stable and pinpointed >80% of individuals. The failure of a code to match its owner at a later time point was largely explained by the loss of specific microbial strains (at current limits of detection) and was only weakly associated with the length of the sampling interval. In addition to highlighting patterns of temporal variation in the ecology of the human microbiome, this work demonstrates the feasibility of microbiome-based identifiability—a result with important ethical implications for microbiome study design. The datasets and code used in this work are available for download from huttenhower.sph.harvard.edu/idability. PMID:25964341
RCDB: Renal Cancer Gene Database.
Ramana, Jayashree
2012-05-18
Renal cell carcinoma or RCC is one of the common and most lethal urological cancers, with 40% of the patients succumbing to death because of metastatic progression of the disease. Treatment of metastatic RCC remains highly challenging because of its resistance to chemotherapy as well as radiotherapy, besides surgical resection. Whereas RCC comprises tumors with differing histological types, clear cell RCC remains the most common. A major problem in the clinical management of patients presenting with localized ccRCC is the inability to determine tumor aggressiveness and accurately predict the risk of metastasis following surgery. As a measure to improve the diagnosis and prognosis of RCC, researchers have identified several molecular markers through a number of techniques. However the wealth of information available is scattered in literature and not easily amenable to data-mining. To reduce this gap, this work describes a comprehensive repository called Renal Cancer Gene Database, as an integrated gateway to study renal cancer related data. Renal Cancer Gene Database is a manually curated compendium of 240 protein-coding and 269 miRNA genes contributing to the etiology and pathogenesis of various forms of renal cell carcinomas. The protein coding genes have been classified according to the kind of gene alteration observed in RCC. RCDB also includes the miRNAsdysregulated in RCC, along with the corresponding information regarding the type of RCC and/or metastatic or prognostic significance. While some of the miRNA genes showed an association with other types of cancers few were unique to RCC. Users can query the database using keywords, category and chromosomal location of the genes. The knowledgebase can be freely accessed via a user-friendly web interface at http://www.juit.ac.in/attachments/jsr/rcdb/homenew.html. It is hoped that this database would serve as a useful complement to the existing public resources and as a good starting point for researchers and physicians interested in RCC genetics.
Cheng, Hui; Li, Jinfeng; Zhang, Hong; Cai, Binhua; Gao, Zhihong
2017-01-01
Compared with other members of the family Rosaceae, the chloroplast genomes of Fragaria species exhibit low variation, and this situation has limited phylogenetic analyses; thus, complete chloroplast genome sequencing of Fragaria species is needed. In this study, we sequenced the complete chloroplast genome of F. × ananassa ‘Benihoppe’ using the Illumina HiSeq 2500-PE150 platform and then performed a combination of de novo assembly and reference-guided mapping of contigs to generate complete chloroplast genome sequences. The chloroplast genome exhibits a typical quadripartite structure with a pair of inverted repeats (IRs, 25,936 bp) separated by large (LSC, 85,531 bp) and small (SSC, 18,146 bp) single-copy (SC) regions. The length of the F. × ananassa ‘Benihoppe’ chloroplast genome is 155,549 bp, representing the smallest Fragaria chloroplast genome observed to date. The genome encodes 112 unique genes, comprising 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Comparative analysis of the overall nucleotide sequence identity among ten complete chloroplast genomes confirmed that for both coding and non-coding regions in Rosaceae, SC regions exhibit higher sequence variation than IRs. The Ka/Ks ratio of most genes was less than 1, suggesting that most genes are under purifying selection. Moreover, the mVISTA results also showed a high degree of conservation in genome structure, gene order and gene content in Fragaria, particularly among three octoploid strawberries which were F. × ananassa ‘Benihoppe’, F. chiloensis (GP33) and F. virginiana (O477). However, when the sequences of the coding and non-coding regions of F. × ananassa ‘Benihoppe’ were compared in detail with those of F. chiloensis (GP33) and F. virginiana (O477), a number of SNPs and InDels were revealed by MEGA 7. Six non-coding regions (trnK-matK, trnS-trnG, atpF-atpH, trnC-petN, trnT-psbD and trnP-psaJ) with a percentage of variable sites greater than 1% and no less than five parsimony-informative sites were identified and may be useful for phylogenetic analysis of the genus Fragaria. PMID:29038765
The complete mitochondrial genome of Papilio glaucus and its phylogenetic implications.
Shen, Jinhui; Cong, Qian; Grishin, Nick V
2015-09-01
Due to the intriguing morphology, lifecycle, and diversity of butterflies and moths, Lepidoptera are emerging as model organisms for the study of genetics, evolution and speciation. The progress of these studies relies on decoding Lepidoptera genomes, both nuclear and mitochondrial. Here we describe a protocol to obtain mitogenomes from Next Generation Sequencing reads performed for whole-genome sequencing and report the complete mitogenome of Papilio (Pterourus) glaucus. The circular mitogenome is 15,306 bp in length and rich in A and T. It contains 13 protein-coding genes (PCGs), 22 transfer-RNA-coding genes (tRNA), and 2 ribosomal-RNA-coding genes (rRNA), with a gene order typical for mitogenomes of Lepidoptera. We performed phylogenetic analyses based on PCG and RNA-coding genes or protein sequences using Bayesian Inference and Maximum Likelihood methods. The phylogenetic trees consistently show that among species with available mitogenomes Papilio glaucus is the closest to Papilio (Agehana) maraho from Asia.
Benoit, Joshua B; Attardo, Geoffrey M; Michalkova, Veronika; Krause, Tyler B; Bohova, Jana; Zhang, Qirui; Baumann, Aaron A; Mireji, Paul O; Takáč, Peter; Denlinger, David L; Ribeiro, Jose M; Aksoy, Serap
2014-04-01
In tsetse flies, nutrients for intrauterine larval development are synthesized by the modified accessory gland (milk gland) and provided in mother's milk during lactation. Interference with at least two milk proteins has been shown to extend larval development and reduce fecundity. The goal of this study was to perform a comprehensive characterization of tsetse milk proteins using lactation-specific transcriptome/milk proteome analyses and to define functional role(s) for the milk proteins during lactation. Differential analysis of RNA-seq data from lactating and dry (non-lactating) females revealed enrichment of transcripts coding for protein synthesis machinery, lipid metabolism and secretory proteins during lactation. Among the genes induced during lactation were those encoding the previously identified milk proteins (milk gland proteins 1-3, transferrin and acid sphingomyelinase 1) and seven new genes (mgp4-10). The genes encoding mgp2-10 are organized on a 40 kb syntenic block in the tsetse genome, have similar exon-intron arrangements, and share regions of amino acid sequence similarity. Expression of mgp2-10 is female-specific and high during milk secretion. While knockdown of a single mgp failed to reduce fecundity, simultaneous knockdown of multiple variants reduced milk protein levels and lowered fecundity. The genomic localization, gene structure similarities, and functional redundancy of MGP2-10 suggest that they constitute a novel highly divergent protein family. Our data indicates that MGP2-10 function both as the primary amino acid resource for the developing larva and in the maintenance of milk homeostasis, similar to the function of the mammalian casein family of milk proteins. This study underscores the dynamic nature of the lactation cycle and identifies a novel family of lactation-specific proteins, unique to Glossina sp., that are essential to larval development. The specificity of MGP2-10 to tsetse and their critical role during lactation suggests that these proteins may be an excellent target for tsetse-specific population control approaches.
Emergence and Evolution of Hominidae-Specific Coding and Noncoding Genomic Sequences.
Saber, Morteza Mahmoudi; Adeyemi Babarinde, Isaac; Hettiarachchi, Nilmini; Saitou, Naruya
2016-07-12
Family Hominidae, which includes humans and great apes, is recognized for unique complex social behavior and intellectual abilities. Despite the increasing genome data, however, the genomic origin of its phenotypic uniqueness has remained elusive. Clade-specific genes and highly conserved noncoding sequences (HCNSs) are among the high-potential evolutionary candidates involved in driving clade-specific characters and phenotypes. On this premise, we analyzed whole genome sequences along with gene orthology data retrieved from major DNA databases to find Hominidae-specific (HS) genes and HCNSs. We discovered that Down syndrome critical region 4 (DSCR4) is the only experimentally verified gene uniquely present in Hominidae. DSCR4 has no structural homology to any known protein and was inferred to have emerged in several steps through LTR/ERV1, LTR/ERVL retrotransposition, and transversion. Using the genomic distance as neutral evolution threshold, we identified 1,658 HS HCNSs. Polymorphism coverage and derived allele frequency analysis of HS HCNSs showed that these HCNSs are under purifying selection, indicating that they may harbor important functions. They are overrepresented in promoters/untranslated regions, in close proximity of genes involved in sensory perception of sound and developmental process, and also showed a significantly lower nucleosome occupancy probability. Interestingly, many ancestral sequences of the HS HCNSs showed very high evolutionary rates. This suggests that new functions emerged through some kind of positive selection, and then purifying selection started to operate to keep these functions. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Molecular characterization of the pL40 protein in Leptospira interrogans.
Zhao, Wei; Chen, Chun-Yan; Zhang, Xiang-Yan; Lai, Wei-Qiang; Hu, Bao-Yu; Zhao, Guo-Ping; Qin, Jin-Hong; Guo, Xiao-Kui
2009-06-01
Leptospirosis is a widespread zoonotic disease caused by pathogenic leptospires. The identification of outer membrane proteins (OMPs) conserved among pathogenic leptospires, which are exposed on the leptospiral surface and expressed during mammalian infection, has become a major focus of leptospirosis research. pL40, a 40 kDa protein coded by the LA3744 gene in Leptospira interrogans, was found to be unique to Leptospira. Triton X-114 fractionation and flow cytometry analyses indicate that pL40 is a component of the leptospiral outer membrane. The conservation of pL40 among Leptospira strains prevalent in China was confirmed by both Western blotting and PCR screening. Furthermore, the pL40 antigen could be recognized by sera from guinea pigs and mice infected with low-passage L. interrogans. These findings indicate that pL40 may serve as a useful serodiagnostic antigen and vaccine candidate for L. interrogans.
A novel TBP-TAF complex on RNA polymerase II-transcribed snRNA genes.
Zaborowska, Justyna; Taylor, Alice; Roeder, Robert G; Murphy, Shona
2012-01-01
Initiation of transcription of most human genes transcribed by RNA polymerase II (RNAP II) requires the formation of a preinitiation complex comprising TFIIA, B, D, E, F, H and RNAP II. The general transcription factor TFIID is composed of the TATA-binding protein and up to 13 TBP-associated factors. During transcription of snRNA genes, RNAP II does not appear to make the transition to long-range productive elongation, as happens during transcription of protein-coding genes. In addition, recognition of the snRNA gene-type specific 3' box RNA processing element requires initiation from an snRNA gene promoter. These characteristics may, at least in part, be driven by factors recruited to the promoter. For example, differences in the complement of TAFs might result in differential recruitment of elongation and RNA processing factors. As precedent, it already has been shown that the promoters of some protein-coding genes do not recruit all the TAFs found in TFIID. Although TAF5 has been shown to be associated with RNAP II-transcribed snRNA genes, the full complement of TAFs associated with these genes has remained unclear. Here we show, using a ChIP and siRNA-mediated approach, that the TBP/TAF complex on snRNA genes differs from that found on protein-coding genes. Interestingly, the largest TAF, TAF1, and the core TAFs, TAF10 and TAF4, are not detected on snRNA genes. We propose that this snRNA gene-specific TAF subset plays a key role in gene type-specific control of expression.
Retrieval of Enterobacteriaceae drug targets using singular value decomposition.
Silvério-Machado, Rita; Couto, Bráulio R G M; Dos Santos, Marcos A
2015-04-15
The identification of potential drug target proteins in bacteria is important in pharmaceutical research for the development of new antibiotics to combat bacterial agents that cause diseases. A new model that combines the singular value decomposition (SVD) technique with biological filters composed of a set of protein properties associated with bacterial drug targets and similarity to protein-coding essential genes of Escherichia coli (strain K12) has been created to predict potential antibiotic drug targets in the Enterobacteriaceae family. This model identified 99 potential drug target proteins in the studied family, which exhibit eight different functions and are protein-coding essential genes or similar to protein-coding essential genes of E.coli (strain K12), indicating that the disruption of the activities of these proteins is critical for cells. Proteins from bacteria with described drug resistance were found among the retrieved candidates. These candidates have no similarity to the human proteome, therefore exhibiting the advantage of causing no adverse effects or at least no known adverse effects on humans. rita_silverio@hotmail.com. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan
2015-12-11
High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
MHC class I-associated peptides derive from selective regions of the human genome.
Pearson, Hillary; Daouda, Tariq; Granados, Diana Paola; Durette, Chantal; Bonneil, Eric; Courcelles, Mathieu; Rodenbrock, Anja; Laverdure, Jean-Philippe; Côté, Caroline; Mader, Sylvie; Lemieux, Sébastien; Thibault, Pierre; Perreault, Claude
2016-12-01
MHC class I-associated peptides (MAPs) define the immune self for CD8+ T lymphocytes and are key targets of cancer immunosurveillance. Here, the goals of our work were to determine whether the entire set of protein-coding genes could generate MAPs and whether specific features influence the ability of discrete genes to generate MAPs. Using proteogenomics, we have identified 25,270 MAPs isolated from the B lymphocytes of 18 individuals who collectively expressed 27 high-frequency HLA-A,B allotypes. The entire MAP repertoire presented by these 27 allotypes covered only 10% of the exomic sequences expressed in B lymphocytes. Indeed, 41% of expressed protein-coding genes generated no MAPs, while 59% of genes generated up to 64 MAPs, often derived from adjacent regions and presented by different allotypes. We next identified several features of transcripts and proteins associated with efficient MAP production. From these data, we built a logistic regression model that predicts with good accuracy whether a gene generates MAPs. Our results show preferential selection of MAPs from a limited repertoire of proteins with distinctive features. The notion that the MHC class I immunopeptidome presents only a small fraction of the protein-coding genome for monitoring by the immune system has profound implications in autoimmunity and cancer immunology.
MHC class I–associated peptides derive from selective regions of the human genome
Pearson, Hillary; Granados, Diana Paola; Durette, Chantal; Bonneil, Eric; Courcelles, Mathieu; Rodenbrock, Anja; Laverdure, Jean-Philippe; Côté, Caroline; Thibault, Pierre
2016-01-01
MHC class I–associated peptides (MAPs) define the immune self for CD8+ T lymphocytes and are key targets of cancer immunosurveillance. Here, the goals of our work were to determine whether the entire set of protein-coding genes could generate MAPs and whether specific features influence the ability of discrete genes to generate MAPs. Using proteogenomics, we have identified 25,270 MAPs isolated from the B lymphocytes of 18 individuals who collectively expressed 27 high-frequency HLA-A,B allotypes. The entire MAP repertoire presented by these 27 allotypes covered only 10% of the exomic sequences expressed in B lymphocytes. Indeed, 41% of expressed protein-coding genes generated no MAPs, while 59% of genes generated up to 64 MAPs, often derived from adjacent regions and presented by different allotypes. We next identified several features of transcripts and proteins associated with efficient MAP production. From these data, we built a logistic regression model that predicts with good accuracy whether a gene generates MAPs. Our results show preferential selection of MAPs from a limited repertoire of proteins with distinctive features. The notion that the MHC class I immunopeptidome presents only a small fraction of the protein-coding genome for monitoring by the immune system has profound implications in autoimmunity and cancer immunology. PMID:27841757
NASA Astrophysics Data System (ADS)
Gao, Fengtao; Wei, Min; Zhu, Ying; Guo, Hua; Chen, Songlin; Yang, Guanpin
2017-06-01
This study presents the complete mitochondrial genome of the hybrid Epinephelus moara♀× Epinephelus lanceolatus♂. The genome is 16886 bp in length, and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, a light-strand replication origin and a control region. Additionally, phylogenetic analysis based on the nucleotide sequences of 13 conserved protein-coding genes using the maximum likelihood method indicated that the mitochondrial genome is maternally inherited. This study presents genomic data for studying phylogenetic relationships and breeding of hybrid Epinephelinae.
Identification of three novel NHS mutations in families with Nance-Horan syndrome.
Huang, Kristen M; Wu, Junhua; Brooks, Simon P; Hardcastle, Alison J; Lewis, Richard Alan; Stambolian, Dwight
2007-03-27
Nance-Horan Syndrome (NHS) is an infrequent and often overlooked X-linked disorder characterized by dense congenital cataracts, microphthalmia, and dental abnormalities. The syndrome is caused by mutations in the NHS gene, whose function is not known. The purpose of this study was to identify the frequency and distribution of NHS gene mutations and compare genotype with Nance-Horan phenotype in five North American NHS families. Genomic DNA was isolated from white blood cells from NHS patients and family members. The NHS gene coding region and its splice site donor and acceptor regions were amplified from genomic DNA by PCR, and the amplicons were sequenced directly. We identified three unique NHS coding region mutations in these NHS families. This report extends the number of unique identified NHS mutations to 14.
Zhao, Yujia; Fan, Jingjing; Li, Jinlin; Li, Jun; Zhou, Xiaohong; Li, Chun
2016-12-01
Small non-coding RNAs (sRNAs) have received much attention in recent years due to their unique biological properties, which can efficiently and specifically tune target gene expressions in bacteria. Inspired by natural sRNAs, recent works have proposed the use of artificial sRNAs (asRNAs) as genetic tools to regulate desired gene that has been applied in several fields, such as metabolic engineering and bacterial physiology studies. However, the rational design of asRNAs is still a challenge. In this study, we proposed structure and length as two criteria to implement rational visualized and precise design of asRNAs. T7 expression system was one of the most useful recombinant protein expression systems. However, it was deeply limited by the formation of inclusion body. To settle this problem, we designed a series of asRNAs to inhibit the T7 RNA polymerase (Gene1) expression to balance the rate between transcription and folding of recombinant protein. Based on the heterologous expression of Aspergillus oryzae Li-3 glucuronidase in E. coli , the asRNA-antigene1-17bp can effectively decrease the inclusion body and increase the enzyme activity by 169.9%.
Mehdizadeh Gohari, Iman; Kropinski, Andrew M; Weese, Scott J; Parreira, Valeria R; Whitehead, Ashley E; Boerlin, Patrick; Prescott, John F
2016-01-01
The recent discovery of a novel beta-pore-forming toxin, NetF, which is strongly associated with canine and foal necrotizing enteritis should improve our understanding of the role of type A Clostridium perfringens associated disease in these animals. The current study presents the complete genome sequence of two netF-positive strains, JFP55 and JFP838, which were recovered from cases of foal necrotizing enteritis and canine hemorrhagic gastroenteritis, respectively. Genome sequencing was done using Single Molecule, Real-Time (SMRT) technology-PacBio and Illumina Hiseq2000. The JFP55 and JFP838 genomes include a single 3.34 Mb and 3.53 Mb chromosome, respectively, and both genomes include five circular plasmids. Plasmid annotation revealed that three plasmids were shared by the two newly sequenced genomes, including a NetF/NetE toxins-encoding tcp-conjugative plasmid, a CPE/CPB2 toxins-encoding tcp-conjugative plasmid and a putative bacteriocin-encoding plasmid. The putative beta-pore-forming toxin genes, netF, netE and netG, were located in unique pathogenicity loci on tcp-conjugative plasmids. The C. perfringens JFP55 chromosome carries 2,825 protein-coding genes whereas the chromosome of JFP838 contains 3,014 protein-encoding genes. Comparison of these two chromosomes with three available reference C. perfringens chromosome sequences identified 48 (~247 kb) and 81 (~430 kb) regions unique to JFP55 and JFP838, respectively. Some of these divergent genomic regions in both chromosomes are phage- and plasmid-related segments. Sixteen of these unique chromosomal regions (~69 kb) were shared between the two isolates. Five of these shared regions formed a mosaic of plasmid-integrated segments, suggesting that these elements were acquired early in a clonal lineage of netF-positive C. perfringens strains. These results provide significant insight into the basis of canine and foal necrotizing enteritis and are the first to demonstrate that netF resides on a large and unique plasmid-encoded locus.
Lefebvre, François; Joly, David L.; Labbé, Caroline; Teichmann, Beate; Linning, Rob; Belzile, François; Bakkeren, Guus; Bélanger, Richard R.
2013-01-01
Pseudozyma flocculosa is related to the model plant pathogen Ustilago maydis yet is not a phytopathogen but rather a biocontrol agent of powdery mildews; this relationship makes it unique for the study of the evolution of plant pathogenicity factors. The P. flocculosa genome of ∼23 Mb includes 6877 predicted protein coding genes. Genome features, including hallmarks of pathogenicity, are very similar in P. flocculosa and U. maydis, Sporisorium reilianum, and Ustilago hordei. Furthermore, P. flocculosa, a strict anamorph, revealed conserved and seemingly intact mating-type and meiosis loci typical of Ustilaginales. By contrast, we observed the loss of a specific subset of candidate secreted effector proteins reported to influence virulence in U. maydis as the singular divergence that could explain its nonpathogenic nature. These results suggest that P. flocculosa could have once been a virulent smut fungus that lost the specific effectors necessary for host compatibility. Interestingly, the biocontrol agent appears to have acquired genes encoding secreted proteins not found in the compared Ustilaginales, including necrosis-inducing-Phytophthora-protein- and Lysin-motif- containing proteins believed to have direct relevance to its lifestyle. The genome sequence should contribute to new insights into the subtle genetic differences that can lead to drastic changes in fungal pathogen lifestyles. PMID:23800965
Evaluation of 10 genes encoding cardiac proteins in Doberman Pinschers with dilated cardiomyopathy.
O'Sullivan, M Lynne; O'Grady, Michael R; Pyle, W Glen; Dawson, John F
2011-07-01
To identify a causative mutation for dilated cardiomyopathy (DCM) in Doberman Pinschers by sequencing the coding regions of 10 cardiac genes known to be associated with familial DCM in humans. 5 Doberman Pinschers with DCM and congestive heart failure and 5 control mixed-breed dogs that were euthanized or died. RNA was extracted from frozen ventricular myocardial samples from each dog, and first-strand cDNA was synthesized via reverse transcription, followed by PCR amplification with gene-specific primers. Ten cardiac genes were analyzed: cardiac actin, α-actinin, α-tropomyosin, β-myosin heavy chain, metavinculin, muscle LIM protein, myosinbinding protein C, tafazzin, titin-cap (telethonin), and troponin T. Sequences for DCM-affected and control dogs and the published canine genome were compared. None of the coding sequences yielded a common causative mutation among all Doberman Pinscher samples. However, 3 variants were identified in the α-actinin gene in the DCM-affected Doberman Pinschers. One of these variants, identified in 2 of the 5 Doberman Pinschers, resulted in an amino acid change in the rod-forming triple coiled-coil domain. Mutations in the coding regions of several genes associated with DCM in humans did not appear to consistently account for DCM in Doberman Pinschers. However, an α-actinin variant was detected in some Doberman Pinschers that may contribute to the development of DCM given its potential effect on the structure of this protein. Investigation of additional candidate gene coding and noncoding regions and further evaluation of the role of α-actinin in development of DCM in Doberman Pinschers are warranted.
Regan, Kelly; Wang, Kanix; Doughty, Emily; Li, Haiquan; Li, Jianrong; Lee, Younghee; Kann, Maricel G
2012-01-01
Objective Although trait-associated genes identified as complex versus single-gene inheritance differ substantially in odds ratio, the authors nonetheless posit that their mechanistic concordance can reveal fundamental properties of the genetic architecture, allowing the automated interpretation of unique polymorphisms within a personal genome. Materials and methods An analytical method, SPADE-gen, spanning three biological scales was developed to demonstrate the mechanistic concordance between Mendelian and complex inheritance of Alzheimer's disease (AD) genes: biological functions (BP), protein interaction modeling, and protein domain implicated in the disease-associated polymorphism. Results Among Gene Ontology (GO) biological processes (BP) enriched at a false detection rate <5% in 15 AD genes of Mendelian inheritance (Online Mendelian Inheritance in Man) and independently in those of complex inheritance (25 host genes of intragenic AD single-nucleotide polymorphisms confirmed in genome-wide association studies), 16 overlapped (empirical p=0.007) and 45 were similar (empirical p<0.009; information theory). SPAN network modeling extended the canonical pathway of AD (KEGG) with 26 new protein interactions (empirical p<0.0001). Discussion The study prioritized new AD-associated biological mechanisms and focused the analysis on previously unreported interactions associated with the biological processes of polymorphisms that affect specific protein domains within characterized AD genes and their direct interactors using (1) concordant GO-BP and (2) domain interactions within STRING protein–protein interactions corresponding to the genomic location of the AD polymorphism (eg, EPHA1, APOE, and CD2AP). Conclusion These results are in line with unique-event polymorphism theory, indicating how disease-associated polymorphisms of Mendelian or complex inheritance relate genetically to those observed as ‘unique personal variants’. They also provide insight for identifying novel targets, for repositioning drugs, and for personal therapeutics. PMID:22319180
CXC chemokine ligand 4 induces a unique transcriptome in monocyte-derived macrophages.
Gleissner, Christian A; Shaked, Iftach; Little, Kristina M; Ley, Klaus
2010-05-01
In atherosclerotic arteries, blood monocytes differentiate to macrophages in the presence of growth factors, such as macrophage colony-stimulation factor (M-CSF), and chemokines, such as platelet factor 4 (CXCL4). To compare the gene expression signature of CXCL4-induced macrophages with M-CSF-induced macrophages or macrophages polarized with IFN-gamma/LPS (M1) or IL-4 (M2), we cultured primary human peripheral blood monocytes for 6 d. mRNA expression was measured by Affymetrix gene chips, and differences were analyzed by local pooled error test, profile of complex functionality, and gene set enrichment analysis. Three hundred seventy-five genes were differentially expressed between M-CSF- and CXCL4-induced macrophages; 206 of them overexpressed in CXCL4 macrophages coding for genes implicated in the inflammatory/immune response, Ag processing and presentation, and lipid metabolism. CXCL4-induced macrophages overexpressed some M1 and M2 genes and the corresponding cytokines at the protein level; however, their transcriptome clustered with neither M1 nor M2 transcriptomes. They almost completely lost the ability to phagocytose zymosan beads. Genes linked to atherosclerosis were not consistently upregulated or downregulated. Scavenger receptors showed lower and cholesterol efflux transporters showed higher expression in CXCL4- than M-CSF-induced macrophages, resulting in lower low-density lipoprotein content. We conclude that CXCL4 induces a unique macrophage transcriptome distinct from known macrophage types, defining a new macrophage differentiation that we propose to call M4.
López-Wilchis, Ricardo; Del Río-Portilla, Miguel Ángel; Guevara-Chumacero, Luis Manuel
2017-02-01
We described the complete mitochondrial genome (mitogenome) of the Wagner's mustached bat, Pteronotus personatus, a species belonging to the family Mormoopidae, and compared it with other published mitogenomes of bats (Chiroptera). The mitogenome of P. personatus was 16,570 bp long and contained a typically conserved structure including 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and one control region (D-loop). Most of the genes were encoded on the H-strand, except for eight tRNA and the ND6 genes. The order of protein-coding and rRNA genes was highly conserved in all mitogenomes. All protein-coding genes started with an ATG codon, except for ND2, ND3, and ND5, which initiated with ATA, and terminated with the typical stop codon TAA/TAG or the codon AGA. Phylogenetic trees constructed using Maximum Parsimony, Maximum Likelihood, and Bayesian inference methods showed an identical topology and indicated the monophyly of different families of bats (Mormoopidae, Phyllostomidae, Vespertilionidae, Rhinolophidae, and Pteropopidae) and the existence of two major clades corresponding to the suborders Yangochiroptera and Yinpterochiroptera. The mitogenome sequence provided here will be useful for further phylogenetic analyses and population genetic studies in mormoopid bats.
Chen, Wei-Hua; Wang, Xue-Xia; Lin, Wei; He, Xiao-Wei; Wu, Zhen-Qiang; Lin, Ying; Hu, Song-Nian; Wang, Xiao-Ning
2006-01-01
Background The cynomolgus monkey (Macaca fascicularis) is one of the most widely used surrogate animal models for an increasing number of human diseases and vaccines, especially immune-system-related ones. Towards a better understanding of the gene expression background upon its immunogenetics, we constructed a cDNA library from Epstein-Barr virus (EBV)-transformed B lymphocytes of a cynomolgus monkey and sequenced 10,000 randomly picked clones. Results After processing, 8,312 high-quality expressed sequence tags (ESTs) were generated and assembled into 3,728 unigenes. Annotations of these uniquely expressed transcripts demonstrated that out of the 2,524 open reading frame (ORF) positive unigenes (mitochondrial and ribosomal sequences were not included), 98.8% shared significant similarities (E-value less than 1e-10) with the NCBI nucleotide (nt) database, while only 67.7% (E-value less than 1e-5) did so with the NCBI non-redundant protein (nr) database. Further analysis revealed that 90.0% of the unigenes that shared no similarities to the nr database could be assigned to human chromosomes, in which 75 did not match significantly to any cynomolgus monkey and human ESTs. The mapping regions to known human genes on the human genome were described in detail. The protein family and domain analysis revealed that the first, second and fourth of the most abundantly expressed protein families were all assigned to immunoglobulin and major histocompatibility complex (MHC)-related proteins. The expression profiles of these genes were compared with that of homologous genes in human blood, lymph nodes and a RAMOS cell line, which demonstrated expression changes after transformation with EBV. The degree of sequence similarity of the MHC class I and II genes to the human reference sequences was evaluated. The results indicated that class I molecules showed weak amino acid identities (<90%), while class II showed slightly higher ones. Conclusion These results indicated that the genes expressed in the cynomolgus monkey could be used to identify novel protein-coding genes and revise those incomplete or incorrect annotations in the human genome by comparative methods, since the old world monkeys and humans share high similarities at the molecular level, especially within coding regions. The identification of multiple genes involved in the immune response, their sequence variations to the human homologues, and their responses to EBV infection could provide useful information to improve our understanding of the cynomolgus monkey immune system. PMID:16618371
Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia
2015-01-01
Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/. © The Author(s) 2015. Published by Oxford University Press.
Decoding the disease-associated proteins encoded in the human chromosome 4.
Chen, Lien-Chin; Liu, Mei-Ying; Hsiao, Yung-Chin; Choong, Wai-Kok; Wu, Hsin-Yi; Hsu, Wen-Lian; Liao, Pao-Chi; Sung, Ting-Yi; Tsai, Shih-Feng; Yu, Jau-Song; Chen, Yu-Ju
2013-01-04
Chromosome 4 is the fourth largest chromosome, containing approximately 191 megabases (~6.4% of the human genome) with 757 protein-coding genes. A number of marker genes for many diseases have been found in this chromosome, including genetic diseases (e.g., hepatocellular carcinoma) and biomedical research (cardiac system, aging, metabolic disorders, immune system, cancer and stem cell) related genes (e.g., oncogenes, growth factors). As a pilot study for the chromosome 4-centric human proteome project (Chr 4-HPP), we present here a systematic analysis of the disease association, protein isoforms, coding single nucleotide polymorphisms of these 757 protein-coding genes and their experimental evidence at the protein level. We also describe how the findings from the chromosome 4 project might be used to drive the biomarker discovery and validation study in disease-oriented projects, using the examples of secretomic and membrane proteomic approaches in cancer research. By integrating with cancer cell secretomes and several other existing databases in the public domain, we identified 141 chromosome 4-encoded proteins as cancer cell-secretable/shedable proteins. Additionally, we also identified 54 chromosome 4-encoded proteins that have been classified as cancer-associated proteins with successful selected or multiple reaction monitoring (SRM/MRM) assays developed. From literature annotation and topology analysis, 271 proteins were recognized as membrane proteins while 27.9% of the 757 proteins do not have any experimental evidence at the protein-level. In summary, the analysis revealed that the chromosome 4 is a rich resource for cancer-associated proteins for biomarker verification projects and for drug target discovery projects.
D'Onofrio, Giuseppe; Ghosh, Tapash Chandra
2005-01-17
Fluctuations and increments of both C(3) and G(3) levels along the human coding sequences were investigated comparing two sets of Xenopus/human orthologous genes. The first set of genes shows minor differences of the GC(3) levels, the second shows considerable increments of the GC(3) levels in the human genes. In both data sets, the fluctuations of C(3) and G(3) levels along the coding sequences correlated with the secondary structures of the encoded proteins. The human genes that underwent the compositional transition showed a different increment of the C(3) and G(3) levels within and among the structural units of the proteins. The relative synonymous codon usage (RSCU) of several amino acids were also affected during the compositional transition, showing that there exists a correlation between RSCU and protein secondary structures in human genes. The importance of natural selection for the formation of isochore organization of the human genome has been discussed on the basis of these results.
Shrestha, Prashanta; Smith, Mark Thomas; Bundy, Bradley Charles
2014-01-25
Site-specific incorporation of unnatural amino acids (uAAs) during protein synthesis expands the proteomic code through the addition of unique residue chemistry. This field provides a unique tool to improve pharmacokinetics, cancer treatments, vaccine development, proteomics and protein engineering. The limited ability to predict the characteristics of proteins with uAA-incorporation creates a need for a low-cost system with the potential for rapid screening. Escherichia coli-based cell-free protein synthesis is a compelling platform for uAA incorporation due to the open and accessible nature of the reaction environment. However, typical cell-free systems can be expensive due to the high cost of energizing reagents. By employing alternative energy sources, we reduce the cost of uAA-incorporation in CFPS by 55%. While alternative energy systems reduce cost, the time investment to develop gene libraries can remain cumbersome. Cell-free systems allow the direct use of PCR products known as linear expression templates, thus alleviating tedious plasmid library preparations steps. We report the specific costs of CFPS with uAA incorporation, demonstrate that LETs are suitable expression templates with uAA-incorporation, and consider the substantial reduction in labor intensity using LET-based expression for CFPS uAA incorporation. Copyright © 2013 Elsevier B.V. All rights reserved.
Comparative Genomics of Carp Herpesviruses
Kurobe, Tomofumi; Gatherer, Derek; Cunningham, Charles; Korf, Ian; Fukuda, Hideo; Hedrick, Ronald P.; Waltzek, Thomas B.
2013-01-01
Three alloherpesviruses are known to cause disease in cyprinid fish: cyprinid herpesviruses 1 and 3 (CyHV1 and CyHV3) in common carp and koi and cyprinid herpesvirus 2 (CyHV2) in goldfish. We have determined the genome sequences of CyHV1 and CyHV2 and compared them with the published CyHV3 sequence. The CyHV1 and CyHV2 genomes are 291,144 and 290,304 bp, respectively, in size, and thus the CyHV3 genome, at 295,146 bp, remains the largest recorded among the herpesviruses. Each of the three genomes consists of a unique region flanked at each terminus by a sizeable direct repeat. The CyHV1, CyHV2, and CyHV3 genomes are predicted to contain 137, 150, and 155 unique, functional protein-coding genes, respectively, of which six, four, and eight, respectively, are duplicated in the terminal repeat. The three viruses share 120 orthologous genes in a largely colinear arrangement, of which up to 55 are also conserved in the other member of the genus Cyprinivirus, anguillid herpesvirus 1. Twelve genes are conserved convincingly in all sequenced alloherpesviruses, and two others are conserved marginally. The reference CyHV3 strain has been reported to contain five fragmented genes that are presumably nonfunctional. The CyHV2 strain has two fragmented genes, and the CyHV1 strain has none. CyHV1, CyHV2, and CyHV3 have five, six, and five families of paralogous genes, respectively. One family unique to CyHV1 is related to cellular JUNB, which encodes a transcription factor involved in oncogenesis. To our knowledge, this is the first time that JUNB-related sequences have been reported in a herpesvirus. PMID:23269803
Djordjevic, Michael A; Chen, Han Cai; Natera, Siria; Van Noorden, Giel; Menzel, Christian; Taylor, Scott; Renard, Clotilde; Geiger, Otto; Weiller, Georg F
2003-06-01
A proteomic examination of Sinorhizobium meliloti strain 1021 was undertaken using a combination of 2-D gel electrophoresis, peptide mass fingerprinting, and bioinformatics. Our goal was to identify (i) putative symbiosis- or nutrient-stress-specific proteins, (ii) the biochemical pathways active under different conditions, (iii) potential new genes, and (iv) the extent of posttranslational modifications of S. meliloti proteins. In total, we identified the protein products of 810 genes (13.1% of the genome's coding capacity). The 810 genes generated 1,180 gene products, with chromosomal genes accounting for 78% of the gene products identified (18.8% of the chromosome's coding capacity). The activity of 53 metabolic pathways was inferred from bioinformatic analysis of proteins with assigned Enzyme Commission numbers. Of the remaining proteins that did not encode enzymes, ABC-type transporters composed 12.7% and regulatory proteins 3.4% of the total. Proteins with up to seven transmembrane domains were identified in membrane preparations. A total of 27 putative nodule-specific proteins and 35 nutrient-stress-specific proteins were identified and used as a basis to define genes and describe processes occurring in S. meliloti cells in nodules and under stress. Several nodule proteins from the plant host were present in the nodule bacteria preparations. We also identified seven potentially novel proteins not predicted from the DNA sequence. Post-translational modifications such as N-terminal processing could be inferred from the data. The posttranslational addition of UMP to the key regulator of nitrogen metabolism, PII, was demonstrated. This work demonstrates the utility of combining mass spectrometry with protein arraying or separation techniques to identify candidate genes involved in important biological processes and niche occupations that may be intransigent to other methods of gene expression profiling.
APPRIS 2017: principal isoforms for multiple gene sets
Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso
2018-01-01
Abstract The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants. PMID:29069475
GeneBuilder: interactive in silico prediction of gene structure.
Milanesi, L; D'Angelo, D; Rogozin, I B
1999-01-01
Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
The gene coding for the B cell surface protein CD19 is localized on human chromosome 16p11.
Stapleton, P; Kozmik, Z; Weith, A; Busslinger, M
1995-02-01
The CD19 gene codes for one of the earliest markers of the human B cell lineage and is a target for the B lymphoid-specific transcription factor BSAP (Pax-5). The transmembrane protein CD19 has been implicated in controlling proliferation of mature B lymphocytes by modulating signal transduction through the antigen receptor. In this study, we have employed Southern blot and fluorescence in situ hybridization analyses to localize the CD19 gene to human chromosome 16p11.
A murC gene in Porphyromonas gingivalis 381.
Ansai, T; Yamashita, Y; Awano, S; Shibata, Y; Wachi, M; Nagai, K; Takehara, T
1995-09-01
The gene encoding a 51 kDa polypeptide of Porphyromonas gingivalis 381 was isolated by immunoblotting using an antiserum raised against P. gingivalis alkaline phosphatase. DNA sequence analysis of a 2.5 kb DNA fragment containing a gene encoding the 51 kDa protein revealed one complete and two incomplete ORFs. Database searches using the FASTA program revealed significant homology between the P. gingivalis 51 kDa protein and the MurC protein of Escherichia coli, which functions in peptidoglycan synthesis. The cloned 51 kDa protein encoded a functional product that complemented an E. coli murC mutant. Moreover, the ORF just upstream of murC coded for a protein that was 31% homologous with the E. coli MurG protein. The ORF just downstream of murC coded for a protein that was 17% homologous with the Streptococcus pneumoniae penicillin-binding protein 2B (PBP2B), which functions in peptidoglycan synthesis and is responsible for antibiotic resistance. These results suggest that P. gingivalis contains a homologue of the E. coli peptidoglycan synthesis gene murC and indicate the possibility of a cluster of genes responsible for cell division and cell growth, as in the E. coli mra region.
Li, Weijun; Wang, Zongqing; Che, Yanli
2017-11-12
In this study, the complete mitochondrial genome of Cryptocercus meridianus was sequenced. The circular mitochondrial genome is 15,322 bp in size and contains 13 protein-coding genes, two ribosomal RNA genes (12S rRNA and 16S rRNA), 22 transfer RNA genes, and one D-loop region. We compare the mitogenome of C. meridianus with that of C. relictus and C. kyebangensis . The base composition of the whole genome was 45.20%, 9.74%, 16.06%, and 29.00% for A, G, C, and T, respectively; it shows a high AT content (74.2%), similar to the mitogenomes of C. relictus and C. kyebangensis . The protein-coding genes are initiated with typical mitochondrial start codons except for cox1 with TTG. The gene order of the C. meridianus mitogenome differs from the typical insect pattern for the translocation of tRNA-Ser AGN , while the mitogenomes of the other two Cryptocercus species, C. relictus and C. kyebangensis , are consistent with the typical insect pattern. There are two very long non-coding intergenic regions lying on both sides of the rearranged gene tRNA-Ser AGN . The phylogenetic relationships were constructed based on the nucleotide sequence of 13 protein-coding genes and two ribosomal RNA genes. The mitogenome of C. meridianus is the first representative of the order Blattodea that demonstrates rearrangement, and it will contribute to the further study of the phylogeny and evolution of the genus Cryptocercus and related taxa.
Chien, Maw-Sheng; Gilbert , Teresa L.; Huang, Chienjin; Landolt, Marsha L.; O'Hara, Patrick J.; Winton, James R.
1992-01-01
The complete sequence coding for the 57-kDa major soluble antigen of the salmonid fish pathogen, Renibacterium salmoninarum, was determined. The gene contained an opening reading frame of 1671 nucleotides coding for a protein of 557 amino acids with a calculated Mr value of 57190. The first 26 amino acids constituted a signal peptide. The deduced sequence for amino acid residues 27–61 was in agreement with the 35 N-terminal amino acid residues determined by microsequencing, suggesting the protein in synthesized as a 557-amino acid precursor and processed to produce a mature protein of Mr 54505. Two regions of the protein contained imperfect direct repeats. The first region contained two copies of an 81-residue repeat, the second contained five copies of an unrelated 25-residue repeat. Also, a perfect inverted repeat (including three in-frame UAA stop codons) was observed at the carboxyl-terminus of the gene.
Sun, Miao-Miao; Han, Liang; Zhang, Fu-Kai; Zhou, Dong-Hui; Wang, Shu-Qing; Ma, Jun; Zhu, Xing-Quan; Liu, Guo-Hua
2018-01-01
Marshallagia marshalli (Nematoda: Trichostrongylidae) infection can lead to serious parasitic gastroenteritis in sheep, goat, and wild ruminant, causing significant socioeconomic losses worldwide. Up to now, the study concerning the molecular biology of M. marshalli is limited. Herein, we sequenced the complete mitochondrial (mt) genome of M. marshalli and examined its phylogenetic relationship with selected members of the superfamily Trichostrongyloidea using Bayesian inference (BI) based on concatenated mt amino acid sequence datasets. The complete mt genome sequence of M. marshalli is 13,891 bp, including 12 protein-coding genes, 22 transfer RNA genes, and 2 ribosomal RNA genes. All protein-coding genes are transcribed in the same direction. Phylogenetic analyses based on concatenated amino acid sequences of the 12 protein-coding genes supported the monophylies of the families Haemonchidae, Molineidae, and Dictyocaulidae with strong statistical support, but rejected the monophyly of the family Trichostrongylidae. The determination of the complete mt genome sequence of M. marshalli provides novel genetic markers for studying the systematics, population genetics, and molecular epidemiology of M. marshalli and its congeners.
The evolution of mollusc shells.
McDougall, Carmel; Degnan, Bernard M
2018-05-01
Molluscan shells are externally fabricated by specialized epithelial cells on the dorsal mantle. Although a conserved set of regulatory genes appears to underlie specification of mantle progenitor cells, the genes that contribute to the formation of the mature shell are incredibly diverse. Recent comparative analyses of mantle transcriptomes and shell proteomes of gastropods and bivalves are consistent with shell diversity being underpinned by a rapidly evolving mantle secretome (suite of genes expressed in the mantle that encode secreted proteins) that is the product of (a) high rates of gene co-option into and loss from the mantle gene regulatory network, and (b) the rapid evolution of coding sequences, particular those encoding repetitive low complexity domains. Outside a few conserved genes, such as carbonic anhydrase, a so-called "biomineralization toolkit" has yet to be discovered. Despite this, a common suite of protein domains, which are often associated with the extracellular matrix and immunity, appear to have been independently and often uniquely co-opted into the mantle secretomes of different species. The evolvability of the mantle secretome provides a molecular explanation for the evolution and diversity of molluscan shells. These genomic processes are likely to underlie the evolution of other animal biominerals, including coral and echinoderm skeletons. This article is categorized under: Comparative Development and Evolution > Regulation of Organ Diversity Comparative Development and Evolution > Evolutionary Novelties. © 2018 Wiley Periodicals, Inc.
Identification Of Protein Vaccine Candidates Using Comprehensive Proteomic Analysis Strategies
2007-12-01
urease (URE) gene codes for a urea amidohydrolase protein that catalyzes urea hydrolysis. The protein was first isolated from C. immitis and...the Cu, Zn, Superoxide Dismutase (SOD), the Spherule Outer Wall glycoprotein (SOWgp), the T-Cell Reactive Protein (TCRP), and Urease (URE). It is...et al. 1997. Isolation and characterization of the urease gene (URE) from the pathogenic fungus Coccidioides immitis. Gene 198: 387-391. 54. Li, K
Lipinska, B; Rao, A S; Bolten, B M; Balakrishnan, R; Goldberg, E B
1989-01-01
We sequenced bacteriophage T4 genes 2 and 3 and the putative C-terminal portion of gene 50. They were found to have appropriate open reading frames directed counterclockwise on the T4 map. Mutations in genes 2 and 64 were shown to be in the same open reading frame, which we now call gene 2. This gene codes for a protein of 27,068 daltons. The open reading frame corresponding to gene 3 codes for a protein of 20,634 daltons. Appropriate bands on polyacrylamide gels were identified at 30 and 20 kilodaltons, respectively. We found that the product of the cloned gene 2 can protect T4 DNA double-stranded ends from exonuclease V action. Images PMID:2644202
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ploos van Amstel, H.; Reitsma, P.H.; van der Logt, C.P.
The human protein S locus on chromosome 3 consists of two protein S genes, PS{alpha} and PS{beta}. Here the authors report the cloning and characterization of both genes. Fifteen exons of the PS{alpha} gene were identified that together code for protein S mRNA as derived from the reported protein S cDNAs. Analysis by primer extension of liver protein S mRNA, however, reveals the presence of two mRNA forms that differ in the length of their 5{prime}-noncoding region. Both transcripts contain a 5{prime}-noncoding region longer than found in the protein S cDNAs. The two products may arise from alternative splicing ofmore » an additional intron in this region or from the usage of two start sites for transcription. The intron-exon organization of the PS{alpha} gene fully supports the hypothesis that the protein S gene is the product of an evolutional assembling process in which gene modules coding for structural/functional protein units also found in other coagulation proteins have been put upstream of the ancestral gene of a steroid hormone binding protein. The PS{beta} gene is identified as a pseudogene. It contains a large variety of detrimental aberrations, viz., the absence of exon I, a splice site mutation, three stop codons, and a frame shift mutation. Overall the two genes PS{alpha} and PS{beta} show between their exonic sequences 96.5% homology. Southern analysis of primate DNA showed that the duplication of the ancestral protein S gene has occurred after the branching of the orangutan from the African apes. A nonsense mutation that is present in the pseudogene of man also could be identified in one of the two protein S genes of both chimpanzee and gorilla. This implicates that silencing of one of the two protein S genes must have taken place before the divergence of the three African apes.« less
Dynamic gene expression response to altered gravity in human T cells.
Thiel, Cora S; Hauschild, Swantje; Huge, Andreas; Tauber, Svantje; Lauber, Beatrice A; Polzer, Jennifer; Paulsen, Katrin; Lier, Hartwin; Engelmann, Frank; Schmitz, Burkhard; Schütte, Andreas; Layer, Liliana E; Ullrich, Oliver
2017-07-12
We investigated the dynamics of immediate and initial gene expression response to different gravitational environments in human Jurkat T lymphocytic cells and compared expression profiles to identify potential gravity-regulated genes and adaptation processes. We used the Affymetrix GeneChip® Human Transcriptome Array 2.0 containing 44,699 protein coding genes and 22,829 non-protein coding genes and performed the experiments during a parabolic flight and a suborbital ballistic rocket mission to cross-validate gravity-regulated gene expression through independent research platforms and different sets of control experiments to exclude other factors than alteration of gravity. We found that gene expression in human T cells rapidly responded to altered gravity in the time frame of 20 s and 5 min. The initial response to microgravity involved mostly regulatory RNAs. We identified three gravity-regulated genes which could be cross-validated in both completely independent experiment missions: ATP6V1A/D, a vacuolar H + -ATPase (V-ATPase) responsible for acidification during bone resorption, IGHD3-3/IGHD3-10, diversity genes of the immunoglobulin heavy-chain locus participating in V(D)J recombination, and LINC00837, a long intergenic non-protein coding RNA. Due to the extensive and rapid alteration of gene expression associated with regulatory RNAs, we conclude that human cells are equipped with a robust and efficient adaptation potential when challenged with altered gravitational environments.
Reuther, Peter; Göpfert, Kristina; Dudek, Alexandra H.; Heiner, Monika; Herold, Susanne; Schwemmle, Martin
2015-01-01
Influenza A viruses (IAV) pose a constant threat to the human population and therefore a better understanding of their fundamental biology and identification of novel therapeutics is of upmost importance. Various reporter-encoding IAV were generated to achieve these goals, however, one recurring difficulty was the genetic instability especially of larger reporter genes. We employed the viral NS segment coding for the non-structural protein 1 (NS1) and nuclear export protein (NEP) for stable expression of diverse reporter proteins. This was achieved by converting the NS segment into a single open reading frame (ORF) coding for NS1, the respective reporter and NEP. To allow expression of individual proteins, the reporter genes were flanked by two porcine Teschovirus-1 2A peptide (PTV-1 2A)-coding sequences. The resulting viruses encoding luciferases, fluorescent proteins or a Cre recombinase are characterized by a high genetic stability in vitro and in mice and can be readily employed for antiviral compound screenings, visualization of infected cells or cells that survived acute infection. PMID:26068081
Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals.
Popova, Olga V; Mikhailov, Kirill V; Nikitin, Mikhail A; Logacheva, Maria D; Penin, Aleksey A; Muntyan, Maria S; Kedrova, Olga S; Petrov, Nikolai B; Panchin, Yuri V; Aleoshin, Vladimir V
2016-01-01
Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha-an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida) and Pycnophyes kielensis (Allomalorhagida). Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even Protostomia.
Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals
Popova, Olga V.; Mikhailov, Kirill V.; Nikitin, Mikhail A.; Logacheva, Maria D.; Penin, Aleksey A.; Muntyan, Maria S.; Kedrova, Olga S.; Petrov, Nikolai B.; Panchin, Yuri V.
2016-01-01
Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha—an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida) and Pycnophyes kielensis (Allomalorhagida). Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even Protostomia. PMID:27755612
Identification and characterization of a novel zebrafish (Danio rerio) pentraxin-carbonic anhydrase.
Patrikainen, Maarit S; Tolvanen, Martti E E; Aspatwar, Ashok; Barker, Harlan R; Ortutay, Csaba; Jänis, Janne; Laitaoja, Mikko; Hytönen, Vesa P; Azizi, Latifeh; Manandhar, Prajwol; Jáger, Edit; Vullo, Daniela; Kukkurainen, Sampo; Hilvo, Mika; Supuran, Claudiu T; Parkkila, Seppo
2017-01-01
Carbonic anhydrases (CAs) are ubiquitous, essential enzymes which catalyze the conversion of carbon dioxide and water to bicarbonate and H + ions. Vertebrate genomes generally contain gene loci for 15-21 different CA isoforms, three of which are enzymatically inactive. CA VI is the only secretory protein of the enzymatically active isoforms. We discovered that non-mammalian CA VI contains a C-terminal pentraxin (PTX) domain, a novel combination for both CAs and PTXs. We isolated and sequenced zebrafish ( Danio rerio ) CA VI cDNA, complete with the sequence coding for the PTX domain, and produced the recombinant CA VI-PTX protein. Enzymatic activity and kinetic parameters were measured with a stopped-flow instrument. Mass spectrometry, analytical gel filtration and dynamic light scattering were used for biophysical characterization. Sequence analyses and Bayesian phylogenetics were used in generating hypotheses of protein structure and CA VI gene evolution. A CA VI-PTX antiserum was produced, and the expression of CA VI protein was studied by immunohistochemistry. A knock-down zebrafish model was constructed, and larvae were observed up to five days post-fertilization (dpf). The expression of ca6 mRNA was quantitated by qRT-PCR in different developmental times in morphant and wild-type larvae and in different adult fish tissues. Finally, the swimming behavior of the morphant fish was compared to that of wild-type fish. The recombinant enzyme has a very high carbonate dehydratase activity. Sequencing confirms a 530-residue protein identical to one of the predicted proteins in the Ensembl database (ensembl.org). The protein is pentameric in solution, as studied by gel filtration and light scattering, presumably joined by the PTX domains. Mass spectrometry confirms the predicted signal peptide cleavage and disulfides, and N-glycosylation in two of the four observed glycosylation motifs. Molecular modeling of the pentamer is consistent with the modifications observed in mass spectrometry. Phylogenetics and sequence analyses provide a consistent hypothesis of the evolutionary history of domains associated with CA VI in mammals and non-mammals. Briefly, the evidence suggests that ancestral CA VI was a transmembrane protein, the exon coding for the cytoplasmic domain was replaced by one coding for PTX domain, and finally, in the therian lineage, the PTX-coding exon was lost. We knocked down CA VI expression in zebrafish embryos with antisense morpholino oligonucleotides, resulting in phenotype features of decreased buoyancy and swim bladder deflation in 4 dpf larvae. These findings provide novel insights into the evolution, structure, and function of this unique CA form.
From Genomes to Protein Models and Back
NASA Astrophysics Data System (ADS)
Tramontano, Anna; Giorgetti, Alejandro; Orsini, Massimiliano; Raimondo, Domenico
2007-12-01
The alternative splicing mechanism allows genes to generate more than one product. When the splicing events occur within protein coding regions they can modify the biological function of the protein. Alternative splicing has been suggested as one way for explaining the discrepancy between the number of human genes and functional complexity. We analysed the putative structure of the alternatively spliced gene products annotated in the ENCODE pilot project and discovered that many of the potential alternative gene products will be unlikely to produce stable functional proteins.
Multiple Site-Directed and Saturation Mutagenesis by the Patch Cloning Method.
Taniguchi, Naohiro; Murakami, Hiroshi
2017-01-01
Constructing protein-coding genes with desired mutations is a basic step for protein engineering. Herein, we describe a multiple site-directed and saturation mutagenesis method, termed MUPAC. This method has been used to introduce multiple site-directed mutations in the green fluorescent protein gene and in the moloney murine leukemia virus reverse transcriptase gene. Moreover, this method was also successfully used to introduce randomized codons at five desired positions in the green fluorescent protein gene, and for simple DNA assembly for cloning.
Peng, Min; Jarett, Leonard; Meade, Ray; Madaio, Michael P.; Hancock, Wayne W.; George, Alfred L.; Neilson, Eric G.; Gasser, David L.
2008-01-01
Background Mice that are homozygous for the kidney disease (kd) mutation are apparently healthy for the first 8 weeks of life, but spontaneously develop a severe form of interstitial nephritis that progresses to end-stage renal disease (ESRD) by 4 to 8 months of age. By testing for linkage to microsatellite markers, we previously localized the kd gene to a YAC/BAC contig. Methods The sequence of the entire critical region was examined, and candidate genes were identified. These candidate genes were sequenced in both mutant (kd/kd) mice and normal controls. The phenotype was further characterized by immunohistochemistry and electron microscopy. Transgenic mice were constructed that carried the wild-type allele of the prime candidate gene, and this transgene was transferred to a kd/kd background by breeding. Results We have obtained evidence that kd is a mutant allele of a novel gene for a prenyltransferase-like mitochondrial protein (PLMP). This gene is alternatively spliced, with the larger gene product having one domain that resembles transprenyltransferase and another that is similar to geranylgeranyl pyrophosphate synthase. The smaller gene product includes only the first domain. An antiserum to PLMP localizes to mitochondria, and ultrastructural defects are present in the mitochondria of renal tubular epithelial cells, and to a lesser extent, hepatocytes and heart cells from kd/kd mice. In a line of kd/kd mice that carried the wild-type PLMP allele as a transgene, only 1 out of 13 animals expressed the disease by 120 days of age. Conclusion The kd allele codes for a novel protein that localizes to the mitochondria, and the kd/kd mouse has dysmorphic mitochondria in the renal tubular epithelial cells. This mouse is therefore a unique animal model for studying mechanisms that lead to tubulointerstitial nephritis. PMID:15200409
Ming-Xing, Lu; Zhi-Teng, Chen; Wei-Wei, Yu; Yu-Zhou, Du
2017-03-01
We report the complete mitochondrial genome (mitogenome) of a spiraling whitefly, Aleurodicus dispersus (Hemiptera: Aleyrodidae). The 16 170 bp long genome consists of 13 protein-coding genes, 20 transfer RNAs, 2 ribosomal RNAs, and a control region. The A. dispersus mitogenome also includes a cytb-like non-coding region and shows several variations relative to the typical insect mitogenome. A phylogenetic tree has been constructed using the 13 protein-coding genes of 12 related species from Hemiptera. Our results would contribute to further study of phylogeny in Aleyrodidae and Hemiptera.
Identification of three novel NHS mutations in families with Nance-Horan syndrome
Wu, Junhua; Brooks, Simon P.; Hardcastle, Alison J.; Lewis, Richard Alan; Stambolian, Dwight
2007-01-01
Purpose Nance-Horan Syndrome (NHS) is an infrequent and often overlooked X-linked disorder characterized by dense congenital cataracts, microphthalmia, and dental abnormalities. The syndrome is caused by mutations in the NHS gene, whose function is not known. The purpose of this study was to identify the frequency and distribution of NHS gene mutations and compare genotype with Nance-Horan phenotype in five North American NHS families. Methods Genomic DNA was isolated from white blood cells from NHS patients and family members. The NHS gene coding region and its splice site donor and acceptor regions were amplified from genomic DNA by PCR, and the amplicons were sequenced directly. Results We identified three unique NHS coding region mutations in these NHS families. Conclusions This report extends the number of unique identified NHS mutations to 14. PMID:17417607
Zhou, Ke-Ren; Liu, Shun; Sun, Wen-Ju; Zheng, Ling-Ling; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu
2017-01-04
The abnormal transcriptional regulation of non-coding RNAs (ncRNAs) and protein-coding genes (PCGs) is contributed to various biological processes and linked with human diseases, but the underlying mechanisms remain elusive. In this study, we developed ChIPBase v2.0 (http://rna.sysu.edu.cn/chipbase/) to explore the transcriptional regulatory networks of ncRNAs and PCGs. ChIPBase v2.0 has been expanded with ∼10 200 curated ChIP-seq datasets, which represent about 20 times expansion when comparing to the previous released version. We identified thousands of binding motif matrices and their binding sites from ChIP-seq data of DNA-binding proteins and predicted millions of transcriptional regulatory relationships between transcription factors (TFs) and genes. We constructed 'Regulator' module to predict hundreds of TFs and histone modifications that were involved in or affected transcription of ncRNAs and PCGs. Moreover, we built a web-based tool, Co-Expression, to explore the co-expression patterns between DNA-binding proteins and various types of genes by integrating the gene expression profiles of ∼10 000 tumor samples and ∼9100 normal tissues and cell lines. ChIPBase also provides a ChIP-Function tool and a genome browser to predict functions of diverse genes and visualize various ChIP-seq data. This study will greatly expand our understanding of the transcriptional regulations of ncRNAs and PCGs. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
González, Carolina; Tabernero, David; Cortese, Maria Francesca; Gregori, Josep; Casillas, Rosario; Riveiro-Barciela, Mar; Godoy, Cristina; Sopena, Sara; Rando, Ariadna; Yll, Marçal; Lopez-Martinez, Rosa; Quer, Josep; Esteban, Rafael; Buti, Maria; Rodríguez-Frías, Francisco
2018-05-21
To detect hyper-conserved regions in the hepatitis B virus (HBV) X gene ( HBX ) 5' region that could be candidates for gene therapy. The study included 27 chronic hepatitis B treatment-naive patients in various clinical stages (from chronic infection to cirrhosis and hepatocellular carcinoma, both HBeAg-negative and HBeAg-positive), and infected with HBV genotypes A-F and H. In a serum sample from each patient with viremia > 3.5 log IU/mL, the HBX 5' end region [nucleotide (nt) 1255-1611] was PCR-amplified and submitted to next-generation sequencing (NGS). We assessed genotype variants by phylogenetic analysis, and evaluated conservation of this region by calculating the information content of each nucleotide position in a multiple alignment of all unique sequences (haplotypes) obtained by NGS. Conservation at the HBx protein amino acid (aa) level was also analyzed. NGS yielded 1333069 sequences from the 27 samples, with a median of 4578 sequences/sample (2487-9279, IQR 2817). In 14/27 patients (51.8%), phylogenetic analysis of viral nucleotide haplotypes showed a complex mixture of genotypic variants. Analysis of the information content in the haplotype multiple alignments detected 2 hyper-conserved nucleotide regions, one in the HBX upstream non-coding region (nt 1255-1286) and the other in the 5' end coding region (nt 1519-1603). This last region coded for a conserved amino acid region (aa 63-76) that partially overlaps a Kunitz-like domain. Two hyper-conserved regions detected in the HBX 5' end may be of value for targeted gene therapy, regardless of the patients' clinical stage or HBV genotype.
NASA Astrophysics Data System (ADS)
Gong, Liang; Wu, Yu; Jian, Qijie; Yin, Chunxiao; Li, Taotao; Gupta, Vijai Kumar; Duan, Xuewu; Jiang, Yueming
2018-01-01
Vibrio qinghaiensis sp.-Q67 (Vqin-Q67) is a freshwater luminescent bacterium that continuously emits blue-green light (485 nm). The bacterium has been widely used for detecting toxic contaminants. Here, we report the complete genome sequence of Vqin-Q67, obtained using third-generation PacBio sequencing technology. Continuous long reads were attained from three PacBio sequencing runs and reads >500 bp with a quality value of >0.75 were merged together into a single dataset. This resultant highly-contiguous de novo assembly has no genome gaps, and comprises two chromosomes with substantial genetic information, including protein-coding genes, non-coding RNA, transposon and gene islands. Our dataset can be useful as a comparative genome for evolution and speciation studies, as well as for the analysis of protein-coding gene families, the pathogenicity of different Vibrio species in fish, the evolution of non-coding RNA and transposon, and the regulation of gene expression in relation to the bioluminescence of Vqin-Q67.
Shoura, Massa J; Gabdank, Idan; Hansen, Loren; Merker, Jason; Gotlib, Jason; Levene, Stephen D; Fire, Andrew Z
2017-10-05
Investigations aimed at defining the 3D configuration of eukaryotic chromosomes have consistently encountered an endogenous population of chromosome-derived circular genomic DNA, referred to as extrachromosomal circular DNA (eccDNA). While the production, distribution, and activities of eccDNAs remain understudied, eccDNA formation from specific regions of the linear genome has profound consequences on the regulatory and coding capabilities for these regions. Here, we define eccDNA distributions in Caenorhabditis elegans and in three human cell types, utilizing a set of DNA topology-dependent approaches for enrichment and characterization. The use of parallel biophysical, enzymatic, and informatic approaches provides a comprehensive profiling of eccDNA robust to isolation and analysis methodology. Results in human and nematode systems provide quantitative analysis of the eccDNA loci at both unique and repetitive regions. Our studies converge on and support a consistent picture, in which endogenous genomic DNA circles are present in normal physiological states, and in which the circles come from both coding and noncoding genomic regions. Prominent among the coding regions generating DNA circles are several genes known to produce a diversity of protein isoforms, with mucin proteins and titin as specific examples. Copyright © 2017 Shoura et al.
cncRNAs: Bi-functional RNAs with protein coding and non-coding functions
Kumari, Pooja; Sampath, Karuna
2015-01-01
For many decades, the major function of mRNA was thought to be to provide protein-coding information embedded in the genome. The advent of high-throughput sequencing has led to the discovery of pervasive transcription of eukaryotic genomes and opened the world of RNA-mediated gene regulation. Many regulatory RNAs have been found to be incapable of protein coding and are hence termed as non-coding RNAs (ncRNAs). However, studies in recent years have shown that several previously annotated non-coding RNAs have the potential to encode proteins, and conversely, some coding RNAs have regulatory functions independent of the protein they encode. Such bi-functional RNAs, with both protein coding and non-coding functions, which we term as ‘cncRNAs’, have emerged as new players in cellular systems. Here, we describe the functions of some cncRNAs identified from bacteria to humans. Because the functions of many RNAs across genomes remains unclear, we propose that RNAs be classified as coding, non-coding or both only after careful analysis of their functions. PMID:26498036
Jakubowska, Agata K; Peters, Sander A; Ziemnicka, Jadwiga; Vlak, Just M; van Oers, Monique M
2006-03-01
The genome sequence of a Polish isolate of Agrotis segetum nucleopolyhedrovirus (AgseNPV-A) was determined and analysed. The circular genome is composed of 147,544 bp and has a G+C content of 45.7 mol%. It contains 153 putative, non-overlapping open reading frames (ORFs) encoding predicted proteins of more than 50 aa, together making up 89.8 % of the genome. The remaining 10.2 % of the DNA constitutes non-coding regions and homologous-repeat regions. One hundred and forty-three AgseNPV-A ORFs are homologues of previously reported baculovirus gene sequences. There are ten unique ORFs and they account for 3 % of the genome in total. All 62 lepidopteran baculovirus genes, including the 29 core baculovirus genes, were found in the AgseNPV-A genome. The gene content and gene order of AgseNPV-A are most similar to those of Spodoptera exigua (Se) multiple NPV and their shared homologous genes are 100 % collinear. Three putative enhancin genes were identified in the AgseNPV-A genome. In phylogenetic analysis, the AgseNPV-A enhancins form a cluster separated from enhancins of the Mamestra species NPVs.
A human haploid gene trap collection to study lncRNAs with unusual RNA biology.
Kornienko, Aleksandra E; Vlatkovic, Irena; Neesen, Jürgen; Barlow, Denise P; Pauler, Florian M
2016-01-01
Many thousand long non-coding (lnc) RNAs are mapped in the human genome. Time consuming studies using reverse genetic approaches by post-transcriptional knock-down or genetic modification of the locus demonstrated diverse biological functions for a few of these transcripts. The Human Gene Trap Mutant Collection in haploid KBM7 cells is a ready-to-use tool for studying protein-coding gene function. As lncRNAs show remarkable differences in RNA biology compared to protein-coding genes, it is unclear if this gene trap collection is useful for functional analysis of lncRNAs. Here we use the uncharacterized LOC100288798 lncRNA as a model to answer this question. Using public RNA-seq data we show that LOC100288798 is ubiquitously expressed, but inefficiently spliced. The minor spliced LOC100288798 isoforms are exported to the cytoplasm, whereas the major unspliced isoform is nuclear localized. This shows that LOC100288798 RNA biology differs markedly from typical mRNAs. De novo assembly from RNA-seq data suggests that LOC100288798 extends 289kb beyond its annotated 3' end and overlaps the downstream SLC38A4 gene. Three cell lines with independent gene trap insertions in LOC100288798 were available from the KBM7 gene trap collection. RT-qPCR and RNA-seq confirmed successful lncRNA truncation and its extended length. Expression analysis from RNA-seq data shows significant deregulation of 41 protein-coding genes upon LOC100288798 truncation. Our data shows that gene trap collections in human haploid cell lines are useful tools to study lncRNAs, and identifies the previously uncharacterized LOC100288798 as a potential gene regulator.
Quantifying the mechanisms of domain gain in animal proteins.
Buljan, Marija; Frankish, Adam; Bateman, Alex
2010-01-01
Protein domains are protein regions that are shared among different proteins and are frequently functionally and structurally independent from the rest of the protein. Novel domain combinations have a major role in evolutionary innovation. However, the relative contributions of the different molecular mechanisms that underlie domain gains in animals are still unknown. By using animal gene phylogenies we were able to identify a set of high confidence domain gain events and by looking at their coding DNA investigate the causative mechanisms. Here we show that the major mechanism for gains of new domains in metazoan proteins is likely to be gene fusion through joining of exons from adjacent genes, possibly mediated by non-allelic homologous recombination. Retroposition and insertion of exons into ancestral introns through intronic recombination are, in contrast to previous expectations, only minor contributors to domain gains and have accounted for less than 1% and 10% of high confidence domain gain events, respectively. Additionally, exonization of previously non-coding regions appears to be an important mechanism for addition of disordered segments to proteins. We observe that gene duplication has preceded domain gain in at least 80% of the gain events. The interplay of gene duplication and domain gain demonstrates an important mechanism for fast neofunctionalization of genes.
Zhao, Fang; Huang, Dun-Yuan; Sun, Xiao-Yan; Shi, Qing-Hui; Hao, Jia-Sheng; Zhang, Lan-Lan; Yang, Qun
2013-10-01
The Riodinidae is one of the lepidopteran butterfly families. This study describes the complete mitochondrial genome of the butterfly species Abisara fylloides, the first mitochondrial genome of the Riodinidae family. The results show that the entire mitochondrial genome of A. fylloides is 15 301 bp in length, and contains 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and a 423 bp A+T-rich region. The gene content, orientation and order are identical to the majority of other lepidopteran insects. Phylogenetic reconstruction was conducted using the concatenated 13 protein-coding gene (PCG) sequences of 19 available butterfly species covering all the five butterfly families (Papilionidae, Nymphalidae, Peridae, Lycaenidae and Riodinidae). Both maximum likelihood and Bayesian inference analyses highly supported the monophyly of Lycaenidae+Riodinidae, which was standing as the sister of Nymphalidae. In addition, we propose that the riodinids be categorized into the family Lycaenidae as a subfamilial taxon. The Riodinidae is one of the lepidopteran butterfly families. This study describes the complete mitochondrial genome of the butterfly species Abisara fylloides , the first mitochondrial genome of the Riodinidae family. The results show that the entire mitochondrial genome of A. fylloides is 15 301 bp in length, and contains 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and a 423 bp A+T-rich region. The gene content, orientation and order are identical to the majority of other lepidopteran insects. Phylogenetic reconstruction was conducted using the concatenated 13 protein-coding gene (PCG) sequences of 19 available butterfly species covering all the five butterfly families (Papilionidae, Nymphalidae, Peridae, Lycaenidae and Riodinidae). Both maximum likelihood and Bayesian inference analyses highly supported the monophyly of Lycaenidae+Riodinidae, which was standing as the sister of Nymphalidae. In addition, we propose that the riodinids be categorized into the family Lycaenidae as a subfamilial taxon.
Taketa, Shin; Yuo, Takahisa; Tonooka, Takuji; Tsumuraya, Yoichi; Inagaki, Yoshiaki; Haruyama, Naoto; Larroque, Oscar; Jobling, Stephen A.
2012-01-01
(1,3;1,4)-β-D-glucans (mixed-linkage glucans) are found in tissues of members of the Poaceae (grasses), and are particularly high in barley (Hordeum vulgare) grains. The present study describes the isolation of three independent (1,3;1,4)-β-D-glucanless (betaglucanless; bgl) mutants of barley which completely lack (1,3;1,4)-β-D-glucan in all the tissues tested. The bgl phenotype cosegregates with the cellulose synthase like HvCslF6 gene on chromosome arm 7HL. Each of the bgl mutants has a single nucleotide substitution in the coding region of the HvCslF6 gene resulting in a change of a highly conserved amino acid residue of the HvCslF6 protein. Microsomal membranes isolated from developing endosperm of the bgl mutants lack detectable (1,3;1,4)-β-D-glucan synthase activity indicating that the HvCslF6 protein is inactive. This was confirmed by transient expression of the HvCslF6 cDNAs in Nicotiana benthamiana leaves. The wild-type HvCslF6 gene directed the synthesis of high levels of (1,3;1,4)-β-D-glucans, whereas the mutant HvCslF6 proteins completely lack the ability to synthesize (1,3;1,4)-β-D-glucans. The fine structure of the (1,3;1,4)-β-D-glucan produced in the tobacco leaf was also very different from that found in cereals having an extremely low DP3/DP4 ratio. These results demonstrate that, among the seven CslF and one CslH genes present in the barley genome, HvCslF6 has a unique role and is the key determinant controlling the biosynthesis of (1,3;1,4)-β-D-glucans. Natural allelic variation in the HvCslF6 gene was found predominantly within introns among 29 barley accessions studied. Genetic manipulation of the HvCslF6 gene could enable control of (1,3;1,4)-β-D-glucans in accordance with the purposes of use. PMID:21940720
Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
Djebali, Sarah; Delaplace, Franck; Crollius, Hugues Roest
2006-01-01
Background Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism. Results We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts. Conclusion We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement. PMID:16925841
The Yersinia pestis gcvB gene encodes two small regulatory RNA molecules
McArthur, Sarah D; Pulvermacher, Sarah C; Stauffer, George V
2006-01-01
Background In recent years it has become clear that small non-coding RNAs function as regulatory elements in bacterial virulence and bacterial stress responses. We tested for the presence of the small non-coding GcvB RNAs in Y. pestis as possible regulators of gene expression in this organism. Results In this study, we report that the Yersinia pestis KIM6 gcvB gene encodes two small RNAs. Transcription of gcvB is activated by the GcvA protein and repressed by the GcvR protein. The gcvB-encoded RNAs are required for repression of the Y. pestis dppA gene, encoding the periplasmic-binding protein component of the dipeptide transport system, showing that the GcvB RNAs have regulatory activity. A deletion of the gcvB gene from the Y. pestis KIM6 chromosome results in a decrease in the generation time of the organism as well as a change in colony morphology. Conclusion The results of this study indicate that the Y. pestis gcvB gene encodes two small non-coding regulatory RNAs that repress dppA expression. A gcvB deletion is pleiotropic, suggesting that the sRNAs are likely involved in controlling genes in addition to dppA. PMID:16768793
Van Hoeck, Arne; Horemans, Nele; Monsieurs, Pieter; Cao, Hieu Xuan; Vandenhove, Hildegarde; Blust, Ronny
2015-01-01
Freshwater duckweed, comprising the smallest, fastest growing and simplest macrophytes has various applications in agriculture, phytoremediation and energy production. Lemna minor, the so-called common duckweed, is a model system of these aquatic plants for ecotoxicological bioassays, genetic transformation tools and industrial applications. Given the ecotoxic relevance and high potential for biomass production, whole-genome information of this cosmopolitan duckweed is needed. The 472 Mbp assembly of the L. minor genome (2n = 40; estimated 481 Mbp; 98.1 %) contains 22,382 protein-coding genes and 61.5 % repetitive sequences. The repeat content explains 94.5 % of the genome size difference in comparison with the greater duckweed, Spirodela polyrhiza (2n = 40; 158 Mbp; 19,623 protein-coding genes; and 15.79 % repetitive sequences). Comparison of proteins from other monocot plants, protein ortholog identification, OrthoMCL, suggests 1356 duckweed-specific groups (3367 proteins, 15.0 % total L. minor proteins) and 795 Lemna-specific groups (2897 proteins, 12.9 % total L. minor proteins). Interestingly, proteins involved in biosynthetic processes in response to various stimuli and hydrolase activities are enriched in the Lemna proteome in comparison with the Spirodela proteome. The genome sequence and annotation of L. minor protein-coding genes provide new insights in biological understanding and biomass production applications of Lemna species.
Elam, W Austin; Schrank, Travis P; Campagnolo, Andrew J; Hilser, Vincent J
2013-04-01
Intrinsically disordered (ID) proteins function in the absence of a unique stable structure and appear to challenge the classic structure-function paradigm. The extent to which ID proteins take advantage of subtle conformational biases to perform functions, and whether signals for such mechanism can be identified in proteome-wide studies is not well understood. Of particular interest is the polyproline II (PII) conformation, suggested to be highly populated in unfolded proteins. We experimentally determine a complete calorimetric propensity scale for the PII conformation. Projection of the scale into representative eukaryotic proteomes reveals significant PII bias in regions coding for ID proteins. Importantly, enrichment of PII in ID proteins, or protein segments, is also captured by other PII scales, indicating that this enrichment is robustly encoded and universally detectable regardless of the method of PII propensity determination. Gene ontology (GO) terms obtained using our PII scale and other scales demonstrate a consensus for molecular functions performed by high PII proteins across the proteome. Perhaps the most striking result of the GO analysis is conserved enrichment (P < 10(-8) ) of phosphorylation sites in high PII regions found by all PII scales. Subsequent conformational analysis reveals a phosphorylation-dependent modulation of PII, suggestive of a conserved "tunability" within these regions. In summary, the application of an experimentally determined polyproline II (PII) propensity scale to proteome-wide sequence analysis and gene ontology reveals an enrichment of PII bias near disordered phosphorylation sites that is conserved throughout eukaryotes. Copyright © 2013 The Protein Society.
Gao, Zhongshan; Weg, Eric W van de; Matos, Catarina I; Arens, Paul; Bolhaar, Suzanne THP; Knulst, Andre C; Li, Yinghui; Hoffmann-Sommergruber, Karin; Gilissen, Luud JWJ
2008-01-01
Background Mal d 1 is a major apple allergen causing food allergic symptoms of the oral allergy syndrome (OAS) in birch-pollen sensitised patients. The Mal d 1 gene family is known to have at least 7 intron-containing and 11 intronless members that have been mapped in clusters on three linkage groups. In this study, the allelic diversity of the seven intron-containing Mal d 1 genes was assessed among a set of apple cultivars by sequencing or indirectly through pedigree genotyping. Protein variant constitutions were subsequently compared with Skin Prick Test (SPT) responses to study the association of deduced protein variants with allergenicity in a set of 14 cultivars. Results From the seven intron-containing Mal d 1 genes investigated, Mal d 1.01 and Mal d 1.02 were highly conserved, as nine out of ten cultivars coded for the same protein variant, while only one cultivar coded for a second variant. Mal d 1.04, Mal d 1.05 and Mal d 1.06 A, B and C were more variable, coding for three to six different protein variants. Comparison of Mal d 1 allelic composition between the high-allergenic cultivar Golden Delicious and the low-allergenic cultivars Santana and Priscilla, which are linked in pedigree, showed an association between the protein variants coded by the Mal d 1.04 and -1.06A genes (both located on linkage group 16) with allergenicity. This association was confirmed in 10 other cultivars. In addition, Mal d 1.06A allele dosage effects associated with the degree of allergenicity based on prick to prick testing. Conversely, no associations were observed for the protein variants coded by the Mal d 1.01 (on linkage group 13), -1.02, -1.06B, -1.06C genes (all on linkage group 16), nor by the Mal d 1.05 gene (on linkage group 6). Conclusion Protein variant compositions of Mal d 1.04 and -1.06A and, in case of Mal d 1.06A, allele doses are associated with the differences in allergenicity among fourteen apple cultivars. This information indicates the involvement of qualitative as well as quantitative factors in allergenicity and warrants further research in the relative importance of quantitative and qualitative aspects of Mal d 1 gene expression on allergenicity. Results from this study have implications for medical diagnostics, immunotherapy, clinical research and breeding schemes for new hypo-allergenic cultivars. PMID:19014530
Zhao, Pengju; Yu, Ying; Feng, Wen; Du, Heng; Yu, Jian; Kang, Huimin; Zheng, Xianrui; Wang, Zhiquan; Liu, George E; Ernst, Catherine W; Ran, Xueqin; Wang, Jiafu; Liu, Jian-Feng
2018-05-01
Meishan is a pig breed indigenous to China and famous for its high fecundity. The traits of Meishan are strongly associated with its distinct evolutionary history and domestication. However, the genomic evidence linking the domestication of Meishan pigs with its unique features is still poorly understood. The goal of this study is to investigate the genomic signatures and evolutionary evidence related to the phenotypic traits of Meishan via large-scale sequencing. We found that the unique domestication of Meishan pigs occurred in the Taihu Basin area between the Majiabang and Liangzhu Cultures, during which 300 protein-coding genes have underwent positive selection. Notably, enrichment of the FoxO signaling pathway with significant enrichment signal and the harbored gene IGF1R were likely associated with the high fertility of Meishan pigs. Moreover, NFKB1 exhibited strong selective sweep signals and positively participated in hyaluronan biosynthesis as the key gene of NF-kB signaling, which may have resulted in the wrinkled skin and face of Meishan pigs. Particularly, three population-specific synonymous single-nucleotide variants occurred in PYROXD1, MC1R, and FAM83G genes; the T305C substitution in the MCIR gene explained the black coat of the Meishan pigs well. In addition, the shared haplotypes between Meishan and Duroc breeds confirmed the previous Asian-derived introgression and demonstrated the specific contribution of Meishan pigs. These findings will help us explain the unique genetic and phenotypic characteristics of Meishan pigs and offer a plausible method for their utilization of Meishan pigs as valuable genetic resources in pig breeding and as an animal model for human wrinkled skin disease research.
Dos Reis, Mario
2015-04-01
First principles of population genetics are used to obtain formulae relating the non-synonymous to synonymous substitution rate ratio to the selection coefficients acting at codon sites in protein-coding genes. Two theoretical cases are discussed and two examples from real data (a chloroplast gene and a virus polymerase) are given. The formulae give much insight into the dynamics of non-synonymous substitutions and may inform the development of methods to detect adaptive evolution. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Turgeon, B; Saba-El-Leil, M K; Meloche, S
2000-02-15
MAP (mitogen-activated protein) kinases are a family of serine/threonine kinases that have a pivotal role in signal transduction. Here we report the cloning and characterization of a mouse homologue of extracellular-signal-regulated protein kinase (ERK)3. The mouse Erk3 cDNA encodes a predicted protein of 720 residues, which displays 94% identity with human ERK3. Transcription and translation of this cDNA in vitro generates a 100 kDa protein similar to the human gene product ERK3. Immunoblot analysis with an antibody raised against a unique sequence of ERK3 also recognizes a 100 kDa protein in mouse tissues. A single transcript of Erk3 was detected in every adult mouse tissue examined, with the highest expression being found in the brain. Interestingly, expression of Erk3 mRNA is acutely regulated during mouse development, with a peak of expression observed at embryonic day 11. The mouse Erk3 gene was mapped to a single locus on central mouse chromosome 9, adjacent to the dilute mutation locus and in a region syntenic to human chromosome 15q21. Finally, we provide several lines of evidence to support the existence of a unique Erk3 gene product of 100 kDa in mammalian cells.
Hu, Bo; Liu, Dong-Xing; Zhang, Yu-Qing; Song, Jian-Tao; Ji, Xian-Fei; Hou, Zhi-Qiang; Zhang, Zhen-Hai
2016-05-01
In this study we sequenced the complete mitochondrial genome sequencing of a heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus) for the first time. The total length of the mitogenome was 16,267 bp. It harbored 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 1 non-coding control region.
Emblem, Åse; Karlsen, Bård Ove; Evertsen, Jussi; Johansen, Steinar D
2011-11-01
Group I introns are genetic insertion elements that invade host genomes in a wide range of organisms. In metazoans, however, group I introns are extremely rare, so far only identified within mitogenomes of hexacorals and some sponges. We sequenced the complete mitogenome of the cold-water scleractinian coral Lophelia pertusa, the dominating deep sea reef-building coral species in the North Atlantic Ocean. The mitogenome (16,150 bp) has the same gene content but organized in a unique gene order compared to that of other known scleractinian corals. A complex group I intron (6460 bp) inserted in the ND5 gene (position 717) was found to host seven essential mitochondrial protein genes and one ribosomal RNA gene. Phylogenetic analysis supports a vertical inheritance pattern of the ND5-717 intron among hexacoral mitogenomes with no examples of intron loss. Structural assessments of the Lophelia intron revealed an unusual organization that lacks the universally conserved ωG at the 3' end, as well as a highly compact RNA core structure with overlapping ribozyme and protein coding capacities. Based on phylogenetic and structural analyses we reconstructed the evolutionary history of ND5-717, from its ancestral protist origin, through intron loss in some early metazoan lineages, and into a compulsory feature with functional implications in hexacorals. Copyright © 2011 Elsevier Inc. All rights reserved.
2014-01-01
Background Nrd1 and Nab3 are essential sequence-specific yeast RNA binding proteins that function as a heterodimer in the processing and degradation of diverse classes of RNAs. These proteins also regulate several mRNA coding genes; however, it remains unclear exactly what percentage of the mRNA component of the transcriptome these proteins control. To address this question, we used the pyCRAC software package developed in our laboratory to analyze CRAC and PAR-CLIP data for Nrd1-Nab3-RNA interactions. Results We generated high-resolution maps of Nrd1-Nab3-RNA interactions, from which we have uncovered hundreds of new Nrd1-Nab3 mRNA targets, representing between 20 and 30% of protein-coding transcripts. Although Nrd1 and Nab3 showed a preference for binding near 5′ ends of relatively short transcripts, they bound transcripts throughout coding sequences and 3′ UTRs. Moreover, our data for Nrd1-Nab3 binding to 3′ UTRs was consistent with a role for these proteins in the termination of transcription. Our data also support a tight integration of Nrd1-Nab3 with the nutrient response pathway. Finally, we provide experimental evidence for some of our predictions, using northern blot and RT-PCR assays. Conclusions Collectively, our data support the notion that Nrd1 and Nab3 function is tightly integrated with the nutrient response and indicate a role for these proteins in the regulation of many mRNA coding genes. Further, we provide evidence to support the hypothesis that Nrd1-Nab3 represents a failsafe termination mechanism in instances of readthrough transcription. PMID:24393166
Biederman, Michelle K; Nelson, Megan M; Asalone, Kathryn C; Pedersen, Alyssa L; Saldanha, Colin J; Bracht, John R
2018-05-21
Developmentally programmed genome rearrangements are rare in vertebrates, but have been reported in scattered lineages including the bandicoot, hagfish, lamprey, and zebra finch (Taeniopygia guttata) [1]. In the finch, a well-studied animal model for neuroendocrinology and vocal learning [2], one such programmed genome rearrangement involves a germline-restricted chromosome, or GRC, which is found in germlines of both sexes but eliminated from mature sperm [3, 4]. Transmitted only through the oocyte, it displays uniparental female-driven inheritance, and early in embryonic development is apparently eliminated from all somatic tissue in both sexes [3, 4]. The GRC comprises the longest finch chromosome at over 120 million base pairs [3], and previously the only known GRC-derived sequence was repetitive and non-coding [5]. Because the zebra finch genome project was sourced from male muscle (somatic) tissue [6], the remaining genomic sequence and protein-coding content of the GRC remain unknown. Here we report the first protein-coding gene from the GRC: a member of the α-soluble N-ethylmaleimide sensitive fusion protein (NSF) attachment protein (α-SNAP) family hitherto missing from zebra finch gene annotations. In addition to the GRC-encoded α-SNAP, we find an additional paralogous α-SNAP residing in the somatic genome (a somatolog)-making the zebra finch the first example in which α-SNAP is not a single-copy gene. We show divergent, sex-biased expression for the paralogs and also that positive selection is detectable across the bird α-SNAP lineage, including the GRC-encoded α-SNAP. This study presents the identification and evolutionary characterization of the first protein-coding GRC gene in any organism. Copyright © 2018 Elsevier Ltd. All rights reserved.
2013-01-01
Background Most eukaryotic species represent stable karyotypes with a particular diploid number. B chromosomes are additional to standard karyotypes and may vary in size, number and morphology even between cells of the same individual. For many years it was generally believed that B chromosomes found in some plant, animal and fungi species lacked active genes. Recently, molecular cytogenetic studies showed the presence of additional copies of protein-coding genes on B chromosomes. However, the transcriptional activity of these genes remained elusive. We studied karyotypes of the Siberian roe deer (Capreolus pygargus) that possess up to 14 B chromosomes to investigate the presence and expression of genes on supernumerary chromosomes. Results Here, we describe a 2 Mbp region homologous to cattle chromosome 3 and containing TNNI3K (partial), FPGT, LRRIQ3 and a large gene-sparse segment on B chromosomes of the Siberian roe deer. The presence of the copy of the autosomal region was demonstrated by B-specific cDNA analysis, PCR assisted mapping, cattle bacterial artificial chromosome (BAC) clone localization and quantitative polymerase chain reaction (qPCR). By comparative analysis of B-specific and non-B chromosomal sequences we discovered some B chromosome-specific mutations in protein-coding genes, which further enabled the detection of a FPGT-TNNI3K transcript expressed from duplicated genes located on B chromosomes in roe deer fibroblasts. Conclusions Discovery of a large autosomal segment in all B chromosomes of the Siberian roe deer further corroborates the view of an autosomal origin for these elements. Detection of a B-derived transcript in fibroblasts implies that the protein coding sequences located on Bs are not fully inactivated. The origin, evolution and effect on host of B chromosomal genes seem to be similar to autosomal segmental duplications, which reinforces the view that supernumerary chromosomal elements might play an important role in genome evolution. PMID:23915065
Feng, X; Happ, G M
1996-11-14
The cDNA for Sp23, a structural protein of the spermatophore of Tenebrio molitor, had been previously cloned and characterized (Paesen, G.C., Schwartz, M.B., Peferoen, M., Weyda, F. and Happ, G.M. (1992a) Amino acid sequence of Sp23, a structure protein of the spermatophore of the mealworm beetle, Tenebrio molitor. J. Biol. Chem. 257, 18852-18857). Using the labeled cDNA for Sp23 as a probe to screen a library of genomic DNA from Tenebrio molitor, we isolated a genomic clone for Sp23. A 5373-base pair (bp) restriction fragment containing the Sp23 gene was sequenced. The coding region is separated by a 55-bp intron which is located close to the translation start site. Three putative ecdysone response elements (EcRE) are identified in the 5' flanking region of the Sp23 gene. Comparison of the flanking regions of the Sp23 gene with those of the D-protein gene expressed in the accessory glands of Tenebrio reveals similar sequences present in the flanking regions of the two genes. The genomic organization of the coding region of the Sp23 gene shares similarities with that of the D-protein gene, three Drosophila accessory gland genes and two Drosophila 20-OH ecdysone-responsive genes.
Sudianto, Edi; Wu, Chung-Shien; Lin, Ching-Ping; Chaw, Shu-Miaw
2016-06-27
Phylogeny of the ten Pinaceous genera has long been contentious. Plastid genomes (plastomes) provide an opportunity to resolve this problem because they contain rich evolutionary information. To comprehend the plastid phylogenomics of all ten Pinaceous genera, we sequenced the plastomes of two previously unavailable genera, Pseudolarix amabilis (122,234 bp) and Tsuga chinensis (120,859 bp). Both plastomes share similar gene repertoire and order. Here for the first time we report a unique insertion of tandem repeats in accD of T. chinensis From the 65 plastid protein-coding genes common to all Pinaceous genera, we re-examined the phylogenetic relationship among all Pinaceous genera. Our two phylogenetic trees are congruent in an identical tree topology, with the five genera of the Abietoideae subfamily constituting a monophyletic clade separate from the other three subfamilies: Pinoideae, Piceoideae, and Laricoideae. The five genera of Abietoideae were grouped into two sister clades consisting of (1) Cedrus alone and (2) two sister subclades of Pseudolarix-Tsuga and Abies-Keteleeria, with the former uniquely losing the gene psaM and the latter specifically excluding the 3 psbA from the residual inverted repeat. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Endo, Akihito; Tanizawa, Yasuhiro; Tanaka, Naoto; ...
2015-12-29
In this study, Fructobacillus spp. in fructose-rich niches belong to the family Leuconostocaceae. They were originally classified as Leuconostoc spp., but were later grouped into a novel genus, Fructobacillus , based on their phylogenetic position, morphology and specific biochemical characteristics. The unique characters, so called fructophilic characteristics, had not been reported in the group of lactic acid bacteria, suggesting unique evolution at the genome level. Here we studied four draft genome sequences of Fructobacillus spp. and compared their metabolic properties against those of Leuconostoc spp. As a result, Fructobacillus species possess significantly less protein coding sequences in their small genomes.more » The number of genes was significantly smaller in carbohydrate transport and metabolism. Several other metabolic pathways, including TCA cycle, ubiquinone and other terpenoid-quinone biosynthesis and phosphotransferase systems, were characterized as discriminative pathways between the two genera. The adhE gene for bifunctional acetaldehyde/alcohol dehydrogenase, and genes for subunits of the pyruvate dehydrogenase complex were absent in Fructobacillus spp. The two genera also show different levels of GC contents, which are mainly due to the different GC contents at the third codon position. In conclusion, the present genome characteristics in Fructobacillus spp. suggest reductive evolution that took place to adapt to specific niches.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Endo, Akihito; Tanizawa, Yasuhiro; Tanaka, Naoto
In this study, Fructobacillus spp. in fructose-rich niches belong to the family Leuconostocaceae. They were originally classified as Leuconostoc spp., but were later grouped into a novel genus, Fructobacillus , based on their phylogenetic position, morphology and specific biochemical characteristics. The unique characters, so called fructophilic characteristics, had not been reported in the group of lactic acid bacteria, suggesting unique evolution at the genome level. Here we studied four draft genome sequences of Fructobacillus spp. and compared their metabolic properties against those of Leuconostoc spp. As a result, Fructobacillus species possess significantly less protein coding sequences in their small genomes.more » The number of genes was significantly smaller in carbohydrate transport and metabolism. Several other metabolic pathways, including TCA cycle, ubiquinone and other terpenoid-quinone biosynthesis and phosphotransferase systems, were characterized as discriminative pathways between the two genera. The adhE gene for bifunctional acetaldehyde/alcohol dehydrogenase, and genes for subunits of the pyruvate dehydrogenase complex were absent in Fructobacillus spp. The two genera also show different levels of GC contents, which are mainly due to the different GC contents at the third codon position. In conclusion, the present genome characteristics in Fructobacillus spp. suggest reductive evolution that took place to adapt to specific niches.« less
Diekmann, Kerstin; Hodkinson, Trevor R; Wolfe, Kenneth H; van den Bekerom, Rob; Dix, Philip J; Barth, Susanne
2009-06-01
Lolium perenne L. (perennial ryegrass) is globally one of the most important forage and grassland crops. We sequenced the chloroplast (cp) genome of Lolium perenne cultivar Cashel. The L. perenne cp genome is 135 282 bp with a typical quadripartite structure. It contains genes for 76 unique proteins, 30 tRNAs and four rRNAs. As in other grasses, the genes accD, ycf1 and ycf2 are absent. The genome is of average size within its subfamily Pooideae and of medium size within the Poaceae. Genome size differences are mainly due to length variations in non-coding regions. However, considerable length differences of 1-27 codons in comparison of L. perenne to other Poaceae and 1-68 codons among all Poaceae were also detected. Within the cp genome of this outcrossing cultivar, 10 insertion/deletion polymorphisms and 40 single nucleotide polymorphisms were detected. Two of the polymorphisms involve tiny inversions within hairpin structures. By comparing the genome sequence with RT-PCR products of transcripts for 33 genes, 31 mRNA editing sites were identified, five of them unique to Lolium. The cp genome sequence of L. perenne is available under Accession number AM777385 at the European Molecular Biology Laboratory, National Center for Biotechnology Information and DNA DataBank of Japan.
Xia, Kai; Li, Yudong; Sun, Jing; Liang, Xinle
2016-01-01
Acetobacter pasteurianus, an acetic acid resistant bacterium belonging to alpha-proteobacteria, has been widely used to produce vinegar in the food industry. To understand the mechanism of its high tolerance to acetic acid and robust ability of oxidizing ethanol to acetic acid (> 12%, w/v), we described the 3.1 Mb complete genome sequence (including 0.28 M plasmid sequence) with a G+C content of 52.4% of A. pasteurianus Ab3, which was isolated from the traditional Chinese rice vinegar (Meiguichu) fermentation process. Automatic annotation of the complete genome revealed 2,786 protein-coding genes and 73 RNA genes. The comparative genome analysis among A. pasteurianus strains revealed that A. pasteurianus Ab3 possesses many unique genes potentially involved in acetic acid resistance mechanisms. In particular, two-component systems or toxin-antitoxin systems may be the signal pathway and modulatory network in A. pasteurianus to cope with acid stress. In addition, the large numbers of unique transport systems may also be related to its acid resistance capacity and cell fitness. Our results provide new clues to understanding the underlying mechanisms of acetic acid resistance in Acetobacter species and guiding industrial strain breeding for vinegar fermentation processes.
Xia, Kai; Li, Yudong; Sun, Jing; Liang, Xinle
2016-01-01
Acetobacter pasteurianus, an acetic acid resistant bacterium belonging to alpha-proteobacteria, has been widely used to produce vinegar in the food industry. To understand the mechanism of its high tolerance to acetic acid and robust ability of oxidizing ethanol to acetic acid (> 12%, w/v), we described the 3.1 Mb complete genome sequence (including 0.28 M plasmid sequence) with a G+C content of 52.4% of A. pasteurianus Ab3, which was isolated from the traditional Chinese rice vinegar (Meiguichu) fermentation process. Automatic annotation of the complete genome revealed 2,786 protein-coding genes and 73 RNA genes. The comparative genome analysis among A. pasteurianus strains revealed that A. pasteurianus Ab3 possesses many unique genes potentially involved in acetic acid resistance mechanisms. In particular, two-component systems or toxin-antitoxin systems may be the signal pathway and modulatory network in A. pasteurianus to cope with acid stress. In addition, the large numbers of unique transport systems may also be related to its acid resistance capacity and cell fitness. Our results provide new clues to understanding the underlying mechanisms of acetic acid resistance in Acetobacter species and guiding industrial strain breeding for vinegar fermentation processes. PMID:27611790
Yin, Yan-hui; Li, Bi-chun; Wei, Guang-hui; Zhu, Cai-ye; Li, Wei; Zhang, Ya-ni; Du, Li-xin; Cao, Wen-guang
2012-05-01
The aim of this study was to clone the heart-type fatty acid binding protein (H-FABP) gene of Xuhuai goat, to explore it bioinformatically, and analyze the subcellular localization using enhanced green fluorescent protein (EGFP). The results showed that the coding sequence (CDS) length of Xuhuai goat H-FABP gene was 402 bp, encoding 133 amino acids (GenBank accession number AY466498.1). The H-FABP cDNA coding sequence was compared with the corresponding region of human, chicken, brown rat, cow, wild boar, donkey, and zebrafish. The similarity were 89%, 76%, 85%, 84%, 93%, 91%, 70%, respectively. For the corresponding amino acid sequences, the similarity were 90%, 79%, 88%, 97%, 95%, 94%, 72%, respectively. This study did not find the signal peptide region in the H-FABP protein; it revealed that H-FABP protein might be a nonsecreted protein. H-FABP expression was detected in vitro by reverse transcription-polymerase chain reaction (RT-PCR), and the EGFP-H-FABP fusion protein was localized to the cytoplasm. The gene could also be transiently and permanently expressed in mice.
[Regulation of heat shock gene expression in response to stress].
Garbuz, D G
2017-01-01
Heat shock (HS) genes, or stress genes, code for a number of proteins that collectively form the most ancient and universal stress defense system. The system determines the cell capability of adaptation to various adverse factors and performs a variety of auxiliary functions in normal physiological conditions. Common stress factors, such as higher temperatures, hypoxia, heavy metals, and others, suppress transcription and translation for the majority of genes, while HS genes are upregulated. Transcription of HS genes is controlled by transcription factors of the HS factor (HSF) family. Certain HSFs are activated on exposure to higher temperatures or other adverse factors to ensure stress-induced HS gene expression, while other HSFs are specifically activated at particular developmental stages. The regulation of the main mammalian stress-inducible factor HSF1 and Drosophila melanogaster HSF includes many components, such as a variety of early warning signals indicative of abnormal cell activity (e.g., increases in intracellular ceramide, cytosolic calcium ions, or partly denatured proteins); protein kinases, which phosphorylate HSFs at various Ser residues; acetyltransferases; and regulatory proteins, such as SUMO and HSBP1. Transcription factors other than HSFs are also involved in activating HS gene transcription; the set includes D. melanogaster GAF, mammalian Sp1 and NF-Y, and other factors. Transcription of several stress genes coding for molecular chaperones of the glucose-regulated protein (GRP) family is predominantly regulated by another stress-detecting system, which is known as the unfolded protein response (UPR) system and is activated in response to massive protein misfolding in the endoplasmic reticulum and mitochondrial matrix. A translational fine tuning of HS protein expression occurs via changing the phosphorylation status of several proteins involved in translation initiation. In addition, specific signal sequences in the 5'-UTRs of some HS protein mRNAs ensure their preferential translation in stress.
Krzeminska, Urszula; Wilson, Robyn; Rahman, Sadequr; Song, Beng Kah; Seneviratne, Sampath; Gan, Han Ming; Austin, Christopher M
2016-07-01
The complete mitochondrial genomes of two jungle crows (Corvus macrorhynchos) were sequenced. DNA was extracted from tissue samples obtained from shed feathers collected in the field in Sri Lanka and sequenced using the Illumina MiSeq Personal Sequencer. Jungle crow mitogenomes have a structural organization typical of the genus Corvus and are 16,927 bp and 17,066 bp in length, both comprising 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal subunit genes, and a non-coding control region. In addition, we complement already available house crow (Corvus spelendens) mitogenome resources by sequencing an individual from Singapore. A phylogenetic tree constructed from Corvidae family mitogenome sequences available on GenBank is presented. We confirm the monophyly of the genus Corvus and propose to use complete mitogenome resources for further intra- and interspecies genetic studies.
Zhang, Bo; Zhang, Yan-Hong; Wang, Xin; Zhang, Hui-Xian; Lin, Qiang
2017-07-01
The deep sea is one of the most extensive ecosystems on earth. Organisms living there survive in an extremely harsh environment, and their mitochondrial energy metabolism might be a result of evolution. As one of the most important organelles, mitochondria generate energy through energy metabolism and play an important role in almost all biological activities. In this study, the mitogenome of a deep-sea sea anemone ( Bolocera sp.) was sequenced and characterized. Like other metazoans, it contained 13 energy pathway protein-coding genes and two ribosomal RNAs. However, it also exhibited some unique features: just two transfer RNA genes, two group I introns, two transposon-like noncanonical open reading frames (ORFs), and a control region-like (CR-like) element. All of the mitochondrial genes were coded by the same strand (the H-strand). The genetic order and orientation were identical to those of most sequenced actiniarians. Phylogenetic analyses showed that this species was closely related to Bolocera tuediae . Positive selection analysis showed that three residues (31 L and 42 N in ATP6 , 570 S in ND5 ) of Bolocera sp. were positively selected sites. By comparing these features with those of shallow sea anemone species, we deduced that these novel gene features may influence the activity of mitochondrial genes. This study may provide some clues regarding the adaptation of Bolocera sp. to the deep-sea environment.
Complete mitochondrial genome of Cynopterus sphinx (Pteropodidae: Cynopterus).
Li, Linmiao; Li, Min; Wu, Zhengjun; Chen, Jinping
2015-01-01
We have characterized the complete mitochondrial genome of Cynopterus sphinx (Pteropodidae: Cynopterus) and described its organization in this study. The total length of C. sphinx complete mitochondrial genome was 16,895 bp with the base composition of 32.54% A, 14.05% G, 25.82% T and 27.59% C. The complete mitochondrial genome included 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes (12S rRNA and 16S rRNA) and 1 control region (D-loop). The control region was 1435 bp long with the sequence CATACG repeat 64 times. Three protein-coding genes (ND1, COI and ND4) were ended with incomplete stop codon TA or T.
Niu, Fang-Fang; Zhu, Liang; Wang, Su; Wei, Shu-Jun
2016-07-01
Here, we report the mitochondrial genome sequence of the multicolored Asian lady beetle Harmonia axyridis (Pallas, 1773) (Coleoptera: Coccinellidae) (GenBank accession No. KR108208). This is the first species with sequenced mitochondrial genome from the genus Harmonia. The current length with partitial A + T-rich region of this mitochondrial genome is 16,387 bp. All the typical genes were sequenced except the trnI and trnQ. As in most other sequenced mitochondrial genomes of Coleoptera, there is no re-arrangement in the sequenced region compared with the pupative ancestral arrangement of insects. All protein-coding genes start with ATN codons. Five, five and three protein-coding genes stop with termination codon TAA, TA and T, respectively. Phylogenetic analysis using Bayesian method based on the first and second codon positions of the protein-coding genes supported that the Scirtidae is a basal lineage of Polyphaga. The Harmonia and the Coccinella form a sister lineage. The monophyly of Staphyliniformia, Scarabaeiformia and Cucujiformia was supported. The Buprestidae was found to be a sister group to the Bostrichiformia.
Naville, M; Warren, I A; Haftek-Terreau, Z; Chalopin, D; Brunet, F; Levin, P; Galiana, D; Volff, J-N
2016-04-01
Viruses and transposable elements, once considered as purely junk and selfish sequences, have repeatedly been used as a source of novel protein-coding genes during the evolution of most eukaryotic lineages, a phenomenon called 'molecular domestication'. This is exemplified perfectly in mammals and other vertebrates, where many genes derived from long terminal repeat (LTR) retroelements (retroviruses and LTR retrotransposons) have been identified through comparative genomics and functional analyses. In particular, genes derived from gag structural protein and envelope (env) genes, as well as from the integrase-coding and protease-coding sequences, have been identified in humans and other vertebrates. Retroelement-derived genes are involved in many important biological processes including placenta formation, cognitive functions in the brain and immunity against retroelements, as well as in cell proliferation, apoptosis and cancer. These observations support an important role of retroelement-derived genes in the evolution and diversification of the vertebrate lineage. Copyright © 2016 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Li, Wei; Zhang, Xin-Cheng; Zhao, Jian; Shi, Yan; Zhu, Xin-Ping
2015-01-25
Cuora trifasciata has become one of the most critically endangered species in the world. The complete mitochondrial genome of C. trifasciata (Chinese three-striped box turtle) was determined in this study. Its mitochondrial genome is a 16,575-bp-long circular molecule that consists of 37 genes that are typically found in other vertebrates. And the basic characteristics of the C. trifasciata mitochondrial genome were also determined. Moreover, a comparison of C. trifasciata with Cuora cyclornata, Cuora pani and Cuora aurocapitata indicated that the four mitogenomics differed in length, codons, overlaps, 13 protein-coding genes (PCGs), ND3, rRNA genes, control region, and other aspects. Phylogenetic analysis with Bayesian inference and maximum likelihood based on 12 protein-coding genes of the genus Cuora indicated the phylogenetic position of C. trifasciata within Cuora. The phylogenetic analysis also showed that C. trifasciata from Vietnam and China formed separate monophyletic clades with different Cuora species. The results of nucleotide base compositions, protein-coding genes and phylogenetic analysis showed that C. trifasciata from these two countries may represent different Cuora species. Copyright © 2014 Elsevier B.V. All rights reserved.
Bohm, C.; Chen, F.; Sevalle, J.; Qamar, S.; Dodd, R.; Li, Y.; Schmitt-Ulms, G.; Fraser, P.E.; St George-Hyslop, P.H.
2015-01-01
Inherited variants in multiple different genes are associated with increased risk for Alzheimer's disease (AD). In many of these genes, the inherited variants alter some aspect of the production or clearance of the neurotoxic amyloid β-peptide (Aβ). Thus missense, splice site or duplication mutants in the presenilin 1 (PS1), presenilin 2 (PS2) or the amyloid precursor protein (APP) genes, which alter the levels or shift the balance of Aβ produced, are associated with rare, highly penetrant autosomal dominant forms of Familial Alzheimer's Disease (FAD). Similarly, the more prevalent late-onset forms of AD are associated with both coding and non-coding variants in genes such as SORL1, PICALM and ABCA7 that affect the production and clearance of Aβ. This review summarises some of the recent molecular and structural work on the role of these genes and the proteins coded by them in the biology of Aβ. We also briefly outline how the emerging knowledge about the pathways involved in Aβ generation and clearance can be potentially targeted therapeutically. This article is part of Special Issue entitled "Neuronal Protein". PMID:25748120
Base composition and expression level of human genes.
Arhondakis, Stilianos; Auletta, Fabio; Torelli, Giuseppe; D'Onofrio, Giuseppe
2004-01-21
It is well known that the gene distribution is non-uniform in the human genome, reaching the highest concentration in the GC-rich isochores. Also the amino acid frequencies, and the hydrophobicity, of the corresponding encoded proteins are affected by the high GC level of the genes localized in the GC-rich isochores. It was hypothesized that the gene expression level as well is higher in GC-rich compared to GC-poor isochores [Mol. Biol. Evol. 10 (1993) 186]. Several features of human genes and proteins, namely expression level, coding and non-coding lengths, and hydrophobicity were investigated in the present paper. The results support the hypothesis reported above, since all the parameters so far studied converge to the same conclusion, that the average expression level of the GC-rich genes is significantly higher than that of the GC-poor genes.
Gibson, Joshua D; Hunt, Greg J
2016-01-01
The complete mitochondrial genome from an Africanized honey bee population (AHB, derived from Apis mellifera scutellata) was assembled and analyzed. The mitogenome is 16,411 bp long and contains the same gene repertoire and gene order as the European honey bee (13 protein coding genes, 22 tRNA genes and 2 rRNA genes). ND4 appears to use an alternate start codon and the long rRNA gene is 48 bp shorter in AHB due to a deletion in a terminal AT dinucleotide repeat. The dihydrouracil arm is missing from tRNA-Ser (AGN) and tRNA-Glu is missing the TV loop. The A + T content is comparable to the European honey bee (84.7%), which increases to 95% for the 3rd position in the protein coding genes.
Lagman, David; Franzén, Ilkin E; Eggert, Joel; Larhammar, Dan; Abalo, Xesús M
2016-06-13
Phosphodiesterase 6 (PDE6) is a protein complex that hydrolyses cGMP and acts as the effector of the vertebrate phototransduction cascade. The PDE6 holoenzyme consists of catalytic and inhibitory subunits belonging to two unrelated gene families. Rods and cones express distinct genes from both families: PDE6A and PDE6B code for the catalytic and PDE6G the inhibitory subunits in rods while PDE6C codes for the catalytic and PDE6H the inhibitory subunits in cones. We performed phylogenetic and comparative synteny analyses for both gene families in genomes from a broad range of animals. Furthermore, gene expression was investigated in zebrafish. We found that both gene families expanded from one to three members in the two rounds of genome doubling (2R) that occurred at the base of vertebrate evolution. The PDE6 inhibitory subunit gene family appears to be unique to vertebrates and expanded further after the teleost-specific genome doubling (3R). We also describe a new family member that originated in 2R and has been lost in amniotes, which we have named pde6i. Zebrafish has retained two additional copies of the PDE6 inhibitory subunit genes after 3R that are highly conserved, have high amino acid sequence identity, are coexpressed in the same photoreceptor type as their amniote orthologs and, interestingly, show strikingly different daily oscillation in gene expression levels. Together, these data suggest specialisation related to the adaptation to different light intensities during the day-night cycle, most likely maintaining the regulatory function of the PDE inhibitory subunits in the phototransduction cascade.
Nota, Florencia; Cambiagno, Damián A; Ribone, Pamela; Alvarez, María E
2015-06-01
DNA glycosylases recognize and excise damaged or incorrect bases from DNA initiating the base excision repair (BER) pathway. Methyl-binding domain protein 4 (MBD4) is a member of the HhH-GPD DNA glycosylase superfamily, which has been well studied in mammals but not in plants. Our knowledge on the plant enzyme is limited to the activity of the Arabidopsis recombinant protein MBD4L in vitro. To start evaluating MBD4L in its biological context, we here characterized the structure, expression and effects of its gene, AtMBD4L. Phylogenetic analysis indicated that AtMBD4L belongs to one of the seven families of HhH-GPD DNA glycosylase genes existing in plants, and is unique on its family. Two AtMBD4L transcripts coding for active enzymes were detected in leaves and flowers. Transgenic plants expressing the AtMBD4L:GUS gene confined GUS activity to perivascular leaf tissues (usually adjacent to hydathodes), flowers (anthers at particular stages of development), and the apex of immature siliques. MBD4L-GFP fusion proteins showed nuclear localization in planta. Interestingly, overexpression of the full length MBD4L, but not a truncated enzyme lacking the DNA glycosylase domain, induced the BER gene LIG1 and enhanced tolerance to oxidative stress. These results suggest that endogenous MBD4L acts on particular tissues, is capable of activating BER, and may contribute to repair DNA damage caused by oxidative stress. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Mesquita, Rafael D.; Vionette-Amaral, Raquel J.; Lowenberger, Carl; Rivera-Pomar, Rolando; Monteiro, Fernando A.; Minx, Patrick; Spieth, John; Carvalho, A. Bernardo; Panzera, Francisco; Lawson, Daniel; Torres, André Q.; Ribeiro, Jose M. C.; Sorgine, Marcos H. F.; Waterhouse, Robert M.; Abad-Franch, Fernando; Alves-Bezerra, Michele; Amaral, Laurence R.; Araujo, Helena M.; Aravind, L.; Atella, Georgia C.; Azambuja, Patricia; Berni, Mateus; Bittencourt-Cunha, Paula R.; Braz, Gloria R. C.; Calderón-Fernández, Gustavo; Carareto, Claudia M. A.; Christensen, Mikkel B.; Costa, Igor R.; Costa, Samara G.; Dansa, Marilvia; Daumas-Filho, Carlos R. O.; De-Paula, Iron F.; Dias, Felipe A.; Dimopoulos, George; Emrich, Scott J.; Esponda-Behrens, Natalia; Fampa, Patricia; Fernandez-Medina, Rita D.; da Fonseca, Rodrigo N.; Fontenele, Marcio; Fronick, Catrina; Fulton, Lucinda A.; Gandara, Ana Caroline; Garcia, Eloi S.; Genta, Fernando A.; Giraldo-Calderón, Gloria I.; Gomes, Bruno; Gondim, Katia C.; Granzotto, Adriana; Guarneri, Alessandra A.; Guigó, Roderic; Harry, Myriam; Hughes, Daniel S. T.; Jablonka, Willy; Jacquin-Joly, Emmanuelle; Juárez, M. Patricia; Koerich, Leonardo B.; Lange, Angela B.; Latorre-Estivalis, José Manuel; Lavore, Andrés; Lawrence, Gena G.; Lazoski, Cristiano; Lazzari, Claudio R.; Lopes, Raphael R.; Lorenzo, Marcelo G.; Lugon, Magda D.; Marcet, Paula L.; Mariotti, Marco; Masuda, Hatisaburo; Megy, Karine; Missirlis, Fanis; Mota, Theo; Noriega, Fernando G.; Nouzova, Marcela; Nunes, Rodrigo D.; Oliveira, Raquel L. L.; Oliveira-Silveira, Gilbert; Ons, Sheila; Orchard, Ian; Pagola, Lucia; Paiva-Silva, Gabriela O.; Pascual, Agustina; Pavan, Marcio G.; Pedrini, Nicolás; Peixoto, Alexandre A.; Pereira, Marcos H.; Pike, Andrew; Polycarpo, Carla; Prosdocimi, Francisco; Ribeiro-Rodrigues, Rodrigo; Robertson, Hugh M.; Salerno, Ana Paula; Salmon, Didier; Santesmasses, Didac; Schama, Renata; Seabra-Junior, Eloy S.; Silva-Cardoso, Livia; Silva-Neto, Mario A. C.; Souza-Gomes, Matheus; Sterkel, Marcos; Taracena, Mabel L.; Tojo, Marta; Tu, Zhijian Jake; Tubio, Jose M. C.; Ursic-Bedoya, Raul; Venancio, Thiago M.; Walter-Nuno, Ana Beatriz; Wilson, Derek; Warren, Wesley C.; Wilson, Richard K.; Huebner, Erwin; Dotson, Ellen M.; Oliveira, Pedro L.
2015-01-01
Rhodnius prolixus not only has served as a model organism for the study of insect physiology, but also is a major vector of Chagas disease, an illness that affects approximately seven million people worldwide. We sequenced the genome of R. prolixus, generated assembled sequences covering 95% of the genome (∼702 Mb), including 15,456 putative protein-coding genes, and completed comprehensive genomic analyses of this obligate blood-feeding insect. Although immune-deficiency (IMD)-mediated immune responses were observed, R. prolixus putatively lacks key components of the IMD pathway, suggesting a reorganization of the canonical immune signaling network. Although both Toll and IMD effectors controlled intestinal microbiota, neither affected Trypanosoma cruzi, the causal agent of Chagas disease, implying the existence of evasion or tolerance mechanisms. R. prolixus has experienced an extensive loss of selenoprotein genes, with its repertoire reduced to only two proteins, one of which is a selenocysteine-based glutathione peroxidase, the first found in insects. The genome contained actively transcribed, horizontally transferred genes from Wolbachia sp., which showed evidence of codon use evolution toward the insect use pattern. Comparative protein analyses revealed many lineage-specific expansions and putative gene absences in R. prolixus, including tandem expansions of genes related to chemoreception, feeding, and digestion that possibly contributed to the evolution of a blood-feeding lifestyle. The genome assembly and these associated analyses provide critical information on the physiology and evolution of this important vector species and should be instrumental for the development of innovative disease control methods. PMID:26627243
Mesquita, Rafael D; Vionette-Amaral, Raquel J; Lowenberger, Carl; Rivera-Pomar, Rolando; Monteiro, Fernando A; Minx, Patrick; Spieth, John; Carvalho, A Bernardo; Panzera, Francisco; Lawson, Daniel; Torres, André Q; Ribeiro, Jose M C; Sorgine, Marcos H F; Waterhouse, Robert M; Montague, Michael J; Abad-Franch, Fernando; Alves-Bezerra, Michele; Amaral, Laurence R; Araujo, Helena M; Araujo, Ricardo N; Aravind, L; Atella, Georgia C; Azambuja, Patricia; Berni, Mateus; Bittencourt-Cunha, Paula R; Braz, Gloria R C; Calderón-Fernández, Gustavo; Carareto, Claudia M A; Christensen, Mikkel B; Costa, Igor R; Costa, Samara G; Dansa, Marilvia; Daumas-Filho, Carlos R O; De-Paula, Iron F; Dias, Felipe A; Dimopoulos, George; Emrich, Scott J; Esponda-Behrens, Natalia; Fampa, Patricia; Fernandez-Medina, Rita D; da Fonseca, Rodrigo N; Fontenele, Marcio; Fronick, Catrina; Fulton, Lucinda A; Gandara, Ana Caroline; Garcia, Eloi S; Genta, Fernando A; Giraldo-Calderón, Gloria I; Gomes, Bruno; Gondim, Katia C; Granzotto, Adriana; Guarneri, Alessandra A; Guigó, Roderic; Harry, Myriam; Hughes, Daniel S T; Jablonka, Willy; Jacquin-Joly, Emmanuelle; Juárez, M Patricia; Koerich, Leonardo B; Lange, Angela B; Latorre-Estivalis, José Manuel; Lavore, Andrés; Lawrence, Gena G; Lazoski, Cristiano; Lazzari, Claudio R; Lopes, Raphael R; Lorenzo, Marcelo G; Lugon, Magda D; Majerowicz, David; Marcet, Paula L; Mariotti, Marco; Masuda, Hatisaburo; Megy, Karine; Melo, Ana C A; Missirlis, Fanis; Mota, Theo; Noriega, Fernando G; Nouzova, Marcela; Nunes, Rodrigo D; Oliveira, Raquel L L; Oliveira-Silveira, Gilbert; Ons, Sheila; Orchard, Ian; Pagola, Lucia; Paiva-Silva, Gabriela O; Pascual, Agustina; Pavan, Marcio G; Pedrini, Nicolás; Peixoto, Alexandre A; Pereira, Marcos H; Pike, Andrew; Polycarpo, Carla; Prosdocimi, Francisco; Ribeiro-Rodrigues, Rodrigo; Robertson, Hugh M; Salerno, Ana Paula; Salmon, Didier; Santesmasses, Didac; Schama, Renata; Seabra-Junior, Eloy S; Silva-Cardoso, Livia; Silva-Neto, Mario A C; Souza-Gomes, Matheus; Sterkel, Marcos; Taracena, Mabel L; Tojo, Marta; Tu, Zhijian Jake; Tubio, Jose M C; Ursic-Bedoya, Raul; Venancio, Thiago M; Walter-Nuno, Ana Beatriz; Wilson, Derek; Warren, Wesley C; Wilson, Richard K; Huebner, Erwin; Dotson, Ellen M; Oliveira, Pedro L
2015-12-01
Rhodnius prolixus not only has served as a model organism for the study of insect physiology, but also is a major vector of Chagas disease, an illness that affects approximately seven million people worldwide. We sequenced the genome of R. prolixus, generated assembled sequences covering 95% of the genome (∼ 702 Mb), including 15,456 putative protein-coding genes, and completed comprehensive genomic analyses of this obligate blood-feeding insect. Although immune-deficiency (IMD)-mediated immune responses were observed, R. prolixus putatively lacks key components of the IMD pathway, suggesting a reorganization of the canonical immune signaling network. Although both Toll and IMD effectors controlled intestinal microbiota, neither affected Trypanosoma cruzi, the causal agent of Chagas disease, implying the existence of evasion or tolerance mechanisms. R. prolixus has experienced an extensive loss of selenoprotein genes, with its repertoire reduced to only two proteins, one of which is a selenocysteine-based glutathione peroxidase, the first found in insects. The genome contained actively transcribed, horizontally transferred genes from Wolbachia sp., which showed evidence of codon use evolution toward the insect use pattern. Comparative protein analyses revealed many lineage-specific expansions and putative gene absences in R. prolixus, including tandem expansions of genes related to chemoreception, feeding, and digestion that possibly contributed to the evolution of a blood-feeding lifestyle. The genome assembly and these associated analyses provide critical information on the physiology and evolution of this important vector species and should be instrumental for the development of innovative disease control methods.
Ashburner, M; Misra, S; Roote, J; Lewis, S E; Blazej, R; Davis, T; Doyle, C; Galle, R; George, R; Harris, N; Hartzell, G; Harvey, D; Hong, L; Houston, K; Hoskins, R; Johnson, G; Martin, C; Moshrefi, A; Palazzolo, M; Reese, M G; Spradling, A; Tsang, G; Wan, K; Whitelaw, K; Celniker, S
1999-01-01
A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized "Adh region." A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species.Before beginning a Hunt, it is wise to ask someone what you are looking for before you begin looking for it. Milne 1926 PMID:10471707
Complete genome sequence of uropathogenic Escherichia coli isolate UPEC 26-1.
Subhadra, Bindu; Kim, Dong Ho; Kim, Jaeseok; Woo, Kyungho; Sohn, Kyung Mok; Kim, Hwa-Jung; Han, Kyudong; Oh, Man Hwan; Choi, Chul Hee
2018-06-01
Urinary tract infections (UTIs) are among the most common infections in humans, predominantly caused by uropathogenic Escherichia coli (UPEC). The diverse genomes of UPEC strains mostly impede disease prevention and control measures. In this study, we comparatively analyzed the whole genome sequence of a highly virulent UPEC strain, namely UPEC 26-1, which was isolated from urine sample of a patient suffering from UTI in Korea. Whole genome analysis showed that the genome consists of one circular chromosome of 5,329,753 bp, comprising 5064 protein-coding genes, 122 RNA genes (94 tRNA, 22 rRNA and 6 ncRNA genes), and 100 pseudogenes, with an average G+C content of 50.56%. In addition, we identified 8 prophage regions comprising 5 intact, 2 incomplete and 1 questionable ones and 63 genomic islands, suggesting the possibility of horizontal gene transfer in this strain. Comparative genome analysis of UPEC 26-1 with the UPEC strain CFT073 revealed an average nucleotide identity of 99.7%. The genome comparison with CFT073 provides major differences in the genome of UPEC 26-1 that would explain its increased virulence and biofilm formation. Nineteen of the total GIs were unique to UPEC 26-1 compared to CFT073 and nine of them harbored unique genes that are involved in virulence, multidrug resistance, biofilm formation and bacterial pathogenesis. The data from this study will assist in future studies of UPEC strains to develop effective control measures.
Mason, J M; Setlow, P
1987-01-01
Spores of Bacillus subtilis strains which carry deletion mutations in one gene (sspA) or two genes (sspA and sspB) which code for major alpha/beta-type small, acid-soluble spore proteins (SASP) are known to be much more sensitive to heat and UV radiation than wild-type spores. This heat- and UV-sensitive phenotype was cured completely or in part by introduction into these mutant strains of one or more copies of the sspA or sspB genes themselves; multiple copies of the B. subtilis sspD gene, which codes for a minor alpha/beta-type SASP; or multiple copies of the SASP-C gene, which codes for a major alpha/beta-type SASP of Bacillus megaterium. These findings suggest that alpha/beta-type SASP play interchangeable roles in the heat and UV radiation resistance of bacterial spores. Images PMID:3112127
Basu, Swaraj; Larsson, Erik
2018-05-31
Antisense transcripts and other long non-coding RNAs are pervasive in mammalian cells, and some of these molecules have been proposed to regulate proximal protein-coding genes in cis For example, non-coding transcription can contribute to inactivation of tumor suppressor genes in cancer, and antisense transcripts have been implicated in the epigenetic inactivation of imprinted genes. However, our knowledge is still limited and more such regulatory interactions likely await discovery. Here, we make use of available gene expression data from a large compendium of human tumors to generate hypotheses regarding non-coding-to-coding cis -regulatory relationships with emphasis on negative associations, as these are less likely to arise for reasons other than cis -regulation. We document a large number of possible regulatory interactions, including 193 coding/non-coding pairs that show expression patterns compatible with negative cis -regulation. Importantly, by this approach we capture several known cases, and many of the involved coding genes have known roles in cancer. Our study provides a large catalog of putative non-coding/coding cis -regulatory pairs that may serve as a basis for further experimental validation and characterization. Copyright © 2018 Basu and Larsson.
Loreni, F; Ruberti, I; Bozzoni, I; Pierandrei-Amaldi, P; Amaldi, F
1985-01-01
Ribosomal protein L1 is encoded by two genes in Xenopus laevis. The comparison of two cDNA sequences shows that the two L1 gene copies (L1a and L1b) have diverged in many silent sites and very few substitution sites; moreover a small duplication occurred at the very end of the coding region of the L1b gene which thus codes for a product five amino acids longer than that coded by L1a. Quantitatively the divergence between the two L1 genes confirms that a whole genome duplication took place in Xenopus laevis approximately 30 million years ago. A genomic fragment containing one of the two L1 gene copies (L1a), with its nine introns and flanking regions, has been completely sequenced. The 5' end of this gene has been mapped within a 20-pyridimine stretch as already found for other vertebrate ribosomal protein genes. Four of the nine introns have a 60-nucleotide sequence with 80% homology; within this region some boxes, one of which is 16 nucleotides long, are 100% homologous among the four introns. This feature of L1a gene introns is interesting since we have previously shown that the activity of this gene is regulated at a post-transcriptional level and it involves the block of the normal splicing of some intron sequences. Images Fig. 3. Fig. 5. PMID:3841512
McLysaght, Aoife; Guerzoni, Daniele
2015-09-26
The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an 'RNA-first' or 'ORF-first' pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations. © 2015 The Authors.
The zebrafish reference genome sequence and its relationship to the human genome.
Howe, Kerstin; Clark, Matthew D; Torroja, Carlos F; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T; Guerra-Assunção, José A; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F; Laird, Gavin K; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Elliot, David; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Begum, Sharmin; Mortimore, Beverley; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Lloyd, Christine; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James D; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Lanz, Christa; Raddatz, Günter; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Schuster, Stephan C; Carter, Nigel P; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M J; Enright, Anton; Geisler, Robert; Plasterk, Ronald H A; Lee, Charles; Westerfield, Monte; de Jong, Pieter J; Zon, Leonard I; Postlethwait, John H; Nüsslein-Volhard, Christiane; Hubbard, Tim J P; Roest Crollius, Hugues; Rogers, Jane; Stemple, Derek L
2013-04-25
Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
The zebrafish reference genome sequence and its relationship to the human genome
Howe, Kerstin; Clark, Matthew D.; Torroja, Carlos F.; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E.; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C.; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T.; Guerra-Assunção, José A.; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F.; Laird, Gavin K.; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M.; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Carter, Nigel P.; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M. J.; Enright, Anton; Geisler, Robert; Plasterk, Ronald H. A.; Lee, Charles; Westerfield, Monte; de Jong, Pieter J.; Zon, Leonard I.; Postlethwait, John H.; Nüsslein-Volhard, Christiane; Hubbard, Tim J. P.; Crollius, Hugues Roest; Rogers, Jane; Stemple, Derek L.
2013-01-01
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination. PMID:23594743
Discovery of a Mammalian Splice Variant of Myostatin That Stimulates Myogenesis
Jeanplong, Ferenc; Falconer, Shelley J.; Oldham, Jenny M.; Thomas, Mark; Gray, Tarra S.; Hennebry, Alex; Matthews, Kenneth G.; Kemp, Frederick C.; Patel, Ketan; Berry, Carole; Nicholas, Gina; McMahon, Christopher D.
2013-01-01
Myostatin plays a fundamental role in regulating the size of skeletal muscles. To date, only a single myostatin gene and no splice variants have been identified in mammals. Here we describe the splicing of a cryptic intron that removes the coding sequence for the receptor binding moiety of sheep myostatin. The deduced polypeptide sequence of the myostatin splice variant (MSV) contains a 256 amino acid N-terminal domain, which is common to myostatin, and a unique C-terminus of 65 amino acids. Western immunoblotting demonstrated that MSV mRNA is translated into protein, which is present in skeletal muscles. To determine the biological role of MSV, we developed an MSV over-expressing C2C12 myoblast line and showed that it proliferated faster than that of the control line in association with an increased abundance of the CDK2/Cyclin E complex in the nucleus. Recombinant protein made for the novel C-terminus of MSV also stimulated myoblast proliferation and bound to myostatin with high affinity as determined by surface plasmon resonance assay. Therefore, we postulated that MSV functions as a binding protein and antagonist of myostatin. Consistent with our postulate, myostatin protein was co-immunoprecipitated from skeletal muscle extracts with an MSV-specific antibody. MSV over-expression in C2C12 myoblasts blocked myostatin-induced Smad2/3-dependent signaling, thereby confirming that MSV antagonizes the canonical myostatin pathway. Furthermore, MSV over-expression increased the abundance of MyoD, Myogenin and MRF4 proteins (P<0.05), which indicates that MSV stimulates myogenesis through the induction of myogenic regulatory factors. To help elucidate a possible role in vivo, we observed that MSV protein was more abundant during early post-natal muscle development, while myostatin remained unchanged, which suggests that MSV may promote the growth of skeletal muscles. We conclude that MSV represents a unique example of intra-genic regulation in which a splice variant directly antagonizes the biological activity of the canonical gene product. PMID:24312578
Beltrame, M; Bianchi, M E
1990-01-01
We have cloned the genes for small acidic ribosomal proteins (A-proteins) of the fission yeast Schizosaccharomyces pombe. S. pombe contains four transcribed genes for small A-proteins per haploid genome, as is the case for Saccharomyces cerevisiae. In contrast, multicellular eucaryotes contain two transcribed genes per haploid genome. The four proteins of S. pombe, besides sharing a high overall similarity, form two couples of nearly identical sequences. Their corresponding genes have a very conserved structure and are transcribed to a similar level. Surprisingly, of each couple of genes coding for nearly identical proteins, one is essential for cell growth, whereas the other is not. We suggest that the unequal importance of the four small A-proteins for cell survival is related to their physical organization in 60S ribosomal subunits. Images PMID:2325655
Kouvelis, Vassili N; Ghikas, Dimitri V; Typas, Milton A
2004-10-01
The mitochondrial genome (mtDNA) of the entomopathogenic fungus Lecanicillium muscarium (synonym Verticillium lecanii) with a total size of 24,499-bp has been analyzed. So far, it is the smallest known mitochondrial genome among Pezizomycotina, with an extremely compact gene organization and only one group-I intron in its large ribosomal RNA (rnl) gene. It contains the 14 typical genes coding for proteins related to oxidative phosphorylation, the two rRNA genes, one intronic ORF coding for a possible ribosomal protein (rps), and a set of 25 tRNA genes which recognize codons for all amino acids, except alanine and cysteine. All genes are transcribed from the same DNA strand. Gene order comparison with all available complete fungal mtDNAs-representatives of all four Phyla are included-revealed some characteristic common features like uninterrupted gene pairs, overlapping genes, and extremely variable intergenic regions, that can all be exploited for the study of fungal mitochondrial genomes. Moreover, a minimum common mtDNA gene order could be detected, in two units, for all known Sordariomycetes namely nad1-nad4-atp8-atp6 and rns-cox3-rnl, which can be extended in Hypocreales, to nad4L-nad5-cob-cox1-nad1-nad4-atp8-atp6 and rns-cox3-rnl nad2-nad3, respectively. Phylogenetic analysis of all fungal mtDNA essential protein-coding genes as one unit, clearly demonstrated the superiority of small genome (mtDNA) over single gene comparisons.
David S. Bischoff; James M. Slavicek
1995-01-01
The Lymantria dispar multinucleocapsid nuclear polyhedrosis virus (LdMNPV) gene encoding G22 was cloned and sequenced. The G22 gene codes for a 191 amino acid protein with a predicted Mr of 22000. Expression of G22 in a rabbit reticulocyte system generated a protein with an M...
Evolution of the alternative AQP2 gene: Acquisition of a novel protein-coding sequence in dolphins.
Kishida, Takushi; Suzuki, Miwa; Takayama, Asuka
2018-01-01
Taxon-specific de novo protein-coding sequences are thought to be important for taxon-specific environmental adaptation. A recent study revealed that bottlenose dolphins acquired a novel isoform of aquaporin 2 generated by alternative splicing (alternative AQP2), which helps dolphins to live in hyperosmotic seawater. The AQP2 gene consists of four exons, but the alternative AQP2 gene lacks the fourth exon and instead has a longer third exon that includes the original third exon and a part of the original third intron. Here, we show that the latter half of the third exon of the alternative AQP2 arose from a non-protein-coding sequence. Intact ORF of this de novo sequence is shared not by all cetaceans, but only by delphinoids. However, this sequence is conservative in all modern cetaceans, implying that this de novo sequence potentially plays important roles for marine adaptation in cetaceans. Copyright © 2017 Elsevier Inc. All rights reserved.
Tetrahymena thermophila acidic ribosomal protein L37 contains an archaebacterial type of C-terminus.
Hansen, T S; Andreasen, P H; Dreisig, H; Højrup, P; Nielsen, H; Engberg, J; Kristiansen, K
1991-09-15
We have cloned and characterized a Tetrahymena thermophila macronuclear gene (L37) encoding the acidic ribosomal protein (A-protein) L37. The gene contains a single intron located in the 3'-part of the coding region. Two major and three minor transcription start points (tsp) were mapped 39 to 63 nucleotides upstream from the translational start codon. The uppermost tsp mapped to the first T in a putative T. thermophila RNA polymerase II initiator element, TATAA. The coding region of L37 predicts a protein of 109 amino acid (aa) residues. A substantial part of the deduced aa sequence was verified by protein sequencing. The T. thermophila L37 clearly belongs to the P1-type family of eukaryotic A-proteins, but the C-terminal region has the hallmarks of archaebacterial A-proteins.
Benoit, Joshua B.; Attardo, Geoffrey M.; Michalkova, Veronika; Krause, Tyler B.; Bohova, Jana; Zhang, Qirui; Baumann, Aaron A.; Mireji, Paul O.; Takáč, Peter; Denlinger, David L.; Ribeiro, Jose M.; Aksoy, Serap
2014-01-01
In tsetse flies, nutrients for intrauterine larval development are synthesized by the modified accessory gland (milk gland) and provided in mother's milk during lactation. Interference with at least two milk proteins has been shown to extend larval development and reduce fecundity. The goal of this study was to perform a comprehensive characterization of tsetse milk proteins using lactation-specific transcriptome/milk proteome analyses and to define functional role(s) for the milk proteins during lactation. Differential analysis of RNA-seq data from lactating and dry (non-lactating) females revealed enrichment of transcripts coding for protein synthesis machinery, lipid metabolism and secretory proteins during lactation. Among the genes induced during lactation were those encoding the previously identified milk proteins (milk gland proteins 1–3, transferrin and acid sphingomyelinase 1) and seven new genes (mgp4–10). The genes encoding mgp2–10 are organized on a 40 kb syntenic block in the tsetse genome, have similar exon-intron arrangements, and share regions of amino acid sequence similarity. Expression of mgp2–10 is female-specific and high during milk secretion. While knockdown of a single mgp failed to reduce fecundity, simultaneous knockdown of multiple variants reduced milk protein levels and lowered fecundity. The genomic localization, gene structure similarities, and functional redundancy of MGP2–10 suggest that they constitute a novel highly divergent protein family. Our data indicates that MGP2–10 function both as the primary amino acid resource for the developing larva and in the maintenance of milk homeostasis, similar to the function of the mammalian casein family of milk proteins. This study underscores the dynamic nature of the lactation cycle and identifies a novel family of lactation-specific proteins, unique to Glossina sp., that are essential to larval development. The specificity of MGP2–10 to tsetse and their critical role during lactation suggests that these proteins may be an excellent target for tsetse-specific population control approaches. PMID:24763277
Maier, Uwe-G; Zauner, Stefan; Woehle, Christian; Bolte, Kathrin; Hempel, Franziska; Allen, John F.; Martin, William F.
2013-01-01
Plastid and mitochondrial genomes have undergone parallel evolution to encode the same functional set of genes. These encode conserved protein components of the electron transport chain in their respective bioenergetic membranes and genes for the ribosomes that express them. This highly convergent aspect of organelle genome evolution is partly explained by the redox regulation hypothesis, which predicts a separate plastid or mitochondrial location for genes encoding bioenergetic membrane proteins of either photosynthesis or respiration. Here we show that convergence in organelle genome evolution is far stronger than previously recognized, because the same set of genes for ribosomal proteins is independently retained by both plastid and mitochondrial genomes. A hitherto unrecognized selective pressure retains genes for the same ribosomal proteins in both organelles. On the Escherichia coli ribosome assembly map, the retained proteins are implicated in 30S and 50S ribosomal subunit assembly and initial rRNA binding. We suggest that ribosomal assembly imposes functional constraints that govern the retention of ribosomal protein coding genes in organelles. These constraints are subordinate to redox regulation for electron transport chain components, which anchor the ribosome to the organelle genome in the first place. As organelle genomes undergo reduction, the rRNAs also become smaller. Below size thresholds of approximately 1,300 nucleotides (16S rRNA) and 2,100 nucleotides (26S rRNA), all ribosomal protein coding genes are lost from organelles, while electron transport chain components remain organelle encoded as long as the organelles use redox chemistry to generate a proton motive force. PMID:24259312
DOE Office of Scientific and Technical Information (OSTI.GOV)
Louie, Tiffany S.; Giovannelli, Donato; Yee, Nathan
Sedimenticola selenatireducens strain AK4OH1 T (= DSM 17993 T = ATCC BAA-1233 T) is a microaerophilic bacterium isolated from sediment from the Arthur Kill intertidal strait between New Jersey and Staten Island, NY. S. selenatireducens is Gram-negative and belongs to the Gammaproteobacteria. Strain AK4OH1 T was the first representative of its genus to be isolated for its unique coupling of the oxidation of aromatic acids to the respiration of selenate. It is a versatile heterotroph and can use a variety of carbon compounds, but can also grow lithoautotrophically under hypoxic and anaerobic conditions. Furthermore, the draft genome comprises 4,588,530 bpmore » and 4276 predicted protein-coding genes including genes for the anaerobic degradation of 4-hydroxybenzoate and benzoate. We report the main features of the genome of S. selenatireducens strain AK4OH1 T.« less
Louie, Tiffany S.; Giovannelli, Donato; Yee, Nathan; ...
2016-09-08
Sedimenticola selenatireducens strain AK4OH1 T (= DSM 17993 T = ATCC BAA-1233 T) is a microaerophilic bacterium isolated from sediment from the Arthur Kill intertidal strait between New Jersey and Staten Island, NY. S. selenatireducens is Gram-negative and belongs to the Gammaproteobacteria. Strain AK4OH1 T was the first representative of its genus to be isolated for its unique coupling of the oxidation of aromatic acids to the respiration of selenate. It is a versatile heterotroph and can use a variety of carbon compounds, but can also grow lithoautotrophically under hypoxic and anaerobic conditions. Furthermore, the draft genome comprises 4,588,530 bpmore » and 4276 predicted protein-coding genes including genes for the anaerobic degradation of 4-hydroxybenzoate and benzoate. We report the main features of the genome of S. selenatireducens strain AK4OH1 T.« less
Wu, S W; De Lencastre, H
1999-01-01
Screening of a library of Tn551 insertional mutants selected for reduction in the methicillin resistance level of the parental Staphylococcus aureus strain COL resulted in the isolation of mutant RUSA266 in which the minimal inhibitory concentration (MIC) of the parent was reduced from 1,600 to 1.5 micrograms/mL. Cloning and sequencing of the vicinity of the insertion site omega 726 identified an open reading frame (orf1365) encoding a very large polypeptide of more than 1,365 amino acids. A unique feature of the deduced amino acid sequence was the presence of multiple tandem repeats of 75 amino acids in the polypeptide, reminiscent of the structure of high-molecular-weight cell-surface proteins EF* and Emb identified in some streptococcal strains. Mutant RUSA266 with the inactivated gene, which we shall provisionally refer to as mrp (for multiple repeat polypeptide), produced a peptidoglycan with altered muropeptide composition, and both the reduced antibiotic resistance and the altered cell wall composition were co-transduced in back-crosses into the parental strain COL. Additional sequencing upstream of mrp has revealed that this gene was part of a five-gene cluster occupying a 9.2-kb region of the staphylococcal chromosome and was composed of glmM (directly upstream of mrp), two open reading frames orf310 and orf269 coding for two hypothetical proteins, and the gene encoding the staphylococcal arginase (arg). Transcriptional analysis demonstrated that the five genes in the cluster were transcribed together.
Hrdlickova, Barbara; Kumar, Vinod; Kanduri, Kartiek; Zhernakova, Daria V; Tripathi, Subhash; Karjalainen, Juha; Lund, Riikka J; Li, Yang; Ullah, Ubaid; Modderman, Rutger; Abdulahad, Wayel; Lähdesmäki, Harri; Franke, Lude; Lahesmaa, Riitta; Wijmenga, Cisca; Withoff, Sebo
2014-01-01
Although genome-wide association studies (GWAS) have identified hundreds of variants associated with a risk for autoimmune and immune-related disorders (AID), our understanding of the disease mechanisms is still limited. In particular, more than 90% of the risk variants lie in non-coding regions, and almost 10% of these map to long non-coding RNA transcripts (lncRNAs). lncRNAs are known to show more cell-type specificity than protein-coding genes. We aimed to characterize lncRNAs and protein-coding genes located in loci associated with nine AIDs which have been well-defined by Immunochip analysis and by transcriptome analysis across seven populations of peripheral blood leukocytes (granulocytes, monocytes, natural killer (NK) cells, B cells, memory T cells, naive CD4(+) and naive CD8(+) T cells) and four populations of cord blood-derived T-helper cells (precursor, primary, and polarized (Th1, Th2) T-helper cells). We show that lncRNAs mapping to loci shared between AID are significantly enriched in immune cell types compared to lncRNAs from the whole genome (α <0.005). We were not able to prioritize single cell types relevant for specific diseases, but we observed five different cell types enriched (α <0.005) in five AID (NK cells for inflammatory bowel disease, juvenile idiopathic arthritis, primary biliary cirrhosis, and psoriasis; memory T and CD8(+) T cells in juvenile idiopathic arthritis, primary biliary cirrhosis, psoriasis, and rheumatoid arthritis; Th0 and Th2 cells for inflammatory bowel disease, juvenile idiopathic arthritis, primary biliary cirrhosis, psoriasis, and rheumatoid arthritis). Furthermore, we show that co-expression analyses of lncRNAs and protein-coding genes can predict the signaling pathways in which these AID-associated lncRNAs are involved. The observed enrichment of lncRNA transcripts in AID loci implies lncRNAs play an important role in AID etiology and suggests that lncRNA genes should be studied in more detail to interpret GWAS findings correctly. The co-expression results strongly support a model in which the lncRNA and protein-coding genes function together in the same pathways.
Suzuki, Takashi; Brown, Judy J.; Swift, Larry L.
2016-01-01
Microsomal triglyceride transfer protein (MTP) is essential for the assembly of triglyceride-rich apolipoprotein B-containing lipoproteins. Previous studies in our laboratory identified a novel splice variant of MTP in mice that we named MTP-B. MTP-B has a unique first exon (1B) located 2.7 kB upstream of the first exon (1A) for canonical MTP (MTP-A). The two mature isoforms, though nearly identical in sequence and function, have different tissue expression patterns. In this study we report the identification of a second MTP splice variant (MTP-C), which contains both exons 1B and 1A. MTP-C is expressed in all the tissues we tested. In cells transfected with MTP-C, protein expression was less than 15% of that found when the cells were transfected with MTP-A or MTP-B. In silico analysis of the 5’-UTR of MTP-C revealed seven ATGs upstream of the start site for MTP-A, which is the only viable start site in frame with the main coding sequence. One of those ATGs was located in the 5’-UTR for MTP-A. We generated reporter constructs in which the 5’-UTRs of MTP-A or MTP-C were inserted between an SV40 promoter and the coding sequence of the luciferase gene and transfected these constructs into HEK 293 cells. Luciferase activity was significantly reduced by the MTP-C 5’-UTR, but not by the MTP-A 5’-UTR. We conclude that alternative splicing plays a key role in regulating MTP expression by introducing unique 5’-UTRs, which contain elements that alter translation efficiency, enabling the cell to optimize MTP levels and activity. PMID:26771188
Caffrey, Sean M.; Park, Hyung Soo; Been, Jenny; Gordon, Paul; Sensen, Christoph W.; Voordouw, Gerrit
2008-01-01
The genome sequence of the sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough was reanalyzed to design unique 70-mer oligonucleotide probes against 2,824 probable protein-coding regions. These included three genes not previously annotated, including one that encodes a c-type cytochrome. Using microarrays printed with these 70-mer probes, we analyzed the gene expression profile of wild-type D. vulgaris grown on cathodic hydrogen, generated at an iron electrode surface with an imposed negative potential of −1.1 V (cathodic protection conditions). The gene expression profile of cells grown on cathodic hydrogen was compared to that of cells grown with gaseous hydrogen bubbling through the culture. Relative to the latter, the electrode-grown cells overexpressed two hydrogenases, the hyn-1 genes for [NiFe] hydrogenase 1 and the hyd genes, encoding [Fe] hydrogenase. The hmc genes for the high-molecular-weight cytochrome complex, which allows electron flow from the hydrogenases across the cytoplasmic membrane, were also overexpressed. In contrast, cells grown on gaseous hydrogen overexpressed the hys genes for [NiFeSe] hydrogenase. Cells growing on the electrode also overexpressed genes encoding proteins which promote biofilm formation. Although the gene expression profiles for these two modes of growth were distinct, they were more closely related to each other than to that for cells grown in a lactate- and sulfate-containing medium. Electrochemically measured corrosion rates were lower for iron electrodes covered with hyn-1, hyd, and hmc mutant biofilms than for wild-type biofilms. This confirms the importance, suggested by the gene expression studies, of the corresponding gene products in D. vulgaris-mediated iron corrosion. PMID:18310429
Huntley, Stuart; Baggott, Daniel M.; Hamilton, Aaron T.; Tran-Gyamfi, Mary; Yang, Shan; Kim, Joomyeong; Gordon, Laurie; Branscomb, Elbert; Stubbs, Lisa
2006-01-01
Krüppel-type zinc finger (ZNF) motifs are prevalent components of transcription factor proteins in all eukaryotes. KRAB-ZNF proteins, in which a potent repressor domain is attached to a tandem array of DNA-binding zinc-finger motifs, are specific to tetrapod vertebrates and represent the largest class of ZNF proteins in mammals. To define the full repertoire of human KRAB-ZNF proteins, we searched the genome sequence for key motifs and then constructed and manually curated gene models incorporating those sequences. The resulting gene catalog contains 423 KRAB-ZNF protein-coding loci, yielding alternative transcripts that altogether predict at least 742 structurally distinct proteins. Active rounds of segmental duplication, involving single genes or larger regions and including both tandem and distributed duplication events, have driven the expansion of this mammalian gene family. Comparisons between the human genes and ZNF loci mined from the draft mouse, dog, and chimpanzee genomes not only identified 103 KRAB-ZNF genes that are conserved in mammals but also highlighted a substantial level of lineage-specific change; at least 136 KRAB-ZNF coding genes are primate specific, including many recent duplicates. KRAB-ZNF genes are widely expressed and clustered genes are typically not coregulated, indicating that paralogs have evolved to fill roles in many different biological processes. To facilitate further study, we have developed a Web-based public resource with access to gene models, sequences, and other data, including visualization tools to provide genomic context and interaction with other public data sets. PMID:16606702
Hiller, Ekkehard; Istel, Fabian; Tscherner, Michael; Brunke, Sascha; Ames, Lauren; Firon, Arnaud; Green, Brian; Cabral, Vitor; Marcet-Houben, Marina; Jacobsen, Ilse D.; Quintin, Jessica; Seider, Katja; Frohner, Ingrid; Glaser, Walter; Jungwirth, Helmut; Bachellier-Bassi, Sophie; Chauvel, Murielle; Zeidler, Ute; Ferrandon, Dominique; Gabaldón, Toni; Hube, Bernhard; d'Enfert, Christophe; Rupp, Steffen; Cormack, Brendan; Haynes, Ken; Kuchler, Karl
2014-01-01
The opportunistic fungal pathogen Candida glabrata is a frequent cause of candidiasis, causing infections ranging from superficial to life-threatening disseminated disease. The inherent tolerance of C. glabrata to azole drugs makes this pathogen a serious clinical threat. To identify novel genes implicated in antifungal drug tolerance, we have constructed a large-scale C. glabrata deletion library consisting of 619 unique, individually bar-coded mutant strains, each lacking one specific gene, all together representing almost 12% of the genome. Functional analysis of this library in a series of phenotypic and fitness assays identified numerous genes required for growth of C. glabrata under normal or specific stress conditions, as well as a number of novel genes involved in tolerance to clinically important antifungal drugs such as azoles and echinocandins. We identified 38 deletion strains displaying strongly increased susceptibility to caspofungin, 28 of which encoding proteins that have not previously been linked to echinocandin tolerance. Our results demonstrate the potential of the C. glabrata mutant collection as a valuable resource in functional genomics studies of this important fungal pathogen of humans, and to facilitate the identification of putative novel antifungal drug target and virulence genes. PMID:24945925
Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.
Al-Tobasei, Rafet; Paneru, Bam; Salem, Mohamed
2016-01-01
The ENCODE project revealed that ~70% of the human genome is transcribed. While only 1-2% of the RNAs encode for proteins, the rest are non-coding RNAs. Long non-coding RNAs (lncRNAs) form a diverse class of non-coding RNAs that are longer than 200 nt. Emerging evidence indicates that lncRNAs play critical roles in various cellular processes including regulation of gene expression. LncRNAs show low levels of gene expression and sequence conservation, which make their computational identification in genomes difficult. In this study, more than two billion Illumina sequence reads were mapped to the genome reference using the TopHat and Cufflinks software. Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed. In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test. Depending on the filtering stringency conditions, between 31,195 and 54,503 lncRNAs were identified, with only 421 matching known lncRNAs in other species. A digital gene expression atlas revealed 2,935 tissue-specific and 3,269 ubiquitously-expressed lncRNAs. This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.
Blazie, Stephen M.; Geissel, Heather C.; Wilky, Henry; Joshi, Rajan; Newbern, Jason; Mangone, Marco
2017-01-01
mRNA expression dynamics promote and maintain the identity of somatic tissues in living organisms; however, their impact in post-transcriptional gene regulation in these processes is not fully understood. Here, we applied the PAT-Seq approach to systematically isolate, sequence, and map tissue-specific mRNA from five highly studied Caenorhabditis elegans somatic tissues: GABAergic and NMDA neurons, arcade and intestinal valve cells, seam cells, and hypodermal tissues, and studied their mRNA expression dynamics. The integration of these datasets with previously profiled transcriptomes of intestine, pharynx, and body muscle tissues, precisely assigns tissue-specific expression dynamics for 60% of all annotated C. elegans protein-coding genes, providing an important resource for the scientific community. The mapping of 15,956 unique high-quality tissue-specific polyA sites in all eight somatic tissues reveals extensive tissue-specific 3′untranslated region (3′UTR) isoform switching through alternative polyadenylation (APA) . Almost all ubiquitously transcribed genes use APA and harbor miRNA targets in their 3′UTRs, which are commonly lost in a tissue-specific manner, suggesting widespread usage of post-transcriptional gene regulation modulated through APA to fine tune tissue-specific protein expression. Within this pool, the human disease gene C. elegans orthologs rack-1 and tct-1 use APA to switch to shorter 3′UTR isoforms in order to evade miRNA regulation in the body muscle tissue, resulting in increased protein expression needed for proper body muscle function. Our results highlight a major positive regulatory role for APA, allowing genes to counteract miRNA regulation on a tissue-specific basis. PMID:28348061
Blazie, Stephen M; Geissel, Heather C; Wilky, Henry; Joshi, Rajan; Newbern, Jason; Mangone, Marco
2017-06-01
mRNA expression dynamics promote and maintain the identity of somatic tissues in living organisms; however, their impact in post-transcriptional gene regulation in these processes is not fully understood. Here, we applied the PAT-Seq approach to systematically isolate, sequence, and map tissue-specific mRNA from five highly studied Caenorhabditis elegans somatic tissues: GABAergic and NMDA neurons, arcade and intestinal valve cells, seam cells, and hypodermal tissues, and studied their mRNA expression dynamics. The integration of these datasets with previously profiled transcriptomes of intestine, pharynx, and body muscle tissues, precisely assigns tissue-specific expression dynamics for 60% of all annotated C. elegans protein-coding genes, providing an important resource for the scientific community. The mapping of 15,956 unique high-quality tissue-specific polyA sites in all eight somatic tissues reveals extensive tissue-specific 3'untranslated region (3'UTR) isoform switching through alternative polyadenylation (APA) . Almost all ubiquitously transcribed genes use APA and harbor miRNA targets in their 3'UTRs, which are commonly lost in a tissue-specific manner, suggesting widespread usage of post-transcriptional gene regulation modulated through APA to fine tune tissue-specific protein expression. Within this pool, the human disease gene C. elegans orthologs rack-1 and tct-1 use APA to switch to shorter 3'UTR isoforms in order to evade miRNA regulation in the body muscle tissue, resulting in increased protein expression needed for proper body muscle function. Our results highlight a major positive regulatory role for APA, allowing genes to counteract miRNA regulation on a tissue-specific basis. Copyright © 2017 Blazie et al.
Hobbs, Matthew; Pavasovic, Ana; King, Andrew G; Prentis, Peter J; Eldridge, Mark D B; Chen, Zhiliang; Colgan, Donald J; Polkinghorne, Adam; Wilkins, Marc R; Flanagan, Cheyne; Gillett, Amber; Hanger, Jon; Johnson, Rebecca N; Timms, Peter
2014-09-11
The koala, Phascolarctos cinereus, is a biologically unique and evolutionarily distinct Australian arboreal marsupial. The goal of this study was to sequence the transcriptome from several tissues of two geographically separate koalas, and to create the first comprehensive catalog of annotated transcripts for this species, enabling detailed analysis of the unique attributes of this threatened native marsupial, including infection by the koala retrovirus. RNA-Seq data was generated from a range of tissues from one male and one female koala and assembled de novo into transcripts using Velvet-Oases. Transcript abundance in each tissue was estimated. Transcripts were searched for likely protein-coding regions and a non-redundant set of 117,563 putative protein sequences was produced. In similarity searches there were 84,907 (72%) sequences that aligned to at least one sequence in the NCBI nr protein database. The best alignments were to sequences from other marsupials. After applying a reciprocal best hit requirement of koala sequences to those from tammar wallaby, Tasmanian devil and the gray short-tailed opossum, we estimate that our transcriptome dataset represents approximately 15,000 koala genes. The marsupial alignment information was used to look for potential gene duplications and we report evidence for copy number expansion of the alpha amylase gene, and of an aldehyde reductase gene.Koala retrovirus (KoRV) transcripts were detected in the transcriptomes. These were analysed in detail and the structure of the spliced envelope gene transcript was determined. There was appreciable sequence diversity within KoRV, with 233 sites in the KoRV genome showing small insertions/deletions or single nucleotide polymorphisms. Both koalas had sequences from the KoRV-A subtype, but the male koala transcriptome has, in addition, sequences more closely related to the KoRV-B subtype. This is the first report of a KoRV-B-like sequence in a wild population. This transcriptomic dataset is a useful resource for molecular genetic studies of the koala, for evolutionary genetic studies of marsupials, for validation and annotation of the koala genome sequence, and for investigation of koala retrovirus. Annotated transcripts can be browsed and queried at http://koalagenome.org.
The complete mitochondrial genome of the Giant Manta ray, Manta birostris.
Hinojosa-Alvarez, Silvia; Díaz-Jaimes, Pindaro; Marcet-Houben, Marina; Gabaldón, Toni
2015-01-01
The complete mitochondrial genome of the giant manta ray (Manta birostris), consists of 18,075 bp with rich A + T and low G content. Gene organization and length is similar to other species of ray. It comprises of 13 protein-coding genes, 2 rRNAs genes, 23 tRNAs genes and 1 non-coding sequence, and the control region. We identified an AT tandem repeat region, similar to that reported in Mobula japanica.
Origins of Genes: "Big Bang" or Continuous Creation?
NASA Astrophysics Data System (ADS)
Kesse, Paul K.; Gibbs, Adrian
1992-10-01
Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes.
Biodegradation of DDT by Stenotrophomonas sp. DDT-1: Characterization and genome functional analysis
NASA Astrophysics Data System (ADS)
Pan, Xiong; Lin, Dunli; Zheng, Yuan; Zhang, Qian; Yin, Yuanming; Cai, Lin; Fang, Hua; Yu, Yunlong
2016-02-01
A novel bacterium capable of utilizing 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT) as the sole carbon and energy source was isolated from a contaminated soil which was identified as Stenotrophomonas sp. DDT-1 based on morphological characteristics, BIOLOG GN2 microplate profile, and 16S rDNA phylogeny. Genome sequencing and functional annotation of the isolate DDT-1 showed a 4,514,569 bp genome size, 66.92% GC content, 4,033 protein-coding genes, and 76 RNA genes including 8 rRNA genes. Totally, 2,807 protein-coding genes were assigned to Clusters of Orthologous Groups (COGs), and 1,601 protein-coding genes were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The degradation half-lives of DDT increased with substrate concentration from 0.1 to 10.0 mg/l, whereas decreased with temperature from 15 °C to 35 °C. Neutral condition was the most favorable for DDT biodegradation. Based on genome annotation of DDT degradation genes and the metabolites detected by GC-MS, a mineralization pathway was proposed for DDT biodegradation in which it was orderly converted into DDE/DDD, DDMU, DDOH, and DDA via dechlorination, hydroxylation, and carboxylation, and ultimately mineralized to carbon dioxide. The results indicate that the isolate DDT-1 is a promising bacterial resource for the removal or detoxification of DDT residues in the environment.
Pan, Xiong; Lin, Dunli; Zheng, Yuan; Zhang, Qian; Yin, Yuanming; Cai, Lin; Fang, Hua; Yu, Yunlong
2016-02-18
A novel bacterium capable of utilizing 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT) as the sole carbon and energy source was isolated from a contaminated soil which was identified as Stenotrophomonas sp. DDT-1 based on morphological characteristics, BIOLOG GN2 microplate profile, and 16S rDNA phylogeny. Genome sequencing and functional annotation of the isolate DDT-1 showed a 4,514,569 bp genome size, 66.92% GC content, 4,033 protein-coding genes, and 76 RNA genes including 8 rRNA genes. Totally, 2,807 protein-coding genes were assigned to Clusters of Orthologous Groups (COGs), and 1,601 protein-coding genes were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The degradation half-lives of DDT increased with substrate concentration from 0.1 to 10.0 mg/l, whereas decreased with temperature from 15 °C to 35 °C. Neutral condition was the most favorable for DDT biodegradation. Based on genome annotation of DDT degradation genes and the metabolites detected by GC-MS, a mineralization pathway was proposed for DDT biodegradation in which it was orderly converted into DDE/DDD, DDMU, DDOH, and DDA via dechlorination, hydroxylation, and carboxylation, and ultimately mineralized to carbon dioxide. The results indicate that the isolate DDT-1 is a promising bacterial resource for the removal or detoxification of DDT residues in the environment.
Nmf9 Encodes a Highly Conserved Protein Important to Neurological Function in Mice and Flies.
Zhang, Shuxiao; Ross, Kevin D; Seidner, Glen A; Gorman, Michael R; Poon, Tiffany H; Wang, Xiaobo; Keithley, Elizabeth M; Lee, Patricia N; Martindale, Mark Q; Joiner, William J; Hamilton, Bruce A
2015-07-01
Many protein-coding genes identified by genome sequencing remain without functional annotation or biological context. Here we define a novel protein-coding gene, Nmf9, based on a forward genetic screen for neurological function. ENU-induced and genome-edited null mutations in mice produce deficits in vestibular function, fear learning and circadian behavior, which correlated with Nmf9 expression in inner ear, amygdala, and suprachiasmatic nuclei. Homologous genes from unicellular organisms and invertebrate animals predict interactions with small GTPases, but the corresponding domains are absent in mammalian Nmf9. Intriguingly, homozygotes for null mutations in the Drosophila homolog, CG45058, show profound locomotor defects and premature death, while heterozygotes show striking effects on sleep and activity phenotypes. These results link a novel gene orthology group to discrete neurological functions, and show conserved requirement across wide phylogenetic distance and domain level structural changes.
Testa, Alison C; Hane, James K; Ellwood, Simon R; Oliver, Richard P
2015-03-11
The impact of gene annotation quality on functional and comparative genomics makes gene prediction an important process, particularly in non-model species, including many fungi. Sets of homologous protein sequences are rarely complete with respect to the fungal species of interest and are often small or unreliable, especially when closely related species have not been sequenced or annotated in detail. In these cases, protein homology-based evidence fails to correctly annotate many genes, or significantly improve ab initio predictions. Generalised hidden Markov models (GHMM) have proven to be invaluable tools in gene annotation and, recently, RNA-seq has emerged as a cost-effective means to significantly improve the quality of automated gene annotation. As these methods do not require sets of homologous proteins, improving gene prediction from these resources is of benefit to fungal researchers. While many pipelines now incorporate RNA-seq data in training GHMMs, there has been relatively little investigation into additionally combining RNA-seq data at the point of prediction, and room for improvement in this area motivates this study. CodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. RNA-seq data informs annotations both during gene-model training and in prediction. Our approach capitalises on the high quality of fungal transcript assemblies by incorporating predictions made directly from transcript sequences. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci. Stringent benchmarking against high-confidence annotation subsets showed CodingQuarry predicted 91.3% of Schizosaccharomyces pombe genes and 90.4% of Saccharomyces cerevisiae genes perfectly. These results are 4-5% better than those of AUGUSTUS, the next best performing RNA-seq driven gene predictor tested. Comparisons against whole genome Sc. pombe and S. cerevisiae annotations further substantiate a 4-5% improvement in the number of correctly predicted genes. We demonstrate the success of a novel method of incorporating RNA-seq data into GHMM fungal gene prediction. This shows that a high quality annotation can be achieved without relying on protein homology or a training set of genes. CodingQuarry is freely available ( https://sourceforge.net/projects/codingquarry/ ), and suitable for incorporation into genome annotation pipelines.
Yang, W; Du, W W; Li, X; Yee, A J; Yang, B B
2016-07-28
It has recently been shown that the upregulation of a pseudogene specific to a protein-coding gene could function as a sponge to bind multiple potential targeting microRNAs (miRNAs), resulting in increased gene expression. Similarly, it was recently demonstrated that circular RNAs can function as sponges for miRNAs, and could upregulate expression of mRNAs containing an identical sequence. Furthermore, some mRNAs are now known to not only translate protein, but also function to sponge miRNA binding, facilitating gene expression. Collectively, these appear to be effective mechanisms to ensure gene expression and protein activity. Here we show that expression of a member of the forkhead family of transcription factors, Foxo3, is regulated by the Foxo3 pseudogene (Foxo3P), and Foxo3 circular RNA, both of which bind to eight miRNAs. We found that the ectopic expression of the Foxo3P, Foxo3 circular RNA and Foxo3 mRNA could all suppress tumor growth and cancer cell proliferation and survival. Our results showed that at least three mechanisms are used to ensure protein translation of Foxo3, which reflects an essential role of Foxo3 and its corresponding non-coding RNAs.
Liu, Kuimei; Dong, Yanmei; Wang, Fangzhong; Jiang, Baojie; Wang, Mingyu; Fang, Xu
2016-01-01
Homologs of the velvet protein family are encoded by the ve1, vel2, and vel3 genes in Trichoderma reesei. To test their regulatory functions, the velvet protein-coding genes were disrupted, generating Δve1, Δvel2, and Δvel3 strains. The phenotypic features of these strains were examined to identify their functions in morphogenesis, sporulation, and cellulase expression. The three velvet-deficient strains produced more hyphal branches, indicating that velvet family proteins participate in the morphogenesis in T. reesei. Deletion of ve1 and vel3 did not affect biomass accumulation, while deletion of vel2 led to a significantly hampered growth when cellulose was used as the sole carbon source in the medium. The deletion of either ve1 or vel2 led to the sharp decrease of sporulation as well as a global downregulation of cellulase-coding genes. In contrast, although the expression of cellulase-coding genes of the ∆vel3 strain was downregulated in the dark, their expression in light condition was unaffected. Sporulation was hampered in the ∆vel3 strain. These results suggest that Ve1 and Vel2 play major roles, whereas Vel3 plays a minor role in sporulation, morphogenesis, and cellulase expression.
Gozes, Illana; Yeheskel, Adva; Pasmanik-Chor, Metsada
2015-01-01
The recent finding of activity-dependent neuroprotective protein (ADNP) as a protein decreased in serum of patients with Alzheimer's disease (AD) compared to controls, alongside with the discovery of ADNP mutations in autism and coupled with the original description of cancer mutations, ignited an interest for a comparative analysis of ADNP with other AD/autism/cancer-associated genes. We strive toward a better understanding of the molecular structure of key players in psychiatric/neurodegenerative diseases including autism, schizophrenia, and AD. This article includes data mining and bioinformatics analysis on the ADNP gene and protein, in addition to other related genes, with emphasis on recent literature. ADNP is discovered here as unique to chordata with specific autism mutations different from cancer-associated mutation. Furthermore, ADNP exhibits similarities to other cancer/autism-associated genes. We suggest that key genes, which shape and maintain our brain and are prone to mutations, are by in large unique to chordata. Furthermore, these brain-controlling genes, like ADNP, are linked to cell growth and differentiation, and under different stress conditions may mutate or exhibit expression changes leading to cancer propagation. Better understanding of these genes could lead to better therapeutics.
Moroz, Leonid L.; Kohn, Andrea B.
2015-01-01
Hypotheses of origins and evolution of neurons and synapses are controversial, mostly due to limited comparative data. Here, we investigated the genome-wide distribution of the bilaterian “synaptic” and “neuronal” protein-coding genes in non-bilaterian basal metazoans (Ctenophora, Porifera, Placozoa, and Cnidaria). First, there are no recognized genes uniquely expressed in neurons across all metazoan lineages. None of the so-called pan-neuronal genes such as embryonic lethal abnormal vision (ELAV), Musashi, or Neuroglobin are expressed exclusively in neurons of the ctenophore Pleurobrachia. Second, our comparative analysis of about 200 genes encoding canonical presynaptic and postsynaptic proteins in bilaterians suggests that there are no true “pan-synaptic” genes or genes uniquely and specifically attributed to all classes of synapses. The majority of these genes encode receptive and secretory complexes in a broad spectrum of eukaryotes. Trichoplax (Placozoa) an organism without neurons and synapses has more orthologs of bilaterian synapse-related/neuron-related genes than do ctenophores—the group with well-developed neuronal and synaptic organization. Third, the majority of genes encoding ion channels and ionotropic receptors are broadly expressed in unicellular eukaryotes and non-neuronal tissues in metazoans. Therefore, they cannot be viewed as neuronal markers. Nevertheless, the co-expression of multiple types of ion channels and receptors does correlate with the presence of neural and synaptic organization. As an illustrative example, the ctenophore genomes encode a greater diversity of ion channels and ionotropic receptors compared with the genomes of the placozoan Trichoplax and the demosponge Amphimedon. Surprisingly, both placozoans and sponges have a similar number of orthologs of “synaptic” proteins as we identified in the genomes of two ctenophores. Ctenophores have a distinct synaptic organization compared with other animals. Our analysis of transcriptomes from 10 different ctenophores did not detect recognized orthologs of synthetic enzymes encoding several classical, low-molecular-weight (neuro)transmitters; glutamate signaling machinery is one of the few exceptions. Novel peptidergic signaling molecules were predicted for ctenophores, together with the diversity of putative receptors including SCNN1/amiloride-sensitive sodium channel-like channels, many of which could be examples of a lineage-specific expansion within this group. In summary, our analysis supports the hypothesis of independent evolution of neurons and, as corollary, a parallel evolution of synapses. We suggest that the formation of synaptic machinery might occur more than once over 600 million years of animal evolution. PMID:26454853
Auer, Paul L; Nalls, Mike; Meschia, James F; Worrall, Bradford B; Longstreth, W T; Seshadri, Sudha; Kooperberg, Charles; Burger, Kathleen M; Carlson, Christopher S; Carty, Cara L; Chen, Wei-Min; Cupples, L Adrienne; DeStefano, Anita L; Fornage, Myriam; Hardy, John; Hsu, Li; Jackson, Rebecca D; Jarvik, Gail P; Kim, Daniel S; Lakshminarayan, Kamakshi; Lange, Leslie A; Manichaikul, Ani; Quinlan, Aaron R; Singleton, Andrew B; Thornton, Timothy A; Nickerson, Deborah A; Peters, Ulrike; Rich, Stephen S
2015-07-01
Stroke is the second leading cause of death and the third leading cause of years of life lost. Genetic factors contribute to stroke prevalence, and candidate gene and genome-wide association studies (GWAS) have identified variants associated with ischemic stroke risk. These variants often have small effects without obvious biological significance. Exome sequencing may discover predicted protein-altering variants with a potentially large effect on ischemic stroke risk. To investigate the contribution of rare and common genetic variants to ischemic stroke risk by targeting the protein-coding regions of the human genome. The National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) analyzed approximately 6000 participants from numerous cohorts of European and African ancestry. For discovery, 365 cases of ischemic stroke (small-vessel and large-vessel subtypes) and 809 European ancestry controls were sequenced; for replication, 47 affected sibpairs concordant for stroke subtype and an African American case-control series were sequenced, with 1672 cases and 4509 European ancestry controls genotyped. The ESP's exome sequencing and genotyping started on January 1, 2010, and continued through June 30, 2012. Analyses were conducted on the full data set between July 12, 2012, and July 13, 2013. Discovery of new variants or genes contributing to ischemic stroke risk and subtype (primary analysis) and determination of support for protein-coding variants contributing to risk in previously published candidate genes (secondary analysis). We identified 2 novel genes associated with an increased risk of ischemic stroke: a protein-coding variant in PDE4DIP (rs1778155; odds ratio, 2.15; P = 2.63 × 10(-8)) with an intracellular signal transduction mechanism and in ACOT4 (rs35724886; odds ratio, 2.04; P = 1.24 × 10(-7)) with a fatty acid metabolism; confirmation of PDE4DIP was observed in affected sibpair families with large-vessel stroke subtype and in African Americans. Replication of protein-coding variants in candidate genes was observed for 2 previously reported GWAS associations: ZFHX3 (cardioembolic stroke) and ABCA1 (large-vessel stroke). Exome sequencing discovered 2 novel genes and mechanisms, PDE4DIP and ACOT4, associated with increased risk for ischemic stroke. In addition, ZFHX3 and ABCA1 were discovered to have protein-coding variants associated with ischemic stroke. These results suggest that genetic variation in novel pathways contributes to ischemic stroke risk and serves as a target for prediction, prevention, and therapy.
Decoding sORF translation - from small proteins to gene regulation.
Cabrera-Quio, Luis Enrique; Herberg, Sarah; Pauli, Andrea
2016-11-01
Translation is best known as the fundamental mechanism by which the ribosome converts a sequence of nucleotides into a string of amino acids. Extensive research over many years has elucidated the key principles of translation, and the majority of translated regions were thought to be known. The recent discovery of wide-spread translation outside of annotated protein-coding open reading frames (ORFs) came therefore as a surprise, raising the intriguing possibility that these newly discovered translated regions might have unrecognized protein-coding or gene-regulatory functions. Here, we highlight recent findings that provide evidence that some of these newly discovered translated short ORFs (sORFs) encode functional, previously missed small proteins, while others have regulatory roles. Based on known examples we will also speculate about putative additional roles and the potentially much wider impact that these translated regions might have on cellular homeostasis and gene regulation.
Sanders, Edward; Diehl, Svenja
2015-01-01
Background Many cancers adopt a metabolism that is characterized by the well-known Warburg effect (aerobic glycolysis). Recently, numerous attempts have been made to treat cancer by targeting one or more gene products involved in this pathway without notable success. This work outlines a transcriptomic approach to identify genes that are highly perturbed in clear cell renal cell carcinoma (CCRCC). Methods We developed a model of the extended Warburg effect and outlined the model using Cytoscape. Following this, gene expression fold changes (FCs) for tumor and adjacent normal tissue from patients with CCRCC (GSE6344) were mapped on to the network. Gene expression values with FCs of greater than two were considered as potential targets for treatment of CCRCC. Results The Cytoscape network includes glycolysis, gluconeogenesis, the pentose phosphate pathway (PPP), the TCA cycle, the serine/glycine pathway, and partial glutaminolysis and fatty acid synthesis pathways. Gene expression FCs for nine of the 10 CCRCC patients in the GSE6344 data set were consistent with a shift to aerobic glycolysis. Genes involved in glycolysis and the synthesis and transport of lactate were over-expressed, as was the gene that codes for the kinase that inhibits the conversion of pyruvate to acetyl-CoA. Interestingly, genes that code for unique proteins involved in gluconeogenesis were strongly under-expressed as was also the case for the serine/glycine pathway. These latter two results suggest that the role attributed to the M2 isoform of pyruvate kinase (PKM2), frequently the principal isoform of PK present in cancer: i.e. causing a buildup of glucose metabolites that are shunted into branch pathways for synthesis of key biomolecules, may not be operative in CCRCC. The fact that there was no increase in the expression FC of any gene in the PPP is consistent with this hypothesis. Literature protein data generally support the transcriptomic findings. Conclusions A number of key genes have been identified that could serve as valid targets for anti-cancer pharmaceutical agents. Genes that are highly over-expressed include ENO2, HK2, PFKP, SLC2A3, PDK1, and SLC16A1. Genes that are highly under-expressed include ALDOB, PKLR, PFKFB2, G6PC, PCK1, FBP1, PC, and SUCLG1. PMID:25859558
Analysis and recognition of 5′ UTR intron splice sites in human pre-mRNA
Eden, E.; Brunak, S.
2004-01-01
Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5′ untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to ‘pure’ UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by ‘coding’ noise, thus enhancing significantly the prediction of 5′ UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3′ ends of non-coding exons and 5′ non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2–3-fold better compared with NetGene2 and GenScan in 5′ UTRs. We also tested the 5′ UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR. PMID:14960723
The genome of Eucalyptus grandis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Myburg, Alexander A.; Grattapaglia, Dario; Tuskan, Gerald A.
Eucalypts are the world s most widely planted hardwood trees. Their broad adaptability, rich species diversity, fast growth and superior multipurpose wood, have made them a global renewable resource of fiber and energy that mitigates human pressures on natural forests. We sequenced and assembled >94% of the 640 Mbp genome of Eucalyptus grandis into its 11 chromosomes. A set of 36,376 protein coding genes were predicted revealing that 34% occur in tandem duplications, the largest proportion found thus far in any plant genome. Eucalypts also show the highest diversity of genes for plant specialized metabolism that act as chemical defencemore » against biotic agents and provide unique pharmaceutical oils. Resequencing of a set of inbred tree genomes revealed regions of strongly conserved heterozygosity, likely hotspots of inbreeding depression. The resequenced genome of the sister species E. globulus underscored the high inter-specific genome colinearity despite substantial genome size variation in the genus. The genome of E. grandis is the first reference for the early diverging Rosid order Myrtales and is placed here basal to the Eurosids. This resource expands knowledge on the unique biology of large woody perennials and provides a powerful tool to accelerate comparative biology, breeding and biotechnology.« less
Rappaport, Noa; Fishilevich, Simon; Nudel, Ron; Twik, Michal; Belinky, Frida; Plaschkes, Inbar; Stein, Tsippi Iny; Cohen, Dana; Oz-Levi, Danit; Safran, Marilyn; Lancet, Doron
2017-08-18
A key challenge in the realm of human disease research is next generation sequencing (NGS) interpretation, whereby identified filtered variant-harboring genes are associated with a patient's disease phenotypes. This necessitates bioinformatics tools linked to comprehensive knowledgebases. The GeneCards suite databases, which include GeneCards (human genes), MalaCards (human diseases) and PathCards (human pathways) together with additional tools, are presented with the focus on MalaCards utility for NGS interpretation as well as for large scale bioinformatic analyses. VarElect, our NGS interpretation tool, leverages the broad information in the GeneCards suite databases. MalaCards algorithms unify disease-related terms and annotations from 69 sources. Further, MalaCards defines hierarchical relatedness-aliases, disease families, a related diseases network, categories and ontological classifications. GeneCards and MalaCards delineate and share a multi-tiered, scored gene-disease network, with stringency levels, including the definition of elite status-high quality gene-disease pairs, coming from manually curated trustworthy sources, that includes 4500 genes for 8000 diseases. This unique resource is key to NGS interpretation by VarElect. VarElect, a comprehensive search tool that helps infer both direct and indirect links between genes and user-supplied disease/phenotype terms, is robustly strengthened by the information found in MalaCards. The indirect mode benefits from GeneCards' diverse gene-to-gene relationships, including SuperPaths-integrated biological pathways from 12 information sources. We are currently adding an important information layer in the form of "disease SuperPaths", generated from the gene-disease matrix by an algorithm similar to that previously employed for biological pathway unification. This allows the discovery of novel gene-disease and disease-disease relationships. The advent of whole genome sequencing necessitates capacities to go beyond protein coding genes. GeneCards is highly useful in this respect, as it also addresses 101,976 non-protein-coding RNA genes. In a more recent development, we are currently adding an inclusive map of regulatory elements and their inferred target genes, generated by integration from 4 resources. MalaCards provides a rich big-data scaffold for in silico biomedical discovery within the gene-disease universe. VarElect, which depends significantly on both GeneCards and MalaCards power, is a potent tool for supporting the interpretation of wet-lab experiments, notably NGS analyses of disease. The GeneCards suite has thus transcended its 2-decade role in biomedical research, maturing into a key player in clinical investigation.
A Genomic View of the Sea Urchin Nervous System
Burke, RD; Angerer, LM; Elphick, MR; Humphrey, GW; Yaguchi, S; Kiyama, T; Liang, S; Mu, X; Agca, C; Klein, WH; Brandhorst, BP; Rowe, M; Wilson, K; Churcher, AM; Taylor, JS; Chen, N; Murray, G; Wang, D; Mellott, D; Olinski, R; Hallböök, F; Thorndyke, MC
2007-01-01
The sequencing of the Strongylocentrotus purpuratus genome provides a unique opportunity to investigate the function and evolution of neural genes. The neurobiology of sea urchins is of particular interest because they have a close phylogenetic relationship with chordates, yet a distinctive pentaradiate body plan and unusual neural organization. Orthologues of transcription factors that regulate neurogenesis in other animals have been identified and several are expressed in neurogenic domains before gastrulation indicating that they may operate near the top of a conserved neural gene regulatory network. A family of genes encoding voltage-gated ion channels is present but, surprisingly, genes encoding gap junction proteins (connexins and pannexins) appear to be absent. Genes required for synapse formation and function have been identified and genes for synthesis and transport of neurotransmitters are present. There is a large family of G-protein-coupled receptors, including 874 rhodopsin-type receptors, 28 metabotropic glutamate-like receptors and a remarkably expanded group of 161 secretin receptor-like proteins. Absence of cannabinoid, lysophospholipid and melanocortin receptors indicates that this group may be unique to chordates. There are at least 37 putative G-protein coupled peptide receptors and precursors for several neuropeptides and peptide hormones have been identified, including SALMFamides, NGFFFamide, a vasotocin-like peptide, glycoprotein hormones, and insulin/insulin-like growth factors. Identification of a neurotrophin-like gene and Trk receptor in sea urchin indicates that this neural signaling system is not unique to chordates. Several hundred chemoreceptor genes have been predicted using several approaches, a number similar to that for other animals. Intriguingly, genes encoding homologues of rhodopsin, Pax6 and several other key mammalian retinal transcription factors are expressed in tube feet, suggesting tube feet function as photosensory organs. Analysis of the sea urchin genome presents a unique perspective on the evolutionary history of deuterostome nervous systems and reveals new approaches to investigate the development and neurobiology of sea urchins. PMID:16965768
Nutt, S L; Morrison, A M; Dörfler, P; Rolink, A; Busslinger, M
1998-01-01
The Pax-5 gene codes for the transcription factor BSAP which is essential for the progression of adult B lymphopoiesis beyond an early progenitor (pre-BI) cell stage. Although several genes have been proposed to be regulated by BSAP, CD19 is to date the only target gene which has been genetically confirmed to depend on this transcription factor for its expression. We have now taken advantage of cultured pre-BI cells of wild-type and Pax-5 mutant bone marrow to screen a large panel of B lymphoid genes for additional BSAP target genes. Four differentially expressed genes were shown to be under the direct control of BSAP, as their expression was rapidly regulated in Pax-5-deficient pre-BI cells by a hormone-inducible BSAP-estrogen receptor fusion protein. The genes coding for the B-cell receptor component Ig-alpha (mb-1) and the transcription factors N-myc and LEF-1 are positively regulated by BSAP, while the gene coding for the cell surface protein PD-1 is efficiently repressed. Distinct regulatory mechanisms of BSAP were revealed by reconstituting Pax-5-deficient pre-BI cells with full-length BSAP or a truncated form containing only the paired domain. IL-7 signalling was able to efficiently induce the N-myc gene only in the presence of full-length BSAP, while complete restoration of CD19 synthesis was critically dependent on the BSAP protein concentration. In contrast, the expression of the mb-1 and LEF-1 genes was already reconstituted by the paired domain polypeptide lacking any transactivation function, suggesting that the DNA-binding domain of BSAP is sufficient to recruit other transcription factors to the regulatory regions of these two genes. In conclusion, these loss- and gain-of-function experiments demonstrate that BSAP regulates four newly identified target genes as a transcriptional activator, repressor or docking protein depending on the specific regulatory sequence context. PMID:9545244
Recurrent and functional regulatory mutations in breast cancer.
Rheinbay, Esther; Parasuraman, Prasanna; Grimsby, Jonna; Tiao, Grace; Engreitz, Jesse M; Kim, Jaegil; Lawrence, Michael S; Taylor-Weiner, Amaro; Rodriguez-Cuevas, Sergio; Rosenberg, Mara; Hess, Julian; Stewart, Chip; Maruvka, Yosef E; Stojanov, Petar; Cortes, Maria L; Seepo, Sara; Cibulskis, Carrie; Tracy, Adam; Pugh, Trevor J; Lee, Jesse; Zheng, Zongli; Ellisen, Leif W; Iafrate, A John; Boehm, Jesse S; Gabriel, Stacey B; Meyerson, Matthew; Golub, Todd R; Baselga, Jose; Hidalgo-Miranda, Alfredo; Shioda, Toshi; Bernards, Andre; Lander, Eric S; Getz, Gad
2017-07-06
Genomic analysis of tumours has led to the identification of hundreds of cancer genes on the basis of the presence of mutations in protein-coding regions. By contrast, much less is known about cancer-causing mutations in non-coding regions. Here we perform deep sequencing in 360 primary breast cancers and develop computational methods to identify significantly mutated promoters. Clear signals are found in the promoters of three genes. FOXA1, a known driver of hormone-receptor positive breast cancer, harbours a mutational hotspot in its promoter leading to overexpression through increased E2F binding. RMRP and NEAT1, two non-coding RNA genes, carry mutations that affect protein binding to their promoters and alter expression levels. Our study shows that promoter regions harbour recurrent mutations in cancer with functional consequences and that the mutations occur at similar frequencies as in coding regions. Power analyses indicate that more such regions remain to be discovered through deep sequencing of adequately sized cohorts of patients.
Lu, Xiangfeng; Peloso, Gina M; Liu, Dajiang J; Wu, Ying; Zhang, He; Zhou, Wei; Li, Jun; Tang, Clara Sze-Man; Dorajoo, Rajkumar; Li, Huaixing; Long, Jirong; Guo, Xiuqing; Xu, Ming; Spracklen, Cassandra N; Chen, Yang; Liu, Xuezhen; Zhang, Yan; Khor, Chiea Chuen; Liu, Jianjun; Sun, Liang; Wang, Laiyuan; Gao, Yu-Tang; Hu, Yao; Yu, Kuai; Wang, Yiqin; Cheung, Chloe Yu Yan; Wang, Feijie; Huang, Jianfeng; Fan, Qiao; Cai, Qiuyin; Chen, Shufeng; Shi, Jinxiu; Yang, Xueli; Zhao, Wanting; Sheu, Wayne H-H; Cherny, Stacey Shawn; He, Meian; Feranil, Alan B; Adair, Linda S; Gordon-Larsen, Penny; Du, Shufa; Varma, Rohit; Chen, Yii-Der Ida; Shu, Xiao-Ou; Lam, Karen Siu Ling; Wong, Tien Yin; Ganesh, Santhi K; Mo, Zengnan; Hveem, Kristian; Fritsche, Lars G; Nielsen, Jonas Bille; Tse, Hung-Fat; Huo, Yong; Cheng, Ching-Yu; Chen, Y Eugene; Zheng, Wei; Tai, E Shyong; Gao, Wei; Lin, Xu; Huang, Wei; Abecasis, Goncalo; Kathiresan, Sekar; Mohlke, Karen L; Wu, Tangchun; Sham, Pak Chung; Gu, Dongfeng; Willer, Cristen J
2017-12-01
Most genome-wide association studies have been of European individuals, even though most genetic variation in humans is seen only in non-European samples. To search for novel loci associated with blood lipid levels and clarify the mechanism of action at previously identified lipid loci, we used an exome array to examine protein-coding genetic variants in 47,532 East Asian individuals. We identified 255 variants at 41 loci that reached chip-wide significance, including 3 novel loci and 14 East Asian-specific coding variant associations. After a meta-analysis including >300,000 European samples, we identified an additional nine novel loci. Sixteen genes were identified by protein-altering variants in both East Asians and Europeans, and thus are likely to be functional genes. Our data demonstrate that most of the low-frequency or rare coding variants associated with lipids are population specific, and that examining genomic data across diverse ancestries may facilitate the identification of functional genes at associated loci.
Lu, Xiangfeng; Peloso, Gina M; Liu, Dajiang J.; Wu, Ying; Zhang, He; Zhou, Wei; Li, Jun; Tang, Clara Sze-man; Dorajoo, Rajkumar; Li, Huaixing; Long, Jirong; Guo, Xiuqing; Xu, Ming; Spracklen, Cassandra N.; Chen, Yang; Liu, Xuezhen; Zhang, Yan; Khor, Chiea Chuen; Liu, Jianjun; Sun, Liang; Wang, Laiyuan; Gao, Yu-Tang; Hu, Yao; Yu, Kuai; Wang, Yiqin; Cheung, Chloe Yu Yan; Wang, Feijie; Huang, Jianfeng; Fan, Qiao; Cai, Qiuyin; Chen, Shufeng; Shi, Jinxiu; Yang, Xueli; Zhao, Wanting; Sheu, Wayne H.-H.; Cherny, Stacey Shawn; He, Meian; Feranil, Alan B.; Adair, Linda S.; Gordon-Larsen, Penny; Du, Shufa; Varma, Rohit; da Chen, Yii-Der I; Shu, XiaoOu; Lam, Karen Siu Ling; Wong, Tien Yin; Ganesh, Santhi K.; Mo, Zengnan; Hveem, Kristian; Fritsche, Lars; Nielsen, Jonas Bille; Tse, Hung-fat; Huo, Yong; Cheng, Ching-Yu; Chen, Y. Eugene; Zheng, Wei; Tai, E Shyong; Gao, Wei; Lin, Xu; Huang, Wei; Abecasis, Goncalo; Consortium, GLGC; Kathiresan, Sekar; Mohlke, Karen L.; Wu, Tangchun; Sham, Pak Chung; Gu, Dongfeng; Willer, Cristen J
2017-01-01
Most genome-wide association studies have been conducted in European individuals, even though most genetic variation in humans is seen only in non-European samples. To search for novel loci associated with blood lipid levels and clarify the mechanism of action at previously identified lipid loci, we examined protein-coding genetic variants in 47,532 East Asian individuals using an exome array. We identified 255 variants at 41 loci reaching chip-wide significance, including 3 novel loci and 14 East Asian-specific coding variant associations. After meta-analysis with > 300,000 European samples, we identified an additional 9 novel loci. The same 16 genes were identified by the protein-altering variants in both East Asians and Europeans, likely pointing to the functional genes. Our data demonstrate that most of the low-frequency or rare coding variants associated with lipids are population-specific, and that examining genomic data across diverse ancestries may facilitate the identification of functional genes at associated loci. PMID:29083407
Biased exonization of transposed elements in duplicated genes: A lesson from the TIF-IA gene.
Amit, Maayan; Sela, Noa; Keren, Hadas; Melamed, Ze'ev; Muler, Inna; Shomron, Noam; Izraeli, Shai; Ast, Gil
2007-11-29
Gene duplication and exonization of intronic transposed elements are two mechanisms that enhance genomic diversity. We examined whether there is less selection against exonization of transposed elements in duplicated genes than in single-copy genes. Genome-wide analysis of exonization of transposed elements revealed a higher rate of exonization within duplicated genes relative to single-copy genes. The gene for TIF-IA, an RNA polymerase I transcription initiation factor, underwent a humanoid-specific triplication, all three copies of the gene are active transcriptionally, although only one copy retains the ability to generate the TIF-IA protein. Prior to TIF-IA triplication, an Alu element was inserted into the first intron. In one of the non-protein coding copies, this Alu is exonized. We identified a single point mutation leading to exonization in one of the gene duplicates. When this mutation was introduced into the TIF-IA coding copy, exonization was activated and the level of the protein-coding mRNA was reduced substantially. A very low level of exonization was detected in normal human cells. However, this exonization was abundant in most leukemia cell lines evaluated, although the genomic sequence is unchanged in these cancerous cells compared to normal cells. The definition of the Alu element within the TIF-IA gene as an exon is restricted to certain types of cancers; the element is not exonized in normal human cells. These results further our understanding of the delicate interplay between gene duplication and alternative splicing and of the molecular evolutionary mechanisms leading to genetic innovations. This implies the existence of purifying selection against exonization in single copy genes, with duplicate genes free from such constrains.
Biased exonization of transposed elements in duplicated genes: A lesson from the TIF-IA gene
Amit, Maayan; Sela, Noa; Keren, Hadas; Melamed, Ze'ev; Muler, Inna; Shomron, Noam; Izraeli, Shai; Ast, Gil
2007-01-01
Background Gene duplication and exonization of intronic transposed elements are two mechanisms that enhance genomic diversity. We examined whether there is less selection against exonization of transposed elements in duplicated genes than in single-copy genes. Results Genome-wide analysis of exonization of transposed elements revealed a higher rate of exonization within duplicated genes relative to single-copy genes. The gene for TIF-IA, an RNA polymerase I transcription initiation factor, underwent a humanoid-specific triplication, all three copies of the gene are active transcriptionally, although only one copy retains the ability to generate the TIF-IA protein. Prior to TIF-IA triplication, an Alu element was inserted into the first intron. In one of the non-protein coding copies, this Alu is exonized. We identified a single point mutation leading to exonization in one of the gene duplicates. When this mutation was introduced into the TIF-IA coding copy, exonization was activated and the level of the protein-coding mRNA was reduced substantially. A very low level of exonization was detected in normal human cells. However, this exonization was abundant in most leukemia cell lines evaluated, although the genomic sequence is unchanged in these cancerous cells compared to normal cells. Conclusion The definition of the Alu element within the TIF-IA gene as an exon is restricted to certain types of cancers; the element is not exonized in normal human cells. These results further our understanding of the delicate interplay between gene duplication and alternative splicing and of the molecular evolutionary mechanisms leading to genetic innovations. This implies the existence of purifying selection against exonization in single copy genes, with duplicate genes free from such constrains. PMID:18047649
Mehdizadeh Gohari, Iman; Kropinski, Andrew M.; Weese, Scott J.; Parreira, Valeria R.; Whitehead, Ashley E.; Boerlin, Patrick; Prescott, John F.
2016-01-01
The recent discovery of a novel beta-pore-forming toxin, NetF, which is strongly associated with canine and foal necrotizing enteritis should improve our understanding of the role of type A Clostridium perfringens associated disease in these animals. The current study presents the complete genome sequence of two netF-positive strains, JFP55 and JFP838, which were recovered from cases of foal necrotizing enteritis and canine hemorrhagic gastroenteritis, respectively. Genome sequencing was done using Single Molecule, Real-Time (SMRT) technology-PacBio and Illumina Hiseq2000. The JFP55 and JFP838 genomes include a single 3.34 Mb and 3.53 Mb chromosome, respectively, and both genomes include five circular plasmids. Plasmid annotation revealed that three plasmids were shared by the two newly sequenced genomes, including a NetF/NetE toxins-encoding tcp-conjugative plasmid, a CPE/CPB2 toxins-encoding tcp-conjugative plasmid and a putative bacteriocin-encoding plasmid. The putative beta-pore-forming toxin genes, netF, netE and netG, were located in unique pathogenicity loci on tcp-conjugative plasmids. The C. perfringens JFP55 chromosome carries 2,825 protein-coding genes whereas the chromosome of JFP838 contains 3,014 protein-encoding genes. Comparison of these two chromosomes with three available reference C. perfringens chromosome sequences identified 48 (~247 kb) and 81 (~430 kb) regions unique to JFP55 and JFP838, respectively. Some of these divergent genomic regions in both chromosomes are phage- and plasmid-related segments. Sixteen of these unique chromosomal regions (~69 kb) were shared between the two isolates. Five of these shared regions formed a mosaic of plasmid-integrated segments, suggesting that these elements were acquired early in a clonal lineage of netF-positive C. perfringens strains. These results provide significant insight into the basis of canine and foal necrotizing enteritis and are the first to demonstrate that netF resides on a large and unique plasmid-encoded locus. PMID:26859667
Shah, M S; Ashraf, A; Khan, M I; Rahman, M; Habib, M; Qureshi, J A
2016-01-01
Fowl adenovirus-4 is an infectious agent causing Hydropericardium syndrome in chickens. Adenovirus are non-enveloped virions having linear, double stranded DNA. Viral genome codes for few structural and non structural proteins. 100K is an important non-structural viral protein. Open reading frame for coding sequence of 100K protein was cloned with oligo histidine tag and expressed in Escherichia coli as a fusion protein. Nucleotide sequence of the gene revealed that 100K gene of FAdV-4 has high homology (98%) with the respective gene of FAdV-10. Recombinant 100K protein was expressed in E. coli and purified by nickel affinity chromatography. Immunization of chickens with recombinant 100K protein elicited significant serum antibody titers. However challenge protection test revealed that 100K protein conferred little protection (40%) to the immunized chicken against pathogenic viral challenge. So it was concluded that 100K gene has 2397 bp length and recombinant 100K protein has molecular weight of 95 kDa. It was also found that the recombinant protein has little capacity to affect the immune response because in-spite of having an important role in intracellular transport & folding of viral capsid proteins during viral replication, it is not exposed on the surface of the virus at any stage. Copyright © 2015 The International Alliance for Biological Standardization. All rights reserved.
Abdali, Narges; Younas, Farhan; Mafakheri, Samaneh; Pothula, Karunakar R; Kleinekathöfer, Ulrich; Tauch, Andreas; Benz, Roland
2018-05-09
Corynebacterium urealyticum, a pathogenic, multidrug resistant member of the mycolata, is known as causative agent of urinary tract infections although it is a bacterium of the skin flora. This pathogenic bacterium shares with the mycolata the property of having an unusual cell envelope composition and architecture, typical for the genus Corynebacterium. The cell wall of members of the mycolata contains channel-forming proteins for the uptake of solutes. In this study, we provide novel information on the identification and characterization of a pore-forming protein in the cell wall of C. urealyticum DSM 7109. Detergent extracts of whole C. urealyticum cultures formed in lipid bilayer membranes slightly cation-selective pores with a single-channel conductance of 1.75 nS in 1 M KCl. Experiments with different salts and non-electrolytes suggested that the cell wall pore of C. urealyticum is wide and water-filled and has a diameter of about 1.8 nm. Molecular modelling and dynamics has been performed to obtain a model of the pore. For the search of the gene coding for the cell wall pore of C. urealyticum we looked in the known genome of C. urealyticum for a similar chromosomal localization of the porin gene to known porH and porA genes of other Corynebacterium strains. Three genes are located between the genes coding for GroEL2 and polyphosphate kinase (PKK2). Two of the genes (cur_1714 and cur_1715) were expressed in different constructs in C. glutamicum ΔporAΔporH and in porin-deficient BL21 DE3 Omp8 E. coli strains. The results suggested that the gene cur_1714 codes alone for the cell wall channel. The cell wall porin of C. urealyticum termed PorACur was purified to homogeneity using different biochemical methods and had an apparent molecular mass of about 4 kDa on tricine-containing sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). Biophysical characterization of the purified protein (PorACur) suggested indeed that cur_1714 is the gene coding for the pore-forming protein in C. urealyticum because the protein formed in lipid bilayer experiments the same pores as the detergent extract of whole cells. The study is the first report of a cell wall channel in the pathogenic C. urealyticum.
2012-01-01
Background Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes. Results We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic analyses of all 13 mitochondrial protein-coding gene sequences consistently yield trees that place pseudoscorpions as sister to acariform mites. Conclusion The well-supported phylogenetic placement of pseudoscorpions as sister to Acariformes differs from some previous analyses based on morphology. However, these two lineages share multiple molecular evolutionary traits, including substantial mitochondrial genome rearrangements, extensive nucleotide substitution, and loss of helices in their inferred tRNA and rRNA structures. PMID:22409411
Réfega, Susana; Girard-Misguich, Fabienne; Bourdieu, Christiane; Péry, Pierre; Labbé, Marie
2003-04-02
Specific antibodies were produced ex vivo from intestinal culture of Eimeria tenella infected chickens. The specificity of these intestinal antibodies was tested against different parasite stages. These antibodies were used to immunoscreen first generation schizont and sporozoite cDNA libraries permitting the identification of new E. tenella antigens. We obtained a total of 119 cDNA clones which were subjected to sequence analysis. The sequences coding for the proteins inducing local immune responses were compared with nucleotide or protein databases and with expressed sequence tags (ESTs) databases. We identified new Eimeria genes coding for heat shock proteins, a ribosomal protein, a pyruvate kinase and a pyridoxine kinase. Specific features of other sequences are discussed.
Chen, Zhi-Teng; Wu, Hai-Yan; Du, Yu-Zhou
2016-07-01
We report the nearly complete mitochondrial genome of a stonefly species, Styloperla sp. (Plecoptera: Styloperlidae), which is a circular molecule of 15,416 bp in length and consists of 13 protein-coding genes, 2 ribosomal RNAs, 20 transfer RNAs and a partial control region (645 bp). Using the 13 protein-coding genes of 8 stoneflies and 3 other related species, we constructed a phylogenetic tree to verify the accuracy of the new determined mitogenome sequences. Our results provide basic data for further study of phylogeny in Plecoptera.
Occurrence, Functions and Biological Significance of Arginine-Rich Proteins.
Chandana, Thimmegowda; Venkatesh, Yeldur P
2016-01-01
Arginine, the most basic among the 20 amino acids, occurs less frequently than lysine in proteins despite being coded by six codons. Only a few important proteins of biological significance have been found to be abundant in arginine. It has been established that these arginine-rich proteins have been assigned important roles in the biological systems. Arginine-rich cationic proteins are known to stabilize macromolecular structures by establishing appropriate interactions (salt bridges, hydrogen bonds and cation-π interactions). These proteins are also known to be the key members of many regulatory pathways such as gene expression, chromatin stability, expurgation of introns from naïve mRNA, mRNA splicing, membrane-penetrating activity and pathogenesis-related defense, to name a few. Further, arginine occurs in various combinations with other amino acids (serine, lysine, proline, tryptophan, valine, glycine and glutamic acid) which diversify the potential functions of arginine-rich proteins. Arginine-rich proteins known till date from dietary sources have been described in terms of their structure and functional properties. A variety of activities such as bactericidal, membrane-penetrating, antimicrobial, anti-hypertensive, pro-angiogenic and others have been reported for arginine-rich proteins. This review attempts to collate the occurrence, functions and the biological significance of this unique class of proteins rich in arginine.
Li, Qingshun Q; Liu, Zhaoyang; Lu, Wenjia; Liu, Man
2017-05-17
Pre-mRNA alternative splicing and alternative polyadenylation have been implicated to play important roles during eukaryotic gene expression. However, much remains unknown regarding the regulatory mechanisms and the interactions of these two processes in plants. Here we focus on an Arabidopsis gene OXT6 (Oxidative Tolerant-6) that has been demonstrated to encode two proteins through alternative splicing and alternative polyadenylation. Specifically, alternative polyadenylation at Intron-2 of OXT6 produces a transcript coding for AtCPSF30, an Arabidopsis ortholog of 30 kDa subunit of the Cleavage and Polyadenylation Specificity Factor. On the other hand, alternative splicing of Intron-2 generates a longer transcript encoding a protein named AtC30Y, a polypeptide including most part of AtCPSF30 and a YT521B domain. To investigate the expression outcome of OXT6 in plants, a set of mutations were constructed to alter the splicing and polyadenylation patterns of OXT6. Analysis of transgenic plants bearing these mutations by quantitative RT-PCR revealed a competition relationship between these two processes. Moreover, when both splice sites and poly(A) signals were mutated, polyadenylation became the preferred mode of OXT6 processing. These results demonstrate the interplay between alternative splicing and alternative polyadenylation, and it is their concerted actions that define a gene's expression outcome.
Advancing neuroscience through epigenetics: molecular mechanisms of learning and memory.
Molfese, David L
2011-01-01
Humans share 96% of our 30,000 genes with Chimpanzees. The 1,200 genes that differ appear at first glance insufficient to describe what makes us human and them apes. However, we are now discovering that the mechanisms that regulate how genes are expressed tell a much richer story than our DNA alone. Sections of our DNA are constantly being turned on or off, marked for easy access, or secluded and hidden away, all in response to ongoing cellular activity. In the brain, neurons encode information-in effect memories-at the cellular level. Yet while memories may last a lifetime, neurons are dynamic structures. Every protein in the synapse undergoes some form of turnover, some with half-lives of only hours. How can a memory persist beyond the lifetimes of its constitutive molecular building blocks? Epigenetics-changes in gene expression that do not alter the underlying DNA sequence-may be the answer. In this article, epigenetic mechanisms including DNA methylation and acetylation or methylation of the histone proteins that package DNA are described in the context of animal learning. Through the interaction of these modifications a "histone code" is emerging wherein individual memories leave unique memory traces at the molecular level with distinct time courses. A better understanding of these mechanisms has implications for treatment of memory disorders caused by normal aging or diseases including schizophrenia, Alzheimer's, depression, and drug addiction.
Pathogenomic Inference of Virulence-Associated Genes in Leptospira interrogans
Lehmann, Jason S.; Fouts, Derrick E.; Haft, Daniel H.; Cannella, Anthony P.; Ricaldi, Jessica N.; Brinkac, Lauren; Harkins, Derek; Durkin, Scott; Sanka, Ravi; Sutton, Granger; Moreno, Angelo; Vinetz, Joseph M.; Matthias, Michael A.
2013-01-01
Leptospirosis is a globally important, neglected zoonotic infection caused by spirochetes of the genus Leptospira. Since genetic transformation remains technically limited for pathogenic Leptospira, a systems biology pathogenomic approach was used to infer leptospiral virulence genes by whole genome comparison of culture-attenuated Leptospira interrogans serovar Lai with its virulent, isogenic parent. Among the 11 pathogen-specific protein-coding genes in which non-synonymous mutations were found, a putative soluble adenylate cyclase with host cell cAMP-elevating activity, and two members of a previously unstudied ∼15 member paralogous gene family of unknown function were identified. This gene family was also uniquely found in the alpha-proteobacteria Bartonella bacilliformis and Bartonella australis that are geographically restricted to the Andes and Australia, respectively. How the pathogenic Leptospira and these two Bartonella species came to share this expanded gene family remains an evolutionary mystery. In vivo expression analyses demonstrated up-regulation of 10/11 Leptospira genes identified in the attenuation screen, and profound in vivo, tissue-specific up-regulation by members of the paralogous gene family, suggesting a direct role in virulence and host-pathogen interactions. The pathogenomic experimental design here is generalizable as a functional systems biology approach to studying bacterial pathogenesis and virulence and should encourage similar experimental studies of other pathogens. PMID:24098822
Pathogenomic inference of virulence-associated genes in Leptospira interrogans.
Lehmann, Jason S; Fouts, Derrick E; Haft, Daniel H; Cannella, Anthony P; Ricaldi, Jessica N; Brinkac, Lauren; Harkins, Derek; Durkin, Scott; Sanka, Ravi; Sutton, Granger; Moreno, Angelo; Vinetz, Joseph M; Matthias, Michael A
2013-01-01
Leptospirosis is a globally important, neglected zoonotic infection caused by spirochetes of the genus Leptospira. Since genetic transformation remains technically limited for pathogenic Leptospira, a systems biology pathogenomic approach was used to infer leptospiral virulence genes by whole genome comparison of culture-attenuated Leptospira interrogans serovar Lai with its virulent, isogenic parent. Among the 11 pathogen-specific protein-coding genes in which non-synonymous mutations were found, a putative soluble adenylate cyclase with host cell cAMP-elevating activity, and two members of a previously unstudied ∼15 member paralogous gene family of unknown function were identified. This gene family was also uniquely found in the alpha-proteobacteria Bartonella bacilliformis and Bartonella australis that are geographically restricted to the Andes and Australia, respectively. How the pathogenic Leptospira and these two Bartonella species came to share this expanded gene family remains an evolutionary mystery. In vivo expression analyses demonstrated up-regulation of 10/11 Leptospira genes identified in the attenuation screen, and profound in vivo, tissue-specific up-regulation by members of the paralogous gene family, suggesting a direct role in virulence and host-pathogen interactions. The pathogenomic experimental design here is generalizable as a functional systems biology approach to studying bacterial pathogenesis and virulence and should encourage similar experimental studies of other pathogens.
Guttman, Mitchell; Garber, Manuel; Levin, Joshua Z.; Donaghey, Julie; Robinson, James; Adiconis, Xian; Fan, Lin; Koziol, Magdalena J.; Gnirke, Andreas; Nusbaum, Chad; Rinn, John L.; Lander, Eric S.; Regev, Aviv
2010-01-01
RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes. PMID:20436462
An atlas of gene expression and gene co-regulation in the human retina.
Pinelli, Michele; Carissimo, Annamaria; Cutillo, Luisa; Lai, Ching-Hung; Mutarelli, Margherita; Moretti, Maria Nicoletta; Singh, Marwah Veer; Karali, Marianthi; Carrella, Diego; Pizzo, Mariateresa; Russo, Francesco; Ferrari, Stefano; Ponzin, Diego; Angelini, Claudia; Banfi, Sandro; di Bernardo, Diego
2016-07-08
The human retina is a specialized tissue involved in light stimulus transduction. Despite its unique biology, an accurate reference transcriptome is still missing. Here, we performed gene expression analysis (RNA-seq) of 50 retinal samples from non-visually impaired post-mortem donors. We identified novel transcripts with high confidence (Observed Transcriptome (ObsT)) and quantified the expression level of known transcripts (Reference Transcriptome (RefT)). The ObsT included 77 623 transcripts (23 960 genes) covering 137 Mb (35 Mb new transcribed genome). Most of the transcripts (92%) were multi-exonic: 81% with known isoforms, 16% with new isoforms and 3% belonging to new genes. The RefT included 13 792 genes across 94 521 known transcripts. Mitochondrial genes were among the most highly expressed, accounting for about 10% of the reads. Of all the protein-coding genes in Gencode, 65% are expressed in the retina. We exploited inter-individual variability in gene expression to infer a gene co-expression network and to identify genes specifically expressed in photoreceptor cells. We experimentally validated the photoreceptors localization of three genes in human retina that had not been previously reported. RNA-seq data and the gene co-expression network are available online (http://retina.tigem.it). © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Lincomycin Biosynthesis Involves a Tyrosine Hydroxylating Heme Protein of an Unusual Enzyme Family
Novotna, Jitka; Olsovska, Jana; Novak, Petr; Mojzes, Peter; Chaloupkova, Radka; Kamenik, Zdenek; Spizek, Jaroslav; Kutejova, Eva; Mareckova, Marketa; Tichy, Pavel; Damborsky, Jiri; Janata, Jiri
2013-01-01
The gene lmbB2 of the lincomycin biosynthetic gene cluster of Streptomyces lincolnensis ATCC 25466 was shown to code for an unusual tyrosine hydroxylating enzyme involved in the biosynthetic pathway of this clinically important antibiotic. LmbB2 was expressed in Escherichia coli, purified near to homogeneity and shown to convert tyrosine to 3,4-dihydroxyphenylalanine (DOPA). In contrast to the well-known tyrosine hydroxylases (EC 1.14.16.2) and tyrosinases (EC 1.14.18.1), LmbB2 was identified as a heme protein. Mass spectrometry and Soret band-excited Raman spectroscopy of LmbB2 showed that LmbB2 contains heme b as prosthetic group. The CO-reduced differential absorption spectra of LmbB2 showed that the coordination of Fe was different from that of cytochrome P450 enzymes. LmbB2 exhibits sequence similarity to Orf13 of the anthramycin biosynthetic gene cluster, which has recently been classified as a heme peroxidase. Tyrosine hydroxylating activity of LmbB2 yielding DOPA in the presence of (6R)-5,6,7,8-tetrahydro-L-biopterin (BH4) was also observed. Reaction mechanism of this unique heme peroxidases family is discussed. Also, tyrosine hydroxylation was confirmed as the first step of the amino acid branch of the lincomycin biosynthesis. PMID:24324587
Contactin 4 as an Autism Susceptibility Locus
Cottrell, Catherine E.; Bir, Natalie; Varga, Elizabeth; Alvarez, Carlos E.; Bouyain, Samuel; Zernzach, Randall; LambThrush, Devon; Evans, Johnna; Trimarchi, Michael; Butter, Eric M.; Cunningham, David; Gastier-Foster, Julie M.; McBride, Kim; Herman, Gail E.
2011-01-01
Scientific Abstract Structural and sequence variation have been described in several members of the contactin (CNTN) and contactin associated protein (CNTNAP) gene families in association with neurodevelopmental disorders, including autism. Using array comparative genome hybridization (CGH), we identified a maternally inherited ~535 kb deletion at 3p26.3 encompassing the 5′ end of the contactin 4 gene (CNTN4) in a patient with autism. Based on this finding and previous reports implicating genomic rearrangements of CNTN4 in autism spectrum disorders (ASDs) and 3p− microdeletion syndrome, we undertook sequencing of the coding regions of the gene in a local ASD cohort in comparison with a set of controls. Unique missense variants were identified in 4/75 unrelated individuals with an ASD, as well as in 1/107 controls. All of the amino acid substitutions were nonsynonomous, occurred at evolutionarily conserved positions, and were, thus, felt likely to be deleterious. However, these data did not reach statistical significance, nor did the variants segregate with disease within all of the ASD families. Finally, there was no detectable difference in binding of two of the variants to the interacting protein PTPRG in vitro. Thusadditional, larger studies will be necessary to determine whether CNTN4 functions as an autism susceptibility locus in combination with other genetic and/or environmental factors. PMID:21308999
Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae.
Michel, Christian J; Ngoune, Viviane Nguefack; Poch, Olivier; Ripp, Raymond; Thompson, Julie D
2017-12-03
A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae . Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae . We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae , but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.
Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.
Borodovsky, M; Rudd, K E; Koonin, E V
1994-01-01
The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins. Images PMID:7984428
Decoding the genome beyond sequencing: the new phase of genomic research.
Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J
2011-10-01
While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. Copyright © 2011 Elsevier Inc. All rights reserved.
Divergent transcription is associated with promoters of transcriptional regulators
2013-01-01
Background Divergent transcription is a wide-spread phenomenon in mammals. For instance, short bidirectional transcripts are a hallmark of active promoters, while longer transcripts can be detected antisense from active genes in conditions where the RNA degradation machinery is inhibited. Moreover, many described long non-coding RNAs (lncRNAs) are transcribed antisense from coding gene promoters. However, the general significance of divergent lncRNA/mRNA gene pair transcription is still poorly understood. Here, we used strand-specific RNA-seq with high sequencing depth to thoroughly identify antisense transcripts from coding gene promoters in primary mouse tissues. Results We found that a substantial fraction of coding-gene promoters sustain divergent transcription of long non-coding RNA (lncRNA)/mRNA gene pairs. Strikingly, upstream antisense transcription is significantly associated with genes related to transcriptional regulation and development. Their promoters share several characteristics with those of transcriptional developmental genes, including very large CpG islands, high degree of conservation and epigenetic regulation in ES cells. In-depth analysis revealed a unique GC skew profile at these promoter regions, while the associated coding genes were found to have large first exons, two genomic features that might enforce bidirectional transcription. Finally, genes associated with antisense transcription harbor specific H3K79me2 epigenetic marking and RNA polymerase II enrichment profiles linked to an intensified rate of early transcriptional elongation. Conclusions We concluded that promoters of a class of transcription regulators are characterized by a specialized transcriptional control mechanism, which is directly coupled to relaxed bidirectional transcription. PMID:24365181
Complete mitochondrial genome of a Asian lion (Panthera leo goojratensis).
Li, Yu-Fei; Wang, Qiang; Zhao, Jian-ning
2016-01-01
The entire mitochondrial genome of this Asian lion (Panthera leo goojratensis) was 17,183 bp in length, gene composition and arrangement conformed to other lions, which contained the typical structure of 22 tRNAs, 2 rRNAs, 13 protein-coding genes and a non-coding region. The characteristic of the mitochondrial genome was analyzed in detail.
A Plain English Map of the Human Glycolysis Enzymes.
ERIC Educational Resources Information Center
Offner, Susan
1999-01-01
Presents a plain English map of the gene coding for the glycolysis enzymes in humans to be used as a teaching tool. The map can be used to illustrate that every reaction in a cell requires an enzyme, and that every enzyme is a protein coded for by a gene somewhere on the chromosomes. (WRM)
Tellgren-Roth, Christian; Baudo, Charles D.; Kennell, John C.; Sun, Sheng; Billmyre, R. Blake; Schröder, Markus S.; Andersson, Anna; Holm, Tina; Sigurgeirsson, Benjamin; Wu, Guangxi; Sankaranarayanan, Sundar Ram; Siddharthan, Rahul; Sanyal, Kaustuv; Lundeberg, Joakim; Nystedt, Björn; Boekhout, Teun; Dawson, Thomas L.; Heitman, Joseph
2017-01-01
Abstract Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies. PMID:28100699
Lee, Hyoung-Joo; Jeong, Seul-Ki; Na, Keun; Lee, Min Jung; Lee, Sun Hee; Lim, Jong-Sun; Cha, Hyun-Jeong; Cho, Jin-Young; Kwon, Ja-Young; Kim, Hoguen; Song, Si Young; Yoo, Jong Shin; Park, Young Mok; Kim, Hail; Hancock, William S; Paik, Young-Ki
2013-06-07
As a starting point of the Chromosome-Centric Human Proteome Project (C-HPP), we established strategies of genome-wide proteomic analysis, including protein identification, quantitation of disease-specific proteins, and assessment of post-translational modifications, using paired human placental tissues from healthy and preeclampsia patients. This analysis resulted in identification of 4239 unique proteins with high confidence (two or more unique peptides with a false discovery rate less than 1%), covering 21% of approximately 20, 059 (Ensembl v69, Oct 2012) human proteins, among which 28 proteins exhibited differentially expressed preeclampsia-specific proteins. When these proteins are assigned to all human chromosomes, the pattern of the newly identified placental protein population is proportional to that of the gene count distribution of each chromosome. We also identified 219 unique N-linked glycopeptides, 592 unique phosphopeptides, and 66 chromosome 13-specific proteins. In particular, protein evidence of 14 genes previously known to be specifically up-regulated in human placenta was verified by mass spectrometry. With respect to the functional implication of these proteins, 38 proteins were found to be involved in regulatory factor biosynthesis or the immune system in the placenta, but the molecular mechanism of these proteins during pregnancy warrants further investigation. As far as we know, this work produced the highest number of proteins identified in the placenta and will be useful for annotating and mapping all proteins encoded in the human genome.
Sequence divergence of the red and green visual pigments in great apes and humans.
Deeb, S S; Jorgensen, A L; Battisti, L; Iwasaki, L; Motulsky, A G
1994-01-01
We have determined the coding sequences of red and green visual pigment genes of the chimpanzee, gorilla, and orangutan. The deduced amino acid sequences of these pigments are highly homologous to the equivalent human pigments. None of the amino acid differences occurred at sites that were previously shown to influence pigment absorption characteristics. Therefore, we predict the spectra of red and green pigments of the apes to have wavelengths of maximum absorption that differ by < 2 nm from the equivalent human pigments and that color vision in these nonhuman primates will be very similar, if not identical, to that in humans. A total of 14 within-species polymorphisms (6 involving silent substitutions) were observed in the coding sequences of the red and green pigment genes of the great apes. Remarkably, the polymorphisms at 6 of these sites had been observed in human populations, suggesting that they predated the evolution of higher primates. Alleles at polymorphic sites were often shared between the red and green pigment genes. The average synonymous rate of divergence of red from green sequences was approximately 1/10th that estimated for other proteins of higher primates, indicating the involvement of gene conversion in generating these polymorphisms. The high degree of homology and juxtaposition of these two genes on the X chromosome has promoted unequal recombination and/or gene conversion that led to sequence homogenization. However, natural selection operated to maintain the degree of separation in peak absorbance between the red and green pigments that resulted in optimal chromatic discrimination. This represents a unique case of molecular coevolution between two homologous genes that functionally interact at the behavioral level. PMID:8041777
A review of bacterial methyl halide degradation: biochemistry, genetics and molecular ecology
McDonald, I.R.; Warner, K.L.; McAnulla, C.; Woodall, C.A.; Oremland, R.S.; Murrell, J.C.
2002-01-01
Methyl halide-degrading bacteria are a diverse group of organisms that are found in both terrestrial and marine environments. They potentially play an important role in mitigating ozone depletion resulting from methyl chloride and methyl bromide emissions. The first step in the pathway(s) of methyl halide degradation involves a methyltransferase and, recently, the presence of this pathway has been studied in a number of bacteria. This paper reviews the biochemistry and genetics of methyl halide utilization in the aerobic bacteria Methylobacterium chloromethanicum CM4T, Hyphomicrobium chloromethanicum CM2T, Aminobacter strain IMB-1 and Aminobacter strain CC495. These bacteria are able to use methyl halides as a sole source of carbon and energy, are all members of the α-Proteobacteria and were isolated from a variety of polluted and pristine terrestrial environments. An understanding of the genetics of these bacteria identified a unique gene (cmuA) involved in the degradation of methyl halides, which codes for a protein (CmuA) with unique methyltransferase and corrinoid functions. This unique functional gene, cmuA, is being used to develop molecular ecology techniques to examine the diversity and distribution of methyl halide-utilizing bacteria in the environment and hopefully to understand their role in methyl halide degradation in different environments. These techniques will also enable the detection of potentially novel methyl halide-degrading bacteria.
Ganeshan, Seedhabadee; Sharma, Pallavi; Young, Lester; Kumar, Ashwani; Fowler, D Brian; Chibbar, Ravindra N
2011-03-01
Low-temperature (LT) tolerance in winter wheat (Triticum aestivum L.) is an economically important but complex trait. Four selected wheat genotypes, a winter hardy cultivar, Norstar, a tender spring cultivar, Manitou and two near-isogenic lines with Vrn-A1 (spring Norstar) and vrn-A1 (winter Manitou) alleles of Manitou and Norstar were cold-acclimated at 6°C and crown and leaf tissues were collected at 0, 2, 14, 21, 35, 42, 56 and 70 days of cold acclimation. cDNA-AFLP profiling was used to determine temporal expression profiles of transcripts during cold-acclimation in crown and leaf tissues, separately to determine if LT regulatory circuitries in crown and leaf tissues could be delineated using this approach. Screening 64 primer combinations identified 4,074 and 2,757 differentially expressed transcript-derived fragments (TDFs) out of which 38 and 16% were up-regulated as compared to 3 and 6% that were down-regulated in crown and leaf tissues, respectively. DNA sequencing of TDFs revealed sequences common to both tissues including genes coding for DEAD-box RNA helicase, choline-phosphate cytidylyltransferase and delta-1-pyrroline carboxylate synthetase. TDF specific to crown tissues included genes coding for phospahtidylinositol kinase, auxin response factor protein and brassinosteroid insensitive 1-associated receptor kinase. In leaf, genes such as methylene tetrahydrofolate reductase, NADH-cytochrome b5 reductase and malate dehydrogenase were identified. However, 30 and 14% of the DNA sequences from the crown and leaf tissues, respectively, were hypothetical or unknown proteins. Cluster analysis of up-, down-regulated and unique TDFs, DNA sequence and real-time PCR validation, infer that mechanisms operating in crown and leaf tissue in response to LT are differently regulated and warrant further studies.
Das, Alok; Soubam, D; Singh, P K; Thakur, S; Singh, N K; Sharma, T R
2012-06-01
The dominant rice blast resistance gene, Pi54 confers resistance to Magnaporthe oryzae in different parts of India. In our effort to identify more effective forms of this gene, we isolated an orthologue of Pi54 named as Pi54rh from the blast-resistant wild species of rice, Oryza rhizomatis, using allele mining approach and validated by complementation. The Pi54rh belongs to CC-NBS-LRR family of disease resistance genes with a unique Zinc finger (C(3)H type) domain. The 1,447 bp Pi54rh transcript comprises of 101 bp 5'-UTR, 1,083 bp coding region and 263 bp 3'-UTR, driven by pathogen inducible promoter. We showed the extracellular localization of Pi54rh protein and the presence of glycosylation, myristoylation and phosphorylation sites which implicates its role in signal transduction process. This is in contrast to other blast resistance genes that are predicted to be intracellular NBS-LRR-type resistance proteins. The Pi54rh was found to express constitutively at basal level in the leaves, but upregulates 3.8-fold at 96 h post-inoculation with the pathogen. Functional validation of cloned Pi54rh gene using complementation test showed high degree of resistance to seven isolates of M. oryzae collected from different geographical locations of India. In this study, for the first time, we demonstrated that a rice blast resistance gene Pi54rh cloned from wild species of rice provides broad spectrum resistance to M. oryzae hence can be used in rice improvement breeding programme.
Song, Sheng-Nan; Chen, Peng-Yan; Wei, Shu-Jun; Chen, Xue-Xin
2016-07-01
The mitochondrial genome sequence of Polistes jokahamae (Radoszkowski, 1887) (Hymenoptera: Vespidae) (GenBank accession no. KR052468) was sequenced. The current length with partial A + T-rich region of this mitochondrial genome is 16,616 bp. All the typical mitochondrial genes were sequenced except for three tRNAs (trnI, trnQ, and trnY) located between the A + T-rich region and nad2. At least three rearrangement events occurred in the sequenced region compared with the pupative ancestral arrangement of insects, corresponding to the shuffling of trnK and trnD, translocation or remote inversion of tnnY and translocation of trnL1. All protein-coding genes start with ATN codons. Eleven, one, and another one protein-coding genes stop with termination codon TAA, TA, and T, respectively. Phylogenetic analysis using the Bayesian method based on all codon positions of the 13 protein-coding genes supports the monophyly of Vespidae and Formicidae. Within the Formicidae, the Myrmicinae and Formicinae form a sister lineage and then sister to the Dolichoderinae, while within the Vespidae, the Eumeninae is sister to the lineage of Vespinae + Polistinae.
Artificial “ping-pong” cascade of PIWI-interacting RNA in silkworm cells
Shoji, Keisuke; Suzuki, Yutaka; Sugano, Sumio; Shimada, Toru; Katsuma, Susumu
2017-01-01
PIWI-interacting RNAs (piRNAs) play essential roles in the defense system against selfish elements in animal germline cells by cooperating with PIWI proteins. A subset of piRNAs is predicted to be generated via the “ping-pong” cascade, which is mainly controlled by two different PIWI proteins. Here we established a cell-based artificial piRNA production system using a silkworm ovarian cultured cell line that is believed to possess a complete piRNA pathway. In addition, we took advantage of a unique silkworm sex-determining one-to-one ping-pong piRNA pair, which enabled us to precisely monitor the behavior of individual artificial piRNAs. With this novel strategy, we successfully generated artificial piRNAs against endogenous protein-coding genes via the expected back-and-forth traveling mechanism. Furthermore, we detected “primary” piRNAs from the upstream region of the artificial “ping-pong” site in the endogenous gene. This artificial piRNA production system experimentally confirms the existence of the “ping-pong” cascade of piRNAs. Also, this system will enable us to identify the factors involved in both, or each, of the “ping” and “pong” cascades and the sequence features that are required for efficient piRNA production. PMID:27777367
Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D
2017-01-04
The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Intragenome Diversity of Gene Families Encoding Toxin-like Proteins in Venomous Animals.
Rodríguez de la Vega, Ricardo C; Giraud, Tatiana
2016-11-01
The evolution of venoms is the story of how toxins arise and of the processes that generate and maintain their diversity. For animal venoms these processes include recruitment for expression in the venom gland, neofunctionalization, paralogous expansions, and functional divergence. The systematic study of these processes requires the reliable identification of the venom components involved in antagonistic interactions. High-throughput sequencing has the potential of uncovering the entire set of toxins in a given organism, yet the existence of non-venom toxin paralogs and the misleading effects of partial census of the molecular diversity of toxins make necessary to collect complementary evidence to distinguish true toxins from their non-venom paralogs. Here, we analyzed the whole genomes of two scorpions, one spider and one snake, aiming at the identification of the full repertoires of genes encoding toxin-like proteins. We classified the entire set of protein-coding genes into paralogous groups and monotypic genes, identified genes encoding toxin-like proteins based on known toxin families, and quantified their expression in both venom-glands and pooled tissues. Our results confirm that genes encoding toxin-like proteins are part of multigene families, and that these families arise by recruitment events from non-toxin genes followed by limited expansions of the toxin-like protein coding genes. We also show that failing to account for sequence similarity with non-toxin proteins has a considerable misleading effect that can be greatly reduced by comparative transcriptomics. Our study overall contributes to the understanding of the evolutionary dynamics of proteins involved in antagonistic interactions. © The Author 2016. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.
Domb, Katherine; Keidar, Danielle; Yaakov, Beery; Khasdan, Vadim; Kashkush, Khalil
2017-10-27
Natural populations of the tetraploid wild emmer wheat (genome AABB) were previously shown to demonstrate eco-geographically structured genetic and epigenetic diversity. Transposable elements (TEs) might make up a significant part of the genetic and epigenetic variation between individuals and populations because they comprise over 80% of the wild emmer wheat genome. In this study, we performed detailed analyses to assess the dynamics of transposable elements in 50 accessions of wild emmer wheat collected from 5 geographically isolated sites. The analyses included: the copy number variation of TEs among accessions in the five populations, population-unique insertional patterns, and the impact of population-unique/specific TE insertions on structure and expression of genes. We assessed the copy numbers of 12 TE families using real-time quantitative PCR, and found significant copy number variation (CNV) in the 50 wild emmer wheat accessions, in a population-specific manner. In some cases, the CNV difference reached up to 6-fold. However, the CNV was TE-specific, namely some TE families showed higher copy numbers in one or more populations, and other TE families showed lower copy numbers in the same population(s). Furthermore, we assessed the insertional patterns of 6 TE families using transposon display (TD), and observed significant population-specific insertional patterns. The polymorphism levels of TE-insertional patterns reached 92% among all wild emmer wheat accessions, in some cases. In addition, we observed population-specific/unique TE insertions, some of which were located within or close to protein-coding genes, creating allelic variations in a population-specific manner. We also showed that those genes are differentially expressed in wild emmer wheat. For the first time, this study shows that TEs proliferate in wild emmer wheat in a population-specific manner, creating new alleles of genes, which contribute to the divergent evolution of homeologous genes from the A and B subgenomes.
Comparative Genomics and Phylogenomics of East Asian Tulips (Amana, Liliaceae)
Li, Pan; Lu, Rui-Sen; Xu, Wu-Qin; Ohi-Toma, Tetsuo; Cai, Min-Qi; Qiu, Ying-Xiong; Cameron, Kenneth M.; Fu, Cheng-Xin
2017-01-01
The genus Amana Honda (Liliaceae), when it is treated as separate from Tulipa, comprises six perennial herbaceous species that are restricted to China, Japan and the Korean Peninsula. Although all six Amana species have important medicinal and horticultural uses, studies focused on species identification and molecular phylogenetics are few. Here we report the nucleotide sequences of six complete Amana chloroplast (cp) genomes. The cp genomes of Amana range from 150,613 bp to 151,136 bp in length, all including a pair of inverted repeats (25,629–25,859 bp) separated by the large single-copy (81,482–82,218 bp) and small single-copy (17,366–17,465 bp) regions. Each cp genome equivalently contains 112 unique genes consisting of 30 transfer RNA genes, four ribosomal RNA genes, and 78 protein coding genes. Gene content, gene order, AT content, and IR/SC boundary structure are nearly identical among all Amana cp genomes. However, the relative contraction and expansion of the IR/SC borders among the six Amana cp genomes results in length variation among them. Simple sequence repeat (SSR) analyses of these Amana cp genomes indicate that the richest SSRs are A/T mononucleotides. The number of repeats among the six Amana species varies from 54 (A. anhuiensis) to 69 (Amana kuocangshanica) with palindromic (28–35) and forward repeats (23–30) as the most common types. Phylogenomic analyses based on these complete cp genomes and 74 common protein-coding genes strongly support the monophyly of the genus, and a sister relationship between Amana and Erythronium, rather than a shared common ancestor with Tulipa. Nine DNA markers (rps15–ycf1, accD–psaI, petA–psbJ, rpl32–trnL, atpH–atpI, petD–rpoA, trnS–trnG, psbM–trnD, and ycf4–cemA) with number of variable sites greater than 0.9% were identified, and these may be useful for future population genetic and phylogeographic studies of Amana species. PMID:28421090
2011-01-01
Background The insect order Neuroptera encompasses more than 5,700 described species. To date, only three neuropteran mitochondrial genomes have been fully and one partly sequenced. Current knowledge on neuropteran mitochondrial genomes is limited, and new data are strongly required. In the present work, the mitochondrial genome of the ascalaphid owlfly Libelloides macaronius is described and compared with the known neuropterid mitochondrial genomes: Megaloptera, Neuroptera and Raphidioptera. These analyses are further extended to other endopterygotan orders. Results The mitochondrial genome of L. macaronius is a circular molecule 15,890 bp long. It includes the entire set of 37 genes usually present in animal mitochondrial genomes. The gene order of this newly sequenced genome is unique among Neuroptera and differs from the ancestral type of insects in the translocation of trnC. The L. macaronius genome shows the lowest A+T content (74.50%) among known neuropterid genomes. Protein-coding genes possess the typical mitochondrial start codons, except for cox1, which has an unusual ACG. Comparisons among endopterygotan mitochondrial genomes showed that A+T content and AT/GC-skews exhibit a broad range of variation among 84 analyzed taxa. Comparative analyses showed that neuropterid mitochondrial protein-coding genes experienced complex evolutionary histories, involving features ranging from codon usage to rate of substitution, that make them potential markers for population genetics/phylogenetics studies at different taxonomic ranks. The 22 tRNAs show variable substitution patterns in Neuropterida, with higher sequence conservation in genes located on the α strand. Inferred secondary structures for neuropterid rrnS and rrnL genes largely agree with those known for other insects. For the first time, a model is provided for domain I of an insect rrnL. The control region in Neuropterida, as in other insects, is fast-evolving genomic region, characterized by AT-rich motifs. Conclusions The new genome shares many features with known neuropteran genomes but differs in its low A+T content. Comparative analysis of neuropterid mitochondrial genes showed that they experienced distinct evolutionary patterns. Both tRNA families and ribosomal RNAs show composite substitution pathways. The neuropterid mitochondrial genome is characterized by a complex evolutionary history. PMID:21569260
Lin, Feng-Jiau; Liu, Yuan; Sha, Zhongli; Tsang, Ling Ming; Chu, Ka Hou; Chan, Tin-Yam; Liu, Ruiyu; Cui, Zhaoxia
2012-11-16
The evolutionary history and relationships of the mud shrimps (Crustacea: Decapoda: Gebiidea and Axiidea) are contentious, with previous attempts revealing mixed results. The mud shrimps were once classified in the infraorder Thalassinidea. Recent molecular phylogenetic analyses, however, suggest separation of the group into two individual infraorders, Gebiidea and Axiidea. Mitochondrial (mt) genome sequence and structure can be especially powerful in resolving higher systematic relationships that may offer new insights into the phylogeny of the mud shrimps and the other decapod infraorders, and test the hypothesis of dividing the mud shrimps into two infraorders. We present the complete mitochondrial genome sequences of five mud shrimps, Austinogebia edulis, Upogebia major, Thalassina kelanang (Gebiidea), Nihonotrypaea thermophilus and Neaxius glyptocercus (Axiidea). All five genomes encode a standard set of 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes and a putative control region. Except for T. kelanang, mud shrimp mitochondrial genomes exhibited rearrangements and novel patterns compared to the pancrustacean ground pattern. Each of the two Gebiidea species (A. edulis and U. major) and two Axiidea species (N. glyptocercus and N. thermophiles) share unique gene order specific to their infraorders and analyses further suggest these two derived gene orders have evolved independently. Phylogenetic analyses based on the concatenated nucleotide and amino acid sequences of 13 protein-coding genes indicate the possible polyphyly of mud shrimps, supporting the division of the group into two infraorders. However, the infraordinal relationships among the Gebiidea and Axiidea, and other reptants are poorly resolved. The inclusion of mt genome from more taxa, in particular the reptant infraorders Polychelida and Glypheidea is required in further analysis. Phylogenetic analyses on the mt genome sequences and the distinct gene orders provide further evidences for the divergence between the two mud shrimp infraorders, Gebiidea and Axiidea, corroborating previous molecular phylogeny and justifying their infraordinal status. Mitochondrial genome sequences appear to be promising markers for resolving phylogenetic issues concerning decapod crustaceans that warrant further investigations and our present study has also provided further information concerning the mt genome evolution of the Decapoda.
2012-01-01
Background The evolutionary history and relationships of the mud shrimps (Crustacea: Decapoda: Gebiidea and Axiidea) are contentious, with previous attempts revealing mixed results. The mud shrimps were once classified in the infraorder Thalassinidea. Recent molecular phylogenetic analyses, however, suggest separation of the group into two individual infraorders, Gebiidea and Axiidea. Mitochondrial (mt) genome sequence and structure can be especially powerful in resolving higher systematic relationships that may offer new insights into the phylogeny of the mud shrimps and the other decapod infraorders, and test the hypothesis of dividing the mud shrimps into two infraorders. Results We present the complete mitochondrial genome sequences of five mud shrimps, Austinogebia edulis, Upogebia major, Thalassina kelanang (Gebiidea), Nihonotrypaea thermophilus and Neaxius glyptocercus (Axiidea). All five genomes encode a standard set of 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes and a putative control region. Except for T. kelanang, mud shrimp mitochondrial genomes exhibited rearrangements and novel patterns compared to the pancrustacean ground pattern. Each of the two Gebiidea species (A. edulis and U. major) and two Axiidea species (N. glyptocercus and N. thermophiles) share unique gene order specific to their infraorders and analyses further suggest these two derived gene orders have evolved independently. Phylogenetic analyses based on the concatenated nucleotide and amino acid sequences of 13 protein-coding genes indicate the possible polyphyly of mud shrimps, supporting the division of the group into two infraorders. However, the infraordinal relationships among the Gebiidea and Axiidea, and other reptants are poorly resolved. The inclusion of mt genome from more taxa, in particular the reptant infraorders Polychelida and Glypheidea is required in further analysis. Conclusions Phylogenetic analyses on the mt genome sequences and the distinct gene orders provide further evidences for the divergence between the two mud shrimp infraorders, Gebiidea and Axiidea, corroborating previous molecular phylogeny and justifying their infraordinal status. Mitochondrial genome sequences appear to be promising markers for resolving phylogenetic issues concerning decapod crustaceans that warrant further investigations and our present study has also provided further information concerning the mt genome evolution of the Decapoda. PMID:23153176
Hansen, Karina K; Hauser, Frank; Williamson, Michael; Weber, Stine B; Grimmelikhuijzen, Cornelis J P
2011-01-07
Recently, a novel neuropeptide, CCHamide, was discovered in the silkworm Bombyx mori (L. Roller et al., Insect Biochem. Mol. Biol. 38 (2008) 1147-1157). We have now found that all insects with a sequenced genome have two genes, each coding for a different CCHamide, CCHamide-1 and -2. We have also cloned and deorphanized two Drosophila G-protein-coupled receptors (GPCRs) coded for by genes CG14593 and CG30106 that are selectively activated by Drosophila CCH-amide-1 (EC(50), 2×10(-9) M) and CCH-amide-2 (EC(50), 5×10(-9) M), respectively. Gene CG30106 (symbol synonym CG14484) has in a previous publication (E.C. Johnson et al., J. Biol. Chem. 278 (2003) 52172-52178) been wrongly assigned to code for an allatostatin-B receptor. This conclusion is based on our findings that the allatostatins-B do not activate the CG30106 receptor and on the recent findings from other research groups that the allatostatins-B activate an unrelated GPCR coded for by gene CG16752. Comparative genomics suggests that a duplication of the CCHamide neuropeptide signalling system occurred after the split of crustaceans and insects, about 410 million years ago, because only one CCHamide neuropeptide gene is found in the water flea Daphnia pulex (Crustacea) and the tick Ixodes scapularis (Chelicerata). Copyright © 2010 Elsevier Inc. All rights reserved.
Nakamura, Mikiko; Suzuki, Ayako; Akada, Junko; Tomiyoshi, Keisuke; Hoshida, Hisashi; Akada, Rinji
2015-12-01
Mammalian gene expression constructs are generally prepared in a plasmid vector, in which a promoter and terminator are located upstream and downstream of a protein-coding sequence, respectively. In this study, we found that front terminator constructs-DNA constructs containing a terminator upstream of a promoter rather than downstream of a coding region-could sufficiently express proteins as a result of end joining of the introduced DNA fragment. By taking advantage of front terminator constructs, FLAG substitutions, and deletions were generated using mutagenesis primers to identify amino acids specifically recognized by commercial FLAG antibodies. A minimal epitope sequence for polyclonal FLAG antibody recognition was also identified. In addition, we analyzed the sequence of a C-terminal Ser-Lys-Leu peroxisome localization signal, and identified the key residues necessary for peroxisome targeting. Moreover, front terminator constructs of hepatitis B surface antigen were used for deletion analysis, leading to the identification of regions required for the particle formation. Collectively, these results indicate that front terminator constructs allow for easy manipulations of C-terminal protein-coding sequences, and suggest that direct gene expression with PCR-amplified DNA is useful for high-throughput protein analysis in mammalian cells.
Ikonomidis, Alexandros; Grapsa, Anastasia; Pavlioglou, Charikleia; Demiri, Antonia; Batarli, Alexandra; Panopoulou, Maria
2016-12-01
Fifty-six Staphylococcus epidermidis clinical isolates, showing high-level linezolid resistance and causing bacteremia in critically ill patients, were studied. All isolates belonged to ST22 clone and carried the T2504A and C2534T mutations in gene coding for 23SrRNA as well as the C189A, G208A, C209T and G384C missense mutations in L3 protein which resulted in Asp159Tyr, Gly152Asp and Leu94Val substitutions. Other silent mutations were also detected in genes coding for ribosomal proteins L3 and L22. In silico analysis of missense mutations showed that although L3 protein retained the sequence of secondary motifs, the tertiary structure was influenced. The observed alteration in L3 protein folding provides an indication on the putative role of L3-coding gene mutations in high-level linezolid resistance. Furthermore, linezolid pressure in health care settings where linezolid consumption is of high rates might lead to the selection of resistant mutants possessing L3 mutations that might confer high-level linezolid resistance.
Liu, Feng; Pang, Shaojun
2016-01-01
Sargassum muticum (Yendo) Fensholt is an invasive canopy-forming brown alga, expanding its presence from Northeast Asia to North America and Europe. The complete mitochondrial genome of S. muticum is characterized as a circular molecule of 34,720 bp. The overall AT content of S. muticum mitogenome is 63.41%. This mitogenome contains 65 genes typically found in brown algae, including 3 ribosomal RNA genes, 25 transfer RNA genes, 35 protein-coding genes, and 2 conserved open reading frames (ORFs). The gene order of mitogenome for S. muticum is identical to that for Sargassum horneri, Fucus vesiculosus and Desmarestia viridis. Phylogenetic analyses based on 35 protein-coding genes reveal that S. muticum has a close evolutionary relationship with S. horneri and a distant relationship with Dictyota dichotoma, supporting current taxonomic systems. The present investigation provides new molecular data for studies of S. muticum population diversity as well as comparative genomics in the Phaeophyceae.
Krishna, Srikar; Nair, Aparna; Cheedipudi, Sirisha; Poduval, Deepak; Dhawan, Jyotsna; Palakodeti, Dasaradhi; Ghanekar, Yashoda
2013-01-07
Small non-coding RNAs such as miRNAs, piRNAs and endo-siRNAs fine-tune gene expression through post-transcriptional regulation, modulating important processes in development, differentiation, homeostasis and regeneration. Using deep sequencing, we have profiled small non-coding RNAs in Hydra magnipapillata and investigated changes in small RNA expression pattern during head regeneration. Our results reveal a unique repertoire of small RNAs in hydra. We have identified 126 miRNA loci; 123 of these miRNAs are unique to hydra. Less than 50% are conserved across two different strains of Hydra vulgaris tested in this study, indicating a highly diverse nature of hydra miRNAs in contrast to bilaterian miRNAs. We also identified siRNAs derived from precursors with perfect stem-loop structure and that arise from inverted repeats. piRNAs were the most abundant small RNAs in hydra, mapping to transposable elements, the annotated transcriptome and unique non-coding regions on the genome. piRNAs that map to transposable elements and the annotated transcriptome display a ping-pong signature. Further, we have identified several miRNAs and piRNAs whose expression is regulated during hydra head regeneration. Our study defines different classes of small RNAs in this cnidarian model system, which may play a role in orchestrating gene expression essential for hydra regeneration.
Krishna, Srikar; Nair, Aparna; Cheedipudi, Sirisha; Poduval, Deepak; Dhawan, Jyotsna; Palakodeti, Dasaradhi; Ghanekar, Yashoda
2013-01-01
Small non-coding RNAs such as miRNAs, piRNAs and endo-siRNAs fine-tune gene expression through post-transcriptional regulation, modulating important processes in development, differentiation, homeostasis and regeneration. Using deep sequencing, we have profiled small non-coding RNAs in Hydra magnipapillata and investigated changes in small RNA expression pattern during head regeneration. Our results reveal a unique repertoire of small RNAs in hydra. We have identified 126 miRNA loci; 123 of these miRNAs are unique to hydra. Less than 50% are conserved across two different strains of Hydra vulgaris tested in this study, indicating a highly diverse nature of hydra miRNAs in contrast to bilaterian miRNAs. We also identified siRNAs derived from precursors with perfect stem–loop structure and that arise from inverted repeats. piRNAs were the most abundant small RNAs in hydra, mapping to transposable elements, the annotated transcriptome and unique non-coding regions on the genome. piRNAs that map to transposable elements and the annotated transcriptome display a ping–pong signature. Further, we have identified several miRNAs and piRNAs whose expression is regulated during hydra head regeneration. Our study defines different classes of small RNAs in this cnidarian model system, which may play a role in orchestrating gene expression essential for hydra regeneration. PMID:23166307
Darbani, Behrooz; Noeparvar, Shahin; Borg, Søren
2016-01-01
RNA circularization made by head-to-tail back-splicing events is involved in the regulation of gene expression from transcriptional to post-translational levels. By exploiting RNA-Seq data and down-stream analysis, we shed light on the importance of circular RNAs in plants. The results introduce circular RNAs as novel interactors in the regulation of gene expression in plants and imply the comprehensiveness of this regulatory pathway by identifying circular RNAs for a diverse set of genes. These genes are involved in several aspects of cellular metabolism as hormonal signaling, intracellular protein sorting, carbohydrate metabolism and cell-wall biogenesis, respiration, amino acid biosynthesis, transcription and translation, and protein ubiquitination. Additionally, these parental loci of circular RNAs, from both nuclear and mitochondrial genomes, encode for different transcript classes including protein coding transcripts, microRNA, rRNA, and long non-coding/microprotein coding RNAs. The results shed light on the mitochondrial exonic circular RNAs and imply the importance of circular RNAs for regulation of mitochondrial genes. Importantly, we introduce circular RNAs in barley and elucidate their cellular-level alterations across tissues and in response to micronutrients iron and zinc. In further support of circular RNAs' functional roles in plants, we report several cases where fluctuations of circRNAs do not correlate with the levels of their parental-loci encoded linear transcripts. PMID:27375638
Corbi, N; Libri, V; Fanciulli, M; Tinsley, J M; Davies, K E; Passananti, C
2000-06-01
Up-regulation of utrophin gene expression is recognized as a plausible therapeutic approach in the treatment of Duchenne muscular dystrophy (DMD). We have designed and engineered new zinc finger-based transcription factors capable of binding and activating transcription from the promoter of the dystrophin-related gene, utrophin. Using the recognition 'code' that proposes specific rules between zinc finger primary structure and potential DNA binding sites, we engineered a new gene named 'Jazz' that encodes for a three-zinc finger peptide. Jazz belongs to the Cys2-His2 zinc finger type and was engineered to target the nine base pair DNA sequence: 5'-GCT-GCT-GCG-3', present in the promoter region of both the human and mouse utrophin gene. The entire zinc finger alpha-helix region, containing the amino acid positions that are crucial for DNA binding, was specifically chosen on the basis of the contacts more frequently represented in the available list of the 'code'. Here we demonstrate that Jazz protein binds specifically to the double-stranded DNA target, with a dissociation constant of about 32 nM. Band shift and super-shift experiments confirmed the high affinity and specificity of Jazz protein for its DNA target. Moreover, we show that chimeric proteins, named Gal4-Jazz and Sp1-Jazz, are able to drive the transcription of a test gene from the human utrophin promoter.
Williams, R R; Hassan-Walker, A F; Lavender, F L; Morgan, M; Faik, P; Ragoussis, J
2001-05-16
Minisatellites are tandemly repeated DNA sequences found throughout the genomes of all eukaryotes. They are regions often prone to instability and hence hypervariability; thus repeat unit sequence is generally not conserved beyond closely related species. We have studied the minisatellite located in intron 9 of the human glucose phosphate isomerase (GPI) gene (also known as neuroleukin, autocrine motility factor, maturation and differentiation factor) and have found, by Zoo blotting coupled with PCR amplification and DNA sequencing, that similar repeat units are present in seven other species of mammal. There is also evidence for the presence of the minisatellite in chicken. The repeat unit does not appear to be present at any other locus in these genomes. Minisatellite DNA has been reported to be involved in recombination activity, control of gene expression of nearby gene(s) (both transcriptional and translational), whilst others form protein coding regions. The high level of conservation exhibited by the GPI minisatellite, coupled with the unique location, strongly suggests a functional role. Our results from transient and stable transfections using luciferase reporter constructs have shown that the GPI minisatellite region can act to increase transcription from the SV40 promoter, CMV promoter and the human GPI promoter.
Analysis of the cytochrome c oxidase subunit II (COX2) gene in giant panda, Ailuropoda melanoleuca.
Ling, S S; Zhu, Y; Lan, D; Li, D S; Pang, H Z; Wang, Y; Li, D Y; Wei, R P; Zhang, H M; Wang, C D; Hu, Y D
2017-01-23
The giant panda, Ailuropoda melanoleuca (Ursidae), has a unique bamboo-based diet; however, this low-energy intake has been sufficient to maintain the metabolic processes of this species since the fourth ice age. As mitochondria are the main sites for energy metabolism in animals, the protein-coding genes involved in mitochondrial respiratory chains, particularly cytochrome c oxidase subunit II (COX2), which is the rate-limiting enzyme in electron transfer, could play an important role in giant panda metabolism. Therefore, the present study aimed to isolate, sequence, and analyze the COX2 DNA from individuals kept at the Giant Panda Protection and Research Center, China, and compare these sequences with those of the other Ursidae family members. Multiple sequence alignment showed that the COX2 gene had three point mutations that defined three haplotypes, with 60% of the sequences corresponding to haplotype I. The neutrality tests revealed that the COX2 gene was conserved throughout evolution, and the maximum likelihood phylogenetic analysis, using homologous sequences from other Ursidae species, showed clustering of the COX2 sequences of giant pandas, suggesting that this gene evolved differently in them.
Probing the Boundaries of Orthology: The Unanticipated Rapid Evolution of Drosophila centrosomin
Eisman, Robert C.; Kaufman, Thomas C.
2013-01-01
The rapid evolution of essential developmental genes and their protein products is both intriguing and problematic. The rapid evolution of gene products with simple protein folds and a lack of well-characterized functional domains typically result in a low discovery rate of orthologous genes. Additionally, in the absence of orthologs it is difficult to study the processes and mechanisms underlying rapid evolution. In this study, we have investigated the rapid evolution of centrosomin (cnn), an essential gene encoding centrosomal protein isoforms required during syncytial development in Drosophila melanogaster. Until recently the rapid divergence of cnn made identification of orthologs difficult and questionable because Cnn violates many of the assumptions underlying models for protein evolution. To overcome these limitations, we have identified a group of insect orthologs and present conserved features likely to be required for the functions attributed to cnn in D. melanogaster. We also show that the rapid divergence of Cnn isoforms is apparently due to frequent coding sequence indels and an accelerated rate of intronic additions and eliminations. These changes appear to be buffered by multi-exon and multi-reading frame maximum potential ORFs, simple protein folds, and the splicing machinery. These buffering features also occur in other genes in Drosophila and may help prevent potentially deleterious mutations due to indels in genes with large coding exons and exon-dense regions separated by small introns. This work promises to be useful for future investigations of cnn and potentially other rapidly evolving genes and proteins. PMID:23749319
Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne; Bourgeois, Stephane; Svensson, Peter J; Wadelius, Mia; Deloukas, Panos; Montgomery, Stephen B; Altman, Russ B
2017-11-24
Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects. Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.
Biotin protein ligase from Corynebacterium glutamicum: role for growth and L: -lysine production.
Peters-Wendisch, P; Stansen, K C; Götker, S; Wendisch, V F
2012-03-01
Corynebacterium glutamicum is a biotin auxotrophic Gram-positive bacterium that is used for large-scale production of amino acids, especially of L-glutamate and L-lysine. It is known that biotin limitation triggers L-glutamate production and that L-lysine production can be increased by enhancing the activity of pyruvate carboxylase, one of two biotin-dependent proteins of C. glutamicum. The gene cg0814 (accession number YP_225000) has been annotated to code for putative biotin protein ligase BirA, but the protein has not yet been characterized. A discontinuous enzyme assay of biotin protein ligase activity was established using a 105aa peptide corresponding to the carboxyterminus of the biotin carboxylase/biotin carboxyl carrier protein subunit AccBC of the acetyl CoA carboxylase from C. glutamicum as acceptor substrate. Biotinylation of this biotin acceptor peptide was revealed with crude extracts of a strain overexpressing the birA gene and was shown to be ATP dependent. Thus, birA from C. glutamicum codes for a functional biotin protein ligase (EC 6.3.4.15). The gene birA from C. glutamicum was overexpressed and the transcriptome was compared with the control strain revealing no significant gene expression changes of the bio-genes. However, biotin protein ligase overproduction increased the level of the biotin-containing protein pyruvate carboxylase and entailed a significant growth advantage in glucose minimal medium. Moreover, birA overexpression resulted in a twofold higher L-lysine yield on glucose as compared with the control strain.
Chocu, Sophie; Evrard, Bertrand; Lavigne, Régis; Rolland, Antoine D; Aubry, Florence; Jégou, Bernard; Chalmel, Frédéric; Pineau, Charles
2014-11-01
Spermatogenesis is a complex process, dependent upon the successive activation and/or repression of thousands of gene products, and ends with the production of haploid male gametes. RNA sequencing of male germ cells in the rat identified thousands of novel testicular unannotated transcripts (TUTs). Although such RNAs are usually annotated as long noncoding RNAs (lncRNAs), it is possible that some of these TUTs code for protein. To test this possibility, we used a "proteomics informed by transcriptomics" (PIT) strategy combining RNA sequencing data with shotgun proteomics analyses of spermatocytes and spermatids in the rat. Among 3559 TUTs and 506 lncRNAs found in meiotic and postmeiotic germ cells, 44 encoded at least one peptide. We showed that these novel high-confidence protein-coding loci exhibit several genomic features intermediate between those of lncRNAs and mRNAs. We experimentally validated the testicular expression pattern of two of these novel protein-coding gene candidates, both highly conserved in mammals: one for a vesicle-associated membrane protein we named VAMP-9, and the other for an enolase domain-containing protein. This study confirms the potential of PIT approaches for the discovery of protein-coding transcripts initially thought to be untranslated or unknown transcripts. Our results contribute to the understanding of spermatogenesis by characterizing two novel proteins, implicated by their strong expression in germ cells. The mass spectrometry proteomics data have been deposited with the ProteomeXchange Consortium under the data set identifier PXD000872. © 2014 by the Society for the Study of Reproduction, Inc.
Comparative architecture of silks, fibrous proteins and their encoding genes in insects and spiders.
Craig, Catherine L; Riekel, Christian
2002-12-01
The known silk fibroins and fibrous glues are thought to be encoded by members of the same gene family. All silk fibroins sequenced to date contain regions of long-range order (crystalline regions) and/or short-range order (non-crystalline regions). All of the sequenced fibroin silks (Flag or silk from flagelliform gland in spiders; Fhc or heavy chain fibroin silks produced by Lepidoptera larvae) are made up of hierarchically organized, repetitive arrays of amino acids. Fhc fibroin genes are characterized by a similar molecular genetic architecture of two exons and one intron, but the organization and size of these units differs. The Flag, Ser (sericin gene) and BR (Balbiani ring genes; both fibrous proteins) genes are made up of multiple exons and introns. Sequences coding for crystalline and non-crystalline protein domains are integrated in the repetitive regions of Fhc and MA exons, but not in the protein glues Ser1 and BR-1. Genetic 'hot-spots' promote recombination errors in Fhc, MA, and Flag. Codon bias, structural constraint, point mutations, and shortened coding arrays may be alternative means of stabilizing precursor mRNA transcripts. Differential regulation of gene expression and selective splicing of the mRNA transcript may allow rapid adaptation of silk functional properties to different physical environments.
Analysis of the cbhE' plasmid gene from acute disease-causing isolates of Coxiella burnetii.
Minnick, M F; Small, C L; Frazier, M E; Mallavia, L P
1991-07-15
A gene termed cbhE' was cloned from the QpH1 plasmid of Coxiella burnetii. Expression of recombinants containing cbhE' in vitro and in Escherichia coli maxicells, produced an insert-encoded polypeptide of approx. 42 kDa. The CbhE protein was not cleaved when intact maxicells were treated with trypsin. Hybridizations of total DNA isolated from the six strains of C. burnetii indicate that this gene is unique to C. burnetii strains associated with acute disease, i.e., Hamilton[I], Vacca[II], and Rasche[III]. The cbhE' gene was not detected in strains associated with chronic disease (Biotzere[IV] and Corazon[V]) or the Dod[VI] strain. The cbhE' open reading frame (ORF) is 1022 bp in length and is preceded by a predicted promoter/Shine-Dalgarno (SD) region of TCAACT(-35)-N16-TAAAAT(-10)-N14-AGAAGGA (SD) located 10 nucleotides (nt) before the presumed AUG start codon. The ORF ends with a single UAA stop codon and has no apparent Rho-factor-independent terminator following it. The cbhE' gene codes for the CbhE protein of 341 amino acid (aa) residues with a deduced Mr of 39,442. CbhE is predominantly hydrophilic with a predicted pI of 4.43. The function of CbhE is unknown. No nt or aa sequences with homology to cbhE' or CbhE, respectively, were found in searches of a number of data bases.
Gawin, Agnieszka; Valla, Svein; Brautaset, Trygve
2017-07-01
The XylS/Pm regulator/promoter system originating from the Pseudomonas putida TOL plasmid pWW0 is widely used for regulated low- and high-level recombinant expression of genes and gene clusters in Escherichia coli and other bacteria. Induction of this system can be graded by using different cheap benzoic acid derivatives, which enter cells by passive diffusion, operate in a dose-dependent manner and are typically not metabolized by the host cells. Combinatorial mutagenesis and selection using the bla gene encoding β-lactamase as a reporter have demonstrated that the Pm promoter, the DNA sequence corresponding to the 5' untranslated end of its cognate mRNA and the xylS coding region can be modified and improved relative to various types of applications. By combining such mutant genetic elements, altered and extended expression profiles were achieved. Due to their unique properties, obtained systems serve as a genetic toolbox valuable for heterologous protein production and metabolic engineering, as well as for basic studies aiming at understanding fundamental parameters affecting bacterial gene expression. The approaches used to modify XylS/Pm should be adaptable for similar improvements also of other microbial expression systems. In this review, we summarize constructions, characteristics, refinements and applications of expression tools using the XylS/Pm system. © 2017 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.
Reciprocal Translocation Observed in End-of-Production Cells of a Commercial CHO-Based Process.
Rouiller, Yolande; Kleuser, Beate; Toso, Emiliano; Palinksy, Wolf; Rossi, Mara; Rossatto, Paola; Barberio, Davide; Broly, Hervé
2015-01-01
During the validation of an additional working cell bank derived from a validated master cell bank to support the commercial production continuum of a recombinant protein, we observed an unexpected chromosomal location of the gene of interest in some end-of-production cells. This event-identified by fluorescence in situ hybridization and multicolour chromosome painting as a reciprocal translocation involving a chromosome region containing the gene of interest with its integral coding and flanking sequences-was unique, occurred probably during or prior to multicolour chromosome painting establishment, and was transmitted to the descending generations. Cells bearing the translocation had a transient and process-independent selective advantage, which did not affect process performance and product quality. However, this first report of a translocation affecting the gene of interest location in Chinese Hamster Ovary cells used for producing a biotherapeutic indicates the importance of the demonstration of the integrity of the gene of interest in end-of-production cells. The expression of recombinant therapeutic proteins in mammalian cells depends on the establishment of a cell line with the gene of interest integrated in the host genome and stably expressed over time. Before being used for commercial production, cell lines are submitted to a qualification program in order to ensure their phenotypic and genotypic characteristics and the efficacy and safety of the product. During the production life cycle of a therapeutic protein, additional cells banks have to be validated after exhaustion of the current qualified cell bank in order to support the commercial production continuum of the recombinant protein. It is during the validation of an additional working cell bank derived from a validated master cell bank that we detected a different chromosome bearing the gene of interest in a portion of cells at the end of the upstream production phase. In our case, this event did not affect the process performance, the product quality, or its safety profile, but it highlights the need to characterize the integrity of the gene of interest in end-of-production cells when producing recombinant proteins for human use. © PDA, Inc. 2015.
Lange, Leslie A.; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M.; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M.; Smith, Joshua D.; Turner, Emily H.; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A.; Holmen, Oddgeir L.; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A.; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C.; Correa, Adolfo; Griswold, Michael E.; Jakobsdottir, Johanna; Smith, Albert V.; Schreiner, Pamela J.; Feitosa, Mary F.; Zhang, Qunyuan; Huffman, Jennifer E.; Crosby, Jacy; Wassel, Christina L.; Do, Ron; Franceschini, Nora; Martin, Lisa W.; Robinson, Jennifer G.; Assimes, Themistocles L.; Crosslin, David R.; Rosenthal, Elisabeth A.; Tsai, Michael; Rieder, Mark J.; Farlow, Deborah N.; Folsom, Aaron R.; Lumley, Thomas; Fox, Ervin R.; Carlson, Christopher S.; Peters, Ulrike; Jackson, Rebecca D.; van Duijn, Cornelia M.; Uitterlinden, André G.; Levy, Daniel; Rotter, Jerome I.; Taylor, Herman A.; Gudnason, Vilmundur; Siscovick, David S.; Fornage, Myriam; Borecki, Ingrid B.; Hayward, Caroline; Rudan, Igor; Chen, Y. Eugene; Bottinger, Erwin P.; Loos, Ruth J.F.; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M.; Gabriel, Stacey B.; O’Donnell, Christopher J.; Post, Wendy S.; North, Kari E.; Reiner, Alexander P.; Boerwinkle, Eric; Psaty, Bruce M.; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P.; Cupples, L. Adrienne; Kooperberg, Charles; Wilson, James G.; Nickerson, Deborah A.; Abecasis, Goncalo R.; Rich, Stephen S.; Tracy, Russell P.; Willer, Cristen J.; Gabriel, Stacey B.; Altshuler, David M.; Abecasis, Gonçalo R.; Allayee, Hooman; Cresci, Sharon; Daly, Mark J.; de Bakker, Paul I.W.; DePristo, Mark A.; Do, Ron; Donnelly, Peter; Farlow, Deborah N.; Fennell, Tim; Garimella, Kiran; Hazen, Stanley L.; Hu, Youna; Jordan, Daniel M.; Jun, Goo; Kathiresan, Sekar; Kang, Hyun Min; Kiezun, Adam; Lettre, Guillaume; Li, Bingshan; Li, Mingyao; Newton-Cheh, Christopher H.; Padmanabhan, Sandosh; Peloso, Gina; Pulit, Sara; Rader, Daniel J.; Reich, David; Reilly, Muredach P.; Rivas, Manuel A.; Schwartz, Steve; Scott, Laura; Siscovick, David S.; Spertus, John A.; Stitziel, Nathaniel O.; Stoletzki, Nina; Sunyaev, Shamil R.; Voight, Benjamin F.; Willer, Cristen J.; Rich, Stephen S.; Akylbekova, Ermeg; Atwood, Larry D.; Ballantyne, Christie M.; Barbalic, Maja; Barr, R. Graham; Benjamin, Emelia J.; Bis, Joshua; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer; Budoff, Matthew; Burke, Greg; Buxbaum, Sarah; Carr, Jeff; Chen, Donna T.; Chen, Ida Y.; Chen, Wei-Min; Concannon, Pat; Crosby, Jacy; Cupples, L. Adrienne; D’Agostino, Ralph; DeStefano, Anita L.; Dreisbach, Albert; Dupuis, Josée; Durda, J. Peter; Ellis, Jaclyn; Folsom, Aaron R.; Fornage, Myriam; Fox, Caroline S.; Fox, Ervin; Funari, Vincent; Ganesh, Santhi K.; Gardin, Julius; Goff, David; Gordon, Ora; Grody, Wayne; Gross, Myron; Guo, Xiuqing; Hall, Ira M.; Heard-Costa, Nancy L.; Heckbert, Susan R.; Heintz, Nicholas; Herrington, David M.; Hickson, DeMarc; Huang, Jie; Hwang, Shih-Jen; Jacobs, David R.; Jenny, Nancy S.; Johnson, Andrew D.; Johnson, Craig W.; Kawut, Steven; Kronmal, Richard; Kurz, Raluca; Lange, Ethan M.; Lange, Leslie A.; Larson, Martin G.; Lawson, Mark; Lewis, Cora E.; Levy, Daniel; Li, Dalin; Lin, Honghuang; Liu, Chunyu; Liu, Jiankang; Liu, Kiang; Liu, Xiaoming; Liu, Yongmei; Longstreth, William T.; Loria, Cay; Lumley, Thomas; Lunetta, Kathryn; Mackey, Aaron J.; Mackey, Rachel; Manichaikul, Ani; Maxwell, Taylor; McKnight, Barbara; Meigs, James B.; Morrison, Alanna C.; Musani, Solomon K.; Mychaleckyj, Josyf C.; Nettleton, Jennifer A.; North, Kari; O’Donnell, Christopher J.; O’Leary, Daniel; Ong, Frank; Palmas, Walter; Pankow, James S.; Pankratz, Nathan D.; Paul, Shom; Perez, Marco; Person, Sharina D.; Polak, Joseph; Post, Wendy S.; Psaty, Bruce M.; Quinlan, Aaron R.; Raffel, Leslie J.; Ramachandran, Vasan S.; Reiner, Alexander P.; Rice, Kenneth; Rotter, Jerome I.; Sanders, Jill P.; Schreiner, Pamela; Seshadri, Sudha; Shea, Steve; Sidney, Stephen; Silverstein, Kevin; Smith, Nicholas L.; Sotoodehnia, Nona; Srinivasan, Asoke; Taylor, Herman A.; Taylor, Kent; Thomas, Fridtjof; Tracy, Russell P.; Tsai, Michael Y.; Volcik, Kelly A.; Wassel, Chrstina L.; Watson, Karol; Wei, Gina; White, Wendy; Wiggins, Kerri L.; Wilk, Jemma B.; Williams, O. Dale; Wilson, Gregory; Wilson, James G.; Wolf, Phillip; Zakai, Neil A.; Hardy, John; Meschia, James F.; Nalls, Michael; Singleton, Andrew; Worrall, Brad; Bamshad, Michael J.; Barnes, Kathleen C.; Abdulhamid, Ibrahim; Accurso, Frank; Anbar, Ran; Beaty, Terri; Bigham, Abigail; Black, Phillip; Bleecker, Eugene; Buckingham, Kati; Cairns, Anne Marie; Caplan, Daniel; Chatfield, Barbara; Chidekel, Aaron; Cho, Michael; Christiani, David C.; Crapo, James D.; Crouch, Julia; Daley, Denise; Dang, Anthony; Dang, Hong; De Paula, Alicia; DeCelie-Germana, Joan; Drumm, Allen DozorMitch; Dyson, Maynard; Emerson, Julia; Emond, Mary J.; Ferkol, Thomas; Fink, Robert; Foster, Cassandra; Froh, Deborah; Gao, Li; Gershan, William; Gibson, Ronald L.; Godwin, Elizabeth; Gondor, Magdalen; Gutierrez, Hector; Hansel, Nadia N.; Hassoun, Paul M.; Hiatt, Peter; Hokanson, John E.; Howenstine, Michelle; Hummer, Laura K.; Kanga, Jamshed; Kim, Yoonhee; Knowles, Michael R.; Konstan, Michael; Lahiri, Thomas; Laird, Nan; Lange, Christoph; Lin, Lin; Lin, Xihong; Louie, Tin L.; Lynch, David; Make, Barry; Martin, Thomas R.; Mathai, Steve C.; Mathias, Rasika A.; McNamara, John; McNamara, Sharon; Meyers, Deborah; Millard, Susan; Mogayzel, Peter; Moss, Richard; Murray, Tanda; Nielson, Dennis; Noyes, Blakeslee; O’Neal, Wanda; Orenstein, David; O’Sullivan, Brian; Pace, Rhonda; Pare, Peter; Parker, H. Worth; Passero, Mary Ann; Perkett, Elizabeth; Prestridge, Adrienne; Rafaels, Nicholas M.; Ramsey, Bonnie; Regan, Elizabeth; Ren, Clement; Retsch-Bogart, George; Rock, Michael; Rosen, Antony; Rosenfeld, Margaret; Ruczinski, Ingo; Sanford, Andrew; Schaeffer, David; Sell, Cindy; Sheehan, Daniel; Silverman, Edwin K.; Sin, Don; Spencer, Terry; Stonebraker, Jackie; Tabor, Holly K.; Varlotta, Laurie; Vergara, Candelaria I.; Weiss, Robert; Wigley, Fred; Wise, Robert A.; Wright, Fred A.; Wurfel, Mark M.; Zanni, Robert; Zou, Fei; Nickerson, Deborah A.; Rieder, Mark J.; Green, Phil; Shendure, Jay; Akey, Joshua M.; Bustamante, Carlos D.; Crosslin, David R.; Eichler, Evan E.; Fox, P. Keolu; Fu, Wenqing; Gordon, Adam; Gravel, Simon; Jarvik, Gail P.; Johnsen, Jill M.; Kan, Mengyuan; Kenny, Eimear E.; Kidd, Jeffrey M.; Lara-Garduno, Fremiet; Leal, Suzanne M.; Liu, Dajiang J.; McGee, Sean; O’Connor, Timothy D.; Paeper, Bryan; Robertson, Peggy D.; Smith, Joshua D.; Staples, Jeffrey C.; Tennessen, Jacob A.; Turner, Emily H.; Wang, Gao; Yi, Qian; Jackson, Rebecca; Peters, Ulrike; Carlson, Christopher S.; Anderson, Garnet; Anton-Culver, Hoda; Assimes, Themistocles L.; Auer, Paul L.; Beresford, Shirley; Bizon, Chris; Black, Henry; Brunner, Robert; Brzyski, Robert; Burwen, Dale; Caan, Bette; Carty, Cara L.; Chlebowski, Rowan; Cummings, Steven; Curb, J. David; Eaton, Charles B.; Ford, Leslie; Franceschini, Nora; Fullerton, Stephanie M.; Gass, Margery; Geller, Nancy; Heiss, Gerardo; Howard, Barbara V.; Hsu, Li; Hutter, Carolyn M.; Ioannidis, John; Jiao, Shuo; Johnson, Karen C.; Kooperberg, Charles; Kuller, Lewis; LaCroix, Andrea; Lakshminarayan, Kamakshi; Lane, Dorothy; Lasser, Norman; LeBlanc, Erin; Li, Kuo-Ping; Limacher, Marian; Lin, Dan-Yu; Logsdon, Benjamin A.; Ludlam, Shari; Manson, JoAnn E.; Margolis, Karen; Martin, Lisa; McGowan, Joan; Monda, Keri L.; Kotchen, Jane Morley; Nathan, Lauren; Ockene, Judith; O’Sullivan, Mary Jo; Phillips, Lawrence S.; Prentice, Ross L.; Robbins, John; Robinson, Jennifer G.; Rossouw, Jacques E.; Sangi-Haghpeykar, Haleh; Sarto, Gloria E.; Shumaker, Sally; Simon, Michael S.; Stefanick, Marcia L.; Stein, Evan; Tang, Hua; Taylor, Kira C.; Thomson, Cynthia A.; Thornton, Timothy A.; Van Horn, Linda; Vitolins, Mara; Wactawski-Wende, Jean; Wallace, Robert; Wassertheil-Smoller, Sylvia; Zeng, Donglin; Applebaum-Bowden, Deborah; Feolo, Michael; Gan, Weiniu; Paltoo, Dina N.; Sholinsky, Phyliss; Sturcke, Anne
2014-01-01
Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775
Lange, Leslie A; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M; Smith, Joshua D; Turner, Emily H; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-Ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A; Holmen, Oddgeir L; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C; Correa, Adolfo; Griswold, Michael E; Jakobsdottir, Johanna; Smith, Albert V; Schreiner, Pamela J; Feitosa, Mary F; Zhang, Qunyuan; Huffman, Jennifer E; Crosby, Jacy; Wassel, Christina L; Do, Ron; Franceschini, Nora; Martin, Lisa W; Robinson, Jennifer G; Assimes, Themistocles L; Crosslin, David R; Rosenthal, Elisabeth A; Tsai, Michael; Rieder, Mark J; Farlow, Deborah N; Folsom, Aaron R; Lumley, Thomas; Fox, Ervin R; Carlson, Christopher S; Peters, Ulrike; Jackson, Rebecca D; van Duijn, Cornelia M; Uitterlinden, André G; Levy, Daniel; Rotter, Jerome I; Taylor, Herman A; Gudnason, Vilmundur; Siscovick, David S; Fornage, Myriam; Borecki, Ingrid B; Hayward, Caroline; Rudan, Igor; Chen, Y Eugene; Bottinger, Erwin P; Loos, Ruth J F; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M; Gabriel, Stacey B; O'Donnell, Christopher J; Post, Wendy S; North, Kari E; Reiner, Alexander P; Boerwinkle, Eric; Psaty, Bruce M; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P; Cupples, L Adrienne; Kooperberg, Charles; Wilson, James G; Nickerson, Deborah A; Abecasis, Goncalo R; Rich, Stephen S; Tracy, Russell P; Willer, Cristen J
2014-02-06
Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Evolution of the eukaryotic dynactin complex, the activator of cytoplasmic dynein
2012-01-01
Background Dynactin is a large multisubunit protein complex that enhances the processivity of cytoplasmic dynein and acts as an adapter between dynein and the cargo. It is composed of eleven different polypeptides of which eight are unique to this complex, namely dynactin1 (p150Glued), dynactin2 (p50 or dynamitin), dynactin3 (p24), dynactin4 (p62), dynactin5 (p25), dynactin6 (p27), and the actin-related proteins Arp1 and Arp10 (Arp11). Results To reveal the evolution of dynactin across the eukaryotic tree the presence or absence of all dynactin subunits was determined in most of the available eukaryotic genome assemblies. Altogether, 3061 dynactin sequences from 478 organisms have been annotated. Phylogenetic trees of the various subunit sequences were used to reveal sub-family relationships and to reconstruct gene duplication events. Especially in the metazoan lineage, several of the dynactin subunits were duplicated independently in different branches. The largest subunit repertoire is found in vertebrates. Dynactin diversity in vertebrates is further increased by alternative splicing of several subunits. The most prominent example is the dynactin1 gene, which may code for up to 36 different isoforms due to three different transcription start sites and four exons that are spliced as differentially included exons. Conclusions The dynactin complex is a very ancient complex that most likely included all subunits in the last common ancestor of extant eukaryotes. The absence of dynactin in certain species coincides with that of the cytoplasmic dynein heavy chain: Organisms that do not encode cytoplasmic dynein like plants and diplomonads also do not encode the unique dynactin subunits. The conserved core of dynactin consists of dynactin1, dynactin2, dynactin4, dynactin5, Arp1, and the heterodimeric actin capping protein. The evolution of the remaining subunits dynactin3, dynactin6, and Arp10 is characterized by many branch- and species-specific gene loss events. PMID:22726940
Stewart, Paul A; Parapatics, Katja; Welsh, Eric A; Müller, André C; Cao, Haoyun; Fang, Bin; Koomen, John M; Eschrich, Steven A; Bennett, Keiryn L; Haura, Eric B
2015-01-01
We performed a pilot proteogenomic study to compare lung adenocarcinoma to lung squamous cell carcinoma using quantitative proteomics (6-plex TMT) combined with a customized Affymetrix GeneChip. Using MaxQuant software, we identified 51,001 unique peptides that mapped to 7,241 unique proteins and from these identified 6,373 genes with matching protein expression for further analysis. We found a minor correlation between gene expression and protein expression; both datasets were able to independently recapitulate known differences between the adenocarcinoma and squamous cell carcinoma subtypes. We found 565 proteins and 629 genes to be differentially expressed between adenocarcinoma and squamous cell carcinoma, with 113 of these consistently differentially expressed at both the gene and protein levels. We then compared our results to published adenocarcinoma versus squamous cell carcinoma proteomic data that we also processed with MaxQuant. We selected two proteins consistently overexpressed in squamous cell carcinoma in all studies, MCT1 (SLC16A1) and GLUT1 (SLC2A1), for further investigation. We found differential expression of these same proteins at the gene level in our study as well as in other public gene expression datasets. These findings combined with survival analysis of public datasets suggest that MCT1 and GLUT1 may be potential prognostic markers in adenocarcinoma and druggable targets in squamous cell carcinoma. Data are available via ProteomeXchange with identifier PXD002622.