Computer analysis of protein functional sites projection on exon structure of genes in Metazoa.
Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A
2015-01-01
Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and allow a better understanding of the emergence of biological diversity.
A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes.
Hezroni, Hadas; Ben-Tov Perry, Rotem; Meir, Zohar; Housman, Gali; Lubelsky, Yoav; Ulitsky, Igor
2017-08-30
Only a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs. We systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality. We estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.
Computer analysis of protein functional sites projection on exon structure of genes in Metazoa
2015-01-01
Background Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. Results One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. Conclusions These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and allow a better understanding of the emergence of biological diversity. PMID:26693737
Long non-coding RNAs and mRNAs profiling during spleen development in pig.
Che, Tiandong; Li, Diyan; Jin, Long; Fu, Yuhua; Liu, Yingkai; Liu, Pengliang; Wang, Yixin; Tang, Qianzi; Ma, Jideng; Wang, Xun; Jiang, Anan; Li, Xuewei; Li, Mingzhou
2018-01-01
Genome-wide transcriptomic studies in humans and mice have become extensive and mature. However, a comprehensive and systematic understanding of protein-coding genes and long non-coding RNAs (lncRNAs) expressed during pig spleen development has not been achieved. LncRNAs are known to participate in regulatory networks for an array of biological processes. Here, we constructed 18 RNA libraries from developing fetal pig spleen (55 days before birth), postnatal pig spleens (0, 30, 180 days and 2 years after birth), and the samples from the 2-year-old Wild Boar. A total of 15,040 lncRNA transcripts were identified among these samples. We found that the temporal expression pattern of lncRNAs was more restricted than observed for protein-coding genes. Time-series analysis showed two large modules for protein-coding genes and lncRNAs. The up-regulated module was enriched for genes related to immune and inflammatory function, while the down-regulated module was enriched for cell proliferation processes such as cell division and DNA replication. Co-expression networks indicated the functional relatedness between protein-coding genes and lncRNAs, which were enriched for similar functions over the series of time points examined. We identified numerous differentially expressed protein-coding genes and lncRNAs in all five developmental stages. Notably, ceruloplasmin precursor (CP), a protein-coding gene participating in antioxidant and iron transport processes, was differentially expressed in all stages. This study provides the first catalog of the developing pig spleen, and contributes to a fuller understanding of the molecular mechanisms underpinning mammalian spleen development.
Neuhaus, Klaus; Landstorfer, Richard; Fellner, Lea; Simon, Svenja; Schafferhans, Andrea; Goldberg, Tatyana; Marx, Harald; Ozoline, Olga N; Rost, Burkhard; Kuster, Bernhard; Keim, Daniel A; Scherer, Siegfried
2016-02-24
Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome). Using the above methods, 72 short, non-annotated protein-coding genes were detected. All of these showed signals in the ribosomal footprinting assay indicating mRNA translation. Seven were verified by mass spectrometry. Fifty-seven genes are annotated in other enterobacteriaceae, mainly as hypothetical genes; the remaining 15 genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, 61 of the 72 novel proteins exhibit predicted structural and functional features similar to those of annotated proteins. Many of the novel genes show differential transcription when grown under eleven diverse growth conditions suggesting environmental regulation. Three genes were found to confer a phenotype in previous studies, e.g., decreased cattle colonization. These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that hypothetical genes are not annotation artifacts and opening an additional way to study their functionality. All 72 genes are taxonomically restricted and, therefore, appear to have evolved relatively recently de novo.
Prediction of plant lncRNA by ensemble machine learning classifiers.
Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian
2018-05-02
In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.
Mutant phenotypes for thousands of bacterial genes of unknown function
Price, Morgan N.; Wetmore, Kelly M.; Waters, R. Jordan; ...
2018-05-16
One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because theymore » are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Lastly, our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.« less
Mutant phenotypes for thousands of bacterial genes of unknown function
DOE Office of Scientific and Technical Information (OSTI.GOV)
Price, Morgan N.; Wetmore, Kelly M.; Waters, R. Jordan
One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because theymore » are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Lastly, our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.« less
Delcourt, Vivian; Lucier, Jean-François; Gagnon, Jules; Beaudoin, Maxime C; Vanderperre, Benoît; Breton, Marc-André; Motard, Julie; Jacques, Jean-François; Brunelle, Mylène; Gagnon-Arsenault, Isabelle; Fournier, Isabelle; Ouangraoua, Aida; Hunting, Darel J; Cohen, Alan A; Landry, Christian R; Scott, Michelle S
2017-01-01
Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins. PMID:29083303
Maize GO annotation—methods, evaluation, and review (maize-GAMER)
USDA-ARS?s Scientific Manuscript database
We created a new high-coverage, robust, and reproducible functional annotation of maize protein-coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein-coding genes, respectively, this stu...
cncRNAs: Bi-functional RNAs with protein coding and non-coding functions
Kumari, Pooja; Sampath, Karuna
2015-01-01
For many decades, the major function of mRNA was thought to be to provide protein-coding information embedded in the genome. The advent of high-throughput sequencing has led to the discovery of pervasive transcription of eukaryotic genomes and opened the world of RNA-mediated gene regulation. Many regulatory RNAs have been found to be incapable of protein coding and are hence termed as non-coding RNAs (ncRNAs). However, studies in recent years have shown that several previously annotated non-coding RNAs have the potential to encode proteins, and conversely, some coding RNAs have regulatory functions independent of the protein they encode. Such bi-functional RNAs, with both protein coding and non-coding functions, which we term as ‘cncRNAs’, have emerged as new players in cellular systems. Here, we describe the functions of some cncRNAs identified from bacteria to humans. Because the functions of many RNAs across genomes remains unclear, we propose that RNAs be classified as coding, non-coding or both only after careful analysis of their functions. PMID:26498036
Reinhardt, Josephine A.; Wanjiru, Betty M.; Brant, Alicia T.; Saelao, Perot; Begun, David J.; Jones, Corbin D.
2013-01-01
How non-coding DNA gives rise to new protein-coding genes (de novo genes) is not well understood. Recent work has revealed the origins and functions of a few de novo genes, but common principles governing the evolution or biological roles of these genes are unknown. To better define these principles, we performed a parallel analysis of the evolution and function of six putatively protein-coding de novo genes described in Drosophila melanogaster. Reconstruction of the transcriptional history of de novo genes shows that two de novo genes emerged from novel long non-coding RNAs that arose at least 5 MY prior to evolution of an open reading frame. In contrast, four other de novo genes evolved a translated open reading frame and transcription within the same evolutionary interval suggesting that nascent open reading frames (proto-ORFs), while not required, can contribute to the emergence of a new de novo gene. However, none of the genes arose from proto-ORFs that existed long before expression evolved. Sequence and structural evolution of de novo genes was rapid compared to nearby genes and the structural complexity of de novo genes steadily increases over evolutionary time. Despite the fact that these genes are transcribed at a higher level in males than females, and are most strongly expressed in testes, RNAi experiments show that most of these genes are essential in both sexes during metamorphosis. This lethality suggests that protein coding de novo genes in Drosophila quickly become functionally important. PMID:24146629
Zhao, Yi; Tang, Liang; Li, Zhe; Jin, Jinpu; Luo, Jingchu; Gao, Ge
2015-04-18
Long-established protein-coding genes may lose their coding potential during evolution ("unitary gene loss"). Members of the Poaceae family are a major food source and represent an ideal model clade for plant evolution research. However, the global pattern of unitary gene loss in Poaceae genomes as well as the evolutionary fate of lost genes are still less-investigated and remain largely elusive. Using a locally developed pipeline, we identified 129 unitary gene loss events for long-established protein-coding genes from four representative species of Poaceae, i.e. brachypodium, rice, sorghum and maize. Functional annotation suggested that the lost genes in all or most of Poaceae species are enriched for genes involved in development and response to endogenous stimulus. We also found that 44 mutated genomic loci of lost genes, which we referred as relics, were still actively transcribed, and of which 84% (37 of 44) showed significantly differential expression across different tissues. More interestingly, we found that there were totally five expressed relics may function as competitive endogenous RNA in brachypodium, rice and sorghum genome. Based on comparative genomics and transcriptome data, we firstly compiled a comprehensive catalogue of unitary gene loss events in Poaceae species and characterized a statistically significant functional preference for these lost genes as well showed the potential of relics functioning as competitive endogenous RNAs in Poaceae genomes.
De Novo Origin of Human Protein-Coding Genes
Wu, Dong-Dong; Irwin, David M.; Zhang, Ya-Ping
2011-01-01
The de novo origin of a new protein-coding gene from non-coding DNA is considered to be a very rare occurrence in genomes. Here we identify 60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee. The functionality of these genes is supported by both transcriptional and proteomic evidence. RNA–seq data indicate that these genes have their highest expression levels in the cerebral cortex and testes, which might suggest that these genes contribute to phenotypic traits that are unique to humans, such as improved cognitive ability. Our results are inconsistent with the traditional view that the de novo origin of new genes is very rare, thus there should be greater appreciation of the importance of the de novo origination of genes. PMID:22102831
Itoh, Takeshi; Tanaka, Tsuyoshi; Barrero, Roberto A.; Yamasaki, Chisato; Fujii, Yasuyuki; Hilton, Phillip B.; Antonio, Baltazar A.; Aono, Hideo; Apweiler, Rolf; Bruskiewich, Richard; Bureau, Thomas; Burr, Frances; Costa de Oliveira, Antonio; Fuks, Galina; Habara, Takuya; Haberer, Georg; Han, Bin; Harada, Erimi; Hiraki, Aiko T.; Hirochika, Hirohiko; Hoen, Douglas; Hokari, Hiroki; Hosokawa, Satomi; Hsing, Yue; Ikawa, Hiroshi; Ikeo, Kazuho; Imanishi, Tadashi; Ito, Yukiyo; Jaiswal, Pankaj; Kanno, Masako; Kawahara, Yoshihiro; Kawamura, Toshiyuki; Kawashima, Hiroaki; Khurana, Jitendra P.; Kikuchi, Shoshi; Komatsu, Setsuko; Koyanagi, Kanako O.; Kubooka, Hiromi; Lieberherr, Damien; Lin, Yao-Cheng; Lonsdale, David; Matsumoto, Takashi; Matsuya, Akihiro; McCombie, W. Richard; Messing, Joachim; Miyao, Akio; Mulder, Nicola; Nagamura, Yoshiaki; Nam, Jongmin; Namiki, Nobukazu; Numa, Hisataka; Nurimoto, Shin; O’Donovan, Claire; Ohyanagi, Hajime; Okido, Toshihisa; OOta, Satoshi; Osato, Naoki; Palmer, Lance E.; Quetier, Francis; Raghuvanshi, Saurabh; Saichi, Naomi; Sakai, Hiroaki; Sakai, Yasumichi; Sakata, Katsumi; Sakurai, Tetsuya; Sato, Fumihiko; Sato, Yoshiharu; Schoof, Heiko; Seki, Motoaki; Shibata, Michie; Shimizu, Yuji; Shinozaki, Kazuo; Shinso, Yuji; Singh, Nagendra K.; Smith-White, Brian; Takeda, Jun-ichi; Tanino, Motohiko; Tatusova, Tatiana; Thongjuea, Supat; Todokoro, Fusano; Tsugane, Mika; Tyagi, Akhilesh K.; Vanavichit, Apichart; Wang, Aihui; Wing, Rod A.; Yamaguchi, Kaori; Yamamoto, Mayu; Yamamoto, Naoyuki; Yu, Yeisoo; Zhang, Hao; Zhao, Qiang; Higo, Kenichi; Burr, Benjamin; Gojobori, Takashi; Sasaki, Takuji
2007-01-01
We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ∼32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene. PMID:17210932
From Genomes to Protein Models and Back
NASA Astrophysics Data System (ADS)
Tramontano, Anna; Giorgetti, Alejandro; Orsini, Massimiliano; Raimondo, Domenico
2007-12-01
The alternative splicing mechanism allows genes to generate more than one product. When the splicing events occur within protein coding regions they can modify the biological function of the protein. Alternative splicing has been suggested as one way for explaining the discrepancy between the number of human genes and functional complexity. We analysed the putative structure of the alternatively spliced gene products annotated in the ENCODE pilot project and discovered that many of the potential alternative gene products will be unlikely to produce stable functional proteins.
Bogdanov, Yuri F; Dadashev, Sergei Y; Grishaeva, Tatiana M
2003-01-01
Evolutionarily distant organisms have not only orthologs, but also nonhomologous proteins that build functionally similar subcellular structures. For instance, this is true with protein components of the synaptonemal complex (SC), a universal ultrastructure that ensures the successful pairing and recombination of homologous chromosomes during meiosis. We aimed at developing a method to search databases for genes that code for such nonhomologous but functionally analogous proteins. Advantage was taken of the ultrastructural parameters of SC and the conformation of SC proteins responsible for these. Proteins involved in SC central space are known to be similar in secondary structure. Using published data, we found a highly significant correlation between the width of the SC central space and the length of rod-shaped central domain of mammalian and yeast intermediate proteins forming transversal filaments in the SC central space. Basing on this, we suggested a method for searching genome databases of distant organisms for genes whose virtual proteins meet the above correlation requirement. Our recent finding of the Drosophila melanogaster CG17604 gene coding for synaptonemal complex transversal filament protein received experimental support from another lab. With the same strategy, we showed that the Arabidopsis thaliana and Caenorhabditis elegans genomes contain unique genes coding for such proteins.
McGuire, Austen B; Rafi, Syed K; Manzardo, Ann M; Butler, Merlin G
2016-05-05
Mammalian chromosomes are comprised of complex chromatin architecture with the specific assembly and configuration of each chromosome influencing gene expression and function in yet undefined ways by varying degrees of heterochromatinization that result in Giemsa (G) negative euchromatic (light) bands and G-positive heterochromatic (dark) bands. We carried out morphometric measurements of high-resolution chromosome ideograms for the first time to characterize the total euchromatic and heterochromatic chromosome band length, distribution and localization of 20,145 known protein-coding genes, 790 recognized autism spectrum disorder (ASD) genes and 365 obesity genes. The individual lengths of G-negative euchromatin and G-positive heterochromatin chromosome bands were measured in millimeters and recorded from scaled and stacked digital images of 850-band high-resolution ideograms supplied by the International Society of Chromosome Nomenclature (ISCN) 2013. Our overall measurements followed established banding patterns based on chromosome size. G-negative euchromatic band regions contained 60% of protein-coding genes while the remaining 40% were distributed across the four heterochromatic dark band sub-types. ASD genes were disproportionately overrepresented in the darker heterochromatic sub-bands, while the obesity gene distribution pattern did not significantly differ from protein-coding genes. Our study supports recent trends implicating genes located in heterochromatin regions playing a role in biological processes including neurodevelopment and function, specifically genes associated with ASD.
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio
2004-01-01
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
Cipriano, Andrea; Ballarino, Monica
2018-01-01
The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as “junk” DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years. PMID:29560353
Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana.
Mayer, K; Schüller, C; Wambutt, R; Murphy, G; Volckaert, G; Pohl, T; Düsterhöft, A; Stiekema, W; Entian, K D; Terryn, N; Harris, B; Ansorge, W; Brandt, P; Grivell, L; Rieger, M; Weichselgartner, M; de Simone, V; Obermaier, B; Mache, R; Müller, M; Kreis, M; Delseny, M; Puigdomenech, P; Watson, M; Schmidtheini, T; Reichert, B; Portatelle, D; Perez-Alonso, M; Boutry, M; Bancroft, I; Vos, P; Hoheisel, J; Zimmermann, W; Wedler, H; Ridley, P; Langham, S A; McCullagh, B; Bilham, L; Robben, J; Van der Schueren, J; Grymonprez, B; Chuang, Y J; Vandenbussche, F; Braeken, M; Weltjens, I; Voet, M; Bastiaens, I; Aert, R; Defoor, E; Weitzenegger, T; Bothe, G; Ramsperger, U; Hilbert, H; Braun, M; Holzer, E; Brandt, A; Peters, S; van Staveren, M; Dirske, W; Mooijman, P; Klein Lankhorst, R; Rose, M; Hauf, J; Kötter, P; Berneiser, S; Hempel, S; Feldpausch, M; Lamberth, S; Van den Daele, H; De Keyser, A; Buysshaert, C; Gielen, J; Villarroel, R; De Clercq, R; Van Montagu, M; Rogers, J; Cronin, A; Quail, M; Bray-Allen, S; Clark, L; Doggett, J; Hall, S; Kay, M; Lennard, N; McLay, K; Mayes, R; Pettett, A; Rajandream, M A; Lyne, M; Benes, V; Rechmann, S; Borkova, D; Blöcker, H; Scharfe, M; Grimm, M; Löhnert, T H; Dose, S; de Haan, M; Maarse, A; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Fartmann, B; Granderath, K; Dauner, D; Herzl, A; Neumann, S; Argiriou, A; Vitale, D; Liguori, R; Piravandi, E; Massenet, O; Quigley, F; Clabauld, G; Mündlein, A; Felber, R; Schnabl, S; Hiller, R; Schmidt, W; Lecharny, A; Aubourg, S; Chefdor, F; Cooke, R; Berger, C; Montfort, A; Casacuberta, E; Gibbons, T; Weber, N; Vandenbol, M; Bargues, M; Terol, J; Torres, A; Perez-Perez, A; Purnelle, B; Bent, E; Johnson, S; Tacon, D; Jesse, T; Heijnen, L; Schwarz, S; Scholler, P; Heber, S; Francs, P; Bielke, C; Frishman, D; Haase, D; Lemcke, K; Mewes, H W; Stocker, S; Zaccaria, P; Bevan, M; Wilson, R K; de la Bastide, M; Habermann, K; Parnell, L; Dedhia, N; Gnoj, L; Schutz, K; Huang, E; Spiegel, L; Sehkon, M; Murray, J; Sheet, P; Cordes, M; Abu-Threideh, J; Stoneking, T; Kalicki, J; Graves, T; Harmon, G; Edwards, J; Latreille, P; Courtney, L; Cloud, J; Abbott, A; Scott, K; Johnson, D; Minx, P; Bentley, D; Fulton, B; Miller, N; Greco, T; Kemp, K; Kramer, J; Fulton, L; Mardis, E; Dante, M; Pepin, K; Hillier, L; Nelson, J; Spieth, J; Ryan, E; Andrews, S; Geisel, C; Layman, D; Du, H; Ali, J; Berghoff, A; Jones, K; Drone, K; Cotton, M; Joshu, C; Antonoiu, B; Zidanic, M; Strong, C; Sun, H; Lamar, B; Yordan, C; Ma, P; Zhong, J; Preston, R; Vil, D; Shekher, M; Matero, A; Shah, R; Swaby, I K; O'Shaughnessy, A; Rodriguez, M; Hoffmann, J; Till, S; Granat, S; Shohdy, N; Hasegawa, A; Hameed, A; Lodhi, M; Johnson, A; Chen, E; Marra, M; Martienssen, R; McCombie, W R
1999-12-16
The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.
Nmf9 Encodes a Highly Conserved Protein Important to Neurological Function in Mice and Flies.
Zhang, Shuxiao; Ross, Kevin D; Seidner, Glen A; Gorman, Michael R; Poon, Tiffany H; Wang, Xiaobo; Keithley, Elizabeth M; Lee, Patricia N; Martindale, Mark Q; Joiner, William J; Hamilton, Bruce A
2015-07-01
Many protein-coding genes identified by genome sequencing remain without functional annotation or biological context. Here we define a novel protein-coding gene, Nmf9, based on a forward genetic screen for neurological function. ENU-induced and genome-edited null mutations in mice produce deficits in vestibular function, fear learning and circadian behavior, which correlated with Nmf9 expression in inner ear, amygdala, and suprachiasmatic nuclei. Homologous genes from unicellular organisms and invertebrate animals predict interactions with small GTPases, but the corresponding domains are absent in mammalian Nmf9. Intriguingly, homozygotes for null mutations in the Drosophila homolog, CG45058, show profound locomotor defects and premature death, while heterozygotes show striking effects on sleep and activity phenotypes. These results link a novel gene orthology group to discrete neurological functions, and show conserved requirement across wide phylogenetic distance and domain level structural changes.
Dimitrieva, Slavica; Anisimova, Maria
2014-01-01
In protein-coding genes, synonymous mutations are often thought not to affect fitness and therefore are not subject to natural selection. Yet increasingly, cases of non-neutral evolution at certain synonymous sites were reported over the last decade. To evaluate the extent and the nature of site-specific selection on synonymous codons, we computed the site-to-site synonymous rate variation (SRV) and identified gene properties that make SRV more likely in a large database of protein-coding gene families and protein domains. To our knowledge, this is the first study that explores the determinants and patterns of the SRV in real data. We show that the SRV is widespread in the evolution of protein-coding sequences, putting in doubt the validity of the synonymous rate as a standard neutral proxy. While protein domains rarely undergo adaptive evolution, the SRV appears to play important role in optimizing the domain function at the level of DNA. In contrast, protein families are more likely to evolve by positive selection, but are less likely to exhibit SRV. Stronger SRV was detected in genes with stronger codon bias and tRNA reusage, those coding for proteins with larger number of interactions or forming larger number of structures, located in intracellular components and those involved in typically conserved complex processes and functions. Genes with extreme SRV show higher expression levels in nearly all tissues. This indicates that codon bias in a gene, which often correlates with gene expression, may often be a site-specific phenomenon regulating the speed of translation along the sequence, consistent with the co-translational folding hypothesis. Strikingly, genes with SRV were strongly overrepresented for metabolic pathways and those associated with several genetic diseases, particularly cancers and diabetes.
Romero, Roberto; Tarca, Adi; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S.; Kalita, Cynthia A.; Cai, Juan; Yeo, Lami; Lipovich, Leonard
2014-01-01
Objective The mechanisms responsible for normal and abnormal parturition are poorly understood. Myometrial activation leading to regular uterine contractions is a key component of labor. Dysfunctional labor (arrest of dilatation and/or descent) is a leading indication for cesarean delivery. Compelling evidence suggests that most of these disorders are functional in nature, and not the result of cephalopelvic disproportion. The methodology and the datasets afforded by the post-genomic era provide novel opportunities to understand and target gene functions in these disorders. In 2012, the ENCODE Consortium elucidated the extraordinary abundance and functional complexity of long non-coding RNA genes in the human genome. The purpose of the study was to identify differentially expressed long non-coding RNA genes in human myometrium in women in spontaneous labor at term. Materials and Methods Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n=19) and women in spontaneous labor at term (n=20). RNA was extracted and profiled using an Illumina® microarray platform. The analysis of the protein coding genes from this study has been previously reported. Here, we have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. Results Upon considering more than 18,498 distinct lncRNA genes compiled nonredundantly from public experimental data sources, and interrogating 2,634 that matched Illumina microarray probes, we identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an independent experimental method. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site that lacked evolutionary conservation beyond primates. Conclusions We provide for the first time evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known, as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term. PMID:24168098
Origin and evolution of the long non-coding genes in the X-inactivation center.
Romito, Antonio; Rougeulle, Claire
2011-11-01
Random X chromosome inactivation (XCI), the eutherian mechanism of X-linked gene dosage compensation, is controlled by a cis-acting locus termed the X-inactivation center (Xic). One of the striking features that characterize the Xic landscape is the abundance of loci transcribing non-coding RNAs (ncRNAs), including Xist, the master regulator of the inactivation process. Recent comparative genomic analyses have depicted the evolutionary scenario behind the origin of the X-inactivation center, revealing that this locus evolved from a region harboring protein-coding genes. During mammalian radiation, this ancestral protein-coding region was disrupted in the marsupial group, whilst it provided in eutherian lineage the starting material for the non-translated RNAs of the X-inactivation center. The emergence of non-coding genes occurred by a dual mechanism involving loss of protein-coding function of the pre-existing genes and integration of different classes of mobile elements, some of which modeled the structure and sequence of the non-coding genes in a species-specific manner. The rising genes started to produce transcripts that acquired function in regulating the epigenetic status of the X chromosome, as shown for Xist, its antisense Tsix, Jpx, and recently suggested for Ftx. Thus, the appearance of the Xic, which occurred after the divergence between eutherians and marsupials, was the basis for the evolution of random X inactivation as a strategy to achieve dosage compensation. Copyright © 2011. Published by Elsevier Masson SAS.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ansong, Charles; Tolic, Nikola; Purvine, Samuel O.
Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. For example systems biology-oriented genome scale modeling efforts greatly benefit from accurate annotation of protein-coding genes to develop proper functioning models. However, determining protein-coding genes for most new genomes is almost completely performed by inference, using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. With the ability to directly measure peptides arising from expressed proteins, mass spectrometry-based proteomics approaches can be used to augment and verify codingmore » regions of a genomic sequence and importantly detect post-translational processing events. In this study we utilized “shotgun” proteomics to guide accurate primary genome annotation of the bacterial pathogen Salmonella Typhimurium 14028 to facilitate a systems-level understanding of Salmonella biology. The data provides protein-level experimental confirmation for 44% of predicted protein-coding genes, suggests revisions to 48 genes assigned incorrect translational start sites, and uncovers 13 non-annotated genes missed by gene prediction programs. We also present a comprehensive analysis of post-translational processing events in Salmonella, revealing a wide range of complex chemical modifications (70 distinct modifications) and confirming more than 130 signal peptide and N-terminal methionine cleavage events in Salmonella. This study highlights several ways in which proteomics data applied during the primary stages of annotation can improve the quality of genome annotations, especially with regards to the annotation of mature protein products.« less
A murC gene in Porphyromonas gingivalis 381.
Ansai, T; Yamashita, Y; Awano, S; Shibata, Y; Wachi, M; Nagai, K; Takehara, T
1995-09-01
The gene encoding a 51 kDa polypeptide of Porphyromonas gingivalis 381 was isolated by immunoblotting using an antiserum raised against P. gingivalis alkaline phosphatase. DNA sequence analysis of a 2.5 kb DNA fragment containing a gene encoding the 51 kDa protein revealed one complete and two incomplete ORFs. Database searches using the FASTA program revealed significant homology between the P. gingivalis 51 kDa protein and the MurC protein of Escherichia coli, which functions in peptidoglycan synthesis. The cloned 51 kDa protein encoded a functional product that complemented an E. coli murC mutant. Moreover, the ORF just upstream of murC coded for a protein that was 31% homologous with the E. coli MurG protein. The ORF just downstream of murC coded for a protein that was 17% homologous with the Streptococcus pneumoniae penicillin-binding protein 2B (PBP2B), which functions in peptidoglycan synthesis and is responsible for antibiotic resistance. These results suggest that P. gingivalis contains a homologue of the E. coli peptidoglycan synthesis gene murC and indicate the possibility of a cluster of genes responsible for cell division and cell growth, as in the E. coli mra region.
Genomics of Clostridium taeniosporum, an organism which forms endospores with ribbon-like appendages
Cambridge, Joshua M.; Blinkova, Alexandra L.; Salvador Rocha, Erick I.; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M.; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O.
2018-01-01
Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12–14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism. PMID:29293521
Cambridge, Joshua M; Blinkova, Alexandra L; Salvador Rocha, Erick I; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O; Walker, James R
2018-01-01
Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12-14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism.
Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U.; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N.; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O.
2014-01-01
Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes. PMID:25264628
Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O
2014-01-01
Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes.
Methylation of miRNA genes and oncogenesis.
Loginov, V I; Rykov, S V; Fridman, M V; Braga, E A
2015-02-01
Interaction between microRNA (miRNA) and messenger RNA of target genes at the posttranscriptional level provides fine-tuned dynamic regulation of cell signaling pathways. Each miRNA can be involved in regulating hundreds of protein-coding genes, and, conversely, a number of different miRNAs usually target a structural gene. Epigenetic gene inactivation associated with methylation of promoter CpG-islands is common to both protein-coding genes and miRNA genes. Here, data on functions of miRNAs in development of tumor-cell phenotype are reviewed. Genomic organization of promoter CpG-islands of the miRNA genes located in inter- and intragenic areas is discussed. The literature and our own results on frequency of CpG-island methylation in miRNA genes from tumors are summarized, and data regarding a link between such modification and changed activity of miRNA genes and, consequently, protein-coding target genes are presented. Moreover, the impact of miRNA gene methylation on key oncogenetic processes as well as affected signaling pathways is discussed.
Activity-Dependent Human Brain Coding/Noncoding Gene Regulatory Networks
Lipovich, Leonard; Dachet, Fabien; Cai, Juan; Bagla, Shruti; Balan, Karina; Jia, Hui; Loeb, Jeffrey A.
2012-01-01
While most gene transcription yields RNA transcripts that code for proteins, a sizable proportion of the genome generates RNA transcripts that do not code for proteins, but may have important regulatory functions. The brain-derived neurotrophic factor (BDNF) gene, a key regulator of neuronal activity, is overlapped by a primate-specific, antisense long noncoding RNA (lncRNA) called BDNFOS. We demonstrate reciprocal patterns of BDNF and BDNFOS transcription in highly active regions of human neocortex removed as a treatment for intractable seizures. A genome-wide analysis of activity-dependent coding and noncoding human transcription using a custom lncRNA microarray identified 1288 differentially expressed lncRNAs, of which 26 had expression profiles that matched activity-dependent coding genes and an additional 8 were adjacent to or overlapping with differentially expressed protein-coding genes. The functions of most of these protein-coding partner genes, such as ARC, include long-term potentiation, synaptic activity, and memory. The nuclear lncRNAs NEAT1, MALAT1, and RPPH1, composing an RNAse P-dependent lncRNA-maturation pathway, were also upregulated. As a means to replicate human neuronal activity, repeated depolarization of SY5Y cells resulted in sustained CREB activation and produced an inverse pattern of BDNF-BDNFOS co-expression that was not achieved with a single depolarization. RNAi-mediated knockdown of BDNFOS in human SY5Y cells increased BDNF expression, suggesting that BDNFOS directly downregulates BDNF. Temporal expression patterns of other lncRNA-messenger RNA pairs validated the effect of chronic neuronal activity on the transcriptome and implied various lncRNA regulatory mechanisms. lncRNAs, some of which are unique to primates, thus appear to have potentially important regulatory roles in activity-dependent human brain plasticity. PMID:22960213
Genes uniquely expressed in human growth plate chondrocytes uncover a distinct regulatory network.
Li, Bing; Balasubramanian, Karthika; Krakow, Deborah; Cohn, Daniel H
2017-12-20
Chondrogenesis is the earliest stage of skeletal development and is a highly dynamic process, integrating the activities and functions of transcription factors, cell signaling molecules and extracellular matrix proteins. The molecular mechanisms underlying chondrogenesis have been extensively studied and multiple key regulators of this process have been identified. However, a genome-wide overview of the gene regulatory network in chondrogenesis has not been achieved. In this study, employing RNA sequencing, we identified 332 protein coding genes and 34 long non-coding RNA (lncRNA) genes that are highly selectively expressed in human fetal growth plate chondrocytes. Among the protein coding genes, 32 genes were associated with 62 distinct human skeletal disorders and 153 genes were associated with skeletal defects in knockout mice, confirming their essential roles in skeletal formation. These gene products formed a comprehensive physical interaction network and participated in multiple cellular processes regulating skeletal development. The data also revealed 34 transcription factors and 11,334 distal enhancers that were uniquely active in chondrocytes, functioning as transcriptional regulators for the cartilage-selective genes. Our findings revealed a complex gene regulatory network controlling skeletal development whereby transcription factors, enhancers and lncRNAs participate in chondrogenesis by transcriptional regulation of key genes. Additionally, the cartilage-selective genes represent candidate genes for unsolved human skeletal disorders.
Su, Huei-Jiun; Hu, Jer-Ming
2012-01-01
Background and Aims The holoparasitic flowering plant Balanophora displays extreme floral reduction and was previously found to have enormous rate acceleration in the nuclear 18S rDNA region. So far, it remains unclear whether non-ribosomal, protein-coding genes of Balanophora also evolve in an accelerated fashion and whether the genes with high substitution rates retain their functionality. To tackle these issues, six different genes were sequenced from two Balanophora species and their rate variation and expression patterns were examined. Methods Sequences including nuclear PI, euAP3, TM6, LFY and RPB2 and mitochondrial matR were determined from two Balanophora spp. and compared with selected hemiparasitic species of Santalales and autotrophic core eudicots. Gene expression was detected for the six protein-coding genes and the expression patterns of the three B-class genes (PI, AP3 and TM6) were further examined across different organs of B. laxiflora using RT-PCR. Key Results Balanophora mitochondrial matR is highly accelerated in both nonsynonymous (dN) and synonymous (dS) substitution rates, whereas the rate variation of nuclear genes LFY, PI, euAP3, TM6 and RPB2 are less dramatic. Significant dS increases were detected in Balanophora PI, TM6, RPB2 and dN accelerations in euAP3. All of the protein-coding genes are expressed in inflorescences, indicative of their functionality. PI is restrictively expressed in tepals, synandria and floral bracts, whereas AP3 and TM6 are widely expressed in both male and female inflorescences. Conclusions Despite the observation that rates of sequence evolution are generally higher in Balanophora than in hemiparasitic species of Santalales and autotrophic core eudicots, the five nuclear protein-coding genes are functional and are evolving at a much slower rate than 18S rDNA. The mechanism or mechanisms responsible for rapid sequence evolution and concomitant rate acceleration for 18S rDNA and matR are currently not well understood and require further study in Balanophora and other holoparasites. PMID:23041381
Li, Shicheng; Sun, Xiao; Miao, Shuncheng; Liu, Jia; Jiao, Wenjie
2017-11-01
Cigarette smoking is one of the greatest preventable risk factors for developing cancer, and most cases of lung squamous cell carcinoma (lung SCC) are associated with smoking. The pathogenesis mechanism of tumor progress is unclear. This study aimed to identify biomarkers in smoking-related lung cancer, including protein-coding gene, long noncoding RNA, and transcription factors. We selected and obtained messenger RNA microarray datasets and clinical data from the Gene Expression Omnibus database to identify gene expression altered by cigarette smoking. Integrated bioinformatic analysis was used to clarify biological functions of the identified genes, including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, the construction of a protein-protein interaction network, transcription factor, and statistical analyses. Subsequent quantitative real-time PCR was utilized to verify these bioinformatic analyses. Five hundred and ninety-eight differentially expressed genes and 21 long noncoding RNA were identified in smoking-related lung SCC. GO and KEGG pathway analysis showed that identified genes were enriched in the cancer-related functions and pathways. The protein-protein interaction network revealed seven hub genes identified in lung SCC. Several transcription factors and their binding sites were predicted. The results of real-time quantitative PCR revealed that AURKA and BIRC5 were significantly upregulated and LINC00094 was downregulated in the tumor tissues of smoking patients. Further statistical analysis indicated that dysregulation of AURKA, BIRC5, and LINC00094 indicated poor prognosis in lung SCC. Protein-coding genes AURKA, BIRC5, and LINC00094 could be biomarkers or therapeutic targets for smoking-related lung SCC. © 2017 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.
Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia
2015-01-01
Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/ PMID:26363020
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smialowska, Agata, E-mail: smialowskaa@gmail.com; School of Life Sciences, Södertörn Högskola, Huddinge 141-89; Djupedal, Ingela
Highlights: • Protein coding genes accumulate anti-sense sRNAs in fission yeast S. pombe. • RNAi represses protein-coding genes in S. pombe. • RNAi-mediated gene repression is post-transcriptional. - Abstract: RNA interference (RNAi) is a gene silencing mechanism conserved from fungi to mammals. Small interfering RNAs are products and mediators of the RNAi pathway and act as specificity factors in recruiting effector complexes. The Schizosaccharomyces pombe genome encodes one of each of the core RNAi proteins, Dicer, Argonaute and RNA-dependent RNA polymerase (dcr1, ago1, rdp1). Even though the function of RNAi in heterochromatin assembly in S. pombe is established, its rolemore » in controlling gene expression is elusive. Here, we report the identification of small RNAs mapped anti-sense to protein coding genes in fission yeast. We demonstrate that these genes are up-regulated at the protein level in RNAi mutants, while their mRNA levels are not significantly changed. We show that the repression by RNAi is not a result of heterochromatin formation. Thus, we conclude that RNAi is involved in post-transcriptional gene silencing in S. pombe.« less
A human haploid gene trap collection to study lncRNAs with unusual RNA biology.
Kornienko, Aleksandra E; Vlatkovic, Irena; Neesen, Jürgen; Barlow, Denise P; Pauler, Florian M
2016-01-01
Many thousand long non-coding (lnc) RNAs are mapped in the human genome. Time consuming studies using reverse genetic approaches by post-transcriptional knock-down or genetic modification of the locus demonstrated diverse biological functions for a few of these transcripts. The Human Gene Trap Mutant Collection in haploid KBM7 cells is a ready-to-use tool for studying protein-coding gene function. As lncRNAs show remarkable differences in RNA biology compared to protein-coding genes, it is unclear if this gene trap collection is useful for functional analysis of lncRNAs. Here we use the uncharacterized LOC100288798 lncRNA as a model to answer this question. Using public RNA-seq data we show that LOC100288798 is ubiquitously expressed, but inefficiently spliced. The minor spliced LOC100288798 isoforms are exported to the cytoplasm, whereas the major unspliced isoform is nuclear localized. This shows that LOC100288798 RNA biology differs markedly from typical mRNAs. De novo assembly from RNA-seq data suggests that LOC100288798 extends 289kb beyond its annotated 3' end and overlaps the downstream SLC38A4 gene. Three cell lines with independent gene trap insertions in LOC100288798 were available from the KBM7 gene trap collection. RT-qPCR and RNA-seq confirmed successful lncRNA truncation and its extended length. Expression analysis from RNA-seq data shows significant deregulation of 41 protein-coding genes upon LOC100288798 truncation. Our data shows that gene trap collections in human haploid cell lines are useful tools to study lncRNAs, and identifies the previously uncharacterized LOC100288798 as a potential gene regulator.
DOE R&D Accomplishments Database
Liang, X.
1998-06-10
The genome of Methanococcus jannaschii has been sequenced completely and has been found to contain approximately 1,770 predicted protein-coding regions. When these coding regions are expressed and how their expression is regulated, however, remain open questions. In this work, mass spectrometry was combined with two-dimensional gel electrophoresis to identify which proteins the genes produce under different growth conditions, and thus investigate the regulation of genes responsible for functions characteristic of this thermophilic representative of the methanogenic Archaea.
Informational structure of genetic sequences and nature of gene splicing
NASA Astrophysics Data System (ADS)
Trifonov, E. N.
1991-10-01
Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
Zhang, Qingbin; Chen, Li; Cui, Shiman; Li, Yan; Zhao, Qi; Cao, Wei; Lai, Shixiang; Yin, Sanjun; Zuo, Zhixiang; Ren, Jian
2017-10-25
Although long noncoding RNAs (lncRNAs) have been emerging as critical regulators in various tissues and biological processes, little is known about their expression and regulation during the osteogenic differentiation of periodontal ligament stem cells (PDLSCs) in inflammatory microenvironment. In this study, we have identified 63 lncRNAs that are not annotated in previous database. These novel lncRNAs were not randomly located in the genome but preferentially located near protein-coding genes related to particular functions and diseases, such as stem cell maintenance and differentiation, development disorders and inflammatory diseases. Moreover, we have identified 650 differentially expressed lncRNAs among different subsets of PDLSCs. Pathway enrichment analysis for neighboring protein-coding genes of these differentially expressed lncRNAs revealed stem cell differentiation related functions. Many of these differentially expressed lncRNAs function as competing endogenous RNAs that regulate protein-coding transcripts through competing shared miRNAs.
Yang, W; Du, W W; Li, X; Yee, A J; Yang, B B
2016-07-28
It has recently been shown that the upregulation of a pseudogene specific to a protein-coding gene could function as a sponge to bind multiple potential targeting microRNAs (miRNAs), resulting in increased gene expression. Similarly, it was recently demonstrated that circular RNAs can function as sponges for miRNAs, and could upregulate expression of mRNAs containing an identical sequence. Furthermore, some mRNAs are now known to not only translate protein, but also function to sponge miRNA binding, facilitating gene expression. Collectively, these appear to be effective mechanisms to ensure gene expression and protein activity. Here we show that expression of a member of the forkhead family of transcription factors, Foxo3, is regulated by the Foxo3 pseudogene (Foxo3P), and Foxo3 circular RNA, both of which bind to eight miRNAs. We found that the ectopic expression of the Foxo3P, Foxo3 circular RNA and Foxo3 mRNA could all suppress tumor growth and cancer cell proliferation and survival. Our results showed that at least three mechanisms are used to ensure protein translation of Foxo3, which reflects an essential role of Foxo3 and its corresponding non-coding RNAs.
Seligmann, Hervé
2013-03-01
Usual DNA→RNA transcription exchanges T→U. Assuming different systematic symmetric nucleotide exchanges during translation, some GenBank RNAs match exactly human mitochondrial sequences (exchange rules listed in decreasing transcript frequencies): C↔U, A↔U, A↔U+C↔G (two nucleotide pairs exchanged), G↔U, A↔G, C↔G, none for A↔C, A↔G+C↔U, and A↔C+G↔U. Most unusual transcripts involve exchanging uracil. Independent measures of rates of rare replicational enzymatic DNA nucleotide misinsertions predict frequencies of RNA transcripts systematically exchanging the corresponding misinserted nucleotides. Exchange transcripts self-hybridize less than other gene regions, self-hybridization increases with length, suggesting endoribonuclease-limited elongation. Blast detects stop codon depleted putative protein coding overlapping genes within exchange-transcribed mitochondrial genes. These align with existing GenBank proteins (mainly metazoan origins, prokaryotic and viral origins underrepresented). These GenBank proteins frequently interact with RNA/DNA, are membrane transporters, or are typical of mitochondrial metabolism. Nucleotide exchange transcript frequencies increase with overlapping gene densities and stop densities, indicating finely tuned counterbalancing regulation of expression of systematic symmetric nucleotide exchange-encrypted proteins. Such expression necessitates combined activities of suppressor tRNAs matching stops, and nucleotide exchange transcription. Two independent properties confirm predicted exchanged overlap coding genes: discrepancy of third codon nucleotide contents from replicational deamination gradients, and codon usage according to circular code predictions. Predictions from both properties converge, especially for frequent nucleotide exchange types. Nucleotide exchanging transcription apparently increases coding densities of protein coding genes without lengthening genomes, revealing unsuspected functional DNA coding potential. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Chen, Geng; Yin, Kangping; Shi, Leming; Fang, Yuanzhang; Qi, Ya; Li, Peng; Luo, Jian; He, Bing; Liu, Mingyao; Shi, Tieliu
2011-01-01
In their expression process, different genes can generate diverse functional products, including various protein-coding or noncoding RNAs. Here, we investigated the protein-coding capacities and the expression levels of their isoforms for human known genes, the conservation and disease association of long noncoding RNAs (ncRNAs) with two transcriptome sequencing datasets from human brain tissues and 10 mixed cell lines. Comparative analysis revealed that about two-thirds of the genes expressed between brain and cell lines are the same, but less than one-third of their isoforms are identical. Besides those genes specially expressed in brain and cell lines, about 66% of genes expressed in common encoded different isoforms. Moreover, most genes dominantly expressed one isoform and some genes only generated protein-coding (or noncoding) RNAs in one sample but not in another. We found 282 human genes could encode both protein-coding and noncoding RNAs through alternative splicing in the two samples. We also identified more than 1,000 long ncRNAs, and most of those long ncRNAs contain conserved elements across either 46 vertebrates or 33 placental mammals or 10 primates. Further analysis showed that some long ncRNAs differentially expressed in human breast cancer or lung cancer, several of those differentially expressed long ncRNAs were validated by RT-PCR. In addition, those validated differentially expressed long ncRNAs were found significantly correlated with certain breast cancer or lung cancer related genes, indicating the important biological relevance between long ncRNAs and human cancers. Our findings reveal that the differences of gene expression profile between samples mainly result from the expressed gene isoforms, and highlight the importance of studying genes at the isoform level for completely illustrating the intricate transcriptome.
Vicente, Juan J; Galardi-Castilla, María; Escalante, Ricardo; Sastre, Leandro
2008-01-03
The social amoeba Dictyostelium discoideum executes a multicellular development program upon starvation. This morphogenetic process requires the differential regulation of a large number of genes and is coordinated by extracellular signals. The MADS-box transcription factor SrfA is required for several stages of development, including slug migration and spore terminal differentiation. Subtractive hybridization allowed the isolation of a gene, sigN (SrfA-induced gene N), that was dependent on the transcription factor SrfA for expression at the slug stage of development. Homology searches detected the existence of a large family of sigN-related genes in the Dictyostelium discoideum genome. The 13 most similar genes are grouped in two regions of chromosome 2 and have been named Group1 and Group2 sigN genes. The putative encoded proteins are 87-89 amino acids long. All these genes have a similar structure, composed of a first exon containing a 13 nucleotides long open reading frame and a second exon comprising the remaining of the putative coding region. The expression of these genes is induced at10 hours of development. Analyses of their promoter regions indicate that these genes are expressed in the prestalk region of developing structures. The addition of antibodies raised against SigN Group 2 proteins induced disintegration of multi-cellular structures at the mound stage of development. A large family of genes coding for small proteins has been identified in D. discoideum. Two groups of very similar genes from this family have been shown to be specifically expressed in prestalk cells during development. Functional studies using antibodies raised against Group 2 SigN proteins indicate that these genes could play a role during multicellular development.
Naville, M; Warren, I A; Haftek-Terreau, Z; Chalopin, D; Brunet, F; Levin, P; Galiana, D; Volff, J-N
2016-04-01
Viruses and transposable elements, once considered as purely junk and selfish sequences, have repeatedly been used as a source of novel protein-coding genes during the evolution of most eukaryotic lineages, a phenomenon called 'molecular domestication'. This is exemplified perfectly in mammals and other vertebrates, where many genes derived from long terminal repeat (LTR) retroelements (retroviruses and LTR retrotransposons) have been identified through comparative genomics and functional analyses. In particular, genes derived from gag structural protein and envelope (env) genes, as well as from the integrase-coding and protease-coding sequences, have been identified in humans and other vertebrates. Retroelement-derived genes are involved in many important biological processes including placenta formation, cognitive functions in the brain and immunity against retroelements, as well as in cell proliferation, apoptosis and cancer. These observations support an important role of retroelement-derived genes in the evolution and diversification of the vertebrate lineage. Copyright © 2016 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Decoding sORF translation - from small proteins to gene regulation.
Cabrera-Quio, Luis Enrique; Herberg, Sarah; Pauli, Andrea
2016-11-01
Translation is best known as the fundamental mechanism by which the ribosome converts a sequence of nucleotides into a string of amino acids. Extensive research over many years has elucidated the key principles of translation, and the majority of translated regions were thought to be known. The recent discovery of wide-spread translation outside of annotated protein-coding open reading frames (ORFs) came therefore as a surprise, raising the intriguing possibility that these newly discovered translated regions might have unrecognized protein-coding or gene-regulatory functions. Here, we highlight recent findings that provide evidence that some of these newly discovered translated short ORFs (sORFs) encode functional, previously missed small proteins, while others have regulatory roles. Based on known examples we will also speculate about putative additional roles and the potentially much wider impact that these translated regions might have on cellular homeostasis and gene regulation.
Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil
2015-01-01
The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073
Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures
Stark, Alexander; Lin, Michael F.; Kheradpour, Pouya; Pedersen, Jakob S.; Parts, Leopold; Carlson, Joseph W.; Crosby, Madeline A.; Rasmussen, Matthew D.; Roy, Sushmita; Deoras, Ameya N.; Ruby, J. Graham; Brennecke, Julius; Hodges, Emily; Hinrichs, Angie S.; Caspi, Anat; Paten, Benedict; Park, Seung-Won; Han, Mira V.; Maeder, Morgan L.; Polansky, Benjamin J.; Robson, Bryanne E.; Aerts, Stein; van Helden, Jacques; Hassan, Bassem; Gilbert, Donald G.; Eastman, Deborah A.; Rice, Michael; Weir, Michael; Hahn, Matthew W.; Park, Yongkyu; Dewey, Colin N.; Pachter, Lior; Kent, W. James; Haussler, David; Lai, Eric C.; Bartel, David P.; Hannon, Gregory J.; Kaufman, Thomas C.; Eisen, Michael B.; Clark, Andrew G.; Smith, Douglas; Celniker, Susan E.; Gelbart, William M.; Kellis, Manolis
2008-01-01
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies. PMID:17994088
Zhang, Haiyun; Sun, Dejun; Li, Defu; Zheng, Zeguang; Xu, Jingyi; Liang, Xue; Zhang, Chenting; Wang, Sheng; Wang, Jian; Lu, Wenju
2018-05-15
Long non-coding RNAs (lncRNAs) have critical regulatory roles in protein-coding gene expression. Aberrant expression profiles of lncRNAs have been observed in various human diseases. In this study, we investigated transcriptome profiles in lung tissues of chronic cigarette smoke (CS)-induced COPD mouse model. We found that 109 lncRNAs and 260 mRNAs were significantly differential expressed in lungs of chronic CS-induced COPD mouse model compared with control animals. GO and KEGG analyses indicated that differentially expressed lncRNAs associated protein-coding genes were mainly involved in protein processing of endoplasmic reticulum pathway, and taurine and hypotaurine metabolism pathway. The combination of high throughput data analysis and the results of qRT-PCR validation in lungs of chronic CS-induced COPD mouse model, 16HBE cells with CSE treatment and PBMC from patients with COPD revealed that NR_102714 and its associated protein-coding gene UCHL1 might be involved in the development of COPD both in mouse and human. In conclusion, our study demonstrated that aberrant expression profiles of lncRNAs and mRNAs existed in lungs of chronic CS-induced COPD mouse model. From animal models perspective, these results might provide further clues to investigate biological functions of lncRNAs and their potential target protein-coding genes in the pathogenesis of COPD.
GeneBuilder: interactive in silico prediction of gene structure.
Milanesi, L; D'Angelo, D; Rogozin, I B
1999-01-01
Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
Decoding the genome beyond sequencing: the new phase of genomic research.
Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J
2011-10-01
While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. Copyright © 2011 Elsevier Inc. All rights reserved.
Analysis of protein-coding genetic variation in 60,706 humans.
Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G
2016-08-18
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Recurrent and functional regulatory mutations in breast cancer.
Rheinbay, Esther; Parasuraman, Prasanna; Grimsby, Jonna; Tiao, Grace; Engreitz, Jesse M; Kim, Jaegil; Lawrence, Michael S; Taylor-Weiner, Amaro; Rodriguez-Cuevas, Sergio; Rosenberg, Mara; Hess, Julian; Stewart, Chip; Maruvka, Yosef E; Stojanov, Petar; Cortes, Maria L; Seepo, Sara; Cibulskis, Carrie; Tracy, Adam; Pugh, Trevor J; Lee, Jesse; Zheng, Zongli; Ellisen, Leif W; Iafrate, A John; Boehm, Jesse S; Gabriel, Stacey B; Meyerson, Matthew; Golub, Todd R; Baselga, Jose; Hidalgo-Miranda, Alfredo; Shioda, Toshi; Bernards, Andre; Lander, Eric S; Getz, Gad
2017-07-06
Genomic analysis of tumours has led to the identification of hundreds of cancer genes on the basis of the presence of mutations in protein-coding regions. By contrast, much less is known about cancer-causing mutations in non-coding regions. Here we perform deep sequencing in 360 primary breast cancers and develop computational methods to identify significantly mutated promoters. Clear signals are found in the promoters of three genes. FOXA1, a known driver of hormone-receptor positive breast cancer, harbours a mutational hotspot in its promoter leading to overexpression through increased E2F binding. RMRP and NEAT1, two non-coding RNA genes, carry mutations that affect protein binding to their promoters and alter expression levels. Our study shows that promoter regions harbour recurrent mutations in cancer with functional consequences and that the mutations occur at similar frequencies as in coding regions. Power analyses indicate that more such regions remain to be discovered through deep sequencing of adequately sized cohorts of patients.
Lu, Xiangfeng; Peloso, Gina M; Liu, Dajiang J; Wu, Ying; Zhang, He; Zhou, Wei; Li, Jun; Tang, Clara Sze-Man; Dorajoo, Rajkumar; Li, Huaixing; Long, Jirong; Guo, Xiuqing; Xu, Ming; Spracklen, Cassandra N; Chen, Yang; Liu, Xuezhen; Zhang, Yan; Khor, Chiea Chuen; Liu, Jianjun; Sun, Liang; Wang, Laiyuan; Gao, Yu-Tang; Hu, Yao; Yu, Kuai; Wang, Yiqin; Cheung, Chloe Yu Yan; Wang, Feijie; Huang, Jianfeng; Fan, Qiao; Cai, Qiuyin; Chen, Shufeng; Shi, Jinxiu; Yang, Xueli; Zhao, Wanting; Sheu, Wayne H-H; Cherny, Stacey Shawn; He, Meian; Feranil, Alan B; Adair, Linda S; Gordon-Larsen, Penny; Du, Shufa; Varma, Rohit; Chen, Yii-Der Ida; Shu, Xiao-Ou; Lam, Karen Siu Ling; Wong, Tien Yin; Ganesh, Santhi K; Mo, Zengnan; Hveem, Kristian; Fritsche, Lars G; Nielsen, Jonas Bille; Tse, Hung-Fat; Huo, Yong; Cheng, Ching-Yu; Chen, Y Eugene; Zheng, Wei; Tai, E Shyong; Gao, Wei; Lin, Xu; Huang, Wei; Abecasis, Goncalo; Kathiresan, Sekar; Mohlke, Karen L; Wu, Tangchun; Sham, Pak Chung; Gu, Dongfeng; Willer, Cristen J
2017-12-01
Most genome-wide association studies have been of European individuals, even though most genetic variation in humans is seen only in non-European samples. To search for novel loci associated with blood lipid levels and clarify the mechanism of action at previously identified lipid loci, we used an exome array to examine protein-coding genetic variants in 47,532 East Asian individuals. We identified 255 variants at 41 loci that reached chip-wide significance, including 3 novel loci and 14 East Asian-specific coding variant associations. After a meta-analysis including >300,000 European samples, we identified an additional nine novel loci. Sixteen genes were identified by protein-altering variants in both East Asians and Europeans, and thus are likely to be functional genes. Our data demonstrate that most of the low-frequency or rare coding variants associated with lipids are population specific, and that examining genomic data across diverse ancestries may facilitate the identification of functional genes at associated loci.
Lu, Xiangfeng; Peloso, Gina M; Liu, Dajiang J.; Wu, Ying; Zhang, He; Zhou, Wei; Li, Jun; Tang, Clara Sze-man; Dorajoo, Rajkumar; Li, Huaixing; Long, Jirong; Guo, Xiuqing; Xu, Ming; Spracklen, Cassandra N.; Chen, Yang; Liu, Xuezhen; Zhang, Yan; Khor, Chiea Chuen; Liu, Jianjun; Sun, Liang; Wang, Laiyuan; Gao, Yu-Tang; Hu, Yao; Yu, Kuai; Wang, Yiqin; Cheung, Chloe Yu Yan; Wang, Feijie; Huang, Jianfeng; Fan, Qiao; Cai, Qiuyin; Chen, Shufeng; Shi, Jinxiu; Yang, Xueli; Zhao, Wanting; Sheu, Wayne H.-H.; Cherny, Stacey Shawn; He, Meian; Feranil, Alan B.; Adair, Linda S.; Gordon-Larsen, Penny; Du, Shufa; Varma, Rohit; da Chen, Yii-Der I; Shu, XiaoOu; Lam, Karen Siu Ling; Wong, Tien Yin; Ganesh, Santhi K.; Mo, Zengnan; Hveem, Kristian; Fritsche, Lars; Nielsen, Jonas Bille; Tse, Hung-fat; Huo, Yong; Cheng, Ching-Yu; Chen, Y. Eugene; Zheng, Wei; Tai, E Shyong; Gao, Wei; Lin, Xu; Huang, Wei; Abecasis, Goncalo; Consortium, GLGC; Kathiresan, Sekar; Mohlke, Karen L.; Wu, Tangchun; Sham, Pak Chung; Gu, Dongfeng; Willer, Cristen J
2017-01-01
Most genome-wide association studies have been conducted in European individuals, even though most genetic variation in humans is seen only in non-European samples. To search for novel loci associated with blood lipid levels and clarify the mechanism of action at previously identified lipid loci, we examined protein-coding genetic variants in 47,532 East Asian individuals using an exome array. We identified 255 variants at 41 loci reaching chip-wide significance, including 3 novel loci and 14 East Asian-specific coding variant associations. After meta-analysis with > 300,000 European samples, we identified an additional 9 novel loci. The same 16 genes were identified by the protein-altering variants in both East Asians and Europeans, likely pointing to the functional genes. Our data demonstrate that most of the low-frequency or rare coding variants associated with lipids are population-specific, and that examining genomic data across diverse ancestries may facilitate the identification of functional genes at associated loci. PMID:29083407
Zhou, Ke-Ren; Liu, Shun; Sun, Wen-Ju; Zheng, Ling-Ling; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu
2017-01-04
The abnormal transcriptional regulation of non-coding RNAs (ncRNAs) and protein-coding genes (PCGs) is contributed to various biological processes and linked with human diseases, but the underlying mechanisms remain elusive. In this study, we developed ChIPBase v2.0 (http://rna.sysu.edu.cn/chipbase/) to explore the transcriptional regulatory networks of ncRNAs and PCGs. ChIPBase v2.0 has been expanded with ∼10 200 curated ChIP-seq datasets, which represent about 20 times expansion when comparing to the previous released version. We identified thousands of binding motif matrices and their binding sites from ChIP-seq data of DNA-binding proteins and predicted millions of transcriptional regulatory relationships between transcription factors (TFs) and genes. We constructed 'Regulator' module to predict hundreds of TFs and histone modifications that were involved in or affected transcription of ncRNAs and PCGs. Moreover, we built a web-based tool, Co-Expression, to explore the co-expression patterns between DNA-binding proteins and various types of genes by integrating the gene expression profiles of ∼10 000 tumor samples and ∼9100 normal tissues and cell lines. ChIPBase also provides a ChIP-Function tool and a genome browser to predict functions of diverse genes and visualize various ChIP-seq data. This study will greatly expand our understanding of the transcriptional regulations of ncRNAs and PCGs. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia
2015-01-01
Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/. © The Author(s) 2015. Published by Oxford University Press.
Tran-Nguyen, L. T. T.; Kube, M.; Schneider, B.; Reinhardt, R.; Gibb, K. S.
2008-01-01
The chromosome sequence of “Candidatus Phytoplasma australiense” (subgroup tuf-Australia I; rp-A), associated with dieback in papaya, Australian grapevine yellows in grapevine, and several other important plant diseases, was determined. The circular chromosome is represented by 879,324 nucleotides, a GC content of 27%, and 839 protein-coding genes. Five hundred two of these protein-coding genes were functionally assigned, while 337 genes were hypothetical proteins with unknown function. Potential mobile units (PMUs) containing clusters of DNA repeats comprised 12.1% of the genome. These PMUs encoded genes involved in DNA replication, repair, and recombination; nucleotide transport and metabolism; translation; and ribosomal structure. Elements with similarities to phage integrases found in these mobile units were difficult to classify, as they were similar to both insertion sequences and bacteriophages. Comparative analysis of “Ca. Phytoplasma australiense” with “Ca. Phytoplasma asteris” strains OY-M and AY-WB showed that the gene order was more conserved between the closely related “Ca. Phytoplasma asteris” strains than to “Ca. Phytoplasma australiense.” Differences observed between “Ca. Phytoplasma australiense” and “Ca. Phytoplasma asteris” strains included the chromosome size (18,693 bp larger than OY-M), a larger number of genes with assigned function, and hypothetical proteins with unknown function. PMID:18359806
Pietan, Lucas L.; Spradling, Theresa A.
2016-01-01
In animals, mitochondrial DNA (mtDNA) typically occurs as a single circular chromosome with 13 protein-coding genes and 22 tRNA genes. The various species of lice examined previously, however, have shown mitochondrial genome rearrangements with a range of chromosome sizes and numbers. Our research demonstrates that the mitochondrial genomes of two species of chewing lice found on pocket gophers, Geomydoecus aurei and Thomomydoecus minor, are fragmented with the 1,536 base-pair (bp) cytochrome-oxidase subunit I (cox1) gene occurring as the only protein-coding gene on a 1,916–1,964 bp minicircular chromosome in the two species, respectively. The cox1 gene of T. minor begins with an atypical start codon, while that of G. aurei does not. Components of the non-protein coding sequence of G. aurei and T. minor include a tRNA (isoleucine) gene, inverted repeat sequences consistent with origins of replication, and an additional non-coding region that is smaller than the non-coding sequence of other lice with such fragmented mitochondrial genomes. Sequences of cox1 minichromosome clones for each species reveal extensive length and sequence heteroplasmy in both coding and noncoding regions. The highly variable non-gene regions of G. aurei and T. minor have little sequence similarity with one another except for a 19-bp region of phylogenetically conserved sequence with unknown function. PMID:27589589
Batagov, Arsen O; Yarmishyn, Aliaksandr A; Jenjaroenpun, Piroon; Tan, Jovina Z; Nishida, Yuichiro; Kurochkin, Igor V
2013-10-16
Mammalian genomes are extensively transcribed producing thousands of long non-protein-coding RNAs (lncRNAs). The biological significance and function of the vast majority of lncRNAs remain unclear. Recent studies have implicated several lncRNAs as playing important roles in embryonic development and cancer progression. LncRNAs are characterized with different genomic architectures in relationship with their associated protein-coding genes. Our study aimed at bridging lncRNA architecture with dynamical patterns of their expression using differentiating human neuroblastoma cells model. LncRNA expression was studied in a 120-hours timecourse of differentiation of human neuroblastoma SH-SY5Y cells into neurons upon treatment with retinoic acid (RA), the compound used for the treatment of neuroblastoma. A custom microarray chip was utilized to interrogate expression levels of 9,267 lncRNAs in the course of differentiation. We categorized lncRNAs into 19 architecture classes according to their position relatively to protein-coding genes. For each architecture class, dynamics of expression of lncRNAs was studied in association with their protein-coding partners. It allowed us to demonstrate positive correlation of lncRNAs with their associated protein-coding genes at bidirectional promoters and for sense-antisense transcript pairs. In contrast, lncRNAs located in the introns and downstream of the protein-coding genes were characterized with negative correlation modes. We further classified the lncRNAs by the temporal patterns of their expression dynamics. We found that intronic and bidirectional promoter architectures are associated with rapid RA-dependent induction or repression of the corresponding lncRNAs, followed by their constant expression. At the same time, lncRNAs expressed downstream of protein-coding genes are characterized by rapid induction, followed by transcriptional repression. Quantitative RT-PCR analysis confirmed the discovered functional modes for several selected lncRNAs associated with proteins involved in cancer and embryonic development. This is the first report detailing dynamical changes of multiple lncRNAs during RA-induced neuroblastoma differentiation. Integration of genomic and transcriptomic levels of information allowed us to demonstrate specific behavior of lncRNAs organized in different genomic architectures. This study also provides a list of lncRNAs with possible roles in neuroblastoma.
Nutt, S L; Morrison, A M; Dörfler, P; Rolink, A; Busslinger, M
1998-01-01
The Pax-5 gene codes for the transcription factor BSAP which is essential for the progression of adult B lymphopoiesis beyond an early progenitor (pre-BI) cell stage. Although several genes have been proposed to be regulated by BSAP, CD19 is to date the only target gene which has been genetically confirmed to depend on this transcription factor for its expression. We have now taken advantage of cultured pre-BI cells of wild-type and Pax-5 mutant bone marrow to screen a large panel of B lymphoid genes for additional BSAP target genes. Four differentially expressed genes were shown to be under the direct control of BSAP, as their expression was rapidly regulated in Pax-5-deficient pre-BI cells by a hormone-inducible BSAP-estrogen receptor fusion protein. The genes coding for the B-cell receptor component Ig-alpha (mb-1) and the transcription factors N-myc and LEF-1 are positively regulated by BSAP, while the gene coding for the cell surface protein PD-1 is efficiently repressed. Distinct regulatory mechanisms of BSAP were revealed by reconstituting Pax-5-deficient pre-BI cells with full-length BSAP or a truncated form containing only the paired domain. IL-7 signalling was able to efficiently induce the N-myc gene only in the presence of full-length BSAP, while complete restoration of CD19 synthesis was critically dependent on the BSAP protein concentration. In contrast, the expression of the mb-1 and LEF-1 genes was already reconstituted by the paired domain polypeptide lacking any transactivation function, suggesting that the DNA-binding domain of BSAP is sufficient to recruit other transcription factors to the regulatory regions of these two genes. In conclusion, these loss- and gain-of-function experiments demonstrate that BSAP regulates four newly identified target genes as a transcriptional activator, repressor or docking protein depending on the specific regulatory sequence context. PMID:9545244
New Genes and Functional Innovation in Mammals
Luis Villanueva-Cañas, José; Ruiz-Orera, Jorge; Agea, M. Isabel; Gallo, Maria; Andreu, David
2017-01-01
Abstract The birth of genes that encode new protein sequences is a major source of evolutionary innovation. However, we still understand relatively little about how these genes come into being and which functions they are selected for. To address these questions, we have obtained a large collection of mammalian-specific gene families that lack homologues in other eukaryotic groups. We have combined gene annotations and de novo transcript assemblies from 30 different mammalian species, obtaining ∼6,000 gene families. In general, the proteins in mammalian-specific gene families tend to be short and depleted in aromatic and negatively charged residues. Proteins which arose early in mammalian evolution include milk and skin polypeptides, immune response components, and proteins involved in reproduction. In contrast, the functions of proteins which have a more recent origin remain largely unknown, despite the fact that these proteins also have extensive proteomics support. We identify several previously described cases of genes originated de novo from noncoding genomic regions, supporting the idea that this mechanism frequently underlies the evolution of new protein-coding genes in mammals. Finally, we show that most young mammalian genes are preferentially expressed in testis, suggesting that sexual selection plays an important role in the emergence of new functional genes. PMID:28854603
Liu, Kuimei; Dong, Yanmei; Wang, Fangzhong; Jiang, Baojie; Wang, Mingyu; Fang, Xu
2016-01-01
Homologs of the velvet protein family are encoded by the ve1, vel2, and vel3 genes in Trichoderma reesei. To test their regulatory functions, the velvet protein-coding genes were disrupted, generating Δve1, Δvel2, and Δvel3 strains. The phenotypic features of these strains were examined to identify their functions in morphogenesis, sporulation, and cellulase expression. The three velvet-deficient strains produced more hyphal branches, indicating that velvet family proteins participate in the morphogenesis in T. reesei. Deletion of ve1 and vel3 did not affect biomass accumulation, while deletion of vel2 led to a significantly hampered growth when cellulose was used as the sole carbon source in the medium. The deletion of either ve1 or vel2 led to the sharp decrease of sporulation as well as a global downregulation of cellulase-coding genes. In contrast, although the expression of cellulase-coding genes of the ∆vel3 strain was downregulated in the dark, their expression in light condition was unaffected. Sporulation was hampered in the ∆vel3 strain. These results suggest that Ve1 and Vel2 play major roles, whereas Vel3 plays a minor role in sporulation, morphogenesis, and cellulase expression.
Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil
2015-02-01
The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
FunGene: the functional gene pipeline and repository.
Fish, Jordan A; Chai, Benli; Wang, Qiong; Sun, Yanni; Brown, C Titus; Tiedje, James M; Cole, James R
2013-01-01
Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer. While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/) offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.
The Evolution and Expression Pattern of Human Overlapping lncRNA and Protein-coding Gene Pairs.
Ning, Qianqian; Li, Yixue; Wang, Zhen; Zhou, Songwen; Sun, Hong; Yu, Guangjun
2017-03-27
Long non-coding RNA overlapping with protein-coding gene (lncRNA-coding pair) is a special type of overlapping genes. Protein-coding overlapping genes have been well studied and increasing attention has been paid to lncRNAs. By studying lncRNA-coding pairs in human genome, we showed that lncRNA-coding pairs were more likely to be generated by overprinting and retaining genes in lncRNA-coding pairs were given higher priority than non-overlapping genes. Besides, the preference of overlapping configurations preserved during evolution was based on the origin of lncRNA-coding pairs. Further investigations showed that lncRNAs promoting the splicing of their embedded protein-coding partners was a unilateral interaction, but the existence of overlapping partners improving the gene expression was bidirectional and the effect was decreased with the increased evolutionary age of genes. Additionally, the expression of lncRNA-coding pairs showed an overall positive correlation and the expression correlation was associated with their overlapping configurations, local genomic environment and evolutionary age of genes. Comparison of the expression correlation of lncRNA-coding pairs between normal and cancer samples found that the lineage-specific pairs including old protein-coding genes may play an important role in tumorigenesis. This work presents a systematically comprehensive understanding of the evolution and the expression pattern of human lncRNA-coding pairs.
2014-01-01
Background Nrd1 and Nab3 are essential sequence-specific yeast RNA binding proteins that function as a heterodimer in the processing and degradation of diverse classes of RNAs. These proteins also regulate several mRNA coding genes; however, it remains unclear exactly what percentage of the mRNA component of the transcriptome these proteins control. To address this question, we used the pyCRAC software package developed in our laboratory to analyze CRAC and PAR-CLIP data for Nrd1-Nab3-RNA interactions. Results We generated high-resolution maps of Nrd1-Nab3-RNA interactions, from which we have uncovered hundreds of new Nrd1-Nab3 mRNA targets, representing between 20 and 30% of protein-coding transcripts. Although Nrd1 and Nab3 showed a preference for binding near 5′ ends of relatively short transcripts, they bound transcripts throughout coding sequences and 3′ UTRs. Moreover, our data for Nrd1-Nab3 binding to 3′ UTRs was consistent with a role for these proteins in the termination of transcription. Our data also support a tight integration of Nrd1-Nab3 with the nutrient response pathway. Finally, we provide experimental evidence for some of our predictions, using northern blot and RT-PCR assays. Conclusions Collectively, our data support the notion that Nrd1 and Nab3 function is tightly integrated with the nutrient response and indicate a role for these proteins in the regulation of many mRNA coding genes. Further, we provide evidence to support the hypothesis that Nrd1-Nab3 represents a failsafe termination mechanism in instances of readthrough transcription. PMID:24393166
Biodegradation of DDT by Stenotrophomonas sp. DDT-1: Characterization and genome functional analysis
NASA Astrophysics Data System (ADS)
Pan, Xiong; Lin, Dunli; Zheng, Yuan; Zhang, Qian; Yin, Yuanming; Cai, Lin; Fang, Hua; Yu, Yunlong
2016-02-01
A novel bacterium capable of utilizing 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT) as the sole carbon and energy source was isolated from a contaminated soil which was identified as Stenotrophomonas sp. DDT-1 based on morphological characteristics, BIOLOG GN2 microplate profile, and 16S rDNA phylogeny. Genome sequencing and functional annotation of the isolate DDT-1 showed a 4,514,569 bp genome size, 66.92% GC content, 4,033 protein-coding genes, and 76 RNA genes including 8 rRNA genes. Totally, 2,807 protein-coding genes were assigned to Clusters of Orthologous Groups (COGs), and 1,601 protein-coding genes were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The degradation half-lives of DDT increased with substrate concentration from 0.1 to 10.0 mg/l, whereas decreased with temperature from 15 °C to 35 °C. Neutral condition was the most favorable for DDT biodegradation. Based on genome annotation of DDT degradation genes and the metabolites detected by GC-MS, a mineralization pathway was proposed for DDT biodegradation in which it was orderly converted into DDE/DDD, DDMU, DDOH, and DDA via dechlorination, hydroxylation, and carboxylation, and ultimately mineralized to carbon dioxide. The results indicate that the isolate DDT-1 is a promising bacterial resource for the removal or detoxification of DDT residues in the environment.
Pan, Xiong; Lin, Dunli; Zheng, Yuan; Zhang, Qian; Yin, Yuanming; Cai, Lin; Fang, Hua; Yu, Yunlong
2016-02-18
A novel bacterium capable of utilizing 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT) as the sole carbon and energy source was isolated from a contaminated soil which was identified as Stenotrophomonas sp. DDT-1 based on morphological characteristics, BIOLOG GN2 microplate profile, and 16S rDNA phylogeny. Genome sequencing and functional annotation of the isolate DDT-1 showed a 4,514,569 bp genome size, 66.92% GC content, 4,033 protein-coding genes, and 76 RNA genes including 8 rRNA genes. Totally, 2,807 protein-coding genes were assigned to Clusters of Orthologous Groups (COGs), and 1,601 protein-coding genes were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The degradation half-lives of DDT increased with substrate concentration from 0.1 to 10.0 mg/l, whereas decreased with temperature from 15 °C to 35 °C. Neutral condition was the most favorable for DDT biodegradation. Based on genome annotation of DDT degradation genes and the metabolites detected by GC-MS, a mineralization pathway was proposed for DDT biodegradation in which it was orderly converted into DDE/DDD, DDMU, DDOH, and DDA via dechlorination, hydroxylation, and carboxylation, and ultimately mineralized to carbon dioxide. The results indicate that the isolate DDT-1 is a promising bacterial resource for the removal or detoxification of DDT residues in the environment.
Deng, Lei; Wu, Hongjie; Liu, Chuyao; Zhan, Weihua; Zhang, Jingpu
2018-06-01
Long non-coding RNAs (lncRNAs) are involved in many biological processes, such as immune response, development, differentiation and gene imprinting and are associated with diseases and cancers. But the functions of the vast majority of lncRNAs are still unknown. Predicting the biological functions of lncRNAs is one of the key challenges in the post-genomic era. In our work, We first build a global network including a lncRNA similarity network, a lncRNA-protein association network and a protein-protein interaction network according to the expressions and interactions, then extract the topological feature vectors of the global network. Using these features, we present an SVM-based machine learning approach, PLNRGO, to annotate human lncRNAs. In PLNRGO, we construct a training data set according to the proteins with GO annotations and train a binary classifier for each GO term. We assess the performance of PLNRGO on our manually annotated lncRNA benchmark and a protein-coding gene benchmark with known functional annotations. As a result, the performance of our method is significantly better than that of other state-of-the-art methods in terms of maximum F-measure and coverage. Copyright © 2018 Elsevier Ltd. All rights reserved.
Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael
2013-01-01
Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343
Maier, Uwe-G; Zauner, Stefan; Woehle, Christian; Bolte, Kathrin; Hempel, Franziska; Allen, John F.; Martin, William F.
2013-01-01
Plastid and mitochondrial genomes have undergone parallel evolution to encode the same functional set of genes. These encode conserved protein components of the electron transport chain in their respective bioenergetic membranes and genes for the ribosomes that express them. This highly convergent aspect of organelle genome evolution is partly explained by the redox regulation hypothesis, which predicts a separate plastid or mitochondrial location for genes encoding bioenergetic membrane proteins of either photosynthesis or respiration. Here we show that convergence in organelle genome evolution is far stronger than previously recognized, because the same set of genes for ribosomal proteins is independently retained by both plastid and mitochondrial genomes. A hitherto unrecognized selective pressure retains genes for the same ribosomal proteins in both organelles. On the Escherichia coli ribosome assembly map, the retained proteins are implicated in 30S and 50S ribosomal subunit assembly and initial rRNA binding. We suggest that ribosomal assembly imposes functional constraints that govern the retention of ribosomal protein coding genes in organelles. These constraints are subordinate to redox regulation for electron transport chain components, which anchor the ribosome to the organelle genome in the first place. As organelle genomes undergo reduction, the rRNAs also become smaller. Below size thresholds of approximately 1,300 nucleotides (16S rRNA) and 2,100 nucleotides (26S rRNA), all ribosomal protein coding genes are lost from organelles, while electron transport chain components remain organelle encoded as long as the organelles use redox chemistry to generate a proton motive force. PMID:24259312
Freed, Nikki E; Bumann, Dirk; Silander, Olin K
2016-09-06
Gene essentiality - whether or not a gene is necessary for cell growth - is a fundamental component of gene function. It is not well established how quickly gene essentiality can change, as few studies have compared empirical measures of essentiality between closely related organisms. Here we present the results of a Tn-seq experiment designed to detect essential protein coding genes in the bacterial pathogen Shigella flexneri 2a 2457T on a genome-wide scale. Superficial analysis of this data suggested that 481 protein-coding genes in this Shigella strain are critical for robust cellular growth on rich media. Comparison of this set of genes with a gold-standard data set of essential genes in the closely related Escherichia coli K12 BW25113 revealed that an excessive number of genes appeared essential in Shigella but non-essential in E. coli. Importantly, and in converse to this comparison, we found no genes that were essential in E. coli and non-essential in Shigella, implying that many genes were artefactually inferred as essential in Shigella. Controlling for such artefacts resulted in a much smaller set of discrepant genes. Among these, we identified three sets of functionally related genes, two of which have previously been implicated as critical for Shigella growth, but which are dispensable for E. coli growth. The data presented here highlight the small number of protein coding genes for which we have strong evidence that their essentiality status differs between the closely related bacterial taxa E. coli and Shigella. A set of genes involved in acetate utilization provides a canonical example. These results leave open the possibility of developing strain-specific antibiotic treatments targeting such differentially essential genes, but suggest that such opportunities may be rare in closely related bacteria.
Non-coding RNAs in lung cancer
Ricciuti, Biagio; Mecca, Carmen; Crinò, Lucio; Baglivo, Sara; Cenci, Matteo; Metro, Giulio
2014-01-01
The discovery that protein-coding genes represent less than 2% of all human genome, and the evidence that more than 90% of it is actively transcribed, changed the classical point of view of the central dogma of molecular biology, which was always based on the assumption that RNA functions mainly as an intermediate bridge between DNA sequences and protein synthesis machinery. Accumulating data indicates that non-coding RNAs are involved in different physiological processes, providing for the maintenance of cellular homeostasis. They are important regulators of gene expression, cellular differentiation, proliferation, migration, apoptosis, and stem cell maintenance. Alterations and disruptions of their expression or activity have increasingly been associated with pathological changes of cancer cells, this evidence and the prospect of using these molecules as diagnostic markers and therapeutic targets, make currently non-coding RNAs among the most relevant molecules in cancer research. In this paper we will provide an overview of non-coding RNA function and disruption in lung cancer biology, also focusing on their potential as diagnostic, prognostic and predictive biomarkers. PMID:25593996
Diallinas, G; Gorfinkiel, L; Arst, H N; Cecchetto, G; Scazzocchio, C
1995-04-14
In Aspergillus nidulans, loss-of-function mutations in the uapA and azgA genes, encoding the major uric acid-xanthine and hypoxanthine-adenine-guanine permeases, respectively, result in impaired utilization of these purines as sole nitrogen sources. The residual growth of the mutant strains is due to the activity of a broad specificity purine permease. We have identified uapC, the gene coding for this third permease through the isolation of both gain-of-function and loss-of-function mutations. Uptake studies with wild-type and mutant strains confirmed the genetic analysis and showed that the UapC protein contributes 30% and 8-10% to uric acid and hypoxanthine transport rates, respectively. The uapC gene was cloned, its expression studied, its sequence and transcript map established, and the sequence of its putative product analyzed. uapC message accumulation is: (i) weakly induced by 2-thiouric acid; (ii) repressed by ammonium; (iii) dependent on functional uaY and areA regulatory gene products (mediating uric acid induction and nitrogen metabolite repression, respectively); (iv) increased by uapC gain-of-function mutations which specifically, but partially, suppress a leucine to valine mutation in the zinc finger of the protein coded by the areA gene. The putative uapC gene product is a highly hydrophobic protein of 580 amino acids (M(r) = 61,251) including 12-14 putative transmembrane segments. The UapC protein is highly similar (58% identity) to the UapA permease and significantly similar (23-34% identity) to a number of bacterial transporters. Comparisons of the sequences and hydropathy profiles of members of this novel family of transporters yield insights into their structure, functionally important residues, and possible evolutionary relationships.
The transcriptional activator ZNF143 is essential for normal development in zebrafish
2012-01-01
Background ZNF143 is a sequence-specific DNA-binding protein that stimulates transcription of both small RNA genes by RNA polymerase II or III, or protein-coding genes by RNA polymerase II, using separable activating domains. We describe phenotypic effects following knockdown of this protein in developing Danio rerio (zebrafish) embryos by injection of morpholino antisense oligonucleotides that target znf143 mRNA. Results The loss of function phenotype is pleiotropic and includes a broad array of abnormalities including defects in heart, blood, ear and midbrain hindbrain boundary. Defects are rescued by coinjection of synthetic mRNA encoding full-length ZNF143 protein, but not by protein lacking the amino-terminal activation domains. Accordingly, expression of several marker genes is affected following knockdown, including GATA-binding protein 1 (gata1), cardiac myosin light chain 2 (cmlc2) and paired box gene 2a (pax2a). The zebrafish pax2a gene proximal promoter contains two binding sites for ZNF143, and reporter gene transcription driven by this promoter in transfected cells is activated by this protein. Conclusions Normal development of zebrafish embryos requires ZNF143. Furthermore, the pax2a gene is probably one example of many protein-coding gene targets of ZNF143 during zebrafish development. PMID:22268977
The transcriptional activator ZNF143 is essential for normal development in zebrafish.
Halbig, Kari M; Lekven, Arne C; Kunkel, Gary R
2012-01-23
ZNF143 is a sequence-specific DNA-binding protein that stimulates transcription of both small RNA genes by RNA polymerase II or III, or protein-coding genes by RNA polymerase II, using separable activating domains. We describe phenotypic effects following knockdown of this protein in developing Danio rerio (zebrafish) embryos by injection of morpholino antisense oligonucleotides that target znf143 mRNA. The loss of function phenotype is pleiotropic and includes a broad array of abnormalities including defects in heart, blood, ear and midbrain hindbrain boundary. Defects are rescued by coinjection of synthetic mRNA encoding full-length ZNF143 protein, but not by protein lacking the amino-terminal activation domains. Accordingly, expression of several marker genes is affected following knockdown, including GATA-binding protein 1 (gata1), cardiac myosin light chain 2 (cmlc2) and paired box gene 2a (pax2a). The zebrafish pax2a gene proximal promoter contains two binding sites for ZNF143, and reporter gene transcription driven by this promoter in transfected cells is activated by this protein. Normal development of zebrafish embryos requires ZNF143. Furthermore, the pax2a gene is probably one example of many protein-coding gene targets of ZNF143 during zebrafish development.
Ambigapathy, Ganesh; Zheng, Zhaoqing; Li, Wei; Keifer, Joyce
2013-01-01
Brain-derived neurotrophic factor (BDNF) has a diverse functional role and complex pattern of gene expression. Alternative splicing of mRNA transcripts leads to further diversity of mRNAs and protein isoforms. Here, we describe the regulation of BDNF mRNA transcripts in an in vitro model of eyeblink classical conditioning and a unique transcript that forms a functionally distinct truncated BDNF protein isoform. Nine different mRNA transcripts from the BDNF gene of the pond turtle Trachemys scripta elegans (tBDNF) are selectively regulated during classical conditioning: exon I mRNA transcripts show no change, exon II transcripts are downregulated, while exon III transcripts are upregulated. One unique transcript that codes from exon II, tBDNF2a, contains a 40 base pair deletion in the protein coding exon that generates a truncated tBDNF protein. The truncated transcript and protein are expressed in the naïve untrained state and are fully repressed during conditioning when full-length mature tBDNF is expressed, thereby having an alternate pattern of expression in conditioning. Truncated BDNF is not restricted to turtles as a truncated mRNA splice variant has been described for the human BDNF gene. Further studies are required to determine the ubiquity of truncated BDNF alternative splice variants across species and the mechanisms of regulation and function of this newly recognized BDNF protein.
Ambigapathy, Ganesh; Zheng, Zhaoqing; Li, Wei; Keifer, Joyce
2013-01-01
Brain-derived neurotrophic factor (BDNF) has a diverse functional role and complex pattern of gene expression. Alternative splicing of mRNA transcripts leads to further diversity of mRNAs and protein isoforms. Here, we describe the regulation of BDNF mRNA transcripts in an in vitro model of eyeblink classical conditioning and a unique transcript that forms a functionally distinct truncated BDNF protein isoform. Nine different mRNA transcripts from the BDNF gene of the pond turtle Trachemys scripta elegans (tBDNF) are selectively regulated during classical conditioning: exon I mRNA transcripts show no change, exon II transcripts are downregulated, while exon III transcripts are upregulated. One unique transcript that codes from exon II, tBDNF2a, contains a 40 base pair deletion in the protein coding exon that generates a truncated tBDNF protein. The truncated transcript and protein are expressed in the naïve untrained state and are fully repressed during conditioning when full-length mature tBDNF is expressed, thereby having an alternate pattern of expression in conditioning. Truncated BDNF is not restricted to turtles as a truncated mRNA splice variant has been described for the human BDNF gene. Further studies are required to determine the ubiquity of truncated BDNF alternative splice variants across species and the mechanisms of regulation and function of this newly recognized BDNF protein. PMID:23825634
Sugita, Chieko; Ogata, Koretsugu; Shikata, Masamitsu; Jikuya, Hiroyuki; Takano, Jun; Furumichi, Miho; Kanehisa, Minoru; Omata, Tatsuo; Sugiura, Masahiro; Sugita, Mamoru
2007-01-01
The entire genome of the unicellular cyanobacterium Synechococcus elongatus PCC 6301 (formerly Anacystis nidulans Berkeley strain 6301) was sequenced. The genome consisted of a circular chromosome 2,696,255 bp long. A total of 2,525 potential protein-coding genes, two sets of rRNA genes, 45 tRNA genes representing 42 tRNA species, and several genes for small stable RNAs were assigned to the chromosome by similarity searches and computer predictions. The translated products of 56% of the potential protein-coding genes showed sequence similarities to experimentally identified and predicted proteins of known function, and the products of 35% of the genes showed sequence similarities to the translated products of hypothetical genes. The remaining 9% of genes lacked significant similarities to genes for predicted proteins in the public DNA databases. Some 139 genes coding for photosynthesis-related components were identified. Thirty-seven genes for two-component signal transduction systems were also identified. This is the smallest number of such genes identified in cyanobacteria, except for marine cyanobacteria, suggesting that only simple signal transduction systems are found in this strain. The gene arrangement and nucleotide sequence of Synechococcus elongatus PCC 6301 were nearly identical to those of a closely related strain Synechococcus elongatus PCC 7942, except for the presence of a 188.6 kb inversion. The sequences as well as the gene information shown in this paper are available in the Web database, CYORF (http://www.cyano.genome.jp/).
Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D
2017-01-04
The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
New Genes and Functional Innovation in Mammals.
Luis Villanueva-Cañas, José; Ruiz-Orera, Jorge; Agea, M Isabel; Gallo, Maria; Andreu, David; Albà, M Mar
2017-07-01
The birth of genes that encode new protein sequences is a major source of evolutionary innovation. However, we still understand relatively little about how these genes come into being and which functions they are selected for. To address these questions, we have obtained a large collection of mammalian-specific gene families that lack homologues in other eukaryotic groups. We have combined gene annotations and de novo transcript assemblies from 30 different mammalian species, obtaining ∼6,000 gene families. In general, the proteins in mammalian-specific gene families tend to be short and depleted in aromatic and negatively charged residues. Proteins which arose early in mammalian evolution include milk and skin polypeptides, immune response components, and proteins involved in reproduction. In contrast, the functions of proteins which have a more recent origin remain largely unknown, despite the fact that these proteins also have extensive proteomics support. We identify several previously described cases of genes originated de novo from noncoding genomic regions, supporting the idea that this mechanism frequently underlies the evolution of new protein-coding genes in mammals. Finally, we show that most young mammalian genes are preferentially expressed in testis, suggesting that sexual selection plays an important role in the emergence of new functional genes. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Retrieval of Enterobacteriaceae drug targets using singular value decomposition.
Silvério-Machado, Rita; Couto, Bráulio R G M; Dos Santos, Marcos A
2015-04-15
The identification of potential drug target proteins in bacteria is important in pharmaceutical research for the development of new antibiotics to combat bacterial agents that cause diseases. A new model that combines the singular value decomposition (SVD) technique with biological filters composed of a set of protein properties associated with bacterial drug targets and similarity to protein-coding essential genes of Escherichia coli (strain K12) has been created to predict potential antibiotic drug targets in the Enterobacteriaceae family. This model identified 99 potential drug target proteins in the studied family, which exhibit eight different functions and are protein-coding essential genes or similar to protein-coding essential genes of E.coli (strain K12), indicating that the disruption of the activities of these proteins is critical for cells. Proteins from bacteria with described drug resistance were found among the retrieved candidates. These candidates have no similarity to the human proteome, therefore exhibiting the advantage of causing no adverse effects or at least no known adverse effects on humans. rita_silverio@hotmail.com. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
CORUM: the comprehensive resource of mammalian protein complexes
Ruepp, Andreas; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Stransky, Michael; Waegele, Brigitte; Schmidt, Thorsten; Doudieu, Octave Noubibou; Stümpflen, Volker; Mewes, H. Werner
2008-01-01
Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The CORUM (http://mips.gsf.de/genre/proj/corum/index.html) database is a collection of experimentally verified mammalian protein complexes. Information is manually derived by critical reading of the scientific literature from expert annotators. Information about protein complexes includes protein complex names, subunits, literature references as well as the function of the complexes. For functional annotation, we use the FunCat catalogue that enables to organize the protein complex space into biologically meaningful subsets. The database contains more than 1750 protein complexes that are built from 2400 different genes, thus representing 12% of the protein-coding genes in human. A web-based system is available to query, view and download the data. CORUM provides a comprehensive dataset of protein complexes for discoveries in systems biology, analyses of protein networks and protein complex-associated diseases. Comparable to the MIPS reference dataset of protein complexes from yeast, CORUM intends to serve as a reference for mammalian protein complexes. PMID:17965090
Tzagoloff, A; Shtanko, A
1995-06-01
Three complementation groups of a pet mutant collection have been found to be composed of respiratory-deficient deficient mutants with lesions in mitochondrial protein synthesis. Recombinant plasmids capable of restoring respiration were cloned by transformation of representatives of each complementation group with a yeast genomic library. The plasmids were used to characterize the complementing genes and to institute disruption of the chromosomal copies of each gene in respiratory-proficient yeast. The sequences of the cloned genes indicate that they code for isoleucyl-, arginyl- and glutamyl-tRNA synthetases. The properties of the mutants used to obtain the genes and of strains with the disrupted genes indicate that all three aminoacyl-tRNA synthetases function exclusively in mitochondrial proteins synthesis. The ISM1 gene for mitochondrial isoleucyl-tRNA synthetase has been localized to chromosome XVI next to UME5. The MSR1 gene for the arginyl-tRNA synthetase was previously located on yeast chromosome VIII. The third gene MSE1 for the mitochondrial glutamyl-tRNA synthetase has not been localized. The identification of three new genes coding for mitochondrial-specific aminoacyl-tRNA synthetases indicates that in Saccharomyces cerevisiae at least 11 members of this protein family are encoded by genes distinct from those coding for the homologous cytoplasmic enzymes.
APPRIS 2017: principal isoforms for multiple gene sets
Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso
2018-01-01
Abstract The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants. PMID:29069475
Probing the Boundaries of Orthology: The Unanticipated Rapid Evolution of Drosophila centrosomin
Eisman, Robert C.; Kaufman, Thomas C.
2013-01-01
The rapid evolution of essential developmental genes and their protein products is both intriguing and problematic. The rapid evolution of gene products with simple protein folds and a lack of well-characterized functional domains typically result in a low discovery rate of orthologous genes. Additionally, in the absence of orthologs it is difficult to study the processes and mechanisms underlying rapid evolution. In this study, we have investigated the rapid evolution of centrosomin (cnn), an essential gene encoding centrosomal protein isoforms required during syncytial development in Drosophila melanogaster. Until recently the rapid divergence of cnn made identification of orthologs difficult and questionable because Cnn violates many of the assumptions underlying models for protein evolution. To overcome these limitations, we have identified a group of insect orthologs and present conserved features likely to be required for the functions attributed to cnn in D. melanogaster. We also show that the rapid divergence of Cnn isoforms is apparently due to frequent coding sequence indels and an accelerated rate of intronic additions and eliminations. These changes appear to be buffered by multi-exon and multi-reading frame maximum potential ORFs, simple protein folds, and the splicing machinery. These buffering features also occur in other genes in Drosophila and may help prevent potentially deleterious mutations due to indels in genes with large coding exons and exon-dense regions separated by small introns. This work promises to be useful for future investigations of cnn and potentially other rapidly evolving genes and proteins. PMID:23749319
Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas.
Mathelier, Anthony; Lefebvre, Calvin; Zhang, Allen W; Arenillas, David J; Ding, Jiarui; Wasserman, Wyeth W; Shah, Sohrab P
2015-04-23
With the rapid increase of whole-genome sequencing of human cancers, an important opportunity to analyze and characterize somatic mutations lying within cis-regulatory regions has emerged. A focus on protein-coding regions to identify nonsense or missense mutations disruptive to protein structure and/or function has led to important insights; however, the impact on gene expression of mutations lying within cis-regulatory regions remains under-explored. We analyzed somatic mutations from 84 matched tumor-normal whole genomes from B-cell lymphomas with accompanying gene expression measurements to elucidate the extent to which these cancers are disrupted by cis-regulatory mutations. We characterize mutations overlapping a high quality set of well-annotated transcription factor binding sites (TFBSs), covering a similar portion of the genome as protein-coding exons. Our results indicate that cis-regulatory mutations overlapping predicted TFBSs are enriched in promoter regions of genes involved in apoptosis or growth/proliferation. By integrating gene expression data with mutation data, our computational approach culminates with identification of cis-regulatory mutations most likely to participate in dysregulation of the gene expression program. The impact can be measured along with protein-coding mutations to highlight key mutations disrupting gene expression and pathways in cancer. Our study yields specific genes with disrupted expression triggered by genomic mutations in either the coding or the regulatory space. It implies that mutated regulatory components of the genome contribute substantially to cancer pathways. Our analyses demonstrate that identifying genomically altered cis-regulatory elements coupled with analysis of gene expression data will augment biological interpretation of mutational landscapes of cancers.
Chamber Specific Gene Expression Landscape of the Zebrafish Heart
Singh, Angom Ramcharan; Sivadas, Ambily; Sabharwal, Ankit; Vellarikal, Shamsudheen Karuthedath; Jayarajan, Rijith; Verma, Ankit; Kapoor, Shruti; Joshi, Adita; Scaria, Vinod; Sivasubbu, Sridhar
2016-01-01
The organization of structure and function of cardiac chambers in vertebrates is defined by chamber-specific distinct gene expression. This peculiarity and uniqueness of the genetic signatures demonstrates functional resolution attributed to the different chambers of the heart. Altered expression of the cardiac chamber genes can lead to individual chamber related dysfunctions and disease patho-physiologies. Information on transcriptional repertoire of cardiac compartments is important to understand the spectrum of chamber specific anomalies. We have carried out a genome wide transcriptome profiling study of the three cardiac chambers in the zebrafish heart using RNA sequencing. We have captured the gene expression patterns of 13,396 protein coding genes in the three cardiac chambers—atrium, ventricle and bulbus arteriosus. Of these, 7,260 known protein coding genes are highly expressed (≥10 FPKM) in the zebrafish heart. Thus, this study represents nearly an all-inclusive information on the zebrafish cardiac transcriptome. In this study, a total of 96 differentially expressed genes across the three cardiac chambers in zebrafish were identified. The atrium, ventricle and bulbus arteriosus displayed 20, 32 and 44 uniquely expressing genes respectively. We validated the expression of predicted chamber-restricted genes using independent semi-quantitative and qualitative experimental techniques. In addition, we identified 23 putative novel protein coding genes that are specifically restricted to the ventricle and not in the atrium or bulbus arteriosus. In our knowledge, these 23 novel genes have either not been investigated in detail or are sparsely studied. The transcriptome identified in this study includes 68 differentially expressing zebrafish cardiac chamber genes that have a human ortholog. We also carried out spatiotemporal gene expression profiling of the 96 differentially expressed genes throughout the three cardiac chambers in 11 developmental stages and 6 tissue types of zebrafish. We hypothesize that clustering the differentially expressed genes with both known and unknown functions will deliver detailed insights on fundamental gene networks that are important for the development and specification of the cardiac chambers. It is also postulated that this transcriptome atlas will help utilize zebrafish in a better way as a model for studying cardiac development and to explore functional role of gene networks in cardiac disease pathogenesis. PMID:26815362
Carver, Melissa N.; Müller, Ulrika; Bekiranov, Stefan; Auble, David T.
2017-01-01
Transcriptome studies on eukaryotic cells have revealed an unexpected abundance and diversity of noncoding RNAs synthesized by RNA polymerase II (Pol II), some of which influence the expression of protein-coding genes. Yet, much less is known about biogenesis of Pol II non-coding RNA than mRNAs. In the budding yeast Saccharomyces cerevisiae, initiation of non-coding transcripts by Pol II appears to be similar to that of mRNAs, but a distinct pathway is utilized for termination of most non-coding RNAs: the Sen1-dependent or “NNS” pathway. Here, we examine the effect on the S. cerevisiae transcriptome of conditional mutations in the genes encoding six different essential proteins that influence Sen1-dependent termination: Sen1, Nrd1, Nab3, Ssu72, Rpb11, and Hrp1. We observe surprisingly diverse effects on transcript abundance for the different proteins that cannot be explained simply by differing severity of the mutations. Rather, we infer from our results that termination of Pol II transcription of non-coding RNA genes is subject to complex combinatorial control that likely involves proteins beyond those studied here. Furthermore, we identify new targets and functions of Sen1-dependent termination, including a role in repression of meiotic genes in vegetative cells. In combination with other recent whole-genome studies on termination of non-coding RNAs, our results provide promising directions for further investigation. PMID:28665995
Benítez-Burraco, A
FOXP2 is the first gene linked to a hereditary variant of specific language impairment and seems to code for a transcriptional repressor that intervenes in the regulation of the development and the functioning of certain thalamic-cortical-striatal circuits. In the last three years, significant progress has been made in the determination of the structural and functional properties of the gene. These advances essentially have to do with the precise analysis of the most important structural motifs of the protein that it codes for and the main parameters that determine its interaction with DNA. They also concern the determination of the functional and behavioural properties in vivo of the main isoforms of the FOXP2 protein, the exact determination of the pattern of expression of new orthologues of the gene, and the identification of the different target genes for factor FOXP2. This new evidence suggests that protein FOXP2 protein has a high degree of versatility in vivo when it comes to binding to DNA; that its different isoforms are biologically functional; and that the FOXP2 gene is functional during embryonic development and during the adult phase. It also suggests that it is involved in the development and/or functioning of the thalamic-cortical-striatal circuits associated to motor planning, sequential behaviour and procedural learning (a significant saving in developmental terms of the regulatory mechanism in which the gene is involved), as well as the accuracy of the models of linguistic processing that consider language to be, to a large extent, the result of an interaction between certain cortical and subcortical structures.
Biodegradation of DDT by Stenotrophomonas sp. DDT-1: Characterization and genome functional analysis
Pan, Xiong; Lin, Dunli; Zheng, Yuan; Zhang, Qian; Yin, Yuanming; Cai, Lin; Fang, Hua; Yu, Yunlong
2016-01-01
A novel bacterium capable of utilizing 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT) as the sole carbon and energy source was isolated from a contaminated soil which was identified as Stenotrophomonas sp. DDT-1 based on morphological characteristics, BIOLOG GN2 microplate profile, and 16S rDNA phylogeny. Genome sequencing and functional annotation of the isolate DDT-1 showed a 4,514,569 bp genome size, 66.92% GC content, 4,033 protein-coding genes, and 76 RNA genes including 8 rRNA genes. Totally, 2,807 protein-coding genes were assigned to Clusters of Orthologous Groups (COGs), and 1,601 protein-coding genes were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The degradation half-lives of DDT increased with substrate concentration from 0.1 to 10.0 mg/l, whereas decreased with temperature from 15 °C to 35 °C. Neutral condition was the most favorable for DDT biodegradation. Based on genome annotation of DDT degradation genes and the metabolites detected by GC-MS, a mineralization pathway was proposed for DDT biodegradation in which it was orderly converted into DDE/DDD, DDMU, DDOH, and DDA via dechlorination, hydroxylation, and carboxylation, and ultimately mineralized to carbon dioxide. The results indicate that the isolate DDT-1 is a promising bacterial resource for the removal or detoxification of DDT residues in the environment. PMID:26888254
Flather, Dylan; Semler, Bert L.
2015-01-01
The compartmentalization of DNA replication and gene transcription in the nucleus and protein production in the cytoplasm is a defining feature of eukaryotic cells. The nucleus functions to maintain the integrity of the nuclear genome of the cell and to control gene expression based on intracellular and environmental signals received through the cytoplasm. The spatial separation of the major processes that lead to the expression of protein-coding genes establishes the necessity of a transport network to allow biomolecules to translocate between these two regions of the cell. The nucleocytoplasmic transport network is therefore essential for regulating normal cellular functioning. The Picornaviridae virus family is one of many viral families that disrupt the nucleocytoplasmic trafficking of cells to promote viral replication. Picornaviruses contain positive-sense, single-stranded RNA genomes and replicate in the cytoplasm of infected cells. As a result of the limited coding capacity of these viruses, cellular proteins are required by these intracellular parasites for both translation and genomic RNA replication. Being of messenger RNA polarity, a picornavirus genome can immediately be translated upon entering the cell cytoplasm. However, the replication of viral RNA requires the activity of RNA-binding proteins, many of which function in host gene expression, and are consequently localized to the nucleus. As a result, picornaviruses disrupt nucleocytoplasmic trafficking to exploit protein functions normally localized to a different cellular compartment from which they translate their genome to facilitate efficient replication. Furthermore, picornavirus proteins are also known to enter the nucleus of infected cells to limit host-cell transcription and down-regulate innate antiviral responses. The interactions of picornavirus proteins and host-cell nuclei are extensive, required for a productive infection, and are the focus of this review. PMID:26150805
Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan
2015-12-11
High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
SinEx DB: a database for single exon coding sequences in mammalian genomes.
Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S
2016-01-01
Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.
2012-01-01
Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
AP1 Keeps Chromatin Poised for Action | Center for Cancer Research
The human genome harbors gene-encoding DNA, the blueprint for building proteins that regulate cellular function. Embedded across the genome, in non-coding regions, are DNA elements to which regulatory factors bind. The interaction of regulatory factors with DNA at these sites modifies gene expression to modulate cell activity. In cells, DNA exists in a complex with proteins
Pitchiaya, Sethuramasundaram; Krishnan, Vishalakshi; Custer, Thomas C.; Walter, Nils G.
2013-01-01
Non-coding RNAs (ncRNAs) recently were discovered to outnumber their protein-coding counterparts, yet their diverse functions are still poorly understood. Here we report on a method for the intracellular Single-molecule High Resolution Localization and Counting (iSHiRLoC) of microRNAs (miRNAs), a conserved, ubiquitous class of regulatory ncRNAs that controls the expression of over 60% of all mammalian protein coding genes post-transcriptionally, by a mechanism shrouded by seemingly contradictory observations. We present protocols to execute single particle tracking (SPT) and single-molecule counting of functional microinjected, fluorophore-labeled miRNAs and thereby extract diffusion coefficients and molecular stoichiometries of micro-ribonucleoprotein (miRNP) complexes from living and fixed cells, respectively. This probing of miRNAs at the single molecule level sheds new light on the intracellular assembly/disassembly of miRNPs, thus beginning to unravel the dynamic nature of this important gene regulatory pathway and facilitating the development of a parsimonious model for their obscured mechanism of action. PMID:23820309
Levine, Mia T; Holloway, Alisha K; Arshad, Umbreen; Begun, David J
2007-11-01
Dosage compensation refers to the equalization of X-linked gene transcription among heterogametic and homogametic sexes. In Drosophila, the dosage compensation complex (DCC) mediates the twofold hypertranscription of the single male X chromosome. Loss-of-function mutations at any DCC protein-coding gene are male lethal. Here we report a population genetic analysis suggesting that four of the five core DCC proteins--MSL1, MSL2, MSL3, and MOF--are evolving under positive selection in D. melanogaster. Within these four proteins, several domains that range in function from X chromosome localization to protein-protein interactions have elevated, D. melanogaster-specific, amino acid divergence.
Genes encoding cuticular proteins are components of the Nimrod gene cluster in Drosophila.
Cinege, Gyöngyi; Zsámboki, János; Vidal-Quadras, Maite; Uv, Anne; Csordás, Gábor; Honti, Viktor; Gábor, Erika; Hegedűs, Zoltán; Varga, Gergely I B; Kovács, Attila L; Juhász, Gábor; Williams, Michael J; Andó, István; Kurucz, Éva
2017-08-01
The Nimrod gene cluster, located on the second chromosome of Drosophila melanogaster, is the largest synthenic unit of the Drosophila genome. Nimrod genes show blood cell specific expression and code for phagocytosis receptors that play a major role in fruit fly innate immune functions. We previously identified three homologous genes (vajk-1, vajk-2 and vajk-3) located within the Nimrod cluster, which are unrelated to the Nimrod genes, but are homologous to a fourth gene (vajk-4) located outside the cluster. Here we show that, unlike the Nimrod candidates, the Vajk proteins are expressed in cuticular structures of the late embryo and the late pupa, indicating that they contribute to cuticular barrier functions. Copyright © 2017 Elsevier Ltd. All rights reserved.
Nucleic acids encoding plant glutamine phenylpyruvate transaminase (GPT) and uses thereof
Unkefer, Pat J.; Anderson, Penelope S.; Knight, Thomas J.
2016-03-29
Glutamine phenylpyruvate transaminase (GPT) proteins, nucleic acid molecules encoding GPT proteins, and uses thereof are disclosed. Provided herein are various GPT proteins and GPT gene coding sequences isolated from a number of plant species. As disclosed herein, GPT proteins share remarkable structural similarity within plant species, and are active in catalyzing the synthesis of 2-hydroxy-5-oxoproline (2-oxoglutaramate), a powerful signal metabolite which regulates the function of a large number of genes involved in the photosynthesis apparatus, carbon fixation and nitrogen metabolism.
Junk DNA and the long non-coding RNA twist in cancer genetics
Ling, Hui; Vincent, Kimberly; Pichler, Martin; Fodde, Riccardo; Berindan-Neagoe, Ioana; Slack, Frank J.; Calin, George A
2015-01-01
The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions, and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function, and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer. PMID:25619839
Altruistic functions for selfish DNA.
Faulkner, Geoffrey J; Carninci, Piero
2009-09-15
Mammalian genomes are comprised of 30-50% transposed elements (TEs). The vast majority of these TEs are truncated and mutated fragments of retrotransposons that are no longer capable of transposition. Although initially regarded as important factors in the evolution of gene regulatory networks, TEs are now commonly perceived as neutrally evolving and non-functional genomic elements. In a major development, recent works have strongly contradicted this "selfish DNA" or "junk DNA" dogma by demonstrating that TEs use a host of novel promoters to generate RNA on a massive scale across most eukaryotic cells. This transcription frequently functions to control the expression of protein-coding genes via alternative promoters, cis regulatory non protein-coding RNAs and the formation of double stranded short RNAs. If considered in sum, these findings challenge the designation of TEs as selfish and neutrally evolving genomic elements. Here, we will expand upon these themes and discuss challenges in establishing novel TE functions in vivo.
Quantifying the mechanisms of domain gain in animal proteins.
Buljan, Marija; Frankish, Adam; Bateman, Alex
2010-01-01
Protein domains are protein regions that are shared among different proteins and are frequently functionally and structurally independent from the rest of the protein. Novel domain combinations have a major role in evolutionary innovation. However, the relative contributions of the different molecular mechanisms that underlie domain gains in animals are still unknown. By using animal gene phylogenies we were able to identify a set of high confidence domain gain events and by looking at their coding DNA investigate the causative mechanisms. Here we show that the major mechanism for gains of new domains in metazoan proteins is likely to be gene fusion through joining of exons from adjacent genes, possibly mediated by non-allelic homologous recombination. Retroposition and insertion of exons into ancestral introns through intronic recombination are, in contrast to previous expectations, only minor contributors to domain gains and have accounted for less than 1% and 10% of high confidence domain gain events, respectively. Additionally, exonization of previously non-coding regions appears to be an important mechanism for addition of disordered segments to proteins. We observe that gene duplication has preceded domain gain in at least 80% of the gain events. The interplay of gene duplication and domain gain demonstrates an important mechanism for fast neofunctionalization of genes.
Origin of sphinx, a young chimeric RNA gene in Drosophila melanogaster
Wang, Wen; Brunet, Frédéric G.; Nevo, Eviatar; Long, Manyuan
2002-01-01
Non-protein-coding RNA genes play an important role in various biological processes. How new RNA genes originated and whether this process is controlled by similar evolutionary mechanisms for the origin of protein-coding genes remains unclear. A young chimeric RNA gene that we term sphinx (spx) provides the first insight into the early stage of evolution of RNA genes. spx originated as an insertion of a retroposed sequence of the ATP synthase chain F gene at the cytological region 60DB since the divergence of Drosophila melanogaster from its sibling species 2–3 million years ago. This retrosequence, which is located at 102F on the fourth chromosome, recruited a nearby exon and intron, thereby evolving a chimeric gene structure. This molecular process suggests that the mechanism of exon shuffling, which can generate protein-coding genes, also plays a role in the origin of RNA genes. The subsequent evolutionary process of spx has been associated with a high nucleotide substitution rate, possibly driven by a continuous positive Darwinian selection for a novel function, as is shown in its sex- and development-specific alternative splicing. To test whether spx has adapted to different environments, we investigated its population genetic structure in the unique “Evolution Canyon” in Israel, revealing a similar haplotype structure in spx, and thus similar evolutionary forces operating on spx between environments. PMID:11904380
The Yersinia pestis gcvB gene encodes two small regulatory RNA molecules
McArthur, Sarah D; Pulvermacher, Sarah C; Stauffer, George V
2006-01-01
Background In recent years it has become clear that small non-coding RNAs function as regulatory elements in bacterial virulence and bacterial stress responses. We tested for the presence of the small non-coding GcvB RNAs in Y. pestis as possible regulators of gene expression in this organism. Results In this study, we report that the Yersinia pestis KIM6 gcvB gene encodes two small RNAs. Transcription of gcvB is activated by the GcvA protein and repressed by the GcvR protein. The gcvB-encoded RNAs are required for repression of the Y. pestis dppA gene, encoding the periplasmic-binding protein component of the dipeptide transport system, showing that the GcvB RNAs have regulatory activity. A deletion of the gcvB gene from the Y. pestis KIM6 chromosome results in a decrease in the generation time of the organism as well as a change in colony morphology. Conclusion The results of this study indicate that the Y. pestis gcvB gene encodes two small non-coding regulatory RNAs that repress dppA expression. A gcvB deletion is pleiotropic, suggesting that the sRNAs are likely involved in controlling genes in addition to dppA. PMID:16768793
Cioffi, Anna Valentina; Ferrara, Diana; Cubellis, Maria Vittoria; Aniello, Francesco; Corrado, Marcella; Liguori, Francesca; Amoroso, Alessandro; Fucci, Laura; Branno, Margherita
2002-08-01
Analysis of the genome structure of the Paracentrotus lividus (sea urchin) DNA methyltransferase (DNA MTase) gene showed the presence of an open reading frame, named METEX, in intron 7 of the gene. METEX expression is developmentally regulated, showing no correlation with DNA MTase expression. In fact, DNA MTase transcripts are present at high concentrations in the early developmental stages, while METEX is expressed at late stages of development. Two METEX cDNA clones (Met1 and Met2) that are different in the 3' end have been isolated in a cDNA library screening. The putative translated protein from Met2 cDNA clone showed similarity with Escherichia coli endonuclease III on the basis of sequence and predictive three-dimensional structure. The protein, overexpressed in E. coli and purified, had functional properties similar to the endonuclease specific for apurinic/apyrimidinic (AP) sites on the basis of the lyase activity. Therefore the open reading frame, present in intron 7 of the P. lividus DNA MTase gene, codes for a functional AP endonuclease designated SuAP1.
2014-01-01
Background The genome is pervasively transcribed but most transcripts do not code for proteins, constituting non-protein-coding RNAs. Despite increasing numbers of functional reports of individual long non-coding RNAs (lncRNAs), assessing the extent of functionality among the non-coding transcriptional output of mammalian cells remains intricate. In the protein-coding world, transcripts differentially expressed in the context of processes essential for the survival of multicellular organisms have been instrumental in the discovery of functionally relevant proteins and their deregulation is frequently associated with diseases. We therefore systematically identified lncRNAs expressed differentially in response to oncologically relevant processes and cell-cycle, p53 and STAT3 pathways, using tiling arrays. Results We found that up to 80% of the pathway-triggered transcriptional responses are non-coding. Among these we identified very large macroRNAs with pathway-specific expression patterns and demonstrated that these are likely continuous transcripts. MacroRNAs contain elements conserved in mammals and sauropsids, which in part exhibit conserved RNA secondary structure. Comparing evolutionary rates of a macroRNA to adjacent protein-coding genes suggests a local action of the transcript. Finally, in different grades of astrocytoma, a tumor disease unrelated to the initially used cell lines, macroRNAs are differentially expressed. Conclusions It has been shown previously that the majority of expressed non-ribosomal transcripts are non-coding. We now conclude that differential expression triggered by signaling pathways gives rise to a similar abundance of non-coding content. It is thus unlikely that the prevalence of non-coding transcripts in the cell is a trivial consequence of leaky or random transcription events. PMID:24594072
Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.
Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E
2018-04-25
Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.
Towards a complete map of the human long non-coding RNA transcriptome.
Uszczynska-Ratajczak, Barbara; Lagarde, Julien; Frankish, Adam; Guigó, Roderic; Johnson, Rory
2018-05-23
Gene maps, or annotations, enable us to navigate the functional landscape of our genome. They are a resource upon which virtually all studies depend, from single-gene to genome-wide scales and from basic molecular biology to medical genetics. Yet present-day annotations suffer from trade-offs between quality and size, with serious but often unappreciated consequences for downstream studies. This is particularly true for long non-coding RNAs (lncRNAs), which are poorly characterized compared to protein-coding genes. Long-read sequencing technologies promise to improve current annotations, paving the way towards a complete annotation of lncRNAs expressed throughout a human lifetime.
Lie, Kai K; Tørresen, Ole K; Solbakken, Monica Hongrø; Rønnestad, Ivar; Tooming-Klunderud, Ave; Nederbragt, Alexander J; Jentoft, Sissel; Sæle, Øystein
2018-03-06
The ballan wrasse (Labrus bergylta) belongs to a large teleost family containing more than 600 species showing several unique evolutionary traits such as lack of stomach and hermaphroditism. Agastric fish are found throughout the teleost phylogeny, in quite diverse and unrelated lineages, indicating stomach loss has occurred independently multiple times in the course of evolution. By assembling the ballan wrasse genome and transcriptome we aimed to determine the genetic basis for its digestive system function and appetite regulation. Among other, this knowledge will aid the formulation of aquaculture diets that meet the nutritional needs of agastric species. Long and short read sequencing technologies were combined to generate a ballan wrasse genome of 805 Mbp. Analysis of the genome and transcriptome assemblies confirmed the absence of genes that code for proteins involved in gastric function. The gene coding for the appetite stimulating protein ghrelin was also absent in wrasse. Gene synteny mapping identified several appetite-controlling genes and their paralogs previously undescribed in fish. Transcriptome profiling along the length of the intestine found a declining expression gradient from the anterior to the posterior, and a distinct expression profile in the hind gut. We showed gene loss has occurred for all known genes related to stomach function in the ballan wrasse, while the remaining functions of the digestive tract appear intact. The results also show appetite control in ballan wrasse has undergone substantial changes. The loss of ghrelin suggests that other genes, such as motilin, may play a ghrelin like role. The wrasse genome offers novel insight in to the evolutionary traits of this large family. As the stomach plays a major role in protein digestion, the lack of genes related to stomach digestion in wrasse suggests it requires formulated diets with higher levels of readily digestible protein than those for gastric species.
McLysaght, Aoife; Guerzoni, Daniele
2015-09-26
The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an 'RNA-first' or 'ORF-first' pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations. © 2015 The Authors.
NASA Astrophysics Data System (ADS)
Boulter, Jim; Connolly, John; Deneris, Evan; Goldman, Dan; Heinemann, Steven; Patrick, Jim
1987-11-01
A family of genes coding for proteins homologous to the α subunit of the muscle nicotinic acetylcholine receptor has been identified in the rat genome. These genes are transcribed in the central and peripheral nervous systems in areas known to contain functional nicotinic receptors. In this paper, we demonstrate that three of these genes, which we call alpha3, alpha4, and beta2, encode proteins that form functional nicotinic acetylcholine receptors when expressed in Xenopus oocytes. Oocytes expressing either alpha3 or alpha4 protein in combination with the beta2 protein produced a strong response to acetylcholine. Oocytes expressing only the alpha4 protein gave a weak response to acetylcholine. These receptors are activated by acetylcholine and nicotine and are blocked by Bungarus toxin 3.1. They are not blocked by α -bungarotoxin, which blocks the muscle nicotinic acetylcholine receptor. Thus, the receptors formed by the alpha3, alpha4, and beta2 subunits are pharmacologically similar to the ganglionic-type neuronal nicotinic acetylcholine receptor. These results indicate that the alpha3, alpha4, and beta2 genes encode functional nicotinic acetylcholine receptor subunits that are expressed in the brain and peripheral nervous system.
Decoding the non-coding RNAs in Alzheimer's disease.
Schonrock, Nicole; Götz, Jürgen
2012-11-01
Non-coding RNAs (ncRNAs) are integral components of biological networks with fundamental roles in regulating gene expression. They can integrate sequence information from the DNA code, epigenetic regulation and functions of multimeric protein complexes to potentially determine the epigenetic status and transcriptional network in any given cell. Humans potentially contain more ncRNAs than any other species, especially in the brain, where they may well play a significant role in human development and cognitive ability. This review discusses their emerging role in Alzheimer's disease (AD), a human pathological condition characterized by the progressive impairment of cognitive functions. We discuss the complexity of the ncRNA world and how this is reflected in the regulation of the amyloid precursor protein and Tau, two proteins with central functions in AD. By understanding this intricate regulatory network, there is hope for a better understanding of disease mechanisms and ultimately developing diagnostic and therapeutic tools.
Favre, Patrick; Bapaume, Laure; Bossolini, Eligio; Delorenzi, Mauro; Falquet, Laurent; Reinhardt, Didier
2014-12-03
Genes involved in arbuscular mycorrhizal (AM) symbiosis have been identified primarily by mutant screens, followed by identification of the mutated genes (forward genetics). In addition, a number of AM-related genes has been identified by their AM-related expression patterns, and their function has subsequently been elucidated by knock-down or knock-out approaches (reverse genetics). However, genes that are members of functionally redundant gene families, or genes that have a vital function and therefore result in lethal mutant phenotypes, are difficult to identify. If such genes are constitutively expressed and therefore escape differential expression analyses, they remain elusive. The goal of this study was to systematically search for AM-related genes with a bioinformatics strategy that is insensitive to these problems. The central element of our approach is based on the fact that many AM-related genes are conserved only among AM-competent species. Our approach involves genome-wide comparisons at the proteome level of AM-competent host species with non-mycorrhizal species. Using a clustering method we first established orthologous/paralogous relationships and subsequently identified protein clusters that contain members only of the AM-competent species. Proteins of these clusters were then analyzed in an extended set of 16 plant species and ranked based on their relatedness among AM-competent monocot and dicot species, relative to non-mycorrhizal species. In addition, we combined the information on the protein-coding sequence with gene expression data and with promoter analysis. As a result we present a list of yet uncharacterized proteins that show a strongly AM-related pattern of sequence conservation, indicating that the respective genes may have been under selection for a function in AM. Among the top candidates are three genes that encode a small family of similar receptor-like kinases that are related to the S-locus receptor kinases involved in sporophytic self-incompatibility. We present a new systematic strategy of gene discovery based on conservation of the protein-coding sequence that complements classical forward and reverse genetics. This strategy can be applied to diverse other biological phenomena if species with established genome sequences fall into distinguished groups that differ in a defined functional trait of interest.
Omeire, Destiny; Abdin, Shaunte; Brooks, Daniel M; Miranda, Hector C
2015-04-01
The Germain's Peacock-Pheasant Polyplectron germaini (Aves, Galliformes, Phasianidae) is classified as Near Threatened on the IUCN Red List. The complete mitochondrial genome of P. germaini is 16,699 bp, consisting of 13 protein-coding genes, 2 rRNA, 22 tRNA genes and 1 control region. All of the 13 protein-coding genes have ATG as start codon. Eight of the 13 protein-coding genes have TAA as stop codon.
Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui; ...
2014-10-02
Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui
Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ploos van Amstel, H.; Reitsma, P.H.; van der Logt, C.P.
The human protein S locus on chromosome 3 consists of two protein S genes, PS{alpha} and PS{beta}. Here the authors report the cloning and characterization of both genes. Fifteen exons of the PS{alpha} gene were identified that together code for protein S mRNA as derived from the reported protein S cDNAs. Analysis by primer extension of liver protein S mRNA, however, reveals the presence of two mRNA forms that differ in the length of their 5{prime}-noncoding region. Both transcripts contain a 5{prime}-noncoding region longer than found in the protein S cDNAs. The two products may arise from alternative splicing ofmore » an additional intron in this region or from the usage of two start sites for transcription. The intron-exon organization of the PS{alpha} gene fully supports the hypothesis that the protein S gene is the product of an evolutional assembling process in which gene modules coding for structural/functional protein units also found in other coagulation proteins have been put upstream of the ancestral gene of a steroid hormone binding protein. The PS{beta} gene is identified as a pseudogene. It contains a large variety of detrimental aberrations, viz., the absence of exon I, a splice site mutation, three stop codons, and a frame shift mutation. Overall the two genes PS{alpha} and PS{beta} show between their exonic sequences 96.5% homology. Southern analysis of primate DNA showed that the duplication of the ancestral protein S gene has occurred after the branching of the orangutan from the African apes. A nonsense mutation that is present in the pseudogene of man also could be identified in one of the two protein S genes of both chimpanzee and gorilla. This implicates that silencing of one of the two protein S genes must have taken place before the divergence of the three African apes.« less
SNP discovery in candidate adaptive genes using exon capture in a free-ranging alpine ungulate
Gretchen H. Roffler; Stephen J. Amish; Seth Smith; Ted Cosart; Marty Kardos; Michael K. Schwartz; Gordon Luikart
2016-01-01
Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding...
Identification of functional elements and regulatory circuits by Drosophila modENCODE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roy, Sushmita; Ernst, Jason; Kharchenko, Peter V.
2010-12-22
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- andmore » tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation. Several years after the complete genetic sequencing of many species, it is still unclear how to translate genomic information into a functional map of cellular and developmental programs. The Encyclopedia of DNA Elements (ENCODE) (1) and model organism ENCODE (modENCODE) (2) projects use diverse genomic assays to comprehensively annotate the Homo sapiens (human), Drosophila melanogaster (fruit fly), and Caenorhabditis elegans (worm) genomes, through systematic generation and computational integration of functional genomic data sets. Previous genomic studies in flies have made seminal contributions to our understanding of basic biological mechanisms and genome functions, facilitated by genetic, experimental, computational, and manual annotation of the euchromatic and heterochromatic genome (3), small genome size, short life cycle, and a deep knowledge of development, gene function, and chromosome biology. The functions of {approx}40% of the protein and nonprotein-coding genes [FlyBase 5.12 (4)] have been determined from cDNA collections (5, 6), manual curation of gene models (7), gene mutations and comprehensive genome-wide RNA interference screens (8-10), and comparative genomic analyses (11, 12). The Drosophila modENCODE project has generated more than 700 data sets that profile transcripts, histone modifications and physical nucleosome properties, general and specific transcription factors (TFs), and replication programs in cell lines, isolated tissues, and whole organisms across several developmental stages (Fig. 1). Here, we computationally integrate these data sets and report (i) improved and additional genome annotations, including full-length proteincoding genes and peptides as short as 21 amino acids; (ii) noncoding transcripts, including 132 candidate structural RNAs and 1608 nonstructural transcripts; (iii) additional Argonaute (Ago)-associated small RNA genes and pathways, including new microRNAs (miRNAs) encoded within protein-coding exons and endogenous small interfering RNAs (siRNAs) from 3-inch untranslated regions; (iv) chromatin 'states' defined by combinatorial patterns of 18 chromatin marks that are associated with distinct functions and properties; (v) regions of high TF occupancy and replication activity with likely epigenetic regulation; (vi)mixed TF and miRNA regulatory networks with hierarchical structure and enriched feed-forward loops; (vii) coexpression- and co-regulation-based functional annotations for nearly 3000 genes; (viii) stage- and tissue-specific regulators; and (ix) predictive models of gene expression levels and regulator function.« less
Microbial metatranscriptomics in a permanent marine oxygen minimum zone.
Stewart, Frank J; Ulloa, Osvaldo; DeLong, Edward F
2012-01-01
Simultaneous characterization of taxonomic composition, metabolic gene content and gene expression in marine oxygen minimum zones (OMZs) has potential to broaden perspectives on the microbial and biogeochemical dynamics in these environments. Here, we present a metatranscriptomic survey of microbial community metabolism in the Eastern Tropical South Pacific OMZ off northern Chile. Community RNA was sampled in late austral autumn from four depths (50, 85, 110, 200 m) extending across the oxycline and into the upper OMZ. Shotgun pyrosequencing of cDNA yielded 180,000 to 550,000 transcript sequences per depth. Based on functional gene representation, transcriptome samples clustered apart from corresponding metagenome samples from the same depth, highlighting the discrepancies between metabolic potential and actual transcription. BLAST-based characterizations of non-ribosomal RNA sequences revealed a dominance of genes involved with both oxidative (nitrification) and reductive (anammox, denitrification) components of the marine nitrogen cycle. Using annotations of protein-coding genes as proxies for taxonomic affiliation, we observed depth-specific changes in gene expression by key functional taxonomic groups. Notably, transcripts most closely matching the genome of the ammonia-oxidizing archaeon Nitrosopumilus maritimus dominated the transcriptome in the upper three depths, representing one in five protein-coding transcripts at 85 m. In contrast, transcripts matching the anammox bacterium Kuenenia stuttgartiensis dominated at the core of the OMZ (200 m; 1 in 12 protein-coding transcripts). The distribution of N. maritimus-like transcripts paralleled that of transcripts matching ammonia monooxygenase genes, which, despite being represented by both bacterial and archaeal sequences in the community DNA, were dominated (> 99%) by archaeal sequences in the RNA, suggesting a substantial role for archaeal nitrification in the upper OMZ. These data, as well as those describing other key OMZ metabolic processes (e.g. sulfur oxidation), highlight gene-specific expression patterns in the context of the entire community transcriptome, as well as identify key functional groups for taxon-specific genomic profiling. © 2011 Society for Applied Microbiology and Blackwell Publishing Ltd.
Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.
Al-Tobasei, Rafet; Paneru, Bam; Salem, Mohamed
2016-01-01
The ENCODE project revealed that ~70% of the human genome is transcribed. While only 1-2% of the RNAs encode for proteins, the rest are non-coding RNAs. Long non-coding RNAs (lncRNAs) form a diverse class of non-coding RNAs that are longer than 200 nt. Emerging evidence indicates that lncRNAs play critical roles in various cellular processes including regulation of gene expression. LncRNAs show low levels of gene expression and sequence conservation, which make their computational identification in genomes difficult. In this study, more than two billion Illumina sequence reads were mapped to the genome reference using the TopHat and Cufflinks software. Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed. In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test. Depending on the filtering stringency conditions, between 31,195 and 54,503 lncRNAs were identified, with only 421 matching known lncRNAs in other species. A digital gene expression atlas revealed 2,935 tissue-specific and 3,269 ubiquitously-expressed lncRNAs. This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.
Gene evolution and functions of extracellular matrix proteins in teeth
Yoshizaki, Keigo; Yamada, Yoshihiko
2013-01-01
The extracellular matrix (ECM) not only provides physical support for tissues, but it is also critical for tissue development, homeostasis and disease. Over 300 ECM molecules have been defined as comprising the “core matrisome” in mammals through the analysis of whole genome sequences. During tooth development, the structure and functions of the ECM dynamically change. In the early stages, basement membranes (BMs) separate two cell layers of the dental epithelium and the mesenchyme. Later in the differentiation stages, the BM layer is replaced with the enamel matrix and the dentin matrix, which are secreted by ameloblasts and odontoblasts, respectively. The enamel matrix genes and the dentin matrix genes are each clustered in two closed regions located on human chromosome 4 (mouse chromosome 5), except for the gene coded for amelogenin, the major enamel matrix protein, which is located on the sex chromosomes. These genes for enamel and dentin matrix proteins are derived from a common ancestral gene, but as a result of evolution, they diverged in terms of their specific functions. These matrix proteins play important roles in cell adhesion, polarity, and differentiation and mineralization of enamel and dentin matrices. Mutations of these genes cause diseases such as odontogenesis imperfect (OI) and amelogenesis imperfect (AI). In this review, we discuss the recently defined terms matrisome and matrixome for ECMs, as well as focus on genes and functions of enamel and dentin matrix proteins. PMID:23539364
Homology-dependent Gene Silencing in Paramecium
Ruiz, Françoise; Vayssié, Laurence; Klotz, Catherine; Sperling, Linda; Madeddu, Luisa
1998-01-01
Microinjection at high copy number of plasmids containing only the coding region of a gene into the Paramecium somatic macronucleus led to a marked reduction in the expression of the corresponding endogenous gene(s). The silencing effect, which is stably maintained throughout vegetative growth, has been observed for all Paramecium genes examined so far: a single-copy gene (ND7), as well as members of multigene families (centrin genes and trichocyst matrix protein genes) in which all closely related paralogous genes appeared to be affected. This phenomenon may be related to posttranscriptional gene silencing in transgenic plants and quelling in Neurospora and allows the efficient creation of specific mutant phenotypes thus providing a potentially powerful tool to study gene function in Paramecium. For the two multigene families that encode proteins that coassemble to build up complex subcellular structures the analysis presented herein provides the first experimental evidence that the members of these gene families are not functionally redundant. PMID:9529389
NASA Astrophysics Data System (ADS)
Yuan, Ye; Wang, Xiuli; Guo, Sheping; Qiu, Xuemei
2011-06-01
Gram-negative Vibrio parahaemolyticus is a common pathogen in humans and marine animals. The outer membrane protein of bacteria plays an important role in the infection and pathogenicity to the host. Thus, the outer membrane proteins are an ideal target for vaccines. We amplified a complete outer membrane protein gene (ompW) from V. parahaemolyticus ATCC 17802. We then cloned and expressed the gene into Escherichia coli BL21 (DE3) cells. The gene coded for a protein that was 42.78 kDa. We purified the protein using Ni-NTA affinity chromatography and Anti-His antibody Western blotting, respectively. Our results provide a basis for future application of the OmpW protein as a vaccine candidate against infection by V. parahaemolyticus. In addition, the purified OmpW protein can be used for further functional and structural studies.
An expanding universe of the non-coding genome in cancer biology.
Xue, Bin; He, Lin
2014-06-01
Neoplastic transformation is caused by accumulation of genetic and epigenetic alterations that ultimately convert normal cells into tumor cells with uncontrolled proliferation and survival, unlimited replicative potential and invasive growth [Hanahan,D. et al. (2011) Hallmarks of cancer: the next generation. Cell, 144, 646-674]. Although the majority of the cancer studies have focused on the functions of protein-coding genes, emerging evidence has started to reveal the importance of the vast non-coding genome, which constitutes more than 98% of the human genome. A number of non-coding RNAs (ncRNAs) derived from the 'dark matter' of the human genome exhibit cancer-specific differential expression and/or genomic alterations, and it is increasingly clear that ncRNAs, including small ncRNAs and long ncRNAs (lncRNAs), play an important role in cancer development by regulating protein-coding gene expression through diverse mechanisms. In addition to ncRNAs, nearly half of the mammalian genomes consist of transposable elements, particularly retrotransposons. Once depicted as selfish genomic parasites that propagate at the expense of host fitness, retrotransposon elements could also confer regulatory complexity to the host genomes during development and disease. Reactivation of retrotransposons in cancer, while capable of causing insertional mutagenesis and genome rearrangements to promote oncogenesis, could also alter host gene expression networks to favor tumor development. Taken together, the functional significance of non-coding genome in tumorigenesis has been previously underestimated, and diverse transcripts derived from the non-coding genome could act as integral functional components of the oncogene and tumor suppressor network. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse
Hillier, LaDeana W.; Zody, Michael C.; Goldstein, Steve; She, Xinwe; Bult, Carol J.; Agarwala, Richa; Cherry, Joshua L.; DiCuccio, Michael; Hlavina, Wratko; Kapustin, Yuri; Meric, Peter; Maglott, Donna; Birtle, Zoë; Marques, Ana C.; Graves, Tina; Zhou, Shiguo; Teague, Brian; Potamousis, Konstantinos; Churas, Christopher; Place, Michael; Herschleb, Jill; Runnheim, Ron; Forrest, Daniel; Amos-Landgraf, James; Schwartz, David C.; Cheng, Ze; Lindblad-Toh, Kerstin; Eichler, Evan E.; Ponting, Chris P.
2009-01-01
The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not. PMID:19468303
The Long Noncoding RNA Landscape of the Mouse Eye.
Chen, Weiwei; Yang, Shuai; Zhou, Zhonglou; Zhao, Xiaoting; Zhong, Jiayun; Reinach, Peter S; Yan, Dongsheng
2017-12-01
Long noncoding RNAs (lncRNAs) are important regulators of diverse biological functions. However, an extensive in-depth analysis of their expression profile and function in mammalian eyes is still lacking. Here we describe comprehensive landscapes of stage-dependent and tissue-specific lncRNA expression in the mouse eye. Affymetrix transcriptome array profiled lncRNA signatures from six different ocular tissue subsets (i.e., cornea, lens, retina, RPE, choroid, and sclera) in newborn and 8-week-old mice. Quantitative RT-PCR analysis validated array findings. Cis analyses and Gene Ontology (GO) annotation of protein-coding genes adjacent to signature lncRNA loci clarified potential lncRNA roles in maintaining tissue identity and regulating eye maturation during the aforementioned phase. In newborn and 8-week-old mice, we identified 47,332 protein-coding and noncoding gene transcripts. LncRNAs comprise 19,313 of these transcripts annotated in public data banks. During this maturation phase of these six different tissue subsets, more than 1000 lncRNAs expression levels underwent ≥2-fold changes. qRT-PCR analysis confirmed part of the gene microarray analysis results. K-means clustering identified 910 lncRNAs in the P0 groups and 686 lncRNAs in the postnatal 8-week-old groups, suggesting distinct tissue-specific lncRNA clusters. GO analysis of protein-coding genes proximal to lncRNA signatures resolved close correlations with their tissue-specific functional maturation between P0 and 8 weeks of age in the 6 tissue subsets. Characterizating maturational changes in lncRNA expression patterns as well as tissue-specific lncRNA signatures in six ocular tissues suggest important contributions made by lncRNA to the control of developmental processes in the mouse eye.
Zorc, Minja; Kunej, Tanja
2016-05-01
MicroRNAs (miRNAs) are a class of non-coding RNAs involved in posttranscriptional regulation of target genes. Regulation requires complementarity between target mRNA and the mature miRNA seed region, responsible for their recognition and binding. It has been estimated that each miRNA targets approximately 200 genes, and genetic variability of miRNA genes has been reported to affect phenotypic variability and disease susceptibility in humans, livestock species, and model organisms. Polymorphisms in miRNA genes could therefore represent biomarkers for phenotypic traits in livestock animals. In our previous study, we collected polymorphisms within miRNA genes in chicken. In the present study, we identified miRNA-related genomic overlaps to prioritize genomic regions of interest for further functional studies and biomarker discovery. Overlapping genomic regions in chicken were analyzed using the following bioinformatics tools and databases: miRNA SNiPer, Ensembl, miRBase, NCBI Blast, and QTLdb. Out of 740 known pre-miRNA genes, 263 (35.5 %) contain polymorphisms; among them, 35 contain more than three polymorphisms The most polymorphic miRNA genes in chicken are gga-miR-6662, containing 23 single nucleotide polymorphisms (SNPs) within the pre-miRNA region, including five consecutive SNPs, and gga-miR-6688, containing ten polymorphisms including three consecutive polymorphisms. Several miRNA-related genomic hotspots have been revealed in chicken genome; polymorphic miRNA genes are located within protein-coding and/or non-coding transcription units and quantitative trait loci (QTL) associated with production traits. The present study includes the first description of an exonic miRNA in a chicken genome, an overlap between the miRNA gene and the exon of the protein-coding gene (gga-miR-6578/HADHB), and the first report of a missense polymorphism located within a mature miRNA seed region. Identified miRNA-related genomic hotspots in chicken can serve researchers as a starting point for further functional studies and association studies with poultry production and health traits and the basis for systematic screening of exonic miRNAs and missense/miRNA seed polymorphisms in other genomes.
Araya, Carlos L.; Cenik, Can; Reuter, Jason A.; Kiss, Gert; Pande, Vijay S.; Snyder, Michael P.; Greenleaf, William J.
2015-01-01
Cancer sequencing studies have primarily identified cancer-driver genes by the accumulation of protein-altering mutations. An improved method would be annotation-independent, sensitive to unknown distributions of functions within proteins, and inclusive of non-coding drivers. We employed density-based clustering methods in 21 tumor types to detect variably-sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and non-coding elements, including transcription factor binding sites and untranslated regions mutated in up to ∼15% of specific tumor types. SMRs reveal spatial clustering of mutations at molecular domains and interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated among tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally-agnostic driver identification. PMID:26691984
A high resolution atlas of gene expression in the domestic sheep (Ovis aries)
Farquhar, Iseabail L.; Young, Rachel; Lefevre, Lucas; Pridans, Clare; Tsang, Hiu G.; Afrasiabi, Cyrus; Watson, Mick; Whitelaw, C. Bruce; Freeman, Tom C.; Archibald, Alan L.; Hume, David A.
2017-01-01
Sheep are a key source of meat, milk and fibre for the global livestock sector, and an important biomedical model. Global analysis of gene expression across multiple tissues has aided genome annotation and supported functional annotation of mammalian genes. We present a large-scale RNA-Seq dataset representing all the major organ systems from adult sheep and from several juvenile, neonatal and prenatal developmental time points. The Ovis aries reference genome (Oar v3.1) includes 27,504 genes (20,921 protein coding), of which 25,350 (19,921 protein coding) had detectable expression in at least one tissue in the sheep gene expression atlas dataset. Network-based cluster analysis of this dataset grouped genes according to their expression pattern. The principle of ‘guilt by association’ was used to infer the function of uncharacterised genes from their co-expression with genes of known function. We describe the overall transcriptional signatures present in the sheep gene expression atlas and assign those signatures, where possible, to specific cell populations or pathways. The findings are related to innate immunity by focusing on clusters with an immune signature, and to the advantages of cross-breeding by examining the patterns of genes exhibiting the greatest expression differences between purebred and crossbred animals. This high-resolution gene expression atlas for sheep is, to our knowledge, the largest transcriptomic dataset from any livestock species to date. It provides a resource to improve the annotation of the current reference genome for sheep, presenting a model transcriptome for ruminants and insight into gene, cell and tissue function at multiple developmental stages. PMID:28915238
A high resolution atlas of gene expression in the domestic sheep (Ovis aries).
Clark, Emily L; Bush, Stephen J; McCulloch, Mary E B; Farquhar, Iseabail L; Young, Rachel; Lefevre, Lucas; Pridans, Clare; Tsang, Hiu G; Wu, Chunlei; Afrasiabi, Cyrus; Watson, Mick; Whitelaw, C Bruce; Freeman, Tom C; Summers, Kim M; Archibald, Alan L; Hume, David A
2017-09-01
Sheep are a key source of meat, milk and fibre for the global livestock sector, and an important biomedical model. Global analysis of gene expression across multiple tissues has aided genome annotation and supported functional annotation of mammalian genes. We present a large-scale RNA-Seq dataset representing all the major organ systems from adult sheep and from several juvenile, neonatal and prenatal developmental time points. The Ovis aries reference genome (Oar v3.1) includes 27,504 genes (20,921 protein coding), of which 25,350 (19,921 protein coding) had detectable expression in at least one tissue in the sheep gene expression atlas dataset. Network-based cluster analysis of this dataset grouped genes according to their expression pattern. The principle of 'guilt by association' was used to infer the function of uncharacterised genes from their co-expression with genes of known function. We describe the overall transcriptional signatures present in the sheep gene expression atlas and assign those signatures, where possible, to specific cell populations or pathways. The findings are related to innate immunity by focusing on clusters with an immune signature, and to the advantages of cross-breeding by examining the patterns of genes exhibiting the greatest expression differences between purebred and crossbred animals. This high-resolution gene expression atlas for sheep is, to our knowledge, the largest transcriptomic dataset from any livestock species to date. It provides a resource to improve the annotation of the current reference genome for sheep, presenting a model transcriptome for ruminants and insight into gene, cell and tissue function at multiple developmental stages.
Nuclear export of RNA: Different sizes, shapes and functions.
Williams, Tobias; Ngo, Linh H; Wickramasinghe, Vihandha O
2018-03-01
Export of protein-coding and non-coding RNA molecules from the nucleus to the cytoplasm is critical for gene expression. This necessitates the continuous transport of RNA species of different size, shape and function through nuclear pore complexes via export receptors and adaptor proteins. Here, we provide an overview of the major RNA export pathways in humans, highlighting the similarities and differences between each. Its importance is underscored by the growing appreciation that deregulation of RNA export pathways is associated with human diseases like cancer. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).
Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C
2015-01-01
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.
Computational Identification and Functional Predictions of Long Noncoding RNA in Zea mays
Boerner, Susan; McGinnis, Karen M.
2012-01-01
Background Computational analysis of cDNA sequences from multiple organisms suggests that a large portion of transcribed DNA does not code for a functional protein. In mammals, noncoding transcription is abundant, and often results in functional RNA molecules that do not appear to encode proteins. Many long noncoding RNAs (lncRNAs) appear to have epigenetic regulatory function in humans, including HOTAIR and XIST. While epigenetic gene regulation is clearly an essential mechanism in plants, relatively little is known about the presence or function of lncRNAs in plants. Methodology/Principal Findings To explore the connection between lncRNA and epigenetic regulation of gene expression in plants, a computational pipeline using the programming language Python has been developed and applied to maize full length cDNA sequences to identify, classify, and localize potential lncRNAs. The pipeline was used in parallel with an SVM tool for identifying ncRNAs to identify the maximal number of ncRNAs in the dataset. Although the available library of sequences was small and potentially biased toward protein coding transcripts, 15% of the sequences were predicted to be noncoding. Approximately 60% of these sequences appear to act as precursors for small RNA molecules and may function to regulate gene expression via a small RNA dependent mechanism. ncRNAs were predicted to originate from both genic and intergenic loci. Of the lncRNAs that originated from genic loci, ∼20% were antisense to the host gene loci. Conclusions/Significance Consistent with similar studies in other organisms, noncoding transcription appears to be widespread in the maize genome. Computational predictions indicate that maize lncRNAs may function to regulate expression of other genes through multiple RNA mediated mechanisms. PMID:22916204
Liu, Yangyang; Han, Xiao; Yuan, Junting; Geng, Tuoyu; Chen, Shihao; Hu, Xuming; Cui, Isabelle H; Cui, Hengmi
2017-04-07
The type II bacterial CRISPR/Cas9 system is a simple, convenient, and powerful tool for targeted gene editing. Here, we describe a CRISPR/Cas9-based approach for inserting a poly(A) transcriptional terminator into both alleles of a targeted gene to silence protein-coding and non-protein-coding genes, which often play key roles in gene regulation but are difficult to silence via insertion or deletion of short DNA fragments. The integration of 225 bp of bovine growth hormone poly(A) signals into either the first intron or the first exon or behind the promoter of target genes caused efficient termination of expression of PPP1R12C , NSUN2 (protein-coding genes), and MALAT1 (non-protein-coding gene). Both NeoR and PuroR were used as markers in the selection of clonal cell lines with biallelic integration of a poly(A) signal. Genotyping analysis indicated that the cell lines displayed the desired biallelic silencing after a brief selection period. These combined results indicate that this CRISPR/Cas9-based approach offers an easy, convenient, and efficient novel technique for gene silencing in cell lines, especially for those in which gene integration is difficult because of a low efficiency of homology-directed repair. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336”
Kumar, Ranjit; Lawrence, Mark L.; Watt, James; Cooksey, Amanda M.; Burgess, Shane C.; Nanduri, Bindu
2012-01-01
Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify “novel” genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method. The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations. PMID:22276113
RNA-seq based transcriptional map of bovine respiratory disease pathogen "Histophilus somni 2336".
Kumar, Ranjit; Lawrence, Mark L; Watt, James; Cooksey, Amanda M; Burgess, Shane C; Nanduri, Bindu
2012-01-01
Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method.The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations.
Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts.
Sanford, Jeremy R; Wang, Xin; Mort, Matthew; Vanduyn, Natalia; Cooper, David N; Mooney, Sean D; Edenberg, Howard J; Liu, Yunlong
2009-03-01
Metazoan genes are encrypted with at least two superimposed codes: the genetic code to specify the primary structure of proteins and the splicing code to expand their proteomic output via alternative splicing. Here, we define the specificity of a central regulator of pre-mRNA splicing, the conserved, essential splicing factor SFRS1. Cross-linking immunoprecipitation and high-throughput sequencing (CLIP-seq) identified 23,632 binding sites for SFRS1 in the transcriptome of cultured human embryonic kidney cells. SFRS1 was found to engage many different classes of functionally distinct transcripts including mRNA, miRNA, snoRNAs, ncRNAs, and conserved intergenic transcripts of unknown function. The majority of these diverse transcripts share a purine-rich consensus motif corresponding to the canonical SFRS1 binding site. The consensus site was not only enriched in exons cross-linked to SFRS1 in vivo, but was also enriched in close proximity to splice sites. mRNAs encoding RNA processing factors were significantly overrepresented, suggesting that SFRS1 may broadly influence the post-transcriptional control of gene expression in vivo. Finally, a search for the SFRS1 consensus motif within the Human Gene Mutation Database identified 181 mutations in 82 different genes that disrupt predicted SFRS1 binding sites. This comprehensive analysis substantially expands the known roles of human SR proteins in the regulation of a diverse array of RNA transcripts.
Origins of Genes: "Big Bang" or Continuous Creation?
NASA Astrophysics Data System (ADS)
Kesse, Paul K.; Gibbs, Adrian
1992-10-01
Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes.
Milanesi, Luciano; Petrillo, Mauro; Sepe, Leandra; Boccia, Angelo; D'Agostino, Nunzio; Passamano, Myriam; Di Nardo, Salvatore; Tasco, Gianluca; Casadio, Rita; Paolella, Giovanni
2005-01-01
Background Protein kinases are a well defined family of proteins, characterized by the presence of a common kinase catalytic domain and playing a significant role in many important cellular processes, such as proliferation, maintenance of cell shape, apoptosys. In many members of the family, additional non-kinase domains contribute further specialization, resulting in subcellular localization, protein binding and regulation of activity, among others. About 500 genes encode members of the kinase family in the human genome, and although many of them represent well known genes, a larger number of genes code for proteins of more recent identification, or for unknown proteins identified as kinase only after computational studies. Results A systematic in silico study performed on the human genome, led to the identification of 5 genes, on chromosome 1, 11, 13, 15 and 16 respectively, and 1 pseudogene on chromosome X; some of these genes are reported as kinases from NCBI but are absent in other databases, such as KinBase. Comparative analysis of 483 gene regions and subsequent computational analysis, aimed at identifying unannotated exons, indicates that a large number of kinase may code for alternately spliced forms or be incorrectly annotated. An InterProScan automated analysis was perfomed to study domain distribution and combination in the various families. At the same time, other structural features were also added to the annotation process, including the putative presence of transmembrane alpha helices, and the cystein propensity to participate into a disulfide bridge. Conclusion The predicted human kinome was extended by identifiying both additional genes and potential splice variants, resulting in a varied panorama where functionality may be searched at the gene and protein level. Structural analysis of kinase proteins domains as defined in multiple sources together with transmembrane alpha helices and signal peptide prediction provides hints to function assignment. The results of the human kinome analysis are collected in the KinWeb database, available for browsing and searching over the internet, where all results from the comparative analysis and the gene structure annotation are made available, alongside the domain information. Kinases may be searched by domain combinations and the relative genes may be viewed in a graphic browser at various level of magnification up to gene organization on the full chromosome set. PMID:16351747
[Regulation of heat shock gene expression in response to stress].
Garbuz, D G
2017-01-01
Heat shock (HS) genes, or stress genes, code for a number of proteins that collectively form the most ancient and universal stress defense system. The system determines the cell capability of adaptation to various adverse factors and performs a variety of auxiliary functions in normal physiological conditions. Common stress factors, such as higher temperatures, hypoxia, heavy metals, and others, suppress transcription and translation for the majority of genes, while HS genes are upregulated. Transcription of HS genes is controlled by transcription factors of the HS factor (HSF) family. Certain HSFs are activated on exposure to higher temperatures or other adverse factors to ensure stress-induced HS gene expression, while other HSFs are specifically activated at particular developmental stages. The regulation of the main mammalian stress-inducible factor HSF1 and Drosophila melanogaster HSF includes many components, such as a variety of early warning signals indicative of abnormal cell activity (e.g., increases in intracellular ceramide, cytosolic calcium ions, or partly denatured proteins); protein kinases, which phosphorylate HSFs at various Ser residues; acetyltransferases; and regulatory proteins, such as SUMO and HSBP1. Transcription factors other than HSFs are also involved in activating HS gene transcription; the set includes D. melanogaster GAF, mammalian Sp1 and NF-Y, and other factors. Transcription of several stress genes coding for molecular chaperones of the glucose-regulated protein (GRP) family is predominantly regulated by another stress-detecting system, which is known as the unfolded protein response (UPR) system and is activated in response to massive protein misfolding in the endoplasmic reticulum and mitochondrial matrix. A translational fine tuning of HS protein expression occurs via changing the phosphorylation status of several proteins involved in translation initiation. In addition, specific signal sequences in the 5'-UTRs of some HS protein mRNAs ensure their preferential translation in stress.
Chakraborty, Supriyo; Uddin, Arif; Mazumder, Tarikul Huda; Choudhury, Monisha Nath; Malakar, Arup Kumar; Paul, Prosenjit; Halder, Binata; Deka, Himangshu; Mazumder, Gulshana Akthar; Barbhuiya, Riazul Ahmed; Barbhuiya, Masuk Ahmed; Devi, Warepam Jesmi
2017-12-02
The study of codon usage coupled with phylogenetic analysis is an important tool to understand the genetic and evolutionary relationship of a gene. The 13 protein coding genes of human mitochondria are involved in electron transport chain for the generation of energy currency (ATP). However, no work has yet been reported on the codon usage of the mitochondrial protein coding genes across six continents. To understand the patterns of codon usage in mitochondrial genes across six different continents, we used bioinformatic analyses to analyze the protein coding genes. The codon usage bias was low as revealed from high ENC value. Correlation between codon usage and GC3 suggested that all the codons ending with G/C were positively correlated with GC3 but vice versa for A/T ending codons with the exception of ND4L and ND5 genes. Neutrality plot revealed that for the genes ATP6, COI, COIII, CYB, ND4 and ND4L, natural selection might have played a major role while mutation pressure might have played a dominant role in the codon usage bias of ATP8, COII, ND1, ND2, ND3, ND5 and ND6 genes. Phylogenetic analysis indicated that evolutionary relationships in each of 13 protein coding genes of human mitochondria were different across six continents and further suggested that geographical distance was an important factor for the origin and evolution of 13 protein coding genes of human mitochondria. Copyright © 2017 Elsevier B.V. and Mitochondria Research Society. All rights reserved.
Nagarkar-Jaiswal, Sonal; Lee, Pei-Tseng; Campbell, Megan E.; ...
2015-03-31
Here, we document a collection of ~7434 MiMIC (Minos Mediated Integration Cassette) insertions of which 2854 are inserted in coding introns. They allowed us to create a library of 400 GFP-tagged genes. We show that 72% of internally tagged proteins are functional, and that more than 90% can be imaged in unfixed tissues. Moreover, the tagged mRNAs can be knocked down by RNAi against GFP (iGFPi), and the tagged proteins can be efficiently knocked down by deGradFP technology. The phenotypes associated with RNA and protein knockdown typically correspond to severe loss of function or null mutant phenotypes. Finally, we demonstratemore » reversible, spatial, and temporal knockdown of tagged proteins in larvae and adult flies. This new strategy and collection of strains allows unprecedented in vivo manipulations in flies for many genes. These strategies will likely extend to vertebrates.« less
A Dual Origin of the Xist Gene from a Protein-Coding Gene and a Set of Transposable Elements
Elisaphenko, Eugeny A.; Kolesnikov, Nikolay N.; Shevchenko, Alexander I.; Rogozin, Igor B.; Nesterova, Tatyana B.; Brockdorff, Neil; Zakian, Suren M.
2008-01-01
X-chromosome inactivation, which occurs in female eutherian mammals is controlled by a complex X-linked locus termed the X-inactivation center (XIC). Previously it was proposed that genes of the XIC evolved, at least in part, as a result of pseudogenization of protein-coding genes. In this study we show that the key XIC gene Xist, which displays fragmentary homology to a protein-coding gene Lnx3, emerged de novo in early eutherians by integration of mobile elements which gave rise to simple tandem repeats. The Xist gene promoter region and four out of ten exons found in eutherians retain homology to exons of the Lnx3 gene. The remaining six Xist exons including those with simple tandem repeats detectable in their structure have similarity to different transposable elements. Integration of mobile elements into Xist accompanies the overall evolution of the gene and presumably continues in contemporary eutherian species. Additionally we showed that the combination of remnants of protein-coding sequences and mobile elements is not unique to the Xist gene and is found in other XIC genes producing non-coding nuclear RNA. PMID:18575625
Hashemi, Seirana; Nowzari Dalini, Abbas; Jalali, Adrin; Banaei-Moghaddam, Ali Mohammad; Razaghi-Moghadam, Zahra
2017-08-16
Discriminating driver mutations from the ones that play no role in cancer is a severe bottleneck in elucidating molecular mechanisms underlying cancer development. Since protein domains are representatives of functional regions within proteins, mutations on them may disturb the protein functionality. Therefore, studying mutations at domain level may point researchers to more accurate assessment of the functional impact of the mutations. This article presents a comprehensive study to map mutations from 29 cancer types to both sequence- and structure-based domains. Statistical analysis was performed to identify candidate domains in which mutations occur with high statistical significance. For each cancer type, the corresponding type-specific domains were distinguished among all candidate domains. Subsequently, cancer type-specific domains facilitated the identification of specific proteins for each cancer type. Besides, performing interactome analysis on specific proteins of each cancer type showed high levels of interconnectivity among them, which implies their functional relationship. To evaluate the role of mitochondrial genes, stem cell-specific genes and DNA repair genes in cancer development, their mutation frequency was determined via further analysis. This study has provided researchers with a publicly available data repository for studying both CATH and Pfam domain regions on protein-coding genes. Moreover, the associations between different groups of genes/domains and various cancer types have been clarified. The work is available at http://www.cancerouspdomains.ir .
Tellgren-Roth, Christian; Baudo, Charles D.; Kennell, John C.; Sun, Sheng; Billmyre, R. Blake; Schröder, Markus S.; Andersson, Anna; Holm, Tina; Sigurgeirsson, Benjamin; Wu, Guangxi; Sankaranarayanan, Sundar Ram; Siddharthan, Rahul; Sanyal, Kaustuv; Lundeberg, Joakim; Nystedt, Björn; Boekhout, Teun; Dawson, Thomas L.; Heitman, Joseph
2017-01-01
Abstract Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies. PMID:28100699
DOE Office of Scientific and Technical Information (OSTI.GOV)
Julie Anne Roden, Branids Belt, Jason Barzel Ross, Thomas Tachibana, Joe Vargas, Mary Beth Mudgett
2004-11-23
The bacterial pathogen Xanthomonas campestris pv. vesicatoria (Xcv) uses a type III secretion system (TTSS) to translocate effector proteins into host plant cells. The TTSS is required for Xcv colonization, yet the identity of many proteins translocated through this apparatus is not known. We used a genetic screen to functionally identify Xcv TTSS effectors. A transposon 5 (Tn5)-based transposon construct including the coding sequence for the Xcv AvrBs2 effector devoid of its TTSS signal was randomly inserted into the Xcv genome. Insertion of the avrBs2 reporter gene into Xcv genes coding for proteins containing a functional TTSS signal peptide resultedmore » in the creation of chimeric TTSS effector::AvrBs2 fusion proteins. Xcv strains containing these fusions translocated the AvrBs2 reporter in a TTSS-dependent manner into resistant BS2 pepper cells during infection, activating the avrBs2-dependent hypersensitive response (HR). We isolated seven chimeric fusion proteins and designated the identified TTSS effectors as Xanthomonas outer proteins (Xops). Translocation of each Xop was confirmed by using the calmodulin-dependent adenylate cydase reporter assay. Three xop genes are Xanthomonas spp.-specific, whereas homologs for the rest are found in other phytopathogenic bacteria. XopF1 and XopF2 define an effector gene family in Xcv. XopN contains a eukaryotic protein fold repeat and is required for full Xcv pathogenicity in pepper and tomato. The translocated effectors identified in this work expand our knowledge of the diversity of proteins that Xcv uses to manipulate its hosts.« less
Efficient analysis of mouse genome sequences reveal many nonsense variants
Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E.; Libert, Claude
2016-01-01
Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605
Ma, Wei; Gabriel, Tobias Sebastian; Martis, Mihaela Maria; Gursinsky, Torsten; Schubert, Veit; Vrána, Jan; Doležel, Jaroslav; Grundlach, Heidrun; Altschmied, Lothar; Scholz, Uwe; Himmelbach, Axel; Behrens, Sven-Erik; Banaei-Moghaddam, Ali Mohammad; Houben, Andreas
2017-01-01
B chromosomes (Bs) are supernumerary, dispensable parts of the nuclear genome, which appear in many different species of eukaryote. So far, Bs have been considered to be genetically inert elements without any functional genes. Our comparative transcriptome analysis and the detection of active RNA polymerase II (RNAPII) in the proximity of B chromatin demonstrate that the Bs of rye (Secale cereale) contribute to the transcriptome. In total, 1954 and 1218 B-derived transcripts with an open reading frame were expressed in generative and vegetative tissues, respectively. In addition to B-derived transposable element transcripts, a high percentage of short transcripts without detectable similarity to known proteins and gene fragments from A chromosomes (As) were found, suggesting an ongoing gene erosion process. In vitro analysis of the A- and B-encoded AGO4B protein variants demonstrated that both possess RNA slicer activity. These data demonstrate unambiguously the presence of a functional AGO4B gene on Bs and that these Bs carry both functional protein coding genes and pseudogene copies. Thus, B-encoded genes may provide an additional level of gene control and complexity in combination with their related A-located genes. Hence, physiological effects, associated with the presence of Bs, may partly be explained by the activity of B-located (pseudo)genes. © 2016 IPK Gatersleben. New Phytologist © 2016 New Phytologist Trust.
Li, C-Q; Huang, G-W; Wu, Z-Y; Xu, Y-J; Li, X-C; Xue, Y-J; Zhu, Y; Zhao, J-M; Li, M; Zhang, J; Wu, J-Y; Lei, F; Wang, Q-Y; Li, S; Zheng, C-P; Ai, B; Tang, Z-D; Feng, C-C; Liao, L-D; Wang, S-H; Shen, J-H; Liu, Y-J; Bai, X-F; He, J-Z; Cao, H-H; Wu, B-L; Wang, M-R; Lin, D-C; Koeffler, H P; Wang, L-D; Li, X; Li, E-M; Xu, L-Y
2017-02-13
Long non-coding RNAs (lncRNAs) have a critical role in cancer initiation and progression, and thus may mediate oncogenic or tumor suppressing effects, as well as be a new class of cancer therapeutic targets. We performed high-throughput sequencing of RNA (RNA-seq) to investigate the expression level of lncRNAs and protein-coding genes in 30 esophageal samples, comprised of 15 esophageal squamous cell carcinoma (ESCC) samples and their 15 paired non-tumor tissues. We further developed an integrative bioinformatics method, denoted URW-LPE, to identify key functional lncRNAs that regulate expression of downstream protein-coding genes in ESCC. A number of known onco-lncRNA and many putative novel ones were effectively identified by URW-LPE. Importantly, we identified lncRNA625 as a novel regulator of ESCC cell proliferation, invasion and migration. ESCC patients with high lncRNA625 expression had significantly shorter survival time than those with low expression. LncRNA625 also showed specific prognostic value for patients with metastatic ESCC. Finally, we identified E1A-binding protein p300 (EP300) as a downstream executor of lncRNA625-induced transcriptional responses. These findings establish a catalog of novel cancer-associated functional lncRNAs, which will promote our understanding of lncRNA-mediated regulation in this malignancy.
Non-coding functions of alternative pre-mRNA splicing in development
Mockenhaupt, Stefan; Makeyev, Eugene V.
2015-01-01
A majority of messenger RNA precursors (pre-mRNAs) in the higher eukaryotes undergo alternative splicing to generate more than one mature product. By targeting the open reading frame region this process increases diversity of protein isoforms beyond the nominal coding capacity of the genome. However, alternative splicing also frequently controls output levels and spatiotemporal features of cellular and organismal gene expression programs. Here we discuss how these non-coding functions of alternative splicing contribute to development through regulation of mRNA stability, translational efficiency and cellular localization. PMID:26493705
Kikhno, Irina
2014-01-01
Highly homologous sequences 154–157 bp in length grouped under the name of “conserved non-protein-coding element” (CNE) were revealed in all of the sequenced genomes of baculoviruses belonging to the genus Alphabaculovirus. A CNE alignment led to the detection of a set of highly conserved nucleotide clusters that occupy strictly conserved positions in the CNE sequence. The significant length of the CNE and conservation of both its length and cluster architecture were identified as a combination of characteristics that make this CNE different from known viral non-coding functional sequences. The essential role of the CNE in the Alphabaculovirus life cycle was demonstrated through the use of a CNE-knockout Autographa californica multiple nucleopolyhedrovirus (AcMNPV) bacmid. It was shown that the essential function of the CNE was not mediated by the presumed expression activities of the protein- and non-protein-coding genes that overlap the AcMNPV CNE. On the basis of the presented data, the AcMNPV CNE was categorized as a complex-structured, polyfunctional genomic element involved in an essential DNA transaction that is associated with an undefined function of the baculovirus genome. PMID:24740153
Carraro, Nicola; Tisdale-Orr, Tracy Eizabeth; Clouse, Ronald Matthew; Knöller, Anne Sophie; Spicer, Rachel
2012-01-01
Intercellular transport of the plant hormone auxin is mediated by three families of membrane-bound protein carriers, with the PIN and ABCB families coding primarily for efflux proteins and the AUX/LAX family coding for influx proteins. In the last decade our understanding of gene and protein function for these transporters in Arabidopsis has expanded rapidly but very little is known about their role in woody plant development. Here we present a comprehensive account of all three families in the model woody species Populus, including chromosome distribution, protein structure, quantitative gene expression, and evolutionary relationships. The PIN and AUX/LAX gene families in Populus comprise 16 and 8 members respectively and show evidence for the retention of paralogs following a relatively recent whole genome duplication. There is also differential expression across tissues within many gene pairs. The ABCB family is previously undescribed in Populus and includes 20 members, showing a much deeper evolutionary history, including both tandem and whole genome duplication as well as probable gene loss. A striking number of these transporters are expressed in developing Populus stems and we suggest that evolutionary and structural relationships with known auxin transporters in Arabidopsis can point toward candidate genes for further study in Populus. This is especially important for the ABCBs, which is a large family and includes members in Arabidopsis that are able to transport other substrates in addition to auxin. Protein modeling, sequence alignment and expression data all point to ABCB1.1 as a likely auxin transport protein in Populus. Given that basipetal auxin flow through the cambial zone shapes the development of woody stems, it is important that we identify the full complement of genes involved in this process. This work should lay the foundation for studies targeting specific proteins for functional characterization and in situ localization. PMID:22645571
Comparative architecture of silks, fibrous proteins and their encoding genes in insects and spiders.
Craig, Catherine L; Riekel, Christian
2002-12-01
The known silk fibroins and fibrous glues are thought to be encoded by members of the same gene family. All silk fibroins sequenced to date contain regions of long-range order (crystalline regions) and/or short-range order (non-crystalline regions). All of the sequenced fibroin silks (Flag or silk from flagelliform gland in spiders; Fhc or heavy chain fibroin silks produced by Lepidoptera larvae) are made up of hierarchically organized, repetitive arrays of amino acids. Fhc fibroin genes are characterized by a similar molecular genetic architecture of two exons and one intron, but the organization and size of these units differs. The Flag, Ser (sericin gene) and BR (Balbiani ring genes; both fibrous proteins) genes are made up of multiple exons and introns. Sequences coding for crystalline and non-crystalline protein domains are integrated in the repetitive regions of Fhc and MA exons, but not in the protein glues Ser1 and BR-1. Genetic 'hot-spots' promote recombination errors in Fhc, MA, and Flag. Codon bias, structural constraint, point mutations, and shortened coding arrays may be alternative means of stabilizing precursor mRNA transcripts. Differential regulation of gene expression and selective splicing of the mRNA transcript may allow rapid adaptation of silk functional properties to different physical environments.
Characterization of the orf1glnKamtB operon of Herbaspirillum seropedicae.
Noindorf, Lilian; Rego, Fabiane G M; Baura, Valter A; Monteiro, Rose A; Wassem, Roseli; Cruz, Leonardo M; Rigo, Liu U; Souza, Emanuel M; Steffens, Maria B R; Pedrosa, Fabio O; Chubatsu, Leda S
2006-03-01
Herbaspirillum seropedicae is an endophytic nitrogen-fixing bacterium that colonizes economically important grasses. In this organism, the amtB gene is co-transcribed with two other genes: glnK that codes for a PII-like protein and orf1 that codes for a probable periplasmatic protein of unknown function. The expression of the orf1glnKamtB operon is increased under nitrogen-limiting conditions and is dependent on NtrC. An amtB mutant failed to transport methylammonium. Post-translational control of nitrogenase was also partially impaired in this mutant, since a complete switch-off of nitrogenase after ammonium addition was not observed. This result suggests that the AmtB protein is involved in the signaling pathway for the reversible inactivation of nitrogenase in H. seropedicae.
Zhang, Yu; Yao, Youlin; Jiang, Siyuan; Lu, Yilu; Liu, Yunqiang; Tao, Dachang; Zhang, Sizhong; Ma, Yongxin
2015-04-01
To identify protein-protein interaction partners of PER1 (period circadian protein homolog 1), key component of the molecular oscillation system of the circadian rhythm in tumors using bacterial two-hybrid system technique. Human cervical carcinoma cell Hela library was adopted. Recombinant bait plasmid pBT-PER1 and pTRG cDNA plasmid library were cotransformed into the two-hybrid system reporter strain cultured in a special selective medium. Target clones were screened. After isolating the positive clones, the target clones were sequenced and analyzed. Fourteen protein coding genes were identified, 4 of which were found to contain whole coding regions of genes, which included optic atrophy 3 protein (OPA3) associated with mitochondrial dynamics and homo sapiens cutA divalent cation tolerance homolog of E. coli (CUTA) associated with copper metabolism. There were also cellular events related proteins and proteins which are involved in biochemical reaction and signal transduction-related proteins. Identification of potential interacting proteins with PER1 in tumors may provide us new insights into the functions of the circadian clock protein PER1 during tumorigenesis.
A High-Resolution Gene Map of the Chloroplast Genome of the Red Alga Porphyra purpurea.
Reith, M; Munholland, J
1993-01-01
Extensive DNA sequencing of the chloroplast genome of the red alga Porphyra purpurea has resulted in the detection of more than 125 genes. Fifty-eight (approximately 46%) of these genes are not found on the chloroplast genomes of land plants. These include genes encoding 17 photosynthetic proteins, three tRNAs, and nine ribosomal proteins. In addition, nine genes encoding proteins related to biosynthetic functions, six genes encoding proteins involved in gene expression, and at least five genes encoding miscellaneous proteins are among those not known to be located on land plant chloroplast genomes. The increased coding capacity of the P. purpurea chloroplast genome, along with other characteristics such as the absence of introns and the conservation of ancestral operons, demonstrate the primitive nature of the P. purpurea chloroplast genome. In addition, evidence for a monophyletic origin of chloroplasts is suggested by the identification of two groups of genes that are clustered in chloroplast genomes but not in cyanobacteria. PMID:12271072
Systematic screening for mutations in the promoter and the coding region of the 5-HT{sub 1A} gene
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erdmann, J.; Shimron-Abarbanell, D.; Cichon, S.
1995-10-09
In the present study we sought to identify genetic variation in the 5-HT{sub 1A} receptor gene which through alteration of protein function or level of expression might contribute to the genetic predisposition to neuropsychiatric diseases. Genomic DNA samples from 159 unrelated subjects (including 45 schizophrenic, 46 bipolar affective, and 43 patients with Tourette`s syndrome, as well as 25 healthy controls) were investigated by single-strand conformation analysis. Overlapping PCR (polymerase chain reaction) fragments covered the whole coding sequence as well as the 5{prime} untranslated region of the 5-HT{sub 1A} gene. The region upstream to the coding sequence we investigated contains amore » functional promoter. We found two rare nucleotide sequence variants. Both mutations are located in the coding region of the gene: a coding mutation (A{yields}G) in nucleotide position 82 which leads to an amino acid exchange (Ile{yields}Val) in position 28 of the receptor protein and a silent mutation (C{yields}T) in nucleotide position 549. The occurrence of the Ile-28-Val substitution was studied in an extended sample of patients (n = 352) and controls (n = 210) but was found in similar frequencies in all groups. Thus, this mutation is unlikely to play a significant role in the genetic predisposition to the diseases investigated. In conclusion, our study does not provide evidence that the 5-HT{sub 1A} gene plays either a major or a minor role in the genetic predisposition to schizophrenia, bipolar affective disorder, or Tourette`s syndrome. 29 refs., 4 figs., 1 tab.« less
Mu-Like Prophage in Serogroup B Neisseria meningitidis Coding for Surface-Exposed Antigens
Masignani, Vega; Giuliani, Marzia Monica; Tettelin, Hervé; Comanducci, Maurizio; Rappuoli, Rino; Scarlato, Vincenzo
2001-01-01
Sequence analysis of the genome of Neisseria meningititdis serogroup B revealed the presence of an ∼35-kb region inserted within a putative gene coding for an ABC-type transporter. The region contains 46 open reading frames, 29 of which are colinear and homologous to the genes of Escherichia coli Mu phage. Two prophages with similar organizations were also found in serogroup A meningococcus, and one was found in Haemophilus influenzae. Early and late phage functions are well preserved in this family of Mu-like prophages. Several regions of atypical nucleotide content were identified. These likely represent genes acquired by horizontal transfer. Three of the acquired genes are shown to code for surface-associated antigens, and the encoded proteins are able to induce bactericidal antibodies. PMID:11254622
Tornow, J; Santangelo, G M
1994-06-01
A duplicate copy of the RPL37A gene (encoding ribosomal protein L37) was cloned and sequenced. The coding region of RPL37B is very similar to that of RPL37A, with only one conservative amino-acid difference. However, the intron and flanking sequences of the two genes are extremely dissimilar. Disruption experiments indicate that the two loci are not functionally equivalent: disruption of RPL37B was insignificant, but disruption of RPL37A severely impaired the growth rate of the cell. When both RPL37 loci are disrupted, the cell is unable to grow at all, indicating that rpL37 is an essential protein. The functional disparity between the two RPL37 loci could be explained by differential gene expression. The results of two experiments support this idea: gene fusion of RPL37A to a reporter gene resulted in six-fold higher mRNA levels than was generated by the same reporter gene fused to RPL37B, and a modest increase in gene dosage of RPL37B overcame the lack of a functional RPL37A gene.
Darbani, Behrooz; Noeparvar, Shahin; Borg, Søren
2016-01-01
RNA circularization made by head-to-tail back-splicing events is involved in the regulation of gene expression from transcriptional to post-translational levels. By exploiting RNA-Seq data and down-stream analysis, we shed light on the importance of circular RNAs in plants. The results introduce circular RNAs as novel interactors in the regulation of gene expression in plants and imply the comprehensiveness of this regulatory pathway by identifying circular RNAs for a diverse set of genes. These genes are involved in several aspects of cellular metabolism as hormonal signaling, intracellular protein sorting, carbohydrate metabolism and cell-wall biogenesis, respiration, amino acid biosynthesis, transcription and translation, and protein ubiquitination. Additionally, these parental loci of circular RNAs, from both nuclear and mitochondrial genomes, encode for different transcript classes including protein coding transcripts, microRNA, rRNA, and long non-coding/microprotein coding RNAs. The results shed light on the mitochondrial exonic circular RNAs and imply the importance of circular RNAs for regulation of mitochondrial genes. Importantly, we introduce circular RNAs in barley and elucidate their cellular-level alterations across tissues and in response to micronutrients iron and zinc. In further support of circular RNAs' functional roles in plants, we report several cases where fluctuations of circRNAs do not correlate with the levels of their parental-loci encoded linear transcripts. PMID:27375638
Genetic evidence for conserved non-coding element function across species–the ears have it
Turner, Eric E.; Cox, Timothy C.
2014-01-01
Comparison of genomic sequences from diverse vertebrate species has revealed numerous highly conserved regions that do not appear to encode proteins or functional RNAs. Often these “conserved non-coding elements,” or CNEs, can direct gene expression to specific tissues in transgenic models, demonstrating they have regulatory function. CNEs are frequently found near “developmental” genes, particularly transcription factors, implying that these elements have essential regulatory roles in development. However, actual examples demonstrating CNE regulatory functions across species have been few, and recent loss-of-function studies of several CNEs in mice have shown relatively minor effects. In this Perspectives article, we discuss new findings in “fancy” rats and Highland cattle demonstrating that function of a CNE near the Hmx1 gene is crucial for normal external ear development and when disrupted can mimic loss-of function Hmx1 coding mutations in mice and humans. These findings provide important support for conserved developmental roles of CNEs in divergent species, and reinforce the concept that CNEs should be examined systematically in the ongoing search for genetic causes of human developmental disorders in the era of genome-scale sequencing. PMID:24478720
Dissecting the chromatin interactome of microRNA genes.
Chen, Dijun; Fu, Liang-Yu; Zhang, Zhao; Li, Guoliang; Zhang, Hang; Jiang, Li; Harrison, Andrew P; Shanahan, Hugh P; Klukas, Christian; Zhang, Hong-Yu; Ruan, Yijun; Chen, Ling-Ling; Chen, Ming
2014-03-01
Our knowledge of the role of higher-order chromatin structures in transcription of microRNA genes (MIRs) is evolving rapidly. Here we investigate the effect of 3D architecture of chromatin on the transcriptional regulation of MIRs. We demonstrate that MIRs have transcriptional features that are similar to protein-coding genes. RNA polymerase II-associated ChIA-PET data reveal that many groups of MIRs and protein-coding genes are organized into functionally compartmentalized chromatin communities and undergo coordinated expression when their genomic loci are spatially colocated. We observe that MIRs display widespread communication in those transcriptionally active communities. Moreover, miRNA-target interactions are significantly enriched among communities with functional homogeneity while depleted from the same community from which they originated, suggesting MIRs coordinating function-related pathways at posttranscriptional level. Further investigation demonstrates the existence of spatial MIR-MIR chromatin interacting networks. We show that groups of spatially coordinated MIRs are frequently from the same family and involved in the same disease category. The spatial interaction network possesses both common and cell-specific subnetwork modules that result from the spatial organization of chromatin within different cell types. Together, our study unveils an entirely unexplored layer of MIR regulation throughout the human genome that links the spatial coordination of MIRs to their co-expression and function.
3D RNA and functional interactions from evolutionary couplings
Weinreb, Caleb; Riesselman, Adam; Ingraham, John B.; Gross, Torsten; Sander, Chris; Marks, Debora S.
2016-01-01
Summary Non-coding RNAs are ubiquitous, but the discovery of new RNA gene sequences far outpaces research on their structure and functional interactions. We mine the evolutionary sequence record to derive precise information about function and structure of RNAs and RNA-protein complexes. As in protein structure prediction, we use maximum entropy global probability models of sequence co-variation to infer evolutionarily constrained nucleotide-nucleotide interactions within RNA molecules, and nucleotide-amino acid interactions in RNA-protein complexes. The predicted contacts allow all-atom blinded 3D structure prediction at good accuracy for several known RNA structures and RNA-protein complexes. For unknown structures, we predict contacts in 160 non-coding RNA families. Beyond 3D structure prediction, evolutionary couplings help identify important functional interactions, e.g., at switch points in riboswitches and at a complex nucleation site in HIV. Aided by accelerating sequence accumulation, evolutionary coupling analysis can accelerate the discovery of functional interactions and 3D structures involving RNA. PMID:27087444
Identification of positive selection in disease response genes within members of the Poaceae.
Rech, Gabriel E; Vargas, Walter A; Sukno, Serenella A; Thon, Michael R
2012-12-01
Millions of years of coevolution between plants and pathogens can leave footprints on their genomes and genes involved on this interaction are expected to show patterns of positive selection in which novel, beneficial alleles are rapidly fixed within the population. Using information about upregulated genes in maize during Colletotrichum graminicola infection and resources available in the Phytozome database, we looked for evidence of positive selection in the Poaceae lineage, acting on protein coding sequences related with plant defense. We found six genes with evidence of positive selection and another eight with sites showing episodic selection. Some of them have already been described as evolving under positive selection, but others are reported here for the first time including genes encoding isocitrate lyase, dehydrogenases, a multidrug transporter, a protein containing a putative leucine-rich repeat and other proteins with unknown functions. Mapping positively selected residues onto the predicted 3-D structure of proteins showed that most of them are located on the surface, where proteins are in contact with other molecules. We present here a set of Poaceae genes that are likely to be involved in plant defense mechanisms and have evidence of positive selection. These genes are excellent candidates for future functional validation.
Complete Mitochondrial Genome of Eruca sativa Mill. (Garden Rocket)
Yang, Qing; Chang, Shengxin; Chen, Jianmei; Hu, Maolong; Guan, Rongzhan
2014-01-01
Eruca sativa (Cruciferae family) is an ancient crop of great economic and agronomic importance. Here, the complete mitochondrial genome of Eruca sativa was sequenced and annotated. The circular molecule is 247 696 bp long, with a G+C content of 45.07%, containing 33 protein-coding genes, three rRNA genes, and 18 tRNA genes. The Eruca sativa mitochondrial genome may be divided into six master circles and four subgenomic molecules via three pairwise large repeats, resulting in a more dynamic structure of the Eruca sativa mtDNA compared with other cruciferous mitotypes. Comparison with the Brassica napus MtDNA revealed that most of the genes with known function are conserved between these two mitotypes except for the ccmFN2 and rrn18 genes, and 27 point mutations were scattered in the 14 protein-coding genes. Evolutionary relationships analysis suggested that Eruca sativa is more closely related to the Brassica species and to Raphanus sativus than to Arabidopsis thaliana. PMID:25157569
Brain cDNA clone for human cholinesterase
DOE Office of Scientific and Technical Information (OSTI.GOV)
McTiernan, C.; Adkins, S.; Chatonnet, A.
1987-10-01
A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less
Cheng, Lixin; Leung, Kwong-Sak
2018-05-16
Moonlighting proteins are a class of proteins having multiple distinct functions, which play essential roles in a variety of cellular and enzymatic functioning systems. Although there have long been calls for computational algorithms for the identification of moonlighting proteins, research on approaches to identify moonlighting long non-coding RNAs (lncRNAs) has never been undertaken. Here, we introduce a novel methodology, MoonFinder, for the identification of moonlighting lncRNAs. MoonFinder is a statistical algorithm identifying moonlighting lncRNAs without a priori knowledge through the integration of protein interactome, RNA-protein interactions, and functional annotation of proteins. We identify 155 moonlighting lncRNA candidates and uncover that they are a distinct class of lncRNAs characterized by specific sequence and cellular localization features. The non-coding genes that transcript moonlighting lncRNAs tend to have shorter but more exons and the moonlighting lncRNAs have a variable localization pattern with a high chance of residing in the cytoplasmic compartment in comparison to the other lncRNAs. Moreover, moonlighting lncRNAs and moonlighting proteins are rather mutually exclusive in terms of both their direct interactions and interacting partners. Our results also shed light on how the moonlighting candidates and their interacting proteins implicated in the formation and development of cancers and other diseases. The code implementing MoonFinder is supplied as an R package in the supplementary material. lxcheng@cse.cuhk.edu.hk or ksleung@cse.cuhk.edu.hk. Supplementary data are available at Bioinformatics online.
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae).
Pan, Hong-Chun; Fang, Hong-Yan; Li, Shi-Wei; Liu, Jun-Hong; Wang, Ying; Wang, An-Tai
2014-12-01
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae) is composed of two linear DNA molecules. The mitochondrial DNA (mtDNA) molecule 1 is 8010 bp long and contains six protein-coding genes, large subunit rRNA, methionine and tryptophan tRNAs, two pseudogenes consisting respectively of a partial copy of COI, and terminal sequences at two ends of the linear mtDNA, while the mtDNA molecule 2 is 7576 bp long and contains seven protein-coding genes, small subunit rRNA, methionine tRNA, a pseudogene consisting of a partial copy of COI and terminal sequences at two ends of the linear mtDNA. COI gene begins with GTG as start codon, whereas other 12 protein-coding genes start with a typical ATG initiation codon. In addition, all protein-coding genes are terminated with TAA as stop codon.
Selfish DNA in protein-coding genes of Rickettsia.
Ogata, H; Audic, S; Barbe, V; Artiguenave, F; Fournier, P E; Raoult, D; Claverie, J M
2000-10-13
Rickettsia conorii, the aetiological agent of Mediterranean spotted fever, is an intracellular bacterium transmitted by ticks. Preliminary analyses of the nearly complete genome sequence of R. conorii have revealed 44 occurrences of a previously undescribed palindromic repeat (150 base pairs long) throughout the genome. Unexpectedly, this repeat was found inserted in-frame within 19 different R. conorii open reading frames likely to encode functional proteins. We found the same repeat in proteins of other Rickettsia species. The finding of a mobile element inserted in many unrelated genes suggests the potential role of selfish DNA in the creation of new protein sequences.
Fernandez-Valverde, Selene L; Calcino, Andrew D; Degnan, Bernard M
2015-05-15
The demosponge Amphimedon queenslandica is amongst the few early-branching metazoans with an assembled and annotated draft genome, making it an important species in the study of the origin and early evolution of animals. Current gene models in this species are largely based on in silico predictions and low coverage expressed sequence tag (EST) evidence. Amphimedon queenslandica protein-coding gene models are improved using deep RNA-Seq data from four developmental stages and CEL-Seq data from 82 developmental samples. Over 86% of previously predicted genes are retained in the new gene models, although 24% have additional exons; there is also a marked increase in the total number of annotated 3' and 5' untranslated regions (UTRs). Importantly, these new developmental transcriptome data reveal numerous previously unannotated protein-coding genes in the Amphimedon genome, increasing the total gene number by 25%, from 30,060 to 40,122. In general, Amphimedon genes have introns that are markedly smaller than those in other animals and most of the alternatively spliced genes in Amphimedon undergo intron-retention; exon-skipping is the least common mode of alternative splicing. Finally, in addition to canonical polyadenylation signal sequences, Amphimedon genes are enriched in a number of unique AT-rich motifs in their 3' UTRs. The inclusion of developmental transcriptome data has substantially improved the structure and composition of protein-coding gene models in Amphimedon queenslandica, providing a more accurate and comprehensive set of genes for functional and comparative studies. These improvements reveal the Amphimedon genome is comprised of a remarkably high number of tightly packed genes. These genes have small introns and there is pervasive intron retention amongst alternatively spliced transcripts. These aspects of the sponge genome are more similar unicellular opisthokont genomes than to other animal genomes.
Shabalina, Svetlana A.; Ogurtsov, Aleksey Y.; Spiridonov, Nikolay A.; Koonin, Eugene V.
2014-01-01
Alternative splicing (AS), alternative transcription initiation (ATI) and alternative transcription termination (ATT) create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5′ and 3′ transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5′-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3′-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns. PMID:24792168
Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred
2014-11-20
Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria. Copyright © 2014 Elsevier B.V. All rights reserved.
Biotin protein ligase from Corynebacterium glutamicum: role for growth and L: -lysine production.
Peters-Wendisch, P; Stansen, K C; Götker, S; Wendisch, V F
2012-03-01
Corynebacterium glutamicum is a biotin auxotrophic Gram-positive bacterium that is used for large-scale production of amino acids, especially of L-glutamate and L-lysine. It is known that biotin limitation triggers L-glutamate production and that L-lysine production can be increased by enhancing the activity of pyruvate carboxylase, one of two biotin-dependent proteins of C. glutamicum. The gene cg0814 (accession number YP_225000) has been annotated to code for putative biotin protein ligase BirA, but the protein has not yet been characterized. A discontinuous enzyme assay of biotin protein ligase activity was established using a 105aa peptide corresponding to the carboxyterminus of the biotin carboxylase/biotin carboxyl carrier protein subunit AccBC of the acetyl CoA carboxylase from C. glutamicum as acceptor substrate. Biotinylation of this biotin acceptor peptide was revealed with crude extracts of a strain overexpressing the birA gene and was shown to be ATP dependent. Thus, birA from C. glutamicum codes for a functional biotin protein ligase (EC 6.3.4.15). The gene birA from C. glutamicum was overexpressed and the transcriptome was compared with the control strain revealing no significant gene expression changes of the bio-genes. However, biotin protein ligase overproduction increased the level of the biotin-containing protein pyruvate carboxylase and entailed a significant growth advantage in glucose minimal medium. Moreover, birA overexpression resulted in a twofold higher L-lysine yield on glucose as compared with the control strain.
Network perturbation by recurrent regulatory variants in cancer
Cho, Ara; Lee, Insuk; Choi, Jung Kyoon
2017-01-01
Cancer driving genes have been identified as recurrently affected by variants that alter protein-coding sequences. However, a majority of cancer variants arise in noncoding regions, and some of them are thought to play a critical role through transcriptional perturbation. Here we identified putative transcriptional driver genes based on combinatorial variant recurrence in cis-regulatory regions. The identified genes showed high connectivity in the cancer type-specific transcription regulatory network, with high outdegree and many downstream genes, highlighting their causative role during tumorigenesis. In the protein interactome, the identified transcriptional drivers were not as highly connected as coding driver genes but appeared to form a network module centered on the coding drivers. The coding and regulatory variants associated via these interactions between the coding and transcriptional drivers showed exclusive and complementary occurrence patterns across tumor samples. Transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes. PMID:28333928
Kim, Yoonhee; Zhang, Yinhua; Pang, Kaifang; Kang, Hyojin; Park, Heejoo; Lee, Yeunkum; Lee, Bokyoung; Lee, Heon-Jeong; Kim, Won-Ki; Geum, Dongho
2016-01-01
Bipolar disorder (BD), characterized by recurrent mood swings between depression and mania, is a highly heritable and devastating mental illness with poorly defined pathophysiology. Recent genome-wide molecular genetic studies have identified several protein-coding genes and microRNAs (miRNAs) significantly associated with BD. Notably, some of the proteins expressed from BD-associated genes function in neuronal synapses, suggesting that abnormalities in synaptic function could be one of the key pathogenic mechanisms of BD. In contrast, however, the role of BD-associated miRNAs in disease pathogenesis remains largely unknown, mainly because of a lack of understanding about their target mRNAs and pathways in neurons. To address this problem, in this study, we focused on a recently identified BD-associated but uncharacterized miRNA, miR-1908-5p. We identified and validated its novel target genes including DLGAP4, GRIN1, STX1A, CLSTN1 and GRM4, which all function in neuronal glutamatergic synapses. Moreover, bioinformatic analyses of human brain expression profiles revealed that the expression levels of miR-1908-5p and its synaptic target genes show an inverse-correlation in many brain regions. In our preliminary experiments, the expression of miR-1908-5p was increased after chronic treatment with valproate but not lithium in control human neural progenitor cells. In contrast, it was decreased by valproate in neural progenitor cells derived from dermal fibroblasts of a BD subject. Together, our results provide new insights into the potential role of miR-1908-5p in the pathogenesis of BD and also propose a hypothesis that neuronal synapses could be a key converging pathway of some BD-associated protein-coding genes and miRNAs. PMID:28035180
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...
2015-10-26
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.
Robson, James F; Barker, Daniel
2015-10-13
To demonstrate the bioinformatics capabilities of a low-cost computer, the Raspberry Pi, we present a comparison of the protein-coding gene content of two species in phylum Chlamydiae: Chlamydia trachomatis, a common sexually transmitted infection of humans, and Candidatus Protochlamydia amoebophila, a recently discovered amoebal endosymbiont. Identifying species-specific proteins and differences in protein families could provide insights into the unique phenotypes of the two species. Using a Raspberry Pi computer, sequence similarity-based protein families were predicted across the two species, C. trachomatis and P. amoebophila, and their members counted. Examples include nine multi-protein families unique to C. trachomatis, 132 multi-protein families unique to P. amoebophila and one family with multiple copies in both. Most families unique to C. trachomatis were polymorphic outer-membrane proteins. Additionally, multiple protein families lacking functional annotation were found. Predicted functional interactions suggest one of these families is involved with the exodeoxyribonuclease V complex. The Raspberry Pi computer is adequate for a comparative genomics project of this scope. The protein families unique to P. amoebophila may provide a basis for investigating the host-endosymbiont interaction. However, additional species should be included; and further laboratory research is required to identify the functions of unknown or putative proteins. Multiple outer membrane proteins were found in C. trachomatis, suggesting importance for host evasion. The tyrosine transport protein family is shared between both species, with four proteins in C. trachomatis and two in P. amoebophila. Shared protein families could provide a starting point for discovery of wide-spectrum drugs against Chlamydiae.
Liu, Guoyuan; Li, Xue; Guo, Liping; Zhang, Xuexian; Qi, Tingxiang; Wang, Hailin; Tang, Huini; Qiao, Xiuqin; Zhang, Jinfa; Xing, Chaozhu; Wu, Jianyong
2017-01-01
The RNA editing occurring in plant organellar genomes mainly involves the change of cytidine to uridine. This process involves a deamination reaction, with cytidine deaminase as the catalyst. Pentatricopeptide repeat (PPR) proteins with a C-terminal DYW domain are reportedly associated with cytidine deamination, similar to members of the deaminase superfamily. PPR genes are involved in many cellular functions and biological processes including fertility restoration to cytoplasmic male sterility (CMS) in plants. In this study, we identified 227 and 211 DYW deaminase-coding PPR genes for the cultivated tetraploid cotton species G. hirsutum and G. barbadense (2n = 4x = 52), respectively, as well as 126 and 97 DYW deaminase-coding PPR genes in the ancestral diploid species G. raimondii and G. arboreum (2n = 26), respectively. The 227 G. hirsutum PPR genes were predicted to encode 52–2016 amino acids, 203 of which were mapped onto 26 chromosomes. Most DYW deaminase genes lacked introns, and their proteins were predicted to target the mitochondria or chloroplasts. Additionally, the DYW domain differed from the complete DYW deaminase domain, which contained part of the E domain and the entire E+ domain. The types and number of DYW tripeptides may have been influenced by evolutionary processes, with some tripeptides being lost. Furthermore, a gene ontology analysis revealed that DYW deaminase functions were mainly related to binding as well as hydrolase and transferase activities. The G. hirsutum DYW deaminase expression profiles varied among different cotton tissues and developmental stages, and no differentially expressed DYW deaminase-coding PPRs were directly associated with the male sterility and restoration in the CMS-D2 system. Our current study provides an important piece of information regarding the structural and evolutionary characteristics of Gossypium DYW-containing PPR genes coding for deaminases and will be useful for characterizing the DYW deaminase gene family in cotton biology and breeding. PMID:28339482
Mechanisms of radiation-induced gene responses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woloschak, G.E.; Paunesku, T.
1996-10-01
In the process of identifying genes differentially expressed in cells exposed ultraviolet radiation, we have identified a transcript having a 26-bp region that is highly conserved in a variety of species including Bacillus circulans, yeast, pumpkin, Drosophila, mouse, and man. When the 5` region (flanking region or UTR) of a gene, the sequence is predominantly in +/+ orientation with respect to the coding DNA strand; while in the coding region and the 3` region (UTR), the sequence is most frequently in the +/-orientation with respect to the coding DNA strand. In two genes, the element is split into two parts;more » however, in most cases, it is found only once but with a minimum of 11 consecutive nucleotides precisely depicting the original sequence. The element is found in a large number of different genes with diverse functions (from human ras p21 to B. circulans chitonase). Gel shift assays demonstrated the presence of a protein in HeLa cell extracts that binds to the sense and antisense single-stranded consensus oligomers, as well as to the double- stranded oligonucleotide. When double-stranded oligomer was used, the size shift demonstrated as additional protein-oligomer complex larger than the one bound to either sense or antisense single-stranded consensus oligomers alone. It is speculated either that this element binds to protein(s) important in maintaining DNA is a single-stranded orientation for transcription or, alternatively that this element is important in the transcription-coupled DNA repair process.« less
Long Noncoding RNAs in the Yeast S. cerevisiae.
Niederer, Rachel O; Hass, Evan P; Zappulla, David C
2017-01-01
Long noncoding RNAs have recently been discovered to comprise a sizeable fraction of the RNA World. The scope of their functions, physical organization, and disease relevance remain in the early stages of characterization. Although many thousands of lncRNA transcripts recently have been found to emanate from the expansive DNA between protein-coding genes in animals, there are also hundreds that have been found in simple eukaryotes. Furthermore, lncRNAs have been found in the bacterial and archaeal branches of the tree of life, suggesting they are ubiquitous. In this chapter, we focus primarily on what has been learned so far about lncRNAs from the greatly studied single-celled eukaryote, the yeast Saccharomyces cerevisiae. Most lncRNAs examined in yeast have been implicated in transcriptional regulation of protein-coding genes-often in response to forms of stress-whereas a select few have been ascribed yet other functions. Of those known to be involved in transcriptional regulation of protein-coding genes, the vast majority function in cis. There are also some yeast lncRNAs identified that are not directly involved in regulation of transcription. Examples of these include the telomerase RNA and telomere-encoded transcripts. In addition to its role as a template-encoding telomeric DNA synthesis, telomerase RNA has been shown to function as a flexible scaffold for protein subunits of the RNP holoenzyme. The flexible scaffold model provides a specific mechanistic paradigm that is likely to apply to many other lncRNAs that assemble and orchestrate large RNP complexes, even in humans. Looking to the future, it is clear that considerable fundamental knowledge remains to be obtained about the architecture and functions of lncRNAs. Using genetically tractable unicellular model organisms should facilitate lncRNA characterization. The acquired basic knowledge will ultimately translate to better understanding of the growing list of lncRNAs linked to human maladies.
Wu, Shengru; Liu, Yanli; Guo, Wei; Cheng, Xi; Ren, Xiaochun; Chen, Si; Li, Xueyuan; Duan, Yongle; Sun, Qingzhu; Yang, Xiaojun
2018-06-27
The liver is mainly hematopoietic in the embryo, and converts into a major metabolic organ in the adult. Therefore, it is intensively remodeled after birth to adapt and perform adult functions. Long non-coding RNAs (lncRNAs) are involved in organ development and cell differentiation, likely they have potential roles in regulating postnatal liver development. Herein, in order to understand the roles of lncRNAs in postnatal liver maturation, we analyzed the lncRNAs and mRNAs expression profiles in immature and mature livers from one-day-old and adult (40 weeks of age) breeder roosters by Ribo-Zero RNA-Sequencing. Around 21,939 protein-coding genes and 2220 predicted lncRNAs were expressed in livers of breeder roosters. Compared to protein-coding genes, the identified chicken lncRNAs shared fewer exons, shorter transcript length, and significantly lower expression levels. Notably, in comparison between the livers of newborn and adult breeder roosters, a total of 1570 mRNAs and 214 lncRNAs were differentially expressed with the criteria of log 2 fold change > 1 or < - 1 and P values < 0.05, which were validated by qPCR using randomly selected five mRNAs and five lncRNAs. Further GO and KEGG analyses have revealed that the differentially expressed mRNAs were involved in the hepatic metabolic and immune functional changes, as well as some biological processes and pathways including cell proliferation, apoptotic and cell cycle that are implicated in the development of liver. We also investigated the cis- and trans- regulatory effects of differentially expressed lncRNAs on its target genes. GO and KEGG analyses indicated that these lncRNAs had their neighbor protein coding genes and trans-regulated genes associated with adapting of adult hepatic functions, as well as some pathways involved in liver development, such as cell cycle pathway, Notch signaling pathway, Hedgehog signaling pathway, and Wnt signaling pathway. This study provides a catalog of mRNAs and lncRNAs related to postnatal liver maturation of chicken, and will contribute to a fuller understanding of biological processes or signaling pathways involved in significant functional transition during postnatal liver development that differentially expressed genes and lncRNAs could take part in.
Origins of genes: "big bang" or continuous creation?
Keese, P K; Gibbs, A
1992-01-01
Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes. PMID:1329098
Gene cloning and prokaryotic expression of recombinant flagellin A from Vibrio parahaemolyticus
NASA Astrophysics Data System (ADS)
Yuan, Ye; Wang, Xiuli; Guo, Sheping; Liu, Yang; Ge, Hui; Qiu, Xuemei
2010-11-01
The Gram-negative Vibrio parahaemolyticus is a common pathogen in humans and marine animals. Bacteria flagellins play an important role during infection and induction of the host immune response. Thus, flagellin proteins are an ideal target for vaccines. We amplified the complete flagellin subunit gene ( flaA) from V. parahaemolyticus ATCC 17802. We then cloned and expressed the gene into Escherichia coli BL21 (DE3) cells. The gene coded for a protein that was 62.78 kDa. We purified and characterized the protein using Ni-NTA affinity chromatography and Anti-His antibody Western blotting, respectively. Our results provide a basis for further studies into the utility of the FlaA protein as a vaccine candidate against infection by Vibrio parahaemolyticus. In addition, the purified FlaA protein can be used for further functional and structural studies.
Intragenome Diversity of Gene Families Encoding Toxin-like Proteins in Venomous Animals.
Rodríguez de la Vega, Ricardo C; Giraud, Tatiana
2016-11-01
The evolution of venoms is the story of how toxins arise and of the processes that generate and maintain their diversity. For animal venoms these processes include recruitment for expression in the venom gland, neofunctionalization, paralogous expansions, and functional divergence. The systematic study of these processes requires the reliable identification of the venom components involved in antagonistic interactions. High-throughput sequencing has the potential of uncovering the entire set of toxins in a given organism, yet the existence of non-venom toxin paralogs and the misleading effects of partial census of the molecular diversity of toxins make necessary to collect complementary evidence to distinguish true toxins from their non-venom paralogs. Here, we analyzed the whole genomes of two scorpions, one spider and one snake, aiming at the identification of the full repertoires of genes encoding toxin-like proteins. We classified the entire set of protein-coding genes into paralogous groups and monotypic genes, identified genes encoding toxin-like proteins based on known toxin families, and quantified their expression in both venom-glands and pooled tissues. Our results confirm that genes encoding toxin-like proteins are part of multigene families, and that these families arise by recruitment events from non-toxin genes followed by limited expansions of the toxin-like protein coding genes. We also show that failing to account for sequence similarity with non-toxin proteins has a considerable misleading effect that can be greatly reduced by comparative transcriptomics. Our study overall contributes to the understanding of the evolutionary dynamics of proteins involved in antagonistic interactions. © The Author 2016. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.
Baumgartner, Desiree; Kopf, Matthias; Klähn, Stephan; Steglich, Claudia; Hess, Wolfgang R
2016-11-28
Despite their versatile functions in multimeric protein complexes, in the modification of enzymatic activities, intercellular communication or regulatory processes, proteins shorter than 80 amino acids (μ-proteins) are a systematically underestimated class of gene products in bacteria. Photosynthetic cyanobacteria provide a paradigm for small protein functions due to extensive work on the photosynthetic apparatus that led to the functional characterization of 19 small proteins of less than 50 amino acids. In analogy, previously unstudied small ORFs with similar degrees of conservation might encode small proteins of high relevance also in other functional contexts. Here we used comparative transcriptomic information available for two model cyanobacteria, Synechocystis sp. PCC 6803 and Synechocystis sp. PCC 6714 for the prediction of small ORFs. We found 293 transcriptional units containing candidate small ORFs ≤80 codons in Synechocystis sp. PCC 6803, also including the known mRNAs encoding small proteins of the photosynthetic apparatus. From these transcriptional units, 146 are shared between the two strains, 42 are shared with the higher plant Arabidopsis thaliana and 25 with E. coli. To verify the existence of the respective μ-proteins in vivo, we selected five genes as examples to which a FLAG tag sequence was added and re-introduced them into Synechocystis sp. PCC 6803. These were the previously annotated gene ssr1169, two newly defined genes norf1 and norf4, as well as nsiR6 (nitrogen stress-induced RNA 6) and hliR1(high light-inducible RNA 1) , which originally were considered non-coding. Upon activation of expression via the Cu 2+. responsive petE promoter or from the native promoters, all five proteins were detected in Western blot experiments. The distribution and conservation of these five genes as well as their regulation of expression and the physico-chemical properties of the encoded proteins underline the likely great bandwidth of small protein functions in bacteria and makes them attractive candidates for functional studies.
NASA Astrophysics Data System (ADS)
Yu, Jia-Feng; Sui, Tian-Xiang; Wang, Hong-Mei; Wang, Chun-Ling; Jing, Li; Wang, Ji-Hua
2015-12-01
Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. Project supported by the National Natural Science Foundation of China (Grant Nos. 61302186 and 61271378) and the Funding from the State Key Laboratory of Bioelectronics of Southeast University.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Calabrese, G.; Sallese, M.; Stornaiuolo, A.
1994-09-01
Two types of proteins play a major role in determining homologous desensitization of G-coupled receptors: {beta}-adrenergic receptor kinase ({beta}ARK), which phosphorylates the agonist-occupied receptor and its functional cofactor, {beta}-arrestin. Both {beta}ARK and {beta}-arrestin are members of multigene families. The family of G-protein-coupled receptor kinases includes rhodopsin kinase, {beta}ARK1, {beta}ARK2, IT11-A (GRK4), GRK5, and GRK6. The arrestin/{beta}-arrestin gene family includes arrestin (also known as S-antigen), {beta}-arrestin 1, and {beta}-arrestin 2. Here we report the chromosome mapping of the human genes for arrestin (SAG), {beta}arrestin 2 (ARRB2), and {beta}ARK2 (ADRBK2) by fluorescence in situ hybridization (FISH). FISH results confirmed the assignment ofmore » the gene coding for arrestin (SAG) to chromosome 2 and allowed us to refine its localization to band q37. The gene coding for {beta}-arrestin 2 (ARRB2) was mapped to chromosome 17p13 and that coding for {beta}ARK2 (ADRBK2) to chromosome 22q11. 17 refs., 1 fig.« less
Hrdlickova, Barbara; Kumar, Vinod; Kanduri, Kartiek; Zhernakova, Daria V; Tripathi, Subhash; Karjalainen, Juha; Lund, Riikka J; Li, Yang; Ullah, Ubaid; Modderman, Rutger; Abdulahad, Wayel; Lähdesmäki, Harri; Franke, Lude; Lahesmaa, Riitta; Wijmenga, Cisca; Withoff, Sebo
2014-01-01
Although genome-wide association studies (GWAS) have identified hundreds of variants associated with a risk for autoimmune and immune-related disorders (AID), our understanding of the disease mechanisms is still limited. In particular, more than 90% of the risk variants lie in non-coding regions, and almost 10% of these map to long non-coding RNA transcripts (lncRNAs). lncRNAs are known to show more cell-type specificity than protein-coding genes. We aimed to characterize lncRNAs and protein-coding genes located in loci associated with nine AIDs which have been well-defined by Immunochip analysis and by transcriptome analysis across seven populations of peripheral blood leukocytes (granulocytes, monocytes, natural killer (NK) cells, B cells, memory T cells, naive CD4(+) and naive CD8(+) T cells) and four populations of cord blood-derived T-helper cells (precursor, primary, and polarized (Th1, Th2) T-helper cells). We show that lncRNAs mapping to loci shared between AID are significantly enriched in immune cell types compared to lncRNAs from the whole genome (α <0.005). We were not able to prioritize single cell types relevant for specific diseases, but we observed five different cell types enriched (α <0.005) in five AID (NK cells for inflammatory bowel disease, juvenile idiopathic arthritis, primary biliary cirrhosis, and psoriasis; memory T and CD8(+) T cells in juvenile idiopathic arthritis, primary biliary cirrhosis, psoriasis, and rheumatoid arthritis; Th0 and Th2 cells for inflammatory bowel disease, juvenile idiopathic arthritis, primary biliary cirrhosis, psoriasis, and rheumatoid arthritis). Furthermore, we show that co-expression analyses of lncRNAs and protein-coding genes can predict the signaling pathways in which these AID-associated lncRNAs are involved. The observed enrichment of lncRNA transcripts in AID loci implies lncRNAs play an important role in AID etiology and suggests that lncRNA genes should be studied in more detail to interpret GWAS findings correctly. The co-expression results strongly support a model in which the lncRNA and protein-coding genes function together in the same pathways.
Assignment of the {beta}-arrestin 1 gene (ARRB1) to human chromosome 11q13
DOE Office of Scientific and Technical Information (OSTI.GOV)
Calabrese, G.; Morizio, E.; Palka, G.
1994-11-01
Two types of proteins play a major role in determining homologous desensitization of G-coupled receptors: {beta}-adrenergic receptor kinase ({beta}ARK), which phosphorylates the agonist-occupied receptor, and its functional cofactor, {beta}-arrestin. {beta}ARK is a member of a multigene family, consisting of six known subtypes, which have also been named G-protein-coupled receptor kinases (GRK 1 to 6) due to the apparently unique functional association of such kinases with this receptor family. The gene for {beta}ARK1 has been localized to human chromosome 11q13. The four members of the arrestin/{beta}-arrestin gene family identified so far are arrestin, X-arrestin, {beta}-arrestin 1, and {beta}-arrestin 2. Here themore » authors report the chromosome mapping of the human gene for {beta}-arrestin 1 (ARRB1) to chromosome 11q13 by fluorescence in situ hybridization (FISH). Two-color FISH confirmed that the two genes coding for the functionally related proteins {beta}ARK1 and {beta}arrestin 1 both map to 11q13. 16 refs., 1 fig., 1 tab.« less
dbCPG: A web resource for cancer predisposition genes.
Wei, Ran; Yao, Yao; Yang, Wu; Zheng, Chun-Hou; Zhao, Min; Xia, Junfeng
2016-06-21
Cancer predisposition genes (CPGs) are genes in which inherited mutations confer highly or moderately increased risks of developing cancer. Identification of these genes and understanding the biological mechanisms that underlie them is crucial for the prevention, early diagnosis, and optimized management of cancer. Over the past decades, great efforts have been made to identify CPGs through multiple strategies. However, information on these CPGs and their molecular functions is scattered. To address this issue and provide a comprehensive resource for researchers, we developed the Cancer Predisposition Gene Database (dbCPG, Database URL: http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp), the first literature-based gene resource for exploring human CPGs. It contains 827 human (724 protein-coding, 23 non-coding, and 80 unknown type genes), 637 rats, and 658 mouse CPGs. Furthermore, data mining was performed to gain insights into the understanding of the CPGs data, including functional annotation, gene prioritization, network analysis of prioritized genes and overlap analysis across multiple cancer types. A user-friendly web interface with multiple browse, search, and upload functions was also developed to facilitate access to the latest information on CPGs. Taken together, the dbCPG database provides a comprehensive data resource for further studies of cancer predisposition genes.
Akhter, Yusuf; Ehebauer, Matthias T; Mukhopadhyay, Sangita; Hasnain, Seyed E
2012-01-01
The PE/PPE multigene family codes for approximately 10% of the Mycobacterium tuberculosis proteome and is encoded by 176 open reading frames. These proteins possess, and have been named after, the conserved proline-glutamate (PE) or proline-proline-glutamate (PPE) motifs at their N-terminus. Their genes have a conserved structure and repeat motifs that could be a potential source of antigenic variation in M. tuberculosis. PE/PPE genes are scattered throughout the genome and PE/PPE pairs are usually encoded in bicistronic operons although this is not universally so. This gene family has evolved by specific gene duplication events. PE/PPE proteins are either secreted or localized to the cell surface. Several are thought to be virulence factors, which participate in evasion of the host immune response. This review summarizes the current knowledge about the gene family in order to better understand its biological function. Copyright © 2011 Elsevier Masson SAS. All rights reserved.
Ezkurdia, Iakes; del Pozo, Angela; Frankish, Adam; Rodriguez, Jose Manuel; Harrow, Jennifer; Ashman, Keith; Valencia, Alfonso; Tress, Michael L.
2012-01-01
Advances in high-throughput mass spectrometry are making proteomics an increasingly important tool in genome annotation projects. Peptides detected in mass spectrometry experiments can be used to validate gene models and verify the translation of putative coding sequences (CDSs). Here, we have identified peptides that cover 35% of the genes annotated by the GENCODE consortium for the human genome as part of a comprehensive analysis of experimental spectra from two large publicly available mass spectrometry databases. We detected the translation to protein of “novel” and “putative” protein-coding transcripts as well as transcripts annotated as pseudogenes and nonsense-mediated decay targets. We provide a detailed overview of the population of alternatively spliced protein isoforms that are detectable by peptide identification methods. We found that 150 genes expressed multiple alternative protein isoforms. This constitutes the largest set of reliably confirmed alternatively spliced proteins yet discovered. Three groups of genes were highly overrepresented. We detected alternative isoforms for 10 of the 25 possible heterogeneous nuclear ribonucleoproteins, proteins with a key role in the splicing process. Alternative isoforms generated from interchangeable homologous exons and from short indels were also significantly enriched, both in human experiments and in parallel analyses of mouse and Drosophila proteomics experiments. Our results show that a surprisingly high proportion (almost 25%) of the detected alternative isoforms are only subtly different from their constitutive counterparts. Many of the alternative splicing events that give rise to these alternative isoforms are conserved in mouse. It was striking that very few of these conserved splicing events broke Pfam functional domains or would damage globular protein structures. This evidence of a strong bias toward subtle differences in CDS and likely conserved cellular function and structure is remarkable and strongly suggests that the translation of alternative transcripts may be subject to selective constraints. PMID:22446687
Maier, Lisa-Katharina; Benz, Juliane; Fischer, Susan; Alstetter, Martina; Jaschinski, Katharina; Hilker, Rolf; Becker, Anke; Allers, Thorsten; Soppa, Jörg; Marchfelder, Anita
2015-10-01
Members of the Sm protein family are important for the cellular RNA metabolism in all three domains of life. The family includes archaeal and eukaryotic Lsm proteins, eukaryotic Sm proteins and archaeal and bacterial Hfq proteins. While several studies concerning the bacterial and eukaryotic family members have been published, little is known about the archaeal Lsm proteins. Although structures for several archaeal Lsm proteins have been solved already more than ten years ago, we still do not know much about their biological function, however one can confidently propose that the archaeal Lsm proteins will also be involved in RNA metabolism. Therefore, we investigated this protein in the halophilic archaeon Haloferax volcanii. The Haloferax genome encodes a single Lsm protein, the lsm gene overlaps and is co-transcribed with the gene for the ribosomal L37.eR protein. Here, we show that the reading frame of the lsm gene contains a promoter which regulates expression of the overlapping rpl37R gene. This rpl37R specific promoter ensures high expression of the rpl37R gene in exponential growth phase. To investigate the biological function of the Lsm protein we generated a lsm deletion mutant that had the coding sequence for the Sm1 motif removed but still contained the internal promoter for the downstream rpl37R gene. The transcriptome of this deletion mutant was compared to the wild type transcriptome, revealing that several genes are down-regulated and many genes are up-regulated in the deletion strain. Northern blot analyses confirmed down-regulation of two genes. In addition, the deletion strain showed a gain of function in swarming, in congruence with the up-regulation of transcripts encoding proteins required for motility. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Zhu, Yafeng; Engström, Pär G; Tellgren-Roth, Christian; Baudo, Charles D; Kennell, John C; Sun, Sheng; Billmyre, R Blake; Schröder, Markus S; Andersson, Anna; Holm, Tina; Sigurgeirsson, Benjamin; Wu, Guangxi; Sankaranarayanan, Sundar Ram; Siddharthan, Rahul; Sanyal, Kaustuv; Lundeberg, Joakim; Nystedt, Björn; Boekhout, Teun; Dawson, Thomas L; Heitman, Joseph; Scheynius, Annika; Lehtiö, Janne
2017-03-17
Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bae, Euiyoung; Bingman, Craig A.; Aceti, David J.
LOC79017 (MW 21.0 kDa, residues 1-188) was annotated as a hypothetical protein encoded by Homo sapiens chromosome 7 open reading frame 24. It was selected as a target by the Center for Eukaryotic Structural Genomics (CESG) because it did not share more than 30% sequence identity with any protein for which the three-dimensional structure is known. The biological function of the protein has not been established yet. Parts of LOC79017 were identified as members of uncharacterized Pfam families (residues 1-95 as PB006073 and residues 104-180 as PB031696). BLAST searches revealed homologues of LOC79017 in many eukaryotes, but none of themmore » have been functionally characterized. Here, we report the crystal structure of H. sapiens protein LOC79017 (UniGene code Hs.530024, UniProt code O75223, CESG target number go.35223).« less
Argonaute: The executor of small RNA function.
Azlan, Azali; Dzaki, Najat; Azzam, Ghows
2016-08-20
The discovery of small non-coding RNAs - microRNA (miRNA), short interfering RNA (siRNA) and PIWI-interacting RNA (piRNA) - represents one of the most exciting frontiers in biology specifically on the mechanism of gene regulation. In order to execute their functions, these small RNAs require physical interactions with their protein partners, the Argonaute (AGO) family proteins. Over the years, numerous studies have made tremendous progress on understanding the roles of AGO in gene silencing in various organisms. In this review, we summarize recent progress of AGO-mediated gene silencing and other cellular processes in which AGO proteins have been implicated with a particular focus on progress made in flies, humans and other model organisms as compliment. Copyright © 2016 Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, and Genetics Society of China. Published by Elsevier Ltd. All rights reserved.
The expanding regulatory universe of p53 in gastrointestinal cancer.
Fesler, Andrew; Zhang, Ning; Ju, Jingfang
2016-01-01
Tumor suppresser gene TP53 is one of the most frequently deleted or mutated genes in gastrointestinal cancers. As a transcription factor, p53 regulates a number of important protein coding genes to control cell cycle, cell death, DNA damage/repair, stemness, differentiation and other key cellular functions. In addition, p53 is also able to activate the expression of a number of small non-coding microRNAs (miRNAs) through direct binding to the promoter region of these miRNAs. Many miRNAs have been identified to be potential tumor suppressors by regulating key effecter target mRNAs. Our understanding of the regulatory network of p53 has recently expanded to include long non-coding RNAs (lncRNAs). Like miRNA, lncRNAs have been found to play important roles in cancer biology. With our increased understanding of the important functions of these non-coding RNAs and their relationship with p53, we are gaining exciting new insights into the biology and function of cells in response to various growth environment changes. In this review we summarize the current understanding of the ever expanding involvement of non-coding RNAs in the p53 regulatory network and its implications for our understanding of gastrointestinal cancer.
MAISTAS: a tool for automatic structural evaluation of alternative splicing products.
Floris, Matteo; Raimondo, Domenico; Leoni, Guido; Orsini, Massimiliano; Marcatili, Paolo; Tramontano, Anna
2011-06-15
Analysis of the human genome revealed that the amount of transcribed sequence is an order of magnitude greater than the number of predicted and well-characterized genes. A sizeable fraction of these transcripts is related to alternatively spliced forms of known protein coding genes. Inspection of the alternatively spliced transcripts identified in the pilot phase of the ENCODE project has clearly shown that often their structure might substantially differ from that of other isoforms of the same gene, and therefore that they might perform unrelated functions, or that they might even not correspond to a functional protein. Identifying these cases is obviously relevant for the functional assignment of gene products and for the interpretation of the effect of variations in the corresponding proteins. Here we describe a publicly available tool that, given a gene or a protein, retrieves and analyses all its annotated isoforms, provides users with three-dimensional models of the isoform(s) of his/her interest whenever possible and automatically assesses whether homology derived structural models correspond to plausible structures. This information is clearly relevant. When the homology model of some isoforms of a gene does not seem structurally plausible, the implications are that either they assume a structure unrelated to that of the other isoforms of the same gene with presumably significant functional differences, or do not correspond to functional products. We provide indications that the second hypothesis is likely to be true for a substantial fraction of the cases. http://maistas.bioinformatica.crs4.it/.
Non-coding functions of alternative pre-mRNA splicing in development.
Mockenhaupt, Stefan; Makeyev, Eugene V
2015-12-01
A majority of messenger RNA precursors (pre-mRNAs) in the higher eukaryotes undergo alternative splicing to generate more than one mature product. By targeting the open reading frame region this process increases diversity of protein isoforms beyond the nominal coding capacity of the genome. However, alternative splicing also frequently controls output levels and spatiotemporal features of cellular and organismal gene expression programs. Here we discuss how these non-coding functions of alternative splicing contribute to development through regulation of mRNA stability, translational efficiency and cellular localization. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Quantifying the Effect of DNA Packaging on Gene Expression Level
NASA Astrophysics Data System (ADS)
Kim, Harold
2010-10-01
Gene expression, the process by which the genetic code comes alive in the form of proteins, is one of the most important biological processes in living cells, and begins when transcription factors bind to specific DNA sequences in the promoter region upstream of a gene. The relationship between gene expression output and transcription factor input which is termed the gene regulation function is specific to each promoter, and predicting this gene regulation function from the locations of transcription factor binding sites is one of the challenges in biology. In eukaryotic organisms (for example, animals, plants, fungi etc), DNA is highly compacted into nucleosomes, 147-bp segments of DNA tightly wrapped around histone protein core, and therefore, the accessibility of transcription factor binding sites depends on their locations with respect to nucleosomes - sites inside nucleosomes are less accessible than those outside nucleosomes. To understand how transcription factor binding sites contribute to gene expression in a quantitative manner, we obtain gene regulation functions of promoters with various configurations of transcription factor binding sites by using fluorescent protein reporters to measure transcription factor input and gene expression output in single yeast cells. In this talk, I will show that the affinity of a transcription factor binding site inside and outside the nucleosome controls different aspects of the gene regulation function, and explain this finding based on a mass-action kinetic model that includes competition between nucleosomes and transcription factors.
The spatial distribution of fixed mutations within genes coding for proteins
NASA Technical Reports Server (NTRS)
Holmquist, R.; Goodman, M.; Conroy, T.; Czelusniak, J.
1983-01-01
An examination has been conducted of the extensive amino acid sequence data now available for five protein families - the alpha crystallin A chain, myoglobin, alpha and beta hemoglobin, and the cytochromes c - with the goal of estimating the true spatial distribution of base substitutions within genes that code for proteins. In every case the commonly used Poisson density failed to even approximate the experimental pattern of base substitution. For the 87 species of beta hemoglobin examined, for example, the probability that the observed results were from a Poisson process was the minuscule 10 to the -44th. Analogous results were obtained for the other functional families. All the data were reasonably, but not perfectly, described by the negative binomial density. In particular, most of the data were described by one of the very simple limiting forms of this density, the geometric density. The implications of this for evolutionary inference are discussed. It is evident that most estimates of total base substitutions between genes are badly in need of revision.
Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.
Borodovsky, M; Rudd, K E; Koonin, E V
1994-01-01
The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins. Images PMID:7984428
Decoding the function of nuclear long non-coding RNAs.
Chen, Ling-Ling; Carmichael, Gordon G
2010-06-01
Long non-coding RNAs (lncRNAs) are mRNA-like, non-protein-coding RNAs that are pervasively transcribed throughout eukaryotic genomes. Rather than silently accumulating in the nucleus, many of these are now known or suspected to play important roles in nuclear architecture or in the regulation of gene expression. In this review, we highlight some recent progress in how lncRNAs regulate these important nuclear processes at the molecular level. Copyright 2010 Elsevier Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a mor...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kozbial, Piotr; Xu, Qingping; Chiu, Hsiu-Ju
2009-08-28
To extend the structural coverage of proteins with unknown functions, we targeted a novel protein family (Pfam accession number PF08807, DUF1798) for which we proposed and determined the structures of two representative members. The MW1337R gene of Staphylococcus aureus subsp. aureus Rosenbach (Wood 46) encodes a protein with a molecular weight of 13.8 kDa (residues 1-116) and a calculated isoelectric point of 5.15. The lin2004 gene of the nonspore-forming bacterium Listeria innocua Clip11262 encodes a protein with a molecular weight of 14.6 kDa (residues 1-121) and a calculated isoelectric point of 5.45. MW1337R and lin2004, as well as their homologs,more » which, so far, have been found only in Bacillus, Staphylococcus, Listeria, and related genera (Geobacillus, Exiguobacterium, and Oceanobacillus), have unknown functions and are annotated as hypothetical proteins. The genomic contexts of MW1337R and lin2004 are similar and conserved in related species. In prokaryotic genomes, most often, functionally interacting proteins are coded by genes, which are colocated in conserved operons. Proteins from the same operon as MW1337R and lin2004 either have unknown functions (i.e., belong to DUF1273, Pfam accession number PF06908) or are similar to ypsB from Bacillus subtilis. The function of ypsB is unclear, although it has a strong similarity to the N-terminal region of DivIVA, which was characterized as a bifunctional protein with distinct roles during vegetative growth and sporulation. In addition, members of the DUF1273 family display distant sequence similarity with the DprA/Smf protein, which acts downstream of the DNA uptake machinery, possibly in conjunction with RecA. The RecA activities in Bacillus subtilis are modulated by RecU Holliday-junction resolvase. In all analyzed cases, the gene coding for RecU is in the vicinity of MW1337R, lin2004, or their orthologs, but on a different operon located in the complementary DNA strand. Here, we report the crystal structures of MW1337R and lin2004, which were determined using the semiautomated, high-throughput pipeline of the Joint Center for Structural Genomics (JCSG), part of the National Institute of General Medical Sciences Protein Structure Initiative.« less
A Molecular Portrait of De Novo Genes in Yeasts.
Vakirlis, Nikolaos; Hebert, Alex S; Opulente, Dana A; Achaz, Guillaume; Hittinger, Chris Todd; Fischer, Gilles; Coon, Joshua J; Lafontaine, Ingrid
2018-03-01
New genes, with novel protein functions, can evolve "from scratch" out of intergenic sequences. These de novo genes can integrate the cell's genetic network and drive important phenotypic innovations. Therefore, identifying de novo genes and understanding how the transition from noncoding to coding occurs are key problems in evolutionary biology. However, identifying de novo genes is a difficult task, hampered by the presence of remote homologs, fast evolving sequences and erroneously annotated protein coding genes. To overcome these limitations, we developed a procedure that handles the usual pitfalls in de novo gene identification and predicted the emergence of 703 de novo gene candidates in 15 yeast species from 2 genera whose phylogeny spans at least 100 million years of evolution. We validated 85 candidates by proteomic data, providing new translation evidence for 25 of them through mass spectrometry experiments. We also unambiguously identified the mutations that enabled the transition from noncoding to coding for 30 Saccharomyces de novo genes. We established that de novo gene origination is a widespread phenomenon in yeasts, only a few being ultimately maintained by selection. We also found that de novo genes preferentially emerge next to divergent promoters in GC-rich intergenic regions where the probability of finding a fortuitous and transcribed ORF is the highest. Finally, we found a more than 3-fold enrichment of de novo genes at recombination hot spots, which are GC-rich and nucleosome-free regions, suggesting that meiotic recombination contributes to de novo gene emergence in yeasts.
MitoNuc: a database of nuclear genes coding for mitochondrial proteins. Update 2002.
Attimonelli, Marcella; Catalano, Domenico; Gissi, Carmela; Grillo, Giorgio; Licciulli, Flavio; Liuni, Sabino; Santamaria, Monica; Pesole, Graziano; Saccone, Cecilia
2002-01-01
Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organelle genomes. Mitochondrial genomes have been extensively sequenced and analysed and data have been collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc, a database containing detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa. The MitoNuc database can be retrieved through SRS and is available via the web site http://bighost.area.ba.cnr.it/mitochondriome where other mitochondrial databases developed by our group, the complete list of the sequenced mitochondrial genomes, links to other mitochondrial sites and related information, are available. The MitoAln database, related to MitoNuc in the previous release, reporting the multiple alignments of the relevant homologous protein coding regions, is no longer supported in the present release. In order to keep the links among entries in MitoNuc from homologous proteins, a new field in the database has been defined: the cluster identifier, an alpha numeric code used to identify each cluster of homologous proteins. A comment field derived from the corresponding SWISS-PROT entry has been introduced; this reports clinical data related to dysfunction of the protein. The logic scheme of MitoNuc database has been implemented in the ORACLE DBMS. This will allow the end-users to retrieve data through a friendly interface that will be soon implemented.
Perspectives on the mechanism of transcriptional regulation by long non-coding RNAs.
Roberts, Thomas C; Morris, Kevin V; Weinberg, Marc S
2014-01-01
Long non-coding RNAs (lncRNAs) are increasingly being recognized as epigenetic regulators of gene transcription. The diversity and complexity of lncRNA genes means that they exert their regulatory effects by a variety of mechanisms. Although there is still much to be learned about the mechanism of lncRNA function, general principles are starting to emerge. In particular, the application of high throughput (deep) sequencing methodologies has greatly advanced our understanding of lncRNA gene function. lncRNAs function as adaptors that link specific chromatin loci with chromatin-remodeling complexes and transcription factors. lncRNAs can act in cis or trans to guide epigenetic-modifier complexes to distinct genomic sites, or act as scaffolds which recruit multiple proteins simultaneously, thereby coordinating their activities. In this review we discuss the genomic organization of lncRNAs, the importance of RNA secondary structure to lncRNA functionality, the multitude of ways in which they interact with the genome, and what evolutionary conservation tells us about their function.
RNA editing in Drosophila melanogaster: new targets and functionalconsequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stapleton, Mark; Carlson, Joseph W.; Celniker, Susan E.
2006-09-05
Adenosine deaminases that act on RNA (ADARs) catalyze the site-specific conversion of adenosine to inosine in primary mRNA transcripts. These re-coding events affect coding potential, splice-sites, and stability of mature mRNAs. ADAR is an essential gene and studies in mouse, C. elegans, and Drosophila suggest its primary function is to modify adult behavior by altering signaling components in the nervous system. By comparing the sequence of isogenic cDNAs to genomic DNA, we have identified and experimentally verified 27 new targets of Drosophila ADAR. Our analyses lead us to identify new classes of genes whose transcripts are targets of ADAR includingmore » components of the actin cytoskeleton, and genes involved in ion homeostasis and signal transduction. Our results indicate that editing in Drosophila increases the diversity of the proteome, and does so in a manner that has direct functional consequences on protein function.« less
Lim, Hyoun-Sub; Vaira, Anna Maria; Domier, Leslie L; Lee, Sung Chul; Kim, Hong Gi; Hammond, John
2010-06-20
We have developed plant virus-based vectors for virus-induced gene silencing (VIGS) and protein expression, based on Alternanthera mosaic virus (AltMV), for infection of a wide range of host plants including Nicotiana benthamiana and Arabidopsis thaliana by either mechanical inoculation of in vitro transcripts or via agroinfiltration. In vivo transcripts produced by co-agroinfiltration of bacteriophage T7 RNA polymerase resulted in T7-driven AltMV infection from a binary vector in the absence of the Cauliflower mosaic virus 35S promoter. An artificial bipartite viral vector delivery system was created by separating the AltMV RNA-dependent RNA polymerase and Triple Gene Block (TGB)123-Coat protein (CP) coding regions into two constructs each bearing the AltMV 5' and 3' non-coding regions, which recombined in planta to generate a full-length AltMV genome. Substitution of TGB1 L(88)P, and equivalent changes in other potexvirus TGB1 proteins, affected RNA silencing suppression efficacy and suitability of the vectors from protein expression to VIGS. Published by Elsevier Inc.
Nagarkar-Jaiswal, Sonal; Lee, Pei-Tseng; Campbell, Megan E; Chen, Kuchuan; Anguiano-Zarate, Stephanie; Cantu Gutierrez, Manuel; Busby, Theodore; Lin, Wen-Wen; He, Yuchun; Schulze, Karen L; Booth, Benjamin W; Evans-Holm, Martha; Venken, Koen JT; Levis, Robert W; Spradling, Allan C; Hoskins, Roger A; Bellen, Hugo J
2015-01-01
Here, we document a collection of ∼7434 MiMIC (Minos Mediated Integration Cassette) insertions of which 2854 are inserted in coding introns. They allowed us to create a library of 400 GFP-tagged genes. We show that 72% of internally tagged proteins are functional, and that more than 90% can be imaged in unfixed tissues. Moreover, the tagged mRNAs can be knocked down by RNAi against GFP (iGFPi), and the tagged proteins can be efficiently knocked down by deGradFP technology. The phenotypes associated with RNA and protein knockdown typically correspond to severe loss of function or null mutant phenotypes. Finally, we demonstrate reversible, spatial, and temporal knockdown of tagged proteins in larvae and adult flies. This new strategy and collection of strains allows unprecedented in vivo manipulations in flies for many genes. These strategies will likely extend to vertebrates. DOI: http://dx.doi.org/10.7554/eLife.05338.001 PMID:25824290
Boden, Rich; Hutt, Lee P.; Huntemann, Marcel; ...
2016-09-26
Thermithiobacillus tepidarius DSM 3134 T was originally isolated (1983) from the waters of a sulfidic spring entering the Roman Baths (Temple of Sulis-Minerva) at Bath, United Kingdom and is an obligate chemolithoautotroph growing at the expense of reduced sulfur species. This strain has a genome size of 2,958,498 bp. Here we report the genome sequence, annotation and characteristics. The genome comprises 2,902 protein coding and 66 RNA coding genes. Genes responsible for the transaldolase variant of the Calvin-Benson-Bassham cycle were identified along with a biosynthetic horseshoe in lieu of Krebs' cycle sensu stricto. Terminal oxidases were identified, viz. cytochrome cmore » oxidase (cbb 3 , EC 1.9.3.1) and ubiquinol oxidase (bd, EC 1.10.3.10). Metalloresistance genes involved in pathways of arsenic and cadmium resistance were found. Evidence of horizontal gene transfer accounting for 5.9 % of the protein-coding genes was found, including transfer from Thiobacillus spp. and Methylococcus capsulatus Bath, isolated from the same spring. A sox gene cluster was found, similar in structure to those from other Acidithiobacillia - by comparison with Thiobacillus thioparus and Paracoccus denitrificans, an additional gene between soxA and soxB was found, annotated as a DUF302-family protein of unknown function. As the Kelly-Friedrich pathway of thiosulfate oxidation (encoded by sox) is not used in Thermithiobacillus spp., the role of the operon (if any) in this species remains unknown. We speculate that DUF302 and sox genes may have a role in periplasmic trithionate oxidation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boden, Rich; Hutt, Lee P.; Huntemann, Marcel
Thermithiobacillus tepidarius DSM 3134 T was originally isolated (1983) from the waters of a sulfidic spring entering the Roman Baths (Temple of Sulis-Minerva) at Bath, United Kingdom and is an obligate chemolithoautotroph growing at the expense of reduced sulfur species. This strain has a genome size of 2,958,498 bp. Here we report the genome sequence, annotation and characteristics. The genome comprises 2,902 protein coding and 66 RNA coding genes. Genes responsible for the transaldolase variant of the Calvin-Benson-Bassham cycle were identified along with a biosynthetic horseshoe in lieu of Krebs' cycle sensu stricto. Terminal oxidases were identified, viz. cytochrome cmore » oxidase (cbb 3 , EC 1.9.3.1) and ubiquinol oxidase (bd, EC 1.10.3.10). Metalloresistance genes involved in pathways of arsenic and cadmium resistance were found. Evidence of horizontal gene transfer accounting for 5.9 % of the protein-coding genes was found, including transfer from Thiobacillus spp. and Methylococcus capsulatus Bath, isolated from the same spring. A sox gene cluster was found, similar in structure to those from other Acidithiobacillia - by comparison with Thiobacillus thioparus and Paracoccus denitrificans, an additional gene between soxA and soxB was found, annotated as a DUF302-family protein of unknown function. As the Kelly-Friedrich pathway of thiosulfate oxidation (encoded by sox) is not used in Thermithiobacillus spp., the role of the operon (if any) in this species remains unknown. We speculate that DUF302 and sox genes may have a role in periplasmic trithionate oxidation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Omasits, U.; Quebatte, Maxime; Stekhoven, Daniel J.
2013-11-01
Prokaryotes, due to their moderate complexity, are particularly amenable to the comprehensive identification of the protein repertoire expressed under different conditions. We applied a generic strategy to identify a complete expressed prokaryotic proteome, which is based on the analysis of RNA and proteins extracted from matched samples. Saturated transcriptome profiling by RNA-seq provided an endpoint estimate of the protein-coding genes expressed under two conditions which mimic the interaction of Bartonella henselae with its mammalian host. Directed shotgun proteomics experiments were carried out on four subcellular fractions. By specifically targeting proteins which are short, basic, low abundant, and membrane localized, wemore » could eliminate their initial underrepresentation compared to the estimated endpoint. A total of 1250 proteins were identified with an estimated false discovery rate below 1%. This represents 85% of all distinct annotated proteins and ~90% of the expressed protein-coding genes. Genes that were detected at the transcript but not protein level, were found to be highly enriched in several genomic islands. Furthermore, genes that lacked an ortholog and a functional annotation were not detected at the protein level; these may represent examples of overprediction in genome annotations. A dramatic membrane proteome reorganization was observed, including differential regulation of autotransporters, adhesins, and hemin binding proteins. Particularly noteworthy was the complete membrane proteome coverage, which included expression of all members of the VirB/D4 type IV secretion system, a key virulence factor.« less
Omasits, Ulrich; Quebatte, Maxime; Stekhoven, Daniel J.; Fortes, Claudia; Roschitzki, Bernd; Robinson, Mark D.; Dehio, Christoph; Ahrens, Christian H.
2013-01-01
Prokaryotes, due to their moderate complexity, are particularly amenable to the comprehensive identification of the protein repertoire expressed under different conditions. We applied a generic strategy to identify a complete expressed prokaryotic proteome, which is based on the analysis of RNA and proteins extracted from matched samples. Saturated transcriptome profiling by RNA-seq provided an endpoint estimate of the protein-coding genes expressed under two conditions which mimic the interaction of Bartonella henselae with its mammalian host. Directed shotgun proteomics experiments were carried out on four subcellular fractions. By specifically targeting proteins which are short, basic, low abundant, and membrane localized, we could eliminate their initial underrepresentation compared to the estimated endpoint. A total of 1250 proteins were identified with an estimated false discovery rate below 1%. This represents 85% of all distinct annotated proteins and ∼90% of the expressed protein-coding genes. Genes that were detected at the transcript but not protein level, were found to be highly enriched in several genomic islands. Furthermore, genes that lacked an ortholog and a functional annotation were not detected at the protein level; these may represent examples of overprediction in genome annotations. A dramatic membrane proteome reorganization was observed, including differential regulation of autotransporters, adhesins, and hemin binding proteins. Particularly noteworthy was the complete membrane proteome coverage, which included expression of all members of the VirB/D4 type IV secretion system, a key virulence factor. PMID:23878158
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goodwin, Stephen; McCorison, Cassandra B.; Cavaletto, Jessica R.
Fungi in the class Dothideomycetes often live in extreme environments or have unusual physiology. One of these, the wine cellar mold Zasmidium cellare, produces thick curtains of mycelial growth in cellars with high humidity, and its ability to metabolize volatile organic compounds including alcohols, esters and formaldehyde is thought to improve air quality. It grows slowly but appears to outcompete ordinarily faster-growing species under anaerobic conditions.Whether these abilities have affected its mitochondrial genome is not known.To fill this gap, its mitochondrial genome was assembled as part of a whole- genome shotgun-sequencing project.The circular-mapping mitochondrial genome of Z. cellare, at onlymore » 23,743 bp, is the smallest yet reported for a filamentous fungus.It contains the complete set of 14 protein-coding genes seen typically in other filamentous fungi, along with genes for large and small ribosomal RNA subunits, 25 predicted tRNA genes capable of decoding all 20 amino acids, and a single open reading frame potentially coding for a protein of unknown function.The Z. cellare mitochondrial genome had genes encoded on both strands with a single change of direction, different from most other fungi but consistent with the Dothideomycetes. The high synteny among mitochondrial genomes of fungi in the Eurotiomycetes broke down almost completely in the Dothideomycetes.Only a low level of microsynteny was observed among protein-coding and tRNA genes in comparison with Mycosphaerella graminicola (synonym Zymoseptoria tritici), the only other fungus in the order Capnodiales with a sequenced mitochondrial genome, involving the three gene pairs atp8-atp9, nad2-nad3, and nad4L-nad5.However, even this low level of microsynteny did not extend to other fungi in the Dothideomycetes and Eurotiomycetes. Phylogenetic analysis of concatenated protein-coding genes confirmed the relationship between Z. cellare and M. graminicola in the Capnodiales, although conclusions were limited due to low sampling density.Other than its small size, the only unusual feature of the Z. cellare mitochondrial genome was two copies of a 110-bp sequence that were duplicated, inverted and separated by approximately 1 kb. This inverted-repeat sequence confused the assembly program but appears to have no functional significance.The small size of the Z. cellare mitochondrial genome was due to slightly smaller genes, lack of introns and non-essential genes, reduced intergenic spaces and very few ORFs relative to other fungi rather than a loss of essential genes. Whether this reduction facilitates its unusual biology remains unknown.« less
Testa, Alison C; Hane, James K; Ellwood, Simon R; Oliver, Richard P
2015-03-11
The impact of gene annotation quality on functional and comparative genomics makes gene prediction an important process, particularly in non-model species, including many fungi. Sets of homologous protein sequences are rarely complete with respect to the fungal species of interest and are often small or unreliable, especially when closely related species have not been sequenced or annotated in detail. In these cases, protein homology-based evidence fails to correctly annotate many genes, or significantly improve ab initio predictions. Generalised hidden Markov models (GHMM) have proven to be invaluable tools in gene annotation and, recently, RNA-seq has emerged as a cost-effective means to significantly improve the quality of automated gene annotation. As these methods do not require sets of homologous proteins, improving gene prediction from these resources is of benefit to fungal researchers. While many pipelines now incorporate RNA-seq data in training GHMMs, there has been relatively little investigation into additionally combining RNA-seq data at the point of prediction, and room for improvement in this area motivates this study. CodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. RNA-seq data informs annotations both during gene-model training and in prediction. Our approach capitalises on the high quality of fungal transcript assemblies by incorporating predictions made directly from transcript sequences. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci. Stringent benchmarking against high-confidence annotation subsets showed CodingQuarry predicted 91.3% of Schizosaccharomyces pombe genes and 90.4% of Saccharomyces cerevisiae genes perfectly. These results are 4-5% better than those of AUGUSTUS, the next best performing RNA-seq driven gene predictor tested. Comparisons against whole genome Sc. pombe and S. cerevisiae annotations further substantiate a 4-5% improvement in the number of correctly predicted genes. We demonstrate the success of a novel method of incorporating RNA-seq data into GHMM fungal gene prediction. This shows that a high quality annotation can be achieved without relying on protein homology or a training set of genes. CodingQuarry is freely available ( https://sourceforge.net/projects/codingquarry/ ), and suitable for incorporation into genome annotation pipelines.
Tripathi, Kumar Parijat; Evangelista, Daniela; Zuccaro, Antonio; Guarracino, Mario Rosario
2015-01-01
RNA-seq is a new tool to measure RNA transcript counts, using high-throughput sequencing at an extraordinary accuracy. It provides quantitative means to explore the transcriptome of an organism of interest. However, interpreting this extremely large data into biological knowledge is a problem, and biologist-friendly tools are lacking. In our lab, we developed Transcriptator, a web application based on a computational Python pipeline with a user-friendly Java interface. This pipeline uses the web services available for BLAST (Basis Local Search Alignment Tool), QuickGO and DAVID (Database for Annotation, Visualization and Integrated Discovery) tools. It offers a report on statistical analysis of functional and Gene Ontology (GO) annotation's enrichment. It helps users to identify enriched biological themes, particularly GO terms, pathways, domains, gene/proteins features and protein-protein interactions related informations. It clusters the transcripts based on functional annotations and generates a tabular report for functional and gene ontology annotations for each submitted transcript to the web server. The implementation of QuickGo web-services in our pipeline enable the users to carry out GO-Slim analysis, whereas the integration of PORTRAIT (Prediction of transcriptomic non coding RNA (ncRNA) by ab initio methods) helps to identify the non coding RNAs and their regulatory role in transcriptome. In summary, Transcriptator is a useful software for both NGS and array data. It helps the users to characterize the de-novo assembled reads, obtained from NGS experiments for non-referenced organisms, while it also performs the functional enrichment analysis of differentially expressed transcripts/genes for both RNA-seq and micro-array experiments. It generates easy to read tables and interactive charts for better understanding of the data. The pipeline is modular in nature, and provides an opportunity to add new plugins in the future. Web application is freely available at: http://www-labgtp.na.icar.cnr.it/Transcriptator.
Adaptive evolution of the matrix extracellular phosphoglycoprotein in mammals
2011-01-01
Background Matrix extracellular phosphoglycoprotein (MEPE) belongs to a family of small integrin-binding ligand N-linked glycoproteins (SIBLINGs) that play a key role in skeleton development, particularly in mineralization, phosphate regulation and osteogenesis. MEPE associated disorders cause various physiological effects, such as loss of bone mass, tumors and disruption of renal function (hypophosphatemia). The study of this developmental gene from an evolutionary perspective could provide valuable insights on the adaptive diversification of morphological phenotypes in vertebrates. Results Here we studied the adaptive evolution of the MEPE gene in 26 Eutherian mammals and three birds. The comparative genomic analyses revealed a high degree of evolutionary conservation of some coding and non-coding regions of the MEPE gene across mammals indicating a possible regulatory or functional role likely related with mineralization and/or phosphate regulation. However, the majority of the coding region had a fast evolutionary rate, particularly within the largest exon (1467 bp). Rodentia and Scandentia had distinct substitution rates with an increased accumulation of both synonymous and non-synonymous mutations compared with other mammalian lineages. Characteristics of the gene (e.g. biochemical, evolutionary rate, and intronic conservation) differed greatly among lineages of the eight mammalian orders. We identified 20 sites with significant positive selection signatures (codon and protein level) outside the main regulatory motifs (dentonin and ASARM) suggestive of an adaptive role. Conversely, we find three sites under selection in the signal peptide and one in the ASARM motif that were supported by at least one selection model. The MEPE protein tends to accumulate amino acids promoting disorder and potential phosphorylation targets. Conclusion MEPE shows a high number of selection signatures, revealing the crucial role of positive selection in the evolution of this SIBLING member. The selection signatures were found mainly outside the functional motifs, reinforcing the idea that other regions outside the dentonin and the ASARM might be crucial for the function of the protein and future studies should be undertaken to understand its importance. PMID:22103247
Uemura, Takeshi; Mori, Takuma; Kurihara, Taiga; Kawase, Shiori; Koike, Rie; Satoga, Michiru; Cao, Xueshan; Li, Xue; Yanagawa, Toru; Sakurai, Takayuki; Shindo, Takayuki; Tabuchi, Katsuhiko
2016-01-01
Genome editing is a powerful technique for studying gene functions. CRISPR/Cas9-mediated gene knock-in has recently been applied to various cells and organisms. Here, we successfully knocked in an EGFP coding sequence at the site immediately after the first ATG codon of the β-actin gene in neurons in the brain by the combined use of the CRISPR/Cas9 system and in utero electroporation technique, resulting in the expression of the EGFP-tagged β-actin protein in cortical layer 2/3 pyramidal neurons. We detected EGFP fluorescence signals in the soma and neurites of EGFP knock-in neurons. These signals were particularly abundant in the head of dendritic spines, corresponding to the localization of the endogenous β-actin protein. EGFP knock-in neurons showed no detectable changes in spine density and basic electrophysiological properties. In contrast, exogenously overexpressed EGFP-β-actin showed increased spine density and EPSC frequency, and changed resting membrane potential. Thus, our technique provides a potential tool to elucidate the localization of various endogenous proteins in neurons by epitope tagging without altering neuronal and synaptic functions. This technique can be also useful for introducing a specific mutation into genes to study the function of proteins and genomic elements in brain neurons. PMID:27782168
Uemura, Takeshi; Mori, Takuma; Kurihara, Taiga; Kawase, Shiori; Koike, Rie; Satoga, Michiru; Cao, Xueshan; Li, Xue; Yanagawa, Toru; Sakurai, Takayuki; Shindo, Takayuki; Tabuchi, Katsuhiko
2016-10-26
Genome editing is a powerful technique for studying gene functions. CRISPR/Cas9-mediated gene knock-in has recently been applied to various cells and organisms. Here, we successfully knocked in an EGFP coding sequence at the site immediately after the first ATG codon of the β-actin gene in neurons in the brain by the combined use of the CRISPR/Cas9 system and in utero electroporation technique, resulting in the expression of the EGFP-tagged β-actin protein in cortical layer 2/3 pyramidal neurons. We detected EGFP fluorescence signals in the soma and neurites of EGFP knock-in neurons. These signals were particularly abundant in the head of dendritic spines, corresponding to the localization of the endogenous β-actin protein. EGFP knock-in neurons showed no detectable changes in spine density and basic electrophysiological properties. In contrast, exogenously overexpressed EGFP-β-actin showed increased spine density and EPSC frequency, and changed resting membrane potential. Thus, our technique provides a potential tool to elucidate the localization of various endogenous proteins in neurons by epitope tagging without altering neuronal and synaptic functions. This technique can be also useful for introducing a specific mutation into genes to study the function of proteins and genomic elements in brain neurons.
The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...
2016-02-24
The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less
The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos
The DOE-JGI Metagenome Annotation Pipeline (MAP v.4) performs structural and functional annotation for metagenomic sequences that are submitted to the Integrated Microbial Genomes with Microbiomes (IMG/M) system for comparative analysis. The pipeline runs on nucleotide sequences provide d via the IMG submission site. Users must first define their analysis projects in GOLD and then submit the associated sequence datasets consisting of scaffolds/contigs with optional coverage information and/or unassembled reads in fasta and fastq file formats. The MAP processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNAs, as well as CRISPR elements. Structural annotation ismore » followed by functional annotation including assignment of protein product names and connection to various protein family databases.« less
Noncoding RNA Shows Context-Dependent Function | Center for Cancer Research
In addition to well-studied protein coding sequences, it is known that the genomes of higher organisms produce numerous noncoding RNAs (ncRNAs). Important roles for some ncRNAs in cell function have been demonstrated, though usually on a case-by-case basis, leading some scientists to argue that the majority of ncRNA production is just “noise” that results from the imperfect transcription machinery. The fact that many ncRNAs overlap with coding genes has hampered studies of their activities. Thus, a general understanding of whether ncRNA production is functional or not is lacking. To address this issue, Daniel Larson, Ph.D., of CCR’s Laboratory of Receptor Biology and Gene Expression, and his colleagues developed a new approach using single-molecule imaging in living cells. The researchers specifically labeled coding and ncRNAs from the GAL locus in yeast, which regulates the galactose response. Glucose is the preferred source of carbon for yeast, but when it is scarce, genes within the GAL locus, including GAL10 and GAL1, are activated to allow the metabolism of galactose.
Kong, Min; Wang, Fengjuan; Tian, Liuying; Tang, Hui; Zhang, Liping
2017-12-15
Glutathione (GSH) fulfills a variety of metabolic functions, participates in oxidative stress response, and defends against toxic actions of heavy metals and xenobiotics. In this study, GSH was detected in Rhodosporidium diobovatum by high-performance liquid chromatography (HPLC). Then, two novel enzymes from R. diobovatum were characterized that convert glutamate, cysteine, and glycine into GSH. Based on reverse transcription PCR, we obtained the glutathione synthetase gene (GSH2), 1866 bp, coding for a 56.6-kDa protein, and the glutamate cysteine ligase gene (GSH1), 2469 bp, coding for a 90.5-kDa protein. The role of GSH1 and GSH2 for the biosynthesis of GSH in the marine yeast R. diobovatum was determined by deletions using the CRISPR-Cas9 nuclease system and enzymatic activity. These results also showed that GSH1 and GSH2 were involved in the production of GSH and are thus being potentially useful to engineer GSH pathways. Alternatively, pET-GSH constructed using vitro recombination could be used to detect the function of genes related to GSH biosynthesis. Finally, the fermentation parameters determined in the present study provide a reference for industrial GSH production in R. diobovatum.
NASA Astrophysics Data System (ADS)
Kong, Min; Wang, Fengjuan; Tian, Liuying; Tang, Hui; Zhang, Liping
2018-02-01
Glutathione (GSH) fulfills a variety of metabolic functions, participates in oxidative stress response, and defends against toxic actions of heavy metals and xenobiotics. In this study, GSH was detected in Rhodosporidium diobovatum by high-performance liquid chromatography (HPLC). Then, two novel enzymes from R. diobovatum were characterized that convert glutamate, cysteine, and glycine into GSH. Based on reverse transcription PCR, we obtained the glutathione synthetase gene ( GSH2), 1866 bp, coding for a 56.6-kDa protein, and the glutamate cysteine ligase gene ( GSH1), 2469 bp, coding for a 90.5-kDa protein. The role of GSH1 and GSH2 for the biosynthesis of GSH in the marine yeast R. diobovatum was determined by deletions using the CRISPR-Cas9 nuclease system and enzymatic activity. These results also showed that GSH1 and GSH2 were involved in the production of GSH and are thus being potentially useful to engineer GSH pathways. Alternatively, pET- GSH constructed using vitro recombination could be used to detect the function of genes related to GSH biosynthesis. Finally, the fermentation parameters determined in the present study provide a reference for industrial GSH production in R. diobovatum.
Hutchins, James R. A.
2014-01-01
The genomic era has enabled research projects that use approaches including genome-scale screens, microarray analysis, next-generation sequencing, and mass spectrometry–based proteomics to discover genes and proteins involved in biological processes. Such methods generate data sets of gene, transcript, or protein hits that researchers wish to explore to understand their properties and functions and thus their possible roles in biological systems of interest. Recent years have seen a profusion of Internet-based resources to aid this process. This review takes the viewpoint of the curious biologist wishing to explore the properties of protein-coding genes and their products, identified using genome-based technologies. Ten key questions are asked about each hit, addressing functions, phenotypes, expression, evolutionary conservation, disease association, protein structure, interactors, posttranslational modifications, and inhibitors. Answers are provided by presenting the latest publicly available resources, together with methods for hit-specific and data set–wide information retrieval, suited to any genome-based analytical technique and experimental species. The utility of these resources is demonstrated for 20 factors regulating cell proliferation. Results obtained using some of these are discussed in more depth using the p53 tumor suppressor as an example. This flexible and universally applicable approach for characterizing experimental hits helps researchers to maximize the potential of their projects for biological discovery. PMID:24723265
Histone-derived piRNA biogenesis depends on the ping-pong partners Piwi5 and Ago3 in Aedes aegypti
Girardi, Erika; Miesen, Pascal; Pennings, Bas; Frangeul, Lionel; Saleh, Maria-Carla
2017-01-01
Abstract The piRNA pathway is of key importance in controlling transposable elements in most animal species. In the vector mosquito Aedes aegypti, the presence of eight PIWI proteins and the accumulation of viral piRNAs upon arbovirus infection suggest additional functions of the piRNA pathway beyond genome defense. To better understand the regulatory potential of this pathway, we analyzed in detail host-derived piRNAs in A. aegypti Aag2 cells. We show that a large repertoire of protein-coding genes and non-retroviral integrated RNA virus elements are processed into genic piRNAs by different combinations of PIWI proteins. Among these, we identify a class of genes that produces piRNAs from coding sequences in an Ago3- and Piwi5-dependent fashion. We demonstrate that the replication-dependent histone gene family is a genic source of ping-pong dependent piRNAs and that histone-derived piRNAs are dynamically expressed throughout the cell cycle, suggesting a role for the piRNA pathway in the regulation of histone gene expression. Moreover, our results establish the Aag2 cell line as an accessible experimental model to study gene-derived piRNAs. PMID:28115625
PaperBLAST: Text Mining Papers for Information about Homologs.
Price, Morgan N; Arkin, Adam P
2017-01-01
Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST's database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/. IMPORTANCE With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins' functions.
Dubey, Bhawna; Meganathan, P R; Haque, Ikramul
2012-07-01
This paper reports the complete mitochondrial genome sequence of an endangered Indian snake, Python molurus molurus (Indian Rock Python). A typical snake mitochondrial (mt) genome of 17258 bp length comprising of 37 genes including the 13 protein coding genes, 22 tRNA genes, and 2 ribosomal RNA genes along with duplicate control regions is described herein. The P. molurus molurus mt. genome is relatively similar to other snake mt. genomes with respect to gene arrangement, composition, tRNA structures and skews of AT/GC bases. The nucleotide composition of the genome shows that there are more A-C % than T-G% on the positive strand as revealed by positive AT and CG skews. Comparison of individual protein coding genes, with other snake genomes suggests that ATP8 and NADH3 genes have high divergence rates. Codon usage analysis reveals a preference of NNC codons over NNG codons in the mt. genome of P. molurus. Also, the synonymous and non-synonymous substitution rates (ka/ks) suggest that most of the protein coding genes are under purifying selection pressure. The phylogenetic analyses involving the concatenated 13 protein coding genes of P. molurus molurus conformed to the previously established snake phylogeny.
Greif, Gonzalo; Rodriguez, Matias; Alvarez-Valin, Fernando
2017-01-01
American trypanosomiasis is a chronic and endemic disease which affects millions of people. Trypanosoma cruzi, its causative agent, has a life cycle that involves complex morphological and functional transitions, as well as a variety of environmental conditions. This requires a tight regulation of gene expression, which is achieved mainly by post-transcriptional regulation. In this work we conducted an RNAseq analysis of the three major life cycle stages of T. cruzi: amastigotes, epimastigotes and trypomastigotes. This analysis allowed us to delineate specific transcriptomic profiling for each stage, and also to identify those biological processes of major relevance in each state. Stage specific expression profiling evidenced the plasticity of T. cruzi to adapt quickly to different conditions, with particular focus on membrane remodeling and metabolic shifts along the life cycle. Epimastigotes, which replicate in the gut of insect vectors, showed higher expression of genes related to energy metabolism, mainly Krebs cycle, respiratory chain and oxidative phosphorylation related genes, and anabolism related genes associated to nucleotide and steroid biosynthesis; also, a general down-regulation of surface glycoprotein coding genes was seen at this stage. Trypomastigotes, living extracellularly in the bloodstream of mammals, express a plethora of surface proteins and signaling genes involved in invasion and evasion of immune response. Amastigotes mostly express membrane transporters and genes involved in regulation of cell cycle, and also express a specific subset of surface glycoprotein coding genes. In addition, these results allowed us to improve the annotation of the Dm28c genome, identifying new ORFs and set the stage for construction of networks of co-expression, which can give clues about coded proteins of unknown functions. PMID:28286708
[Long non-coding RNAs in the pathophysiology of atherosclerosis].
Novak, Jan; Vašků, Julie Bienertová; Souček, Miroslav
2018-01-01
The human genome contains about 22 000 protein-coding genes that are transcribed to an even larger amount of messenger RNAs (mRNA). Interestingly, the results of the project ENCODE from 2012 show, that despite up to 90 % of our genome being actively transcribed, protein-coding mRNAs make up only 2-3 % of the total amount of the transcribed RNA. The rest of RNA transcripts is not translated to proteins and that is why they are referred to as "non-coding RNAs". Earlier the non-coding RNA was considered "the dark matter of genome", or "the junk", whose genes has accumulated in our DNA during the course of evolution. Today we already know that non-coding RNAs fulfil a variety of regulatory functions in our body - they intervene into epigenetic processes from chromatin remodelling to histone methylation, or into the transcription process itself, or even post-transcription processes. Long non-coding RNAs (lncRNA) are one of the classes of non-coding RNAs that have more than 200 nucleotides in length (non-coding RNAs with less than 200 nucleotides in length are called small non-coding RNAs). lncRNAs represent a widely varied and large group of molecules with diverse regulatory functions. We can identify them in all thinkable cell types or tissues, or even in an extracellular space, which includes blood, specifically plasma. Their levels change during the course of organogenesis, they are specific to different tissues and their changes also occur along with the development of different illnesses, including atherosclerosis. This review article aims to present lncRNAs problematics in general and then focuses on some of their specific representatives in relation to the process of atherosclerosis (i.e. we describe lncRNA involvement in the biology of endothelial cells, vascular smooth muscle cells or immune cells), and we further describe possible clinical potential of lncRNA, whether in diagnostics or therapy of atherosclerosis and its clinical manifestations.Key words: atherosclerosis - lincRNA - lncRNA - MALAT - MIAT.
New technologies accelerate the exploration of non-coding RNAs in horticultural plants
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Degao; Mewalal, Ritesh; Hu, Rongbin
Non-coding RNAs (ncRNAs), that is, RNAs not translated into proteins, are crucial regulators of a variety of biological processes in plants. While protein-encoding genes have been relatively well-annotated in sequenced genomes, accounting for a small portion of the genome space in plants, the universe of plant ncRNAs is rapidly expanding. Recent advances in experimental and computational technologies have generated a great momentum for discovery and functional characterization of ncRNAs. Here we summarize the classification and known biological functions of plant ncRNAs, review the application of next-generation sequencing (NGS) technology and ribosome profiling technology to ncRNA discovery in horticultural plants andmore » discuss the application of new technologies, especially the new genome-editing tool clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) systems, to functional characterization of plant ncRNAs.« less
New technologies accelerate the exploration of non-coding RNAs in horticultural plants
Liu, Degao; Mewalal, Ritesh; Hu, Rongbin; Tuskan, Gerald A; Yang, Xiaohan
2017-01-01
Non-coding RNAs (ncRNAs), that is, RNAs not translated into proteins, are crucial regulators of a variety of biological processes in plants. While protein-encoding genes have been relatively well-annotated in sequenced genomes, accounting for a small portion of the genome space in plants, the universe of plant ncRNAs is rapidly expanding. Recent advances in experimental and computational technologies have generated a great momentum for discovery and functional characterization of ncRNAs. Here we summarize the classification and known biological functions of plant ncRNAs, review the application of next-generation sequencing (NGS) technology and ribosome profiling technology to ncRNA discovery in horticultural plants and discuss the application of new technologies, especially the new genome-editing tool clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) systems, to functional characterization of plant ncRNAs. PMID:28698797
Robust expression of a bioactive mammalian protein in chlamydomonas chloroplast
Mayfield, Stephen P.
2010-03-16
Methods and compositions are disclosed to engineer chloroplast comprising heterologous mammalian genes via a direct replacement of chloroplast Photosystem II (PSII) reaction center protein coding regions to achieve expression of recombinant protein above 5% of total protein. When algae is used, algal expressed protein is produced predominantly as a soluble protein where the functional activity of the peptide is intact. As the host algae is edible, production of biologics in this organism for oral delivery or proteins/peptides, especially gut active proteins, without purification is disclosed.
Robust expression of a bioactive mammalian protein in Chlamydomonas chloroplast
Mayfield, Stephen P
2015-01-13
Methods and compositions are disclosed to engineer chloroplast comprising heterologous mammalian genes via a direct replacement of chloroplast Photosystem II (PSII) reaction center protein coding regions to achieve expression of recombinant protein above 5% of total protein. When algae is used, algal expressed protein is produced predominantly as a soluble protein where the functional activity of the peptide is intact. As the host algae is edible, production of biologics in this organism for oral delivery of proteins/peptides, especially gut active proteins, without purification is disclosed.
SoyNet: a database of co-functional networks for soybean Glycine max.
Kim, Eiru; Hwang, Sohyun; Lee, Insuk
2017-01-04
Soybean (Glycine max) is a legume crop with substantial economic value, providing a source of oil and protein for humans and livestock. More than 50% of edible oils consumed globally are derived from this crop. Soybean plants are also important for soil fertility, as they fix atmospheric nitrogen by symbiosis with microorganisms. The latest soybean genome annotation (version 2.0) lists 56 044 coding genes, yet their functional contributions to crop traits remain mostly unknown. Co-functional networks have proven useful for identifying genes that are involved in a particular pathway or phenotype with various network algorithms. Here, we present SoyNet (available at www.inetbio.org/soynet), a database of co-functional networks for G. max and a companion web server for network-based functional predictions. SoyNet maps 1 940 284 co-functional links between 40 812 soybean genes (72.8% of the coding genome), which were inferred from 21 distinct types of genomics data including 734 microarrays and 290 RNA-seq samples from soybean. SoyNet provides a new route to functional investigation of the soybean genome, elucidating genes and pathways of agricultural importance. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Suo, Chen; Hrydziuszko, Olga; Lee, Donghwan; Pramana, Setia; Saputra, Dhany; Joshi, Himanshu; Calza, Stefano; Pawitan, Yudi
2015-08-15
Genome and transcriptome analyses can be used to explore cancers comprehensively, and it is increasingly common to have multiple omics data measured from each individual. Furthermore, there are rich functional data such as predicted impact of mutations on protein coding and gene/protein networks. However, integration of the complex information across the different omics and functional data is still challenging. Clinical validation, particularly based on patient outcomes such as survival, is important for assessing the relevance of the integrated information and for comparing different procedures. An analysis pipeline is built for integrating genomic and transcriptomic alterations from whole-exome and RNA sequence data and functional data from protein function prediction and gene interaction networks. The method accumulates evidence for the functional implications of mutated potential driver genes found within and across patients. A driver-gene score (DGscore) is developed to capture the cumulative effect of such genes. To contribute to the score, a gene has to be frequently mutated, with high or moderate mutational impact at protein level, exhibiting an extreme expression and functionally linked to many differentially expressed neighbors in the functional gene network. The pipeline is applied to 60 matched tumor and normal samples of the same patient from The Cancer Genome Atlas breast-cancer project. In clinical validation, patients with high DGscores have worse survival than those with low scores (P = 0.001). Furthermore, the DGscore outperforms the established expression-based signatures MammaPrint and PAM50 in predicting patient survival. In conclusion, integration of mutation, expression and functional data allows identification of clinically relevant potential driver genes in cancer. The documented pipeline including annotated sample scripts can be found in http://fafner.meb.ki.se/biostatwiki/driver-genes/. yudi.pawitan@ki.se Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
PaperBLAST: Text Mining Papers for Information about Homologs
Price, Morgan N.; Arkin, Adam P.
2017-08-15
Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quicklymore » finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions.« less
PaperBLAST: Text Mining Papers for Information about Homologs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Price, Morgan N.; Arkin, Adam P.
Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quicklymore » finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions.« less
PaperBLAST: Text Mining Papers for Information about Homologs
Arkin, Adam P.
2017-01-01
ABSTRACT Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST’s database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/. IMPORTANCE With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins’ functions. PMID:28845458
Determination of the Core of a Minimal Bacterial Gene Set†
Gil, Rosario; Silva, Francisco J.; Peretó, Juli; Moya, Andrés
2004-01-01
The availability of a large number of complete genome sequences raises the question of how many genes are essential for cellular life. Trying to reconstruct the core of the protein-coding gene set for a hypothetical minimal bacterial cell, we have performed a computational comparative analysis of eight bacterial genomes. Six of the analyzed genomes are very small due to a dramatic genome size reduction process, while the other two, corresponding to free-living relatives, are larger. The available data from several systematic experimental approaches to define all the essential genes in some completely sequenced bacterial genomes were also considered, and a reconstruction of a minimal metabolic machinery necessary to sustain life was carried out. The proposed minimal genome contains 206 protein-coding genes with all the genetic information necessary for self-maintenance and reproduction in the presence of a full complement of essential nutrients and in the absence of environmental stress. The main features of such a minimal gene set, as well as the metabolic functions that must be present in the hypothetical minimal cell, are discussed. PMID:15353568
Emdin, Connor A; Khera, Amit V; Chaffin, Mark; Klarin, Derek; Natarajan, Pradeep; Aragam, Krishna; Haas, Mary; Bick, Alexander; Zekavat, Seyedeh M; Nomura, Akihiro; Ardissino, Diego; Wilson, James G; Schunkert, Heribert; McPherson, Ruth; Watkins, Hugh; Elosua, Roberto; Bown, Matthew J; Samani, Nilesh J; Baber, Usman; Erdmann, Jeanette; Gupta, Namrata; Danesh, John; Chasman, Daniel; Ridker, Paul; Denny, Joshua; Bastarache, Lisa; Lichtman, Judith H; D'Onofrio, Gail; Mattera, Jennifer; Spertus, John A; Sheu, Wayne H-H; Taylor, Kent D; Psaty, Bruce M; Rich, Stephen S; Post, Wendy; Rotter, Jerome I; Chen, Yii-Der Ida; Krumholz, Harlan; Saleheen, Danish; Gabriel, Stacey; Kathiresan, Sekar
2018-04-24
Less than 3% of protein-coding genetic variants are predicted to result in loss of protein function through the introduction of a stop codon, frameshift, or the disruption of an essential splice site; however, such predicted loss-of-function (pLOF) variants provide insight into effector transcript and direction of biological effect. In >400,000 UK Biobank participants, we conduct association analyses of 3759 pLOF variants with six metabolic traits, six cardiometabolic diseases, and twelve additional diseases. We identified 18 new low-frequency or rare (allele frequency < 5%) pLOF variant-phenotype associations. pLOF variants in the gene GPR151 protect against obesity and type 2 diabetes, in the gene IL33 against asthma and allergic disease, and in the gene IFIH1 against hypothyroidism. In the gene PDE3B, pLOF variants associate with elevated height, improved body fat distribution and protection from coronary artery disease. Our findings prioritize genes for which pharmacologic mimics of pLOF variants may lower risk for disease.
Asaf, Sajjad; Khan, Abdul Latif; Khan, Abdur Rahim; Waqas, Muhammad; Kang, Sang-Mo; Khan, Muhammad Aaqil; Shahzad, Raheem; Seo, Chang-Woo; Shin, Jae-Ho; Lee, In-Jung
2016-01-01
Oryza minuta (Poaceae family) is a tetraploid wild relative of cultivated rice with a BBCC genome. O. minuta has the potential to resist against various pathogenic diseases such as bacterial blight (BB), white backed planthopper (WBPH) and brown plant hopper (BPH). Here, we sequenced and annotated the complete mitochondrial genome of O. minuta. The mtDNA genome is 515,022 bp, containing 60 protein coding genes, 31 tRNA genes and two rRNA genes. The mitochondrial genome organization and the gene content at the nucleotide level are highly similar (89%) to that of O. rufipogon. Comparison with other related species revealed that most of the genes with known function are conserved among the Poaceae members. Similarly, O. minuta mt genome shared 24 protein-coding genes, 15 tRNA genes and 1 ribosomal RNA gene with other rice species (indica and japonica). The evolutionary relationship and phylogenetic analysis revealed that O. minuta is more closely related to O. rufipogon than to any other related species. Such studies are essential to understand the evolutionary divergence among species and analyze common gene pools to combat risks in the current scenario of a changing environment.
Genenames.org: the HGNC and VGNC resources in 2017.
Yates, Bethan; Braschi, Bryony; Gray, Kristian A; Seal, Ruth L; Tweedie, Susan; Bruford, Elspeth A
2017-01-04
The HUGO Gene Nomenclature Committee (HGNC) based at the European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. Currently the HGNC database contains almost 40 000 approved gene symbols, over 19 000 of which represent protein-coding genes. In addition to naming genomic loci we manually curate genes into family sets based on shared characteristics such as homology, function or phenotype. We have recently updated our gene family resources and introduced new improved visualizations which can be seen alongside our gene symbol reports on our primary website http://www.genenames.org In 2016 we expanded our remit and formed the Vertebrate Gene Nomenclature Committee (VGNC) which is responsible for assigning names to vertebrate species lacking a dedicated nomenclature group. Using the chimpanzee genome as a pilot project we have approved symbols and names for over 14 500 protein-coding genes in chimpanzee, and have developed a new website http://vertebrate.genenames.org to distribute these data. Here, we review our online data and resources, focusing particularly on the improvements and new developments made during the last two years. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ethanol production by recombinant hosts
Fowler, David E.; Horton, Philip G.; Ben-Bassat, Arie
1996-01-01
Novel plasmids comprising genes which code for the alcohol dehydrogenase and pyruvate decarboxylase are described. Also described are recombinant hosts which have been transformed with genes coding for alcohol dehydrogenase and pyruvate. By virtue of their transformation with these genes, the recombinant hosts are capable of producing significant amounts of ethanol as a fermentation product. Also disclosed are methods for increasing the growth of recombinant hosts and methods for reducing the accumulation of undesirable metabolic products in the growth medium of these hosts. Also disclosed are recombinant host capable of producing significant amounts of ethanol as a fermentation product of oligosaccharides and plasmids comprising genes encoding polysaccharases, in addition to the genes described above which code for the alcohol dehydrogenase and pyruvate decarboxylase. Further, methods are described for producing ethanol from oligomeric feedstock using the recombinant hosts described above. Also provided is a method for enhancing the production of functional proteins in a recombinant host comprising overexpressing an adhB gene in the host. Further provided are process designs for fermenting oligosaccharide-containing biomass to ethanol.
Ethanol production by recombinant hosts
Ingram, Lonnie O.; Beall, David S.; Burchhardt, Gerhard F. H.; Guimaraes, Walter V.; Ohta, Kazuyoshi; Wood, Brent E.; Shanmugam, Keelnatham T.
1995-01-01
Novel plasmids comprising genes which code for the alcohol dehydrogenase and pyruvate decarboxylase are described. Also described are recombinant hosts which have been transformed with genes coding for alcohol dehydrogenase and pyruvate. By virtue of their transformation with these genes, the recombinant hosts are capable of producing significant amounts of ethanol as a fermentation product. Also disclosed are methods for increasing the growth of recombinant hosts and methods for reducing the accumulation of undesirable metabolic products in the growth medium of these hosts. Also disclosed are recombinant host capable of producing significant amounts of ethanol as a fermentation product of oligosaccharides and plasmids comprising genes encoding polysaccharases, in addition to the genes described above which code for the alcohol dehydrogenase and pyruvate decarboxylase. Further, methods are described for producing ethanol from oligomeric feedstock using the recombinant hosts described above. Also provided is a method for enhancing the production of functional proteins in a recombinant host comprising overexpressing an adhB gene in the host. Further provided are process designs for fermenting oligosaccharide-containing biomass to ethanol.
Proudhon, D; Wei, J; Briat, J; Theil, E C
1996-03-01
Ferritin, a protein widespread in nature, concentrates iron approximately 10(11)-10(12)-fold above the solubility within a spherical shell of 24 subunits; it derives in plants and animals from a common ancestor (based on sequence) but displays a cytoplasmic location in animals compared to the plastid in contemporary plants. Ferritin gene regulation in plants and animals is altered by development, hormones, and excess iron; iron signals target DNA in plants but mRNA in animals. Evolution has thus conserved the two end points of ferritin gene expression, the physiological signals and the protein structure, while allowing some divergence of the genetic mechanisms. Comparison of ferritin gene organization in plants and animals, made possible by the cloning of a dicot (soybean) ferritin gene presented here and the recent cloning of two monocot (maize) ferritin genes, shows evolutionary divergence in ferritin gene organization between plants and animals but conservation among plants or among animals; divergence in the genetic mechanism for iron regulation is reflected by the absence in all three plant genes of the IRE, a highly conserved, noncoding sequence in vertebrate animal ferritin mRNA. In plant ferritin genes, the number of introns (n = 7) is higher than in animals (n = 3). Second, no intron positions are conserved when ferritin genes of plants and animals are compared, although all ferritin gene introns are in the coding region; within kingdoms, the intron positions in ferritin genes are conserved. Finally, secondary protein structure has no apparent relationship to intron/exon boundaries in plant ferritin genes, whereas in animal ferritin genes the correspondence is high. The structural differences in introns/exons among phylogenetically related ferritin coding sequences and the high conservation of the gene structure within plant or animal kingdoms of the gene structure within plant or animal kingdoms suggest that kingdom-specific functional constraints may exist to maintain a particular intron/exon pattern within ferritin genes. In the case of plants, where ferritin gene intron placement is unrelated to triplet codons or protein structure, and where ferritin is targeted to the plastid, the selection pressure on gene organization may relate to RNA function and plastid/nuclear signaling.
Samson, Marie-Laure
2008-01-01
Background The Drosophila gene embryonic lethal abnormal visual system (elav) is the prototype of a gene family present in all metazoans. Its members encode structurally conserved neuronal proteins with three RNA Recognition Motifs (RRM) but they paradoxically act at diverse levels of post-transcriptional regulation. In an attempt to understand the history of this family, we searched for orthologs in eleven completely sequenced genomes, including those of humans, D. melanogaster and C. elegans, for which cDNAs are available. Results We analyzed 23 orthologs/paralogs of elav, and found evidence of gain/loss of gene copy number. For one set of genes, including elav itself, the coding sequences are free of introns and their products most resemble ELAV. The remaining genes show remarkable conservation of their exon organization, and their products most resemble FNE and RBP9, proteins encoded by the two elav paralogs of Drosophila. Remarkably, three of the conserved exon junctions are both close to structural elements, involved respectively in protein-RNA interactions and in the regulation of sub-cellular localization, and in the vicinity of diverse sequence variations. Conclusion The data indicate that the essential elav gene of Drosophila is newly emerged, restricted to dipterans and of retrotransposed origin. We propose that the conserved exon junctions constitute potential sites for sequence/function modifications, and that RRM binding proteins, whose function relies upon plastic RNA-protein interactions, may have played an important role in brain evolution. PMID:18715504
Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae.
Michel, Christian J; Ngoune, Viviane Nguefack; Poch, Olivier; Ripp, Raymond; Thompson, Julie D
2017-12-03
A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae . Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae . We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae , but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.
Dedrick, Rebekah M; Marinelli, Laura J; Newton, Gerald L; Pogliano, Kit; Pogliano, Joseph; Hatfull, Graham F
2013-05-01
Bacteriophages represent a majority of all life forms, and the vast, dynamic population with early origins is reflected in their enormous genetic diversity. A large number of bacteriophage genomes have been sequenced. They are replete with novel genes without known relatives. We know little about their functions, which genes are required for lytic growth, and how they are expressed. Furthermore, the diversity is such that even genes with required functions - such as virion proteins and repressors - cannot always be recognized. Here we describe a functional genomic dissection of mycobacteriophage Giles, in which the virion proteins are identified, genes required for lytic growth are determined, the repressor is identified, and the transcription patterns determined. We find that although all of the predicted phage genes are expressed either in lysogeny or in lytic growth, 45% of the predicted genes are non-essential for lytic growth. We also describe genes required for DNA replication, show that recombination is required for lytic growth, and that Giles encodes a novel repressor. RNAseq analysis reveals abundant expression of a small non-coding RNA in a lysogen and in late lytic growth, although it is non-essential for lytic growth and does not alter lysogeny. © 2013 Blackwell Publishing Ltd.
Tsai, Yi-Ming; Chang, An; Kuo, Chih-Horng
2018-06-01
Genome reduction is a recurring theme of symbiont evolution. The genus Spiroplasma contains species that are mostly facultative insect symbionts. The typical genome sizes of those species within the Apis clade were estimated to be ∼1.0-1.4 Mb. Intriguingly, Spiroplasma clarkii was found to have a genome size that is > 30% larger than the median of other species within the same clade. To investigate the molecular evolution events that led to the genome expansion of this bacterium, we determined its complete genome sequence and inferred the evolutionary origin of each protein-coding gene based on the phylogenetic distribution of homologs. Among the 1,346 annotated protein-coding genes, 641 were originated from within the Apis clade while 233 were putatively acquired from outside of the clade (including 91 high-confidence candidates). Additionally, 472 were specific to S. clarkii without homologs in the current database (i.e., the origins remained unknown). The acquisition of protein-coding genes, rather than mobile genetic elements, appeared to be a major contributing factor of genome expansion. Notably, >50% of the high-confidence acquired genes are related to carbohydrate transport and metabolism, suggesting that these acquired genes contributed to the expansion of both genome size and metabolic capability. The findings of this work provided an interesting case against the general evolutionary trend observed among symbiotic bacteria and further demonstrated the flexibility of Spiroplasma genomes. For future studies, investigation on the functional integration of these acquired genes, as well as the inference of their contribution to fitness could improve our knowledge of symbiont evolution.
Kim, Dong Seon; Hahn, Yoonsoo
2012-11-13
Evolution of splice sites is a well-known phenomenon that results in transcript diversity during human evolution. Many novel splice sites are derived from repetitive elements and may not contribute to protein products. Here, we analyzed annotated human protein-coding exons and identified human-specific splice sites that arose after the human-chimpanzee divergence. We analyzed multiple alignments of the annotated human protein-coding exons and their respective orthologous mammalian genome sequences to identify 85 novel splice sites (50 splice acceptors and 35 donors) in the human genome. The novel protein-coding exons, which are expressed either constitutively or alternatively, produce novel protein isoforms by insertion, deletion, or frameshift. We found three cases in which the human-specific isoform conferred novel molecular function in the human cells: the human-specific IMUP protein isoform induces apoptosis of the trophoblast and is implicated in pre-eclampsia; the intronization of a part of SMOX gene exon produces inactive spermine oxidase; the human-specific NUB1 isoform shows reduced interaction with ubiquitin-like proteins, possibly affecting ubiquitin pathways. Although the generation of novel protein isoforms does not equate to adaptive evolution, we propose that these cases are useful candidates for a molecular functional study to identify proteomic changes that might bring about novel phenotypes during human evolution.
Liu, Xiuying; Luo, GuanZheng; Bai, Xiujuan; Wang, Xiu-Jie
2009-10-01
MicroRNAs are approximately 22 nt long small non-coding RNAs that play important regulatory roles in eukaryotes. The biogenesis and functional processes of microRNAs require the participation of many proteins, of which, the well studied ones are Dicer, Drosha, Argonaute and Exportin 5. To systematically study these four protein families, we screened 11 animal genomes to search for genes encoding above mentioned proteins, and identified some new members for each family. Domain analysis results revealed that most proteins within the same family share identical or similar domains. Alternative spliced transcript variants were found for some proteins. We also examined the expression patterns of these proteins in different human tissues and identified other proteins that could potentially interact with these proteins. These findings provided systematic information on the four key proteins involved in microRNA biogenesis and functional pathways in animals, and will shed light on further functional studies of these proteins.
Van Coillie, Samya; Liang, Lunxi; Zhang, Yao; Wang, Huanbin; Fang, Jing-Yuan; Xu, Jie
2016-04-05
High-throughput methods such as co-immunoprecipitationmass spectrometry (coIP-MS) and yeast 2 hybridization (Y2H) have suggested a broad range of unannotated protein-protein interactions (PPIs), and interpretation of these PPIs remains a challenging task. The advancements in cancer genomic researches allow for the inference of "coactivation pairs" in cancer, which may facilitate the identification of PPIs involved in cancer. Here we present OncoBinder as a tool for the assessment of proteomic interaction data based on the functional synergy of oncoproteins in cancer. This decision tree-based method combines gene mutation, copy number and mRNA expression information to infer the functional status of protein-coding genes. We applied OncoBinder to evaluate the potential binders of EGFR and ERK2 proteins based on the gastric cancer dataset of The Cancer Genome Atlas (TCGA). As a result, OncoBinder identified high confidence interactions (annotated by Kyoto Encyclopedia of Genes and Genomes (KEGG) or validated by low-throughput assays) more efficiently than co-expression based method. Taken together, our results suggest that evaluation of gene functional synergy in cancer may facilitate the interpretation of proteomic interaction data. The OncoBinder toolbox for Matlab is freely accessible online.
Bhattarai, Sunil; Aly, Ahmed; Garcia, Kristy; Ruiz, Diandra; Pontarelli, Fabrizio; Dharap, Ashutosh
2018-06-03
Gene expression in cerebral ischemia has been a subject of intense investigations for several years. Studies utilizing probe-based high-throughput methodologies such as microarrays have contributed significantly to our existing knowledge but lacked the capacity to dissect the transcriptome in detail. Genome-wide RNA-sequencing (RNA-seq) enables comprehensive examinations of transcriptomes for attributes such as strandedness, alternative splicing, alternative transcription start/stop sites, and sequence composition, thus providing a very detailed account of gene expression. Leveraging this capability, we conducted an in-depth, genome-wide evaluation of the protein-coding transcriptome of the adult mouse cortex after transient focal ischemia at 6, 12, or 24 h of reperfusion using RNA-seq. We identified a total of 1007 transcripts at 6 h, 1878 transcripts at 12 h, and 1618 transcripts at 24 h of reperfusion that were significantly altered as compared to sham controls. With isoform-level resolution, we identified 23 splice variants arising from 23 genes that were novel mRNA isoforms. For a subset of genes, we detected reperfusion time-point-dependent splice isoform switching, indicating an expression and/or functional switch for these genes. Finally, for 286 genes across all three reperfusion time-points, we discovered multiple, distinct, simultaneously expressed and differentially altered isoforms per gene that were generated via alternative transcription start/stop sites. Of these, 165 isoforms derived from 109 genes were novel mRNAs. Together, our data unravel the protein-coding transcriptome of the cerebral cortex at an unprecedented depth to provide several new insights into the flexibility and complexity of stroke-related gene transcription and transcript organization.
Buchensky, Celeste; Almirón, Paula; Mantilla, Brian Suarez; Silber, Ariel M; Cricco, Julia A
2010-11-01
Trypanosoma cruzi, the etiologic agent for Chagas’ disease, has requirements for several cofactors, one of which is heme. Because this organism is unable to synthesize heme, which serves as a prosthetic group for several heme proteins (including the respiratory chain complexes), it therefore must be acquired from the environment. Considering this deficiency, it is an open question as to how heme A, the essential cofactor for eukaryotic CcO enzymes, is acquired by this parasite. In the present work, we provide evidence for the presence and functionality of genes coding for heme O and heme A synthases, which catalyze the synthesis of heme O and its conversion into heme A, respectively. The functions of these T. cruzi proteins were evaluated using yeast complementation assays, and the mRNA levels of their respective genes were analyzed at the different T. cruzi life stages. It was observed that the amount of mRNA coding for these proteins changes during the parasite life cycle, suggesting that this variation could reflect different respiratory requirements in the different parasite life stages. © 2010 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
MicroRNAs in genetic disease: rethinking the dosage.
Henrion-Caude, Alexandra; Girard, Muriel; Amiel, Jeanne
2012-08-01
To date, the general assumption was that most mutations interested protein-coding genes only. Thus, only few illustrations have mentioned here that mutations may occur in non-protein coding genes such as microRNAs (miRNAs). We thus report progress in delineating their contribution as phenotypic modulators, genetic switches and fine-tuners of gene expression. We reasoned that browsing their contribution to genetic disease may provide a framework for understanding the proper requirements to devise miRNA-based therapy strategies, in particular the relief of an appropriate dosage. Gain and loss of function of miRNA enforce the need to respectively antagonize or supply the miRNAs. We further categorized human disease according to the different ways in which the miRNA was altered arising either de novo, or inherited whether as a mendelian or as an epistatic trait, uncovering its role in epigenetics. We discuss how improving our knowledge on the contribution of miRNAs to genetic disease may be beneficial to devise appropriate gene therapy strategies.
Conserved syntenic clusters of protein coding genes are missing in birds.
Lovell, Peter V; Wirthlin, Morgan; Wilhelm, Larry; Minx, Patrick; Lazar, Nathan H; Carbone, Lucia; Warren, Wesley C; Mello, Claudio V
2014-01-01
Birds are one of the most highly successful and diverse groups of vertebrates, having evolved a number of distinct characteristics, including feathers and wings, a sturdy lightweight skeleton and unique respiratory and urinary/excretion systems. However, the genetic basis of these traits is poorly understood. Using comparative genomics based on extensive searches of 60 avian genomes, we have found that birds lack approximately 274 protein coding genes that are present in the genomes of most vertebrate lineages and are for the most part organized in conserved syntenic clusters in non-avian sauropsids and in humans. These genes are located in regions associated with chromosomal rearrangements, and are largely present in crocodiles, suggesting that their loss occurred subsequent to the split of dinosaurs/birds from crocodilians. Many of these genes are associated with lethality in rodents, human genetic disorders, or biological functions targeting various tissues. Functional enrichment analysis combined with orthogroup analysis and paralog searches revealed enrichments that were shared by non-avian species, present only in birds, or shared between all species. Together these results provide a clearer definition of the genetic background of extant birds, extend the findings of previous studies on missing avian genes, and provide clues about molecular events that shaped avian evolution. They also have implications for fields that largely benefit from avian studies, including development, immune system, oncogenesis, and brain function and cognition. With regards to the missing genes, birds can be considered ‘natural knockouts’ that may become invaluable model organisms for several human diseases.
Cheng, Chao; Ung, Matthew; Grant, Gavin D.; Whitfield, Michael L.
2013-01-01
Cell cycle is a complex and highly supervised process that must proceed with regulatory precision to achieve successful cellular division. Despite the wide application, microarray time course experiments have several limitations in identifying cell cycle genes. We thus propose a computational model to predict human cell cycle genes based on transcription factor (TF) binding and regulatory motif information in their promoters. We utilize ENCODE ChIP-seq data and motif information as predictors to discriminate cell cycle against non-cell cycle genes. Our results show that both the trans- TF features and the cis- motif features are predictive of cell cycle genes, and a combination of the two types of features can further improve prediction accuracy. We apply our model to a complete list of GENCODE promoters to predict novel cell cycle driving promoters for both protein-coding genes and non-coding RNAs such as lincRNAs. We find that a similar percentage of lincRNAs are cell cycle regulated as protein-coding genes, suggesting the importance of non-coding RNAs in cell cycle division. The model we propose here provides not only a practical tool for identifying novel cell cycle genes with high accuracy, but also new insights on cell cycle regulation by TFs and cis-regulatory elements. PMID:23874175
Discovery of rare protein-coding genes in model methylotroph Methylobacterium extorquens AM1.
Kumar, Dhirendra; Mondal, Anupam Kumar; Yadav, Amit Kumar; Dash, Debasis
2014-12-01
Proteogenomics involves the use of MS to refine annotation of protein-coding genes and discover genes in a genome. We carried out comprehensive proteogenomic analysis of Methylobacterium extorquens AM1 (ME-AM1) from publicly available proteomics data with a motive to improve annotation for methylotrophs; organisms capable of surviving in reduced carbon compounds such as methanol. Besides identifying 2482(50%) proteins, 29 new genes were discovered and 66 annotated gene models were revised in ME-AM1 genome. One such novel gene is identified with 75 peptides, lacks homolog in other methylobacteria but has glycosyl transferase and lipopolysaccharide biosynthesis protein domains, indicating its potential role in outer membrane synthesis. Many novel genes are present only in ME-AM1 among methylobacteria. Distant homologs of these genes in unrelated taxonomic classes and low GC-content of few genes suggest lateral gene transfer as a potential mode of their origin. Annotations of methylotrophy related genes were also improved by the discovery of a short gene in methylotrophy gene island and redefining a gene important for pyrroquinoline quinone synthesis, essential for methylotrophy. The combined use of proteogenomics and rigorous bioinformatics analysis greatly enhanced the annotation of protein-coding genes in model methylotroph ME-AM1 genome. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Ntougias, Spyridon; Lapidus, Alla; Han, James; Mavromatis, Konstantinos; Pati, Amrita; Chen, Amy; Klenk, Hans-Peter; Woyke, Tanja; Fasseas, Constantinos; Kyrpides, Nikos C.; Zervakis, Georgios I.
2014-01-01
Olivibacter sitiensis Ntougias et al. 2007 is a member of the family Sphingobacteriaceae, phylum Bacteroidetes. Members of the genus Olivibacter are phylogenetically diverse and of significant interest. They occur in diverse habitats, such as rhizosphere and contaminated soils, viscous wastes, composts, biofilter clean-up facilities on contaminated sites and cave environments, and they are involved in the degradation of complex and toxic compounds. Here we describe the features of O. sitiensis AW-6T, together with the permanent-draft genome sequence and annotation. The organism was sequenced under the Genomic Encyclopedia for Bacteria and Archaea (GEBA) project at the DOE Joint Genome Institute and is the first genome sequence of a species within the genus Olivibacter. The genome is 5,053,571 bp long and is comprised of 110 scaffolds with an average GC content of 44.61%. Of the 4,565 genes predicted, 4,501 were protein-coding genes and 64 were RNA genes. Most protein-coding genes (68.52%) were assigned to a putative function. The identification of 2-keto-4-pentenoate hydratase/2-oxohepta-3-ene-1,7-dioic acid hydratase-coding genes indicates involvement of this organism in the catechol catabolic pathway. In addition, genes encoding for β-1,4-xylanases and β-1,4-xylosidases reveal the xylanolytic action of O. sitiensis. PMID:25197463
A global view of the nonprotein-coding transcriptome in Plasmodium falciparum
Raabe, Carsten A.; Sanchez, Cecilia P.; Randau, Gerrit; Robeck, Thomas; Skryabin, Boris V.; Chinni, Suresh V.; Kube, Michael; Reinhardt, Richard; Ng, Guey Hooi; Manickam, Ravichandran; Kuryshev, Vladimir Y.; Lanzer, Michael; Brosius, Juergen; Tang, Thean Hock; Rozhdestvensky, Timofey S.
2010-01-01
Nonprotein-coding RNAs (npcRNAs) represent an important class of regulatory molecules that act in many cellular pathways. Here, we describe the experimental identification and validation of the small npcRNA transcriptome of the human malaria parasite Plasmodium falciparum. We identified 630 novel npcRNA candidates. Based on sequence and structural motifs, 43 of them belong to the C/D and H/ACA-box subclasses of small nucleolar RNAs (snoRNAs) and small Cajal body-specific RNAs (scaRNAs). We further observed the exonization of a functional H/ACA snoRNA gene, which might contribute to the regulation of ribosomal protein L7a gene expression. Some of the small npcRNA candidates are from telomeric and subtelomeric repetitive regions, suggesting their potential involvement in maintaining telomeric integrity and subtelomeric gene silencing. We also detected 328 cis-encoded antisense npcRNAs (asRNAs) complementary to P. falciparum protein-coding genes of a wide range of biochemical pathways, including determinants of virulence and pathology. All cis-encoded asRNA genes tested exhibit lifecycle-specific expression profiles. For all but one of the respective sense–antisense pairs, we deduced concordant patterns of expression. Our findings have important implications for a better understanding of gene regulatory mechanisms in P. falciparum, revealing an extended and sophisticated npcRNA network that may control the expression of housekeeping genes and virulence factors. PMID:19864253
A global view of the nonprotein-coding transcriptome in Plasmodium falciparum.
Raabe, Carsten A; Sanchez, Cecilia P; Randau, Gerrit; Robeck, Thomas; Skryabin, Boris V; Chinni, Suresh V; Kube, Michael; Reinhardt, Richard; Ng, Guey Hooi; Manickam, Ravichandran; Kuryshev, Vladimir Y; Lanzer, Michael; Brosius, Juergen; Tang, Thean Hock; Rozhdestvensky, Timofey S
2010-01-01
Nonprotein-coding RNAs (npcRNAs) represent an important class of regulatory molecules that act in many cellular pathways. Here, we describe the experimental identification and validation of the small npcRNA transcriptome of the human malaria parasite Plasmodium falciparum. We identified 630 novel npcRNA candidates. Based on sequence and structural motifs, 43 of them belong to the C/D and H/ACA-box subclasses of small nucleolar RNAs (snoRNAs) and small Cajal body-specific RNAs (scaRNAs). We further observed the exonization of a functional H/ACA snoRNA gene, which might contribute to the regulation of ribosomal protein L7a gene expression. Some of the small npcRNA candidates are from telomeric and subtelomeric repetitive regions, suggesting their potential involvement in maintaining telomeric integrity and subtelomeric gene silencing. We also detected 328 cis-encoded antisense npcRNAs (asRNAs) complementary to P. falciparum protein-coding genes of a wide range of biochemical pathways, including determinants of virulence and pathology. All cis-encoded asRNA genes tested exhibit lifecycle-specific expression profiles. For all but one of the respective sense-antisense pairs, we deduced concordant patterns of expression. Our findings have important implications for a better understanding of gene regulatory mechanisms in P. falciparum, revealing an extended and sophisticated npcRNA network that may control the expression of housekeeping genes and virulence factors.
Montandon, P E; Vasserot, A; Stutz, E
1986-01-01
We retrieved a 1.6 kbp intron separating two exons of the psb C gene which codes for the 44 kDa reaction center protein of photosystem II. This intron is 3 to 4 times the size of all previously sequenced Euglena gracilis chloroplast introns. It contains an open reading frame of 458 codons potentially coding for a basic protein of 54 kDa of yet unknown function. The intron boundaries follow consensus sequences established for chloroplast introns related to class II and nuclear pre-mRNA introns. Its 3'-terminal segment has structural features similar to class II mitochondrial introns with an invariant base A as possible branch point for lariat formation.
Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.
Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M
2010-12-15
Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.
Gene and genon concept: coding versus regulation
2007-01-01
We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon. PMID:18087760
Schwab, Stefan; Ramos, Humberto J; Souza, Emanuel M; Pedrosa, Fábio O; Yates, Marshall G; Chubatsu, Leda S; Rigo, Liu U
2007-05-01
Random mutagenesis using transposons with promoterless reporter genes has been widely used to examine differential gene expression patterns in bacteria. Using this approach, we have identified 26 genes of the endophytic nitrogen-fixing bacterium Herbaspirillum seropedicae regulated in response to ammonium content in the growth medium. These include nine genes involved in the transport of nitrogen compounds, such as the high-affinity ammonium transporter AmtB, and uptake systems for alternative nitrogen sources; nine genes coding for proteins responsible for restoring intracellular ammonium levels through enzymatic reactions, such as nitrogenase, amidase, and arginase; and a third group includes metabolic switch genes, coding for sensor kinases or transcription regulation factors, whose role in metabolism was previously unknown. Also, four genes identified were of unknown function. This paper describes their involvement in response to ammonium limitation. The results provide a preliminary profile of the metabolic response of Herbaspirillum seropedicae to ammonium stress.
Fan, SiGang; Hu, ChaoQun; Wen, Jing; Zhang, LvPing
2011-05-01
The complete mitochondrial DNA sequence contains useful information for phylogenetic analyses of metazoa. In this study, the complete mitochondrial DNA sequence of sea cucumber Stichopus horrens (Holothuroidea: Stichopodidae: Stichopus) is presented. The complete sequence was determined using normal and long PCRs. The mitochondrial genome of Stichopus horrens is a circular molecule 16257 bps long, composed of 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes. Most of these genes are coded on the heavy strand except for one protein-coding gene (nad6) and five tRNA genes (tRNA ( Ser(UCN) ), tRNA ( Gln ), tRNA ( Ala ), tRNA ( Val ), tRNA ( Asp )) which are coded on the light strand. The composition of the heavy strand is 30.8% A, 23.7% C, 16.2% G, and 29.3% T bases (AT skew=0.025; GC skew=-0.188). A non-coding region of 675 bp was identified as a putative control region because of its location and AT richness. The intergenic spacers range from 1 to 50 bp in size, totaling 227 bp. A total of 25 overlapping nucleotides, ranging from 1 to 10 bp in size, exist among 11 genes. All 13 protein-coding genes are initiated with an ATG. The TAA codon is used as the stop codon in all the protein coding genes except nad3 and nad4 that use TAG as their termination codon. The most frequently used amino acids are Leu (16.29%), Ser (10.34%) and Phe (8.37%). All of the tRNA genes have the potential to fold into typical cloverleaf secondary structures. We also compared the order of the genes in the mitochondrial DNA from the five holothurians that are now available and found a novel gene arrangement in the mitochondrial DNA of Stichopus horrens.
dbCPG: A web resource for cancer predisposition genes
Wei, Ran; Yao, Yao; Yang, Wu; Zheng, Chun-Hou; Zhao, Min; Xia, Junfeng
2016-01-01
Cancer predisposition genes (CPGs) are genes in which inherited mutations confer highly or moderately increased risks of developing cancer. Identification of these genes and understanding the biological mechanisms that underlie them is crucial for the prevention, early diagnosis, and optimized management of cancer. Over the past decades, great efforts have been made to identify CPGs through multiple strategies. However, information on these CPGs and their molecular functions is scattered. To address this issue and provide a comprehensive resource for researchers, we developed the Cancer Predisposition Gene Database (dbCPG, Database URL: http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp), the first literature-based gene resource for exploring human CPGs. It contains 827 human (724 protein-coding, 23 non-coding, and 80 unknown type genes), 637 rats, and 658 mouse CPGs. Furthermore, data mining was performed to gain insights into the understanding of the CPGs data, including functional annotation, gene prioritization, network analysis of prioritized genes and overlap analysis across multiple cancer types. A user-friendly web interface with multiple browse, search, and upload functions was also developed to facilitate access to the latest information on CPGs. Taken together, the dbCPG database provides a comprehensive data resource for further studies of cancer predisposition genes. PMID:27192119
Origins of De Novo Genes in Human and Chimpanzee.
Ruiz-Orera, Jorge; Hernandez-Rodriguez, Jessica; Chiva, Cristina; Sabidó, Eduard; Kondova, Ivanela; Bontrop, Ronald; Marqués-Bonet, Tomàs; Albà, M Mar
2015-12-01
The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.
Origins of De Novo Genes in Human and Chimpanzee
Ruiz-Orera, Jorge; Hernandez-Rodriguez, Jessica; Chiva, Cristina; Sabidó, Eduard; Kondova, Ivanela; Bontrop, Ronald; Marqués-Bonet, Tomàs; Albà, M.Mar
2015-01-01
The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species—human, chimpanzee, macaque, and mouse—and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins. PMID:26720152
Natural selection in avian protein-coding genes expressed in brain.
Axelsson, Erik; Hultin-Rosenberg, Lina; Brandström, Mikael; Zwahlén, Martin; Clayton, David F; Ellegren, Hans
2008-06-01
The evolution of birds from theropod dinosaurs took place approximately 150 million years ago, and was associated with a number of specific adaptations that are still evident among extant birds, including feathers, song and extravagant secondary sexual characteristics. Knowledge about the molecular evolutionary background to such adaptations is lacking. Here, we analyse the evolution of > 5000 protein-coding gene sequences expressed in zebra finch brain by comparison to orthologous sequences in chicken. Mean d(N)/d(S) is 0.085 and genes with their maximal expression in the eye and central nervous system have the lowest mean d(N)/d(S) value, while those expressed in digestive and reproductive tissues exhibit the highest. We find that fast-evolving genes (those which have higher than expected rate of nonsynonymous substitution, indicative of adaptive evolution) are enriched for biological functions such as fertilization, muscle contraction, defence response, response to stress, wounding and endogenous stimulus, and cell death. After alignment to mammalian orthologues, we identify a catalogue of 228 genes that show a significantly higher rate of protein evolution in the two bird lineages than in mammals. These accelerated bird genes, representing candidates for avian-specific adaptations, include genes implicated in vocal learning and other cognitive processes. Moreover, colouration genes evolve faster in birds than in mammals, which may have been driven by sexual selection for extravagant plumage characteristics.
Graf, Louis; Kim, Yae Jin; Cho, Ga Youn; Miller, Kathy Ann
2017-01-01
Coccophora langsdorfii (Turner) Greville (Fucales) is an intertidal brown alga that is endemic to Northeast Asia and increasingly endangered by habitat loss and climate change. We sequenced the complete circular plastid and mitochondrial genomes of C. langsdorfii. The circular plastid genome is 124,450 bp and contains 139 protein-coding, 28 tRNA and 6 rRNA genes. The circular mitochondrial genome is 35,660 bp and contains 38 protein-coding, 25 tRNA and 3 rRNA genes. The structure and gene content of the C. langsdorfii plastid genome is similar to those of other species in the Fucales. The plastid genomes of brown algae in other orders share similar gene content but exhibit large structural recombination. The large in-frame insert in the cox2 gene in the mitochondrial genome of C. langsdorfii is typical of other brown algae. We explored the effect of this insertion on the structure and function of the cox2 protein. We estimated the usefulness of 135 plastid genes and 35 mitochondrial genes for developing molecular markers. This study shows that 29 organellar genes will prove efficient for resolving brown algal phylogeny. In addition, we propose a new molecular marker suitable for the study of intraspecific genetic diversity that should be tested in a large survey of populations of C. langsdorfii. PMID:29095864
Mechanisms and consequences of alternative polyadenylation
Di Giammartino, Dafne Campigli; Nishida, Kensei; Manley, James L.
2011-01-01
Summary Alternative polyadenylation (APA) is emerging as a widespread mechanism used to control gene expression. Like alternative splicing, usage of alternative poly(A) sites allows a single gene to encode multiple mRNA transcripts. In some cases, this changes the mRNA coding potential; in other cases, the code remains unchanged but the 3’UTR length is altered, influencing the fate of mRNAs in several ways, for example, by altering the availability of RNA binding protein sites and microRNA binding sites. The mechansims governing both global and gene-specific APA are only starting to be deciphered. Here we review what is known about these mechanisms and the functional consequences of alternative polyadenlyation. PMID:21925375
Roth, Melissa S; Cokus, Shawn J; Gallaher, Sean D; Walter, Andreas; Lopez, David; Erickson, Erika; Endelman, Benjamin; Westcott, Daniel; Larabell, Carolyn A; Merchant, Sabeeha S; Pellegrini, Matteo; Niyogi, Krishna K
2017-05-23
Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis , because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. To advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ∼58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniform gene density over chromosomes, low repetitive sequence content (∼6%), and a high fraction of protein-coding sequence (∼39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (∼73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase ( BKT ), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. The high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.
Roth, Melissa S.; Cokus, Shawn J.; Gallaher, Sean D.; ...
2017-05-08
Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. Here, to advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ~58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniformmore » gene density over chromosomes, low repetitive sequence content (~6%), and a high fraction of protein-coding sequence (~39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (~73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. Finally, the high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roth, Melissa S.; Cokus, Shawn J.; Gallaher, Sean D.
Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. Here, to advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ~58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniformmore » gene density over chromosomes, low repetitive sequence content (~6%), and a high fraction of protein-coding sequence (~39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (~73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. Finally, the high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.« less
Roth, Melissa S.; Cokus, Shawn J.; Gallaher, Sean D.; Walter, Andreas; Lopez, David; Erickson, Erika; Endelman, Benjamin; Westcott, Daniel; Larabell, Carolyn A.; Merchant, Sabeeha S.; Pellegrini, Matteo
2017-01-01
Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. To advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ∼58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniform gene density over chromosomes, low repetitive sequence content (∼6%), and a high fraction of protein-coding sequence (∼39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (∼73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. The high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production. PMID:28484037
2004-12-09
We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Buttstedt, Anja; Moritz, Robin Fa; Erler, Silvio
2013-11-27
In the honeybee Apis mellifera, female larvae destined to become a queen are fed with royal jelly, a secretion of the hypopharyngeal glands of young nurse bees that rear the brood. The protein moiety of royal jelly comprises mostly major royal jelly proteins (MRJPs) of which the coding genes (mrjp1-9) have been identified on chromosome 11 in the honeybee's genome. We determined the expression of mrjp1-9 among the honeybee worker caste (nurses, foragers) and the sexuals (queens (unmated, mated) and drones) in various body parts (head, thorax, abdomen). Specific mrjp expression was not only found in brood rearing nurse bees, but also in foragers and the sexuals. The expression of mrjp1 to 7 is characteristic for the heads of worker bees, with an elevated expression of mrjp1-4 and 7 in nurse bees compared to foragers. Mrjp5 and 6 were higher in foragers compared to nurses suggesting functions in addition to those of brood food proteins. Furthermore, the expression of mrjp9 was high in the heads, thoraces and abdomen of almost all female bees, suggesting a function irrespective of body section. This completely different expression profile suggests mrjp9 to code for the most ancestral major royal jelly protein of the honeybee.
Raju, Hemalatha B; Tsinoremas, Nicholas F; Capobianco, Enrico
2016-01-01
Regeneration of injured nerves is likely occurring in the peripheral nervous system, but not in the central nervous system. Although protein-coding gene expression has been assessed during nerve regeneration, little is currently known about the role of non-coding RNAs (ncRNAs). This leaves open questions about the potential effects of ncRNAs at transcriptome level. Due to the limited availability of human neuropathic pain (NP) data, we have identified the most comprehensive time-course gene expression profile referred to sciatic nerve (SN) injury and studied in a rat model using two neuronal tissues, namely dorsal root ganglion (DRG) and SN. We have developed a methodology to identify differentially expressed bioentities starting from microarray probes and repurposing them to annotate ncRNAs, while analyzing the expression profiles of protein-coding genes. The approach is designed to reuse microarray data and perform first profiling and then meta-analysis through three main steps. First, we used contextual analysis to identify what we considered putative or potential protein-coding targets for selected ncRNAs. Relevance was therefore assigned to differential expression of neighbor protein-coding genes, with neighborhood defined by a fixed genomic distance from long or antisense ncRNA loci, and of parental genes associated with pseudogenes. Second, connectivity among putative targets was used to build networks, in turn useful to conduct inference at interactomic scale. Last, network paths were annotated to assess relevance to NP. We found significant differential expression in long-intergenic ncRNAs (32 lincRNAs in SN and 8 in DRG), antisense RNA (31 asRNA in SN and 12 in DRG), and pseudogenes (456 in SN and 56 in DRG). In particular, contextual analysis centered on pseudogenes revealed some targets with known association to neurodegeneration and/or neurogenesis processes. While modules of the olfactory receptors were clearly identified in protein-protein interaction networks, other connectivity paths were identified between proteins already investigated in studies on disorders, such as Parkinson, Down syndrome, Huntington disease, and Alzheimer. Our findings suggest the importance of reusing gene expression data by meta-analysis approaches.
McTavish, H; LaQuier, F; Arciero, D; Logan, M; Mundfrom, G; Fuchs, J A; Hooper, A B
1993-04-01
The genome of Nitrosomonas europaea contains at least three copies each of the genes coding for hydroxylamine oxidoreductase (HAO) and cytochrome c554. A copy of an HAO gene is always located within 2.7 kb of a copy of a cytochrome c554 gene. Cytochrome P-460, a protein that shares very unusual spectral features with HAO, was found to be encoded by a gene separate from the HAO genes.
Fujisawa, Takatomo; Narikawa, Rei; Okamoto, Shinobu; Ehira, Shigeki; Yoshimura, Hidehisa; Suzuki, Iwane; Masuda, Tatsuru; Mochimaru, Mari; Takaichi, Shinichi; Awai, Koichiro; Sekine, Mitsuo; Horikawa, Hiroshi; Yashiro, Isao; Omata, Seiha; Takarada, Hiromi; Katano, Yoko; Kosugi, Hiroki; Tanikawa, Satoshi; Ohmori, Kazuko; Sato, Naoki; Ikeuchi, Masahiko; Fujita, Nobuyuki; Ohmori, Masayuki
2010-01-01
A filamentous non-N2-fixing cyanobacterium, Arthrospira (Spirulina) platensis, is an important organism for industrial applications and as a food supply. Almost the complete genome of A. platensis NIES-39 was determined in this study. The genome structure of A. platensis is estimated to be a single, circular chromosome of 6.8 Mb, based on optical mapping. Annotation of this 6.7 Mb sequence yielded 6630 protein-coding genes as well as two sets of rRNA genes and 40 tRNA genes. Of the protein-coding genes, 78% are similar to those of other organisms; the remaining 22% are currently unknown. A total 612 kb of the genome comprise group II introns, insertion sequences and some repetitive elements. Group I introns are located in a protein-coding region. Abundant restriction-modification systems were determined. Unique features in the gene composition were noted, particularly in a large number of genes for adenylate cyclase and haemolysin-like Ca2+-binding proteins and in chemotaxis proteins. Filament-specific genes were highlighted by comparative genomic analysis. PMID:20203057
Nicolas, Francisco Esteban; Moxon, Simon; de Haro, Juan P.; Calo, Silvia; Grigoriev, Igor V.; Torres-Martínez, Santiago; Moulton, Vincent; Ruiz-Vázquez, Rosa M.; Dalmay, Tamas
2010-01-01
Endogenous short RNAs (esRNAs) play diverse roles in eukaryotes and usually are produced from double-stranded RNA (dsRNA) by Dicer. esRNAs are grouped into different classes based on biogenesis and function but not all classes are present in all three eukaryotic kingdoms. The esRNA register of fungi is poorly described compared to other eukaryotes and it is not clear what esRNA classes are present in this kingdom and whether they regulate the expression of protein coding genes. However, evidence that some dicer mutant fungi display altered phenotypes suggests that esRNAs play an important role in fungi. Here, we show that the basal fungus Mucor circinelloides produces new classes of esRNAs that map to exons and regulate the expression of many protein coding genes. The largest class of these exonic-siRNAs (ex-siRNAs) are generated by RNA-dependent RNA Polymerase 1 (RdRP1) and dicer-like 2 (DCL2) and target the mRNAs of protein coding genes from which they were produced. Our results expand the range of esRNAs in eukaryotes and reveal a new role for esRNAs in fungi. PMID:20427422
Significance of duon mutations in cancer genomes
NASA Astrophysics Data System (ADS)
Yadav, Vinod Kumar; Smith, Kyle S.; Flinders, Colin; Mumenthaler, Shannon M.; de, Subhajyoti
2016-06-01
Functional mutations in coding regions not only affect the structure and function of the protein products, but may also modulate their expression in some cases. This class of mutations, recently dubbed “duon mutations” due to their dual roles, can potentially have major impacts on downstream pathways. However their significance in diseases such as cancer remain unclear. In a survey covering 4606 samples from 19 cancer types, and integrating allelic expression, overall mRNA expression, regulatory motif perturbation, and chromatin signatures in one composite index called REDACT score, we identified potential duon mutations. Several such mutations are detected in known cancer genes in multiple cancer types. For instance a potential duon mutation in TP53 is associated with increased expression of the mutant allelic gene copy, thereby possibly amplifying the functional effects on the downstream pathways. Another potential duon mutation in SF3B1 is associated with abnormal splicing and changes in angiogenesis and matrix degradation related pathways. Our findings emphasize the need to interrogate the mutations in coding regions beyond their obvious effects on protein structures.
Kinetic models of gene expression including non-coding RNAs
NASA Astrophysics Data System (ADS)
Zhdanov, Vladimir P.
2011-03-01
In cells, genes are transcribed into mRNAs, and the latter are translated into proteins. Due to the feedbacks between these processes, the kinetics of gene expression may be complex even in the simplest genetic networks. The corresponding models have already been reviewed in the literature. A new avenue in this field is related to the recognition that the conventional scenario of gene expression is fully applicable only to prokaryotes whose genomes consist of tightly packed protein-coding sequences. In eukaryotic cells, in contrast, such sequences are relatively rare, and the rest of the genome includes numerous transcript units representing non-coding RNAs (ncRNAs). During the past decade, it has become clear that such RNAs play a crucial role in gene expression and accordingly influence a multitude of cellular processes both in the normal state and during diseases. The numerous biological functions of ncRNAs are based primarily on their abilities to silence genes via pairing with a target mRNA and subsequently preventing its translation or facilitating degradation of the mRNA-ncRNA complex. Many other abilities of ncRNAs have been discovered as well. Our review is focused on the available kinetic models describing the mRNA, ncRNA and protein interplay. In particular, we systematically present the simplest models without kinetic feedbacks, models containing feedbacks and predicting bistability and oscillations in simple genetic networks, and models describing the effect of ncRNAs on complex genetic networks. Mathematically, the presentation is based primarily on temporal mean-field kinetic equations. The stochastic and spatio-temporal effects are also briefly discussed.
The Mediator complex: a central integrator of transcription
Allen, Benjamin L.; Taatjes, Dylan J.
2016-01-01
The RNA polymerase II (pol II) enzyme transcribes all protein-coding and most non-coding RNA genes and is globally regulated by Mediator, a large, conformationally flexible protein complex with variable subunit composition (for example, a four-subunit CDK8 module can reversibly associate). These biochemical characteristics are fundamentally important for Mediator's ability to control various processes important for transcription, including organization of chromatin architecture and regulation of pol II pre-initiation, initiation, re-initiation, pausing, and elongation. Although Mediator exists in all eukaryotes, a variety of Mediator functions appear to be specific to metazoans, indicative of more diverse regulatory requirements. PMID:25693131
In Silico Pattern-Based Analysis of the Human Cytomegalovirus Genome
Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T.; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas
2003-01-01
More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/). PMID:12634390
In silico pattern-based analysis of the human cytomegalovirus genome.
Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas
2003-04-01
More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/).
Lei, Huimeng; Yan, Zhangming; Sun, Xiaohong; Zhang, Yue; Wang, Jianhong; Ma, Caihong; Xu, Qunyuan; Wang, Rui; Jarvis, Erich D; Sun, Zhirong
2017-11-01
Human and several nonhuman species share the rare ability of modifying acoustic and/or syntactic features of sounds produced, i.e. vocal learning, which is the important neurobiological and behavioral substrate of human speech/language. This convergent trait was suggested to be associated with significant genomic convergence and best manifested at the ROBO-SLIT axon guidance pathway. Here we verified the significance of such genomic convergence and assessed its functional relevance to human speech/language using human genetic variation data. In normal human populations, we found the affected amino acid sites were well fixed and accompanied with significantly more associated protein-coding SNPs in the same genes than the rest genes. Diseased individuals with speech/language disorders have significant more low frequency protein coding SNPs but they preferentially occurred outside the affected genes. Such patients' SNPs were enriched in several functional categories including two axon guidance pathways (mediated by netrin and semaphorin) that interact with ROBO-SLITs. Four of the six patients have homozygous missense SNPs on PRAME gene family, one youngest gene family in human lineage, which possibly acts upon retinoic acid receptor signaling, similarly as FOXP2, to modulate axon guidance. Taken together, we suggest the axon guidance pathways (e.g. ROBO-SLIT, PRAME gene family) served as common targets for human speech/language evolution and related disorders. Copyright © 2017 Elsevier Inc. All rights reserved.
Saha, Anusree; Das, Shubhajit; Moin, Mazahar; Dutta, Mouboni; Bakshi, Achala; Madhav, M. S.; Kirti, P. B.
2017-01-01
Ribosomal proteins (RPs) are indispensable in ribosome biogenesis and protein synthesis, and play a crucial role in diverse developmental processes. Our previous studies on Ribosomal Protein Large subunit (RPL) genes provided insights into their stress responsive roles in rice. In the present study, we have explored the developmental and stress regulated expression patterns of Ribosomal Protein Small (RPS) subunit genes for their differential expression in a spatiotemporal and stress dependent manner. We have also performed an in silico analysis of gene structure, cis-elements in upstream regulatory regions, protein properties and phylogeny. Expression studies of the 34 RPS genes in 13 different tissues of rice covering major growth and developmental stages revealed that their expression was substantially elevated, mostly in shoots and leaves indicating their possible involvement in the development of vegetative organs. The majority of the RPS genes have manifested significant expression under all abiotic stress treatments with ABA, PEG, NaCl, and H2O2. Infection with important rice pathogens, Xanthomonas oryzae pv. oryzae (Xoo) and Rhizoctonia solani also induced the up-regulation of several of the RPS genes. RPS4, 13a, 18a, and 4a have shown higher transcript levels under all the abiotic stresses, whereas, RPS4 is up-regulated in both the biotic stress treatments. The information obtained from the present investigation would be useful in appreciating the possible stress-regulatory attributes of the genes coding for rice ribosomal small subunit proteins apart from their functions as house-keeping proteins. A detailed functional analysis of independent genes is required to study their roles in stress tolerance and generating stress- tolerant crops. PMID:28966624
Solov'ev, V V; Kel', A E; Kolchanov, N A
1989-01-01
The factors, determining the presence of inverted and symmetrical repeats in genes coding for globular proteins, have been analysed. An interesting property of genetical code has been revealed in the analysis of symmetrical repeats: the pairs of symmetrical codons corresponded to pairs of amino acids with mostly similar physical-chemical parameters. This property may explain the presence of symmetrical repeats and palindromes only in genes coding for beta-structural proteins-polypeptides, where amino acids with similar physical-chemical properties occupy symmetrical positions. A stochastic model of evolution of polynucleotide sequences has been used for analysis of inverted repeats. The modelling demonstrated that only limiting of sequences (uneven frequencies of used codons) is enough for arising of nonrandom inverted repeats in genes.
Lin, Michael F.; Deoras, Ameya N.; Rasmussen, Matthew D.; Kellis, Manolis
2008-01-01
Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (≤240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human. PMID:18421375
Soybean kinome: functional classification and gene expression patterns
Liu, Jinyi; Chen, Nana; Grant, Joshua N.; Cheng, Zong-Ming (Max); Stewart, C. Neal; Hewezi, Tarek
2015-01-01
The protein kinase (PK) gene family is one of the largest and most highly conserved gene families in plants and plays a role in nearly all biological functions. While a large number of genes have been predicted to encode PKs in soybean, a comprehensive functional classification and global analysis of expression patterns of this large gene family is lacking. In this study, we identified the entire soybean PK repertoire or kinome, which comprised 2166 putative PK genes, representing 4.67% of all soybean protein-coding genes. The soybean kinome was classified into 19 groups, 81 families, and 122 subfamilies. The receptor-like kinase (RLK) group was remarkably large, containing 1418 genes. Collinearity analysis indicated that whole-genome segmental duplication events may have played a key role in the expansion of the soybean kinome, whereas tandem duplications might have contributed to the expansion of specific subfamilies. Gene structure, subcellular localization prediction, and gene expression patterns indicated extensive functional divergence of PK subfamilies. Global gene expression analysis of soybean PK subfamilies revealed tissue- and stress-specific expression patterns, implying regulatory functions over a wide range of developmental and physiological processes. In addition, tissue and stress co-expression network analysis uncovered specific subfamilies with narrow or wide interconnected relationships, indicative of their association with particular or broad signalling pathways, respectively. Taken together, our analyses provide a foundation for further functional studies to reveal the biological and molecular functions of PKs in soybean. PMID:25614662
The Human Cell Surfaceome of Breast Tumors
da Cunha, Júlia Pinheiro Chagas; Galante, Pedro Alexandre Favoretto; de Souza, Jorge Estefano Santana; Pieprzyk, Martin; Carraro, Dirce Maria; Old, Lloyd J.; Camargo, Anamaria Aranha; de Souza, Sandro José
2013-01-01
Introduction. Cell surface proteins are ideal targets for cancer therapy and diagnosis. We have identified a set of more than 3700 genes that code for transmembrane proteins believed to be at human cell surface. Methods. We used a high-throuput qPCR system for the analysis of 573 cell surface protein-coding genes in 12 primary breast tumors, 8 breast cell lines, and 21 normal human tissues including breast. To better understand the role of these genes in breast tumors, we used a series of bioinformatics strategies to integrates different type, of the datasets, such as KEGG, protein-protein interaction databases, ONCOMINE, and data from, literature. Results. We found that at least 77 genes are overexpressed in breast primary tumors while at least 2 of them have also a restricted expression pattern in normal tissues. We found common signaling pathways that may be regulated in breast tumors through the overexpression of these cell surface protein-coding genes. Furthermore, a comparison was made between the genes found in this report and other genes associated with features clinically relevant for breast tumorigenesis. Conclusions. The expression profiling generated in this study, together with an integrative bioinformatics analysis, allowed us to identify putative targets for breast tumors. PMID:24195083
Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin
ERIC Educational Resources Information Center
Offner, Susan
2010-01-01
The beta hemoglobin protein is identical in humans and chimpanzees. In this tutorial, students see that even though the proteins are identical, the genes that code for them are not. There are many more differences in the introns than in the exons, which indicates that coding regions of DNA are more highly conserved than non-coding regions.
Purification and identification of a nuclease activity in embryo axes from French bean.
Lambert, Rocío; Quiles, Francisco Antonio; Cabello-Díaz, Juan Miguel; Piedras, Pedro
2014-07-01
Plant nucleases are involved in nucleic acid degradation associated to programmed cell death processes as well as in DNA restriction, repair and recombination processes. However, the knowledge about the function of plant nucleases is limited. A major nuclease activity was detected by in-gel assay with whole embryonic axes of common bean by using ssDNA or RNA as substrate, whereas this activity was minimal in cotyledons. The enzyme has been purified to electrophoretic homogeneity from embryonic axes. The main biochemical properties of the purified enzyme indicate that it belongs to the S1/P1 family of nucleases. This was corroborated when this protein, after SDS-electrophoresis, was excised from the gel and further analysis by MALDI TOF/TOF allowed identification of the gene (PVN1) that codes this protein. The gene that codes the purified protein was identified. The expression of PVN1 gene was induced at the specific moment of radicle protrusion. The inclusion of inorganic phosphate to the imbibition media reduced the level of expression of this gene and the nuclease activity suggesting a relationship with the phosphorous status in French bean seedlings. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Juul, Malene; Bertl, Johanna; Guo, Qianyun; Nielsen, Morten Muhlig; Świtnicki, Michał; Hornshøj, Henrik; Madsen, Tobias; Hobolth, Asger; Pedersen, Jakob Skou
2017-01-01
Non-coding mutations may drive cancer development. Statistical detection of non-coding driver regions is challenged by a varying mutation rate and uncertainty of functional impact. Here, we develop a statistically founded non-coding driver-detection method, ncdDetect, which includes sample-specific mutational signatures, long-range mutation rate variation, and position-specific impact measures. Using ncdDetect, we screened non-coding regulatory regions of protein-coding genes across a pan-cancer set of whole-genomes (n = 505), which top-ranked known drivers and identified new candidates. For individual candidates, presence of non-coding mutations associates with altered expression or decreased patient survival across an independent pan-cancer sample set (n = 5454). This includes an antigen-presenting gene (CD1A), where 5’UTR mutations correlate significantly with decreased survival in melanoma. Additionally, mutations in a base-excision-repair gene (SMUG1) correlate with a C-to-T mutational-signature. Overall, we find that a rich model of mutational heterogeneity facilitates non-coding driver identification and integrative analysis points to candidates of potential clinical relevance. DOI: http://dx.doi.org/10.7554/eLife.21778.001 PMID:28362259
Yallowitz, Alisha R.; Gong, Ke-Qin; Swinehart, Ilea T.; Nelson, Lisa T.; Wellik, Deneen M.
2009-01-01
Summary Hox genes control many developmental events along the AP axis, but few target genes have been identified. Whether target genes are activated or repressed, what enhancer elements are required for regulation, and how different domains of the Hox proteins contribute to regulatory specificity is poorly understood. Six2 is genetically downstream of both the Hox11 paralogous genes in the developing mammalian kidney and Hoxa2 in branchial arch and facial mesenchyme. Loss-of-function of Hox11 leads to loss of Six2 expression and loss-of-function of Hoxa2 leads to expanded Six2 expression. Herein we demonstrate that a single enhancer site upstream of the Six2 coding sequence is responsible for both activation by Hox11 proteins in the kidney and repression by Hoxa2 in the branchial arch and facial mesenchyme in vivo. DNA binding activity is required for both activation and repression, but differential activity is not controlled by differences in the homeodomains. Rather, protein domains N- and C-terminal to the homeodomain confer activation versus repression activity. These data support a model in which the DNA binding specificity of Hox proteins in vivo may be similar, consistent with accumulated in vitro data, and that unique functions result mainly from differential interactions mediated by non-homeodomain regions of Hox proteins. PMID:19716816
Inter-individual variation in expression: a missing link in biomarker biology?
Little, Peter F R; Williams, Rohan B H; Wilkins, Marc R
2009-01-01
The past decade has seen an explosion of variation data demonstrating that diversity of both protein-coding sequences and of regulatory elements of protein-coding genes is common and of functional importance. In this article, we argue that genetic diversity can no longer be ignored in studies of human biology, even research projects without explicit genetic experimental design, and that this knowledge can, and must, inform research. By way of illustration, we focus on the potential role of genetic data in case-control studies to identify and validate cancer protein biomarkers. We argue that a consideration of genetics, in conjunction with proteomic biomarker discovery projects, should improve the proportion of biomarkers that can accurately classify patients.
An efficient transgenic system by TA cloning vectors and RNAi for C. elegans
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gengyo-Ando, Keiko; CREST, JST, 4-1-8 Hon-cho, Kawaguchi, Saitama 332-0012; Yoshina, Sawako
2006-11-03
In the nematode, transgenic analyses have been performed by microinjection of DNA from various sources into the syncytium gonad. To expedite these transgenic analyses, we solved two potential problems in this work. First, we constructed an efficient TA-cloning vector system which is useful for any promoter. By amplifying the genomic DNA fragments which contain regulatory sequences with or without the coding region, we could easily construct plasmids expressing fluorescent protein fusion without considering restriction sites. We could dissect motor neurons with three colors in a single animal. Second, we used feeding RNAi to isolate transgenic strains which express lag-2::venus fusionmore » gene. We found that the fusion protein is toxic when ectopically expressed in embryos but is functional to rescue a loss of function mutant in the lag-2 gene. Thus, the transgenic system described here should be useful to examine the protein function in the nematode.« less
Kang, Sung-Hwan; Atallah, Osama O; Sun, Yong-Duo; Folimonova, Svetlana Y
2018-01-15
Viruses from the family Closteroviridae show an example of intra-genome duplications of more than one gene. In addition to the hallmark coat protein gene duplication, several members possess a tandem duplication of papain-like leader proteases. In this study, we demonstrate that domains encoding the L1 and L2 proteases in the Citrus tristeza virus genome underwent a significant functional divergence at the RNA and protein levels. We show that the L1 protease is crucial for viral accumulation and establishment of initial infection, whereas its coding region is vital for virus transport. On the other hand, the second protease is indispensable for virus infection of its natural citrus host, suggesting that L2 has evolved an important adaptive function that mediates virus interaction with the woody host. Copyright © 2017 Elsevier Inc. All rights reserved.
Niarchos, Athanasios; Siora, Anastasia; Konstantinou, Evangelia; Kalampoki, Vasiliki; Lagoumintzis, George; Poulas, Konstantinos
2017-01-01
During the last few decades, the recombinant protein expression finds more and more applications. The cloning of protein-coding genes into expression vectors is required to be directional for proper expression, and versatile in order to facilitate gene insertion in multiple different vectors for expression tests. In this study, the TA-GC cloning method is proposed, as a new, simple and efficient method for the directional cloning of protein-coding genes in expression vectors. The presented method features several advantages over existing methods, which tend to be relatively more labour intensive, inflexible or expensive. The proposed method relies on the complementarity between single A- and G-overhangs of the protein-coding gene, obtained after a short incubation with T4 DNA polymerase, and T and C overhangs of the novel vector pET-BccI, created after digestion with the restriction endonuclease BccI. The novel protein-expression vector pET-BccI also facilitates the screening of transformed colonies for recombinant transformants. Evaluation experiments of the proposed TA-GC cloning method showed that 81% of the transformed colonies contained recombinant pET-BccI plasmids, and 98% of the recombinant colonies expressed the desired protein. This demonstrates that TA-GC cloning could be a valuable method for cloning protein-coding genes in expression vectors.
Niarchos, Athanasios; Siora, Anastasia; Konstantinou, Evangelia; Kalampoki, Vasiliki; Poulas, Konstantinos
2017-01-01
During the last few decades, the recombinant protein expression finds more and more applications. The cloning of protein-coding genes into expression vectors is required to be directional for proper expression, and versatile in order to facilitate gene insertion in multiple different vectors for expression tests. In this study, the TA-GC cloning method is proposed, as a new, simple and efficient method for the directional cloning of protein-coding genes in expression vectors. The presented method features several advantages over existing methods, which tend to be relatively more labour intensive, inflexible or expensive. The proposed method relies on the complementarity between single A- and G-overhangs of the protein-coding gene, obtained after a short incubation with T4 DNA polymerase, and T and C overhangs of the novel vector pET-BccI, created after digestion with the restriction endonuclease BccI. The novel protein-expression vector pET-BccI also facilitates the screening of transformed colonies for recombinant transformants. Evaluation experiments of the proposed TA-GC cloning method showed that 81% of the transformed colonies contained recombinant pET-BccI plasmids, and 98% of the recombinant colonies expressed the desired protein. This demonstrates that TA-GC cloning could be a valuable method for cloning protein-coding genes in expression vectors. PMID:29091919
Romero, Roberto; Tarca, Adi L; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S; Kalita, Cynthia A; Cai, Juan; Yeo, Lami; Lipovich, Leonard
2014-09-01
To identify differentially expressed long non-coding RNA (lncRNA) genes in human myometrium in women with spontaneous labor at term. Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n = 19) and women in spontaneous labor at term (n = 20). RNA was extracted and profiled using an Illumina® microarray platform. We have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. We identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an experimental method completely independent of the microarray analysis. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site, that lacked evolutionary conservation beyond primates. We provide, for the first time, evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term.
Death of a dogma: eukaryotic mRNAs can code for more than one protein
Mouilleron, Hélène; Delcourt, Vivian; Roucou, Xavier
2016-01-01
mRNAs carry the genetic information that is translated by ribosomes. The traditional view of a mature eukaryotic mRNA is a molecule with three main regions, the 5′ UTR, the protein coding open reading frame (ORF) or coding sequence (CDS), and the 3′ UTR. This concept assumes that ribosomes translate one ORF only, generally the longest one, and produce one protein. As a result, in the early days of genomics and bioinformatics, one CDS was associated with each protein-coding gene. This fundamental concept of a single CDS is being challenged by increasing experimental evidence indicating that annotated proteins are not the only proteins translated from mRNAs. In particular, mass spectrometry (MS)-based proteomics and ribosome profiling have detected productive translation of alternative open reading frames. In several cases, the alternative and annotated proteins interact. Thus, the expression of two or more proteins translated from the same mRNA may offer a mechanism to ensure the co-expression of proteins which have functional interactions. Translational mechanisms already described in eukaryotic cells indicate that the cellular machinery is able to translate different CDSs from a single viral or cellular mRNA. In addition to summarizing data showing that the protein coding potential of eukaryotic mRNAs has been underestimated, this review aims to challenge the single translated CDS dogma. PMID:26578573
Wang, Jinyan; Yang, Yuwen; Jin, Lamei; Ling, Xitie; Liu, Tingli; Chen, Tianzi; Ji, Yinghua; Yu, Wengui; Zhang, Baolong
2018-06-04
Long Noncoding-RNAs (LncRNAs) are known to be involved in some biological processes, but their roles in plant-virus interactions remain largely unexplored. While circular RNAs (circRNAs) have been studied in animals, there has yet to be extensive research on them in a plant system, especially in tomato-tomato yellow leaf curl virus (TYLCV) interaction. In this study, RNA transcripts from the susceptible tomato line JS-CT-9210 either infected with TYLCV or untreated, were sequenced in a pair-end strand-specific manner using ribo-zero rRNA removal library method. A total of 2056 lncRNAs including 1767 long intergenic non-coding RNA (lincRNAs) and 289 long non-coding natural antisense transcripts (lncNATs) were obtained. The expression patterns in lncRNAs were similar in susceptible tomato plants between control check (CK) and TYLCV infected samples. Our analysis suggested that lncRNAs likely played a role in a variety of functions, including plant hormone signaling, protein processing in the endoplasmic reticulum, RNA transport, ribosome function, photosynthesis, glulathione metabolism, and plant-pathogen interactions. Using virus-induced gene silencing (VIGS) analysis, we found that reduced expression of the lncRNA S-slylnc0957 resulted in enhanced resistance to TYLCV in susceptible tomato plants. Moreover, we identified 184 circRNAs candidates using the CircRNA Identifier (CIRI) software, of which 32 circRNAs were specifically expressed in untreated samples and 83 circRNAs in TYLCV samples. Approximately 62% of these circRNAs were derived from exons. We validated the circRNAs by both PCR and Sanger sequencing using divergent primers, and found that most of circRNAs were derived from the exons of protein coding genes. The silencing of these circRNAs parent genes resulted in decreased TYLCV virus accumulation. In this study, we identified novel lncRNAs and circRNAs using bioinformatic approaches and showed that these RNAs function as negative regulators of TYLCV infection. Moreover, the expression patterns of lncRNAs in susceptible tomato plants were different from that of resistant tomato plants, while exonic circRNAs expression positively associated with their respective protein coding genes. This work provides a foundation for elaborating the novel roles of lncRNAs and circRNAs in susceptible tomatoes following TYLCV infection.
Peng, Rui; Zeng, Bo; Meng, Xiuxiang; Yue, Bisong; Zhang, Zhihe; Zou, Fangdong
2007-08-01
The complete mitochondrial genome sequence of the giant panda, Ailuropoda melanoleuca, was determined by the long and accurate polymerase chain reaction (LA-PCR) with conserved primers and primer walking sequence methods. The complete mitochondrial DNA is 16,805 nucleotides in length and contains two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one control region. The total length of the 13 protein-coding genes is longer than the American black bear, brown bear and polar bear by 3 amino acids at the end of ND5 gene. The codon usage also followed the typical vertebrate pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 5 (ND5) gene. The molecular phylogenetic analysis was performed on the sequences of 12 concatenated heavy-strand encoded protein-coding genes, and suggested that the giant panda is most closely related to bears.
Nedelcu, Aurora M.; Lee, Robert W.; Lemieux, Claude; Gray, Michael W.; Burger, Gertraud
2000-01-01
Two distinct mitochondrial genome types have been described among the green algal lineages investigated to date: a reduced–derived, Chlamydomonas-like type and an ancestral, Prototheca-like type. To determine if this unexpected dichotomy is real or is due to insufficient or biased sampling and to define trends in the evolution of the green algal mitochondrial genome, we sequenced and analyzed the mitochondrial DNA (mtDNA) of Scenedesmus obliquus. This genome is 42,919 bp in size and encodes 42 conserved genes (i.e., large and small subunit rRNA genes, 27 tRNA and 13 respiratory protein-coding genes), four additional free-standing open reading frames with no known homologs, and an intronic reading frame with endonuclease/maturase similarity. No 5S rRNA or ribosomal protein-coding genes have been identified in Scenedesmus mtDNA. The standard protein-coding genes feature a deviant genetic code characterized by the use of UAG (normally a stop codon) to specify leucine, and the unprecedented use of UCA (normally a serine codon) as a signal for termination of translation. The mitochondrial genome of Scenedesmus combines features of both green algal mitochondrial genome types: the presence of a more complex set of protein-coding and tRNA genes is shared with the ancestral type, whereas the lack of 5S rRNA and ribosomal protein-coding genes as well as the presence of fragmented and scrambled rRNA genes are shared with the reduced–derived type of mitochondrial genome organization. Furthermore, the gene content and the fragmentation pattern of the rRNA genes suggest that this genome represents an intermediate stage in the evolutionary process of mitochondrial genome streamlining in green algae. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF204057.] PMID:10854413
Kirsten, Holger; Al-Hasani, Hoor; Holdt, Lesca; Gross, Arnd; Beutner, Frank; Krohn, Knut; Horn, Katrin; Ahnert, Peter; Burkhardt, Ralph; Reiche, Kristin; Hackermüller, Jörg; Löffler, Markus; Teupser, Daniel; Thiery, Joachim; Scholz, Markus
2015-01-01
Genetics of gene expression (eQTLs or expression QTLs) has proved an indispensable tool for understanding biological pathways and pathomechanisms of trait-associated SNPs. However, power of most genome-wide eQTL studies is still limited. We performed a large eQTL study in peripheral blood mononuclear cells of 2112 individuals increasing the power to detect trans-effects genome-wide. Going beyond univariate SNP-transcript associations, we analyse relations of eQTLs to biological pathways, polygenetic effects of expression regulation, trans-clusters and enrichment of co-localized functional elements. We found eQTLs for about 85% of analysed genes, and 18% of genes were trans-regulated. Local eSNPs were enriched up to a distance of 5 Mb to the transcript challenging typically implemented ranges of cis-regulations. Pathway enrichment within regulated genes of GWAS-related eSNPs supported functional relevance of identified eQTLs. We demonstrate that nearest genes of GWAS-SNPs might frequently be misleading functional candidates. We identified novel trans-clusters of potential functional relevance for GWAS-SNPs of several phenotypes including obesity-related traits, HDL-cholesterol levels and haematological phenotypes. We used chromatin immunoprecipitation data for demonstrating biological effects. Yet, we show for strongly heritable transcripts that still little trans-chromosomal heritability is explained by all identified trans-eSNPs; however, our data suggest that most cis-heritability of these transcripts seems explained. Dissection of co-localized functional elements indicated a prominent role of SNPs in loci of pseudogenes and non-coding RNAs for the regulation of coding genes. In summary, our study substantially increases the catalogue of human eQTLs and improves our understanding of the complex genetic regulation of gene expression, pathways and disease-related processes. PMID:26019233
Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins.
Hu, Pingzhao; Janga, Sarath Chandra; Babu, Mohan; Díaz-Mejía, J Javier; Butland, Gareth; Yang, Wenhong; Pogoutse, Oxana; Guo, Xinghua; Phanse, Sadhna; Wong, Peter; Chandran, Shamanta; Christopoulos, Constantine; Nazarians-Armavil, Anaies; Nasseri, Negin Karimi; Musso, Gabriel; Ali, Mehrab; Nazemof, Nazila; Eroukova, Veronika; Golshani, Ashkan; Paccanaro, Alberto; Greenblatt, Jack F; Moreno-Hagelsieb, Gabriel; Emili, Andrew
2009-04-28
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans' biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a "systems-wide" functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhaxybayeva, Olga; Swithers, Kristen S; Foght, Julia
2012-01-01
Here we describe the genome of Mesotoga prima MesG1.Ag4.2, the first genome of a mesophilic Thermotogales bacterium. Mesotoga prima was isolated from a polychlorinated biphenyl (PCB)-dechlorinating enrichment culture from Baltimore Harbor sediments. Its 2.97 Mb genome is considerably larger than any previously sequenced Thermotogales genomes, which range between 1.86 and 2.30 Mb. This larger size is due to both higher numbers of protein-coding genes and larger intergenic regions. In particular, the M. prima genome contains more genes for proteins involved in regulatory functions, for instance those involved in regulation of transcription. Together with its closest relative, Kosmotoga olearia, it alsomore » encodes different types of proteins involved in environmental and cell-cell interactions as compared with other Thermotogales bacteria. Amino acid composition analysis of M. prima proteins implies that this lineage has inhabited low-temperature environments for a long time. A large fraction of the M. prima genome has been acquired by lateral gene transfer (LGT): a DarkHorse analysis suggests that 766 (32%) of predicted protein-coding genes have been involved in LGT after Mesotoga diverged from the other Thermotogales lineages. A notable example of a lineage-specific LGT event is a reductive dehalogenase gene - a key enzyme in dehalorespiration, indicating M. prima may have a more active role in PCB dechlorination than was previously assumed.« less
Quarello, Paola; Garelli, Emanuela; Brusco, Alfredo; Carando, Adriana; Mancini, Cecilia; Pappi, Patrizia; Vinti, Luciana; Svahn, Johanna; Dianzani, Irma; Ramenghi, Ugo
2012-01-01
Diamond-Blackfan anemia is an autosomal dominant disease due to mutations in nine ribosomal protein encoding genes. Because most mutations are loss of function and detected by direct sequencing of coding exons, we reasoned that part of the approximately 50% mutation negative patients may have carried a copy number variant of ribosomal protein genes. As a proof of concept, we designed a multiplex ligation-dependent probe amplification assay targeted to screen the six genes that are most frequently mutated in Diamond-Blackfan anemia patients: RPS17, RPS19, RPS26, RPL5, RPL11, and RPL35A. Using this assay we showed that deletions represent approximately 20% of all mutations. The combination of sequencing and multiplex ligation-dependent probe amplification analysis of these six genes allows the genetic characterization of approximately 65% of patients, showing that Diamond-Blackfan anemia is indisputably a ribosomopathy. PMID:22689679
Basak, Jolly; Nithin, Chandran
2015-01-01
Non-coding RNAs (ncRNAs) have emerged as versatile master regulator of biological functions in recent years. MicroRNAs (miRNAs) are small endogenous ncRNAs of 18-24 nucleotides in length that originates from long self-complementary precursors. Besides their direct involvement in developmental processes, plant miRNAs play key roles in gene regulatory networks and varied biological processes. Alternatively, long ncRNAs (lncRNAs) are a large and diverse class of transcribed ncRNAs whose length exceed that of 200 nucleotides. Plant lncRNAs are transcribed by different RNA polymerases, showing diverse structural features. Plant lncRNAs also are important regulators of gene expression in diverse biological processes. There has been a breakthrough in the technology of genome editing, the CRISPR-Cas9 (clustered regulatory interspaced short palindromic repeats/CRISPR-associated protein 9) technology, in the last decade. CRISPR loci are transcribed into ncRNA and eventually form a functional complex with Cas9 and further guide the complex to cleave complementary invading DNA. The CRISPR-Cas technology has been successfully applied in model plants such as Arabidopsis and tobacco and important crops like wheat, maize, and rice. However, all these studies are focused on protein coding genes. Information about targeting non-coding genes is scarce. Hitherto, the CRISPR-Cas technology has been exclusively used in vertebrate systems to engineer miRNA/lncRNAs, but it is still relatively unexplored in plants. While briefing miRNAs, lncRNAs and applications of the CRISPR-Cas technology in human and animals, this review essentially elaborates several strategies to overcome the challenges of applying the CRISPR-Cas technology in editing ncRNAs in plants and the future perspective of this field.
Pdsg1 and Pdsg2, Novel Proteins Involved in Developmental Genome Remodelling in Paramecium
Hoehener, Cristina; Singh, Aditi; Swart, Estienne C.; Nowacki, Mariusz
2014-01-01
The epigenetic influence of maternal cells on the development of their progeny has long been studied in various eukaryotes. Multicellular organisms usually provide their zygotes not only with nutrients but also with functional elements required for proper development, such as coding and non-coding RNAs. These maternally deposited RNAs exhibit a variety of functions, from regulating gene expression to assuring genome integrity. In ciliates, such as Paramecium these RNAs participate in the programming of large-scale genome reorganization during development, distinguishing germline-limited DNA, which is excised, from somatic-destined DNA. Only a handful of proteins playing roles in this process have been identified so far, including typical RNAi-derived factors such as Dicer-like and Piwi proteins. Here we report and characterize two novel proteins, Pdsg1 and Pdsg2 (Paramecium protein involved in Development of the Somatic Genome 1 and 2), involved in Paramecium genome reorganization. We show that these proteins are necessary for the excision of germline-limited DNA during development and the survival of sexual progeny. Knockdown of PDSG1 and PDSG2 genes affects the populations of small RNAs known to be involved in the programming of DNA elimination (scanRNAs and iesRNAs) and chromatin modification patterns during development. Our results suggest an association between RNA-mediated trans-generational epigenetic signal and chromatin modifications in the process of Paramecium genome reorganization. PMID:25397898
Pdsg1 and Pdsg2, novel proteins involved in developmental genome remodelling in Paramecium.
Arambasic, Miroslav; Sandoval, Pamela Y; Hoehener, Cristina; Singh, Aditi; Swart, Estienne C; Nowacki, Mariusz
2014-01-01
The epigenetic influence of maternal cells on the development of their progeny has long been studied in various eukaryotes. Multicellular organisms usually provide their zygotes not only with nutrients but also with functional elements required for proper development, such as coding and non-coding RNAs. These maternally deposited RNAs exhibit a variety of functions, from regulating gene expression to assuring genome integrity. In ciliates, such as Paramecium these RNAs participate in the programming of large-scale genome reorganization during development, distinguishing germline-limited DNA, which is excised, from somatic-destined DNA. Only a handful of proteins playing roles in this process have been identified so far, including typical RNAi-derived factors such as Dicer-like and Piwi proteins. Here we report and characterize two novel proteins, Pdsg1 and Pdsg2 (Paramecium protein involved in Development of the Somatic Genome 1 and 2), involved in Paramecium genome reorganization. We show that these proteins are necessary for the excision of germline-limited DNA during development and the survival of sexual progeny. Knockdown of PDSG1 and PDSG2 genes affects the populations of small RNAs known to be involved in the programming of DNA elimination (scanRNAs and iesRNAs) and chromatin modification patterns during development. Our results suggest an association between RNA-mediated trans-generational epigenetic signal and chromatin modifications in the process of Paramecium genome reorganization.
BRD4 assists elongation of both coding and enhancer RNAs guided by histone acetylation
Kanno, Tomohiko; Kanno, Yuka; LeRoy, Gary; Campos, Eric; Sun, Hong-Wei; Brooks, Stephen R; Vahedi, Golnaz; Heightman, Tom D; Garcia, Benjamin A; Reinberg, Danny; Siebenlist, Ulrich; O’Shea, John J; Ozato, Keiko
2016-01-01
Small-molecule BET inhibitors interfere with the epigenetic interactions between acetylated histones and the bromodomains of the BET family proteins, including BRD4, and they potently inhibit growth of malignant cells by targeting cancer-promoting genes. BRD4 interacts with the pause-release factor P-TEFb, and has been proposed to release Pol II from promoter-proximal pausing. We show that BRD4 occupied widespread genomic regions in mouse cells, and directly stimulated elongation of both protein-coding transcripts and non-coding enhancer RNAs (eRNAs), dependent on the function of bromodomains. BRD4 interacted physically with elongating Pol II complexes, and assisted Pol II progression through hyper-acetylated nucleosomes by interacting with acetylated histones via bromodomains. On active enhancers, the BET inhibitor JQ1 antagonized BRD4-associated eRNA synthesis. Thus, BRD4 is involved in multiple steps of the transcription hierarchy, primarily by assisting transcript elongation both at enhancers and on gene bodies. PMID:25383670
The Mediator complex and transcription regulation
Poss, Zachary C.; Ebmeier, Christopher C.
2013-01-01
The Mediator complex is a multi-subunit assembly that appears to be required for regulating expression of most RNA polymerase II (pol II) transcripts, which include protein-coding and most non-coding RNA genes. Mediator and pol II function within the pre-initiation complex (PIC), which consists of Mediator, pol II, TFIIA, TFIIB, TFIID, TFIIE, TFIIF and TFIIH and is approximately 4.0 MDa in size. Mediator serves as a central scaffold within the PIC and helps regulate pol II activity in ways that remain poorly understood. Mediator is also generally targeted by sequence-specific, DNA-binding transcription factors (TFs) that work to control gene expression programs in response to developmental or environmental cues. At a basic level, Mediator functions by relaying signals from TFs directly to the pol II enzyme, thereby facilitating TF-dependent regulation of gene expression. Thus, Mediator is essential for converting biological inputs (communicated by TFs) to physiological responses (via changes in gene expression). In this review, we summarize an expansive body of research on the Mediator complex, with an emphasis on yeast and mammalian complexes. We focus on the basics that underlie Mediator function, such as its structure and subunit composition, and describe its broad regulatory influence on gene expression, ranging from chromatin architecture to transcription initiation and elongation, to mRNA processing. We also describe factors that influence Mediator structure and activity, including TFs, non-coding RNAs and the CDK8 module. PMID:24088064
Raju, Hemalatha B.; Tsinoremas, Nicholas F.; Capobianco, Enrico
2016-01-01
Regeneration of injured nerves is likely occurring in the peripheral nervous system, but not in the central nervous system. Although protein-coding gene expression has been assessed during nerve regeneration, little is currently known about the role of non-coding RNAs (ncRNAs). This leaves open questions about the potential effects of ncRNAs at transcriptome level. Due to the limited availability of human neuropathic pain (NP) data, we have identified the most comprehensive time-course gene expression profile referred to sciatic nerve (SN) injury and studied in a rat model using two neuronal tissues, namely dorsal root ganglion (DRG) and SN. We have developed a methodology to identify differentially expressed bioentities starting from microarray probes and repurposing them to annotate ncRNAs, while analyzing the expression profiles of protein-coding genes. The approach is designed to reuse microarray data and perform first profiling and then meta-analysis through three main steps. First, we used contextual analysis to identify what we considered putative or potential protein-coding targets for selected ncRNAs. Relevance was therefore assigned to differential expression of neighbor protein-coding genes, with neighborhood defined by a fixed genomic distance from long or antisense ncRNA loci, and of parental genes associated with pseudogenes. Second, connectivity among putative targets was used to build networks, in turn useful to conduct inference at interactomic scale. Last, network paths were annotated to assess relevance to NP. We found significant differential expression in long-intergenic ncRNAs (32 lincRNAs in SN and 8 in DRG), antisense RNA (31 asRNA in SN and 12 in DRG), and pseudogenes (456 in SN and 56 in DRG). In particular, contextual analysis centered on pseudogenes revealed some targets with known association to neurodegeneration and/or neurogenesis processes. While modules of the olfactory receptors were clearly identified in protein–protein interaction networks, other connectivity paths were identified between proteins already investigated in studies on disorders, such as Parkinson, Down syndrome, Huntington disease, and Alzheimer. Our findings suggest the importance of reusing gene expression data by meta-analysis approaches. PMID:27803687
Ning, S B; Wang, L; Song, Y C
2000-01-01
Peroxidase plays a key role in plant disease resistance, cold stress and some developmental processes, and cold-regulated protein functions necessarily in reaction of plants on cold or heat stress. Recent studies showed that these processes in plant cells were involved in programmed cell death (PCD). Using a biotin-labelled in situ hybridization (ISH) technique, we physically mapped the genes px and cld coding peroxidase and cold-regulated protein respectively onto maize chromosomes. Both DAB and fluorescence detection systems gave the identical results, the probe uaz235 corresponding to gene px was localized onto the long arm of chromosome 2 (2L) and 7L, and csu19 corresponding to gene cld was hybridized onto 4L and 5L. The percentage distances (from the hybridization sites to centromeres) of uaz235 in 2L and 7L were 45.4 +/- 1.3 and 67.4 +/- 3.7 respectively, and those of csu19 in 4L and 5L were 68.6 +/- 2.6 and 58.2 +/- 1.6 respectively. The physical positions of px in 2L and cld in 4L coincide with those in their genetic map pattern. The results also show that both of these genes have duplicated sites in maize genome.
Etebari, Kayvan; Furlong, Michael J.; Asgari, Sassan
2015-01-01
Long non-coding RNAs (lncRNAs) play important roles in genomic imprinting, cancer, differentiation and regulation of gene expression. Here, we identified 3844 long intergenic ncRNAs (lincRNA) in Plutella xylostella, which is a notorious pest of cruciferous plants that has developed field resistance to all classes of insecticides, including Bacillus thuringiensis (Bt) endotoxins. Further, we found that some of those lincRNAs may potentially serve as precursors for the production of small ncRNAs. We found 280 and 350 lincRNAs that are differentially expressed in Chlorpyrifos and Fipronil resistant larvae. A survey on P. xylostella midgut transcriptome data from Bt-resistant populations revealed 59 altered lincRNA in two resistant strains compared with the susceptible population. We validated the transcript levels of a number of putative lincRNAs in deltamethrin-resistant larvae that were exposed to deltamethrin, which indicated that this group of lincRNAs might be involved in the response to xenobiotics in this insect. To functionally characterize DBM lincRNAs, gene ontology (GO) enrichment of their associated protein-coding genes was extracted and showed over representation of protein, DNA and RNA binding GO terms. The data presented here will facilitate future studies to unravel the function of lincRNAs in insecticide resistance or the response to xenobiotics of eukaryotic cells. PMID:26411386
Untangling the Web: The Diverse Functions of the PIWI/piRNA Pathway
Mani, Sneha Ramesh; Juliano, Celina E.
2014-01-01
SUMMARY Small RNAs impact several cellular processes through gene regulation. Argonaute proteins bind small RNAs to form effector complexes that control transcriptional and post-transcriptional gene expression. PIWI proteins belong to the Argonaute protein family, and bind PIWI-interacting RNAs (piRNAs). They are highly abundant in the germline, but are also expressed in some somatic tissues. The PIWI/piRNA pathway has a role in transposon repression in Drosophila, which occurs both by epigenetic regulation and post-transcriptional degradation of transposon mRNAs. These functions are conserved, but clear differences in the extent and mechanism of transposon repression exist between species. Mutations in piwi genes lead to the upregulation of transposon mRNAs. It is hypothesized that this increased transposon mobilization leads to genomic instability and thus sterility, although no causal link has been established between transposon upregulation and genome instability. An alternative scenario could be that piwi mutations directly affect genomic instability, and thus lead to increased transposon expression. We propose that the PIWI/piRNA pathway controls genome stability in several ways: suppression of transposons, direct regulation of chromatin architecture and regulation of genes that control important biological processes related to genome stability. The PIWI/piRNA pathway also regulates at least some, if not many, protein-coding genes, which further lends support to the idea that piwi genes may have broader functions beyond transposon repression. An intriguing possibility is that the PIWI/piRNA pathway is using transposon sequences to coordinate the expression of large groups of genes to regulate cellular function. PMID:23712694
Duellman, Tyler; Warren, Christopher; Yang, Jay
2014-01-01
Microribonucleic acids (miRNAs) work with exquisite specificity and are able to distinguish a target from a non-target based on a single nucleotide mismatch in the core nucleotide domain. We questioned whether miRNA regulation of gene expression could occur in a single nucleotide polymorphism (SNP)-specific manner, manifesting as a post-transcriptional control of expression of genetic polymorphisms. In our recent study of the functional consequences of matrix metalloproteinase (MMP)-9 SNPs, we discovered that expression of a coding exon SNP in the pro-domain of the protein resulted in a profound decrease in the secreted protein. This missense SNP results in the N38S amino acid change and a loss of an N-glycosylation site. A systematic study demonstrated that the loss of secreted protein was due not to the loss of an N-glycosylation site, but rather an SNP-specific targeting by miR-671-3p and miR-657. Bioinformatics analysis identified 41 SNP-specific miRNA targeting MMP-9 SNPs, mostly in the coding exon and an extension of the analysis to chromosome 20, where the MMP-9 gene is located, suggesting that SNP-specific miRNAs targeting the coding exon are prevalent. This selective post-transcriptional regulation of a target messenger RNA harboring genetic polymorphisms by miRNAs offers an SNP-dependent post-transcriptional regulatory mechanism, allowing for polymorphic-specific differential gene regulation. PMID:24627221
Decoding Mechanisms by which Silent Codon Changes Influence Protein Biogenesis and Function
Bali, Vedrana; Bebok, Zsuzsanna
2015-01-01
Scope Synonymous codon usage has been a focus of investigation since the discovery of the genetic code and its redundancy. The occurrences of synonymous codons vary between species and within genes of the same genome, known as codon usage bias. Today, bioinformatics and experimental data allow us to compose a global view of the mechanisms by which the redundancy of the genetic code contributes to the complexity of biological systems from affecting survival in prokaryotes, to fine tuning the structure and function of proteins in higher eukaryotes. Studies analyzing the consequences of synonymous codon changes in different organisms have revealed that they impact nucleic acid stability, protein levels, structure and function without altering amino acid sequence. As such, synonymous mutations inevitably contribute to the pathogenesis of complex human diseases. Yet, fundamental questions remain unresolved regarding the impact of silent mutations in human disorders. In the present review we describe developments in this area concentrating on mechanisms by which synonymous mutations may affect protein function and human health. Purpose This synopsis illustrates the significance of synonymous mutations in disease pathogenesis. We review the different steps of gene expression affected by silent mutations, and assess the benefits and possible harmful effects of codon optimization applied in the development of therapeutic biologics. Physiological and medical relevance Understanding mechanisms by which synonymous mutations contribute to complex diseases such as cancer, neurodegeneration and genetic disorders, including the limitations of codon-optimized biologics, provides insight concerning interpretation of silent variants and future molecular therapies. PMID:25817479
Möller, André; Xie, Sheila Q.; Hosp, Fabian; Lang, Benjamin; Phatnani, Hemali P.; James, Sonya; Ramirez, Francisco; Collin, Gayle B.; Naggert, Jürgen K.; Babu, M. Madan; Greenleaf, Arno L.; Selbach, Matthias; Pombo, Ana
2012-01-01
RNA polymerase II (RNAPII) transcribes protein-coding genes in eukaryotes and interacts with factors involved in chromatin remodeling, transcriptional activation, elongation, and RNA processing. Here, we present the isolation of native RNAPII complexes using mild extraction conditions and immunoaffinity purification. RNAPII complexes were extracted from mitotic cells, where they exist dissociated from chromatin. The proteomic content of native complexes in total and size-fractionated extracts was determined using highly sensitive LC-MS/MS. Protein associations with RNAPII were validated by high-resolution immunolocalization experiments in both mitotic cells and in interphase nuclei. Functional assays of transcriptional activity were performed after siRNA-mediated knockdown. We identify >400 RNAPII associated proteins in mitosis, among these previously uncharacterized proteins for which we show roles in transcriptional elongation. We also identify, as novel functional RNAPII interactors, two proteins involved in human disease, ALMS1 and TFG, emphasizing the importance of gene regulation for normal development and physiology. PMID:22199231
Foldability of a Natural De Novo Evolved Protein.
Bungard, Dixie; Copple, Jacob S; Yan, Jing; Chhun, Jimmy J; Kumirov, Vlad K; Foy, Scott G; Masel, Joanna; Wysocki, Vicki H; Cordes, Matthew H J
2017-11-07
The de novo evolution of protein-coding genes from noncoding DNA is emerging as a source of molecular innovation in biology. Studies of random sequence libraries, however, suggest that young de novo proteins will not fold into compact, specific structures typical of native globular proteins. Here we show that Bsc4, a functional, natural de novo protein encoded by a gene that evolved recently from noncoding DNA in the yeast S. cerevisiae, folds to a partially specific three-dimensional structure. Bsc4 forms soluble, compact oligomers with high β sheet content and a hydrophobic core, and undergoes cooperative, reversible denaturation. Bsc4 lacks a specific quaternary state, however, existing instead as a continuous distribution of oligomer sizes, and binds dyes indicative of amyloid oligomers or molten globules. The combination of native-like and non-native-like properties suggests a rudimentary fold that could potentially act as a functional intermediate in the emergence of new folded proteins de novo. Copyright © 2017 Elsevier Ltd. All rights reserved.
Family-specific scaling laws in bacterial genomes.
De Lazzari, Eleonora; Grilli, Jacopo; Maslov, Sergei; Cosentino Lagomarsino, Marco
2017-07-27
Among several quantitative invariants found in evolutionary genomics, one of the most striking is the scaling of the overall abundance of proteins, or protein domains, sharing a specific functional annotation across genomes of given size. The size of these functional categories change, on average, as power-laws in the total number of protein-coding genes. Here, we show that such regularities are not restricted to the overall behavior of high-level functional categories, but also exist systematically at the level of single evolutionary families of protein domains. Specifically, the number of proteins within each family follows family-specific scaling laws with genome size. Functionally similar sets of families tend to follow similar scaling laws, but this is not always the case. To understand this systematically, we provide a comprehensive classification of families based on their scaling properties. Additionally, we develop a quantitative score for the heterogeneity of the scaling of families belonging to a given category or predefined group. Under the common reasonable assumption that selection is driven solely or mainly by biological function, these findings point to fine-tuned and interdependent functional roles of specific protein domains, beyond our current functional annotations. This analysis provides a deeper view on the links between evolutionary expansion of protein families and the functional constraints shaping the gene repertoire of bacterial genomes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Panganiban, Ronald A; Sun, Maoyun; Dahlin, Amber; Park, Hae-Ryung; Kan, Mengyuan; Himes, Blanca E; Mitchel, Jennifer A; Iribarren, Carlos; Jorgenson, Eric; Randell, Scott H; Israel, Elliot; Tantisira, Kelan; Shore, Stephanie; Park, Jin-Ah; Weiss, Scott T; Wu, Ann Chen; Lu, Quan
2018-01-09
Genetic variants in the chromosomal region 17q21 are consistently associated with asthma. However, mechanistic studies have not yet linked any of the associated variants to a function that could influence asthma, and as a result, the identity of the asthma gene(s) remains elusive. We sought to identify and characterize functional variants in the 17q21 locus. We used the Exome Aggregation Consortium browser to identify coding (amino acid-changing) variants in the 17q21 locus. We obtained asthma association measures for these variants in both the Genetic Epidemiology Research in Adult Health and Aging (GERA) cohort (16,274 cases and 38,269 matched controls) and the EVE Consortium study (5,303 asthma cases and 12,560 individuals). Gene expression and protein localization were determined by quantitative RT-PCR and fluorescence immunostaining, respectively. Molecular and cellular studies were performed to determine the functional effects of coding variants. Two coding variants (rs2305480 and rs11078928) of the gasdermin B (GSDMB) gene in the 17q21 locus were associated with lower asthma risk in both GERA (odds ratio, 0.92; P = 1.01 × 10 -6 ) and EVE (odds ratio, 0.85; joint P EVE = 1.31 × 10 -13 ). In GERA, rs11078928 had a minor allele frequency (MAF) of 0.45 in unaffected (nonasthmatic) controls and 0.43 in asthma cases. For European Americans in EVE, the MAF of rs2305480 was 0.45 for controls and 0.39 for cases; for all EVE subjects, the MAF was 0.32 for controls and 0.27 for cases. GSDMB is highly expressed in differentiated airway epithelial cells, including the ciliated cells. We found that, when the GSDMB protein is cleaved by inflammatory caspase-1 to release its N-terminal fragment, potent pyroptotic cell death is induced. The splice variant rs11078928 deletes the entire exon 6, which encodes 13 amino acids in the critical N-terminus, and abolishes the pyroptotic activity of the GSDMB protein. Our study identified a functional asthma variant in the GSDMB gene of the 17q21 locus and implicates GSDMB-mediated epithelial cell pyroptosis in pathogenesis. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
2012-01-01
Background Evolution of splice sites is a well-known phenomenon that results in transcript diversity during human evolution. Many novel splice sites are derived from repetitive elements and may not contribute to protein products. Here, we analyzed annotated human protein-coding exons and identified human-specific splice sites that arose after the human-chimpanzee divergence. Results We analyzed multiple alignments of the annotated human protein-coding exons and their respective orthologous mammalian genome sequences to identify 85 novel splice sites (50 splice acceptors and 35 donors) in the human genome. The novel protein-coding exons, which are expressed either constitutively or alternatively, produce novel protein isoforms by insertion, deletion, or frameshift. We found three cases in which the human-specific isoform conferred novel molecular function in the human cells: the human-specific IMUP protein isoform induces apoptosis of the trophoblast and is implicated in pre-eclampsia; the intronization of a part of SMOX gene exon produces inactive spermine oxidase; the human-specific NUB1 isoform shows reduced interaction with ubiquitin-like proteins, possibly affecting ubiquitin pathways. Conclusions Although the generation of novel protein isoforms does not equate to adaptive evolution, we propose that these cases are useful candidates for a molecular functional study to identify proteomic changes that might bring about novel phenotypes during human evolution. PMID:23148531
Genome-wide analysis of alternative splicing during human heart development
NASA Astrophysics Data System (ADS)
Wang, He; Chen, Yanmei; Li, Xinzhong; Chen, Guojun; Zhong, Lintao; Chen, Gangbing; Liao, Yulin; Liao, Wangjun; Bin, Jianping
2016-10-01
Alternative splicing (AS) drives determinative changes during mouse heart development. Recent high-throughput technological advancements have facilitated genome-wide AS, while its analysis in human foetal heart transition to the adult stage has not been reported. Here, we present a high-resolution global analysis of AS transitions between human foetal and adult hearts. RNA-sequencing data showed extensive AS transitions occurred between human foetal and adult hearts, and AS events occurred more frequently in protein-coding genes than in long non-coding RNA (lncRNA). A significant difference of AS patterns was found between foetal and adult hearts. The predicted difference in AS events was further confirmed using quantitative reverse transcription-polymerase chain reaction analysis of human heart samples. Functional foetal-specific AS event analysis showed enrichment associated with cell proliferation-related pathways including cell cycle, whereas adult-specific AS events were associated with protein synthesis. Furthermore, 42.6% of foetal-specific AS events showed significant changes in gene expression levels between foetal and adult hearts. Genes exhibiting both foetal-specific AS and differential expression were highly enriched in cell cycle-associated functions. In conclusion, we provided a genome-wide profiling of AS transitions between foetal and adult hearts and proposed that AS transitions and deferential gene expression may play determinative roles in human heart development.
Pervasive transcription: detecting functional RNAs in bacteria.
Lybecker, Meghan; Bilusic, Ivana; Raghavan, Rahul
2014-01-01
Pervasive, or genome-wide, transcription has been reported in all domains of life. In bacteria, most pervasive transcription occurs antisense to protein-coding transcripts, although recently a new class of pervasive RNAs was identified that originates from within annotated genes. Initially considered to be non-functional transcriptional noise, pervasive transcription is increasingly being recognized as important in regulating gene expression. The function of pervasive transcription is an extensively debated question in the field of transcriptomics and regulatory RNA biology. Here, we highlight the most recent contributions addressing the purpose of pervasive transcription in bacteria and discuss their implications.
López-Igual, Rocío; Wilson, Adjélé; Bourcier de Carbon, Céline; Sutter, Markus; Turmo, Aiko
2016-01-01
The photoactive Orange Carotenoid Protein (OCP) is involved in cyanobacterial photoprotection. Its N-terminal domain (NTD) is responsible for interaction with the antenna and induction of excitation energy quenching, while the C-terminal domain is the regulatory domain that senses light and induces photoactivation. In most nitrogen-fixing cyanobacterial strains, there are one to four paralogous genes coding for homologs to the NTD of the OCP. The functions of these proteins are unknown. Here, we study the expression, localization, and function of these genes in Anabaena sp. PCC 7120. We show that the four genes present in the genome are expressed in both vegetative cells and heterocysts but do not seem to have an essential role in heterocyst formation. This study establishes that all four Anabaena NTD-like proteins can bind a carotenoid and the different paralogs have distinct functions. Surprisingly, only one paralog (All4941) was able to interact with the antenna and to induce permanent thermal energy dissipation. Two of the other Anabaena paralogs (All3221 and Alr4783) were shown to be very good singlet oxygen quenchers. The fourth paralog (All1123) does not seem to be involved in photoprotection. Structural homology modeling allowed us to propose specific features responsible for the different functions of these soluble carotenoid-binding proteins. PMID:27208286
Prevalence of transcription promoters within archaeal operons and coding sequences.
Koide, Tie; Reiss, David J; Bare, J Christopher; Pang, Wyming Lee; Facciotti, Marc T; Schmid, Amy K; Pan, Min; Marzolf, Bruz; Van, Phu T; Lo, Fang-Yin; Pratap, Abhishek; Deutsch, Eric W; Peterson, Amelia; Martin, Dan; Baliga, Nitin S
2009-01-01
Despite the knowledge of complex prokaryotic-transcription mechanisms, generalized rules, such as the simplified organization of genes into operons with well-defined promoters and terminators, have had a significant role in systems analysis of regulatory logic in both bacteria and archaea. Here, we have investigated the prevalence of alternate regulatory mechanisms through genome-wide characterization of transcript structures of approximately 64% of all genes, including putative non-coding RNAs in Halobacterium salinarum NRC-1. Our integrative analysis of transcriptome dynamics and protein-DNA interaction data sets showed widespread environment-dependent modulation of operon architectures, transcription initiation and termination inside coding sequences, and extensive overlap in 3' ends of transcripts for many convergently transcribed genes. A significant fraction of these alternate transcriptional events correlate to binding locations of 11 transcription factors and regulators (TFs) inside operons and annotated genes-events usually considered spurious or non-functional. Using experimental validation, we illustrate the prevalence of overlapping genomic signals in archaeal transcription, casting doubt on the general perception of rigid boundaries between coding sequences and regulatory elements.
Dong, Chen; Hu, Huigang; Xie, Jianghui
2016-12-01
DNA-binding with one finger (Dof) domain proteins are a multigene family of plant-specific transcription factors involved in numerous aspects of plant growth and development. In this study, we report a genome-wide search for Musa acuminata Dof (MaDof) genes and their expression profiles at different developmental stages and in response to various abiotic stresses. In addition, a complete overview of the Dof gene family in bananas is presented, including the gene structures, chromosomal locations, cis-regulatory elements, conserved protein domains, and phylogenetic inferences. Based on the genome-wide analysis, we identified 74 full-length protein-coding MaDof genes unevenly distributed on 11 chromosomes. Phylogenetic analysis with Dof members from diverse plant species showed that MaDof genes can be classified into four subgroups (StDof I, II, III, and IV). The detailed genomic information of the MaDof gene homologs in the present study provides opportunities for functional analyses to unravel the exact role of the genes in plant growth and development.
APPRIS: annotation of principal and alternative splice isoforms
Rodriguez, Jose Manuel; Maietta, Paolo; Ezkurdia, Iakes; Pietrelli, Alessandro; Wesselink, Jan-Jaap; Lopez, Gonzalo; Valencia, Alfonso; Tress, Michael L.
2013-01-01
Here, we present APPRIS (http://appris.bioinfo.cnio.es), a database that houses annotations of human splice isoforms. APPRIS has been designed to provide value to manual annotations of the human genome by adding reliable protein structural and functional data and information from cross-species conservation. The visual representation of the annotations provided by APPRIS for each gene allows annotators and researchers alike to easily identify functional changes brought about by splicing events. In addition to collecting, integrating and analyzing reliable predictions of the effect of splicing events, APPRIS also selects a single reference sequence for each gene, here termed the principal isoform, based on the annotations of structure, function and conservation for each transcript. APPRIS identifies a principal isoform for 85% of the protein-coding genes in the GENCODE 7 release for ENSEMBL. Analysis of the APPRIS data shows that at least 70% of the alternative (non-principal) variants would lose important functional or structural information relative to the principal isoform. PMID:23161672
Complete mitochondrial genome of the agarophyte red alga Gelidium vagum (Gelidiales).
Yang, Eun Chan; Kim, Kyeong Mi; Boo, Ga Hun; Lee, Jung-Hyun; Boo, Sung Min; Yoon, Hwan Su
2014-08-01
We describe the first complete mitochondrial genome of Gelidium vagum (Gelidiales) (24,901 bp, 30.4% GC content), an agar-producing red alga. The circular mitochondrial genome contains 43 genes, including 23 protein-coding, 18 tRNA and 2 rRNA genes. All the protein-coding genes have a typical ATG start codon. No introns were found. Two genes, secY and rps12, were overlapped by 41 bp.
Li, Wan; Zhu, Lina; Huang, Hao; He, Yuehan; Lv, Junjie; Li, Weimin; Chen, Lina; He, Weiming
2017-10-01
Complex chronic diseases are caused by the effects of genetic and environmental factors. Single nucleotide polymorphisms (SNPs), one common type of genetic variations, played vital roles in diseases. We hypothesized that disease risk functional SNPs in coding regions and protein interaction network modules were more likely to contribute to the identification of disease susceptible genes for complex chronic diseases. This could help to further reveal the pathogenesis of complex chronic diseases. Disease risk SNPs were first recognized from public SNP data for coronary heart disease (CHD), hypertension (HT) and type 2 diabetes (T2D). SNPs in coding regions that were classified into nonsense and missense by integrating several SNP functional annotation databases were treated as functional SNPs. Then, regions significantly associated with each disease were screened using random permutations for disease risk functional SNPs. Corresponding to these regions, 155, 169 and 173 potential disease susceptible genes were identified for CHD, HT and T2D, respectively. A disease-related gene product interaction network in environmental context was constructed for interacting gene products of both disease genes and potential disease susceptible genes for these diseases. After functional enrichment analysis for disease associated modules, 5 CHD susceptible genes, 7 HT susceptible genes and 3 T2D susceptible genes were finally identified, some of which had pleiotropic effects. Most of these genes were verified to be related to these diseases in literature. This was similar for disease genes identified from another method proposed by Lee et al. from a different aspect. This research could provide novel perspectives for diagnosis and treatment of complex chronic diseases and susceptible genes identification for other diseases. Copyright © 2017 Elsevier Inc. All rights reserved.
Youngs, Noah; Penfold-Brown, Duncan; Drew, Kevin; Shasha, Dennis; Bonneau, Richard
2013-05-01
Computational biologists have demonstrated the utility of using machine learning methods to predict protein function from an integration of multiple genome-wide data types. Yet, even the best performing function prediction algorithms rely on heuristics for important components of the algorithm, such as choosing negative examples (proteins without a given function) or determining key parameters. The improper choice of negative examples, in particular, can hamper the accuracy of protein function prediction. We present a novel approach for choosing negative examples, using a parameterizable Bayesian prior computed from all observed annotation data, which also generates priors used during function prediction. We incorporate this new method into the GeneMANIA function prediction algorithm and demonstrate improved accuracy of our algorithm over current top-performing function prediction methods on the yeast and mouse proteomes across all metrics tested. Code and Data are available at: http://bonneaulab.bio.nyu.edu/funcprop.html
A DEK Domain-Containing Protein Modulates Chromatin Structure and Function in Arabidopsis[W][OPEN
Waidmann, Sascha; Kusenda, Branislav; Mayerhofer, Juliane; Mechtler, Karl; Jonak, Claudia
2014-01-01
Chromatin is a major determinant in the regulation of virtually all DNA-dependent processes. Chromatin architectural proteins interact with nucleosomes to modulate chromatin accessibility and higher-order chromatin structure. The evolutionarily conserved DEK domain-containing protein is implicated in important chromatin-related processes in animals, but little is known about its DNA targets and protein interaction partners. In plants, the role of DEK has remained elusive. In this work, we identified DEK3 as a chromatin-associated protein in Arabidopsis thaliana. DEK3 specifically binds histones H3 and H4. Purification of other proteins associated with nuclear DEK3 also established DNA topoisomerase 1α and proteins of the cohesion complex as in vivo interaction partners. Genome-wide mapping of DEK3 binding sites by chromatin immunoprecipitation followed by deep sequencing revealed enrichment of DEK3 at protein-coding genes throughout the genome. Using DEK3 knockout and overexpressor lines, we show that DEK3 affects nucleosome occupancy and chromatin accessibility and modulates the expression of DEK3 target genes. Furthermore, functional levels of DEK3 are crucial for stress tolerance. Overall, data indicate that DEK3 contributes to modulation of Arabidopsis chromatin structure and function. PMID:25387881
Design and construction of functional AAV vectors.
Gray, John T; Zolotukhin, Serge
2011-01-01
Using the basic principles of molecular biology and laboratory techniques presented in this chapter, researchers should be able to create a wide variety of AAV vectors for both clinical and basic research applications. Basic vector design concepts are covered for both protein coding gene expression and small non-coding RNA gene expression cassettes. AAV plasmid vector backbones (available via AddGene) are described, along with critical sequence details for a variety of modular expression components that can be inserted as needed for specific applications. Protocols are provided for assembling the various DNA components into AAV vector plasmids in Escherichia coli, as well as for transferring these vector sequences into baculovirus genomes for large-scale production of AAV in the insect cell production system.
Véliz, David; Vega-Retter, Caren; Quezada-Romegialli, Claudio
2016-01-01
The complete sequence of the mitochondrial genome for the Chilean silverside Basilichthys microlepidotus is reported for the first time. The entire mitochondrial genome was 16,544 bp in length (GenBank accession no. KM245937); gene composition and arrangement was conformed to that reported for most fishes and contained the typical structure of 2 rRNAs, 13 protein-coding genes, 22 tRNAs and a non-coding region. The assembled mitogenome was validated against sequences of COI and Control Region previously sequenced in our lab, functional genes from RNA-Seq data for the same species and the mitogenome of two other atherinopsid species available in Genbank.
Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R
2015-01-01
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.
Dasenko, Mark A.
2015-01-01
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693
Genic insights from integrated human proteomics in GeneCards.
Fishilevich, Simon; Zimmerman, Shahar; Kohn, Asher; Iny Stein, Tsippi; Olender, Tsviya; Kolker, Eugene; Safran, Marilyn; Lancet, Doron
2016-01-01
GeneCards is a one-stop shop for searchable human gene annotations (http://www.genecards.org/). Data are automatically mined from ∼120 sources and presented in an integrated web card for every human gene. We report the application of recent advances in proteomics to enhance gene annotation and classification in GeneCards. First, we constructed the Human Integrated Protein Expression Database (HIPED), a unified database of protein abundance in human tissues, based on the publically available mass spectrometry (MS)-based proteomics sources ProteomicsDB, Multi-Omics Profiling Expression Database, Protein Abundance Across Organisms and The MaxQuant DataBase. The integrated database, residing within GeneCards, compares favourably with its individual sources, covering nearly 90% of human protein-coding genes. For gene annotation and comparisons, we first defined a protein expression vector for each gene, based on normalized abundances in 69 normal human tissues. This vector is portrayed in the GeneCards expression section as a bar graph, allowing visual inspection and comparison. These data are juxtaposed with transcriptome bar graphs. Using the protein expression vectors, we further defined a pairwise metric that helps assess expression-based pairwise proximity. This new metric for finding functional partners complements eight others, including sharing of pathways, gene ontology (GO) terms and domains, implemented in the GeneCards Suite. In parallel, we calculated proteome-based differential expression, highlighting a subset of tissues that overexpress a gene and subserving gene classification. This textual annotation allows users of VarElect, the suite's next-generation phenotyper, to more effectively discover causative disease variants. Finally, we define the protein-RNA expression ratio and correlation as yet another attribute of every gene in each tissue, adding further annotative information. The results constitute a significant enhancement of several GeneCards sections and help promote and organize the genome-wide structural and functional knowledge of the human proteome. Database URL:http://www.genecards.org/. © The Author(s) 2016. Published by Oxford University Press.
Wu, Yichao; Arumugam, Krithika; Tay, Martin Qi Xiang; Seshan, Hari; Mohanty, Anee; Cao, Bin
2015-04-01
Comamonas testosteroni is an important environmental bacterium capable of degrading a variety of toxic aromatic pollutants and has been demonstrated to be a promising biocatalyst for environmental decontamination. This organism is often found to be among the primary surface colonizers in various natural and engineered ecosystems, suggesting an extraordinary capability of this organism in environmental adaptation and biofilm formation. The goal of this study was to gain genetic insights into the adaption of C. testosteroni to versatile environments and the importance of a biofilm lifestyle. Specifically, a draft genome of C. testosteroni I2 was obtained. The draft genome is 5,778,710 bp in length and comprises 110 contigs. The average G+C content was 61.88 %. A total of 5365 genes with 5263 protein-coding genes were predicted, whereas 4324 (80.60 % of total genes) protein-encoding genes were associated with predicted functions. The catabolic genes responsible for biodegradation of steroid and other aromatic compounds on draft genome were identified. Plasmid pI2 was found to encode a complete pathway for aniline degradation and a partial catabolic pathway for chloroaniline. This organism was found to be equipped with a sophisticated signaling system which helps it find ideal niches and switch between planktonic and biofilm lifestyles. A large number of putative multi-drug-resistant genes coding for abundant outer membrane transporters, chaperones, and heat shock proteins for the protection of cellular function were identified in the genome of strain I2. In addition, the genome of strain I2 was predicted to encode several proteins involved in producing, secreting, and uptaking siderophores under iron-limiting conditions. The genome of strain I2 contains a number of genes responsible for the synthesis and secretion of exopolysaccharides, an extracellular component essential for biofilm formation. Overall, our results reveal the genomic features underlying the adaption of C. testosteroni to versatile environments and highlighting the importance of its biofilm lifestyle.
Seligmann, Hervé
2013-05-07
GenBank's EST database includes RNAs matching exactly human mitochondrial sequences assuming systematic asymmetric nucleotide exchange-transcription along exchange rules: A→G→C→U/T→A (12 ESTs), A→U/T→C→G→A (4 ESTs), C→G→U/T→C (3 ESTs), and A→C→G→U/T→A (1 EST), no RNAs correspond to other potential asymmetric exchange rules. Hypothetical polypeptides translated from nucleotide-exchanged human mitochondrial protein coding genes align with numerous GenBank proteins, predicted secondary structures resemble their putative GenBank homologue's. Two independent methods designed to detect overlapping genes (one based on nucleotide contents analyses in relation to replicative deamination gradients at third codon positions, and circular code analyses of codon contents based on frame redundancy), confirm nucleotide-exchange-encrypted overlapping genes. Methods converge on which genes are most probably active, and which not, and this for the various exchange rules. Mean EST lengths produced by different nucleotide exchanges are proportional to (a) extents that various bioinformatics analyses confirm the protein coding status of putative overlapping genes; (b) known kinetic chemistry parameters of the corresponding nucleotide substitutions by the human mitochondrial DNA polymerase gamma (nucleotide DNA misinsertion rates); (c) stop codon densities in predicted overlapping genes (stop codon readthrough and exchanging polymerization regulate gene expression by counterbalancing each other). Numerous rarely expressed proteins seem encoded within regular mitochondrial genes through asymmetric nucleotide exchange, avoiding lengthening genomes. Intersecting evidence between several independent approaches confirms the working hypothesis status of gene encryption by systematic nucleotide exchanges. Copyright © 2013 Elsevier Ltd. All rights reserved.
Yoon, Sung Ho; Turkarslan, Serdar; Reiss, David J.; Pan, Min; Burn, June A.; Costa, Kyle C.; Lie, Thomas J.; Slagel, Joseph; Moritz, Robert L.; Hackett, Murray; Leigh, John A.; Baliga, Nitin S.
2013-01-01
Methanogens catalyze the critical methane-producing step (called methanogenesis) in the anaerobic decomposition of organic matter. Here, we present the first predictive model of global gene regulation of methanogenesis in a hydrogenotrophic methanogen, Methanococcus maripaludis. We generated a comprehensive list of genes (protein-coding and noncoding) for M. maripaludis through integrated analysis of the transcriptome structure and a newly constructed Peptide Atlas. The environment and gene-regulatory influence network (EGRIN) model of the strain was constructed from a compendium of transcriptome data that was collected over 58 different steady-state and time-course experiments that were performed in chemostats or batch cultures under a spectrum of environmental perturbations that modulated methanogenesis. Analyses of the EGRIN model have revealed novel components of methanogenesis that included at least three additional protein-coding genes of previously unknown function as well as one noncoding RNA. We discovered that at least five regulatory mechanisms act in a combinatorial scheme to intercoordinate key steps of methanogenesis with different processes such as motility, ATP biosynthesis, and carbon assimilation. Through a combination of genetic and environmental perturbation experiments we have validated the EGRIN-predicted role of two novel transcription factors in the regulation of phosphate-dependent repression of formate dehydrogenase—a key enzyme in the methanogenesis pathway. The EGRIN model demonstrates regulatory affiliations within methanogenesis as well as between methanogenesis and other cellular functions. PMID:24089473
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feder, J.N.; Jan, L.Y.; Jan, Y.N.
The Drosophila hairy gene encodes a basic helix- loop-helix protein that functions in at least two steps during Drosophila development: (1) during embryogenesis, when it partakes in the establishment of segments, and (2) during the larval stage, when it functions negatively in determining the pattern of sensory bristles on the adult fly. In the rat, a structurally homologous gene (RHL) behaves as an immediate-early gene in its response to growth factors and can, like that in Drosophila, suppress neuronal differentiation events. Here, the authors report the genomic cloning of the human hairy gene homolog (HRY). The coding region of themore » gene is contained within four exons. The predicted amino acid sequence reveals only four amino acid differences between the human and rat genes. Analysis of the DNA sequence 5[prime] to the coding region reveals a putatitve untranslated exon. To increase the value of the HRY gene as a genetic marker and to assess its potential involvement in genetic disorders, they sublocalized the locus to chromosome 3q28-q29 by fluorescence in situ hybridization. 34 refs., 4 figs., 1 tab.« less
The Saccharomyces cerevisiae enolase-related regions encode proteins that are active enolases.
Kornblatt, M J; Richard Albert, J; Mattie, S; Zakaib, J; Dayanandan, S; Hanic-Joyce, P J; Joyce, P B M
2013-02-01
In addition to two genes (ENO1 and ENO2) known to code for enolase (EC4.2.1.11), the Saccharomyces cerevisiae genome contains three enolase-related regions (ERR1, ERR2 and ERR3) which could potentially encode proteins with enolase function. Here, we show that products of these genes (Err2p and Err3p) have secondary and quaternary structures similar to those of yeast enolase (Eno1p). In addition, Err2p and Err3p can convert 2-phosphoglycerate to phosphoenolpyruvate, with kinetic parameters similar to those of Eno1p, suggesting that these proteins could function as enolases in vivo. To address this possibility, we overexpressed the ERR2 and ERR3 genes individually in a double-null yeast strain lacking ENO1 and ENO2, and showed that either ERR2 or ERR3 could complement the growth defect in this strain when cells are grown in medium with glucose as the carbon source. Taken together, these data suggest that the ERR genes in Saccharomyces cerevisiae encode a protein that could function in glycolysis as enolase. The presence of these enolase-related regions in Saccharomyces cerevisiae and their absence in other related yeasts suggests that these genes may play some unique role in Saccharomyces cerevisiae. Further experiments will be required to determine whether these functions are related to glycolysis or other cellular processes. Copyright © 2012 John Wiley & Sons, Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rossi, Paolo; Ramelot, Theresa A.; Xiao, Rong
2005-11-01
The product of gene locus BB0938 from Bordetella bronchiseptica (Swiss-Prot ID: Q7WNU7-BORBR; NESG target ID: BoR11; Wunderlich et al., 2004; Pfam ID: PF03476) is a 128-residue protein of unknown function. This broadly conserved protein family is found in eubacteria and eukaryotes. Using triple resonance NMR techniques, we have determined 98% of backbone and 94% of side chain 1H, 13C, and 15N resonance assignments. The chemical shift and 3J(HN?Ha) scalar coupling data reveal a b topology with a seven-residue helical insert, ??????????. BMRB deposit with accession number 6693. Reference: Wunderlich et al. (2004) Proteins, 56, 181?187.
Hill, J; McGraw, P; Tzagoloff, A
1985-03-25
The yeast nuclear gene CBP2 was previously proposed to code for a protein necessary for processing of the terminal intron in the cytochrome b pre-mRNA (McGraw, P., and Tzagoloff, A. (1983) J. Biol. Chem. 258, 9459-9468). In the present study we describe a mitochondrial mutation capable of suppressing the respiratory deficiency of cbp2 mutants. The mitochondrial suppressor mutation has been shown to be the result of a precise excision of the last intervening sequence from the cytochrome b gene. Strains with the altered mitochondrial DNA have normal levels of mature cytochrome b mRNA and of cytochrome b and exhibit wild type growth on glycerol. These results confirm that CBP2 codes for a protein specifically required for splicing of the cytochrome b intron and further suggest that absence of the intervening sequence does not noticeably affect the expression of respiratory function in mitochondria.
A circadian gene expression atlas in mammals: implications for biology and medicine.
Zhang, Ray; Lahens, Nicholas F; Ballance, Heather I; Hughes, Michael E; Hogenesch, John B
2014-11-11
To characterize the role of the circadian clock in mouse physiology and behavior, we used RNA-seq and DNA arrays to quantify the transcriptomes of 12 mouse organs over time. We found 43% of all protein coding genes showed circadian rhythms in transcription somewhere in the body, largely in an organ-specific manner. In most organs, we noticed the expression of many oscillating genes peaked during transcriptional "rush hours" preceding dawn and dusk. Looking at the genomic landscape of rhythmic genes, we saw that they clustered together, were longer, and had more spliceforms than nonoscillating genes. Systems-level analysis revealed intricate rhythmic orchestration of gene pathways throughout the body. We also found oscillations in the expression of more than 1,000 known and novel noncoding RNAs (ncRNAs). Supporting their potential role in mediating clock function, ncRNAs conserved between mouse and human showed rhythmic expression in similar proportions as protein coding genes. Importantly, we also found that the majority of best-selling drugs and World Health Organization essential medicines directly target the products of rhythmic genes. Many of these drugs have short half-lives and may benefit from timed dosage. In sum, this study highlights critical, systemic, and surprising roles of the mammalian circadian clock and provides a blueprint for advancement in chronotherapy.
Liu, Cui; Yu, Yanbao; Liu, Feng; Wei, Xin; Wrobel, John A.; Gunawardena, Harsha P.; Zhou, Li; Jin, Jian; Chen, Xian
2015-01-01
Immune cells develop endotoxin tolerance (ET) after prolonged stimulation. ET increases the level of a repression mark H3K9me2 in the transcriptional-silent chromatin specifically associated with pro-inflammatory genes. However, it is not clear what proteins are functionally involved in this process. Here we show that a novel chromatin activity based chemoproteomic (ChaC) approach can dissect the functional chromatin protein complexes that regulate ET-associated inflammation. Using UNC0638 that binds the enzymatically active H3K9-specific methyltransferase G9a/GLP, ChaC reveals that G9a is constitutively active at a G9a-dependent mega-dalton repressome in primary endotoxin-tolerant macrophages. G9a/GLP broadly impacts the ET-specific reprogramming of the histone code landscape, chromatin remodeling, and the activities of select transcription factors. We discover that the G9a-dependent epigenetic environment promotes the transcriptional repression activity of c-Myc for gene-specific co-regulation of chronic inflammation. ChaC may be also applicable to dissect other functional protein complexes in the context of phenotypic chromatin architectures. PMID:25502336
SMN control of RNP assembly: from post-transcriptional gene regulation to motor neuron disease
Li, Darrick K.; Tisdale, Sarah; Lotti, Francesco; Pellizzoni, Livio
2014-01-01
At the post-transcriptional level, expression of protein-coding genes is controlled by a series of RNA regulatory events including nuclear processing of primary transcripts, transport of mature mRNAs to specific cellular compartments, translation and ultimately, turnover. These processes are orchestrated through the dynamic association of mRNAs with RNA binding proteins and ribonucleoprotein (RNP) complexes. Accurate formation of RNPs in vivo is fundamentally important to cellular development and function, and its impairment often leads to human disease. The survival motor neuron (SMN) protein is key to this biological paradigm: SMN is essential for the biogenesis of various RNPs that function in mRNA processing, and genetic mutations leading to SMN deficiency cause the neurodegenerative disease spinal muscular atrophy. Here we review the expanding role of SMN in the regulation of gene expression through its multiple functions in RNP assembly. We discuss advances in our understanding of SMN activity as a chaperone of RNPs and how disruption of SMN-dependent RNA pathways can cause motor neuron disease. PMID:24769255
Transcriptome and ultrastructural changes in dystrophic Epidermolysis bullosa resemble skin aging
Trost, Andrea; Weber, Manuela; Klausegger, Alfred; Gruber, Christina; Bruckner, Daniela; Reitsamer, Herbert A.; Bauer, Johann W.; Breitenbach, Michael
2015-01-01
The aging process of skin has been investigated recently with respect to mitochondrial function and oxidative stress. We have here observed striking phenotypic and clinical similarity between skin aging and recessive dystrophic Epidermolysis bullosa (RDEB), which is caused by recessive mutations in the gene coding for collagen VII, COL7A1. Ultrastructural changes, defects in wound healing, and inflammation markers are in part shared with aged skin. We have here compared the skin transcriptomes of young adults suffering from RDEB with that of sex‐ and age‐matched healthy probands. In parallel we have compared the skin transcriptome of healthy young adults with that of elderly healthy donors. Quite surprisingly, there was a large overlap of the two gene lists that concerned a limited number of functional protein families. Most prominent among the proteins found are a number of proteins of the cornified envelope or proteins mechanistically involved in cornification and other skin proteins. Further, the overlap list contains a large number of genes with a known role in inflammation. We are documenting some of the most prominent ultrastructural and protein changes by immunofluorescence analysis of skin sections from patients, old individuals, and healthy controls. PMID:26143532
Transcriptome and ultrastructural changes in dystrophic Epidermolysis bullosa resemble skin aging.
Breitenbach, Jenny S; Rinnerthaler, Mark; Trost, Andrea; Weber, Manuela; Klausegger, Alfred; Gruber, Christina; Bruckner, Daniela; Reitsamer, Herbert A; Bauer, Johann W; Breitenbach, Michael
2015-06-01
The aging process of skin has been investigated recently with respect to mitochondrial function and oxidative stress. We have here observed striking phenotypic and clinical similarity between skin aging and recessive dystrophic Epidermolysis bullosa (RDEB), which is caused by recessive mutations in the gene coding for collagen VII,COL7A1. Ultrastructural changes, defects in wound healing, and inflammation markers are in part shared with aged skin. We have here compared the skin transcriptomes of young adults suffering from RDEB with that of sex- and age-matched healthy probands. In parallel we have compared the skin transcriptome of healthy young adults with that of elderly healthy donors. Quite surprisingly, there was a large overlap of the two gene lists that concerned a limited number of functional protein families. Most prominent among the proteins found are a number of proteins of the cornified envelope or proteins mechanistically involved in cornification and other skin proteins. Further, the overlap list contains a large number of genes with a known role in inflammation. We are documenting some of the most prominent ultrastructural and protein changes by immunofluorescence analysis of skin sections from patients, old individuals, and healthy controls.
Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan
2017-10-03
Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.
Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan
2017-01-01
Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes. PMID:29108274
Pan-cancer transcriptomic analysis associates long non-coding RNAs with key mutational driver events
Ashouri, Arghavan; Sayin, Volkan I.; Van den Eynden, Jimmy; Singh, Simranjit X.; Papagiannakopoulos, Thales; Larsson, Erik
2016-01-01
Thousands of long non-coding RNAs (lncRNAs) lie interspersed with coding genes across the genome, and a small subset has been implicated as downstream effectors in oncogenic pathways. Here we make use of transcriptome and exome sequencing data from thousands of tumours across 19 cancer types, to identify lncRNAs that are induced or repressed in relation to somatic mutations in key oncogenic driver genes. Our screen confirms known coding and non-coding effectors and also associates many new lncRNAs to relevant pathways. The associations are often highly reproducible across cancer types, and while many lncRNAs are co-expressed with their protein-coding hosts or neighbours, some are intergenic and independent. We highlight lncRNAs with possible functions downstream of the tumour suppressor TP53 and the master antioxidant transcription factor NFE2L2. Our study provides a comprehensive overview of lncRNA transcriptional alterations in relation to key driver mutational events in human cancers. PMID:28959951
Discovering Protein-Coding Genes from the Environment: Time for the Eukaryotes?
Marmeisse, Roland; Kellner, Harald; Fraissinet-Tachet, Laurence; Luis, Patricia
2017-09-01
Eukaryotic microorganisms from diverse environments encompass a large number of taxa, many of them still unknown to science. One strategy to mine these organisms for genes of biotechnological relevance is to use a pool of eukaryotic mRNA directly extracted from environmental samples. Recent reports demonstrate that the resulting metatranscriptomic cDNA libraries can be screened by expression in yeast for a wide range of genes and functions from many of the different eukaryotic taxa. In combination with novel emerging high-throughput technologies, we anticipate that this approach should contribute to exploring the functional diversity of the eukaryotic microbiota. Copyright © 2017 Elsevier Ltd. All rights reserved.
Whitaker, Weston R; Lee, Hanson; Arkin, Adam P; Dueber, John E
2015-03-20
Genetic sequences ported into non-native hosts for synthetic biology applications can gain unexpected properties. In this study, we explored sequences functioning as ribosome binding sites (RBSs) within protein coding DNA sequences (CDSs) that cause internal translation, resulting in truncated proteins. Genome-wide prediction of bacterial RBSs, based on biophysical calculations employed by the RBS calculator, suggests a selection against internal RBSs within CDSs in Escherichia coli, but not those in Saccharomyces cerevisiae. Based on these calculations, silent mutations aimed at removing internal RBSs can effectively reduce truncation products from internal translation. However, a solution for complete elimination of internal translation initiation is not always feasible due to constraints of available coding sequences. Fluorescence assays and Western blot analysis showed that in genes with internal RBSs, increasing the strength of the intended upstream RBS had little influence on the internal translation strength. Another strategy to minimize truncated products from an internal RBS is to increase the relative strength of the upstream RBS with a concomitant reduction in promoter strength to achieve the same protein expression level. Unfortunately, lower transcription levels result in increased noise at the single cell level due to stochasticity in gene expression. At the low expression regimes desired for many synthetic biology applications, this problem becomes particularly pronounced. We found that balancing promoter strengths and upstream RBS strengths to intermediate levels can achieve the target protein concentration while avoiding both excessive noise and truncated protein.
Applications of statistical physics and information theory to the analysis of DNA sequences
NASA Astrophysics Data System (ADS)
Grosse, Ivo
2000-10-01
DNA carries the genetic information of most living organisms, and the of genome projects is to uncover that genetic information. One basic task in the analysis of DNA sequences is the recognition of protein coding genes. Powerful computer programs for gene recognition have been developed, but most of them are based on statistical patterns that vary from species to species. In this thesis I address the question if there exist universal statistical patterns that are different in coding and noncoding DNA of all living species, regardless of their phylogenetic origin. In search for such species-independent patterns I study the mutual information function of genomic DNA sequences, and find that it shows persistent period-three oscillations. To understand the biological origin of the observed period-three oscillations, I compare the mutual information function of genomic DNA sequences to the mutual information function of stochastic model sequences. I find that the pseudo-exon model is able to reproduce the mutual information function of genomic DNA sequences. Moreover, I find that a generalization of the pseudo-exon model can connect the existence and the functional form of long-range correlations to the presence and the length distributions of coding and noncoding regions. Based on these theoretical studies I am able to find an information-theoretical quantity, the average mutual information (AMI), whose probability distributions are significantly different in coding and noncoding DNA, while they are almost identical in all studied species. These findings show that there exist universal statistical patterns that are different in coding and noncoding DNA of all studied species, and they suggest that the AMI may be used to identify genes in different living species, irrespective of their taxonomic origin.
Bain, Peter A; Papanicolaou, Alexie; Kumar, Anupama
2015-01-01
Murray-Darling rainbowfish (Melanotaenia fluviatilis [Castelnau, 1878]; Atheriniformes: Melanotaeniidae) is a small-bodied teleost currently under development in Australasia as a test species for aquatic toxicological studies. To date, efforts towards the development of molecular biomarkers of contaminant exposure have been hindered by the lack of available sequence data. To address this, we sequenced messenger RNA from brain, liver and gonads of mature male and female fish and generated a high-quality draft transcriptome using a de novo assembly approach. 149,742 clusters of putative transcripts were obtained, encompassing 43,841 non-redundant protein-coding regions. Deduced amino acid sequences were annotated by functional inference based on similarity with sequences from manually curated protein sequence databases. The draft assembly contained protein-coding regions homologous to 95.7% of the complete cohort of predicted proteins from the taxonomically related species, Oryzias latipes (Japanese medaka). The mean length of rainbowfish protein-coding sequences relative to their medaka homologues was 92.1%, indicating that despite the limited number of tissues sampled a large proportion of the total expected number of protein-coding genes was captured in the study. Because of our interest in the effects of environmental contaminants on endocrine pathways, we manually curated subsets of coding regions for putative nuclear receptors and steroidogenic enzymes in the rainbowfish transcriptome, revealing 61 candidate nuclear receptors encompassing all known subfamilies, and 41 putative steroidogenic enzymes representing all major steroidogenic enzymes occurring in teleosts. The transcriptome presented here will be a valuable resource for researchers interested in biomarker development, protein structure and function, and contaminant-response genomics in Murray-Darling rainbowfish.
Tian, Weidong; Zhang, Lan V; Taşan, Murat; Gibbons, Francis D; King, Oliver D; Park, Julie; Wunderlich, Zeba; Cherry, J Michael; Roth, Frederick P
2008-01-01
Background: Learning the function of genes is a major goal of computational genomics. Methods for inferring gene function have typically fallen into two categories: 'guilt-by-profiling', which exploits correlation between function and other gene characteristics; and 'guilt-by-association', which transfers function from one gene to another via biological relationships. Results: We have developed a strategy ('Funckenstein') that performs guilt-by-profiling and guilt-by-association and combines the results. Using a benchmark set of functional categories and input data for protein-coding genes in Saccharomyces cerevisiae, Funckenstein was compared with a previous combined strategy. Subsequently, we applied Funckenstein to 2,455 Gene Ontology terms. In the process, we developed 2,455 guilt-by-profiling classifiers based on 8,848 gene characteristics and 12 functional linkage graphs based on 23 biological relationships. Conclusion: Funckenstein outperforms a previous combined strategy using a common benchmark dataset. The combination of 'guilt-by-profiling' and 'guilt-by-association' gave significant improvement over the component classifiers, showing the greatest synergy for the most specific functions. Performance was evaluated by cross-validation and by literature examination of the top-scoring novel predictions. These quantitative predictions should help prioritize experimental study of yeast gene functions. PMID:18613951
Death of a dogma: eukaryotic mRNAs can code for more than one protein.
Mouilleron, Hélène; Delcourt, Vivian; Roucou, Xavier
2016-01-08
mRNAs carry the genetic information that is translated by ribosomes. The traditional view of a mature eukaryotic mRNA is a molecule with three main regions, the 5' UTR, the protein coding open reading frame (ORF) or coding sequence (CDS), and the 3' UTR. This concept assumes that ribosomes translate one ORF only, generally the longest one, and produce one protein. As a result, in the early days of genomics and bioinformatics, one CDS was associated with each protein-coding gene. This fundamental concept of a single CDS is being challenged by increasing experimental evidence indicating that annotated proteins are not the only proteins translated from mRNAs. In particular, mass spectrometry (MS)-based proteomics and ribosome profiling have detected productive translation of alternative open reading frames. In several cases, the alternative and annotated proteins interact. Thus, the expression of two or more proteins translated from the same mRNA may offer a mechanism to ensure the co-expression of proteins which have functional interactions. Translational mechanisms already described in eukaryotic cells indicate that the cellular machinery is able to translate different CDSs from a single viral or cellular mRNA. In addition to summarizing data showing that the protein coding potential of eukaryotic mRNAs has been underestimated, this review aims to challenge the single translated CDS dogma. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Meyer, Irmtraud M
2017-05-01
RNA transcripts are the primary products of active genes in any living organism, including many viruses. Their cellular destiny not only depends on primary sequence signals, but can also be determined by RNA structure. Recent experimental evidence shows that many transcripts can be assigned more than a single functional RNA structure throughout their cellular life and that structure formation happens co-transcriptionally, i.e. as the transcript is synthesised in the cell. Moreover, functional RNA structures are not limited to non-coding transcripts, but can also feature in coding transcripts. The picture that now emerges is that RNA structures constitute an additional layer of information that can be encoded in any RNA transcript (and on top of other layers of information such as protein-context) in order to exert a wide range of functional roles. Moreover, different encoded RNA structures can be expressed at different stages of a transcript's life in order to alter the transcript's behaviour depending on its actual cellular context. Similar to the concept of alternative splicing for protein-coding genes, where a single transcript can yield different proteins depending on cellular context, it is thus appropriate to propose the notion of alternative RNA structure expression for any given transcript. This review introduces several computational strategies that my group developed to detect different aspects of RNA structure expression in vivo. Two aspects are of particular interest to us: (1) RNA secondary structure features that emerge during co-transcriptional folding and (2) functional RNA structure features that are expressed at different times of a transcript's life and potentially mutually exclusive. Copyright © 2017. Published by Elsevier Inc.
Yu, Hong; Kong, Lingfeng; Li, Qi
2016-01-01
In this study, we evaluated the efficacy of 12 mitochondrial protein-coding genes from 238 mitochondrial genomes of 140 molluscan species as potential DNA barcodes for mollusks. Three barcoding methods (distance, monophyly and character-based methods) were used in species identification. The species recovery rates based on genetic distances for the 12 genes ranged from 70.83 to 83.33%. There were no significant differences in intra- or interspecific variability among the 12 genes. The monophyly and character-based methods provided higher resolution than the distance-based method in species delimitation. Especially in closely related taxa, the character-based method showed some advantages. The results suggested that besides the standard COI barcode, other 11 mitochondrial protein-coding genes could also be potentially used as a molecular diagnostic for molluscan species discrimination. Our results also showed that the combination of mitochondrial genes did not enhance the efficacy for species identification and a single mitochondrial gene would be fully competent.
Genome fluctuations in cyanobacteria reflect evolutionary, developmental and adaptive traits.
Larsson, John; Nylander, Johan Aa; Bergman, Birgitta
2011-06-30
Cyanobacteria belong to an ancient group of photosynthetic prokaryotes with pronounced variations in their cellular differentiation strategies, physiological capacities and choice of habitat. Sequencing efforts have shown that genomes within this phylum are equally diverse in terms of size and protein-coding capacity. To increase our understanding of genomic changes in the lineage, the genomes of 58 contemporary cyanobacteria were analysed for shared and unique orthologs. A total of 404 protein families, present in all cyanobacterial genomes, were identified. Two of these are unique to the phylum, corresponding to an AbrB family transcriptional regulator and a gene that escapes functional annotation although its genomic neighbourhood is conserved among the organisms examined. The evolution of cyanobacterial genome sizes involves a mix of gains and losses in the clade encompassing complex cyanobacteria, while a single event of reduction is evident in a clade dominated by unicellular cyanobacteria. Genome sizes and gene family copy numbers evolve at a higher rate in the former clade, and multi-copy genes were predominant in large genomes. Orthologs unique to cyanobacteria exhibiting specific characteristics, such as filament formation, heterocyst differentiation, diazotrophy and symbiotic competence, were also identified. An ancestral character reconstruction suggests that the most recent common ancestor of cyanobacteria had a genome size of approx. 4.5 Mbp and 1678 to 3291 protein-coding genes, 4%-6% of which are unique to cyanobacteria today. The different rates of genome-size evolution and multi-copy gene abundance suggest two routes of genome development in the history of cyanobacteria. The expansion strategy is driven by gene-family enlargment and generates a broad adaptive potential; while the genome streamlining strategy imposes adaptations to highly specific niches, also reflected in their different functional capacities. A few genomes display extreme proliferation of non-coding nucleotides which is likely to be the result of initial expansion of genomes/gene copy number to gain adaptive potential, followed by a shift to a life-style in a highly specific niche (e.g. symbiosis). This transition results in redundancy of genes and gene families, leading to an increase in junk DNA and eventually to gene loss. A few orthologs can be correlated with specific phenotypes in cyanobacteria, such as filament formation and symbiotic competence; these constitute exciting exploratory targets.
Genome fluctuations in cyanobacteria reflect evolutionary, developmental and adaptive traits
2011-01-01
Background Cyanobacteria belong to an ancient group of photosynthetic prokaryotes with pronounced variations in their cellular differentiation strategies, physiological capacities and choice of habitat. Sequencing efforts have shown that genomes within this phylum are equally diverse in terms of size and protein-coding capacity. To increase our understanding of genomic changes in the lineage, the genomes of 58 contemporary cyanobacteria were analysed for shared and unique orthologs. Results A total of 404 protein families, present in all cyanobacterial genomes, were identified. Two of these are unique to the phylum, corresponding to an AbrB family transcriptional regulator and a gene that escapes functional annotation although its genomic neighbourhood is conserved among the organisms examined. The evolution of cyanobacterial genome sizes involves a mix of gains and losses in the clade encompassing complex cyanobacteria, while a single event of reduction is evident in a clade dominated by unicellular cyanobacteria. Genome sizes and gene family copy numbers evolve at a higher rate in the former clade, and multi-copy genes were predominant in large genomes. Orthologs unique to cyanobacteria exhibiting specific characteristics, such as filament formation, heterocyst differentiation, diazotrophy and symbiotic competence, were also identified. An ancestral character reconstruction suggests that the most recent common ancestor of cyanobacteria had a genome size of approx. 4.5 Mbp and 1678 to 3291 protein-coding genes, 4%-6% of which are unique to cyanobacteria today. Conclusions The different rates of genome-size evolution and multi-copy gene abundance suggest two routes of genome development in the history of cyanobacteria. The expansion strategy is driven by gene-family enlargment and generates a broad adaptive potential; while the genome streamlining strategy imposes adaptations to highly specific niches, also reflected in their different functional capacities. A few genomes display extreme proliferation of non-coding nucleotides which is likely to be the result of initial expansion of genomes/gene copy number to gain adaptive potential, followed by a shift to a life-style in a highly specific niche (e.g. symbiosis). This transition results in redundancy of genes and gene families, leading to an increase in junk DNA and eventually to gene loss. A few orthologs can be correlated with specific phenotypes in cyanobacteria, such as filament formation and symbiotic competence; these constitute exciting exploratory targets. PMID:21718514
Connections Underlying Translation and mRNA Stability.
Radhakrishnan, Aditya; Green, Rachel
2016-09-11
Gene expression and regulation in organisms minimally depends on transcription by RNA polymerase and on the stability of the RNA product (for both coding and non-coding RNAs). For coding RNAs, gene expression is further influenced by the amount of translation by the ribosome and by the stability of the protein product. The stabilities of these two classes of RNA, non-coding and coding, vary considerably: tRNAs and rRNAs tend to be long lived while mRNAs tend to be more short lived. Even among mRNAs, however, there is a considerable range in stability (ranging from seconds to hours in bacteria and up to days in metazoans), suggesting a significant role for stability in the regulation of gene expression. Here, we review recent experiments from bacteria, yeast and metazoans indicating that the stability of most mRNAs is broadly impacted by the actions of ribosomes that translate them. Ribosomal recognition of defective mRNAs triggers "mRNA surveillance" pathways that target the mRNA for degradation [Shoemaker and Green (2012) ]. More generally, even the stability of perfectly functional mRNAs appears to be dictated by overall rates of translation by the ribosome [Herrick et al. (1990), Presnyak et al. (2015) ]. Given that mRNAs are synthesized for the purpose of being translated into proteins, it is reassuring that such intimate connections between mRNA and the ribosome can drive biological regulation. In closing, we consider the likelihood that these connections between protein synthesis and mRNA stability are widespread or whether other modes of regulation dominate the mRNA stability landscape in higher organisms. Copyright © 2016. Published by Elsevier Ltd.
The functional spectrum of low-frequency coding variation.
Marth, Gabor T; Yu, Fuli; Indap, Amit R; Garimella, Kiran; Gravel, Simon; Leong, Wen Fung; Tyler-Smith, Chris; Bainbridge, Matthew; Blackwell, Tom; Zheng-Bradley, Xiangqun; Chen, Yuan; Challis, Danny; Clarke, Laura; Ball, Edward V; Cibulskis, Kristian; Cooper, David N; Fulton, Bob; Hartl, Chris; Koboldt, Dan; Muzny, Donna; Smith, Richard; Sougnez, Carrie; Stewart, Chip; Ward, Alistair; Yu, Jin; Xue, Yali; Altshuler, David; Bustamante, Carlos D; Clark, Andrew G; Daly, Mark; DePristo, Mark; Flicek, Paul; Gabriel, Stacey; Mardis, Elaine; Palotie, Aarno; Gibbs, Richard
2011-09-14
Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency. The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants. This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.
Thuan, Nguyen Huy; Dhakal, Dipesh; Pokhrel, Anaya Raj; Chu, Luan Luong; Van Pham, Thi Thuy; Shrestha, Anil; Sohng, Jae Kyung
2018-05-01
Streptomyces peucetius ATCC 27952 produces two major anthracyclines, doxorubicin (DXR) and daunorubicin (DNR), which are potent chemotherapeutic agents for the treatment of several cancers. In order to gain detailed insight on genetics and biochemistry of the strain, the complete genome was determined and analyzed. The result showed that its complete sequence contains 7187 protein coding genes in a total of 8,023,114 bp, whereas 87% of the genome contributed to the protein coding region. The genomic sequence included 18 rRNA, 66 tRNAs, and 3 non-coding RNAs. In silico studies predicted ~ 68 biosynthetic gene clusters (BCGs) encoding diverse classes of secondary metabolites, including non-ribosomal polyketide synthase (NRPS), polyketide synthase (PKS I, II, and III), terpenes, and others. Detailed analysis of the genome sequence revealed versatile biocatalytic enzymes such as cytochrome P450 (CYP), electron transfer systems (ETS) genes, methyltransferase (MT), glycosyltransferase (GT). In addition, numerous functional genes (transporter gene, SOD, etc.) and regulatory genes (afsR-sp, metK-sp, etc.) involved in the regulation of secondary metabolites were found. This minireview summarizes the genome-based genome mining (GM) of diverse BCGs and genome exploration (GE) of versatile biocatalytic enzymes, and other enzymes involved in maintenance and regulation of metabolism of S. peucetius. The detailed analysis of genome sequence provides critically important knowledge useful in the bioengineering of the strain or harboring catalytically efficient enzymes for biotechnological applications.
Ramesh, S V
2013-09-01
Of late non-coding RNAs (ncRNAs)-mediated gene silencing is an influential tool deliberately deployed to negatively regulate the expression of targeted genes. In addition to the widely employed small interfering RNA (siRNA)-mediated gene silencing approach, other variants like artificial miRNA (amiRNA), miRNA mimics, and artificial transacting siRNAs (tasiRNAs) are being explored and successfully deployed in developing non-coding RNA-based genetically modified plants. The ncRNA-based gene manipulations are typified with mobile nature of silencing signals, interference from viral genome-derived suppressor proteins, and an obligation for meticulous computational analysis to prevaricate any inadvertent effects. In a broad sense, risk assessment inquiries for genetically modified plants based on the expression of ncRNAs are competently addressed by the environmental risk assessment (ERA) models, currently in vogue, designed for the first generation transgenic plants which are based on the expression of heterologous proteins. Nevertheless, transgenic plants functioning on the foundation of ncRNAs warrant due attention with respect to their unique attributes like off-target or non-target gene silencing effects, small RNAs (sRNAs) persistence, food and feed safety assessments, problems in detection and tracking of sRNAs in food, impact of ncRNAs in plant protection measures, effect of mutations etc. The role of recent developments in sequencing techniques like next generation sequencing (NGS) and the ERA paradigm of the different countries in vogue are also discussed in the context of ncRNA-based gene manipulations.
Yang, Hai-Ling; Liu, Yan-Jing; Wang, Cai-Ling; Zeng, Qing-Yin
2012-01-01
Trehalose-6-phosphate synthase (TPS) plays important roles in trehalose metabolism and signaling. Plant TPS proteins contain both a TPS and a trehalose-6-phosphate phosphatase (TPP) domain, which are coded by a multi-gene family. The plant TPS gene family has been divided into class I and class II. A previous study showed that the Populus, Arabidopsis, and rice genomes have seven class I and 27 class II TPS genes. In this study, we found that all class I TPS genes had 16 introns within the protein-coding region, whereas class II TPS genes had two introns. A significant sequence difference between the two classes of TPS proteins was observed by pairwise sequence comparisons of the 34 TPS proteins. A phylogenetic analysis revealed that at least seven TPS genes were present in the monocot–dicot common ancestor. Segmental duplications contributed significantly to the expansion of this gene family. At least five and three TPS genes were created by segmental duplication events in the Populus and rice genomes, respectively. Both the TPS and TPP domains of 34 TPS genes have evolved under purifying selection, but the selective constraint on the TPP domain was more relaxed than that on the TPS domain. Among 34 TPS genes from Populus, Arabidopsis, and rice, four class I TPS genes (AtTPS1, OsTPS1, PtTPS1, and PtTPS2) were under stronger purifying selection, whereas three Arabidopsis class I TPS genes (AtTPS2, 3, and 4) apparently evolved under relaxed selective constraint. Additionally, a reverse transcription polymerase chain reaction analysis showed the expression divergence of the TPS gene family in Populus, Arabidopsis, and rice under normal growth conditions and in response to stressors. Our findings provide new insights into the mechanisms of gene family expansion and functional evolution. PMID:22905132
Yang, Hai-Ling; Liu, Yan-Jing; Wang, Cai-Ling; Zeng, Qing-Yin
2012-01-01
Trehalose-6-phosphate synthase (TPS) plays important roles in trehalose metabolism and signaling. Plant TPS proteins contain both a TPS and a trehalose-6-phosphate phosphatase (TPP) domain, which are coded by a multi-gene family. The plant TPS gene family has been divided into class I and class II. A previous study showed that the Populus, Arabidopsis, and rice genomes have seven class I and 27 class II TPS genes. In this study, we found that all class I TPS genes had 16 introns within the protein-coding region, whereas class II TPS genes had two introns. A significant sequence difference between the two classes of TPS proteins was observed by pairwise sequence comparisons of the 34 TPS proteins. A phylogenetic analysis revealed that at least seven TPS genes were present in the monocot-dicot common ancestor. Segmental duplications contributed significantly to the expansion of this gene family. At least five and three TPS genes were created by segmental duplication events in the Populus and rice genomes, respectively. Both the TPS and TPP domains of 34 TPS genes have evolved under purifying selection, but the selective constraint on the TPP domain was more relaxed than that on the TPS domain. Among 34 TPS genes from Populus, Arabidopsis, and rice, four class I TPS genes (AtTPS1, OsTPS1, PtTPS1, and PtTPS2) were under stronger purifying selection, whereas three Arabidopsis class I TPS genes (AtTPS2, 3, and 4) apparently evolved under relaxed selective constraint. Additionally, a reverse transcription polymerase chain reaction analysis showed the expression divergence of the TPS gene family in Populus, Arabidopsis, and rice under normal growth conditions and in response to stressors. Our findings provide new insights into the mechanisms of gene family expansion and functional evolution.
Using a Euclid distance discriminant method to find protein coding genes in the yeast genome.
Zhang, Chun-Ting; Wang, Ju; Zhang, Ren
2002-02-01
The Euclid distance discriminant method is used to find protein coding genes in the yeast genome, based on the single nucleotide frequencies at three codon positions in the ORFs. The method is extremely simple and may be extended to find genes in prokaryotic genomes or eukaryotic genomes with less introns. Six-fold cross-validation tests have demonstrated that the accuracy of the algorithm is better than 93%. Based on this, it is found that the total number of protein coding genes in the yeast genome is less than or equal to 5579 only, about 3.8-7.0% less than 5800-6000, which is currently widely accepted. The base compositions at three codon positions are analyzed in details using a graphic method. The result shows that the preference codons adopted by yeast genes are of the RGW type, where R, G and W indicate the bases of purine, non-G and A/T, whereas the 'codons' in the intergenic sequences are of the form NNN, where N denotes any base. This fact constitutes the basis of the algorithm to distinguish between coding and non-coding ORFs in the yeast genome. The names of putative non-coding ORFs are listed here in detail.
Benoit, Joshua B; Attardo, Geoffrey M; Michalkova, Veronika; Krause, Tyler B; Bohova, Jana; Zhang, Qirui; Baumann, Aaron A; Mireji, Paul O; Takáč, Peter; Denlinger, David L; Ribeiro, Jose M; Aksoy, Serap
2014-04-01
In tsetse flies, nutrients for intrauterine larval development are synthesized by the modified accessory gland (milk gland) and provided in mother's milk during lactation. Interference with at least two milk proteins has been shown to extend larval development and reduce fecundity. The goal of this study was to perform a comprehensive characterization of tsetse milk proteins using lactation-specific transcriptome/milk proteome analyses and to define functional role(s) for the milk proteins during lactation. Differential analysis of RNA-seq data from lactating and dry (non-lactating) females revealed enrichment of transcripts coding for protein synthesis machinery, lipid metabolism and secretory proteins during lactation. Among the genes induced during lactation were those encoding the previously identified milk proteins (milk gland proteins 1-3, transferrin and acid sphingomyelinase 1) and seven new genes (mgp4-10). The genes encoding mgp2-10 are organized on a 40 kb syntenic block in the tsetse genome, have similar exon-intron arrangements, and share regions of amino acid sequence similarity. Expression of mgp2-10 is female-specific and high during milk secretion. While knockdown of a single mgp failed to reduce fecundity, simultaneous knockdown of multiple variants reduced milk protein levels and lowered fecundity. The genomic localization, gene structure similarities, and functional redundancy of MGP2-10 suggest that they constitute a novel highly divergent protein family. Our data indicates that MGP2-10 function both as the primary amino acid resource for the developing larva and in the maintenance of milk homeostasis, similar to the function of the mammalian casein family of milk proteins. This study underscores the dynamic nature of the lactation cycle and identifies a novel family of lactation-specific proteins, unique to Glossina sp., that are essential to larval development. The specificity of MGP2-10 to tsetse and their critical role during lactation suggests that these proteins may be an excellent target for tsetse-specific population control approaches.
Detecting long tandem duplications in genomic sequences.
Audemard, Eric; Schiex, Thomas; Faraut, Thomas
2012-05-08
Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,(a) we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS < 1) and that it is also able to predict tandem duplications involving non coding elements such as pseudo-genes or RNA genes. ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations.
Rossmassler, Karen; Dietrich, Carsten; Thompson, Claire; Mikaelyan, Aram; Nonoh, James O; Scheffrahn, Rudolf H; Sillam-Dussès, David; Brune, Andreas
2015-11-26
Termites are important contributors to carbon and nitrogen cycling in tropical ecosystems. Higher termites digest lignocellulose in various stages of humification with the help of an entirely prokaryotic microbiota housed in their compartmented intestinal tract. Previous studies revealed fundamental differences in community structure between compartments, but the functional roles of individual lineages in symbiotic digestion are mostly unknown. Here, we conducted a highly resolved analysis of the gut microbiota in six species of higher termites that feed on plant material at different levels of humification. Combining amplicon sequencing and metagenomics, we assessed similarities in community structure and functional potential between the major hindgut compartments (P1, P3, and P4). Cluster analysis of the relative abundances of orthologous gene clusters (COGs) revealed high similarities among wood- and litter-feeding termites and strong differences to humivorous species. However, abundance estimates of bacterial phyla based on 16S rRNA genes greatly differed from those based on protein-coding genes. Community structure and functional potential of the microbiota in individual gut compartments are clearly driven by the digestive strategy of the host. The metagenomics libraries obtained in this study provide the basis for future studies that elucidate the fundamental differences in the symbiont-mediated breakdown of lignocellulose and humus by termites of different feeding groups. The high proportion of uncultured bacterial lineages in all samples calls for a reference-independent approach for the correct taxonomic assignment of protein-coding genes.
Yu, Hua; Jiao, Bingke; Lu, Lu; Wang, Pengfei; Chen, Shuangcheng; Liang, Chengzhi; Liu, Wei
2018-01-01
Accurately reconstructing gene co-expression network is of great importance for uncovering the genetic architecture underlying complex and various phenotypes. The recent availability of high-throughput RNA-seq sequencing has made genome-wide detecting and quantifying of the novel, rare and low-abundance transcripts practical. However, its potential merits in reconstructing gene co-expression network have still not been well explored. Using massive-scale RNA-seq samples, we have designed an ensemble pipeline, called NetMiner, for building genome-scale and high-quality Gene Co-expression Network (GCN) by integrating three frequently used inference algorithms. We constructed a RNA-seq-based GCN in one species of monocot rice. The quality of network obtained by our method was verified and evaluated by the curated gene functional association data sets, which obviously outperformed each single method. In addition, the powerful capability of network for associating genes with functions and agronomic traits was shown by enrichment analysis and case studies. In particular, we demonstrated the potential value of our proposed method to predict the biological roles of unknown protein-coding genes, long non-coding RNA (lncRNA) genes and circular RNA (circRNA) genes. Our results provided a valuable and highly reliable data source to select key candidate genes for subsequent experimental validation. To facilitate identification of novel genes regulating important biological processes and phenotypes in other plants or animals, we have published the source code of NetMiner, making it freely available at https://github.com/czllab/NetMiner.
The Prx1 limb enhancers: targeted gene expression in developing zebrafish pectoral fins.
Hernández-Vega, Amayra; Minguillón, Carolina
2011-08-01
Limbs represent an excellent model to study the induction, growth, and patterning of several organs. A breakthrough to study gene function in various tissues has been the characterization of regulatory elements that allow tissue-specific interference of gene function. The mouse Prx1 promoter has been used to generate limb-specific mutants and overexpress genes in tetrapod limbs. Although zebrafish possess advantages that favor their use to study limb morphogenesis, there is no driver described suitable for specifically interfering with gene function in developing fins. We report the generation of zebrafish lines that express enhanced green fluorescent protein (EGFP) driven by the mouse Prx1 enhancer in developing pectoral fins. We also describe the expression pattern of the zebrafish prrx1 genes and identify three conserved non-coding elements (CNEs) that we use to generate fin-specific EGFP reporter lines. Finally, we show that the mouse and zebrafish regulatory elements may be used to modify gene function in pectoral fins. Copyright © 2011 Wiley-Liss, Inc.
Moodley, Yoshan; Uhr, Markus; Stamer, Christiana; Vauterin, Marc; Suerbaum, Sebastian; Achtman, Mark
2010-01-01
The Helicobacter pylori cag pathogenicity island (cagPAI) encodes a type IV secretion system. Humans infected with cagPAI–carrying H. pylori are at increased risk for sequelae such as gastric cancer. Housekeeping genes in H. pylori show considerable genetic diversity; but the diversity of virulence factors such as the cagPAI, which transports the bacterial oncogene CagA into host cells, has not been systematically investigated. Here we compared the complete cagPAI sequences for 38 representative isolates from all known H. pylori biogeographic populations. Their gene content and gene order were highly conserved. The phylogeny of most cagPAI genes was similar to that of housekeeping genes, indicating that the cagPAI was probably acquired only once by H. pylori, and its genetic diversity reflects the isolation by distance that has shaped this bacterial species since modern humans migrated out of Africa. Most isolates induced IL-8 release in gastric epithelial cells, indicating that the function of the Cag secretion system has been conserved despite some genetic rearrangements. More than one third of cagPAI genes, in particular those encoding cell-surface exposed proteins, showed signatures of diversifying (Darwinian) selection at more than 5% of codons. Several unknown gene products predicted to be under Darwinian selection are also likely to be secreted proteins (e.g. HP0522, HP0535). One of these, HP0535, is predicted to code for either a new secreted candidate effector protein or a protein which interacts with CagA because it contains two genetic lineages, similar to cagA. Our study provides a resource that can guide future research on the biological roles and host interactions of cagPAI proteins, including several whose function is still unknown. PMID:20808891
Olbermann, Patrick; Josenhans, Christine; Moodley, Yoshan; Uhr, Markus; Stamer, Christiana; Vauterin, Marc; Suerbaum, Sebastian; Achtman, Mark; Linz, Bodo
2010-08-19
The Helicobacter pylori cag pathogenicity island (cagPAI) encodes a type IV secretion system. Humans infected with cagPAI-carrying H. pylori are at increased risk for sequelae such as gastric cancer. Housekeeping genes in H. pylori show considerable genetic diversity; but the diversity of virulence factors such as the cagPAI, which transports the bacterial oncogene CagA into host cells, has not been systematically investigated. Here we compared the complete cagPAI sequences for 38 representative isolates from all known H. pylori biogeographic populations. Their gene content and gene order were highly conserved. The phylogeny of most cagPAI genes was similar to that of housekeeping genes, indicating that the cagPAI was probably acquired only once by H. pylori, and its genetic diversity reflects the isolation by distance that has shaped this bacterial species since modern humans migrated out of Africa. Most isolates induced IL-8 release in gastric epithelial cells, indicating that the function of the Cag secretion system has been conserved despite some genetic rearrangements. More than one third of cagPAI genes, in particular those encoding cell-surface exposed proteins, showed signatures of diversifying (Darwinian) selection at more than 5% of codons. Several unknown gene products predicted to be under Darwinian selection are also likely to be secreted proteins (e.g. HP0522, HP0535). One of these, HP0535, is predicted to code for either a new secreted candidate effector protein or a protein which interacts with CagA because it contains two genetic lineages, similar to cagA. Our study provides a resource that can guide future research on the biological roles and host interactions of cagPAI proteins, including several whose function is still unknown.
Greenwald, Scott H.; Kuchenbecker, James A.; Rowlan, Jessica S.; Neitz, Jay; Neitz, Maureen
2017-01-01
Purpose Human long (L) and middle (M) wavelength cone opsin genes are highly variable due to intermixing. Two L/M cone opsin interchange mutants, designated LIAVA and LVAVA, are associated with clinical diagnoses, including red-green color vision deficiency, blue cone monochromacy, cone degeneration, myopia, and Bornholm Eye Disease. Because the protein and splicing codes are carried by the same nucleotides, intermixing L and M genes can cause disease by affecting protein structure and splicing. Methods Genetically engineered mice were created to allow investigation of the consequences of altered protein structure alone, and the effects on cone morphology were examined using immunohistochemistry. In humans and mice, cone function was evaluated using the electroretinogram (ERG) under L/M- or short (S) wavelength cone isolating conditions. Effects of LIAVA and LVAVA genes on splicing were evaluated using a minigene assay. Results ERGs and histology in mice revealed protein toxicity for the LVAVA but not for the LIAVA opsin. Minigene assays showed that the dominant messenger RNA (mRNA) was aberrantly spliced for both variants; however, the LVAVA gene produced a small but significant amount of full-length mRNA and LVAVA subjects had correspondingly reduced ERG amplitudes. In contrast, the LIAVA subject had no L/M cone ERG. Conclusions Dramatic differences in phenotype can result from seemingly minor differences in genotype through divergent effects on the dual amino acid and splicing codes. Translational Relevance The mechanism by which individual mutations contribute to clinical phenotypes provides valuable information for diagnosis and prognosis of vision disorders associated with L/M interchange mutations, and it informs strategies for developing therapies. PMID:28516000
Does CTCF mediate between nuclear organization and gene expression?
Ohlsson, Rolf; Lobanenkov, Victor; Klenova, Elena
2010-01-01
The multifunctional zinc-finger protein CCCTC-binding factor (CTCF) is a very strong candidate for the role of coordinating the expression level of coding sequences with their three-dimensional position in the nucleus, apparently responding to a "code" in the DNA itself. Dynamic interactions between chromatin fibers in the context of nuclear architecture have been implicated in various aspects of genome functions. However, the molecular basis of these interactions still remains elusive and is a subject of intense debate. Here we discuss the nature of CTCF-DNA interactions, the CTCF-binding specificity to its binding sites and the relationship between CTCF and chromatin, and we examine data linking CTCF with gene regulation in the three-dimensional nuclear space. We discuss why these features render CTCF a very strong candidate for the role and propose a unifying model, the "CTCF code," explaining the mechanistic basis of how the information encrypted in DNA may be interpreted by CTCF into diverse nuclear functions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilchrist, Michael J.; Sobral, Daniel; Khoueiry, Pierre
Genome-wide resources, such as collections of cDNA clones encoding for complete proteins (full-ORF clones), are crucial tools for studying the evolution of gene function and genetic interactions. Non-model organisms, in particular marine organisms, provide a rich source of functional diversity. Marine organism genomes are, however, frequently highly polymorphic and encode proteins that diverge significantly from those of well-annotated model genomes. The construction of full-ORF clone collections from non-model organisms is hindered by the difficulty of predicting accurately the N-terminal ends of proteins, and distinguishing recent paralogs from highly polymorphic alleles. We also report a computational strategy that overcomes these difficulties,more » and allows for accurate gene level clustering of transcript data followed by the automated identification of full-ORFs with correct 5'- and 3'-ends. It is robust to polymorphism, includes paralog calling and does not require evolutionary proximity to well annotated model organisms. Here, we developed this pipeline for the ascidian Ciona intestinalis, a highly polymorphic member of the divergent sister group of the vertebrates, emerging as a powerful model organism to study chordate gene function, Gene Regulatory Networks and molecular mechanisms underlying human pathologies. Furthermore, using this pipeline we have generated the first full-ORF collection for a highly polymorphic marine invertebrate. It contains 19,163 full-ORF cDNA clones covering 60% of Ciona coding genes, and full-ORF orthologs for approximately half of curated human disease-associated genes.« less
Variant discovery in the sheep milk transcriptome using RNA sequencing.
Suárez-Vega, Aroa; Gutiérrez-Gil, Beatriz; Klopp, Christophe; Tosser-Klopp, Gwenola; Arranz, Juan José
2017-02-15
The identification of genetic variation underlying desired phenotypes is one of the main challenges of current livestock genetic research. High-throughput transcriptome sequencing (RNA-Seq) offers new opportunities for the detection of transcriptome variants (SNPs and short indels) in different tissues and species. In this study, we used RNA-Seq on Milk Sheep Somatic Cells (MSCs) with the goal of characterizing the genetic variation within the coding regions of the milk transcriptome in Churra and Assaf sheep, two common dairy sheep breeds farmed in Spain. A total of 216,637 variants were detected in the MSCs transcriptome of the eight ewes analyzed. Among them, a total of 57,795 variants were detected in the regions harboring Quantitative Trait Loci (QTL) for milk yield, protein percentage and fat percentage, of which 21.44% were novel variants. Among the total variants detected, 561 (2.52%) and 1,649 (7.42%) were predicted to produce high or moderate impact changes in the corresponding transcriptional unit, respectively. In the functional enrichment analysis of the genes positioned within selected QTL regions harboring novel relevant functional variants (high and moderate impact), the KEGG pathway with the highest enrichment was "protein processing in endoplasmic reticulum". Additionally, a total of 504 and 1,063 variants were identified in the genes encoding principal milk proteins and molecules involved in the lipid metabolism, respectively. Of these variants, 20 mutations were found to have putative relevant effects on the encoded proteins. We present herein the first transcriptomic approach aimed at identifying genetic variants of the genes expressed in the lactating mammary gland of sheep. Through the transcriptome analysis of variability within regions harboring QTL for milk yield, protein percentage and fat percentage, we have found several pathways and genes that harbor mutations that could affect dairy production traits. Moreover, remarkable variants were also found in candidate genes coding for major milk proteins and proteins related to milk fat metabolism. Several of the SNPs found in this study could be included as suitable markers in genotyping platforms or custom SNP arrays to perform association analyses in commercial populations and apply genomic selection protocols in the dairy production industry.
Mohandesan, Elmira; Fitak, Robert R; Corander, Jukka; Yadamsuren, Adiya; Chuluunbat, Battsetseg; Abdelhadi, Omer; Raziq, Abdul; Nagy, Peter; Stalder, Gabrielle; Walzer, Chris; Faye, Bernard; Burger, Pamela A
2017-08-30
The genus Camelus is an interesting model to study adaptive evolution in the mitochondrial genome, as the three extant Old World camel species inhabit hot and low-altitude as well as cold and high-altitude deserts. We sequenced 24 camel mitogenomes and combined them with three previously published sequences to study the role of natural selection under different environmental pressure, and to advance our understanding of the evolutionary history of the genus Camelus. We confirmed the heterogeneity of divergence across different components of the electron transport system. Lineage-specific analysis of mitochondrial protein evolution revealed a significant effect of purifying selection in the concatenated protein-coding genes in domestic Bactrian camels. The estimated dN/dS < 1 in the concatenated protein-coding genes suggested purifying selection as driving force for shaping mitogenome diversity in camels. Additional analyses of the functional divergence in amino acid changes between species-specific lineages indicated fixed substitutions in various genes, with radical effects on the physicochemical properties of the protein products. The evolutionary time estimates revealed a divergence between domestic and wild Bactrian camels around 1.1 [0.58-1.8] million years ago (mya). This has major implications for the conservation and management of the critically endangered wild species, Camelus ferus.
Fellner, Lea; Simon, Svenja; Scherling, Christian; Witting, Michael; Schober, Steffen; Polte, Christine; Schmitt-Kopplin, Philippe; Keim, Daniel A; Scherer, Siegfried; Neuhaus, Klaus
2015-12-18
Gene duplication is believed to be the classical way to form novel genes, but overprinting may be an important alternative. Overprinting allows entirely novel proteins to evolve de novo, i.e., formerly non-coding open reading frames within functional genes become expressed. Only three cases have been described for Escherichia coli. Here, a fourth example is presented. RNA sequencing revealed an open reading frame weakly transcribed in cow dung, coding for 101 residues and embedded completely in the -2 reading frame of citC in enterohemorrhagic E. coli. This gene is designated novel overlapping gene, nog1. The promoter region fused to gfp exhibits specific activities and 5' rapid amplification of cDNA ends indicated the transcriptional start 40-bp upstream of the start codon. nog1 was strand-specifically arrested in translation by a nonsense mutation silent in citC. This Nog1-mutant showed a phenotype in competitive growth against wild type in the presence of MgCl2. Small differences in metabolite concentrations were also found. Bioinformatic analyses propose Nog1 to be inner membrane-bound and to possess at least one membrane-spanning domain. A phylogenetic analysis suggests that the orphan gene nog1 arose by overprinting after Escherichia/Shigella separated from the other γ-proteobacteria. Since nog1 is of recent origin, non-essential, short, weakly expressed and only marginally involved in E. coli's central metabolism, we propose that this gene is in an initial stage of evolution. While we present specific experimental evidence for the existence of a fourth overlapping gene in enterohemorrhagic E. coli, we believe that this may be an initial finding only and overlapping genes in bacteria may be more common than is currently assumed by microbiologists.
Kihara, Takahiro; Hiroe, Ayaka; Ishii-Hyakutake, Manami; Mizuno, Kouhei; Tsuge, Takeharu
2017-08-01
Bacillus cereus and Bacillus megaterium both accumulate polyhydroxyalkanoate (PHA) but their PHA biosynthetic gene (pha) clusters that code for proteins involved in PHA biosynthesis are different. Namely, a gene encoding MaoC-like protein exists in the B. cereus-type pha cluster but not in the B. megaterium-type pha cluster. MaoC-like protein has an R-specific enoyl-CoA hydratase (R-hydratase) activity and is referred to as PhaJ when involved in PHA metabolism. In this study, the pha cluster of B. cereus YB-4 was characterized in terms of PhaJ's function. In an in vitro assay, PhaJ from B. cereus YB-4 (PhaJ YB4 ) exhibited hydration activity toward crotonyl-CoA. In an in vivo assay using Escherichia coli as a host for PHA accumulation, the recombinant strain expressing PhaJ YB4 and PHA synthase led to increased PHA accumulation, suggesting that PhaJ YB4 functioned as a monomer supplier. The monomer composition of the accumulated PHA reflected the substrate specificity of PhaJ YB4 , which appeared to prefer short chain-length substrates. The pha cluster from B. cereus YB-4 functioned to accumulate PHA in E. coli; however, it did not function when the phaJ YB4 gene was deleted. The B. cereus-type pha cluster represents a new example of a pha cluster that contains the gene encoding PhaJ.
Structural architecture of the human long non-coding RNA, steroid receptor RNA activator
Novikova, Irina V.; Hennelly, Scott P.; Sanbonmatsu, Karissa Y.
2012-01-01
While functional roles of several long non-coding RNAs (lncRNAs) have been determined, the molecular mechanisms are not well understood. Here, we report the first experimentally derived secondary structure of a human lncRNA, the steroid receptor RNA activator (SRA), 0.87 kB in size. The SRA RNA is a non-coding RNA that coactivates several human sex hormone receptors and is strongly associated with breast cancer. Coding isoforms of SRA are also expressed to produce proteins, making the SRA gene a unique bifunctional system. Our experimental findings (SHAPE, in-line, DMS and RNase V1 probing) reveal that this lncRNA has a complex structural organization, consisting of four domains, with a variety of secondary structure elements. We examine the coevolution of the SRA gene at the RNA structure and protein structure levels using comparative sequence analysis across vertebrates. Rapid evolutionary stabilization of RNA structure, combined with frame-disrupting mutations in conserved regions, suggests that evolutionary pressure preserves the RNA structural core rather than its translational product. We perform similar experiments on alternatively spliced SRA isoforms to assess their structural features. PMID:22362738
2011-01-01
Background Nucleoside diphosphate kinases NDPK are evolutionarily conserved enzymes present in Bacteria, Archaea and Eukarya, with human Nme1 the most studied representative of the family and the first identified metastasis suppressor. Sponges (Porifera) are simple metazoans without tissues, closest to the common ancestor of all animals. They changed little during evolution and probably provide the best insight into the metazoan ancestor's genomic features. Recent studies show that sponges have a wide repertoire of genes many of which are involved in diseases in more complex metazoans. The original function of those genes and the way it has evolved in the animal lineage is largely unknown. Here we report new results on the metastasis suppressor gene/protein homolog from the marine sponge Suberites domuncula, NmeGp1Sd. The purpose of this study was to investigate the properties of the sponge Group I Nme gene and protein, and compare it to its human homolog in order to elucidate the evolution of the structure and function of Nme. Results We found that sponge genes coding for Group I Nme protein are intron-rich. Furthermore, we discovered that the sponge NmeGp1Sd protein has a similar level of kinase activity as its human homolog Nme1, does not cleave negatively supercoiled DNA and shows nonspecific DNA-binding activity. The sponge NmeGp1Sd forms a hexamer, like human Nme1, and all other eukaryotic Nme proteins. NmeGp1Sd interacts with human Nme1 in human cells and exhibits the same subcellular localization. Stable clones expressing sponge NmeGp1Sd inhibited the migratory potential of CAL 27 cells, as already reported for human Nme1, which suggests that Nme's function in migratory processes was engaged long before the composition of true tissues. Conclusions This study suggests that the ancestor of all animals possessed a NmeGp1 protein with properties and functions similar to evolutionarily recent versions of the protein, even before the appearance of true tissues and the origin of tumors and metastasis. PMID:21457554
Delineating slowly and rapidly evolving fractions of the Drosophila genome.
Keith, Jonathan M; Adams, Peter; Stephen, Stuart; Mattick, John S
2008-05-01
Evolutionary conservation is an important indicator of function and a major component of bioinformatic methods to identify non-protein-coding genes. We present a new Bayesian method for segmenting pairwise alignments of eukaryotic genomes while simultaneously classifying segments into slowly and rapidly evolving fractions. We also describe an information criterion similar to the Akaike Information Criterion (AIC) for determining the number of classes. Working with pairwise alignments enables detection of differences in conservation patterns among closely related species. We analyzed three whole-genome and three partial-genome pairwise alignments among eight Drosophila species. Three distinct classes of conservation level were detected. Sequences comprising the most slowly evolving component were consistent across a range of species pairs, and constituted approximately 62-66% of the D. melanogaster genome. Almost all (>90%) of the aligned protein-coding sequence is in this fraction, suggesting much of it (comprising the majority of the Drosophila genome, including approximately 56% of non-protein-coding sequences) is functional. The size and content of the most rapidly evolving component was species dependent, and varied from 1.6% to 4.8%. This fraction is also enriched for protein-coding sequence (while containing significant amounts of non-protein-coding sequence), suggesting it is under positive selection. We also classified segments according to conservation and GC content simultaneously. This analysis identified numerous sub-classes of those identified on the basis of conservation alone, but was nevertheless consistent with that classification. Software, data, and results available at www.maths.qut.edu.au/-keithj/. Genomic segments comprising the conservation classes available in BED format.
Mu, Huawei; Sun, Jin; Heras, Horacio; Chu, Ka Hou; Qiu, Jian-Wen
2017-02-23
Proteins of the egg perivitelline fluid (PVF) that surrounds the embryo are critical for embryonic development in many animals, but little is known about their identities. Using an integrated proteomic and transcriptomic approach, we identified 64 proteins from the PVF of Pomacea maculata, a freshwater snail adopting aerial oviposition. Proteins were classified into eight functional groups: major multifunctional perivitellin subunits, immune response, energy metabolism, protein degradation, oxidation-reduction, signaling and binding, transcription and translation, and others. Comparison of gene expression levels between tissues showed that 22 PVF genes were exclusively expressed in albumen gland, the female organ that secretes PVF. Base substitution analysis of PVF and housekeeping genes between P. maculata and its closely related species Pomacea canaliculata showed that the reproductive proteins had a higher mean evolutionary rate. Predicted 3D structures of selected PVF proteins showed that some nonsynonymous substitutions are located at or near the binding regions that may affect protein function. The proteome and sequence divergence analysis revealed a substantial amount of maternal investment in embryonic nutrition and defense, and higher adaptive selective pressure on PVF protein-coding genes when compared with housekeeping genes, providing insight into the adaptations associated with the unusual reproductive strategy in these mollusks. There has been great interest in studying reproduction-related proteins as such studies may not only answer fundamental questions about speciation and evolution, but also solve practical problems of animal infertility and pest outbreak. Our study has demonstrated the effectiveness of an integrated proteomic and transcriptomic approach in understanding the heavy maternal investment of proteins in the eggs of a non-model snail, and how the reproductive proteins may have evolved during the transition from laying underwater eggs to aerial eggs. Copyright © 2017 Elsevier B.V. All rights reserved.
Cai, Kexin; Wang, Jiawen; Wang, Min; Zhang, Hui; Wang, Siming; Zhao, Yu
2016-07-01
To establish an efficient expression system for a fusion protein GST-pgLTP (Lipid Transfer Protein) and to test its antifungal activity. The nucleotide sequence of LTP gene was obtained from Panax ginseng using RT-PCR. The ORF of the cDNA is 363 bp, codING for a protein OF 120 amino acids with a calculated MW of 12.09 kDa. The pgLTP gene with a His6-tag at the C-terminus was cloned into the pGEX-6p1 vector to generate a GST-fusion pgLTP protein construct that was expressed in Escherichia coli Rosetta. Following purification by Ni-NTA, the fusion protein exhibited antifungal activity against five fungi found in ginseng. The fusion protein GST-pgLTP has activity against a broad spectrum of phytopathogenic fungi, and can potentially be adapted for production to combat fungal diseases that affect P. ginseng.
Stotz, Henrik U; Harvey, Pascoe J; Haddadi, Parham; Mashanova, Alla; Kukol, Andreas; Larkan, Nicholas J; Borhan, M Hossein; Fitt, Bruce D L
2018-01-01
Genes coding for nucleotide-binding leucine-rich repeat (LRR) receptors (NLRs) control resistance against intracellular (cell-penetrating) pathogens. However, evidence for a role of genes coding for proteins with LRR domains in resistance against extracellular (apoplastic) fungal pathogens is limited. Here, the distribution of genes coding for proteins with eLRR domains but lacking kinase domains was determined for the Brassica napus genome. Predictions of signal peptide and transmembrane regions divided these genes into 184 coding for receptor-like proteins (RLPs) and 121 coding for secreted proteins (SPs). Together with previously annotated NLRs, a total of 720 LRR genes were found. Leptosphaeria maculans-induced expression during a compatible interaction with cultivar Topas differed between RLP, SP and NLR gene families; NLR genes were induced relatively late, during the necrotrophic phase of pathogen colonization. Seven RLP, one SP and two NLR genes were found in Rlm1 and Rlm3/Rlm4/Rlm7/Rlm9 loci for resistance against L. maculans on chromosome A07 of B. napus. One NLR gene at the Rlm9 locus was positively selected, as was the RLP gene on chromosome A10 with LepR3 and Rlm2 alleles conferring resistance against L. maculans races with corresponding effectors AvrLm1 and AvrLm2, respectively. Known loci for resistance against L. maculans (extracellular hemi-biotrophic fungus), Sclerotinia sclerotiorum (necrotrophic fungus) and Plasmodiophora brassicae (intracellular, obligate biotrophic protist) were examined for presence of RLPs, SPs and NLRs in these regions. Whereas loci for resistance against P. brassicae were enriched for NLRs, no such signature was observed for the other pathogens. These findings demonstrate involvement of (i) NLR genes in resistance against the intracellular pathogen P. brassicae and a putative NLR gene in Rlm9-mediated resistance against the extracellular pathogen L. maculans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).
Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren
2016-04-01
Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)
Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren
2016-01-01
Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575
Analysis of protein function in clinical C. albicans isolates
Gerami-Nejad, Maryam; Forche, Anja; McClellan, Mark; Berman, Judith
2012-01-01
Clinical isolates are prototrophic and hence are not amenable to genetic manipulation using nutritional markers. Here we describe a new set of plasmids carrying the NAT1 (nourseothricin) drug resistance marker (Shen et al., 2005) that can be used both in clinical isolates and in laboratory strains. We constructed novel plasmids containing HA-NAT1 or MYC-NAT1 cassettes to facilitate PCR-mediated construction of strains with C-terminal epitope-tagged proteins and a NAT1-pMet3-GFP plasmid to enable conditional expression of proteins with or without the green fluorescent protein fused at the N-terminus. Furthermore, for proteins that require both the endogenous N- and C-termini for function, we have constructed a GF-NAT1-FP cassette carrying truncated alleles that facilitate insertion of an intact, single copy of GFP internal to the coding sequence. In addition, GFP-NAT1, RFP-NAT1, and M-Cherry-NAT1 plasmids were constructed expressing two differently labeled gene products for the study of protein co-expression and co-localization in vivo. Together, these vectors provide a useful set of genetic tools for studying diverse aspects of gene function in C. albicans clinical as well as laboratory strains. PMID:22777821
LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources.
Karchin, Rachel; Diekhans, Mark; Kelly, Libusha; Thomas, Daryl J; Pieper, Ursula; Eswar, Narayanan; Haussler, David; Sali, Andrej
2005-06-15
The NCBI dbSNP database lists over 9 million single nucleotide polymorphisms (SNPs) in the human genome, but currently contains limited annotation information. SNPs that result in amino acid residue changes (nsSNPs) are of critical importance in variation between individuals, including disease and drug sensitivity. We have developed LS-SNP, a genomic scale software pipeline to annotate nsSNPs. LS-SNP comprehensively maps nsSNPs onto protein sequences, functional pathways and comparative protein structure models, and predicts positions where nsSNPs destabilize proteins, interfere with the formation of domain-domain interfaces, have an effect on protein-ligand binding or severely impact human health. It currently annotates 28,043 validated SNPs that produce amino acid residue substitutions in human proteins from the SwissProt/TrEMBL database. Annotations can be viewed via a web interface either in the context of a genomic region or by selecting sets of SNPs, genes, proteins or pathways. These results are useful for identifying candidate functional SNPs within a gene, haplotype or pathway and in probing molecular mechanisms responsible for functional impacts of nsSNPs. http://www.salilab.org/LS-SNP CONTACT: rachelk@salilab.org http://salilab.org/LS-SNP/supp-info.pdf.
Tsiagkas, Giannis; Nikolaou, Christoforos; Almirantis, Yannis
2014-12-01
CpG Islands (CGIs) are compositionally defined short genomic stretches, which have been studied in the human, mouse, chicken and later in several other genomes. Initially, they were assigned the role of transcriptional regulation of protein-coding genes, especially the house-keeping ones, while more recently there is found evidence that they are involved in several other functions as well, which might include regulation of the expression of RNA genes, DNA replication etc. Here, an investigation of their distributional characteristics in a variety of genomes is undertaken for both whole CGI populations as well as for CGI subsets that lie away from known genes (gene-unrelated or "orphan" CGIs). In both cases power-law-like linearity in double logarithmic scale is found. An evolutionary model, initially put forward for the explanation of a similar pattern found in gene populations is implemented. It includes segmental duplication events and eliminations of most of the duplicated CGIs, while a moderate rate of non-duplicated CGI eliminations is also applied in some cases. Simulations reproduce all the main features of the observed inter-CGI chromosomal size distributions. Our results on power-law-like linearity found in orphan CGI populations suggest that the observed distributional pattern is independent of the analogous pattern that protein coding segments were reported to follow. The power-law-like patterns in the genomic distributions of CGIs described herein are found to be compatible with several other features of the composition, abundance or functional role of CGIs reported in the current literature across several genomes, on the basis of the proposed evolutionary model. Copyright © 2014 Elsevier Ltd. All rights reserved.
Ma, Ruijuan; Li, Yan; Lu, Yinghua
2017-10-11
The PII signaling protein is a key protein for controlling nitrogen assimilatory reactions in most organisms, but little information is reported on PII proteins of green microalga Haematococcus pluvialis . Since H. pluvialis cells can produce a large amount of astaxanthin upon nitrogen starvation, its PII protein may represent an important factor on elevated production of Haematococcus astaxanthin. This study identified and isolated the coding gene (Hp GLB1 ) from this microalga. The full-length of Hp GLB1 was 1222 bp, including 621 bp coding sequence (CDS), 103 bp 5' untranslated region (5' UTR), and 498 bp 3' untranslated region (3' UTR). The CDS could encode a protein with 206 amino acids (HpPII). Its calculated molecular weight (Mw) was 22.4 kDa and the theoretical isoelectric point was 9.53. When H. pluvialis cells were exposed to nitrogen starvation, the Hp GLB1 expression was increased 2.46 times in 48 h, concomitant with the raise of astaxanthin content. This study also used phylogenetic analysis to prove that HpPII was homogeneous to the PII proteins of other green microalgae. The results formed a fundamental basis for the future study on HpPII, for its potential physiological function in Haematococcus astaxanthin biosysthesis.
Bhasi, Ashwini; Philip, Philge; Manikandan, Vinu; Senapathy, Periannan
2009-01-01
We have developed ExDom, a unique database for the comparative analysis of the exon–intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon–intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon–intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon–intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/. PMID:18984624
Recognition of Protein-coding Genes Based on Z-curve Algorithms
-Biao Guo, Feng; Lin, Yan; -Ling Chen, Ling
2014-01-01
Recognition of protein-coding genes, a classical bioinformatics issue, is an absolutely needed step for annotating newly sequenced genomes. The Z-curve algorithm, as one of the most effective methods on this issue, has been successfully applied in annotating or re-annotating many genomes, including those of bacteria, archaea and viruses. Two Z-curve based ab initio gene-finding programs have been developed: ZCURVE (for bacteria and archaea) and ZCURVE_V (for viruses and phages). ZCURVE_C (for 57 bacteria) and Zfisher (for any bacterium) are web servers for re-annotation of bacterial and archaeal genomes. The above four tools can be used for genome annotation or re-annotation, either independently or combined with the other gene-finding programs. In addition to recognizing protein-coding genes and exons, Z-curve algorithms are also effective in recognizing promoters and translation start sites. Here, we summarize the applications of Z-curve algorithms in gene finding and genome annotation. PMID:24822027
Xiong, Yan; Yue, Feng; Jia, Zhihao; Gao, Yun; Jin, Wen; Hu, Keping; Zhang, Yong; Zhu, Dahai; Yang, Gongshe; Kuang, Shihuan
2018-04-01
The thermogenic activities of brown and beige adipocytes can be exploited to reduce energy surplus and counteract obesity. Recent RNA sequencing studies have uncovered a number of long noncoding RNAs (lncRNAs) uniquely expressed in white and brown adipose tissues (WAT and BAT), but whether and how these lncRNAs function in adipogenesis remain largely unknown. Here, we report the identification of a novel brown adipocyte-enriched LncRNA (AK079912), and its nuclear localization, function and regulation. The expression of AK079912 increases during brown preadipocyte differentiation and in response to cold-stimulated browning of white adipocytes. Knockdown of AK079912 inhibits brown preadipocyte differentiation, manifested by reductions in lipid accumulation and down-regulation of adipogenic and BAT-specific genes. Conversely, ectopic expression of AK079912 in white preadipocytes up-regulates the expression of genes involved in thermogenesis. Mechanistically, inhibition of AK079912 reduces mitochondrial copy number and protein levels of mitochondria electron transport chain (ETC) complexes, whereas AK079912 overexpression increases the levels of ETC proteins. Lastly, reporter and pharmacological assays identify Pparγ as an upstream regulator of AK079912. These results provide new insights into the function of non-coding RNAs in brown adipogenesis and regulating browning of white adipocytes. Copyright © 2018 Elsevier B.V. All rights reserved.
Takeda, Jun-ichi; Suzuki, Yutaka; Nakao, Mitsuteru; Barrero, Roberto A.; Koyanagi, Kanako O.; Jin, Lihua; Motono, Chie; Hata, Hiroko; Isogai, Takao; Nagai, Keiichi; Otsuki, Tetsuji; Kuryshev, Vladimir; Shionyu, Masafumi; Yura, Kei; Go, Mitiko; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Wiemann, Stefan; Nomura, Nobuo; Sugano, Sumio; Gojobori, Takashi; Imanishi, Tadashi
2006-01-01
We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants. PMID:16914452
The PhoP-Dependent ncRNA Mcr7 Modulates the TAT Secretion System in Mycobacterium tuberculosis
Benjak, Andrej; Uplekar, Swapna; Rougemont, Jacques; Guilhot, Christophe; Malaga, Wladimir; Martín, Carlos; Cole, Stewart T.
2014-01-01
The PhoPR two-component system is essential for virulence in Mycobacterium tuberculosis where it controls expression of approximately 2% of the genes, including those for the ESX-1 secretion apparatus, a major virulence determinant. Mutations in phoP lead to compromised production of pathogen-specific cell wall components and attenuation both ex vivo and in vivo. Using antibodies against the native protein in ChIP-seq experiments (chromatin immunoprecipitation followed by high-throughput sequencing) we demonstrated that PhoP binds to at least 35 loci on the M. tuberculosis genome. The PhoP regulon comprises several transcriptional regulators as well as genes for polyketide synthases and PE/PPE proteins. Integration of ChIP-seq results with high-resolution transcriptomic analysis (RNA-seq) revealed that PhoP controls 30 genes directly, whilst regulatory cascades are responsible for signal amplification and downstream effects through proteins like EspR, which controls Esx1 function, via regulation of the espACD operon. The most prominent site of PhoP regulation was located in the intergenic region between rv2395 and PE_PGRS41, where the mcr7 gene codes for a small non-coding RNA (ncRNA). Northern blot experiments confirmed the absence of Mcr7 in an M. tuberculosis phoP mutant as well as low-level expression of the ncRNA in M. tuberculosis complex members other than M. tuberculosis. By means of genetic and proteomic analyses we demonstrated that Mcr7 modulates translation of the tatC mRNA thereby impacting the activity of the Twin Arginine Translocation (Tat) protein secretion apparatus. As a result, secretion of the immunodominant Ag85 complex and the beta-lactamase BlaC is affected, among others. Mcr7, the first ncRNA of M. tuberculosis whose function has been established, therefore represents a missing link between the PhoPR two-component system and the downstream functions necessary for successful infection of the host. PMID:24874799
Kantyka, Tomasz; Rawlings, Neil D.; Potempa, Jan
2010-01-01
In metazoan organisms protein inhibitors of peptidases are important factors essential for regulation of proteolytic activity. In vertebrates genes encoding peptidase inhibitors constitute up to 1% of genes reflecting a need for tight and specific control of proteolysis especially in extracellular body fluids. In stark contrast unicellular organisms, both prokaryotic and eukaryotic consistently contain only few, if any, genes coding for putative peptidase inhibitors. This may seem perplexing in the light of the fact that these organisms produce large numbers of proteases of different catalytic classes with the genes constituting up to 6% of the total gene count with the average being about 3%. Apparently, however, a unicellular life-style is fully compatible with other mechanisms of regulation of proteolysis and does not require protein inhibitors to control their intracellular and extracellular proteolytic activity. So in prokaryotes occurrence of genes encoding different types of peptidase inhibitors is infrequent and often scattered among phylogenetically distinct orders or even phyla of microbiota. Genes encoding proteins homologous to alpha-2-macroglobulin (family I39), serine carboxypeptidase Y inhibitor (family I51), alpha-1-peptidase inhibitor (family I4) and ecotin (family I11) are the most frequently represented in Bacteria. Although several of these gene products were shown to possess inhibitory activity, with an exception of ecotin and staphostatins, the biological function of microbial inhibitors is unclear. In this review we present distribution of protein inhibitors from different families among prokaryotes, describe their mode of action and hypothesize on their role in microbial physiology and interactions with hosts and environment. PMID:20558234
SERPINA2 Is a Novel Gene with a Divergent Function from SERPINA1
Martins, Manuella; Figueiredo, Joana; Silva, Diana Isabel; Castro, Patrícia; Morales-Hojas, Ramiro; Simões-Correia, Joana; Seixas, Susana
2013-01-01
Serine protease inhibitors (SERPINs) are a superfamily of highly conserved proteins that play a key role in controlling the activity of proteases in diverse biological processes. The SERPIN cluster located at the 14q32.1 region includes the gene coding for SERPINA1, and a highly homologous sequence, SERPINA2, which was originally thought to be a pseudogene. We have previously shown that SERPINA2 is expressed in different tissues, namely leukocytes and testes, suggesting that it is a functional SERPIN. To investigate the function of SERPINA2, we used HeLa cells stably transduced with the different variants of SERPINA2 and SERPINA1 (M1, S and Z) and leukocytes as the in vivo model. We identified SERPINA2 as a 52 kDa intracellular glycoprotein, which is localized at the endoplasmic reticulum (ER), independently of the variant analyzed. SERPINA2 is not significantly regulated by proteasome, proposing that ER localization is not due to misfolding. Specific features of SERPINA2 include the absence of insoluble aggregates and the insignificant response to cell stress, suggesting that it is a non-polymerogenic protein with divergent activity of SERPINA1. Using phylogenetic analysis, we propose an origin of SERPINA2 in the crown of primates, and we unveiled the overall conservation of SERPINA2 and A1. Nonetheless, few SERPINA2 residues seem to have evolved faster, contributing to the emergence of a new advantageous function, possibly as a chymotrypsin-like SERPIN. Herein, we present evidences that SERPINA2 is an active gene, coding for an ER-resident protein, which may act as substrate or adjuvant of ER-chaperones. PMID:23826168
Chemical Approaches to Control Gene Expression
Gottesfeld, Joel M.; Turner, James M.; Dervan, Peter B.
2000-01-01
A current goal in molecular medicine is the development of new strategies to interfere with gene expression in living cells in the hope that novel therapies for human disease will result from these efforts. This review focuses on small-molecule or chemical approaches to manipulate gene expression by modulating either transcription of messenger RNA-coding genes or protein translation. The molecules under study include natural products, designed ligands, and compounds identified through functional screens of combinatorial libraries. The cellular targets for these molecules include DNA, messenger RNA, and the protein components of the transcription, RNA processing, and translational machinery. Studies with model systems have shown promise in the inhibition of both cellular and viral gene transcription and mRNA utilization. Moreover, strategies for both repression and activation of gene transcription have been described. These studies offer promise for treatment of diseases of pathogenic (viral, bacterial, etc.) and cellular origin (cancer, genetic diseases, etc.). PMID:11097426
Carrasco-Rando, Marta; Tutor, Antonio S.; Prieto-Sánchez, Silvia; González-Pérez, Esther; Barrios, Natalia; Letizia, Annalisa; Martín, Paloma; Campuzano, Sonsoles; Ruiz-Gómez, Mar
2011-01-01
A central issue of myogenesis is the acquisition of identity by individual muscles. In Drosophila, at the time muscle progenitors are singled out, they already express unique combinations of muscle identity genes. This muscle code results from the integration of positional and temporal signalling inputs. Here we identify, by means of loss-of-function and ectopic expression approaches, the Iroquois Complex homeobox genes araucan and caupolican as novel muscle identity genes that confer lateral transverse muscle identity. The acquisition of this fate requires that Araucan/Caupolican repress other muscle identity genes such as slouch and vestigial. In addition, we show that Caupolican-dependent slouch expression depends on the activation state of the Ras/Mitogen Activated Protein Kinase cascade. This provides a comprehensive insight into the way Iroquois genes integrate in muscle progenitors, signalling inputs that modulate gene expression and protein activity. PMID:21811416
Peri, A; Cordella-Miele, E; Miele, L; Mukherjee, A B
1993-01-01
Clara cell 10-kD protein (cc10kD), a secretory phospholipase A2 inhibitor, is suggested to be the human counterpart of rabbit uteroglobin (UG). Because cc10kD is expressed constitutively at a very high level in the human respiratory epithelium, the 5' region of its gene may be useful in achieving organ-specific expression of recombinant DNA in gene therapy of diseases such as cystic fibrosis. However, it is important to establish the tissue-specific expression of this gene before designing gene transfer experiments. Since the UG gene in the rabbit is expressed in many other organs besides the lung and the endometrium, we investigated the organ and tissue specificity of human cc10kD gene expression using polymerase chain reaction, nucleotide sequence analysis, immunofluorescence, and Northern blotting. Our results indicate that, in addition to the lung, cc10kD is expressed in several nonrespiratory organs, with a distribution pattern very similar, if not identical, to that of UG in the rabbit. These results underscore the necessity for more detailed analyses of the 5' region of the human cc10kD gene before its usefulness in gene therapy could be fully assessed. These data also suggest that cc10kD and UG may have similar physiological function(s). Images PMID:8227325
Sporophyte Formation and Life Cycle Completion in Moss Requires Heterotrimeric G-Proteins1[OPEN
Hackenberg, Dieter; Quatrano, Ralph
2016-01-01
In this study, we report the functional characterization of heterotrimeric G-proteins from a nonvascular plant, the moss Physcomitrella patens. In plants, G-proteins have been characterized from only a few angiosperms to date, where their involvement has been shown during regulation of multiple signaling and developmental pathways affecting overall plant fitness. In addition to its unparalleled evolutionary position in the plant lineages, the P. patens genome also codes for a unique assortment of G-protein components, which includes two copies of Gβ and Gγ genes, but no canonical Gα. Instead, a single gene encoding an extra-large Gα (XLG) protein exists in the P. patens genome. Here, we demonstrate that in P. patens the canonical Gα is biochemically and functionally replaced by an XLG protein, which works in the same genetic pathway as one of the Gβ proteins to control its development. Furthermore, the specific G-protein subunits in P. patens are essential for its life cycle completion. Deletion of the genomic locus of PpXLG or PpGβ2 results in smaller, slower growing gametophores. Normal reproductive structures develop on these gametophores, but they are unable to form any sporophyte, the only diploid stage in the moss life cycle. Finally, the mutant phenotypes of ΔPpXLG and ΔPpGβ2 can be complemented by the homologous genes from Arabidopsis, AtXLG2 and AtAGB1, respectively, suggesting an overall conservation of their function throughout the plant evolution. PMID:27550997
Rather, Irshad Ahmad; Awasthi, Praveen; Mahajan, Vidushi; Bedi, Yashbir S; Vishwakarma, Ram A; Gandhi, Sumit G
2015-03-01
Pathogenesis-related (PR) proteins are involved in biotic and abiotic stress responses of plants and are grouped into 17 families (PR-1 to PR-17). PR-5 family includes proteins related to thaumatin and osmotin, with several members possessing antimicrobial properties. In this study, a PR-5 gene showing a high degree of homology with osmotin-like protein was isolated from sweet basil (Ocimum basilicum L.). A complete open reading frame consisting of 675 nucleotides, coding for a precursor protein, was obtained by PCR amplification. Based on sequence comparisons with tobacco osmotin and other osmotin-like proteins (OLPs), this protein was named ObOLP. The predicted mature protein is 225 amino acids in length and contains 16 cysteine residues that may potentially form eight disulfide bonds, a signature common to most PR-5 proteins. Among the various abiotic stress treatments tested, including high salt, mechanical wounding and exogenous phytohormone/elicitor treatments; methyl jasmonate (MeJA) and mechanical wounding significantly induced the expression of ObOLP gene. The coding sequence of ObOLP was cloned and expressed in a bacterial host resulting in a 25kDa recombinant-HIS tagged protein, displaying antifungal activity. The ObOLP protein sequence appears to contain an N-terminal signal peptide with signatures of secretory pathway. Further, our experimental data shows that ObOLP expression is regulated transcriptionally and in silico analysis suggests that it may be post-transcriptionally and post-translationally regulated through microRNAs and post-translational protein modifications, respectively. This study appears to be the first report of isolation and characterization of osmotin-like protein gene from O. basilicum. Copyright © 2014 Elsevier B.V. All rights reserved.
GENCODE: the reference human genome annotation for The ENCODE Project.
Harrow, Jennifer; Frankish, Adam; Gonzalez, Jose M; Tapanari, Electra; Diekhans, Mark; Kokocinski, Felix; Aken, Bronwen L; Barrell, Daniel; Zadissa, Amonida; Searle, Stephen; Barnes, If; Bignell, Alexandra; Boychenko, Veronika; Hunt, Toby; Kay, Mike; Mukherjee, Gaurab; Rajan, Jeena; Despacio-Reyes, Gloria; Saunders, Gary; Steward, Charles; Harte, Rachel; Lin, Michael; Howald, Cédric; Tanzer, Andrea; Derrien, Thomas; Chrast, Jacqueline; Walters, Nathalie; Balasubramanian, Suganthi; Pei, Baikang; Tress, Michael; Rodriguez, Jose Manuel; Ezkurdia, Iakes; van Baren, Jeltje; Brent, Michael; Haussler, David; Kellis, Manolis; Valencia, Alfonso; Reymond, Alexandre; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J
2012-09-01
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Allen, Michael S.; Hurst, Gregory B.; Lu, Tse-Yuan S.; ...
2015-04-08
Rhodopseudomonas palustris encodes 16 extracytoplasmic function (ECF) σ factors. In this paper, to begin to investigate the regulatory network of one of these ECF σ factors, the whole proteome of R. palustris CGA010 was quantitatively analyzed by tandem mass spectrometry from cultures episomally expressing the ECF σ RPA4225 (ecfT) versus a WT control. Among the proteins with the greatest increase in abundance were catalase KatE, trehalose synthase, a DPS-like protein, and several regulatory proteins. Alignment of the cognate promoter regions driving expression of several upregulated proteins suggested a conserved binding motif in the -35 and -10 regions with the consensusmore » sequence GGAAC-18N-TT. Additionally, the putative anti-σ factor RPA4224, whose gene is contained in the same predicted operon as RPA4225, was identified as interacting directly with the predicted response regulator RPA4223 by mass spectrometry of affinity-isolated protein complexes. Furthermore, another gene (RPA4226) coding for a protein that contains a cytoplasmic histidine kinase domain is located immediately upstream of RPA4225. The genomic organization of orthologs for these four genes is conserved in several other strains of R. palustris as well as in closely related α-Proteobacteria. Finally, taken together, these data suggest that ECF σ RPA4225 and the three additional genes make up a sigma factor mimicry system in R. palustris.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allen, Michael S.; Hurst, Gregory B.; Lu, Tse-Yuan S.
Rhodopseudomonas palustris encodes 16 extracytoplasmic function (ECF) σ factors. In this paper, to begin to investigate the regulatory network of one of these ECF σ factors, the whole proteome of R. palustris CGA010 was quantitatively analyzed by tandem mass spectrometry from cultures episomally expressing the ECF σ RPA4225 (ecfT) versus a WT control. Among the proteins with the greatest increase in abundance were catalase KatE, trehalose synthase, a DPS-like protein, and several regulatory proteins. Alignment of the cognate promoter regions driving expression of several upregulated proteins suggested a conserved binding motif in the -35 and -10 regions with the consensusmore » sequence GGAAC-18N-TT. Additionally, the putative anti-σ factor RPA4224, whose gene is contained in the same predicted operon as RPA4225, was identified as interacting directly with the predicted response regulator RPA4223 by mass spectrometry of affinity-isolated protein complexes. Furthermore, another gene (RPA4226) coding for a protein that contains a cytoplasmic histidine kinase domain is located immediately upstream of RPA4225. The genomic organization of orthologs for these four genes is conserved in several other strains of R. palustris as well as in closely related α-Proteobacteria. Finally, taken together, these data suggest that ECF σ RPA4225 and the three additional genes make up a sigma factor mimicry system in R. palustris.« less
Noh, Hyun Ji; Tang, Ruqi; Flannick, Jason; O'Dushlaine, Colm; Swofford, Ross; Howrigan, Daniel; Genereux, Diane P; Johnson, Jeremy; van Grootheest, Gerard; Grünblatt, Edna; Andersson, Erik; Djurfeldt, Diana R; Patel, Paresh D; Koltookian, Michele; M Hultman, Christina; Pato, Michele T; Pato, Carlos N; Rasmussen, Steven A; Jenike, Michael A; Hanna, Gregory L; Stewart, S Evelyn; Knowles, James A; Ruhrmann, Stephan; Grabe, Hans-Jörgen; Wagner, Michael; Rück, Christian; Mathews, Carol A; Walitza, Susanne; Cath, Daniëlle C; Feng, Guoping; Karlsson, Elinor K; Lindblad-Toh, Kerstin
2017-10-17
Obsessive-compulsive disorder is a severe psychiatric disorder linked to abnormalities in glutamate signaling and the cortico-striatal circuit. We sequenced coding and regulatory elements for 608 genes potentially involved in obsessive-compulsive disorder in human, dog, and mouse. Using a new method that prioritizes likely functional variants, we compared 592 cases to 560 controls and found four strongly associated genes, validated in a larger cohort. NRXN1 and HTR2A are enriched for coding variants altering postsynaptic protein-binding domains. CTTNBP2 (synapse maintenance) and REEP3 (vesicle trafficking) are enriched for regulatory variants, of which at least six (35%) alter transcription factor-DNA binding in neuroblastoma cells. NRXN1 achieves genome-wide significance (p = 6.37 × 10 -11 ) when we include 33,370 population-matched controls. Our findings suggest synaptic adhesion as a key component in compulsive behaviors, and show that targeted sequencing plus functional annotation can identify potentially causative variants, even when genomic data are limited.Obsessive-compulsive disorder (OCD) is a neuropsychiatric disorder with symptoms including intrusive thoughts and time-consuming repetitive behaviors. Here Noh and colleagues identify genes enriched for functional variants associated with increased risk of OCD.
Architecture of the human interactome defines protein communities and disease networks
Huttlin, Edward L.; Bruckner, Raphael J.; Paulo, Joao A.; Cannon, Joe R.; Ting, Lily; Baltier, Kurt; Colby, Greg; Gebreab, Fana; Gygi, Melanie P.; Parzen, Hannah; Szpyt, John; Tam, Stanley; Zarraga, Gabriela; Pontano-Vaites, Laura; Swarup, Sharan; White, Anne E.; Schweppe, Devin K.; Rad, Ramin; Erickson, Brian K.; Obar, Robert A.; Guruharsha, K.G.; Li, Kejie; Artavanis-Tsakonas, Spyros; Gygi, Steven P.; Harper, J. Wade
2017-01-01
The physiology of a cell can be viewed as the product of thousands of proteins acting in concert to shape the cellular response. Coordination is achieved in part through networks of protein-protein interactions that assemble functionally related proteins into complexes, organelles, and signal transduction pathways. Understanding the architecture of the human proteome has the potential to inform cellular, structural, and evolutionary mechanisms and is critical to elucidation of how genome variation contributes to disease1–3. Here, we present BioPlex 2.0 (Biophysical Interactions of ORFEOME-derived complexes), which employs robust affinity purification-mass spectrometry (AP-MS) methodology4 to elucidate protein interaction networks and co-complexes nucleated by more than 25% of protein coding genes from the human genome, and constitutes the largest such network to date. With >56,000 candidate interactions, BioPlex 2.0 contains >29,000 previously unknown co-associations and provides functional insights into hundreds of poorly characterized proteins while enhancing network-based analyses of domain associations, subcellular localization, and co-complex formation. Unsupervised Markov clustering (MCL)5 of interacting proteins identified more than 1300 protein communities representing diverse cellular activities. Genes essential for cell fitness6,7 are enriched within 53 communities representing central cellular functions. Moreover, we identified 442 communities associated with more than 2000 disease annotations, placing numerous candidate disease genes into a cellular framework. BioPlex 2.0 exceeds previous experimentally derived interaction networks in depth and breadth, and will be a valuable resource for exploring the biology of incompletely characterized proteins and for elucidating larger-scale patterns of proteome organization. PMID:28514442
Alam, Syed Imteyaz; Dwivedi, Pratistha
2016-10-01
The whole genome sequencing and annotation of Clostridium perfringens strains revealed several genes coding for proteins of unknown function with no significant similarities to genes in other organisms. Our previous studies clearly demonstrated that hypothetical proteins CPF_2500, CPF_1441, CPF_0876, CPF_0093, CPF_2002, CPF_2314, CPF_1179, CPF_1132, CPF_2853, CPF_0552, CPF_2032, CPF_0438, CPF_1440, CPF_2918, CPF_0656, and CPF_2364 are genuine proteins of C. perfringens expressed in high abundance. This study explored the putative role of these hypothetical proteins using bioinformatic tools and evaluated their potential as putative candidates for prophylaxis. Apart from a group of eight hypothetical proteins (HPs), a putative function was predicted for the rest of the hypothetical proteins using one or more of the algorithms used. The phylogenetic analysis did not suggest an evidence of a horizontal gene transfer event except for HP CPF_0876. HP CPF_2918 is an abundant extracellular protein, unique to C. perfringens species with maximum strain coverage and did not show any significant match in the database. CPF_2918 was cloned, recombinant protein was purified to near homogeneity, and probing with mouse anti-CPF_2918 serum revealed surface localization of the protein in C. perfringens ATCC13124 cultures. The purified recombinant CPF_2918 protein induced antibody production, a mixed Th1 and Th2 kind of response, and provided partial protection to immunized mice in direct C. perfringens challenge. Copyright © 2016 Elsevier B.V. All rights reserved.
Blaise, Sandra; Ruggieri, Alessia; Dewannieux, Marie; Cosset, François-Loic; Heidmann, Thierry
2004-01-01
A member of the HERV-W family of human endogenous retroviruses (HERV) had previously been demonstrated to encode a functional envelope which can form pseudotypes with human immunodeficiency virus type 1 virions and confer infectivity on the resulting retrovirus particles. Here we show that a second envelope protein sorted out by a systematic search for fusogenic proteins that we made among all the HERV coding envelope genes and belonging to the HERV-FRD family can also make pseudotypes and confer infectivity. We further show that the orthologous envelope genes that were isolated from simians—from New World monkeys to humans—are also functional in the infectivity assay, with one singular exception for the gibbon HERV-FRD gene, which is found to be fusogenic in a cell-cell fusion assay, as observed for the other simian envelopes, but which is not infectious. Sequence comparison of the FRD envelopes revealed a limited number of mutations among simians, and one point mutation—located in the TM subunit—was shown to be responsible for the loss of infectivity of the gibbon envelope. The functional characterization of the identified envelopes is strongly indicative of an ancestral retrovirus infection and endogenization, with some of the envelope functions subsequently retained in evolution. PMID:14694139
Blaise, Sandra; Ruggieri, Alessia; Dewannieux, Marie; Cosset, François-Loic; Heidmann, Thierry
2004-01-01
A member of the HERV-W family of human endogenous retroviruses (HERV) had previously been demonstrated to encode a functional envelope which can form pseudotypes with human immunodeficiency virus type 1 virions and confer infectivity on the resulting retrovirus particles. Here we show that a second envelope protein sorted out by a systematic search for fusogenic proteins that we made among all the HERV coding envelope genes and belonging to the HERV-FRD family can also make pseudotypes and confer infectivity. We further show that the orthologous envelope genes that were isolated from simians-from New World monkeys to humans-are also functional in the infectivity assay, with one singular exception for the gibbon HERV-FRD gene, which is found to be fusogenic in a cell-cell fusion assay, as observed for the other simian envelopes, but which is not infectious. Sequence comparison of the FRD envelopes revealed a limited number of mutations among simians, and one point mutation-located in the TM subunit-was shown to be responsible for the loss of infectivity of the gibbon envelope. The functional characterization of the identified envelopes is strongly indicative of an ancestral retrovirus infection and endogenization, with some of the envelope functions subsequently retained in evolution.
Global Regulatory Functions of the Staphylococcus aureus Endoribonuclease III in Gene Expression
Lioliou, Efthimia; Sharma, Cynthia M.; Caldelari, Isabelle; Helfer, Anne-Catherine; Fechter, Pierre; Vandenesch, François; Vogel, Jörg; Romby, Pascale
2012-01-01
RNA turnover plays an important role in both virulence and adaptation to stress in the Gram-positive human pathogen Staphylococcus aureus. However, the molecular players and mechanisms involved in these processes are poorly understood. Here, we explored the functions of S. aureus endoribonuclease III (RNase III), a member of the ubiquitous family of double-strand-specific endoribonucleases. To define genomic transcripts that are bound and processed by RNase III, we performed deep sequencing on cDNA libraries generated from RNAs that were co-immunoprecipitated with wild-type RNase III or two different cleavage-defective mutant variants in vivo. Several newly identified RNase III targets were validated by independent experimental methods. We identified various classes of structured RNAs as RNase III substrates and demonstrated that this enzyme is involved in the maturation of rRNAs and tRNAs, regulates the turnover of mRNAs and non-coding RNAs, and autoregulates its synthesis by cleaving within the coding region of its own mRNA. Moreover, we identified a positive effect of RNase III on protein synthesis based on novel mechanisms. RNase III–mediated cleavage in the 5′ untranslated region (5′UTR) enhanced the stability and translation of cspA mRNA, which encodes the major cold-shock protein. Furthermore, RNase III cleaved overlapping 5′UTRs of divergently transcribed genes to generate leaderless mRNAs, which constitutes a novel way to co-regulate neighboring genes. In agreement with recent findings, low abundance antisense RNAs covering 44% of the annotated genes were captured by co-immunoprecipitation with RNase III mutant proteins. Thus, in addition to gene regulation, RNase III is associated with RNA quality control of pervasive transcription. Overall, this study illustrates the complexity of post-transcriptional regulation mediated by RNase III. PMID:22761586
Characteristics and significance of intergenic polyadenylated RNA transcription in Arabidopsis.
Moghe, Gaurav D; Lehti-Shiu, Melissa D; Seddon, Alex E; Yin, Shan; Chen, Yani; Juntawong, Piyada; Brandizzi, Federica; Bailey-Serres, Julia; Shiu, Shin-Han
2013-01-01
The Arabidopsis (Arabidopsis thaliana) genome is the most well-annotated plant genome. However, transcriptome sequencing in Arabidopsis continues to suggest the presence of polyadenylated (polyA) transcripts originating from presumed intergenic regions. It is not clear whether these transcripts represent novel noncoding or protein-coding genes. To understand the nature of intergenic polyA transcription, we first assessed its abundance using multiple messenger RNA sequencing data sets. We found 6,545 intergenic transcribed fragments (ITFs) occupying 3.6% of Arabidopsis intergenic space. In contrast to transcribed fragments that map to protein-coding and RNA genes, most ITFs are significantly shorter, are expressed at significantly lower levels, and tend to be more data set specific. A surprisingly large number of ITFs (32.1%) may be protein coding based on evidence of translation. However, our results indicate that these "translated" ITFs tend to be close to and are likely associated with known genes. To investigate if ITFs are under selection and are functional, we assessed ITF conservation through cross-species as well as within-species comparisons. Our analysis reveals that 237 ITFs, including 49 with translation evidence, are under strong selective constraint and relatively distant from annotated features. These ITFs are likely parts of novel genes. However, the selective pressure imposed on most ITFs is similar to that of randomly selected, untranscribed intergenic sequences. Our findings indicate that despite the prevalence of ITFs, apart from the possibility of genomic contamination, many may be background or noisy transcripts derived from "junk" DNA, whose production may be inherent to the process of transcription and which, on rare occasions, may act as catalysts for the creation of novel genes.
Mergaert, Peter; Nikovics, Krisztina; Kelemen, Zsolt; Maunoury, Nicolas; Vaubert, Danièle; Kondorosi, Adam; Kondorosi, Eva
2003-01-01
Transcriptome analysis of Medicago truncatula nodules has led to the discovery of a gene family named NCR (nodule-specific cysteine rich) with more than 300 members. The encoded polypeptides were short (60–90 amino acids), carried a conserved signal peptide, and, except for a conserved cysteine motif, displayed otherwise extensive sequence divergence. Family members were found in pea (Pisum sativum), broad bean (Vicia faba), white clover (Trifolium repens), and Galega orientalis but not in other plants, including other legumes, suggesting that the family might be specific for galegoid legumes forming indeterminate nodules. Gene expression of all family members was restricted to nodules except for two, also expressed in mycorrhizal roots. NCR genes exhibited distinct temporal and spatial expression patterns in nodules and, thus, were coupled to different stages of development. The signal peptide targeted the polypeptides in the secretory pathway, as shown by green fluorescent protein fusions expressed in onion (Allium cepa) epidermal cells. Coregulation of certain NCR genes with genes coding for a potentially secreted calmodulin-like protein and for a signal peptide peptidase suggests a concerted action in nodule development. Potential functions of the NCR polypeptides in cell-to-cell signaling and creation of a defense system are discussed. PMID:12746522
AP-2α and AP-2β cooperatively orchestrate homeobox gene expression during branchial arch patterning.
Van Otterloo, Eric; Li, Hong; Jones, Kenneth L; Williams, Trevor
2018-01-25
The evolution of a hinged moveable jaw with variable morphology is considered a major factor behind the successful expansion of the vertebrates. DLX homeobox transcription factors are crucial for establishing the positional code that patterns the mandible, maxilla and intervening hinge domain, but how the genes encoding these proteins are regulated remains unclear. Herein, we demonstrate that the concerted action of the AP-2α and AP-2β transcription factors within the mouse neural crest is essential for jaw patterning. In the absence of these two proteins, the hinge domain is lost and there are alterations in the size and patterning of the jaws correlating with dysregulation of homeobox gene expression, with reduced levels of Emx, Msx and Dlx paralogs accompanied by an expansion of Six1 expression. Moreover, detailed analysis of morphological features and gene expression changes indicate significant overlap with various compound Dlx gene mutants. Together, these findings reveal that the AP-2 genes have a major function in mammalian neural crest development, influencing patterning of the craniofacial skeleton via the DLX code, an effect that has implications for vertebrate facial evolution, as well as for human craniofacial disorders. © 2018. Published by The Company of Biologists Ltd.
Martín, A C; López, R; García, P
1996-06-01
Cp-1, a bacteriophage infecting Streptococcus pneumoniae, has a linear double-stranded DNA genome, with a terminal protein covalently linked to its 5' ends, that replicates by the protein-priming mechanism. We describe here the complete DNA sequence and transcriptional map of the Cp-1 genome. These analyses have led to the firm assignment of 10 genes and the localization of 19 additional open reading frames in the 19,345-bp Cp-1 DNA. Striking similarities and differences between some of these proteins and those of the Bacillus subtilis phage phi 29, a system that also replicates its DNA by the protein-priming mechanism, have been revealed. The genes coding for structural proteins and assembly factors are located in the central part of the Cp-1 genome. Several proteins corresponding to the predicted gene products were identified by in vitro and in vivo expression of the cloned genes. Mature major head protein from the virion particles results from hydrolysis of the primary gene product at the His-49 residue, whereas the phage gene is expressed in Escherichia coli without modification. We have also identified two open reading frames coding for proteins that show high degrees of similarity to the N- and C-terminal regions, respectively, of the single tail protein identified in phi 29. Sequencing and primer extension analysis suggest transcription of a small RNA showing a secondary structure similar to that of the prohead RNA required for the ATP-dependent packaging of phi 29 DNA. On the basis of its temporal expression, transcription of the Cp-1 genome takes place in two stages, early and late. Combined Northern (RNA) blot and primer extension experiments allowed us to map the 5' initiation sites of the transcripts, and we found that only three genes were transcribed from right to left. These analyses reveal that there are also noticeable differences between Cp-l and phi 29 in transcriptional organization. Considered together, the observations reported here provide new tangible evidence on phylogenetic relationships between B. subtilis and S. pneumoniae.
Mistry, Divya; Wise, Roger P; Dickerson, Julie A
2017-01-01
Identification of central genes and proteins in biomolecular networks provides credible candidates for pathway analysis, functional analysis, and essentiality prediction. The DiffSLC centrality measure predicts central and essential genes and proteins using a protein-protein interaction network. Network centrality measures prioritize nodes and edges based on their importance to the network topology. These measures helped identify critical genes and proteins in biomolecular networks. The proposed centrality measure, DiffSLC, combines the number of interactions of a protein and the gene coexpression values of genes from which those proteins were translated, as a weighting factor to bias the identification of essential proteins in a protein interaction network. Potentially essential proteins with low node degree are promoted through eigenvector centrality. Thus, the gene coexpression values are used in conjunction with the eigenvector of the network's adjacency matrix and edge clustering coefficient to improve essentiality prediction. The outcome of this prediction is shown using three variations: (1) inclusion or exclusion of gene co-expression data, (2) impact of different coexpression measures, and (3) impact of different gene expression data sets. For a total of seven networks, DiffSLC is compared to other centrality measures using Saccharomyces cerevisiae protein interaction networks and gene expression data. Comparisons are also performed for the top ranked proteins against the known essential genes from the Saccharomyces Gene Deletion Project, which show that DiffSLC detects more essential proteins and has a higher area under the ROC curve than other compared methods. This makes DiffSLC a stronger alternative to other centrality methods for detecting essential genes using a protein-protein interaction network that obeys centrality-lethality principle. DiffSLC is implemented using the igraph package in R, and networkx package in Python. The python package can be obtained from git.io/diffslcpy. The R implementation and code to reproduce the analysis is available via git.io/diffslc.
Liu, Wenjing; Ma, Rui; Yuan, Yuan
2017-01-01
Noncoding RNAs play critical roles in regulating protein-coding genes and comprise two major classes: long noncoding RNAs (lncRNAs) and microRNAs (miRNAs). LncRNAs regulate gene expression at transcriptional, post-transcriptional, and epigenetic levels via multiple action modes. LncRNAs can also function as endogenous competitive RNAs for miRNAs and indirectly regulate gene expression post-transcriptionally. By binding to the 3'-untranslated regions (3'-UTR) of target genes, miRNAs post-transcriptionally regulate gene expression. Herein, we conducted a review of post-transcriptional regulation by lncRNAs and miRNAs of genes associated with biological behaviors of gastric cancer. PMID:29187891
[Novel bidirectional promoter from human genome].
Orekhova, A S; Sverdlova, P S; Spirin, P V; Leonova, O G; Popenko, V I; Prasolov, V S; Rubtsov, P M
2011-01-01
In human and other mammalian genomes a number of closely linked gene pairs transcribed in opposite directions are found. According to bioinformatic analysis up to 10% of human genes are arranged in this way. In present work the fragment of human genome was cloned that separates genes localized at 2p13.1 and oriented "head-to-head", coding for hypothetical proteins with unknown functions--CCDC (Coiled Coil Domain Containing) 142 and TTC (TetraTricopeptide repeat Containing) 31. Intergenic CCDC142-TTC31 region overlaps with CpG-island and contains a number of potential binding sites for transcription factors. This fragment functions as bidirectional promoter in the system ofluciferase reporter gene expression upon transfection of human embryonic kidney (HEK293) cells. The vectors containing genes of two fluorescent proteins--green (EGFP) and red (DsRed2) in opposite orientations separated by the fragment of CCDC142-TTC31 intergenic region were constructed. In HEK293 cells transfected with these vectors simultaneous expression of two fluorescent proteins is observed. Truncated versions of intergenic region were obtained and their promoter activity measured. Minimal promoter fragment contains elements Inr, BRE, DPE characteristic for TATA-less promoters. Thus, from the human genome the novel bidirectional promoter was cloned that can be used for simultaneous constitutive expression of two genes in human cells.
Heat-inducible hygromycin resistance in transgenic tobacco.
Severin, K; Schöffl, F
1990-12-01
We have constructed a chimaeric gene consisting of the promoter of the soybean heat shock (hs) gene Gmhsp17, 6-L, the coding region of a hygromycin phosphotransferase (hpt) gene, and the termination sequence of the nopaline synthase (nos) gene. This gene fusion was introduced into tobacco by Agrobacterium-mediated gene transfer. Heat-inducible synthesis of mRNA was shown by northern hybridization, and translation of this RNA into a functional protein was indicated by plant growth on hygromycin-containing media in a temperature-dependent fashion. One hour incubation at 40 degrees C per day, applied for several weeks, was sufficient to express the resistant phenotype in transgenic plants containing the chimaeric hs-hpt gene. These data suggest that the hygromycin resistance gene is functional and faithfully controlled by the soybean hs promoter. The suitability of these transgenic plants for selection of mutations that alter the hs response is discussed.
Jay, Z. J.; Rusch, D. B.; Tringe, S. G.; Bailey, C.; Jennings, R. M.
2014-01-01
High-temperature (>70°C) ecosystems in Yellowstone National Park (YNP) provide an unparalleled opportunity to study chemotrophic archaea and their role in microbial community structure and function under highly constrained geochemical conditions. Acidilobus spp. (order Desulfurococcales) comprise one of the dominant phylotypes in hypoxic geothermal sulfur sediment and Fe(III)-oxide environments along with members of the Thermoproteales and Sulfolobales. Consequently, the primary goals of the current study were to analyze and compare replicate de novo sequence assemblies of Acidilobus-like populations from four different mildly acidic (pH 3.3 to 6.1) high-temperature (72°C to 82°C) environments and to identify metabolic pathways and/or protein-encoding genes that provide a detailed foundation of the potential functional role of these populations in situ. De novo assemblies of the highly similar Acidilobus-like populations (>99% 16S rRNA gene identity) represent near-complete consensus genomes based on an inventory of single-copy genes, deduced metabolic potential, and assembly statistics generated across sites. Functional analysis of coding sequences and confirmation of gene transcription by Acidilobus-like populations provide evidence that they are primarily chemoorganoheterotrophs, generating acetyl coenzyme A (acetyl-CoA) via the degradation of carbohydrates, lipids, and proteins, and auxotrophic with respect to several external vitamins, cofactors, and metabolites. No obvious pathways or protein-encoding genes responsible for the dissimilatory reduction of sulfur were identified. The presence of a formate dehydrogenase (Fdh) and other protein-encoding genes involved in mixed-acid fermentation supports the hypothesis that Acidilobus spp. function as degraders of complex organic constituents in high-temperature, mildly acidic, hypoxic geothermal systems. PMID:24162572
Barta, Andrea; Kalyna, Maria; Reddy, Anireddy S N
2010-09-01
Growing interest in alternative splicing in plants and the extensive sequencing of new plant genomes necessitate more precise definition and classification of genes coding for splicing factors. SR proteins are a family of RNA binding proteins, which function as essential factors for constitutive and alternative splicing. We propose a unified nomenclature for plant SR proteins, taking into account the newly revised nomenclature of the mammalian SR proteins and a number of plant-specific properties of the plant proteins. We identify six subfamilies of SR proteins in Arabidopsis thaliana and rice (Oryza sativa), three of which are plant specific. The proposed subdivision of plant SR proteins into different subfamilies will allow grouping of paralogous proteins and simple assignment of newly discovered SR orthologs from other plant species and will promote functional comparisons in diverse plant species.
Kirsten, Holger; Al-Hasani, Hoor; Holdt, Lesca; Gross, Arnd; Beutner, Frank; Krohn, Knut; Horn, Katrin; Ahnert, Peter; Burkhardt, Ralph; Reiche, Kristin; Hackermüller, Jörg; Löffler, Markus; Teupser, Daniel; Thiery, Joachim; Scholz, Markus
2015-08-15
Genetics of gene expression (eQTLs or expression QTLs) has proved an indispensable tool for understanding biological pathways and pathomechanisms of trait-associated SNPs. However, power of most genome-wide eQTL studies is still limited. We performed a large eQTL study in peripheral blood mononuclear cells of 2112 individuals increasing the power to detect trans-effects genome-wide. Going beyond univariate SNP-transcript associations, we analyse relations of eQTLs to biological pathways, polygenetic effects of expression regulation, trans-clusters and enrichment of co-localized functional elements. We found eQTLs for about 85% of analysed genes, and 18% of genes were trans-regulated. Local eSNPs were enriched up to a distance of 5 Mb to the transcript challenging typically implemented ranges of cis-regulations. Pathway enrichment within regulated genes of GWAS-related eSNPs supported functional relevance of identified eQTLs. We demonstrate that nearest genes of GWAS-SNPs might frequently be misleading functional candidates. We identified novel trans-clusters of potential functional relevance for GWAS-SNPs of several phenotypes including obesity-related traits, HDL-cholesterol levels and haematological phenotypes. We used chromatin immunoprecipitation data for demonstrating biological effects. Yet, we show for strongly heritable transcripts that still little trans-chromosomal heritability is explained by all identified trans-eSNPs; however, our data suggest that most cis-heritability of these transcripts seems explained. Dissection of co-localized functional elements indicated a prominent role of SNPs in loci of pseudogenes and non-coding RNAs for the regulation of coding genes. In summary, our study substantially increases the catalogue of human eQTLs and improves our understanding of the complex genetic regulation of gene expression, pathways and disease-related processes. © The Author 2015. Published by Oxford University Press.
Huang, Ying; Chen, Shi-Yi; Deng, Feilong
2016-01-01
In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.
Ma, Jinxing; Wang, Zhiwei; Li, Huan; Park, Hee-Deung; Wu, Zhichao
2016-06-01
Metagenomic sequencing was used to investigate the microbial structures, functional potentials, and biofouling-related genes in a membrane bioreactor (MBR). The results showed that the microbial community in the MBR was highly diverse. Notably, function analysis of the dominant genera indicated that common genes from different phylotypes were identified for important functional potentials with the observation of variation of abundances of genes in a certain taxon (e.g., Dechloromonas). Despite maintaining similar metabolic functional potentials with a parallel full-scale conventional activated sludge (CAS) system due to treating the identical wastewater, the MBR had more abundant nitrification-related bacteria and coding genes of ammonia monooxygenase, which could well explain its excellent ammonia removal in the low-temperature period. Furthermore, according to quantification of the genes involved in exopolysaccharide and extracellular polymeric substance (EPS) protein metabolism, the MBR did not show a much different potential in producing EPS compared to the CAS system, and bacteria from the membrane biofilm had lower abundances of genes associated with EPS biosynthesis and transport compared to the activated sludge in the MBR.
Xie, G.; Chain, P.S.G.; Lo, C.; Liu, K-L.; Gans, J.; Merritt, J.; Qi, F.
2010-01-01
SUMMARY Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~ 2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. PMID:21040513
Xie, G; Chain, P S G; Lo, C-C; Liu, K-L; Gans, J; Merritt, J; Qi, F
2010-12-01
Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. © 2010 John Wiley & Sons A/S.
Okada, Y
1999-01-01
Early in the development of molecular biology, TMV RNA was widely used as a mRNA [corrected] that could be purified easily, and it contributed much to research on protein synthesis. Also, in the early stages of elucidation of the genetic code, artificially produced TMV mutants were widely used and provided the first proof that the genetic code was non-overlapping. In 1982, Goelet et al. determined the complete TMV RNA base sequence of 6395 nucleotides. The four genes (130K, 180K, 30K and coat protein) could then be mapped at precise locations in the TMV genome. Furthermore it had become clear, a little earlier, that genes located internally in the genome were expressed via subgenomic mRNAs. The initiation site for assembly of TMV particles was also determined. However, although TMV contributed so much at the beginning of the development of molecular biology, its influence was replaced by that of Escherichia coli and its phages in the next phase. As recombinant DNA technology developed in the 1980s, RNA virus research became more detached from the frontier of molecular biology. To recover from this setback, a gene-manipulation system was needed for RNA viruses. In 1986, two such systems were developed for TMV, using full-length cDNA clones, by Dawson's group and by Okada's group. Thus, reverse genetics could be used to elucidate the basic functions of all proteins encoded by the TMV genome. Identification of the function of the 30K protein was especially important because it was the first evidence that a plant virus possesses a cell-to-cell movement function. Many other plant viruses have since been found to encode comparable 'movement proteins'. TMV thus became the first plant virus for which structures and functions were known for all its genes. At the birth of molecular plant pathology, TMV became a leader again. TMV has also played pioneering roles in many other fields. TMV was the first virus for which the amino acid sequence of the coat protein was determined and first virus for which cotranslational disassembly was demonstrated both in vivo and in vitro. It was the first virus for which activation of a resistance gene in a host plant was related to the molecular specificity of a product of a viral gene. Also, in the field of plant biotechnology, TMV vectors are among the most promising. Thus, for the 100 years since Beijerinck's work, TMV research has consistently played a leading role in opening up new areas of study, not only in plant pathology, but also in virology, biochemistry, molecular biology, RNA genetics and biotechnology. PMID:10212936
Silencing of X-Linked MicroRNAs by Meiotic Sex Chromosome Inactivation
Royo, Hélène; Seitz, Hervé; ElInati, Elias; Peters, Antoine H. F. M.; Stadler, Michael B.; Turner, James M. A.
2015-01-01
During the pachytene stage of meiosis in male mammals, the X and Y chromosomes are transcriptionally silenced by Meiotic Sex Chromosome Inactivation (MSCI). MSCI is conserved in therian mammals and is essential for normal male fertility. Transcriptomics approaches have demonstrated that in mice, most or all protein-coding genes on the X chromosome are subject to MSCI. However, it is unclear whether X-linked non-coding RNAs behave in a similar manner. The X chromosome is enriched in microRNA (miRNA) genes, with many exhibiting testis-biased expression. Importantly, high expression levels of X-linked miRNAs (X-miRNAs) have been reported in pachytene spermatocytes, indicating that these genes may escape MSCI, and perhaps play a role in the XY-silencing process. Here we use RNA FISH to examine X-miRNA expression in the male germ line. We find that, like protein-coding X-genes, X-miRNAs are expressed prior to prophase I and are thereafter silenced during pachynema. X-miRNA silencing does not occur in mouse models with defective MSCI. Furthermore, X-miRNAs are expressed at pachynema when present as autosomally integrated transgenes. Thus, we conclude that silencing of X-miRNAs during pachynema in wild type males is MSCI-dependent. Importantly, misexpression of X-miRNAs during pachynema causes spermatogenic defects. We propose that MSCI represents a chromosomal mechanism by which X-miRNAs, and other potential X-encoded repressors, can be silenced, thereby regulating genes with critical late spermatogenic functions. PMID:26509798
Reggiani, Claudio; Coppens, Sandra; Sekhara, Tayeb; Dimov, Ivan; Pichon, Bruno; Lufin, Nicolas; Addor, Marie-Claude; Belligni, Elga Fabia; Digilio, Maria Cristina; Faletra, Flavio; Ferrero, Giovanni Battista; Gerard, Marion; Isidor, Bertrand; Joss, Shelagh; Niel-Bütschi, Florence; Perrone, Maria Dolores; Petit, Florence; Renieri, Alessandra; Romana, Serge; Topa, Alexandra; Vermeesch, Joris Robert; Lenaerts, Tom; Casimir, Georges; Abramowicz, Marc; Bontempi, Gianluca; Vilain, Catheline; Deconinck, Nicolas; Smits, Guillaume
2017-07-19
Tissue-specific integrative omics has the potential to reveal new genic elements important for developmental disorders. Two pediatric patients with global developmental delay and intellectual disability phenotype underwent array-CGH genetic testing, both showing a partial deletion of the DLG2 gene. From independent human and murine omics datasets, we combined copy number variations, histone modifications, developmental tissue-specific regulation, and protein data to explore the molecular mechanism at play. Integrating genomics, transcriptomics, and epigenomics data, we describe two novel DLG2 promoters and coding first exons expressed in human fetal brain. Their murine conservation and protein-level evidence allowed us to produce new DLG2 gene models for human and mouse. These new genic elements are deleted in 90% of 29 patients (public and in-house) showing partial deletion of the DLG2 gene. The patients' clinical characteristics expand the neurodevelopmental phenotypic spectrum linked to DLG2 gene disruption to cognitive and behavioral categories. While protein-coding genes are regarded as well known, our work shows that integration of multiple omics datasets can unveil novel coding elements. From a clinical perspective, our work demonstrates that two new DLG2 promoters and exons are crucial for the neurodevelopmental phenotypes associated with this gene. In addition, our work brings evidence for the lack of cross-annotation in human versus mouse reference genomes and nucleotide versus protein databases.
Gudhka, Reema K; Neilan, Brett A; Burns, Brendan P
2015-01-01
Halococcus hamelinensis was the first archaeon isolated from stromatolites. These geomicrobial ecosystems are thought to be some of the earliest known on Earth, yet, despite their evolutionary significance, the role of Archaea in these systems is still not well understood. Detailed here is the genome sequencing and analysis of an archaeon isolated from stromatolites. The genome of H. hamelinensis consisted of 3,133,046 base pairs with an average G+C content of 60.08% and contained 3,150 predicted coding sequences or ORFs, 2,196 (68.67%) of which were protein-coding genes with functional assignments and 954 (29.83%) of which were of unknown function. Codon usage of the H. hamelinensis genome was consistent with a highly acidic proteome, a major adaptive mechanism towards high salinity. Amino acid transport and metabolism, inorganic ion transport and metabolism, energy production and conversion, ribosomal structure, and unknown function COG genes were overrepresented. The genome of H. hamelinensis also revealed characteristics reflecting its survival in its extreme environment, including putative genes/pathways involved in osmoprotection, oxidative stress response, and UV damage repair. Finally, genome analyses indicated the presence of putative transposases as well as positive matches of genes of H. hamelinensis against various genomes of Bacteria, Archaea, and viruses, suggesting the potential for horizontal gene transfer.
Dam, Phuongan; Kataeva, Irina; Yang, Sung-Jae; Zhou, Fengfeng; Yin, Yanbin; Chou, Wenchi; Poole, Farris L.; Westpheling, Janet; Hettich, Robert; Giannone, Richard; Lewis, Derrick L.; Kelly, Robert; Gilbert, Harry J.; Henrissat, Bernard; Xu, Ying; Adams, Michael W. W.
2011-01-01
Caldicellulosiruptor bescii DSM 6725 utilizes various polysaccharides and grows efficiently on untreated high-lignin grasses and hardwood at an optimum temperature of ∼80°C. It is a promising anaerobic bacterium for studying high-temperature biomass conversion. Its genome contains 2666 protein-coding sequences organized into 1209 operons. Expression of 2196 genes (83%) was confirmed experimentally. At least 322 genes appear to have been obtained by lateral gene transfer (LGT). Putative functions were assigned to 364 conserved/hypothetical protein (C/HP) genes. The genome contains 171 and 88 genes related to carbohydrate transport and utilization, respectively. Growth on cellulose led to the up-regulation of 32 carbohydrate-active (CAZy), 61 sugar transport, 25 transcription factor and 234 C/HP genes. Some C/HPs were overproduced on cellulose or xylan, suggesting their involvement in polysaccharide conversion. A unique feature of the genome is enrichment with genes encoding multi-modular, multi-functional CAZy proteins organized into one large cluster, the products of which are proposed to act synergistically on different components of plant cell walls and to aid the ability of C. bescii to convert plant biomass. The high duplication of CAZy domains coupled with the ability to acquire foreign genes by LGT may have allowed the bacterium to rapidly adapt to changing plant biomass-rich environments. PMID:21227922
Chang, Yao-Ming; Liu, Wen-Yu; Shih, Arthur Chun-Chieh; Shen, Meng-Ni; Lu, Chen-Hua; Lu, Mei-Yeh Jade; Yang, Hui-Wen; Wang, Tzi-Yuan; Chen, Sean C-C; Chen, Stella Maris; Li, Wen-Hsiung; Ku, Maurice S B
2012-09-01
To study the regulatory and functional differentiation between the mesophyll (M) and bundle sheath (BS) cells of maize (Zea mays), we isolated large quantities of highly homogeneous M and BS cells from newly matured second leaves for transcriptome profiling by RNA sequencing. A total of 52,421 annotated genes with at least one read were found in the two transcriptomes. Defining a gene with more than one read per kilobase per million mapped reads as expressed, we identified 18,482 expressed genes; 14,972 were expressed in M cells, including 53 M-enriched transcription factor (TF) genes, whereas 17,269 were expressed in BS cells, including 214 BS-enriched TF genes. Interestingly, many TF gene families show a conspicuous BS preference in expression. Pathway analyses reveal differentiation between the two cell types in various functional categories, with the M cells playing more important roles in light reaction, protein synthesis and folding, tetrapyrrole synthesis, and RNA binding, while the BS cells specialize in transport, signaling, protein degradation and posttranslational modification, major carbon, hydrogen, and oxygen metabolism, cell division and organization, and development. Genes coding for several transporters involved in the shuttle of C(4) metabolites and BS cell wall development have been identified, to our knowledge, for the first time. This comprehensive data set will be useful for studying M/BS differentiation in regulation and function.
Foox, Jonathan; Brugler, Mercer; Siddall, Mark Edward; Rodríguez, Estefanía
2016-07-01
Six complete and three partial actiniarian mitochondrial genomes were amplified in two semi-circles using long-range PCR and pyrosequenced in a single run on a 454 GS Junior, doubling the number of complete mitogenomes available within the order. Typical metazoan mtDNA features included circularity, 13 protein-coding genes, 2 ribosomal RNA genes, and length ranging from 17,498 to 19,727 bp. Several typical anthozoan mitochondrial genome features were also observed including the presence of only two transfer RNA genes, elevated A + T richness ranging from 54.9 to 62.4%, large intergenic regions, and group 1 introns interrupting NADH dehydrogenase subunit 5 and cytochrome c oxidase subunit I, the latter of which possesses a homing endonuclease gene. Within the sea anemone Alicia sansibarensis, we report the first mitochondrial gene order rearrangement within the Actiniaria, as well as putative novel non-canonical protein-coding genes. Phylogenetic analyses of all 13 protein-coding and 2 ribosomal genes largely corroborated current hypotheses of sea anemone interrelatedness, with a few lower-level differences.
Yong, Hoi-Sen; Song, Sze-Looi; Lim, Phaik-Eem; Chan, Kok-Gan; Chow, Wan-Loo; Eamsobhana, Praphathip
2015-01-01
The whole mitochondrial genome of the pest fruit fly Bactrocera arecae was obtained from next-generation sequencing of genomic DNA. It had a total length of 15,900 bp, consisting of 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The control region (952 bp) was flanked by rrnS and trnI genes. The start codons included 6 ATG, 3 ATT and 1 each of ATA, ATC, GTG and TCG. Eight TAA, two TAG, one incomplete TA and two incomplete T stop codons were represented in the protein-coding genes. The cloverleaf structure for trnS1 lacked the D-loop, and that of trnN and trnF lacked the TΨC-loop. Molecular phylogeny based on 13 protein-coding genes was concordant with 37 mitochondrial genes, with B. arecae having closest genetic affinity to B. tryoni. The subgenus Bactrocera of Dacini tribe and the Dacinae subfamily (Dacini and Ceratitidini tribes) were monophyletic. The whole mitogenome of B. arecae will serve as a useful dataset for studying the genetics, systematics and phylogenetic relationships of the many species of Bactrocera genus in particular, and tephritid fruit flies in general. PMID:26472633
Reengineering a transmembrane protein to treat muscular dystrophy using exon skipping.
Gao, Quan Q; Wyatt, Eugene; Goldstein, Jeff A; LoPresti, Peter; Castillo, Lisa M; Gazda, Alec; Petrossian, Natalie; Earley, Judy U; Hadhazy, Michele; Barefield, David Y; Demonbreun, Alexis R; Bönnemann, Carsten; Wolf, Matthew; McNally, Elizabeth M
2015-11-02
Exon skipping uses antisense oligonucleotides as a treatment for genetic diseases. The antisense oligonucleotides used for exon skipping are designed to bypass premature stop codons in the target RNA and restore reading frame disruption. Exon skipping is currently being tested in humans with dystrophin gene mutations who have Duchenne muscular dystrophy. For Duchenne muscular dystrophy, the rationale for exon skipping derived from observations in patients with naturally occurring dystrophin gene mutations that generated internally deleted but partially functional dystrophin proteins. We have now expanded the potential for exon skipping by testing whether an internal, in-frame truncation of a transmembrane protein γ-sarcoglycan is functional. We generated an internally truncated γ-sarcoglycan protein that we have termed Mini-Gamma by deleting a large portion of the extracellular domain. Mini-Gamma provided functional and pathological benefits to correct the loss of γ-sarcoglycan in a Drosophila model, in heterologous cell expression studies, and in transgenic mice lacking γ-sarcoglycan. We generated a cellular model of human muscle disease and showed that multiple exon skipping could be induced in RNA that encodes a mutant human γ-sarcoglycan. Since Mini-Gamma represents removal of 4 of the 7 coding exons in γ-sarcoglycan, this approach provides a viable strategy to treat the majority of patients with γ-sarcoglycan gene mutations.
Reengineering a transmembrane protein to treat muscular dystrophy using exon skipping
Gao, Quan Q.; Wyatt, Eugene; Goldstein, Jeff A.; LoPresti, Peter; Castillo, Lisa M.; Gazda, Alec; Petrossian, Natalie; Earley, Judy U.; Hadhazy, Michele; Barefield, David Y.; Demonbreun, Alexis R.; Bönnemann, Carsten; Wolf, Matthew; McNally, Elizabeth M.
2015-01-01
Exon skipping uses antisense oligonucleotides as a treatment for genetic diseases. The antisense oligonucleotides used for exon skipping are designed to bypass premature stop codons in the target RNA and restore reading frame disruption. Exon skipping is currently being tested in humans with dystrophin gene mutations who have Duchenne muscular dystrophy. For Duchenne muscular dystrophy, the rationale for exon skipping derived from observations in patients with naturally occurring dystrophin gene mutations that generated internally deleted but partially functional dystrophin proteins. We have now expanded the potential for exon skipping by testing whether an internal, in-frame truncation of a transmembrane protein γ-sarcoglycan is functional. We generated an internally truncated γ-sarcoglycan protein that we have termed Mini-Gamma by deleting a large portion of the extracellular domain. Mini-Gamma provided functional and pathological benefits to correct the loss of γ-sarcoglycan in a Drosophila model, in heterologous cell expression studies, and in transgenic mice lacking γ-sarcoglycan. We generated a cellular model of human muscle disease and showed that multiple exon skipping could be induced in RNA that encodes a mutant human γ-sarcoglycan. Since Mini-Gamma represents removal of 4 of the 7 coding exons in γ-sarcoglycan, this approach provides a viable strategy to treat the majority of patients with γ-sarcoglycan gene mutations. PMID:26457733
Quach, Tommy; Brooks, Daniel M; Miranda, Hector C
2016-01-01
The complete mitochondrial genome of the Palawan peacock-pheasant Polyplectron napoleonis is 16,710 bp and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a control-region. All protein-coding genes use the standard ATG start codon, except for cox1 which has GTG start codon. Seven out of 13 PCGs have TAA stop codons, two have AGG (cox1 and nd6), and three PCGs (nd2, cox2 and nd4) have incomplete stop codon of just T- - nucleotide.
Gene Duplication and Transference of Function in the paleoAP3 Lineage of Floral Organ Identity Genes
Galimba, Kelsey D.; Martínez-Gómez, Jesús; Di Stilio, Verónica S.
2018-01-01
The floral organ identity gene APETALA3 (AP3) is a MADS-box transcription factor involved in stamen and petal identity that belongs to the B-class of the ABC model of flower development. Thalictrum (Ranunculaceae), an emerging model in the non-core eudicots, has AP3 homologs derived from both ancient and recent gene duplications. Prior work has shown that petals have been lost repeatedly and independently in Ranunculaceae in correlation with the loss of a specific AP3 paralog, and Thalictrum represents one of these instances. The main goal of this study was to conduct a functional analysis of the three AP3 orthologs present in Thalictrum thalictroides, representing the paleoAP3 gene lineage, to determine the degree of redundancy versus divergence after gene duplication. Because Thalictrum lacks petals, and has lost the petal-specific AP3, we also asked whether heterotopic expression of the remaining AP3 genes contributes to the partial transference of petal function to the first whorl found in insect-pollinated species. To address these questions, we undertook functional characterization by virus-induced gene silencing (VIGS), protein–protein interaction and binding site analyses. Our results illustrate partial redundancy among Thalictrum AP3s, with deep conservation of B-class function in stamen identity and a novel role in ectopic petaloidy of sepals. Certain aspects of petal function of the lost AP3 locus have apparently been transferred to the other paralogs. A novel result is that the protein products interact not only with each other, but also as homodimers. Evidence presented here also suggests that expression of the different ThtAP3 paralogs is tightly integrated, with an apparent disruption of B function homeostasis upon silencing of one of the paralogs that codes for a truncated protein. To explain this result, we propose two testable alternative scenarios: that the truncated protein is a dominant negative mutant or that there is a compensational response as part of a back-up circuit. The evidence for promiscuous protein–protein interactions via yeast two-hybrid combined with the detection of AP3 specific binding motifs in all B-class gene promoters provide partial support for these hypotheses. PMID:29628932
Schulte, W; Töpfer, R; Stracke, R; Schell, J; Martini, N
1997-04-01
Three genes coding for different multifunctional acetyl-CoA carboxylase (ACCase; EC 6.4.1.2) isoenzymes from Brassica napus were isolated and divided into two major classes according to structural features in their 5' regions: class I comprises two genes with an additional coding exon of approximately 300 bp at the 5' end, and class II is represented by one gene carrying an intron of 586 bp in its 5' untranslated region. Fusion of the peptide sequence encoded by the additional first exon of a class I ACCase gene to the jellyfish Aequorea victoria green fluorescent protein (GFP) and transient expression in tobacco protoplasts targeted GFP to the chloroplasts. In contrast to the deduced primary structure of the biotin carboxylase domain encoded by the class I gene, the corresponding amino acid sequence of the class II ACCase shows higher identity with that of the Arabidopsis ACCase, both lacking a transit peptide. The Arabidopsis ACCase has been proposed to be a cytosolic isoenzyme. These observations indicate that the two classes of ACCase genes encode plastidic and cytosolic isoforms of multi-functional, eukaryotic type, respectively, and that B. napus contains at least one multi-functional ACCase besides the multi-subunit, prokaryotic type located in plastids. Southern blot analysis of genomic DNA from B. napus, Brassica rapa, and Brassica oleracea, the ancestors of amphidiploid rapeseed, using a fragment of a multi-functional ACCase gene as a probe revealed that ACCase is encoded by a multi-gene family of at least five members.
Gene Trapping Using Gal4 in Zebrafish
Balciuniene, Jorune; Balciunas, Darius
2013-01-01
Large clutch size and external development of optically transparent embryos make zebrafish an exceptional vertebrate model system for in vivo insertional mutagenesis using fluorescent reporters to tag expression of mutated genes. Several laboratories have constructed and tested enhancer- and gene-trap vectors in zebrafish, using fluorescent proteins, Gal4- and lexA- based transcriptional activators as reporters 1-7. These vectors had two potential drawbacks: suboptimal stringency (e.g. lack of ability to differentiate between enhancer- and gene-trap events) and low mutagenicity (e.g. integrations into genes rarely produced null alleles). Gene Breaking Transposon (GBTs) were developed to address these drawbacks 8-10. We have modified one of the first GBT vectors, GBT-R15, for use with Gal4-VP16 as the primary gene trap reporter and added UAS:eGFP as the secondary reporter for direct detection of gene trap events. Application of Gal4-VP16 as the primary gene trap reporter provides two main advantages. First, it increases sensitivity for genes expressed at low expression levels. Second, it enables researchers to use gene trap lines as Gal4 drivers to direct expression of other transgenes in very specific tissues. This is especially pertinent for genes with non-essential or redundant functions, where gene trap integration may not result in overt phenotypes. The disadvantage of using Gal4-VP16 as the primary gene trap reporter is that genes coding for proteins with N-terminal signal sequences are not amenable to trapping, as the resulting Gal4-VP16 fusion proteins are unlikely to be able to enter the nucleus and activate transcription. Importantly, the use of Gal4-VP16 does not pre-select for nuclear proteins: we recovered gene trap mutations in genes encoding proteins which function in the nucleus, the cytoplasm and the plasma membrane. PMID:24121167
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dyer, K.D.; Handen, J.S.; Rosenberg, H.F.
The Charcot-Leyden crystal (CLC) protein, or eosinophil lysophospholipase, is a characteristic protein of human eosinophils and basophils; recent work has demonstrated that the CLC protein is both structurally and functionally related to the galectin family of {beta}-galactoside binding proteins. The galectins as a group share a number of features in common, including a linear ligand binding site encoded on a single exon. In this work, we demonstrate that the intron-exon structure of the gene encoding CLC is analogous to those encoding the galectins. The coding sequence of the CLC gene is divided into four exons, with the entire {beta}-galactoside bindingmore » site encoded by exon III. We have isolated CLC {beta}-galactoside binding sites from both orangutan (Pongo pygmaeus) and murine (Mus musculus) genomic DNAs, both encoded on single exons, and noted conservation of the amino acids shown to interact directly with the {beta}-galactoside ligand. The most likely interpretation of these results suggests the occurrence of one or more exon duplication and insertion events, resulting in the distribution of this lectin domain to CLC as well as to the multiple galectin genes. 35 refs., 3 figs.« less
MGDB: a comprehensive database of genes involved in melanoma.
Zhang, Di; Zhu, Rongrong; Zhang, Hanqian; Zheng, Chun-Hou; Xia, Junfeng
2015-01-01
The Melanoma Gene Database (MGDB) is a manually curated catalog of molecular genetic data relating to genes involved in melanoma. The main purpose of this database is to establish a network of melanoma related genes and to facilitate the mechanistic study of melanoma tumorigenesis. The entries describing the relationships between melanoma and genes in the current release were manually extracted from PubMed abstracts, which contains cumulative to date 527 human melanoma genes (422 protein-coding and 105 non-coding genes). Each melanoma gene was annotated in seven different aspects (General Information, Expression, Methylation, Mutation, Interaction, Pathway and Drug). In addition, manually curated literature references have also been provided to support the inclusion of the gene in MGDB and establish its association with melanoma. MGDB has a user-friendly web interface with multiple browse and search functions. We hoped MGDB will enrich our knowledge about melanoma genetics and serve as a useful complement to the existing public resources. Database URL: http://bioinfo.ahu.edu.cn:8080/Melanoma/index.jsp. © The Author(s) 2015. Published by Oxford University Press.
MultitaskProtDB: a database of multitasking proteins.
Hernández, Sergio; Ferragut, Gabriela; Amela, Isaac; Perez-Pons, JosepAntoni; Piñol, Jaume; Mozo-Villarias, Angel; Cedano, Juan; Querol, Enrique
2014-01-01
We have compiled MultitaskProtDB, available online at http://wallace.uab.es/multitask, to provide a repository where the many multitasking proteins found in the literature can be stored. Multitasking or moonlighting is the capability of some proteins to execute two or more biological functions. Usually, multitasking proteins are experimentally revealed by serendipity. This ability of proteins to perform multitasking functions helps us to understand one of the ways used by cells to perform many complex functions with a limited number of genes. Even so, the study of this phenomenon is complex because, among other things, there is no database of moonlighting proteins. The existence of such a tool facilitates the collection and dissemination of these important data. This work reports the database, MultitaskProtDB, which is designed as a friendly user web page containing >288 multitasking proteins with their NCBI and UniProt accession numbers, canonical and additional biological functions, monomeric/oligomeric states, PDB codes when available and bibliographic references. This database also serves to gain insight into some characteristics of multitasking proteins such as frequencies of the different pairs of functions, phylogenetic conservation and so forth.
Non-coding RNAs—Novel targets in neurotoxicity
Tal, Tamara L.; Tanguay, Robert L.
2012-01-01
Over the past ten years non-coding RNAs (ncRNAs) have emerged as pivotal players in fundamental physiological and cellular processes and have been increasingly implicated in cancer, immune disorders, and cardiovascular, neurodegenerative, and metabolic diseases. MicroRNAs (miRNAs) represent a class of ncRNA molecules that function as negative regulators of post-transcriptional gene expression. miRNAs are predicted to regulate 60% of all human protein-coding genes and as such, play key roles in cellular and developmental processes, human health, and disease. Relative to counterparts that lack bindings sites for miRNAs, genes encoding proteins that are post-transcriptionally regulated by miRNAs are twice as likely to be sensitive to environmental chemical exposure. Not surprisingly, miRNAs have been recognized as targets or effectors of nervous system, developmental, hepatic, and carcinogenic toxicants, and have been identified as putative regulators of phase I xenobiotic-metabolizing enzymes. In this review, we give an overview of the types of ncRNAs and highlight their roles in neurodevelopment, neurological disease, activity-dependent signaling, and drug metabolism. We then delve into specific examples that illustrate their importance as mediators, effectors, or adaptive agents of neurotoxicants or neuroactive pharmaceutical compounds. Finally, we identify a number of outstanding questions regarding ncRNAs and neurotoxicity. PMID:22394481
A multitasking Argonaute: exploring the many facets of C. elegans CSR-1.
Wedeles, Christopher J; Wu, Monica Z; Claycomb, Julie M
2013-12-01
While initial studies of small RNA-mediated gene regulatory pathways focused on the cytoplasmic functions of such pathways, identifying roles for Argonaute/small RNA pathways in modulating chromatin and organizing the genome has become a topic of intense research in recent years. Nuclear regulatory mechanisms for Argonaute/small RNA pathways appear to be widespread, in organisms ranging from plants to fission yeast, Caenorhabditis elegans to humans. As the effectors of small RNA-mediated gene regulatory pathways, Argonaute proteins guide the chromatin-directed activities of these pathways. Of particular interest is the C. elegans Argonaute, chromosome segregation and RNAi deficient (CSR-1), which has been implicated in such diverse functions as organizing the holocentromeres of worm chromosomes, modulating germline chromatin, protecting the genome from foreign nucleic acid, regulating histone levels, executing RNAi, and inhibiting translation in conjunction with Pumilio proteins. CSR-1 interacts with small RNAs known as 22G-RNAs, which have complementarity to 25 % of the protein coding genes. This peculiar Argonaute is the only essential C. elegans Argonaute out of 24 family members in total. Here, we summarize the current understanding of CSR-1 functions in the worm, with emphasis on the chromatin-directed activities of this ever-intriguing Argonaute.
Exceptionally long 5' UTR short tandem repeats specifically linked to primates.
Namdar-Aligoodarzi, P; Mohammadparast, S; Zaker-Kandjani, B; Talebi Kakroodi, S; Jafari Vesiehsari, M; Ohadi, M
2015-09-10
We have previously reported genome-scale short tandem repeats (STRs) in the core promoter interval (i.e. -120 to +1 to the transcription start site) of protein-coding genes that have evolved identically in primates vs. non-primates. Those STRs may function as evolutionary switch codes for primate speciation. In the current study, we used the Ensembl database to analyze the 5' untranslated region (5' UTR) between +1 and +60 of the transcription start site of the entire human protein-coding genes annotated in the GeneCards database, in order to identify "exceptionally long" STRs (≥5-repeats), which may be of selective/adaptive advantage. The importance of this critical interval is its function as core promoter, and its effect on transcription and translation. In order to minimize ascertainment bias, we analyzed the evolutionary status of the human 5' UTR STRs of ≥5-repeats in several species encompassing six major orders and superorders across mammals, including primates, rodents, Scandentia, Laurasiatheria, Afrotheria, and Xenarthra. We introduce primate-specific STRs, and STRs which have expanded from mouse to primates. Identical co-occurrence of the identified STRs of rare average frequency between 0.006 and 0.0001 in primates supports a role for those motifs in processes that diverged primates from other mammals, such as neuronal differentiation (e.g. APOD and FGF4), and craniofacial development (e.g. FILIP1L). A number of the identified STRs of ≥5-repeats may be human-specific (e.g. ZMYM3 and DAZAP1). Future work is warranted to examine the importance of the listed genes in primate/human evolution, development, and disease. Copyright © 2015 Elsevier B.V. All rights reserved.
Rioualen, Claire; Da Costa, Quentin; Chetrit, Bernard; Charafe-Jauffret, Emmanuelle; Ginestier, Christophe
2017-01-01
High-throughput RNAi screenings (HTS) allow quantifying the impact of the deletion of each gene in any particular function, from virus-host interactions to cell differentiation. However, there has been less development for functional analysis tools dedicated to RNAi analyses. HTS-Net, a network-based analysis program, was developed to identify gene regulatory modules impacted in high-throughput screenings, by integrating transcription factors-target genes interaction data (regulome) and protein-protein interaction networks (interactome) on top of screening z-scores. HTS-Net produces exhaustive HTML reports for results navigation and exploration. HTS-Net is a new pipeline for RNA interference screening analyses that proves better performance than simple gene rankings by z-scores, by re-prioritizing genes and replacing them in their biological context, as shown by the three studies that we reanalyzed. Formatted input data for the three studied datasets, source code and web site for testing the system are available from the companion web site at http://htsnet.marseille.inserm.fr/. We also compared our program with existing algorithms (CARD and hotnet2). PMID:28949986
Chu, H W; Rios, C; Huang, C; Wesolowska-Andersen, A; Burchard, E G; O'Connor, B P; Fingerlin, T E; Nichols, D; Reynolds, S D; Seibold, M A
2015-10-01
Targeted knockout of genes in primary human cells using CRISPR-Cas9-mediated genome-editing represents a powerful approach to study gene function and to discern molecular mechanisms underlying complex human diseases. We used lentiviral delivery of CRISPR-Cas9 machinery and conditional reprogramming culture methods to knockout the MUC18 gene in human primary nasal airway epithelial cells (AECs). Massively parallel sequencing technology was used to confirm that the genome of essentially all cells in the edited AEC populations contained coding region insertions and deletions (indels). Correspondingly, we found mRNA expression of MUC18 was greatly reduced and protein expression was absent. Characterization of MUC18 knockout cell populations stimulated with TLR2, 3 and 4 agonists revealed that IL-8 (a proinflammatory chemokine) responses of AECs were greatly reduced in the absence of functional MUC18 protein. Our results show the feasibility of CRISPR-Cas9-mediated gene knockouts in AEC culture (both submerged and polarized), and suggest a proinflammatory role for MUC18 in airway epithelial response to bacterial and viral stimuli.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steinkasserer, A.; Koettnitz, K.; Hauber, J.
1995-02-10
The eukaryotic initiation factor 5A (eIF-5A) has been identified as an essential cofactor for the HIV-1 transactivator protein Rev. Rev plays a key role in the complex regulation of HIV-1 gene expression and thereby in the generation of infectious virus particles. Expression of eIF-5A is vital for Rev function, and inhibition of this interaction leads to a block of the viral replication cycle. In humans, four different eIF-5A genes have been identified. One codes for the eIF-5A protein and the other three are pseudogenes. Using a panel of somatic rodent-human cell hybrids in combination with fluorescence in situ hybridization analysis,more » we show that the four genes map to three different chromosomes. The coding eIF-5A gene (EIF5A) maps to 17p12-p13, and the three pseudogenes EIF5AP1, EIF5AP2, and EIF5AP3 map to 10q23.3, 17q25, and 19q13.2, respectively. This is the first localization report for a eukaryotic cofactor for a regulatory HIV-1 protein. 16 refs., 1 fig.« less
SIFTER search: a web server for accurate phylogeny-based protein function prediction
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.
We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less
SIFTER search: a web server for accurate phylogeny-based protein function prediction
Sahraeian, Sayed M.; Luo, Kevin R.; Brenner, Steven E.
2015-05-15
We are awash in proteins discovered through high-throughput sequencing projects. As only a minuscule fraction of these have been experimentally characterized, computational methods are widely used for automated annotation. Here, we introduce a user-friendly web interface for accurate protein function prediction using the SIFTER algorithm. SIFTER is a state-of-the-art sequence-based gene molecular function prediction algorithm that uses a statistical model of function evolution to incorporate annotations throughout the phylogenetic tree. Due to the resources needed by the SIFTER algorithm, running SIFTER locally is not trivial for most users, especially for large-scale problems. The SIFTER web server thus provides access tomore » precomputed predictions on 16 863 537 proteins from 232 403 species. Users can explore SIFTER predictions with queries for proteins, species, functions, and homologs of sequences not in the precomputed prediction set. Lastly, the SIFTER web server is accessible at http://sifter.berkeley.edu/ and the source code can be downloaded.« less
AP1 Keeps Chromatin Poised for Action | Center for Cancer Research
The human genome harbors gene-encoding DNA, the blueprint for building proteins that regulate cellular function. Embedded across the genome, in non-coding regions, are DNA elements to which regulatory factors bind. The interaction of regulatory factors with DNA at these sites modifies gene expression to modulate cell activity. In cells, DNA exists in a complex with proteins called chromatin that compacts the DNA in the nucleus, strongly restricting access to DNA sequences. As a result, regulatory factors only interact with a small subset of their potential binding elements in a given cell to regulate genes. How factors recognize and select sites in chromatin across the genome is not well understood -- but several discoveries in CCR’s Laboratory of Receptor Biology and Gene Expression (LRBGE) have shed light on the mechanisms that direct factors to DNA.
A genome-wide resource for the analysis of protein localisation in Drosophila
Sarov, Mihail; Barz, Christiane; Jambor, Helena; Hein, Marco Y; Schmied, Christopher; Suchold, Dana; Stender, Bettina; Janosch, Stephan; KJ, Vinay Vikas; Krishnan, RT; Krishnamoorthy, Aishwarya; Ferreira, Irene RS; Ejsmont, Radoslaw K; Finkl, Katja; Hasse, Susanne; Kämpfer, Philipp; Plewka, Nicole; Vinis, Elisabeth; Schloissnig, Siegfried; Knust, Elisabeth; Hartenstein, Volker; Mann, Matthias; Ramaswami, Mani; VijayRaghavan, K; Tomancak, Pavel; Schnorrer, Frank
2016-01-01
The Drosophila genome contains >13000 protein-coding genes, the majority of which remain poorly investigated. Important reasons include the lack of antibodies or reporter constructs to visualise these proteins. Here, we present a genome-wide fosmid library of 10000 GFP-tagged clones, comprising tagged genes and most of their regulatory information. For 880 tagged proteins, we created transgenic lines, and for a total of 207 lines, we assessed protein expression and localisation in ovaries, embryos, pupae or adults by stainings and live imaging approaches. Importantly, we visualised many proteins at endogenous expression levels and found a large fraction of them localising to subcellular compartments. By applying genetic complementation tests, we estimate that about two-thirds of the tagged proteins are functional. Moreover, these tagged proteins enable interaction proteomics from developing pupae and adult flies. Taken together, this resource will boost systematic analysis of protein expression and localisation in various cellular and developmental contexts. DOI: http://dx.doi.org/10.7554/eLife.12068.001 PMID:26896675
Structural and functional partitioning of bread wheat chromosome 3B.
Choulet, Frédéric; Alberti, Adriana; Theil, Sébastien; Glover, Natasha; Barbe, Valérie; Daron, Josquin; Pingault, Lise; Sourdille, Pierre; Couloux, Arnaud; Paux, Etienne; Leroy, Philippe; Mangenot, Sophie; Guilhot, Nicolas; Le Gouis, Jacques; Balfourier, Francois; Alaux, Michael; Jamilloux, Véronique; Poulain, Julie; Durand, Céline; Bellec, Arnaud; Gaspin, Christine; Safar, Jan; Dolezel, Jaroslav; Rogers, Jane; Vandepoele, Klaas; Aury, Jean-Marc; Mayer, Klaus; Berges, Hélène; Quesneville, Hadi; Wincker, Patrick; Feuillet, Catherine
2014-07-18
We produced a reference sequence of the 1-gigabase chromosome 3B of hexaploid bread wheat. By sequencing 8452 bacterial artificial chromosomes in pools, we assembled a sequence of 774 megabases carrying 5326 protein-coding genes, 1938 pseudogenes, and 85% of transposable elements. The distribution of structural and functional features along the chromosome revealed partitioning correlated with meiotic recombination. Comparative analyses indicated high wheat-specific inter- and intrachromosomal gene duplication activities that are potential sources of variability for adaption. In addition to providing a better understanding of the organization, function, and evolution of a large and polyploid genome, the availability of a high-quality sequence anchored to genetic maps will accelerate the identification of genes underlying important agronomic traits. Copyright © 2014, American Association for the Advancement of Science.
Pesaranghader, Ahmad; Matwin, Stan; Sokolova, Marina; Beiko, Robert G
2016-05-01
Measures of protein functional similarity are essential tools for function prediction, evaluation of protein-protein interactions (PPIs) and other applications. Several existing methods perform comparisons between proteins based on the semantic similarity of their GO terms; however, these measures are highly sensitive to modifications in the topological structure of GO, tend to be focused on specific analytical tasks and concentrate on the GO terms themselves rather than considering their textual definitions. We introduce simDEF, an efficient method for measuring semantic similarity of GO terms using their GO definitions, which is based on the Gloss Vector measure commonly used in natural language processing. The simDEF approach builds optimized definition vectors for all relevant GO terms, and expresses the similarity of a pair of proteins as the cosine of the angle between their definition vectors. Relative to existing similarity measures, when validated on a yeast reference database, simDEF improves correlation with sequence homology by up to 50%, shows a correlation improvement >4% with gene expression in the biological process hierarchy of GO and increases PPI predictability by > 2.5% in F1 score for molecular function hierarchy. Datasets, results and source code are available at http://kiwi.cs.dal.ca/Software/simDEF CONTACT: ahmad.pgh@dal.ca or beiko@cs.dal.ca Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Rapid Detection of Positive Selection in Genes and Genomes Through Variation Clusters
Wagner, Andreas
2007-01-01
Positive selection in genes and genomes can point to the evolutionary basis for differences among species and among races within a species. The detection of positive selection can also help identify functionally important protein regions and thus guide protein engineering. Many existing tests for positive selection are excessively conservative, vulnerable to artifacts caused by demographic population history, or computationally very intensive. I here propose a simple and rapid test that is complementary to existing tests and that overcomes some of these problems. It relies on the null hypothesis that neutrally evolving DNA regions should show a Poisson distribution of nucleotide substitutions. The test detects significant deviations from this expectation in the form of variation clusters, highly localized groups of amino acid changes in a coding region. In applying this test to several thousand human–chimpanzee gene orthologs, I show that such variation clusters are not generally caused by relaxed selection. They occur in well-defined domains of a protein's tertiary structure and show a large excess of amino acid replacement over silent substitutions. I also identify multiple new human–chimpanzee orthologs subject to positive selection, among them genes that are involved in reproductive functions, immune defense, and the nervous system. PMID:17603100
Identification of a G protein coupled receptor induced in activated T cells.
Kaplan, M H; Smith, D I; Sundick, R S
1993-07-15
Many genes are induced after T cell activation to make a cell competent for proliferation and ultimately, function. Many of these genes encode surface receptors for growth factors that signal a cell to proliferate. We have cloned a novel gene (clone 6H1) that codes for a member of the G protein-coupled receptor superfamily. This gene was isolated from a chicken activated T cell cDNA library by low level hybridization to mammalian IL-2 cDNA probes. The 308 amino acid open reading frame has seven hydrophobic, presumably transmembrane domains and a consensus site for interaction with G proteins. Tissue distribution studies suggest that gene expression is restricted to activated T cells. The message appears by 1 h after activation and is maintained for at least 45 h. Transcription of 6H1 is induced by a number of T cell stimuli and is inhibited by cyclosporin A, but not by cycloheximide. This is the first description of a member of this superfamily expressed specifically in activated T cells. The gene product may provide a link between T cell growth factors and G protein activation.
Gene-specific cell labeling using MiMIC transposons
Gnerer, Joshua P.; Venken, Koen J. T.; Dierick, Herman A.
2015-01-01
Binary expression systems such as GAL4/UAS, LexA/LexAop and QF/QUAS have greatly enhanced the power of Drosophila as a model organism by allowing spatio-temporal manipulation of gene function as well as cell and neural circuit function. Tissue-specific expression of these heterologous transcription factors relies on random transposon integration near enhancers or promoters that drive the binary transcription factor embedded in the transposon. Alternatively, gene-specific promoter elements are directly fused to the binary factor within the transposon followed by random or site-specific integration. However, such insertions do not consistently recapitulate endogenous expression. We used Minos-Mediated Integration Cassette (MiMIC) transposons to convert host loci into reliable gene-specific binary effectors. MiMIC transposons allow recombinase-mediated cassette exchange to modify the transposon content. We developed novel exchange cassettes to convert coding intronic MiMIC insertions into gene-specific binary factor protein-traps. In addition, we expanded the set of binary factor exchange cassettes available for non-coding intronic MiMIC insertions. We show that binary factor conversions of different insertions in the same locus have indistinguishable expression patterns, suggesting that they reliably reflect endogenous gene expression. We show the efficacy and broad applicability of these new tools by dissecting the cellular expression patterns of the Drosophila serotonin receptor gene family. PMID:25712101
Li, Yongquan; Li, Hongyu
2014-03-01
Studies on Acidithiobacillus ferrooxidans accepting electrons from Fe(II) have previously focused on cytochrome c. However, we have discovered that, besides cytochrome c, type IV pili (Tfp) can transfer electrons. Here, we report conduction by Tfp of A. ferrooxidans analyzed with a conducting-probe atomic force microscope (AFM). The results indicate that the Tfp of A. ferrooxidans are highly conductive. The genome sequence of A. ferrooxidans ATCC 23270 contains two genes, pilV and pilW, which code for pilin domain proteins with the conserved amino acids characteristic of Tfp. Multiple alignment analysis of the PilV and PilW (pilin) proteins indicated that pilV is the adhesin gene while pilW codes for the major protein element of Tfp. The likely function of Tfp is to complete the circuit between the cell surface and Fe(II) oxides. These results indicate that Tfp of A. ferrooxidans might serve as biological nanowires transferring electrons from the surface of Fe(II) oxides to the cell surface. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The Blueprint of a Minimal Cell: MiniBacillus
Reuß, Daniel R.; Commichau, Fabian M.; Gundlach, Jan; Zhu, Bingyao
2016-01-01
SUMMARY Bacillus subtilis is one of the best-studied organisms. Due to the broad knowledge and annotation and the well-developed genetic system, this bacterium is an excellent starting point for genome minimization with the aim of constructing a minimal cell. We have analyzed the genome of B. subtilis and selected all genes that are required to allow life in complex medium at 37°C. This selection is based on the known information on essential genes and functions as well as on gene and protein expression data and gene conservation. The list presented here includes 523 and 119 genes coding for proteins and RNAs, respectively. These proteins and RNAs are required for the basic functions of life in information processing (replication and chromosome maintenance, transcription, translation, protein folding, and secretion), metabolism, cell division, and the integrity of the minimal cell. The completeness of the selected metabolic pathways, reactions, and enzymes was verified by the development of a model of metabolism of the minimal cell. A comparison of the MiniBacillus genome to the recently reported designed minimal genome of Mycoplasma mycoides JCVI-syn3.0 indicates excellent agreement in the information-processing pathways, whereas each species has a metabolism that reflects specific evolution and adaptation. The blueprint of MiniBacillus presented here serves as the starting point for a successive reduction of the B. subtilis genome. PMID:27681641
Kim, Chul Min
2016-01-01
Genes encoding ROOT HAIR DEFECTIVE SIX-LIKE (RSL) class I basic helix loop helix proteins are expressed in future root hair cells of the Arabidopsis thaliana root meristem where they positively regulate root hair cell development. Here we show that there are three RSL class I protein coding genes in the Brachypodium distachyon genome, BdRSL1, BdRSL2 and BdRSL3, and each is expressed in developing root hair cells after the asymmetric cell division that forms root hair cells and hairless epidermal cells. Expression of BdRSL class I genes is sufficient for root hair cell development: ectopic overexpression of any of the three RSL class I genes induces the development of root hairs in every cell of the root epidermis. Expression of BdRSL class I genes in root hairless Arabidopsis thaliana root hair defective 6 (Atrhd6) Atrsl1 double mutants, devoid of RSL class I function, restores root hair development indicating that the function of these proteins has been conserved. However, neither AtRSL nor BdRSL class I genes is sufficient for root hair development in A. thaliana. These data demonstrate that the spatial pattern of class I RSL activity can account for the pattern of root hair cell differentiation in B. distachyon. However, the spatial pattern of class I RSL activity cannot account for the spatial pattern of root hair cells in A. thaliana. Taken together these data indicate that that the functions of RSL class I proteins have been conserved among most angiosperms—monocots and eudicots—despite the dramatically different patterns of root hair cell development. PMID:27494519
Navarro-García, F; Sánchez, M; Pla, J; Nombela, C
1995-01-01
Mitogen-activated protein (MAP) kinases represent a group of serine/threonine protein kinases playing a central role in signal transduction processes in eukaryotic cells. Using a strategy based on the complementation of the thermosensitive autolytic phenotype of slt2 null mutants, we have isolated a Candida albicans homolog of Saccharomyces cerevisiae MAP kinase gene SLT2 (MPK1), which is involved in the recently outlined PKC1-controlled signalling pathway. The isolated gene, named MKC1 (MAP kinase from C. albicans), coded for a putative protein, Mkc1p, of 58,320 Da that displayed all the characteristic domains of MAP kinases and was 55% identical to S. cerevisiae Slt2p (Mpk1p). The MKC1 gene was deleted in a diploid Candida strain, and heterozygous and homozygous strains, in both Ura+ and Ura- backgrounds, were obtained to facilitate the analysis of the function of the gene. Deletion of the two alleles of the MKC1 gene gave rise to viable cells that grew at 28 and 37 degrees C but, nevertheless, displayed a variety of phenotypic traits under more stringent conditions. These included a low growth yield and a loss of viability in cultures grown at 42 degrees C, a high sensitivity to thermal shocks at 55 degrees C, an enhanced susceptibility to caffeine that was osmotically remediable, and the formation of a weak cell wall with a very low resistance to complex lytic enzyme preparations. The analysis of the functions downstream of the MKC1 gene should contribute to understanding of the connection of growth and morphogenesis in pathogenic fungi. PMID:7891715
Non-coding RNAs in virology: an RNA genomics approach.
Isaac, Christopher; Patel, Trushar R; Zovoilis, Athanasios
2018-04-01
Advances in sequencing technologies and bioinformatic analysis techniques have greatly improved our understanding of various classes of RNAs and their functions. Despite not coding for proteins, non-coding RNAs (ncRNAs) are emerging as essential biomolecules fundamental for cellular functions and cell survival. Interestingly, ncRNAs produced by viruses not only control the expression of viral genes, but also influence host cell regulation and circumvent host innate immune response. Correspondingly, ncRNAs produced by the host genome can play a key role in host-virus interactions. In this article, we will first discuss a number of types of viral and mammalian ncRNAs associated with viral infections. Subsequently, we also describe the new possibilities and opportunities that RNA genomics and next-generation sequencing technologies provide for studying ncRNAs in virology.
Trystuła, M; Żychowska, M; Wilk-Frańczuk, M; Kropotov, J D; Pąchalska, M
2017-02-16
The aim of this study was to evaluate dysregulation of gene expression associated with the cellular stress response in a patient with a post-"warning stroke" depressive disorder confirmed by the presence of a neurophysiological neuromarker through the use of quantitative EEG and event-related potentials. The patient was tested for seven genes associated with the stress reaction: HSPA1A, HSPB1, IL6, IL10, CRP, and HSF-1 along with NF-κB, compared to gene expression in health controls. A 54-year-old patient with a past history of schizophrenia (at the age of 20), and of transient ischemic attack (at the age of 53) and depressive disorder confirmed by functional, cognitive, emotional, and affectional diagnostics underwent additional testing for expression of the genes associated with stress response. The expression of genes coding for heat shock protein (HSPA1A, HSPB1), interleukins (IL6, IL10), and C-reactive protein was tested along with factors that regulate their expression. The results of the tests conducted on this patient were compared with 42 healthy control subjects. Diagnostic testing revealed upregulation in expression of these genes, presenting as increased expression of the target genes and of the regulatory genes. A post-"warning stroke" depressive disorder appears to be associated with overexpression of the genes coding for HSP and interleukins. Further research on larger groups of people may provide grounds for treatment modification.
Rossmassler, Karen; Dietrich, Carsten; Thompson, Claire; ...
2015-11-26
Termites are important contributors to carbon and nitrogen cycling in tropical ecosystems. Higher termites digest lignocellulose in various stages of humification with the help of an entirely prokaryotic microbiota housed in their compartmented intestinal tract. Previous studies revealed fundamental differences in community structure between compartments, but the functional roles of individual lineages in symbiotic digestion are mostly unknown. Furthermore, we conducted a highly resolved analysis of the gut microbiota in six species of higher termites that feed on plant material at different levels of humification. Combining amplicon sequencing and metagenomics, we assessed similarities in community structure and functional potential betweenmore » the major hindgut compartments (P1, P3, and P4). Cluster analysis of the relative abundances of orthologous gene clusters (COGs) revealed high similarities among woodand litter-feeding termites and strong differences to humivorous species. However, abundance estimates of bacterial phyla based on 16S rRNA genes greatly differed from those based on protein-coding genes. In conclusion, the community structure and functional potential of the microbiota in individual gut compartments are clearly driven by the digestive strategy of the host. The metagenomics libraries obtained in this study provide the basis for future studies that elucidate the fundamental differences in the symbiont-mediated breakdown of lignocellulose and humus by termites of different feeding groups. The high proportion of uncultured bacterial lineages in all samples calls for a reference-independent approach for the correct taxonomic assignment of protein-coding genes.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rossmassler, Karen; Dietrich, Carsten; Thompson, Claire
Termites are important contributors to carbon and nitrogen cycling in tropical ecosystems. Higher termites digest lignocellulose in various stages of humification with the help of an entirely prokaryotic microbiota housed in their compartmented intestinal tract. Previous studies revealed fundamental differences in community structure between compartments, but the functional roles of individual lineages in symbiotic digestion are mostly unknown. Furthermore, we conducted a highly resolved analysis of the gut microbiota in six species of higher termites that feed on plant material at different levels of humification. Combining amplicon sequencing and metagenomics, we assessed similarities in community structure and functional potential betweenmore » the major hindgut compartments (P1, P3, and P4). Cluster analysis of the relative abundances of orthologous gene clusters (COGs) revealed high similarities among woodand litter-feeding termites and strong differences to humivorous species. However, abundance estimates of bacterial phyla based on 16S rRNA genes greatly differed from those based on protein-coding genes. In conclusion, the community structure and functional potential of the microbiota in individual gut compartments are clearly driven by the digestive strategy of the host. The metagenomics libraries obtained in this study provide the basis for future studies that elucidate the fundamental differences in the symbiont-mediated breakdown of lignocellulose and humus by termites of different feeding groups. The high proportion of uncultured bacterial lineages in all samples calls for a reference-independent approach for the correct taxonomic assignment of protein-coding genes.« less
An Eye on Trafficking Genes: Identification of Four Eye Color Mutations in Drosophila
Grant, Paaqua; Maga, Tara; Loshakov, Anna; Singhal, Rishi; Wali, Aminah; Nwankwo, Jennifer; Baron, Kaitlin; Johnson, Diana
2016-01-01
Genes that code for proteins involved in organelle biogenesis and intracellular trafficking produce products that are critical in normal cell function . Conserved orthologs of these are present in most or all eukaryotes, including Drosophila melanogaster. Some of these genes were originally identified as eye color mutants with decreases in both types of pigments found in the fly eye. These criteria were used for identification of such genes, four eye color mutations that are not annotated in the genome sequence: chocolate, maroon, mahogany, and red Malpighian tubules were molecularly mapped and their genome sequences have been evaluated. Mapping was performed using deletion analysis and complementation tests. chocolate is an allele of the VhaAC39-1 gene, which is an ortholog of the Vacuolar H+ ATPase AC39 subunit 1. maroon corresponds to the Vps16A gene and its product is part of the HOPS complex, which participates in transport and organelle fusion. red Malpighian tubule is the CG12207 gene, which encodes a protein of unknown function that includes a LysM domain. mahogany is the CG13646 gene, which is predicted to be an amino acid transporter. The strategy of identifying eye color genes based on perturbations in quantities of both types of eye color pigments has proven useful in identifying proteins involved in trafficking and biogenesis of lysosome-related organelles. Mutants of these genes can form the basis of valuable in vivo models to understand these processes. PMID:27558665
Moreno, Renata; Fonseca, Pilar; Rojo, Fernando
2010-08-06
In Pseudomonas putida, the expression of the pWW0 plasmid genes for the toluene/xylene assimilation pathway (the TOL pathway) is subject to complex regulation in response to environmental and physiological signals. This includes strong inhibition via catabolite repression, elicited by the carbon sources that the cells prefer to hydrocarbons. The Crc protein, a global regulator that controls carbon flow in pseudomonads, has an important role in this inhibition. Crc is a translational repressor that regulates the TOL genes, but how it does this has remained unknown. This study reports that Crc binds to sites located at the translation initiation regions of the mRNAs coding for XylR and XylS, two specific transcription activators of the TOL genes. Unexpectedly, eight additional Crc binding sites were found overlapping the translation initiation sites of genes coding for several enzymes of the pathway, all encoded within two polycistronic mRNAs. Evidence is provided supporting the idea that these sites are functional. This implies that Crc can differentially modulate the expression of particular genes within polycistronic mRNAs. It is proposed that Crc controls TOL genes in two ways. First, Crc inhibits the translation of the XylR and XylS regulators, thereby reducing the transcription of all TOL pathway genes. Second, Crc inhibits the translation of specific structural genes of the pathway, acting mainly on proteins involved in the first steps of toluene assimilation. This ensures a rapid inhibitory response that reduces the expression of the toluene/xylene degradation proteins when preferred carbon sources become available.
Singh, Pratichi; Dass, J Febin Prabhu
2018-05-07
IFNL3 gene plays a crucial role in immune defense against viruses. It induces the interferon stimulated genes (ISGs) with antiviral properties by activating the JAK-STAT pathway. In this study, we investigated the evolutionary force involved in shaping the IFNL3 gene to perform its downstream function as a regulatory gene in HCV clearance. We have selected 25 IFNL3 coding sequences with human gene as a reference sequence and constructed a phylogeny. Furthermore, rate of variation, substitution saturation test, phylogenetic informativeness and differential selection were also analysed. The codon evolution result suggests that nearly neutral mutation is the key pattern in shaping the IFNL3 evolution. The results were validated by subjecting the human IFNL3 protein variants to that of the native through a molecular dynamics simulation study. The molecular dynamics simulation clearly depicts the negative impact on the reported variants in human IFNL3 protein. However, these detrimental mutations (R157Q and R157W) were shown to be negatively selected in the evolutionary study of the mammals. Hence, the variation revealed a mild impact on the IFNL3 function and may be removed from the population through negative selection due to its high functional constraints. In a nutshell, our study may contribute the overall evidence in phylotyping and structural transformation that takes place in the non-synonymous substitutions of IFNL3 protein. Substantially, our obtained theoretical knowledge will lay the path to extend the experimental validation in HCV clearance. Copyright © 2018 Elsevier Ltd. All rights reserved.
Keel, B N; Nonneman, D J; Rohrer, G A
2017-08-01
Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a more significant effect on phenotypic variation than do other types of genetic variants. Hence, a comprehensive list of these functional variants would be of considerable interest in swine genomic studies, particularly those targeting fertility and production traits. Whole-genome sequence was obtained from 72 of the founders of an intensely phenotyped experimental swine herd at the U.S. Meat Animal Research Center (USMARC). These animals included all 24 of the founding boars (12 Duroc and 12 Landrace) and 48 Yorkshire-Landrace composite sows. Sequence reads were mapped to the Sscrofa10.2 genome build, resulting in a mean of 6.1 fold (×) coverage per genome. A total of 22 342 915 high confidence SNPs were identified from the sequenced genomes. These included 21 million previously reported SNPs and 79% of the 62 163 SNPs on the PorcineSNP60 BeadChip assay. Variation was detected in the coding sequence or untranslated regions (UTRs) of 87.8% of the genes in the porcine genome: loss-of-function variants were predicted in 504 genes, 10 202 genes contained nonsynonymous variants, 10 773 had variation in UTRs and 13 010 genes contained synonymous variants. Approximately 139 000 SNPs were classified as loss-of-function, nonsynonymous or regulatory, which suggests that over 99% of the variation detected in our pigs could potentially be ignored, allowing us to focus on a much smaller number of functional SNPs during future analyses. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.
Characterization of the Lymantria dispar nucleopolyhedrovirus 25K FP gene
David S. Bischoff; James M. Slavicek
1996-01-01
The Lymantria dispar nucleopolyhedrovirus (LdMNPV) gene encoding the 25K FP protein has been cloned and sequenced. The 25KFP gene codes for a 217 amino acid protein with a predicted molecular mass of 24870 Da. Expression of the 25K FP protein in a rabbit reticulocyte system generated a 27 kDa protein, in close agreement with the...
Rubel, Elisa Terumi; Raittz, Roberto Tadeu; Coimbra, Nilson Antonio da Rocha; Gehlen, Michelly Alves Coutinho; Pedrosa, Fábio de Oliveira
2016-12-15
Azopirillum brasilense is a plant-growth promoting nitrogen-fixing bacteria that is used as bio-fertilizer in agriculture. Since nitrogen fixation has a high-energy demand, the reduction of N 2 to NH 4 + by nitrogenase occurs only under limiting conditions of NH 4 + and O 2 . Moreover, the synthesis and activity of nitrogenase is highly regulated to prevent energy waste. In A. brasilense nitrogenase activity is regulated by the products of draG and draT. The product of the draB gene, located downstream in the draTGB operon, may be involved in the regulation of nitrogenase activity by an, as yet, unknown mechanism. A deep in silico analysis of the product of draB was undertaken aiming at suggesting its possible function and involvement with DraT and DraG in the regulation of nitrogenase activity in A. brasilense. In this work, we present a new artificial intelligence strategy for protein classification, named ProClaT. The features used by the pattern recognition model were derived from the primary structure of the DraB homologous proteins, calculated by a ProClaT internal algorithm. ProClaT was applied to this case study and the results revealed that the A. brasilense draB gene codes for a protein highly similar to the nitrogenase associated NifO protein of Azotobacter vinelandii. This tool allowed the reclassification of DraB/NifO homologous proteins, hypothetical, conserved hypothetical and those annotated as putative arsenate reductase, ArsC, as NifO-like. An analysis of co-occurrence of draB, draT, draG and of other nif genes was performed, suggesting the involvement of draB (nifO) in nitrogen fixation, however, without the definition of a specific function.
Long-Range Control of Gene Expression: Emerging Mechanisms and Disruption in Disease
Kleinjan, Dirk A.; van Heyningen, Veronica
2005-01-01
Transcriptional control is a major mechanism for regulating gene expression. The complex machinery required to effect this control is still emerging from functional and evolutionary analysis of genomic architecture. In addition to the promoter, many other regulatory elements are required for spatiotemporally and quantitatively correct gene expression. Enhancer and repressor elements may reside in introns or up- and downstream of the transcription unit. For some genes with highly complex expression patterns—often those that function as key developmental control genes—the cis-regulatory domain can extend long distances outside the transcription unit. Some of the earliest hints of this came from disease-associated chromosomal breaks positioned well outside the relevant gene. With the availability of wide-ranging genome sequence comparisons, strong conservation of many noncoding regions became obvious. Functional studies have shown many of these conserved sites to be transcriptional regulatory elements that sometimes reside inside unrelated neighboring genes. Such sequence-conserved elements generally harbor sites for tissue-specific DNA-binding proteins. Developmentally variable chromatin conformation can control protein access to these sites and can regulate transcription. Disruption of these finely tuned mechanisms can cause disease. Some regulatory element mutations will be associated with phenotypes distinct from any identified for coding-region mutations. PMID:15549674
Deschamps, Matthieu; Laval, Guillaume; Fagny, Maud; Itan, Yuval; Abel, Laurent; Casanova, Jean-Laurent; Patin, Etienne; Quintana-Murci, Lluis
2016-01-01
Human genes governing innate immunity provide a valuable tool for the study of the selective pressure imposed by microorganisms on host genomes. A comprehensive, genome-wide study of how selective constraints and adaptations have driven the evolution of innate immunity genes is missing. Using full-genome sequence variation from the 1000 Genomes Project, we first show that innate immunity genes have globally evolved under stronger purifying selection than the remainder of protein-coding genes. We identify a gene set under the strongest selective constraints, mutations in which are likely to predispose individuals to life-threatening disease, as illustrated by STAT1 and TRAF3. We then evaluate the occurrence of local adaptation and detect 57 high-scoring signals of positive selection at innate immunity genes, variation in which has been associated with susceptibility to common infectious or autoimmune diseases. Furthermore, we show that most adaptations targeting coding variation have occurred in the last 6,000–13,000 years, the period at which populations shifted from hunting and gathering to farming. Finally, we show that innate immunity genes present higher Neandertal introgression than the remainder of the coding genome. Notably, among the genes presenting the highest Neandertal ancestry, we find the TLR6-TLR1-TLR10 cluster, which also contains functional adaptive variation in Europeans. This study identifies highly constrained genes that fulfill essential, non-redundant functions in host survival and reveals others that are more permissive to change—containing variation acquired from archaic hominins or adaptive variants in specific populations—improving our understanding of the relative biological importance of innate immunity pathways in natural conditions. PMID:26748513
Antisense transcription is pervasive but rarely conserved in enteric bacteria.
Raghavan, Rahul; Sloan, Daniel B; Ochman, Howard
2012-01-01
Noncoding RNAs, including antisense RNAs (asRNAs) that originate from the complementary strand of protein-coding genes, are involved in the regulation of gene expression in all domains of life. Recent application of deep-sequencing technologies has revealed that the transcription of asRNAs occurs genome-wide in bacteria. Although the role of the vast majority of asRNAs remains unknown, it is often assumed that their presence implies important regulatory functions, similar to those of other noncoding RNAs. Alternatively, many antisense transcripts may be produced by chance transcription events from promoter-like sequences that result from the degenerate nature of bacterial transcription factor binding sites. To investigate the biological relevance of antisense transcripts, we compared genome-wide patterns of asRNA expression in closely related enteric bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, by performing strand-specific transcriptome sequencing. Although antisense transcripts are abundant in both species, less than 3% of asRNAs are expressed at high levels in both species, and only about 14% appear to be conserved among species. And unlike the promoters of protein-coding genes, asRNA promoters show no evidence of sequence conservation between, or even within, species. Our findings suggest that many or even most bacterial asRNAs are nonadaptive by-products of the cell's transcription machinery. IMPORTANCE Application of high-throughput methods has revealed the expression throughout bacterial genomes of transcripts encoded on the strand complementary to protein-coding genes. Because transcription is costly, it is usually assumed that these transcripts, termed antisense RNAs (asRNAs), serve some function; however, the role of most asRNAs is unclear, raising questions about their relevance in cellular processes. Because natural selection conserves functional elements, comparisons between related species provide a method for assessing functionality genome-wide. Applying such an approach, we assayed all transcripts in two closely related bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, and demonstrate that, although the levels of genome-wide antisense transcription are similarly high in both bacteria, only a small fraction of asRNAs are shared across species. Moreover, the promoters associated with asRNAs show no evidence of sequence conservation between, or even within, species. These findings indicate that despite the genome-wide transcription of asRNAs, many of these transcripts are likely nonfunctional.
A Cas9 transgenic Plasmodium yoelii parasite for efficient gene editing.
Qian, Pengge; Wang, Xu; Yang, Zhenke; Li, Zhenkui; Gao, Han; Su, Xin-Zhuan; Cui, Huiting; Yuan, Jing
2018-06-01
The RNA-guided endonuclease Cas9 has applied as an efficient gene-editing method in malaria parasite Plasmodium. However, the size (4.2 kb) of the commonly used Cas9 from Streptococcus pyogenes (SpCas9) limits its utility for genome editing in the parasites only introduced with cas9 plasmid. To establish the endogenous and constitutive expression of Cas9 protein in the rodent malaria parasite P. yoelii, we replaced the coding region of an endogenous gene sera1 with the intact SpCas9 coding sequence using the CRISPR/Cas9-mediated genome editing method, generating the cas9-knockin parasite (PyCas9ki) of the rodent malaria parasite P. yoelii. The resulted PyCas9ki parasite displays normal progression during the whole life cycle and possesses the Cas9 protein expression in asexual blood stage. By introducing the plasmid (pYCs) containing only sgRNA and homologous template elements, we successfully achieved both deletion and tagging modifications for different endogenous genes in the genome of PyCas9ki parasite. This cas9-knockin PyCas9ki parasite provides a new platform facilitating gene functions study in the rodent malaria parasite P. yoelii. Copyright © 2018 Elsevier B.V. All rights reserved.
Balfanz, Sabine; Strünker, Timo; Frings, Stephan; Baumann, Arnd
2005-04-01
In invertebrates, the biogenic-amine octopamine is an important physiological regulator. It controls and modulates neuronal development, circadian rhythm, locomotion, 'fight or flight' responses, as well as learning and memory. Octopamine mediates its effects by activation of different GTP-binding protein (G protein)-coupled receptor types, which induce either cAMP production or Ca(2+) release. Here we describe the functional characterization of two genes from Drosophila melanogaster that encode three octopamine receptors. The first gene (Dmoa1) codes for two polypeptides that are generated by alternative splicing. When heterologously expressed, both receptors cause oscillatory increases of the intracellular Ca(2+) concentration in response to applying nanomolar concentrations of octopamine. The second gene (Dmoa2) codes for a receptor that specifically activates adenylate cyclase and causes a rise of intracellular cAMP with an EC(50) of approximately 3 x 10(-8) m octopamine. Tyramine, the precursor of octopamine biosynthesis, activates all three receptors at > or = 100-fold higher concentrations, whereas dopamine and serotonin are non-effective. Developmental expression of Dmoa genes was assessed by RT-PCR. Overlapping but not identical expression patterns were observed for the individual transcripts. The genes characterized in this report encode unique receptors that display signature properties of native octopamine receptors.
Sadhukhan, Ratan; Chowdhury, Priyanka; Ghosh, Sourav; Ghosh, Utpal
2018-06-01
Telomere DNA can form specialized nucleoprotein structure with telomere-associated proteins to hide free DNA ends or G-quadruplex structures under certain conditions especially in presence of G-quadruplex ligand. Telomere DNA is transcribed to form non-coding telomere repeat-containing RNA (TERRA) whose biogenesis and function is poorly understood. Our aim was to find the role of telomere-associated proteins and telomere structures in TERRA transcription. We silenced four [two shelterin (TRF1, TRF2) and two non-shelterin (PARP-1, SLX4)] telomere-associated genes using siRNA and verified depletion in protein level. Knocking down of one gene modulated expression of other telomere-associated genes and increased TERRA from 10q, 15q, XpYp and XqYq chromosomes in A549 cells. Telomere was destabilized or damaged by G-quadruplex ligand pyridostatin (PDS) and bleomycin. Telomere dysfunction-induced foci (TIFs) were observed for each case of depletion of proteins, treatment with PDS or bleomycin. TERRA level was elevated by PDS and bleomycin treatment alone or in combination with depletion of telomere-associated proteins.
MicroRNAs as New Characters in the Plot between Epigenetics and Prostate Cancer.
Paone, Alessio; Galli, Roberta; Fabbri, Muller
2011-01-01
Prostate cancer (PCA) still represents a leading cause of death. An increasing number of studies have documented that microRNAs (miRNAs), a subgroup of non-coding RNAs with gene regulatory functions, are differentially expressed in PCA respect to the normal tissue counterpart, suggesting their involvement in prostate carcinogenesis and dissemination. Interestingly, it has been shown that miRNAs undergo the same regulatory mechanisms than any other protein coding gene, including epigenetic regulation. In turn, miRNAs can also affect the expression of oncogenes and tumor suppressor genes by targeting effectors of the epigenetic machinery, therefore indirectly affecting the epigenetic controls on these genes. Among the genes that undergo this complex regulation, there is the androgen receptor (AR), a key therapeutic target for PCA. This review will focus on the role of epigenetically regulated and epigenetically regulating miRNAs in PCA and on the fine regulation of AR expression, as mediated by this miRNA-epigenetics interaction.
Alternative splicing and promoter use in TFII-I genes.
Makeyev, Aleksandr V; Bayarsaihan, Dashzeveg
2009-03-15
TFII-I proteins are ubiquitously expressed transcriptional factors involved in both basal transcription and signal transduction activation or repression. TFII-I proteins are detected as early as at two-cell stage and exhibit distinct and dynamic expression patterns in developing embryos as well as mark regional variation in the adult mouse brain. Analysis of atypical small and rare chromosomal deletions at 7q11.23 points to TFII-I genes (GTF2I and GTF2IRD1) as the prime candidates responsible for craniofacial and cognitive abnormalities in the Williams-Beuren syndrome. TFII-I genes are often subjected to alternative splicing, which generates isoforms that show different activities and play distinct biological roles. The coding regions of TFII-I genes are composed of more than 30 exons and are well conserved among vertebrates. However, their 5' untranslated regions are not as well conserved and all poorly characterized. In the present work, we analyzed promoter regions of TFII-I genes and described their additional exons, as well as tested tissue specificity of both previously reported and novel alternatively spliced isoforms. Our comprehensive analysis leads to further elucidation of the functional heterogeneity of TFII-I proteins, provides hints on search for regulatory pathways governing their expression, and opens up possibilities for examining the effect of different haplotypes on their promoter functions.
de Freitas, Michele C R; Resende, Juliana A; Ferreira-Machado, Alessandra B; Saji, Guadalupe D R Q; de Vasconcelos, Ana T R; da Silva, Vânia L; Nicolás, Marisa F; Diniz, Cláudio G
2016-01-01
Bacteroides fragilis , member from commensal gut microbiota, is an important pathogen associated to endogenous infections and metronidazole remains a valuable antibiotic for the treatment of these infections, although bacterial resistance is widely reported. Considering the need of a better understanding on the global mechanisms by which B. fragilis survive upon metronidazole exposure, we performed a RNA-seq transcriptomic approach with validation of gene expression results by qPCR. Bacteria strains were selected after in vitro subcultures with subinhibitory concentration (SIC) of the drug. From a wild type B. fragilis ATCC 43859 four derivative strains were selected: first and fourth subcultures under metronidazole exposure and first and fourth subcultures after drug removal. According to global gene expression analysis, 2,146 protein coding genes were identified, of which a total of 1,618 (77%) were assigned to a Gene Ontology term (GO), indicating that most known cellular functions were taken. Among these 2,146 protein coding genes, 377 were shared among all strains, suggesting that they are critical for B. fragilis survival. In order to identify distinct expression patterns, we also performed a K-means clustering analysis set to 15 groups. This analysis allowed us to detect the major activated or repressed genes encoding for enzymes which act in several metabolic pathways involved in metronidazole response such as drug activation, defense mechanisms against superoxide ions, high expression level of multidrug efflux pumps, and DNA repair. The strains collected after metronidazole removal were functionally more similar to those cultured under drug pressure, reinforcing that drug-exposure lead to drastic persistent changes in the B. fragilis gene expression patterns. These results may help to elucidate B. fragilis response during metronidazole exposure, mainly at SIC, contributing with information about bacterial survival strategies under stress conditions in their environment.