The Evolution and Expression Pattern of Human Overlapping lncRNA and Protein-coding Gene Pairs.
Ning, Qianqian; Li, Yixue; Wang, Zhen; Zhou, Songwen; Sun, Hong; Yu, Guangjun
2017-03-27
Long non-coding RNA overlapping with protein-coding gene (lncRNA-coding pair) is a special type of overlapping genes. Protein-coding overlapping genes have been well studied and increasing attention has been paid to lncRNAs. By studying lncRNA-coding pairs in human genome, we showed that lncRNA-coding pairs were more likely to be generated by overprinting and retaining genes in lncRNA-coding pairs were given higher priority than non-overlapping genes. Besides, the preference of overlapping configurations preserved during evolution was based on the origin of lncRNA-coding pairs. Further investigations showed that lncRNAs promoting the splicing of their embedded protein-coding partners was a unilateral interaction, but the existence of overlapping partners improving the gene expression was bidirectional and the effect was decreased with the increased evolutionary age of genes. Additionally, the expression of lncRNA-coding pairs showed an overall positive correlation and the expression correlation was associated with their overlapping configurations, local genomic environment and evolutionary age of genes. Comparison of the expression correlation of lncRNA-coding pairs between normal and cancer samples found that the lineage-specific pairs including old protein-coding genes may play an important role in tumorigenesis. This work presents a systematically comprehensive understanding of the evolution and the expression pattern of human lncRNA-coding pairs.
De Novo Origin of Human Protein-Coding Genes
Wu, Dong-Dong; Irwin, David M.; Zhang, Ya-Ping
2011-01-01
The de novo origin of a new protein-coding gene from non-coding DNA is considered to be a very rare occurrence in genomes. Here we identify 60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee. The functionality of these genes is supported by both transcriptional and proteomic evidence. RNA–seq data indicate that these genes have their highest expression levels in the cerebral cortex and testes, which might suggest that these genes contribute to phenotypic traits that are unique to humans, such as improved cognitive ability. Our results are inconsistent with the traditional view that the de novo origin of new genes is very rare, thus there should be greater appreciation of the importance of the de novo origination of genes. PMID:22102831
Chen, Geng; Yin, Kangping; Shi, Leming; Fang, Yuanzhang; Qi, Ya; Li, Peng; Luo, Jian; He, Bing; Liu, Mingyao; Shi, Tieliu
2011-01-01
In their expression process, different genes can generate diverse functional products, including various protein-coding or noncoding RNAs. Here, we investigated the protein-coding capacities and the expression levels of their isoforms for human known genes, the conservation and disease association of long noncoding RNAs (ncRNAs) with two transcriptome sequencing datasets from human brain tissues and 10 mixed cell lines. Comparative analysis revealed that about two-thirds of the genes expressed between brain and cell lines are the same, but less than one-third of their isoforms are identical. Besides those genes specially expressed in brain and cell lines, about 66% of genes expressed in common encoded different isoforms. Moreover, most genes dominantly expressed one isoform and some genes only generated protein-coding (or noncoding) RNAs in one sample but not in another. We found 282 human genes could encode both protein-coding and noncoding RNAs through alternative splicing in the two samples. We also identified more than 1,000 long ncRNAs, and most of those long ncRNAs contain conserved elements across either 46 vertebrates or 33 placental mammals or 10 primates. Further analysis showed that some long ncRNAs differentially expressed in human breast cancer or lung cancer, several of those differentially expressed long ncRNAs were validated by RT-PCR. In addition, those validated differentially expressed long ncRNAs were found significantly correlated with certain breast cancer or lung cancer related genes, indicating the important biological relevance between long ncRNAs and human cancers. Our findings reveal that the differences of gene expression profile between samples mainly result from the expressed gene isoforms, and highlight the importance of studying genes at the isoform level for completely illustrating the intricate transcriptome.
Chakraborty, Supriyo; Uddin, Arif; Mazumder, Tarikul Huda; Choudhury, Monisha Nath; Malakar, Arup Kumar; Paul, Prosenjit; Halder, Binata; Deka, Himangshu; Mazumder, Gulshana Akthar; Barbhuiya, Riazul Ahmed; Barbhuiya, Masuk Ahmed; Devi, Warepam Jesmi
2017-12-02
The study of codon usage coupled with phylogenetic analysis is an important tool to understand the genetic and evolutionary relationship of a gene. The 13 protein coding genes of human mitochondria are involved in electron transport chain for the generation of energy currency (ATP). However, no work has yet been reported on the codon usage of the mitochondrial protein coding genes across six continents. To understand the patterns of codon usage in mitochondrial genes across six different continents, we used bioinformatic analyses to analyze the protein coding genes. The codon usage bias was low as revealed from high ENC value. Correlation between codon usage and GC3 suggested that all the codons ending with G/C were positively correlated with GC3 but vice versa for A/T ending codons with the exception of ND4L and ND5 genes. Neutrality plot revealed that for the genes ATP6, COI, COIII, CYB, ND4 and ND4L, natural selection might have played a major role while mutation pressure might have played a dominant role in the codon usage bias of ATP8, COII, ND1, ND2, ND3, ND5 and ND6 genes. Phylogenetic analysis indicated that evolutionary relationships in each of 13 protein coding genes of human mitochondria were different across six continents and further suggested that geographical distance was an important factor for the origin and evolution of 13 protein coding genes of human mitochondria. Copyright © 2017 Elsevier B.V. and Mitochondria Research Society. All rights reserved.
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio
2004-01-01
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
Activity-Dependent Human Brain Coding/Noncoding Gene Regulatory Networks
Lipovich, Leonard; Dachet, Fabien; Cai, Juan; Bagla, Shruti; Balan, Karina; Jia, Hui; Loeb, Jeffrey A.
2012-01-01
While most gene transcription yields RNA transcripts that code for proteins, a sizable proportion of the genome generates RNA transcripts that do not code for proteins, but may have important regulatory functions. The brain-derived neurotrophic factor (BDNF) gene, a key regulator of neuronal activity, is overlapped by a primate-specific, antisense long noncoding RNA (lncRNA) called BDNFOS. We demonstrate reciprocal patterns of BDNF and BDNFOS transcription in highly active regions of human neocortex removed as a treatment for intractable seizures. A genome-wide analysis of activity-dependent coding and noncoding human transcription using a custom lncRNA microarray identified 1288 differentially expressed lncRNAs, of which 26 had expression profiles that matched activity-dependent coding genes and an additional 8 were adjacent to or overlapping with differentially expressed protein-coding genes. The functions of most of these protein-coding partner genes, such as ARC, include long-term potentiation, synaptic activity, and memory. The nuclear lncRNAs NEAT1, MALAT1, and RPPH1, composing an RNAse P-dependent lncRNA-maturation pathway, were also upregulated. As a means to replicate human neuronal activity, repeated depolarization of SY5Y cells resulted in sustained CREB activation and produced an inverse pattern of BDNF-BDNFOS co-expression that was not achieved with a single depolarization. RNAi-mediated knockdown of BDNFOS in human SY5Y cells increased BDNF expression, suggesting that BDNFOS directly downregulates BDNF. Temporal expression patterns of other lncRNA-messenger RNA pairs validated the effect of chronic neuronal activity on the transcriptome and implied various lncRNA regulatory mechanisms. lncRNAs, some of which are unique to primates, thus appear to have potentially important regulatory roles in activity-dependent human brain plasticity. PMID:22960213
Basu, Swaraj; Larsson, Erik
2018-05-31
Antisense transcripts and other long non-coding RNAs are pervasive in mammalian cells, and some of these molecules have been proposed to regulate proximal protein-coding genes in cis For example, non-coding transcription can contribute to inactivation of tumor suppressor genes in cancer, and antisense transcripts have been implicated in the epigenetic inactivation of imprinted genes. However, our knowledge is still limited and more such regulatory interactions likely await discovery. Here, we make use of available gene expression data from a large compendium of human tumors to generate hypotheses regarding non-coding-to-coding cis -regulatory relationships with emphasis on negative associations, as these are less likely to arise for reasons other than cis -regulation. We document a large number of possible regulatory interactions, including 193 coding/non-coding pairs that show expression patterns compatible with negative cis -regulation. Importantly, by this approach we capture several known cases, and many of the involved coding genes have known roles in cancer. Our study provides a large catalog of putative non-coding/coding cis -regulatory pairs that may serve as a basis for further experimental validation and characterization. Copyright © 2018 Basu and Larsson.
Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui; ...
2014-10-02
Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui
Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin
ERIC Educational Resources Information Center
Offner, Susan
2010-01-01
The beta hemoglobin protein is identical in humans and chimpanzees. In this tutorial, students see that even though the proteins are identical, the genes that code for them are not. There are many more differences in the introns than in the exons, which indicates that coding regions of DNA are more highly conserved than non-coding regions.
Cellular miR-2909 RNomics governs the genes that ensure immune checkpoint regulation.
Kaul, Deepak; Malik, Deepti; Wani, Sameena
2018-06-20
Cross-talk between coding RNAs and regulatory non-coding microRNAs, within human genome, has provided compelling evidence for the existence of flexible checkpoint control of T-Cell activation. The present study attempts to demonstrate that the interplay between miR-2909 and its effector KLF4 gene has the inherent capacity to regulate genes coding for CTLA4, CD28, CD40, CD134, PDL1, CD80, CD86, IL-6 and IL-10 within normal human peripheral blood mononuclear cells (PBMCs). Based upon these findings, we propose a pathway that links miR-2909 RNomics with the genes coding for immune checkpoint regulators required for the maintenance of immune homeostasis.
Schmouth, Jean-François; Castellarin, Mauro; Laprise, Stéphanie; Banks, Kathleen G; Bonaguro, Russell J; McInerny, Simone C; Borretta, Lisa; Amirabbasi, Mahsa; Korecki, Andrea J; Portales-Casamar, Elodie; Wilson, Gary; Dreolini, Lisa; Jones, Steven J M; Wasserman, Wyeth W; Goldowitz, Daniel; Holt, Robert A; Simpson, Elizabeth M
2013-10-14
The next big challenge in human genetics is understanding the 98% of the genome that comprises non-coding DNA. Hidden in this DNA are sequences critical for gene regulation, and new experimental strategies are needed to understand the functional role of gene-regulation sequences in health and disease. In this study, we build upon our HuGX ('high-throughput human genes on the X chromosome') strategy to expand our understanding of human gene regulation in vivo. In all, ten human genes known to express in therapeutically important brain regions were chosen for study. For eight of these genes, human bacterial artificial chromosome clones were identified, retrofitted with a reporter, knocked single-copy into the Hprt locus in mouse embryonic stem cells, and mouse strains derived. Five of these human genes expressed in mouse, and all expressed in the adult brain region for which they were chosen. This defined the boundaries of the genomic DNA sufficient for brain expression, and refined our knowledge regarding the complexity of gene regulation. We also characterized for the first time the expression of human MAOA and NR2F2, two genes for which the mouse homologs have been extensively studied in the central nervous system (CNS), and AMOTL1 and NOV, for which roles in CNS have been unclear. We have demonstrated the use of the HuGX strategy to functionally delineate non-coding-regulatory regions of therapeutically important human brain genes. Our results also show that a careful investigation, using publicly available resources and bioinformatics, can lead to accurate predictions of gene expression.
Genes uniquely expressed in human growth plate chondrocytes uncover a distinct regulatory network.
Li, Bing; Balasubramanian, Karthika; Krakow, Deborah; Cohn, Daniel H
2017-12-20
Chondrogenesis is the earliest stage of skeletal development and is a highly dynamic process, integrating the activities and functions of transcription factors, cell signaling molecules and extracellular matrix proteins. The molecular mechanisms underlying chondrogenesis have been extensively studied and multiple key regulators of this process have been identified. However, a genome-wide overview of the gene regulatory network in chondrogenesis has not been achieved. In this study, employing RNA sequencing, we identified 332 protein coding genes and 34 long non-coding RNA (lncRNA) genes that are highly selectively expressed in human fetal growth plate chondrocytes. Among the protein coding genes, 32 genes were associated with 62 distinct human skeletal disorders and 153 genes were associated with skeletal defects in knockout mice, confirming their essential roles in skeletal formation. These gene products formed a comprehensive physical interaction network and participated in multiple cellular processes regulating skeletal development. The data also revealed 34 transcription factors and 11,334 distal enhancers that were uniquely active in chondrocytes, functioning as transcriptional regulators for the cartilage-selective genes. Our findings revealed a complex gene regulatory network controlling skeletal development whereby transcription factors, enhancers and lncRNAs participate in chondrogenesis by transcriptional regulation of key genes. Additionally, the cartilage-selective genes represent candidate genes for unsolved human skeletal disorders.
A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes.
Hezroni, Hadas; Ben-Tov Perry, Rotem; Meir, Zohar; Housman, Gali; Lubelsky, Yoav; Ulitsky, Igor
2017-08-30
Only a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs. We systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality. We estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.
Dynamic gene expression response to altered gravity in human T cells.
Thiel, Cora S; Hauschild, Swantje; Huge, Andreas; Tauber, Svantje; Lauber, Beatrice A; Polzer, Jennifer; Paulsen, Katrin; Lier, Hartwin; Engelmann, Frank; Schmitz, Burkhard; Schütte, Andreas; Layer, Liliana E; Ullrich, Oliver
2017-07-12
We investigated the dynamics of immediate and initial gene expression response to different gravitational environments in human Jurkat T lymphocytic cells and compared expression profiles to identify potential gravity-regulated genes and adaptation processes. We used the Affymetrix GeneChip® Human Transcriptome Array 2.0 containing 44,699 protein coding genes and 22,829 non-protein coding genes and performed the experiments during a parabolic flight and a suborbital ballistic rocket mission to cross-validate gravity-regulated gene expression through independent research platforms and different sets of control experiments to exclude other factors than alteration of gravity. We found that gene expression in human T cells rapidly responded to altered gravity in the time frame of 20 s and 5 min. The initial response to microgravity involved mostly regulatory RNAs. We identified three gravity-regulated genes which could be cross-validated in both completely independent experiment missions: ATP6V1A/D, a vacuolar H + -ATPase (V-ATPase) responsible for acidification during bone resorption, IGHD3-3/IGHD3-10, diversity genes of the immunoglobulin heavy-chain locus participating in V(D)J recombination, and LINC00837, a long intergenic non-protein coding RNA. Due to the extensive and rapid alteration of gene expression associated with regulatory RNAs, we conclude that human cells are equipped with a robust and efficient adaptation potential when challenged with altered gravitational environments.
Cheng, Chao; Ung, Matthew; Grant, Gavin D.; Whitfield, Michael L.
2013-01-01
Cell cycle is a complex and highly supervised process that must proceed with regulatory precision to achieve successful cellular division. Despite the wide application, microarray time course experiments have several limitations in identifying cell cycle genes. We thus propose a computational model to predict human cell cycle genes based on transcription factor (TF) binding and regulatory motif information in their promoters. We utilize ENCODE ChIP-seq data and motif information as predictors to discriminate cell cycle against non-cell cycle genes. Our results show that both the trans- TF features and the cis- motif features are predictive of cell cycle genes, and a combination of the two types of features can further improve prediction accuracy. We apply our model to a complete list of GENCODE promoters to predict novel cell cycle driving promoters for both protein-coding genes and non-coding RNAs such as lincRNAs. We find that a similar percentage of lincRNAs are cell cycle regulated as protein-coding genes, suggesting the importance of non-coding RNAs in cell cycle division. The model we propose here provides not only a practical tool for identifying novel cell cycle genes with high accuracy, but also new insights on cell cycle regulation by TFs and cis-regulatory elements. PMID:23874175
Analysis of protein-coding genetic variation in 60,706 humans.
Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G
2016-08-18
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
D'Onofrio, Giuseppe; Ghosh, Tapash Chandra
2005-01-17
Fluctuations and increments of both C(3) and G(3) levels along the human coding sequences were investigated comparing two sets of Xenopus/human orthologous genes. The first set of genes shows minor differences of the GC(3) levels, the second shows considerable increments of the GC(3) levels in the human genes. In both data sets, the fluctuations of C(3) and G(3) levels along the coding sequences correlated with the secondary structures of the encoded proteins. The human genes that underwent the compositional transition showed a different increment of the C(3) and G(3) levels within and among the structural units of the proteins. The relative synonymous codon usage (RSCU) of several amino acids were also affected during the compositional transition, showing that there exists a correlation between RSCU and protein secondary structures in human genes. The importance of natural selection for the formation of isochore organization of the human genome has been discussed on the basis of these results.
The gene coding for the B cell surface protein CD19 is localized on human chromosome 16p11.
Stapleton, P; Kozmik, Z; Weith, A; Busslinger, M
1995-02-01
The CD19 gene codes for one of the earliest markers of the human B cell lineage and is a target for the B lymphoid-specific transcription factor BSAP (Pax-5). The transmembrane protein CD19 has been implicated in controlling proliferation of mature B lymphocytes by modulating signal transduction through the antigen receptor. In this study, we have employed Southern blot and fluorescence in situ hybridization analyses to localize the CD19 gene to human chromosome 16p11.
Romero, Roberto; Tarca, Adi; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S.; Kalita, Cynthia A.; Cai, Juan; Yeo, Lami; Lipovich, Leonard
2014-01-01
Objective The mechanisms responsible for normal and abnormal parturition are poorly understood. Myometrial activation leading to regular uterine contractions is a key component of labor. Dysfunctional labor (arrest of dilatation and/or descent) is a leading indication for cesarean delivery. Compelling evidence suggests that most of these disorders are functional in nature, and not the result of cephalopelvic disproportion. The methodology and the datasets afforded by the post-genomic era provide novel opportunities to understand and target gene functions in these disorders. In 2012, the ENCODE Consortium elucidated the extraordinary abundance and functional complexity of long non-coding RNA genes in the human genome. The purpose of the study was to identify differentially expressed long non-coding RNA genes in human myometrium in women in spontaneous labor at term. Materials and Methods Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n=19) and women in spontaneous labor at term (n=20). RNA was extracted and profiled using an Illumina® microarray platform. The analysis of the protein coding genes from this study has been previously reported. Here, we have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. Results Upon considering more than 18,498 distinct lncRNA genes compiled nonredundantly from public experimental data sources, and interrogating 2,634 that matched Illumina microarray probes, we identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an independent experimental method. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site that lacked evolutionary conservation beyond primates. Conclusions We provide for the first time evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known, as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term. PMID:24168098
Seligmann, Hervé
2013-05-07
GenBank's EST database includes RNAs matching exactly human mitochondrial sequences assuming systematic asymmetric nucleotide exchange-transcription along exchange rules: A→G→C→U/T→A (12 ESTs), A→U/T→C→G→A (4 ESTs), C→G→U/T→C (3 ESTs), and A→C→G→U/T→A (1 EST), no RNAs correspond to other potential asymmetric exchange rules. Hypothetical polypeptides translated from nucleotide-exchanged human mitochondrial protein coding genes align with numerous GenBank proteins, predicted secondary structures resemble their putative GenBank homologue's. Two independent methods designed to detect overlapping genes (one based on nucleotide contents analyses in relation to replicative deamination gradients at third codon positions, and circular code analyses of codon contents based on frame redundancy), confirm nucleotide-exchange-encrypted overlapping genes. Methods converge on which genes are most probably active, and which not, and this for the various exchange rules. Mean EST lengths produced by different nucleotide exchanges are proportional to (a) extents that various bioinformatics analyses confirm the protein coding status of putative overlapping genes; (b) known kinetic chemistry parameters of the corresponding nucleotide substitutions by the human mitochondrial DNA polymerase gamma (nucleotide DNA misinsertion rates); (c) stop codon densities in predicted overlapping genes (stop codon readthrough and exchanging polymerization regulate gene expression by counterbalancing each other). Numerous rarely expressed proteins seem encoded within regular mitochondrial genes through asymmetric nucleotide exchange, avoiding lengthening genomes. Intersecting evidence between several independent approaches confirms the working hypothesis status of gene encryption by systematic nucleotide exchanges. Copyright © 2013 Elsevier Ltd. All rights reserved.
A Plain English Map of the Human Glycolysis Enzymes.
ERIC Educational Resources Information Center
Offner, Susan
1999-01-01
Presents a plain English map of the gene coding for the glycolysis enzymes in humans to be used as a teaching tool. The map can be used to illustrate that every reaction in a cell requires an enzyme, and that every enzyme is a protein coded for by a gene somewhere on the chromosomes. (WRM)
Romero, Roberto; Tarca, Adi L; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S; Kalita, Cynthia A; Cai, Juan; Yeo, Lami; Lipovich, Leonard
2014-09-01
To identify differentially expressed long non-coding RNA (lncRNA) genes in human myometrium in women with spontaneous labor at term. Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n = 19) and women in spontaneous labor at term (n = 20). RNA was extracted and profiled using an Illumina® microarray platform. We have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. We identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an experimental method completely independent of the microarray analysis. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site, that lacked evolutionary conservation beyond primates. We provide, for the first time, evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term.
Reggiani, Claudio; Coppens, Sandra; Sekhara, Tayeb; Dimov, Ivan; Pichon, Bruno; Lufin, Nicolas; Addor, Marie-Claude; Belligni, Elga Fabia; Digilio, Maria Cristina; Faletra, Flavio; Ferrero, Giovanni Battista; Gerard, Marion; Isidor, Bertrand; Joss, Shelagh; Niel-Bütschi, Florence; Perrone, Maria Dolores; Petit, Florence; Renieri, Alessandra; Romana, Serge; Topa, Alexandra; Vermeesch, Joris Robert; Lenaerts, Tom; Casimir, Georges; Abramowicz, Marc; Bontempi, Gianluca; Vilain, Catheline; Deconinck, Nicolas; Smits, Guillaume
2017-07-19
Tissue-specific integrative omics has the potential to reveal new genic elements important for developmental disorders. Two pediatric patients with global developmental delay and intellectual disability phenotype underwent array-CGH genetic testing, both showing a partial deletion of the DLG2 gene. From independent human and murine omics datasets, we combined copy number variations, histone modifications, developmental tissue-specific regulation, and protein data to explore the molecular mechanism at play. Integrating genomics, transcriptomics, and epigenomics data, we describe two novel DLG2 promoters and coding first exons expressed in human fetal brain. Their murine conservation and protein-level evidence allowed us to produce new DLG2 gene models for human and mouse. These new genic elements are deleted in 90% of 29 patients (public and in-house) showing partial deletion of the DLG2 gene. The patients' clinical characteristics expand the neurodevelopmental phenotypic spectrum linked to DLG2 gene disruption to cognitive and behavioral categories. While protein-coding genes are regarded as well known, our work shows that integration of multiple omics datasets can unveil novel coding elements. From a clinical perspective, our work demonstrates that two new DLG2 promoters and exons are crucial for the neurodevelopmental phenotypes associated with this gene. In addition, our work brings evidence for the lack of cross-annotation in human versus mouse reference genomes and nucleotide versus protein databases.
A human haploid gene trap collection to study lncRNAs with unusual RNA biology.
Kornienko, Aleksandra E; Vlatkovic, Irena; Neesen, Jürgen; Barlow, Denise P; Pauler, Florian M
2016-01-01
Many thousand long non-coding (lnc) RNAs are mapped in the human genome. Time consuming studies using reverse genetic approaches by post-transcriptional knock-down or genetic modification of the locus demonstrated diverse biological functions for a few of these transcripts. The Human Gene Trap Mutant Collection in haploid KBM7 cells is a ready-to-use tool for studying protein-coding gene function. As lncRNAs show remarkable differences in RNA biology compared to protein-coding genes, it is unclear if this gene trap collection is useful for functional analysis of lncRNAs. Here we use the uncharacterized LOC100288798 lncRNA as a model to answer this question. Using public RNA-seq data we show that LOC100288798 is ubiquitously expressed, but inefficiently spliced. The minor spliced LOC100288798 isoforms are exported to the cytoplasm, whereas the major unspliced isoform is nuclear localized. This shows that LOC100288798 RNA biology differs markedly from typical mRNAs. De novo assembly from RNA-seq data suggests that LOC100288798 extends 289kb beyond its annotated 3' end and overlaps the downstream SLC38A4 gene. Three cell lines with independent gene trap insertions in LOC100288798 were available from the KBM7 gene trap collection. RT-qPCR and RNA-seq confirmed successful lncRNA truncation and its extended length. Expression analysis from RNA-seq data shows significant deregulation of 41 protein-coding genes upon LOC100288798 truncation. Our data shows that gene trap collections in human haploid cell lines are useful tools to study lncRNAs, and identifies the previously uncharacterized LOC100288798 as a potential gene regulator.
Towards a complete map of the human long non-coding RNA transcriptome.
Uszczynska-Ratajczak, Barbara; Lagarde, Julien; Frankish, Adam; Guigó, Roderic; Johnson, Rory
2018-05-23
Gene maps, or annotations, enable us to navigate the functional landscape of our genome. They are a resource upon which virtually all studies depend, from single-gene to genome-wide scales and from basic molecular biology to medical genetics. Yet present-day annotations suffer from trade-offs between quality and size, with serious but often unappreciated consequences for downstream studies. This is particularly true for long non-coding RNAs (lncRNAs), which are poorly characterized compared to protein-coding genes. Long-read sequencing technologies promise to improve current annotations, paving the way towards a complete annotation of lncRNAs expressed throughout a human lifetime.
Base composition and expression level of human genes.
Arhondakis, Stilianos; Auletta, Fabio; Torelli, Giuseppe; D'Onofrio, Giuseppe
2004-01-21
It is well known that the gene distribution is non-uniform in the human genome, reaching the highest concentration in the GC-rich isochores. Also the amino acid frequencies, and the hydrophobicity, of the corresponding encoded proteins are affected by the high GC level of the genes localized in the GC-rich isochores. It was hypothesized that the gene expression level as well is higher in GC-rich compared to GC-poor isochores [Mol. Biol. Evol. 10 (1993) 186]. Several features of human genes and proteins, namely expression level, coding and non-coding lengths, and hydrophobicity were investigated in the present paper. The results support the hypothesis reported above, since all the parameters so far studied converge to the same conclusion, that the average expression level of the GC-rich genes is significantly higher than that of the GC-poor genes.
NASA Technical Reports Server (NTRS)
Chang, Dong Kyung; Metzgar, David; Wills, Christopher; Boland, C. Richard
2003-01-01
All "minor" components of the human DNA mismatch repair (MMR) system-MSH3, MSH6, PMS2, and the recently discovered MLH3-contain mononucleotide microsatellites in their coding sequences. This intriguing finding contrasts with the situation found in the major components of the DNA MMR system-MSH2 and MLH1-and, in fact, most human genes. Although eukaryotic genomes are rich in microsatellites, non-triplet microsatellites are rare in coding regions. The recurring presence of exonal mononucleotide repeat sequences within a single family of human genes would therefore be considered exceptional.
Midthun, K; Flores, J; Taniguchi, K; Urasawa, S; Kapikian, A Z; Chanock, R M
1987-01-01
Antigenic characterization of human rotaviruses by plaque reduction neutralization assay has revealed four distinct serotypes. The outer capsid protein VP7, coded for by gene 8 or 9, is a major neutralization protein; however, studies of rotaviruses derived from genetic reassortment between two strains have confirmed that another outer capsid protein, VP3, is in some cases equally important in neutralization. In this study, the genetic relatedness of the genes coding for VP7 of human rotaviruses belonging to serotypes 1 through 4 was examined by hybridization of their denatured double-stranded genomic RNAs to labeled single-stranded mRNA probes derived from human-animal rotavirus reassortants containing only the VP7 gene of their human rotavirus parent. A high degree of homology was demonstrated between the VP7 genes of strain D and other serotype 1 human rotaviruses, strain DS-1 and other serotype 2 human rotaviruses, strain P and other serotype 3 human rotaviruses, and strain ST3 and other serotype 4 human rotaviruses. Hybrid bands could not be demonstrated between the VP7 gene of D, DS-1, P, or ST3 and the corresponding gene of human rotaviruses belonging to a different serotype. RNA specimens extracted from the stools of 15 Venezuelan children hospitalized with rotavirus diarrhea were hybridized to each of the reassortant probes representing the four human serotypes. All five viruses with short RNA patterns showed homology with the DS-1 strain VP7 gene; two of these were previously adapted to tissue culture and shown to be serotype 2 strains by tissue culture neutralization. Of the remaining 10 viruses with long RNA patterns, 2 hybridized only to the D strain VP7 gene, 6 hybridized only to the P strain VP7 gene, and 2 hybridized only to the ST3 strain VP7 gene. Hybridization using single human rotavirus gene substitution reassortants as probes may provide an alternative method for identifying the VP7 serotype of field isolates that would circumvent the need for tissue culture adaptation. Images PMID:3038948
Expression Profiling Smackdown: Human Transcriptome Array HTA 2.0 vs. RNA-Seq
Palermo, Meghann; Driscoll, Heather; Tighe, Scott; Dragon, Julie; Bond, Jeff; Shukla, Arti; Vangala, Mahesh; Vincent, James; Hunter, Tim
2014-01-01
The advent of both microarray and massively parallel sequencing have revolutionized high-throughput analysis of the human transcriptome. Due to limitations in microarray technology, detecting and quantifying coding transcript isoforms, in addition to non-coding transcripts, has been challenging. As a result, RNA-Seq has been the preferred method for characterizing the full human transcriptome, until now. A new high-resolution array from Affymetrix, GeneChip Human Transcriptome Array 2.0 (HTA 2.0), has been designed to interrogate all transcript isoforms in the human transcriptome with >6 million probes targeting coding transcripts, exon-exon splice junctions, and non-coding transcripts. Here we compare expression results from GeneChip HTA 2.0 and RNA-Seq data using identical RNA extractions from three samples each of healthy human mesothelial cells in culture, LP9-C1, and healthy mesothelial cells treated with asbestos, LP9-A1. For GeneChip HTA 2.0 sample preparation, we chose to compare two target preparation methods, NuGEN Ovation Pico WTA V2 with the Encore Biotin Module versus Affymetrix's GeneChip WT PLUS with the WT Terminal Labeling Kit, on identical RNA extractions from both untreated and treated samples. These same RNA extractions were used for the RNA-Seq library preparation. All analyses were performed in Partek Genomics Suite 6.6. Expression profiles for control and asbestos-treated mesothelial cells prepared with NuGEN versus Affymetrix target preparation methods (GeneChip HTA 2.0) are compared to each other as well as to RNA-Seq results.
The Human Cell Surfaceome of Breast Tumors
da Cunha, Júlia Pinheiro Chagas; Galante, Pedro Alexandre Favoretto; de Souza, Jorge Estefano Santana; Pieprzyk, Martin; Carraro, Dirce Maria; Old, Lloyd J.; Camargo, Anamaria Aranha; de Souza, Sandro José
2013-01-01
Introduction. Cell surface proteins are ideal targets for cancer therapy and diagnosis. We have identified a set of more than 3700 genes that code for transmembrane proteins believed to be at human cell surface. Methods. We used a high-throuput qPCR system for the analysis of 573 cell surface protein-coding genes in 12 primary breast tumors, 8 breast cell lines, and 21 normal human tissues including breast. To better understand the role of these genes in breast tumors, we used a series of bioinformatics strategies to integrates different type, of the datasets, such as KEGG, protein-protein interaction databases, ONCOMINE, and data from, literature. Results. We found that at least 77 genes are overexpressed in breast primary tumors while at least 2 of them have also a restricted expression pattern in normal tissues. We found common signaling pathways that may be regulated in breast tumors through the overexpression of these cell surface protein-coding genes. Furthermore, a comparison was made between the genes found in this report and other genes associated with features clinically relevant for breast tumorigenesis. Conclusions. The expression profiling generated in this study, together with an integrative bioinformatics analysis, allowed us to identify putative targets for breast tumors. PMID:24195083
Miller-Delaney, Suzanne F.C.; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C.; Bray, Isabella M.; Reynolds, James P.; Gwinn, Ryder; Stallings, Raymond L.
2015-01-01
Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. PMID:25552301
Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
Djebali, Sarah; Delaplace, Franck; Crollius, Hugues Roest
2006-01-01
Background Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism. Results We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts. Conclusion We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement. PMID:16925841
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pham-Dinh, D.; Gaspera, D.B.; Dautigny, A.
1995-09-20
Myelin/oligodendrocyte glycoprotein (MOG), a special component of the central nervous system localization on the outermost lamellae of mature myelin, is a member of the immunoglobulin superfamily. We report here the organization of the human MOG gene, which spans approximately 17 kb, and the characterization of six MOG mRNA splicing variants. The intron/exon structure of the human MOG gene confirmed the splicing pattern, supporting the hypothesis that mRNA isoforms could arise by alternative splicing of a single gene. In addition to the eight exons coding for the major MOG isoform, the human MOG gene also contains 3` region, a previously unknownmore » alternatively spliced coding exon, VIA. Alternative utilization of two acceptor splicing sites for exon VIII could produce two different C-termini. The nucleotide sequences presented here may be a useful tool to study further possible involvement if the MOG gene in hereditary neurological disorders. 23 refs., 5 figs.« less
Brain cDNA clone for human cholinesterase
DOE Office of Scientific and Technical Information (OSTI.GOV)
McTiernan, C.; Adkins, S.; Chatonnet, A.
1987-10-01
A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less
IL-TIF/IL-22: genomic organization and mapping of the human and mouse genes.
Dumoutier, L; Van Roost, E; Ameye, G; Michaux, L; Renauld, J C
2000-12-01
IL-TIF is a new cytokine originally identified as a gene induced by IL-9 in murine T lymphocytes, and showing 22% amino acid identity with IL-10. Here, we report the sequence and organization of the mouse and human IL-TIF genes, which both consist of 6 exons spreading over approximately 6 Kb. The IL-TIF gene is a single copy gene in humans, and is located on chromosome 12q15, at 90 Kb from the IFN gamma gene, and at 27 Kb from the AK155 gene, which codes for another IL-10-related cytokine. In the mouse, the IL-TIF gene is located on chromosome 10, also in the same region as the IFN gamma gene. Although it is a single copy gene in BALB/c and DBA/2 mice, the IL-TIF gene is duplicated in other strains such as C57Bl/6, FVB and 129. The two copies, which show 98% nucleotide identity in the coding region, were named IL-TIF alpha and IL-TIF beta. Beside single nucleotide variations, they differ by a 658 nucleotide deletion in IL-TIF beta, including the first non-coding exon and 603 nucleotides from the promoter. A DNA fragment corresponding to this deletion was sufficient to confer IL-9-regulated expression of a luciferase reporter plasmid, suggesting that the IL-TIF beta gene is either differentially regulated, or not expressed at all.
Shao, Renfu; Barker, Stephen C
2011-02-15
The mitochondrial (mt) genome of the human body louse, Pediculus humanus, consists of 18 minichromosomes. Each minichromosome is 3 to 4 kb long and has 1 to 3 genes. There is unequivocal evidence for recombination between different mt minichromosomes in P. humanus. It is not known, however, how these minichromosomes recombine. Here, we report the discovery of eight chimeric mt minichromosomes in P. humanus. We classify these chimeric mt minichromosomes into two groups: Group I and Group II. Group I chimeric minichromosomes contain parts of two different protein-coding genes that are from different minichromosomes. The two parts of protein-coding genes in each Group I chimeric minichromosome are joined at a microhomologous nucleotide sequence; microhomologous nucleotide sequences are hallmarks of non-homologous recombination. Group II chimeric minichromosomes contain all of the genes and the non-coding regions of two different minichromosomes. The conserved sequence blocks in the non-coding regions of Group II chimeric minichromosomes resemble the "recombination repeats" in the non-coding regions of the mt genomes of higher plants. These repeats are essential to homologous recombination in higher plants. Our analyses of the nucleotide sequences of chimeric mt minichromosomes indicate both homologous and non-homologous recombination between minichromosomes in the mitochondria of the human body louse. Copyright © 2010 Elsevier B.V. All rights reserved.
Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants.
Fu, Wenqing; O'Connor, Timothy D; Jun, Goo; Kang, Hyun Min; Abecasis, Goncalo; Leal, Suzanne M; Gabriel, Stacey; Rieder, Mark J; Altshuler, David; Shendure, Jay; Nickerson, Deborah A; Bamshad, Michael J; Akey, Joshua M
2013-01-10
Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history and will help to facilitate the development of new approaches for disease-gene discovery. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth, notable for an excess of rare genetic variants, suggesting that many mutations arose recently. To more quantitatively assess the distribution of mutation ages, we resequenced 15,336 genes in 6,515 individuals of European American and African American ancestry and inferred the age of 1,146,401 autosomal single nucleotide variants (SNVs). We estimate that approximately 73% of all protein-coding SNVs and approximately 86% of SNVs predicted to be deleterious arose in the past 5,000-10,000 years. The average age of deleterious SNVs varied significantly across molecular pathways, and disease genes contained a significantly higher proportion of recently arisen deleterious SNVs than other genes. Furthermore, European Americans had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans, consistent with weaker purifying selection due to the Out-of-Africa dispersal. Our results better delimit the historical details of human protein-coding variation, show the profound effect of recent human history on the burden of deleterious SNVs segregating in contemporary populations, and provide important practical information that can be used to prioritize variants in disease-gene discovery.
Liu, Zhongliang; Hui, Yi; Shi, Lei; Chen, Zhenyu; Xu, Xiangjie; Chi, Liankai; Fan, Beibei; Fang, Yujiang; Liu, Yang; Ma, Lin; Wang, Yiran; Xiao, Lei; Zhang, Quanbin; Jin, Guohua; Liu, Ling; Zhang, Xiaoqing
2016-09-13
Loss-of-function studies in human pluripotent stem cells (hPSCs) require efficient methodologies for lesion of genes of interest. Here, we introduce a donor-free paired gRNA-guided CRISPR/Cas9 knockout strategy (paired-KO) for efficient and rapid gene ablation in hPSCs. Through paired-KO, we succeeded in targeting all genes of interest with high biallelic targeting efficiencies. More importantly, during paired-KO, the cleaved DNA was repaired mostly through direct end joining without insertions/deletions (precise ligation), and thus makes the lesion product predictable. The paired-KO remained highly efficient for one-step targeting of multiple genes and was also efficient for targeting of microRNA, while for long non-coding RNA over 8 kb, cleavage of a short fragment of the core promoter region was sufficient to eradicate downstream gene transcription. This work suggests that the paired-KO strategy is a simple and robust system for loss-of-function studies for both coding and non-coding genes in hPSCs. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia
2015-01-01
Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/ PMID:26363020
McLysaght, Aoife; Guerzoni, Daniele
2015-09-26
The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an 'RNA-first' or 'ORF-first' pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations. © 2015 The Authors.
A comprehensive catalogue of the coding and non-coding transcripts of the human inner ear
Corneveaux, Jason J.; Ohmen, Jeffrey; White, Cory; Allen, April N.; Lusis, Aldons J.; Van Camp, Guy; Huentelman, Matthew J.; Friedman, Rick A.
2015-01-01
The mammalian inner ear consists of the cochlea and the vestibular labyrinth (utricle, saccule, and semicircular canals), which participate in both hearing and balance. Proper development and life-long function of these structures involves a highly complex coordinated system of spatial and temporal gene expression. The characterization of the inner ear transcriptome is likely important for the functional study of auditory and vestibular components, yet, primarily due to tissue unavailability, detailed expression catalogues of the human inner ear remain largely incomplete. We report here, for the first time, comprehensive transcriptome characterization of the adult human cochlea, ampulla, saccule and utricle of the vestibule obtained from patients without hearing abnormalities. Using RNA-Seq, we measured the expression of >50,000 predicted genes corresponding to approximately 200,000 transcripts, in the adult inner ear and compared it to 32 other human tissues. First, we identified genes preferentially expressed in the inner ear, and unique either to the vestibule or cochlea. Next, we examined expression levels of specific groups of potentially interesting RNAs, such as genes implicated in hearing loss, long non-coding RNAs, pseudogenes and transcripts subject to nonsense mediated decay (NMD). We uncover the spatial specificity of expression of these RNAs in the hearing/balance system, and reveal evidence of tissue specific NMD. Lastly, we investigated the non-syndromic deafness loci to which no gene has been mapped, and narrow the list of potential candidates for each locus. These data represent the first high-resolution transcriptome catalogue of the adult human inner ear. A comprehensive identification of coding and non-coding RNAs in the inner ear will enable pathways of auditory and vestibular function to be further defined in the study of hearing and balance. Expression data are freely accessible at https://www.tgen.org/home/research/research-divisions/neurogenomics/supplementary-data/inner-ear-transcriptome.aspx PMID:26341477
Functional annotation of the vlinc class of non-coding RNAs using systems biology approach
Laurent, Georges St.; Vyatkin, Yuri; Antonets, Denis; Ri, Maxim; Qi, Yao; Saik, Olga; Shtokalo, Dmitry; de Hoon, Michiel J.L.; Kawaji, Hideya; Itoh, Masayoshi; Lassmann, Timo; Arner, Erik; Forrest, Alistair R.R.; Nicolas, Estelle; McCaffrey, Timothy A.; Carninci, Piero; Hayashizaki, Yoshihide; Wahlestedt, Claes; Kapranov, Philipp
2016-01-01
Functionality of the non-coding transcripts encoded by the human genome is the coveted goal of the modern genomics research. While commonly relied on the classical methods of forward genetics, integration of different genomics datasets in a global Systems Biology fashion presents a more productive avenue of achieving this very complex aim. Here we report application of a Systems Biology-based approach to dissect functionality of a newly identified vast class of very long intergenic non-coding (vlinc) RNAs. Using highly quantitative FANTOM5 CAGE dataset, we show that these RNAs could be grouped into 1542 novel human genes based on analysis of insulators that we show here indeed function as genomic barrier elements. We show that vlincRNAs genes likely function in cis to activate nearby genes. This effect while most pronounced in closely spaced vlincRNA–gene pairs can be detected over relatively large genomic distances. Furthermore, we identified 101 vlincRNA genes likely involved in early embryogenesis based on patterns of their expression and regulation. We also found another 109 such genes potentially involved in cellular functions also happening at early stages of development such as proliferation, migration and apoptosis. Overall, we show that Systems Biology-based methods have great promise for functional annotation of non-coding RNAs. PMID:27001520
Zhu, Shiyou; Li, Wei; Liu, Jingze; Chen, Chen-Hao; Liao, Qi; Xu, Ping; Xu, Han; Xiao, Tengfei; Cao, Zhongzheng; Peng, Jingyu; Yuan, Pengfei; Brown, Myles; Liu, Xiaole Shirley; Wei, Wensheng
2017-01-01
CRISPR/Cas9 screens have been widely adopted to analyse coding gene functions, but high throughput screening of non-coding elements using this method is more challenging, because indels caused by a single cut in non-coding regions are unlikely to produce a functional knockout. A high-throughput method to produce deletions of non-coding DNA is needed. Herein, we report a high throughput genomic deletion strategy to screen for functional long non-coding RNAs (lncRNAs) that is based on a lentiviral paired-guide RNA (pgRNA) library. Applying our screening method, we identified 51 lncRNAs that can positively or negatively regulate human cancer cell growth. We individually validated 9 lncRNAs using CRISPR/Cas9-mediated genomic deletion and functional rescue, CRISPR activation or inhibition, and gene expression profiling. Our high-throughput pgRNA genome deletion method should enable rapid identification of functional mammalian non-coding elements. PMID:27798563
Tamori, Akihiro; Yamanishi, Yoshihiro; Kawashima, Shuichi; Kanehisa, Minoru; Enomoto, Masaru; Tanaka, Hiromu; Kubo, Shoji; Shiomi, Susumu; Nishiguchi, Shuhei
2005-08-15
Integration of hepatitis B virus (HBV) DNA into the human genome is one of the most important steps in HBV-related carcinogenesis. This study attempted to find the link between HBV DNA, the adjoining cellular sequence, and altered gene expression in hepatocellular carcinoma (HCC) with integrated HBV DNA. We examined 15 cases of HCC infected with HBV by cassette ligation-mediated PCR. The human DNA adjacent to the integrated HBV DNA was sequenced. Protein coding sequences were searched for in the human sequence. In five cases with HBV DNA integration, from which good quality RNA was extracted, gene expression was examined by cDNA microarray analysis. The human DNA sequence successive to integrated HBV DNA was determined in the 15 HCCs. Eight protein-coding regions were involved: ras-responsive element binding protein 1, calmodulin 1, mixed lineage leukemia 2 (MLL2), FLJ333655, LOC220272, LOC255345, LOC220220, and LOC168991. The MLL2 gene was expressed in three cases with HBV DNA integrated into exon 3 of MLL2 and in one case with HBV DNA integrated into intron 3 of MLL2. Gene expression analysis suggested that two HCCs with HBV integrated into MLL2 had similar patterns of gene expression compared with three HCCs with HBV integrated into other loci of human chromosomes. HBV DNA was integrated at random sites of human DNA, and the MLL2 gene was one of the targets for integration. Our results suggest that HBV DNA might modulate human genes near integration sites, followed by integration site-specific expression of such genes during hepatocarcinogenesis.
Lei, Huimeng; Yan, Zhangming; Sun, Xiaohong; Zhang, Yue; Wang, Jianhong; Ma, Caihong; Xu, Qunyuan; Wang, Rui; Jarvis, Erich D; Sun, Zhirong
2017-11-01
Human and several nonhuman species share the rare ability of modifying acoustic and/or syntactic features of sounds produced, i.e. vocal learning, which is the important neurobiological and behavioral substrate of human speech/language. This convergent trait was suggested to be associated with significant genomic convergence and best manifested at the ROBO-SLIT axon guidance pathway. Here we verified the significance of such genomic convergence and assessed its functional relevance to human speech/language using human genetic variation data. In normal human populations, we found the affected amino acid sites were well fixed and accompanied with significantly more associated protein-coding SNPs in the same genes than the rest genes. Diseased individuals with speech/language disorders have significant more low frequency protein coding SNPs but they preferentially occurred outside the affected genes. Such patients' SNPs were enriched in several functional categories including two axon guidance pathways (mediated by netrin and semaphorin) that interact with ROBO-SLITs. Four of the six patients have homozygous missense SNPs on PRAME gene family, one youngest gene family in human lineage, which possibly acts upon retinoic acid receptor signaling, similarly as FOXP2, to modulate axon guidance. Taken together, we suggest the axon guidance pathways (e.g. ROBO-SLIT, PRAME gene family) served as common targets for human speech/language evolution and related disorders. Copyright © 2017 Elsevier Inc. All rights reserved.
The non-coding RNA landscape of human hematopoiesis and leukemia.
Schwarzer, Adrian; Emmrich, Stephan; Schmidt, Franziska; Beck, Dominik; Ng, Michelle; Reimer, Christina; Adams, Felix Ferdinand; Grasedieck, Sarah; Witte, Damian; Käbler, Sebastian; Wong, Jason W H; Shah, Anushi; Huang, Yizhou; Jammal, Razan; Maroz, Aliaksandra; Jongen-Lavrencic, Mojca; Schambach, Axel; Kuchenbauer, Florian; Pimanda, John E; Reinhardt, Dirk; Heckl, Dirk; Klusmann, Jan-Henning
2017-08-09
Non-coding RNAs have emerged as crucial regulators of gene expression and cell fate decisions. However, their expression patterns and regulatory functions during normal and malignant human hematopoiesis are incompletely understood. Here we present a comprehensive resource defining the non-coding RNA landscape of the human hematopoietic system. Based on highly specific non-coding RNA expression portraits per blood cell population, we identify unique fingerprint non-coding RNAs-such as LINC00173 in granulocytes-and assign these to critical regulatory circuits involved in blood homeostasis. Following the incorporation of acute myeloid leukemia samples into the landscape, we further uncover prognostically relevant non-coding RNA stem cell signatures shared between acute myeloid leukemia blasts and healthy hematopoietic stem cells. Our findings highlight the importance of the non-coding transcriptome in the formation and maintenance of the human blood hierarchy.While micro-RNAs are known regulators of haematopoiesis and leukemogenesis, the role of long non-coding RNAs is less clear. Here the authors provide a non-coding RNA expression landscape of the human hematopoietic system, highlighting their role in the formation and maintenance of the human blood hierarchy.
Silencing of the pentose phosphate pathway genes influences DNA replication in human fibroblasts.
Fornalewicz, Karolina; Wieczorek, Aneta; Węgrzyn, Grzegorz; Łyżeń, Robert
2017-11-30
Previous reports and our recently published data indicated that some enzymes of glycolysis and the tricarboxylic acid cycle can affect the genome replication process by changing either the efficiency or timing of DNA synthesis in human normal cells. Both these pathways are connected with the pentose phosphate pathway (PPP pathway). The PPP pathway supports cell growth by generating energy and precursors for nucleotides and amino acids. Therefore, we asked if silencing of genes coding for enzymes involved in the pentose phosphate pathway may also affect the control of DNA replication in human fibroblasts. Particular genes coding for PPP pathway enzymes were partially silenced with specific siRNAs. Such cells remained viable. We found that silencing of the H6PD, PRPS1, RPE genes caused less efficient enterance to the S phase and decrease in efficiency of DNA synthesis. On the other hand, in cells treated with siRNA against G6PD, RBKS and TALDO genes, the fraction of cells entering the S phase was increased. However, only in the case of G6PD and TALDO, the ratio of BrdU incorporation to DNA was significantly changed. The presented results together with our previously published studies illustrate the complexity of the influence of genes coding for central carbon metabolism on the control of DNA replication in human fibroblasts, and indicate which of them are especially important in this process. Copyright © 2017 Elsevier B.V. All rights reserved.
The impact of rare variation on gene expression across tissues.
Li, Xin; Kim, Yungil; Tsang, Emily K; Davis, Joe R; Damani, Farhan N; Chiang, Colby; Hess, Gaelen T; Zappala, Zachary; Strober, Benjamin J; Scott, Alexandra J; Li, Amy; Ganna, Andrea; Bassik, Michael C; Merker, Jason D; Hall, Ira M; Battle, Alexis; Montgomery, Stephen B
2017-10-11
Rare genetic variants are abundant in humans and are expected to contribute to individual disease risk. While genetic association studies have successfully identified common genetic variants associated with susceptibility, these studies are not practical for identifying rare variants. Efforts to distinguish pathogenic variants from benign rare variants have leveraged the genetic code to identify deleterious protein-coding alleles, but no analogous code exists for non-coding variants. Therefore, ascertaining which rare variants have phenotypic effects remains a major challenge. Rare non-coding variants have been associated with extreme gene expression in studies using single tissues, but their effects across tissues are unknown. Here we identify gene expression outliers, or individuals showing extreme expression levels for a particular gene, across 44 human tissues by using combined analyses of whole genomes and multi-tissue RNA-sequencing data from the Genotype-Tissue Expression (GTEx) project v6p release. We find that 58% of underexpression and 28% of overexpression outliers have nearby conserved rare variants compared to 8% of non-outliers. Additionally, we developed RIVER (RNA-informed variant effect on regulation), a Bayesian statistical model that incorporates expression data to predict a regulatory effect for rare variants with higher accuracy than models using genomic annotations alone. Overall, we demonstrate that rare variants contribute to large gene expression changes across tissues and provide an integrative method for interpretation of rare variants in individual genomes.
The GENCODE exome: sequencing the complete human exome
Coffey, Alison J; Kokocinski, Felix; Calafato, Maria S; Scott, Carol E; Palta, Priit; Drury, Eleanor; Joyce, Christopher J; LeProust, Emily M; Harrow, Jen; Hunt, Sarah; Lehesjoki, Anna-Elina; Turner, Daniel J; Hubbard, Tim J; Palotie, Aarno
2011-01-01
Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing. PMID:21364695
Zhang, Haiyun; Sun, Dejun; Li, Defu; Zheng, Zeguang; Xu, Jingyi; Liang, Xue; Zhang, Chenting; Wang, Sheng; Wang, Jian; Lu, Wenju
2018-05-15
Long non-coding RNAs (lncRNAs) have critical regulatory roles in protein-coding gene expression. Aberrant expression profiles of lncRNAs have been observed in various human diseases. In this study, we investigated transcriptome profiles in lung tissues of chronic cigarette smoke (CS)-induced COPD mouse model. We found that 109 lncRNAs and 260 mRNAs were significantly differential expressed in lungs of chronic CS-induced COPD mouse model compared with control animals. GO and KEGG analyses indicated that differentially expressed lncRNAs associated protein-coding genes were mainly involved in protein processing of endoplasmic reticulum pathway, and taurine and hypotaurine metabolism pathway. The combination of high throughput data analysis and the results of qRT-PCR validation in lungs of chronic CS-induced COPD mouse model, 16HBE cells with CSE treatment and PBMC from patients with COPD revealed that NR_102714 and its associated protein-coding gene UCHL1 might be involved in the development of COPD both in mouse and human. In conclusion, our study demonstrated that aberrant expression profiles of lncRNAs and mRNAs existed in lungs of chronic CS-induced COPD mouse model. From animal models perspective, these results might provide further clues to investigate biological functions of lncRNAs and their potential target protein-coding genes in the pathogenesis of COPD.
GENCODE: the reference human genome annotation for The ENCODE Project.
Harrow, Jennifer; Frankish, Adam; Gonzalez, Jose M; Tapanari, Electra; Diekhans, Mark; Kokocinski, Felix; Aken, Bronwen L; Barrell, Daniel; Zadissa, Amonida; Searle, Stephen; Barnes, If; Bignell, Alexandra; Boychenko, Veronika; Hunt, Toby; Kay, Mike; Mukherjee, Gaurab; Rajan, Jeena; Despacio-Reyes, Gloria; Saunders, Gary; Steward, Charles; Harte, Rachel; Lin, Michael; Howald, Cédric; Tanzer, Andrea; Derrien, Thomas; Chrast, Jacqueline; Walters, Nathalie; Balasubramanian, Suganthi; Pei, Baikang; Tress, Michael; Rodriguez, Jose Manuel; Ezkurdia, Iakes; van Baren, Jeltje; Brent, Michael; Haussler, David; Kellis, Manolis; Valencia, Alfonso; Reymond, Alexandre; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J
2012-09-01
The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Many human accelerated regions are developmental enhancers
Capra, John A.; Erwin, Genevieve D.; McKinsey, Gabriel; Rubenstein, John L. R.; Pollard, Katherine S.
2013-01-01
The genetic changes underlying the dramatic differences in form and function between humans and other primates are largely unknown, although it is clear that gene regulatory changes play an important role. To identify regulatory sequences with potentially human-specific functions, we and others used comparative genomics to find non-coding regions conserved across mammals that have acquired many sequence changes in humans since divergence from chimpanzees. These regions are good candidates for performing human-specific regulatory functions. Here, we analysed the DNA sequence, evolutionary history, histone modifications, chromatin state and transcription factor (TF) binding sites of a combined set of 2649 non-coding human accelerated regions (ncHARs) and predicted that at least 30% of them function as developmental enhancers. We prioritized the predicted ncHAR enhancers using analysis of TF binding site gain and loss, along with the functional annotations and expression patterns of nearby genes. We then tested both the human and chimpanzee sequence for 29 ncHARs in transgenic mice, and found 24 novel developmental enhancers active in both species, 17 of which had very consistent patterns of activity in specific embryonic tissues. Of these ncHAR enhancers, five drove expression patterns suggestive of different activity for the human and chimpanzee sequence at embryonic day 11.5. The changes to human non-coding DNA in these ncHAR enhancers may modify the complex patterns of gene expression necessary for proper development in a human-specific manner and are thus promising candidates for understanding the genetic basis of human-specific biology. PMID:24218637
Highly efficient Cas9-mediated transcriptional programming
Chavez, Alejandro; Scheiman, Jonathan; Vora, Suhani; ...
2015-03-02
The RNA-guided nuclease Cas9 can be reengineered as a programmable transcription factor. However, modest levels of gene activation have limited potential applications. Here we describe an improved transcriptional regulator through the rational design of a tripartite activator, VP64-p65-Rta (VPR), fused to nuclease-null Cas9. Here, we demonstrate its utility in activating endogenous coding and non-coding genes, targeting several genes simultaneously and stimulating neuronal differentiation of human induced pluripotent stem cells (iPSCs).
Functional annotation of the vlinc class of non-coding RNAs using systems biology approach.
St Laurent, Georges; Vyatkin, Yuri; Antonets, Denis; Ri, Maxim; Qi, Yao; Saik, Olga; Shtokalo, Dmitry; de Hoon, Michiel J L; Kawaji, Hideya; Itoh, Masayoshi; Lassmann, Timo; Arner, Erik; Forrest, Alistair R R; Nicolas, Estelle; McCaffrey, Timothy A; Carninci, Piero; Hayashizaki, Yoshihide; Wahlestedt, Claes; Kapranov, Philipp
2016-04-20
Functionality of the non-coding transcripts encoded by the human genome is the coveted goal of the modern genomics research. While commonly relied on the classical methods of forward genetics, integration of different genomics datasets in a global Systems Biology fashion presents a more productive avenue of achieving this very complex aim. Here we report application of a Systems Biology-based approach to dissect functionality of a newly identified vast class of very long intergenic non-coding (vlinc) RNAs. Using highly quantitative FANTOM5 CAGE dataset, we show that these RNAs could be grouped into 1542 novel human genes based on analysis of insulators that we show here indeed function as genomic barrier elements. We show that vlinc RNAs genes likely function in cisto activate nearby genes. This effect while most pronounced in closely spaced vlinc RNA-gene pairs can be detected over relatively large genomic distances. Furthermore, we identified 101 vlinc RNA genes likely involved in early embryogenesis based on patterns of their expression and regulation. We also found another 109 such genes potentially involved in cellular functions also happening at early stages of development such as proliferation, migration and apoptosis. Overall, we show that Systems Biology-based methods have great promise for functional annotation of non-coding RNAs. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Evolution of extensively fragmented mitochondrial genomes in the lice of humans.
Shao, Renfu; Zhu, Xing-Quan; Barker, Stephen C; Herd, Kate
2012-01-01
Bilateral animals are featured by an extremely compact mitochondrial (mt) genome with 37 genes on a single circular chromosome. The human body louse, Pediculus humanus, however, has its mt genes on 20 minichromosomes. We sequenced the mt genomes of two other human lice: the head louse, P. capitis, and the pubic louse, Pthirus pubis. Comparison among the three human lice revealed the presence of fragmented mt genomes in their most recent common ancestor, which lived ∼7 Ma. The head louse has exactly the same set of mt minichromosomes as the body louse, indicating that the number of minichromosomes, and the gene content and gene arrangement in each minichromosome have remained unchanged since the body louse evolved from the head louse ∼107,000 years ago. The pubic louse has the same pattern of one protein-coding or rRNA gene per minichromosome (except one minichromosome with two protein-coding genes, atp6 and atp8) as the head louse and the body louse. This pattern is apparently ancestral to all human lice and has been stable for at least 7 Myr. Most tRNA genes of the pubic louse, however, are on different minichromosomes when compared with their counterparts in the head louse and the body louse. It is evident that rearrangement of four tRNA genes (for leucine, arginine and glycine) was due to gene-identity switch by point mutation at the third anticodon position or by homologous recombination, whereas rearrangement of other tRNA genes was by gene translocation between minichromosomes, likely caused by minichromosome split via gene degeneration and deletion.
D'Antò, Vincenzo; Cantile, Monica; D'Armiento, Maria; Schiavo, Giulia; Spagnuolo, Gianrico; Terracciano, Luigi; Vecchione, Raffaela; Cillo, Clemente
2006-03-01
Homeobox-containing genes play a crucial role in odontogenesis. After the detection of Dlx and Msx genes in overlapping domains along maxillary and mandibular processes, a homeobox odontogenic code has been proposed to explain the interaction between different homeobox genes during dental lamina patterning. No role has so far been assigned to the Hox gene network in the homeobox odontogenic code due to studies on specific Hox genes and evolutionary considerations. Despite its involvement in early patterning during embryonal development, the HOX gene network, the most repeat-poor regions of the human genome, controls the phenotype identity of adult eukaryotic cells. Here, according to our results, the HOX gene network appears to be active in human tooth germs between 18 and 24 weeks of development. The immunohistochemical localization of specific HOX proteins mostly concerns the epithelial tooth germ compartment. Furthermore, only a few genes of the network are active in embryonal retromolar tissues, as well as in ectomesenchymal dental pulp cells (DPC) grown in vitro from adult human molar. Exposure of DPCs to cAMP induces the expression of from three to nine total HOX genes of the network in parallel with phenotype modifications with traits of neuronal differentiation. Our observations suggest that: (i) by combining its component genes, the HOX gene network determines the phenotype identity of epithelial and ectomesenchymal cells interacting in the generation of human tooth germ; (ii) cAMP treatment activates the HOX network and induces, in parallel, a neuronal-like phenotype in human primary ectomesenchymal dental pulp cells. 2005 Wiley-Liss, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feder, J.N.; Jan, L.Y.; Jan, Y.N.
The Drosophila hairy gene encodes a basic helix- loop-helix protein that functions in at least two steps during Drosophila development: (1) during embryogenesis, when it partakes in the establishment of segments, and (2) during the larval stage, when it functions negatively in determining the pattern of sensory bristles on the adult fly. In the rat, a structurally homologous gene (RHL) behaves as an immediate-early gene in its response to growth factors and can, like that in Drosophila, suppress neuronal differentiation events. Here, the authors report the genomic cloning of the human hairy gene homolog (HRY). The coding region of themore » gene is contained within four exons. The predicted amino acid sequence reveals only four amino acid differences between the human and rat genes. Analysis of the DNA sequence 5[prime] to the coding region reveals a putatitve untranslated exon. To increase the value of the HRY gene as a genetic marker and to assess its potential involvement in genetic disorders, they sublocalized the locus to chromosome 3q28-q29 by fluorescence in situ hybridization. 34 refs., 4 figs., 1 tab.« less
Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia
2015-01-01
Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/. © The Author(s) 2015. Published by Oxford University Press.
Durso, Lisa M; Miller, Daniel N; Wienhold, Brian J
2012-01-01
There is concern that antibiotic resistance can potentially be transferred from animals to humans through the food chain. The relationship between specific antibiotic resistant bacteria and the genes they carry remains to be described. Few details are known about the ecology of antibiotic resistant genes and bacteria in food production systems, or how antibiotic resistance genes in food animals compare to antibiotic resistance genes in other ecosystems. Here we report the distribution of antibiotic resistant genes in publicly available agricultural and non-agricultural metagenomic samples and identify which bacteria are likely to be carrying those genes. Antibiotic resistance, as coded for in the genes used in this study, is a process that was associated with all natural, agricultural, and human-impacted ecosystems examined, with between 0.7 to 4.4% of all classified genes in each habitat coding for resistance to antibiotic and toxic compounds (RATC). Agricultural, human, and coastal-marine metagenomes have characteristic distributions of antibiotic resistance genes, and different bacteria that carry the genes. There is a larger percentage of the total genome associated with antibiotic resistance in gastrointestinal-associated and agricultural metagenomes compared to marine and Antarctic samples. Since antibiotic resistance genes are a natural part of both human-impacted and pristine habitats, presence of these resistance genes in any specific habitat is therefore not sufficient to indicate or determine impact of anthropogenic antibiotic use. We recommend that baseline studies and control samples be taken in order to determine natural background levels of antibiotic resistant bacteria and/or antibiotic resistance genes when investigating the impacts of veterinary use of antibiotics on human health. We raise questions regarding whether the underlying biology of each type of bacteria contributes to the likelihood of transfer via the food chain.
Esipov, Roman S; Stepanenko, Vasily N; Gurevich, Alexandr I; Chupova, Larisa A; Miroshnikov, Anatoly I
2006-01-01
Chemico-enzymatic synthesis and cloning in Esherichia coli of an artificial gene coding human glucagon was performed. Recombinant plasmid containing hybrid glucagons gene and intein Ssp dnaB from Synechocestis sp. was designed. Expression of the obtained hybrid gene in E. coli, properties of the formed hybrid protein, and conditions of its autocatalytic cleavage leading to glucagon formation were studied.
Rankin, Carl Robert; Theodorou, Evangelos; Law, Ivy Ka Man; Rowe, Lorraine; Kokkotou, Efi; Pekow, Joel; Wang, Jiafang; Martin, Martin G; Pothoulakis, Charalabos; Padua, David Miguel
2018-06-28
Inflammatory bowel disease (IBD) is a complex disorder that is associated with significant morbidity. While many recent advances have been made with new diagnostic and therapeutic tools, a deeper understanding of its basic pathophysiology is needed to continue this trend towards improving treatments. By utilizing an unbiased, high-throughput transcriptomic analysis of two well-established mouse models of colitis, we set out to uncover novel coding and non-coding RNAs that are differentially expressed in the setting of colonic inflammation. RNA-seq analysis was performed using colonic tissue from two mouse models of colitis, a dextran sodium sulfate induced model and a genetic-induced model in mice lacking IL-10. We identified 81 coding RNAs that were commonly altered in both experimental models. Of these coding RNAs, 12 of the human orthologs were differentially expressed in a transcriptomic analysis of IBD patients. Interestingly, 5 of the 12 of human differentially expressed genes have not been previously identified as IBD-associated genes, including ubiquitin D. Our analysis also identified 15 non-coding RNAs that were differentially expressed in either mouse model. Surprisingly, only three non-coding RNAs were commonly dysregulated in both of these models. The discovery of these new coding and non-coding RNAs expands our transcriptional knowledge of mouse models of IBD and offers additional targets to deepen our understanding of the pathophysiology of IBD.
The structure of the human interferon alpha/beta receptor gene.
Lutfalla, G; Gardiner, K; Proudhon, D; Vielh, E; Uzé, G
1992-02-05
Using the cDNA coding for the human interferon alpha/beta receptor (IFNAR), the IFNAR gene has been physically mapped relative to the other loci of the chromosome 21q22.1 region. 32,906 base pairs covering the IFNAR gene have been cloned and sequenced. Primer extension and solution hybridization-ribonuclease protection have been used to determine that the transcription of the gene is initiated in a broad region of 20 base pairs. Some aspects of the polymorphism of the gene, including noncoding sequences, have been analyzed; some are allelic differences in the coding sequence that induce amino acid variations in the resulting protein. The exon structure of the IFNAR gene and of that of the available genes for the receptors of the cytokine/growth hormone/prolactin/interferon receptor family have been compared with the predictions for the secondary structure of those receptors. From this analysis, we postulate a common origin and propose an hypothesis for the divergence from the immunoglobulin superfamily.
Decoding the complex genetic causes of heart diseases using systems biology.
Djordjevic, Djordje; Deshpande, Vinita; Szczesnik, Tomasz; Yang, Andrian; Humphreys, David T; Giannoulatou, Eleni; Ho, Joshua W K
2015-03-01
The pace of disease gene discovery is still much slower than expected, even with the use of cost-effective DNA sequencing and genotyping technologies. It is increasingly clear that many inherited heart diseases have a more complex polygenic aetiology than previously thought. Understanding the role of gene-gene interactions, epigenetics, and non-coding regulatory regions is becoming increasingly critical in predicting the functional consequences of genetic mutations identified by genome-wide association studies and whole-genome or exome sequencing. A systems biology approach is now being widely employed to systematically discover genes that are involved in heart diseases in humans or relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the complex genetic causes of congenital and complex heart diseases. This review summarises state-of-the-art genomic and bioinformatics techniques that are used in accelerating the pace of disease gene discovery in heart diseases. Accompanying this review, we provide an interactive web-resource for systems biology analysis of mammalian heart development and diseases, CardiacCode ( http://CardiacCode.victorchang.edu.au/ ). CardiacCode features a dataset of over 700 pieces of manually curated genetic or molecular perturbation data, which enables the inference of a cardiac-specific GRN of 280 regulatory relationships between 33 regulator genes and 129 target genes. We believe this growing resource will fill an urgent unmet need to fully realise the true potential of predictive and personalised genomic medicine in tackling human heart disease.
NCG 4.0: the network of cancer genes in the era of massive mutational screenings of cancer genomes
An, Omer; Pendino, Vera; D’Antonio, Matteo; Ratti, Emanuele; Gentilini, Marco; Ciccarelli, Francesca D.
2014-01-01
NCG 4.0 is the latest update of the Network of Cancer Genes, a web-based repository of systems-level properties of cancer genes. In its current version, the database collects information on 537 known (i.e. experimentally supported) and 1463 candidate (i.e. inferred using statistical methods) cancer genes. Candidate cancer genes derive from the manual revision of 67 original publications describing the mutational screening of 3460 human exomes and genomes in 23 different cancer types. For all 2000 cancer genes, duplicability, evolutionary origin, expression, functional annotation, interaction network with other human proteins and with microRNAs are reported. In addition to providing a substantial update of cancer-related information, NCG 4.0 also introduces two new features. The first is the annotation of possible false-positive cancer drivers, defined as candidate cancer genes inferred from large-scale screenings whose association with cancer is likely to be spurious. The second is the description of the systems-level properties of 64 human microRNAs that are causally involved in cancer progression (oncomiRs). Owing to the manual revision of all information, NCG 4.0 constitutes a complete and reliable resource on human coding and non-coding genes whose deregulation drives cancer onset and/or progression. NCG 4.0 can also be downloaded as a free application for Android smart phones. Database URL: http://bio.ieo.eu/ncg/ PMID:24608173
Detection of human microRNAs across miRNA Array and Next Generation DNA Sequencing Platforms
microRNA (miRNAs) are non-coding RNA molecules between 19 and 30 nucleotides in length that are believed to regulate approximately 30 per cent of all human genes. They act as negative regulators of their gene targets in many biological processes. Recent developments in microar...
Insights into hominid evolution from the gorilla genome sequence
Scally, Aylwyn; Dutheil, Julien Y.; Hillier, LaDeana W.; Jordan, Greg E.; Goodhead, Ian; Herrero, Javier; Hobolth, Asger; Lappalainen, Tuuli; Mailund, Thomas; Marques-Bonet, Tomas; McCarthy, Shane; Montgomery, Stephen H.; Schwalie, Petra C.; Tang, Y. Amy; Ward, Michelle C.; Xue, Yali; Yngvadottir, Bryndis; Alkan, Can; Andersen, Lars N.; Ayub, Qasim; Ball, Edward V.; Beal, Kathryn; Bradley, Brenda J.; Chen, Yuan; Clee, Chris M.; Fitzgerald, Stephen; Graves, Tina A.; Gu, Yong; Heath, Paul; Heger, Andreas; Karakoc, Emre; Kolb-Kokocinski, Anja; Laird, Gavin K.; Lunter, Gerton; Meader, Stephen; Mort, Matthew; Mullikin, James C.; Munch, Kasper; O’Connor, Timothy D.; Phillips, Andrew D.; Prado-Martinez, Javier; Rogers, Anthony S.; Sajjadian, Saba; Schmidt, Dominic; Shaw, Katy; Simpson, Jared T.; Stenson, Peter D.; Turner, Daniel J.; Vigilant, Linda; Vilella, Albert J.; Whitener, Weldon; Zhu, Baoli; Cooper, David N.; de Jong, Pieter; Dermitzakis, Emmanouil T.; Eichler, Evan E.; Flicek, Paul; Goldman, Nick; Mundy, Nicholas I.; Ning, Zemin; Odom, Duncan T.; Ponting, Chris P.; Quail, Michael A.; Ryder, Oliver A.; Searle, Stephen M.; Warren, Wesley C.; Wilson, Richard K.; Schierup, Mikkel H.; Rogers, Jane; Tyler-Smith, Chris; Durbin, Richard
2012-01-01
Summary Gorillas are humans’ closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago (Mya). In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution. PMID:22398555
Wildman, Derek E; Uddin, Monica; Liu, Guozhen; Grossman, Lawrence I; Goodman, Morris
2003-06-10
What do functionally important DNA sites, those scrutinized and shaped by natural selection, tell us about the place of humans in evolution? Here we compare approximately 90 kb of coding DNA nucleotide sequence from 97 human genes to their sequenced chimpanzee counterparts and to available sequenced gorilla, orangutan, and Old World monkey counterparts, and, on a more limited basis, to mouse. The nonsynonymous changes (functionally important), like synonymous changes (functionally much less important), show chimpanzees and humans to be most closely related, sharing 99.4% identity at nonsynonymous sites and 98.4% at synonymous sites. On a time scale, the coding DNA divergencies separate the human-chimpanzee clade from the gorilla clade at between 6 and 7 million years ago and place the most recent common ancestor of humans and chimpanzees at between 5 and 6 million years ago. The evolutionary rate of coding DNA in the catarrhine clade (Old World monkey and ape, including human) is much slower than in the lineage to mouse. Among the genes examined, 30 show evidence of positive selection during descent of catarrhines. Nonsynonymous substitutions by themselves, in this subset of positively selected genes, group humans and chimpanzees closest to each other and have chimpanzees diverge about as much from the common human-chimpanzee ancestor as humans do. This functional DNA evidence supports two previously offered taxonomic proposals: family Hominidae should include all extant apes; and genus Homo should include three extant species and two subgenera, Homo (Homo) sapiens (humankind), Homo (Pan) troglodytes (common chimpanzee), and Homo (Pan) paniscus (bonobo chimpanzee).
The Sequence and Analysis of Duplication Rich Human Chromosome 16
DOE R&D Accomplishments Database
Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.
2004-01-01
We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.
New steroid 5alpha-reductase type I (SRD5A1) homologous sequences on human chromosomes 6 and 8.
Eminović, I; Liović, M; Prezelj, J; Kocijancic, A; Rozman, D; Komel, R
2001-01-01
To date, two genes encoding 5alpha-reductase isoenzymes are known (type I, type II), and one type I pseudogene. The divergent localization of these genes and the still not fully understood function of the encoded enzymes as well as the perplexing results we obtained after sequencing PCR-amplified SRD5A1 gene fragments (out of genomic DNA), made us assume that, in addition to the known SRD5A1 gene, one or more different human 5alpha-reductase type I coding genes may exist. Our research provide the first evidence for the existence of two new SRD5A1 related, previously unidentified sequences in the human genome. These sequences which were localized to chromosomes 6 and 8 are highly homologous (> 99%) to SRD5A1, and also do not contain any deletions or insertions that are otherwise a characteristic of the SRD5API pseudogene. Our results imply that these sequences may be either coding parts of yet unknown, active SRD5A1 genes, and/or of previously unidentified pseudogenes. These findings additionally support data of Chen et al. who confirmed the existence of various SRD5A1 proteins in cultured human skin cells.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Calabrese, G.; Sallese, M.; Stornaiuolo, A.
1994-09-01
Two types of proteins play a major role in determining homologous desensitization of G-coupled receptors: {beta}-adrenergic receptor kinase ({beta}ARK), which phosphorylates the agonist-occupied receptor and its functional cofactor, {beta}-arrestin. Both {beta}ARK and {beta}-arrestin are members of multigene families. The family of G-protein-coupled receptor kinases includes rhodopsin kinase, {beta}ARK1, {beta}ARK2, IT11-A (GRK4), GRK5, and GRK6. The arrestin/{beta}-arrestin gene family includes arrestin (also known as S-antigen), {beta}-arrestin 1, and {beta}-arrestin 2. Here we report the chromosome mapping of the human genes for arrestin (SAG), {beta}arrestin 2 (ARRB2), and {beta}ARK2 (ADRBK2) by fluorescence in situ hybridization (FISH). FISH results confirmed the assignment ofmore » the gene coding for arrestin (SAG) to chromosome 2 and allowed us to refine its localization to band q37. The gene coding for {beta}-arrestin 2 (ARRB2) was mapped to chromosome 17p13 and that coding for {beta}ARK2 (ADRBK2) to chromosome 22q11. 17 refs., 1 fig.« less
Henne, Karsten; Li, Jing; Stoneking, Mark; Kessler, Olga; Schilling, Hildegard; Sonanini, Anne; Conrads, Georg; Horz, Hans-Peter
2014-08-22
The genetic diversity of the human microbiome holds great potential for shedding light on the history of our ancestors. Helicobacter pylori is the most prominent example as its analysis allowed a fine-scale resolution of past migration patterns including some that could not be distinguished using human genetic markers. However studies of H. pylori require stomach biopsies, which severely limits the number of samples that can be analysed. By focussing on the house-keeping gene gdh (coding for the glucose-6-phosphate dehydrogenase), on the virulence gene gtf (coding for the glucosyltransferase) of mitis-streptococci and on the 16S-23S rRNA internal transcribed spacer (ITS) region of the Fusobacterium nucleatum/periodonticum-group we here tested the hypothesis that bacterial genes from human saliva have the potential for distinguishing human populations. Analysis of 10 individuals from each of seven geographic regions, encompassing Africa, Asia and Europe, revealed that the genes gdh and ITS exhibited the highest number of polymorphic sites (59% and 79%, respectively) and most OTUs (defined at 99% identity) were unique to a given country. In contrast, the gene gtf had the lowest number of polymorphic sites (21%), and most OTUs were shared among countries. Most of the variation in the gdh and ITS genes was explained by the high clonal diversity within individuals (around 80%) followed by inter-individual variation of around 20%, leaving the geographic region as providing virtually no source of sequence variation. Conversely, for gtf the variation within individuals accounted for 32%, between individuals for 57% and among geographic regions for 11%. This geographic signature persisted upon extension of the analysis to four additional locations from the American continent. Pearson correlation analysis, pairwise Fst-cluster analysis as well as UniFrac analyses consistently supported a tree structure in which the European countries clustered tightly together and branched with American countries and South Africa, to the exclusion of Asian countries and the Congo. This study shows that saliva harbours protein-coding bacterial genes that are geographically structured, and which could potentially be used for addressing previously unresolved human migration events.
Long non-coding RNAs and mRNAs profiling during spleen development in pig.
Che, Tiandong; Li, Diyan; Jin, Long; Fu, Yuhua; Liu, Yingkai; Liu, Pengliang; Wang, Yixin; Tang, Qianzi; Ma, Jideng; Wang, Xun; Jiang, Anan; Li, Xuewei; Li, Mingzhou
2018-01-01
Genome-wide transcriptomic studies in humans and mice have become extensive and mature. However, a comprehensive and systematic understanding of protein-coding genes and long non-coding RNAs (lncRNAs) expressed during pig spleen development has not been achieved. LncRNAs are known to participate in regulatory networks for an array of biological processes. Here, we constructed 18 RNA libraries from developing fetal pig spleen (55 days before birth), postnatal pig spleens (0, 30, 180 days and 2 years after birth), and the samples from the 2-year-old Wild Boar. A total of 15,040 lncRNA transcripts were identified among these samples. We found that the temporal expression pattern of lncRNAs was more restricted than observed for protein-coding genes. Time-series analysis showed two large modules for protein-coding genes and lncRNAs. The up-regulated module was enriched for genes related to immune and inflammatory function, while the down-regulated module was enriched for cell proliferation processes such as cell division and DNA replication. Co-expression networks indicated the functional relatedness between protein-coding genes and lncRNAs, which were enriched for similar functions over the series of time points examined. We identified numerous differentially expressed protein-coding genes and lncRNAs in all five developmental stages. Notably, ceruloplasmin precursor (CP), a protein-coding gene participating in antioxidant and iron transport processes, was differentially expressed in all stages. This study provides the first catalog of the developing pig spleen, and contributes to a fuller understanding of the molecular mechanisms underpinning mammalian spleen development.
Phillips, Brianne E; Venn-Watson, Stephanie; Archer, Linda L; Nollens, Hendrik H; Wellehan, James F X
2014-10-01
Hemochromatosis (iron storage disease) has been reported in diverse mammals including bottlenose dolphins (Tursiops truncatus). The primary cause of excessive iron storage in humans is hereditary hemochromatosis. Most human hereditary hemochromatosis cases (up to 90%) are caused by a point mutation in the hfe gene, resulting in a C282Y substitution leading to iron accumulation. To evaluate the possibility of a hereditary hemochromatosis-like genetic predisposition in dolphins, we sequenced the bottlenose dolphin hfe gene, using reverse transcriptase-PCR and hfe primers designed from the dolphin genome, from liver of affected and healthy control dolphins. Sample size included two case animals and five control animals. Although isotype diversity was evident, no coding differences were identified in the hfe gene between any of the animals examined. Because our sample size was small, we cannot exclude the possibility that hemochromatosis in dolphins is due to a coding mutation in the hfe gene. Other potential causes of hemochromatosis, including mutations in different genes, diet, primary liver disease, and insulin resistance, should be evaluated.
Kakizaki, Fumihiko; Sonoshita, Masahiro; Miyoshi, Hiroyuki; Itatani, Yoshiro; Ito, Shinji; Kawada, Kenji; Sakai, Yoshiharu; Taketo, M Mark
2016-11-01
We recently found that the product of the AES gene functions as a metastasis suppressor of colorectal cancer (CRC) in both humans and mice. Expression of amino-terminal enhancer of split (AES) protein is significantly decreased in liver metastatic lesions compared with primary colon tumors. To investigate its downregulation mechanism in metastases, we searched for transcriptional regulators of AES in human CRC and found that its expression is reduced mainly by transcriptional dysregulation and, in some cases, by additional haploidization of its coding gene. The AES promoter-enhancer is in a typical CpG island, and contains a Yin-Yang transcription factor recognition sequence (YY element). In human epithelial cells of normal colon and primary tumors, transcription factor YY2, a member of the YY family, binds directly to the YY element, and stimulates expression of AES. In a transplantation mouse model of liver metastases, however, expression of Yy2 (and therefore of Aes) is downregulated. In human CRC metastases to the liver, the levels of AES protein are correlated with those of YY2. In addition, we noticed copy-number reduction for the AES coding gene in chromosome 19p13.3 in 12% (5/42) of human CRC cell lines. We excluded other mechanisms such as point or indel mutations in the coding or regulatory regions of the AES gene, CpG methylation in the AES promoter enhancer, expression of microRNAs, and chromatin histone modifications. These results indicate that Aes may belong to a novel family of metastasis suppressors with a CpG-island promoter enhancer, and it is regulated transcriptionally. © 2016 The Authors. Cancer Science published by John Wiley & Sons Australia, Ltd on behalf of Japanese Cancer Association.
Yang, Xiaofei; Gao, Lin; Guo, Xingli; Shi, Xinghua; Wu, Hao; Song, Fei; Wang, Bingbo
2014-01-01
Increasing evidence has indicated that long non-coding RNAs (lncRNAs) are implicated in and associated with many complex human diseases. Despite of the accumulation of lncRNA-disease associations, only a few studies had studied the roles of these associations in pathogenesis. In this paper, we investigated lncRNA-disease associations from a network view to understand the contribution of these lncRNAs to complex diseases. Specifically, we studied both the properties of the diseases in which the lncRNAs were implicated, and that of the lncRNAs associated with complex diseases. Regarding the fact that protein coding genes and lncRNAs are involved in human diseases, we constructed a coding-non-coding gene-disease bipartite network based on known associations between diseases and disease-causing genes. We then applied a propagation algorithm to uncover the hidden lncRNA-disease associations in this network. The algorithm was evaluated by leave-one-out cross validation on 103 diseases in which at least two genes were known to be involved, and achieved an AUC of 0.7881. Our algorithm successfully predicted 768 potential lncRNA-disease associations between 66 lncRNAs and 193 diseases. Furthermore, our results for Alzheimer's disease, pancreatic cancer, and gastric cancer were verified by other independent studies. PMID:24498199
Evaluation of 10 genes encoding cardiac proteins in Doberman Pinschers with dilated cardiomyopathy.
O'Sullivan, M Lynne; O'Grady, Michael R; Pyle, W Glen; Dawson, John F
2011-07-01
To identify a causative mutation for dilated cardiomyopathy (DCM) in Doberman Pinschers by sequencing the coding regions of 10 cardiac genes known to be associated with familial DCM in humans. 5 Doberman Pinschers with DCM and congestive heart failure and 5 control mixed-breed dogs that were euthanized or died. RNA was extracted from frozen ventricular myocardial samples from each dog, and first-strand cDNA was synthesized via reverse transcription, followed by PCR amplification with gene-specific primers. Ten cardiac genes were analyzed: cardiac actin, α-actinin, α-tropomyosin, β-myosin heavy chain, metavinculin, muscle LIM protein, myosinbinding protein C, tafazzin, titin-cap (telethonin), and troponin T. Sequences for DCM-affected and control dogs and the published canine genome were compared. None of the coding sequences yielded a common causative mutation among all Doberman Pinscher samples. However, 3 variants were identified in the α-actinin gene in the DCM-affected Doberman Pinschers. One of these variants, identified in 2 of the 5 Doberman Pinschers, resulted in an amino acid change in the rod-forming triple coiled-coil domain. Mutations in the coding regions of several genes associated with DCM in humans did not appear to consistently account for DCM in Doberman Pinschers. However, an α-actinin variant was detected in some Doberman Pinschers that may contribute to the development of DCM given its potential effect on the structure of this protein. Investigation of additional candidate gene coding and noncoding regions and further evaluation of the role of α-actinin in development of DCM in Doberman Pinschers are warranted.
Genome-wide analysis of alternative splicing during human heart development
NASA Astrophysics Data System (ADS)
Wang, He; Chen, Yanmei; Li, Xinzhong; Chen, Guojun; Zhong, Lintao; Chen, Gangbing; Liao, Yulin; Liao, Wangjun; Bin, Jianping
2016-10-01
Alternative splicing (AS) drives determinative changes during mouse heart development. Recent high-throughput technological advancements have facilitated genome-wide AS, while its analysis in human foetal heart transition to the adult stage has not been reported. Here, we present a high-resolution global analysis of AS transitions between human foetal and adult hearts. RNA-sequencing data showed extensive AS transitions occurred between human foetal and adult hearts, and AS events occurred more frequently in protein-coding genes than in long non-coding RNA (lncRNA). A significant difference of AS patterns was found between foetal and adult hearts. The predicted difference in AS events was further confirmed using quantitative reverse transcription-polymerase chain reaction analysis of human heart samples. Functional foetal-specific AS event analysis showed enrichment associated with cell proliferation-related pathways including cell cycle, whereas adult-specific AS events were associated with protein synthesis. Furthermore, 42.6% of foetal-specific AS events showed significant changes in gene expression levels between foetal and adult hearts. Genes exhibiting both foetal-specific AS and differential expression were highly enriched in cell cycle-associated functions. In conclusion, we provided a genome-wide profiling of AS transitions between foetal and adult hearts and proposed that AS transitions and deferential gene expression may play determinative roles in human heart development.
Identifying personal microbiomes using metagenomic codes
Franzosa, Eric A.; Huang, Katherine; Meadow, James F.; Gevers, Dirk; Lemon, Katherine P.; Bohannan, Brendan J. M.; Huttenhower, Curtis
2015-01-01
Community composition within the human microbiome varies across individuals, but it remains unknown if this variation is sufficient to uniquely identify individuals within large populations or stable enough to identify them over time. We investigated this by developing a hitting set-based coding algorithm and applying it to the Human Microbiome Project population. Our approach defined body site-specific metagenomic codes: sets of microbial taxa or genes prioritized to uniquely and stably identify individuals. Codes capturing strain variation in clade-specific marker genes were able to distinguish among 100s of individuals at an initial sampling time point. In comparisons with follow-up samples collected 30–300 d later, ∼30% of individuals could still be uniquely pinpointed using metagenomic codes from a typical body site; coincidental (false positive) matches were rare. Codes based on the gut microbiome were exceptionally stable and pinpointed >80% of individuals. The failure of a code to match its owner at a later time point was largely explained by the loss of specific microbial strains (at current limits of detection) and was only weakly associated with the length of the sampling interval. In addition to highlighting patterns of temporal variation in the ecology of the human microbiome, this work demonstrates the feasibility of microbiome-based identifiability—a result with important ethical implications for microbiome study design. The datasets and code used in this work are available for download from huttenhower.sph.harvard.edu/idability. PMID:25964341
Wu, Dong-Dong; Ye, Ling-Qun; Li, Yan; Sun, Yan-Bo; Shao, Yi; Chen, Chunyan; Zhu, Zhu; Zhong, Li; Wang, Lu; Irwin, David M; Zhang, Yong E; Zhang, Ya-Ping
2015-08-01
Next-generation RNA sequencing has been successfully used for identification of transcript assembly, evaluation of gene expression levels, and detection of post-transcriptional modifications. Despite these large-scale studies, additional comprehensive RNA-seq data from different subregions of the human brain are required to fully evaluate the evolutionary patterns experienced by the human brain transcriptome. Here, we provide a total of 6.5 billion RNA-seq reads from different subregions of the human brain. A significant correlation was observed between the levels of alternative splicing and RNA editing, which might be explained by a competition between the molecular machineries responsible for the splicing and editing of RNA. Young human protein-coding genes demonstrate biased expression to the neocortical and non-neocortical regions during evolution on the lineage leading to humans. We also found that a significantly greater number of young human protein-coding genes are expressed in the putamen, a tissue that was also observed to have the highest level of RNA-editing activity. The putamen, which previously received little attention, plays an important role in cognitive ability, and our data suggest a potential contribution of the putamen to human evolution. © The Author (2015). Published by Oxford University Press on behalf of Journal of Molecular Cell Biology, IBCB, SIBS, CAS. All rights reserved.
The gene coding for glial cell line derived neurotrophic factor (GDNF) maps to chromosome 5p12-p13.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schindelhauer, D.; Schuffenhauer, S.; Meitinger, T.
1995-08-10
The gene coding for glial cell line derived neurotrophic factor (GDNF) has biological properties that may have potential as a treatment for Parkinson`s and motoneuron diseases. Using the NIGMS Mapping Panel 2, we have localized the GDNF gene to human chromosome 5p12-p13.1. Large NruI and NotI fragments on chromosome 5 will facilitate the construction of a long-range map of the region. 26 refs., 1 fig., 1 tab.
Leloire, Audrey; Dhennin, Véronique; Coumoul, Xavier; Yengo, Loïc; Froguel, Philippe
2017-01-01
Bisphenol A (BPA) exposure has been suspected to be associated with deleterious effects on health including obesity and metabolically-linked diseases. Although bisphenols F (BPF) and S (BPS) are BPA structural analogs commonly used in many marketed products as a replacement for BPA, only sparse toxicological data are available yet. Our objective was to comprehensively characterize bisphenols gene targets in a human primary adipocyte model, in order to determine whether they may induce cellular dysfunction, using chronic exposure at two concentrations: a “low-dose” similar to the dose usually encountered in human biological fluids and a higher dose. Therefore, BPA, BPF and BPS have been added at 10 nM or 10 μM during the differentiation of human primary adipocytes from subcutaneous fat of three non-diabetic Caucasian female patients. Gene expression (mRNA/lncRNA) arrays and microRNA arrays, have been used to assess coding and non-coding RNA changes. We detected significantly deregulated mRNA/lncRNA and miRNA at low and high doses. Enrichment in “cancer” and “organismal injury and abnormalities” related pathways was found in response to the three products. Some long intergenic non-coding RNAs and small nucleolar RNAs were differentially expressed suggesting that bisphenols may also activate multiple cellular processes and epigenetic modifications. The analysis of upstream regulators of deregulated genes highlighted hormones or hormone-like chemicals suggesting that BPS and BPF can be suspected to interfere, just like BPA, with hormonal regulation and have to be considered as endocrine disruptors. All these results suggest that as BPA, its substitutes BPS and BPF should be used with the same restrictions. PMID:28628672
DOE Office of Scientific and Technical Information (OSTI.GOV)
Umans, L.; Serneels, L.; Hilliker, C.
1994-08-01
The authors have cloned the mouse gene coding for {alpha}{sub 2}-macroglobulin in overlapping {lambda} clones and have analyzed its structure. The gene contains 36 exons, coding for the 4.8-kb cDNA that we cloned previously. Including putative control elements in the 5{prime} flanking region, the gene covers about 45 kb. A region of 3.8 kb, stretching from 835 bases upstream of the cDNA start site to exon 4, including all intervening sequences, was sequenced completely. The analysis demonstrated that the putative promoter region of the mouse A2M gene differed considerably from the known promoter sequences of the human A2M gene andmore » of the rat acute-phas A2M gene. Comparison of the exon-intron structure of all known genes of the A2M family confirmed that the rat acute phase A2M gene is more closely related to the human gene than to the mouse A2M gene. To generate mice with the A2M gene inactivated, an insertion type of construct containing 7.5 kb of genomic DNA of the mouse strain 129/J, encompassing exons 16 to 19, was synthesized. A hygromycin marker gene was embedded in intron 17. After electroporation, 198 hygromycin-resistant ES cell lines were isolated and analyzed by Southern blotting. Five ES cell lines were obtained with one allele of the mouse A2M gene targeted by this insertion construct, demonstrating that the position and the characteristics of the vector served the intended goal.« less
Wildman, Derek E.; Uddin, Monica; Liu, Guozhen; Grossman, Lawrence I.; Goodman, Morris
2003-01-01
What do functionally important DNA sites, those scrutinized and shaped by natural selection, tell us about the place of humans in evolution? Here we compare ≈90 kb of coding DNA nucleotide sequence from 97 human genes to their sequenced chimpanzee counterparts and to available sequenced gorilla, orangutan, and Old World monkey counterparts, and, on a more limited basis, to mouse. The nonsynonymous changes (functionally important), like synonymous changes (functionally much less important), show chimpanzees and humans to be most closely related, sharing 99.4% identity at nonsynonymous sites and 98.4% at synonymous sites. On a time scale, the coding DNA divergencies separate the human–chimpanzee clade from the gorilla clade at between 6 and 7 million years ago and place the most recent common ancestor of humans and chimpanzees at between 5 and 6 million years ago. The evolutionary rate of coding DNA in the catarrhine clade (Old World monkey and ape, including human) is much slower than in the lineage to mouse. Among the genes examined, 30 show evidence of positive selection during descent of catarrhines. Nonsynonymous substitutions by themselves, in this subset of positively selected genes, group humans and chimpanzees closest to each other and have chimpanzees diverge about as much from the common human–chimpanzee ancestor as humans do. This functional DNA evidence supports two previously offered taxonomic proposals: family Hominidae should include all extant apes; and genus Homo should include three extant species and two subgenera, Homo (Homo) sapiens (humankind), Homo (Pan) troglodytes (common chimpanzee), and Homo (Pan) paniscus (bonobo chimpanzee). PMID:12766228
Analysis and recognition of 5′ UTR intron splice sites in human pre-mRNA
Eden, E.; Brunak, S.
2004-01-01
Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5′ untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to ‘pure’ UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by ‘coding’ noise, thus enhancing significantly the prediction of 5′ UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3′ ends of non-coding exons and 5′ non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2–3-fold better compared with NetGene2 and GenScan in 5′ UTRs. We also tested the 5′ UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR. PMID:14960723
dbCPG: A web resource for cancer predisposition genes.
Wei, Ran; Yao, Yao; Yang, Wu; Zheng, Chun-Hou; Zhao, Min; Xia, Junfeng
2016-06-21
Cancer predisposition genes (CPGs) are genes in which inherited mutations confer highly or moderately increased risks of developing cancer. Identification of these genes and understanding the biological mechanisms that underlie them is crucial for the prevention, early diagnosis, and optimized management of cancer. Over the past decades, great efforts have been made to identify CPGs through multiple strategies. However, information on these CPGs and their molecular functions is scattered. To address this issue and provide a comprehensive resource for researchers, we developed the Cancer Predisposition Gene Database (dbCPG, Database URL: http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp), the first literature-based gene resource for exploring human CPGs. It contains 827 human (724 protein-coding, 23 non-coding, and 80 unknown type genes), 637 rats, and 658 mouse CPGs. Furthermore, data mining was performed to gain insights into the understanding of the CPGs data, including functional annotation, gene prioritization, network analysis of prioritized genes and overlap analysis across multiple cancer types. A user-friendly web interface with multiple browse, search, and upload functions was also developed to facilitate access to the latest information on CPGs. Taken together, the dbCPG database provides a comprehensive data resource for further studies of cancer predisposition genes.
McGuire, Austen B; Rafi, Syed K; Manzardo, Ann M; Butler, Merlin G
2016-05-05
Mammalian chromosomes are comprised of complex chromatin architecture with the specific assembly and configuration of each chromosome influencing gene expression and function in yet undefined ways by varying degrees of heterochromatinization that result in Giemsa (G) negative euchromatic (light) bands and G-positive heterochromatic (dark) bands. We carried out morphometric measurements of high-resolution chromosome ideograms for the first time to characterize the total euchromatic and heterochromatic chromosome band length, distribution and localization of 20,145 known protein-coding genes, 790 recognized autism spectrum disorder (ASD) genes and 365 obesity genes. The individual lengths of G-negative euchromatin and G-positive heterochromatin chromosome bands were measured in millimeters and recorded from scaled and stacked digital images of 850-band high-resolution ideograms supplied by the International Society of Chromosome Nomenclature (ISCN) 2013. Our overall measurements followed established banding patterns based on chromosome size. G-negative euchromatic band regions contained 60% of protein-coding genes while the remaining 40% were distributed across the four heterochromatic dark band sub-types. ASD genes were disproportionately overrepresented in the darker heterochromatic sub-bands, while the obesity gene distribution pattern did not significantly differ from protein-coding genes. Our study supports recent trends implicating genes located in heterochromatin regions playing a role in biological processes including neurodevelopment and function, specifically genes associated with ASD.
Biased exonization of transposed elements in duplicated genes: A lesson from the TIF-IA gene.
Amit, Maayan; Sela, Noa; Keren, Hadas; Melamed, Ze'ev; Muler, Inna; Shomron, Noam; Izraeli, Shai; Ast, Gil
2007-11-29
Gene duplication and exonization of intronic transposed elements are two mechanisms that enhance genomic diversity. We examined whether there is less selection against exonization of transposed elements in duplicated genes than in single-copy genes. Genome-wide analysis of exonization of transposed elements revealed a higher rate of exonization within duplicated genes relative to single-copy genes. The gene for TIF-IA, an RNA polymerase I transcription initiation factor, underwent a humanoid-specific triplication, all three copies of the gene are active transcriptionally, although only one copy retains the ability to generate the TIF-IA protein. Prior to TIF-IA triplication, an Alu element was inserted into the first intron. In one of the non-protein coding copies, this Alu is exonized. We identified a single point mutation leading to exonization in one of the gene duplicates. When this mutation was introduced into the TIF-IA coding copy, exonization was activated and the level of the protein-coding mRNA was reduced substantially. A very low level of exonization was detected in normal human cells. However, this exonization was abundant in most leukemia cell lines evaluated, although the genomic sequence is unchanged in these cancerous cells compared to normal cells. The definition of the Alu element within the TIF-IA gene as an exon is restricted to certain types of cancers; the element is not exonized in normal human cells. These results further our understanding of the delicate interplay between gene duplication and alternative splicing and of the molecular evolutionary mechanisms leading to genetic innovations. This implies the existence of purifying selection against exonization in single copy genes, with duplicate genes free from such constrains.
Biased exonization of transposed elements in duplicated genes: A lesson from the TIF-IA gene
Amit, Maayan; Sela, Noa; Keren, Hadas; Melamed, Ze'ev; Muler, Inna; Shomron, Noam; Izraeli, Shai; Ast, Gil
2007-01-01
Background Gene duplication and exonization of intronic transposed elements are two mechanisms that enhance genomic diversity. We examined whether there is less selection against exonization of transposed elements in duplicated genes than in single-copy genes. Results Genome-wide analysis of exonization of transposed elements revealed a higher rate of exonization within duplicated genes relative to single-copy genes. The gene for TIF-IA, an RNA polymerase I transcription initiation factor, underwent a humanoid-specific triplication, all three copies of the gene are active transcriptionally, although only one copy retains the ability to generate the TIF-IA protein. Prior to TIF-IA triplication, an Alu element was inserted into the first intron. In one of the non-protein coding copies, this Alu is exonized. We identified a single point mutation leading to exonization in one of the gene duplicates. When this mutation was introduced into the TIF-IA coding copy, exonization was activated and the level of the protein-coding mRNA was reduced substantially. A very low level of exonization was detected in normal human cells. However, this exonization was abundant in most leukemia cell lines evaluated, although the genomic sequence is unchanged in these cancerous cells compared to normal cells. Conclusion The definition of the Alu element within the TIF-IA gene as an exon is restricted to certain types of cancers; the element is not exonized in normal human cells. These results further our understanding of the delicate interplay between gene duplication and alternative splicing and of the molecular evolutionary mechanisms leading to genetic innovations. This implies the existence of purifying selection against exonization in single copy genes, with duplicate genes free from such constrains. PMID:18047649
Noncoding sequence classification based on wavelet transform analysis: part I
NASA Astrophysics Data System (ADS)
Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.
2017-09-01
DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.
Structure and chromosomal localization of the human PD-1 gene (PDCD1)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shinohara, T.; Ishida, Y.; Kawaichi, M.
1994-10-01
A cDNA encoding mouse PD-1, a member of the immunoglobulin superfamily, was previously isolated from apoptosis-induced cells by subtractive hybridization. To determine the structure and chromosomal location of the human PD-1 gene, we screened a human T cell cDNA library by mouse PD-1 probe and isolated a cDNA coding for the human PD-1 protein. The deduced amino acid sequence of human PD-1 was 60% identical to the mouse counterpart, and a putative tyrosine kinase-association motif was well conserved. The human PD-1 gene was mapped to 2q37.3 by chromosomal in situ hybridization. 7 refs., 3 figs.
2010-01-01
Background Adenosine to inosine (A-to-I) RNA-editing is an essential post-transcriptional mechanism that occurs in numerous sites in the human transcriptome, mainly within Alu repeats. It has been shown to have consistent levels of editing across individuals in a few targets in the human brain and altered in several human pathologies. However, the variability across human individuals of editing levels in other tissues has not been studied so far. Results Here, we analyzed 32 skin samples, looking at A-to-I editing level in three genes within coding sequences and in the Alu repeats of six different genes. We observed highly consistent editing levels across different individuals as well as across tissues, not only in coding targets but, surprisingly, also in the non evolutionary conserved Alu repeats. Conclusions Our findings suggest that A-to-I RNA-editing of Alu elements is a tightly regulated process and, as such, might have been recruited in the course of primate evolution for post-transcriptional regulatory mechanisms. PMID:21029430
Disruption of long-distance highly conserved noncoding elements in neurocristopathies.
Amiel, Jeanne; Benko, Sabina; Gordon, Christopher T; Lyonnet, Stanislas
2010-12-01
One of the key discoveries of vertebrate genome sequencing projects has been the identification of highly conserved noncoding elements (CNEs). Some characteristics of CNEs include their high frequency in mammalian genomes, their potential regulatory role in gene expression, and their enrichment in gene deserts nearby master developmental genes. The abnormal development of neural crest cells (NCCs) leads to a broad spectrum of congenital malformation(s), termed neurocristopathies, and/or tumor predisposition. Here we review recent findings that disruptions of CNEs, within or at long distance from the coding sequences of key genes involved in NCC development, result in neurocristopathies via the alteration of tissue- or stage-specific long-distance regulation of gene expression. While most studies on human genetic disorders have focused on protein-coding sequences, these examples suggest that investigation of genomic alterations of CNEs will provide a broader understanding of the molecular etiology of both rare and common human congenital malformations. © 2010 New York Academy of Sciences.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Teumer, J.; Green, H.
1989-02-01
The gene for involucrin, an epidermal protein, has been remodeled in the higher primates. Most of the coding region of the human gene consists of a modern segment of repeats derived from a 10-codon sequence present in the ancestral segment of the gene. The modern segment can be divided into early, middle, and late regions. The authors report here the nucleotide sequence of three alleles of the gorilla involucrin gene. Each possesses a modern segment homologous to that of the human and consisting of 10-codon repeats. The early and middle regions are similar to the corresponding regions of the humanmore » allele and are nearly identical among the different gorilla alleles. The late region consists of recent duplications whose pattern is unique in each of the gorilla alleles and in the human allele. The early region is located in what is now the 3{prime} third of the modern segment, and the late, polymorphic region is located in what is now the 5{prime} third. Therefore, as the modern segment expanded during evolution, its 3{prime} end became stabilized, and continuing duplications became confined to its 5{prime} end. The expansion of the involucrin coding region, which began long before the separation of the gorilla and human, has continued in both species after their separation.« less
Florio, Marta; Heide, Michael; Pinson, Anneline; Brandl, Holger; Albert, Mareike; Winkler, Sylke; Wimberger, Pauline; Huttner, Wieland B; Hiller, Michael
2018-03-21
Understanding the molecular basis that underlies the expansion of the neocortex during primate, and notably human, evolution requires the identification of genes that are particularly active in the neural stem and progenitor cells of the developing neocortex. Here, we have used existing transcriptome datasets to carry out a comprehensive screen for protein-coding genes preferentially expressed in progenitors of fetal human neocortex. We show that 15 human-specific genes exhibit such expression, and many of them evolved distinct neural progenitor cell-type expression profiles and levels compared to their ancestral paralogs. Functional studies on one such gene, NOTCH2NL , demonstrate its ability to promote basal progenitor proliferation in mice. An additional 35 human genes with progenitor-enriched expression are shown to have orthologs only in primates. Our study provides a resource of genes that are promising candidates to exert specific, and novel, roles in neocortical development during primate, and notably human, evolution. © 2018, Florio et al.
Pinson, Anneline; Brandl, Holger; Albert, Mareike; Winkler, Sylke; Wimberger, Pauline
2018-01-01
Understanding the molecular basis that underlies the expansion of the neocortex during primate, and notably human, evolution requires the identification of genes that are particularly active in the neural stem and progenitor cells of the developing neocortex. Here, we have used existing transcriptome datasets to carry out a comprehensive screen for protein-coding genes preferentially expressed in progenitors of fetal human neocortex. We show that 15 human-specific genes exhibit such expression, and many of them evolved distinct neural progenitor cell-type expression profiles and levels compared to their ancestral paralogs. Functional studies on one such gene, NOTCH2NL, demonstrate its ability to promote basal progenitor proliferation in mice. An additional 35 human genes with progenitor-enriched expression are shown to have orthologs only in primates. Our study provides a resource of genes that are promising candidates to exert specific, and novel, roles in neocortical development during primate, and notably human, evolution. PMID:29561261
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).
Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren
2016-04-01
Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)
Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren
2016-01-01
Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575
Batagov, Arsen O; Yarmishyn, Aliaksandr A; Jenjaroenpun, Piroon; Tan, Jovina Z; Nishida, Yuichiro; Kurochkin, Igor V
2013-10-16
Mammalian genomes are extensively transcribed producing thousands of long non-protein-coding RNAs (lncRNAs). The biological significance and function of the vast majority of lncRNAs remain unclear. Recent studies have implicated several lncRNAs as playing important roles in embryonic development and cancer progression. LncRNAs are characterized with different genomic architectures in relationship with their associated protein-coding genes. Our study aimed at bridging lncRNA architecture with dynamical patterns of their expression using differentiating human neuroblastoma cells model. LncRNA expression was studied in a 120-hours timecourse of differentiation of human neuroblastoma SH-SY5Y cells into neurons upon treatment with retinoic acid (RA), the compound used for the treatment of neuroblastoma. A custom microarray chip was utilized to interrogate expression levels of 9,267 lncRNAs in the course of differentiation. We categorized lncRNAs into 19 architecture classes according to their position relatively to protein-coding genes. For each architecture class, dynamics of expression of lncRNAs was studied in association with their protein-coding partners. It allowed us to demonstrate positive correlation of lncRNAs with their associated protein-coding genes at bidirectional promoters and for sense-antisense transcript pairs. In contrast, lncRNAs located in the introns and downstream of the protein-coding genes were characterized with negative correlation modes. We further classified the lncRNAs by the temporal patterns of their expression dynamics. We found that intronic and bidirectional promoter architectures are associated with rapid RA-dependent induction or repression of the corresponding lncRNAs, followed by their constant expression. At the same time, lncRNAs expressed downstream of protein-coding genes are characterized by rapid induction, followed by transcriptional repression. Quantitative RT-PCR analysis confirmed the discovered functional modes for several selected lncRNAs associated with proteins involved in cancer and embryonic development. This is the first report detailing dynamical changes of multiple lncRNAs during RA-induced neuroblastoma differentiation. Integration of genomic and transcriptomic levels of information allowed us to demonstrate specific behavior of lncRNAs organized in different genomic architectures. This study also provides a list of lncRNAs with possible roles in neuroblastoma.
Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts.
Sanford, Jeremy R; Wang, Xin; Mort, Matthew; Vanduyn, Natalia; Cooper, David N; Mooney, Sean D; Edenberg, Howard J; Liu, Yunlong
2009-03-01
Metazoan genes are encrypted with at least two superimposed codes: the genetic code to specify the primary structure of proteins and the splicing code to expand their proteomic output via alternative splicing. Here, we define the specificity of a central regulator of pre-mRNA splicing, the conserved, essential splicing factor SFRS1. Cross-linking immunoprecipitation and high-throughput sequencing (CLIP-seq) identified 23,632 binding sites for SFRS1 in the transcriptome of cultured human embryonic kidney cells. SFRS1 was found to engage many different classes of functionally distinct transcripts including mRNA, miRNA, snoRNAs, ncRNAs, and conserved intergenic transcripts of unknown function. The majority of these diverse transcripts share a purine-rich consensus motif corresponding to the canonical SFRS1 binding site. The consensus site was not only enriched in exons cross-linked to SFRS1 in vivo, but was also enriched in close proximity to splice sites. mRNAs encoding RNA processing factors were significantly overrepresented, suggesting that SFRS1 may broadly influence the post-transcriptional control of gene expression in vivo. Finally, a search for the SFRS1 consensus motif within the Human Gene Mutation Database identified 181 mutations in 82 different genes that disrupt predicted SFRS1 binding sites. This comprehensive analysis substantially expands the known roles of human SR proteins in the regulation of a diverse array of RNA transcripts.
A candidate gene for choanal atresia in alpaca.
Reed, Kent M; Bauer, Miranda M; Mendoza, Kristelle M; Armién, Aníbal G
2010-03-01
Choanal atresia (CA) is a common nasal craniofacial malformation in New World domestic camelids (alpaca and llama). CA results from abnormal development of the nasal passages and is especially debilitating to newborn crias. CA in camelids shares many of the clinical manifestations of a similar condition in humans (CHARGE syndrome). Herein we report on the regulatory gene CHD7 of alpaca, whose homologue in humans is most frequently associated with CHARGE. Sequence of the CHD7 coding region was obtained from a non-affected cria. The complete coding region was 9003 bp, corresponding to a translated amino acid sequence of 3000 aa. Additional genomic sequences corresponding to a significant portion of the CHD7 gene were identified and assembled from the 2x alpaca whole genome sequence, providing confirmatory sequence for much of the CHD7 coding region. The alpaca CHD7 mRNA sequence was 97.9% similar to the human sequence, with the greatest sequence difference being an insertion in exon 38 that results in a polyalanine repeat (A12). Polymorphism in this repeat was tested for association with CA in alpaca by cloning and sequencing the repeat from both affected and non-affected individuals. Variation in length of the poly-A repeat was not associated with CA. Complete sequencing of the CHD7 gene will be necessary to determine whether other mutations in CHD7 are the cause of CA in camelids.
Zhou, Yanrong; Lin, Yanli; Wu, Xiaojie; Xiong, Fuyin; Lv, Yuemeng; Zheng, Tao; Huang, Peitang; Chen, Hongxing
2012-02-01
Transgene expression for the mammary gland bioreactor aimed at producing recombinant proteins requires optimized expression vector construction. Previously we presented a hybrid gene locus strategy, which was originally tested with human lactoferrin (hLF) as target transgene, and an extremely high-level expression of rhLF ever been achieved as to 29.8 g/l in mice milk. Here to demonstrate the broad application of this strategy, another 38.4 kb mWAP-htPA hybrid gene locus was constructed, in which the 3-kb genomic coding sequence in the 24-kb mouse whey acidic protein (mWAP) gene locus was substituted by the 17.4-kb genomic coding sequence of human tissue plasminogen activator (htPA), exactly from the start codon to the end codon. Corresponding five transgenic mice lines were generated and the highest expression level of rhtPA in the milk attained as to 3.3 g/l. Our strategy will provide a universal way for the large-scale production of pharmaceutical proteins in the mammary gland of transgenic animals.
Progressive changes in non-coding RNA profile in leucocytes with age
Muñoz-Culla, Maider; Irizar, Haritz; Gorostidi, Ana; Alberro, Ainhoa; Osorio-Querejeta, Iñaki; Ruiz-Martínez, Javier; Olascoaga, Javier; de Munain, Adolfo López; Otaegui, David
2017-01-01
It has been observed that immune cell deterioration occurs in the elderly, as well as a chronic low-grade inflammation called inflammaging. These cellular changes must be driven by numerous changes in gene expression and in fact, both protein-coding and non-coding RNA expression alterations have been observed in peripheral blood mononuclear cells from elder people. In the present work we have studied the expression of small non-coding RNA (microRNA and small nucleolar RNA -snoRNA-) from healthy individuals from 24 to 79 years old. We have observed that the expression of 69 non-coding RNAs (56 microRNAs and 13 snoRNAs) changes progressively with chronological age. According to our results, the age range from 47 to 54 is critical given that it is the period when the expression trend (increasing or decreasing) of age-related small non-coding RNAs is more pronounced. Furthermore, age-related miRNAs regulate genes that are involved in immune, cell cycle and cancer-related processes, which had already been associated to human aging. Therefore, human aging could be studied as a result of progressive molecular changes, and different age ranges should be analysed to cover the whole aging process. PMID:28448962
Human coding RNA editing is generally nonadaptive
Xu, Guixia; Zhang, Jianzhi
2014-01-01
Impairment of RNA editing at a handful of coding sites causes severe disorders, prompting the view that coding RNA editing is highly advantageous. Recent genomic studies have expanded the list of human coding RNA editing sites by more than 100 times, raising the question of how common advantageous RNA editing is. Analyzing 1,783 human coding A-to-G editing sites, we show that both the frequency and level of RNA editing decrease as the importance of a site or gene increases; that during evolution, edited As are more likely than unedited As to be replaced with Gs but not with Ts or Cs; and that among nonsynonymously edited As, those that are evolutionarily least conserved exhibit the highest editing levels. These and other observations reveal the overall nonadaptive nature of coding RNA editing, despite the presence of a few sites in which editing is clearly beneficial. We propose that most observed coding RNA editing results from tolerable promiscuous targeting by RNA editing enzymes, the original physiological functions of which remain elusive. PMID:24567376
PHENOstruct: Prediction of human phenotype ontology terms using heterogeneous data sources.
Kahanda, Indika; Funk, Christopher; Verspoor, Karin; Ben-Hur, Asa
2015-01-01
The human phenotype ontology (HPO) was recently developed as a standardized vocabulary for describing the phenotype abnormalities associated with human diseases. At present, only a small fraction of human protein coding genes have HPO annotations. But, researchers believe that a large portion of currently unannotated genes are related to disease phenotypes. Therefore, it is important to predict gene-HPO term associations using accurate computational methods. In this work we demonstrate the performance advantage of the structured SVM approach which was shown to be highly effective for Gene Ontology term prediction in comparison to several baseline methods. Furthermore, we highlight a collection of informative data sources suitable for the problem of predicting gene-HPO associations, including large scale literature mining data.
Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse
Hillier, LaDeana W.; Zody, Michael C.; Goldstein, Steve; She, Xinwe; Bult, Carol J.; Agarwala, Richa; Cherry, Joshua L.; DiCuccio, Michael; Hlavina, Wratko; Kapustin, Yuri; Meric, Peter; Maglott, Donna; Birtle, Zoë; Marques, Ana C.; Graves, Tina; Zhou, Shiguo; Teague, Brian; Potamousis, Konstantinos; Churas, Christopher; Place, Michael; Herschleb, Jill; Runnheim, Ron; Forrest, Daniel; Amos-Landgraf, James; Schwartz, David C.; Cheng, Ze; Lindblad-Toh, Kerstin; Eichler, Evan E.; Ponting, Chris P.
2009-01-01
The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not. PMID:19468303
Norman, Paul J.; Norberg, Steven J.; Guethlein, Lisbeth A.; Nemat-Gorgani, Neda; Royce, Thomas; Wroblewski, Emily E.; Dunn, Tamsen; Mann, Tobias; Alicata, Claudia; Hollenbach, Jill A.; Chang, Weihua; Shults Won, Melissa; Gunderson, Kevin L.; Abi-Rached, Laurent; Ronaghi, Mostafa; Parham, Peter
2017-01-01
The most polymorphic part of the human genome, the MHC, encodes over 160 proteins of diverse function. Half of them, including the HLA class I and II genes, are directly involved in immune responses. Consequently, the MHC region strongly associates with numerous diseases and clinical therapies. Notoriously, the MHC region has been intractable to high-throughput analysis at complete sequence resolution, and current reference haplotypes are inadequate for large-scale studies. To address these challenges, we developed a method that specifically captures and sequences the 4.8-Mbp MHC region from genomic DNA. For 95 MHC homozygous cell lines we assembled, de novo, a set of high-fidelity contigs and a sequence scaffold, representing a mean 98% of the target region. Included are six alternative MHC reference sequences of the human genome that we completed and refined. Characterization of the sequence and structural diversity of the MHC region shows the approach accurately determines the sequences of the highly polymorphic HLA class I and HLA class II genes and the complex structural diversity of complement factor C4A/C4B. It has also uncovered extensive and unexpected diversity in other MHC genes; an example is MUC22, which encodes a lung mucin and exhibits more coding sequence alleles than any HLA class I or II gene studied here. More than 60% of the coding sequence alleles analyzed were previously uncharacterized. We have created a substantial database of robust reference MHC haplotype sequences that will enable future population scale studies of this complicated and clinically important region of the human genome. PMID:28360230
Corbi, N; Libri, V; Fanciulli, M; Tinsley, J M; Davies, K E; Passananti, C
2000-06-01
Up-regulation of utrophin gene expression is recognized as a plausible therapeutic approach in the treatment of Duchenne muscular dystrophy (DMD). We have designed and engineered new zinc finger-based transcription factors capable of binding and activating transcription from the promoter of the dystrophin-related gene, utrophin. Using the recognition 'code' that proposes specific rules between zinc finger primary structure and potential DNA binding sites, we engineered a new gene named 'Jazz' that encodes for a three-zinc finger peptide. Jazz belongs to the Cys2-His2 zinc finger type and was engineered to target the nine base pair DNA sequence: 5'-GCT-GCT-GCG-3', present in the promoter region of both the human and mouse utrophin gene. The entire zinc finger alpha-helix region, containing the amino acid positions that are crucial for DNA binding, was specifically chosen on the basis of the contacts more frequently represented in the available list of the 'code'. Here we demonstrate that Jazz protein binds specifically to the double-stranded DNA target, with a dissociation constant of about 32 nM. Band shift and super-shift experiments confirmed the high affinity and specificity of Jazz protein for its DNA target. Moreover, we show that chimeric proteins, named Gal4-Jazz and Sp1-Jazz, are able to drive the transcription of a test gene from the human utrophin promoter.
Structural architecture of the human long non-coding RNA, steroid receptor RNA activator
Novikova, Irina V.; Hennelly, Scott P.; Sanbonmatsu, Karissa Y.
2012-01-01
While functional roles of several long non-coding RNAs (lncRNAs) have been determined, the molecular mechanisms are not well understood. Here, we report the first experimentally derived secondary structure of a human lncRNA, the steroid receptor RNA activator (SRA), 0.87 kB in size. The SRA RNA is a non-coding RNA that coactivates several human sex hormone receptors and is strongly associated with breast cancer. Coding isoforms of SRA are also expressed to produce proteins, making the SRA gene a unique bifunctional system. Our experimental findings (SHAPE, in-line, DMS and RNase V1 probing) reveal that this lncRNA has a complex structural organization, consisting of four domains, with a variety of secondary structure elements. We examine the coevolution of the SRA gene at the RNA structure and protein structure levels using comparative sequence analysis across vertebrates. Rapid evolutionary stabilization of RNA structure, combined with frame-disrupting mutations in conserved regions, suggests that evolutionary pressure preserves the RNA structural core rather than its translational product. We perform similar experiments on alternatively spliced SRA isoforms to assess their structural features. PMID:22362738
RNA editing of non-coding RNA and its role in gene regulation.
Daniel, Chammiran; Lagergren, Jens; Öhman, Marie
2015-10-01
It has for a long time been known that repetitive elements, particularly Alu sequences in human, are edited by the adenosine deaminases acting on RNA, ADAR, family. The functional interpretation of these events has been even more difficult than that of editing events in coding sequences, but today there is an emerging understanding of their downstream effects. A surprisingly large fraction of the human transcriptome contains inverted Alu repeats, often forming long double stranded structures in RNA transcripts, typically occurring in introns and UTRs of protein coding genes. Alu repeats are also common in other primates, and similar inverted repeats can frequently be found in non-primates, although the latter are less prone to duplex formation. In human, as many as 700,000 Alu elements have been identified as substrates for RNA editing, of which many are edited at several sites. In fact, recent advancements in transcriptome sequencing techniques and bioinformatics have revealed that the human editome comprises at least a hundred million adenosine to inosine (A-to-I) editing sites in Alu sequences. Although substantial additional efforts are required in order to map the editome, already present knowledge provides an excellent starting point for studying cis-regulation of editing. In this review, we will focus on editing of long stem loop structures in the human transcriptome and how it can effect gene expression. Copyright © 2015 Elsevier B.V. and Société Française de Biochimie et Biologie Moléculaire (SFBBM). All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Enzhong; Zhu, Lingyu; Zhao, Lingyun
1996-08-01
The complete 4775-nt cDNA encoding the human serotonin 5-HT{sub 2C} receptor (5-HT{sub 2C}R), a G-protein-coupled receptor, has been isolated. It contains a 1377-nt coding region flanked by a 728-nt 5{prime}-untranslated region and a 2670-nt 3{prime}-untranslated region. By using the cloned 5-HT{sub 2C}R cDNA probe, the complete human gene for this receptor has been isolated and shown to contain six exons and five introns spanning at least 230 kb of DNA. The coding region of the human 5-HT{sub 2C}R gene is interrupted by three introns, and the positions of the intron/exon junctions are conserved between the human and the rodent genes.more » In addition, an alternatively spliced 5-HT{sub 2C}R RNA that contains a 95-nt deletion in the region coding for the second intracellular loop and the fourth transmembrane domain of the receptor has been identified. This deletion leads to a frameshift and premature termination so that the short isoform RNA encodes a putative protein of 248 amino acids. The ratio for the short isoform over the 5-HT{sub 2C}R RNA was found to be higher in choroid plexus tumor than in normal brain tissue, suggesting the possibility of differential regulation of the 5-HT{sub 2C}R gene in different neural tissues or during tumorigenesis. Transcription of the human 5-HT{sub 2C}R gene was found to be initiated at multiple sites. No classical TATA-box sequence was found at the appropriate location, and the 5{prime}-flanking sequence contains many potential transcription factor-binding sites. A 7.3-kb 5{prime}-flanking 5-HT{sub 2C}R DNA directed the efficient expression of a luciferase reported gene in SK-N-SH and IMR32 neuroblastoma cells, indicating that is contains a functional promoter. 69 refs., 8 figs., 1 tab.« less
Seligmann, Hervé
2013-03-01
Usual DNA→RNA transcription exchanges T→U. Assuming different systematic symmetric nucleotide exchanges during translation, some GenBank RNAs match exactly human mitochondrial sequences (exchange rules listed in decreasing transcript frequencies): C↔U, A↔U, A↔U+C↔G (two nucleotide pairs exchanged), G↔U, A↔G, C↔G, none for A↔C, A↔G+C↔U, and A↔C+G↔U. Most unusual transcripts involve exchanging uracil. Independent measures of rates of rare replicational enzymatic DNA nucleotide misinsertions predict frequencies of RNA transcripts systematically exchanging the corresponding misinserted nucleotides. Exchange transcripts self-hybridize less than other gene regions, self-hybridization increases with length, suggesting endoribonuclease-limited elongation. Blast detects stop codon depleted putative protein coding overlapping genes within exchange-transcribed mitochondrial genes. These align with existing GenBank proteins (mainly metazoan origins, prokaryotic and viral origins underrepresented). These GenBank proteins frequently interact with RNA/DNA, are membrane transporters, or are typical of mitochondrial metabolism. Nucleotide exchange transcript frequencies increase with overlapping gene densities and stop densities, indicating finely tuned counterbalancing regulation of expression of systematic symmetric nucleotide exchange-encrypted proteins. Such expression necessitates combined activities of suppressor tRNAs matching stops, and nucleotide exchange transcription. Two independent properties confirm predicted exchanged overlap coding genes: discrepancy of third codon nucleotide contents from replicational deamination gradients, and codon usage according to circular code predictions. Predictions from both properties converge, especially for frequent nucleotide exchange types. Nucleotide exchanging transcription apparently increases coding densities of protein coding genes without lengthening genomes, revealing unsuspected functional DNA coding potential. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Babbar, Anshu; Itzek, Andreas; Pieper, Dietmar H; Nitsche-Schmitz, D Patric
2018-03-12
Streptococcus dysgalactiae subsp. equisimilis (SDSE), belonging to the group C and G streptococci, are human pathogens reported to cause clinical manifestations similar to infections caused by Streptococcus pyogenes. To scrutinize the distribution of gene coding for S. pyogenes virulence factors in SDSE, 255 isolates were collected from humans infected with SDSE in Vellore, a region in southern India, with high incidence of SDSE infections. Initial evaluation indicated SDSE isolates comprising of 82.35% group G and 17.64% group C. A multiplex PCR system was used to detect 21 gene encoding virulence-associated factors of S. pyogenes, like superantigens, DNases, proteinases, and other immune modulatory toxins. As validated by DNA sequencing of the PCR products, sequences homologous to speC, speG, speH, speI, speL, ssa and smeZ of the family of superantigen coding genes and for DNases like sdaD and sdc were detected in the SDSE collection. Furthermore, there was high abundance (48.12% in group G and 86.6% in group C SDSE) of scpA, the gene coding for C5a peptidase in these isolates. Higher abundance of S. pyogenes virulence factor genes was observed in SDSE of Lancefield group C as compared to group G, even though the incidence rates in former were lower. This study not only substantiates detection of S. pyogenes virulence factor genes in whole genome sequenced SDSE but also makes significant contribution towards the understanding of SDSE and its increasing virulence potential.
Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.
Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M
2010-12-15
Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.
APADB: a database for alternative polyadenylation and microRNA regulation events
Müller, Sören; Rycak, Lukas; Afonso-Grunz, Fabian; Winter, Peter; Zawada, Adam M.; Damrath, Ewa; Scheider, Jessica; Schmäh, Juliane; Koch, Ina; Kahl, Günter; Rotter, Björn
2014-01-01
Alternative polyadenylation (APA) is a widespread mechanism that contributes to the sophisticated dynamics of gene regulation. Approximately 50% of all protein-coding human genes harbor multiple polyadenylation (PA) sites; their selective and combinatorial use gives rise to transcript variants with differing length of their 3′ untranslated region (3′UTR). Shortened variants escape UTR-mediated regulation by microRNAs (miRNAs), especially in cancer, where global 3′UTR shortening accelerates disease progression, dedifferentiation and proliferation. Here we present APADB, a database of vertebrate PA sites determined by 3′ end sequencing, using massive analysis of complementary DNA ends. APADB provides (A)PA sites for coding and non-coding transcripts of human, mouse and chicken genes. For human and mouse, several tissue types, including different cancer specimens, are available. APADB records the loss of predicted miRNA binding sites and visualizes next-generation sequencing reads that support each PA site in a genome browser. The database tables can either be browsed according to organism and tissue or alternatively searched for a gene of interest. APADB is the largest database of APA in human, chicken and mouse. The stored information provides experimental evidence for thousands of PA sites and APA events. APADB combines 3′ end sequencing data with prediction algorithms of miRNA binding sites, allowing to further improve prediction algorithms. Current databases lack correct information about 3′UTR lengths, especially for chicken, and APADB provides necessary information to close this gap. Database URL: http://tools.genxpro.net/apadb/ PMID:25052703
Naville, M; Warren, I A; Haftek-Terreau, Z; Chalopin, D; Brunet, F; Levin, P; Galiana, D; Volff, J-N
2016-04-01
Viruses and transposable elements, once considered as purely junk and selfish sequences, have repeatedly been used as a source of novel protein-coding genes during the evolution of most eukaryotic lineages, a phenomenon called 'molecular domestication'. This is exemplified perfectly in mammals and other vertebrates, where many genes derived from long terminal repeat (LTR) retroelements (retroviruses and LTR retrotransposons) have been identified through comparative genomics and functional analyses. In particular, genes derived from gag structural protein and envelope (env) genes, as well as from the integrase-coding and protease-coding sequences, have been identified in humans and other vertebrates. Retroelement-derived genes are involved in many important biological processes including placenta formation, cognitive functions in the brain and immunity against retroelements, as well as in cell proliferation, apoptosis and cancer. These observations support an important role of retroelement-derived genes in the evolution and diversification of the vertebrate lineage. Copyright © 2016 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
dbCPG: A web resource for cancer predisposition genes
Wei, Ran; Yao, Yao; Yang, Wu; Zheng, Chun-Hou; Zhao, Min; Xia, Junfeng
2016-01-01
Cancer predisposition genes (CPGs) are genes in which inherited mutations confer highly or moderately increased risks of developing cancer. Identification of these genes and understanding the biological mechanisms that underlie them is crucial for the prevention, early diagnosis, and optimized management of cancer. Over the past decades, great efforts have been made to identify CPGs through multiple strategies. However, information on these CPGs and their molecular functions is scattered. To address this issue and provide a comprehensive resource for researchers, we developed the Cancer Predisposition Gene Database (dbCPG, Database URL: http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp), the first literature-based gene resource for exploring human CPGs. It contains 827 human (724 protein-coding, 23 non-coding, and 80 unknown type genes), 637 rats, and 658 mouse CPGs. Furthermore, data mining was performed to gain insights into the understanding of the CPGs data, including functional annotation, gene prioritization, network analysis of prioritized genes and overlap analysis across multiple cancer types. A user-friendly web interface with multiple browse, search, and upload functions was also developed to facilitate access to the latest information on CPGs. Taken together, the dbCPG database provides a comprehensive data resource for further studies of cancer predisposition genes. PMID:27192119
Origins of De Novo Genes in Human and Chimpanzee.
Ruiz-Orera, Jorge; Hernandez-Rodriguez, Jessica; Chiva, Cristina; Sabidó, Eduard; Kondova, Ivanela; Bontrop, Ronald; Marqués-Bonet, Tomàs; Albà, M Mar
2015-12-01
The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.
Origins of De Novo Genes in Human and Chimpanzee
Ruiz-Orera, Jorge; Hernandez-Rodriguez, Jessica; Chiva, Cristina; Sabidó, Eduard; Kondova, Ivanela; Bontrop, Ronald; Marqués-Bonet, Tomàs; Albà, M.Mar
2015-01-01
The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species—human, chimpanzee, macaque, and mouse—and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins. PMID:26720152
Wieczorek, Aneta; Fornalewicz, Karolina; Mocarski, Łukasz; Łyżeń, Robert; Węgrzyn, Grzegorz
2018-04-15
Genetic evidence for a link between DNA replication and glycolysis has been demonstrated a decade ago in Bacillus subtilis, where temperature-sensitive mutations in genes coding for replication proteins could be suppressed by mutations in genes of glycolytic enzymes. Then, a strong influence of dysfunctions of particular enzymes from the central carbon metabolism (CCM) on DNA replication and repair in Escherichia coli was reported. Therefore, we asked if such a link occurs only in bacteria or it is a more general phenomenon. Here, we demonstrate that effects of silencing (provoked by siRNA) of expression of genes coding for proteins involved in DNA replication and repair (primase, DNA polymerase ι, ligase IV, and topoisomerase IIIβ) on these processes (less efficient entry into the S phase of the cell cycle and decreased level of DNA synthesis) could be suppressed by silencing of specific genes of enzymes from CMM. Silencing of other pairs of replication/repair and CMM genes resulted in enhancement of the negative effects of lower expression levels of replication/repair genes. We suggest that these results may be proposed as a genetic evidence for the link between DNA replication/repair and CMM in human cells, indicating that it is a common biological phenomenon, occurring from bacteria to humans. Copyright © 2018 Elsevier B.V. All rights reserved.
Chen, Wen; Zhang, Xuan; Li, Jing; Huang, Shulan; Xiang, Shuanglin; Hu, Xiang; Liu, Changning
2018-05-09
Zebrafish is a full-developed model system for studying development processes and human disease. Recent studies of deep sequencing had discovered a large number of long non-coding RNAs (lncRNAs) in zebrafish. However, only few of them had been functionally characterized. Therefore, how to take advantage of the mature zebrafish system to deeply investigate the lncRNAs' function and conservation is really intriguing. We systematically collected and analyzed a series of zebrafish RNA-seq data, then combined them with resources from known database and literatures. As a result, we obtained by far the most complete dataset of zebrafish lncRNAs, containing 13,604 lncRNA genes (21,128 transcripts) in total. Based on that, a co-expression network upon zebrafish coding and lncRNA genes was constructed and analyzed, and used to predict the Gene Ontology (GO) and the KEGG annotation of lncRNA. Meanwhile, we made a conservation analysis on zebrafish lncRNA, identifying 1828 conserved zebrafish lncRNA genes (1890 transcripts) that have their putative mammalian orthologs. We also found that zebrafish lncRNAs play important roles in regulation of the development and function of nervous system; these conserved lncRNAs present a significant sequential and functional conservation, with their mammalian counterparts. By integrative data analysis and construction of coding-lncRNA gene co-expression network, we gained the most comprehensive dataset of zebrafish lncRNAs up to present, as well as their systematic annotations and comprehensive analyses on function and conservation. Our study provides a reliable zebrafish-based platform to deeply explore lncRNA function and mechanism, as well as the lncRNA commonality between zebrafish and human.
Using reporter gene assays to identify cis regulatory differences between humans and chimpanzees.
Chabot, Adrien; Shrit, Ralla A; Blekhman, Ran; Gilad, Yoav
2007-08-01
Most phenotypic differences between human and chimpanzee are likely to result from differences in gene regulation, rather than changes to protein-coding regions. To date, however, only a handful of human-chimpanzee nucleotide differences leading to changes in gene regulation have been identified. To hone in on differences in regulatory elements between human and chimpanzee, we focused on 10 genes that were previously found to be differentially expressed between the two species. We then designed reporter gene assays for the putative human and chimpanzee promoters of the 10 genes. Of seven promoters that we found to be active in human liver cell lines, human and chimpanzee promoters had significantly different activity in four cases, three of which recapitulated the gene expression difference seen in the microarray experiment. For these three genes, we were therefore able to demonstrate that a change in cis influences expression differences between humans and chimpanzees. Moreover, using site-directed mutagenesis on one construct, the promoter for the DDA3 gene, we were able to identify three nucleotides that together lead to a cis regulatory difference between the species. High-throughput application of this approach can provide a map of regulatory element differences between humans and our close evolutionary relatives.
The transfer and transformation of collective network information in gene-matched networks.
Kitsukawa, Takashi; Yagi, Takeshi
2015-10-09
Networks, such as the human society network, social and professional networks, and biological system networks, contain vast amounts of information. Information signals in networks are distributed over nodes and transmitted through intricately wired links, making the transfer and transformation of such information difficult to follow. Here we introduce a novel method for describing network information and its transfer using a model network, the Gene-matched network (GMN), in which nodes (neurons) possess attributes (genes). In the GMN, nodes are connected according to their expression of common genes. Because neurons have multiple genes, the GMN is cluster-rich. We show that, in the GMN, information transfer and transformation were controlled systematically, according to the activity level of the network. Furthermore, information transfer and transformation could be traced numerically with a vector using genes expressed in the activated neurons, the active-gene array, which was used to assess the relative activity among overlapping neuronal groups. Interestingly, this coding style closely resembles the cell-assembly neural coding theory. The method introduced here could be applied to many real-world networks, since many systems, including human society and various biological systems, can be represented as a network of this type.
Deschamps, Matthieu; Laval, Guillaume; Fagny, Maud; Itan, Yuval; Abel, Laurent; Casanova, Jean-Laurent; Patin, Etienne; Quintana-Murci, Lluis
2016-01-01
Human genes governing innate immunity provide a valuable tool for the study of the selective pressure imposed by microorganisms on host genomes. A comprehensive, genome-wide study of how selective constraints and adaptations have driven the evolution of innate immunity genes is missing. Using full-genome sequence variation from the 1000 Genomes Project, we first show that innate immunity genes have globally evolved under stronger purifying selection than the remainder of protein-coding genes. We identify a gene set under the strongest selective constraints, mutations in which are likely to predispose individuals to life-threatening disease, as illustrated by STAT1 and TRAF3. We then evaluate the occurrence of local adaptation and detect 57 high-scoring signals of positive selection at innate immunity genes, variation in which has been associated with susceptibility to common infectious or autoimmune diseases. Furthermore, we show that most adaptations targeting coding variation have occurred in the last 6,000–13,000 years, the period at which populations shifted from hunting and gathering to farming. Finally, we show that innate immunity genes present higher Neandertal introgression than the remainder of the coding genome. Notably, among the genes presenting the highest Neandertal ancestry, we find the TLR6-TLR1-TLR10 cluster, which also contains functional adaptive variation in Europeans. This study identifies highly constrained genes that fulfill essential, non-redundant functions in host survival and reveals others that are more permissive to change—containing variation acquired from archaic hominins or adaptive variants in specific populations—improving our understanding of the relative biological importance of innate immunity pathways in natural conditions. PMID:26748513
Huntley, Stuart; Baggott, Daniel M.; Hamilton, Aaron T.; Tran-Gyamfi, Mary; Yang, Shan; Kim, Joomyeong; Gordon, Laurie; Branscomb, Elbert; Stubbs, Lisa
2006-01-01
Krüppel-type zinc finger (ZNF) motifs are prevalent components of transcription factor proteins in all eukaryotes. KRAB-ZNF proteins, in which a potent repressor domain is attached to a tandem array of DNA-binding zinc-finger motifs, are specific to tetrapod vertebrates and represent the largest class of ZNF proteins in mammals. To define the full repertoire of human KRAB-ZNF proteins, we searched the genome sequence for key motifs and then constructed and manually curated gene models incorporating those sequences. The resulting gene catalog contains 423 KRAB-ZNF protein-coding loci, yielding alternative transcripts that altogether predict at least 742 structurally distinct proteins. Active rounds of segmental duplication, involving single genes or larger regions and including both tandem and distributed duplication events, have driven the expansion of this mammalian gene family. Comparisons between the human genes and ZNF loci mined from the draft mouse, dog, and chimpanzee genomes not only identified 103 KRAB-ZNF genes that are conserved in mammals but also highlighted a substantial level of lineage-specific change; at least 136 KRAB-ZNF coding genes are primate specific, including many recent duplicates. KRAB-ZNF genes are widely expressed and clustered genes are typically not coregulated, indicating that paralogs have evolved to fill roles in many different biological processes. To facilitate further study, we have developed a Web-based public resource with access to gene models, sequences, and other data, including visualization tools to provide genomic context and interaction with other public data sets. PMID:16606702
Human homolog of the mouse sperm receptor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chamberlin, M.E.; Dean, J.
1990-08-01
The human zona pellucida, composed of three glycoproteins (ZP1, ZP2, and ZP3), forms an extracellular matrix that surrounds ovulated eggs and mediates species-specific fertilization. The genes that code for at least two of the zona proteins (ZP2 and ZP3) cross-hybridize with other mammalian DNA. The recently characterized mouse sperm receptor gene (Zp-3) was used to isolate its human homolog. The human homolog spans {approx}18.3 kilobase pairs (kbp) (compared to 8.6 kbp for the mouse gene) and contains eight exons, the sizes of which are strictly conserved between the two species. Four short (8-15 bp) sequences within the first 250 bpmore » of the 5{prime} flanking region in the human Zp-3 homolog are also present upstream of mouse Zp-3. These elements may modulate oocyte-specific gene expression. By using the polymerase chain reaction, a full-length cDNA of human ZP3 was isolated from human ovarian poly(A){sup +} RNA and used to deduce the structure of human ZP3 mRNA. Certain features of the human and mouse ZP3 transcripts are conserved. Both have unusually short 5{prime} and 3{prime} untranslated regions, both contain a single open reading frame that is 74% identical, and both code for 424 amino acid polypeptides that are 67% the same. The similarity between the two proteins may define domains that are important in maintaining the structural integrity of the zona pellucida, while the differences may play a role in mediating the species-specific events of mammalian fertilization.« less
Nithya, Ravichantar; Ahmed, Siti Aminah; Hoe, Chee-Hock; Gopinath, Subash C B; Citartan, Marimuthu; Chinni, Suresh V; Lee, Li Pin; Rozhdestvensky, Timofey S; Tang, Thean-Hock
2015-01-01
Salmonellosis, a communicable disease caused by members of the Salmonella species, transmitted to humans through contaminated food or water. It is of paramount importance, to generate accurate detection methods for discriminating the various Salmonella species that cause severe infection in humans, including S. Typhi and S. Paratyphi A. Here, we formulated a strategy of detection and differentiation of salmonellosis by a multiplex polymerase chain reaction assay using S. Typhi non-protein coding RNA (sRNA) genes. With the designed sequences that specifically detect sRNA genes from S. Typhi and S. Paratyphi A, a detection limit of up to 10 pg was achieved. Moreover, in a stool-seeding experiment with S. Typhi and S. Paratyphi A, we have attained a respective detection limit of 15 and 1.5 CFU/mL. The designed strategy using sRNA genes shown here is comparatively sensitive and specific, suitable for clinical diagnosis and disease surveillance, and sRNAs represent an excellent molecular target for infectious disease.
Sequence divergence of the red and green visual pigments in great apes and humans.
Deeb, S S; Jorgensen, A L; Battisti, L; Iwasaki, L; Motulsky, A G
1994-01-01
We have determined the coding sequences of red and green visual pigment genes of the chimpanzee, gorilla, and orangutan. The deduced amino acid sequences of these pigments are highly homologous to the equivalent human pigments. None of the amino acid differences occurred at sites that were previously shown to influence pigment absorption characteristics. Therefore, we predict the spectra of red and green pigments of the apes to have wavelengths of maximum absorption that differ by < 2 nm from the equivalent human pigments and that color vision in these nonhuman primates will be very similar, if not identical, to that in humans. A total of 14 within-species polymorphisms (6 involving silent substitutions) were observed in the coding sequences of the red and green pigment genes of the great apes. Remarkably, the polymorphisms at 6 of these sites had been observed in human populations, suggesting that they predated the evolution of higher primates. Alleles at polymorphic sites were often shared between the red and green pigment genes. The average synonymous rate of divergence of red from green sequences was approximately 1/10th that estimated for other proteins of higher primates, indicating the involvement of gene conversion in generating these polymorphisms. The high degree of homology and juxtaposition of these two genes on the X chromosome has promoted unequal recombination and/or gene conversion that led to sequence homogenization. However, natural selection operated to maintain the degree of separation in peak absorbance between the red and green pigments that resulted in optimal chromatic discrimination. This represents a unique case of molecular coevolution between two homologous genes that functionally interact at the behavioral level. PMID:8041777
Niu, Ao-lei; Wang, Yin-qiu; Zhang, Hui; Liao, Cheng-hong; Wang, Jin-kai; Zhang, Rui; Che, Jun; Su, Bing
2011-10-12
Homeobox genes are the key regulators during development, and they are in general highly conserved with only a few reported cases of rapid evolution. RHOXF2 is an X-linked homeobox gene in primates. It is highly expressed in the testicle and may play an important role in spermatogenesis. As male reproductive system is often the target of natural and/or sexual selection during evolution, in this study, we aim to dissect the pattern of molecular evolution of RHOXF2 in primates and its potential functional consequence. We studied sequences and copy number variation of RHOXF2 in humans and 16 nonhuman primate species as well as the expression patterns in human, chimpanzee, white-browed gibbon and rhesus macaque. The gene copy number analysis showed that there had been parallel gene duplications/losses in multiple primate lineages. Our evidence suggests that 11 nonhuman primate species have one RHOXF2 copy, and two copies are present in humans and four Old World monkey species, and at least 6 copies in chimpanzees. Further analysis indicated that the gene duplications in primates had likely been mediated by endogenous retrovirus (ERV) sequences flanking the gene regions. In striking contrast to non-human primates, humans appear to have homogenized their two RHOXF2 copies by the ERV-mediated non-allelic recombination mechanism. Coding sequence and phylogenetic analysis suggested multi-lineage strong positive selection on RHOXF2 during primate evolution, especially during the origins of humans and chimpanzees. All the 8 coding region polymorphic sites in human populations are non-synonymous, implying on-going selection. Gene expression analysis demonstrated that besides the preferential expression in the reproductive system, RHOXF2 is also expressed in the brain. The quantitative data suggests expression pattern divergence among primate species. RHOXF2 is a fast-evolving homeobox gene in primates. The rapid evolution and copy number changes of RHOXF2 had been driven by Darwinian positive selection acting on the male reproductive system and possibly also on the central nervous system, which sheds light on understanding the role of homeobox genes in adaptive evolution.
Miyamoto, T; Koh, E; Tsujimura, A; Miyagawa, Y; Saijo, Y; Namiki, M; Sengoku, K
2014-04-01
Genetic mechanisms have been implicated as a cause of some cases of male infertility. Recently, ten novel genes involved in human spermatogenesis, including human LRWD1, have been identified by expression microarray analysis of human testictissue. The human LRWD1 protein mediates the origin recognition complex in chromatin, which is critical for the initiation of pre-replication complex assembly in G1 and chromatin organization in post-G1 cells. The Lrwd1 gene expression is specific to the testis in mice. Therefore, we hypothesized that mutation or polymorphisms of LRWD1 participate in male infertility, especially azoospermia. To investigate whether LRWD1 gene defects are associated with azoospermia caused by SCOS and meiotic arrest (MA), mutational analysis was performed in 100 and 30 Japanese patients by direct sequencing of the coding regions, respectively. Statistical analysis was performed for patients with SCOS and MA and in 100 healthy control men. No mutations were found in LRWD1; however, three coding single-nucleotide polymorphisms (SNP1-SNP3) could be detected in the patients. The genotype and allele frequencies in SNP1 and SNP2 were notably higher in the SCOS group than in the control group (P < 0.05). These results suggest the critical role of LRWD1 in human spermatogenesis. © 2013 Blackwell Verlag GmbH.
Jiang, Xiaoou; Yu, Han; Teo, Cui Rong; Tan, Genim Siu Xian; Goh, Sok Chin; Patel, Parasvi; Chua, Yiqiang Kevin; Hameed, Nasirah Banu Sahul; Bertoletti, Antonio; Patzel, Volker
2016-09-01
Dumbbell-shaped DNA minimal vectors lacking nontherapeutic genes and bacterial sequences are considered a stable, safe alternative to viral, nonviral, and naked plasmid-based gene-transfer systems. We investigated novel molecular features of dumbbell vectors aiming to reduce vector size and to improve the expression of noncoding or coding RNA. We minimized small hairpin RNA (shRNA) or microRNA (miRNA) expressing dumbbell vectors in size down to 130 bp generating the smallest genetic expression vectors reported. This was achieved by using a minimal H1 promoter with integrated transcriptional terminator transcribing the RNA hairpin structure around the dumbbell loop. Such vectors were generated with high conversion yields using a novel protocol. Minimized shRNA-expressing dumbbells showed accelerated kinetics of delivery and transcription leading to enhanced gene silencing in human tissue culture cells. In primary human T cells, minimized miRNA-expressing dumbbells revealed higher stability and triggered stronger target gene suppression as compared with plasmids and miRNA mimics. Dumbbell-driven gene expression was enhanced up to 56- or 160-fold by implementation of an intron and the SV40 enhancer compared with control dumbbells or plasmids. Advanced dumbbell vectors may represent one option to close the gap between durable expression that is achievable with integrating viral vectors and short-term effects triggered by naked RNA.
Xie, G.; Chain, P.S.G.; Lo, C.; Liu, K-L.; Gans, J.; Merritt, J.; Qi, F.
2010-01-01
SUMMARY Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~ 2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. PMID:21040513
Xie, G; Chain, P S G; Lo, C-C; Liu, K-L; Gans, J; Merritt, J; Qi, F
2010-12-01
Human dental plaque is a complex microbial community containing an estimated 700 to 19,000 species/phylotypes. Despite numerous studies analysing species richness in healthy and diseased human subjects, the true genomic composition of the human dental plaque microbiota remains unknown. Here we report a metagenomic analysis of a healthy human plaque sample using a combination of second-generation sequencing platforms. A total of 860 million base pairs of non-human sequences were generated. Various analysis tools revealed the presence of 12 well-characterized phyla, members of the TM-7 and BRC1 clade, and sequences that could not be classified. Both pathogens and opportunistic pathogens were identified, supporting the ecological plaque hypothesis for oral diseases. Mapping the metagenomic reads to sequenced reference genomes demonstrated that 4% of the reads could be assigned to the sequenced species. Preliminary annotation identified genes belonging to all known functional categories. Interestingly, although 73% of the total assembled contig sequences were predicted to code for proteins, only 51% of them could be assigned a functional role. Furthermore, ~2.8% of the total predicted genes coded for proteins involved in resistance to antibiotics and toxic compounds, suggesting that the oral cavity is an important reservoir for antimicrobial resistance. © 2010 John Wiley & Sons A/S.
HERC1 polymorphisms: population-specific variations in haplotype composition.
Yuasa, Isao; Umetsu, Kazuo; Nishimukai, Hiroaki; Fukumori, Yasuo; Harihara, Shinji; Saitou, Naruya; Jin, Feng; Chattopadhyay, Prasanta K; Henke, Lotte; Henke, Jürgen
2009-08-01
Human HERC1 is one of six HERC proteins and may play an important role in intracellular membrane trafficking. The human HERC1 gene is suggested to have been affected by local positive selection. To assess the global frequency distributions of coding and non-coding single nucleotide polymorphisms (SNPs) in the HERC1 gene, we developed a new simultaneous genotyping method for four SNPs, and applied this method to investigate 1213 individuals from 12 global populations. The results confirmed remarked differences in the allele and haplotype frequencies between East Asian and non-East Asian populations. One of the three common haplotypes observed was found to be characteristic of East Asians, who showed a relatively uniform distribution of haplotypes. Information on haplotypes would be useful for testing the function of polymorphisms in the HERC1 gene. This is the first study to investigate the distribution of HERC1 polymorphisms in various populations. (c) 2009 John Wiley & Sons, Ltd.
A new mutation identified in SPATA16 in two globozoospermic patients.
ElInati, Elias; Fossard, Camille; Okutman, Ozlem; Ghédir, Houda; Ibala-Romdhane, Samira; Ray, Pierre F; Saad, Ali; Hennebicq, Sylvianne; Viville, Stéphane
2016-06-01
The aim of this study is to identify potential genes involved in human globozoopsermia. Nineteen globozoospermic patients (previously screened for DPY19L2 mutations with no causative mutation) were recruited in this study and screened for mutations in genes implicated in human globozoospermia SPATA16 and PICK1. Using the candidate gene approach and the determination of Spata16 partners by Glutathione S-transferase (GST) pull-down four genes were also selected and screened for mutations. We identified a novel mutation of SPATA16: deletion of 22.6 Kb encompassing the first coding exon in two unrelated Tunisian patients who presented the same deletion breakpoints. The two patients shared the same haplotype, suggesting a possible ancestral founder effect for this new deletion. Four genes were selected using the candidate gene approach and the GST pull-down (GOPC, PICK1, AGFG1 and IRGC) and were screened for mutation, but no variation was identified. The present study confirms the pathogenicity of the SPATA16 mutations. The fact that no variation was detected in the coding sequence of AFGF1, GOPC, PICK1 and IRGC does not mean that they are not involved in human globozoospermia. A larger globozoospermic cohort must be studied in order to accelerate the process of identifying new genes involved in such phenotypes. Until sufficient numbers of patients have been screened, AFGF1, GOPC, PICK1 and IRGC should still be considered as candidate genes.
Toyoda, N; Kleinhaus, N; Larsen, P R
1996-06-01
We analyzed the exon-intron structure of the human type 1 deiodinase gene (dio1) and compared it with that of a patient with suspected congenital type 1 deiodinase (D1) deficiency. The hdio1 gene is identical in exon-intron arrangement to the mouse gene, with coding sequences and a selenocysteine insertion sequence (SECIS) element contained in four exons. There were no mutations in the sequences of exons 1-4 of the patient's genomic DNA. Functional studies by transient expression techniques showed no difference in basal promoter activity or T3 responsiveness between the patient's and the normal dio1 gene. A structural abnormality in the dio1 gene is not a likely explanation for this patient's D1-deficient phenotype.
An expanding universe of the non-coding genome in cancer biology.
Xue, Bin; He, Lin
2014-06-01
Neoplastic transformation is caused by accumulation of genetic and epigenetic alterations that ultimately convert normal cells into tumor cells with uncontrolled proliferation and survival, unlimited replicative potential and invasive growth [Hanahan,D. et al. (2011) Hallmarks of cancer: the next generation. Cell, 144, 646-674]. Although the majority of the cancer studies have focused on the functions of protein-coding genes, emerging evidence has started to reveal the importance of the vast non-coding genome, which constitutes more than 98% of the human genome. A number of non-coding RNAs (ncRNAs) derived from the 'dark matter' of the human genome exhibit cancer-specific differential expression and/or genomic alterations, and it is increasingly clear that ncRNAs, including small ncRNAs and long ncRNAs (lncRNAs), play an important role in cancer development by regulating protein-coding gene expression through diverse mechanisms. In addition to ncRNAs, nearly half of the mammalian genomes consist of transposable elements, particularly retrotransposons. Once depicted as selfish genomic parasites that propagate at the expense of host fitness, retrotransposon elements could also confer regulatory complexity to the host genomes during development and disease. Reactivation of retrotransposons in cancer, while capable of causing insertional mutagenesis and genome rearrangements to promote oncogenesis, could also alter host gene expression networks to favor tumor development. Taken together, the functional significance of non-coding genome in tumorigenesis has been previously underestimated, and diverse transcripts derived from the non-coding genome could act as integral functional components of the oncogene and tumor suppressor network. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana.
Mayer, K; Schüller, C; Wambutt, R; Murphy, G; Volckaert, G; Pohl, T; Düsterhöft, A; Stiekema, W; Entian, K D; Terryn, N; Harris, B; Ansorge, W; Brandt, P; Grivell, L; Rieger, M; Weichselgartner, M; de Simone, V; Obermaier, B; Mache, R; Müller, M; Kreis, M; Delseny, M; Puigdomenech, P; Watson, M; Schmidtheini, T; Reichert, B; Portatelle, D; Perez-Alonso, M; Boutry, M; Bancroft, I; Vos, P; Hoheisel, J; Zimmermann, W; Wedler, H; Ridley, P; Langham, S A; McCullagh, B; Bilham, L; Robben, J; Van der Schueren, J; Grymonprez, B; Chuang, Y J; Vandenbussche, F; Braeken, M; Weltjens, I; Voet, M; Bastiaens, I; Aert, R; Defoor, E; Weitzenegger, T; Bothe, G; Ramsperger, U; Hilbert, H; Braun, M; Holzer, E; Brandt, A; Peters, S; van Staveren, M; Dirske, W; Mooijman, P; Klein Lankhorst, R; Rose, M; Hauf, J; Kötter, P; Berneiser, S; Hempel, S; Feldpausch, M; Lamberth, S; Van den Daele, H; De Keyser, A; Buysshaert, C; Gielen, J; Villarroel, R; De Clercq, R; Van Montagu, M; Rogers, J; Cronin, A; Quail, M; Bray-Allen, S; Clark, L; Doggett, J; Hall, S; Kay, M; Lennard, N; McLay, K; Mayes, R; Pettett, A; Rajandream, M A; Lyne, M; Benes, V; Rechmann, S; Borkova, D; Blöcker, H; Scharfe, M; Grimm, M; Löhnert, T H; Dose, S; de Haan, M; Maarse, A; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Fartmann, B; Granderath, K; Dauner, D; Herzl, A; Neumann, S; Argiriou, A; Vitale, D; Liguori, R; Piravandi, E; Massenet, O; Quigley, F; Clabauld, G; Mündlein, A; Felber, R; Schnabl, S; Hiller, R; Schmidt, W; Lecharny, A; Aubourg, S; Chefdor, F; Cooke, R; Berger, C; Montfort, A; Casacuberta, E; Gibbons, T; Weber, N; Vandenbol, M; Bargues, M; Terol, J; Torres, A; Perez-Perez, A; Purnelle, B; Bent, E; Johnson, S; Tacon, D; Jesse, T; Heijnen, L; Schwarz, S; Scholler, P; Heber, S; Francs, P; Bielke, C; Frishman, D; Haase, D; Lemcke, K; Mewes, H W; Stocker, S; Zaccaria, P; Bevan, M; Wilson, R K; de la Bastide, M; Habermann, K; Parnell, L; Dedhia, N; Gnoj, L; Schutz, K; Huang, E; Spiegel, L; Sehkon, M; Murray, J; Sheet, P; Cordes, M; Abu-Threideh, J; Stoneking, T; Kalicki, J; Graves, T; Harmon, G; Edwards, J; Latreille, P; Courtney, L; Cloud, J; Abbott, A; Scott, K; Johnson, D; Minx, P; Bentley, D; Fulton, B; Miller, N; Greco, T; Kemp, K; Kramer, J; Fulton, L; Mardis, E; Dante, M; Pepin, K; Hillier, L; Nelson, J; Spieth, J; Ryan, E; Andrews, S; Geisel, C; Layman, D; Du, H; Ali, J; Berghoff, A; Jones, K; Drone, K; Cotton, M; Joshu, C; Antonoiu, B; Zidanic, M; Strong, C; Sun, H; Lamar, B; Yordan, C; Ma, P; Zhong, J; Preston, R; Vil, D; Shekher, M; Matero, A; Shah, R; Swaby, I K; O'Shaughnessy, A; Rodriguez, M; Hoffmann, J; Till, S; Granat, S; Shohdy, N; Hasegawa, A; Hameed, A; Lodhi, M; Johnson, A; Chen, E; Marra, M; Martienssen, R; McCombie, W R
1999-12-16
The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.
Pecker, I; Avraham, K B; Gilbert, D J; Savitsky, K; Rotman, G; Harnik, R; Fukao, T; Schröck, E; Hirotsune, S; Tagle, D A; Collins, F S; Wynshaw-Boris, A; Ried, T; Copeland, N G; Jenkins, N A; Shiloh, Y; Ziv, Y
1996-07-01
Atm, the mouse homolog of the human ATM gene defective in ataxia-telangiectasia (A-T), has been identified. The entire coding sequence of the Atm transcript was cloned and found to contain an open reading frame encoding a protein of 3066 amino acids with 84% overall identity and 91% similarity to the human ATM protein. Variable levels of expression of Atm were observed in different tissues. Fluorescence in situ hybridization and linkage analysis located the Atm gene on mouse chromosome 9, band 9C, in a region homologous to the ATM region on human chromosome 11q22-q23.
Chan, Wen-Ling; Yang, Wen-Kuang; Huang, Hsien-Da; Chang, Jan-Gowth
2013-01-01
RNA interference (RNAi) is a gene silencing process within living cells, which is controlled by the RNA-induced silencing complex with a sequence-specific manner. In flies and mice, the pseudogene transcripts can be processed into short interfering RNAs (siRNAs) that regulate protein-coding genes through the RNAi pathway. Following these findings, we construct an innovative and comprehensive database to elucidate siRNA-mediated mechanism in human transcribed pseudogenes (TPGs). To investigate TPG producing siRNAs that regulate protein-coding genes, we mapped the TPGs to small RNAs (sRNAs) that were supported by publicly deep sequencing data from various sRNA libraries and constructed the TPG-derived siRNA-target interactions. In addition, we also presented that TPGs can act as a target for miRNAs that actually regulate the parental gene. To enable the systematic compilation and updating of these results and additional information, we have developed a database, pseudoMap, capturing various types of information, including sequence data, TPG and cognate annotation, deep sequencing data, RNA-folding structure, gene expression profiles, miRNA annotation and target prediction. As our knowledge, pseudoMap is the first database to demonstrate two mechanisms of human TPGs: encoding siRNAs and decoying miRNAs that target the parental gene. pseudoMap is freely accessible at http://pseudomap.mbc.nctu.edu.tw/. Database URL: http://pseudomap.mbc.nctu.edu.tw/
SIN3A mutations are rare in men with azoospermia.
Miyamoto, T; Koh, E; Tsujimura, A; Miyagawa, Y; Minase, G; Ueda, Y; Namiki, M; Sengoku, K
2015-11-01
A loss of function of the murine Sin3A gene resulted in male infertility with Sertoli cell-only syndrome (SCOS) phenotype in mice. Here, we investigated the relevance of this gene to human male infertility with azoospermia caused by SCOS. Mutation analysis of SIN3A in the coding region was performed on 80 Japanese patients. However, no variants could be detected. This study suggests a lack of association of SIN3A gene sequence variants with azoospermia caused by SCOS in humans. © 2014 Blackwell Verlag GmbH.
The functional spectrum of low-frequency coding variation.
Marth, Gabor T; Yu, Fuli; Indap, Amit R; Garimella, Kiran; Gravel, Simon; Leong, Wen Fung; Tyler-Smith, Chris; Bainbridge, Matthew; Blackwell, Tom; Zheng-Bradley, Xiangqun; Chen, Yuan; Challis, Danny; Clarke, Laura; Ball, Edward V; Cibulskis, Kristian; Cooper, David N; Fulton, Bob; Hartl, Chris; Koboldt, Dan; Muzny, Donna; Smith, Richard; Sougnez, Carrie; Stewart, Chip; Ward, Alistair; Yu, Jin; Xue, Yali; Altshuler, David; Bustamante, Carlos D; Clark, Andrew G; Daly, Mark; DePristo, Mark; Flicek, Paul; Gabriel, Stacey; Mardis, Elaine; Palotie, Aarno; Gibbs, Richard
2011-09-14
Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency. The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants. This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation.
Mutation detection in the human HSP70B′ gene by denaturing high-performance liquid chromatography
Hecker, Karl H.; Asea, Alexzander; Kobayashi, Kaoru; Green, Stacy; Tang, Dan; Calderwood, Stuart K.
2000-01-01
Variances, particularly single nucleotide polymorphisms (SNP), in the genomic sequence of individuals are the primary key to understanding gene function as it relates to differences in the susceptibility to disease, environmental influences, and therapy. In this report, the HSP70B′ gene is the target sequence for mutation detection in biopsy samples from human prostate cancer patients undergoing combined hyperthermia and radiation therapy at the Dana-Farber Cancer Institute, using temperature-modulated heteroduplex analysis (TMHA). The underlying principles of TMHA for mutation detection using DHPLC technology are discussed. The procedures involved in amplicon design for mutation analysis by DHPLC are detailed. The melting behavior of the complete coding sequence of the target gene is characterized using WAVEMAKERTM software. Four overlapping amplicons, which span the complete coding region of the HSP70B′ gene, amenable to mutation detection by DHPLC were identified based on the software-predicted melting profile of the target sequence. TMHA was performed on PCR products of individual amplicons of the HSP70B′ gene on the WAVE® Nucleic Acid Fragment Analysis System. The criteria for mutation calling by comparing wild-type and mutant chromatographic patterns are discussed. PMID:11189446
Mutation detection in the human HSP7OB' gene by denaturing high-performance liquid chromatography.
Hecker, K H; Asea, A; Kobayashi, K; Green, S; Tang, D; Calderwood, S K
2000-11-01
Variances, particularly single nucleotide polymorphisms (SNP), in the genomic sequence of individuals are the primary key to understanding gene function as it relates to differences in the susceptibility to disease, environmental influences, and therapy. In this report, the HSP70B' gene is the target sequence for mutation detection in biopsy samples from human prostate cancer patients undergoing combined hyperthermia and radiation therapy at the Dana-Farber Cancer Institute, using temperature-modulated heteroduplex analysis (TMHA). The underlying principles of TMHA for mutation detection using DHPLC technology are discussed. The procedures involved in amplicon design for mutation analysis by DHPLC are detailed. The melting behavior of the complete coding sequence of the target gene is characterized using WAVEMAKER software. Four overlapping amplicons, which span the complete coding region of the HSP70B' gene, amenable to mutation detection by DHPLC were identified based on the software-predicted melting profile of the target sequence. TMHA was performed on PCR products of individual amplicons of the HSP70B' gene on the WAVE Nucleic Acid Fragment Analysis System. The criteria for mutation calling by comparing wild-type and mutant chromatographic patterns are discussed.
Exceptionally long 5' UTR short tandem repeats specifically linked to primates.
Namdar-Aligoodarzi, P; Mohammadparast, S; Zaker-Kandjani, B; Talebi Kakroodi, S; Jafari Vesiehsari, M; Ohadi, M
2015-09-10
We have previously reported genome-scale short tandem repeats (STRs) in the core promoter interval (i.e. -120 to +1 to the transcription start site) of protein-coding genes that have evolved identically in primates vs. non-primates. Those STRs may function as evolutionary switch codes for primate speciation. In the current study, we used the Ensembl database to analyze the 5' untranslated region (5' UTR) between +1 and +60 of the transcription start site of the entire human protein-coding genes annotated in the GeneCards database, in order to identify "exceptionally long" STRs (≥5-repeats), which may be of selective/adaptive advantage. The importance of this critical interval is its function as core promoter, and its effect on transcription and translation. In order to minimize ascertainment bias, we analyzed the evolutionary status of the human 5' UTR STRs of ≥5-repeats in several species encompassing six major orders and superorders across mammals, including primates, rodents, Scandentia, Laurasiatheria, Afrotheria, and Xenarthra. We introduce primate-specific STRs, and STRs which have expanded from mouse to primates. Identical co-occurrence of the identified STRs of rare average frequency between 0.006 and 0.0001 in primates supports a role for those motifs in processes that diverged primates from other mammals, such as neuronal differentiation (e.g. APOD and FGF4), and craniofacial development (e.g. FILIP1L). A number of the identified STRs of ≥5-repeats may be human-specific (e.g. ZMYM3 and DAZAP1). Future work is warranted to examine the importance of the listed genes in primate/human evolution, development, and disease. Copyright © 2015 Elsevier B.V. All rights reserved.
Qu, Wen; Cingolani, Pablo; Zeeberg, Barry R; Ruden, Douglas M
2017-01-01
Deep sequencing of cDNAs made from spliced mRNAs indicates that most coding genes in many animals and plants have pre-mRNA transcripts that are alternatively spliced. In pre-mRNAs, in addition to invariant exons that are present in almost all mature mRNA products, there are at least 6 additional types of exons, such as exons from alternative promoters or with alternative polyA sites, mutually exclusive exons, skipped exons, or exons with alternative 5' or 3' splice sites. Our bioinformatics-based hypothesis is that, in analogy to the genetic code, there is an "alternative-splicing code" in introns and flanking exon sequences, analogous to the genetic code, that directs alternative splicing of many of the 36 types of introns. In humans, we identified 42 different consensus sequences that are each present in at least 100 human introns. 37 of the 42 top consensus sequences are significantly enriched or depleted in at least one of the 36 types of introns. We further supported our hypothesis by showing that 96 out of 96 analyzed human disease mutations that affect RNA splicing, and change alternative splicing from one class to another, can be partially explained by a mutation altering a consensus sequence from one type of intron to that of another type of intron. Some of the alternative splicing consensus sequences, and presumably their small-RNA or protein targets, are evolutionarily conserved from 50 plant to animal species. We also noticed the set of introns within a gene usually share the same splicing codes, thus arguing that one sub-type of splicesosome might process all (or most) of the introns in a given gene. Our work sheds new light on a possible mechanism for generating the tremendous diversity in protein structure by alternative splicing of pre-mRNAs.
Lu, Xiangfeng; Peloso, Gina M; Liu, Dajiang J; Wu, Ying; Zhang, He; Zhou, Wei; Li, Jun; Tang, Clara Sze-Man; Dorajoo, Rajkumar; Li, Huaixing; Long, Jirong; Guo, Xiuqing; Xu, Ming; Spracklen, Cassandra N; Chen, Yang; Liu, Xuezhen; Zhang, Yan; Khor, Chiea Chuen; Liu, Jianjun; Sun, Liang; Wang, Laiyuan; Gao, Yu-Tang; Hu, Yao; Yu, Kuai; Wang, Yiqin; Cheung, Chloe Yu Yan; Wang, Feijie; Huang, Jianfeng; Fan, Qiao; Cai, Qiuyin; Chen, Shufeng; Shi, Jinxiu; Yang, Xueli; Zhao, Wanting; Sheu, Wayne H-H; Cherny, Stacey Shawn; He, Meian; Feranil, Alan B; Adair, Linda S; Gordon-Larsen, Penny; Du, Shufa; Varma, Rohit; Chen, Yii-Der Ida; Shu, Xiao-Ou; Lam, Karen Siu Ling; Wong, Tien Yin; Ganesh, Santhi K; Mo, Zengnan; Hveem, Kristian; Fritsche, Lars G; Nielsen, Jonas Bille; Tse, Hung-Fat; Huo, Yong; Cheng, Ching-Yu; Chen, Y Eugene; Zheng, Wei; Tai, E Shyong; Gao, Wei; Lin, Xu; Huang, Wei; Abecasis, Goncalo; Kathiresan, Sekar; Mohlke, Karen L; Wu, Tangchun; Sham, Pak Chung; Gu, Dongfeng; Willer, Cristen J
2017-12-01
Most genome-wide association studies have been of European individuals, even though most genetic variation in humans is seen only in non-European samples. To search for novel loci associated with blood lipid levels and clarify the mechanism of action at previously identified lipid loci, we used an exome array to examine protein-coding genetic variants in 47,532 East Asian individuals. We identified 255 variants at 41 loci that reached chip-wide significance, including 3 novel loci and 14 East Asian-specific coding variant associations. After a meta-analysis including >300,000 European samples, we identified an additional nine novel loci. Sixteen genes were identified by protein-altering variants in both East Asians and Europeans, and thus are likely to be functional genes. Our data demonstrate that most of the low-frequency or rare coding variants associated with lipids are population specific, and that examining genomic data across diverse ancestries may facilitate the identification of functional genes at associated loci.
Lu, Xiangfeng; Peloso, Gina M; Liu, Dajiang J.; Wu, Ying; Zhang, He; Zhou, Wei; Li, Jun; Tang, Clara Sze-man; Dorajoo, Rajkumar; Li, Huaixing; Long, Jirong; Guo, Xiuqing; Xu, Ming; Spracklen, Cassandra N.; Chen, Yang; Liu, Xuezhen; Zhang, Yan; Khor, Chiea Chuen; Liu, Jianjun; Sun, Liang; Wang, Laiyuan; Gao, Yu-Tang; Hu, Yao; Yu, Kuai; Wang, Yiqin; Cheung, Chloe Yu Yan; Wang, Feijie; Huang, Jianfeng; Fan, Qiao; Cai, Qiuyin; Chen, Shufeng; Shi, Jinxiu; Yang, Xueli; Zhao, Wanting; Sheu, Wayne H.-H.; Cherny, Stacey Shawn; He, Meian; Feranil, Alan B.; Adair, Linda S.; Gordon-Larsen, Penny; Du, Shufa; Varma, Rohit; da Chen, Yii-Der I; Shu, XiaoOu; Lam, Karen Siu Ling; Wong, Tien Yin; Ganesh, Santhi K.; Mo, Zengnan; Hveem, Kristian; Fritsche, Lars; Nielsen, Jonas Bille; Tse, Hung-fat; Huo, Yong; Cheng, Ching-Yu; Chen, Y. Eugene; Zheng, Wei; Tai, E Shyong; Gao, Wei; Lin, Xu; Huang, Wei; Abecasis, Goncalo; Consortium, GLGC; Kathiresan, Sekar; Mohlke, Karen L.; Wu, Tangchun; Sham, Pak Chung; Gu, Dongfeng; Willer, Cristen J
2017-01-01
Most genome-wide association studies have been conducted in European individuals, even though most genetic variation in humans is seen only in non-European samples. To search for novel loci associated with blood lipid levels and clarify the mechanism of action at previously identified lipid loci, we examined protein-coding genetic variants in 47,532 East Asian individuals using an exome array. We identified 255 variants at 41 loci reaching chip-wide significance, including 3 novel loci and 14 East Asian-specific coding variant associations. After meta-analysis with > 300,000 European samples, we identified an additional 9 novel loci. The same 16 genes were identified by the protein-altering variants in both East Asians and Europeans, likely pointing to the functional genes. Our data demonstrate that most of the low-frequency or rare coding variants associated with lipids are population-specific, and that examining genomic data across diverse ancestries may facilitate the identification of functional genes at associated loci. PMID:29083407
The Diversity Present in 5140 Human Mitochondrial Genomes
Pereira, Luísa; Freitas, Fernando; Fernandes, Verónica; Pereira, Joana B.; Costa, Marta D.; Costa, Stephanie; Máximo, Valdemar; Macaulay, Vincent; Rocha, Ricardo; Samuels, David C.
2009-01-01
We analyzed the current status (as of the end of August 2008) of human mitochondrial genomes deposited in GenBank, amounting to 5140 complete or coding-region sequences, in order to present an overall picture of the diversity present in the mitochondrial DNA of the global human population. To perform this task, we developed mtDNA-GeneSyn, a computer tool that identifies and exhaustedly classifies the diversity present in large genetic data sets. The diversity observed in the 5140 human mitochondrial genomes was compared with all possible transitions and transversions from the standard human mitochondrial reference genome. This comparison showed that tRNA and rRNA secondary structures have a large effect in limiting the diversity of the human mitochondrial sequences, whereas for the protein-coding genes there is a bias toward less variation at the second codon positions. The analysis of the observed amino acid variations showed a tolerance of variations that convert between the amino acids V, I, A, M, and T. This defines a group of amino acids with similar chemical properties that can interconvert by a single transition. PMID:19426953
Miyakawa, Hiroe; Miyamoto, Toshinobu; Koh, Eitetsu; Tsujimura, Akira; Miyagawa, Yasushi; Saijo, Yasuaki; Namiki, Mikio; Sengoku, Kazuo
2012-01-01
Genetic mechanisms have been implicated as a cause of some cases of male infertility. Recently, 10 novel genes involved in human spermatogenesis, including human SEPTIN12, were identified by expression microarray analysis of human testicular tissue. Septin12 is a member of the septin family of conserved cytoskeletal GTPases that form heteropolymeric filamentous structures in interphase cells. It is expressed specifically in the testis. Therefore, we hypothesized that mutation or polymorphisms of SEPTIN12 participate in male infertility, especially Sertoli cell-only syndrome (SCOS). To investigate whether SEPTIN12 gene defects are associated with azoospermia caused by SCOS, mutational analysis was performed in 100 Japanese patients by direct sequencing of coding regions. Statistical analysis was performed in patients with SCOS and in 140 healthy control men. No mutations were found in SEPTIN12 ; however, 8 coding single-nucleotide polymorphisms (SNP1-SNP8) could be detected in the patients with SCOS. The genotype and allele frequencies in SNP3, SNP4, and SNP6 were notably higher in the SCOS group than in the control group (P < .001). These results suggest that SEPTIN12 might play a critical role in human spermatogenesis.
The truth about mouse, human, worms and yeast
2004-01-01
Genome comparisons are behind the powerful new annotation methods being developed to find all human genes, as well as genes from other genomes. Genomes are now frequently being studied in pairs to provide cross-comparison datasets. This 'Noah's Ark' approach often reveals unsuspected genes and may support the deletion of false-positive predictions. Joining mouse and human as the cross-comparison dataset for the first two mammals are: two Drosophila species, D. melanogaster and D. pseudoobscura; two sea squirts, Ciona intestinalis and Ciona savignyi; four yeast (Saccharomyces) species; two nematodes, Caenorhabditis elegans and Caenorhabditis briggsae; and two pufferfish (Takefugu rubripes and Tetraodon nigroviridis). Even genomes like yeast and C. elegans, which have been known for more than five years, are now being significantly improved. Methods developed for yeast or nematodes will now be applied to mouse and human, and soon to additional mammals such as rat and dog, to identify all the mammalian protein-coding genes. Current large disparities between human Unigene predictions (127,835 genes) and gene-scanning methods (45,000 genes) still need to be resolved. This will be the challenge during the next few years. PMID:15601543
The truth about mouse, human, worms and yeast.
Nelson, David R; Nebert, Daniel W
2004-01-01
Genome comparisons are behind the powerful new annotation methods being developed to find all human genes, as well as genes from other genomes. Genomes are now frequently being studied in pairs to provide cross-comparison datasets. This 'Noah's Ark' approach often reveals unsuspected genes and may support the deletion of false-positive predictions. Joining mouse and human as the cross-comparison dataset for the first two mammals are: two Drosophila species, D. melanogaster and D. pseudoobscura; two sea squirts, Ciona intestinalis and Ciona savignyi; four yeast (Saccharomyces) species; two nematodes, Caenorhabditis elegans and Caenorhabditis briggsae; and two pufferfish (Takefugu rubripes and Tetraodon nigroviridis). Even genomes like yeast and C. elegans, which have been known for more than five years, are now being significantly improved. Methods developed for yeast or nematodes will now be applied to mouse and human, and soon to additional mammals such as rat and dog, to identify all the mammalian protein-coding genes. Current large disparities between human Unigene predictions (127,835 genes) and gene-scanning methods (45,000 genes) still need to be resolved. This will be the challenge during the next few years.
2018-01-01
FAM230C, a long intergenic non-coding RNA (lincRNA) gene in human chromosome 13 (chr13) is a member of lincRNA genes termed family with sequence similarity 230. An analysis using bioinformatics search tools and alignment programs was undertaken to determine properties of FAM230C and its related genes. Results reveal that the DNA translocation element, the Translocation Breakpoint Type A (TBTA) sequence, which consists of satellite DNA, Alu elements, and AT-rich sequences is embedded in the FAM230C gene. Eight lincRNA genes related to FAM230C also carry the TBTA sequences. These genes were formed from a large segment of the 3’ half of the FAM230C sequence duplicated in chr22, and are specifically in regions of low copy repeats (LCR22)s, in or close to the 22q.11.2 region. 22q11.2 is a chromosomal segment that undergoes a high rate of DNA translocation and is prone to genetic deletions. FAM230C-related genes present in other chromosomes do not carry the TBTA motif and were formed from the 5’ half region of the FAM230C sequence. These findings identify a high specificity in lincRNA gene formation by gene sequence duplication in different chromosomes. PMID:29668722
The zebrafish reference genome sequence and its relationship to the human genome.
Howe, Kerstin; Clark, Matthew D; Torroja, Carlos F; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T; Guerra-Assunção, José A; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F; Laird, Gavin K; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Elliot, David; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Begum, Sharmin; Mortimore, Beverley; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Lloyd, Christine; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James D; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Lanz, Christa; Raddatz, Günter; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Schuster, Stephan C; Carter, Nigel P; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M J; Enright, Anton; Geisler, Robert; Plasterk, Ronald H A; Lee, Charles; Westerfield, Monte; de Jong, Pieter J; Zon, Leonard I; Postlethwait, John H; Nüsslein-Volhard, Christiane; Hubbard, Tim J P; Roest Crollius, Hugues; Rogers, Jane; Stemple, Derek L
2013-04-25
Zebrafish have become a popular organism for the study of vertebrate gene function. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
The zebrafish reference genome sequence and its relationship to the human genome
Howe, Kerstin; Clark, Matthew D.; Torroja, Carlos F.; Torrance, James; Berthelot, Camille; Muffato, Matthieu; Collins, John E.; Humphray, Sean; McLaren, Karen; Matthews, Lucy; McLaren, Stuart; Sealy, Ian; Caccamo, Mario; Churcher, Carol; Scott, Carol; Barrett, Jeffrey C.; Koch, Romke; Rauch, Gerd-Jörg; White, Simon; Chow, William; Kilian, Britt; Quintais, Leonor T.; Guerra-Assunção, José A.; Zhou, Yi; Gu, Yong; Yen, Jennifer; Vogel, Jan-Hinnerk; Eyre, Tina; Redmond, Seth; Banerjee, Ruby; Chi, Jianxiang; Fu, Beiyuan; Langley, Elizabeth; Maguire, Sean F.; Laird, Gavin K.; Lloyd, David; Kenyon, Emma; Donaldson, Sarah; Sehra, Harminder; Almeida-King, Jeff; Loveland, Jane; Trevanion, Stephen; Jones, Matt; Quail, Mike; Willey, Dave; Hunt, Adrienne; Burton, John; Sims, Sarah; McLay, Kirsten; Plumb, Bob; Davis, Joy; Clee, Chris; Oliver, Karen; Clark, Richard; Riddle, Clare; Eliott, David; Threadgold, Glen; Harden, Glenn; Ware, Darren; Mortimer, Beverly; Kerry, Giselle; Heath, Paul; Phillimore, Benjamin; Tracey, Alan; Corby, Nicole; Dunn, Matthew; Johnson, Christopher; Wood, Jonathan; Clark, Susan; Pelan, Sarah; Griffiths, Guy; Smith, Michelle; Glithero, Rebecca; Howden, Philip; Barker, Nicholas; Stevens, Christopher; Harley, Joanna; Holt, Karen; Panagiotidis, Georgios; Lovell, Jamieson; Beasley, Helen; Henderson, Carl; Gordon, Daria; Auger, Katherine; Wright, Deborah; Collins, Joanna; Raisen, Claire; Dyer, Lauren; Leung, Kenric; Robertson, Lauren; Ambridge, Kirsty; Leongamornlert, Daniel; McGuire, Sarah; Gilderthorp, Ruth; Griffiths, Coline; Manthravadi, Deepa; Nichol, Sarah; Barker, Gary; Whitehead, Siobhan; Kay, Michael; Brown, Jacqueline; Murnane, Clare; Gray, Emma; Humphries, Matthew; Sycamore, Neil; Barker, Darren; Saunders, David; Wallis, Justene; Babbage, Anne; Hammond, Sian; Mashreghi-Mohammadi, Maryam; Barr, Lucy; Martin, Sancha; Wray, Paul; Ellington, Andrew; Matthews, Nicholas; Ellwood, Matthew; Woodmansey, Rebecca; Clark, Graham; Cooper, James; Tromans, Anthony; Grafham, Darren; Skuce, Carl; Pandian, Richard; Andrews, Robert; Harrison, Elliot; Kimberley, Andrew; Garnett, Jane; Fosker, Nigel; Hall, Rebekah; Garner, Patrick; Kelly, Daniel; Bird, Christine; Palmer, Sophie; Gehring, Ines; Berger, Andrea; Dooley, Christopher M.; Ersan-Ürün, Zübeyde; Eser, Cigdem; Geiger, Horst; Geisler, Maria; Karotki, Lena; Kirn, Anette; Konantz, Judith; Konantz, Martina; Oberländer, Martina; Rudolph-Geiger, Silke; Teucke, Mathias; Osoegawa, Kazutoyo; Zhu, Baoli; Rapp, Amanda; Widaa, Sara; Langford, Cordelia; Yang, Fengtang; Carter, Nigel P.; Harrow, Jennifer; Ning, Zemin; Herrero, Javier; Searle, Steve M. J.; Enright, Anton; Geisler, Robert; Plasterk, Ronald H. A.; Lee, Charles; Westerfield, Monte; de Jong, Pieter J.; Zon, Leonard I.; Postlethwait, John H.; Nüsslein-Volhard, Christiane; Hubbard, Tim J. P.; Crollius, Hugues Roest; Rogers, Jane; Stemple, Derek L.
2013-01-01
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination. PMID:23594743
Frech, Christian; Chen, Nansheng
2011-01-01
Genes underlying important phenotypic differences between Plasmodium species, the causative agents of malaria, are frequently found in only a subset of species and cluster at dynamically evolving subtelomeric regions of chromosomes. We hypothesized that chromosome-internal regions of Plasmodium genomes harbour additional species subset-specific genes that underlie differences in human pathogenicity, human-to-human transmissibility, and human virulence. We combined sequence similarity searches with synteny block analyses to identify species subset-specific genes in chromosome-internal regions of six published Plasmodium genomes, including Plasmodium falciparum, Plasmodium vivax, Plasmodium knowlesi, Plasmodium yoelii, Plasmodium berghei, and Plasmodium chabaudi. To improve comparative analysis, we first revised incorrectly annotated gene models using homology-based gene finders and examined putative subset-specific genes within syntenic contexts. Confirmed subset-specific genes were then analyzed for their role in biological pathways and examined for molecular functions using publicly available databases. We identified 16 genes that are well conserved in the three primate parasites but not found in rodent parasites, including three key enzymes of the thiamine (vitamin B1) biosynthesis pathway. Thirteen genes were found to be present in both human parasites but absent in the monkey parasite P. knowlesi, including genes specifically upregulated in sporozoites or gametocytes that could be linked to parasite transmission success between humans. Furthermore, we propose 15 chromosome-internal P. falciparum-specific genes as new candidate genes underlying increased human virulence and detected a currently uncharacterized cluster of P. vivax-specific genes on chromosome 6 likely involved in erythrocyte invasion. In conclusion, Plasmodium species harbour many chromosome-internal differences in the form of protein-coding genes, some of which are potentially linked to human disease and thus promising leads for future laboratory research. PMID:22215999
Beamer, B A; Negri, C; Yen, C J; Gavrilova, O; Rumberger, J M; Durcan, M J; Yarnall, D P; Hawkins, A L; Griffin, C A; Burns, D K; Roth, J; Reitman, M; Shuldiner, A R
1997-04-28
We determined the chromosomal localization and partial genomic structure of the coding region of the human PPAR gamma gene (hPPAR gamma), a nuclear receptor important for adipocyte differentiation and function. Sequence analysis and long PCR of human genomic DNA with primers that span putative introns revealed that intron positions and sizes of hPPAR gamma are similar to those previously determined for the mouse PPAR gamma gene[13]. Fluorescent in situ hybridization localized hPPAR gamma to chromosome 3, band 3p25. Radiation hybrid mapping with two independent primer pairs was consistent with hPPAR gamma being within 1.5 Mb of marker D3S1263 on 3p25-p24.2. These sequences of the intron/exon junctions of the 6 coding exons shared by hPPAR gamma 1 and hPPAR gamma 2 will facilitate screening for possible mutations. Furthermore, D3S1263 is a suitable polymorphic marker for linkage analysis to evaluate PPAR gamma's potential contribution to genetic susceptibility to obesity, lipoatrophy, insulin resistance, and diabetes.
MouSensor: A Versatile Genetic Platform to Create Super Sniffer Mice for Studying Human Odor Coding.
D'Hulst, Charlotte; Mina, Raena B; Gershon, Zachary; Jamet, Sophie; Cerullo, Antonio; Tomoiaga, Delia; Bai, Li; Belluscio, Leonardo; Rogers, Matthew E; Sirotin, Yevgeniy; Feinstein, Paul
2016-07-26
Typically, ∼0.1% of the total number of olfactory sensory neurons (OSNs) in the main olfactory epithelium express the same odorant receptor (OR) in a singular fashion and their axons coalesce into homotypic glomeruli in the olfactory bulb. Here, we have dramatically increased the total number of OSNs expressing specific cloned OR coding sequences by multimerizing a 21-bp sequence encompassing the predicted homeodomain binding site sequence, TAATGA, known to be essential in OR gene choice. Singular gene choice is maintained in these "MouSensors." In vivo synaptopHluorin imaging of odor-induced responses by known M71 ligands shows functional glomerular activation in an M71 MouSensor. Moreover, a behavioral avoidance task demonstrates that specific odor detection thresholds are significantly decreased in multiple transgenic lines, expressing mouse or human ORs. We have developed a versatile platform to study gene choice and axon identity, to create biosensors with great translational potential, and to finally decode human olfaction. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Retrieval of Enterobacteriaceae drug targets using singular value decomposition.
Silvério-Machado, Rita; Couto, Bráulio R G M; Dos Santos, Marcos A
2015-04-15
The identification of potential drug target proteins in bacteria is important in pharmaceutical research for the development of new antibiotics to combat bacterial agents that cause diseases. A new model that combines the singular value decomposition (SVD) technique with biological filters composed of a set of protein properties associated with bacterial drug targets and similarity to protein-coding essential genes of Escherichia coli (strain K12) has been created to predict potential antibiotic drug targets in the Enterobacteriaceae family. This model identified 99 potential drug target proteins in the studied family, which exhibit eight different functions and are protein-coding essential genes or similar to protein-coding essential genes of E.coli (strain K12), indicating that the disruption of the activities of these proteins is critical for cells. Proteins from bacteria with described drug resistance were found among the retrieved candidates. These candidates have no similarity to the human proteome, therefore exhibiting the advantage of causing no adverse effects or at least no known adverse effects on humans. rita_silverio@hotmail.com. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Genetic evidence for conserved non-coding element function across species–the ears have it
Turner, Eric E.; Cox, Timothy C.
2014-01-01
Comparison of genomic sequences from diverse vertebrate species has revealed numerous highly conserved regions that do not appear to encode proteins or functional RNAs. Often these “conserved non-coding elements,” or CNEs, can direct gene expression to specific tissues in transgenic models, demonstrating they have regulatory function. CNEs are frequently found near “developmental” genes, particularly transcription factors, implying that these elements have essential regulatory roles in development. However, actual examples demonstrating CNE regulatory functions across species have been few, and recent loss-of-function studies of several CNEs in mice have shown relatively minor effects. In this Perspectives article, we discuss new findings in “fancy” rats and Highland cattle demonstrating that function of a CNE near the Hmx1 gene is crucial for normal external ear development and when disrupted can mimic loss-of function Hmx1 coding mutations in mice and humans. These findings provide important support for conserved developmental roles of CNEs in divergent species, and reinforce the concept that CNEs should be examined systematically in the ongoing search for genetic causes of human developmental disorders in the era of genome-scale sequencing. PMID:24478720
Pan-cancer transcriptomic analysis associates long non-coding RNAs with key mutational driver events
Ashouri, Arghavan; Sayin, Volkan I.; Van den Eynden, Jimmy; Singh, Simranjit X.; Papagiannakopoulos, Thales; Larsson, Erik
2016-01-01
Thousands of long non-coding RNAs (lncRNAs) lie interspersed with coding genes across the genome, and a small subset has been implicated as downstream effectors in oncogenic pathways. Here we make use of transcriptome and exome sequencing data from thousands of tumours across 19 cancer types, to identify lncRNAs that are induced or repressed in relation to somatic mutations in key oncogenic driver genes. Our screen confirms known coding and non-coding effectors and also associates many new lncRNAs to relevant pathways. The associations are often highly reproducible across cancer types, and while many lncRNAs are co-expressed with their protein-coding hosts or neighbours, some are intergenic and independent. We highlight lncRNAs with possible functions downstream of the tumour suppressor TP53 and the master antioxidant transcription factor NFE2L2. Our study provides a comprehensive overview of lncRNA transcriptional alterations in relation to key driver mutational events in human cancers. PMID:28959951
DOE Office of Scientific and Technical Information (OSTI.GOV)
Proia, R.L.
1988-03-01
Lysosomal {beta}-hexosaminidase is composed of two structurally similar chains, {alpha} and {beta}, that are the products of different genes. Mutations in either gene causing {beta}-hexosaminidase deficiency result in the lysosomal storage disease GM2-gangliosidosis. To enable the investigation of the molecular lesions in this disorder and to study the evolutionary relationship between the {alpha} and {beta} chains, the {beta}-chain gene was isolated, and its organization was characterized. The {beta}-chain coding region is divided into 14 exons distributed over {approx}40 kilobases of DNA. Comparison with the {alpha}-chain gene revealed that 12 of the 13 introns interrupt the coding regions at homologous positions.more » This extensive sharing of intron placement demonstrates that the {alpha} and {beta} chains evolved by way of the duplication of a common ancestor.« less
Evolution of coding and non-coding genes in HOX clusters of a marsupial.
Yu, Hongshi; Lindsay, James; Feng, Zhi-Ping; Frankenberg, Stephen; Hu, Yanqiu; Carone, Dawn; Shaw, Geoff; Pask, Andrew J; O'Neill, Rachel; Papenfuss, Anthony T; Renfree, Marilyn B
2012-06-18
The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial.
Evolution of coding and non-coding genes in HOX clusters of a marsupial
2012-01-01
Background The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Results Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. Conclusions This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial. PMID:22708672
Raymond, Frédéric; Boisvert, Sébastien; Roy, Gaétan; Ritt, Jean-François; Légaré, Danielle; Isnard, Amandine; Stanke, Mario; Olivier, Martin; Tremblay, Michel J.; Papadopoulou, Barbara; Ouellette, Marc; Corbeil, Jacques
2012-01-01
The Leishmania tarentolae Parrot-TarII strain genome sequence was resolved to an average 16-fold mean coverage by next-generation DNA sequencing technologies. This is the first non-pathogenic to humans kinetoplastid protozoan genome to be described thus providing an opportunity for comparison with the completed genomes of pathogenic Leishmania species. A high synteny was observed between all sequenced Leishmania species. A limited number of chromosomal regions diverged between L. tarentolae and L. infantum, while remaining syntenic to L. major. Globally, >90% of the L. tarentolae gene content was shared with the other Leishmania species. We identified 95 predicted coding sequences unique to L. tarentolae and 250 genes that were absent from L. tarentolae. Interestingly, many of the latter genes were expressed in the intracellular amastigote stage of pathogenic species. In addition, genes coding for products involved in antioxidant defence or participating in vesicular-mediated protein transport were underrepresented in L. tarentolae. In contrast to other Leishmania genomes, two gene families were expanded in L. tarentolae, namely the zinc metallo-peptidase surface glycoprotein GP63 and the promastigote surface antigen PSA31C. Overall, L. tarentolae's gene content appears better adapted to the promastigote insect stage rather than the amastigote mammalian stage. PMID:21998295
The complete mitochondrial genome of the endangered spotback skate, Atlantoraja castelnaui.
Duckett, Drew J L; Naylor, Gavin J P
2016-05-01
Chondrichthyes are a highly threatened class of organisms, largely due to overfishing and other human activities. The present study describes the complete mitochondrial genome (16,750 bp) of the endangered spotback skate, Atlantoraja castelnaui. The mitogenome is arranged in a typical vertebrate fashion, containing 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes and 1 control region.
Decoding the disease-associated proteins encoded in the human chromosome 4.
Chen, Lien-Chin; Liu, Mei-Ying; Hsiao, Yung-Chin; Choong, Wai-Kok; Wu, Hsin-Yi; Hsu, Wen-Lian; Liao, Pao-Chi; Sung, Ting-Yi; Tsai, Shih-Feng; Yu, Jau-Song; Chen, Yu-Ju
2013-01-04
Chromosome 4 is the fourth largest chromosome, containing approximately 191 megabases (~6.4% of the human genome) with 757 protein-coding genes. A number of marker genes for many diseases have been found in this chromosome, including genetic diseases (e.g., hepatocellular carcinoma) and biomedical research (cardiac system, aging, metabolic disorders, immune system, cancer and stem cell) related genes (e.g., oncogenes, growth factors). As a pilot study for the chromosome 4-centric human proteome project (Chr 4-HPP), we present here a systematic analysis of the disease association, protein isoforms, coding single nucleotide polymorphisms of these 757 protein-coding genes and their experimental evidence at the protein level. We also describe how the findings from the chromosome 4 project might be used to drive the biomarker discovery and validation study in disease-oriented projects, using the examples of secretomic and membrane proteomic approaches in cancer research. By integrating with cancer cell secretomes and several other existing databases in the public domain, we identified 141 chromosome 4-encoded proteins as cancer cell-secretable/shedable proteins. Additionally, we also identified 54 chromosome 4-encoded proteins that have been classified as cancer-associated proteins with successful selected or multiple reaction monitoring (SRM/MRM) assays developed. From literature annotation and topology analysis, 271 proteins were recognized as membrane proteins while 27.9% of the 757 proteins do not have any experimental evidence at the protein-level. In summary, the analysis revealed that the chromosome 4 is a rich resource for cancer-associated proteins for biomarker verification projects and for drug target discovery projects.
Target genes discovery through copy number alteration analysis in human hepatocellular carcinoma.
Gu, De-Leung; Chen, Yen-Hsieh; Shih, Jou-Ho; Lin, Chi-Hung; Jou, Yuh-Shan; Chen, Chian-Feng
2013-12-21
High-throughput short-read sequencing of exomes and whole cancer genomes in multiple human hepatocellular carcinoma (HCC) cohorts confirmed previously identified frequently mutated somatic genes, such as TP53, CTNNB1 and AXIN1, and identified several novel genes with moderate mutation frequencies, including ARID1A, ARID2, MLL, MLL2, MLL3, MLL4, IRF2, ATM, CDKN2A, FGF19, PIK3CA, RPS6KA3, JAK1, KEAP1, NFE2L2, C16orf62, LEPR, RAC2, and IL6ST. Functional classification of these mutated genes suggested that alterations in pathways participating in chromatin remodeling, Wnt/β-catenin signaling, JAK/STAT signaling, and oxidative stress play critical roles in HCC tumorigenesis. Nevertheless, because there are few druggable genes used in HCC therapy, the identification of new therapeutic targets through integrated genomic approaches remains an important task. Because a large amount of HCC genomic data genotyped by high density single nucleotide polymorphism arrays is deposited in the public domain, copy number alteration (CNA) analyses of these arrays is a cost-effective way to reveal target genes through profiling of recurrent and overlapping amplicons, homozygous deletions and potentially unbalanced chromosomal translocations accumulated during HCC progression. Moreover, integration of CNAs with other high-throughput genomic data, such as aberrantly coding transcriptomes and non-coding gene expression in human HCC tissues and rodent HCC models, provides lines of evidence that can be used to facilitate the identification of novel HCC target genes with the potential of improving the survival of HCC patients.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer.
Wojcik, Sylwia E; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z; Rai, Kanti R; Kipps, Thomas J; Keating, Michael J; Croce, Carlo M; Calin, George A
2010-02-01
Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer
Wojcik, Sylwia E.; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S.; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z.; Rai, Kanti R.; Kipps, Thomas J.; Keating, Michael J.
2010-01-01
Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas. PMID:19926640
Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project
Horton, Roger; Gibson, Richard; Coggill, Penny; Miretti, Marcos; Allcock, Richard J.; Almeida, Jeff; Forbes, Simon; Gilbert, James G. R.; Halls, Karen; Harrow, Jennifer L.; Hart, Elizabeth; Howe, Kevin; Jackson, David K.; Palmer, Sophie; Roberts, Anne N.; Sims, Sarah; Stewart, C. Andrew; Traherne, James A.; Trevanion, Steve; Wilming, Laurens; Rogers, Jane; de Jong, Pieter J.; Elliott, John F.; Sawcer, Stephen; Todd, John A.; Trowsdale, John
2008-01-01
The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine. PMID:18193213
Lumsden, Amanda L; Ma, Yuefang; Ashander, Liam M; Stempel, Andrew J; Keating, Damien J; Smith, Justine R; Appukuttan, Binoy
2018-05-09
Regulation of intercellular adhesion molecule (ICAM)-1 in retinal endothelial cells is a promising druggable target for retinal vascular diseases. The ICAM-1-related (ICR) long non-coding RNA stabilizes ICAM-1 transcript, increasing protein expression. However, studies of ICR involvement in disease have been limited as the promoter is uncharacterized. To address this issue, we undertook a comprehensive in silico analysis of the human ICR gene promoter region. We used genomic evolutionary rate profiling to identify a 115 base pair (bp) sequence within 500 bp upstream of the transcription start site of the annotated human ICR gene that was conserved across 25 eutherian genomes. A second constrained sequence upstream of the orthologous mouse gene (68 bp; conserved across 27 Eutherian genomes including human) was also discovered. Searching these elements identified 33 matrices predictive of binding sites for transcription factors known to be responsive to a broad range of pathological stimuli, including hypoxia, and metabolic and inflammatory proteins. Five phenotype-associated single nucleotide polymorphisms (SNPs) in the immediate vicinity of these elements included four SNPs (i.e. rs2569693, rs281439, rs281440 and rs11575074) predicted to impact binding motifs of transcription factors, and thus the expression of ICR and ICAM-1 genes, with potential to influence disease susceptibility. We verified that human retinal endothelial cells expressed ICR, and observed induction of expression by tumor necrosis factor-α.
Insights into HLA-G Genetics Provided by Worldwide Haplotype Diversity
Castelli, Erick C.; Ramalho, Jaqueline; Porto, Iane O. P.; Lima, Thálitta H. A.; Felício, Leandro P.; Sabbagh, Audrey; Donadi, Eduardo A.; Mendes-Junior, Celso T.
2014-01-01
Human leukocyte antigen G (HLA-G) belongs to the family of non-classical HLA class I genes, located within the major histocompatibility complex (MHC). HLA-G has been the target of most recent research regarding the function of class I non-classical genes. The main features that distinguish HLA-G from classical class I genes are (a) limited protein variability, (b) alternative splicing generating several membrane bound and soluble isoforms, (c) short cytoplasmic tail, (d) modulation of immune response (immune tolerance), and (e) restricted expression to certain tissues. In the present work, we describe the HLA-G gene structure and address the HLA-G variability and haplotype diversity among several populations around the world, considering each of its major segments [promoter, coding, and 3′ untranslated region (UTR)]. For this purpose, we developed a pipeline to reevaluate the 1000Genomes data and recover miscalled or missing genotypes and haplotypes. It became clear that the overall structure of the HLA-G molecule has been maintained during the evolutionary process and that most of the variation sites found in the HLA-G coding region are either coding synonymous or intronic mutations. In addition, only a few frequent and divergent extended haplotypes are found when the promoter, coding, and 3′UTRs are evaluated together. The divergence is particularly evident for the regulatory regions. The population comparisons confirmed that most of the HLA-G variability has originated before human dispersion from Africa and that the allele and haplotype frequencies have probably been shaped by strong selective pressures. PMID:25339953
Haas, Brian J; Salzberg, Steven L; Zhu, Wei; Pertea, Mihaela; Allen, Jonathan E; Orvis, Joshua; White, Owen; Buell, C Robin; Wortman, Jennifer R
2008-01-01
EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation. PMID:18190707
Potassium bromate (KBr03) is a rat renal carcinogen and a major drinking water disinfection by-product from ozonization. While KBr03 is a human nephro- and neuro-toxicant, its carcinogenicity in humans is unknown. Clear cell renal tumors, the common form of human renal carcinomas...
In Silico Pattern-Based Analysis of the Human Cytomegalovirus Genome
Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T.; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas
2003-01-01
More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/). PMID:12634390
In silico pattern-based analysis of the human cytomegalovirus genome.
Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas
2003-04-01
More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/).
Porcine insulin receptor substrate 4 (IRS4) gene: cloning, polymorphism and association study
USDA-ARS?s Scientific Manuscript database
Using PCR and IPCR techniques we obtained a 4498 bp nucleotide sequence FN424076 encompassing the complete coding sequence of the porcine IRS4 gene and its proximal promoter. The 1269-amino acid porcine protein deduced from the nucleotide sequence shares 92% identity with the human IRS4 and possesse...
Afouda, Pamela; Durand, Guillaume A; Lagier, Jean-Christophe; Labas, Noémie; Cadoret, Fréderic; Armstrong, Nicholas; Raoult, Didier; Dubourg, Grégory
2018-04-14
Intestinimonas massiliensis sp. nov strain GD2 T is a new species of the genus Intestinimonas (the second, following Intestinimonas butyriciproducens gen. nov., sp. nov). First isolated from the gut microbiota of a healthy subject of French origin using a culturomics approach combined with taxono-genomics, it is strictly anaerobic, nonspore-forming, rod-shaped, with catalase- and oxidase-negative reactions. Its growth was observed after preincubation in an anaerobic blood culture enriched with sheep blood (5%) and rumen fluid (5%), incubated at 37°C. Its phenotypic and genotypic descriptions are presented in this paper with a full annotation of its genome sequence. This genome consists of 3,104,261 bp in length and contains 3,074 predicted genes, including 3,012 protein-coding genes and 62 RNA-coding genes. Strain GD2 T significantly produces butyrate and is frequently found among available 16S rRNA gene amplicon datasets, which leads consideration of Intestinimonas massiliensis as an important human gut commensal. © 2018 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
Zorc, Minja; Kunej, Tanja
2016-05-01
MicroRNAs (miRNAs) are a class of non-coding RNAs involved in posttranscriptional regulation of target genes. Regulation requires complementarity between target mRNA and the mature miRNA seed region, responsible for their recognition and binding. It has been estimated that each miRNA targets approximately 200 genes, and genetic variability of miRNA genes has been reported to affect phenotypic variability and disease susceptibility in humans, livestock species, and model organisms. Polymorphisms in miRNA genes could therefore represent biomarkers for phenotypic traits in livestock animals. In our previous study, we collected polymorphisms within miRNA genes in chicken. In the present study, we identified miRNA-related genomic overlaps to prioritize genomic regions of interest for further functional studies and biomarker discovery. Overlapping genomic regions in chicken were analyzed using the following bioinformatics tools and databases: miRNA SNiPer, Ensembl, miRBase, NCBI Blast, and QTLdb. Out of 740 known pre-miRNA genes, 263 (35.5 %) contain polymorphisms; among them, 35 contain more than three polymorphisms The most polymorphic miRNA genes in chicken are gga-miR-6662, containing 23 single nucleotide polymorphisms (SNPs) within the pre-miRNA region, including five consecutive SNPs, and gga-miR-6688, containing ten polymorphisms including three consecutive polymorphisms. Several miRNA-related genomic hotspots have been revealed in chicken genome; polymorphic miRNA genes are located within protein-coding and/or non-coding transcription units and quantitative trait loci (QTL) associated with production traits. The present study includes the first description of an exonic miRNA in a chicken genome, an overlap between the miRNA gene and the exon of the protein-coding gene (gga-miR-6578/HADHB), and the first report of a missense polymorphism located within a mature miRNA seed region. Identified miRNA-related genomic hotspots in chicken can serve researchers as a starting point for further functional studies and association studies with poultry production and health traits and the basis for systematic screening of exonic miRNAs and missense/miRNA seed polymorphisms in other genomes.
Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas.
Mathelier, Anthony; Lefebvre, Calvin; Zhang, Allen W; Arenillas, David J; Ding, Jiarui; Wasserman, Wyeth W; Shah, Sohrab P
2015-04-23
With the rapid increase of whole-genome sequencing of human cancers, an important opportunity to analyze and characterize somatic mutations lying within cis-regulatory regions has emerged. A focus on protein-coding regions to identify nonsense or missense mutations disruptive to protein structure and/or function has led to important insights; however, the impact on gene expression of mutations lying within cis-regulatory regions remains under-explored. We analyzed somatic mutations from 84 matched tumor-normal whole genomes from B-cell lymphomas with accompanying gene expression measurements to elucidate the extent to which these cancers are disrupted by cis-regulatory mutations. We characterize mutations overlapping a high quality set of well-annotated transcription factor binding sites (TFBSs), covering a similar portion of the genome as protein-coding exons. Our results indicate that cis-regulatory mutations overlapping predicted TFBSs are enriched in promoter regions of genes involved in apoptosis or growth/proliferation. By integrating gene expression data with mutation data, our computational approach culminates with identification of cis-regulatory mutations most likely to participate in dysregulation of the gene expression program. The impact can be measured along with protein-coding mutations to highlight key mutations disrupting gene expression and pathways in cancer. Our study yields specific genes with disrupted expression triggered by genomic mutations in either the coding or the regulatory space. It implies that mutated regulatory components of the genome contribute substantially to cancer pathways. Our analyses demonstrate that identifying genomically altered cis-regulatory elements coupled with analysis of gene expression data will augment biological interpretation of mutational landscapes of cancers.
2014-01-01
Background Fascioliasis is an important and neglected disease of humans and other mammals, caused by trematodes of the genus Fasciola. Fasciola hepatica and F. gigantica are valid species that infect humans and animals, but the specific status of Fasciola sp. (‘intermediate form’) is unclear. Methods Single specimens inferred to represent Fasciola sp. (‘intermediate form’; Heilongjiang) and F. gigantica (Guangxi) from China were genetically identified and characterized using PCR-based sequencing of the first and second internal transcribed spacer regions of nuclear ribosomal DNA. The complete mitochondrial (mt) genomes of these representative specimens were then sequenced. The relationships of these specimens with selected members of the Trematoda were assessed by phylogenetic analysis of concatenated amino acid sequence datasets by Bayesian inference (BI). Results The complete mt genomes of representatives of Fasciola sp. and F. gigantica were 14,453 bp and 14,478 bp in size, respectively. Both mt genomes contain 12 protein-coding genes, 22 transfer RNA genes and two ribosomal RNA genes, but lack an atp8 gene. All protein-coding genes are transcribed in the same direction, and the gene order in both mt genomes is the same as that published for F. hepatica. Phylogenetic analysis of the concatenated amino acid sequence data for all 12 protein-coding genes showed that the specimen of Fasciola sp. was more closely related to F. gigantica than to F. hepatica. Conclusions The mt genomes characterized here provide a rich source of markers, which can be used in combination with nuclear markers and imaging techniques, for future comparative studies of the biology of Fasciola sp. from China and other countries. PMID:24685294
Kirsten, Holger; Al-Hasani, Hoor; Holdt, Lesca; Gross, Arnd; Beutner, Frank; Krohn, Knut; Horn, Katrin; Ahnert, Peter; Burkhardt, Ralph; Reiche, Kristin; Hackermüller, Jörg; Löffler, Markus; Teupser, Daniel; Thiery, Joachim; Scholz, Markus
2015-01-01
Genetics of gene expression (eQTLs or expression QTLs) has proved an indispensable tool for understanding biological pathways and pathomechanisms of trait-associated SNPs. However, power of most genome-wide eQTL studies is still limited. We performed a large eQTL study in peripheral blood mononuclear cells of 2112 individuals increasing the power to detect trans-effects genome-wide. Going beyond univariate SNP-transcript associations, we analyse relations of eQTLs to biological pathways, polygenetic effects of expression regulation, trans-clusters and enrichment of co-localized functional elements. We found eQTLs for about 85% of analysed genes, and 18% of genes were trans-regulated. Local eSNPs were enriched up to a distance of 5 Mb to the transcript challenging typically implemented ranges of cis-regulations. Pathway enrichment within regulated genes of GWAS-related eSNPs supported functional relevance of identified eQTLs. We demonstrate that nearest genes of GWAS-SNPs might frequently be misleading functional candidates. We identified novel trans-clusters of potential functional relevance for GWAS-SNPs of several phenotypes including obesity-related traits, HDL-cholesterol levels and haematological phenotypes. We used chromatin immunoprecipitation data for demonstrating biological effects. Yet, we show for strongly heritable transcripts that still little trans-chromosomal heritability is explained by all identified trans-eSNPs; however, our data suggest that most cis-heritability of these transcripts seems explained. Dissection of co-localized functional elements indicated a prominent role of SNPs in loci of pseudogenes and non-coding RNAs for the regulation of coding genes. In summary, our study substantially increases the catalogue of human eQTLs and improves our understanding of the complex genetic regulation of gene expression, pathways and disease-related processes. PMID:26019233
Liu, Guo-Hua; Gasser, Robin B; Young, Neil D; Song, Hui-Qun; Ai, Lin; Zhu, Xing-Quan
2014-03-31
Fascioliasis is an important and neglected disease of humans and other mammals, caused by trematodes of the genus Fasciola. Fasciola hepatica and F. gigantica are valid species that infect humans and animals, but the specific status of Fasciola sp. ('intermediate form') is unclear. Single specimens inferred to represent Fasciola sp. ('intermediate form'; Heilongjiang) and F. gigantica (Guangxi) from China were genetically identified and characterized using PCR-based sequencing of the first and second internal transcribed spacer regions of nuclear ribosomal DNA. The complete mitochondrial (mt) genomes of these representative specimens were then sequenced. The relationships of these specimens with selected members of the Trematoda were assessed by phylogenetic analysis of concatenated amino acid sequence datasets by Bayesian inference (BI). The complete mt genomes of representatives of Fasciola sp. and F. gigantica were 14,453 bp and 14,478 bp in size, respectively. Both mt genomes contain 12 protein-coding genes, 22 transfer RNA genes and two ribosomal RNA genes, but lack an atp8 gene. All protein-coding genes are transcribed in the same direction, and the gene order in both mt genomes is the same as that published for F. hepatica. Phylogenetic analysis of the concatenated amino acid sequence data for all 12 protein-coding genes showed that the specimen of Fasciola sp. was more closely related to F. gigantica than to F. hepatica. The mt genomes characterized here provide a rich source of markers, which can be used in combination with nuclear markers and imaging techniques, for future comparative studies of the biology of Fasciola sp. from China and other countries.
Izuogu, Osagie G; Alhasan, Abd A; Mellough, Carla; Collin, Joseph; Gallon, Richard; Hyslop, Jonathon; Mastrorosa, Francesco K; Ehrmann, Ingrid; Lako, Majlinda; Elliott, David J; Santibanez-Koref, Mauro; Jackson, Michael S
2018-04-20
Circular RNAs (circRNAs) are predominantly derived from protein coding genes, and some can act as microRNA sponges or transcriptional regulators. Changes in circRNA levels have been identified during human development which may be functionally important, but lineage-specific analyses are currently lacking. To address this, we performed RNAseq analysis of human embryonic stem (ES) cells differentiated for 90 days towards 3D laminated retina. A transcriptome-wide increase in circRNA expression, size, and exon count was observed, with circRNA levels reaching a plateau by day 45. Parallel statistical analyses, controlling for sample and locus specific effects, identified 239 circRNAs with expression changes distinct from the transcriptome-wide pattern, but these all also increased in abundance over time. Surprisingly, circRNAs derived from long non-coding RNAs (lncRNAs) were found to account for a significantly larger proportion of transcripts from their loci of origin than circRNAs from coding genes. The most abundant, circRMST:E12-E6, showed a > 100X increase during differentiation accompanied by an isoform switch, and accounts for > 99% of RMST transcripts in many adult tissues. The second most abundant, circFIRRE:E10-E5, accounts for > 98% of FIRRE transcripts in differentiating human ES cells, and is one of 39 FIRRE circRNAs, many of which include multiple unannotated exons. Our results suggest that during human ES cell differentiation, changes in circRNA levels are primarily globally controlled. They also suggest that RMST and FIRRE, genes with established roles in neurogenesis and topological organisation of chromosomal domains respectively, are processed as circular lncRNAs with only minor linear species.
Hofstra, R M; Osinga, J; Tan-Sindhunata, G; Wu, Y; Kamsteeg, E J; Stulp, R P; van Ravenswaaij-Arts, C; Majoor-Krakauer, D; Angrist, M; Chakravarti, A; Meijers, C; Buys, C H
1996-04-01
Hirschsprung disease (HSCR) or colonic aganglionosis is a congenital disorder characterized by an absence of intramural ganglia along variable lengths of the colon resulting in intestinal obstruction. The incidence of HSCR is 1 in 5,000 live births. Mutations in the RET gene, which codes for a receptor tyrosine kinase, and in EDNRB which codes for the endothelin-B receptor, have been shown to be associated with HSCR in humans. The lethal-spotted mouse which has pigment abnormalities, but also colonic aganglionosis, carries a mutation in the gene coding for endothelin 3 (Edn3), the ligand for the receptor protein encoded by EDNRB. Here, we describe a mutation of the human gene for endothelin 3 (EDN3), homozygously present in a patient with a combined Waardenburg syndrome type 2 (WS2) and HSCR phenotype (Shah-Waardenburg syndrome). The mutation, Cys159Phe, in exon 3 in the ET-3 like domain of EDN3, presumably affects the proteolytic processing of the preproendothelin to the mature peptide EDN3. The patient's parents were first cousins. A previous child in this family had been diagnosed with a similar combination of HSCR, depigmentation and deafness. Depigmentation and deafness were present in other relatives. Moreover, we present a further indication for the involvement of EDNRB in HSCR by reporting a novel mutation detected in one of 40 unselected HSCR patients.
Conserved syntenic clusters of protein coding genes are missing in birds.
Lovell, Peter V; Wirthlin, Morgan; Wilhelm, Larry; Minx, Patrick; Lazar, Nathan H; Carbone, Lucia; Warren, Wesley C; Mello, Claudio V
2014-01-01
Birds are one of the most highly successful and diverse groups of vertebrates, having evolved a number of distinct characteristics, including feathers and wings, a sturdy lightweight skeleton and unique respiratory and urinary/excretion systems. However, the genetic basis of these traits is poorly understood. Using comparative genomics based on extensive searches of 60 avian genomes, we have found that birds lack approximately 274 protein coding genes that are present in the genomes of most vertebrate lineages and are for the most part organized in conserved syntenic clusters in non-avian sauropsids and in humans. These genes are located in regions associated with chromosomal rearrangements, and are largely present in crocodiles, suggesting that their loss occurred subsequent to the split of dinosaurs/birds from crocodilians. Many of these genes are associated with lethality in rodents, human genetic disorders, or biological functions targeting various tissues. Functional enrichment analysis combined with orthogroup analysis and paralog searches revealed enrichments that were shared by non-avian species, present only in birds, or shared between all species. Together these results provide a clearer definition of the genetic background of extant birds, extend the findings of previous studies on missing avian genes, and provide clues about molecular events that shaped avian evolution. They also have implications for fields that largely benefit from avian studies, including development, immune system, oncogenesis, and brain function and cognition. With regards to the missing genes, birds can be considered ‘natural knockouts’ that may become invaluable model organisms for several human diseases.
Variation in conserved non-coding sequences on chromosome 5q andsusceptibility to asthma and atopy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Donfack, Joseph; Schneider, Daniel H.; Tan, Zheng
2005-09-10
Background: Evolutionarily conserved sequences likely havebiological function. Methods: To determine whether variation in conservedsequences in non-coding DNA contributes to risk for human disease, westudied six conserved non-coding elements in the Th2 cytokine cluster onhuman chromosome 5q31 in a large Hutterite pedigree and in samples ofoutbred European American and African American asthma cases and controls.Results: Among six conserved non-coding elements (>100 bp,>70percent identity; human-mouse comparison), we identified one singlenucleotide polymorphism (SNP) in each of two conserved elements and sixSNPs in the flanking regions of three conserved elements. We genotypedour samples for four of these SNPs and an additional three SNPs eachmore » inthe IL13 and IL4 genes. While there was only modest evidence forassociation with single SNPs in the Hutterite and European Americansamples (P<0.05), there were highly significant associations inEuropean Americans between asthma and haplotypes comprised of SNPs in theIL4 gene (P<0.001), including a SNP in a conserved non-codingelement. Furthermore, variation in the IL13 gene was strongly associatedwith total IgE (P = 0.00022) and allergic sensitization to mold allergens(P = 0.00076) in the Hutterites, and more modestly associated withsensitization to molds in the European Americans and African Americans (P<0.01). Conclusion: These results indicate that there is overalllittle variation in the conserved non-coding elements on 5q31, butvariation in IL4 and IL13, including possibly one SNP in a conservedelement, influence asthma and atopic phenotypes in diversepopulations.« less
Xiong, Haoyu; Barker, Stephen C.; Burger, Thomas D.; Raoult, Didier; Shao, Renfu
2013-01-01
The typical mitochondrial (mt) genomes of bilateral animals consist of 37 genes on a single circular chromosome. The mt genomes of the human body louse, Pediculus humanus, and the human head louse, Pediculus capitis, however, are extensively fragmented and contain 20 minichromosomes, with one to three genes on each minichromosome. Heteroplasmy, i.e. nucleotide polymorphisms in the mt genome within individuals, has been shown to be significantly higher in the mt cox1 gene of human lice than in humans and other animals that have the typical mt genomes. To understand whether the extent of heteroplasmy in human lice is associated with mt genome fragmentation, we sequenced the entire coding regions of all of the mt minichromosomes of six human body lice and six human head lice from Ethiopia, China and France with an Illumina HiSeq platform. For comparison, we also sequenced the entire coding regions of the mt genomes of seven species of ticks, which have the typical mitochondrial genome organization of bilateral animals. We found that the level of heteroplasmy varies significantly both among the human lice and among the ticks. The human lice from Ethiopia have significantly higher level of heteroplasmy than those from China and France (Pt<0.05). The tick, Amblyomma cajennense, has significantly higher level of heteroplasmy than other ticks (Pt<0.05). Our results indicate that heteroplasmy level can be substantially variable within a species and among closely related species, and does not appear to be determined by single factors such as genome fragmentation. PMID:24058467
Xiong, Haoyu; Barker, Stephen C; Burger, Thomas D; Raoult, Didier; Shao, Renfu
2013-01-01
The typical mitochondrial (mt) genomes of bilateral animals consist of 37 genes on a single circular chromosome. The mt genomes of the human body louse, Pediculus humanus, and the human head louse, Pediculus capitis, however, are extensively fragmented and contain 20 minichromosomes, with one to three genes on each minichromosome. Heteroplasmy, i.e. nucleotide polymorphisms in the mt genome within individuals, has been shown to be significantly higher in the mt cox1 gene of human lice than in humans and other animals that have the typical mt genomes. To understand whether the extent of heteroplasmy in human lice is associated with mt genome fragmentation, we sequenced the entire coding regions of all of the mt minichromosomes of six human body lice and six human head lice from Ethiopia, China and France with an Illumina HiSeq platform. For comparison, we also sequenced the entire coding regions of the mt genomes of seven species of ticks, which have the typical mitochondrial genome organization of bilateral animals. We found that the level of heteroplasmy varies significantly both among the human lice and among the ticks. The human lice from Ethiopia have significantly higher level of heteroplasmy than those from China and France (Pt<0.05). The tick, Amblyomma cajennense, has significantly higher level of heteroplasmy than other ticks (Pt<0.05). Our results indicate that heteroplasmy level can be substantially variable within a species and among closely related species, and does not appear to be determined by single factors such as genome fragmentation.
Unique features of a global human ectoparasite identified through sequencing of the bed bug genome.
Benoit, Joshua B; Adelman, Zach N; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C; Szuter, Elise M; Hagan, Richard W; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M; Nelson, David R; Rosendale, Andrew J; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R; Ioannidis, Panagiotis; Waterhouse, Robert M; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J Spencer; Gondhalekar, Ameya D; Scharf, Michael E; Peterson, Brittany F; Raje, Kapil R; Hottel, Benjamin A; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S T; Duncan, Elizabeth J; Murali, Shwetha C; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C; Muzny, Donna M; Wheeler, David; Panfilio, Kristen A; Vargas Jentzsch, Iris M; Vargo, Edward L; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T; Anderson, Michelle A E; Jones, Jeffery W; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D; Attardo, Geoffrey M; Robertson, Hugh M; Zdobnov, Evgeny M; Ribeiro, Jose M C; Gibbs, Richard A; Werren, John H; Palli, Subba R; Schal, Coby; Richards, Stephen
2016-02-02
The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host-symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human-bed bug and symbiont-bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite.
Intriguing Balancing Selection on the Intron 5 Region of LMBR1 in Human Population
He, Fang; Wu, Dong-Dong; Kong, Qing-Peng; Zhang, Ya-Ping
2008-01-01
Background The intron 5 of gene LMBR1 is the cis-acting regulatory module for the sonic hedgehog (SHH) gene. Mutation in this non-coding region is associated with preaxial polydactyly, and may play crucial roles in the evolution of limb and skeletal system. Methodology/Principal Findings We sequenced a region of the LMBR1 gene intron 5 in East Asian human population, and found a significant deviation of Tajima's D statistics from neutrality taking human population growth into account. Data from HapMap also demonstrated extended linkage disequilibrium in the region in East Asian and European population, and significantly low degree of genetic differentiation among human populations. Conclusion/Significance We proposed that the intron 5 of LMBR1 was presumably subject to balancing selection during the evolution of modern human. PMID:18698406
Lipovich, Leonard; Hou, Zhuo-Cheng; Jia, Hui; Sinkler, Christopher; McGowen, Michael; Sterner, Kirstin N; Weckle, Amy; Sugalski, Amara B; Pipes, Lenore; Gatti, Domenico L; Mason, Christopher E; Sherwood, Chet C; Hof, Patrick R; Kuzawa, Christopher W; Grossman, Lawrence I; Goodman, Morris; Wildman, Derek E
2016-02-01
The human brain and human cognitive abilities are strikingly different from those of other great apes despite relatively modest genome sequence divergence. However, little is presently known about the interspecies divergence in gene structure and transcription that might contribute to these phenotypic differences. To date, most comparative studies of gene structure in the brain have examined humans, chimpanzees, and macaque monkeys. To add to this body of knowledge, we analyze here the brain transcriptome of the western lowland gorilla (Gorilla gorilla gorilla), an African great ape species that is phylogenetically closely related to humans, but with a brain that is approximately one-third the size. Manual transcriptome curation from a sample of the planum temporale region of the neocortex revealed 12 protein-coding genes and one noncoding-RNA gene with exons in the gorilla unmatched by public transcriptome data from the orthologous human loci. These interspecies gene structure differences accounted for a total of 134 amino acids in proteins found in the gorilla that were absent from protein products of the orthologous human genes. Proteins varying in structure between human and gorilla were involved in immunity and energy metabolism, suggesting their relevance to phenotypic differences. This gorilla neocortical transcriptome comprises an empirical, not homology- or prediction-driven, resource for orthologous gene comparisons between human and gorilla. These findings provide a unique repository of the sequences and structures of thousands of genes transcribed in the gorilla brain, pointing to candidate genes that may contribute to the traits distinguishing humans from other closely related great apes. © 2015 Wiley Periodicals, Inc.
Gene expression analysis upon lncRNA DDSR1 knockdown in human fibroblasts
Jia, Li; Sun, Zhonghe; Wu, Xiaolin; Misteli, Tom; Sharma, Vivek
2015-01-01
Long non-coding RNAs (lncRNAs) play important roles in regulating diverse biological processes including DNA damage and repair. We have recently reported that the DNA damage inducible lncRNA DNA damage-sensitive RNA1 (DDSR1) regulates DNA repair by homologous recombination (HR). Since lncRNAs also modulate gene expression, we identified gene expression changes upon DDSR1 knockdown in human fibroblast cells. Gene expression analysis after RNAi treatment targeted against DDSR1 revealed 119 genes that show differential expression. Here we provide a detailed description of the microarray data (NCBI GEO accession number GSE67048) and the data analysis procedure associated with the publication by Sharma et al., 2015 in EMBO Reports [1]. PMID:26697398
Production of recombinant adenovirus containing human interlukin-4 gene.
Mojarrad, Majid; Abdolazimi, Yassan; Hajati, Jamshid; Modarressi, Mohammad Hossein
2011-11-01
Recombinant adenoviruses are currently used for a variety of purposes, including in vitro gene transfer, in vivo vaccination, and gene therapy. Ability to infect many cell types, high efficiency in gene transfer, entering both dividing and non dividing cells, and growing to high titers make this virus a good choice for using in various experiments. In the present experiment, a recombinant adenovirus containing human IL-4 coding sequence was made. IL-4 has several characteristics that made it a good choice for using in cancer gene therapy, controlling inflammatory diseases, and studies on autoimmune diseases. In brief, IL-4 coding sequence was amplified by and cloned in pAd-Track-CMV. Then, by means of homologous recombination between recombinant pAd-Track-CMV and Adeasy-1 plasmid in bacteria, recombinant adenovirus complete genome was made and IL-4 containing shuttle vector was incorporated into the viral backbone. After linearization, for virus packaging, viral genome was transfected into HEK-293 cell line. Viral production was conveniently followed with the aid of green fluorescent protein. Recombinant adenovirus produced here, was capable to infecting cell lines and express interlukin-4 in cell. This system can be used as a powerful, easy, and cost benefit tool in various studies on cancer gene therapy and also studies on immunogenetics.
Cheng, Tian; Liu, Guo-Hua; Song, Hui-Qun; Lin, Rui-Qing; Zhu, Xing-Quan
2016-03-01
Hymenolepis nana, commonly known as the dwarf tapeworm, is one of the most common tapeworms of humans and rodents and can cause hymenolepiasis. Although this zoonotic tapeworm is of socio-economic significance in many countries of the world, its genetics, systematics, epidemiology, and biology are poorly understood. In the present study, we sequenced and characterized the complete mitochondrial (mt) genome of H. nana. The mt genome is 13,764 bp in size and encodes 36 genes, including 12 protein-coding genes, 2 ribosomal RNA, and 22 transfer RNA genes. All genes are transcribed in the same direction. The gene order and genome content are completely identical with their congener Hymenolepis diminuta. Phylogenetic analyses based on concatenated amino acid sequences of 12 protein-coding genes by Bayesian inference, Maximum likelihood, and Maximum parsimony showed the division of class Cestoda into two orders, supported the monophylies of both the orders Cyclophyllidea and Pseudophyllidea. Analyses of mt genome sequences also support the monophylies of the three families Taeniidae, Hymenolepididae, and Diphyllobothriidae. This novel mt genome provides a useful genetic marker for studying the molecular epidemiology, systematics, and population genetics of the dwarf tapeworm and should have implications for the diagnosis, prevention, and control of hymenolepiasis in humans.
The pig X and Y Chromosomes: structure, sequence, and evolution
Skinner, Benjamin M.; Sargent, Carole A.; Churcher, Carol; Hunt, Toby; Herrero, Javier; Loveland, Jane E.; Dunn, Matt; Louzada, Sandra; Fu, Beiyuan; Chow, William; Gilbert, James; Austin-Guest, Siobhan; Beal, Kathryn; Carvalho-Silva, Denise; Cheng, William; Gordon, Daria; Grafham, Darren; Hardy, Matt; Harley, Jo; Hauser, Heidi; Howden, Philip; Howe, Kerstin; Lachani, Kim; Ellis, Peter J.I.; Kelly, Daniel; Kerry, Giselle; Kerwin, James; Ng, Bee Ling; Threadgold, Glen; Wileman, Thomas; Wood, Jonathan M.D.; Yang, Fengtang; Harrow, Jen; Affara, Nabeel A.; Tyler-Smith, Chris
2016-01-01
We have generated an improved assembly and gene annotation of the pig X Chromosome, and a first draft assembly of the pig Y Chromosome, by sequencing BAC and fosmid clones from Duroc animals and incorporating information from optical mapping and fiber-FISH. The X Chromosome carries 1033 annotated genes, 690 of which are protein coding. Gene order closely matches that found in primates (including humans) and carnivores (including cats and dogs), which is inferred to be ancestral. Nevertheless, several protein-coding genes present on the human X Chromosome were absent from the pig, and 38 pig-specific X-chromosomal genes were annotated, 22 of which were olfactory receptors. The pig Y-specific Chromosome sequence generated here comprises 30 megabases (Mb). A 15-Mb subset of this sequence was assembled, revealing two clusters of male-specific low copy number genes, separated by an ampliconic region including the HSFY gene family, which together make up most of the short arm. Both clusters contain palindromes with high sequence identity, presumably maintained by gene conversion. Many of the ancestral X-related genes previously reported in at least one mammalian Y Chromosome are represented either as active genes or partial sequences. This sequencing project has allowed us to identify genes—both single copy and amplified—on the pig Y Chromosome, to compare the pig X and Y Chromosomes for homologous sequences, and thereby to reveal mechanisms underlying pig X and Y Chromosome evolution. PMID:26560630
Lopez, M; Eberlé, F; Mattei, M G; Gabert, J; Birg, F; Bardin, F; Maroc, C; Dubreuil, P
1995-04-03
The human poliovirus (PV) receptor (PVR) is a member of the immunoglobulin (Ig) superfamily with unknown cellular function. We have isolated a human PVR-related (PRR) cDNA. The deduced amino acid (aa) sequence of PRR showed, in the extracellular region, 51.7 and 54.3% similarity with human PVR and with the murine PVR homolog, respectively. The cDNA coding sequence is 1.6-kb long and encodes a deduced 57-kDa protein; this protein has a structural organization analogous to that of PVR, that is, one V- and two C-set Ig domains, with a conserved number of aa. Northern blot analysis indicated that a major 5.9-kb transcript is present in all normal human tissues tested. In situ hybridization showed that the PRR gene is located at bands q23-q24 of human chromosome 11.
SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments
Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic
2001-01-01
Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202
The CIDEA gene V115F polymorphism is associated with obesity in Swedish subjects.
Dahlman, Ingrid; Kaaman, Maria; Jiao, Hong; Kere, Juha; Laakso, Markku; Arner, Peter
2005-10-01
The cell death-inducing DFFA (DNA fragmentation factor-alpha)-like effector A (CIDEA) gene is implicated as an important regulator of body weight in mice and humans and is therefore a candidate gene for human obesity. Here, we characterize common CIDEA gene polymorphisms and investigate them for association with obesity in two independent Swedish samples; the first comprised 981 women and the second 582 men. Both samples display a large variation in BMI. The only detected coding polymorphism encodes an exon 4 V115F amino acid substitution, which is associated with BMI in both sexes (P = 0.021 for women, P = 0.023 for men, and P = 0.0015 for joint analysis). These results support a role for CIDEA alleles in human obesity. CIDEA-deficient mice display higher metabolic rate, and the gene cross-talks with tumor necrosis factor-alpha (TNF-alpha) in fat cells. We hypothesize that CIDEA alleles regulate human obesity through impact on basal metabolic rate and adipocyte TNF-alpha signaling.
Non-coding RNAs in lung cancer
Ricciuti, Biagio; Mecca, Carmen; Crinò, Lucio; Baglivo, Sara; Cenci, Matteo; Metro, Giulio
2014-01-01
The discovery that protein-coding genes represent less than 2% of all human genome, and the evidence that more than 90% of it is actively transcribed, changed the classical point of view of the central dogma of molecular biology, which was always based on the assumption that RNA functions mainly as an intermediate bridge between DNA sequences and protein synthesis machinery. Accumulating data indicates that non-coding RNAs are involved in different physiological processes, providing for the maintenance of cellular homeostasis. They are important regulators of gene expression, cellular differentiation, proliferation, migration, apoptosis, and stem cell maintenance. Alterations and disruptions of their expression or activity have increasingly been associated with pathological changes of cancer cells, this evidence and the prospect of using these molecules as diagnostic markers and therapeutic targets, make currently non-coding RNAs among the most relevant molecules in cancer research. In this paper we will provide an overview of non-coding RNA function and disruption in lung cancer biology, also focusing on their potential as diagnostic, prognostic and predictive biomarkers. PMID:25593996
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burkhardt E.; Adham, I.M.; Brosig, B.
1994-03-01
Leydig insulin-like protein (LEY I-L) is a member of the insulin-like hormone superfamily. The LEY I-L gene (designated INSL3) is expressed exclusively in prenatal and postnatal Leydig cells. The authors report here the cloning and nucleotide sequence of porcine and human LEY I-L genes including the 5[prime] regions. Both genes consist of two exons and one intron. The organization of the LEY I-L gene is similar to that of insulin and relaxin. The transcription start site in the porcine and human LEY I-L gene is localized 13 and 14 bp upstream of the translation start site, respectively. Alignment of themore » 5[prime] flanking regions of both genes reveals that the first 107 nucleotides upstream of the transcription start site exhibit an overall sequence similarity of 80%. This conserved region contains a consensus TATAA box, a CAAT-like element (GAAT), and a consensus SP1 sequence (GGGCGG) at equivalent positions in both genes and therefore may play a role in regulation of expression of the LEY I-L gene. The porcine and human genome contains a single copy of the LEY I-L gene. By in situ hybridization, the human gene was assigned to bands p13.2-p12 of the short arm of chromosome 19. 25 refs., 6 figs.« less
Milanesi, Luciano; Petrillo, Mauro; Sepe, Leandra; Boccia, Angelo; D'Agostino, Nunzio; Passamano, Myriam; Di Nardo, Salvatore; Tasco, Gianluca; Casadio, Rita; Paolella, Giovanni
2005-01-01
Background Protein kinases are a well defined family of proteins, characterized by the presence of a common kinase catalytic domain and playing a significant role in many important cellular processes, such as proliferation, maintenance of cell shape, apoptosys. In many members of the family, additional non-kinase domains contribute further specialization, resulting in subcellular localization, protein binding and regulation of activity, among others. About 500 genes encode members of the kinase family in the human genome, and although many of them represent well known genes, a larger number of genes code for proteins of more recent identification, or for unknown proteins identified as kinase only after computational studies. Results A systematic in silico study performed on the human genome, led to the identification of 5 genes, on chromosome 1, 11, 13, 15 and 16 respectively, and 1 pseudogene on chromosome X; some of these genes are reported as kinases from NCBI but are absent in other databases, such as KinBase. Comparative analysis of 483 gene regions and subsequent computational analysis, aimed at identifying unannotated exons, indicates that a large number of kinase may code for alternately spliced forms or be incorrectly annotated. An InterProScan automated analysis was perfomed to study domain distribution and combination in the various families. At the same time, other structural features were also added to the annotation process, including the putative presence of transmembrane alpha helices, and the cystein propensity to participate into a disulfide bridge. Conclusion The predicted human kinome was extended by identifiying both additional genes and potential splice variants, resulting in a varied panorama where functionality may be searched at the gene and protein level. Structural analysis of kinase proteins domains as defined in multiple sources together with transmembrane alpha helices and signal peptide prediction provides hints to function assignment. The results of the human kinome analysis are collected in the KinWeb database, available for browsing and searching over the internet, where all results from the comparative analysis and the gene structure annotation are made available, alongside the domain information. Kinases may be searched by domain combinations and the relative genes may be viewed in a graphic browser at various level of magnification up to gene organization on the full chromosome set. PMID:16351747
An Evolutionary Genomic Approach to Identify Genes Involved in Human Birth Timing
Orabona, Guilherme; Morgan, Thomas; Haataja, Ritva; Hallman, Mikko; Puttonen, Hilkka; Menon, Ramkumar; Kuczynski, Edward; Norwitz, Errol; Snegovskikh, Victoria; Palotie, Aarno; Fellman, Vineta; DeFranco, Emily A.; Chaudhari, Bimal P.; McGregor, Tracy L.; McElroy, Jude J.; Oetjens, Matthew T.; Teramo, Kari; Borecki, Ingrid; Fay, Justin; Muglia, Louis
2011-01-01
Coordination of fetal maturation with birth timing is essential for mammalian reproduction. In humans, preterm birth is a disorder of profound global health significance. The signals initiating parturition in humans have remained elusive, due to divergence in physiological mechanisms between humans and model organisms typically studied. Because of relatively large human head size and narrow birth canal cross-sectional area compared to other primates, we hypothesized that genes involved in parturition would display accelerated evolution along the human and/or higher primate phylogenetic lineages to decrease the length of gestation and promote delivery of a smaller fetus that transits the birth canal more readily. Further, we tested whether current variation in such accelerated genes contributes to preterm birth risk. Evidence from allometric scaling of gestational age suggests human gestation has been shortened relative to other primates. Consistent with our hypothesis, many genes involved in reproduction show human acceleration in their coding or adjacent noncoding regions. We screened >8,400 SNPs in 150 human accelerated genes in 165 Finnish preterm and 163 control mothers for association with preterm birth. In this cohort, the most significant association was in FSHR, and 8 of the 10 most significant SNPs were in this gene. Further evidence for association of a linkage disequilibrium block of SNPs in FSHR, rs11686474, rs11680730, rs12473870, and rs1247381 was found in African Americans. By considering human acceleration, we identified a novel gene that may be associated with preterm birth, FSHR. We anticipate other human accelerated genes will similarly be associated with preterm birth risk and elucidate essential pathways for human parturition. PMID:21533219
From Genomes to Protein Models and Back
NASA Astrophysics Data System (ADS)
Tramontano, Anna; Giorgetti, Alejandro; Orsini, Massimiliano; Raimondo, Domenico
2007-12-01
The alternative splicing mechanism allows genes to generate more than one product. When the splicing events occur within protein coding regions they can modify the biological function of the protein. Alternative splicing has been suggested as one way for explaining the discrepancy between the number of human genes and functional complexity. We analysed the putative structure of the alternatively spliced gene products annotated in the ENCODE pilot project and discovered that many of the potential alternative gene products will be unlikely to produce stable functional proteins.
Studying Functions of All Yeast Genes Simultaneously
NASA Technical Reports Server (NTRS)
Stolc, Viktor; Eason, Robert G.; Poumand, Nader; Herman, Zelek S.; Davis, Ronald W.; Anthony Kevin; Jejelowo, Olufisayo
2006-01-01
A method of studying the functions of all the genes of a given species of microorganism simultaneously has been developed in experiments on Saccharomyces cerevisiae (commonly known as baker's or brewer's yeast). It is already known that many yeast genes perform functions similar to those of corresponding human genes; therefore, by facilitating understanding of yeast genes, the method may ultimately also contribute to the knowledge needed to treat some diseases in humans. Because of the complexity of the method and the highly specialized nature of the underlying knowledge, it is possible to give only a brief and sketchy summary here. The method involves the use of unique synthetic deoxyribonucleic acid (DNA) sequences that are denoted as DNA bar codes because of their utility as molecular labels. The method also involves the disruption of gene functions through deletion of genes. Saccharomyces cerevisiae is a particularly powerful experimental system in that multiple deletion strains easily can be pooled for parallel growth assays. Individual deletion strains recently have been created for 5,918 open reading frames, representing nearly all of the estimated 6,000 genetic loci of Saccharomyces cerevisiae. Tagging of each deletion strain with one or two unique 20-nucleotide sequences enables identification of genes affected by specific growth conditions, without prior knowledge of gene functions. Hybridization of bar-code DNA to oligonucleotide arrays can be used to measure the growth rate of each strain over several cell-division generations. The growth rate thus measured serves as an index of the fitness of the strain.
Non-coding RNAs—Novel targets in neurotoxicity
Tal, Tamara L.; Tanguay, Robert L.
2012-01-01
Over the past ten years non-coding RNAs (ncRNAs) have emerged as pivotal players in fundamental physiological and cellular processes and have been increasingly implicated in cancer, immune disorders, and cardiovascular, neurodegenerative, and metabolic diseases. MicroRNAs (miRNAs) represent a class of ncRNA molecules that function as negative regulators of post-transcriptional gene expression. miRNAs are predicted to regulate 60% of all human protein-coding genes and as such, play key roles in cellular and developmental processes, human health, and disease. Relative to counterparts that lack bindings sites for miRNAs, genes encoding proteins that are post-transcriptionally regulated by miRNAs are twice as likely to be sensitive to environmental chemical exposure. Not surprisingly, miRNAs have been recognized as targets or effectors of nervous system, developmental, hepatic, and carcinogenic toxicants, and have been identified as putative regulators of phase I xenobiotic-metabolizing enzymes. In this review, we give an overview of the types of ncRNAs and highlight their roles in neurodevelopment, neurological disease, activity-dependent signaling, and drug metabolism. We then delve into specific examples that illustrate their importance as mediators, effectors, or adaptive agents of neurotoxicants or neuroactive pharmaceutical compounds. Finally, we identify a number of outstanding questions regarding ncRNAs and neurotoxicity. PMID:22394481
MGDB: a comprehensive database of genes involved in melanoma.
Zhang, Di; Zhu, Rongrong; Zhang, Hanqian; Zheng, Chun-Hou; Xia, Junfeng
2015-01-01
The Melanoma Gene Database (MGDB) is a manually curated catalog of molecular genetic data relating to genes involved in melanoma. The main purpose of this database is to establish a network of melanoma related genes and to facilitate the mechanistic study of melanoma tumorigenesis. The entries describing the relationships between melanoma and genes in the current release were manually extracted from PubMed abstracts, which contains cumulative to date 527 human melanoma genes (422 protein-coding and 105 non-coding genes). Each melanoma gene was annotated in seven different aspects (General Information, Expression, Methylation, Mutation, Interaction, Pathway and Drug). In addition, manually curated literature references have also been provided to support the inclusion of the gene in MGDB and establish its association with melanoma. MGDB has a user-friendly web interface with multiple browse and search functions. We hoped MGDB will enrich our knowledge about melanoma genetics and serve as a useful complement to the existing public resources. Database URL: http://bioinfo.ahu.edu.cn:8080/Melanoma/index.jsp. © The Author(s) 2015. Published by Oxford University Press.
The importance of immune gene variability (MHC) in evolutionary ecology and conservation
Sommer, Simone
2005-01-01
Genetic studies have typically inferred the effects of human impact by documenting patterns of genetic differentiation and levels of genetic diversity among potentially isolated populations using selective neutral markers such as mitochondrial control region sequences, microsatellites or single nucleotide polymorphism (SNPs). However, evolutionary relevant and adaptive processes within and between populations can only be reflected by coding genes. In vertebrates, growing evidence suggests that genetic diversity is particularly important at the level of the major histocompatibility complex (MHC). MHC variants influence many important biological traits, including immune recognition, susceptibility to infectious and autoimmune diseases, individual odours, mating preferences, kin recognition, cooperation and pregnancy outcome. These diverse functions and characteristics place genes of the MHC among the best candidates for studies of mechanisms and significance of molecular adaptation in vertebrates. MHC variability is believed to be maintained by pathogen-driven selection, mediated either through heterozygote advantage or frequency-dependent selection. Up to now, most of our knowledge has derived from studies in humans or from model organisms under experimental, laboratory conditions. Empirical support for selective mechanisms in free-ranging animal populations in their natural environment is rare. In this review, I first introduce general information about the structure and function of MHC genes, as well as current hypotheses and concepts concerning the role of selection in the maintenance of MHC polymorphism. The evolutionary forces acting on the genetic diversity in coding and non-coding markers are compared. Then, I summarise empirical support for the functional importance of MHC variability in parasite resistance with emphasis on the evidence derived from free-ranging animal populations investigated in their natural habitat. Finally, I discuss the importance of adaptive genetic variability with respect to human impact and conservation, and implications for future studies. PMID:16242022
Identification of a Novel GJA8 (Cx50) Point Mutation Causes Human Dominant Congenital Cataracts
NASA Astrophysics Data System (ADS)
Ge, Xiang-Lian; Zhang, Yilan; Wu, Yaming; Lv, Jineng; Zhang, Wei; Jin, Zi-Bing; Qu, Jia; Gu, Feng
2014-02-01
Hereditary cataracts are clinically and genetically heterogeneous lens diseases that cause a significant proportion of visual impairment and blindness in children. Human cataracts have been linked with mutations in two genes, GJA3 and GJA8, respectively. To identify the causative mutation in a family with hereditary cataracts, family members were screened for mutations by PCR for both genes. Sequencing the coding regions of GJA8, coding for connexin 50, revealed a C > A transversion at nucleotide 264, which caused p.P88T mutation. To dissect the molecular consequences of this mutation, plasmids carrying wild-type and mutant mouse ORFs of Gja8 were generated and ectopically expressed in HEK293 cells and human lens epithelial cells, respectively. The recombinant proteins were assessed by confocal microscopy and Western blotting. The results demonstrate that the molecular consequences of the p.P88T mutation in GJA8 include changes in connexin 50 protein localization patterns, accumulation of mutant protein, and increased cell growth.
Zhang, Xun; Gejman, Roger; Mahta, Ali; Zhong, Ying; Rice, Kimberley A.; Zhou, Yunli; Cheunsuchon, Pornsuk; Louis, David N.; Klibanski, Anne
2010-01-01
Meningiomas are common tumors, representing 15-25% of all central nervous system tumors. NF2 gene inactivation on chromosome 22 has been shown as an early event in tumorigenesis; however, few factors underlying tumor growth and progression have been identified. Chromosomal abnormalities of 14q32 are often associated with meningioma pathogenesis and progression; therefore it has been proposed that an as yet unidentified tumor suppressor is present at this locus. MEG3 is an imprinted gene located at 14q32 that encodes a non-coding RNA with an anti-proliferative function. We found that MEG3 mRNA is highly expressed in normal arachnoidal cells. However, MEG3 is not expressed in the majority of human meningiomas or the human meningioma cell lines IOMM-Lee and CH157-MN. There is a strong association between loss of MEG3 expression and tumor grade. Allelic loss at the MEG3 locus is also observed in meningiomas, with increasing prevalence in higher grade tumors. In addition, there is an increase in CpG methylation within the promoter and the imprinting control region of MEG3 gene in meningiomas. Functionally, MEG3 suppresses DNA synthesis in both IOMM-Lee and CH157-MN cells by approximately 60% in BrdU incorporation assays. Colony-forming efficiency assays show that MEG3 inhibits colony formation in CH157-MN cells by approximately 80%. Furthermore, MEG3 stimulates p53-mediated transactivation in these cell lines. Therefore, these data are consistent with the hypothesis that MEG3, which encodes a non-coding RNA, may be a tumor suppressor gene at chromosome 14q32 involved in meningioma progression via a novel mechanism. PMID:20179190
Effects of Nickel Treatment on H3K4 Trimethylation and Gene Expression
Tchou-Wong, Kam-Meng; Kluz, Thomas; Arita, Adriana; Smith, Phillip R.; Brown, Stuart; Costa, Max
2011-01-01
Occupational exposure to nickel compounds has been associated with lung and nasal cancers. We have previously shown that exposure of the human lung adenocarcinoma A549 cells to NiCl2 for 24 hr significantly increased global levels of trimethylated H3K4 (H3K4me3), a transcriptional activating mark that maps to the promoters of transcribed genes. To further understand the potential epigenetic mechanism(s) underlying nickel carcinogenesis, we performed genome-wide mapping of H3K4me3 by chromatin immunoprecipitation and direct genome sequencing (ChIP-seq) and correlated with transcriptome genome-wide mapping of RNA transcripts by massive parallel sequencing of cDNA (RNA-seq). The effect of NiCl2 treatment on H3K4me3 peaks within 5,000 bp of transcription start sites (TSSs) on a set of genes highly induced by nickel in both A549 cells and human peripheral blood mononuclear cells were analyzed. Nickel exposure increased the level of H3K4 trimethylation in both the promoters and coding regions of several genes including CA9 and NDRG1 that were increased in expression in A549 cells. We have also compared the extent of the H3K4 trimethylation in the absence and presence of formaldehyde crosslinking and observed that crosslinking of chromatin was required to observe H3K4 trimethylation in the coding regions immediately downstream of TSSs of some nickel-induced genes including ADM and IGFBP3. This is the first genome-wide mapping of trimethylated H3K4 in the promoter and coding regions of genes induced after exposure to NiCl2. This study may provide insights into the epigenetic mechanism(s) underlying the carcinogenicity of nickel compounds. PMID:21455298
Turcot, Valérie; Lu, Yingchang; Highland, Heather M; Schurmann, Claudia; Justice, Anne E; Fine, Rebecca S; Bradfield, Jonathan P; Esko, Tõnu; Giri, Ayush; Graff, Mariaelisa; Guo, Xiuqing; Hendricks, Audrey E; Karaderi, Tugce; Lempradl, Adelheid; Locke, Adam E; Mahajan, Anubha; Marouli, Eirini; Sivapalaratnam, Suthesh; Young, Kristin L; Alfred, Tamuno; Feitosa, Mary F; Masca, Nicholas GD; Manning, Alisa K; Medina-Gomez, Carolina; Mudgal, Poorva; Ng, Maggie CY; Reiner, Alex P; Vedantam, Sailaja; Willems, Sara M; Winkler, Thomas W; Abecasis, Goncalo; Aben, Katja K; Alam, Dewan S; Alharthi, Sameer E; Allison, Matthew; Amouyel, Philippe; Asselbergs, Folkert W; Auer, Paul L; Balkau, Beverley; Bang, Lia E; Barroso, Inês; Bastarache, Lisa; Benn, Marianne; Bergmann, Sven; Bielak, Lawrence F; Blüher, Matthias; Boehnke, Michael; Boeing, Heiner; Boerwinkle, Eric; Böger, Carsten A; Bork-Jensen, Jette; Bots, Michiel L; Bottinger, Erwin P; Bowden, Donald W; Brandslund, Ivan; Breen, Gerome; Brilliant, Murray H; Broer, Linda; Brumat, Marco; Burt, Amber A; Butterworth, Adam S; Campbell, Peter T; Cappellani, Stefania; Carey, David J; Catamo, Eulalia; Caulfield, Mark J; Chambers, John C; Chasman, Daniel I; Chen, Yii-Der Ida; Chowdhury, Rajiv; Christensen, Cramer; Chu, Audrey Y; Cocca, Massimiliano; Collins, Francis S; Cook, James P; Corley, Janie; Galbany, Jordi Corominas; Cox, Amanda J; Crosslin, David S; Cuellar-Partida, Gabriel; D'Eustacchio, Angela; Danesh, John; Davies, Gail; de Bakker, Paul IW; de Groot, Mark CH; de Mutsert, Renée; Deary, Ian J; Dedoussis, George; Demerath, Ellen W; den Heijer, Martin; den Hollander, Anneke I; den Ruijter, Hester M; Dennis, Joe G; Denny, Josh C; Di Angelantonio, Emanuele; Drenos, Fotios; Du, Mengmeng; Dubé, Marie-Pierre; Dunning, Alison M; Easton, Douglas F; Edwards, Todd L; Ellinghaus, David; Ellinor, Patrick T; Elliott, Paul; Evangelou, Evangelos; Farmaki, Aliki-Eleni; Farooqi, I. Sadaf; Faul, Jessica D; Fauser, Sascha; Feng, Shuang; Ferrannini, Ele; Ferrieres, Jean; Florez, Jose C; Ford, Ian; Fornage, Myriam; Franco, Oscar H; Franke, Andre; Franks, Paul W; Friedrich, Nele; Frikke-Schmidt, Ruth; Galesloot, Tessel E.; Gan, Wei; Gandin, Ilaria; Gasparini, Paolo; Gibson, Jane; Giedraitis, Vilmantas; Gjesing, Anette P; Gordon-Larsen, Penny; Gorski, Mathias; Grabe, Hans-Jörgen; Grant, Struan FA; Grarup, Niels; Griffiths, Helen L; Grove, Megan L; Gudnason, Vilmundur; Gustafsson, Stefan; Haessler, Jeff; Hakonarson, Hakon; Hammerschlag, Anke R; Hansen, Torben; Harris, Kathleen Mullan; Harris, Tamara B; Hattersley, Andrew T; Have, Christian T; Hayward, Caroline; He, Liang; Heard-Costa, Nancy L; Heath, Andrew C; Heid, Iris M; Helgeland, Øyvind; Hernesniemi, Jussi; Hewitt, Alex W; Holmen, Oddgeir L; Hovingh, G Kees; Howson, Joanna MM; Hu, Yao; Huang, Paul L; Huffman, Jennifer E; Ikram, M Arfan; Ingelsson, Erik; Jackson, Anne U; Jansson, Jan-Håkan; Jarvik, Gail P; Jensen, Gorm B; Jia, Yucheng; Johansson, Stefan; Jørgensen, Marit E; Jørgensen, Torben; Jukema, J Wouter; Kahali, Bratati; Kahn, René S; Kähönen, Mika; Kamstrup, Pia R; Kanoni, Stavroula; Kaprio, Jaakko; Karaleftheri, Maria; Kardia, Sharon LR; Karpe, Fredrik; Kathiresan, Sekar; Kee, Frank; Kiemeney, Lambertus A; Kim, Eric; Kitajima, Hidetoshi; Komulainen, Pirjo; Kooner, Jaspal S; Kooperberg, Charles; Korhonen, Tellervo; Kovacs, Peter; Kuivaniemi, Helena; Kutalik, Zoltán; Kuulasmaa, Kari; Kuusisto, Johanna; Laakso, Markku; Lakka, Timo A; Lamparter, David; Lange, Ethan M; Lange, Leslie A; Langenberg, Claudia; Larson, Eric B; Lee, Nanette R; Lehtimäki, Terho; Lewis, Cora E; Li, Huaixing; Li, Jin; Li-Gao, Ruifang; Lin, Honghuang; Lin, Keng-Hung; Lin, Li-An; Lin, Xu; Lind, Lars; Lindström, Jaana; Linneberg, Allan; Liu, Ching-Ti; Liu, Dajiang J; Liu, Yongmei; Lo, Ken Sin; Lophatananon, Artitaya; Lotery, Andrew J; Loukola, Anu; Luan, Jian'an; Lubitz, Steven A; Lyytikäinen, Leo-Pekka; Männistö, Satu; Marenne, Gaëlle; Mazul, Angela L; McCarthy, Mark I; McKean-Cowdin, Roberta; Medland, Sarah E; Meidtner, Karina; Milani, Lili; Mistry, Vanisha; Mitchell, Paul; Mohlke, Karen L; Moilanen, Leena; Moitry, Marie; Montgomery, Grant W; Mook-Kanamori, Dennis O; Moore, Carmel; Mori, Trevor A; Morris, Andrew D; Morris, Andrew P; Müller-Nurasyid, Martina; Munroe, Patricia B; Nalls, Mike A; Narisu, Narisu; Nelson, Christopher P; Neville, Matt; Nielsen, Sune F; Nikus, Kjell; Njølstad, Pål R; Nordestgaard, Børge G; Nyholt, Dale R; O'Connel, Jeffrey R; O’Donoghue, Michelle L.; Olde Loohuis, Loes M; Ophoff, Roel A; Owen, Katharine R; Packard, Chris J; Padmanabhan, Sandosh; Palmer, Colin NA; Palmer, Nicholette D; Pasterkamp, Gerard; Patel, Aniruddh P; Pattie, Alison; Pedersen, Oluf; Peissig, Peggy L; Peloso, Gina M; Pennell, Craig E; Perola, Markus; Perry, James A; Perry, John RB; Pers, Tune H; Person, Thomas N; Peters, Annette; Petersen, Eva RB; Peyser, Patricia A; Pirie, Ailith; Polasek, Ozren; Polderman, Tinca J; Puolijoki, Hannu; Raitakari, Olli T; Rasheed, Asif; Rauramaa, Rainer; Reilly, Dermot F; Renström, Frida; Rheinberger, Myriam; Ridker, Paul M; Rioux, John D; Rivas, Manuel A; Roberts, David J; Robertson, Neil R; Robino, Antonietta; Rolandsson, Olov; Rudan, Igor; Ruth, Katherine S; Saleheen, Danish; Salomaa, Veikko; Samani, Nilesh J; Sapkota, Yadav; Sattar, Naveed; Schoen, Robert E; Schreiner, Pamela J; Schulze, Matthias B; Scott, Robert A; Segura-Lepe, Marcelo P; Shah, Svati H; Sheu, Wayne H-H; Sim, Xueling; Slater, Andrew J; Small, Kerrin S; Smith, Albert Vernon; Southam, Lorraine; Spector, Timothy D; Speliotes, Elizabeth K; Starr, John M; Stefansson, Kari; Steinthorsdottir, Valgerdur; Stirrups, Kathleen E; Strauch, Konstantin; Stringham, Heather M; Stumvoll, Michael; Sun, Liang; Surendran, Praveen; Swift, Amy J; Tada, Hayato; Tansey, Katherine E; Tardif, Jean-Claude; Taylor, Kent D; Teumer, Alexander; Thompson, Deborah J; Thorleifsson, Gudmar; Thorsteinsdottir, Unnur; Thuesen, Betina H; Tönjes, Anke; Tromp, Gerard; Trompet, Stella; Tsafantakis, Emmanouil; Tuomilehto, Jaakko; Tybjaerg-Hansen, Anne; Tyrer, Jonathan P; Uher, Rudolf; Uitterlinden, André G; Uusitupa, Matti; van der Laan, Sander W; van Duijn, Cornelia M; van Leeuwen, Nienke; van Setten, Jessica; Vanhala, Mauno; Varbo, Anette; Varga, Tibor V; Varma, Rohit; Velez Edwards, Digna R; Vermeulen, Sita H; Veronesi, Giovanni; Vestergaard, Henrik; Vitart, Veronique; Vogt, Thomas F; Völker, Uwe; Vuckovic, Dragana; Wagenknecht, Lynne E; Walker, Mark; Wallentin, Lars; Wang, Feijie; Wang, Carol A; Wang, Shuai; Wang, Yiqin; Ware, Erin B; Wareham, Nicholas J; Warren, Helen R; Waterworth, Dawn M; Wessel, Jennifer; White, Harvey D; Willer, Cristen J; Wilson, James G; Witte, Daniel R; Wood, Andrew R; Wu, Ying; Yaghootkar, Hanieh; Yao, Jie; Yao, Pang; Yerges-Armstrong, Laura M; Young, Robin; Zeggini, Eleftheria; Zhan, Xiaowei; Zhang, Weihua; Zhao, Jing Hua; Zhao, Wei; Zhao, Wei; Zhou, Wei; Zondervan, Krina T; Rotter, Jerome I; Pospisilik, John A; Rivadeneira, Fernando; Borecki, Ingrid B; Deloukas, Panos; Frayling, Timothy M; Lettre, Guillaume; North, Kari E; Lindgren, Cecilia M; Hirschhorn, Joel N; Loos, Ruth JF
2018-01-01
Genome-wide association studies (GWAS) have identified >250 loci for body mass index (BMI), implicating pathways related to neuronal biology. Most GWAS loci represent clusters of common, non-coding variants from which pinpointing causal genes remains challenging. Here, we combined data from 718,734 individuals to discover rare and low-frequency (MAF<5%) coding variants associated with BMI. We identified 14 coding variants in 13 genes, of which eight in genes (ZBTB7B, ACHE, RAPGEF3, RAB21, ZFHX3, ENTPD6, ZFR2, ZNF169) newly implicated in human obesity, two (MC4R, KSR2) previously observed in extreme obesity, and two variants in GIPR. Effect sizes of rare variants are ~10 times larger than of common variants, with the largest effect observed in carriers of an MC4R stop-codon (p.Tyr35Ter, MAF=0.01%), weighing ~7kg more than non-carriers. Pathway analyses confirmed enrichment of neuronal genes and provide new evidence for adipocyte and energy expenditure biology, widening the potential of genetically-supported therapeutic targets to treat obesity. PMID:29273807
New genetic variants of LATS1 detected in urinary bladder and colon cancer.
Saadeldin, Mona K; Shawer, Heba; Mostafa, Ahmed; Kassem, Neemat M; Amleh, Asma; Siam, Rania
2014-01-01
LATS1, the large tumor suppressor 1 gene, encodes for a serine/threonine kinase protein and is implicated in cell cycle progression. LATS1 is down-regulated in various human cancers, such as breast cancer, and astrocytoma. Point mutations in LATS1 were reported in human sarcomas. Additionally, loss of heterozygosity of LATS1 chromosomal region predisposes to breast, ovarian, and cervical tumors. In the current study, we investigated LATS1 genetic variations including single nucleotide polymorphisms (SNPs), in 28 Egyptian patients with either urinary bladder or colon cancers. The LATS1 gene was amplified and sequenced and the expression of LATS1 at the RNA level was assessed in 12 urinary bladder cancer samples. We report, the identification of a total of 29 variants including previously identified SNPs within LATS1 coding and non-coding sequences. A total of 18 variants were novel. Majority of the novel variants, 13, were mapped to intronic sequences and un-translated regions of the gene. Four of the five novel variants located in the coding region of the gene, represented missense mutations within the serine/threonine kinase catalytic domain. Interestingly, LATS1 RNA steady state levels was lost in urinary bladder cancerous tissue harboring four specific SNPs (16045 + 41736 + 34614 + 56177) positioned in the 5'UTR, intron 6, and two silent mutations within exon 4 and exon 8, respectively. This study identifies novel single-base-sequence alterations in the LATS1 gene. These newly identified variants could potentially be used as novel diagnostic or prognostic tools in cancer.
[Novel bidirectional promoter from human genome].
Orekhova, A S; Sverdlova, P S; Spirin, P V; Leonova, O G; Popenko, V I; Prasolov, V S; Rubtsov, P M
2011-01-01
In human and other mammalian genomes a number of closely linked gene pairs transcribed in opposite directions are found. According to bioinformatic analysis up to 10% of human genes are arranged in this way. In present work the fragment of human genome was cloned that separates genes localized at 2p13.1 and oriented "head-to-head", coding for hypothetical proteins with unknown functions--CCDC (Coiled Coil Domain Containing) 142 and TTC (TetraTricopeptide repeat Containing) 31. Intergenic CCDC142-TTC31 region overlaps with CpG-island and contains a number of potential binding sites for transcription factors. This fragment functions as bidirectional promoter in the system ofluciferase reporter gene expression upon transfection of human embryonic kidney (HEK293) cells. The vectors containing genes of two fluorescent proteins--green (EGFP) and red (DsRed2) in opposite orientations separated by the fragment of CCDC142-TTC31 intergenic region were constructed. In HEK293 cells transfected with these vectors simultaneous expression of two fluorescent proteins is observed. Truncated versions of intergenic region were obtained and their promoter activity measured. Minimal promoter fragment contains elements Inr, BRE, DPE characteristic for TATA-less promoters. Thus, from the human genome the novel bidirectional promoter was cloned that can be used for simultaneous constitutive expression of two genes in human cells.
A novel TBP-TAF complex on RNA polymerase II-transcribed snRNA genes.
Zaborowska, Justyna; Taylor, Alice; Roeder, Robert G; Murphy, Shona
2012-01-01
Initiation of transcription of most human genes transcribed by RNA polymerase II (RNAP II) requires the formation of a preinitiation complex comprising TFIIA, B, D, E, F, H and RNAP II. The general transcription factor TFIID is composed of the TATA-binding protein and up to 13 TBP-associated factors. During transcription of snRNA genes, RNAP II does not appear to make the transition to long-range productive elongation, as happens during transcription of protein-coding genes. In addition, recognition of the snRNA gene-type specific 3' box RNA processing element requires initiation from an snRNA gene promoter. These characteristics may, at least in part, be driven by factors recruited to the promoter. For example, differences in the complement of TAFs might result in differential recruitment of elongation and RNA processing factors. As precedent, it already has been shown that the promoters of some protein-coding genes do not recruit all the TAFs found in TFIID. Although TAF5 has been shown to be associated with RNAP II-transcribed snRNA genes, the full complement of TAFs associated with these genes has remained unclear. Here we show, using a ChIP and siRNA-mediated approach, that the TBP/TAF complex on snRNA genes differs from that found on protein-coding genes. Interestingly, the largest TAF, TAF1, and the core TAFs, TAF10 and TAF4, are not detected on snRNA genes. We propose that this snRNA gene-specific TAF subset plays a key role in gene type-specific control of expression.
Rates of genomic divergence in humans, chimpanzees and their lice.
Johnson, Kevin P; Allen, Julie M; Olds, Brett P; Mugisha, Lawrence; Reed, David L; Paige, Ken N; Pittendrigh, Barry R
2014-02-22
The rate of DNA mutation and divergence is highly variable across the tree of life. However, the reasons underlying this variation are not well understood. Comparing the rates of genetic changes between hosts and parasite lineages that diverged at the same time is one way to begin to understand differences in genetic mutation and substitution rates. Such studies have indicated that the rate of genetic divergence in parasites is often faster than that of their hosts when comparing single genes. However, the variation in this relative rate of molecular evolution across different genes in the genome is unknown. We compared the rate of DNA sequence divergence between humans, chimpanzees and their ectoparasitic lice for 1534 protein-coding genes across their genomes. The rate of DNA substitution in these orthologous genes was on average 14 times faster for lice than for humans and chimpanzees. In addition, these rates were positively correlated across genes. Because this correlation only occurred for substitutions that changed the amino acid, this pattern is probably produced by similar functional constraints across the same genes in humans, chimpanzees and their ectoparasites.
Rates of genomic divergence in humans, chimpanzees and their lice
Johnson, Kevin P.; Allen, Julie M.; Olds, Brett P.; Mugisha, Lawrence; Reed, David L.; Paige, Ken N.; Pittendrigh, Barry R.
2014-01-01
The rate of DNA mutation and divergence is highly variable across the tree of life. However, the reasons underlying this variation are not well understood. Comparing the rates of genetic changes between hosts and parasite lineages that diverged at the same time is one way to begin to understand differences in genetic mutation and substitution rates. Such studies have indicated that the rate of genetic divergence in parasites is often faster than that of their hosts when comparing single genes. However, the variation in this relative rate of molecular evolution across different genes in the genome is unknown. We compared the rate of DNA sequence divergence between humans, chimpanzees and their ectoparasitic lice for 1534 protein-coding genes across their genomes. The rate of DNA substitution in these orthologous genes was on average 14 times faster for lice than for humans and chimpanzees. In addition, these rates were positively correlated across genes. Because this correlation only occurred for substitutions that changed the amino acid, this pattern is probably produced by similar functional constraints across the same genes in humans, chimpanzees and their ectoparasites. PMID:24403325
Prospecting for pig single nucleotide polymorphisms in the human genome: have we struck gold?
Grapes, L; Rudd, S; Fernando, R L; Megy, K; Rocha, D; Rothschild, M F
2006-06-01
Gene-to-gene variation in the frequency of single nucleotide polymorphisms (SNPs) has been observed in humans, mice, rats, primates and pigs, but a relationship across species in this variation has not been described. Here, the frequency of porcine coding SNPs (cSNPs) identified by in silico methods, and the frequency of murine cSNPs, were compared with the frequency of human cSNPs across homologous genes. From 150,000 porcine expressed sequence tag (EST) sequences, a total of 452 SNP-containing sequence clusters were found, totalling 1394 putative SNPs. All the clustered porcine EST annotations and SNP data have been made publicly available at http://sputnik.btk.fi/project?name=swine. Human and murine cSNPs were identified from dbSNP and were characterized as either validated or total number of cSNPs (validated plus non-validated) for comparison purposes. The correlation between in silico pig cSNP and validated human cSNP densities was found to be 0.77 (p < 0.00001) for a set of 25 homologous genes, while a correlation of 0.48 (p < 0.0005) was found for a primarily random sample of 50 homologous human and mouse genes. This is the first evidence of conserved gene-to-gene variability in cSNP frequency across species and indicates that site-directed screening of porcine genes that are homologous to cSNP-rich human genes may rapidly advance cSNP discovery in pigs.
Chen, Wei-Hua; Wang, Xue-Xia; Lin, Wei; He, Xiao-Wei; Wu, Zhen-Qiang; Lin, Ying; Hu, Song-Nian; Wang, Xiao-Ning
2006-01-01
Background The cynomolgus monkey (Macaca fascicularis) is one of the most widely used surrogate animal models for an increasing number of human diseases and vaccines, especially immune-system-related ones. Towards a better understanding of the gene expression background upon its immunogenetics, we constructed a cDNA library from Epstein-Barr virus (EBV)-transformed B lymphocytes of a cynomolgus monkey and sequenced 10,000 randomly picked clones. Results After processing, 8,312 high-quality expressed sequence tags (ESTs) were generated and assembled into 3,728 unigenes. Annotations of these uniquely expressed transcripts demonstrated that out of the 2,524 open reading frame (ORF) positive unigenes (mitochondrial and ribosomal sequences were not included), 98.8% shared significant similarities (E-value less than 1e-10) with the NCBI nucleotide (nt) database, while only 67.7% (E-value less than 1e-5) did so with the NCBI non-redundant protein (nr) database. Further analysis revealed that 90.0% of the unigenes that shared no similarities to the nr database could be assigned to human chromosomes, in which 75 did not match significantly to any cynomolgus monkey and human ESTs. The mapping regions to known human genes on the human genome were described in detail. The protein family and domain analysis revealed that the first, second and fourth of the most abundantly expressed protein families were all assigned to immunoglobulin and major histocompatibility complex (MHC)-related proteins. The expression profiles of these genes were compared with that of homologous genes in human blood, lymph nodes and a RAMOS cell line, which demonstrated expression changes after transformation with EBV. The degree of sequence similarity of the MHC class I and II genes to the human reference sequences was evaluated. The results indicated that class I molecules showed weak amino acid identities (<90%), while class II showed slightly higher ones. Conclusion These results indicated that the genes expressed in the cynomolgus monkey could be used to identify novel protein-coding genes and revise those incomplete or incorrect annotations in the human genome by comparative methods, since the old world monkeys and humans share high similarities at the molecular level, especially within coding regions. The identification of multiple genes involved in the immune response, their sequence variations to the human homologues, and their responses to EBV infection could provide useful information to improve our understanding of the cynomolgus monkey immune system. PMID:16618371
Trotta, Edoardo
2016-05-17
The three stop codons UAA, UAG, and UGA signal the termination of mRNA translation. As a result of a mechanism that is not adequately understood, they are normally used with unequal frequencies. In this work, we showed that selective forces and mutational biases drive stop codon usage in the human genome. We found that, in respect to sense codons, stop codon usage was affected by stronger selective forces but was less influenced by neutral mutational biases. UGA is the most frequent termination codon in human genome. However, UAA was the preferred stop codon in genes with high breadth of expression, high level of expression, AT-rich coding sequences, housekeeping functions, and in gene ontology categories with the largest deviation from expected stop codon usage. Selective forces associated with the breadth and the level of expression favoured AT-rich sequences in the mRNA region including the stop site and its proximal 3'-UTR, but acted with scarce effects on sense codons, generating two regions, upstream and downstream of the stop codon, with strongly different base composition. By favouring low levels of GC-content, selection promoted labile local secondary structures at the stop site and its proximal 3'-UTR. The compositional and structural context favoured by selection was surprisingly emphasized in the class of ribosomal proteins and was consistent with sequence elements that increase the efficiency of translational termination. Stop codons were also heterogeneously distributed among chromosomes by a mechanism that was strongly correlated with the GC-content of coding sequences. In human genome, the nucleotide composition and the thermodynamic stability of stop codon site and its proximal 3'-UTR are correlated with the GC-content of coding sequences and with the breadth and the level of gene expression. In highly expressed genes stop codon usage is compositionally and structurally consistent with highly efficient translation termination signals.
Raju, Hemalatha B.; Tsinoremas, Nicholas F.; Capobianco, Enrico
2016-01-01
Regeneration of injured nerves is likely occurring in the peripheral nervous system, but not in the central nervous system. Although protein-coding gene expression has been assessed during nerve regeneration, little is currently known about the role of non-coding RNAs (ncRNAs). This leaves open questions about the potential effects of ncRNAs at transcriptome level. Due to the limited availability of human neuropathic pain (NP) data, we have identified the most comprehensive time-course gene expression profile referred to sciatic nerve (SN) injury and studied in a rat model using two neuronal tissues, namely dorsal root ganglion (DRG) and SN. We have developed a methodology to identify differentially expressed bioentities starting from microarray probes and repurposing them to annotate ncRNAs, while analyzing the expression profiles of protein-coding genes. The approach is designed to reuse microarray data and perform first profiling and then meta-analysis through three main steps. First, we used contextual analysis to identify what we considered putative or potential protein-coding targets for selected ncRNAs. Relevance was therefore assigned to differential expression of neighbor protein-coding genes, with neighborhood defined by a fixed genomic distance from long or antisense ncRNA loci, and of parental genes associated with pseudogenes. Second, connectivity among putative targets was used to build networks, in turn useful to conduct inference at interactomic scale. Last, network paths were annotated to assess relevance to NP. We found significant differential expression in long-intergenic ncRNAs (32 lincRNAs in SN and 8 in DRG), antisense RNA (31 asRNA in SN and 12 in DRG), and pseudogenes (456 in SN and 56 in DRG). In particular, contextual analysis centered on pseudogenes revealed some targets with known association to neurodegeneration and/or neurogenesis processes. While modules of the olfactory receptors were clearly identified in protein–protein interaction networks, other connectivity paths were identified between proteins already investigated in studies on disorders, such as Parkinson, Down syndrome, Huntington disease, and Alzheimer. Our findings suggest the importance of reusing gene expression data by meta-analysis approaches. PMID:27803687
Raju, Hemalatha B; Tsinoremas, Nicholas F; Capobianco, Enrico
2016-01-01
Regeneration of injured nerves is likely occurring in the peripheral nervous system, but not in the central nervous system. Although protein-coding gene expression has been assessed during nerve regeneration, little is currently known about the role of non-coding RNAs (ncRNAs). This leaves open questions about the potential effects of ncRNAs at transcriptome level. Due to the limited availability of human neuropathic pain (NP) data, we have identified the most comprehensive time-course gene expression profile referred to sciatic nerve (SN) injury and studied in a rat model using two neuronal tissues, namely dorsal root ganglion (DRG) and SN. We have developed a methodology to identify differentially expressed bioentities starting from microarray probes and repurposing them to annotate ncRNAs, while analyzing the expression profiles of protein-coding genes. The approach is designed to reuse microarray data and perform first profiling and then meta-analysis through three main steps. First, we used contextual analysis to identify what we considered putative or potential protein-coding targets for selected ncRNAs. Relevance was therefore assigned to differential expression of neighbor protein-coding genes, with neighborhood defined by a fixed genomic distance from long or antisense ncRNA loci, and of parental genes associated with pseudogenes. Second, connectivity among putative targets was used to build networks, in turn useful to conduct inference at interactomic scale. Last, network paths were annotated to assess relevance to NP. We found significant differential expression in long-intergenic ncRNAs (32 lincRNAs in SN and 8 in DRG), antisense RNA (31 asRNA in SN and 12 in DRG), and pseudogenes (456 in SN and 56 in DRG). In particular, contextual analysis centered on pseudogenes revealed some targets with known association to neurodegeneration and/or neurogenesis processes. While modules of the olfactory receptors were clearly identified in protein-protein interaction networks, other connectivity paths were identified between proteins already investigated in studies on disorders, such as Parkinson, Down syndrome, Huntington disease, and Alzheimer. Our findings suggest the importance of reusing gene expression data by meta-analysis approaches.
lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine.
Sun, Lei; Liu, Hui; Zhang, Lin; Meng, Jia
2015-01-01
Functional long non-coding RNAs (lncRNAs) have been bringing novel insight into biological study, however it is still not trivial to accurately distinguish the lncRNA transcripts (LNCTs) from the protein coding ones (PCTs). As various information and data about lncRNAs are preserved by previous studies, it is appealing to develop novel methods to identify the lncRNAs more accurately. Our method lncRScan-SVM aims at classifying PCTs and LNCTs using support vector machine (SVM). The gold-standard datasets for lncRScan-SVM model training, lncRNA prediction and method comparison were constructed according to the GENCODE gene annotations of human and mouse respectively. By integrating features derived from gene structure, transcript sequence, potential codon sequence and conservation, lncRScan-SVM outperforms other approaches, which is evaluated by several criteria such as sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC) and area under curve (AUC). In addition, several known human lncRNA datasets were assessed using lncRScan-SVM. LncRScan-SVM is an efficient tool for predicting the lncRNAs, and it is quite useful for current lncRNA study.
Li, Juan; Chen, Fen; Sugiyama, Hiromu; Blair, David; Lin, Rui-Qing; Zhu, Xing-Quan
2015-07-01
In the present study, near-complete mitochondrial (mt) genome sequences for Schistosoma japonicum from different regions in the Philippines and Japan were amplified and sequenced. Comparisons among S. japonicum from the Philippines, Japan, and China revealed a geographically based length difference in mt genomes, but the mt genomic organization and gene arrangement were the same. Sequence differences among samples from the Philippines and all samples from the three endemic areas were 0.57-2.12 and 0.76-3.85 %, respectively. The most variable part of the mt genome was the non-coding region. In the coding portion of the genome, protein-coding genes varied more than rRNA genes and tRNAs. The near-complete mt genome sequences for Philippine specimens were identical in length (14,091 bp) which was 4 bp longer than those of S. japonicum samples from Japan and China. This indel provides a unique genetic marker for S. japonicum samples from the Philippines. Phylogenetic analyses based on the concatenated amino acids of 12 protein-coding genes showed that samples of S. japonicum clustered according to their geographical origins. The identified mitochondrial indel marker will be useful for tracing the source of S. japonicum infection in humans and animals in Southeast Asia.
Cipriano, Andrea; Ballarino, Monica
2018-01-01
The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as “junk” DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years. PMID:29560353
Neuhaus, Klaus; Landstorfer, Richard; Fellner, Lea; Simon, Svenja; Schafferhans, Andrea; Goldberg, Tatyana; Marx, Harald; Ozoline, Olga N; Rost, Burkhard; Kuster, Bernhard; Keim, Daniel A; Scherer, Siegfried
2016-02-24
Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome). Using the above methods, 72 short, non-annotated protein-coding genes were detected. All of these showed signals in the ribosomal footprinting assay indicating mRNA translation. Seven were verified by mass spectrometry. Fifty-seven genes are annotated in other enterobacteriaceae, mainly as hypothetical genes; the remaining 15 genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, 61 of the 72 novel proteins exhibit predicted structural and functional features similar to those of annotated proteins. Many of the novel genes show differential transcription when grown under eleven diverse growth conditions suggesting environmental regulation. Three genes were found to confer a phenotype in previous studies, e.g., decreased cattle colonization. These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that hypothetical genes are not annotation artifacts and opening an additional way to study their functionality. All 72 genes are taxonomically restricted and, therefore, appear to have evolved relatively recently de novo.
genenames.org: the HGNC resources in 2011
Seal, Ruth L.; Gordon, Susan M.; Lush, Michael J.; Wright, Mathew W.; Bruford, Elspeth A.
2011-01-01
The HUGO Gene Nomenclature Committee (HGNC) aims to assign a unique gene symbol and name to every human gene. The HGNC database currently contains almost 30 000 approved gene symbols, over 19 000 of which represent protein-coding genes. The public website, www.genenames.org, displays all approved nomenclature within Symbol Reports that contain data curated by HGNC editors and links to related genomic, phenotypic and proteomic information. Here we describe improvements to our resources, including a new Quick Gene Search, a new List Search, an integrated HGNC BioMart and a new Statistics and Downloads facility. PMID:20929869
Kim, Dong Seon; Hahn, Yoonsoo
2012-11-13
Evolution of splice sites is a well-known phenomenon that results in transcript diversity during human evolution. Many novel splice sites are derived from repetitive elements and may not contribute to protein products. Here, we analyzed annotated human protein-coding exons and identified human-specific splice sites that arose after the human-chimpanzee divergence. We analyzed multiple alignments of the annotated human protein-coding exons and their respective orthologous mammalian genome sequences to identify 85 novel splice sites (50 splice acceptors and 35 donors) in the human genome. The novel protein-coding exons, which are expressed either constitutively or alternatively, produce novel protein isoforms by insertion, deletion, or frameshift. We found three cases in which the human-specific isoform conferred novel molecular function in the human cells: the human-specific IMUP protein isoform induces apoptosis of the trophoblast and is implicated in pre-eclampsia; the intronization of a part of SMOX gene exon produces inactive spermine oxidase; the human-specific NUB1 isoform shows reduced interaction with ubiquitin-like proteins, possibly affecting ubiquitin pathways. Although the generation of novel protein isoforms does not equate to adaptive evolution, we propose that these cases are useful candidates for a molecular functional study to identify proteomic changes that might bring about novel phenotypes during human evolution.
AP1 Keeps Chromatin Poised for Action | Center for Cancer Research
The human genome harbors gene-encoding DNA, the blueprint for building proteins that regulate cellular function. Embedded across the genome, in non-coding regions, are DNA elements to which regulatory factors bind. The interaction of regulatory factors with DNA at these sites modifies gene expression to modulate cell activity. In cells, DNA exists in a complex with proteins
Evidence for Transcript Networks Composed of Chimeric RNAs in Human Cells
Borel, Christelle; Mudge, Jonathan M.; Howald, Cédric; Foissac, Sylvain; Ucla, Catherine; Chrast, Jacqueline; Ribeca, Paolo; Martin, David; Murray, Ryan R.; Yang, Xinping; Ghamsari, Lila; Lin, Chenwei; Bell, Ian; Dumais, Erica; Drenkow, Jorg; Tress, Michael L.; Gelpí, Josep Lluís; Orozco, Modesto; Valencia, Alfonso; van Berkum, Nynke L.; Lajoie, Bryan R.; Vidal, Marc; Stamatoyannopoulos, John; Batut, Philippe; Dobin, Alex; Harrow, Jennifer; Hubbard, Tim; Dekker, Job; Frankish, Adam; Salehi-Ashtiani, Kourosh; Reymond, Alexandre; Antonarakis, Stylianos E.; Guigó, Roderic; Gingeras, Thomas R.
2012-01-01
The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5′ and 3′ transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network. PMID:22238572
Rappaport, Noa; Fishilevich, Simon; Nudel, Ron; Twik, Michal; Belinky, Frida; Plaschkes, Inbar; Stein, Tsippi Iny; Cohen, Dana; Oz-Levi, Danit; Safran, Marilyn; Lancet, Doron
2017-08-18
A key challenge in the realm of human disease research is next generation sequencing (NGS) interpretation, whereby identified filtered variant-harboring genes are associated with a patient's disease phenotypes. This necessitates bioinformatics tools linked to comprehensive knowledgebases. The GeneCards suite databases, which include GeneCards (human genes), MalaCards (human diseases) and PathCards (human pathways) together with additional tools, are presented with the focus on MalaCards utility for NGS interpretation as well as for large scale bioinformatic analyses. VarElect, our NGS interpretation tool, leverages the broad information in the GeneCards suite databases. MalaCards algorithms unify disease-related terms and annotations from 69 sources. Further, MalaCards defines hierarchical relatedness-aliases, disease families, a related diseases network, categories and ontological classifications. GeneCards and MalaCards delineate and share a multi-tiered, scored gene-disease network, with stringency levels, including the definition of elite status-high quality gene-disease pairs, coming from manually curated trustworthy sources, that includes 4500 genes for 8000 diseases. This unique resource is key to NGS interpretation by VarElect. VarElect, a comprehensive search tool that helps infer both direct and indirect links between genes and user-supplied disease/phenotype terms, is robustly strengthened by the information found in MalaCards. The indirect mode benefits from GeneCards' diverse gene-to-gene relationships, including SuperPaths-integrated biological pathways from 12 information sources. We are currently adding an important information layer in the form of "disease SuperPaths", generated from the gene-disease matrix by an algorithm similar to that previously employed for biological pathway unification. This allows the discovery of novel gene-disease and disease-disease relationships. The advent of whole genome sequencing necessitates capacities to go beyond protein coding genes. GeneCards is highly useful in this respect, as it also addresses 101,976 non-protein-coding RNA genes. In a more recent development, we are currently adding an inclusive map of regulatory elements and their inferred target genes, generated by integration from 4 resources. MalaCards provides a rich big-data scaffold for in silico biomedical discovery within the gene-disease universe. VarElect, which depends significantly on both GeneCards and MalaCards power, is a potent tool for supporting the interpretation of wet-lab experiments, notably NGS analyses of disease. The GeneCards suite has thus transcended its 2-decade role in biomedical research, maturing into a key player in clinical investigation.
The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity
Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H.; Allen, Andrew S.; Goldstein, David B.
2015-01-01
Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease. PMID:26332131
The DNA Methylome of Human Peripheral Blood Mononuclear Cells
Ye, Mingzhi; Zheng, Hancheng; Yu, Jian; Wu, Honglong; Sun, Jihua; Zhang, Hongyu; Chen, Quan; Luo, Ruibang; Chen, Minfeng; He, Yinghua; Jin, Xin; Zhang, Qinghui; Yu, Chang; Zhou, Guangyu; Sun, Jinfeng; Huang, Yebo; Zheng, Huisong; Cao, Hongzhi; Zhou, Xiaoyu; Guo, Shicheng; Hu, Xueda; Li, Xin; Kristiansen, Karsten; Bolund, Lars; Xu, Jiujin; Wang, Wen; Yang, Huanming; Wang, Jian; Li, Ruiqiang; Beck, Stephan; Wang, Jun; Zhang, Xiuqing
2010-01-01
DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies. PMID:21085693
Nuclear export of RNA: Different sizes, shapes and functions.
Williams, Tobias; Ngo, Linh H; Wickramasinghe, Vihandha O
2018-03-01
Export of protein-coding and non-coding RNA molecules from the nucleus to the cytoplasm is critical for gene expression. This necessitates the continuous transport of RNA species of different size, shape and function through nuclear pore complexes via export receptors and adaptor proteins. Here, we provide an overview of the major RNA export pathways in humans, highlighting the similarities and differences between each. Its importance is underscored by the growing appreciation that deregulation of RNA export pathways is associated with human diseases like cancer. Crown Copyright © 2017. Published by Elsevier Ltd. All rights reserved.
Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.
Al-Tobasei, Rafet; Paneru, Bam; Salem, Mohamed
2016-01-01
The ENCODE project revealed that ~70% of the human genome is transcribed. While only 1-2% of the RNAs encode for proteins, the rest are non-coding RNAs. Long non-coding RNAs (lncRNAs) form a diverse class of non-coding RNAs that are longer than 200 nt. Emerging evidence indicates that lncRNAs play critical roles in various cellular processes including regulation of gene expression. LncRNAs show low levels of gene expression and sequence conservation, which make their computational identification in genomes difficult. In this study, more than two billion Illumina sequence reads were mapped to the genome reference using the TopHat and Cufflinks software. Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed. In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test. Depending on the filtering stringency conditions, between 31,195 and 54,503 lncRNAs were identified, with only 421 matching known lncRNAs in other species. A digital gene expression atlas revealed 2,935 tissue-specific and 3,269 ubiquitously-expressed lncRNAs. This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.
Decoding the non-coding RNAs in Alzheimer's disease.
Schonrock, Nicole; Götz, Jürgen
2012-11-01
Non-coding RNAs (ncRNAs) are integral components of biological networks with fundamental roles in regulating gene expression. They can integrate sequence information from the DNA code, epigenetic regulation and functions of multimeric protein complexes to potentially determine the epigenetic status and transcriptional network in any given cell. Humans potentially contain more ncRNAs than any other species, especially in the brain, where they may well play a significant role in human development and cognitive ability. This review discusses their emerging role in Alzheimer's disease (AD), a human pathological condition characterized by the progressive impairment of cognitive functions. We discuss the complexity of the ncRNA world and how this is reflected in the regulation of the amyloid precursor protein and Tau, two proteins with central functions in AD. By understanding this intricate regulatory network, there is hope for a better understanding of disease mechanisms and ultimately developing diagnostic and therapeutic tools.
Microprocessor mediates transcriptional termination in long noncoding microRNA genes
Dhir, Ashish; Dhir, Somdutta; Proudfoot, Nick J.; Jopling, Catherine L.
2015-01-01
MicroRNA (miRNA) play a major role in the post-transcriptional regulation of gene expression. Mammalian miRNA biogenesis begins with co-transcriptional cleavage of RNA polymerase II (Pol II) transcripts by the Microprocessor complex. While most miRNA are located within introns of protein coding genes, a substantial minority of miRNA originate from long non coding (lnc) RNA where transcript processing is largely uncharacterized. We show, by detailed characterization of liver-specific lnc-pri-miR-122 and genome-wide analysis in human cell lines, that most lnc-pri-miRNA do not use the canonical cleavage and polyadenylation (CPA) pathway, but instead use Microprocessor cleavage to terminate transcription. This Microprocessor inactivation leads to extensive transcriptional readthrough of lnc-pri-miRNA and transcriptional interference with downstream genes. Consequently we define a novel RNase III-mediated, polyadenylation-independent mechanism of Pol II transcription termination in mammalian cells. PMID:25730776
The Genome of the Western Clawed Frog Xenopus tropicalis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hellsten, Uffe; Harland, Richard M.; Gilchrist, Michael J.
2009-10-01
The western clawed frog Xenopus tropicalis is an important model for vertebrate development that combines experimental advantages of the African clawed frog Xenopus laevis with more tractable genetics. Here we present a draft genome sequence assembly of X. tropicalis. This genome encodes over 20,000 protein-coding genes, including orthologs of at least 1,700 human disease genes. Over a million expressed sequence tags validated the annotation. More than one-third of the genome consists of transposable elements, with unusually prevalent DNA transposons. Like other tetrapods, the genome contains gene deserts enriched for conserved non-coding elements. The genome exhibits remarkable shared synteny with humanmore » and chicken over major parts of large chromosomes, broken by lineage-specific chromosome fusions and fissions, mainly in the mammalian lineage.« less
The complete coding region sequence of river buffalo (Bubalus bubalis) SRY gene.
Parma, Pietro; Feligini, Maria; Greppi, Gianfranco; Enne, Giuseppe
2004-02-01
The Y-linked SRY gene is responsible for testis determination in mammals. Mutations in this gene can lead to XY Gonadal Dysgenesis, an abnormal sexual phenotype described in humans, cattle, horses and river buffalo. We report here the complete river buffalo SRY sequence in order to enable the genetic diagnosis of this disease. The SRY sequence was also used to confirm the evolutionary divergence time between cattle and river buffalo 10 million years ago.
Simard, Frédéric; Licht, Monica; Besansky, Nora J.; Lehmann, Tovi
2007-01-01
Genetic variation in defensin, a gene encoding a major effector molecule of insects immune response was analyzed within and between populations of three members of the Anopheles gambiae complex. The species selected included the two anthropophilic species, An. gambiae and An. arabiensis and the most zoophilic species of the complex, An. quadriannulatus. The first species was represented by four populations spanning its extreme genetic and geographical ranges, whereas each of the other two species was represented by a single population. We found (i) reduced overall polymorphism in the mature peptide region and in the total coding region, together with specific reductions in rare and moderately frequent mutations (sites) in the coding region compared with non coding regions, (ii) markedly reduced rate of nonsynonymous diversity compared with synonymous variation in the mature peptide and virtually identical mature peptide across the three species, and (iii) increased divergence between species in the mature peptide together with reduced differentiation between populations of An. gambiae in the same DNA region. These patterns suggest a strong purifying selection on the mature peptide and probably the whole coding region. Because An. quadriannulatus is not exposed to human pathogens, identical mature peptide and similar pattern of polymorphism across species implies that human pathogens played no role as selective agents on this peptide. PMID:17161659
Production of Recombinant Adenovirus Containing Human Interlukin-4 Gene
Mojarrad, Majid; Abdolazimi, Yassan; Hajati, Jamshid; Modarressi, Mohammad Hossein
2011-01-01
Objective(s) Recombinant adenoviruses are currently used for a variety of purposes, including in vitro gene transfer, in vivo vaccination, and gene therapy. Ability to infect many cell types, high efficiency in gene transfer, entering both dividing and non dividing cells, and growing to high titers make this virus a good choice for using in various experiments. In the present experiment, a recombinant adenovirus containing human IL-4 coding sequence was made. IL-4 has several characteristics that made it a good choice for using in cancer gene therapy, controlling inflammatory diseases, and studies on autoimmune diseases. Materials and Methods In brief, IL-4 coding sequence was amplified by and cloned in pAd-Track-CMV. Then, by means of homologous recombination between recombinant pAd-Track-CMV and Adeasy-1 plasmid in bacteria, recombinant adenovirus complete genome was made and IL-4 containing shuttle vector was incorporated into the viral backbone. After linearization, for virus packaging, viral genome was transfected into HEK-293 cell line. Viral production was conveniently followed with the aid of green fluorescent protein. Results Recombinant adenovirus produced here, was capable to infecting cell lines and express interlukin-4 in cell. Conclusion This system can be used as a powerful, easy, and cost benefit tool in various studies on cancer gene therapy and also studies on immunogenetics. PMID:23493491
Peri, A; Cordella-Miele, E; Miele, L; Mukherjee, A B
1993-01-01
Clara cell 10-kD protein (cc10kD), a secretory phospholipase A2 inhibitor, is suggested to be the human counterpart of rabbit uteroglobin (UG). Because cc10kD is expressed constitutively at a very high level in the human respiratory epithelium, the 5' region of its gene may be useful in achieving organ-specific expression of recombinant DNA in gene therapy of diseases such as cystic fibrosis. However, it is important to establish the tissue-specific expression of this gene before designing gene transfer experiments. Since the UG gene in the rabbit is expressed in many other organs besides the lung and the endometrium, we investigated the organ and tissue specificity of human cc10kD gene expression using polymerase chain reaction, nucleotide sequence analysis, immunofluorescence, and Northern blotting. Our results indicate that, in addition to the lung, cc10kD is expressed in several nonrespiratory organs, with a distribution pattern very similar, if not identical, to that of UG in the rabbit. These results underscore the necessity for more detailed analyses of the 5' region of the human cc10kD gene before its usefulness in gene therapy could be fully assessed. These data also suggest that cc10kD and UG may have similar physiological function(s). Images PMID:8227325
Genic insights from integrated human proteomics in GeneCards.
Fishilevich, Simon; Zimmerman, Shahar; Kohn, Asher; Iny Stein, Tsippi; Olender, Tsviya; Kolker, Eugene; Safran, Marilyn; Lancet, Doron
2016-01-01
GeneCards is a one-stop shop for searchable human gene annotations (http://www.genecards.org/). Data are automatically mined from ∼120 sources and presented in an integrated web card for every human gene. We report the application of recent advances in proteomics to enhance gene annotation and classification in GeneCards. First, we constructed the Human Integrated Protein Expression Database (HIPED), a unified database of protein abundance in human tissues, based on the publically available mass spectrometry (MS)-based proteomics sources ProteomicsDB, Multi-Omics Profiling Expression Database, Protein Abundance Across Organisms and The MaxQuant DataBase. The integrated database, residing within GeneCards, compares favourably with its individual sources, covering nearly 90% of human protein-coding genes. For gene annotation and comparisons, we first defined a protein expression vector for each gene, based on normalized abundances in 69 normal human tissues. This vector is portrayed in the GeneCards expression section as a bar graph, allowing visual inspection and comparison. These data are juxtaposed with transcriptome bar graphs. Using the protein expression vectors, we further defined a pairwise metric that helps assess expression-based pairwise proximity. This new metric for finding functional partners complements eight others, including sharing of pathways, gene ontology (GO) terms and domains, implemented in the GeneCards Suite. In parallel, we calculated proteome-based differential expression, highlighting a subset of tissues that overexpress a gene and subserving gene classification. This textual annotation allows users of VarElect, the suite's next-generation phenotyper, to more effectively discover causative disease variants. Finally, we define the protein-RNA expression ratio and correlation as yet another attribute of every gene in each tissue, adding further annotative information. The results constitute a significant enhancement of several GeneCards sections and help promote and organize the genome-wide structural and functional knowledge of the human proteome. Database URL:http://www.genecards.org/. © The Author(s) 2016. Published by Oxford University Press.
A global view of the nonprotein-coding transcriptome in Plasmodium falciparum
Raabe, Carsten A.; Sanchez, Cecilia P.; Randau, Gerrit; Robeck, Thomas; Skryabin, Boris V.; Chinni, Suresh V.; Kube, Michael; Reinhardt, Richard; Ng, Guey Hooi; Manickam, Ravichandran; Kuryshev, Vladimir Y.; Lanzer, Michael; Brosius, Juergen; Tang, Thean Hock; Rozhdestvensky, Timofey S.
2010-01-01
Nonprotein-coding RNAs (npcRNAs) represent an important class of regulatory molecules that act in many cellular pathways. Here, we describe the experimental identification and validation of the small npcRNA transcriptome of the human malaria parasite Plasmodium falciparum. We identified 630 novel npcRNA candidates. Based on sequence and structural motifs, 43 of them belong to the C/D and H/ACA-box subclasses of small nucleolar RNAs (snoRNAs) and small Cajal body-specific RNAs (scaRNAs). We further observed the exonization of a functional H/ACA snoRNA gene, which might contribute to the regulation of ribosomal protein L7a gene expression. Some of the small npcRNA candidates are from telomeric and subtelomeric repetitive regions, suggesting their potential involvement in maintaining telomeric integrity and subtelomeric gene silencing. We also detected 328 cis-encoded antisense npcRNAs (asRNAs) complementary to P. falciparum protein-coding genes of a wide range of biochemical pathways, including determinants of virulence and pathology. All cis-encoded asRNA genes tested exhibit lifecycle-specific expression profiles. For all but one of the respective sense–antisense pairs, we deduced concordant patterns of expression. Our findings have important implications for a better understanding of gene regulatory mechanisms in P. falciparum, revealing an extended and sophisticated npcRNA network that may control the expression of housekeeping genes and virulence factors. PMID:19864253
A global view of the nonprotein-coding transcriptome in Plasmodium falciparum.
Raabe, Carsten A; Sanchez, Cecilia P; Randau, Gerrit; Robeck, Thomas; Skryabin, Boris V; Chinni, Suresh V; Kube, Michael; Reinhardt, Richard; Ng, Guey Hooi; Manickam, Ravichandran; Kuryshev, Vladimir Y; Lanzer, Michael; Brosius, Juergen; Tang, Thean Hock; Rozhdestvensky, Timofey S
2010-01-01
Nonprotein-coding RNAs (npcRNAs) represent an important class of regulatory molecules that act in many cellular pathways. Here, we describe the experimental identification and validation of the small npcRNA transcriptome of the human malaria parasite Plasmodium falciparum. We identified 630 novel npcRNA candidates. Based on sequence and structural motifs, 43 of them belong to the C/D and H/ACA-box subclasses of small nucleolar RNAs (snoRNAs) and small Cajal body-specific RNAs (scaRNAs). We further observed the exonization of a functional H/ACA snoRNA gene, which might contribute to the regulation of ribosomal protein L7a gene expression. Some of the small npcRNA candidates are from telomeric and subtelomeric repetitive regions, suggesting their potential involvement in maintaining telomeric integrity and subtelomeric gene silencing. We also detected 328 cis-encoded antisense npcRNAs (asRNAs) complementary to P. falciparum protein-coding genes of a wide range of biochemical pathways, including determinants of virulence and pathology. All cis-encoded asRNA genes tested exhibit lifecycle-specific expression profiles. For all but one of the respective sense-antisense pairs, we deduced concordant patterns of expression. Our findings have important implications for a better understanding of gene regulatory mechanisms in P. falciparum, revealing an extended and sophisticated npcRNA network that may control the expression of housekeeping genes and virulence factors.
Medical Sequencing at the extremes of Human Body Mass
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ahituv, Nadav; Kavaslar, Nihan; Schackwitz, Wendy
2006-09-01
Body weight is a quantitative trait with significantheritability in humans. To identify potential genetic contributors tothis phenotype, we resequenced the coding exons and splice junctions of58 genes in 379 obese and 378 lean individuals. Our 96Mb survey included21 genes associated with monogenic forms of obesity in humans or mice, aswell as 37 genes that function in body weight-related pathways. We foundthat the monogenic obesity-associated gene group was enriched for rarenonsynonymous variants unique to the obese (n=46) versus lean (n=26)populations. Computational analysis further predicted a significantlygreater fraction of deleterious variants within the obese cohort.Consistent with the complex inheritance of body weight,more » we did notobserve obvious familial segregation in the majority of the 28 availablekindreds. Taken together, these data suggest that multiple rare alleleswith variable penetrance contribute to obesity in the population andprovide a deep medical sequencing based approach to detectthem.« less
Unique features of a global human ectoparasite identified through sequencing of the bed bug genome
Benoit, Joshua B.; Adelman, Zach N.; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C.; Szuter, Elise M.; Hagan, Richard W.; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M.; Nelson, David R.; Rosendale, Andrew J.; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M.; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R.; Ioannidis, Panagiotis; Waterhouse, Robert M.; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J. Spencer; Gondhalekar, Ameya D.; Scharf, Michael E.; Peterson, Brittany F.; Raje, Kapil R.; Hottel, Benjamin A.; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S. T.; Duncan, Elizabeth J.; Murali, Shwetha C.; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L.; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C.; Muzny, Donna M.; Wheeler, David; Panfilio, Kristen A.; Vargas Jentzsch, Iris M.; Vargo, Edward L.; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T.; Anderson, Michelle A. E.; Jones, Jeffery W.; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D.; Attardo, Geoffrey M.; Robertson, Hugh M.; Zdobnov, Evgeny M.; Ribeiro, Jose M. C.; Gibbs, Richard A.; Werren, John H.; Palli, Subba R.; Schal, Coby; Richards, Stephen
2016-01-01
The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host–symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human–bed bug and symbiont–bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite. PMID:26836814
[Efficient genome editing in human pluripotent stem cells through CRISPR/Cas9].
Liu, Gai-gai; Li, Shuang; Wei, Yu-da; Zhang, Yong-xian; Ding, Qiu-rong
2015-11-01
The RNA-guided CRISPR (clustered regularly interspaced short palindromic repeat)-associated Cas9 nuclease has offered a new platform for genome editing with high efficiency. Here, we report the use of CRISPR/Cas9 technology to target a specific genomic region in human pluripotent stem cells. We show that CRISPR/Cas9 can be used to disrupt a gene by introducing frameshift mutations to gene coding region; to knock in specific sequences (e.g. FLAG tag DNA sequence) to targeted genomic locus via homology directed repair; to induce large genomic deletion through dual-guide multiplex. Our results demonstrate the versatile application of CRISPR/Cas9 in stem cell genome editing, which can be widely utilized for functional studies of genes or genome loci in human pluripotent stem cells.
Distinct polyadenylation landscapes of diverse human tissues revealed by a modified PA-seq strategy
2013-01-01
Background Polyadenylation is a key regulatory step in eukaryotic gene expression and one of the major contributors of transcriptome diversity. Aberrant polyadenylation often associates with expression defects and leads to human diseases. Results To better understand global polyadenylation regulation, we have developed a polyadenylation sequencing (PA-seq) approach. By profiling polyadenylation events in 13 human tissues, we found that alternative cleavage and polyadenylation (APA) is prevalent in both protein-coding and noncoding genes. In addition, APA usage, similar to gene expression profiling, exhibits tissue-specific signatures and is sufficient for determining tissue origin. A 3′ untranslated region shortening index (USI) was further developed for genes with tandem APA sites. Strikingly, the results showed that different tissues exhibit distinct patterns of shortening and/or lengthening of 3′ untranslated regions, suggesting the intimate involvement of APA in establishing tissue or cell identity. Conclusions This study provides a comprehensive resource to uncover regulated polyadenylation events in human tissues and to characterize the underlying regulatory mechanism. PMID:24025092
Chu, H W; Rios, C; Huang, C; Wesolowska-Andersen, A; Burchard, E G; O'Connor, B P; Fingerlin, T E; Nichols, D; Reynolds, S D; Seibold, M A
2015-10-01
Targeted knockout of genes in primary human cells using CRISPR-Cas9-mediated genome-editing represents a powerful approach to study gene function and to discern molecular mechanisms underlying complex human diseases. We used lentiviral delivery of CRISPR-Cas9 machinery and conditional reprogramming culture methods to knockout the MUC18 gene in human primary nasal airway epithelial cells (AECs). Massively parallel sequencing technology was used to confirm that the genome of essentially all cells in the edited AEC populations contained coding region insertions and deletions (indels). Correspondingly, we found mRNA expression of MUC18 was greatly reduced and protein expression was absent. Characterization of MUC18 knockout cell populations stimulated with TLR2, 3 and 4 agonists revealed that IL-8 (a proinflammatory chemokine) responses of AECs were greatly reduced in the absence of functional MUC18 protein. Our results show the feasibility of CRISPR-Cas9-mediated gene knockouts in AEC culture (both submerged and polarized), and suggest a proinflammatory role for MUC18 in airway epithelial response to bacterial and viral stimuli.
Schyth, Brian Dall; Bela-ong, Dennis Berbulla; Jalali, Seyed Amir Hossein; Kristensen, Lasse Bøgelund Juel; Einer-Jensen, Katja; Pedersen, Finn Skou; Lorenzen, Niels
2015-01-01
MicroRNAs (miRNAs) are ~22 base pair-long non-coding RNAs which regulate gene expression in the cytoplasm of eukaryotic cells by binding to specific target regions in mRNAs to mediate transcriptional blocking or mRNA cleavage. Through their fundamental roles in cellular pathways, gene regulation mediated by miRNAs has been shown to be involved in almost all biological phenomena, including development, metabolism, cell cycle, tumor formation, and host-pathogen interactions. To address the latter in a primitive vertebrate host, we here used an array platform to analyze the miRNA response in rainbow trout (Oncorhynchus mykiss) following inoculation with the virulent fish rhabdovirus Viral hemorrhagic septicaemia virus. Two clustered miRNAs, miR-462 and miR-731 (herein referred to as miR-462 cluster), described only in teleost fishes, were found to be strongly upregulated, indicating their involvement in fish-virus interactions. We searched for homologues of the two teleost miRNAs in other vertebrate species and investigated whether findings related to ours have been reported for these homologues. Gene synteny analysis along with gene sequence conservation suggested that the teleost fish miR-462 and miR-731 had evolved from the ancestral miR-191 and miR-425 (herein called miR-191 cluster), respectively. Whereas the miR-462 cluster locus is found between two protein-coding genes (intergenic) in teleost fish genomes, the miR-191 cluster locus is found within an intron of a protein-coding gene (intragenic) in the human genome. Interferon (IFN)-inducible and immune-related promoter elements found upstream of the teleost miR-462 cluster locus suggested roles in immune responses to viral pathogens in fish, while in humans, the miR-191 cluster functionally associated with cell cycle regulation. Stimulation of fish cell cultures with the IFN inducer poly I:C accordingly upregulated the expression of miR-462 and miR-731, while no stimulatory effect on miR-191 and miR-425 expression was observed in human cell lines. Despite high sequence conservation, evolution has thus resulted in different regulation and presumably also different functional roles of these orthologous miRNA clusters in different vertebrate lineages. PMID:26207374
DGEM--a microarray gene expression database for primary human disease tissues.
Xia, Yuni; Campen, Andrew; Rigsby, Dan; Guo, Ying; Feng, Xingdong; Su, Eric W; Palakal, Mathew; Li, Shuyu
2007-01-01
Gene expression patterns can reflect gene regulations in human tissues under normal or pathologic conditions. Gene expression profiling data from studies of primary human disease samples are particularly valuable since these studies often span many years in order to collect patient clinical information and achieve a large sample size. Disease-to-Gene Expression Mapper (DGEM) provides a beneficial community resource to access and analyze these data; it currently includes Affymetrix oligonucleotide array datasets for more than 40 human diseases and 1400 samples. The data are normalized to the same scale and stored in a relational database. A statistical-analysis pipeline was implemented to identify genes abnormally expressed in disease tissues or genes whose expressions are associated with clinical parameters such as cancer patient survival. Data-mining results can be queried through a web-based interface at http://dgem.dhcp.iupui.edu/. The query tool enables dynamic generation of graphs and tables that are further linked to major gene and pathway resources that connect the data to relevant biology, including Entrez Gene and Kyoto Encyclopedia of Genes and Genomes (KEGG). In summary, DGEM provides scientists and physicians a valuable tool to study disease mechanisms, to discover potential disease biomarkers for diagnosis and prognosis, and to identify novel gene targets for drug discovery. The source code is freely available for non-profit use, on request to the authors.
Birikh, K R; Lebedenko, E N; Boni, I V; Berlin, Y A
1995-10-27
Synthetic intronless genes, coding for human interleukin 1 alpha (IL 1 alpha) and interleukin 1 receptor antagonist (IL1ra), have been expressed efficiently in a specially designed prokaryotic vector, pGMCE (a pGEM1 derivative), where the target gene forms the second part of a two-cistron system. The first part of the system is a translation enhancer-containing mini-cistron, whose termination codon overlaps the start codon of the target gene. In the case of the IL1 alpha gene, the high expression level is largely due to the direct efficient translation initiation at the second cistron, whereas with the IL1ra gene in the same system, the proximal translation initiation region (TIR) provides a high level of coupled expression of the target gene. Thus, pGMCE is a potentially versatile vector for direct prokaryotic expression.
EGASP: the human ENCODE Genome Annotation Assessment Project
Guigó, Roderic; Flicek, Paul; Abril, Josep F; Reymond, Alexandre; Lagarde, Julien; Denoeud, France; Antonarakis, Stylianos; Ashburner, Michael; Bajic, Vladimir B; Birney, Ewan; Castelo, Robert; Eyras, Eduardo; Ucla, Catherine; Gingeras, Thomas R; Harrow, Jennifer; Hubbard, Tim; Lewis, Suzanna E; Reese, Martin G
2006-01-01
Background We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. Results The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. Conclusion This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence. PMID:16925836
Long noncoding RNAs as enhancers of gene expression.
Ørom, U A; Derrien, T; Guigo, R; Shiekhattar, R
2010-01-01
The human genome contains thousands of long noncoding RNAs (ncRNAs) transcribed from diverse genomic locations. A large set of long ncRNAs is transcribed independent of protein-coding genes. We have used the GENCODE annotation of the human genome to identify 3019 long ncRNAs expressed in various human cell lines and tissue. This set of long ncRNAs responds to differentiation signals in primary human keratinocytes and is coexpressed with important regulators of keratinocyte development. Depletion of a number of these long ncRNAs leads to the repression of specific genes in their surrounding locus, supportive of an activating function for ncRNAs. Using reporter assays, we confirmed such activating function and show that such transcriptional enhancement is mediated through the long ncRNA transcripts. Our studies show that long ncRNAs exhibit functions similar to classically defined enhancers, through an RNA-dependent mechanism.
Kirsten, Holger; Al-Hasani, Hoor; Holdt, Lesca; Gross, Arnd; Beutner, Frank; Krohn, Knut; Horn, Katrin; Ahnert, Peter; Burkhardt, Ralph; Reiche, Kristin; Hackermüller, Jörg; Löffler, Markus; Teupser, Daniel; Thiery, Joachim; Scholz, Markus
2015-08-15
Genetics of gene expression (eQTLs or expression QTLs) has proved an indispensable tool for understanding biological pathways and pathomechanisms of trait-associated SNPs. However, power of most genome-wide eQTL studies is still limited. We performed a large eQTL study in peripheral blood mononuclear cells of 2112 individuals increasing the power to detect trans-effects genome-wide. Going beyond univariate SNP-transcript associations, we analyse relations of eQTLs to biological pathways, polygenetic effects of expression regulation, trans-clusters and enrichment of co-localized functional elements. We found eQTLs for about 85% of analysed genes, and 18% of genes were trans-regulated. Local eSNPs were enriched up to a distance of 5 Mb to the transcript challenging typically implemented ranges of cis-regulations. Pathway enrichment within regulated genes of GWAS-related eSNPs supported functional relevance of identified eQTLs. We demonstrate that nearest genes of GWAS-SNPs might frequently be misleading functional candidates. We identified novel trans-clusters of potential functional relevance for GWAS-SNPs of several phenotypes including obesity-related traits, HDL-cholesterol levels and haematological phenotypes. We used chromatin immunoprecipitation data for demonstrating biological effects. Yet, we show for strongly heritable transcripts that still little trans-chromosomal heritability is explained by all identified trans-eSNPs; however, our data suggest that most cis-heritability of these transcripts seems explained. Dissection of co-localized functional elements indicated a prominent role of SNPs in loci of pseudogenes and non-coding RNAs for the regulation of coding genes. In summary, our study substantially increases the catalogue of human eQTLs and improves our understanding of the complex genetic regulation of gene expression, pathways and disease-related processes. © The Author 2015. Published by Oxford University Press.
Zhou, Ke-Ren; Liu, Shun; Sun, Wen-Ju; Zheng, Ling-Ling; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu
2017-01-04
The abnormal transcriptional regulation of non-coding RNAs (ncRNAs) and protein-coding genes (PCGs) is contributed to various biological processes and linked with human diseases, but the underlying mechanisms remain elusive. In this study, we developed ChIPBase v2.0 (http://rna.sysu.edu.cn/chipbase/) to explore the transcriptional regulatory networks of ncRNAs and PCGs. ChIPBase v2.0 has been expanded with ∼10 200 curated ChIP-seq datasets, which represent about 20 times expansion when comparing to the previous released version. We identified thousands of binding motif matrices and their binding sites from ChIP-seq data of DNA-binding proteins and predicted millions of transcriptional regulatory relationships between transcription factors (TFs) and genes. We constructed 'Regulator' module to predict hundreds of TFs and histone modifications that were involved in or affected transcription of ncRNAs and PCGs. Moreover, we built a web-based tool, Co-Expression, to explore the co-expression patterns between DNA-binding proteins and various types of genes by integrating the gene expression profiles of ∼10 000 tumor samples and ∼9100 normal tissues and cell lines. ChIPBase also provides a ChIP-Function tool and a genome browser to predict functions of diverse genes and visualize various ChIP-seq data. This study will greatly expand our understanding of the transcriptional regulations of ncRNAs and PCGs. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genenames.org: the HGNC and VGNC resources in 2017.
Yates, Bethan; Braschi, Bryony; Gray, Kristian A; Seal, Ruth L; Tweedie, Susan; Bruford, Elspeth A
2017-01-04
The HUGO Gene Nomenclature Committee (HGNC) based at the European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. Currently the HGNC database contains almost 40 000 approved gene symbols, over 19 000 of which represent protein-coding genes. In addition to naming genomic loci we manually curate genes into family sets based on shared characteristics such as homology, function or phenotype. We have recently updated our gene family resources and introduced new improved visualizations which can be seen alongside our gene symbol reports on our primary website http://www.genenames.org In 2016 we expanded our remit and formed the Vertebrate Gene Nomenclature Committee (VGNC) which is responsible for assigning names to vertebrate species lacking a dedicated nomenclature group. Using the chimpanzee genome as a pilot project we have approved symbols and names for over 14 500 protein-coding genes in chimpanzee, and have developed a new website http://vertebrate.genenames.org to distribute these data. Here, we review our online data and resources, focusing particularly on the improvements and new developments made during the last two years. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
2004-12-09
We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Samuels, David C.; Boys, Richard J.; Henderson, Daniel A.; Chinnery, Patrick F.
2003-01-01
We applied a hidden Markov model segmentation method to the human mitochondrial genome to identify patterns in the sequence, to compare these patterns to the gene structure of mtDNA and to see whether these patterns reveal additional characteristics important for our understanding of genome evolution, structure and function. Our analysis identified three segmentation categories based upon the sequence transition probabilities. Category 2 segments corresponded to the tRNA and rRNA genes, with a greater strand-symmetry in these segments. Category 1 and 3 segments covered the protein- coding genes and almost all of the non-coding D-loop. Compared to category 1, the mtDNA segments assigned to category 3 had much lower guanine abundance. A comparison to two independent databases of mitochondrial mutations and polymorphisms showed that the high substitution rate of guanine in human mtDNA is largest in the category 3 segments. Analysis of synonymous mutations showed the same pattern. This suggests that this heterogeneity in the mutation rate is partly independent of respiratory chain function and is a direct property of the genome sequence itself. This has important implications for our understanding of mtDNA evolution and its use as a ‘molecular clock’ to determine the rate of population and species divergence. PMID:14530452
A fresh look at the male-specific region of the human Y chromosome.
Jangravi, Zohreh; Alikhani, Mehdi; Arefnezhad, Babak; Sharifi Tabar, Mehdi; Taleahmad, Sara; Karamzadeh, Razieh; Jadaliha, Mahdieh; Mousavi, Seyed Ahmad; Ahmadi Rastegar, Diba; Parsamatin, Pouria; Vakilian, Haghighat; Mirshahvaladi, Shahab; Sabbaghian, Marjan; Mohseni Meybodi, Anahita; Mirzaei, Mehdi; Shahhoseini, Maryam; Ebrahimi, Marzieh; Piryaei, Abbas; Moosavi-Movahedi, Ali Akbar; Haynes, Paul A; Goodchild, Ann K; Nasr-Esfahani, Mohammad Hossein; Jabbari, Esmaiel; Baharvand, Hossein; Sedighi Gilani, Mohammad Ali; Gourabi, Hamid; Salekdeh, Ghasem Hosseini
2013-01-04
The Chromosome-centric Human Proteome Project (C-HPP) aims to systematically map the entire human proteome with the intent to enhance our understanding of human biology at the cellular level. This project attempts simultaneously to establish a sound basis for the development of diagnostic, prognostic, therapeutic, and preventive medical applications. In Iran, current efforts focus on mapping the proteome of the human Y chromosome. The male-specific region of the Y chromosome (MSY) is unique in many aspects and comprises 95% of the chromosome's length. The MSY continually retains its haploid state and is full of repeated sequences. It is responsible for important biological roles such as sex determination and male fertility. Here, we present the most recent update of MSY protein-encoding genes and their association with various traits and diseases including sex determination and reversal, spermatogenesis and male infertility, cancers such as prostate cancers, sex-specific effects on the brain and behavior, and graft-versus-host disease. We also present information available from RNA sequencing, protein-protein interaction, post-translational modification of MSY protein-coding genes and their implications in biological systems. An overview of Human Y chromosome Proteome Project is presented and a systematic approach is suggested to ensure that at least one of each predicted protein-coding gene's major representative proteins will be characterized in the context of its major anatomical sites of expression, its abundance, and its functional relevance in a biological and/or medical context. There are many technical and biological issues that will need to be overcome in order to accomplish the full scale mapping.
Nehme, A; Zibara, K; Cerutti, C; Bricca, G
2015-06-01
The implication of the renin-angiotensin-aldosterone system (RAAS) in atheroma development is well described. However, a complete view of the local RAAS in atheroma is still missing. In this study we aimed to reveal the organization of RAAS in atheroma at the transcriptomic level and identify the transcriptional regulators behind it. Extended RAAS (extRAAS) was defined as the set of 37 genes coding for classical and novel RAAS participants (Figure 1). Five microarray datasets containing overall 590 samples representing carotid and peripheral atheroma were downloaded from the GEO database. Correlation-based hierarchical clustering (R software) of extRAAS genes within each dataset allowed the identification of modules of co-expressed genes. Reproducible co-expression modules across datasets were then extracted. Transcription factors (TFs) having common binding sites (TFBSs) in the promoters of coordinated genes were identified using the Genomatix database tools and analyzed for their correlation with extRAAS genes in the microarray datasets. Expression data revealed the expressed extRAAS components and their relative abundance displaying the favored pathways in atheroma. Three co-expression modules with more than 80% reproducibility across datasets were extracted. Two of them (M1 and M2) contained genes coding for angiotensin metabolizing enzymes involved in different pathways: M1 included ACE, MME, RNPEP, and DPP3, in addition to 7 other genes; and M2 included CMA1, CTSG, and CPA3. The third module (M3) contained genes coding for receptors known to be implicated in atheroma (AGTR1, MR, GR, LNPEP, EGFR and GPER). M1 and M3 were negatively correlated in 3 of 5 datasets. We identified 19 TFs that have enriched TFBSs in the promoters of genes of M1, and two for M3, but none was found for M2. Among the extracted TFs, ELF1, MAX, and IRF5 showed significant positive correlations with peptidase-coding genes from M1 and negative correlations with receptors-coding genes from M3 (p < 0.05). The identified co-expression modules display the transcriptional organization of local extRAAS in human carotid atheroma. The identification of several TFs potentially associated to extRAAS genes may provide a frame for the discovery of atheroma-specific modulators of extRAAS activity.(Figure is included in full-text article.).
MHC class I-associated peptides derive from selective regions of the human genome.
Pearson, Hillary; Daouda, Tariq; Granados, Diana Paola; Durette, Chantal; Bonneil, Eric; Courcelles, Mathieu; Rodenbrock, Anja; Laverdure, Jean-Philippe; Côté, Caroline; Mader, Sylvie; Lemieux, Sébastien; Thibault, Pierre; Perreault, Claude
2016-12-01
MHC class I-associated peptides (MAPs) define the immune self for CD8+ T lymphocytes and are key targets of cancer immunosurveillance. Here, the goals of our work were to determine whether the entire set of protein-coding genes could generate MAPs and whether specific features influence the ability of discrete genes to generate MAPs. Using proteogenomics, we have identified 25,270 MAPs isolated from the B lymphocytes of 18 individuals who collectively expressed 27 high-frequency HLA-A,B allotypes. The entire MAP repertoire presented by these 27 allotypes covered only 10% of the exomic sequences expressed in B lymphocytes. Indeed, 41% of expressed protein-coding genes generated no MAPs, while 59% of genes generated up to 64 MAPs, often derived from adjacent regions and presented by different allotypes. We next identified several features of transcripts and proteins associated with efficient MAP production. From these data, we built a logistic regression model that predicts with good accuracy whether a gene generates MAPs. Our results show preferential selection of MAPs from a limited repertoire of proteins with distinctive features. The notion that the MHC class I immunopeptidome presents only a small fraction of the protein-coding genome for monitoring by the immune system has profound implications in autoimmunity and cancer immunology.
MHC class I–associated peptides derive from selective regions of the human genome
Pearson, Hillary; Granados, Diana Paola; Durette, Chantal; Bonneil, Eric; Courcelles, Mathieu; Rodenbrock, Anja; Laverdure, Jean-Philippe; Côté, Caroline; Thibault, Pierre
2016-01-01
MHC class I–associated peptides (MAPs) define the immune self for CD8+ T lymphocytes and are key targets of cancer immunosurveillance. Here, the goals of our work were to determine whether the entire set of protein-coding genes could generate MAPs and whether specific features influence the ability of discrete genes to generate MAPs. Using proteogenomics, we have identified 25,270 MAPs isolated from the B lymphocytes of 18 individuals who collectively expressed 27 high-frequency HLA-A,B allotypes. The entire MAP repertoire presented by these 27 allotypes covered only 10% of the exomic sequences expressed in B lymphocytes. Indeed, 41% of expressed protein-coding genes generated no MAPs, while 59% of genes generated up to 64 MAPs, often derived from adjacent regions and presented by different allotypes. We next identified several features of transcripts and proteins associated with efficient MAP production. From these data, we built a logistic regression model that predicts with good accuracy whether a gene generates MAPs. Our results show preferential selection of MAPs from a limited repertoire of proteins with distinctive features. The notion that the MHC class I immunopeptidome presents only a small fraction of the protein-coding genome for monitoring by the immune system has profound implications in autoimmunity and cancer immunology. PMID:27841757
An extended set of yeast-based functional assays accurately identifies human disease mutations
Sun, Song; Yang, Fan; Tan, Guihong; Costanzo, Michael; Oughtred, Rose; Hirschman, Jodi; Theesfeld, Chandra L.; Bansal, Pritpal; Sahni, Nidhi; Yi, Song; Yu, Analyn; Tyagi, Tanya; Tie, Cathy; Hill, David E.; Vidal, Marc; Andrews, Brenda J.; Boone, Charles; Dolinski, Kara; Roth, Frederick P.
2016-01-01
We can now routinely identify coding variants within individual human genomes. A pressing challenge is to determine which variants disrupt the function of disease-associated genes. Both experimental and computational methods exist to predict pathogenicity of human genetic variation. However, a systematic performance comparison between them has been lacking. Therefore, we developed and exploited a panel of 26 yeast-based functional complementation assays to measure the impact of 179 variants (101 disease- and 78 non-disease-associated variants) from 22 human disease genes. Using the resulting reference standard, we show that experimental functional assays in a 1-billion-year diverged model organism can identify pathogenic alleles with significantly higher precision and specificity than current computational methods. PMID:26975778
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aho, Hanne; Schwemmer, M.; Tessmann, D.
1996-03-01
The mitochondrial capsule selenoprotein (MCS) (HGMW-approved symbol MCSP) is one of three proteins that are important for the maintenance and stabilization of the crescent structure of the sperm mitochondria. We describe here the isolation of a cDNA, the exon-intron organization, the expression, and the chromosomal localization of the human MCS gene. Nucleotide sequence analysis of the human and mouse MCS cDNAs reveals that the 5{prime}- and 3{prime}-untranslated sequences are more conserved (71%) than the coding sequences (59%). The open reading frame encodes a 116-amino-acid protein and lacks the UGA codons, which have been reported to encode the selenocysteines in themore » N-terminal of the deduced mouse protein. The deduced human protein shows a low degree of amino acid sequence identity to the mouse protein. The deduced human protein shows a low degree of amino acid sequence identity to the mouse protein (39%). The most striking homology lies in the dicysteine motifs. Northern and Southern zooblot analyses reveal that the MCS gene in human, baboon, and bovine is more conserved than its counterparts in mouse and rat. The single intron in the human MCS gene is approximately 6 kb and interrupts the 5{prime}-untranslated region at a position equivalent to that in the mouse and rat genes. Northern blot and in situ hybridization experiments demonstrate that the expression of the human MCS gene is restricted to haploid spermatids. The human gene was assigned to q21 of chromosome 1. 30 refs., 9 figs.« less
2012-01-01
Background Evolution of splice sites is a well-known phenomenon that results in transcript diversity during human evolution. Many novel splice sites are derived from repetitive elements and may not contribute to protein products. Here, we analyzed annotated human protein-coding exons and identified human-specific splice sites that arose after the human-chimpanzee divergence. Results We analyzed multiple alignments of the annotated human protein-coding exons and their respective orthologous mammalian genome sequences to identify 85 novel splice sites (50 splice acceptors and 35 donors) in the human genome. The novel protein-coding exons, which are expressed either constitutively or alternatively, produce novel protein isoforms by insertion, deletion, or frameshift. We found three cases in which the human-specific isoform conferred novel molecular function in the human cells: the human-specific IMUP protein isoform induces apoptosis of the trophoblast and is implicated in pre-eclampsia; the intronization of a part of SMOX gene exon produces inactive spermine oxidase; the human-specific NUB1 isoform shows reduced interaction with ubiquitin-like proteins, possibly affecting ubiquitin pathways. Conclusions Although the generation of novel protein isoforms does not equate to adaptive evolution, we propose that these cases are useful candidates for a molecular functional study to identify proteomic changes that might bring about novel phenotypes during human evolution. PMID:23148531
Goto, Hiroki; Peng, Lei; Makova, Kateryna D
2009-02-01
Compared with the X chromosome, the mammalian Y chromosome is considerably diminished in size and has lost most of its ancestral genes during evolution. Interestingly, for the X-degenerate region on the Y chromosome, human has retained all 16 genes, while chimpanzee has lost 4 of the 16 genes since the divergence of the two species. To uncover the evolutionary forces governing ape Y chromosome degeneration, we determined the complete sequences of the coding exons and splice sites for 16 gorilla Y chromosome genes of the X-degenerate region. We discovered that all studied reading frames and splice sites were intact, and thus, this genomic region experienced no gene loss in the gorilla lineage. Higher nucleotide divergence was observed in the chimpanzee than the human lineage, particularly for genes with disruptive mutations, suggesting a lack of functional constraints for these genes in chimpanzee. Surprisingly, our results indicate that the human and gorilla orthologues of the genes disrupted in chimpanzee evolve under relaxed functional constraints and might not be essential. Taking mating patterns and effective population sizes of ape species into account, we conclude that genetic hitchhiking associated with positive selection due to sperm competition might explain the rapid decline in the Y chromosome gene number in chimpanzee. As we found no evidence of positive selection acting on the X-degenerate genes, such selection likely targets other genes on the chimpanzee Y chromosome.
Li, Hu; Liu, Hui; Shi, Aimin; Štys, Pavel; Zhou, Xuguo; Cai, Wanzhi
2012-01-01
Many of true bugs are important insect pests to cultivated crops and some are important vectors of human diseases, but few cladistic analyses have addressed relationships among the seven infraorders of Heteroptera. The Enicocephalomorpha and Nepomorpha are consider the basal groups of Heteroptera, but the basal-most lineage remains unresolved. Here we report the mitochondrial genome of the unique-headed bug Stenopirates sp., the first mitochondrial genome sequenced from Enicocephalomorpha. The Stenopirates sp. mitochondrial genome is a typical circular DNA molecule of 15, 384 bp in length, and contains 37 genes and a large non-coding fragment. The gene order differs substantially from other known insect mitochondrial genomes, with rearrangements of both tRNA genes and protein-coding genes. The overall AT content (82.5%) of Stenopirates sp. is the highest among all the known heteropteran mitochondrial genomes. The strand bias is consistent with other true bugs with negative GC-skew and positive AT-skew for the J-strand. The heteropteran mitochondrial atp8 exhibits the highest evolutionary rate, whereas cox1 appears to have the lowest rate. Furthermore, a negative correlation was observed between the variation of nucleotide substitutions and the GC content of each protein-coding gene. A microsatellite was identified in the putative control region. Finally, phylogenetic reconstruction suggests that Enicocephalomorpha is the sister group to all the remaining Heteroptera. PMID:22235294
Singh, Vineet K; Ring, Robert P; Aswani, Vijay; Stemper, Mary E; Kislow, Jennifer; Ye, Zhan; Shukla, Sanjay K
2017-12-01
Staphylococcus aureus is an opportunistic human pathogen that can cause serious infections in humans. A plethora of known and putative virulence factors are produced by staphylococci that collectively orchestrate pathogenesis. Ear protein (Escherichia coli ampicillin resistance) in S. aureus is an exoprotein in COL strain, predicted to be a superantigen, and speculated to play roles in antibiotic resistance and virulence. The goal of this study was to determine if expression of ear is modulated by single nucleotide polymorphisms in its promoter and coding sequences and whether this gene plays roles in antibiotic resistance and virulence. Promoter, coding sequences and expression of the ear gene in clinical and carriage S. aureus strains with distinct genetic backgrounds were analysed. The JE2 strain and its isogenic ear mutant were used in a systemic infection mouse model to determine the competiveness of the ear mutant.Results/Key findings. The ear gene showed a variable expression, with USA300FPR3757 showing a high-level expression compared to many of the other strains tested including some showing negligible expression. Higher expression was associated with agr type 1 but not correlated with phylogenetic relatedness of the ear gene based upon single nucleotide polymorphisms in the promoter or coding regions suggesting a complex regulation. An isogenic JE2 (USA300 background) ear mutant showed no significant difference in its growth, antibiotic susceptibility or virulence in a mouse model. Our data suggests that despite being highly expressed in a USA300 genetic background, Ear is not a significant contributor to virulence in that strain.
Regulation of neural macroRNAs by the transcriptional repressor REST
Johnson, Rory; Teh, Christina Hui-Leng; Jia, Hui; Vanisri, Ravi Raj; Pandey, Tridansh; Lu, Zhong-Hao; Buckley, Noel J.; Stanton, Lawrence W.; Lipovich, Leonard
2009-01-01
The essential transcriptional repressor REST (repressor element 1-silencing transcription factor) plays central roles in development and human disease by regulating a large cohort of neural genes. These have conventionally fallen into the class of known, protein-coding genes; recently, however, several noncoding microRNA genes were identified as REST targets. Given the widespread transcription of messenger RNA-like, noncoding RNAs (“macroRNAs”), some of which are functional and implicated in disease in mammalian genomes, we sought to determine whether this class of noncoding RNAs can also be regulated by REST. By applying a new, unbiased target gene annotation pipeline to computationally discovered REST binding sites, we find that 23% of mammalian REST genomic binding sites are within 10 kb of a macroRNA gene. These putative target genes were overlooked by previous studies. Focusing on a set of 18 candidate macroRNA targets from mouse, we experimentally demonstrate that two are regulated by REST in neural stem cells. Flanking protein-coding genes are, at most, weakly repressed, suggesting specific targeting of the macroRNAs by REST. Similar to the majority of known REST target genes, both of these macroRNAs are induced during nervous system development and have neurally restricted expression profiles in adult mouse. We observe a similar phenomenon in human: the DiGeorge syndrome-associated noncoding RNA, DGCR5, is repressed by REST through a proximal upstream binding site. Therefore neural macroRNAs represent an additional component of the REST regulatory network. These macroRNAs are new candidates for understanding the role of REST in neuronal development, neurodegeneration, and cancer. PMID:19050060
Regulation of neural macroRNAs by the transcriptional repressor REST.
Johnson, Rory; Teh, Christina Hui-Leng; Jia, Hui; Vanisri, Ravi Raj; Pandey, Tridansh; Lu, Zhong-Hao; Buckley, Noel J; Stanton, Lawrence W; Lipovich, Leonard
2009-01-01
The essential transcriptional repressor REST (repressor element 1-silencing transcription factor) plays central roles in development and human disease by regulating a large cohort of neural genes. These have conventionally fallen into the class of known, protein-coding genes; recently, however, several noncoding microRNA genes were identified as REST targets. Given the widespread transcription of messenger RNA-like, noncoding RNAs ("macroRNAs"), some of which are functional and implicated in disease in mammalian genomes, we sought to determine whether this class of noncoding RNAs can also be regulated by REST. By applying a new, unbiased target gene annotation pipeline to computationally discovered REST binding sites, we find that 23% of mammalian REST genomic binding sites are within 10 kb of a macroRNA gene. These putative target genes were overlooked by previous studies. Focusing on a set of 18 candidate macroRNA targets from mouse, we experimentally demonstrate that two are regulated by REST in neural stem cells. Flanking protein-coding genes are, at most, weakly repressed, suggesting specific targeting of the macroRNAs by REST. Similar to the majority of known REST target genes, both of these macroRNAs are induced during nervous system development and have neurally restricted expression profiles in adult mouse. We observe a similar phenomenon in human: the DiGeorge syndrome-associated noncoding RNA, DGCR5, is repressed by REST through a proximal upstream binding site. Therefore neural macroRNAs represent an additional component of the REST regulatory network. These macroRNAs are new candidates for understanding the role of REST in neuronal development, neurodegeneration, and cancer.
Genome-scale transcriptional activation by an engineered CRISPR-Cas9 complex.
Konermann, Silvana; Brigham, Mark D; Trevino, Alexandro E; Joung, Julia; Abudayyeh, Omar O; Barcena, Clea; Hsu, Patrick D; Habib, Naomi; Gootenberg, Jonathan S; Nishimasu, Hiroshi; Nureki, Osamu; Zhang, Feng
2015-01-29
Systematic interrogation of gene function requires the ability to perturb gene expression in a robust and generalizable manner. Here we describe structure-guided engineering of a CRISPR-Cas9 complex to mediate efficient transcriptional activation at endogenous genomic loci. We used these engineered Cas9 activation complexes to investigate single-guide RNA (sgRNA) targeting rules for effective transcriptional activation, to demonstrate multiplexed activation of ten genes simultaneously, and to upregulate long intergenic non-coding RNA (lincRNA) transcripts. We also synthesized a library consisting of 70,290 guides targeting all human RefSeq coding isoforms to screen for genes that, upon activation, confer resistance to a BRAF inhibitor. The top hits included genes previously shown to be able to confer resistance, and novel candidates were validated using individual sgRNA and complementary DNA overexpression. A gene expression signature based on the top screening hits correlated with markers of BRAF inhibitor resistance in cell lines and patient-derived samples. These results collectively demonstrate the potential of Cas9-based activators as a powerful genetic perturbation technology.
Marra, M A; Prasad, S S; Baillie, D L
1993-01-01
A previous study of genomic organization described the identification of nine potential coding regions in 150 kb of genomic DNA from the unc-22(IV) region of Caenorhabditis elegans. In this study, we focus on the genomic organization of a small interval of 0.1 map unit bordered on the right by unc-22 and on the left by the left-hand breakpoints of the deficiencies sDf9, sDf19 and sDf65. This small interval at present contains a single mutagenically defined locus, the essential gene let-56. The cosmid C11F2 has previously been used to rescue let-56. Therefore, at least some of C11F2 must reside in the interval. In this paper, we report the characterization of two coding elements that reside on C11F2. Analysis of nucleotide sequence data obtained from cDNAs and cosmid subclones revealed that one of the coding elements closely resembles aromatic amino acid decarboxylases from several species. The other of these coding elements was found to closely resemble a human growth factor activatable Na+/H+ antiporter. Paris of oligonucleotide primers, predicted from both coding elements, have been used in PCR experiments to position these coding elements between the left breakpoint of sDf19 and the left breakpoint of sDf65, between the essential genes let-653 and let-56.
Splicing regulation and dysregulation of cholinergic genes expressed at the neuromuscular junction.
Ohno, Kinji; Rahman, Mohammad Alinoor; Nazim, Mohammad; Nasrin, Farhana; Lin, Yingni; Takeda, Jun-Ichi; Masuda, Akio
2017-08-01
We humans have evolved by acquiring diversity of alternative RNA metabolisms including alternative means of splicing and transcribing non-coding genes, and not by acquiring new coding genes. Tissue-specific and developmental stage-specific alternative RNA splicing is achieved by tightly regulated spatiotemporal regulation of expressions and activations of RNA-binding proteins that recognize their cognate splicing cis-elements on nascent RNA transcripts. Genes expressed at the neuromuscular junction are also alternatively spliced. In addition, germline mutations provoke aberrant splicing by compromising binding of RNA-binding proteins, and cause congenital myasthenic syndromes (CMS). We present physiological splicing mechanisms of genes for agrin (AGRN), acetylcholinesterase (ACHE), MuSK (MUSK), acetylcholine receptor (AChR) α1 subunit (CHRNA1), and collagen Q (COLQ) in human, and their aberration in diseases. Splicing isoforms of AChE T , AChE H , and AChE R are generated by hnRNP H/F. Skipping of MUSK exon 10 makes a Wnt-insensitive MuSK isoform, which is unique to human. Skipping of exon 10 is achieved by coordinated binding of hnRNP C, YB-1, and hnRNP L to exon 10. Exon P3A of CHRNA1 is alternatively included to generate a non-functional AChR α1 subunit in human. Molecular dissection of splicing mutations in patients with CMS reveals that exon P3A is alternatively skipped by hnRNP H, polypyrimidine tract-binding protein 1, and hnRNP L. Similarly, analysis of an exonic mutation in COLQ exon 16 in a CMS patient discloses that constitutive splicing of exon 16 requires binding of serine arginine-rich splicing factor 1. Intronic and exonic splicing mutations in CMS enable us to dissect molecular mechanisms underlying alternative and constitutive splicing of genes expressed at the neuromuscular junction. This is an article for the special issue XVth International Symposium on Cholinergic Mechanisms. © 2017 International Society for Neurochemistry.
Jongsma, A P; Burgerhout, W G
1977-01-01
Regional localization studies of genes coding for human PGD, PPH1, PGM1, UGPP, GuK1, Pep-C, and FH, which have been assigned to chromosome 1, were performed with man-Chinese hamster somatic cell hybrids, Informative hybrids that retained fragments of the human chromosome 1 were produced by fusion of hamster cells with human cells carrying reciprocal translocations involving chromosome 1. Analysis of the hybrids that retained one of the translocation chromosomes or de novo rearrangements involving the human 1 revealed the following gene positions: PGD and PPH1 in 1pter leads to 1p32, PGM1 in 1p32 leads to 1p22, UGPP and GuK1 in 1q21 leads to 1q42, FH in 1qter leads to 1q42, and Pep-C probably in 1q42.
The spotted gar genome illuminates vertebrate evolution and facilitates human-to-teleost comparisons
Braasch, Ingo; Gehrke, Andrew R.; Smith, Jeramiah J.; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M.; Campbell, Michael S.; Barrell, Daniel; Martin, Kyle J.; Mulley, John F.; Ravi, Vydianathan; Lee, Alison P.; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E. G.; Sun, Yi; Hertel, Jana; Beam, Michael J.; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H.; Litman, Gary W.; Litman, Ronda T.; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F.; Wang, Han; Taylor, John S.; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M. J.; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A.; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T.; Venkatesh, Byrappa; Holland, Peter W. H.; Guiguen, Yann; Bobe, Julien; Shubin, Neil H.; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H.
2016-01-01
To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before the teleost genome duplication (TGD). The slowly evolving gar genome conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization, and development (e.g., Hox, ParaHox, and miRNA genes). Numerous conserved non-coding elements (CNEs, often cis-regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles of such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses revealed that the sum of expression domains and levels from duplicated teleost genes often approximate patterns and levels of gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes, and the function of human regulatory sequences. PMID:26950095
Cătană, Cristina- Sorina; Pichler, Martin; Giannelli, Gianluigi; Mader, Robert M; Berindan-Neagoe, Ioana
2017-04-25
In a continuous and mutual exchange of information, cancer cells are invariably exposed to microenvironment transformation. This continuous alteration of the genetic, molecular and cellular peritumoral stroma background has become as critical as the management of primary tumor progression events in cancer cells. The communication between stroma and tumor cells within the extracellular matrix is one of the triggers in colon and liver carcinogenesis. All non- codingRNAs including long non-coding RNAs, microRNAs and ultraconserved genes play a critical role in almost all cancers and are responsible for the modulation of the tumor microenvironment in several malignant processes such as initiation, progression and dissemination. This review details the involvement of non codingRNAs in the evolution of human colorectal carcinoma and hepatocellular carcinoma in relationship with the microenvironment. Recent research has shown that a considerable number of dysregulated non- codingRNAs could be valuable diagnostic and prognostic biomarkers in cancer. Therefore, more in-depth knowledge of the role non- codingRNAs play in stroma-tumor communication and of the complex regulatory mechanisms between ultraconserved genes and microRNAs supports the validation of future effective therapeutic targets in patients suffering from hepatocellular and colorectal carcinoma, two distinctive entities which share quite a lot common non-coding RNAs.
Cătană, Cristina- Sorina; Pichler, Martin; Giannelli, Gianluigi; Mader, Robert M.; Berindan-Neagoe, Ioana
2017-01-01
In a continuous and mutual exchange of information, cancer cells are invariably exposed to microenvironment transformation. This continuous alteration of the genetic, molecular and cellular peritumoral stroma background has become as critical as the management of primary tumor progression events in cancer cells. The communication between stroma and tumor cells within the extracellular matrix is one of the triggers in colon and liver carcinogenesis. All non- codingRNAs including long non-coding RNAs, microRNAs and ultraconserved genes play a critical role in almost all cancers and are responsible for the modulation of the tumor microenvironment in several malignant processes such as initiation, progression and dissemination. This review details the involvement of non codingRNAs in the evolution of human colorectal carcinoma and hepatocellular carcinoma in relationship with the microenvironment. Recent research has shown that a considerable number of dysregulated non- codingRNAs could be valuable diagnostic and prognostic biomarkers in cancer. Therefore, more in-depth knowledge of the role non- codingRNAs play in stroma-tumor communication and of the complex regulatory mechanisms between ultraconserved genes and microRNAs supports the validation of future effective therapeutic targets in patients suffering from hepatocellular and colorectal carcinoma, two distinctive entities which share quite a lot common non-coding RNAs. PMID:28392501
Fisher, S E; van Bakel, I; Lloyd, S E; Pearce, S H; Thakker, R V; Craig, I W
1995-10-10
Dent disease, an X-linked familial renal tubular disorder, is a form of Fanconi syndrome associated with proteinuria, hypercalciuria, nephrocalcinosis, kidney stones, and eventual renal failure. We have previously used positional cloning to identify the 3' part of a novel kidney-specific gene (initially termed hClC-K2, but now referred to as CLCN5), which is deleted in patients from one pedigree segregating Dent disease. Mutations that disrupt this gene have been identified in other patients with this disorder. Here we describe the isolation and characterization of the complete open reading frame of the human CLCN5 gene, which is predicted to encode a protein of 746 amino acids, with significant homology to all known members of the ClC family of voltage-gated chloride channels. CLCN5 belongs to a distinct branch of this family, which also includes the recently identified genes CLCN3 and CLCN4. We have shown that the coding region of CLCN5 is organized into 12 exons, spanning 25-30 kb of genomic DNA, and have determined the sequence of each exon-intron boundary. The elucidation of the coding sequence and exon-intron organization of CLCN5 will both expedite the evaluation of structure/function relationships of these ion channels and facilitate the screening of other patients with renal tubular dysfunction for mutations at this locus.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steinkasserer, A.; Koettnitz, K.; Hauber, J.
1995-02-10
The eukaryotic initiation factor 5A (eIF-5A) has been identified as an essential cofactor for the HIV-1 transactivator protein Rev. Rev plays a key role in the complex regulation of HIV-1 gene expression and thereby in the generation of infectious virus particles. Expression of eIF-5A is vital for Rev function, and inhibition of this interaction leads to a block of the viral replication cycle. In humans, four different eIF-5A genes have been identified. One codes for the eIF-5A protein and the other three are pseudogenes. Using a panel of somatic rodent-human cell hybrids in combination with fluorescence in situ hybridization analysis,more » we show that the four genes map to three different chromosomes. The coding eIF-5A gene (EIF5A) maps to 17p12-p13, and the three pseudogenes EIF5AP1, EIF5AP2, and EIF5AP3 map to 10q23.3, 17q25, and 19q13.2, respectively. This is the first localization report for a eukaryotic cofactor for a regulatory HIV-1 protein. 16 refs., 1 fig.« less
Musante, Luciana; Bartsch, Oliver; Ropers, Hans-Hilger; Kalscheuer, Vera M
2004-05-12
Characterization of a balanced t(2;12)(q37;q24) translocation in a patient with suspicion of Noonan syndrome revealed that the chromosome 12 breakpoint lies in the vicinity of a novel human gene, thyroid hormone receptor-associated protein 2 (THRAP2). We therefore characterized this gene and its mouse counterpart in more detail. Human and mouse THRAP2/Thrap2 span a genomic region of about 310 and >170 kilobases (kb), and both contain 31 exons. Corresponding transcripts are approximately 9.5 kb long. Their open reading frames code for proteins of 2210 and 2203 amino acids, which are 93% identical. By northern blot analysis, human and mouse THRAP2/Thrap2 genes showed ubiquitous expression. Transcripts were most abundant in human skeletal muscle and in mouse heart. THRAP2 protein is 56% identical to human TRAP240, which belongs to the thyroid hormone receptor associated protein (TRAP) complex and is evolutionary conserved up to yeast. This complex is involved in transcriptional regulation and is believed to serve as adapting interface between regulatory proteins bound to specific DNA sequences and RNA polymerase II.
miRNA-dependent gene silencing involving Ago2-mediated cleavage of a circular antisense RNA
Hansen, Thomas B; Wiklund, Erik D; Bramsen, Jesper B; Villadsen, Sune B; Statham, Aaron L; Clark, Susan J; Kjems, Jørgen
2011-01-01
MicroRNAs (miRNAs) are ∼22 nt non-coding RNAs that typically bind to the 3′ UTR of target mRNAs in the cytoplasm, resulting in mRNA destabilization and translational repression. Here, we report that miRNAs can also regulate gene expression by targeting non-coding antisense transcripts in human cells. Specifically, we show that miR-671 directs cleavage of a circular antisense transcript of the Cerebellar Degeneration-Related protein 1 (CDR1) locus in an Ago2-slicer-dependent manner. The resulting downregulation of circular antisense has a concomitant decrease in CDR1 mRNA levels, independently of heterochromatin formation. This study provides the first evidence for non-coding antisense transcripts as functional miRNA targets, and a novel regulatory mechanism involving a positive correlation between mRNA and antisense circular RNA levels. PMID:21964070
Detection of Pathways Affected by Positive Selection in Primate Lineages Ancestral to Humans
Moretti, S.; Davydov, I.I.; Excoffier, L.
2017-01-01
Abstract Gene set enrichment approaches have been increasingly successful in finding signals of recent polygenic selection in the human genome. In this study, we aim at detecting biological pathways affected by positive selection in more ancient human evolutionary history. Focusing on four branches of the primate tree that lead to modern humans, we tested all available protein coding gene trees of the Primates clade for signals of adaptation in these branches, using the likelihood-based branch site test of positive selection. The results of these locus-specific tests were then used as input for a gene set enrichment test, where whole pathways are globally scored for a signal of positive selection, instead of focusing only on outlier “significant” genes. We identified signals of positive selection in several pathways that are mainly involved in immune response, sensory perception, metabolism, and energy production. These pathway-level results are highly significant, even though there is no functional enrichment when only focusing on top scoring genes. Interestingly, several gene sets are found significant at multiple levels in the phylogeny, but different genes are responsible for the selection signal in the different branches. This suggests that the same function has been optimized in different ways at different times in primate evolution. PMID:28333345
Evaluation and Design of Genome-Wide CRISPR/SpCas9 Knockout Screens
Hart, Traver; Tong, Amy Hin Yan; Chan, Katie; Van Leeuwen, Jolanda; Seetharaman, Ashwin; Aregger, Michael; Chandrashekhar, Megha; Hustedt, Nicole; Seth, Sahil; Noonan, Avery; Habsid, Andrea; Sizova, Olga; Nedyalkova, Lyudmila; Climie, Ryan; Tworzyanski, Leanne; Lawson, Keith; Sartori, Maria Augusta; Alibeh, Sabriyeh; Tieu, David; Masud, Sanna; Mero, Patricia; Weiss, Alexander; Brown, Kevin R.; Usaj, Matej; Billmann, Maximilian; Rahman, Mahfuzur; Costanzo, Michael; Myers, Chad L.; Andrews, Brenda J.; Boone, Charles; Durocher, Daniel; Moffat, Jason
2017-01-01
The adaptation of CRISPR/SpCas9 technology to mammalian cell lines is transforming the study of human functional genomics. Pooled libraries of CRISPR guide RNAs (gRNAs) targeting human protein-coding genes and encoded in viral vectors have been used to systematically create gene knockouts in a variety of human cancer and immortalized cell lines, in an effort to identify whether these knockouts cause cellular fitness defects. Previous work has shown that CRISPR screens are more sensitive and specific than pooled-library shRNA screens in similar assays, but currently there exists significant variability across CRISPR library designs and experimental protocols. In this study, we reanalyze 17 genome-scale knockout screens in human cell lines from three research groups, using three different genome-scale gRNA libraries. Using the Bayesian Analysis of Gene Essentiality algorithm to identify essential genes, we refine and expand our previously defined set of human core essential genes from 360 to 684 genes. We use this expanded set of reference core essential genes, CEG2, plus empirical data from six CRISPR knockout screens to guide the design of a sequence-optimized gRNA library, the Toronto KnockOut version 3.0 (TKOv3) library. We then demonstrate the high effectiveness of the library relative to reference sets of essential and nonessential genes, as well as other screens using similar approaches. The optimized TKOv3 library, combined with the CEG2 reference set, provide an efficient, highly optimized platform for performing and assessing gene knockout screens in human cell lines. PMID:28655737
Smet, Annemieke; Martel, An; Persoons, Davy; Dewulf, Jeroen; Heyndrickx, Marc; Herman, Lieve; Haesebrouck, Freddy; Butaye, Patrick
2010-05-01
Broad-spectrum β-lactamase genes (coding for extended-spectrum β-lactamases and AmpC β-lactamases) have been frequently demonstrated in the microbiota of food-producing animals. This may pose a human health hazard as these genes may be present in zoonotic bacteria, which would cause a direct problem. They can also be present in commensals, which may act as a reservoir of resistance genes for pathogens causing disease both in humans and in animals. Broad-spectrum β-lactamase genes are frequently located on mobile genetic elements, such as plasmids, transposons and integrons, which often also carry additional resistance genes. This could limit treatment options for infections caused by broad-spectrum β-lactam-resistant microorganisms. This review addresses the growing burden of broad-spectrum β-lactam resistance among Enterobacteriaceae isolated from food, companion and wild animals worldwide. To explore the human health hazard, the diversity of broad-spectrum β-lactamases among Enterobacteriaceae derived from animals is compared with respect to their presence in human bacteria. Furthermore, the possibilities of the exchange of genes encoding broad-spectrum β-lactamases - including the exchange of the transposons and plasmids that serve as vehicles for these genes - between different ecosystems (human and animal) are discussed. © 2009 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Hu, Min; Chilton, Neil B; Gasser, Robin B
2002-02-01
The complete mitochondrial genome sequences were determined for two species of human hookworms, Ancylostoma duodenale (13,721 bp) and Necator americanus (13,604 bp). The circular hookworm genomes are amongst the smallest reported to date for any metazoan organism. Their relatively small size relates mainly to a reduced length in the AT-rich region. Both hookworm genomes encode 12 protein, two ribosomal RNA and 22 transfer RNA genes, but lack the ATP synthetase subunit 8 gene, which is consistent with three other species of Secernentea studied to date. All genes are transcribed in the same direction and have a nucleotide composition high in A and T, but low in G and C. The AT bias had a significant effect on both the codon usage pattern and amino acid composition of proteins. For both hookworm species, genes were arranged in the same order as for Caenorhabditis elegans, except for the presence of a non-coding region between genes nad3 and nad5. In A. duodenale, this non-coding region is predicted to form a stem-and-loop structure which is not present in N. americanus. The mitochondrial genome structure for both hookworms differs from Ascaris suum only in the location of the AT-rich region, whereas there are substantial differences when compared with Onchocerca volvulus, including four gene or gene-block translocations and the positions of some transfer RNA genes and the AT-rich region. Based on genome organisation and amino acid sequence identity, A. duodenale and N. americanus were more closely related to C. elegans than to A. suum or O. volvulus (all secernentean nematodes), consistent with a previous phylogenetic study using ribosomal DNA sequence data. Determination of the complete mitochondrial genome sequences for two human hookworms (the first members of the order Strongylida ever sequenced) provides a foundation for studying the systematics, population genetics and ecology of these and other nematodes of socio-economic importance.
Tang, Guo-Qing; Maxwell, E. Stuart
2008-01-01
The amphibian Xenopus provides a model organism for investigating microRNA expression during vertebrate embryogenesis and development. Searching available Xenopus genome databases using known human pre-miRNAs as query sequences, more than 300 genes encoding 142 Xenopus tropicalis miRNAs were identified. Analysis of Xenopus tropicalis miRNA genes revealed a predominate positioning within introns of protein-coding and nonprotein-coding RNA Pol II-transcribed genes. MiRNA genes were also located in pre-mRNA exons and positioned intergenically between known protein-coding genes. Many miRNA species were found in multiple locations and in more than one genomic context. MiRNA genes were also clustered throughout the genome, indicating the potential for the cotranscription and coordinate expression of miRNAs located in a given cluster. Northern blot analysis confirmed the expression of many identified miRNAs in both X. tropicalis and X. laevis. Comparison of X. tropicalis and X. laevis blots revealed comparable expression profiles, although several miRNAs exhibited species-specific expression in different tissues. More detailed analysis revealed that for some miRNAs, the tissue-specific expression profile of the pri-miRNA precursor was distinctly different from that of the mature miRNA profile. Differential miRNA precursor processing in both the nucleus and cytoplasm was implicated in the observed tissue-specific differences. These observations indicated that post-transcriptional processing plays an important role in regulating miRNA expression in the amphibian Xenopus. PMID:18032731
Predicting Gene Structure Changes Resulting from Genetic Variants via Exon Definition Features.
Majoros, William H; Holt, Carson; Campbell, Michael S; Ware, Doreen; Yandell, Mark; Reddy, Timothy E
2018-04-25
Genetic variation that disrupts gene function by altering gene splicing between individuals can substantially influence traits and disease. In those cases, accurately predicting the effects of genetic variation on splicing can be highly valuable for investigating the mechanisms underlying those traits and diseases. While methods have been developed to generate high quality computational predictions of gene structures in reference genomes, the same methods perform poorly when used to predict the potentially deleterious effects of genetic changes that alter gene splicing between individuals. Underlying that discrepancy in predictive ability are the common assumptions by reference gene finding algorithms that genes are conserved, well-formed, and produce functional proteins. We describe a probabilistic approach for predicting recent changes to gene structure that may or may not conserve function. The model is applicable to both coding and noncoding genes, and can be trained on existing gene annotations without requiring curated examples of aberrant splicing. We apply this model to the problem of predicting altered splicing patterns in the genomes of individual humans, and we demonstrate that performing gene-structure prediction without relying on conserved coding features is feasible. The model predicts an unexpected abundance of variants that create de novo splice sites, an observation supported by both simulations and empirical data from RNA-seq experiments. While these de novo splice variants are commonly misinterpreted by other tools as coding or noncoding variants of little or no effect, we find that in some cases they can have large effects on splicing activity and protein products, and we propose that they may commonly act as cryptic factors in disease. The software is available from geneprediction.org/SGRF. bmajoros@duke.edu. Supplementary information is available at Bioinformatics online.
Regional localization of the human integrin {beta}{sub 6} gene (ITGB6) to chromosome 2q24-q31
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fernandez-Ruiz, E.; Sanchez-Madrid, F.
The heterodimer {alpha}{sub v}{beta}{sub 6} acts as a fibronectin receptor for human carcinoma cells. The authors report here the regional localization of the {beta}{sub 6} gene to 2q24-q31 by fluorescence in situ hybridization coupled with GTG-banding. This gene is located close to the region to which genes coding for the {alpha} subunits of the integrins VLA-4 and vitronectin receptor (ITGA4 and ITGAV, respectively) have been previously mapped (2q31-q32). These data suggest a proximal position of the integrin {beta}{sub 6} locus (ITGB6) on this integrin gene cluster. Futhermore, double-labeling in situ hybridization experiments performed with {alpha}{sub 4} and {alpha}{sub v} probesmore » indicated a telomeric position of ITGAV with respect to ITGA4. 22 refs., 2 figs.« less
Reuther, Peter; Göpfert, Kristina; Dudek, Alexandra H.; Heiner, Monika; Herold, Susanne; Schwemmle, Martin
2015-01-01
Influenza A viruses (IAV) pose a constant threat to the human population and therefore a better understanding of their fundamental biology and identification of novel therapeutics is of upmost importance. Various reporter-encoding IAV were generated to achieve these goals, however, one recurring difficulty was the genetic instability especially of larger reporter genes. We employed the viral NS segment coding for the non-structural protein 1 (NS1) and nuclear export protein (NEP) for stable expression of diverse reporter proteins. This was achieved by converting the NS segment into a single open reading frame (ORF) coding for NS1, the respective reporter and NEP. To allow expression of individual proteins, the reporter genes were flanked by two porcine Teschovirus-1 2A peptide (PTV-1 2A)-coding sequences. The resulting viruses encoding luciferases, fluorescent proteins or a Cre recombinase are characterized by a high genetic stability in vitro and in mice and can be readily employed for antiviral compound screenings, visualization of infected cells or cells that survived acute infection. PMID:26068081
Integrating non-coding RNAs in JAK-STAT regulatory networks
Witte, Steven; Muljo, Stefan A
2014-01-01
Being a well-characterized pathway, JAK-STAT signaling serves as a valuable paradigm for studying the architecture of gene regulatory networks. The discovery of untranslated or non-coding RNAs, namely microRNAs and long non-coding RNAs, provides an opportunity to elucidate their roles in such networks. In principle, these regulatory RNAs can act as downstream effectors of the JAK-STAT pathway and/or affect signaling by regulating the expression of JAK-STAT components. Examples of interactions between signaling pathways and non-coding RNAs have already emerged in basic cell biology and human diseases such as cancer, and can potentially guide the identification of novel biomarkers or drug targets for medicine. PMID:24778925
Researchers led by Ashish Lal, Ph.D., Investigator in the Genetics Branch, have shown that when the DNA in human colon cancer cells is damaged, a long non-coding RNA (lncRNA) regulates the expression of genes that halt growth, which allows the cells to repair the damage and promote survival. Their findings suggest an important pro-survival function of a lncRNA in cancer
Hunt, C; Morimoto, R I
1985-01-01
We have determined the nucleotide sequence of the human hsp70 gene and 5' flanking region. The hsp70 gene is transcribed as an uninterrupted primary transcript of 2440 nucleotides composed of a 5' noncoding leader sequence of 212 nucleotides, a 3' noncoding region of 242 nucleotides, and a continuous open reading frame of 1986 nucleotides that encodes a protein with predicted molecular mass of 69,800 daltons. Upstream of the 5' terminus are the canonical TATAAA box, the sequence ATTGG that corresponds in the inverted orientation to the CCAAT motif, and the dyad sequence CTGGAAT/ATTCCCG that shares homology in 12 of 14 positions with the consensus transcription regulatory sequence common to Drosophila heat shock genes. Comparison of the predicted amino acid sequences of human hsp70 with the published sequences of Drosophila hsp70 and Escherichia coli dnaK reveals that human hsp70 is 73% identical to Drosophila hsp70 and 47% identical to E. coli dnaK. Surprisingly, the nucleotide sequences of the human and Drosophila genes are 72% identical and human and E. coli genes are 50% identical, which is more highly conserved than necessary given the degeneracy of the genetic code. The lack of accumulated silent nucleotide substitutions leads us to propose that there may be additional information in the nucleotide sequence of the hsp70 gene or the corresponding mRNA that precludes the maximum divergence allowed in the silent codon positions. PMID:3931075
Orden, José A; Horcajo, Pilar; de la Fuente, Ricardo; Ruiz-Santa-Quiteria, José A; Domínguez-Bernal, Gustavo; Carrión, Javier
2011-12-01
Subtilase cytotoxin (SubAB) from verotoxin (VT)-producing Escherichia coli (VTEC) strains was first described in the 98NK2 strain and has been associated with human disease. However, SubAB has recently been found in two VT-negative E. coli strains (ED 591 and ED 32). SubAB is encoded by two closely linked, cotranscribed genes (subA and subB). In this study, we investigated the presence of subAB genes in 52 VTEC strains isolated from cattle and 209 strains from small ruminants, using PCR. Most (91.9%) VTEC strains from sheep and goats and 25% of the strains from healthy cattle possessed subAB genes. The presence of subAB in a high percentage of the VTEC strains from small ruminants might increase the pathogenicity of these strains for human beings. Some differences in the results of PCRs and in the association with some virulence genes suggested the existence of different variants of subAB. We therefore sequenced the subA gene in 12 strains and showed that the subA gene in most of the subAB-positive VTEC strains from cattle was almost identical (about 99%) to that in the 98NK2 strain, while the subA gene in most of the subAB-positive VTEC strains from small ruminants was almost identical to that in the ED 591 strain. We propose the terms subAB1 to describe the SubAB-coding genes resembling that in the 98NK2 strain and subAB2 to describe those resembling that in the ED 591 strain.
Tong, Pin; Monahan, Jack; Prendergast, James G D
2017-03-01
Large-scale gene expression datasets are providing an increasing understanding of the location of cis-eQTLs in the human genome and their role in disease. However, little is currently known regarding the extent of regulatory site-sharing between genes. This is despite it having potentially wide-ranging implications, from the determination of the way in which genetic variants may shape multiple phenotypes to the understanding of the evolution of human gene order. By first identifying the location of non-redundant cis-eQTLs, we show that regulatory site-sharing is a relatively common phenomenon in the human genome, with over 10% of non-redundant regulatory variants linked to the expression of multiple nearby genes. We show that these shared, local regulatory sites are linked to high levels of chromatin looping between the regulatory sites and their associated genes. In addition, these co-regulated gene modules are found to be strongly conserved across mammalian species, suggesting that shared regulatory sites have played an important role in shaping human gene order. The association of these shared cis-eQTLs with multiple genes means they also appear to be unusually important in understanding the genetics of human phenotypes and pleiotropy, with shared regulatory sites more often linked to multiple human phenotypes than other regulatory variants. This study shows that regulatory site-sharing is likely an underappreciated aspect of gene regulation and has important implications for the understanding of various biological phenomena, including how the two and three dimensional structures of the genome have been shaped and the potential causes of disease pleiotropy outside coding regions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gladwin, A.J.; Dixon, J.; Loftus, S.K.
Heparan sulfate-N-deacetylase/N-sulfotransferase (HSST) catalyzes both the N-deacetylation and the N-sulfation of heparan sulfate. Previous studies have resulted in the isolation of the human HSST gene from within the Treacher Collins syndrome locus (TCOF1) critical region on 5q. In the present study, the genomic organization of the HSST gene has been elucidated, and the 14 exons identified have been tested for TCOF1-specific mutations. As a result of these studies, mutations within the coding sequence and adjacent splice junctions of HSST can be excluded from a causative role in the pathogenesis of Treacher Collins syndrome. 13 refs., 1 fig., 2 tabs.
Maruyama, Atsushi; Mimura, Junsei; Itoh, Ken
2014-01-01
Recent studies have disclosed the function of enhancer RNAs (eRNAs), which are long non-coding RNAs transcribed from gene enhancer regions, in transcriptional regulation. However, it remains unclear whether eRNAs are involved in the regulation of human heme oxygenase-1 gene (HO-1) induction. Here, we report that multiple nuclear-enriched eRNAs are transcribed from the regions adjacent to two human HO-1 enhancers (i.e. the distal E2 and proximal E1 enhancers), and some of these eRNAs are induced by the oxidative stress-causing reagent diethyl maleate (DEM). We demonstrated that the expression of one forward direction (5′ to 3′) eRNA transcribed from the human HO-1 E2 enhancer region (named human HO-1enhancer RNA E2-3; hereafter called eRNA E2-3) was induced by DEM in an NRF2-dependent manner in HeLa cells. Conversely, knockdown of BACH1, a repressor of HO-1 transcription, further increased DEM-inducible eRNA E2-3 transcription as well as HO-1 expression. In addition, we showed that knockdown of eRNA E2-3 selectively down-regulated DEM-induced HO-1 expression. Furthermore, eRNA E2-3 knockdown attenuated DEM-induced Pol II binding to the promoter and E2 enhancer regions of HO-1 without affecting NRF2 recruitment to the E2 enhancer. These findings indicate that eRNAE2-3 is functional and is required for HO-1 induction. PMID:25404134
Wilks, A; Black, S M; Miller, W L; Ortiz de Montellano, P R
1995-04-04
A human heme oxygenase (hHO-1) gene without the sequence coding for the last 23 amino acids has been expressed in Escherichia coli behind the pho A promoter. The truncated enzyme is obtained in high yields as a soluble, catalytically-active protein, making it available for the first time for detailed mechanistic studies. The purified, truncated hHO-1/heme complex is spectroscopically indistinguishable from that of the rat enzyme and converts heme to biliverdin when reconstituted with rat liver cytochrome P450 reductase. A self-sufficient heme oxygenase system has been obtained by fusing the truncated hHO-1 gene to the gene for human cytochrome P450 reductase without the sequence coding for the 20 amino acid membrane binding domain. Expression of the fusion protein in pCWori+ yields a protein that only requires NADPH for catalytic turnover. The failure of exogenous cytochrome P450 reductase to stimulate turnover and the insensitivity of the catalytic rate toward changes in ionic strength establish that electrons are transferred intramolecularly between the reductase and heme oxygenase domains of the fusion protein. The Vmax for the fusion protein is 2.5 times higher than that for the reconstituted system. Therefore, either the covalent tether does not interfere with normal docking and electron transfer between the flavin and heme domains or alternative but equally efficient electron transfer pathways are available that do not require specific docking.
An atlas of gene expression and gene co-regulation in the human retina.
Pinelli, Michele; Carissimo, Annamaria; Cutillo, Luisa; Lai, Ching-Hung; Mutarelli, Margherita; Moretti, Maria Nicoletta; Singh, Marwah Veer; Karali, Marianthi; Carrella, Diego; Pizzo, Mariateresa; Russo, Francesco; Ferrari, Stefano; Ponzin, Diego; Angelini, Claudia; Banfi, Sandro; di Bernardo, Diego
2016-07-08
The human retina is a specialized tissue involved in light stimulus transduction. Despite its unique biology, an accurate reference transcriptome is still missing. Here, we performed gene expression analysis (RNA-seq) of 50 retinal samples from non-visually impaired post-mortem donors. We identified novel transcripts with high confidence (Observed Transcriptome (ObsT)) and quantified the expression level of known transcripts (Reference Transcriptome (RefT)). The ObsT included 77 623 transcripts (23 960 genes) covering 137 Mb (35 Mb new transcribed genome). Most of the transcripts (92%) were multi-exonic: 81% with known isoforms, 16% with new isoforms and 3% belonging to new genes. The RefT included 13 792 genes across 94 521 known transcripts. Mitochondrial genes were among the most highly expressed, accounting for about 10% of the reads. Of all the protein-coding genes in Gencode, 65% are expressed in the retina. We exploited inter-individual variability in gene expression to infer a gene co-expression network and to identify genes specifically expressed in photoreceptor cells. We experimentally validated the photoreceptors localization of three genes in human retina that had not been previously reported. RNA-seq data and the gene co-expression network are available online (http://retina.tigem.it). © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Characterization and mapping of the mouse NDP (Norrie disease) locus (Ndp).
Battinelli, E M; Boyd, Y; Craig, I W; Breakefield, X O; Chen, Z Y
1996-02-01
Norrie disease is a severe X-linked recessive neurological disorder characterized by congenital blindness with progressive loss of hearing. Over half of Norrie patients also manifest different degrees of mental retardation. The gene for Norrie disease (NDP) has recently been cloned and characterized. With the human NDP cDNA, mouse genomic phage libraries were screened for the homolog of the gene. Comparison between mouse and human genomic DNA blots hybridized with the NDP cDNA, as well as analysis of phage clones, shows that the mouse NDP gene is 29 kb in size (28 kb for the human gene). The organization in the two species is very similar. Both have three exons with similar-sized introns and identical exon-intron boundaries between exon 2 and 3. The mouse open reading frame is 393 bp and, like the human coding sequence, is encoded in exons 2 and 3. The absence of six nucleotides in the second mouse exon results in the encoded protein being two amino acids smaller than its human counterpart. The overall homology between the human and mouse NDP protein is 95% and is particularly high (99%) in exon 3, consistent with the apparent functional importance of this region. Analysis of transcription initiation sites suggests the presence of multiple start sites associated with expression of the mouse NDP gene. Pedigree analysis of an interspecific mouse backcross localizes the mouse NDP gene close to Maoa in the conserved segment, which runs from CYBB to PFC in both human and mouse.
Structure and genomic organization of the human B1 receptor gene for kinins (BDKRB1).
Bachvarov, D R; Hess, J F; Menke, J G; Larrivée, J F; Marceau, F
1996-05-01
Two subtypes of mammalian bradykinin receptors, B1 and B2 (BDKRB1 and BDKRB2), have been defined based on their pharmacological properties. The B1 type kinin receptors have weak affinity for intact BK or Lys-BK but strong affinity for kinin metabolites without the C-terminal arginine (e.g., des-Arg9-BK and Lys-des-Arg9-BK, also called des-Arg10-kallidin), which are generated by kininase I. The B1 receptor expression is up-regulated following tissue injury and inflammation (hyperemia, exudation, hyperalgesia, etc.). In the present study, we have cloned and sequenced the gene encoding human B1 receptor from a human genomic library. The human B1 receptor gene contains three exons separated by two introns. The first and the second exon are noncoding, while the coding region and the 3'-flanking region are located entirely on the third exon. The exon-intron arrangement of the human B1 receptor gene shows significant similarity with the genes encoding the B2 receptor subtype in human, mouse, and rat. Sequence analysis of the 5'-flanking region revealed the presence of a consensus TATA box and of numerous candidate transcription factor binding sequences. Primer extension experiments have shown the existence of multiple transcription initiation sites situated downstream and upstream from the consensus TATA box. Genomic Southern blot analysis indicated that the human B1 receptor is encoded by a single-copy gene.
Dweep, Harsh; Sticht, Carsten; Pandey, Priyanka; Gretz, Norbert
2011-10-01
MicroRNAs are small, non-coding RNA molecules that can complementarily bind to the mRNA 3'-UTR region to regulate the gene expression by transcriptional repression or induction of mRNA degradation. Increasing evidence suggests a new mechanism by which miRNAs may regulate target gene expression by binding in promoter and amino acid coding regions. Most of the existing databases on miRNAs are restricted to mRNA 3'-UTR region. To address this issue, we present miRWalk, a comprehensive database on miRNAs, which hosts predicted as well as validated miRNA binding sites, information on all known genes of human, mouse and rat. All mRNAs, mitochondrial genes and 10 kb upstream flanking regions of all known genes of human, mouse and rat were analyzed by using a newly developed algorithm named 'miRWalk' as well as with eight already established programs for putative miRNA binding sites. An automated and extensive text-mining search was performed on PubMed database to extract validated information on miRNAs. Combined information was put into a MySQL database. miRWalk presents predicted and validated information on miRNA-target interaction. Such a resource enables researchers to validate new targets of miRNA not only on 3'-UTR, but also on the other regions of all known genes. The 'Validated Target module' is updated every month and the 'Predicted Target module' is updated every 6 months. miRWalk is freely available at http://mirwalk.uni-hd.de/. Copyright © 2011 Elsevier Inc. All rights reserved.
Imprinted and X-linked non-coding RNAs as potential regulators of human placental function
Buckberry, Sam; Bianco-Miotto, Tina; Roberts, Claire T
2014-01-01
Pregnancy outcome is inextricably linked to placental development, which is strictly controlled temporally and spatially through mechanisms that are only partially understood. However, increasing evidence suggests non-coding RNAs (ncRNAs) direct and regulate a considerable number of biological processes and therefore may constitute a previously hidden layer of regulatory information in the placenta. Many ncRNAs, including both microRNAs and long non-coding transcripts, show almost exclusive or predominant expression in the placenta compared with other somatic tissues and display altered expression patterns in placentas from complicated pregnancies. In this review, we explore the results of recent genome-scale and single gene expression studies using human placental tissue, but include studies in the mouse where human data are lacking. Our review focuses on the ncRNAs epigenetically regulated through genomic imprinting or X-chromosome inactivation and includes recent evidence surrounding the H19 lincRNA, the imprinted C19MC cluster microRNAs, and X-linked miRNAs associated with pregnancy complications. PMID:24081302
Genomic assessment of the evolution of the prion protein gene family in vertebrates.
Harrison, Paul M; Khachane, Amit; Kumar, Manish
2010-05-01
Prion diseases are devastating neurological disorders caused by the propagation of particles containing an alternative beta-sheet-rich form of the prion protein (PrP). Genes paralogous to PrP, called Doppel and Shadoo, have been identified, that also have neuropathological relevance. To aid in the further functional characterization of PrP and its relatives, we annotated completely the PrP gene family (PrP-GF), in the genomes of 42 vertebrates, through combined strategic application of gene prediction programs and advanced remote homology detection techniques (such as HMMs, PSI-TBLASTN and pGenThreader). We have uncovered several previously undescribed paralogous genes and pseudogenes. We find that current high-quality genomic evidence indicates that the PrP relative Doppel, was likely present in the last common ancestor of present-day Tetrapoda, but was lost in the bird lineage, since its divergence from reptiles. Using the new gene annotations, we have defined the consensus of structural features that are characteristic of the PrP and Doppel structures, across diverse Tetrapoda clades. Furthermore, we describe in detail a transcribed pseudogene derived from Shadoo that is conserved across primates, and that overlaps the meiosis gene, SYCE1, thus possibly regulating its expression. In addition, we analysed the locus of PRNP/PRND for significant conservation across the genomic DNA of eleven mammals, and determined the phylogenetic penetration of non-coding exons. The genomic evidence indicates that the second PRNP non-coding exon found in even-toed ungulates and rodents, is conserved in all high-coverage genome assemblies of primates (human, chimp, orang utan and macaque), and is, at least, likely to have fallen out of use during primate speciation. Furthermore, we have demonstrated that the PRNT gene (at the PRNP human locus) is conserved across at least sixteen mammals, and evolves like a long non-coding RNA, fashioned from fragments of ancient, long, interspersed elements. These annotations and evolutionary analyses will be of further use for functional characterisation of the PrP-GF, and will be updatable in a semi-automated fashion as more genomes accumulate. Copyright 2010 Elsevier Inc. All rights reserved.
Brancaccio, Mariarita; Coretti, Lorena; Florio, Ermanno; Pezone, Antonio; Calabrò, Viola; Falco, Geppino; Keller, Simona; Lembo, Francesca; Avvedimento, Vittorio Enrico; Chiariotti, Lorenzo
2016-01-01
Bacterial lipopolysaccharide (LPS) induces release of inflammatory mediators both in immune and epithelial cells. We investigated whether changes of epigenetic marks, including selected histone modification and DNA methylation, may drive or accompany the activation of COX-2 gene in HT-29 human intestinal epithelial cells upon exposure to LPS. Here we describe cyclical histone acetylation (H3), methylation (H3K4, H3K9, H3K27) and DNA methylation changes occurring at COX-2 gene promoter overtime after LPS stimulation. Histone K27 methylation changes are carried out by the H3 demethylase JMJD3 and are essential for COX-2 induction by LPS. The changes of the histone code are associated with cyclical methylation signatures at the promoter and gene body of COX-2 gene. PMID:27253528
Weil, D; Levy, G; Sahly, I; Levi-Acobas, F; Blanchard, S; El-Amraoui, A; Crozet, F; Philippe, H; Abitbol, M; Petit, C
1996-04-16
The gene encoding human myosin VIIA is responsible for Usher syndrome type III (USH1B), a disease which associates profound congenital sensorineural deafness, vestibular dysfunction, and retinitis pigmentosa. The reconstituted cDNA sequence presented here predicts a 2215 amino acid protein with a typical unconventional myosin structure. This protein is expected to dimerize into a two-headed molecule. The C terminus of its tail shares homology with the membrane-binding domain of the band 4.1 protein superfamily. The gene consists of 48 coding exons. It encodes several alternatively spliced forms. In situ hybridization analysis in human embryos demonstrates that the myosin VIIA gene is expressed in the pigment epithelium and the photoreceptor cells of the retina, thus indicating that both cell types may be involved in the USH1B retinal degenerative process. In addition, the gene is expressed in the human embryonic cochlear and vestibular neuroepithelia. We suggest that deafness and vestibular dysfunction in USH1B patients result from a defect in the morphogenesis of the inner ear sensory cell stereocilia.
Liu, Guo-Hua; Li, Sheng; Zou, Feng-Cai; Wang, Chun-Ren; Zhu, Xing-Quan
2016-01-01
Passalurus ambiguus (Nematda: Oxyuridae) is a common pinworm which parasitizes in the caecum and colon of rabbits. Despite its significance as a pathogen, the epidemiology, genetics, systematics, and biology of this pinworm remain poorly understood. In the present study, we sequenced the complete mitochondrial (mt) genome of P. ambiguus. The circular mt genome is 14,023 bp in size and encodes of 36 genes, including 12 protein-coding, two ribosomal RNA, and 22 transfer RNA genes. The mt gene order of P. ambiguus is the same as that of Wellcomia siamensis, but distinct from that of Enterobius vermicularis. Phylogenetic analyses based on concatenated amino acid sequences of 12 protein-coding genes by Bayesian inference (BI) showed that P. ambiguus was more closely related to W. siamensis than to E. vermicularis. This mt genome provides novel genetic markers for studying the molecular epidemiology, population genetics, systematics of pinworm of animals and humans, and should have implications for the diagnosis, prevention, and control of passaluriasis in rabbits and other animals.
Javierre, Biola M; Burren, Oliver S; Wilder, Steven P; Kreuzhuber, Roman; Hill, Steven M; Sewitz, Sven; Cairns, Jonathan; Wingett, Steven W; Várnai, Csilla; Thiecke, Michiel J; Burden, Frances; Farrow, Samantha; Cutler, Antony J; Rehnström, Karola; Downes, Kate; Grassi, Luigi; Kostadima, Myrto; Freire-Pritchett, Paula; Wang, Fan; Stunnenberg, Hendrik G; Todd, John A; Zerbino, Daniel R; Stegle, Oliver; Ouwehand, Willem H; Frontini, Mattia; Wallace, Chris; Spivakov, Mikhail; Fraser, Peter
2016-11-17
Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
MicroRNAs in genetic disease: rethinking the dosage.
Henrion-Caude, Alexandra; Girard, Muriel; Amiel, Jeanne
2012-08-01
To date, the general assumption was that most mutations interested protein-coding genes only. Thus, only few illustrations have mentioned here that mutations may occur in non-protein coding genes such as microRNAs (miRNAs). We thus report progress in delineating their contribution as phenotypic modulators, genetic switches and fine-tuners of gene expression. We reasoned that browsing their contribution to genetic disease may provide a framework for understanding the proper requirements to devise miRNA-based therapy strategies, in particular the relief of an appropriate dosage. Gain and loss of function of miRNA enforce the need to respectively antagonize or supply the miRNAs. We further categorized human disease according to the different ways in which the miRNA was altered arising either de novo, or inherited whether as a mendelian or as an epistatic trait, uncovering its role in epigenetics. We discuss how improving our knowledge on the contribution of miRNAs to genetic disease may be beneficial to devise appropriate gene therapy strategies.
Roberts, Thomas; Roiser, Jonathan P
2010-11-01
The human leukocyte antigen (HLA) is the most polymorphic region of the genome, coding for proteins that mediate human immune response. This polymorphism may be maintained by balancing selection and certain populations show deviations from expected gene frequencies. Supporting this hypothesis, studies into olfactory preferences have suggested that females prefer the scent of males with dissimilar HLA to their own. However, it has also been proposed that androstenones play a role in female mate choice, and as these molecules inhibit the immune system, this has implications for the theory of HLA-driven mate preference. This review will critically analyze the findings of studies investigating olfactory preference in humans, and their implications for these two contrasting theories of mate choice.
Transfer of carbohydrate-active enzymes from marine bacteria to Japanese gut microbiota.
Hehemann, Jan-Hendrik; Correc, Gaëlle; Barbeyron, Tristan; Helbert, William; Czjzek, Mirjam; Michel, Gurvan
2010-04-08
Gut microbes supply the human body with energy from dietary polysaccharides through carbohydrate active enzymes, or CAZymes, which are absent in the human genome. These enzymes target polysaccharides from terrestrial plants that dominated diet throughout human evolution. The array of CAZymes in gut microbes is highly diverse, exemplified by the human gut symbiont Bacteroides thetaiotaomicron, which contains 261 glycoside hydrolases and polysaccharide lyases, as well as 208 homologues of susC and susD-genes coding for two outer membrane proteins involved in starch utilization. A fundamental question that, to our knowledge, has yet to be addressed is how this diversity evolved by acquiring new genes from microbes living outside the gut. Here we characterize the first porphyranases from a member of the marine Bacteroidetes, Zobellia galactanivorans, active on the sulphated polysaccharide porphyran from marine red algae of the genus Porphyra. Furthermore, we show that genes coding for these porphyranases, agarases and associated proteins have been transferred to the gut bacterium Bacteroides plebeius isolated from Japanese individuals. Our comparative gut metagenome analyses show that porphyranases and agarases are frequent in the Japanese population and that they are absent in metagenome data from North American individuals. Seaweeds make an important contribution to the daily diet in Japan (14.2 g per person per day), and Porphyra spp. (nori) is the most important nutritional seaweed, traditionally used to prepare sushi. This indicates that seaweeds with associated marine bacteria may have been the route by which these novel CAZymes were acquired in human gut bacteria, and that contact with non-sterile food may be a general factor in CAZyme diversity in human gut microbes.
GENETICALLY MODIFIED FOODS: TECHNOLOGICAL BREAKTHROUGH OR ECOLOGICAL NIGHMARE?
Fifty years ago, Wastson and Crick described the structure of DNA, setting the stage for the past decade's biotechnology revolution. Scientists have now broken the code of the entire human genome, and delineated the function of multiple genes; similar strides are being taken with...
Population-specific variation in haplotype composition and heterozygosity at the POLB locus.
Yamtich, Jennifer; Speed, William C; Straka, Eva; Kidd, Judith R; Sweasy, Joann B; Kidd, Kenneth K
2009-05-01
DNA polymerase beta plays a central role in base excision repair (BER), which removes large numbers of endogenous DNA lesions from each cell on a daily basis. Little is currently known about germline polymorphisms within the POLB locus, making it difficult to study the association of variants at this locus with human diseases such as cancer. Yet, approximately thirty percent of human tumor types show variants of DNA polymerase beta. We have assessed the global frequency distributions of coding and common non-coding SNPs in and flanking the POLB gene for a total of 14 sites typed in approximately 2400 individuals from anthropologically defined human populations worldwide. We have found a marked difference between haplotype frequencies in African populations and in non-African populations.
Wan, Jijun; Yourshaw, Michael; Mamsa, Hafsa; Rudnik-Schöneborn, Sabine; Menezes, Manoj P.; Hong, Ji Eun; Leong, Derek W.; Senderek, Jan; Salman, Michael S.; Chitayat, David; Seeman, Pavel; von Moers, Arpad; Graul-Neumann, Luitgard; Kornberg, Andrew J.; Castro-Gago, Manuel; Sobrido, María-Jesús; Sanefuji, Masafumi; Shieh, Perry B.; Salamon, Noriko; Kim, Ronald C.; Vinters, Harry V.; Chen, Zugen; Zerres, Klaus; Ryan, Monique M.; Nelson, Stanley F.; Jen, Joanna C.
2012-01-01
RNA exosomes are multi-subunit complexes conserved throughout evolution1 and emerging as the major cellular machinery for processing, surveillance, and turnover of a diverse spectrum of coding and non-coding RNA substrates essential for viability2. By exome sequencing, we discovered recessive mutations in exosome component 3 (EXOSC3) in four siblings with infantile spinal motor neuron disease, cerebellar atrophy, progressive microcephaly, and profound global developmental delay, consistent with pontocerebellar hypoplasia type 1 [PCH1; OMIM 607596]3–6. We identified mutations in EXOSC3 in an additional 8 of 12 families with PCH1. Morpholino knockdown of exosc3 in zebrafish embryos caused embryonic maldevelopment with small brain and poor motility, reminiscent of human clinical features and largely rescued by coinjected wildtype but not mutant exosc3 mRNA. These findings represent the first example of an RNA exosome gene responsible for a human disease and further implicate dysregulation of RNA processing in cerebellar and spinal motor neuron maldevelopment and degeneration. PMID:22544365
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pech, M.; Gazit, A.; Arnstein, P.
1989-04-01
A retrovirus containing the entire human platelet-derived growth factor B-chain (PDGF-B) gene was constructed in order to investigate the in vivo biological activity of its encoded growth factor. When this virus was introduced into newborn mice, it reproducibly generated fibrosarcomas at the site of inoculation. Proviruses in each fibrosarcoma analyzed had lost 149 nucleotides downstream of the PDGF-B coding region. This deletion originated from an alternative or aberrant splice event that occurred within exon 7 of the PDGF-B gene and mimicked the v-sis oncogene. Thus, deletion of this region may be necessary for efficient retrovirus replication or for more potentmore » transforming function. Evidence that the normal growth factor coding sequence was unaltered derived from RNase protection studies and immunoprecipitation analysis. Tumors were generally polyclonal but demonstrated clonal subpopulations. Moreover, tumor-derived cell lines became monoclonal within a few tissue culture passages and rapidly formed tumors in vivo. These findings argue that overexpression of the normal human PDGF-B gene product under retrovirus control can induce the fully malignant phenotype.« less
Epigenetic regulation in human melanoma: past and future.
Sarkar, Debina; Leung, Euphemia Y; Baguley, Bruce C; Finlay, Graeme J; Askarian-Amiri, Marjan E
2015-01-01
The development and progression of melanoma have been attributed to independent or combined genetic and epigenetic events. There has been remarkable progress in understanding melanoma pathogenesis in terms of genetic alterations. However, recent studies have revealed a complex involvement of epigenetic mechanisms in the regulation of gene expression, including methylation, chromatin modification and remodeling, and the diverse activities of non-coding RNAs. The roles of gene methylation and miRNAs have been relatively well studied in melanoma, but other studies have shown that changes in chromatin status and in the differential expression of long non-coding RNAs can lead to altered regulation of key genes. Taken together, they affect the functioning of signaling pathways that influence each other, intersect, and form networks in which local perturbations disturb the activity of the whole system. Here, we focus on how epigenetic events intertwine with these pathways and contribute to the molecular pathogenesis of melanoma.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ploos van Amstel, H.; Reitsma, P.H.; van der Logt, C.P.
The human protein S locus on chromosome 3 consists of two protein S genes, PS{alpha} and PS{beta}. Here the authors report the cloning and characterization of both genes. Fifteen exons of the PS{alpha} gene were identified that together code for protein S mRNA as derived from the reported protein S cDNAs. Analysis by primer extension of liver protein S mRNA, however, reveals the presence of two mRNA forms that differ in the length of their 5{prime}-noncoding region. Both transcripts contain a 5{prime}-noncoding region longer than found in the protein S cDNAs. The two products may arise from alternative splicing ofmore » an additional intron in this region or from the usage of two start sites for transcription. The intron-exon organization of the PS{alpha} gene fully supports the hypothesis that the protein S gene is the product of an evolutional assembling process in which gene modules coding for structural/functional protein units also found in other coagulation proteins have been put upstream of the ancestral gene of a steroid hormone binding protein. The PS{beta} gene is identified as a pseudogene. It contains a large variety of detrimental aberrations, viz., the absence of exon I, a splice site mutation, three stop codons, and a frame shift mutation. Overall the two genes PS{alpha} and PS{beta} show between their exonic sequences 96.5% homology. Southern analysis of primate DNA showed that the duplication of the ancestral protein S gene has occurred after the branching of the orangutan from the African apes. A nonsense mutation that is present in the pseudogene of man also could be identified in one of the two protein S genes of both chimpanzee and gorilla. This implicates that silencing of one of the two protein S genes must have taken place before the divergence of the three African apes.« less
Staphylococcus aureus Transcriptome Architecture: From Laboratory to Infection-Mimicking Conditions
Depke, Maren; Pané-Farré, Jan; Debarbouille, Michel; van der Kooi-Pol, Magdalena M.; Guérin, Cyprien; Dérozier, Sandra; Hiron, Aurelia; Jarmer, Hanne; Leduc, Aurélie; Michalik, Stephan; Reilman, Ewoud; Schaffer, Marc; Schmidt, Frank; Bessières, Philippe; Noirot, Philippe; Hecker, Michael; Msadek, Tarek; Völker, Uwe; van Dijl, Jan Maarten
2016-01-01
Staphylococcus aureus is a major pathogen that colonizes about 20% of the human population. Intriguingly, this Gram-positive bacterium can survive and thrive under a wide range of different conditions, both inside and outside the human body. Here, we investigated the transcriptional adaptation of S. aureus HG001, a derivative of strain NCTC 8325, across experimental conditions ranging from optimal growth in vitro to intracellular growth in host cells. These data establish an extensive repertoire of transcription units and non-coding RNAs, a classification of 1412 promoters according to their dependence on the RNA polymerase sigma factors SigA or SigB, and allow identification of new potential targets for several known transcription factors. In particular, this study revealed a relatively low abundance of antisense RNAs in S. aureus, where they overlap only 6% of the coding genes, and only 19 antisense RNAs not co-transcribed with other genes were found. Promoter analysis and comparison with Bacillus subtilis links the small number of antisense RNAs to a less profound impact of alternative sigma factors in S. aureus. Furthermore, we revealed that Rho-dependent transcription termination suppresses pervasive antisense transcription, presumably originating from abundant spurious transcription initiation in this A+T-rich genome, which would otherwise affect expression of the overlapped genes. In summary, our study provides genome-wide information on transcriptional regulation and non-coding RNAs in S. aureus as well as new insights into the biological function of Rho and the implications of spurious transcription in bacteria. PMID:27035918
Bacterium induces cryptic meroterpenoid pathway in the pathogenic fungus Aspergillus fumigatus.
König, Claudia C; Scherlach, Kirstin; Schroeckh, Volker; Horn, Fabian; Nietzsche, Sandor; Brakhage, Axel A; Hertweck, Christian
2013-05-27
Stimulating encounter: The intimate, physical interaction between the soil-derived bacterium Streptomyces rapamycinicus and the human pathogenic fungus Aspergillus fumigatus led to the activation of an otherwise silent polyketide synthase (PKS) gene cluster coding for an unusual prenylated polyphenol (fumicycline A). The meroterpenoid pathway is regulated by a pathway-specific activator gene as well as by epigenetic factors. Copyright © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Samusik, Nikolay; Krukovskaya, Larisa; Meln, Irina; Shilov, Evgeny; Kozlov, Andrey P.
2013-01-01
PBOV1 is a known human protein-coding gene with an uncharacterized function. We have previously found that PBOV1 lacks orthologs in non-primate genomes and is expressed in a wide range of tumor types. Here we report that PBOV1 protein-coding sequence is human-specific and has originated de novo in the primate evolution through a series of frame-shift and stop codon mutations. We profiled PBOV1 expression in multiple cancer and normal tissue samples and found that it was expressed in 19 out of 34 tumors of various origins but completely lacked expression in any of the normal adult or fetal human tissues. We found that, unlike the cancer/testis antigens that are typically controlled by CpG island-containing promoters, PBOV1 was expressed from a GC-poor TATA-containing promoter which was not influenced by CpG demethylation and was inactive in testis. Our analysis of public microarray data suggests that PBOV1 activation in tumors could be dependent on the Hedgehog signaling pathway. Despite the recent de novo origin and the lack of identifiable functional signatures, a missense SNP in the PBOV1 coding sequence has been previously associated with an increased risk of breast cancer. Using publicly available microarray datasets, we found that high levels of PBOV1 expression in breast cancer and glioma samples were significantly associated with a positive outcome of the cancer disease. We also found that PBOV1 was highly expressed in primary but not in recurrent high-grade gliomas, suggesting the presence of a negative selection against PBOV1-expressing cancer cells. Our findings could contribute to the understanding of the mechanisms behind de novo gene origin and the possible role of tumors in this process. PMID:23418531
Assignment of the {beta}-arrestin 1 gene (ARRB1) to human chromosome 11q13
DOE Office of Scientific and Technical Information (OSTI.GOV)
Calabrese, G.; Morizio, E.; Palka, G.
1994-11-01
Two types of proteins play a major role in determining homologous desensitization of G-coupled receptors: {beta}-adrenergic receptor kinase ({beta}ARK), which phosphorylates the agonist-occupied receptor, and its functional cofactor, {beta}-arrestin. {beta}ARK is a member of a multigene family, consisting of six known subtypes, which have also been named G-protein-coupled receptor kinases (GRK 1 to 6) due to the apparently unique functional association of such kinases with this receptor family. The gene for {beta}ARK1 has been localized to human chromosome 11q13. The four members of the arrestin/{beta}-arrestin gene family identified so far are arrestin, X-arrestin, {beta}-arrestin 1, and {beta}-arrestin 2. Here themore » authors report the chromosome mapping of the human gene for {beta}-arrestin 1 (ARRB1) to chromosome 11q13 by fluorescence in situ hybridization (FISH). Two-color FISH confirmed that the two genes coding for the functionally related proteins {beta}ARK1 and {beta}arrestin 1 both map to 11q13. 16 refs., 1 fig., 1 tab.« less
An Integrated Encyclopedia of DNA Elements in the Human Genome
2012-01-01
Summary The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure, and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall the project provides new insights into the organization and regulation of our genes and genome, and an expansive resource of functional annotations for biomedical research. PMID:22955616
Moqtaderi, Zarmik; Wang, Jie; Raha, Debasish; White, Robert J.; Snyder, Michael; Weng, Zhiping; Struhl, Kevin
2012-01-01
Genome-wide occupancy profiles of five components of the RNA Polymerase III (Pol III) machinery in human cells identified the expected tRNA and non-coding RNA targets and revealed many additional Pol III-associated loci, mostly near SINEs. Several genes are targets of an alternative TFIIIB containing Brf2 instead of Brf1 and have extremely low levels of TFIIIC. Strikingly, expressed Pol III genes, unlike non-expressed Pol III genes, are situated in regions with a pattern of histone modifications associated with functional Pol II promoters. TFIIIC alone associates with numerous ETC loci, via the B box or a novel motif. ETCs are often near CTCF binding sites, suggesting a potential role in chromosome organization. Our results suggest that human Pol III complexes associate preferentially with regions near functional Pol II promoters and that TFIIIC-mediated recruitment of TFIIIB is regulated in a locus-specific manner. PMID:20418883
Gene expression regulation by upstream open reading frames and human disease.
Barbosa, Cristina; Peixeiro, Isabel; Romão, Luísa
2013-01-01
Upstream open reading frames (uORFs) are major gene expression regulatory elements. In many eukaryotic mRNAs, one or more uORFs precede the initiation codon of the main coding region. Indeed, several studies have revealed that almost half of human transcripts present uORFs. Very interesting examples have shown that these uORFs can impact gene expression of the downstream main ORF by triggering mRNA decay or by regulating translation. Also, evidence from recent genetic and bioinformatic studies implicates disturbed uORF-mediated translational control in the etiology of many human diseases, including malignancies, metabolic or neurologic disorders, and inherited syndromes. In this review, we will briefly present the mechanisms through which uORFs regulate gene expression and how they can impact on the organism's response to different cell stress conditions. Then, we will emphasize the importance of these structures by illustrating, with specific examples, how disturbed uORF-mediated translational control can be involved in the etiology of human diseases, giving special importance to genotype-phenotype correlations. Identifying and studying more cases of uORF-altering mutations will help us to understand and establish genotype-phenotype associations, leading to advancements in diagnosis, prognosis, and treatment of many human disorders.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gantz, I.; Yamada, Tadataka; Tashiro, Takao
1994-01-15
[alpha]-Melanocyte stimulating hormone ([alpha]-MSH), a hormone originally named for its ability to regulate pigmentation of melanocytes, is a 13-amino-acid post-translational product of the pro-opiomelanocortin (POMC) gene. [alpha]-MSH and the other products of POMC processing, which share the core heptapeptide amino acid sequence Met-Glu (Gly)-His-Phe-Arg-Trp-Gly (Asp), the adrenocorticotropic hormone (ACTH), [beta]-MSH, and [gamma]-MSH, are collectively referred to as melanocortins. While best known for their effects on the melanocyte (pigmentation) and adrenal cortical cells (steroidogenesis), melanocortins have been postulated to function in diverse activities, including enhancement of learning and memory, control of the cardiovascular system, analgesia, thermoregulation, immunomodulation, parturition, and neurotrophism. Tomore » identify the chromosomal band encoding the human melanocortin-1 receptor gene, 1 [mu]g of an EMBL clone coding region of the human MC1R and approximately 15 kb of surrounding DNA was labeled with biotin and hybridized to human metaphase chromosomes as previously described. The results indicate that the human MC1R gene is localized to 16q24.3. 15 refs., 1 fig.« less
Chang, Anne Lynn S; Bitter, Patrick H; Qu, Kun; Lin, Meihong; Rapicavoli, Nicole A; Chang, Howard Y
2013-01-01
Studies in model organisms suggest that aged cells can be functionally rejuvenated, but whether this concept applies to human skin is unclear. Here we apply 3′-end sequencing for expression quantification (“3-seq”) to discover the gene expression program associated with human photoaging and intrinsic skin aging (collectively termed “skin aging”), and the impact of broadband light (BBL) treatment. We find that skin aging was associated with a significantly altered expression level of 2,265 coding and noncoding RNAs, of which 1,293 became “rejuvenated” after BBL treatment; i.e., they became more similar to their expression level in youthful skin. Rejuvenated genes (RGs) included several known key regulators of organismal longevity and their proximal long noncoding RNAs. Skin aging is not associated with systematic changes in 3′-end mRNA processing. Hence, BBL treatment can restore gene expression pattern of photoaged and intrinsically aged human skin to resemble young skin. In addition, our data reveal, to our knowledge, a previously unreported set of targets that may lead to new insights into the human skin aging process. PMID:22931923
Researchers led by Ashish Lal, Ph.D., Investigator in the Genetics Branch, have shown that when the DNA in human colon cancer cells is damaged, a long non-coding RNA (lncRNA) regulates the expression of genes that halt growth, which allows the cells to repair the damage and promote survival. Their findings suggest an important pro-survival function of a lncRNA in cancer cells. Read more...
Discovery of stimulation-responsive immune enhancers with CRISPR activation
Simeonov, Dimitre R.; Gowen, Benjamin G.; Boontanrart, Mandy; Roth, Theodore L.; Gagnon, John D.; Mumbach, Maxwell R.; Satpathy, Ansuman T.; Lee, Youjin; Bray, Nicolas L.; Chan, Alice Y.; Lituiev, Dmytro S.; Nguyen, Michelle L.; Gate, Rachel E.; Subramaniam, Meena; Li, Zhongmei; Woo, Jonathan M.; Mitros, Therese; Ray, Graham J.; Curie, Gemma L.; Naddaf, Nicki; Chu, Julia S.; Ma, Hong; Boyer, Eric; Van Gool, Frederic; Huang, Hailiang; Liu, Ruize; Tobin, Victoria R.; Schumann, Kathrin; Daly, Mark J.; Farh, Kyle K; Ansel, K. Mark; Ye, Chun J.; Greenleaf, William J.; Anderson, Mark S.; Bluestone, Jeffrey A.; Chang, Howard Y.; Corn, Jacob E.; Marson, Alexander
2017-01-01
The majority of genetic variants associated with common human diseases map to enhancers, non-coding elements that shape cell-type-specific transcriptional programs and responses to extracellular cues1–3. Systematic mapping of functional enhancers and their biological contexts is required to understand the mechanisms by which variation in non-coding genetic sequences contributes to disease. Functional enhancers can be mapped by genomic sequence disruption4–6, but this approach is limited to the subset of enhancers that are necessary in the particular cellular context being studied. We hypothesized that recruitment of a strong transcriptional activator to an enhancer would be sufficient to drive target gene expression, even if that enhancer was not currently active in the assayed cells. Here we describe a discovery platform that can identify stimulus-responsive enhancers for a target gene independent of stimulus exposure. We used tiled CRISPR activation (CRISPRa)7 to synthetically recruit a transcriptional activator to sites across large genomic regions (more than 100 kilobases) surrounding two key autoimmunity risk loci, CD69 and IL2RA. We identified several CRISPRa-responsive elements with chromatin features of stimulus-responsive enhancers, including an IL2RA enhancer that harbours an autoimmunity risk variant. Using engineered mouse models, we found that sequence perturbation of the disease-associated Il2ra enhancer did not entirely block Il2ra expression, but rather delayed the timing of gene activation in response to specific extracellular signals. Enhancer deletion skewed polarization of naive T cells towards a pro-inflammatory T helper (TH17) cell state and away from a regulatory T cell state. This integrated approach identifies functional enhancers and reveals how non-coding variation associated with human immune dysfunction alters context-specific gene programs. PMID:28854172
Discovery of stimulation-responsive immune enhancers with CRISPR activation.
Simeonov, Dimitre R; Gowen, Benjamin G; Boontanrart, Mandy; Roth, Theodore L; Gagnon, John D; Mumbach, Maxwell R; Satpathy, Ansuman T; Lee, Youjin; Bray, Nicolas L; Chan, Alice Y; Lituiev, Dmytro S; Nguyen, Michelle L; Gate, Rachel E; Subramaniam, Meena; Li, Zhongmei; Woo, Jonathan M; Mitros, Therese; Ray, Graham J; Curie, Gemma L; Naddaf, Nicki; Chu, Julia S; Ma, Hong; Boyer, Eric; Van Gool, Frederic; Huang, Hailiang; Liu, Ruize; Tobin, Victoria R; Schumann, Kathrin; Daly, Mark J; Farh, Kyle K; Ansel, K Mark; Ye, Chun J; Greenleaf, William J; Anderson, Mark S; Bluestone, Jeffrey A; Chang, Howard Y; Corn, Jacob E; Marson, Alexander
2017-09-07
The majority of genetic variants associated with common human diseases map to enhancers, non-coding elements that shape cell-type-specific transcriptional programs and responses to extracellular cues. Systematic mapping of functional enhancers and their biological contexts is required to understand the mechanisms by which variation in non-coding genetic sequences contributes to disease. Functional enhancers can be mapped by genomic sequence disruption, but this approach is limited to the subset of enhancers that are necessary in the particular cellular context being studied. We hypothesized that recruitment of a strong transcriptional activator to an enhancer would be sufficient to drive target gene expression, even if that enhancer was not currently active in the assayed cells. Here we describe a discovery platform that can identify stimulus-responsive enhancers for a target gene independent of stimulus exposure. We used tiled CRISPR activation (CRISPRa) to synthetically recruit a transcriptional activator to sites across large genomic regions (more than 100 kilobases) surrounding two key autoimmunity risk loci, CD69 and IL2RA. We identified several CRISPRa-responsive elements with chromatin features of stimulus-responsive enhancers, including an IL2RA enhancer that harbours an autoimmunity risk variant. Using engineered mouse models, we found that sequence perturbation of the disease-associated Il2ra enhancer did not entirely block Il2ra expression, but rather delayed the timing of gene activation in response to specific extracellular signals. Enhancer deletion skewed polarization of naive T cells towards a pro-inflammatory T helper (T H 17) cell state and away from a regulatory T cell state. This integrated approach identifies functional enhancers and reveals how non-coding variation associated with human immune dysfunction alters context-specific gene programs.
Greenwald, Scott H.; Kuchenbecker, James A.; Rowlan, Jessica S.; Neitz, Jay; Neitz, Maureen
2017-01-01
Purpose Human long (L) and middle (M) wavelength cone opsin genes are highly variable due to intermixing. Two L/M cone opsin interchange mutants, designated LIAVA and LVAVA, are associated with clinical diagnoses, including red-green color vision deficiency, blue cone monochromacy, cone degeneration, myopia, and Bornholm Eye Disease. Because the protein and splicing codes are carried by the same nucleotides, intermixing L and M genes can cause disease by affecting protein structure and splicing. Methods Genetically engineered mice were created to allow investigation of the consequences of altered protein structure alone, and the effects on cone morphology were examined using immunohistochemistry. In humans and mice, cone function was evaluated using the electroretinogram (ERG) under L/M- or short (S) wavelength cone isolating conditions. Effects of LIAVA and LVAVA genes on splicing were evaluated using a minigene assay. Results ERGs and histology in mice revealed protein toxicity for the LVAVA but not for the LIAVA opsin. Minigene assays showed that the dominant messenger RNA (mRNA) was aberrantly spliced for both variants; however, the LVAVA gene produced a small but significant amount of full-length mRNA and LVAVA subjects had correspondingly reduced ERG amplitudes. In contrast, the LIAVA subject had no L/M cone ERG. Conclusions Dramatic differences in phenotype can result from seemingly minor differences in genotype through divergent effects on the dual amino acid and splicing codes. Translational Relevance The mechanism by which individual mutations contribute to clinical phenotypes provides valuable information for diagnosis and prognosis of vision disorders associated with L/M interchange mutations, and it informs strategies for developing therapies. PMID:28516000
Discovery of stimulation-responsive immune enhancers with CRISPR activation
NASA Astrophysics Data System (ADS)
Simeonov, Dimitre R.; Gowen, Benjamin G.; Boontanrart, Mandy; Roth, Theodore L.; Gagnon, John D.; Mumbach, Maxwell R.; Satpathy, Ansuman T.; Lee, Youjin; Bray, Nicolas L.; Chan, Alice Y.; Lituiev, Dmytro S.; Nguyen, Michelle L.; Gate, Rachel E.; Subramaniam, Meena; Li, Zhongmei; Woo, Jonathan M.; Mitros, Therese; Ray, Graham J.; Curie, Gemma L.; Naddaf, Nicki; Chu, Julia S.; Ma, Hong; Boyer, Eric; van Gool, Frederic; Huang, Hailiang; Liu, Ruize; Tobin, Victoria R.; Schumann, Kathrin; Daly, Mark J.; Farh, Kyle K.; Ansel, K. Mark; Ye, Chun J.; Greenleaf, William J.; Anderson, Mark S.; Bluestone, Jeffrey A.; Chang, Howard Y.; Corn, Jacob E.; Marson, Alexander
2017-09-01
The majority of genetic variants associated with common human diseases map to enhancers, non-coding elements that shape cell-type-specific transcriptional programs and responses to extracellular cues. Systematic mapping of functional enhancers and their biological contexts is required to understand the mechanisms by which variation in non-coding genetic sequences contributes to disease. Functional enhancers can be mapped by genomic sequence disruption, but this approach is limited to the subset of enhancers that are necessary in the particular cellular context being studied. We hypothesized that recruitment of a strong transcriptional activator to an enhancer would be sufficient to drive target gene expression, even if that enhancer was not currently active in the assayed cells. Here we describe a discovery platform that can identify stimulus-responsive enhancers for a target gene independent of stimulus exposure. We used tiled CRISPR activation (CRISPRa) to synthetically recruit a transcriptional activator to sites across large genomic regions (more than 100 kilobases) surrounding two key autoimmunity risk loci, CD69 and IL2RA. We identified several CRISPRa-responsive elements with chromatin features of stimulus-responsive enhancers, including an IL2RA enhancer that harbours an autoimmunity risk variant. Using engineered mouse models, we found that sequence perturbation of the disease-associated Il2ra enhancer did not entirely block Il2ra expression, but rather delayed the timing of gene activation in response to specific extracellular signals. Enhancer deletion skewed polarization of naive T cells towards a pro-inflammatory T helper (TH17) cell state and away from a regulatory T cell state. This integrated approach identifies functional enhancers and reveals how non-coding variation associated with human immune dysfunction alters context-specific gene programs.
Genome assembly and geospatial phylogenomics of the bed bug Cimex lectularius.
Rosenfeld, Jeffrey A; Reeves, Darryl; Brugler, Mercer R; Narechania, Apurva; Simon, Sabrina; Durrett, Russell; Foox, Jonathan; Shianna, Kevin; Schatz, Michael C; Gandara, Jorge; Afshinnekoo, Ebrahim; Lam, Ernest T; Hastie, Alex R; Chan, Saki; Cao, Han; Saghbini, Michael; Kentsis, Alex; Planet, Paul J; Kholodovych, Vladyslav; Tessler, Michael; Baker, Richard; DeSalle, Rob; Sorkin, Louis N; Kolokotronis, Sergios-Orestis; Siddall, Mark E; Amato, George; Mason, Christopher E
2016-02-02
The common bed bug (Cimex lectularius) has been a persistent pest of humans for thousands of years, yet the genetic basis of the bed bug's basic biology and adaptation to dense human environments is largely unknown. Here we report the assembly, annotation and phylogenetic mapping of the 697.9-Mb Cimex lectularius genome, with an N50 of 971 kb, using both long and short read technologies. A RNA-seq time course across all five developmental stages and male and female adults generated 36,985 coding and noncoding gene models. The most pronounced change in gene expression during the life cycle occurs after feeding on human blood and included genes from the Wolbachia endosymbiont, which shows a simultaneous and coordinated host/commensal response to haematophagous activity. These data provide a rich genetic resource for mapping activity and density of C. lectularius across human hosts and cities, which can help track, manage and control bed bug infestations.
Genome assembly and geospatial phylogenomics of the bed bug Cimex lectularius
Rosenfeld, Jeffrey A.; Reeves, Darryl; Brugler, Mercer R.; Narechania, Apurva; Simon, Sabrina; Durrett, Russell; Foox, Jonathan; Shianna, Kevin; Schatz, Michael C.; Gandara, Jorge; Afshinnekoo, Ebrahim; Lam, Ernest T.; Hastie, Alex R.; Chan, Saki; Cao, Han; Saghbini, Michael; Kentsis, Alex; Planet, Paul J.; Kholodovych, Vladyslav; Tessler, Michael; Baker, Richard; DeSalle, Rob; Sorkin, Louis N.; Kolokotronis, Sergios-Orestis; Siddall, Mark E.; Amato, George; Mason, Christopher E.
2016-01-01
The common bed bug (Cimex lectularius) has been a persistent pest of humans for thousands of years, yet the genetic basis of the bed bug's basic biology and adaptation to dense human environments is largely unknown. Here we report the assembly, annotation and phylogenetic mapping of the 697.9-Mb Cimex lectularius genome, with an N50 of 971 kb, using both long and short read technologies. A RNA-seq time course across all five developmental stages and male and female adults generated 36,985 coding and noncoding gene models. The most pronounced change in gene expression during the life cycle occurs after feeding on human blood and included genes from the Wolbachia endosymbiont, which shows a simultaneous and coordinated host/commensal response to haematophagous activity. These data provide a rich genetic resource for mapping activity and density of C. lectularius across human hosts and cities, which can help track, manage and control bed bug infestations. PMID:26836631
The developmental transcriptome of Drosophila melanogaster
DOE Office of Scientific and Technical Information (OSTI.GOV)
University of Connecticut; Graveley, Brenton R.; Brooks, Angela N.
Drosophila melanogaster is one of the most well studied genetic model organisms; nonetheless, its genome still contains unannotated coding and non-coding genes, transcripts, exons and RNA editing sites. Full discovery and annotation are pre-requisites for understanding how the regulation of transcription, splicing and RNA editing directs the development of this complex organism. Here we used RNA-Seq, tiling microarrays and cDNA sequencing to explore the transcriptome in 30 distinct developmental stages. We identified 111,195 new elements, including thousands of genes, coding and non-coding transcripts, exons, splicing and editing events, and inferred protein isoforms that previously eluded discovery using established experimental, predictionmore » and conservation-based approaches. These data substantially expand the number of known transcribed elements in the Drosophila genome and provide a high-resolution view of transcriptome dynamics throughout development. Drosophila melanogaster is an important non-mammalian model system that has had a critical role in basic biological discoveries, such as identifying chromosomes as the carriers of genetic information and uncovering the role of genes in development. Because it shares a substantial genic content with humans, Drosophila is increasingly used as a translational model for human development, homeostasis and disease. High-quality maps are needed for all functional genomic elements. Previous studies demonstrated that a rich collection of genes is deployed during the life cycle of the fly. Although expression profiling using microarrays has revealed the expression of, 13,000 annotated genes, it is difficult to map splice junctions and individual base modifications generated by RNA editing using such approaches. Single-base resolution is essential to define precisely the elements that comprise the Drosophila transcriptome. Estimates of the number of transcript isoforms are less accurate than estimates of the number of genes. Whereas, 20% of Drosophila genes are annotated as encoding alternatively spliced premRNAs, splice-junction microarray experiments indicate that this number is at least 40% (ref. 7). Determining the diversity of mRNAs generated by alternative promoters, alternative splicing and RNA editing will substantially increase the inferred protein repertoire. Non-coding RNA genes (ncRNAs) including short interfering RNAs (siRNAs) and microRNAS (miRNAs) (reviewed in ref. 10), and longer ncRNAs such as bxd (ref. 11) and rox (ref. 12), have important roles in gene regulation, whereas others such as small nucleolar RNAs (snoRNAs)and small nuclear RNAs (snRNAs) are important components of macromolecular machines such as the ribosome and spliceosome. The transcription and processing of these ncRNAs must also be fully documented and mapped. As part of the modENCODE project to annotate the functional elements of the D. melanogaster and Caenorhabditis elegans genomes, we used RNA-Seq and tiling microarrays to sample the Drosophila transcriptome at unprecedented depth throughout development from early embryo to ageing male and female adults. We report on a high-resolution view of the discovery, structure and dynamic expression of the D. melanogaster transcriptome.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gault, J.; Zonana, J.; Zeltinger, J.
A conserved mouse genomic clone was used to identify a homologous human genomic clone (the DXS732E locus), which was subsequently employed to isolate cDNAs from a human fetal brain library. Nine unique overlapping cDNAs were isolated, and sequences analysis of 3.9 kb identified a putative 1 kb ORF. GRAIL analysis of the sequence supported the hypothesis that the putative ORF was coding sequence, and Prosite analysis of the putative ORF identified potential glycosylation and phosphorylation sites. The 5{prime} end of the gene maps within a CpG island, and comparison of cDNA sequences indicate the gene is alternatively spliced at itsmore » 3{prime} end. Northern analysis and RT-PCR indicate that two different sized messages appear to be expressed with the gene expressed in human fetal kidney, intestine, brain, and muscle. The gene is expressed in 77 day human skin, a time when hair follicle formation occurs. Anhidrotic ectodermal dysplasia (EDA) results in the abnormal morphogenesis of hair, teeth and eccrine sweat glands. A positional cloning strategy towards cloning the EDA gene had been used, and deletion and X-autosome translocation patients have been useful in further delimiting the EDA region. The present gene at the DXS732E locus is partially deleted in one EDA patient who does not have other apparent abnormalities. No rearrangements of the gene have been detected in two female X-autosome translocation EDA patients, nor in four additional male patients with submicroscopic molecular deletions.« less
Jasinska, Anna J.; Zelaya, Ivette; Service, Susan K.; Peterson, Christine B.; Cantor, Rita M.; Choi, Oi-Wa; DeYoung, Joseph; Eskin, Eleazar; Fairbanks, Lynn A.; Fears, Scott; Furterer, Allison E.; Huang, Yu S.; Ramensky, Vasily; Schmitt, Christopher A.; Svardal, Hannes; Jorgensen, Matthew J.; Kaplan, Jay R.; Villar, Diego; Aken, Bronwen L.; Flicek, Paul; Nag, Rishi; Wong, Emily S.; Blangero, John; Dyer, Thomas D.; Bogomolov, Marina; Benjamini, Yoav; Weinstock, George M.; Dewar, Ken; Sabatti, Chiara; Wilson, Richard K.; Jentsch, J. David; Warren, Wesley; Coppola, Giovanni; Woods, Roger P.; Freimer, Nelson B.
2017-01-01
By analyzing multi-tissue gene expression and genome-wide genetic variation data in samples from a vervet monkey pedigree, we generated a transcriptome resource and produced the first catalogue of expression quantitative trait loci (eQTLs) in a non-human primate model. This catalogue contains more genome-wide significant eQTLs, per sample, than comparable human resources, and reveals sex and age-related expression patterns. Findings include a master regulatory locus that likely plays a role in immune function, and a locus regulating hippocampal long non-coding RNAs (lncRNAs), whose expression correlates with hippocampal volume. This resource will facilitate genetic investigation of quantitative traits, including brain and behavioral phenotypes relevant to neuropsychiatric disorders. PMID:29083405
The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.
Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R
1982-01-01
The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791
Epitopes of human testis-specific lactate dehydrogenase deduced from a cDNA sequence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Millan, J.L.; Driscoll, C.E.; LeVan, K.M.
The sequence and structure of human testis-specific L-lactate dehydrogenase (LDHC/sub 4/, LDHX; (L)-lactate:NAD/sup +/ oxidoreductase, EC 1.1.1.27) has been derived from analysis of a complementary DNA (cDNA) clone comprising the complete protein coding region of the enzyme. From the deduced amino acid sequence, human LDHC/sub 4/ is as different from rodent LDHC/sub 4/ (73% homology) as it is from human LDHA/sub 4/ (76% homology) and porcine LDHB/sub 4/ (68% homology). Subunit homologies are consistent with the conclusion that the LDHC gene arose by at least two independent duplication events. Furthermore, the lower degree of homology between mouse and human LDHC/submore » 4/ and the appearance of this isozyme late in evolution suggests a higher rate of mutation in the mammalian LDHC genes than in the LDHA and -B genes. Comparison of exposed amino acid residues of discrete anti-genic determinants of mouse and human LDHC/sub 4/ reveals significant differences. Knowledge of the human LDHC/sub 4/ sequence will help design human-specific peptides useful in the development of a contraceptive vaccine.« less
Takeda, Jun-ichi; Suzuki, Yutaka; Nakao, Mitsuteru; Barrero, Roberto A.; Koyanagi, Kanako O.; Jin, Lihua; Motono, Chie; Hata, Hiroko; Isogai, Takao; Nagai, Keiichi; Otsuki, Tetsuji; Kuryshev, Vladimir; Shionyu, Masafumi; Yura, Kei; Go, Mitiko; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Wiemann, Stefan; Nomura, Nobuo; Sugano, Sumio; Gojobori, Takashi; Imanishi, Tadashi
2006-01-01
We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants. PMID:16914452
Michel, Christian J
2017-04-18
In 1996, a set X of 20 trinucleotides was identified in genes of both prokaryotes and eukaryotes which has on average the highest occurrence in reading frame compared to its two shifted frames. Furthermore, this set X has an interesting mathematical property as X is a maximal C 3 self-complementary trinucleotide circular code. In 2015, by quantifying the inspection approach used in 1996, the circular code X was confirmed in the genes of bacteria and eukaryotes and was also identified in the genes of plasmids and viruses. The method was based on the preferential occurrence of trinucleotides among the three frames at the gene population level. We extend here this definition at the gene level. This new statistical approach considers all the genes, i.e., of large and small lengths, with the same weight for searching the circular code X . As a consequence, the concept of circular code, in particular the reading frame retrieval, is directly associated to each gene. At the gene level, the circular code X is strengthened in the genes of bacteria, eukaryotes, plasmids, and viruses, and is now also identified in the genes of archaea. The genes of mitochondria and chloroplasts contain a subset of the circular code X . Finally, by studying viral genes, the circular code X was found in DNA genomes, RNA genomes, double-stranded genomes, and single-stranded genomes.
A circadian gene expression atlas in mammals: implications for biology and medicine.
Zhang, Ray; Lahens, Nicholas F; Ballance, Heather I; Hughes, Michael E; Hogenesch, John B
2014-11-11
To characterize the role of the circadian clock in mouse physiology and behavior, we used RNA-seq and DNA arrays to quantify the transcriptomes of 12 mouse organs over time. We found 43% of all protein coding genes showed circadian rhythms in transcription somewhere in the body, largely in an organ-specific manner. In most organs, we noticed the expression of many oscillating genes peaked during transcriptional "rush hours" preceding dawn and dusk. Looking at the genomic landscape of rhythmic genes, we saw that they clustered together, were longer, and had more spliceforms than nonoscillating genes. Systems-level analysis revealed intricate rhythmic orchestration of gene pathways throughout the body. We also found oscillations in the expression of more than 1,000 known and novel noncoding RNAs (ncRNAs). Supporting their potential role in mediating clock function, ncRNAs conserved between mouse and human showed rhythmic expression in similar proportions as protein coding genes. Importantly, we also found that the majority of best-selling drugs and World Health Organization essential medicines directly target the products of rhythmic genes. Many of these drugs have short half-lives and may benefit from timed dosage. In sum, this study highlights critical, systemic, and surprising roles of the mammalian circadian clock and provides a blueprint for advancement in chronotherapy.
Whole-exome/genome sequencing and genomics.
Grody, Wayne W; Thompson, Barry H; Hudgins, Louanne
2013-12-01
As medical genetics has progressed from a descriptive entity to one focused on the functional relationship between genes and clinical disorders, emphasis has been placed on genomics. Genomics, a subelement of genetics, is the study of the genome, the sum total of all the genes of an organism. The human genome, which is contained in the 23 pairs of nuclear chromosomes and in the mitochondrial DNA of each cell, comprises >6 billion nucleotides of genetic code. There are some 23,000 protein-coding genes, a surprisingly small fraction of the total genetic material, with the remainder composed of noncoding DNA, regulatory sequences, and introns. The Human Genome Project, launched in 1990, produced a draft of the genome in 2001 and then a finished sequence in 2003, on the 50th anniversary of the initial publication of Watson and Crick's paper on the double-helical structure of DNA. Since then, this mass of genetic information has been translated at an ever-increasing pace into useable knowledge applicable to clinical medicine. The recent advent of massively parallel DNA sequencing (also known as shotgun, high-throughput, and next-generation sequencing) has brought whole-genome analysis into the clinic for the first time, and most of the current applications are directed at children with congenital conditions that are undiagnosable by using standard genetic tests for single-gene disorders. Thus, pediatricians must become familiar with this technology, what it can and cannot offer, and its technical and ethical challenges. Here, we address the concepts of human genomic analysis and its clinical applicability for primary care providers.
Csuros, Miklos; Rogozin, Igor B.; Koonin, Eugene V.
2011-01-01
Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing. PMID:21935348
René, P; Lenne, F; Ventura, M A; Bertagna, X; de Keyzer, Y
2000-01-04
In the pituitary, vasopressin triggers ACTH release through a specific receptor subtype, termed V3 or V1b. We cloned the V3 cDNA and showed that its expression was almost exclusive to pituitary corticotrophs and some corticotroph tumors. To study the determinants of this tissue specificity, we have now cloned the gene for the human (h) V3 receptor and characterized its structure. It is composed of two exons, spanning 10kb, with the coding region interrupted between transmembrane domains 6 and 7. We established that the transcription initiation site is located 498 nucleotides upstream of the initiator codon and showed that two polyadenylation sites may be used, while the most frequent is the most downstream. Sequence analysis of the promoter region showed no TATA box but identified consensus binding motifs for Sp1, CREB, and half sites of the estrogen receptor binding site. However comparison with another corticotroph-specific gene, proopiomelanocortin, did not identify common regulatory elements in the two promoters except for a short GC-rich region. Unexpectedly, hV3 gene analysis revealed that a formerly cloned 'artifactual' hV3 cDNA indeed corresponded to a spliced antisense transcript, overlapping the 5' part of the coding sequence in exon 1 and the promoter region. This transcript, hV3rev, was detected in normal pituitary and in many corticotroph tumors expressing hV3 sense mRNA and may therefore play a role in hV3 gene expression.
Prestes, Priscilla R; Marques, Francine Z; Lopez-Campos, Guillermo; Lewandowski, Paul; Delbridge, Lea M D; Charchar, Fadi J; Harrap, Stephen B
2018-05-18
Hypertrophic cardiomyopathy thickens heart muscles reducing functionality and increasing risk of cardiac disease and morbidity. Genetic factors are involved, but their contribution is poorly understood. We used the hypertrophic heart rat (HHR), a unique normotensive polygenic model of cardiac hypertrophy and heart failure to investigate the role of genes associated with monogenic human cardiomyopathy. We selected 42 genes involved in monogenic human cardiomyopathies to study: 1) DNA variants, by sequencing the whole-genome of 13-week old HHR and age-matched normal heart rat (NHR), its genetic control strain; 2) mRNA expression, by targeted RNA-sequencing in left ventricles of HHR and NHR at five ages (2-days old, 4-, 13-, 33- and 50-weeks old) compared to human idiopathic dilated data; and 3) microRNA expression, with rat microRNA microarrays in left ventricles of 2-days old HHR and age-matched NHR. We also investigated experimentally validated microRNA-mRNA interactions. Whole-genome sequencing revealed unique variants mostly located in non-coding regions of HHR and NHR. We found 29 genes differentially expressed in at least one age. Genes encoding desmoglein 2 (Dsg2) and transthyretin (Ttr) were significantly differentially expressed at all ages in the HHR, but only Ttr was also differentially expressed in human idiopathic cardiomyopathy. Lastly, only two microRNAs differentially expressed in the HHR were present in our comparison of validated microRNA-mRNA interactions. These two microRNAs interact with five of the genes studied. Our study shows that genes involved in monogenic forms of human cardiomyopathies may also influence polygenic forms of the disease.
Inheritance-mode specific pathogenicity prioritization (ISPP) for human protein coding genes.
Hsu, Jacob Shujui; Kwan, Johnny S H; Pan, Zhicheng; Garcia-Barcelo, Maria-Mercè; Sham, Pak Chung; Li, Miaoxin
2016-10-15
Exome sequencing studies have facilitated the detection of causal genetic variants in yet-unsolved Mendelian diseases. However, the identification of disease causal genes among a list of candidates in an exome sequencing study is still not fully settled, and it is often difficult to prioritize candidate genes for follow-up studies. The inheritance mode provides crucial information for understanding Mendelian diseases, but none of the existing gene prioritization tools fully utilize this information. We examined the characteristics of Mendelian disease genes under different inheritance modes. The results suggest that Mendelian disease genes with autosomal dominant (AD) inheritance mode are more haploinsufficiency and de novo mutation sensitive, whereas those autosomal recessive (AR) genes have significantly more non-synonymous variants and regulatory transcript isoforms. In addition, the X-linked (XL) Mendelian disease genes have fewer non-synonymous and synonymous variants. As a result, we derived a new scoring system for prioritizing candidate genes for Mendelian diseases according to the inheritance mode. Our scoring system assigned to each annotated protein-coding gene (N = 18 859) three pathogenic scores according to the inheritance mode (AD, AR and XL). This inheritance mode-specific framework achieved higher accuracy (area under curve = 0.84) in XL mode. The inheritance-mode specific pathogenicity prioritization (ISPP) outperformed other well-known methods including Haploinsufficiency, Recessive, Network centrality, Genic Intolerance, Gene Damage Index and Gene Constraint scores. This systematic study suggests that genes manifesting disease inheritance modes tend to have unique characteristics. ISPP is included in KGGSeq v1.0 (http://grass.cgs.hku.hk/limx/kggseq/), and source code is available from (https://github.com/jacobhsu35/ISPP.git). mxli@hku.hkSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Can a few non‐coding mutations make a human brain?
Franchini, Lucía F.
2015-01-01
The recent finding that the human version of a neurodevelopmental enhancer of the Wnt receptor Frizzled 8 (FZD8) gene alters neural progenitor cell cycle timing and brain size is a step forward to understanding human brain evolution. The human brain is distinctive in terms of its cognitive abilities as well as its susceptibility to neurological disease. Identifying which of the millions of genomic changes that occurred during human evolution led to these and other uniquely human traits is extremely challenging. Recent studies have demonstrated that many of the fastest evolving regions of the human genome function as gene regulatory enhancers during embryonic development and that the human‐specific mutations in them might alter expression patterns. However, elucidating molecular and cellular effects of sequence or expression pattern changes is a major obstacle to discovering the genetic bases of the evolution of our species. There is much work to do before human‐specific genetic and genomic changes are linked to complex human traits. Also watch the Video Abstract. PMID:26350501
The evolution and expression of the snaR family of small non-coding RNAs
Parrott, Andrew M.; Tsai, Michael; Batchu, Priyanka; Ryan, Karen; Ozer, Harvey L.; Tian, Bin; Mathews, Michael B.
2011-01-01
We recently identified the snaR family of small non-coding RNAs that associate in vivo with the nuclear factor 90 (NF90/ILF3) protein. The major human species, snaR-A, is an RNA polymerase III transcript with restricted tissue distribution and orthologs in chimpanzee but not rhesus macaque or mouse. We report their expression in human tissues and their evolution in primates. snaR genes are exclusively in African Great Apes and some are unique to humans. Two novel families of snaR-related genetic elements were found in primates: CAS (catarrhine ancestor of snaR), limited to Old World Monkeys and apes; and ASR (Alu/snaR-related), present in all monkeys and apes. ASR and CAS appear to have spread by retrotransposition, whereas most snaR genes have spread by segmental duplication. snaR-A and snaR-G2 are differentially expressed in discrete regions of the human brain and other tissues, notably including testis. snaR-A is up-regulated in transformed and immortalized human cells, and is stably bound to ribosomes in HeLa cells. We infer that snaR evolved from the left monomer of the primate-specific Alu SINE family via ASR and CAS in conjunction with major primate speciation events, and suggest that snaRs participate in tissue- and species-specific regulation of cell growth and translation. PMID:20935053
MERP1: a mammalian ependymin-related protein gene differentially expressed in hematopoietic cells.
Gregorio-King, Claudia C; McLeod, Janet L; Collier, Fiona McL; Collier, Gregory R; Bolton, Karyn A; Van Der Meer, Gavin J; Apostolopoulos, Jim; Kirkland, Mark A
2002-03-20
We have utilized differential display polymerase chain reaction to investigate the gene expression of hematopoietic progenitor cells from adult bone marrow and umbilical cord blood. A differentially expressed gene was identified in CD34+ hematopoietic progenitor cells, with low expression in CD34- cells. We have obtained the full coding sequence of this gene which we designated human mammalian ependymin-related protein 1 (MERP1). Expression of MERP1 was found in a variety of normal human tissues, and is 4- and 10-fold higher in adult bone marrow and umbilical cord blood CD34+ cells, respectively, compared to CD34- cells. Additionally, MERP1 expression in a hematopoietic stem cell enriched population was down-regulated with proliferation and differentiation. Conceptual translation of the MERP1 open reading frame reveals significant homology to two families of glycoprotein calcium-dependant cell adhesion molecules: ependymins and protocadherins.
Regulation of Epidermal Growth Factor Receptor Expression by PML in Human Breast Cancer.
1996-08-01
Characterization of a zinc finger gene disrupted by the t(15; 17) in acute promyelocytic leukemia. Science, 254:1371-1374. (32) Fagioli , M., Alcalay, M...DISTRIBUTION CODE Approved for public release; distribution unlimited 13. ABSTRACT (Maximum 200 We have determined that PML is a novel growth suppressor that...was found to be translocated from chromosome 15 and fused with the retinoic acid receptor- a gene on chromosome 17 (t(15; 17) in acute promyelogenous
Identification of two allelic IgG1 C(H) coding regions (Cgamma1) of cat.
Kanai, T H; Ueda, S; Nakamura, T
2000-01-31
Two types of cDNA encoding IgG1 heavy chain (gamma1) were isolated from a single domestic short-hair cat. Sequence analysis indicated a higher level of similarity of these Cgamma1 sequences to human Cgamma1 sequence (76.9 and 77.0%) than to mouse sequence (70.0 and 69.7%) at the nucleotide level. Predicted primary structures of both the feline Cgamma1 genes, designated as Cgamma1a and Cgamma1b, were similar to that of human Cgamma1 gene, for instance, as to the size of constant domains, the presence of six conserved cysteine residues involved in formation of the domain structure, and the location of a conserved N-linked glycosylation site. Sequence comparison between the two alleles showed that 7 out of 10 nucleotide differences were within the C(H)3 domain coding region, all leading to nonsynonymous changes in amino acid residues. Partial sequence analysis of genomic clones showed three nucleotide substitutions between the two Cgamma1 alleles in the intron between the CH2 and C(H)3 domain coding regions. In 12 domestic short-hair cats used in this study, the frequency of Cgamma1a allele (62.5%) was higher than that of the Cgamma1b allele (37.5%).
Hyatt, Sam; Cheung, Kat; Skelton, Andrew J.; Xu, Yaobo; Clark, Ian M.
2017-01-01
Long non-coding RNAs (lncRNAs) are expressed in a highly tissue-specific manner and function in various aspects of cell biology, often as key regulators of gene expression. In this study, we established a role for lncRNAs in chondrocyte differentiation. Using RNA sequencing we identified a human articular chondrocyte repertoire of lncRNAs from normal hip cartilage donated by neck of femur fracture patients. Of particular interest are lncRNAs upstream of the master chondrocyte transcription factor SOX9 locus. SOX9 is an HMG-box transcription factor that plays an essential role in chondrocyte development by directing the expression of chondrocyte-specific genes. Two of these lncRNAs are upregulated during chondrogenic differentiation of mesenchymal stem cells (MSCs). Depletion of one of these lncRNAs, LOC102723505, which we termed ROCR (regulator of chondrogenesis RNA), by RNA interference disrupted MSC chondrogenesis, concomitant with reduced cartilage-specific gene expression and incomplete matrix component production, indicating an important role in chondrocyte biology. Specifically, SOX9 induction was significantly ablated in the absence of ROCR, and overexpression of SOX9 rescued the differentiation of MSCs into chondrocytes. Our work sheds further light on chondrocyte-specific SOX9 expression and highlights a novel method of chondrocyte gene regulation involving a lncRNA. PMID:29084806
cncRNAs: Bi-functional RNAs with protein coding and non-coding functions
Kumari, Pooja; Sampath, Karuna
2015-01-01
For many decades, the major function of mRNA was thought to be to provide protein-coding information embedded in the genome. The advent of high-throughput sequencing has led to the discovery of pervasive transcription of eukaryotic genomes and opened the world of RNA-mediated gene regulation. Many regulatory RNAs have been found to be incapable of protein coding and are hence termed as non-coding RNAs (ncRNAs). However, studies in recent years have shown that several previously annotated non-coding RNAs have the potential to encode proteins, and conversely, some coding RNAs have regulatory functions independent of the protein they encode. Such bi-functional RNAs, with both protein coding and non-coding functions, which we term as ‘cncRNAs’, have emerged as new players in cellular systems. Here, we describe the functions of some cncRNAs identified from bacteria to humans. Because the functions of many RNAs across genomes remains unclear, we propose that RNAs be classified as coding, non-coding or both only after careful analysis of their functions. PMID:26498036
Expression profiling of G-protein-coupled receptors in human urothelium and related cell lines.
Ochodnický, Peter; Humphreys, Sian; Eccles, Rachel; Poljakovic, Mirjana; Wiklund, Peter; Michel, Martin C
2012-09-01
What's known on the subject? and What does the study add? Urothelium emerged as a crucial integrator of sensory inputs and outputs in the bladder wall, and urothelial G-protein-coupled receptors (GPCRs) may represent plausible targets for treatment of various bladder pathologies. Urothelial cell lines provide a useful tool to study urothelial receptor function, but their validity as models for native human urothelium remains unclear. We characterize the mRNA expression of genes coding for GPCRs in human freshly isolated urothelium and compare the expression pattern with those in human urothelial cell lines. To characterize the mRNA expression pattern of genes coding for G-protein-coupled receptors (GPCRs) in human freshly isolated urothelium. To compare GPCR expression in human urothelium-derived cell lines to explore the suitability of these cell lines as model systems to study urothelial function. Native human urothelium (commercially sourced) and human urothelium-derived non-cancer (UROtsa and TERT-NHUC) and cancer (J82) cell lines were used. For mRNA expression profiling we used custom-designed real-time polymerase chain reaction array for 40 receptors and several related genes. Native urothelium expressed a wide variety of GPCRs, including α(1A), α(1D) and all subtypes of α(2) and β adrenoceptors. In addition, M(2) and M(3) cholinergic muscarinic receptors, angiotensin II AT(1) receptor, serotonin 5-HT(2A) receptor and all subtypes of bradykinin, endothelin, cannabinoid, tachykinin and sphingosine-1-phosphate receptors were detected. Nerve growth factor and both its low- and high-affinity receptors were also expressed in urothelium. In all cell lines expression of most GPCRs was markedly downregulated, with few exceptions. In UROtsa cells, but much less in other cell lines, the expression of β(2) adrenoceptors, M(3) muscarinic receptors, B(1) and B(2) bradykinin receptors, ET(B) endothelin receptors and several subtypes of sphingosine-1-phosphate receptors was largely retained. Human urothelium expresses a wide range of receptors which enables sensing and integration of various extracellular signals. Human urothelium-derived cell lines, especially UROtsa cells, show comparable mRNA expression to native tissue for several physiologically relevant GPCRs, but lose expression of many other receptors. The use of cell lines as model systems of human urothelium requires careful validation of suitability for the genes of interest. © 2012 BJU INTERNATIONAL.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eipers, P.G.
1992-01-01
The gene for the human p58[sup clk[minus]1] protein kinase, a cell division control-related gene, has been mapped by somatic cell hybrid analyses, in situ localization with the chromosomal gene, and nested polymerase chain reaction amplification of microdissected chromosomes. These studies indicate that the expressed p58[sup clk[minus]1] chromosomal gene maps to 1p36, while a highly related p58[sup clk[minus]1] sequence of unknown nature maps to chromosome 15. Assignment of a p34[sup cdc2]-related gene to 1p36 region, including neuroblastoma, ductal carcinoma of the breast, malignant melanoma, Merkel cell carcinoma and endocrine neoplasia among others. Aberrant expression of this protein kinase negatively regulates normalmore » cellular growth. The p58[sup clk[minus]1] protein contains a central domain of 299 amino acids that is 46% identical to human p34[sup cdc2], the master mitotic protein kinase. This dissertation details the complete structure of the p58[sup clk[minus]1] chromosomal gene, including its putative promoter region, transcriptional start sites, exonic sequences, and intron/exon boundary sequences. The gene is 10 kb in size and contains 12 exons and 11 introns. Interestingly, the rather large 2.0 kb 3[prime] untranslated region is interrupted by an intron that separates a region containing numerous AUUUA destabilization motifs from the coding region. Furthermore, the expression of this gene in normal human tissues, as well as several human tumor cell samples and lines, is examined. The origin of multiple human transcripts from the same chromosomal gene, and the possible differential stability of these various transcripts, is discussed with regard to the transcriptional and post-transcriptional regulation of this gene. This is the first report of the chromosomal gene structure of a member of the p34[sup cdc2] supergene family.« less
Ogawa, Yuko; Tsujimoto, Masafumi; Yanoshita, Ryohei
2016-01-01
Exosomes are small extracellular vesicles containing microRNAs and mRNAs that are produced by various types of cells. We previously used ultrafiltration and size-exclusion chromatography to isolate two types of human salivary exosomes (exosomes I, II) that are different in size and proteomes. We showed that salivary exosomes contain large repertoires of small RNAs. However, precise information regarding long RNAs in salivary exosomes has not been fully determined. In this study, we investigated the compositions of protein-coding RNAs (pcRNAs) and long non-protein-coding RNAs (lncRNAs) of exosome I, exosome II and whole saliva (WS) by next-generation sequencing technology. Although 11% of all RNAs were commonly detected among the three samples, the compositions of reads mapping to known RNAs were similar. The most abundant pcRNA is ribosomal RNA protein, and pcRNAs of some salivary proteins such as S100 calcium-binding protein A8 (protein S100-A8) were present in salivary exosomes. Interestingly, lncRNAs of pseudogenes (presumably, processed pseudogenes) were abundant in exosome I, exosome II and WS. Translationally controlled tumor protein gene, which plays an important role in cell proliferation, cell death and immune responses, was highly expressed as pcRNA and pseudogenes in salivary exosomes. Our results show that salivary exosomes contain various types of RNAs such as pseudogenes and small RNAs, and may mediate intercellular communication by transferring these RNAs to target cells as gene expression regulators.
Coppieters, Frauke; Todeschini, Anne Laure; Fujimaki, Takuro; Baert, Annelot; De Bruyne, Marieke; Van Cauwenbergh, Caroline; Verdin, Hannah; Bauwens, Miriam; Ongenaert, Maté; Kondo, Mineo; Meire, Françoise; Murakami, Akira; Veitia, Reiner A; Leroy, Bart P; De Baere, Elfride
2015-12-01
Leber congenital amaurosis (LCA) is a severe autosomal-recessive retinal dystrophy leading to congenital blindness. A recently identified LCA gene is NMNAT1, located in the LCA9 locus. Although most mutations in blindness genes are coding variations, there is accumulating evidence for hidden noncoding defects or structural variations (SVs). The starting point of this study was an LCA9-associated consanguineous family in which no coding mutations were found in the LCA9 region. Exploring the untranslated regions of NMNAT1 revealed a novel homozygous 5'UTR variant, c.-70A>T. Moreover, an adjacent 5'UTR variant, c.-69C>T, was identified in a second consanguineous family displaying a similar phenotype. Both 5'UTR variants resulted in decreased NMNAT1 mRNA abundance in patients' lymphocytes, and caused decreased luciferase activity in human retinal pigment epithelial RPE-1 cells. Second, we unraveled pseudohomozygosity of a coding NMNAT1 mutation in two unrelated LCA patients by the identification of two distinct heterozygous partial NMNAT1 deletions. Molecular characterization of the breakpoint junctions revealed a complex Alu-rich genomic architecture. Our study uncovered hidden genetic variation in NMNAT1-associated LCA and emphasized a shift from coding to noncoding regulatory mutations and repeat-mediated SVs in the molecular pathogenesis of heterogeneous recessive disorders such as hereditary blindness. © 2015 The Authors. **Human Mutation published by Wiley Periodicals, Inc.
The Evolution of Human Cells in Terms of Protein Innovation
Sardar, Adam J.; Oates, Matt E.; Fang, Hai; Forrest, Alistair R.R.; Kawaji, Hideya; Gough, Julian; Rackham, Owen J.L.
2014-01-01
Humans are composed of hundreds of cell types. As the genomic DNA of each somatic cell is identical, cell type is determined by what is expressed and when. Until recently, little has been reported about the determinants of human cell identity, particularly from the joint perspective of gene evolution and expression. Here, we chart the evolutionary past of all documented human cell types via the collective histories of proteins, the principal product of gene expression. FANTOM5 data provide cell-type–specific digital expression of human protein-coding genes and the SUPERFAMILY resource is used to provide protein domain annotation. The evolutionary epoch in which each protein was created is inferred by comparison with domain annotation of all other completely sequenced genomes. Studying the distribution across epochs of genes expressed in each cell type reveals insights into human cellular evolution in terms of protein innovation. For each cell type, its history of protein innovation is charted based on the genes it expresses. Combining the histories of all cell types enables us to create a timeline of cell evolution. This timeline identifies the possibility that our common ancestor Coelomata (cavity-forming animals) provided the innovation required for the innate immune system, whereas cells which now form the brain of human have followed a trajectory of continually accumulating novel proteins since Opisthokonta (boundary of animals and fungi). We conclude that exaptation of existing domain architectures into new contexts is the dominant source of cell-type–specific domain architectures. PMID:24692656
Michel, Christian J.
2017-01-01
In 1996, a set X of 20 trinucleotides was identified in genes of both prokaryotes and eukaryotes which has on average the highest occurrence in reading frame compared to its two shifted frames. Furthermore, this set X has an interesting mathematical property as X is a maximal C3 self-complementary trinucleotide circular code. In 2015, by quantifying the inspection approach used in 1996, the circular code X was confirmed in the genes of bacteria and eukaryotes and was also identified in the genes of plasmids and viruses. The method was based on the preferential occurrence of trinucleotides among the three frames at the gene population level. We extend here this definition at the gene level. This new statistical approach considers all the genes, i.e., of large and small lengths, with the same weight for searching the circular code X. As a consequence, the concept of circular code, in particular the reading frame retrieval, is directly associated to each gene. At the gene level, the circular code X is strengthened in the genes of bacteria, eukaryotes, plasmids, and viruses, and is now also identified in the genes of archaea. The genes of mitochondria and chloroplasts contain a subset of the circular code X. Finally, by studying viral genes, the circular code X was found in DNA genomes, RNA genomes, double-stranded genomes, and single-stranded genomes. PMID:28420220
Functional analysis and transcriptional output of the Göttingen minipig genome.
Heckel, Tobias; Schmucki, Roland; Berrera, Marco; Ringshandl, Stephan; Badi, Laura; Steiner, Guido; Ravon, Morgane; Küng, Erich; Kuhn, Bernd; Kratochwil, Nicole A; Schmitt, Georg; Kiialainen, Anna; Nowaczyk, Corinne; Daff, Hamina; Khan, Azinwi Phina; Lekolool, Isaac; Pelle, Roger; Okoth, Edward; Bishop, Richard; Daubenberger, Claudia; Ebeling, Martin; Certa, Ulrich
2015-11-14
In the past decade the Göttingen minipig has gained increasing recognition as animal model in pharmaceutical and safety research because it recapitulates many aspects of human physiology and metabolism. Genome-based comparison of drug targets together with quantitative tissue expression analysis allows rational prediction of pharmacology and cross-reactivity of human drugs in animal models thereby improving drug attrition which is an important challenge in the process of drug development. Here we present a new chromosome level based version of the Göttingen minipig genome together with a comparative transcriptional analysis of tissues with pharmaceutical relevance as basis for translational research. We relied on mapping and assembly of WGS (whole-genome-shotgun sequencing) derived reads to the reference genome of the Duroc pig and predict 19,228 human orthologous protein-coding genes. Genome-based prediction of the sequence of human drug targets enables the prediction of drug cross-reactivity based on conservation of binding sites. We further support the finding that the genome of Sus scrofa contains about ten-times less pseudogenized genes compared to other vertebrates. Among the functional human orthologs of these minipig pseudogenes we found HEPN1, a putative tumor suppressor gene. The genomes of Sus scrofa, the Tibetan boar, the African Bushpig, and the Warthog show sequence conservation of all inactivating HEPN1 mutations suggesting disruption before the evolutionary split of these pig species. We identify 133 Sus scrofa specific, conserved long non-coding RNAs (lncRNAs) in the minipig genome and show that these transcripts are highly conserved in the African pigs and the Tibetan boar suggesting functional significance. Using a new minipig specific microarray we show high conservation of gene expression signatures in 13 tissues with biomedical relevance between humans and adult minipigs. We underline this relationship for minipig and human liver where we could demonstrate similar expression levels for most phase I drug-metabolizing enzymes. Higher expression levels and metabolic activities were found for FMO1, AKR/CRs and for phase II drug metabolizing enzymes in minipig as compared to human. The variability of gene expression in equivalent human and minipig tissues is considerably higher in minipig organs, which is important for study design in case a human target belongs to this variable category in the minipig. The first analysis of gene expression in multiple tissues during development from young to adult shows that the majority of transcriptional programs are concluded four weeks after birth. This finding is in line with the advanced state of human postnatal organ development at comparative age categories and further supports the minipig as model for pediatric drug safety studies. Genome based assessment of sequence conservation combined with gene expression data in several tissues improves the translational value of the minipig for human drug development. The genome and gene expression data presented here are important resources for researchers using the minipig as model for biomedical research or commercial breeding. Potential impact of our data for comparative genomics, translational research, and experimental medicine are discussed.
Tang, Clara S; Zhang, He; Cheung, Chloe Y Y; Xu, Ming; Ho, Jenny C Y; Zhou, Wei; Cherny, Stacey S; Zhang, Yan; Holmen, Oddgeir; Au, Ka-Wing; Yu, Haiyi; Xu, Lin; Jia, Jia; Porsch, Robert M; Sun, Lijie; Xu, Weixian; Zheng, Huiping; Wong, Lai-Yung; Mu, Yiming; Dou, Jingtao; Fong, Carol H Y; Wang, Shuyu; Hong, Xueyu; Dong, Liguang; Liao, Yanhua; Wang, Jiansong; Lam, Levina S M; Su, Xi; Yan, Hua; Yang, Min-Lee; Chen, Jin; Siu, Chung-Wah; Xie, Gaoqiang; Woo, Yu-Cho; Wu, Yangfeng; Tan, Kathryn C B; Hveem, Kristian; Cheung, Bernard M Y; Zöllner, Sebastian; Xu, Aimin; Eugene Chen, Y; Jiang, Chao Qiang; Zhang, Youyi; Lam, Tai-Hing; Ganesh, Santhi K; Huo, Yong; Sham, Pak C; Lam, Karen S L; Willer, Cristen J; Tse, Hung-Fat; Gao, Wei
2015-12-22
Blood lipids are important risk factors for coronary artery disease (CAD). Here we perform an exome-wide association study by genotyping 12,685 Chinese, using a custom Illumina HumanExome BeadChip, to identify additional loci influencing lipid levels. Single-variant association analysis on 65,671 single nucleotide polymorphisms reveals 19 loci associated with lipids at exome-wide significance (P<2.69 × 10(-7)), including three Asian-specific coding variants in known genes (CETP p.Asp459Gly, PCSK9 p.Arg93Cys and LDLR p.Arg257Trp). Furthermore, missense variants at two novel loci-PNPLA3 p.Ile148Met and PKD1L3 p.Thr429Ser-also influence levels of triglycerides and low-density lipoprotein cholesterol, respectively. Another novel gene, TEAD2, is found to be associated with high-density lipoprotein cholesterol through gene-based association analysis. Most of these newly identified coding variants show suggestive association (P<0.05) with CAD. These findings demonstrate that exome-wide genotyping on samples of non-European ancestry can identify additional population-specific possible causal variants, shedding light on novel lipid biology and CAD.
Genome-wide compendium and functional assessment of in vivo heart enhancers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dickel, Diane E.; Barozzi, Iros; Zhu, Yiwen
Whole-genome sequencing is identifying growing numbers of non-coding variants in human disease studies, but the lack of accurate functional annotations prevents their interpretation. We describe the genome-wide landscape of distant-acting enhancers active in the developing and adult human heart, an organ whose impairment is a predominant cause of mortality and morbidity. Using integrative analysis of > 35 epigenomic data sets from mouse and human pre-and postnatal hearts we created a comprehensive reference of > 80,000 putative human heart enhancers. To illustrate the importance of enhancers in the regulation of genes involved in heart disease, we deleted the mouse orthologs ofmore » two human enhancers near cardiac myosin genes. In both cases, we observe in vivo expression changes and cardiac phenotypes consistent with human heart disease. Our study provides a comprehensive catalogue of human heart enhancers for use in clinical whole-genome sequencing studies and highlights the importance of enhancers for cardiac function.« less
Genome-wide compendium and functional assessment of in vivo heart enhancers
Dickel, Diane E.; Barozzi, Iros; Zhu, Yiwen; ...
2016-10-05
Whole-genome sequencing is identifying growing numbers of non-coding variants in human disease studies, but the lack of accurate functional annotations prevents their interpretation. We describe the genome-wide landscape of distant-acting enhancers active in the developing and adult human heart, an organ whose impairment is a predominant cause of mortality and morbidity. Using integrative analysis of > 35 epigenomic data sets from mouse and human pre-and postnatal hearts we created a comprehensive reference of > 80,000 putative human heart enhancers. To illustrate the importance of enhancers in the regulation of genes involved in heart disease, we deleted the mouse orthologs ofmore » two human enhancers near cardiac myosin genes. In both cases, we observe in vivo expression changes and cardiac phenotypes consistent with human heart disease. Our study provides a comprehensive catalogue of human heart enhancers for use in clinical whole-genome sequencing studies and highlights the importance of enhancers for cardiac function.« less
Genome-wide compendium and functional assessment of in vivo heart enhancers
Dickel, Diane E.; Barozzi, Iros; Zhu, Yiwen; Fukuda-Yuzawa, Yoko; Osterwalder, Marco; Mannion, Brandon J.; May, Dalit; Spurrell, Cailyn H.; Plajzer-Frick, Ingrid; Pickle, Catherine S.; Lee, Elizabeth; Garvin, Tyler H.; Kato, Momoe; Akiyama, Jennifer A.; Afzal, Veena; Lee, Ah Young; Gorkin, David U.; Ren, Bing; Rubin, Edward M.; Visel, Axel; Pennacchio, Len A.
2016-01-01
Whole-genome sequencing is identifying growing numbers of non-coding variants in human disease studies, but the lack of accurate functional annotations prevents their interpretation. We describe the genome-wide landscape of distant-acting enhancers active in the developing and adult human heart, an organ whose impairment is a predominant cause of mortality and morbidity. Using integrative analysis of >35 epigenomic data sets from mouse and human pre- and postnatal hearts we created a comprehensive reference of >80,000 putative human heart enhancers. To illustrate the importance of enhancers in the regulation of genes involved in heart disease, we deleted the mouse orthologs of two human enhancers near cardiac myosin genes. In both cases, we observe in vivo expression changes and cardiac phenotypes consistent with human heart disease. Our study provides a comprehensive catalogue of human heart enhancers for use in clinical whole-genome sequencing studies and highlights the importance of enhancers for cardiac function. PMID:27703156
Organization and transient expression of the gene for human U11 snRNA
Clemens, Suter-Crazzolara; Walter, Keller
1991-01-01
The nucleotide sequence of U11 small nuclear RNA, a minor U RNA from HeLa cells, was determined. Computer analysis of the sequence (135 residues) predicts two strong hairpin loops which are separated by seventeen nucleotides containing an Sm binding site (AAUUUUUUGG). A synthetic gene was constructed in which the coding region of U11 RNA is under the control of a T7 promoter. This vector can be used to produce U11 RNA in vitro. Southern hybridization and PCR analysis of HeLa genomic DNA suggest that U11 RNA is encoded by a single copy gene, and that at least three genomic regions could be U11 RNA pseudogenes. A HeLa genomic copy of a U11 gene was isolated by inverted PCR. This gene contains the U11 RNA coding sequence and several sequence elements unique for the U RNA genes. These include a Distal Sequence Element (DSE, ATTTGCATA) present between positions −215 and −223 relative to the start of transcription; a Proximal Sequence Element (PSE, TTCACCTTTACCAAAAATG) located between positions −43 and −63 ; and a 3′box (GTTAGGCGAAATATTA) between positions +150 and +166. Transfection of HeLa cells with this gene revealed that it is functioning in vivo and can produce U11 RNA. PMID:1820214
Zheng, Yang; Cai, Jing; Li, JianWen; Li, Bo; Lin, Runmao; Tian, Feng; Wang, XiaoLing; Wang, Jun
2010-01-01
A 10-fold BAC library for giant panda was constructed and nine BACs were selected to generate finish sequences. These BACs could be used as a validation resource for the de novo assembly accuracy of the whole genome shotgun sequencing reads of giant panda newly generated by the Illumina GA sequencing technology. Complete sanger sequencing, assembly, annotation and comparative analysis were carried out on the selected BACs of a joint length 878 kb. Homologue search and de novo prediction methods were used to annotate genes and repeats. Twelve protein coding genes were predicted, seven of which could be functionally annotated. The seven genes have an average gene size of about 41 kb, an average coding size of about 1.2 kb and an average exon number of 6 per gene. Besides, seven tRNA genes were found. About 27 percent of the BAC sequence is composed of repeats. A phylogenetic tree was constructed using neighbor-join algorithm across five species, including giant panda, human, dog, cat and mouse, which reconfirms dog as the most related species to giant panda. Our results provide detailed sequence and structure information for new genes and repeats of giant panda, which will be helpful for further studies on the giant panda.
Novel numerical and graphical representation of DNA sequences and proteins.
Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D
2006-12-01
We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.
Long non-coding RNAs involved in autophagy regulation
Yang, Lixian; Wang, Hanying; Shen, Qi; Feng, Lifeng; Jin, Hongchuan
2017-01-01
Autophagy degrades non-functioning or damaged proteins and organelles to maintain cellular homeostasis in a physiological or pathological context. Autophagy can be protective or detrimental, depending on its activation status and other conditions. Therefore, autophagy has a crucial role in a myriad of pathophysiological processes. From the perspective of autophagy-related (ATG) genes, the molecular dissection of autophagy process and the regulation of its level have been largely unraveled. However, the discovery of long non-coding RNAs (lncRNAs) provides a new paradigm of gene regulation in almost all important biological processes, including autophagy. In this review, we highlight recent advances in autophagy-associated lncRNAs and their specific autophagic targets, as well as their relevance to human diseases such as cancer, cardiovascular disease, diabetes and cerebral ischemic stroke. PMID:28981093
Behind the curtain of non-coding RNAs; long non-coding RNAs regulating hepatocarcinogenesis
El Khodiry, Aya; Afify, Menna; El Tayebi, Hend M
2018-01-01
Hepatocellular carcinoma (HCC) is one of the most common and aggressive cancers worldwide. HCC is the fifth common malignancy in the world and the second leading cause of cancer death in Asia. Long non-coding RNAs (lncRNAs) are RNAs with a length greater than 200 nucleotides that do not encode proteins. lncRNAs can regulate gene expression and protein synthesis in several ways by interacting with DNA, RNA and proteins in a sequence specific manner. They could regulate cellular and developmental processes through either gene inhibition or gene activation. Many studies have shown that dysregulation of lncRNAs is related to many human diseases such as cardiovascular diseases, genetic disorders, neurological diseases, immune mediated disorders and cancers. However, the study of lncRNAs is challenging as they are poorly conserved between species, their expression levels aren’t as high as that of mRNAs and have great interpatient variations. The study of lncRNAs expression in cancers have been a breakthrough as it unveils potential biomarkers and drug targets for cancer therapy and helps understand the mechanism of pathogenesis. This review discusses many long non-coding RNAs and their contribution in HCC, their role in development, metastasis, and prognosis of HCC and how to regulate and target these lncRNAs as a therapeutic tool in HCC treatment in the future. PMID:29434445
Protein Kinases in Mammary Gland Development and Carcinogenesis
1999-09-01
studies identical at the amino acid level to calcium/calmodulin-dependent may provide insight into mechanisms of growth control and DNA protein kinase I...human homologues of these kinases(19, 20 ). Amino acid conservation in the coding region between mouse and human Hunk is greater than 90% identical. While...genes (13, 14). Over the past 4 years , several of the mRNA and protein levels (39-46). These findings clearly dem- these breast cancer susceptibility
Bandara, G; Mueller, G M; Galea-Lauri, J; Tindal, M H; Georgescu, H I; Suchanek, M K; Hung, G L; Glorioso, J C; Robbins, P D; Evans, C H
1993-01-01
Gene therapy offers a radical different approach to the treatment of arthritis. Here we have demonstrated that two marker genes (lacZ and neo) and cDNA coding for a potentially therapeutic protein (human interleukin 1-receptor-antagonist protein; IRAP or IL-1ra) can be delivered, by ex vivo techniques, to the synovial lining of joints; intraarticular expression of IRAP inhibited intraarticular responses to interleukin 1. To achieve this, lapine synoviocytes were first transduced in culture by retroviral infection. The genetically modified synovial cells were then transplanted by intraarticular injection into the knee joints of rabbits, where they efficiently colonized the synovium. Assay of joint lavages confirmed the in vivo expression of biologically active human IRAP. With allografted cells, IRAP expression was lost by 12 days after transfer. In contrast, autografted synoviocytes continued to express IRAP for approximately 5 weeks. Knee joints expressing human IRAP were protected from the leukocytosis that otherwise follows the intraarticular injection of recombinant human interleukin 1 beta. Thus, we report the intraarticular expression and activity of a potentially therapeutic protein by gene-transfer technology; these experiments demonstrate the feasibility of treating arthritis and other joint disorders with gene therapy. Images Fig. 1 Fig. 2 PMID:8248169
Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content.
Hughes, Jennifer F; Skaletsky, Helen; Pyntikova, Tatyana; Graves, Tina A; van Daalen, Saskia K M; Minx, Patrick J; Fulton, Robert S; McGrath, Sean D; Locke, Devin P; Friedman, Cynthia; Trask, Barbara J; Mardis, Elaine R; Warren, Wesley C; Repping, Sjoerd; Rozen, Steve; Wilson, Richard K; Page, David C
2010-01-28
The human Y chromosome began to evolve from an autosome hundreds of millions of years ago, acquiring a sex-determining function and undergoing a series of inversions that suppressed crossing over with the X chromosome. Little is known about the recent evolution of the Y chromosome because only the human Y chromosome has been fully sequenced. Prevailing theories hold that Y chromosomes evolve by gene loss, the pace of which slows over time, eventually leading to a paucity of genes, and stasis. These theories have been buttressed by partial sequence data from newly emergent plant and animal Y chromosomes, but they have not been tested in older, highly evolved Y chromosomes such as that of humans. Here we finished sequencing of the male-specific region of the Y chromosome (MSY) in our closest living relative, the chimpanzee, achieving levels of accuracy and completion previously reached for the human MSY. By comparing the MSYs of the two species we show that they differ radically in sequence structure and gene content, indicating rapid evolution during the past 6 million years. The chimpanzee MSY contains twice as many massive palindromes as the human MSY, yet it has lost large fractions of the MSY protein-coding genes and gene families present in the last common ancestor. We suggest that the extraordinary divergence of the chimpanzee and human MSYs was driven by four synergistic factors: the prominent role of the MSY in sperm production, 'genetic hitchhiking' effects in the absence of meiotic crossing over, frequent ectopic recombination within the MSY, and species differences in mating behaviour. Although genetic decay may be the principal dynamic in the evolution of newly emergent Y chromosomes, wholesale renovation is the paramount theme in the continuing evolution of chimpanzee, human and perhaps other older MSYs.
The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome
Camargo, Anamaria A.; Samaia, Helena P. B.; Dias-Neto, Emmanuel; Simão, Daniel F.; Migotto, Italo A.; Briones, Marcelo R. S.; Costa, Fernando F.; Aparecida Nagai, Maria; Verjovski-Almeida, Sergio; Zago, Marco A.; Andrade, Luis Eduardo C.; Carrer, Helaine; El-Dorry, Hamza F. A.; Espreafico, Enilza M.; Habr-Gama, Angelita; Giannella-Neto, Daniel; Goldman, Gustavo H.; Gruber, Arthur; Hackel, Christine; Kimura, Edna T.; Maciel, Rui M. B.; Marie, Suely K. N.; Martins, Elizabeth A. L.; Nóbrega, Marina P.; Paçó-Larson, Maria Luisa; Pardini, Maria Inês M. C.; Pereira, Gonçalo G.; Pesquero, João Bosco; Rodrigues, Vanderlei; Rogatto, Silvia R.; da Silva, Ismael D. C. G.; Sogayar, Mari C.; Sonati, Maria de Fátima; Tajara, Eloiza H.; Valentini, Sandro R.; Alberto, Fernando L.; Amaral, Maria Elisabete J.; Aneas, Ivy; Arnaldi, Liliane A. T.; de Assis, Angela M.; Bengtson, Mário Henrique; Bergamo, Nadia Aparecida; Bombonato, Vanessa; de Camargo, Maria E. R.; Canevari, Renata A.; Carraro, Dirce M.; Cerutti, Janete M.; Corrêa, Maria Lucia C.; Corrêa, Rosana F. R.; Costa, Maria Cristina R.; Curcio, Cyntia; Hokama, Paula O. M.; Ferreira, Ari J. S.; Furuzawa, Gilberto K.; Gushiken, Tsieko; Ho, Paulo L.; Kimura, Elza; Krieger, José E.; Leite, Luciana C. C.; Majumder, Paromita; Marins, Mozart; Marques, Everaldo R.; Melo, Analy S. A.; Melo, Monica; Mestriner, Carlos Alberto; Miracca, Elisabete C.; Miranda, Daniela C.; Nascimento, Ana Lucia T. O.; Nóbrega, Francisco G.; Ojopi, Élida P. B.; Pandolfi, José Rodrigo C.; Pessoa, Luciana G.; Prevedel, Aline C.; Rahal, Paula; Rainho, Claudia A.; Reis, Eduardo M. R.; Ribeiro, Marcelo L.; da Rós, Nancy; de Sá, Renata G.; Sales, Magaly M.; Sant'anna, Simone Cristina; dos Santos, Mariana L.; da Silva, Aline M.; da Silva, Neusa P.; Silva, Wilson A.; da Silveira, Rosana A.; Sousa, Josane F.; Stecconi, Daniella; Tsukumo, Fernando; Valente, Valéria; Soares, Fernando; Moreira, Eloisa S.; Nunes, Diana N.; Correa, Ricardo G.; Zalcberg, Heloisa; Carvalho, Alex F.; Reis, Luis F. L.; Brentani, Ricardo R.; Simpson, Andrew J. G.; de Souza, Sandro J.
2001-01-01
Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription–PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning. PMID:11593022
The contribution of 700,000 ORF sequence tags to the definition of the human transcriptome.
Camargo, A A; Samaia, H P; Dias-Neto, E; Simão, D F; Migotto, I A; Briones, M R; Costa, F F; Nagai, M A; Verjovski-Almeida, S; Zago, M A; Andrade, L E; Carrer, H; El-Dorry, H F; Espreafico, E M; Habr-Gama, A; Giannella-Neto, D; Goldman, G H; Gruber, A; Hackel, C; Kimura, E T; Maciel, R M; Marie, S K; Martins, E A; Nobrega, M P; Paco-Larson, M L; Pardini, M I; Pereira, G G; Pesquero, J B; Rodrigues, V; Rogatto, S R; da Silva, I D; Sogayar, M C; Sonati, M F; Tajara, E H; Valentini, S R; Alberto, F L; Amaral, M E; Aneas, I; Arnaldi, L A; de Assis, A M; Bengtson, M H; Bergamo, N A; Bombonato, V; de Camargo, M E; Canevari, R A; Carraro, D M; Cerutti, J M; Correa, M L; Correa, R F; Costa, M C; Curcio, C; Hokama, P O; Ferreira, A J; Furuzawa, G K; Gushiken, T; Ho, P L; Kimura, E; Krieger, J E; Leite, L C; Majumder, P; Marins, M; Marques, E R; Melo, A S; Melo, M B; Mestriner, C A; Miracca, E C; Miranda, D C; Nascimento, A L; Nobrega, F G; Ojopi, E P; Pandolfi, J R; Pessoa, L G; Prevedel, A C; Rahal, P; Rainho, C A; Reis, E M; Ribeiro, M L; da Ros, N; de Sa, R G; Sales, M M; Sant'anna, S C; dos Santos, M L; da Silva, A M; da Silva, N P; Silva, W A; da Silveira, R A; Sousa, J F; Stecconi, D; Tsukumo, F; Valente, V; Soares, F; Moreira, E S; Nunes, D N; Correa, R G; Zalcberg, H; Carvalho, A F; Reis, L F; Brentani, R R; Simpson, A J; de Souza, S J; Melo, M
2001-10-09
Open reading frame expressed sequences tags (ORESTES) differ from conventional ESTs by providing sequence data from the central protein coding portion of transcripts. We generated a total of 696,745 ORESTES sequences from 24 human tissues and used a subset of the data that correspond to a set of 15,095 full-length mRNAs as a means of assessing the efficiency of the strategy and its potential contribution to the definition of the human transcriptome. We estimate that ORESTES sampled over 80% of all highly and moderately expressed, and between 40% and 50% of rarely expressed, human genes. In our most thoroughly sequenced tissue, the breast, the 130,000 ORESTES generated are derived from transcripts from an estimated 70% of all genes expressed in that tissue, with an equally efficient representation of both highly and poorly expressed genes. In this respect, we find that the capacity of the ORESTES strategy both for gene discovery and shotgun transcript sequence generation significantly exceeds that of conventional ESTs. The distribution of ORESTES is such that many human transcripts are now represented by a scaffold of partial sequences distributed along the length of each gene product. The experimental joining of the scaffold components, by reverse transcription-PCR, represents a direct route to transcript finishing that may represent a useful alternative to full-length cDNA cloning.
LncRNA-DANCR: A valuable cancer related long non-coding RNA for human cancers.
Thin, Khaing Zar; Liu, Xuefang; Feng, Xiaobo; Raveendran, Sudheesh; Tu, Jian Cheng
2018-06-01
Long noncoding RNAs (lncRNA) are a type of noncoding RNA that comprise of longer than 200 nucleotides sequences. They can regulate chromosome structure, gene expression and play an essential role in the pathophysiology of human diseases, especially in tumorigenesis and progression. Nowadays, they are being targeted as potential biomarkers for various cancer types. And many research studies have proven that lncRNAs might bring a new era to cancer diagnosis and support treatment management. The purpose of this review was to inspect the molecular mechanism and clinical significance of long non-coding RNA- differentiation antagonizing nonprotein coding RNA(DANCR) in various types of human cancers. In this review, we summarize and figure out recent research studies concerning the expression and biological mechanisms of lncRNA-DANCR in tumour development. The related studies were obtained through a systematic search of PubMed, Embase and Cochrane Library. Long non-coding RNAs-DANCR is a valuable cancer-related lncRNA that its dysregulated expression was found in a variety of malignancies, including hepatocellular carcinoma, breast cancer, glioma, colorectal cancer, gastric cancer, and lung cancer. The aberrant expressions of DANCR have been shown to contribute to proliferation, migration and invasion of cancer cells. Long non-coding RNAs-DANCR likely serves as a useful disease biomarker or therapeutic cancer target. Copyright © 2018 Elsevier GmbH. All rights reserved.
Chamber Specific Gene Expression Landscape of the Zebrafish Heart
Singh, Angom Ramcharan; Sivadas, Ambily; Sabharwal, Ankit; Vellarikal, Shamsudheen Karuthedath; Jayarajan, Rijith; Verma, Ankit; Kapoor, Shruti; Joshi, Adita; Scaria, Vinod; Sivasubbu, Sridhar
2016-01-01
The organization of structure and function of cardiac chambers in vertebrates is defined by chamber-specific distinct gene expression. This peculiarity and uniqueness of the genetic signatures demonstrates functional resolution attributed to the different chambers of the heart. Altered expression of the cardiac chamber genes can lead to individual chamber related dysfunctions and disease patho-physiologies. Information on transcriptional repertoire of cardiac compartments is important to understand the spectrum of chamber specific anomalies. We have carried out a genome wide transcriptome profiling study of the three cardiac chambers in the zebrafish heart using RNA sequencing. We have captured the gene expression patterns of 13,396 protein coding genes in the three cardiac chambers—atrium, ventricle and bulbus arteriosus. Of these, 7,260 known protein coding genes are highly expressed (≥10 FPKM) in the zebrafish heart. Thus, this study represents nearly an all-inclusive information on the zebrafish cardiac transcriptome. In this study, a total of 96 differentially expressed genes across the three cardiac chambers in zebrafish were identified. The atrium, ventricle and bulbus arteriosus displayed 20, 32 and 44 uniquely expressing genes respectively. We validated the expression of predicted chamber-restricted genes using independent semi-quantitative and qualitative experimental techniques. In addition, we identified 23 putative novel protein coding genes that are specifically restricted to the ventricle and not in the atrium or bulbus arteriosus. In our knowledge, these 23 novel genes have either not been investigated in detail or are sparsely studied. The transcriptome identified in this study includes 68 differentially expressing zebrafish cardiac chamber genes that have a human ortholog. We also carried out spatiotemporal gene expression profiling of the 96 differentially expressed genes throughout the three cardiac chambers in 11 developmental stages and 6 tissue types of zebrafish. We hypothesize that clustering the differentially expressed genes with both known and unknown functions will deliver detailed insights on fundamental gene networks that are important for the development and specification of the cardiac chambers. It is also postulated that this transcriptome atlas will help utilize zebrafish in a better way as a model for studying cardiac development and to explore functional role of gene networks in cardiac disease pathogenesis. PMID:26815362
Mechanisms of radiation-induced gene responses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woloschak, G.E.; Paunesku, T.
1996-10-01
In the process of identifying genes differentially expressed in cells exposed ultraviolet radiation, we have identified a transcript having a 26-bp region that is highly conserved in a variety of species including Bacillus circulans, yeast, pumpkin, Drosophila, mouse, and man. When the 5` region (flanking region or UTR) of a gene, the sequence is predominantly in +/+ orientation with respect to the coding DNA strand; while in the coding region and the 3` region (UTR), the sequence is most frequently in the +/-orientation with respect to the coding DNA strand. In two genes, the element is split into two parts;more » however, in most cases, it is found only once but with a minimum of 11 consecutive nucleotides precisely depicting the original sequence. The element is found in a large number of different genes with diverse functions (from human ras p21 to B. circulans chitonase). Gel shift assays demonstrated the presence of a protein in HeLa cell extracts that binds to the sense and antisense single-stranded consensus oligomers, as well as to the double- stranded oligonucleotide. When double-stranded oligomer was used, the size shift demonstrated as additional protein-oligomer complex larger than the one bound to either sense or antisense single-stranded consensus oligomers alone. It is speculated either that this element binds to protein(s) important in maintaining DNA is a single-stranded orientation for transcription or, alternatively that this element is important in the transcription-coupled DNA repair process.« less
SinEx DB: a database for single exon coding sequences in mammalian genomes.
Jorquera, Roddy; Ortiz, Rodrigo; Ossandon, F; Cárdenas, Juan Pablo; Sepúlveda, Rene; González, Carolina; Holmes, David S
2016-01-01
Eukaryotic genes are typically interrupted by intragenic, noncoding sequences termed introns. However, some genes lack introns in their coding sequence (CDS) and are generally known as 'single exon genes' (SEGs). In this work, a SEG is defined as a nuclear, protein-coding gene that lacks introns in its CDS. Whereas, many public databases of Eukaryotic multi-exon genes are available, there are only two specialized databases for SEGs. The present work addresses the need for a more extensive and diverse database by creating SinEx DB, a publicly available, searchable database of predicted SEGs from 10 completely sequenced mammalian genomes including human. SinEx DB houses the DNA and protein sequence information of these SEGs and includes their functional predictions (KOG) and the relative distribution of these functions within species. The information is stored in a relational database built with My SQL Server 5.1.33 and the complete dataset of SEG sequences and their functional predictions are available for downloading. SinEx DB can be interrogated by: (i) a browsable phylogenetic schema, (ii) carrying out BLAST searches to the in-house SinEx DB of SEGs and (iii) via an advanced search mode in which the database can be searched by key words and any combination of searches by species and predicted functions. SinEx DB provides a rich source of information for advancing our understanding of the evolution and function of SEGs.Database URL: www.sinex.cl. © The Author(s) 2016. Published by Oxford University Press.
Lin, C S; Sun, Y L; Liu, C Y; Yang, P C; Chang, L C; Cheng, I C; Mao, S J; Huang, M C
1999-08-05
The complete nucleotide sequence of the pig (Sus scrofa) mitochondrial genome, containing 16613bp, is presented in this report. The genome is not a specific length because of the presence of the variable numbers of tandem repeats, 5'-CGTGCGTACA in the displacement loop (D-loop). Genes responsible for 12S and 16S rRNAs, 22 tRNAs, and 13 protein-coding regions are found. The genome carries very few intergenic nucleotides with several instances of overlap between protein-coding or tRNA genes, except in the D-loop region. For evaluating the possible evolutionary relationships between Artiodactyla and Cetacea, the nucleotide substitutions and amino acid sequences of 13 protein-coding genes were aligned by pairwise comparisons of the pig, cow, and fin whale. By comparing these sequences, we suggest that there is a closer relationship between the pig and cow than that between either of these species and fin whale. In addition, the accumulation of transversions and gaps in pig 12S and 16S rRNA genes was compared with that in other eutherian species, including cow, fin whale, human, horse, and harbor seal. The results also reveal a close phylogenetic relationship between pig and cow, as compared to fin whale and others. Thus, according to the sequence differences of mitochondrial rRNA genes in eutherian species, the evolutionary separation of pig and cow occurred about 53-60 million years ago.
HOTAIR: An Oncogenic Long Non-Coding RNA in Human Cancer.
Tang, Qing; Hann, Swei Sunny
2018-05-24
Long non-coding RNAs (LncRNAs) represent a novel class of noncoding RNAs that are longer than 200 nucleotides without protein-coding potential and function as novel master regulators in various human diseases, including cancer. Accumulating evidence shows that lncRNAs are dysregulated and implicated in various aspects of cellular homeostasis, such as proliferation, apoptosis, mobility, invasion, metastasis, chromatin remodeling, gene transcription, and post-transcriptional processing. However, the mechanisms by which lncRNAs regulate various biological functions in human diseases have yet to be determined. HOX antisense intergenic RNA (HOTAIR) is a recently discovered lncRNA and plays a critical role in various areas of cancer, such as proliferation, survival, migration, drug resistance, and genomic stability. In this review, we briefly introduce the concept, identification, and biological functions of HOTAIR. We then describe the involvement of HOTAIR that has been associated with tumorigenesis, growth, invasion, cancer stem cell differentiation, metastasis, and drug resistance in cancer. We also discuss emerging insights into the role of HOTAIR as potential biomarkers and therapeutic targets for novel treatment paradigms in cancer. © 2018 The Author(s). Published by S. Karger AG, Basel.
Cenik, Can; Chua, Hon Nian; Singh, Guramrit; Akef, Abdalla; Snyder, Michael P; Palazzo, Alexander F; Moore, Melissa J; Roth, Frederick P
2017-03-01
Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5 ' proximal- i ntron- m inus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N 1 -methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N 1 -methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC. © 2017 Cenik et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klebig, M.L.; Woychik, R.P.; Wilkinson, J.E.
1994-09-01
The lethal yellow (A{sup y/-}) and viable yellow (A{sup vy/-}) mouse agouti mutants have a predominantly yellow pelage and display a complex syndrome that includes obesity, hyperinsulinemia, and insulin resistance, hallmark features of obesity-associated noninsulin-dependent diabetes mellitus (NIDDM) in humans. A new dominant agouti allele, A{sup iapy}, has recently been identified; like the A{sup vy} allele, it is homozygous viable and confers obesity and yellow fur in heterozygotes. The agouti gene was cloned and characterized at the molecular level. The gene is expressed in the skin during hair growth and is predicted to encode a 131 amino acid protein, thatmore » is likely to be a secreted factor. In both Ay/- and A{sup iapy}/- mice, the obesity and other dominant pleiotropic effects are associated with an ectopic expression of agouti in many tissues where the gene product is normally not produced. In Ay, a 170-kb deletion has occurred that causes an upstream promoter to drive the ectopic expression of the wild-type agouti coding exons. In A{sup iapy}, the coding region of the gene is expressed from a cryptic promoter within the LTR of an intracisternal A-particle (IAP), which has integrated within the region just upstream of the first agouti coding exon. Transgenic mice ubiquitously expressing the cloned agouti gene under the influence of the beta-actin and phosphoglycerate kinase promoters display obesity, hyperinsulinemia, and yellow coat color. This demonstrates unequivocally that ectopic expression of agouti is responsible for the yellow obese syndrome.« less
Serum amyloid A1: Structure, function and gene polymorphism
Sun, Lei; Ye, Richard D.
2017-01-01
Inducible expression of serum amyloid A (SAA) is a hallmark of the acute-phase response, which is a conserved reaction of vertebrates to environmental challenges such as tissue injury, infection and surgery. Human SAA1 is encoded by one of the four SAA genes and is the best-characterized SAA protein. Initially known as a major precursor of amyloid A (AA), SAA1 has been found to play an important role in lipid metabolism and contributes to bacterial clearance, the regulation of inflammation and tumor pathogenesis. SAA1 has five polymorphic coding alleles (SAA1.1 – SAA1.5) that encode distinct proteins with minor amino acid substitutions. Single nucleotide polymorphism (SNP) has been identified in both the coding and non-coding regions of human SAA1. Despite high levels of sequence homology among these variants, SAA1 polymorphisms have been reported as risk factors of cardiovascular diseases and several types of cancer. A recently solved crystal structure of SAA1.1 reveals a hexameric bundle with each of the SAA1 subunits assuming a 4-helix structure stabilized by the C-terminal tail. Analysis of the native SAA1.1 structure has led to the identification of a competing site for high-density lipoprotein (HDL) and heparin, thus providing the structural basis for a role of heparin and heparan sulfate in the conversion of SAA1 to AA. In this brief review, we compares human SAA1 with other forms of human and mouse SAAs, and discuss how structural and genetic studies of SAA1 have advanced our understanding of the physiological functions of the SAA proteins. PMID:26945629
Chung, H Y; Choi, Y C; Park, H N
2015-05-18
We investigated the phylogenetic relationships between pig breeds, compared the genetic similarity between humans and pigs, and provided basic genetic information on Korean native pigs (KNPs), using genetic variants of the swine leukocyte antigen 3 (SLA-3) gene. Primers were based on sequences from GenBank (accession Nos. AF464010 and AF464009). Polymerase chain reaction analysis amplified approximately 1727 bp of segments, which contained 1086 bp of coding regions and 641 bp of the 3'- and 5'-untranslated regions. Bacterial artificial chromosome clones of miniature pigs were used for sequencing the SLA-3 genomic region, which was 3114 bp in total length, including the coding (1086 bp) and non-coding (2028 bp) regions. Sequence analysis detected 53 single nucleotide polymorphisms (SNPs), based on a minor allele frequency greater than 0.01, which is low compared with other pig breeds, and the results suggest that there is low genetic variability in KNPs. Comparative analysis revealed that humans possess approximately three times more genetic variation than do pigs. Approximately 71% of SNPs in exons 2 and 3 were detected in KNPs, and exon 5 in humans is a highly polymorphic region. Newly identified sequences of SLA-3 using KNPs were submitted to GenBank (accession No. DQ992512-18). Cluster analysis revealed that KNPs were grouped according to three major alleles: SLA-3*0502 (DQ992518), SLA-3*0302 (DQ992513 and DQ992516), and SLA-3*0303 (DQ992512, DQ992514, DQ992515, and DQ992517). Alignments revealed that humans have a relatively close genetic relationship with pigs and chimpanzees. The information provided by this study may be useful in KNP management.
Human active X-specific DNA methylation events showing stability across time and tissues
Joo, Jihoon Eric; Novakovic, Boris; Cruickshank, Mark; Doyle, Lex W; Craig, Jeffrey M; Saffery, Richard
2014-01-01
The phenomenon of X chromosome inactivation in female mammals is well characterised and remains the archetypal example of dosage compensation via monoallelic expression. The temporal series of events that culminates in inactive X-specific gene silencing by DNA methylation has revealed a ‘patchwork' of gene inactivation along the chromosome, with approximately 15% of genes escaping. Such genes are therefore potentially subject to sex-specific imbalance between males and females. Aside from XIST, the non-coding RNA on the X chromosome destined to be inactivated, very little is known about the extent of loci that may be selectively silenced on the active X chromosome (Xa). Using longitudinal array-based DNA methylation profiling of two human tissues, we have identified specific and widespread active X-specific DNA methylation showing stability over time and across tissues of disparate origin. Our panel of X-chromosome loci subject to methylation on Xa reflects a potentially novel mechanism for controlling female-specific X inactivation and sex-specific dimorphisms in humans. Further work is needed to investigate these phenomena. PMID:24713664
Yang, Jian-Hua; Zhang, Xiao-Chen; Huang, Zhan-Peng; Zhou, Hui; Huang, Mian-Bo; Zhang, Shu; Chen, Yue-Qin; Qu, Liang-Hu
2006-01-01
Small nucleolar RNAs (snoRNAs) represent an abundant group of non-coding RNAs in eukaryotes. They can be divided into guide and orphan snoRNAs according to the presence or absence of antisense sequence to rRNAs or snRNAs. Current snoRNA-searching programs, which are essentially based on sequence complementarity to rRNAs or snRNAs, exist only for the screening of guide snoRNAs. In this study, we have developed an advanced computational package, snoSeeker, which includes CDseeker and ACAseeker programs, for the highly efficient and specific screening of both guide and orphan snoRNA genes in mammalian genomes. By using these programs, we have systematically scanned four human-mammal whole-genome alignment (WGA) sequences and identified 54 novel candidates including 26 orphan candidates as well as 266 known snoRNA genes. Eighteen novel snoRNAs were further experimentally confirmed with four snoRNAs exhibiting a tissue-specific or restricted expression pattern. The results of this study provide the most comprehensive listing of two families of snoRNA genes in the human genome till date.
Lee, Imchang; Chalita, Mauricio; Ha, Sung-Min; Na, Seong-In; Yoon, Seok-Hwan; Chun, Jongsik
2017-06-01
Thanks to the recent advancement of DNA sequencing technology, the cost and time of prokaryotic genome sequencing have been dramatically decreased. It has repeatedly been reported that genome sequencing using high-throughput next-generation sequencing is prone to contaminations due to its high depth of sequencing coverage. Although a few bioinformatics tools are available to detect potential contaminations, these have inherited limitations as they only use protein-coding genes. Here we introduce a new algorithm, called ContEst16S, to detect potential contaminations using 16S rRNA genes from genome assemblies. We screened 69 745 prokaryotic genomes from the NCBI Assembly Database using ContEst16S and found that 594 were contaminated by bacteria, human and plants. Of the predicted contaminated genomes, 8 % were not predicted by the existing protein-coding gene-based tool, implying that both methods can be complementary in the detection of contaminations. A web-based service of the algorithm is available at www.ezbiocloud.net/tools/contest16s.
PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.
Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay
2015-12-01
A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.
Yin, Yan-hui; Li, Bi-chun; Wei, Guang-hui; Zhu, Cai-ye; Li, Wei; Zhang, Ya-ni; Du, Li-xin; Cao, Wen-guang
2012-05-01
The aim of this study was to clone the heart-type fatty acid binding protein (H-FABP) gene of Xuhuai goat, to explore it bioinformatically, and analyze the subcellular localization using enhanced green fluorescent protein (EGFP). The results showed that the coding sequence (CDS) length of Xuhuai goat H-FABP gene was 402 bp, encoding 133 amino acids (GenBank accession number AY466498.1). The H-FABP cDNA coding sequence was compared with the corresponding region of human, chicken, brown rat, cow, wild boar, donkey, and zebrafish. The similarity were 89%, 76%, 85%, 84%, 93%, 91%, 70%, respectively. For the corresponding amino acid sequences, the similarity were 90%, 79%, 88%, 97%, 95%, 94%, 72%, respectively. This study did not find the signal peptide region in the H-FABP protein; it revealed that H-FABP protein might be a nonsecreted protein. H-FABP expression was detected in vitro by reverse transcription-polymerase chain reaction (RT-PCR), and the EGFP-H-FABP fusion protein was localized to the cytoplasm. The gene could also be transiently and permanently expressed in mice.
van Nierop, Pim; Vormer, Tinke L.; Foijer, Floris; Verheij, Joanne; Lodder, Johannes C.; Andersen, Jesper B.; Mansvelder, Huibert D.; te Riele, Hein
2018-01-01
To identify coding and non-coding suppressor genes of anchorage-independent proliferation by efficient loss-of-function screening, we have developed a method for enzymatic production of low complexity shRNA libraries from subtracted transcriptomes. We produced and screened two LEGO (Low-complexity by Enrichment for Genes shut Off) shRNA libraries that were enriched for shRNA vectors targeting coding and non-coding polyadenylated transcripts that were reduced in transformed Mouse Embryonic Fibroblasts (MEFs). The LEGO shRNA libraries included ~25 shRNA vectors per transcript which limited off-target artifacts. Our method identified 79 coding and non-coding suppressor transcripts. We found that taurine-responsive GABAA receptor subunits, including GABRA5 and GABRB3, were induced during the arrest of non-transformed anchor-deprived MEFs and prevented anchorless proliferation. We show that taurine activates chloride currents through GABAA receptors on MEFs, causing seclusion of cell volume in large membrane protrusions. Volume seclusion from cells by taurine correlated with reduced proliferation and, conversely, suppression of this pathway allowed anchorage-independent proliferation. In human cholangiocarcinomas, we found that several proteins involved in taurine signaling via GABAA receptors were repressed. Low GABRA5 expression typified hyperproliferative tumors, and loss of taurine signaling correlated with reduced patient survival, suggesting this tumor suppressive mechanism operates in vivo. PMID:29787571
Epigenetics: a new frontier in dentistry.
Williams, S D; Hughes, T E; Adler, C J; Brook, A H; Townsend, G C
2014-06-01
In 2007, only four years after the completion of the Human Genome Project, the journal Science announced that epigenetics was the 'breakthrough of the year'. Time magazine placed it second in the top 10 discoveries of 2009. While our genetic code (i.e. our DNA) contains all of the information to produce the elements we require to function, our epigenetic code determines when and where genes in the genetic code are expressed. Without the epigenetic code, the genetic code is like an orchestra without a conductor. Although there is now a substantial amount of published research on epigenetics in medicine and biology, epigenetics in dental research is in its infancy. However, epigenetics promises to become increasingly relevant to dentistry because of the role it plays in gene expression during development and subsequently potentially influencing oral disease susceptibility. This paper provides a review of the field of epigenetics aimed specifically at oral health professionals. It defines epigenetics, addresses the underlying concepts and provides details about specific epigenetic molecular mechanisms. Further, we discuss some of the key areas where epigenetics is implicated, and review the literature on epigenetics research in dentistry, including its relevance to clinical disciplines. This review considers some implications of epigenetics for the future of dental practice, including a 'personalized medicine' approach to the management of common oral diseases. © 2014 Australian Dental Association.
Carrion, Katrina; Dyo, Jeffrey; Patel, Vishal; Sasik, Roman; Mohamed, Salah A; Hardiman, Gary; Nigam, Vishal
2014-01-01
Aortic valve calcification is a significant and serious clinical problem for which there are no effective medical treatments. Individuals born with bicuspid aortic valves, 1-2% of the population, are at the highest risk of developing aortic valve calcification. Aortic valve calcification involves increased expression of calcification and inflammatory genes. Bicuspid aortic valve leaflets experience increased biomechanical strain as compared to normal tricuspid aortic valves. The molecular pathogenesis involved in the calcification of BAVs are not well understood, especially the molecular response to mechanical stretch. HOTAIR is a long non-coding RNA (lncRNA) that has been implicated with cancer but has not been studied in cardiac disease. We have found that HOTAIR levels are decreased in BAVs and in human aortic interstitial cells (AVICs) exposed to cyclic stretch. Reducing HOTAIR levels via siRNA in AVICs results in increased expression of calcification genes. Our data suggest that β-catenin is a stretch responsive signaling pathway that represses HOTAIR. This is the first report demonstrating that HOTAIR is mechanoresponsive and repressed by WNT β-catenin signaling. These findings provide novel evidence that HOTAIR is involved in aortic valve calcification.
Itoh, S; Yanagimoto, T; Tagawa, S; Hashimoto, H; Kitamura, R; Nakajima, Y; Okochi, T; Fujimoto, S; Uchino, J; Kamataki, T
1992-03-24
P-450IIIA7 is a form of cytochrome P-450 which was isolated from human fetal livers and termed P-450HFLa. This form has been clarified to be expressed during fetal life specifically (Komori, M., Nishio, K., Kitada, M., Shiramatsu, K., Muroya, K., Soma, M., Nagashima, K. and Kamataki, T. (1990) Biochemistry 29, 4430-4433). In the present study, we isolated five independent clones which probably corresponded to the human P-450IIIA7 gene. These clones were completely sequenced, all exons, exon-intron junctions and the 5' flanking region from the cap site to-869. Although the sequences in the coding region were completely identical to P-450IIIA7, it is possible that genomic fragments sequenced in this study encode portions of other P-450IIIA7-related genes since we could not obtain a complete overlapping set of genomic clones. Within its 5' flanking sequence, the putative binding sites of several transcriptional regulatory factors existed. Among them, it was shown that a basic transcription element binding factor (BTEB) actually interacted with the 5' flanking region of this gene.
The contribution of de novo coding mutations to autism spectrum disorder.
Iossifov, Ivan; O'Roak, Brian J; Sanders, Stephan J; Ronemus, Michael; Krumm, Niklas; Levy, Dan; Stessman, Holly A; Witherspoon, Kali T; Vives, Laura; Patterson, Karynne E; Smith, Joshua D; Paeper, Bryan; Nickerson, Deborah A; Dea, Jeanselle; Dong, Shan; Gonzalez, Luis E; Mandell, Jeffrey D; Mane, Shrikant M; Murtha, Michael T; Sullivan, Catherine A; Walker, Michael F; Waqar, Zainulabedin; Wei, Liping; Willsey, A Jeremy; Yamrom, Boris; Lee, Yoon-ha; Grabowska, Ewa; Dalkic, Ertugrul; Wang, Zihua; Marks, Steven; Andrews, Peter; Leotta, Anthony; Kendall, Jude; Hakker, Inessa; Rosenbaum, Julie; Ma, Beicong; Rodgers, Linda; Troge, Jennifer; Narzisi, Giuseppe; Yoon, Seungtai; Schatz, Michael C; Ye, Kenny; McCombie, W Richard; Shendure, Jay; Eichler, Evan E; State, Matthew W; Wigler, Michael
2014-11-13
Whole exome sequencing has proven to be a powerful tool for understanding the genetic architecture of human disease. Here we apply it to more than 2,500 simplex families, each having a child with an autistic spectrum disorder. By comparing affected to unaffected siblings, we show that 13% of de novo missense mutations and 43% of de novo likely gene-disrupting (LGD) mutations contribute to 12% and 9% of diagnoses, respectively. Including copy number variants, coding de novo mutations contribute to about 30% of all simplex and 45% of female diagnoses. Almost all LGD mutations occur opposite wild-type alleles. LGD targets in affected females significantly overlap the targets in males of lower intelligence quotient (IQ), but neither overlaps significantly with targets in males of higher IQ. We estimate that LGD mutation in about 400 genes can contribute to the joint class of affected females and males of lower IQ, with an overlapping and similar number of genes vulnerable to contributory missense mutation. LGD targets in the joint class overlap with published targets for intellectual disability and schizophrenia, and are enriched for chromatin modifiers, FMRP-associated genes and embryonically expressed genes. Most of the significance for the latter comes from affected females.
Rozhdestvensky, Timofey S; Robeck, Thomas; Galiveti, Chenna R; Raabe, Carsten A; Seeger, Birte; Wolters, Anna; Gubar, Leonid V; Brosius, Jürgen; Skryabin, Boris V
2016-02-05
Prader-Willi syndrome (PWS) is a neurogenetic disorder caused by loss of paternally expressed genes on chromosome 15q11-q13. The PWS-critical region (PWScr) contains an array of non-protein coding IPW-A exons hosting intronic SNORD116 snoRNA genes. Deletion of PWScr is associated with PWS in humans and growth retardation in mice exhibiting ~15% postnatal lethality in C57BL/6 background. Here we analysed a knock-in mouse containing a 5'HPRT-LoxP-Neo(R) cassette (5'LoxP) inserted upstream of the PWScr. When the insertion was inherited maternally in a paternal PWScr-deletion mouse model (PWScr(p-/m5'LoxP)), we observed compensation of growth retardation and postnatal lethality. Genomic methylation pattern and expression of protein-coding genes remained unaltered at the PWS-locus of PWScr(p-/m5'LoxP) mice. Interestingly, ubiquitous Snord116 and IPW-A exon transcription from the originally silent maternal chromosome was detected. In situ hybridization indicated that PWScr(p-/m5'LoxP) mice expressed Snord116 in brain areas similar to wild type animals. Our results suggest that the lack of PWScr RNA expression in certain brain areas could be a primary cause of the growth retardation phenotype in mice. We propose that activation of disease-associated genes on imprinted regions could lead to general therapeutic strategies in associated diseases.
Expression and Role of Gonadotropin-Releasing Hormone 2 and Its Receptor in Mammals
Desaulniers, Amy T.; Cederberg, Rebecca A.; Lents, Clay A.; White, Brett R.
2017-01-01
Gonadotropin-releasing hormone 1 (GnRH1) and its receptor (GnRHR1) drive mammalian reproduction via regulation of the gonadotropins. Yet, a second form of GnRH (GnRH2) and its receptor (GnRHR2) also exist in mammals. GnRH2 has been completely conserved throughout 500 million years of evolution, signifying high selection pressure and a critical biological role. However, the GnRH2 gene is absent (e.g., rat) or inactivated (e.g., cow and sheep) in some species but retained in others (e.g., human, horse, and pig). Likewise, many species (e.g., human, chimpanzee, cow, and sheep) retain the GnRHR2 gene but lack the appropriate coding sequence to produce a full-length protein due to gene coding errors; although production of GnRHR2 in humans remains controversial. Certain mammals lack the GnRHR2 gene (e.g., mouse) or most exons entirely (e.g., rat). In contrast, old world monkeys, musk shrews, and pigs maintain the coding sequence required to produce a functional GnRHR2. Like GnRHR1, GnRHR2 is a 7-transmembrane, G protein-coupled receptor that interacts with Gαq/11 to mediate cell signaling. However, GnRHR2 retains a cytoplasmic tail and is only 40% homologous to GnRHR1. A role for GnRH2 and its receptor in mammals has been elusive, likely because common laboratory models lack both the ligand and receptor. Uniquely, both GnRH2 and GnRHR2 are ubiquitously expressed; transcript levels are abundant in peripheral tissues and scarcely found in regions of the brain associated with gonadotropin secretion, suggesting a divergent role from GnRH1/GnRHR1. Indeed, GnRH2 and its receptor are not physiological modulators of gonadotropin secretion in mammals. Instead, GnRH2 and GnRHR2 coordinate the interaction between nutritional status and sexual behavior in the female brain. Within peripheral tissues, GnRH2 and its receptor are novel regulators of reproductive organs. GnRH2 and GnRHR2 directly stimulate steroidogenesis within the porcine testis. In the female, GnRH2 and its receptor may help mediate placental function, implantation, and ovarian steroidogenesis. Furthermore, both the GnRH2 and GnRHR2 genes are expressed in human reproductive tumors and represent emerging targets for cancer treatment. Thus, GnRH2 and GnRHR2 have diverse functions in mammals which remain largely unexplored. PMID:29312140
Kamm, Gretel B.; López-Leal, Rodrigo; Lorenzo, Juan R.; Franchini, Lucía F.
2013-01-01
The developmental brain gene NPAS3 stands out as a hot spot in human evolution because it contains the largest number of human-specific, fast-evolving, conserved, non-coding elements. In this paper we studied 2xHAR142, one of these elements that is located in the fifth intron of NPAS3. Using transgenic mice, we show that the mouse and chimp 2xHAR142 orthologues behave as transcriptional enhancers driving expression of the reporter gene lacZ to a similar NPAS3 expression subdomain in the mouse central nervous system. Interestingly, the human 2xHAR142 orthologue drives lacZ expression to an extended expression pattern in the nervous system. Thus, molecular evolution of 2xHAR142 provides the first documented example of human-specific heterotopy in the forebrain promoted by a transcriptional enhancer and suggests that it may have contributed to assemble the unique properties of the human brain. PMID:24218632
Origin of the polymorphism of the involucrin gene in Asians.
Djian, P; Delhomme, B; Green, H
1995-01-01
The involucrin gene, encoding a protein of the terminally differentiated keratinocyte, is polymorphic in the human. There is polymorphism of marker nucleotides a two positions in the coding region, and there are over eight polymorphic forms based on the number and kind of 10-codon tandem repeats in that part of the coding region most recently added in the human lineage. The involucrin alleles of Caucasians and Africans differ in both nucleotides and repeat patterns. We show that the involucrin alleles of East Asians (Chinese and Japanese) can be divided into two populations according to whether they possess the two marker nucleotides typical of Africans or Caucasians. The Asian population bearing Caucasian-type marker nucleotides has repeat patterns similar to those of Caucasians, whereas Asians bearing African-type marker nucleotides have repeat patterns that resemble those of Africans more than those of Caucasians. The existence of two populations of East Asian involucrin alleles gives support for the existence of a Eurasian stem lineage from which Caucasians and a part of the Asian population originated. PMID:7762559
Rare and low-frequency coding variants alter human adult height
Marouli, Eirini; Graff, Mariaelisa; Medina-Gomez, Carolina; Lo, Ken Sin; Wood, Andrew R; Kjaer, Troels R; Fine, Rebecca S; Lu, Yingchang; Schurmann, Claudia; Highland, Heather M; Rüeger, Sina; Thorleifsson, Gudmar; Justice, Anne E; Lamparter, David; Stirrups, Kathleen E; Turcot, Valérie; Young, Kristin L; Winkler, Thomas W; Esko, Tõnu; Karaderi, Tugce; Locke, Adam E; Masca, Nicholas GD; Ng, Maggie CY; Mudgal, Poorva; Rivas, Manuel A; Vedantam, Sailaja; Mahajan, Anubha; Guo, Xiuqing; Abecasis, Goncalo; Aben, Katja K; Adair, Linda S; Alam, Dewan S; Albrecht, Eva; Allin, Kristine H; Allison, Matthew; Amouyel, Philippe; Appel, Emil V; Arveiler, Dominique; Asselbergs, Folkert W; Auer, Paul L; Balkau, Beverley; Banas, Bernhard; Bang, Lia E; Benn, Marianne; Bergmann, Sven; Bielak, Lawrence F; Blüher, Matthias; Boeing, Heiner; Boerwinkle, Eric; Böger, Carsten A; Bonnycastle, Lori L; Bork-Jensen, Jette; Bots, Michiel L; Bottinger, Erwin P; Bowden, Donald W; Brandslund, Ivan; Breen, Gerome; Brilliant, Murray H; Broer, Linda; Burt, Amber A; Butterworth, Adam S; Carey, David J; Caulfield, Mark J; Chambers, John C; Chasman, Daniel I; Chen, Yii-Der Ida; Chowdhury, Rajiv; Christensen, Cramer; Chu, Audrey Y; Cocca, Massimiliano; Collins, Francis S; Cook, James P; Corley, Janie; Galbany, Jordi Corominas; Cox, Amanda J; Cuellar-Partida, Gabriel; Danesh, John; Davies, Gail; de Bakker, Paul IW; de Borst, Gert J.; de Denus, Simon; de Groot, Mark CH; de Mutsert, Renée; Deary, Ian J; Dedoussis, George; Demerath, Ellen W; den Hollander, Anneke I; Dennis, Joe G; Di Angelantonio, Emanuele; Drenos, Fotios; Du, Mengmeng; Dunning, Alison M; Easton, Douglas F; Ebeling, Tapani; Edwards, Todd L; Ellinor, Patrick T; Elliott, Paul; Evangelou, Evangelos; Farmaki, Aliki-Eleni; Faul, Jessica D; Feitosa, Mary F; Feng, Shuang; Ferrannini, Ele; Ferrario, Marco M; Ferrieres, Jean; Florez, Jose C; Ford, Ian; Fornage, Myriam; Franks, Paul W; Frikke-Schmidt, Ruth; Galesloot, Tessel E; Gan, Wei; Gandin, Ilaria; Gasparini, Paolo; Giedraitis, Vilmantas; Giri, Ayush; Girotto, Giorgia; Gordon, Scott D; Gordon-Larsen, Penny; Gorski, Mathias; Grarup, Niels; Grove, Megan L.; Gudnason, Vilmundur; Gustafsson, Stefan; Hansen, Torben; Harris, Kathleen Mullan; Harris, Tamara B; Hattersley, Andrew T; Hayward, Caroline; He, Liang; Heid, Iris M; Heikkilä, Kauko; Helgeland, Øyvind; Hernesniemi, Jussi; Hewitt, Alex W; Hocking, Lynne J; Hollensted, Mette; Holmen, Oddgeir L; Hovingh, G. Kees; Howson, Joanna MM; Hoyng, Carel B; Huang, Paul L; Hveem, Kristian; Ikram, M. Arfan; Ingelsson, Erik; Jackson, Anne U; Jansson, Jan-Håkan; Jarvik, Gail P; Jensen, Gorm B; Jhun, Min A; Jia, Yucheng; Jiang, Xuejuan; Johansson, Stefan; Jørgensen, Marit E; Jørgensen, Torben; Jousilahti, Pekka; Jukema, J Wouter; Kahali, Bratati; Kahn, René S; Kähönen, Mika; Kamstrup, Pia R; Kanoni, Stavroula; Kaprio, Jaakko; Karaleftheri, Maria; Kardia, Sharon LR; Karpe, Fredrik; Kee, Frank; Keeman, Renske; Kiemeney, Lambertus A; Kitajima, Hidetoshi; Kluivers, Kirsten B; Kocher, Thomas; Komulainen, Pirjo; Kontto, Jukka; Kooner, Jaspal S; Kooperberg, Charles; Kovacs, Peter; Kriebel, Jennifer; Kuivaniemi, Helena; Küry, Sébastien; Kuusisto, Johanna; La Bianca, Martina; Laakso, Markku; Lakka, Timo A; Lange, Ethan M; Lange, Leslie A; Langefeld, Carl D; Langenberg, Claudia; Larson, Eric B; Lee, I-Te; Lehtimäki, Terho; Lewis, Cora E; Li, Huaixing; Li, Jin; Li-Gao, Ruifang; Lin, Honghuang; Lin, Li-An; Lin, Xu; Lind, Lars; Lindström, Jaana; Linneberg, Allan; Liu, Yeheng; Liu, Yongmei; Lophatananon, Artitaya; Luan, Jian'an; Lubitz, Steven A; Lyytikäinen, Leo-Pekka; Mackey, David A; Madden, Pamela AF; Manning, Alisa K; Männistö, Satu; Marenne, Gaëlle; Marten, Jonathan; Martin, Nicholas G; Mazul, Angela L; Meidtner, Karina; Metspalu, Andres; Mitchell, Paul; Mohlke, Karen L; Mook-Kanamori, Dennis O; Morgan, Anna; Morris, Andrew D; Morris, Andrew P; Müller-Nurasyid, Martina; Munroe, Patricia B; Nalls, Mike A; Nauck, Matthias; Nelson, Christopher P; Neville, Matt; Nielsen, Sune F; Nikus, Kjell; Njølstad, Pål R; Nordestgaard, Børge G; Ntalla, Ioanna; O'Connel, Jeffrey R; Oksa, Heikki; Loohuis, Loes M Olde; Ophoff, Roel A; Owen, Katharine R; Packard, Chris J; Padmanabhan, Sandosh; Palmer, Colin NA; Pasterkamp, Gerard; Patel, Aniruddh P; Pattie, Alison; Pedersen, Oluf; Peissig, Peggy L; Peloso, Gina M; Pennell, Craig E; Perola, Markus; Perry, James A; Perry, John R.B.; Person, Thomas N; Pirie, Ailith; Polasek, Ozren; Posthuma, Danielle; Raitakari, Olli T; Rasheed, Asif; Rauramaa, Rainer; Reilly, Dermot F; Reiner, Alex P; Renström, Frida; Ridker, Paul M; Rioux, John D; Robertson, Neil; Robino, Antonietta; Rolandsson, Olov; Rudan, Igor; Ruth, Katherine S; Saleheen, Danish; Salomaa, Veikko; Samani, Nilesh J; Sandow, Kevin; Sapkota, Yadav; Sattar, Naveed; Schmidt, Marjanka K; Schreiner, Pamela J; Schulze, Matthias B; Scott, Robert A; Segura-Lepe, Marcelo P; Shah, Svati; Sim, Xueling; Sivapalaratnam, Suthesh; Small, Kerrin S; Smith, Albert Vernon; Smith, Jennifer A; Southam, Lorraine; Spector, Timothy D; Speliotes, Elizabeth K; Starr, John M; Steinthorsdottir, Valgerdur; Stringham, Heather M; Stumvoll, Michael; Surendran, Praveen; Hart, Leen M ‘t; Tansey, Katherine E; Tardif, Jean-Claude; Taylor, Kent D; Teumer, Alexander; Thompson, Deborah J; Thorsteinsdottir, Unnur; Thuesen, Betina H; Tönjes, Anke; Tromp, Gerard; Trompet, Stella; Tsafantakis, Emmanouil; Tuomilehto, Jaakko; Tybjaerg-Hansen, Anne; Tyrer, Jonathan P; Uher, Rudolf; Uitterlinden, André G; Ulivi, Sheila; van der Laan, Sander W; Van Der Leij, Andries R; van Duijn, Cornelia M; van Schoor, Natasja M; van Setten, Jessica; Varbo, Anette; Varga, Tibor V; Varma, Rohit; Edwards, Digna R Velez; Vermeulen, Sita H; Vestergaard, Henrik; Vitart, Veronique; Vogt, Thomas F; Vozzi, Diego; Walker, Mark; Wang, Feijie; Wang, Carol A; Wang, Shuai; Wang, Yiqin; Wareham, Nicholas J; Warren, Helen R; Wessel, Jennifer; Willems, Sara M; Wilson, James G; Witte, Daniel R; Woods, Michael O; Wu, Ying; Yaghootkar, Hanieh; Yao, Jie; Yao, Pang; Yerges-Armstrong, Laura M; Young, Robin; Zeggini, Eleftheria; Zhan, Xiaowei; Zhang, Weihua; Zhao, Jing Hua; Zhao, Wei; Zhao, Wei; Zheng, He; Zhou, Wei; Rotter, Jerome I; Boehnke, Michael; Kathiresan, Sekar; McCarthy, Mark I; Willer, Cristen J; Stefansson, Kari; Borecki, Ingrid B; Liu, Dajiang J; North, Kari E; Heard-Costa, Nancy L; Pers, Tune H; Lindgren, Cecilia M; Oxvig, Claus; Kutalik, Zoltán; Rivadeneira, Fernando; Loos, Ruth JF; Frayling, Timothy M; Hirschhorn, Joel N; Deloukas, Panos; Lettre, Guillaume
2016-01-01
Summary Height is a highly heritable, classic polygenic trait with ∼700 common associated variants identified so far through genome-wide association studies. Here, we report 83 height-associated coding variants with lower minor allele frequencies (range of 0.1-4.8%) and effects of up to 2 cm/allele (e.g. in IHH, STC2, AR and CRISPLD2), >10 times the average effect of common variants. In functional follow-up studies, rare height-increasing alleles of STC2 (+1-2 cm/allele) compromised proteolytic inhibition of PAPP-A and increased cleavage of IGFBP-4 in vitro, resulting in higher bioavailability of insulin-like growth factors. These 83 height-associated variants overlap genes mutated in monogenic growth disorders and highlight new biological candidates (e.g. ADAMTS3, IL11RA, NOX4) and pathways (e.g. proteoglycan/glycosaminoglycan synthesis) involved in growth. Our results demonstrate that sufficiently large sample sizes can uncover rare and low-frequency variants of moderate to large effect associated with polygenic human phenotypes, and that these variants implicate relevant genes and pathways. PMID:28146470
A Mouse Model for the Metabolic Effects of the Human Fat Mass and Obesity Associated FTO Gene
Church, Chris; Deacon, Robert; Gerken, Thomas; Lee, Angela; Moir, Lee; Mecinović, Jasmin; Quwailid, Mohamed M.; Schofield, Christopher J.; Ashcroft, Frances M.; Cox, Roger D.
2009-01-01
Human FTO gene variants are associated with body mass index and type 2 diabetes. Because the obesity-associated SNPs are intronic, it is unclear whether changes in FTO expression or splicing are the cause of obesity or if regulatory elements within intron 1 influence upstream or downstream genes. We tested the idea that FTO itself is involved in obesity. We show that a dominant point mutation in the mouse Fto gene results in reduced fat mass, increased energy expenditure, and unchanged physical activity. Exposure to a high-fat diet enhances lean mass and lowers fat mass relative to control mice. Biochemical studies suggest the mutation occurs in a structurally novel domain and modifies FTO function, possibly by altering its dimerisation state. Gene expression profiling revealed increased expression of some fat and carbohydrate metabolism genes and an improved inflammatory profile in white adipose tissue of mutant mice. These data provide direct functional evidence that FTO is a causal gene underlying obesity. Compared to the reported mouse FTO knockout, our model more accurately reflects the effect of human FTO variants; we observe a heterozygous as well as homozygous phenotype, a smaller difference in weight and adiposity, and our mice do not show perinatal lethality or an age-related reduction in size and length. Our model suggests that a search for human coding mutations in FTO may be informative and that inhibition of FTO activity is a possible target for the treatment of morbid obesity. PMID:19680540
Santo, Evan E; Paik, Jihye
2018-06-17
The rapid development of CRISPR technology is revolutionizing molecular approaches to the dissection of complex biological phenomena. Here we describe an alternative generally applicable implementation of the CRISPR-Cas9 system that allows for selective knockdown of extremely homologous genes. This strategy employs the lentiviral delivery of paired sgRNAs and nickase Cas9 (Cas9D10A) to achieve targeted deletion of splice junctions. This general strategy offers several advantages over standard single-guide exon-targeting CRISPR-Cas9 such as greatly reduced off-target effects, more restricted genomic editing, routine disruption of target gene mRNA expression and the ability to differentiate between closely related genes. Here we demonstrate the utility of this strategy by achieving selective knockdown of the highly homologous human genes FOXO3A and suspected pseudogene FOXO3B. We find the spJCRISPR strategy to efficiently and selectively disrupt FOXO3A and FOXO3B mRNA and protein expression; thus revealing that the human FOXO3B locus encodes a bona fide human gene. Unlike FOXO3A, we find the FOXO3B protein to be cytosolically localized in both the presence and absence of active Akt. The ability to selectively target and efficiently disrupt the expression of the closely-related FOXO3A and FOXO3B genes demonstrates the efficacy of the spJCRISPR approach. Copyright © 2018. Published by Elsevier B.V.
Jenkins, Adam M; Waterhouse, Robert M; Muskavitch, Marc A T
2015-04-23
Long non-coding RNAs (lncRNAs) have been defined as mRNA-like transcripts longer than 200 nucleotides that lack significant protein-coding potential, and many of them constitute scaffolds for ribonucleoprotein complexes with critical roles in epigenetic regulation. Various lncRNAs have been implicated in the modulation of chromatin structure, transcriptional and post-transcriptional gene regulation, and regulation of genomic stability in mammals, Caenorhabditis elegans, and Drosophila melanogaster. The purpose of this study is to identify the lncRNA landscape in the malaria vector An. gambiae and assess the evolutionary conservation of lncRNAs and their secondary structures across the Anopheles genus. Using deep RNA sequencing of multiple Anopheles gambiae life stages, we have identified 2,949 lncRNAs and more than 300 previously unannotated putative protein-coding genes. The lncRNAs exhibit differential expression profiles across life stages and adult genders. We find that across the genus Anopheles, lncRNAs display much lower sequence conservation than protein-coding genes. Additionally, we find that lncRNA secondary structure is highly conserved within the Gambiae complex, but diverges rapidly across the rest of the genus Anopheles. This study offers one of the first lncRNA secondary structure analyses in vector insects. Our description of lncRNAs in An. gambiae offers the most comprehensive genome-wide insights to date into lncRNAs in this vector mosquito, and defines a set of potential targets for the development of vector-based interventions that may further curb the human malaria burden in disease-endemic countries.
Li, Hu; Barker, Stephen C.
2017-01-01
Fragmented mitochondrial (mt) genomes have been reported in 11 species of sucking lice (suborder Anoplura) that infest humans, chimpanzees, pigs, horses, and rodents. There is substantial variation among these lice in mt karyotype: the number of minichromosomes of a species ranges from 9 to 20; the number of genes in a minichromosome ranges from 1 to 8; gene arrangement in a minichromosome differs between species, even in the same genus. We sequenced the mt genome of the guanaco louse, Microthoracius praelongiceps, to help establish the ancestral mt karyotype for sucking lice and understand how fragmented mt genomes evolved. The guanaco louse has 12 mt minichromosomes; each minichromosome has 2–5 genes and a non-coding region. The guanaco louse shares many features with rodent lice in mt karyotype, more than with other sucking lice. The guanaco louse, however, is more closely related phylogenetically to human lice, chimpanzee lice, pig lice, and horse lice than to rodent lice. By parsimony analysis of shared features in mt karyotype, we infer that the most recent common ancestor of sucking lice, which lived ∼75 Ma, had 11 minichromosomes; each minichromosome had 1–6 genes and a non-coding region. As sucking lice diverged, split of mt minichromosomes occurred many times in the lineages leading to the lice of humans, chimpanzees, and rodents whereas merger of minichromosomes occurred in the lineage leading to the lice of pigs and horses. Together, splits and mergers of minichromosomes created a very complex and dynamic mt genome organization in the sucking lice. PMID:28164215
Genome medicine: gene therapy for the millennium, 30 September-3 October 2001, Rome, Italy.
Gruenert, D C; Novelli, G; Dallapiccola, B; Colosimo, A
2002-06-01
The recent surge of DNA sequence information resulting from the efforts of agencies interested in deciphering the human genetic code has facilitated technological developments that have been critical in the identification of genes associated with numerous disease pathologies. In addition, these efforts have opened the door to the opportunity to develop novel genetic therapies to treat a broad range of inherited disorders. Through a joint effort by the University of Vermont, the University of Rome, Tor Vergata, University of Rome, La Sapienza, and the CSS Mendel Institute, Rome, an international meeting, 'Genome Medicine: Gene Therapy for the Millennium' was organized. This meeting provided a forum for the discussion of scientific and clinical advances stimulated by the explosion of sequence information generated by the Human Genome Project and the implications these advances have for gene therapy. The meeting had six sessions that focused on the functional evaluation of specific genes via biochemical analysis and through animal models, the development of novel therapeutic strategies involving gene targeting, artificial chromsomes, DNA delivery systems and non-embryonic stem cells, and on the ethical and social implications of these advances.
Characterization of a gene coding for a type IIo bacterial IgG-binding protein.
Boyle, M D; Weber-Heynemann, J; Raeder, R; Podbielski, A
1995-06-01
Two antigenic classes of non-immune IgG-binding proteins can be expressed by group A streptococci. One antigenic group of proteins is recognized by an antibody prepared against the product of a cloned fcrA gene (anti-FcRA). In this study, the immunogen used to prepare the antibody that defines the second antigenic class was shown to be the product of the emm-like (emmL) gene of M serotype 55 group A isolate, A928. The emmL55 gene expressed in E. coli produced an M(r) approximately 58,000 molecule which bound human IgG1, IgG2, IgG3 and IgG4, as well as horse, rabbit and pig IgG in a non-immune fashion. These properties are characteristic of the previously described type IIo IgG-binding protein isolated from this strain. In addition, the recombinant protein was reactive with human serum albumin and fibrinogen. The emmL 55 gene sequence was analysed and found to have the organization and sequence characteristics of a typical class I emm-like gene.
Purrello, M; Di Pietro, C; Rapisarda, A; Viola, A; Corsaro, C; Motta, S; Grzeschik, K H; Sichel, G
1996-01-01
Dr1 is a nuclear protein of 19 kDa that exists in the nucleoplasm as a homotetramer. By binding to TBP (the DNA-binding subunit of TFIID, and also a subunit of SL1 and TFIIIB), the protein blocks class II and class III preinitiation complex assembly, thus repressing the activity of the corresponding promoters. Since transcription of class I genes is unaffected by Dr1. it has been proposed that the protein may coordinate the expression of class I, class II and class III genes. By somatic cell genetics and fluorescence in situ hybridization, we have localized the gene (DR1), present in the genome of higher eukaryotes as a single copy, to human chromosome region 1p21-->p13. The nucleotide sequence conservation of the coding segment of the gene, as determined by Noah's ark blot analysis, and its ubiquitous transcription suggest that Dr1 has an important biological role, which could be related to the negative control of cell proliferation.
Wen, Dong-Yue; Lin, Peng; Pang, Yu-Yan; Chen, Gang; He, Yun; Dang, Yi-Wu; Yang, Hong
2018-05-05
BACKGROUND Long non-coding RNAs (lncRNAs) have a role in physiological and pathological processes, including cancer. The aim of this study was to investigate the expression of the long intergenic non-protein coding RNA 665 (LINC00665) gene and the cell cycle in hepatocellular carcinoma (HCC) using database analysis including The Cancer Genome Atlas (TCGA), the Gene Expression Omnibus (GEO), and quantitative real-time polymerase chain reaction (qPCR). MATERIAL AND METHODS Expression levels of LINC00665 were compared between human tissue samples of HCC and adjacent normal liver, clinicopathological correlations were made using TCGA and the GEO, and qPCR was performed to validate the findings. Other public databases were searched for other genes associated with LINC00665 expression, including The Atlas of Noncoding RNAs in Cancer (TANRIC), the Multi Experiment Matrix (MEM), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and protein-protein interaction (PPI) networks. RESULTS Overexpression of LINC00665 in patients with HCC was significantly associated with gender, tumor grade, stage, and tumor cell type. Overexpression of LINC00665 in patients with HCC was significantly associated with overall survival (OS) (HR=1.47795%; CI: 1.046-2.086). Bioinformatics analysis identified 469 related genes and further analysis supported a hypothesis that LINC00665 regulates pathways in the cell cycle to facilitate the development and progression of HCC through ten identified core genes: CDK1, BUB1B, BUB1, PLK1, CCNB2, CCNB1, CDC20, ESPL1, MAD2L1, and CCNA2. CONCLUSIONS Overexpression of the lncRNA, LINC00665 may be involved in the regulation of cell cycle pathways in HCC through ten identified hub genes.
Kim, Yoonhee; Zhang, Yinhua; Pang, Kaifang; Kang, Hyojin; Park, Heejoo; Lee, Yeunkum; Lee, Bokyoung; Lee, Heon-Jeong; Kim, Won-Ki; Geum, Dongho
2016-01-01
Bipolar disorder (BD), characterized by recurrent mood swings between depression and mania, is a highly heritable and devastating mental illness with poorly defined pathophysiology. Recent genome-wide molecular genetic studies have identified several protein-coding genes and microRNAs (miRNAs) significantly associated with BD. Notably, some of the proteins expressed from BD-associated genes function in neuronal synapses, suggesting that abnormalities in synaptic function could be one of the key pathogenic mechanisms of BD. In contrast, however, the role of BD-associated miRNAs in disease pathogenesis remains largely unknown, mainly because of a lack of understanding about their target mRNAs and pathways in neurons. To address this problem, in this study, we focused on a recently identified BD-associated but uncharacterized miRNA, miR-1908-5p. We identified and validated its novel target genes including DLGAP4, GRIN1, STX1A, CLSTN1 and GRM4, which all function in neuronal glutamatergic synapses. Moreover, bioinformatic analyses of human brain expression profiles revealed that the expression levels of miR-1908-5p and its synaptic target genes show an inverse-correlation in many brain regions. In our preliminary experiments, the expression of miR-1908-5p was increased after chronic treatment with valproate but not lithium in control human neural progenitor cells. In contrast, it was decreased by valproate in neural progenitor cells derived from dermal fibroblasts of a BD subject. Together, our results provide new insights into the potential role of miR-1908-5p in the pathogenesis of BD and also propose a hypothesis that neuronal synapses could be a key converging pathway of some BD-associated protein-coding genes and miRNAs. PMID:28035180
Expression of glutathione peroxidase I gene in selenium-deficient rats.
Reddy, A P; Hsu, B L; Reddy, P S; Li, N Q; Thyagaraju, K; Reddy, C C; Tam, M F; Tu, C P
1988-01-01
We have characterized a cDNA pGPX1211 encoding rat glutathione peroxidase I. The selenocysteine in the protein corresponded to a TGA codon in the coding region of the cDNA, similar to earlier findings in mouse and human genes, and a gene encoding the formate dehydrogenase from E. coli, another selenoenzyme. The rat GSH peroxidase I has a calculated subunit molecular weight of 22,155 daltons and shares 95% and 86% sequence homology with the mouse and human subunits, respectively. The 3'-noncoding sequence (greater than 930 bp) in pGPX1211 is much longer than that of the human sequences. We found that glutathione peroxidase I mRNA, but not the polypeptide, was expressed under nutritional stress of selenium deficiency where no glutathione peroxidase I activity can be detected. The failure of detecting any apoprotein for the glutathione peroxidase I under selenium deficiency and results published from other laboratories supports the proposal that selenium may be incorporated into the glutathione peroxidase I co-translationally. Images PMID:2838821
Genomic insights into the Ixodes scapularis tick vector of Lyme disease
Gulia-Nuss, Monika; Nuss, Andrew B.; Meyer, Jason M.; Sonenshine, Daniel E.; Roe, R. Michael; Waterhouse, Robert M.; Sattelle, David B.; de la Fuente, José; Ribeiro, Jose M.; Megy, Karine; Thimmapuram, Jyothi; Miller, Jason R.; Walenz, Brian P.; Koren, Sergey; Hostetler, Jessica B.; Thiagarajan, Mathangi; Joardar, Vinita S.; Hannick, Linda I.; Bidwell, Shelby; Hammond, Martin P.; Young, Sarah; Zeng, Qiandong; Abrudan, Jenica L.; Almeida, Francisca C.; Ayllón, Nieves; Bhide, Ketaki; Bissinger, Brooke W.; Bonzon-Kulichenko, Elena; Buckingham, Steven D.; Caffrey, Daniel R.; Caimano, Melissa J.; Croset, Vincent; Driscoll, Timothy; Gilbert, Don; Gillespie, Joseph J.; Giraldo-Calderón, Gloria I.; Grabowski, Jeffrey M.; Jiang, David; Khalil, Sayed M. S.; Kim, Donghun; Kocan, Katherine M.; Koči, Juraj; Kuhn, Richard J.; Kurtti, Timothy J.; Lees, Kristin; Lang, Emma G.; Kennedy, Ryan C.; Kwon, Hyeogsun; Perera, Rushika; Qi, Yumin; Radolf, Justin D.; Sakamoto, Joyce M.; Sánchez-Gracia, Alejandro; Severo, Maiara S.; Silverman, Neal; Šimo, Ladislav; Tojo, Marta; Tornador, Cristian; Van Zee, Janice P.; Vázquez, Jesús; Vieira, Filipe G.; Villar, Margarita; Wespiser, Adam R.; Yang, Yunlong; Zhu, Jiwei; Arensburger, Peter; Pietrantonio, Patricia V.; Barker, Stephen C.; Shao, Renfu; Zdobnov, Evgeny M.; Hauser, Frank; Grimmelikhuijzen, Cornelis J. P.; Park, Yoonseong; Rozas, Julio; Benton, Richard; Pedra, Joao H. F.; Nelson, David R.; Unger, Maria F.; Tubio, Jose M. C.; Tu, Zhijian; Robertson, Hugh M.; Shumway, Martin; Sutton, Granger; Wortman, Jennifer R.; Lawson, Daniel; Wikel, Stephen K.; Nene, Vishvanath M.; Fraser, Claire M.; Collins, Frank H.; Birren, Bruce; Nelson, Karen E.; Caler, Elisabet; Hill, Catherine A.
2016-01-01
Ticks transmit more pathogens to humans and animals than any other arthropod. We describe the 2.1 Gbp nuclear genome of the tick, Ixodes scapularis (Say), which vectors pathogens that cause Lyme disease, human granulocytic anaplasmosis, babesiosis and other diseases. The large genome reflects accumulation of repetitive DNA, new lineages of retro-transposons, and gene architecture patterns resembling ancient metazoans rather than pancrustaceans. Annotation of scaffolds representing ∼57% of the genome, reveals 20,486 protein-coding genes and expansions of gene families associated with tick–host interactions. We report insights from genome analyses into parasitic processes unique to ticks, including host ‘questing', prolonged feeding, cuticle synthesis, blood meal concentration, novel methods of haemoglobin digestion, haem detoxification, vitellogenesis and prolonged off-host survival. We identify proteins associated with the agent of human granulocytic anaplasmosis, an emerging disease, and the encephalitis-causing Langat virus, and a population structure correlated to life-history traits and transmission of the Lyme disease agent. PMID:26856261
Genomic insights into the Ixodes scapularis tick vector of Lyme disease.
Gulia-Nuss, Monika; Nuss, Andrew B; Meyer, Jason M; Sonenshine, Daniel E; Roe, R Michael; Waterhouse, Robert M; Sattelle, David B; de la Fuente, José; Ribeiro, Jose M; Megy, Karine; Thimmapuram, Jyothi; Miller, Jason R; Walenz, Brian P; Koren, Sergey; Hostetler, Jessica B; Thiagarajan, Mathangi; Joardar, Vinita S; Hannick, Linda I; Bidwell, Shelby; Hammond, Martin P; Young, Sarah; Zeng, Qiandong; Abrudan, Jenica L; Almeida, Francisca C; Ayllón, Nieves; Bhide, Ketaki; Bissinger, Brooke W; Bonzon-Kulichenko, Elena; Buckingham, Steven D; Caffrey, Daniel R; Caimano, Melissa J; Croset, Vincent; Driscoll, Timothy; Gilbert, Don; Gillespie, Joseph J; Giraldo-Calderón, Gloria I; Grabowski, Jeffrey M; Jiang, David; Khalil, Sayed M S; Kim, Donghun; Kocan, Katherine M; Koči, Juraj; Kuhn, Richard J; Kurtti, Timothy J; Lees, Kristin; Lang, Emma G; Kennedy, Ryan C; Kwon, Hyeogsun; Perera, Rushika; Qi, Yumin; Radolf, Justin D; Sakamoto, Joyce M; Sánchez-Gracia, Alejandro; Severo, Maiara S; Silverman, Neal; Šimo, Ladislav; Tojo, Marta; Tornador, Cristian; Van Zee, Janice P; Vázquez, Jesús; Vieira, Filipe G; Villar, Margarita; Wespiser, Adam R; Yang, Yunlong; Zhu, Jiwei; Arensburger, Peter; Pietrantonio, Patricia V; Barker, Stephen C; Shao, Renfu; Zdobnov, Evgeny M; Hauser, Frank; Grimmelikhuijzen, Cornelis J P; Park, Yoonseong; Rozas, Julio; Benton, Richard; Pedra, Joao H F; Nelson, David R; Unger, Maria F; Tubio, Jose M C; Tu, Zhijian; Robertson, Hugh M; Shumway, Martin; Sutton, Granger; Wortman, Jennifer R; Lawson, Daniel; Wikel, Stephen K; Nene, Vishvanath M; Fraser, Claire M; Collins, Frank H; Birren, Bruce; Nelson, Karen E; Caler, Elisabet; Hill, Catherine A
2016-02-09
Ticks transmit more pathogens to humans and animals than any other arthropod. We describe the 2.1 Gbp nuclear genome of the tick, Ixodes scapularis (Say), which vectors pathogens that cause Lyme disease, human granulocytic anaplasmosis, babesiosis and other diseases. The large genome reflects accumulation of repetitive DNA, new lineages of retro-transposons, and gene architecture patterns resembling ancient metazoans rather than pancrustaceans. Annotation of scaffolds representing ∼57% of the genome, reveals 20,486 protein-coding genes and expansions of gene families associated with tick-host interactions. We report insights from genome analyses into parasitic processes unique to ticks, including host 'questing', prolonged feeding, cuticle synthesis, blood meal concentration, novel methods of haemoglobin digestion, haem detoxification, vitellogenesis and prolonged off-host survival. We identify proteins associated with the agent of human granulocytic anaplasmosis, an emerging disease, and the encephalitis-causing Langat virus, and a population structure correlated to life-history traits and transmission of the Lyme disease agent.
Janson, Christopher; McPhee, Scott; Bilaniuk, Larissa; Haselgrove, John; Testaiuti, Mark; Freese, Andrew; Wang, Dah-Jyuu; Shera, David; Hurh, Peter; Rupin, Joan; Saslow, Elizabeth; Goldfarb, Olga; Goldberg, Michael; Larijani, Ghassem; Sharrar, William; Liouterman, Larisa; Camp, Angelique; Kolodny, Edwin; Samulski, Jude; Leone, Paola
2002-07-20
This clinical protocol describes virus-based gene transfer for Canavan disease, a childhood leukodystrophy. Canavan disease, also known as Van Bogaert-Bertrand disease, is a monogeneic, autosomal recessive disease in which the gene coding for the enzyme aspartoacylase (ASPA) is defective. The lack of functional enzyme leads to an increase in the central nervous system of the substrate molecule, N-acetyl-aspartate (NAA), which impairs normal myelination and results in spongiform degeneration of the brain. No effective treatment currently exists; however, virus-based gene transfer has the potential to arrest or reverse the course of this otherwise fatal condition. This procedure involves neurosurgical administration of approximately 900 billion genomic particles (approximately 10 billion infectious particles) of recombinant adeno-associated virus (AAV) containing the aspartoacylase gene (ASPA) directly to affected regions of the brain in each of 21 patients with Canavan disease. Pre- and post-delivery assessments include a battery of noninvasive biochemical, radiological, and neurological tests. This gene transfer study represents the first clinical use of AAV in the human brain and the first instance of viral gene transfer for a neurodegenerative disease.
A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions
Glusman, Gustavo; Qin, Shizhen; El-Gewely, M. Raafat; Siegel, Andrew F; Roach, Jared C; Hood, Leroy; Smit, Arian F. A
2006-01-01
The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.” PMID:16543943
Kemmerer, Marina; Finkernagel, Florian; Cavalcante, Marcela Frota; Abdalla, Dulcineia Saes Parra; Müller, Rolf; Brüne, Bernhard; Namgaladze, Dmitry
2015-01-01
AMP-activated protein kinase (AMPK) maintains energy homeostasis by suppressing cellular ATP-consuming processes and activating catabolic, ATP-producing pathways such as fatty acid oxidation (FAO). The transcription factor peroxisome proliferator-activated receptor δ (PPARδ) also affects fatty acid metabolism, stimulating the expression of genes involved in FAO. To question the interplay of AMPK and PPARδ in human macrophages we transduced primary human macrophages with lentiviral particles encoding for the constitutively active AMPKα1 catalytic subunit, followed by microarray expression analysis after treatment with the PPARδ agonist GW501516. Microarray analysis showed that co-activation of AMPK and PPARδ increased expression of FAO genes, which were validated by quantitative PCR. Induction of these FAO-associated genes was also observed upon infecting macrophages with an adenovirus coding for AMPKγ1 regulatory subunit carrying an activating R70Q mutation. The pharmacological AMPK activator A-769662 increased expression of several FAO genes in a PPARδ- and AMPK-dependent manner. Although GW501516 significantly increased FAO and reduced the triglyceride amount in very low density lipoproteins (VLDL)-loaded foam cells, AMPK activation failed to potentiate this effect, suggesting that increased expression of fatty acid catabolic genes alone may be not sufficient to prevent macrophage lipid overload.
Kemmerer, Marina; Finkernagel, Florian; Cavalcante, Marcela Frota; Abdalla, Dulcineia Saes Parra; Müller, Rolf; Brüne, Bernhard; Namgaladze, Dmitry
2015-01-01
AMP-activated protein kinase (AMPK) maintains energy homeostasis by suppressing cellular ATP-consuming processes and activating catabolic, ATP-producing pathways such as fatty acid oxidation (FAO). The transcription factor peroxisome proliferator-activated receptor δ (PPARδ) also affects fatty acid metabolism, stimulating the expression of genes involved in FAO. To question the interplay of AMPK and PPARδ in human macrophages we transduced primary human macrophages with lentiviral particles encoding for the constitutively active AMPKα1 catalytic subunit, followed by microarray expression analysis after treatment with the PPARδ agonist GW501516. Microarray analysis showed that co-activation of AMPK and PPARδ increased expression of FAO genes, which were validated by quantitative PCR. Induction of these FAO-associated genes was also observed upon infecting macrophages with an adenovirus coding for AMPKγ1 regulatory subunit carrying an activating R70Q mutation. The pharmacological AMPK activator A-769662 increased expression of several FAO genes in a PPARδ- and AMPK-dependent manner. Although GW501516 significantly increased FAO and reduced the triglyceride amount in very low density lipoproteins (VLDL)-loaded foam cells, AMPK activation failed to potentiate this effect, suggesting that increased expression of fatty acid catabolic genes alone may be not sufficient to prevent macrophage lipid overload. PMID:26098914
Human X-Linked genes regionally mapped utilizing X-autosome translocations and somatic cell hybrids.
Shows, T B; Brown, J A
1975-01-01
Human genes coding for hypoxanthine phosphoribosyltransferase (HPRT, EC 2.4.2.8; IMP:pyrophosphate phosphoribosyltransferase), glucose-6-phosphate dehydrogenase (G6PD, EC 1.1.1.49; D-glucose-6-phosphate:NADP+ 1-oxidoreductase), and phosphoglycerate kinase (PGK, EC 2.7.2.3; ATP:3-phospho-D-glycerate 1-phosphotransferase) have been assigned to specific regions on the long arm of the X chromosome by somatic cell gentic techniques. Gene assignment and linear order were determined by employing human somatic cells possessing an X/9 translocation or an X/22 translocation in man-mouse cell hybridization studies. The X/9 translocation involved the majority of the X long arm translocated to chromosome 9 and the X/22 translocation involved the distal half of the X long arm translocated to 22. In each case these rearrangements appeared to be reciprocal. Concordant segregation of X-linked enzymes and segments of the X chromosome generated by the translocations indicated assignment of the PGK gene to a proximal long arm region (q12-q22) and the HPRT and G6PD genes to the distal half (q22-qter) of the X long arm. Further evidence suggests a gene order on the X long arm of centromere-PGK-HPRT-G6PD. Images PMID:1056018
Changes in gene expression associated with reproductive maturation in wild female baboons.
Babbitt, Courtney C; Tung, Jenny; Wray, Gregory A; Alberts, Susan C
2012-01-01
Changes in gene expression during development play an important role in shaping morphological and behavioral differences, including between humans and nonhuman primates. Although many of the most striking developmental changes occur during early development, reproductive maturation represents another critical window in primate life history. However, this process is difficult to study at the molecular level in natural primate populations. Here, we took advantage of ovarian samples made available through an unusual episode of human-wildlife conflict to identify genes that are important in this process. Specifically, we used RNA sequencing (RNA-Seq) to compare genome-wide gene expression patterns in the ovarian tissue of juvenile and adult female baboons from Amboseli National Park, Kenya. We combined this information with prior evidence of selection occurring on two primate lineages (human and chimpanzee). We found that in cases in which genes were both differentially expressed over the course of ovarian maturation and also linked to lineage-specific selection this selective signature was much more likely to occur in regulatory regions than in coding regions. These results suggest that adaptive change in the development of the primate ovary may be largely driven at the mechanistic level by selection on gene regulation, potentially in relationship to the physiology or timing of female reproductive maturation.
Diverse point mutations in the human gene for polymorphic N-acetyltransferase
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vatsis, K.P.; Martell, K.J.; Weber, W.W.
1991-07-15
Classification of humans as rapid or slow acetylators is based on hereditary differences in rates of N-acetylation of therapeutic and carcinogenic agents, but N-acetylation of certain arylamine drugs displays no genetic variation. Two highly homologous human genes for N-acetyltransferase NAT1 and NAT2, presumably code for the genetically invariant and variant NAT proteins, respectively. In the present investigation, 1.9-kilobase human genomic EcoRI fragments encoding NAT2 were generated by the polymerase chain reaction with liver and leukocyte DNA from seven subjects phenotyped as homozygous and heterozygous acetylators. Direct sequencing revealed multiple point mutations in the coding region of two distinct NAT2 variants.more » One of these was derived from leukocytes of a slow acetylator and was distinguished by a silent mutation (coden 94) and a separate G {r arrow} A transition (position 590) leading to replacement of Arg-197 by Gln; the mutated guanine was part of a CpG dinucleotide and a Taq I site. The second NAT2 variant originated from liver with low N-acetylation activity. It was characterized by three nucleotide transitions giving rise to a silent mutation (codon 161), accompanied by obliteration of the sole Kpn I site, and two amino acid substitutions. The results show conclusively that the genetically variant NAT is encoded by NAT2.« less
Liu, Xuewu; Wang, Yuanyuan; Liang, Jiao; Wang, Luojun; Qin, Na; Zhao, Ya; Zhao, Gang
2018-05-02
Plasmodium falciparum is the most virulent malaria parasite capable of parasitizing human erythrocytes. The identification of genes related to this capability can enhance our understanding of the molecular mechanisms underlying human malaria and lead to the development of new therapeutic strategies for malaria control. With the availability of several malaria parasite genome sequences, performing computational analysis is now a practical strategy to identify genes contributing to this disease. Here, we developed and used a virtual genome method to assign 33,314 genes from three human malaria parasites, namely, P. falciparum, P. knowlesi and P. vivax, and three rodent malaria parasites, namely, P. berghei, P. chabaudi and P. yoelii, to 4605 clusters. Each cluster consisted of genes whose protein sequences were significantly similar and was considered as a virtual gene. Comparing the enriched values of all clusters in human malaria parasites with those in rodent malaria parasites revealed 115 P. falciparum genes putatively responsible for parasitizing human erythrocytes. These genes are mainly located in the chromosome internal regions and participate in many biological processes, including membrane protein trafficking and thiamine biosynthesis. Meanwhile, 289 P. berghei genes were included in the rodent parasite-enriched clusters. Most are located in subtelomeric regions and encode erythrocyte surface proteins. Comparing cluster values in P. falciparum with those in P. vivax and P. knowlesi revealed 493 candidate genes linked to virulence. Some of them encode proteins present on the erythrocyte surface and participate in cytoadhesion, virulence factor trafficking, or erythrocyte invasion, but many genes with unknown function were also identified. Cerebral malaria is characterized by accumulation of infected erythrocytes at trophozoite stage in brain microvascular. To discover cerebral malaria-related genes, fast Fourier transformation (FFT) was introduced to extract genes highly transcribed at the trophozoite stage. Finally, 55 candidate genes were identified. Considering that parasite-infected erythrocyte surface protein 2 (PIESP2) contains gap-junction-related Neuromodulin_N domain and that anti-PIESP2 might provide protection against malaria, we chose PIESP2 for further experimental study. Our analysis revealed a limited number of genes linked to human disease in P. falciparum genome. These genes could be interesting targets for further functional characterization.
AP-2α and AP-2β cooperatively orchestrate homeobox gene expression during branchial arch patterning.
Van Otterloo, Eric; Li, Hong; Jones, Kenneth L; Williams, Trevor
2018-01-25
The evolution of a hinged moveable jaw with variable morphology is considered a major factor behind the successful expansion of the vertebrates. DLX homeobox transcription factors are crucial for establishing the positional code that patterns the mandible, maxilla and intervening hinge domain, but how the genes encoding these proteins are regulated remains unclear. Herein, we demonstrate that the concerted action of the AP-2α and AP-2β transcription factors within the mouse neural crest is essential for jaw patterning. In the absence of these two proteins, the hinge domain is lost and there are alterations in the size and patterning of the jaws correlating with dysregulation of homeobox gene expression, with reduced levels of Emx, Msx and Dlx paralogs accompanied by an expansion of Six1 expression. Moreover, detailed analysis of morphological features and gene expression changes indicate significant overlap with various compound Dlx gene mutants. Together, these findings reveal that the AP-2 genes have a major function in mammalian neural crest development, influencing patterning of the craniofacial skeleton via the DLX code, an effect that has implications for vertebrate facial evolution, as well as for human craniofacial disorders. © 2018. Published by The Company of Biologists Ltd.
Brankatschk, Kerstin; Kamber, Tim; Pothier, Joël F; Duffy, Brion; Smits, Theo H M
2014-11-01
Sprouted seeds represent a great risk for infection by human enteric pathogens because of favourable growth conditions for pathogens during their germination. The aim of this study was to identify mechanisms of interactions of Salmonella enterica subsp. enterica Weltevreden with alfalfa sprouts. RNA-seq analysis of S. Weltevreden grown with sprouts in comparison with M9-glucose medium showed that among a total of 4158 annotated coding sequences, 177 genes (4.3%) and 345 genes (8.3%) were transcribed at higher levels with sprouts and in minimal medium respectively. Genes that were higher transcribed with sprouts are coding for proteins involved in mechanisms known to be important for attachment, motility and biofilm formation. Besides gene expression required for phenotypic adaption, genes involved in sulphate acquisition were higher transcribed, suggesting that the surface on alfalfa sprouts may be poor in sulphate. Genes encoding structural and effector proteins of Salmonella pathogenicity island 2, involved in survival within macrophages during infection of animal tissue, were higher transcribed with sprouts possibly as a response to environmental conditions. This study provides insight on additional mechanisms that may be important for pathogen interactions with sprouts. © 2013 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.
SWIM: a computational tool to unveiling crucial nodes in complex biological networks.
Paci, Paola; Colombo, Teresa; Fiscon, Giulia; Gurtner, Aymone; Pavesi, Giulio; Farina, Lorenzo
2017-03-20
SWItchMiner (SWIM) is a wizard-like software implementation of a procedure, previously described, able to extract information contained in complex networks. Specifically, SWIM allows unearthing the existence of a new class of hubs, called "fight-club hubs", characterized by a marked negative correlation with their first nearest neighbors. Among them, a special subset of genes, called "switch genes", appears to be characterized by an unusual pattern of intra- and inter-module connections that confers them a crucial topological role, interestingly mirrored by the evidence of their clinic-biological relevance. Here, we applied SWIM to a large panel of cancer datasets from The Cancer Genome Atlas, in order to highlight switch genes that could be critically associated with the drastic changes in the physiological state of cells or tissues induced by the cancer development. We discovered that switch genes are found in all cancers we studied and they encompass protein coding genes and non-coding RNAs, recovering many known key cancer players but also many new potential biomarkers not yet characterized in cancer context. Furthermore, SWIM is amenable to detect switch genes in different organisms and cell conditions, with the potential to uncover important players in biologically relevant scenarios, including but not limited to human cancer.
Transcriptional regulation of the human mitochondrial peptide deformylase (PDF).
Pereira-Castro, Isabel; Costa, Luís Teixeira da; Amorim, António; Azevedo, Luisa
2012-05-18
The last years of research have been particularly dynamic in establishing the importance of peptide deformylase (PDF), a protein of the N-terminal methionine excision (NME) pathway that removes formyl-methionine from mitochondrial-encoded proteins. The genomic sequence of the human PDF gene is shared with the COG8 gene, which encodes a component of the oligomeric golgi complex, a very unusual case in Eukaryotic genomes. Since PDF is crucial in maintaining mitochondrial function and given the atypical short distance between the end of COG8 coding sequence and the PDF initiation codon, we investigated whether the regulation of the human PDF is affected by the COG8 overlapping partner. Our data reveals that PDF has several transcription start sites, the most important of which only 18 bp from the initiation codon. Furthermore, luciferase-activation assays using differently-sized fragments defined a 97 bp minimal promoter region for human PDF, which is capable of very strong transcriptional activity. This fragment contains a potential Sp1 binding site highly conserved in mammalian species. We show that this binding site, whose mutation significantly reduces transcription activation, is a target for the Sp1 transcription factor, and possibly of other members of the Sp family. Importantly, the entire minimal promoter region is located after the end of COG8's coding region, strongly suggesting that the human PDF preserves an independent regulation from its overlapping partner. Copyright © 2012 Elsevier Inc. All rights reserved.
Early Evolution of Conserved Regulatory Sequences Associated with Development in Vertebrates
McEwen, Gayle K.; Goode, Debbie K.; Parker, Hugo J.; Woolfe, Adam; Callaway, Heather; Elgar, Greg
2009-01-01
Comparisons between diverse vertebrate genomes have uncovered thousands of highly conserved non-coding sequences, an increasing number of which have been shown to function as enhancers during early development. Despite their extreme conservation over 500 million years from humans to cartilaginous fish, these elements appear to be largely absent in invertebrates, and, to date, there has been little understanding of their mode of action or the evolutionary processes that have modelled them. We have now exploited emerging genomic sequence data for the sea lamprey, Petromyzon marinus, to explore the depth of conservation of this type of element in the earliest diverging extant vertebrate lineage, the jawless fish (agnathans). We searched for conserved non-coding elements (CNEs) at 13 human gene loci and identified lamprey elements associated with all but two of these gene regions. Although markedly shorter and less well conserved than within jawed vertebrates, identified lamprey CNEs are able to drive specific patterns of expression in zebrafish embryos, which are almost identical to those driven by the equivalent human elements. These CNEs are therefore a unique and defining characteristic of all vertebrates. Furthermore, alignment of lamprey and other vertebrate CNEs should permit the identification of persistent sequence signatures that are responsible for common patterns of expression and contribute to the elucidation of the regulatory language in CNEs. Identifying the core regulatory code for development, common to all vertebrates, provides a foundation upon which regulatory networks can be constructed and might also illuminate how large conserved regulatory sequence blocks evolve and become fixed in genomic DNA. PMID:20011110
Huang, Yao-Ting; Cheng, Jan-Fang; Chen, Shi-Yu; Hong, Yu-Kai; Wu, Zong-Yen; Liu, Po-Yu
2018-06-19
Shewanella algae is an environmental marine bacteria and an emerging opportunistic human pathogen. Moreover, there are increasing reports of strains showing multi-drug resistance, particularly carbapenem-resistant isolates. Although S. algae have been found in bivalve shellfish aquaculture, there is very little genome-wide data on resistant determinants in S. algae from shellfish. In the study, we aimed to determine the whole genome sequence of carbapenem-resistant S. algae strain AC isolated from small abalone in Taiwan. Genome DNA was sequenced using an Illumina MiSeq platform using 250bp paired-end reads. De novo genome assembly was performed using Velvet v1.2.07. The whole genome was annotated and several candidate genes for antimicrobial resistance were identified. The genome size was calculated at 4,751,156bp, with a mean G+C content of 53.09%. A total of 4,164 protein-coding sequences, 7 rRNAs, 85 tRNAs, and 5 non-coding RNAs were identified. The genome contains genes associated with resistance to β-lactams, trimethoprim, tetracycline, colistin, and quinolone resistance. Multiple efflux pump genes were also detected. Small abalone is a potential source of foodborne drug resistant S. algae. The genome sequence of a carbapenem-resistant S. algae strain AC isolated from small abalone will provide valuable information for further study of the dissemination of resistance genes at the human-animal interface. Copyright © 2018. Published by Elsevier Ltd.
Transcriptome profiling analysis of Vibrio vulnificus during human infection.
Bisharat, Naiel; Bronstein, Michal; Korner, Mira; Schnitzer, Temima; Koton, Yael
2013-09-01
Vibrio vulnificus is a waterborne pathogen that was responsible for an outbreak of severe soft-tissue infections among fish farmers and fish consumers in Israel. Several factors have been shown to be associated with virulence. However, the transcriptome profile of the pathogen during human infection has not been determined yet. We compared the transcriptome profile, using RNA sequencing, of a human-pathogenic strain harvested directly from tissue of a patient suffering from severe soft-tissue infection with necrotizing fasciitis, with the same strain and three other environmental strains grown in vitro. The five sequenced libraries were aligned to the reference genomes of V. vulnificus strains CMCP6 and YJ016. Approximately 47.8 to 62.3 million paired-end raw reads were generated from the five runs. Nearly 84 % of the genome was covered by reads from at least one of the five runs, suggesting that nearly 16 % of the genome is not transcribed or is transcribed at low levels. We identified 123 genes that were differentially expressed during the acute phase of infection. Sixty-three genes were mapped to the large chromosome, 47 genes mapped to the small chromosome and 13 genes mapped to the YJ016 plasmid. The 123 genes fell into a variety of functional categories including transcription, signal transduction, cell motility, carbohydrate metabolism, intracellular trafficking and cell envelope biogenesis. Among the genes differentially expressed during human infection we identified genes encoding bacterial toxin (RtxA1) and genes involved in flagellar components, Flp-coding region, GGDEF family protein, iron acquisition system and sialic acid metabolism.
Gene-culture coevolution in the age of genomics
Richerson, Peter J.; Boyd, Robert; Henrich, Joseph
2010-01-01
The use of socially learned information (culture) is central to human adaptations. We investigate the hypothesis that the process of cultural evolution has played an active, leading role in the evolution of genes. Culture normally evolves more rapidly than genes, creating novel environments that expose genes to new selective pressures. Many human genes that have been shown to be under recent or current selection are changing as a result of new environments created by cultural innovations. Some changed in response to the development of agricultural subsistence systems in the Early and Middle Holocene. Alleles coding for adaptations to diets rich in plant starch (e.g., amylase copy number) and to epidemic diseases evolved as human populations expanded (e.g., sickle cell and G6PD deficiency alleles that provide protection against malaria). Large-scale scans using patterns of linkage disequilibrium to detect recent selection suggest that many more genes evolved in response to agriculture. Genetic change in response to the novel social environment of contemporary modern societies is also likely to be occurring. The functional effects of most of the alleles under selection during the last 10,000 years are currently unknown. Also unknown is the role of paleoenvironmental change in regulating the tempo of hominin evolution. Although the full extent of culture-driven gene-culture coevolution is thus far unknown for the deeper history of the human lineage, theory and some evidence suggest that such effects were profound. Genomic methods promise to have a major impact on our understanding of gene-culture coevolution over the span of hominin evolutionary history. PMID:20445092
Chromosomal localization and cDNA cloning of the human DBP and TEF genes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khatib, Z.A.; Inaba, T.; Valentine, M.
1994-09-15
The authors have isolated cDNA and genomic clones and determined the human chromosome positions of two genes encoding transcription factors expressed in the liver and the pituitary gland: albumin D-site-binding protein (DBP) and thyrotroph embryonic factor (TEF). Both proteins have been identified as members of the PAR (proline and acidic amino acid-rich) subfamily of bZIP transcription factors in the rat, but human homologues have not been characterized. Using a fluorescence in situ hybridization technique, the DBP locus was assigned to chromosome 19q13, and TEF to chromosome 22q13. Each assignment was confirmed by means of human chromosome segregation in somatic cellmore » hybrids. Coding sequences of DBP and TEF, extending beyond the bZIP domain to the PAR region, were highly conserved in both human-human and interspecies comparisons. Conservation of the exon-intron boundaries of each bZIP domain-encoding exon suggested derivation from a common ancestral gene. DBP and TEF mRNAs were expressed in all tissues and cell lines examined, including brain, lung, liver, spleen, and kidney. Knowledge of the human chromosome locations of these PAR proteins will facilitate studies to assess their involvement in carcinogenesis and other fundamental biological processes. 37 refs., 5 figs., 1 tab.« less
Comprehensive identification and analysis of human accelerated regulatory DNA
Gittelman, Rachel M.; Hun, Enna; Ay, Ferhat; Madeoy, Jennifer; Pennacchio, Len; Noble, William S.; Hawkins, R. David; Akey, Joshua M.
2015-01-01
It has long been hypothesized that changes in gene regulation have played an important role in human evolution, but regulatory DNA has been much more difficult to study compared with protein-coding regions. Recent large-scale studies have created genome-scale catalogs of DNase I hypersensitive sites (DHSs), which demark potentially functional regulatory DNA. To better define regulatory DNA that has been subject to human-specific adaptive evolution, we performed comprehensive evolutionary and population genetics analyses on over 18 million DHSs discovered in 130 cell types. We identified 524 DHSs that are conserved in nonhuman primates but accelerated in the human lineage (haDHS), and estimate that 70% of substitutions in haDHSs are attributable to positive selection. Through extensive computational and experimental analyses, we demonstrate that haDHSs are often active in brain or neuronal cell types; play an important role in regulating the expression of developmentally important genes, including many transcription factors such as SOX6, POU3F2, and HOX genes; and identify striking examples of adaptive regulatory evolution that may have contributed to human-specific phenotypes. More generally, our results reveal new insights into conserved and adaptive regulatory DNA in humans and refine the set of genomic substrates that distinguish humans from their closest living primate relatives. PMID:26104583
Evolution and genomics of the human brain.
Rosales-Reynoso, M A; Juárez-Vázquez, C I; Barros-Núñez, P
2018-05-01
Most living beings are able to perform actions that can be considered intelligent or, at the very least, the result of an appropriate reaction to changing circumstances in their environment. However, the intelligence or intellectual processes of humans are vastly superior to those achieved by all other species. The adult human brain is a highly complex organ weighing approximately 1500g, which accounts for only 2% of the total body weight but consumes an amount of energy equal to that required by all skeletal muscle at rest. Although the human brain displays a typical primate structure, it can be identified by its specific distinguishing features. The process of evolution and humanisation of the Homo sapiens brain resulted in a unique and distinct organ with the largest relative volume of any animal species. It also permitted structural reorganization of tissues and circuits in specific segments and regions. These steps explain the remarkable cognitive abilities of modern humans compared not only with other species in our genus, but also with older members of our own species. Brain evolution required the coexistence of two adaptation mechanisms. The first involves genetic changes that occur at the species level, and the second occurs at the individual level and involves changes in chromatin organisation or epigenetic changes. The genetic mechanisms include: a) genetic changes in coding regions that lead to changes in the sequence and activity of existing proteins; b) duplication and deletion of previously existing genes; c) changes in gene expression through changes in the regulatory sequences of different genes; and d) synthesis of non-coding RNAs. Lastly, this review describes some of the main documented chromosomal differences between humans and great apes. These differences have also contributed to the evolution and humanisation process of the H. sapiens brain. Copyright © 2014 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.
FOXP2 variation in great ape populations offers insight into the evolution of communication skills.
Staes, Nicky; Sherwood, Chet C; Wright, Katharine; de Manuel, Marc; Guevara, Elaine E; Marques-Bonet, Tomas; Krützen, Michael; Massiah, Michael; Hopkins, William D; Ely, John J; Bradley, Brenda J
2017-12-04
The gene coding for the forkhead box protein P2 (FOXP2) is associated with human language disorders. Evolutionary changes in this gene are hypothesized to have contributed to the emergence of speech and language in the human lineage. Although FOXP2 is highly conserved across most mammals, humans differ at two functional amino acid substitutions from chimpanzees, bonobos and gorillas, with an additional fixed substitution found in orangutans. However, FOXP2 has been characterized in only a small number of apes and no publication to date has examined the degree of natural variation in large samples of unrelated great apes. Here, we analyzed the genetic variation in the FOXP2 coding sequence in 63 chimpanzees, 11 bonobos, 48 gorillas, 37 orangutans and 2 gibbons and observed undescribed variation in great apes. We identified two variable polyglutamine microsatellites in chimpanzees and orangutans and found three nonsynonymous single nucleotide polymorphisms, one in chimpanzees, one in gorillas and one in orangutans with derived allele frequencies of 0.01, 0.26 and 0.29, respectively. Structural and functional protein modeling indicate a biochemical effect of the substitution in orangutans, and because of its presence solely in the Sumatran orangutan species, the mutation may be associated with reported population differences in vocalizations.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ozaki, N.; Lappalainen, J.; Linnoila, M.
Serotonin (5-HT){sub ID} receptors are 5-HT release-regulating autoreceptors in the human brain. Abnormalities in brain 5-HT function have been hypothesized in the pathophysiology of various psychiatric disorders, including obsessive-compulsive disorder, autism, mood disorders, eating disorders, impulsive violent behavior, and alcoholism. Thus, mutations occurring in 5-HT autoreceptors may cause or increase the vulnerability to any of these conditions. 5-HT{sub 1D{alpha}} and 5-HT{sub 1D{Beta}} subtypes have been previously localized to chromosomes 1p36.3-p34.3 and 6q13, respectively, using rodent-human hybrids and in situ localization. In this communication, we report the detection of a 5-HT{sub 1D{alpha}} receptor gene polymorphism by single strand conformation polymorphism (SSCP)more » analysis of the coding sequence. The polymorphism was used for fine scale linkage mapping of 5-HT{sub 1D{alpha}} on chromosome 1. This polymorphism should also be useful for linkage studies in populations and in families. Our analysis also demonstrates that functionally significant coding sequence variants of the 5-HT{sub 1D{alpha}} are probably not abundant either among alcoholics or in the general population. 14 refs., 1 fig., 1 tab.« less
Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan
2017-10-03
Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.
Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan
2017-01-01
Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes. PMID:29108274
Gaube, Friedemann; Wolfl, Stefan; Pusch, Larissa; Kroll, Torsten C; Hamburger, Matthias
2007-01-01
Background Extracts from the rhizome of Cimicifuga racemosa (black cohosh) are increasingly popular as herbal alternative to hormone replacement therapy (HRT) for the alleviation of postmenopausal disorders. However, the molecular mode of action and the active principles are presently not clear. Previously published data have been largely contradictory. We, therefore, investigated the effects of a lipophilic black cohosh rhizome extract and cycloartane-type triterpenoids on the estrogen receptor positive human breast cancer cell line MCF-7. Results Both extract and purified compounds clearly inhibited cellular proliferation. Gene expression profiling with the extract allowed us to identify 431 regulated genes with high significance. The extract induced expression pattern differed from those of 17β-estradiol or the estrogen receptor antagonist tamoxifen. We observed a significant enrichment of genes in an anti-proliferative and apoptosis-sensitizing manner, as well as an increase of mRNAs coding for gene products involved in several stress response pathways. These functional groups were highly overrepresented among all regulated genes. Also several transcripts coding for oxidoreductases were induced, as for example the cytochrome P450 family members 1A1 and 1B1. In addition, some transcripts associated with antitumor but also tumor-promoting activity were regulated. Real-Time RT-PCR analysis of 13 selected genes was conducted after treatment with purified compounds – the cycloartane-type triterpene glycoside actein and triterpene aglycons – showing similar expression levels compared to the extract. Conclusion No estrogenic but antiproliferative and proapoptotic gene expression was shown for black cohosh in MCF-7 cells at the transcriptional level. The effects may be results of the activation of different pathways. The cycloartane glycosides and – for the first time – their aglycons could be identified as an active principle in black cohosh. PMID:17880733
Cis-encoded non-coding antisense RNAs in streptococci and other low GC Gram (+) bacterial pathogens
Cho, Kyu Hong; Kim, Jeong-Ho
2015-01-01
Due to recent advances of bioinformatics and high throughput sequencing technology, discovery of regulatory non-coding RNAs in bacteria has been increased to a great extent. Based on this bandwagon, many studies searching for trans-acting small non-coding RNAs in streptococci have been performed intensively, especially in the important human pathogen, group A and B streptococci. However, studies for cis-encoded non-coding antisense RNAs in streptococci have been scarce. A recent study shows antisense RNAs are involved in virulence gene regulation in group B streptococcus, S. agalactiae. This suggests antisense RNAs could have important roles in the pathogenesis of streptococcal pathogens. In this review, we describe recent discoveries of chromosomal cis-encoded antisense RNAs in streptococcal pathogens and other low GC Gram (+) bacteria to provide a guide for future studies. PMID:25859258
Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D
2017-01-04
The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Human heavy chain disease protein WIS: implications for the organization of immunoglobulin genes.
Franklin, E C; Prelli, F; Frangione, B
1979-01-01
Protein WIS is a human gamma3 heavy (H) chain disease immunoglobulin variant whose amino acid sequence is most readily interpreted by postulating that three residues of the amino terminus are followed by a deletion of most of the variable (VH) domain, which ends at the variable-constant (VC) joining region. Then there is a stretch of eight residues, three of which are unusual, while the other five have striking homology to the VC junction sequence. This is followed by a second deletion, which ends at the beginning of the quadruplicated hinge region. These findings are consistent with mutations resulting in deletions of most of the gene coding for the V region and CH1 domain followed by splicing at the VC joining region and at the hinge. These structural features fit well the notion of genetic discontinuity between V and C genes and also suggest similar mechanisms of excision and splicing in the interdomain regions of the C gene of the heavy chain. PMID:106391
Collart, F R; Osipiuk, J; Trent, J; Olsen, G J; Huberman, E
1996-10-03
We have cloned and characterized the gene encoding inosine monophosphate dehydrogenase (IMPDH) from Pyrococcus furiosus (Pf), a hyperthermophillic archeon. Sequence analysis of the Pf gene indicated an open reading frame specifying a protein of 485 amino acids (aa) with a calculated M(r) of 52900. Canonical Archaea promoter elements, Box A and Box B, are located -49 and -17 nucleotides (nt), respectively, upstream of the putative start codon. The sequence of the putative active-site region conforms to the IMPDH signature motif and contains a putative active-site cysteine. Phylogenetic relationships derived by using all available IMPDH sequences are consistent with trees developed for other molecules; they do not precisely resolve the history of Pf IMPDH but indicate a close similarity to bacterial IMPDH proteins. The phylogenetic analysis indicates that a gene duplication occurred prior to the division between rodents and humans, accounting for the Type I and II isoforms identified in mice and humans.
Delzenne, N; Ferré, P; Beylot, M; Daubioul, C; Declercq, B; Diraison, F; Dugail, I; Foufelle, F; Foretz, M; Mace, K; Reimer, R; Palmer, G; Rutter, G; Tavare, J; Van Loo, J; Vidal, H
2001-08-01
Dietary digestible carbohydrates are able to modulate lipogenesis, by modifying the expression of genes coding for key lipogenic enzymes, like fatty acid synthase. The overall objective of the Nutrigene project (FAIR-CT97-3011) was to study the efficiency of various carbohydrates to modulate the lipogenic capacity and relevant gene expression in rat and human species (control and obese subjects) and to understand the underlying molecular mechanisms involved in the regulation of lipogenic genes by carbohydrates. Key cellular mediators (namely SREBP-1c and 2, AMP activated protein kinase, cholesterol content) of the regulation of lipogenic gene expression by glucose and/or insulin were identified and constitute new putative targets in the development of plurimetabolic syndrome associated with obesity. In humans, hepatic lipogenesis and triglyceride synthesis, assessed in vivo by the use of stable isotopes, was promoted by a high-carbohydrate diet in non obese subjects, and in non alcoholic steatotic patients, but was not modified in the adipose tissue of obese subjects. Non digestible/fermentable carbohydrates, such as fructans, were shown to decrease hepatic lipogenesis in non obese rats, and to lessen hepatic steatosis and body weight in obese Zucker rats. If confirmed in obese humans, this would allow the development of functional food able to counteract the metabolic disturbances linked to obesity.
Fedrigo, Olivier; Babbitt, Courtney C.; Wortham, Matthew; Tewari, Alok K.; London, Darin; Song, Lingyun; Lee, Bum-Kyu; Iyer, Vishwanath R.; Parker, Stephen C. J.; Margulies, Elliott H.; Wray, Gregory A.; Furey, Terrence S.; Crawford, Gregory E.
2012-01-01
Understanding the molecular basis for phenotypic differences between humans and other primates remains an outstanding challenge. Mutations in non-coding regulatory DNA that alter gene expression have been hypothesized as a key driver of these phenotypic differences. This has been supported by differential gene expression analyses in general, but not by the identification of specific regulatory elements responsible for changes in transcription and phenotype. To identify the genetic source of regulatory differences, we mapped DNaseI hypersensitive (DHS) sites, which mark all types of active gene regulatory elements, genome-wide in the same cell type isolated from human, chimpanzee, and macaque. Most DHS sites were conserved among all three species, as expected based on their central role in regulating transcription. However, we found evidence that several hundred DHS sites were gained or lost on the lineages leading to modern human and chimpanzee. Species-specific DHS site gains are enriched near differentially expressed genes, are positively correlated with increased transcription, show evidence of branch-specific positive selection, and overlap with active chromatin marks. Species-specific sequence differences in transcription factor motifs found within these DHS sites are linked with species-specific changes in chromatin accessibility. Together, these indicate that the regulatory elements identified here are genetic contributors to transcriptional and phenotypic differences among primate species. PMID:22761590
A comparative study of disease genes and drug targets in the human protein interactome
2015-01-01
Background Disease genes cause or contribute genetically to the development of the most complex diseases. Drugs are the major approaches to treat the complex disease through interacting with their targets. Thus, drug targets are critical for treatment efficacy. However, the interrelationship between the disease genes and drug targets is not clear. Results In this study, we comprehensively compared the network properties of disease genes and drug targets for five major disease categories (cancer, cardiovascular disease, immune system disease, metabolic disease, and nervous system disease). We first collected disease genes from genome-wide association studies (GWAS) for five disease categories and collected their corresponding drugs based on drugs' Anatomical Therapeutic Chemical (ATC) classification. Then, we obtained the drug targets for these five different disease categories. We found that, though the intersections between disease genes and drug targets were small, disease genes were significantly enriched in targets compared to their enrichment in human protein-coding genes. We further compared network properties of the proteins encoded by disease genes and drug targets in human protein-protein interaction networks (interactome). The results showed that the drug targets tended to have higher degree, higher betweenness, and lower clustering coefficient in cancer Furthermore, we observed a clear fraction increase of disease proteins or drug targets in the near neighborhood compared with the randomized genes. Conclusions The study presents the first comprehensive comparison of the disease genes and drug targets in the context of interactome. The results provide some foundational network characteristics for further designing computational strategies to predict novel drug targets and drug repurposing. PMID:25861037
Crosley, E J; Elliot, M G; Christians, J K; Crespi, B J
2013-02-01
Recent evidence from chimpanzees and gorillas has raised doubts that preeclampsia is a uniquely human disease. The deep extravillous trophoblast (EVT) invasion and spiral artery remodeling that characterizes our placenta (and is abnormal in preeclampsia) is shared within great apes, setting Homininae apart from Hylobatidae and Old World Monkeys, which show much shallower trophoblast invasion and limited spiral artery remodeling. We hypothesize that the evolution of a more invasive placenta in the lineage ancestral to the great apes involved positive selection on genes crucial to EVT invasion and spiral artery remodeling. Furthermore, identification of placentally-expressed genes under selection in this lineage may identify novel genes involved in placental development. We tested for positive selection in approximately 18,000 genes using the ratio of non-synonymous to synonymous amino acid substitution for protein-coding DNA. DAVID Bioinformatics Resources identified biological processes enriched in positively selected genes, including processes related to EVT invasion and spiral artery remodeling. Analyses revealed 295 and 264 genes under significant positive selection on the branches ancestral to Hominidae (Human, Chimp, Gorilla, Orangutan) and Homininae (Human, Chimp, Gorilla), respectively. Gene ontology analysis of these gene sets demonstrated significant enrichments for several functional gene clusters relevant to preeclampsia risk, and sets of placentally-expressed genes that have been linked with preeclampsia and/or trophoblast invasion in other studies. Our study represents a novel approach to the identification of candidate genes and amino acid residues involved in placental pathologies by implicating them in the evolution of highly-invasive placenta. Copyright © 2012 Elsevier Ltd. All rights reserved.
A comparative study of disease genes and drug targets in the human protein interactome.
Sun, Jingchun; Zhu, Kevin; Zheng, W; Xu, Hua
2015-01-01
Disease genes cause or contribute genetically to the development of the most complex diseases. Drugs are the major approaches to treat the complex disease through interacting with their targets. Thus, drug targets are critical for treatment efficacy. However, the interrelationship between the disease genes and drug targets is not clear. In this study, we comprehensively compared the network properties of disease genes and drug targets for five major disease categories (cancer, cardiovascular disease, immune system disease, metabolic disease, and nervous system disease). We first collected disease genes from genome-wide association studies (GWAS) for five disease categories and collected their corresponding drugs based on drugs' Anatomical Therapeutic Chemical (ATC) classification. Then, we obtained the drug targets for these five different disease categories. We found that, though the intersections between disease genes and drug targets were small, disease genes were significantly enriched in targets compared to their enrichment in human protein-coding genes. We further compared network properties of the proteins encoded by disease genes and drug targets in human protein-protein interaction networks (interactome). The results showed that the drug targets tended to have higher degree, higher betweenness, and lower clustering coefficient in cancer Furthermore, we observed a clear fraction increase of disease proteins or drug targets in the near neighborhood compared with the randomized genes. The study presents the first comprehensive comparison of the disease genes and drug targets in the context of interactome. The results provide some foundational network characteristics for further designing computational strategies to predict novel drug targets and drug repurposing.
Basak, Jolly; Nithin, Chandran
2015-01-01
Non-coding RNAs (ncRNAs) have emerged as versatile master regulator of biological functions in recent years. MicroRNAs (miRNAs) are small endogenous ncRNAs of 18-24 nucleotides in length that originates from long self-complementary precursors. Besides their direct involvement in developmental processes, plant miRNAs play key roles in gene regulatory networks and varied biological processes. Alternatively, long ncRNAs (lncRNAs) are a large and diverse class of transcribed ncRNAs whose length exceed that of 200 nucleotides. Plant lncRNAs are transcribed by different RNA polymerases, showing diverse structural features. Plant lncRNAs also are important regulators of gene expression in diverse biological processes. There has been a breakthrough in the technology of genome editing, the CRISPR-Cas9 (clustered regulatory interspaced short palindromic repeats/CRISPR-associated protein 9) technology, in the last decade. CRISPR loci are transcribed into ncRNA and eventually form a functional complex with Cas9 and further guide the complex to cleave complementary invading DNA. The CRISPR-Cas technology has been successfully applied in model plants such as Arabidopsis and tobacco and important crops like wheat, maize, and rice. However, all these studies are focused on protein coding genes. Information about targeting non-coding genes is scarce. Hitherto, the CRISPR-Cas technology has been exclusively used in vertebrate systems to engineer miRNA/lncRNAs, but it is still relatively unexplored in plants. While briefing miRNAs, lncRNAs and applications of the CRISPR-Cas technology in human and animals, this review essentially elaborates several strategies to overcome the challenges of applying the CRISPR-Cas technology in editing ncRNAs in plants and the future perspective of this field.
CVD-associated non-coding RNA, ANRIL, modulates expression of atherogenic pathways in VSMC
DOE Office of Scientific and Technical Information (OSTI.GOV)
Congrains, Ada; Kamide, Kei; Katsuya, Tomohiro
Highlights: Black-Right-Pointing-Pointer ANRIL maps in the strongest susceptibility locus for cardiovascular disease. Black-Right-Pointing-Pointer Silencing of ANRIL leads to altered expression of tissue remodeling-related genes. Black-Right-Pointing-Pointer The effects of ANRIL on gene expression are splicing variant specific. Black-Right-Pointing-Pointer ANRIL affects progression of cardiovascular disease by regulating proliferation and apoptosis pathways. -- Abstract: ANRIL is a newly discovered non-coding RNA lying on the strongest genetic susceptibility locus for cardiovascular disease (CVD) in the chromosome 9p21 region. Genome-wide association studies have been linking polymorphisms in this locus with CVD and several other major diseases such as diabetes and cancer. The role of thismore » non-coding RNA in atherosclerosis progression is still poorly understood. In this study, we investigated the implication of ANRIL in the modulation of gene sets directly involved in atherosclerosis. We designed and tested siRNA sequences to selectively target two exons (exon 1 and exon 19) of the transcript and successfully knocked down expression of ANRIL in human aortic vascular smooth muscle cells (HuAoVSMC). We used a pathway-focused RT-PCR array to profile gene expression changes caused by ANRIL knock down. Notably, the genes affected by each of the siRNAs were different, suggesting that different splicing variants of ANRIL might have distinct roles in cell physiology. Our results suggest that ANRIL splicing variants play a role in coordinating tissue remodeling, by modulating the expression of genes involved in cell proliferation, apoptosis, extra-cellular matrix remodeling and inflammatory response to finally impact in the risk of cardiovascular disease and other pathologies.« less
Coexpression of the KCNA3B gene product with Kv1.5 leads to a novel A-type potassium channel.
Leicher, T; Bähring, R; Isbrandt, D; Pongs, O
1998-12-25
Shaker-related voltage-gated potassium (Kv) channels may be heterooligomers consisting of membrane-integral alpha-subunits associated with auxiliary cytoplasmic beta-subunits. In this study we have cloned the human Kvbeta3.1 subunit and the corresponding KCNA3B gene. Identification of sequence-tagged sites in the gene mapped KCNA3B to band p13.1 of human chromosome 17. Comparison of the KCNA1B, KCNA2B, and KCNA3B gene structures showed that the three Kvbeta genes have very disparate lengths varying from >/=350 kb (KCNA1B) to approximately 7 kb (KCNA3B). Yet, the exon patterns of the three genes, which code for the seven known mammalian Kvbeta subunits, are very similar. The Kvbeta1 and Kvbeta2 splice variants are generated by alternative use of 5'-exons. Mouse Kvbeta4, a potential splice variant of Kvbeta3, is a read-through product where the open reading frame starts within the sequence intervening between Kvbeta3 exons 7 and 8. The human KCNA3B sequence does not contain a mouse Kvbeta4-like open reading frame. Human Kvbeta3 mRNA is specifically expressed in the brain, where it is predominantly detected in the cerebellum. The heterologous coexpression of human Kv1.5 and Kvbeta3.1 subunits in Chinese hamster ovary cells yielded a novel Kv channel mediating very fast inactivating (A-type) outward currents upon depolarization. Thus, the expression of Kvbeta3.1 subunits potentially extends the possibilities to express diverse A-type Kv channels in the human brain.
Schmouth, Jean-François; Bonaguro, Russell J.; Corso-Diaz, Ximena; Simpson, Elizabeth M.
2012-01-01
An increasing body of literature from genome-wide association studies and human whole-genome sequencing highlights the identification of large numbers of candidate regulatory variants of potential therapeutic interest in numerous diseases. Our relatively poor understanding of the functions of non-coding genomic sequence, and the slow and laborious process of experimental validation of the functional significance of human regulatory variants, limits our ability to fully benefit from this information in our efforts to comprehend human disease. Humanized mouse models (HuMMs), in which human genes are introduced into the mouse, suggest an approach to this problem. In the past, HuMMs have been used successfully to study human disease variants; e.g., the complex genetic condition arising from Down syndrome, common monogenic disorders such as Huntington disease and β-thalassemia, and cancer susceptibility genes such as BRCA1. In this commentary, we highlight a novel method for high-throughput single-copy site-specific generation of HuMMs entitled High-throughput Human Genes on the X Chromosome (HuGX). This method can be applied to most human genes for which a bacterial artificial chromosome (BAC) construct can be derived and a mouse-null allele exists. This strategy comprises (1) the use of recombineering technology to create a human variant–harbouring BAC, (2) knock-in of this BAC into the mouse genome using Hprt docking technology, and (3) allele comparison by interspecies complementation. We demonstrate the throughput of the HuGX method by generating a series of seven different alleles for the human NR2E1 gene at Hprt. In future challenges, we consider the current limitations of experimental approaches and call for a concerted effort by the genetics community, for both human and mouse, to solve the challenge of the functional analysis of human regulatory variation. PMID:22396661
Jing, Zhao; Zou, Hai-Zhou; Xu, Fang
2012-09-01
To study the molecular mechanisms of Curcuma Wenyujin extract-mediated inhibitory effects on human esophageal carcinoma cells. The Curcuma Wenyujin extract was obtained by supercritical carbon dioxide extraction. TE-1 cells were divided into 4 groups after adherence. 100 microL RMPI-1640 culture medium containing 0.1% DMSO was added in Group 1 as the control group. 100 microL 25, 50, and 100 mg/L Curcuma Wenyujin extract complete culture medium was respectively added in the rest 3 groups as the low, middle, and high dose Curcuma Wenyujin extract groups. The effects of different doses of Curcuma Wenyujin extract (25, 50, and 100 mg/L) on the proliferation of human esophageal carcinoma cell line TE-1 in vitro were analyzed by MTT assay. The gene expression profile was identified by cDNA microarrays in esophageal carcinoma TE-1 cells exposed to Curcuma Wenyujin extract for 48 h. The differential expression genes were further analyzed by Gene Ontology function analysis. Compared with the control group, MTT results showed that Curcuma Wenyujin extract significantly inhibited the proliferation of TE-1 cells in a dose-dependent manner (P<0.05). The expression level of 88 genes changed with significance, including 66 up-regulation genes and 22 down-regulation genes. Gene Ontology analysis indicated the genes coding for proteins was involved in signal transduction (6), cell cycle (8), apoptosis (14), and cell differentiation (10). The Curcuma Wenyujin extract could inhibit the growth of human esophageal carcinoma cell line TE-1 in vitro. The molecular mechanisms might be associated with regulating genes expressions at multi-levels.
Definition of RNA Polymerase II CoTC Terminator Elements in the Human Genome
Nojima, Takayuki; Dienstbier, Martin; Murphy, Shona; Proudfoot, Nicholas J.; Dye, Michael J.
2013-01-01
Summary Mammalian RNA polymerase II (Pol II) transcription termination is an essential step in protein-coding gene expression that is mediated by pre-mRNA processing activities and DNA-encoded terminator elements. Although much is known about the role of pre-mRNA processing in termination, our understanding of the characteristics and generality of terminator elements is limited. Whereas promoter databases list up to 40,000 known and potential Pol II promoter sequences, fewer than ten Pol II terminator sequences have been described. Using our knowledge of the human β-globin terminator mechanism, we have developed a selection strategy for mapping mammalian Pol II terminator elements. We report the identification of 78 cotranscriptional cleavage (CoTC)-type terminator elements at endogenous gene loci. The results of this analysis pave the way for the full understanding of Pol II termination pathways and their roles in gene expression. PMID:23562152
Transcriptional diversity during lineage commitment of human blood progenitors.
Chen, Lu; Kostadima, Myrto; Martens, Joost H A; Canu, Giovanni; Garcia, Sara P; Turro, Ernest; Downes, Kate; Macaulay, Iain C; Bielczyk-Maczynska, Ewa; Coe, Sophia; Farrow, Samantha; Poudel, Pawan; Burden, Frances; Jansen, Sjoert B G; Astle, William J; Attwood, Antony; Bariana, Tadbir; de Bono, Bernard; Breschi, Alessandra; Chambers, John C; Consortium, Bridge; Choudry, Fizzah A; Clarke, Laura; Coupland, Paul; van der Ent, Martijn; Erber, Wendy N; Jansen, Joop H; Favier, Rémi; Fenech, Matthew E; Foad, Nicola; Freson, Kathleen; van Geet, Chris; Gomez, Keith; Guigo, Roderic; Hampshire, Daniel; Kelly, Anne M; Kerstens, Hindrik H D; Kooner, Jaspal S; Laffan, Michael; Lentaigne, Claire; Labalette, Charlotte; Martin, Tiphaine; Meacham, Stuart; Mumford, Andrew; Nürnberg, Sylvia; Palumbo, Emilio; van der Reijden, Bert A; Richardson, David; Sammut, Stephen J; Slodkowicz, Greg; Tamuri, Asif U; Vasquez, Louella; Voss, Katrin; Watt, Stephen; Westbury, Sarah; Flicek, Paul; Loos, Remco; Goldman, Nick; Bertone, Paul; Read, Randy J; Richardson, Sylvia; Cvejic, Ana; Soranzo, Nicole; Ouwehand, Willem H; Stunnenberg, Hendrik G; Frontini, Mattia; Rendon, Augusto
2014-09-26
Blood cells derive from hematopoietic stem cells through stepwise fating events. To characterize gene expression programs driving lineage choice, we sequenced RNA from eight primary human hematopoietic progenitor populations representing the major myeloid commitment stages and the main lymphoid stage. We identified extensive cell type-specific expression changes: 6711 genes and 10,724 transcripts, enriched in non-protein-coding elements at early stages of differentiation. In addition, we found 7881 novel splice junctions and 2301 differentially used alternative splicing events, enriched in genes involved in regulatory processes. We demonstrated experimentally cell-specific isoform usage, identifying nuclear factor I/B (NFIB) as a regulator of megakaryocyte maturation-the platelet precursor. Our data highlight the complexity of fating events in closely related progenitor populations, the understanding of which is essential for the advancement of transplantation and regenerative medicine. Copyright © 2014, American Association for the Advancement of Science.
Evolution of the Iga Heavy Chain Gene in the Genus Mus
Osborne, B. A.; Golde, T. E.; Schwartz, R. L.; Rudikoff, S.
1988-01-01
To examine questions of immunoglobulin gene evolution, the IgA α heavy chain gene from Mus pahari, an evolutionarily distant relative to Mus musculus domesticus, was cloned and sequenced. The sequence, when compared to the IgA gene of BALB/c or human, demonstrated that the IgA gene is evolving in a mosaic fashion with the hinge region accumulating mutations most rapidly and the third domain at a considerably lower frequency. In spite of this pronounced accumulation of mutations, the hinge region appears to maintain the conformation of a random coil. A marked propensity to accumulate replacement over silent site changes in the coding regions was noted, as was a definite codon bias. The possibility that these two phenomena are interrelated is discussed. PMID:2842228
Zhuo, Chuanjun; Hou, Weihong; Hu, Lirong; Lin, Chongguang; Chen, Ce; Lin, Xiaodong
2017-01-01
Schizophrenia is a genetically related mental illness, in which the majority of genetic alterations occur in the non-coding regions of the human genome. In the past decade, a growing number of regulatory non-coding RNAs (ncRNAs) including microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) have been identified to be strongly associated with schizophrenia. However, the studies of these ncRNAs in the pathophysiology of schizophrenia and the reverting of their genetic defects in restoration of the normal phenotype have been hampered by insufficient technology to manipulate these ncRNA genes effectively as well as a lack of appropriate animal models. Most recently, a revolutionary gene editing technology known as Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated nuclease 9 (Cas9; CRISPR/Cas9) has been developed that enable researchers to overcome these challenges. In this review article, we mainly focus on the schizophrenia-related ncRNAs and the use of CRISPR/Cas9-mediated editing on the non-coding regions of the genomic DNA in proving causal relationship between the genetic defects and the pathophysiology of schizophrenia. We subsequently discuss the potential of translating this advanced technology into a clinical therapy for schizophrenia, although the CRISPR/Cas9 technology is currently still in its infancy and immature to put into use in the treatment of diseases. Furthermore, we suggest strategies to accelerate the pace from the bench to the bedside. This review describes the application of the powerful and feasible CRISPR/Cas9 technology to manipulate schizophrenia-associated ncRNA genes. This technology could help researchers tackle this complex health problem and perhaps other genetically related mental disorders due to the overlapping genetic alterations of schizophrenia with other mental illnesses. PMID:28217082
Khabirova, Eleonora; Moloney, Aileen; Marciniak, Stefan J; Williams, Julie; Lomas, David A; Oliver, Stephen G; Favrin, Giorgio; Sattelle, David B; Crowther, Damian C
2014-01-01
The human Aβ peptide causes progressive paralysis when expressed in the muscles of the nematode worm, C. elegans. We have exploited this model of Aβ toxicity by carrying out an RNAi screen to identify genes whose reduced expression modifies the severity of this locomotor phenotype. Our initial finding was that none of the human orthologues of these worm genes is identical with the genome-wide significant GWAS genes reported to date (the "white zone"); moreover there was no identity between worm screen hits and the longer list of GWAS genes which included those with borderline levels of significance (the "grey zone"). This indicates that Aβ toxicity should not be considered as equivalent to sporadic AD. To increase the sensitivity of our analysis, we then considered the physical interactors (+1 interactome) of the products of the genes in both the worm and the white+grey zone lists. When we consider these worm and GWAS gene lists we find that 4 of the 60 worm genes have a +1 interactome overlap that is larger than expected by chance. Two of these genes form a chaperonin complex, the third is closely associated with this complex and the fourth gene codes for actin, the major substrate of the same chaperonin.
Using Zebrafish to Test the Genetic Basis of Human Craniofacial Diseases.
Machado, R Grecco; Eames, B Frank
2017-10-01
Genome-wide association studies (GWASs) opened an innovative and productive avenue to investigate the molecular basis of human craniofacial disease. However, GWASs identify candidate genes only; they do not prove that any particular one is the functional villain underlying disease or just an unlucky genomic bystander. Genetic manipulation of animal models is the best approach to reveal which genetic loci identified from human GWASs are functionally related to specific diseases. The purpose of this review is to discuss the potential of zebrafish to resolve which candidate genetic loci are mechanistic drivers of craniofacial diseases. Many anatomic, embryonic, and genetic features of craniofacial development are conserved among zebrafish and mammals, making zebrafish a good model of craniofacial diseases. Also, the ability to manipulate gene function in zebrafish was greatly expanded over the past 20 y, enabling systems such as Gateway Tol2 and CRISPR-Cas9 to test gain- and loss-of-function alleles identified from human GWASs in coding and noncoding regions of DNA. With the optimization of genetic editing methods, large numbers of candidate genes can be efficiently interrogated. Finding the functional villains that underlie diseases will permit new treatments and prevention strategies and will increase understanding of how gene pathways operate during normal development.
Identification of the human homolog of the imprinted mouse Air non-coding RNA
Yotova, Iveta Y.; Vlatkovic, Irena M.; Pauler, Florian M.; Warczok, Katarzyna E.; Ambros, Peter F.; Oshimura, Mitsuo; Theussl, Hans-Christian; Gessler, Manfred; Wagner, Erwin F.; Barlow, Denise P.
2010-01-01
Genomic imprinting is widely conserved amongst placental mammals. Imprinted expression of IGF2R, however, differs between mice and humans. In mice, Igf2r imprinted expression is seen in all fetal and adult tissues. In humans, adult tissues lack IGF2R imprinted expression, but it is found in fetal tissues and Wilms' tumors where it is polymorphic and only seen in a small proportion of tested samples. Mouse Igf2r imprinted expression is controlled by the Air (Airn) ncRNA whose promoter lies in an intronic maternally-methylated CpG island. The human IGF2R gene carries a homologous intronic maternally-methylated CpG island of unknown function. Here, we use transfection and transgenic studies to show that the human IGF2R intronic CpG island is a ncRNA promoter. We also identify the same ncRNA at the endogenous human locus in 16–40% of Wilms' tumors. Thus, the human IGF2R gene shows evolutionary conservation of key features that control imprinted expression in the mouse. PMID:18789384
Reinhardt, Josephine A.; Wanjiru, Betty M.; Brant, Alicia T.; Saelao, Perot; Begun, David J.; Jones, Corbin D.
2013-01-01
How non-coding DNA gives rise to new protein-coding genes (de novo genes) is not well understood. Recent work has revealed the origins and functions of a few de novo genes, but common principles governing the evolution or biological roles of these genes are unknown. To better define these principles, we performed a parallel analysis of the evolution and function of six putatively protein-coding de novo genes described in Drosophila melanogaster. Reconstruction of the transcriptional history of de novo genes shows that two de novo genes emerged from novel long non-coding RNAs that arose at least 5 MY prior to evolution of an open reading frame. In contrast, four other de novo genes evolved a translated open reading frame and transcription within the same evolutionary interval suggesting that nascent open reading frames (proto-ORFs), while not required, can contribute to the emergence of a new de novo gene. However, none of the genes arose from proto-ORFs that existed long before expression evolved. Sequence and structural evolution of de novo genes was rapid compared to nearby genes and the structural complexity of de novo genes steadily increases over evolutionary time. Despite the fact that these genes are transcribed at a higher level in males than females, and are most strongly expressed in testes, RNAi experiments show that most of these genes are essential in both sexes during metamorphosis. This lethality suggests that protein coding de novo genes in Drosophila quickly become functionally important. PMID:24146629
Morley, Laura; McNally, Alan; Paszkiewicz, Konrad; Corander, Jukka; Méric, Guillaume; Sheppard, Samuel K.; Blom, Jochen
2015-01-01
Campylobacter jejuni is a highly diverse species of bacteria commonly associated with infectious intestinal disease of humans and zoonotic carriage in poultry, cattle, pigs, and other animals. The species contains a large number of distinct clonal complexes that vary from host generalist lineages commonly found in poultry, livestock, and human disease cases to host-adapted specialized lineages primarily associated with livestock or poultry. Here, we present novel data on the ST403 clonal complex of C. jejuni, a lineage that has not been reported in avian hosts. Our data show that the lineage exhibits a distinctive pattern of intralineage recombination that is accompanied by the presence of lineage-specific restriction-modification systems. Furthermore, we show that the ST403 complex has undergone gene decay at a number of loci. Our data provide a putative link between the lack of association with avian hosts of C. jejuni ST403 and both gene gain and gene loss through nonsense mutations in coding sequences of genes, resulting in pseudogene formation. PMID:25795671
Ziegler, Andreas; Dohr, Gotrfried; Uchanska-Ziegler, Barbara
2002-07-01
Polymorphic genes of the human major histocompatibility complex [MHC; human leukocyte antigen (HLA)] are probably important in determining resistance to parasites and avoidance of inbreeding. We investigated whether HLA-associated sexual selection could also involve HLA-linked olfactory receptor (OR) genes, which might not only participate in olfaction-guided mate choice, but also in selection processes within the testis. The testicular expression status of HLA class I molecules (by immunohistology) and HLA-linked OR genes (by transcriptional analysis) was determined. Various HLA class I heavy chains, but not beta2-microglobulin (beta2m), were expressed, mainly at the spermatocyte I stage. Of 17 HLA-linked OR genes analyzed, eight were found to be transcribed in the testis. They exhibited varying numbers of 5'- or 3'-non-coding exons as well as differential splicing. We suggest that testis-expressed polymorphic HLA and OR proteins are functionally connected and serve the selection of spermatozoa, enabling them to distinguish 'self from 'non-self [the sperm-receptor-selection (SRS) hypothesis].
Pleiotropic Effects of Variants in Dementia Genes in Parkinson Disease.
Ibanez, Laura; Dube, Umber; Davis, Albert A; Fernandez, Maria V; Budde, John; Cooper, Breanna; Diez-Fairen, Monica; Ortega-Cubero, Sara; Pastor, Pau; Perlmutter, Joel S; Cruchaga, Carlos; Benitez, Bruno A
2018-01-01
Background: The prevalence of dementia in Parkinson disease (PD) increases dramatically with advancing age, approaching 80% in patients who survive 20 years with the disease. Increasing evidence suggests clinical, pathological and genetic overlap between Alzheimer disease, dementia with Lewy bodies and frontotemporal dementia with PD. However, the contribution of the dementia-causing genes to PD risk, cognitive impairment and dementia in PD is not fully established. Objective: To assess the contribution of coding variants in Mendelian dementia-causing genes on the risk of developing PD and the effect on cognitive performance of PD patients. Methods: We analyzed the coding regions of the amyloid-beta precursor protein ( APP ), Presenilin 1 and 2 ( PSEN1, PSEN2 ), and Granulin ( GRN ) genes from 1,374 PD cases and 973 controls using pooled-DNA targeted sequence, human exome-chip and whole-exome sequencing (WES) data by single variant and gene base (SKAT-O and burden tests) analyses. Global cognitive function was assessed using the Mini-Mental State Examination (MMSE) or the Montreal Cognitive Assessment (MoCA). The effect of coding variants in dementia-causing genes on cognitive performance was tested by multiple regression analysis adjusting for gender, disease duration, age at dementia assessment, study site and APOE carrier status. Results: Known AD pathogenic mutations in the PSEN1 (p.A79V) and PSEN2 (p.V148I) genes were found in 0.3% of all PD patients. There was a significant burden of rare, likely damaging variants in the GRN and PSEN1 genes in PD patients when compared with frequencies in the European population from the ExAC database. Multiple regression analysis revealed that PD patients carrying rare variants in the APP, PSEN1, PSEN2 , and GRN genes exhibit lower cognitive tests scores than non-carrier PD patients ( p = 2.0 × 10 -4 ), independent of age at PD diagnosis, age at evaluation, APOE status or recruitment site. Conclusions: Pathogenic mutations in the Alzheimer disease-causing genes ( PSEN1 and PSEN2) are found in sporadic PD patients. PD patients with cognitive decline carry rare variants in dementia-causing genes. Variants in genes causing Mendelian neurodegenerative diseases exhibit pleiotropic effects.
MetaRanker 2.0: a web server for prioritization of genetic variation data
Pers, Tune H.; Dworzyński, Piotr; Thomas, Cecilia Engel; Lage, Kasper; Brunak, Søren
2013-01-01
MetaRanker 2.0 is a web server for prioritization of common and rare frequency genetic variation data. Based on heterogeneous data sets including genetic association data, protein–protein interactions, large-scale text-mining data, copy number variation data and gene expression experiments, MetaRanker 2.0 prioritizes the protein-coding part of the human genome to shortlist candidate genes for targeted follow-up studies. MetaRanker 2.0 is made freely available at www.cbs.dtu.dk/services/MetaRanker-2.0. PMID:23703204
MetaRanker 2.0: a web server for prioritization of genetic variation data.
Pers, Tune H; Dworzyński, Piotr; Thomas, Cecilia Engel; Lage, Kasper; Brunak, Søren
2013-07-01
MetaRanker 2.0 is a web server for prioritization of common and rare frequency genetic variation data. Based on heterogeneous data sets including genetic association data, protein-protein interactions, large-scale text-mining data, copy number variation data and gene expression experiments, MetaRanker 2.0 prioritizes the protein-coding part of the human genome to shortlist candidate genes for targeted follow-up studies. MetaRanker 2.0 is made freely available at www.cbs.dtu.dk/services/MetaRanker-2.0.
Pujar, Shashikant; O’Leary, Nuala A; Farrell, Catherine M; Mudge, Jonathan M; Wallin, Craig; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bult, Carol J; Frankish, Adam; Pruitt, Kim D
2018-01-01
Abstract The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. PMID:29126148
Phaneuf, D; Labelle, Y; Bérubé, D; Arden, K; Cavenee, W; Gagné, R; Tanguay, R M
1991-01-01
Type 1 hereditary tyrosinemia (HT) is an autosomal recessive disease characterized by a deficiency of the enzyme fumarylacetoacetate hydrolase (FAH; E.C.3.7.1.2). We have isolated human FAH cDNA clones by screening a liver cDNA expression library using specific antibodies and plaque hybridization with a rat FAH cDNA probe. A 1,477-bp cDNA was sequenced and shown to code for FAH by an in vitro transcription-translation assay and sequence homology with tryptic fragments of purified FAH. Transient expression of this FAH cDNA in transfected CV-1 mammalian cells resulted in the synthesis of an immunoreactive protein comigrating with purified human liver FAH on SDS-PAGE and having enzymatic activity as shown by the hydrolysis of the natural substrate fumarylacetoacetate. This indicates that the single polypeptide chain encoded by the FAH gene contains all the genetic information required for functional activity, suggesting that the dimer found in vivo is a homodimer. The human FAH cDNA was used as a probe to determine the gene's chromosomal localization using somatic cell hybrids and in situ hybridization. The human FAH gene maps to the long arm of chromosome 15 in the region q23-q25. Images Figure 1 Figure 3 Figure 4 Figure 6 Figure 8 PMID:1998338
Hart, Elizabeth A; Caccamo, Mario; Harrow, Jennifer L; Humphray, Sean J; Gilbert, James GR; Trevanion, Steve; Hubbard, Tim; Rogers, Jane; Rothschild, Max F
2007-01-01
Background We describe here the sequencing, annotation and comparative analysis of an 8 Mb region of pig chromosome 17, which provides a useful test region to assess coverage and quality for the pig genome sequencing project. We report our findings comparing the annotation of draft sequence assembled at different depths of coverage. Results Within this region we annotated 71 loci, of which 53 are orthologous to human known coding genes. When compared to the syntenic regions in human (20q13.13-q13.33) and mouse (chromosome 2, 167.5 Mb-178.3 Mb), this region was found to be highly conserved with respect to gene order. The most notable difference between the three species is the presence of a large expansion of zinc finger coding genes and pseudogenes on mouse chromosome 2 between Edn3 and Phactr3 that is absent from pig and human. All of our annotation has been made publicly available in the Vertebrate Genome Annotation browser, VEGA. We assessed the impact of coverage on sequence assembly across this region and found, as expected, that increased sequence depth resulted in fewer, longer contigs. One-third of our annotated loci could not be fully re-aligned back to the low coverage version of the sequence, principally because the transcripts are fragmented over several contigs. Conclusion We have demonstrated the considerable advantages of sequencing at increased read depths and discuss the implications that lower coverage sequence may have on subsequent comparative and functional studies, particularly those involving complex loci such as GNAS. PMID:17705864
Tau mRNA 3'UTR-to-CDS ratio is increased in Alzheimer disease.
García-Escudero, Vega; Gargini, Ricardo; Martín-Maestro, Patricia; García, Esther; García-Escudero, Ramón; Avila, Jesús
2017-08-10
Neurons frequently show an imbalance in expression of the 3' untranslated region (3'UTR) relative to the coding DNA sequence (CDS) region of mature messenger RNAs (mRNA). The ratio varies among different cells or parts of the brain. The Map2 protein levels per cell depend on the 3'UTR-to-CDS ratio rather than the total mRNA amount, which suggests powerful regulation of protein expression by 3'UTR sequences. Here we found that MAPT (the microtubule-associated protein tau gene) 3'UTR levels are particularly high with respect to other genes; indeed, the 3'UTR-to-CDS ratio of MAPT is balanced in healthy brain in mouse and human. The tau protein accumulates in Alzheimer diseased brain. We nonetheless observed that the levels of RNA encoding MAPT/tau were diminished in these patients' brains. To explain this apparently contradictory result, we studied MAPT mRNA stoichiometry in coding and non-coding regions, and found that the 3'UTR-to-CDS ratio was higher in the hippocampus of Alzheimer disease patients, with higher tau protein but lower total mRNA levels. Our data indicate that changes in the 3'UTR-to-CDS ratio have a regulatory role in the disease. Future research should thus consider not only mRNA levels, but also the ratios between coding and non-coding regions. Copyright © 2017 Elsevier B.V. All rights reserved.
A decade of human genome project conclusion: Scientific diffusion about our genome knowledge.
Moraes, Fernanda; Góes, Andréa
2016-05-06
The Human Genome Project (HGP) was initiated in 1990 and completed in 2003. It aimed to sequence the whole human genome. Although it represented an advance in understanding the human genome and its complexity, many questions remained unanswered. Other projects were launched in order to unravel the mysteries of our genome, including the ENCyclopedia of DNA Elements (ENCODE). This review aims to analyze the evolution of scientific knowledge related to both the HGP and ENCODE projects. Data were retrieved from scientific articles published in 1990-2014, a period comprising the development and the 10 years following the HGP completion. The fact that only 20,000 genes are protein and RNA-coding is one of the most striking HGP results. A new concept about the organization of genome arose. The ENCODE project was initiated in 2003 and targeted to map the functional elements of the human genome. This project revealed that the human genome is pervasively transcribed. Therefore, it was determined that a large part of the non-protein coding regions are functional. Finally, a more sophisticated view of chromatin structure emerged. The mechanistic functioning of the genome has been redrafted, revealing a much more complex picture. Besides, a gene-centric conception of the organism has to be reviewed. A number of criticisms have emerged against the ENCODE project approaches, raising the question of whether non-conserved but biochemically active regions are truly functional. Thus, HGP and ENCODE projects accomplished a great map of the human genome, but the data generated still requires further in depth analysis. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:215-223, 2016. © 2016 The International Union of Biochemistry and Molecular Biology.
MicroRNA genes are frequently located near mouse cancer susceptibility loci
Sevignani, Cinzia; Calin, George A.; Nnadi, Stephanie C.; Shimizu, Masayoshi; Davuluri, Ramana V.; Hyslop, Terry; Demant, Peter; Croce, Carlo M.; Siracusa, Linda D.
2007-01-01
MicroRNAs (miRNAs) are short 19- to 24-nt RNA molecules that have been shown to regulate the expression of other genes in a variety of eukaryotic systems. Abnormal expression of miRNAs has been observed in several human cancers, and furthermore, germ-line and somatic mutations in human miRNAs were recently identified in patients with chronic lymphocytic leukemia. Thus, human miRNAs can act as tumor suppressor genes or oncogenes, where mutations, deletions, or amplifications can underlie the development of certain types of leukemia. In addition, previous studies have shown that miRNA expression profiles can distinguish among human solid tumors from different organs. Because a single miRNA can simultaneously influence the expression of two or more protein-coding genes, we hypothesized that miRNAs could be candidate genes for cancer risk. Research in complex trait genetics has demonstrated that genetic background determines cancer susceptibility or resistance in various tissues, such as colon and lung, of different inbred mouse strains. We compared the genome positions of mouse tumor susceptibility loci with those of mouse miRNAs. Here, we report a statistically significant association between the chromosomal location of miRNAs and those of mouse cancer susceptibility loci that influence the development of solid tumors. Furthermore, we identified distinct patterns of flanking DNA sequences for several miRNAs located at or near susceptibility loci in inbred strains with different tumor susceptibilities. These data provide a catalog of miRNA genes in inbred strains that could represent genes involved in the development and penetrance of solid tumors. PMID:17470785
Glinsky, Gennadi V.
2015-01-01
Despite significant progress in the structural and functional characterization of the human genome, understanding of the mechanisms underlying the genetic basis of human phenotypic uniqueness remains limited. Here, I report that transposable element-derived sequences, most notably LTR7/HERV-H, LTR5_Hs, and L1HS, harbor 99.8% of the candidate human-specific regulatory loci (HSRL) with putative transcription factor-binding sites in the genome of human embryonic stem cells (hESC). A total of 4,094 candidate HSRL display selective and site-specific binding of critical regulators (NANOG [Nanog homeobox], POU5F1 [POU class 5 homeobox 1], CCCTC-binding factor [CTCF], Lamin B1), and are preferentially located within the matrix of transcriptionally active DNA segments that are hypermethylated in hESC. hESC-specific NANOG-binding sites are enriched near the protein-coding genes regulating brain size, pluripotency long noncoding RNAs, hESC enhancers, and 5-hydroxymethylcytosine-harboring regions immediately adjacent to binding sites. Sequences of only 4.3% of hESC-specific NANOG-binding sites are present in Neanderthals’ genome, suggesting that a majority of these regulatory elements emerged in Modern Humans. Comparisons of estimated creation rates of novel TF-binding sites revealed that there was 49.7-fold acceleration of creation rates of NANOG-binding sites in genomes of Chimpanzees compared with the mouse genomes and further 5.7-fold acceleration in genomes of Modern Humans compared with the Chimpanzees genomes. Preliminary estimates suggest that emergence of one novel NANOG-binding site detectable in hESC required 466 years of evolution. Pathway analysis of coding genes that have hESC-specific NANOG-binding sites within gene bodies or near gene boundaries revealed their association with physiological development and functions of nervous and cardiovascular systems, embryonic development, behavior, as well as development of a diverse spectrum of pathological conditions such as cancer, diseases of cardiovascular and reproductive systems, metabolic diseases, multiple neurological and psychological disorders. A proximity placement model is proposed explaining how a 33–47% excess of NANOG, CTCF, and POU5F1 proteins immobilized on a DNA scaffold may play a functional role at distal regulatory elements. PMID:25956794
Network perturbation by recurrent regulatory variants in cancer
Cho, Ara; Lee, Insuk; Choi, Jung Kyoon
2017-01-01
Cancer driving genes have been identified as recurrently affected by variants that alter protein-coding sequences. However, a majority of cancer variants arise in noncoding regions, and some of them are thought to play a critical role through transcriptional perturbation. Here we identified putative transcriptional driver genes based on combinatorial variant recurrence in cis-regulatory regions. The identified genes showed high connectivity in the cancer type-specific transcription regulatory network, with high outdegree and many downstream genes, highlighting their causative role during tumorigenesis. In the protein interactome, the identified transcriptional drivers were not as highly connected as coding driver genes but appeared to form a network module centered on the coding drivers. The coding and regulatory variants associated via these interactions between the coding and transcriptional drivers showed exclusive and complementary occurrence patterns across tumor samples. Transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes. PMID:28333928
Compositional Gene Landscapes in Vertebrates
Cruveiller, Stéphane; Jabbari, Kamel; Clay, Oliver; Bernardi, Giorgio
2004-01-01
The existence of a well conserved linear relationship between GC levels of genes' second and third codon positions (GC2, GC3) prompted us to focus on the landscape, or joint distribution, spanned by these two variables. In human, well curated coding sequences now cover at least 15%–30% of the estimated total gene set. Our analysis of the landscape defined by this gene set revealed not only the well documented linear crest, but also the presence of several peaks and valleys along that crest, a property that was also indicated in two other warm-blooded vertebrates represented by large gene databases, that is, mouse and chicken. GC2 is the sum of eight amino acid frequencies, whereas GC3 is linearly related to the GC level of the chromosomal region containing the gene. The landscapes therefore portray relations between proteins and the DNA environments of the genes that encode them. PMID:15123586
Compositional gene landscapes in vertebrates.
Cruveiller, Stéphane; Jabbari, Kamel; Clay, Oliver; Bernardi, Giorgio
2004-05-01
The existence of a well conserved linear relationship between GC levels of genes' second and third codon positions (GC2, GC3) prompted us to focus on the landscape, or joint distribution, spanned by these two variables. In human, well curated coding sequences now cover at least 15%-30% of the estimated total gene set. Our analysis of the landscape defined by this gene set revealed not only the well documented linear crest, but also the presence of several peaks and valleys along that crest, a property that was also indicated in two other warm-blooded vertebrates represented by large gene databases, that is, mouse and chicken. GC2 is the sum of eight amino acid frequencies, whereas GC3 is linearly related to the GC level of the chromosomal region containing the gene. The landscapes therefore portray relations between proteins and the DNA environments of the genes that encode them.
Janova, Eva; Matiasovic, Jan; Vahala, Jiri; Vodicka, Roman; Van Dyk, Enette; Horin, Petr
2009-07-01
The major histocompatibility complex genes coding for antigen binding and presenting molecules are the most polymorphic genes in the vertebrate genome. We studied the DRA and DQA gene polymorphism of the family Equidae. In addition to 11 previously reported DRA and 24 DQA alleles, six new DRA sequences and 13 new DQA alleles were identified in the genus Equus. Phylogenetic analysis of both DRA and DQA sequences provided evidence for trans-species polymorphism in the family Equidae. The phylogenetic trees differed from species relationships defined by standard taxonomy of Equidae and from trees based on mitochondrial or neutral gene sequence data. Analysis of selection showed differences between the less variable DRA and more variable DQA genes. DRA alleles were more often shared by more species. The DQA sequences analysed showed strong amongst-species positive selection; the selected amino acid positions mostly corresponded to selected positions in rodent and human DQA genes.
Role of non-coding RNAs in non-aging-related neurological disorders.
Vieira, A S; Dogini, D B; Lopes-Cendes, I
2018-06-11
Protein coding sequences represent only 2% of the human genome. Recent advances have demonstrated that a significant portion of the genome is actively transcribed as non-coding RNA molecules. These non-coding RNAs are emerging as key players in the regulation of biological processes, and act as "fine-tuners" of gene expression. Neurological disorders are caused by a wide range of genetic mutations, epigenetic and environmental factors, and the exact pathophysiology of many of these conditions is still unknown. It is currently recognized that dysregulations in the expression of non-coding RNAs are present in many neurological disorders and may be relevant in the mechanisms leading to disease. In addition, circulating non-coding RNAs are emerging as potential biomarkers with great potential impact in clinical practice. In this review, we discuss mainly the role of microRNAs and long non-coding RNAs in several neurological disorders, such as epilepsy, Huntington disease, fragile X-associated ataxia, spinocerebellar ataxias, amyotrophic lateral sclerosis (ALS), and pain. In addition, we give information about the conditions where microRNAs have demonstrated to be potential biomarkers such as in epilepsy, pain, and ALS.
Huang, Lin; Lange, Miles D.; Zhang, Zhixin
2014-01-01
VH replacement occurs through RAG-mediated secondary recombination between a rearranged VH gene and an upstream unrearranged VH gene. Due to the location of the cryptic recombination signal sequence (cRSS, TACTGTG) at the 3′ end of VH gene coding region, a short stretch of nucleotides from the previous rearranged VH gene can be retained in the newly formed VH–DH junction as a “footprint” of VH replacement. Such footprints can be used as markers to identify Ig heavy chain (IgH) genes potentially generated through VH replacement. To explore the contribution of VH replacement products to the antibody repertoire, we developed a Java-based computer program, VH replacement footprint analyzer-I (VHRFA-I), to analyze published or newly obtained IgH genes from human or mouse. The VHRFA-1 program has multiple functional modules: it first uses service provided by the IMGT/V-QUEST program to assign potential VH, DH, and JH germline genes; then, it searches for VH replacement footprint motifs within the VH–DH junction (N1) regions of IgH gene sequences to identify potential VH replacement products; it can also analyze the frequencies of VH replacement products in correlation with publications, keywords, or VH, DH, and JH gene usages, and mutation status; it can further analyze the amino acid usages encoded by the identified VH replacement footprints. In summary, this program provides a useful computation tool for exploring the biological significance of VH replacement products in human and mouse. PMID:24575092
Fatakia, Sarosh N.; Mehta, Ishita S.; Rao, Basuthkar J.
2016-01-01
Forty-six chromosome territories (CTs) are positioned uniquely in human interphase nuclei, wherein each of their positions can range from the centre of the nucleus to its periphery. A non-empirical basis for their non-random arrangement remains unreported. Here, we derive a suprachromosomal basis of that overall arrangement (which we refer to as a CT constellation), and report a hierarchical nature of the same. Using matrix algebra, we unify intrinsic chromosomal parameters (e.g., chromosomal length, gene density, the number of genes per chromosome), to derive an extrinsic effective gene density matrix, the hierarchy of which is dominated largely by extrinsic mathematical coupling of HSA19, followed by HSA17 (human chromosome 19 and 17, both preferentially interior CTs) with all CTs. We corroborate predicted constellations and effective gene density hierarchy with published reports from fluorescent in situ hybridization based microscopy and Hi-C techniques, and delineate analogous hierarchy in disparate vertebrates. Our theory accurately predicts CTs localised to the nuclear interior, which interestingly share conserved synteny with HSA19 and/or HSA17. Finally, the effective gene density hierarchy dictates how permutations among CT position represents the plasticity within its constellations, based on which we suggest that a differential mix of coding with noncoding genome modulates the same. PMID:27845379
Genomic interval engineering of mice identified a novel modulator of triglyceride production
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhu, Y.; Jong, M.C.; Frazer, K.A.
1999-10-01
To accelerate the biological annotation of novel genes discovered in sequenced of mammalian genomes, we are creating large deletions in the mouse genome targeted to include clusters of such genes. Here we describe the targeted deletion of a 450 kb region on mouse chromosome 11 which, based on computational analysis of the deleted murine sequences and human 5q orthologous sequences, codes for nine putative genes. Mice homozygous for the deletion had a variety of abnormalities including severe hypertriglyceridemia, hepatic and cardiac enlargement, growth retardation and premature mortality. Analysis of triglyceride metabolism in these animals demonstrated a several-fold increase in hepaticmore » very-low density lipoprotein (VLDL) triglyceride secretion, the most prevalent mechanism responsible for hypertriglyceridemia in humans. A series of mouse BAC and human YAC transgenes covering different intervals of the 450 kb deleted region were assessed for their ability to complement the deletion induced abnormalities. These studies revealed that OCTN2, a gene recently shown to play a role in carnitine transport, was able to correct the triglyceride abnormalities. The discovery of this previously unappreciated relationship between OCTN2, carnitine and hepatic triglyceride production is of particular importance due to the clinical consequence of hypertriglyceridemia and the paucity of genes known to modulate triglyceride secretion.« less
Lin, Michael F.; Deoras, Ameya N.; Rasmussen, Matthew D.; Kellis, Manolis
2008-01-01
Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (≤240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human. PMID:18421375
Rozhdestvensky, Timofey S.; Robeck, Thomas; Galiveti, Chenna R.; Raabe, Carsten A.; Seeger, Birte; Wolters, Anna; Gubar, Leonid V.; Brosius, Jürgen; Skryabin, Boris V.
2016-01-01
Prader-Willi syndrome (PWS) is a neurogenetic disorder caused by loss of paternally expressed genes on chromosome 15q11-q13. The PWS-critical region (PWScr) contains an array of non-protein coding IPW-A exons hosting intronic SNORD116 snoRNA genes. Deletion of PWScr is associated with PWS in humans and growth retardation in mice exhibiting ~15% postnatal lethality in C57BL/6 background. Here we analysed a knock-in mouse containing a 5′HPRT-LoxP-NeoR cassette (5′LoxP) inserted upstream of the PWScr. When the insertion was inherited maternally in a paternal PWScr-deletion mouse model (PWScrp−/m5′LoxP), we observed compensation of growth retardation and postnatal lethality. Genomic methylation pattern and expression of protein-coding genes remained unaltered at the PWS-locus of PWScrp−/m5′LoxP mice. Interestingly, ubiquitous Snord116 and IPW-A exon transcription from the originally silent maternal chromosome was detected. In situ hybridization indicated that PWScrp−/m5′LoxP mice expressed Snord116 in brain areas similar to wild type animals. Our results suggest that the lack of PWScr RNA expression in certain brain areas could be a primary cause of the growth retardation phenotype in mice. We propose that activation of disease-associated genes on imprinted regions could lead to general therapeutic strategies in associated diseases. PMID:26848093
Forrest, Megan E; Saiakhova, Alina; Beard, Lydia; Buchner, David A; Scacheri, Peter C; LaFramboise, Thomas; Markowitz, Sanford; Khalil, Ahmad M
2018-05-09
Long non-coding RNAs (lncRNAs) are frequently dysregulated in many human cancers. We sought to identify candidate oncogenic lncRNAs in human colon tumors by utilizing RNA sequencing data from 22 colon tumors and 22 adjacent normal colon samples from The Cancer Genome Atlas (TCGA). The analysis led to the identification of ~200 differentially expressed lncRNAs. Validation in an independent cohort of normal colon and patient-derived colon cancer cell lines identified a novel lncRNA, lincDUSP, as a potential candidate oncogene. Knockdown of lincDUSP in patient-derived colon tumor cell lines resulted in significantly decreased cell proliferation and clonogenic potential, and increased susceptibility to apoptosis. The knockdown of lincDUSP affects the expression of ~800 genes, and NCI pathway analysis showed enrichment of DNA damage response and cell cycle control pathways. Further, identification of lincDUSP chromatin occupancy sites by ChIRP-Seq demonstrated association with genes involved in the replication-associated DNA damage response and cell cycle control. Consistent with these findings, lincDUSP knockdown in colon tumor cell lines increased both the accumulation of cells in early S-phase and γH2AX foci formation, indicating increased DNA damage response induction. Taken together, these results demonstrate a key role of lincDUSP in the regulation of important pathways in colon cancer.
A Dual Origin of the Xist Gene from a Protein-Coding Gene and a Set of Transposable Elements
Elisaphenko, Eugeny A.; Kolesnikov, Nikolay N.; Shevchenko, Alexander I.; Rogozin, Igor B.; Nesterova, Tatyana B.; Brockdorff, Neil; Zakian, Suren M.
2008-01-01
X-chromosome inactivation, which occurs in female eutherian mammals is controlled by a complex X-linked locus termed the X-inactivation center (XIC). Previously it was proposed that genes of the XIC evolved, at least in part, as a result of pseudogenization of protein-coding genes. In this study we show that the key XIC gene Xist, which displays fragmentary homology to a protein-coding gene Lnx3, emerged de novo in early eutherians by integration of mobile elements which gave rise to simple tandem repeats. The Xist gene promoter region and four out of ten exons found in eutherians retain homology to exons of the Lnx3 gene. The remaining six Xist exons including those with simple tandem repeats detectable in their structure have similarity to different transposable elements. Integration of mobile elements into Xist accompanies the overall evolution of the gene and presumably continues in contemporary eutherian species. Additionally we showed that the combination of remnants of protein-coding sequences and mobile elements is not unique to the Xist gene and is found in other XIC genes producing non-coding nuclear RNA. PMID:18575625
POM-ZP3, a bipartite transcript derived from human ZP3 and a POM121 homologue
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kipersztok, S.; Osawa, G.A.; Liang, L.F.
1995-01-20
Human POM-ZP3 is a novel bipartite RNA transcript that is derived from a gene homologous to rat POM121 (a nuclear pore membrane protein) and ZP3 (a sperm receptor ligand in the zona pellucida). The 5{prime} region is 77% identical to the 5{prime} end of the coding region of rat POM121 and appears to represent a partial duplication of a gene encoding a human homologue of this rodent gene. The 3{prime} end of the POM-ZP3 transcript is 99% identical to ZP3 and appears to have arisen from a duplication of the last four exons (exons 5-8) of ZP3. Using Northern blotsmore » and RT-PCR, POM-ZP3 transcripts were detected in human ovaries, testes, spleen, thymus, lymphocytes, prostate, and intestines. The longest open reading frame encodes a conceptual protein of 210 amino acids, the first 76 of which are 83% identical to residues 241-315 of rat POM121. The next 125 amino acids are 98% identical to residues 239-363 of the 424-amino-acid human ZP3 protein. By fluorescence in situ hybridization, genomic fragments of ZP3 and a human homologue of POM121 were localized to chromosome 7q11.23. Taken together, these data suggest that partial duplications of human ZP3 and a POM121-like gene have resulted in a fusion transcript, POM-ZP3, that is expressed in multiple human tissues. 24 refs., 5 figs.« less
Population genetic implications from sequence variation in four Y chromosome genes.
Shen, P; Wang, F; Underhill, P A; Franco, C; Yang, W H; Roxas, A; Sung, R; Lin, A A; Hyman, R W; Vollrath, D; Davis, R W; Cavalli-Sforza, L L; Oefner, P J
2000-06-20
Some insight into human evolution has been gained from the sequencing of four Y chromosome genes. Primary genomic sequencing determined gene SMCY to be composed of 27 exons that comprise 4,620 bp of coding sequence. The unfinished sequencing of the 5' portion of gene UTY1 was completed by primer walking, and a total of 20 exons were found. By using denaturing HPLC, these two genes, as well as DBY and DFFRY, were screened for polymorphic sites in 53-72 representatives of the five continents. A total of 98 variants were found, yielding nucleotide diversity estimates of 2.45 x 10(-5), 5. 07 x 10(-5), and 8.54 x 10(-5) for the coding regions of SMCY, DFFRY, and UTY1, respectively, with no variant having been observed in DBY. In agreement with most autosomal genes, diversity estimates for the noncoding regions were about 2- to 3-fold higher and ranged from 9. 16 x 10(-5) to 14.2 x 10(-5) for the four genes. Analysis of the frequencies of derived alleles for all four genes showed that they more closely fit the expectation of a Luria-Delbrück distribution than a distribution expected under a constant population size model, providing evidence for exponential population growth. Pairwise nucleotide mismatch distributions date the occurrence of population expansion to approximately 28,000 years ago. This estimate is in accord with the spread of Aurignacian technology and the disappearance of the Neanderthals.
Noh, Hyun Ji; Tang, Ruqi; Flannick, Jason; O'Dushlaine, Colm; Swofford, Ross; Howrigan, Daniel; Genereux, Diane P; Johnson, Jeremy; van Grootheest, Gerard; Grünblatt, Edna; Andersson, Erik; Djurfeldt, Diana R; Patel, Paresh D; Koltookian, Michele; M Hultman, Christina; Pato, Michele T; Pato, Carlos N; Rasmussen, Steven A; Jenike, Michael A; Hanna, Gregory L; Stewart, S Evelyn; Knowles, James A; Ruhrmann, Stephan; Grabe, Hans-Jörgen; Wagner, Michael; Rück, Christian; Mathews, Carol A; Walitza, Susanne; Cath, Daniëlle C; Feng, Guoping; Karlsson, Elinor K; Lindblad-Toh, Kerstin
2017-10-17
Obsessive-compulsive disorder is a severe psychiatric disorder linked to abnormalities in glutamate signaling and the cortico-striatal circuit. We sequenced coding and regulatory elements for 608 genes potentially involved in obsessive-compulsive disorder in human, dog, and mouse. Using a new method that prioritizes likely functional variants, we compared 592 cases to 560 controls and found four strongly associated genes, validated in a larger cohort. NRXN1 and HTR2A are enriched for coding variants altering postsynaptic protein-binding domains. CTTNBP2 (synapse maintenance) and REEP3 (vesicle trafficking) are enriched for regulatory variants, of which at least six (35%) alter transcription factor-DNA binding in neuroblastoma cells. NRXN1 achieves genome-wide significance (p = 6.37 × 10 -11 ) when we include 33,370 population-matched controls. Our findings suggest synaptic adhesion as a key component in compulsive behaviors, and show that targeted sequencing plus functional annotation can identify potentially causative variants, even when genomic data are limited.Obsessive-compulsive disorder (OCD) is a neuropsychiatric disorder with symptoms including intrusive thoughts and time-consuming repetitive behaviors. Here Noh and colleagues identify genes enriched for functional variants associated with increased risk of OCD.
SoyNet: a database of co-functional networks for soybean Glycine max.
Kim, Eiru; Hwang, Sohyun; Lee, Insuk
2017-01-04
Soybean (Glycine max) is a legume crop with substantial economic value, providing a source of oil and protein for humans and livestock. More than 50% of edible oils consumed globally are derived from this crop. Soybean plants are also important for soil fertility, as they fix atmospheric nitrogen by symbiosis with microorganisms. The latest soybean genome annotation (version 2.0) lists 56 044 coding genes, yet their functional contributions to crop traits remain mostly unknown. Co-functional networks have proven useful for identifying genes that are involved in a particular pathway or phenotype with various network algorithms. Here, we present SoyNet (available at www.inetbio.org/soynet), a database of co-functional networks for G. max and a companion web server for network-based functional predictions. SoyNet maps 1 940 284 co-functional links between 40 812 soybean genes (72.8% of the coding genome), which were inferred from 21 distinct types of genomics data including 734 microarrays and 290 RNA-seq samples from soybean. SoyNet provides a new route to functional investigation of the soybean genome, elucidating genes and pathways of agricultural importance. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred
2014-11-20
Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria. Copyright © 2014 Elsevier B.V. All rights reserved.
Gene-Auto: Automatic Software Code Generation for Real-Time Embedded Systems
NASA Astrophysics Data System (ADS)
Rugina, A.-E.; Thomas, D.; Olive, X.; Veran, G.
2008-08-01
This paper gives an overview of the Gene-Auto ITEA European project, which aims at building a qualified C code generator from mathematical models under Matlab-Simulink and Scilab-Scicos. The project is driven by major European industry partners, active in the real-time embedded systems domains. The Gene- Auto code generator will significantly improve the current development processes in such domains by shortening the time to market and by guaranteeing the quality of the generated code through the use of formal methods. The first version of the Gene-Auto code generator has already been released and has gone thought a validation phase on real-life case studies defined by each project partner. The validation results are taken into account in the implementation of the second version of the code generator. The partners aim at introducing the Gene-Auto results into industrial development by 2010.
Turcot, Valérie; Lu, Yingchang; Highland, Heather M; Schurmann, Claudia; Justice, Anne E; Fine, Rebecca S; Bradfield, Jonathan P; Esko, Tõnu; Giri, Ayush; Graff, Mariaelisa; Guo, Xiuqing; Hendricks, Audrey E; Karaderi, Tugce; Lempradl, Adelheid; Locke, Adam E; Mahajan, Anubha; Marouli, Eirini; Sivapalaratnam, Suthesh; Young, Kristin L; Alfred, Tamuno; Feitosa, Mary F; Masca, Nicholas G D; Manning, Alisa K; Medina-Gomez, Carolina; Mudgal, Poorva; Ng, Maggie C Y; Reiner, Alex P; Vedantam, Sailaja; Willems, Sara M; Winkler, Thomas W; Abecasis, Gonçalo; Aben, Katja K; Alam, Dewan S; Alharthi, Sameer E; Allison, Matthew; Amouyel, Philippe; Asselbergs, Folkert W; Auer, Paul L; Balkau, Beverley; Bang, Lia E; Barroso, Inês; Bastarache, Lisa; Benn, Marianne; Bergmann, Sven; Bielak, Lawrence F; Blüher, Matthias; Boehnke, Michael; Boeing, Heiner; Boerwinkle, Eric; Böger, Carsten A; Bork-Jensen, Jette; Bots, Michiel L; Bottinger, Erwin P; Bowden, Donald W; Brandslund, Ivan; Breen, Gerome; Brilliant, Murray H; Broer, Linda; Brumat, Marco; Burt, Amber A; Butterworth, Adam S; Campbell, Peter T; Cappellani, Stefania; Carey, David J; Catamo, Eulalia; Caulfield, Mark J; Chambers, John C; Chasman, Daniel I; Chen, Yii-Der I; Chowdhury, Rajiv; Christensen, Cramer; Chu, Audrey Y; Cocca, Massimiliano; Collins, Francis S; Cook, James P; Corley, Janie; Corominas Galbany, Jordi; Cox, Amanda J; Crosslin, David S; Cuellar-Partida, Gabriel; D'Eustacchio, Angela; Danesh, John; Davies, Gail; Bakker, Paul I W; Groot, Mark C H; Mutsert, Renée; Deary, Ian J; Dedoussis, George; Demerath, Ellen W; Heijer, Martin; Hollander, Anneke I; Ruijter, Hester M; Dennis, Joe G; Denny, Josh C; Di Angelantonio, Emanuele; Drenos, Fotios; Du, Mengmeng; Dubé, Marie-Pierre; Dunning, Alison M; Easton, Douglas F; Edwards, Todd L; Ellinghaus, David; Ellinor, Patrick T; Elliott, Paul; Evangelou, Evangelos; Farmaki, Aliki-Eleni; Farooqi, I Sadaf; Faul, Jessica D; Fauser, Sascha; Feng, Shuang; Ferrannini, Ele; Ferrieres, Jean; Florez, Jose C; Ford, Ian; Fornage, Myriam; Franco, Oscar H; Franke, Andre; Franks, Paul W; Friedrich, Nele; Frikke-Schmidt, Ruth; Galesloot, Tessel E; Gan, Wei; Gandin, Ilaria; Gasparini, Paolo; Gibson, Jane; Giedraitis, Vilmantas; Gjesing, Anette P; Gordon-Larsen, Penny; Gorski, Mathias; Grabe, Hans-Jörgen; Grant, Struan F A; Grarup, Niels; Griffiths, Helen L; Grove, Megan L; Gudnason, Vilmundur; Gustafsson, Stefan; Haessler, Jeff; Hakonarson, Hakon; Hammerschlag, Anke R; Hansen, Torben; Harris, Kathleen Mullan; Harris, Tamara B; Hattersley, Andrew T; Have, Christian T; Hayward, Caroline; He, Liang; Heard-Costa, Nancy L; Heath, Andrew C; Heid, Iris M; Helgeland, Øyvind; Hernesniemi, Jussi; Hewitt, Alex W; Holmen, Oddgeir L; Hovingh, G Kees; Howson, Joanna M M; Hu, Yao; Huang, Paul L; Huffman, Jennifer E; Ikram, M Arfan; Ingelsson, Erik; Jackson, Anne U; Jansson, Jan-Håkan; Jarvik, Gail P; Jensen, Gorm B; Jia, Yucheng; Johansson, Stefan; Jørgensen, Marit E; Jørgensen, Torben; Jukema, J Wouter; Kahali, Bratati; Kahn, René S; Kähönen, Mika; Kamstrup, Pia R; Kanoni, Stavroula; Kaprio, Jaakko; Karaleftheri, Maria; Kardia, Sharon L R; Karpe, Fredrik; Kathiresan, Sekar; Kee, Frank; Kiemeney, Lambertus A; Kim, Eric; Kitajima, Hidetoshi; Komulainen, Pirjo; Kooner, Jaspal S; Kooperberg, Charles; Korhonen, Tellervo; Kovacs, Peter; Kuivaniemi, Helena; Kutalik, Zoltán; Kuulasmaa, Kari; Kuusisto, Johanna; Laakso, Markku; Lakka, Timo A; Lamparter, David; Lange, Ethan M; Lange, Leslie A; Langenberg, Claudia; Larson, Eric B; Lee, Nanette R; Lehtimäki, Terho; Lewis, Cora E; Li, Huaixing; Li, Jin; Li-Gao, Ruifang; Lin, Honghuang; Lin, Keng-Hung; Lin, Li-An; Lin, Xu; Lind, Lars; Lindström, Jaana; Linneberg, Allan; Liu, Ching-Ti; Liu, Dajiang J; Liu, Yongmei; Lo, Ken S; Lophatananon, Artitaya; Lotery, Andrew J; Loukola, Anu; Luan, Jian'an; Lubitz, Steven A; Lyytikäinen, Leo-Pekka; Männistö, Satu; Marenne, Gaëlle; Mazul, Angela L; McCarthy, Mark I; McKean-Cowdin, Roberta; Medland, Sarah E; Meidtner, Karina; Milani, Lili; Mistry, Vanisha; Mitchell, Paul; Mohlke, Karen L; Moilanen, Leena; Moitry, Marie; Montgomery, Grant W; Mook-Kanamori, Dennis O; Moore, Carmel; Mori, Trevor A; Morris, Andrew D; Morris, Andrew P; Müller-Nurasyid, Martina; Munroe, Patricia B; Nalls, Mike A; Narisu, Narisu; Nelson, Christopher P; Neville, Matt; Nielsen, Sune F; Nikus, Kjell; Njølstad, Pål R; Nordestgaard, Børge G; Nyholt, Dale R; O'Connel, Jeffrey R; O'Donoghue, Michelle L; Olde Loohuis, Loes M; Ophoff, Roel A; Owen, Katharine R; Packard, Chris J; Padmanabhan, Sandosh; Palmer, Colin N A; Palmer, Nicholette D; Pasterkamp, Gerard; Patel, Aniruddh P; Pattie, Alison; Pedersen, Oluf; Peissig, Peggy L; Peloso, Gina M; Pennell, Craig E; Perola, Markus; Perry, James A; Perry, John R B; Pers, Tune H; Person, Thomas N; Peters, Annette; Petersen, Eva R B; Peyser, Patricia A; Pirie, Ailith; Polasek, Ozren; Polderman, Tinca J; Puolijoki, Hannu; Raitakari, Olli T; Rasheed, Asif; Rauramaa, Rainer; Reilly, Dermot F; Renström, Frida; Rheinberger, Myriam; Ridker, Paul M; Rioux, John D; Rivas, Manuel A; Roberts, David J; Robertson, Neil R; Robino, Antonietta; Rolandsson, Olov; Rudan, Igor; Ruth, Katherine S; Saleheen, Danish; Salomaa, Veikko; Samani, Nilesh J; Sapkota, Yadav; Sattar, Naveed; Schoen, Robert E; Schreiner, Pamela J; Schulze, Matthias B; Scott, Robert A; Segura-Lepe, Marcelo P; Shah, Svati H; Sheu, Wayne H-H; Sim, Xueling; Slater, Andrew J; Small, Kerrin S; Smith, Albert V; Southam, Lorraine; Spector, Timothy D; Speliotes, Elizabeth K; Starr, John M; Stefansson, Kari; Steinthorsdottir, Valgerdur; Stirrups, Kathleen E; Strauch, Konstantin; Stringham, Heather M; Stumvoll, Michael; Sun, Liang; Surendran, Praveen; Swift, Amy J; Tada, Hayato; Tansey, Katherine E; Tardif, Jean-Claude; Taylor, Kent D; Teumer, Alexander; Thompson, Deborah J; Thorleifsson, Gudmar; Thorsteinsdottir, Unnur; Thuesen, Betina H; Tönjes, Anke; Tromp, Gerard; Trompet, Stella; Tsafantakis, Emmanouil; Tuomilehto, Jaakko; Tybjaerg-Hansen, Anne; Tyrer, Jonathan P; Uher, Rudolf; Uitterlinden, André G; Uusitupa, Matti; Laan, Sander W; Duijn, Cornelia M; Leeuwen, Nienke; van Setten, Jessica; Vanhala, Mauno; Varbo, Anette; Varga, Tibor V; Varma, Rohit; Velez Edwards, Digna R; Vermeulen, Sita H; Veronesi, Giovanni; Vestergaard, Henrik; Vitart, Veronique; Vogt, Thomas F; Völker, Uwe; Vuckovic, Dragana; Wagenknecht, Lynne E; Walker, Mark; Wallentin, Lars; Wang, Feijie; Wang, Carol A; Wang, Shuai; Wang, Yiqin; Ware, Erin B; Wareham, Nicholas J; Warren, Helen R; Waterworth, Dawn M; Wessel, Jennifer; White, Harvey D; Willer, Cristen J; Wilson, James G; Witte, Daniel R; Wood, Andrew R; Wu, Ying; Yaghootkar, Hanieh; Yao, Jie; Yao, Pang; Yerges-Armstrong, Laura M; Young, Robin; Zeggini, Eleftheria; Zhan, Xiaowei; Zhang, Weihua; Zhao, Jing Hua; Zhao, Wei; Zhao, Wei; Zhou, Wei; Zondervan, Krina T; Rotter, Jerome I; Pospisilik, John A; Rivadeneira, Fernando; Borecki, Ingrid B; Deloukas, Panos; Frayling, Timothy M; Lettre, Guillaume; North, Kari E; Lindgren, Cecilia M; Hirschhorn, Joel N; Loos, Ruth J F
2018-01-01
Genome-wide association studies (GWAS) have identified >250 loci for body mass index (BMI), implicating pathways related to neuronal biology. Most GWAS loci represent clusters of common, noncoding variants from which pinpointing causal genes remains challenging. Here we combined data from 718,734 individuals to discover rare and low-frequency (minor allele frequency (MAF) < 5%) coding variants associated with BMI. We identified 14 coding variants in 13 genes, of which 8 variants were in genes (ZBTB7B, ACHE, RAPGEF3, RAB21, ZFHX3, ENTPD6, ZFR2 and ZNF169) newly implicated in human obesity, 2 variants were in genes (MC4R and KSR2) previously observed to be mutated in extreme obesity and 2 variants were in GIPR. The effect sizes of rare variants are ~10 times larger than those of common variants, with the largest effect observed in carriers of an MC4R mutation introducing a stop codon (p.Tyr35Ter, MAF = 0.01%), who weighed ~7 kg more than non-carriers. Pathway analyses based on the variants associated with BMI confirm enrichment of neuronal genes and provide new evidence for adipocyte and energy expenditure biology, widening the potential of genetically supported therapeutic targets in obesity.
SWIM: a computational tool to unveiling crucial nodes in complex biological networks
Paci, Paola; Colombo, Teresa; Fiscon, Giulia; Gurtner, Aymone; Pavesi, Giulio; Farina, Lorenzo
2017-01-01
SWItchMiner (SWIM) is a wizard-like software implementation of a procedure, previously described, able to extract information contained in complex networks. Specifically, SWIM allows unearthing the existence of a new class of hubs, called “fight-club hubs”, characterized by a marked negative correlation with their first nearest neighbors. Among them, a special subset of genes, called “switch genes”, appears to be characterized by an unusual pattern of intra- and inter-module connections that confers them a crucial topological role, interestingly mirrored by the evidence of their clinic-biological relevance. Here, we applied SWIM to a large panel of cancer datasets from The Cancer Genome Atlas, in order to highlight switch genes that could be critically associated with the drastic changes in the physiological state of cells or tissues induced by the cancer development. We discovered that switch genes are found in all cancers we studied and they encompass protein coding genes and non-coding RNAs, recovering many known key cancer players but also many new potential biomarkers not yet characterized in cancer context. Furthermore, SWIM is amenable to detect switch genes in different organisms and cell conditions, with the potential to uncover important players in biologically relevant scenarios, including but not limited to human cancer. PMID:28317894
Gomes, S L; Gober, J W; Shapiro, L
1990-01-01
Caulobacter crescentus has a single dnaK gene that is highly homologous to the hsp70 family of heat shock genes. Analysis of the cloned and sequenced dnaK gene has shown that the deduced amino acid sequence could encode a protein of 67.6 kilodaltons that is 68% identical to the DnaK protein of Escherichia coli and 49% identical to the Drosophila and human hsp70 protein family. A partial open reading frame 165 base pairs 3' to the end of dnaK encodes a peptide of 190 amino acids that is 59% identical to DnaJ of E. coli. Northern blot analysis revealed a single 4.0-kilobase mRNA homologous to the cloned fragment. Since the dnaK coding region is 1.89 kilobases, dnaK and dnaJ may be transcribed as a polycistronic message. S1 mapping and primer extension experiments showed that transcription initiated at two sites 5' to the dnaK coding sequence. A single start site of transcription was identified during heat shock at 42 degrees C, and the predicted promoter sequence conformed to the consensus heat shock promoters of E. coli. At normal growth temperature (30 degrees C), a different start site was identified 3' to the heat shock start site that conformed to the E. coli sigma 70 promoter consensus sequence. S1 protection assays and analysis of expression of the dnaK gene fused to the lux transcription reporter gene showed that expression of dnaK is temporally controlled under normal physiological conditions and that transcription occurs just before the initiation of DNA replication. Thus, in both human cells (I. K. L. Milarski and R. I. Morimoto, Proc. Natl. Acad. Sci. USA 83:9517-9521, 1986) and in a simple bacterium, the transcription of a hsp70 gene is temporally controlled as a function of the cell cycle under normal growth conditions. Images PMID:2345134
Zhang, Q; Baldwin, V J; Acland, G M; Parshall, C J; Haskel, J; Aguirre, G D; Ray, K
1999-01-01
Photoreceptor dysplasia (pd) is one of a group of at least six distinct autosomal and one X-linked retinal disorders identified in dogs which are collectively known as progressive retinal atrophy (PRA). It is an early onset retinal disease identified in miniature schnauzer dogs, and pedigree analysis and breeding studies have established autosomal recessive inheritance of the disease. Using a gene-based approach, a number of retina-expressed genes, including some members of the phototransduction pathway, have been causally implicated in retinal diseases of humans and other animals. Here we examined seven such potential candidate genes (opsin, RDS/peripherin, ROM1, rod cGMP-gated cation channel alpha-subunit, and three subunits of transducin) for their causal association with the pd locus by testing segregation of intragenic markers with the disease locus, or, in the absence of informative polymorphisms, sequencing of the coding regions of the genes. Based on these results, we have conclusively excluded four photoreceptor-specific genes as candidates for pd by linkage analysis. For three other photoreceptor-specific genes, we did not find any mutation in the coding sequences of the genes and have excluded them provisionally. Formal exclusion would require investigation of the levels of expression of the candidate genes in pd-affected dogs relative to age-matched controls. At present we are building suitable informative pedigrees for the disease locus with a sufficient number of meiosis to be useful for genomewide screening. This should identify markers linked to the disease locus and eventually permit progress toward the identification of the photoreceptor dysplasia gene and the disease-causing mutation.
Fu, Lijuan; Shi, Zhimin; Luo, Guanzheng; Tu, Weihong; Wang, XiuJie; Fang, Zhide; Li, XiaoChing
2014-10-01
Mutations in the human FOXP2 gene cause speech and language impairments. The FOXP2 protein is a transcription factor that regulates the expression of many downstream genes, which may have important roles in nervous system development and function. An adequate amount of functional FOXP2 protein is thought to be critical for the proper development of the neural circuitry underlying speech and language. However, how FOXP2 gene expression is regulated is not clearly understood. The FOXP2 mRNA has an approximately 4-kb-long 3' untranslated region (3' UTR), twice as long as its protein coding region, indicating that FOXP2 can be regulated by microRNAs (miRNAs). We identified multiple miRNAs that regulate the expression of the human FOXP2 gene using sequence analysis and in vitro cell systems. Focusing on let-7a, miR-9, and miR-129-5p, three brain-enriched miRNAs, we show that these miRNAs regulate human FOXP2 expression in a dosage-dependent manner and target specific sequences in the FOXP2 3' UTR. We further show that these three miRNAs are expressed in the cerebellum of the human fetal brain, where FOXP2 is known to be expressed. Our results reveal novel regulatory functions of the human FOXP2 3' UTR sequence and regulatory interactions between multiple miRNAs and the human FOXP2 gene. The expression of let-7a, miR-9, and miR-129-5p in the human fetal cerebellum is consistent with their roles in regulating FOXP2 expression during early cerebellum development. These results suggest that various genetic and environmental factors may contribute to speech and language development and related neural developmental disorders via the miRNA-FOXP2 regulatory network.
Chimpanzee and human Y chromosomes are remarkably divergent in structure and gene content
Hughes, Jennifer F.; Skaletsky, Helen; Pyntikova, Tatyana; Graves, Tina A.; van Daalen, Saskia K. M.; Minx, Patrick J.; Fulton, Robert S.; McGrath, Sean D.; Locke, Devin P.; Friedman, Cynthia; Trask, Barbara J.; Mardis, Elaine R.; Warren, Wesley C.; Repping, Sjoerd; Rozen, Steve; Wilson, Richard K.; Page, David C.
2013-01-01
The human Y chromosome began to evolve from an autosome hundreds of millions of years ago, acquiring a sex-determining function and undergoing a series of inversions that suppressed crossing over with the X chromosome1,2. Little is known about the Y chromosome’s recent evolution because only the human Y chromosome has been fully sequenced. Prevailing theories hold that Y chromosomes evolve by gene loss, the pace of which slows over time, eventually leading to a paucity of genes, and stasis3,4. These theories have been buttressed by partial sequence data from newly emergent plant and animal Y chromosomes5-8, but they have not been tested in older, highly evolved Y chromosomes like that of humans. We therefore finished sequencing the male-specific region of the Y chromosome (MSY) in our closest living relative, the chimpanzee, achieving levels of accuracy and completion previously reached for the human MSY. We then compared the MSYs of the two species and found that they differ radically in sequence structure and gene content, implying rapid evolution during the past 6 million years. The chimpanzee MSY harbors twice as many massive palindromes as the human MSY, yet it has lost large fractions of the MSY protein-coding genes and gene families present in the last common ancestor. We suggest that the extraordinary divergence of the chimpanzee and human MSYs was driven by four synergistic factors: the MSY’s prominent role in sperm production, genetic hitchhiking effects in the absence of meiotic crossing over, frequent ectopic recombination within the MSY, and species differences in mating behavior. While genetic decay may be the principal dynamic in the evolution of newly emergent Y chromosomes, wholesale renovation is the paramount theme in the ongoing evolution of chimpanzee, human, and perhaps other older MSYs. PMID:20072128
MicroRNAs in large herpesvirus DNA genomes: recent advances.
Sorel, Océane; Dewals, Benjamin G
2016-08-01
MicroRNAs (miRNAs) are small non-coding RNAs (ncRNAs) that regulate gene expression. They alter mRNA translation through base-pair complementarity, leading to regulation of genes during both physiological and pathological processes. Viruses have evolved mechanisms to take advantage of the host cells to multiply and/or persist over the lifetime of the host. Herpesviridae are a large family of double-stranded DNA viruses that are associated with a number of important diseases, including lymphoproliferative diseases. Herpesviruses establish lifelong latent infections through modulation of the interface between the virus and its host. A number of reports have identified miRNAs in a very large number of human and animal herpesviruses suggesting that these short non-coding transcripts could play essential roles in herpesvirus biology. This review will specifically focus on the recent advances on the functions of herpesvirus miRNAs in infection and pathogenesis.
Polymorphisms of 20 regulatory proteins between Mycobacterium tuberculosis and Mycobacterium bovis.
Bigi, María M; Blanco, Federico Carlos; Araújo, Flabio R; Thacker, Tyler C; Zumárraga, Martín J; Cataldi, Angel A; Soria, Marcelo A; Bigi, Fabiana
2016-08-01
Mycobacterium tuberculosis and Mycobacterium bovis are responsible for tuberculosis in humans and animals, respectively. Both species are closely related and belong to the Mycobacterium tuberculosis complex (MTC). M. tuberculosis is the most ancient species from which M. bovis and other members of the MTC evolved. The genome of M. bovis is over >99.95% identical to that of M. tuberculosis but with seven deletions ranging in size from 1 to 12.7 kb. In addition, 1200 single nucleotide mutations in coding regions distinguish M. bovis from M. tuberculosis. In the present study, we assessed 75 M. tuberculosis genomes and 23 M. bovis genomes to identify non-synonymous mutations in 202 coding sequences of regulatory genes between both species. We identified species-specific variants in 20 regulatory proteins and confirmed differential expression of hypoxia-related genes between M. bovis and M. tuberculosis. © 2016 The Societies and John Wiley & Sons Australia, Ltd.
Heavy ion mutagenesis: linear energy transfer effects and genetic linkage
NASA Technical Reports Server (NTRS)
Kronenberg, A.; Gauny, S.; Criddle, K.; Vannais, D.; Ueno, A.; Kraemer, S.; Waldren, C. A.; Chatterjee, A. (Principal Investigator)
1995-01-01
We have characterized a series of 69 independent mutants at the endogenous hprt locus of human TK6 lymphoblasts and over 200 independent S1-deficient mutants of the human x hamster hybrid cell line AL arising spontaneously or following low-fluence exposures to densely ionizing Fe ions (600 MeV/amu, linear energy transfer = 190 keV/microns). We find that large deletions are common. The entire hprt gene (> 44 kb) was missing in 19/39 Fe-induced mutants, while only 2/30 spontaneous mutants lost the entire hprt coding sequence. When the gene of interest (S1 locus = M1C1 gene) is located on a nonessential human chromosome 11, multilocus deletions of several million base pairs are observed frequently. The S1 mutation frequency is more than 50-fold greater than the frequency of hprt mutants in the same cells. Taken together, these results suggest that low-fluence exposures to Fe ions are often cytotoxic due to their ability to create multilocus deletions that may often include the loss of essential genes. In addition, the tumorigenic potential of these HZE heavy ions may be due to the high potential for loss of tumor suppressor genes. The relative insensitivity of the hprt locus to mutation is likely due to tight linkage to a gene that is required for viability.
Kwon, Deug-Nam; Chang, Byung-Soo; Kim, Jin-Hoi
2014-01-01
Background N-glycolylneuraminic acid (Neu5Gc) is generated by hydroxylation of CMP-Neu5Ac to CMP-Neu5Gc, catalyzed by CMP-Neu5Ac hydroxylase (CMAH). However, humans lack this common mammalian cell surface molecule, Neu5Gc, due to inactivation of the CMAH gene during evolution. CMAH is one of several human-specific genes whose function has been lost by disruption or deletion of the coding frame. It has been suggested that CMAH inactivation has resulted in biochemical or physiological characteristics that have resulted in human-specific diseases. Methodology/Principal Findings To identify differential gene expression profiles associated with the loss of Neu5Gc expression, we performed microarray analysis using Illumina MouseRef-8 v2 Expression BeadChip, using the main tissues (lung, kidney, and heart) from control mice and CMP-Neu5Ac hydroxylase (Cmah) gene knock-out mice, respectively. Out of a total of 25,697 genes, 204, 162, and 147 genes were found to be significantly modulated in the lung, kidney, and heart tissues of the Cmah null mouse, respectively. In this study, we examined the gene expression profiles, using three commercial pathway analysis software packages: Ingenuity Pathways Analysis, Kyoto Encyclopedia of Genes and Genomes analysis, and Pathway Studio. The gene ontology analysis revealed that the top 6 biological processes of these genes included protein metabolism and modification, signal transduction, lipid, fatty acid, and steroid metabolism, nucleoside, nucleotide and nucleic acid metabolism, immunity and defense, and carbohydrate metabolism. Gene interaction network analysis showed a common network that was common to the different tissues of the Cmah null mouse. However, the expression of most sialytransferase mRNAs of Hanganutziu-Deicher antigen, sialy-Tn antigen, Forssman antigen, and Tn antigen was significantly down-regulated in the liver tissue of Cmah null mice. Conclusions/Significance Mice bearing a human-like deletion of the Cmah gene serve as an important model for the study of abnormal pathogenesis and/or metabolism caused by the evolutionary loss of Neu5Gc synthesis in humans. PMID:25229777
Shen, Hua; McHale, Cliona M.; Smith, Martyn T; Zhang, Luoping
2015-01-01
Characterizing variability in the extent and nature of responses to environmental exposures is a critical aspect of human health risk assessment. Chemical toxicants act by many different mechanisms, however, and the genes involved in adverse outcome pathways (AOPs) and AOP networks are not yet characterized. Functional genomic approaches can reveal both toxicity pathways and susceptibility genes, through knockdown or knockout of all non-essential genes in a cell of interest, and identification of genes associated with a toxicity phenotype following toxicant exposure. Screening approaches in yeast and human near-haploid leukemic KBM7 cells, have identified roles for genes and pathways involved in response to many toxicants but are limited by partial homology among yeast and human genes and limited relevance to normal diploid cells. RNA interference (RNAi) suppresses mRNA expression level but is limited by off-target effects (OTEs) and incomplete knockdown. The recently developed gene editing approach called clustered regularly interspaced short palindrome repeats-associated nuclease (CRISPR)-Cas9, can precisely knock-out most regions of the genome at the DNA level with fewer OTEs than RNAi, in multiple human cell types, thus overcoming the limitations of the other approaches. It has been used to identify genes involved in the response to chemical and microbial toxicants in several human cell types and could readily be extended to the systematic screening of large numbers of environmental chemicals. CRISPR-Cas9 can also repress and activate gene expression, including that of non-coding RNA, with near-saturation, thus offering the potential to more fully characterize AOPs and AOP networks. Finally, CRISPR-Cas9 can generate complex animal models in which to conduct preclinical toxicity testing at the level of individual genotypes or haplotypes. Therefore, CRISPR-Cas9 is a powerful and flexible functional genomic screening approach that can be harnessed to provide unprecedented mechanistic insight in the field of modern toxicology. PMID:26041264