multiple genes coding: Topics by Science.gov

Sample records for multiple genes coding

EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences.

PubMed

Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

2003-07-01

EUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGENE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGENE'HOM to handle sequences from a variety of organisms. The current target of EUGENE'HOM is plant sequences. The EUGENE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl.
Genomic and Epigenomic Insights into Nutrition and Brain Disorders

PubMed Central

Dauncey, Margaret Joy

2013-01-01

Considerable evidence links many neuropsychiatric, neurodevelopmental and neurodegenerative disorders with multiple complex interactions between genetics and environmental factors such as nutrition. Mental health problems, autism, eating disorders, Alzheimer’s disease, schizophrenia, Parkinson’s disease and brain tumours are related to individual variability in numerous protein-coding and non-coding regions of the genome. However, genotype does not necessarily determine neurological phenotype because the epigenome modulates gene expression in response to endogenous and exogenous regulators, throughout the life-cycle. Studies using both genome-wide analysis of multiple genes and comprehensive analysis of specific genes are providing new insights into genetic and epigenetic mechanisms underlying nutrition and neuroscience. This review provides a critical evaluation of the following related areas: (1) recent advances in genomic and epigenomic technologies, and their relevance to brain disorders; (2) the emerging role of non-coding RNAs as key regulators of transcription, epigenetic processes and gene silencing; (3) novel approaches to nutrition, epigenetics and neuroscience; (4) gene-environment interactions, especially in the serotonergic system, as a paradigm of the multiple signalling pathways affected in neuropsychiatric and neurological disorders. Current and future advances in these four areas should contribute significantly to the prevention, amelioration and treatment of multiple devastating brain disorders. PMID:23503168
EUGÈNE'HOM: a generic similarity-based gene finder using multiple homologous sequences

PubMed Central

Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

2003-01-01

EUGÈNE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGÈNE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGÈNE'HOM to handle sequences from a variety of organisms. The current target of EUGÈNE'HOM is plant sequences. The EUGÈNE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl. PMID:12824408
Multiple copies of genes coding for electron transport proteins in the bacterium Nitrosomonas europaea.

PubMed

McTavish, H; LaQuier, F; Arciero, D; Logan, M; Mundfrom, G; Fuchs, J A; Hooper, A B

1993-04-01

The genome of Nitrosomonas europaea contains at least three copies each of the genes coding for hydroxylamine oxidoreductase (HAO) and cytochrome c554. A copy of an HAO gene is always located within 2.7 kb of a copy of a cytochrome c554 gene. Cytochrome P-460, a protein that shares very unusual spectral features with HAO, was found to be encoded by a gene separate from the HAO genes.
Metabolic Coevolution in the Bacterial Symbiosis of Whiteflies and Related Plant Sap-Feeding Insects.

PubMed

Luan, Jun-Bo; Chen, Wenbo; Hasegawa, Daniel K; Simmons, Alvin M; Wintermantel, William M; Ling, Kai-Shu; Fei, Zhangjun; Liu, Shu-Sheng; Douglas, Angela E

2015-09-15

Genomic decay is a common feature of intracellular bacteria that have entered into symbiosis with plant sap-feeding insects. This study of the whitefly Bemisia tabaci and two bacteria (Portiera aleyrodidarum and Hamiltonella defensa) cohoused in each host cell investigated whether the decay of Portiera metabolism genes is complemented by host and Hamiltonella genes, and compared the metabolic traits of the whitefly symbiosis with other sap-feeding insects (aphids, psyllids, and mealybugs). Parallel genomic and transcriptomic analysis revealed that the host genome contributes multiple metabolic reactions that complement or duplicate Portiera function, and that Hamiltonella may contribute multiple cofactors and one essential amino acid, lysine. Homologs of the Bemisia metabolism genes of insect origin have also been implicated in essential amino acid synthesis in other sap-feeding insect hosts, indicative of parallel coevolution of shared metabolic pathways across multiple symbioses. Further metabolism genes coded in the Bemisia genome are of bacterial origin, but phylogenetically distinct from Portiera, Hamiltonella and horizontally transferred genes identified in other sap-feeding insects. Overall, 75% of the metabolism genes of bacterial origin are functionally unique to one symbiosis, indicating that the evolutionary history of metabolic integration in these symbioses is strongly contingent on the pattern of horizontally acquired genes. Our analysis, further, shows that bacteria with genomic decay enable host acquisition of complex metabolic pathways by multiple independent horizontal gene transfers from exogenous bacteria. Specifically, each horizontally acquired gene can function with other genes in the pathway coded by the symbiont, while facilitating the decay of the symbiont gene coding the same reaction. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data

PubMed Central

Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia

2015-01-01

Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/ PMID:26363020
Different small, acid-soluble proteins of the alpha/beta type have interchangeable roles in the heat and UV radiation resistance of Bacillus subtilis spores.

PubMed Central

Mason, J M; Setlow, P

1987-01-01

Spores of Bacillus subtilis strains which carry deletion mutations in one gene (sspA) or two genes (sspA and sspB) which code for major alpha/beta-type small, acid-soluble spore proteins (SASP) are known to be much more sensitive to heat and UV radiation than wild-type spores. This heat- and UV-sensitive phenotype was cured completely or in part by introduction into these mutant strains of one or more copies of the sspA or sspB genes themselves; multiple copies of the B. subtilis sspD gene, which codes for a minor alpha/beta-type SASP; or multiple copies of the SASP-C gene, which codes for a major alpha/beta-type SASP of Bacillus megaterium. These findings suggest that alpha/beta-type SASP play interchangeable roles in the heat and UV radiation resistance of bacterial spores. Images PMID:3112127
Multiple Site-Directed and Saturation Mutagenesis by the Patch Cloning Method.

PubMed

Taniguchi, Naohiro; Murakami, Hiroshi

2017-01-01

Constructing protein-coding genes with desired mutations is a basic step for protein engineering. Herein, we describe a multiple site-directed and saturation mutagenesis method, termed MUPAC. This method has been used to introduce multiple site-directed mutations in the green fluorescent protein gene and in the moloney murine leukemia virus reverse transcriptase gene. Moreover, this method was also successfully used to introduce randomized codons at five desired positions in the green fluorescent protein gene, and for simple DNA assembly for cloning.
Genes uniquely expressed in human growth plate chondrocytes uncover a distinct regulatory network.

PubMed

Li, Bing; Balasubramanian, Karthika; Krakow, Deborah; Cohn, Daniel H

2017-12-20

Chondrogenesis is the earliest stage of skeletal development and is a highly dynamic process, integrating the activities and functions of transcription factors, cell signaling molecules and extracellular matrix proteins. The molecular mechanisms underlying chondrogenesis have been extensively studied and multiple key regulators of this process have been identified. However, a genome-wide overview of the gene regulatory network in chondrogenesis has not been achieved. In this study, employing RNA sequencing, we identified 332 protein coding genes and 34 long non-coding RNA (lncRNA) genes that are highly selectively expressed in human fetal growth plate chondrocytes. Among the protein coding genes, 32 genes were associated with 62 distinct human skeletal disorders and 153 genes were associated with skeletal defects in knockout mice, confirming their essential roles in skeletal formation. These gene products formed a comprehensive physical interaction network and participated in multiple cellular processes regulating skeletal development. The data also revealed 34 transcription factors and 11,334 distal enhancers that were uniquely active in chondrocytes, functioning as transcriptional regulators for the cartilage-selective genes. Our findings revealed a complex gene regulatory network controlling skeletal development whereby transcription factors, enhancers and lncRNAs participate in chondrogenesis by transcriptional regulation of key genes. Additionally, the cartilage-selective genes represent candidate genes for unsolved human skeletal disorders.
dbCPG: A web resource for cancer predisposition genes.

PubMed

Wei, Ran; Yao, Yao; Yang, Wu; Zheng, Chun-Hou; Zhao, Min; Xia, Junfeng

2016-06-21

Cancer predisposition genes (CPGs) are genes in which inherited mutations confer highly or moderately increased risks of developing cancer. Identification of these genes and understanding the biological mechanisms that underlie them is crucial for the prevention, early diagnosis, and optimized management of cancer. Over the past decades, great efforts have been made to identify CPGs through multiple strategies. However, information on these CPGs and their molecular functions is scattered. To address this issue and provide a comprehensive resource for researchers, we developed the Cancer Predisposition Gene Database (dbCPG, Database URL: http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp), the first literature-based gene resource for exploring human CPGs. It contains 827 human (724 protein-coding, 23 non-coding, and 80 unknown type genes), 637 rats, and 658 mouse CPGs. Furthermore, data mining was performed to gain insights into the understanding of the CPGs data, including functional annotation, gene prioritization, network analysis of prioritized genes and overlap analysis across multiple cancer types. A user-friendly web interface with multiple browse, search, and upload functions was also developed to facilitate access to the latest information on CPGs. Taken together, the dbCPG database provides a comprehensive data resource for further studies of cancer predisposition genes.
dbCPG: A web resource for cancer predisposition genes

PubMed Central

Wei, Ran; Yao, Yao; Yang, Wu; Zheng, Chun-Hou; Zhao, Min; Xia, Junfeng

2016-01-01

Cancer predisposition genes (CPGs) are genes in which inherited mutations confer highly or moderately increased risks of developing cancer. Identification of these genes and understanding the biological mechanisms that underlie them is crucial for the prevention, early diagnosis, and optimized management of cancer. Over the past decades, great efforts have been made to identify CPGs through multiple strategies. However, information on these CPGs and their molecular functions is scattered. To address this issue and provide a comprehensive resource for researchers, we developed the Cancer Predisposition Gene Database (dbCPG, Database URL: http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp), the first literature-based gene resource for exploring human CPGs. It contains 827 human (724 protein-coding, 23 non-coding, and 80 unknown type genes), 637 rats, and 658 mouse CPGs. Furthermore, data mining was performed to gain insights into the understanding of the CPGs data, including functional annotation, gene prioritization, network analysis of prioritized genes and overlap analysis across multiple cancer types. A user-friendly web interface with multiple browse, search, and upload functions was also developed to facilitate access to the latest information on CPGs. Taken together, the dbCPG database provides a comprehensive data resource for further studies of cancer predisposition genes. PMID:27192119
Strategies for comparing gene expression profiles from different microarray platforms: application to a case-control experiment.

PubMed

Severgnini, Marco; Bicciato, Silvio; Mangano, Eleonora; Scarlatti, Francesca; Mezzelani, Alessandra; Mattioli, Michela; Ghidoni, Riccardo; Peano, Clelia; Bonnal, Raoul; Viti, Federica; Milanesi, Luciano; De Bellis, Gianluca; Battaglia, Cristina

2006-06-01

Meta-analysis of microarray data is increasingly important, considering both the availability of multiple platforms using disparate technologies and the accumulation in public repositories of data sets from different laboratories. We addressed the issue of comparing gene expression profiles from two microarray platforms by devising a standardized investigative strategy. We tested this procedure by studying MDA-MB-231 cells, which undergo apoptosis on treatment with resveratrol. Gene expression profiles were obtained using high-density, short-oligonucleotide, single-color microarray platforms: GeneChip (Affymetrix) and CodeLink (Amersham). Interplatform analyses were carried out on 8414 common transcripts represented on both platforms, as identified by LocusLink ID, representing 70.8% and 88.6% of annotated GeneChip and CodeLink features, respectively. We identified 105 differentially expressed genes (DEGs) on CodeLink and 42 DEGs on GeneChip. Among them, only 9 DEGs were commonly identified by both platforms. Multiple analyses (BLAST alignment of probes with target sequences, gene ontology, literature mining, and quantitative real-time PCR) permitted us to investigate the factors contributing to the generation of platform-dependent results in single-color microarray experiments. An effective approach to cross-platform comparison involves microarrays of similar technologies, samples prepared by identical methods, and a standardized battery of bioinformatic and statistical analyses.
Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data.

PubMed

Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia

2015-01-01

Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/. © The Author(s) 2015. Published by Oxford University Press.
Prediction of plant lncRNA by ensemble machine learning classifiers.

PubMed

Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian

2018-05-02

In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.
Cap 'n' collar C regulates genes responsible for imidacloprid resistance in the Colorado potato beetle, Leptinotarsa decemlineata.

PubMed

Gaddelapati, Sharath Chandra; Kalsi, Megha; Roy, Amit; Palli, Subba Reddy

2018-08-01

The Colorado potato beetle (CPB), Leptinotarsa decemlineata developed resistance to imidacloprid after exposure to this insecticide for multiple generations. Our previous studies showed that xenobiotic transcription factor, cap 'n' collar isoform C (CncC) regulates the expression of multiple cytochrome P450 genes, which play essential roles in resistance to plant allelochemicals and insecticides. In this study, we sought to obtain a comprehensive picture of the genes regulated by CncC in imidacloprid-resistant CPB. We performed sequencing of RNA isolated from imidacloprid-resistant CPB treated with dsRNA targeting CncC or gene coding for green fluorescent protein (control). Comparative transcriptome analysis showed that CncC regulated the expression of 1798 genes, out of which 1499 genes were downregulated in CncC knockdown beetles. Interestingly, expression of 79% of imidacloprid induced P450 genes requires CncC. We performed quantitative real-time PCR to verify the reduction in the expression of 20 genes including those coding for detoxification enzymes (P450s, glutathione S-transferases, and esterases) and ABC transporters. The genes coding for ABC transporters are induced in insecticide resistant CPB and require CncC for their expression. Knockdown of genes coding for ABC transporters simultaneously or individually caused an increase in imidacloprid-induced mortality in resistant beetles confirming their contribution to insecticide resistance. These studies identified CncC as a transcription factor involved in regulation of genes responsible for imidacloprid resistance. Small molecule inhibitors of CncC or suppression of CncC by RNAi could provide effective synergists for pest control or management of insecticide resistance. Copyright © 2018 Elsevier Ltd. All rights reserved.
GeneMachine: gene prediction and sequence annotation.

PubMed

Makalowska, I; Ryan, J F; Baxevanis, A D

2001-09-01

A number of free-standing programs have been developed in order to help researchers find potential coding regions and deduce gene structure for long stretches of what is essentially 'anonymous DNA'. As these programs apply inherently different criteria to the question of what is and is not a coding region, multiple algorithms should be used in the course of positional cloning and positional candidate projects to assure that all potential coding regions within a previously-identified critical region are identified. We have developed a gene identification tool called GeneMachine which allows users to query multiple exon and gene prediction programs in an automated fashion. BLAST searches are also performed in order to see whether a previously-characterized coding region corresponds to a region in the query sequence. A suite of Perl programs and modules are used to run MZEF, GENSCAN, GRAIL 2, FGENES, RepeatMasker, Sputnik, and BLAST. The results of these runs are then parsed and written into ASN.1 format. Output files can be opened using NCBI Sequin, in essence using Sequin as both a workbench and as a graphical viewer. The main feature of GeneMachine is that the process is fully automated; the user is only required to launch GeneMachine and then open the resulting file with Sequin. Annotations can then be made to these results prior to submission to GenBank, thereby increasing the intrinsic value of these data. GeneMachine is freely-available for download at http://genome.nhgri.nih.gov/genemachine. A public Web interface to the GeneMachine server for academic and not-for-profit users is available at http://genemachine.nhgri.nih.gov. The Web supplement to this paper may be found at http://genome.nhgri.nih.gov/genemachine/supplement/.
Circular RNA profiling reveals that circular RNAs from ANXA2 can be used as new biomarkers for multiple sclerosis.

PubMed

Iparraguirre, Leire; Muñoz-Culla, Maider; Prada-Luengo, Iñigo; Castillo-Triviño, Tamara; Olascoaga, Javier; Otaegui, David

2017-09-15

Multiple sclerosis is an autoimmune disease, with higher prevalence in women, in whom the immune system is dysregulated. This dysregulation has been shown to correlate with changes in transcriptome expression as well as in gene-expression regulators, such as non-coding RNAs (e.g. microRNAs). Indeed, some of these have been suggested as biomarkers for multiple sclerosis even though few biomarkers have reached the clinical practice. Recently, a novel family of non-coding RNAs, circular RNAs, has emerged as a new player in the complex network of gene-expression regulation. MicroRNA regulation function through a 'sponge system' and a RNA splicing regulation function have been proposed for the circular RNAs. This regulating role together with their high stability in biofluids makes them seemingly good candidates as biomarkers. Given the dysregulation of both protein-coding and non-coding transcriptome that have been reported in multiple sclerosis patients, we hypothesised that circular RNA expression may also be altered. Therefore, we carried out expression profiling of 13.617 circular RNAs in peripheral blood leucocytes from multiple sclerosis patients and healthy controls finding 406 differentially expressed (P-value < 0.05, Fold change > 1.5) and demonstrate after validation that, circ_0005402 and circ_0035560 are underexpressed in multiple sclerosis patients and could be used as biomarkers of the disease. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Interplay between cardiac transcription factors and non-coding RNAs in predisposing to atrial fibrillation.

PubMed

Mikhailov, Alexander T; Torrado, Mario

2018-05-12

There is growing evidence that putative gene regulatory networks including cardio-enriched transcription factors, such as PITX2, TBX5, ZFHX3, and SHOX2, and their effector/target genes along with downstream non-coding RNAs can play a potentially important role in the process of adaptive and maladaptive atrial rhythm remodeling. In turn, expression of atrial fibrillation-associated transcription factors is under the control of upstream regulatory non-coding RNAs. This review broadly explores gene regulatory mechanisms associated with susceptibility to atrial fibrillation-with key examples from both animal models and patients-within the context of both cardiac transcription factors and non-coding RNAs. These two systems appear to have multiple levels of cross-regulation and act coordinately to achieve effective control of atrial rhythm effector gene expression. Perturbations of a dynamic expression balance between transcription factors and corresponding non-coding RNAs can provoke the development or promote the progression of atrial fibrillation. We also outline deficiencies in current models and discuss ongoing studies to clarify remaining mechanistic questions. An understanding of the function of transcription factors and non-coding RNAs in gene regulatory networks associated with atrial fibrillation risk will enable the development of innovative therapeutic strategies.
DLRS: gene tree evolution in light of a species tree.

PubMed

Sjöstrand, Joel; Sennblad, Bengt; Arvestad, Lars; Lagergren, Jens

2012-11-15

PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters. PrIME-DLRS is available for Java SE 6+ under the New BSD License, and JAR files and source code can be downloaded from http://code.google.com/p/jprime/. There is also a slightly older C++ version available as a binary package for Ubuntu, with download instructions at http://prime.sbc.su.se. The C++ source code is available upon request. joel.sjostrand@scilifelab.se or jens.lagergren@scilifelab.se. PrIME-DLRS is based on a sound probabilistic model (Åkerborg et al., 2009) and has been thoroughly validated on synthetic and biological datasets (Supplementary Material online).
Pseudoscorpion mitochondria show rearranged genes and genome-wide reductions of RNA gene sizes and inferred structures, yet typical nucleotide composition bias

PubMed Central

2012-01-01

Background Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes. Results We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic analyses of all 13 mitochondrial protein-coding gene sequences consistently yield trees that place pseudoscorpions as sister to acariform mites. Conclusion The well-supported phylogenetic placement of pseudoscorpions as sister to Acariformes differs from some previous analyses based on morphology. However, these two lineages share multiple molecular evolutionary traits, including substantial mitochondrial genome rearrangements, extensive nucleotide substitution, and loss of helices in their inferred tRNA and rRNA structures. PMID:22409411

Multiple independent insertions of 5S rRNA genes in the spliced-leader gene family of trypanosome species.

PubMed

Beauparlant, Marc A; Drouin, Guy

2014-02-01

Analyses of the 5S rRNA genes found in the spliced-leader (SL) gene repeat units of numerous trypanosome species suggest that such linkages were not inherited from a common ancestor, but were the result of independent 5S rRNA gene insertions. In trypanosomes, 5S rRNA genes are found either in the tandemly repeated units coding for SL genes or in independent tandemly repeated units. Given that trypanosome species where 5S rRNA genes are within the tandemly repeated units coding for SL genes are phylogenetically related, one might hypothesize that this arrangement is the result of an ancestral insertion of 5S rRNA genes into the tandemly repeated SL gene family of trypanosomes. Here, we use the types of 5S rRNA genes found associated with SL genes, the flanking regions of the inserted 5S rRNA genes and the position of these insertions to show that most of the 5S rRNA genes found within SL gene repeat units of trypanosome species were not acquired from a common ancestor but are the results of independent insertions. These multiple 5S rRNA genes insertion events in trypanosomes are likely the result of frequent founder events in different hosts and/or geographical locations in species having short generation times.
PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

PubMed Central

Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J

2008-01-01

Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client side software setup or installation required. Source code is freely available to researchers interested in setting up a local version of PSAT for analysis of genomes not available through the public server. Access to the public web server and instructions for obtaining source code can be found at . PMID:18366802
Dietary Intervention by Phytochemicals and Their Role in Modulating Coding and Non-Coding Genes in Cancer

PubMed Central

Budisan, Liviuta; Gulei, Diana; Zanoaga, Oana Mihaela; Irimie, Alexandra Iulia; Chira, Sergiu; Braicu, Cornelia; Gherman, Claudia Diana; Berindan-Neagoe, Ioana

2017-01-01

Phytochemicals are natural compounds synthesized as secondary metabolites in plants, representing an important source of molecules with a wide range of therapeutic applications. These natural agents are important regulators of key pathological processes/conditions, including cancer, as they are able to modulate the expression of coding and non-coding transcripts with an oncogenic or tumour suppressor role. These natural agents are currently exploited for the development of therapeutic strategies alone or in tandem with conventional treatments for cancer. The aim of this paper is to review the recent studies regarding the role of these natural phytochemicals in different processes related to cancer inhibition, including apoptosis activation, angiogenesis and metastasis suppression. From the large palette of phytochemicals we selected epigallocatechin gallate (EGCG), caffeic acid phenethyl ester (CAPE), genistein, morin and kaempferol, due to their increased activity in modulating multiple coding and non-coding genes, targeting the main hallmarks of cancer. PMID:28587155
Dietary Intervention by Phytochemicals and Their Role in Modulating Coding and Non-Coding Genes in Cancer.

PubMed

Budisan, Liviuta; Gulei, Diana; Zanoaga, Oana Mihaela; Irimie, Alexandra Iulia; Sergiu, Chira; Braicu, Cornelia; Gherman, Claudia Diana; Berindan-Neagoe, Ioana

2017-06-01

Phytochemicals are natural compounds synthesized as secondary metabolites in plants, representing an important source of molecules with a wide range of therapeutic applications. These natural agents are important regulators of key pathological processes/conditions, including cancer, as they are able to modulate the expression of coding and non-coding transcripts with an oncogenic or tumour suppressor role. These natural agents are currently exploited for the development of therapeutic strategies alone or in tandem with conventional treatments for cancer. The aim of this paper is to review the recent studies regarding the role of these natural phytochemicals in different processes related to cancer inhibition, including apoptosis activation, angiogenesis and metastasis suppression. From the large palette of phytochemicals we selected epigallocatechin gallate (EGCG), caffeic acid phenethyl ester (CAPE), genistein, morin and kaempferol, due to their increased activity in modulating multiple coding and non-coding genes, targeting the main hallmarks of cancer.
Efficient CRISPR/Cas9-Mediated Versatile, Predictable, and Donor-Free Gene Knockout in Human Pluripotent Stem Cells.

PubMed

Liu, Zhongliang; Hui, Yi; Shi, Lei; Chen, Zhenyu; Xu, Xiangjie; Chi, Liankai; Fan, Beibei; Fang, Yujiang; Liu, Yang; Ma, Lin; Wang, Yiran; Xiao, Lei; Zhang, Quanbin; Jin, Guohua; Liu, Ling; Zhang, Xiaoqing

2016-09-13

Loss-of-function studies in human pluripotent stem cells (hPSCs) require efficient methodologies for lesion of genes of interest. Here, we introduce a donor-free paired gRNA-guided CRISPR/Cas9 knockout strategy (paired-KO) for efficient and rapid gene ablation in hPSCs. Through paired-KO, we succeeded in targeting all genes of interest with high biallelic targeting efficiencies. More importantly, during paired-KO, the cleaved DNA was repaired mostly through direct end joining without insertions/deletions (precise ligation), and thus makes the lesion product predictable. The paired-KO remained highly efficient for one-step targeting of multiple genes and was also efficient for targeting of microRNA, while for long non-coding RNA over 8 kb, cleavage of a short fragment of the core promoter region was sufficient to eradicate downstream gene transcription. This work suggests that the paired-KO strategy is a simple and robust system for loss-of-function studies for both coding and non-coding genes in hPSCs. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
Repeats of base oligomers as the primordial coding sequences of the primeval earth and their vestiges in modern genes.

PubMed

Ohno, S

1984-01-01

Three outstanding properties uniquely qualify repeats of base oligomers as the primordial coding sequences of all polypeptide chains. First, when compared with randomly generated base sequences in general, they are more likely to have long open reading frames. Second, periodical polypeptide chains specified by such repeats are more likely to assume either alpha-helical or beta-sheet secondary structures than are polypeptide chains of random sequence. Third, provided that the number of bases in the oligomeric unit is not a multiple of 3, these internally repetitious coding sequences are impervious to randomly sustained base substitutions, deletions, and insertions. This is because the recurring periodicity of their polypeptide chains is given by three consecutive copies of the oligomeric unit translated in three different reading frames. Accordingly, when one reading frame is open, the other two are automatically open as well, all three being capable of coding for polypeptide chains of identical periodicity. Under this circumstance, a frame shift due to the deletion or insertion of a number of bases that is not a multiple of 3 fails to alter the down-stream amino acid sequence, and even a base change causing premature chain-termination can silence only one of the three potential coding units. Newly arisen coding sequences in modern organisms are oligomeric repeats, and most of the older genes retain various vestiges of their original internal repetitions. Some of the genes (e.g., oncogenes) have even inherited the property of being impervious to randomly sustained base changes.
Refined mapping of autoimmune disease associated genetic variants with gene expression suggests an important role for non-coding RNAs.

PubMed

Ricaño-Ponce, Isis; Zhernakova, Daria V; Deelen, Patrick; Luo, Oscar; Li, Xingwang; Isaacs, Aaron; Karjalainen, Juha; Di Tommaso, Jennifer; Borek, Zuzanna Agnieszka; Zorro, Maria M; Gutierrez-Achury, Javier; Uitterlinden, Andre G; Hofman, Albert; van Meurs, Joyce; Netea, Mihai G; Jonkers, Iris H; Withoff, Sebo; van Duijn, Cornelia M; Li, Yang; Ruan, Yijun; Franke, Lude; Wijmenga, Cisca; Kumar, Vinod

2016-04-01

Genome-wide association and fine-mapping studies in 14 autoimmune diseases (AID) have implicated more than 250 loci in one or more of these diseases. As more than 90% of AID-associated SNPs are intergenic or intronic, pinpointing the causal genes is challenging. We performed a systematic analysis to link 460 SNPs that are associated with 14 AID to causal genes using transcriptomic data from 629 blood samples. We were able to link 71 (39%) of the AID-SNPs to two or more nearby genes, providing evidence that for part of the AID loci multiple causal genes exist. While 54 of the AID loci are shared by one or more AID, 17% of them do not share candidate causal genes. In addition to finding novel genes such as ULK3, we also implicate novel disease mechanisms and pathways like autophagy in celiac disease pathogenesis. Furthermore, 42 of the AID SNPs specifically affected the expression of 53 non-coding RNA genes. To further understand how the non-coding genome contributes to AID, the SNPs were linked to functional regulatory elements, which suggest a model where AID genes are regulated by network of chromatin looping/non-coding RNAs interactions. The looping model also explains how a causal candidate gene is not necessarily the gene closest to the AID SNP, which was the case in nearly 50% of cases. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Positive Selection of Plasmodium falciparum Parasites With Multiple var2csa-Type PfEMP1 Genes During the Course of Infection in Pregnant Women

PubMed Central

Salanti, Ali; Lavstsen, Thomas; Nielsen, Morten A.; Theander, Thor G.; Leke, Rose G. F.; Lo, Yeung Y.; Bobbili, Naveen; Arnot, David E.; Taylor, Diane W.

2011-01-01

Placental malaria infections are caused by Plasmodium falciparum–infected red blood cells sequestering in the placenta by binding to chondroitin sulfate A, mediated by VAR2CSA, a variant of the PfEMP1 family of adhesion antigens. Recent studies have shown that many P. falciparum genomes have multiple genes coding for different VAR2CSA proteins, and parasites with >1 var2csa gene appear to be more common in pregnant women with placental malaria than in nonpregnant individuals. We present evidence that, in pregnant women, parasites containing multiple var2csa-type genes possess a selective advantage over parasites with a single var2csa gene. Accumulation of parasites with multiple copies of the var2csa gene during the course of pregnancy was also correlated with the development of antibodies involved in blocking VAR2CSA adhesion. The data suggest that multiplicity of var2csa-type genes enables P. falciparum parasites to persist for a longer period of time during placental infections, probably because of their greater capacity for antigenic variation and evasion of variant-specific immune responses. PMID:21592998
An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder.

PubMed

Werling, Donna M; Brand, Harrison; An, Joon-Yong; Stone, Matthew R; Zhu, Lingxue; Glessner, Joseph T; Collins, Ryan L; Dong, Shan; Layer, Ryan M; Markenscoff-Papadimitriou, Eirene; Farrell, Andrew; Schwartz, Grace B; Wang, Harold Z; Currall, Benjamin B; Zhao, Xuefang; Dea, Jeanselle; Duhn, Clif; Erdman, Carolyn A; Gilson, Michael C; Yadav, Rachita; Handsaker, Robert E; Kashin, Seva; Klei, Lambertus; Mandell, Jeffrey D; Nowakowski, Tomasz J; Liu, Yuwen; Pochareddy, Sirisha; Smith, Louw; Walker, Michael F; Waterman, Matthew J; He, Xin; Kriegstein, Arnold R; Rubenstein, John L; Sestan, Nenad; McCarroll, Steven A; Neale, Benjamin M; Coon, Hilary; Willsey, A Jeremy; Buxbaum, Joseph D; Daly, Mark J; State, Matthew W; Quinlan, Aaron R; Marth, Gabor T; Roeder, Kathryn; Devlin, Bernie; Talkowski, Michael E; Sanders, Stephan J

2018-05-01

Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.
Diversity of Antisense and Other Non-Coding RNAs in Archaea Revealed by Comparative Small RNA Sequencing in Four Pyrobaculum Species

PubMed Central

Bernick, David L.; Dennis, Patrick P.; Lui, Lauren M.; Lowe, Todd M.

2012-01-01

A great diversity of small, non-coding RNA (ncRNA) molecules with roles in gene regulation and RNA processing have been intensely studied in eukaryotic and bacterial model organisms, yet our knowledge of possible parallel roles for small RNAs (sRNA) in archaea is limited. We employed RNA-seq to identify novel sRNA across multiple species of the hyperthermophilic genus Pyrobaculum, known for unusual RNA gene characteristics. By comparing transcriptional data collected in parallel among four species, we were able to identify conserved RNA genes fitting into known and novel families. Among our findings, we highlight three novel cis-antisense sRNAs encoded opposite to key regulatory (ferric uptake regulator), metabolic (triose-phosphate isomerase), and core transcriptional apparatus genes (transcription factor B). We also found a large increase in the number of conserved C/D box sRNA genes over what had been previously recognized; many of these genes are encoded antisense to protein coding genes. The conserved opposition to orthologous genes across the Pyrobaculum genus suggests similarities to other cis-antisense regulatory systems. Furthermore, the genus-specific nature of these sRNAs indicates they are relatively recent, stable adaptations. PMID:22783241
GenePattern | Informatics Technology for Cancer Research (ITCR)

Cancer.gov

GenePattern is a genomic analysis platform that provides access to hundreds of tools for the analysis and visualization of multiple data types. A web-based interface provides easy access to these tools and allows the creation of multi-step analysis pipelines that enable reproducible in silico research. A new GenePattern Notebook environment allows users to combine GenePattern analyses with text, graphics, and code to create complete reproducible research narratives.
Mechanisms and consequences of alternative polyadenylation

PubMed Central

Di Giammartino, Dafne Campigli; Nishida, Kensei; Manley, James L.

2011-01-01

Summary Alternative polyadenylation (APA) is emerging as a widespread mechanism used to control gene expression. Like alternative splicing, usage of alternative poly(A) sites allows a single gene to encode multiple mRNA transcripts. In some cases, this changes the mRNA coding potential; in other cases, the code remains unchanged but the 3’UTR length is altered, influencing the fate of mRNAs in several ways, for example, by altering the availability of RNA binding protein sites and microRNA binding sites. The mechansims governing both global and gene-specific APA are only starting to be deciphered. Here we review what is known about these mechanisms and the functional consequences of alternative polyadenlyation. PMID:21925375
A series of vectors to construct lacZ fusions for the study of gene expression in Schizosaccharomyces pombe.

PubMed

Lafuente, M J; Petit, T; Gancedo, C

1997-12-22

We have constructed a series of plasmids to facilitate the fusion of promoters with or without coding regions of genes of Schizosaccharomyces pombe to the lacZ gene of Escherichia coli. These vectors carry a multiple cloning region in which fission yeast DNA may be inserted in three different reading frames with respect to the coding region of lacZ. The plasmids were constructed with the ura4+ or the his3+ marker of S. pombe. Functionality of the plasmids was tested measuring in parallel the expression of fructose 1,6-bisphosphatase and beta-galactosidase under the control of the fbp1+ promoter in different conditions.
Non-coding RNAs as regulators of gene expression and epigenetics

PubMed Central

Kaikkonen, Minna U.; Lam, Michael T.Y.; Glass, Christopher K.

2011-01-01

Genome-wide studies have revealed that mammalian genomes are pervasively transcribed. This has led to the identification and isolation of novel classes of non-coding RNAs (ncRNAs) that influence gene expression by a variety of mechanisms. Here we review the characteristics and functions of regulatory ncRNAs in chromatin remodelling and at multiple levels of transcriptional and post-transcriptional regulation. We also describe the potential roles of ncRNAs in vascular biology and in mediating epigenetic modifications that might play roles in cardiovascular disease susceptibility. The emerging recognition of the diverse functions of ncRNAs in regulation of gene expression suggests that they may represent new targets for therapeutic intervention. PMID:21558279
FunGene: the functional gene pipeline and repository.

PubMed

Fish, Jordan A; Chai, Benli; Wang, Qiong; Sun, Yanni; Brown, C Titus; Tiedje, James M; Cole, James R

2013-01-01

Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer. While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/) offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.
Analysis of bHLH coding genes using gene co-expression network approach.

PubMed

Srivastava, Swati; Sanchita; Singh, Garima; Singh, Noopur; Srivastava, Gaurava; Sharma, Ashok

2016-07-01

Network analysis provides a powerful framework for the interpretation of data. It uses novel reference network-based metrices for module evolution. These could be used to identify module of highly connected genes showing variation in co-expression network. In this study, a co-expression network-based approach was used for analyzing the genes from microarray data. Our approach consists of a simple but robust rank-based network construction. The publicly available gene expression data of Solanum tuberosum under cold and heat stresses were considered to create and analyze a gene co-expression network. The analysis provide highly co-expressed module of bHLH coding genes based on correlation values. Our approach was to analyze the variation of genes expression, according to the time period of stress through co-expression network approach. As the result, the seed genes were identified showing multiple connections with other genes in the same cluster. Seed genes were found to be vary in different time periods of stress. These analyzed seed genes may be utilized further as marker genes for developing the stress tolerant plant species.
CMCpy: Genetic Code-Message Coevolution Models in Python

PubMed Central

Becich, Peter J.; Stark, Brian P.; Bhat, Harish S.; Ardell, David H.

2013-01-01

Code-message coevolution (CMC) models represent coevolution of a genetic code and a population of protein-coding genes (“messages”). Formally, CMC models are sets of quasispecies coupled together for fitness through a shared genetic code. Although CMC models display plausible explanations for the origin of multiple genetic code traits by natural selection, useful modern implementations of CMC models are not currently available. To meet this need we present CMCpy, an object-oriented Python API and command-line executable front-end that can reproduce all published results of CMC models. CMCpy implements multiple solvers for leading eigenpairs of quasispecies models. We also present novel analytical results that extend and generalize applications of perturbation theory to quasispecies models and pioneer the application of a homotopy method for quasispecies with non-unique maximally fit genotypes. Our results therefore facilitate the computational and analytical study of a variety of evolutionary systems. CMCpy is free open-source software available from http://pypi.python.org/pypi/CMCpy/. PMID:23532367
Molecular basis of the polydispersity of mucins: implications for the generation of saccharide diversity.

PubMed

Bhavanandan, V P; Gupta, D; Woitach, J; Guo, X; Jiang, W

1999-06-01

Secreted epithelial mucins are large macromolecules which exhibit extreme polydispersity, the molecular basis of which is not fully understood. We have obtained partial sequences of two genes (BSM1 and BSM2) coding for two distinct molecules. This is the first time that such closely-related genes have been identified for any mucin from an animal. We propose that a combination of multiple homologous genes, alternative splicing, differential glycosylation, and additional post-translational processing all contribute to the extreme polydispersity of mucins. The multiple domain structure and non-identical tandem repeats are also very important for the generation of the saccharide diversities of mucins.
Could age modify the effect of genetic variants in IL6 and TNF-α genes in multiple myeloma?

PubMed

Martino, Alessandro; Buda, Gabriele; Maggini, Valentina; Lapi, Francesco; Lupia, Antonella; Di Bello, Domenica; Orciuolo, Enrico; Galimberti, Sara; Barale, Roberto; Petrini, Mario; Rossi, Anna Maria

2012-05-01

Cytokines play a central role in multiple myeloma (MM) pathogenesis thus genetic variations within cytokines coding genes could influence MM susceptibility and therapy outcome. We investigated the impact of 8 SNPs in these genes in 202 MM cases and 235 controls also evaluating their impact on therapy outcome in a subset of 91 patients. Despite the overall negative findings, we found a significant age-modified effect of IL6 and TNF-α SNPs, on MM risk and therapy outcome, respectively. Therefore, this observation suggests that genetic variation in inflammation-related genes could be an important mediator of the complex interplay between ageing and cancer. Copyright Â© 2012 Elsevier Ltd. All rights reserved.
Novel promoters and coding first exons in DLG2 linked to developmental disorders and intellectual disability.

PubMed

Reggiani, Claudio; Coppens, Sandra; Sekhara, Tayeb; Dimov, Ivan; Pichon, Bruno; Lufin, Nicolas; Addor, Marie-Claude; Belligni, Elga Fabia; Digilio, Maria Cristina; Faletra, Flavio; Ferrero, Giovanni Battista; Gerard, Marion; Isidor, Bertrand; Joss, Shelagh; Niel-Bütschi, Florence; Perrone, Maria Dolores; Petit, Florence; Renieri, Alessandra; Romana, Serge; Topa, Alexandra; Vermeesch, Joris Robert; Lenaerts, Tom; Casimir, Georges; Abramowicz, Marc; Bontempi, Gianluca; Vilain, Catheline; Deconinck, Nicolas; Smits, Guillaume

2017-07-19

Tissue-specific integrative omics has the potential to reveal new genic elements important for developmental disorders. Two pediatric patients with global developmental delay and intellectual disability phenotype underwent array-CGH genetic testing, both showing a partial deletion of the DLG2 gene. From independent human and murine omics datasets, we combined copy number variations, histone modifications, developmental tissue-specific regulation, and protein data to explore the molecular mechanism at play. Integrating genomics, transcriptomics, and epigenomics data, we describe two novel DLG2 promoters and coding first exons expressed in human fetal brain. Their murine conservation and protein-level evidence allowed us to produce new DLG2 gene models for human and mouse. These new genic elements are deleted in 90% of 29 patients (public and in-house) showing partial deletion of the DLG2 gene. The patients' clinical characteristics expand the neurodevelopmental phenotypic spectrum linked to DLG2 gene disruption to cognitive and behavioral categories. While protein-coding genes are regarded as well known, our work shows that integration of multiple omics datasets can unveil novel coding elements. From a clinical perspective, our work demonstrates that two new DLG2 promoters and exons are crucial for the neurodevelopmental phenotypes associated with this gene. In addition, our work brings evidence for the lack of cross-annotation in human versus mouse reference genomes and nucleotide versus protein databases.

Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes).

PubMed

Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

2011-09-01

Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.
Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

PubMed Central

Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

2011-01-01

Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341
A high resolution atlas of gene expression in the domestic sheep (Ovis aries)

PubMed Central

Farquhar, Iseabail L.; Young, Rachel; Lefevre, Lucas; Pridans, Clare; Tsang, Hiu G.; Afrasiabi, Cyrus; Watson, Mick; Whitelaw, C. Bruce; Freeman, Tom C.; Archibald, Alan L.; Hume, David A.

2017-01-01

Sheep are a key source of meat, milk and fibre for the global livestock sector, and an important biomedical model. Global analysis of gene expression across multiple tissues has aided genome annotation and supported functional annotation of mammalian genes. We present a large-scale RNA-Seq dataset representing all the major organ systems from adult sheep and from several juvenile, neonatal and prenatal developmental time points. The Ovis aries reference genome (Oar v3.1) includes 27,504 genes (20,921 protein coding), of which 25,350 (19,921 protein coding) had detectable expression in at least one tissue in the sheep gene expression atlas dataset. Network-based cluster analysis of this dataset grouped genes according to their expression pattern. The principle of ‘guilt by association’ was used to infer the function of uncharacterised genes from their co-expression with genes of known function. We describe the overall transcriptional signatures present in the sheep gene expression atlas and assign those signatures, where possible, to specific cell populations or pathways. The findings are related to innate immunity by focusing on clusters with an immune signature, and to the advantages of cross-breeding by examining the patterns of genes exhibiting the greatest expression differences between purebred and crossbred animals. This high-resolution gene expression atlas for sheep is, to our knowledge, the largest transcriptomic dataset from any livestock species to date. It provides a resource to improve the annotation of the current reference genome for sheep, presenting a model transcriptome for ruminants and insight into gene, cell and tissue function at multiple developmental stages. PMID:28915238
A high resolution atlas of gene expression in the domestic sheep (Ovis aries).

PubMed

Clark, Emily L; Bush, Stephen J; McCulloch, Mary E B; Farquhar, Iseabail L; Young, Rachel; Lefevre, Lucas; Pridans, Clare; Tsang, Hiu G; Wu, Chunlei; Afrasiabi, Cyrus; Watson, Mick; Whitelaw, C Bruce; Freeman, Tom C; Summers, Kim M; Archibald, Alan L; Hume, David A

2017-09-01

Sheep are a key source of meat, milk and fibre for the global livestock sector, and an important biomedical model. Global analysis of gene expression across multiple tissues has aided genome annotation and supported functional annotation of mammalian genes. We present a large-scale RNA-Seq dataset representing all the major organ systems from adult sheep and from several juvenile, neonatal and prenatal developmental time points. The Ovis aries reference genome (Oar v3.1) includes 27,504 genes (20,921 protein coding), of which 25,350 (19,921 protein coding) had detectable expression in at least one tissue in the sheep gene expression atlas dataset. Network-based cluster analysis of this dataset grouped genes according to their expression pattern. The principle of 'guilt by association' was used to infer the function of uncharacterised genes from their co-expression with genes of known function. We describe the overall transcriptional signatures present in the sheep gene expression atlas and assign those signatures, where possible, to specific cell populations or pathways. The findings are related to innate immunity by focusing on clusters with an immune signature, and to the advantages of cross-breeding by examining the patterns of genes exhibiting the greatest expression differences between purebred and crossbred animals. This high-resolution gene expression atlas for sheep is, to our knowledge, the largest transcriptomic dataset from any livestock species to date. It provides a resource to improve the annotation of the current reference genome for sheep, presenting a model transcriptome for ruminants and insight into gene, cell and tissue function at multiple developmental stages.
Unraveling transcriptional control and cis-regulatory codes using the software suite GeneACT

PubMed Central

Cheung, Tom Hiu; Kwan, Yin Lam; Hamady, Micah; Liu, Xuedong

2006-01-01

Deciphering gene regulatory networks requires the systematic identification of functional cis-acting regulatory elements. We present a suite of web-based bioinformatics tools, called GeneACT , that can rapidly detect evolutionarily conserved transcription factor binding sites or microRNA target sites that are either unique or over-represented in differentially expressed genes from DNA microarray data. GeneACT provides graphic visualization and extraction of common regulatory sequence elements in the promoters and 3'-untranslated regions that are conserved across multiple mammalian species. PMID:17064417
Egg Case Silk Gene Sequences from Argiope Spiders: Evidence for Multiple Loci and a Loss of Function Between Paralogs

PubMed Central

Chaw, R. Crystal; Collin, Matthew; Wimmer, Marjorie; Helmrick, Kara-Leigh; Hayashi, Cheryl Y.

2017-01-01

Spiders swath their eggs with silk to protect developing embryos and hatchlings. Egg case silks, like other fibrous spider silks, are primarily composed of proteins called spidroins (spidroin = spider-fibroin). Silks, and thus spidroins, are important throughout the lives of spiders, yet the evolution of spidroin genes has been relatively understudied. Spidroin genes are notoriously difficult to sequence because they are typically very long (≥ 10 kb of coding sequence) and highly repetitive. Here, we investigate the evolution of spider silk genes through long-read sequencing of Bacterial Artificial Chromosome (BAC) clones. We demonstrate that the silver garden spider Argiope argentata has multiple egg case spidroin loci with a loss of function at one locus. We also use degenerate PCR primers to search the genomic DNA of congeneric species and find evidence for multiple egg case spidroin loci in other Argiope spiders. Comparative analyses show that these multiple loci are more similar at the nucleotide level within a species than between species. This pattern is consistent with concerted evolution homogenizing gene copies within a genome. More complicated explanations include convergent evolution or recent independent gene duplications within each species. PMID:29127108
MGDB: a comprehensive database of genes involved in melanoma.

PubMed

Zhang, Di; Zhu, Rongrong; Zhang, Hanqian; Zheng, Chun-Hou; Xia, Junfeng

2015-01-01

The Melanoma Gene Database (MGDB) is a manually curated catalog of molecular genetic data relating to genes involved in melanoma. The main purpose of this database is to establish a network of melanoma related genes and to facilitate the mechanistic study of melanoma tumorigenesis. The entries describing the relationships between melanoma and genes in the current release were manually extracted from PubMed abstracts, which contains cumulative to date 527 human melanoma genes (422 protein-coding and 105 non-coding genes). Each melanoma gene was annotated in seven different aspects (General Information, Expression, Methylation, Mutation, Interaction, Pathway and Drug). In addition, manually curated literature references have also been provided to support the inclusion of the gene in MGDB and establish its association with melanoma. MGDB has a user-friendly web interface with multiple browse and search functions. We hoped MGDB will enrich our knowledge about melanoma genetics and serve as a useful complement to the existing public resources. Database URL: http://bioinfo.ahu.edu.cn:8080/Melanoma/index.jsp. © The Author(s) 2015. Published by Oxford University Press.
Pleiotropic Effects of Variants in Dementia Genes in Parkinson Disease.

PubMed

Ibanez, Laura; Dube, Umber; Davis, Albert A; Fernandez, Maria V; Budde, John; Cooper, Breanna; Diez-Fairen, Monica; Ortega-Cubero, Sara; Pastor, Pau; Perlmutter, Joel S; Cruchaga, Carlos; Benitez, Bruno A

2018-01-01

Background: The prevalence of dementia in Parkinson disease (PD) increases dramatically with advancing age, approaching 80% in patients who survive 20 years with the disease. Increasing evidence suggests clinical, pathological and genetic overlap between Alzheimer disease, dementia with Lewy bodies and frontotemporal dementia with PD. However, the contribution of the dementia-causing genes to PD risk, cognitive impairment and dementia in PD is not fully established. Objective: To assess the contribution of coding variants in Mendelian dementia-causing genes on the risk of developing PD and the effect on cognitive performance of PD patients. Methods: We analyzed the coding regions of the amyloid-beta precursor protein ( APP ), Presenilin 1 and 2 ( PSEN1, PSEN2 ), and Granulin ( GRN ) genes from 1,374 PD cases and 973 controls using pooled-DNA targeted sequence, human exome-chip and whole-exome sequencing (WES) data by single variant and gene base (SKAT-O and burden tests) analyses. Global cognitive function was assessed using the Mini-Mental State Examination (MMSE) or the Montreal Cognitive Assessment (MoCA). The effect of coding variants in dementia-causing genes on cognitive performance was tested by multiple regression analysis adjusting for gender, disease duration, age at dementia assessment, study site and APOE carrier status. Results: Known AD pathogenic mutations in the PSEN1 (p.A79V) and PSEN2 (p.V148I) genes were found in 0.3% of all PD patients. There was a significant burden of rare, likely damaging variants in the GRN and PSEN1 genes in PD patients when compared with frequencies in the European population from the ExAC database. Multiple regression analysis revealed that PD patients carrying rare variants in the APP, PSEN1, PSEN2 , and GRN genes exhibit lower cognitive tests scores than non-carrier PD patients ( p = 2.0 × 10 -4 ), independent of age at PD diagnosis, age at evaluation, APOE status or recruitment site. Conclusions: Pathogenic mutations in the Alzheimer disease-causing genes ( PSEN1 and PSEN2) are found in sporadic PD patients. PD patients with cognitive decline carry rare variants in dementia-causing genes. Variants in genes causing Mendelian neurodegenerative diseases exhibit pleiotropic effects.
Structure of the human myelin/oligodendrocyte glycoprotein gene and multiple alternative spliced isoforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pham-Dinh, D.; Gaspera, D.B.; Dautigny, A.

1995-09-20

Myelin/oligodendrocyte glycoprotein (MOG), a special component of the central nervous system localization on the outermost lamellae of mature myelin, is a member of the immunoglobulin superfamily. We report here the organization of the human MOG gene, which spans approximately 17 kb, and the characterization of six MOG mRNA splicing variants. The intron/exon structure of the human MOG gene confirmed the splicing pattern, supporting the hypothesis that mRNA isoforms could arise by alternative splicing of a single gene. In addition to the eight exons coding for the major MOG isoform, the human MOG gene also contains 3` region, a previously unknownmore » alternatively spliced coding exon, VIA. Alternative utilization of two acceptor splicing sites for exon VIII could produce two different C-termini. The nucleotide sequences presented here may be a useful tool to study further possible involvement if the MOG gene in hereditary neurological disorders. 23 refs., 5 figs.« less
Current and future implications of basic and translational research on amyloid-β peptide production and removal pathways

PubMed Central

Bohm, C.; Chen, F.; Sevalle, J.; Qamar, S.; Dodd, R.; Li, Y.; Schmitt-Ulms, G.; Fraser, P.E.; St George-Hyslop, P.H.

2015-01-01

Inherited variants in multiple different genes are associated with increased risk for Alzheimer's disease (AD). In many of these genes, the inherited variants alter some aspect of the production or clearance of the neurotoxic amyloid β-peptide (Aβ). Thus missense, splice site or duplication mutants in the presenilin 1 (PS1), presenilin 2 (PS2) or the amyloid precursor protein (APP) genes, which alter the levels or shift the balance of Aβ produced, are associated with rare, highly penetrant autosomal dominant forms of Familial Alzheimer's Disease (FAD). Similarly, the more prevalent late-onset forms of AD are associated with both coding and non-coding variants in genes such as SORL1, PICALM and ABCA7 that affect the production and clearance of Aβ. This review summarises some of the recent molecular and structural work on the role of these genes and the proteins coded by them in the biology of Aβ. We also briefly outline how the emerging knowledge about the pathways involved in Aβ generation and clearance can be potentially targeted therapeutically. This article is part of Special Issue entitled "Neuronal Protein". PMID:25748120
Molecular Dissection of a Major Gene Effect on a Quantitative Trait: The Level of Alcohol Dehydrogenase Expression in Drosophila Melanogaster

PubMed Central

Stam, L. F.; Laurie, C. C.

1996-01-01

A molecular mapping experiment shows that a major gene effect on a quantitative trait, the level of alcohol dehydrogenase expression in Drosophila melanogaster, is due to multiple polymorphisms within the Adh gene. These polymorphisms are located in an intron, the coding sequence, and the 3' untranslated region. Because of nonrandom associations among polymorphisms at different sites, the individual effects combine (in some cases epistatically) to produce ``superalleles'' with large effect. These results have implications for the interpretation of major gene effects detected by quantitative trait locus mapping methods. They show that large effects due to a single locus may be due to multiple associated polymorphisms (or sequential fixations in isolated populations) rather than individual mutations of large effect. PMID:8978044
TA-GC cloning: A new simple and versatile technique for the directional cloning of PCR products for recombinant protein expression.

PubMed

Niarchos, Athanasios; Siora, Anastasia; Konstantinou, Evangelia; Kalampoki, Vasiliki; Lagoumintzis, George; Poulas, Konstantinos

2017-01-01

During the last few decades, the recombinant protein expression finds more and more applications. The cloning of protein-coding genes into expression vectors is required to be directional for proper expression, and versatile in order to facilitate gene insertion in multiple different vectors for expression tests. In this study, the TA-GC cloning method is proposed, as a new, simple and efficient method for the directional cloning of protein-coding genes in expression vectors. The presented method features several advantages over existing methods, which tend to be relatively more labour intensive, inflexible or expensive. The proposed method relies on the complementarity between single A- and G-overhangs of the protein-coding gene, obtained after a short incubation with T4 DNA polymerase, and T and C overhangs of the novel vector pET-BccI, created after digestion with the restriction endonuclease BccI. The novel protein-expression vector pET-BccI also facilitates the screening of transformed colonies for recombinant transformants. Evaluation experiments of the proposed TA-GC cloning method showed that 81% of the transformed colonies contained recombinant pET-BccI plasmids, and 98% of the recombinant colonies expressed the desired protein. This demonstrates that TA-GC cloning could be a valuable method for cloning protein-coding genes in expression vectors.
TA-GC cloning: A new simple and versatile technique for the directional cloning of PCR products for recombinant protein expression

PubMed Central

Niarchos, Athanasios; Siora, Anastasia; Konstantinou, Evangelia; Kalampoki, Vasiliki; Poulas, Konstantinos

2017-01-01

During the last few decades, the recombinant protein expression finds more and more applications. The cloning of protein-coding genes into expression vectors is required to be directional for proper expression, and versatile in order to facilitate gene insertion in multiple different vectors for expression tests. In this study, the TA-GC cloning method is proposed, as a new, simple and efficient method for the directional cloning of protein-coding genes in expression vectors. The presented method features several advantages over existing methods, which tend to be relatively more labour intensive, inflexible or expensive. The proposed method relies on the complementarity between single A- and G-overhangs of the protein-coding gene, obtained after a short incubation with T4 DNA polymerase, and T and C overhangs of the novel vector pET-BccI, created after digestion with the restriction endonuclease BccI. The novel protein-expression vector pET-BccI also facilitates the screening of transformed colonies for recombinant transformants. Evaluation experiments of the proposed TA-GC cloning method showed that 81% of the transformed colonies contained recombinant pET-BccI plasmids, and 98% of the recombinant colonies expressed the desired protein. This demonstrates that TA-GC cloning could be a valuable method for cloning protein-coding genes in expression vectors. PMID:29091919
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

PubMed Central

Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

2011-01-01

Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358
PSP: rapid identification of orthologous coding genes under positive selection across multiple closely related prokaryotic genomes.

PubMed

Su, Fei; Ou, Hong-Yu; Tao, Fei; Tang, Hongzhi; Xu, Ping

2013-12-27

With genomic sequences of many closely related bacterial strains made available by deep sequencing, it is now possible to investigate trends in prokaryotic microevolution. Positive selection is a sub-process of microevolution, in which a particular mutation is favored, causing the allele frequency to continuously shift in one direction. Wide scanning of prokaryotic genomes has shown that positive selection at the molecular level is much more frequent than expected. Genes with significant positive selection may play key roles in bacterial adaption to different environmental pressures. However, selection pressure analyses are computationally intensive and awkward to configure. Here we describe an open access web server, which is designated as PSP (Positive Selection analysis for Prokaryotic genomes) for performing evolutionary analysis on orthologous coding genes, specially designed for rapid comparison of dozens of closely related prokaryotic genomes. Remarkably, PSP facilitates functional exploration at the multiple levels by assignments and enrichments of KO, GO or COG terms. To illustrate this user-friendly tool, we analyzed Escherichia coli and Bacillus cereus genomes and found that several genes, which play key roles in human infection and antibiotic resistance, show significant evidence of positive selection. PSP is freely available to all users without any login requirement at: http://db-mml.sjtu.edu.cn/PSP/. PSP ultimately allows researchers to do genome-scale analysis for evolutionary selection across multiple prokaryotic genomes rapidly and easily, and identify the genes undergoing positive selection, which may play key roles in the interactions of host-pathogen and/or environmental adaptation.
Genomewide analysis of Drosophila circular RNAs reveals their structural and sequence properties and age-dependent neural accumulation

PubMed Central

Westholm, Jakub O.; Miura, Pedro; Olson, Sara; Shenker, Sol; Joseph, Brian; Sanfilippo, Piero; Celniker, Susan E.; Graveley, Brenton R.; Lai, Eric C.

2014-01-01

Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues and cultured cells, to rigorously annotate >2500 fruitfly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1000 well-conserved canonical miRNA seed matches, especially within coding regions, and coding conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs, and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase dramatically relative to linear isoforms during CNS aging, and constitute a novel aging biomarker. PMID:25544350
Detection of hyper-conserved regions in hepatitis B virus X gene potentially useful for gene therapy.

PubMed

González, Carolina; Tabernero, David; Cortese, Maria Francesca; Gregori, Josep; Casillas, Rosario; Riveiro-Barciela, Mar; Godoy, Cristina; Sopena, Sara; Rando, Ariadna; Yll, Marçal; Lopez-Martinez, Rosa; Quer, Josep; Esteban, Rafael; Buti, Maria; Rodríguez-Frías, Francisco

2018-05-21

To detect hyper-conserved regions in the hepatitis B virus (HBV) X gene ( HBX ) 5' region that could be candidates for gene therapy. The study included 27 chronic hepatitis B treatment-naive patients in various clinical stages (from chronic infection to cirrhosis and hepatocellular carcinoma, both HBeAg-negative and HBeAg-positive), and infected with HBV genotypes A-F and H. In a serum sample from each patient with viremia > 3.5 log IU/mL, the HBX 5' end region [nucleotide (nt) 1255-1611] was PCR-amplified and submitted to next-generation sequencing (NGS). We assessed genotype variants by phylogenetic analysis, and evaluated conservation of this region by calculating the information content of each nucleotide position in a multiple alignment of all unique sequences (haplotypes) obtained by NGS. Conservation at the HBx protein amino acid (aa) level was also analyzed. NGS yielded 1333069 sequences from the 27 samples, with a median of 4578 sequences/sample (2487-9279, IQR 2817). In 14/27 patients (51.8%), phylogenetic analysis of viral nucleotide haplotypes showed a complex mixture of genotypic variants. Analysis of the information content in the haplotype multiple alignments detected 2 hyper-conserved nucleotide regions, one in the HBX upstream non-coding region (nt 1255-1286) and the other in the 5' end coding region (nt 1519-1603). This last region coded for a conserved amino acid region (aa 63-76) that partially overlaps a Kunitz-like domain. Two hyper-conserved regions detected in the HBX 5' end may be of value for targeted gene therapy, regardless of the patients' clinical stage or HBV genotype.
The Ever-Evolving Concept of the Gene: The Use of RNA/Protein Experimental Techniques to Understand Genome Functions

PubMed Central

Cipriano, Andrea; Ballarino, Monica

2018-01-01

The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as “junk” DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years. PMID:29560353
Alternative splicing for members of human mosaic domain superfamilies. I. The CH and LIM domains containing group of proteins.

PubMed

Friedberg, Felix

2009-05-01

In this paper we examine (restricted to homo sapiens) the products resulting from gene duplication and the subsequent alternative splicing for the members of a multidomain group of proteins which possess the evolutionary conserved calponin homology CH domain, i.e. an "actin binding domain", as a singlet and which, in addition, contain the conserved cysteine rich double Zn finger possessing Lim domain, also as a singlet. Seven genes, resulting from gene duplications, were identified that code for seven group members for which pre-mRNAs appear to have undergone multiple alternative splicing: Mical 1, 2 and 3 are located on chromosomes 6q21, 11p15 and 22q11, respectively. The LMO7 gene is present on chromosome 13q22 and the LIMCH1 gene on chromosome 4p13. Micall1 is mapped to chromosome 22q13 and Micall2 to chromosome 7p22. Translated Gen/Bank ESTs suggest the existence of multiple products alternatively spliced from the pre-mRNAs encoded by these genes. Characteristic indicators of such splicing among the proteins derived from one gene must include containment of some common extensive 100% identical regions. In some instances only one exon might be partly or completely eliminated. Sometimes alternative splicing is also associated with an increased frequency of creation of an exon or part of an exon from an intron. Not only coding regions for the body of the protein but also for its N- or -C ends could be affected by the splicing. If created forms are merely beginning at different starting points but remain identical in sequence thereafter, their existence as products of alternate splicing must be questioned. In the splicings, described in this paper, multiple isoforms rather than a single isoform appear as products during the gene expression.
Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures

PubMed Central

Stark, Alexander; Lin, Michael F.; Kheradpour, Pouya; Pedersen, Jakob S.; Parts, Leopold; Carlson, Joseph W.; Crosby, Madeline A.; Rasmussen, Matthew D.; Roy, Sushmita; Deoras, Ameya N.; Ruby, J. Graham; Brennecke, Julius; Hodges, Emily; Hinrichs, Angie S.; Caspi, Anat; Paten, Benedict; Park, Seung-Won; Han, Mira V.; Maeder, Morgan L.; Polansky, Benjamin J.; Robson, Bryanne E.; Aerts, Stein; van Helden, Jacques; Hassan, Bassem; Gilbert, Donald G.; Eastman, Deborah A.; Rice, Michael; Weir, Michael; Hahn, Matthew W.; Park, Yongkyu; Dewey, Colin N.; Pachter, Lior; Kent, W. James; Haussler, David; Lai, Eric C.; Bartel, David P.; Hannon, Gregory J.; Kaufman, Thomas C.; Eisen, Michael B.; Clark, Andrew G.; Smith, Douglas; Celniker, Susan E.; Gelbart, William M.; Kellis, Manolis

2008-01-01

Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies. PMID:17994088

Draft Genome Sequence of Paenibacillus sp. Strain DMB20, Isolated from Alang Ship-Breaking Yard, Which Harbors Genes for Xenobiotic Degradation

PubMed Central

Shah, Binal; Jain, Kunal; Patel, Namrata; Pandit, Ramesh; Patel, Anand; Joshi, Chaitanya G.

2015-01-01

Paenibacillus sp. strain DMB20, in cometabolism with other Proteobacteria and Firmicutes, exhibits azoreduction of textile dyes. Here, we report the draft genome sequence of this bacterium, consisting of 6,647,181 bp with 7,668 coding sequences (CDSs). The data presented highlight multiple sets of functional genes associated with xenobiotic compound degradation. PMID:26067950
Ontological function annotation of long non-coding RNAs through hierarchical multi-label classification.

PubMed

Zhang, Jingpu; Zhang, Zuping; Wang, Zixiang; Liu, Yuting; Deng, Lei

2018-05-15

Long non-coding RNAs (lncRNAs) are an enormous collection of functional non-coding RNAs. Over the past decades, a large number of novel lncRNA genes have been identified. However, most of the lncRNAs remain function uncharacterized at present. Computational approaches provide a new insight to understand the potential functional implications of lncRNAs. Considering that each lncRNA may have multiple functions and a function may be further specialized into sub-functions, here we describe NeuraNetL2GO, a computational ontological function prediction approach for lncRNAs using hierarchical multi-label classification strategy based on multiple neural networks. The neural networks are incrementally trained level by level, each performing the prediction of gene ontology (GO) terms belonging to a given level. In NeuraNetL2GO, we use topological features of the lncRNA similarity network as the input of the neural networks and employ the output results to annotate the lncRNAs. We show that NeuraNetL2GO achieves the best performance and the overall advantage in maximum F-measure and coverage on the manually annotated lncRNA2GO-55 dataset compared to other state-of-the-art methods. The source code and data are available at http://denglab.org/NeuraNetL2GO/. leideng@csu.edu.cn. Supplementary data are available at Bioinformatics online.
Post-transcriptional Regulation of Genes Related to Biological Behaviors of Gastric Cancer by Long Noncoding RNAs and MicroRNAs

PubMed Central

Liu, Wenjing; Ma, Rui; Yuan, Yuan

2017-01-01

Noncoding RNAs play critical roles in regulating protein-coding genes and comprise two major classes: long noncoding RNAs (lncRNAs) and microRNAs (miRNAs). LncRNAs regulate gene expression at transcriptional, post-transcriptional, and epigenetic levels via multiple action modes. LncRNAs can also function as endogenous competitive RNAs for miRNAs and indirectly regulate gene expression post-transcriptionally. By binding to the 3'-untranslated regions (3'-UTR) of target genes, miRNAs post-transcriptionally regulate gene expression. Herein, we conducted a review of post-transcriptional regulation by lncRNAs and miRNAs of genes associated with biological behaviors of gastric cancer. PMID:29187891
GENETICALLY MODIFIED FOODS: TECHNOLOGICAL BREAKTHROUGH OR ECOLOGICAL NIGHMARE?

EPA Science Inventory

Fifty years ago, Wastson and Crick described the structure of DNA, setting the stage for the past decade's biotechnology revolution. Scientists have now broken the code of the entire human genome, and delineated the function of multiple genes; similar strides are being taken with...
DNA Multiple Sequence Alignment Guided by Protein Domains: The MSA-PAD 2.0 Method.

PubMed

Balech, Bachir; Monaco, Alfonso; Perniola, Michele; Santamaria, Monica; Donvito, Giacinto; Vicario, Saverio; Maggi, Giorgio; Pesole, Graziano

2018-01-01

Multiple sequence alignment (MSA) is a fundamental component in many DNA sequence analyses including metagenomics studies and phylogeny inference. When guided by protein profiles, DNA multiple alignments assume a higher precision and robustness. Here we present details of the use of the upgraded version of MSA-PAD (2.0), which is a DNA multiple sequence alignment framework able to align DNA sequences coding for single/multiple protein domains guided by PFAM or user-defined annotations. MSA-PAD has two alignment strategies, called "Gene" and "Genome," accounting for coding domains order and genomic rearrangements, respectively. Novel options were added to the present version, where the MSA can be guided by protein profiles provided by the user. This allows MSA-PAD 2.0 to run faster and to add custom protein profiles sometimes not present in PFAM database according to the user's interest. MSA-PAD 2.0 is currently freely available as a Web application at https://recasgateway.cloud.ba.infn.it/ .
Overview of long non-coding RNA and mRNA expression in response to methamphetamine treatment in vitro.

PubMed

Xiong, Kun; Long, Lingling; Zhang, Xudong; Qu, Hongke; Deng, Haixiao; Ding, Yanjun; Cai, Jifeng; Wang, Shuchao; Wang, Mi; Liao, Lvshuang; Huang, Jufang; Yi, Chun-Xia; Yan, Jie

2017-10-01

Long non-coding RNAs (lncRNAs) display multiple functions including regulation of neuronal injury. However, their impact in methamphetamine (METH)-induced neurotoxicity has rarely been reported. Here, using microarray analysis, we investigated the expression profiling of lncRNAs and mRNAs in primary cultured prefrontal cortical neurons after METH treatment. We observed a difference in lncRNA and mRNA expression between the experimental and sham control groups. Using bioinformatics, we analyzed the highest enriched gene ontology (GO) terms of biological process, cellular component, and molecular function, and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and pathway network analysis. Furthermore, an lncRNA-mRNA co-expression sub-network for aberrantly expressed terms revealed possible interactions of lncRNA NR_110713 and NR_027943 with their related genes. Afterwards, three lncRNAs (NR_110713, NR_027943, GAS5) and two mRNAs (Ddit3, Casp12) were targeted to validate the microarray data by qRT-PCR. This presented an overview of lncRNA and mRNA expression profiling and indicated that lncRNA might participate in METH-induced neuronal apoptosis by regulating the coding genes of neurons. Copyright © 2017 Elsevier Ltd. All rights reserved.
Rcount: simple and flexible RNA-Seq read counting.

PubMed

Schmid, Marc W; Grossniklaus, Ueli

2015-02-01

Analysis of differential gene expression by RNA sequencing (RNA-Seq) is frequently done using feature counts, i.e. the number of reads mapping to a gene. However, commonly used count algorithms (e.g. HTSeq) do not address the problem of reads aligning with multiple locations in the genome (multireads) or reads aligning with positions where two or more genes overlap (ambiguous reads). Rcount specifically addresses these issues. Furthermore, Rcount allows the user to assign priorities to certain feature types (e.g. higher priority for protein-coding genes compared to rRNA-coding genes) or to add flanking regions. Rcount provides a fast and easy-to-use graphical user interface requiring no command line or programming skills. It is implemented in C++ using the SeqAn (www.seqan.de) and the Qt libraries (qt-project.org). Source code and 64 bit binaries for (Ubuntu) Linux, Windows (7) and MacOSX are released under the GPLv3 license and are freely available on github.com/MWSchmid/Rcount. marcschmid@gmx.ch Test data, genome annotation files, useful Python and R scripts and a step-by-step user guide (including run-time and memory usage tests) are available on github.com/MWSchmid/Rcount. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Selection and validation of suitable reference genes for miRNA expression normalization by quantitative RT-PCR in citrus somatic embryogenic and adult tissues.

PubMed

Kou, Shu-Jun; Wu, Xiao-Meng; Liu, Zheng; Liu, Yuan-Long; Xu, Qiang; Guo, Wen-Wu

2012-12-01

miRNAs have recently been reported to modulate somatic embryogenesis (SE), a key pathway of plant regeneration in vitro. For expression level detection and subsequent function dissection of miRNAs in certain biological processes, qRT-PCR is one of the most effective and sensitive techniques, for which suitable reference gene selection is a prerequisite. In this study, three miRNAs and eight non-coding RNAs (ncRNA) were selected as reference candidates, and their expression stability was inspected in developing citrus SE tissues cultured at 20, 25, and 30 °C. Stability of the eight non-miRNA ncRNAs was further validated in five adult tissues without temperature treatment. The best single reference gene for SE tissues was snoR14 or snoRD25, while for the adult tissues the best one was U4; although they were not as stable as the optimal multiple references snoR14 + U6 for SE tissues and snoR14 + U5 for adult tissues. For expression normalization of less abundant miRNAs in SE tissues, miR3954 was assessed as a viable reference. Single reference gene snoR14 outperformed multiple references for the overall SE and adult tissues. As one of the pioneer systematic studies on reference gene identification for plant miRNA normalization, this study benefits future exploration on miRNA function in citrus and provides valuable information for similar studies in other higher plants. Three miRNAs and eight non-coding RNAs were tested as reference candidates on developing citrus SE tissues. Best single references snoR14 or snoRD25 and optimal multiple references snoR14 + U6, snoR14 + U5 were identified.
APPRIS 2017: principal isoforms for multiple gene sets

PubMed Central

Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso

2018-01-01

Abstract The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants. PMID:29069475
Draft Genome Sequence of Paenibacillus sp. Strain DMB20, Isolated from Alang Ship-Breaking Yard, Which Harbors Genes for Xenobiotic Degradation.

PubMed

Shah, Binal; Jain, Kunal; Patel, Namrata; Pandit, Ramesh; Patel, Anand; Joshi, Chaitanya G; Madamwar, Datta

2015-06-11

Paenibacillus sp. strain DMB20, in cometabolism with other Proteobacteria and Firmicutes, exhibits azoreduction of textile dyes. Here, we report the draft genome sequence of this bacterium, consisting of 6,647,181 bp with 7,668 coding sequences (CDSs). The data presented highlight multiple sets of functional genes associated with xenobiotic compound degradation. Copyright © 2015 Shah et al.
Identification of Circular RNAs from the Parental Genes Involved in Multiple Aspects of Cellular Metabolism in Barley

PubMed Central

Darbani, Behrooz; Noeparvar, Shahin; Borg, Søren

2016-01-01

RNA circularization made by head-to-tail back-splicing events is involved in the regulation of gene expression from transcriptional to post-translational levels. By exploiting RNA-Seq data and down-stream analysis, we shed light on the importance of circular RNAs in plants. The results introduce circular RNAs as novel interactors in the regulation of gene expression in plants and imply the comprehensiveness of this regulatory pathway by identifying circular RNAs for a diverse set of genes. These genes are involved in several aspects of cellular metabolism as hormonal signaling, intracellular protein sorting, carbohydrate metabolism and cell-wall biogenesis, respiration, amino acid biosynthesis, transcription and translation, and protein ubiquitination. Additionally, these parental loci of circular RNAs, from both nuclear and mitochondrial genomes, encode for different transcript classes including protein coding transcripts, microRNA, rRNA, and long non-coding/microprotein coding RNAs. The results shed light on the mitochondrial exonic circular RNAs and imply the importance of circular RNAs for regulation of mitochondrial genes. Importantly, we introduce circular RNAs in barley and elucidate their cellular-level alterations across tissues and in response to micronutrients iron and zinc. In further support of circular RNAs' functional roles in plants, we report several cases where fluctuations of circRNAs do not correlate with the levels of their parental-loci encoded linear transcripts. PMID:27375638
DNA Translator and Aligner: HyperCard utilities to aid phylogenetic analysis of molecules.

PubMed

Eernisse, D J

1992-04-01

DNA Translator and Aligner are molecular phylogenetics HyperCard stacks for Macintosh computers. They manipulate sequence data to provide graphical gene mapping, conversions, translations and manual multiple-sequence alignment editing. DNA Translator is able to convert documented GenBank or EMBL documented sequences into linearized, rescalable gene maps whose gene sequences are extractable by clicking on the corresponding map button or by selection from a scrolling list. Provided gene maps, complete with extractable sequences, consist of nine metazoan, one yeast, and one ciliate mitochondrial DNAs and three green plant chloroplast DNAs. Single or multiple sequences can be manipulated to aid in phylogenetic analysis. Sequences can be translated between nucleic acids and proteins in either direction with flexible support of alternate genetic codes and ambiguous nucleotide symbols. Multiple aligned sequence output from diverse sources can be converted to Nexus, Hennig86 or PHYLIP format for subsequent phylogenetic analysis. Input or output alignments can be examined with Aligner, a convenient accessory stack included in the DNA Translator package. Aligner is an editor for the manual alignment of up to 100 sequences that toggles between display of matched characters and normal unmatched sequences. DNA Translator also generates graphic displays of amino acid coding and codon usage frequency relative to all other, or only synonymous, codons for approximately 70 select organism-organelle combinations. Codon usage data is compatible with spreadsheet or UWGCG formats for incorporation of additional molecules of interest. The complete package is available via anonymous ftp and is free for non-commercial uses.
Molecular dissection of nutrient exchange at the insect-microbial interface.

PubMed

Douglas, Angela E

2014-10-01

Genome research is transforming our understanding of nutrient exchange between insects and intracellular bacteria. A key characteristic of these bacteria is their small genome size and gene content. Their fastidious and inflexible nutritional requirements are met by multiple metabolites from the insect host cell. Although the bacteria have generally retained genes coding the synthesis of nutrients required by the insect, some apparently critical genes have been lost, and compensated for by shared metabolic pathways with the insect host or supplementary bacteria with complementary metabolic capabilities. Copyright © 2014 Elsevier Inc. All rights reserved.
A fully decompressed synthetic bacteriophage øX174 genome assembled and archived in yeast.

PubMed

Jaschke, Paul R; Lieberman, Erica K; Rodriguez, Jon; Sierra, Adrian; Endy, Drew

2012-12-20

The 5386 nucleotide bacteriophage øX174 genome has a complicated architecture that encodes 11 gene products via overlapping protein coding sequences spanning multiple reading frames. We designed a 6302 nucleotide synthetic surrogate, øX174.1, that fully separates all primary phage protein coding sequences along with cognate translation control elements. To specify øX174.1f, a decompressed genome the same length as wild type, we truncated the gene F coding sequence. We synthesized DNA encoding fragments of øX174.1f and used a combination of in vitro- and yeast-based assembly to produce yeast vectors encoding natural or designer bacteriophage genomes. We isolated clonal preparations of yeast plasmid DNA and transfected E. coli C strains. We recovered viable øX174 particles containing the øX174.1f genome from E. coli C strains that independently express full-length gene F. We expect that yeast can serve as a genomic 'drydock' within which to maintain and manipulate clonal lineages of other obligate lytic phage. Copyright © 2012 Elsevier Inc. All rights reserved.
Deciphering the associations between gene expression and copy number alteration using a sparse double Laplacian shrinkage approach

PubMed Central

Shi, Xingjie; Zhao, Qing; Huang, Jian; Xie, Yang; Ma, Shuangge

2015-01-01

Motivation: Both gene expression levels (GEs) and copy number alterations (CNAs) have important biological implications. GEs are partly regulated by CNAs, and much effort has been devoted to understanding their relations. The regulation analysis is challenging with one gene expression possibly regulated by multiple CNAs and one CNA potentially regulating the expressions of multiple genes. The correlations among GEs and among CNAs make the analysis even more complicated. The existing methods have limitations and cannot comprehensively describe the regulation. Results: A sparse double Laplacian shrinkage method is developed. It jointly models the effects of multiple CNAs on multiple GEs. Penalization is adopted to achieve sparsity and identify the regulation relationships. Network adjacency is computed to describe the interconnections among GEs and among CNAs. Two Laplacian shrinkage penalties are imposed to accommodate the network adjacency measures. Simulation shows that the proposed method outperforms the competing alternatives with more accurate marker identification. The Cancer Genome Atlas data are analysed to further demonstrate advantages of the proposed method. Availability and implementation: R code is available at http://works.bepress.com/shuangge/49/ Contact: shuangge.ma@yale.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26342102
TargetCompare: A web interface to compare simultaneous miRNAs targets

PubMed Central

Moreira, Fabiano Cordeiro; Dustan, Bruno; Hamoy, Igor G; Ribeiro-dos-Santos, André M; dos Santos, Ândrea Ribeiro

2014-01-01

MicroRNAs (miRNAs) are small non-coding nucleotide sequences between 17 and 25 nucleotides in length that primarily function in the regulation of gene expression. A since miRNA has thousand of predict targets in a complex, regulatory cell signaling network. Therefore, it is of interest to study multiple target genes simultaneously. Hence, we describe a web tool (developed using Java programming language and MySQL database server) to analyse multiple targets of pre-selected miRNAs. We cross validated the tool in eight most highly expressed miRNAs in the antrum region of stomach. This helped to identify 43 potential genes that are target of at least six of the referred miRNAs. The developed tool aims to reduce the randomness and increase the chance of selecting strong candidate target genes and miRNAs responsible for playing important roles in the studied tissue. Availability http://lghm.ufpa.br/targetcompare PMID:25352731
TargetCompare: A web interface to compare simultaneous miRNAs targets.

PubMed

Moreira, Fabiano Cordeiro; Dustan, Bruno; Hamoy, Igor G; Ribeiro-Dos-Santos, André M; Dos Santos, Andrea Ribeiro

2014-01-01

MicroRNAs (miRNAs) are small non-coding nucleotide sequences between 17 and 25 nucleotides in length that primarily function in the regulation of gene expression. A since miRNA has thousand of predict targets in a complex, regulatory cell signaling network. Therefore, it is of interest to study multiple target genes simultaneously. Hence, we describe a web tool (developed using Java programming language and MySQL database server) to analyse multiple targets of pre-selected miRNAs. We cross validated the tool in eight most highly expressed miRNAs in the antrum region of stomach. This helped to identify 43 potential genes that are target of at least six of the referred miRNAs. The developed tool aims to reduce the randomness and increase the chance of selecting strong candidate target genes and miRNAs responsible for playing important roles in the studied tissue. http://lghm.ufpa.br/targetcompare.
Xenopus microRNA genes are predominantly located within introns and are differentially expressed in adult frog tissues via post-transcriptional regulation

PubMed Central

Tang, Guo-Qing; Maxwell, E. Stuart

2008-01-01

The amphibian Xenopus provides a model organism for investigating microRNA expression during vertebrate embryogenesis and development. Searching available Xenopus genome databases using known human pre-miRNAs as query sequences, more than 300 genes encoding 142 Xenopus tropicalis miRNAs were identified. Analysis of Xenopus tropicalis miRNA genes revealed a predominate positioning within introns of protein-coding and nonprotein-coding RNA Pol II-transcribed genes. MiRNA genes were also located in pre-mRNA exons and positioned intergenically between known protein-coding genes. Many miRNA species were found in multiple locations and in more than one genomic context. MiRNA genes were also clustered throughout the genome, indicating the potential for the cotranscription and coordinate expression of miRNAs located in a given cluster. Northern blot analysis confirmed the expression of many identified miRNAs in both X. tropicalis and X. laevis. Comparison of X. tropicalis and X. laevis blots revealed comparable expression profiles, although several miRNAs exhibited species-specific expression in different tissues. More detailed analysis revealed that for some miRNAs, the tissue-specific expression profile of the pri-miRNA precursor was distinctly different from that of the mature miRNA profile. Differential miRNA precursor processing in both the nucleus and cytoplasm was implicated in the observed tissue-specific differences. These observations indicated that post-transcriptional processing plays an important role in regulating miRNA expression in the amphibian Xenopus. PMID:18032731
Multiple copies of a bile acid-inducible gene in Eubacterium sp. strain VPI 12708.

PubMed Central

Gopal-Srivastava, R; Mallonee, D H; White, W B; Hylemon, P B

1990-01-01

Eubacterium sp. strain VPI 12708 is an anaerobic intestinal bacterium which possesses inducible bile acid 7-dehydroxylation activity. Several new polypeptides are produced in this strain following induction with cholic acid. Genes coding for two copies of a bile acid-inducible 27,000-dalton polypeptide (baiA1 and baiA2) have been previously cloned and sequenced. We now report on a gene coding for a third copy of this 27,000-dalton polypeptide (baiA3). The baiA3 gene has been cloned in lambda DASH on an 11.2-kilobase DNA fragment from a partial Sau3A digest of the Eubacterium DNA. DNA sequence analysis of the baiA3 gene revealed 100% homology with the baiA1 gene within the coding region of the 27,000-dalton polypeptides. The baiA2 gene shares 81% sequence identity with the other two genes at the nucleotide level. The flanking nucleotide sequences associated with the baiA1 and baiA3 genes are identical for 930 bases in the 5' direction from the initiation codon and for at least 325 bases in the 3' direction from the stop codon, including the putative promoter regions for the genes. An additional open reading frame (occupying from 621 to 648 bases, depending on the correct start codon) was found in the identical 5' regions associated with the baiA1 and baiA3 clones. The 5' sequence 930 bases upstream from the baiA1 and baiA3 genes was totally divergent. The baiA2 gene, which is part of a large bile acid-inducible operon, showed no homology with the other two genes either in the 5' or 3' direction from the polypeptide coding region, except for a 15-base-pair presumed ribosome-binding site in the 5' region. These studies strongly suggest that a gene duplication (baiA1 and baiA3) has occurred and is stably maintained in this bacterium. Images PMID:2376563
Convergent evolution of marine mammals is associated with distinct substitutions in common genes

PubMed Central

Zhou, Xuming; Seim, Inge; Gladyshev, Vadim N.

2015-01-01

Phenotypic convergence is thought to be driven by parallel substitutions coupled with natural selection at the sequence level. Multiple independent evolutionary transitions of mammals to an aquatic environment offer an opportunity to test this thesis. Here, whole genome alignment of coding sequences identified widespread parallel amino acid substitutions in marine mammals; however, the majority of these changes were not unique to these animals. Conversely, we report that candidate aquatic adaptation genes, identified by signatures of likelihood convergence and/or elevated ratio of nonsynonymous to synonymous nucleotide substitution rate, are characterized by very few parallel substitutions and exhibit distinct sequence changes in each group. Moreover, no significant positive correlation was found between likelihood convergence and positive selection in all three marine lineages. These results suggest that convergence in protein coding genes associated with aquatic lifestyle is mainly characterized by independent substitutions and relaxed negative selection. PMID:26549748

Pre-Bilaterian Origins of the Hox Cluster and the Hox Code: Evidence from the Sea Anemone, Nematostella vectensis

PubMed Central

Ryan, Joseph F.; Mazza, Maureen E.; Pang, Kevin; Matus, David Q.; Baxevanis, Andreas D.; Martindale, Mark Q.; Finnerty, John R.

2007-01-01

Background Hox genes were critical to many morphological innovations of bilaterian animals. However, early Hox evolution remains obscure. Phylogenetic, developmental, and genomic analyses on the cnidarian sea anemone Nematostella vectensis challenge recent claims that the Hox code is a bilaterian invention and that no “true” Hox genes exist in the phylum Cnidaria. Methodology/Principal Findings Phylogenetic analyses of 18 Hox-related genes from Nematostella identify putative Hox1, Hox2, and Hox9+ genes. Statistical comparisons among competing hypotheses bolster these findings, including an explicit consideration of the gene losses implied by alternate topologies. In situ hybridization studies of 20 Hox-related genes reveal that multiple Hox genes are expressed in distinct regions along the primary body axis, supporting the existence of a pre-bilaterian Hox code. Additionally, several Hox genes are expressed in nested domains along the secondary body axis, suggesting a role in “dorsoventral” patterning. Conclusions/Significance A cluster of anterior and posterior Hox genes, as well as ParaHox cluster of genes evolved prior to the cnidarian-bilaterian split. There is evidence to suggest that these clusters were formed from a series of tandem gene duplication events and played a role in patterning both the primary and secondary body axes in a bilaterally symmetrical common ancestor. Cnidarians and bilaterians shared a common ancestor some 570 to 700 million years ago, and as such, are derived from a common body plan. Our work reveals several conserved genetic components that are found in both of these diverse lineages. This finding is consistent with the hypothesis that a set of developmental rules established in the common ancestor of cnidarians and bilaterians is still at work today. PMID:17252055
Positive Selection Underlies Faster-Z Evolution of Gene Expression in Birds

PubMed Central

Dean, Rebecca; Harrison, Peter W.; Wright, Alison E.; Zimmer, Fabian; Mank, Judith E.

2015-01-01

The elevated rate of evolution for genes on sex chromosomes compared with autosomes (Fast-X or Fast-Z evolution) can result either from positive selection in the heterogametic sex or from nonadaptive consequences of reduced relative effective population size. Recent work in birds suggests that Fast-Z of coding sequence is primarily due to relaxed purifying selection resulting from reduced relative effective population size. However, gene sequence and gene expression are often subject to distinct evolutionary pressures; therefore, we tested for Fast-Z in gene expression using next-generation RNA-sequencing data from multiple avian species. Similar to studies of Fast-Z in coding sequence, we recover clear signatures of Fast-Z in gene expression; however, in contrast to coding sequence, our data indicate that Fast-Z in expression is due to positive selection acting primarily in females. In the soma, where gene expression is highly correlated between the sexes, we detected Fast-Z in both sexes, although at a higher rate in females, suggesting that many positively selected expression changes in females are also expressed in males. In the gonad, where intersexual correlations in expression are much lower, we detected Fast-Z for female gene expression, but crucially, not males. This suggests that a large amount of expression variation is sex-specific in its effects within the gonad. Taken together, our results indicate that Fast-Z evolution of gene expression is the product of positive selection acting on recessive beneficial alleles in the heterogametic sex. More broadly, our analysis suggests that the adaptive potential of Z chromosome gene expression may be much greater than that of gene sequence, results which have important implications for the role of sex chromosomes in speciation and sexual selection. PMID:26067773
The transfer and transformation of collective network information in gene-matched networks.

PubMed

Kitsukawa, Takashi; Yagi, Takeshi

2015-10-09

Networks, such as the human society network, social and professional networks, and biological system networks, contain vast amounts of information. Information signals in networks are distributed over nodes and transmitted through intricately wired links, making the transfer and transformation of such information difficult to follow. Here we introduce a novel method for describing network information and its transfer using a model network, the Gene-matched network (GMN), in which nodes (neurons) possess attributes (genes). In the GMN, nodes are connected according to their expression of common genes. Because neurons have multiple genes, the GMN is cluster-rich. We show that, in the GMN, information transfer and transformation were controlled systematically, according to the activity level of the network. Furthermore, information transfer and transformation could be traced numerically with a vector using genes expressed in the activated neurons, the active-gene array, which was used to assess the relative activity among overlapping neuronal groups. Interestingly, this coding style closely resembles the cell-assembly neural coding theory. The method introduced here could be applied to many real-world networks, since many systems, including human society and various biological systems, can be represented as a network of this type.
Complete Genome Sequence of the Rhizobacterium Pseudomonas trivialis Strain IHBB745 with Multiple Plant Growth-Promoting Activities and Tolerance to Desiccation and Alkalinity

PubMed Central

Swarnkar, Mohit Kumar; Vyas, Pratibha; Rahi, Praveen; Thakur, Rishu; Thakur, Namika; Singh, Anil Kumar

2015-01-01

The complete genome sequence of 6.45 Mb is reported here for Pseudomonas trivialis strain IHBB745 (MTCC 5336), which is an efficient, stress-tolerant, and broad-spectrum plant growth-promoting rhizobacterium. The gene-coding clusters predicted the genes for phosphate solubilization, siderophore production, 1-aminocyclopropane-1-carboxylate (ACC) deaminase activity, indole-3-acetic acid (IAA) production, and stress response. PMID:26337878
Complex organisation and structure of the ghrelin antisense strand gene GHRLOS, a candidate non-coding RNA gene

PubMed Central

Seim, Inge; Carter, Shea L; Herington, Adrian C; Chopin, Lisa K

2008-01-01

Background The peptide hormone ghrelin has many important physiological and pathophysiological roles, including the stimulation of growth hormone (GH) release, appetite regulation, gut motility and proliferation of cancer cells. We previously identified a gene on the opposite strand of the ghrelin gene, ghrelinOS (GHRLOS), which spans the promoter and untranslated regions of the ghrelin gene (GHRL). Here we further characterise GHRLOS. Results We have described GHRLOS mRNA isoforms that extend over 1.4 kb of the promoter region and 106 nucleotides of exon 4 of the ghrelin gene, GHRL. These GHRLOS transcripts initiate 4.8 kb downstream of the terminal exon 4 of GHRL and are present in the 3' untranslated exon of the adjacent gene TATDN2 (TatD DNase domain containing 2). Interestingly, we have also identified a putative non-coding TATDN2-GHRLOS chimaeric transcript, indicating that GHRLOS RNA biogenesis is extremely complex. Moreover, we have discovered that the 3' region of GHRLOS is also antisense, in a tail-to-tail fashion to a novel terminal exon of the neighbouring SEC13 gene, which is important in protein transport. Sequence analyses revealed that GHRLOS is riddled with stop codons, and that there is little nucleotide and amino-acid sequence conservation of the GHRLOS gene between vertebrates. The gene spans 44 kb on 3p25.3, is extensively spliced and harbours multiple variable exons. We have also investigated the expression of GHRLOS and found evidence of differential tissue expression. It is highly expressed in tissues which are emerging as major sites of non-coding RNA expression (the thymus, brain, and testis), as well as in the ovary and uterus. In contrast, very low levels were found in the stomach where sense, GHRL derived RNAs are highly expressed. Conclusion GHRLOS RNA transcripts display several distinctive features of non-coding (ncRNA) genes, including 5' capping, polyadenylation, extensive splicing and short open reading frames. The gene is also non-conserved, with differential and tissue-restricted expression. The overlapping genomic arrangement of GHRLOS with the ghrelin gene indicates that it is likely to have interesting regulatory and functional roles in the ghrelin axis. PMID:18954468
Complex organisation and structure of the ghrelin antisense strand gene GHRLOS, a candidate non-coding RNA gene.

PubMed

Seim, Inge; Carter, Shea L; Herington, Adrian C; Chopin, Lisa K

2008-10-28

The peptide hormone ghrelin has many important physiological and pathophysiological roles, including the stimulation of growth hormone (GH) release, appetite regulation, gut motility and proliferation of cancer cells. We previously identified a gene on the opposite strand of the ghrelin gene, ghrelinOS (GHRLOS), which spans the promoter and untranslated regions of the ghrelin gene (GHRL). Here we further characterise GHRLOS. We have described GHRLOS mRNA isoforms that extend over 1.4 kb of the promoter region and 106 nucleotides of exon 4 of the ghrelin gene, GHRL. These GHRLOS transcripts initiate 4.8 kb downstream of the terminal exon 4 of GHRL and are present in the 3' untranslated exon of the adjacent gene TATDN2 (TatD DNase domain containing 2). Interestingly, we have also identified a putative non-coding TATDN2-GHRLOS chimaeric transcript, indicating that GHRLOS RNA biogenesis is extremely complex. Moreover, we have discovered that the 3' region of GHRLOS is also antisense, in a tail-to-tail fashion to a novel terminal exon of the neighbouring SEC13 gene, which is important in protein transport. Sequence analyses revealed that GHRLOS is riddled with stop codons, and that there is little nucleotide and amino-acid sequence conservation of the GHRLOS gene between vertebrates. The gene spans 44 kb on 3p25.3, is extensively spliced and harbours multiple variable exons. We have also investigated the expression of GHRLOS and found evidence of differential tissue expression. It is highly expressed in tissues which are emerging as major sites of non-coding RNA expression (the thymus, brain, and testis), as well as in the ovary and uterus. In contrast, very low levels were found in the stomach where sense, GHRL derived RNAs are highly expressed. GHRLOS RNA transcripts display several distinctive features of non-coding (ncRNA) genes, including 5' capping, polyadenylation, extensive splicing and short open reading frames. The gene is also non-conserved, with differential and tissue-restricted expression. The overlapping genomic arrangement of GHRLOS with the ghrelin gene indicates that it is likely to have interesting regulatory and functional roles in the ghrelin axis.
Capturing the Biofuel Wellhead and Powerhouse: The Chloroplast and Mitochondrial Genomes of the Leguminous Feedstock Tree Pongamia pinnata

PubMed Central

Kazakoff, Stephen H.; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T.; Gresshoff, Peter M.

2012-01-01

Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® ‘Second Generation DNA Sequencing (2GS)’ and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites. PMID:23272141
Capturing the biofuel wellhead and powerhouse: the chloroplast and mitochondrial genomes of the leguminous feedstock tree Pongamia pinnata.

PubMed

Kazakoff, Stephen H; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T; Gresshoff, Peter M

2012-01-01

Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® 'Second Generation DNA Sequencing (2GS)' and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites.
The metazoan Mediator co-activator complex as an integrative hub for transcriptional regulation.

PubMed

Malik, Sohail; Roeder, Robert G

2010-11-01

The Mediator is an evolutionarily conserved, multiprotein complex that is a key regulator of protein-coding genes. In metazoan cells, multiple pathways that are responsible for homeostasis, cell growth and differentiation converge on the Mediator through transcriptional activators and repressors that target one or more of the almost 30 subunits of this complex. Besides interacting directly with RNA polymerase II, Mediator has multiple functions and can interact with and coordinate the action of numerous other co-activators and co-repressors, including those acting at the level of chromatin. These interactions ultimately allow the Mediator to deliver outputs that range from maximal activation of genes to modulation of basal transcription to long-term epigenetic silencing.
dbWFA: a web-based database for functional annotation of Triticum aestivum transcripts

PubMed Central

Vincent, Jonathan; Dai, Zhanwu; Ravel, Catherine; Choulet, Frédéric; Mouzeyar, Said; Bouzidi, M. Fouad; Agier, Marie; Martre, Pierre

2013-01-01

The functional annotation of genes based on sequence homology with genes from model species genomes is time-consuming because it is necessary to mine several unrelated databases. The aim of the present work was to develop a functional annotation database for common wheat Triticum aestivum (L.). The database, named dbWFA, is based on the reference NCBI UniGene set, an expressed gene catalogue built by expressed sequence tag clustering, and on full-length coding sequences retrieved from the TriFLDB database. Information from good-quality heterogeneous sources, including annotations for model plant species Arabidopsis thaliana (L.) Heynh. and Oryza sativa L., was gathered and linked to T. aestivum sequences through BLAST-based homology searches. Even though the complexity of the transcriptome cannot yet be fully appreciated, we developed a tool to easily and promptly obtain information from multiple functional annotation systems (Gene Ontology, MapMan bin codes, MIPS Functional Categories, PlantCyc pathway reactions and TAIR gene families). The use of dbWFA is illustrated here with several query examples. We were able to assign a putative function to 45% of the UniGenes and 81% of the full-length coding sequences from TriFLDB. Moreover, comparison of the annotation of the whole T. aestivum UniGene set along with curated annotations of the two model species assessed the accuracy of the annotation provided by dbWFA. To further illustrate the use of dbWFA, genes specifically expressed during the early cell division or late storage polymer accumulation phases of T. aestivum grain development were identified using a clustering analysis and then annotated using dbWFA. The annotation of these two sets of genes was consistent with previous analyses of T. aestivum grain transcriptomes and proteomes. Database URL: urgi.versailles.inra.fr/dbWFA/ PMID:23660284
Elevated Rate of Fixation of Endogenous Retroviral Elements in Haplorhini TRIM5 and TRIM22 Genomic Sequences: Impact on Transcriptional Regulation

PubMed Central

Diehl, William E.; Johnson, Welkin E.; Hunter, Eric

2013-01-01

All genes in the TRIM6/TRIM34/TRIM5/TRIM22 locus are type I interferon inducible, with TRIM5 and TRIM22 possessing antiviral properties. Evolutionary studies involving the TRIM6/34/5/22 locus have predominantly focused on the coding sequence of the genes, finding that TRIM5 and TRIM22 have undergone high rates of both non-synonymous nucleotide replacements and in-frame insertions and deletions. We sought to understand if divergent evolutionary pressures on TRIM6/34/5/22 coding regions have selected for modifications in the non-coding regions of these genes and explore whether such non-coding changes may influence the biological function of these genes. The transcribed genomic regions, including the introns, of TRIM6, TRIM34, TRIM5, and TRIM22 from ten Haplorhini primates and one prosimian species were analyzed for transposable element content. In Haplorhini species, TRIM5 displayed an exaggerated interspecies variability, predominantly resulting from changes in the composition of transposable elements in the large first and fourth introns. Multiple lineage-specific endogenous retroviral long terminal repeats (LTRs) were identified in the first intron of TRIM5 and TRIM22. In the prosimian genome, we identified a duplication of TRIM5 with a concomitant loss of TRIM22. The transposable element content of the prosimian TRIM5 genes appears to largely represent the shared Haplorhini/prosimian ancestral state for this gene. Furthermore, we demonstrated that one such differentially fixed LTR provides for species-specific transcriptional regulation of TRIM22 in response to p53 activation. Our results identify a previously unrecognized source of species-specific variation in the antiviral TRIM genes, which can lead to alterations in their transcriptional regulation. These observations suggest that there has existed long-term pressure for exaptation of retroviral LTRs in the non-coding regions of these genes. This likely resulted from serial viral challenges and provided a mechanism for rapid alteration of transcriptional regulation. To our knowledge, this represents the first report of persistent evolutionary pressure for the capture of retroviral LTR insertions. PMID:23516500
Foxo3 activity promoted by non-coding effects of circular RNA and Foxo3 pseudogene in the inhibition of tumor growth and angiogenesis.

PubMed

Yang, W; Du, W W; Li, X; Yee, A J; Yang, B B

2016-07-28

It has recently been shown that the upregulation of a pseudogene specific to a protein-coding gene could function as a sponge to bind multiple potential targeting microRNAs (miRNAs), resulting in increased gene expression. Similarly, it was recently demonstrated that circular RNAs can function as sponges for miRNAs, and could upregulate expression of mRNAs containing an identical sequence. Furthermore, some mRNAs are now known to not only translate protein, but also function to sponge miRNA binding, facilitating gene expression. Collectively, these appear to be effective mechanisms to ensure gene expression and protein activity. Here we show that expression of a member of the forkhead family of transcription factors, Foxo3, is regulated by the Foxo3 pseudogene (Foxo3P), and Foxo3 circular RNA, both of which bind to eight miRNAs. We found that the ectopic expression of the Foxo3P, Foxo3 circular RNA and Foxo3 mRNA could all suppress tumor growth and cancer cell proliferation and survival. Our results showed that at least three mechanisms are used to ensure protein translation of Foxo3, which reflects an essential role of Foxo3 and its corresponding non-coding RNAs.
Computational Tools and Algorithms for Designing Customized Synthetic Genes

PubMed Central

Gould, Nathan; Hendy, Oliver; Papamichail, Dimitris

2014-01-01

Advances in DNA synthesis have enabled the construction of artificial genes, gene circuits, and genomes of bacterial scale. Freedom in de novo design of synthetic constructs provides significant power in studying the impact of mutations in sequence features, and verifying hypotheses on the functional information that is encoded in nucleic and amino acids. To aid this goal, a large number of software tools of variable sophistication have been implemented, enabling the design of synthetic genes for sequence optimization based on rationally defined properties. The first generation of tools dealt predominantly with singular objectives such as codon usage optimization and unique restriction site incorporation. Recent years have seen the emergence of sequence design tools that aim to evolve sequences toward combinations of objectives. The design of optimal protein-coding sequences adhering to multiple objectives is computationally hard, and most tools rely on heuristics to sample the vast sequence design space. In this review, we study some of the algorithmic issues behind gene optimization and the approaches that different tools have adopted to redesign genes and optimize desired coding features. We utilize test cases to demonstrate the efficiency of each approach, as well as identify their strengths and limitations. PMID:25340050
Genome-wide identification of microRNAs in pomegranate (Punica granatum L.) by high-throughput sequencing

USDA-ARS?s Scientific Manuscript database

Background: MicroRNAs (miRNAs), a class of small non-coding endogenous RNAs that regulate gene expression post-transcriptionally, play multiple key roles in plant growth and development and in biotic and abiotic stress response. Knowledge and roles of miRNAs in pomegranate fruit development have not...
Complete Genome Sequence of the Rhizobacterium Pseudomonas trivialis Strain IHBB745 with Multiple Plant Growth-Promoting Activities and Tolerance to Desiccation and Alkalinity.

PubMed

Gulati, Arvind; Swarnkar, Mohit Kumar; Vyas, Pratibha; Rahi, Praveen; Thakur, Rishu; Thakur, Namika; Singh, Anil Kumar

2015-09-03

The complete genome sequence of 6.45 Mb is reported here for Pseudomonas trivialis strain IHBB745 (MTCC 5336), which is an efficient, stress-tolerant, and broad-spectrum plant growth-promoting rhizobacterium. The gene-coding clusters predicted the genes for phosphate solubilization, siderophore production, 1-aminocyclopropane-1-carboxylate (ACC) deaminase activity, indole-3-acetic acid (IAA) production, and stress response. Copyright © 2015 Gulati et al.
Deep Sequencing Reveals Uncharted Isoform Heterogeneity of the Protein-Coding Transcriptome in Cerebral Ischemia.

PubMed

Bhattarai, Sunil; Aly, Ahmed; Garcia, Kristy; Ruiz, Diandra; Pontarelli, Fabrizio; Dharap, Ashutosh

2018-06-03

Gene expression in cerebral ischemia has been a subject of intense investigations for several years. Studies utilizing probe-based high-throughput methodologies such as microarrays have contributed significantly to our existing knowledge but lacked the capacity to dissect the transcriptome in detail. Genome-wide RNA-sequencing (RNA-seq) enables comprehensive examinations of transcriptomes for attributes such as strandedness, alternative splicing, alternative transcription start/stop sites, and sequence composition, thus providing a very detailed account of gene expression. Leveraging this capability, we conducted an in-depth, genome-wide evaluation of the protein-coding transcriptome of the adult mouse cortex after transient focal ischemia at 6, 12, or 24 h of reperfusion using RNA-seq. We identified a total of 1007 transcripts at 6 h, 1878 transcripts at 12 h, and 1618 transcripts at 24 h of reperfusion that were significantly altered as compared to sham controls. With isoform-level resolution, we identified 23 splice variants arising from 23 genes that were novel mRNA isoforms. For a subset of genes, we detected reperfusion time-point-dependent splice isoform switching, indicating an expression and/or functional switch for these genes. Finally, for 286 genes across all three reperfusion time-points, we discovered multiple, distinct, simultaneously expressed and differentially altered isoforms per gene that were generated via alternative transcription start/stop sites. Of these, 165 isoforms derived from 109 genes were novel mRNAs. Together, our data unravel the protein-coding transcriptome of the cerebral cortex at an unprecedented depth to provide several new insights into the flexibility and complexity of stroke-related gene transcription and transcript organization.
A universal genomic coordinate translator for comparative genomics

PubMed Central

2014-01-01

Background Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. Results Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. Conclusions Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken. PMID:24976580
A universal genomic coordinate translator for comparative genomics.

PubMed

Zamani, Neda; Sundström, Görel; Meadows, Jennifer R S; Höppner, Marc P; Dainat, Jacques; Lantz, Henrik; Haas, Brian J; Grabherr, Manfred G

2014-06-30

Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken.
High-efficiency CRISPR/Cas9 multiplex gene editing using the glycine tRNA-processing system-based strategy in maize.

PubMed

Qi, Weiwei; Zhu, Tong; Tian, Zhongrui; Li, Chaobin; Zhang, Wei; Song, Rentao

2016-08-11

CRISPR/Cas9 genome editing strategy has been applied to a variety of species and the tRNA-processing system has been used to compact multiple gRNAs into one synthetic gene for manipulating multiple genes in rice. We optimized and introduced the multiplex gene editing strategy based on the tRNA-processing system into maize. Maize glycine-tRNA was selected to design multiple tRNA-gRNA units for the simultaneous production of numerous gRNAs under the control of one maize U6 promoter. We designed three gRNAs for simplex editing and three multiple tRNA-gRNA units for multiplex editing. The results indicate that this system not only increased the number of targeted sites but also enhanced mutagenesis efficiency in maize. Additionally, we propose an advanced sequence selection of gRNA spacers for relatively more efficient and accurate chromosomal fragment deletion, which is important for complete abolishment of gene function especially long non-coding RNAs (lncRNAs). Our results also indicated that up to four tRNA-gRNA units in one expression cassette design can still work in maize. The examples reported here demonstrate the utility of the tRNA-processing system-based strategy as an efficient multiplex genome editing tool to enhance maize genetic research and breeding.
Negligible impact of rare autoimmune-locus coding-region variants on missing heritability.

PubMed

Hunt, Karen A; Mistry, Vanisha; Bockett, Nicholas A; Ahmad, Tariq; Ban, Maria; Barker, Jonathan N; Barrett, Jeffrey C; Blackburn, Hannah; Brand, Oliver; Burren, Oliver; Capon, Francesca; Compston, Alastair; Gough, Stephen C L; Jostins, Luke; Kong, Yong; Lee, James C; Lek, Monkol; MacArthur, Daniel G; Mansfield, John C; Mathew, Christopher G; Mein, Charles A; Mirza, Muddassar; Nutland, Sarah; Onengut-Gumuscu, Suna; Papouli, Efterpi; Parkes, Miles; Rich, Stephen S; Sawcer, Steven; Satsangi, Jack; Simmonds, Matthew J; Trembath, Richard C; Walker, Neil M; Wozniak, Eva; Todd, John A; Simpson, Michael A; Plagnol, Vincent; van Heel, David A

2013-06-13

Genome-wide association studies (GWAS) have identified common variants of modest-effect size at hundreds of loci for common autoimmune diseases; however, a substantial fraction of heritability remains unexplained, to which rare variants may contribute. To discover rare variants and test them for association with a phenotype, most studies re-sequence a small initial sample size and then genotype the discovered variants in a larger sample set. This approach fails to analyse a large fraction of the rare variants present in the entire sample set. Here we perform simultaneous amplicon-sequencing-based variant discovery and genotyping for coding exons of 25 GWAS risk genes in 41,911 UK residents of white European origin, comprising 24,892 subjects with six autoimmune disease phenotypes and 17,019 controls, and show that rare coding-region variants at known loci have a negligible role in common autoimmune disease susceptibility. These results do not support the rare-variant synthetic genome-wide-association hypothesis (in which unobserved rare causal variants lead to association detected at common tag variants). Many known autoimmune disease risk loci contain multiple, independently associated, common and low-frequency variants, and so genes at these loci are a priori stronger candidates for harbouring rare coding-region variants than other genes. Our data indicate that the missing heritability for common autoimmune diseases may not be attributable to the rare coding-region variant portion of the allelic spectrum, but perhaps, as others have proposed, may be a result of many common-variant loci of weak effect.

Genome sequence of an enhancin gene-rich nucleopolyhedrovirus (NPV) from Agrotis segetum: collinearity with Spodoptera exigua multiple NPV.

PubMed

Jakubowska, Agata K; Peters, Sander A; Ziemnicka, Jadwiga; Vlak, Just M; van Oers, Monique M

2006-03-01

The genome sequence of a Polish isolate of Agrotis segetum nucleopolyhedrovirus (AgseNPV-A) was determined and analysed. The circular genome is composed of 147,544 bp and has a G+C content of 45.7 mol%. It contains 153 putative, non-overlapping open reading frames (ORFs) encoding predicted proteins of more than 50 aa, together making up 89.8 % of the genome. The remaining 10.2 % of the DNA constitutes non-coding regions and homologous-repeat regions. One hundred and forty-three AgseNPV-A ORFs are homologues of previously reported baculovirus gene sequences. There are ten unique ORFs and they account for 3 % of the genome in total. All 62 lepidopteran baculovirus genes, including the 29 core baculovirus genes, were found in the AgseNPV-A genome. The gene content and gene order of AgseNPV-A are most similar to those of Spodoptera exigua (Se) multiple NPV and their shared homologous genes are 100 % collinear. Three putative enhancin genes were identified in the AgseNPV-A genome. In phylogenetic analysis, the AgseNPV-A enhancins form a cluster separated from enhancins of the Mamestra species NPVs.
BRD4 assists elongation of both coding and enhancer RNAs guided by histone acetylation

PubMed Central

Kanno, Tomohiko; Kanno, Yuka; LeRoy, Gary; Campos, Eric; Sun, Hong-Wei; Brooks, Stephen R; Vahedi, Golnaz; Heightman, Tom D; Garcia, Benjamin A; Reinberg, Danny; Siebenlist, Ulrich; O’Shea, John J; Ozato, Keiko

2016-01-01

Small-molecule BET inhibitors interfere with the epigenetic interactions between acetylated histones and the bromodomains of the BET family proteins, including BRD4, and they potently inhibit growth of malignant cells by targeting cancer-promoting genes. BRD4 interacts with the pause-release factor P-TEFb, and has been proposed to release Pol II from promoter-proximal pausing. We show that BRD4 occupied widespread genomic regions in mouse cells, and directly stimulated elongation of both protein-coding transcripts and non-coding enhancer RNAs (eRNAs), dependent on the function of bromodomains. BRD4 interacted physically with elongating Pol II complexes, and assisted Pol II progression through hyper-acetylated nucleosomes by interacting with acetylated histones via bromodomains. On active enhancers, the BET inhibitor JQ1 antagonized BRD4-associated eRNA synthesis. Thus, BRD4 is involved in multiple steps of the transcription hierarchy, primarily by assisting transcript elongation both at enhancers and on gene bodies. PMID:25383670
Identification of novel non-coding RNA-based negative feedback regulating the expression of the oncogenic transcription factor GLI1.

PubMed

Villegas, Victoria E; Rahman, Mohammed Ferdous-Ur; Fernandez-Barrena, Maite G; Diao, Yumei; Liapi, Eleni; Sonkoly, Enikö; Ståhle, Mona; Pivarcsi, Andor; Annaratone, Laura; Sapino, Anna; Ramírez Clavijo, Sandra; Bürglin, Thomas R; Shimokawa, Takashi; Ramachandran, Saraswathi; Kapranov, Philipp; Fernandez-Zapico, Martin E; Zaphiropoulos, Peter G

2014-07-01

Non-coding RNAs are a complex class of nucleic acids, with growing evidence supporting regulatory roles in gene expression. Here we identify a non-coding RNA located head-to-head with the gene encoding the Glioma-associated oncogene 1 (GLI1), a transcriptional effector of multiple cancer-associated signaling pathways. The expression of this three-exon GLI1 antisense (GLI1AS) RNA in cancer cells was concordant with GLI1 levels. siRNAs knockdown of GLI1AS up-regulated GLI1 and increased cellular proliferation and tumor growth in a xenograft model system. Conversely, GLI1AS overexpression decreased the levels of GLI1, its target genes PTCH1 and PTCH2, and cellular proliferation. Additionally, we demonstrate that GLI1 knockdown reduced GLI1AS, while GLI1 overexpression increased GLI1AS, supporting the role of GLI1AS as a target gene of the GLI1 transcription factor. Activation of TGFβ and Hedgehog signaling, two known regulators of GLI1 expression, conferred a concordant up-regulation of GLI1 and GLI1AS in cancer cells. Finally, analysis of the mechanism underlying the interplay between GLI1 and GLI1AS indicates that the non-coding RNA elicits a local alteration of chromatin structure by increasing the silencing mark H3K27me3 and decreasing the recruitment of RNA polymerase II to this locus. Taken together, the data demonstrate the existence of a novel non-coding RNA-based negative feedback loop controlling GLI1 levels, thus expanding the repertoire of mechanisms regulating the expression of this oncogenic transcription factor. Copyright © 2014 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.
Informational structure of genetic sequences and nature of gene splicing

NASA Astrophysics Data System (ADS)

Trifonov, E. N.

1991-10-01

Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
Accumulation of multiple mutations in linezolid-resistant Staphylococcus epidermidis causing bloodstream infections; in silico analysis of L3 amino acid substitutions that might confer high-level linezolid resistance.

PubMed

Ikonomidis, Alexandros; Grapsa, Anastasia; Pavlioglou, Charikleia; Demiri, Antonia; Batarli, Alexandra; Panopoulou, Maria

2016-12-01

Fifty-six Staphylococcus epidermidis clinical isolates, showing high-level linezolid resistance and causing bacteremia in critically ill patients, were studied. All isolates belonged to ST22 clone and carried the T2504A and C2534T mutations in gene coding for 23SrRNA as well as the C189A, G208A, C209T and G384C missense mutations in L3 protein which resulted in Asp159Tyr, Gly152Asp and Leu94Val substitutions. Other silent mutations were also detected in genes coding for ribosomal proteins L3 and L22. In silico analysis of missense mutations showed that although L3 protein retained the sequence of secondary motifs, the tertiary structure was influenced. The observed alteration in L3 protein folding provides an indication on the putative role of L3-coding gene mutations in high-level linezolid resistance. Furthermore, linezolid pressure in health care settings where linezolid consumption is of high rates might lead to the selection of resistant mutants possessing L3 mutations that might confer high-level linezolid resistance.
Shared regulatory sites are abundant in the human genome and shed light on genome evolution and disease pleiotropy.

PubMed

Tong, Pin; Monahan, Jack; Prendergast, James G D

2017-03-01

Large-scale gene expression datasets are providing an increasing understanding of the location of cis-eQTLs in the human genome and their role in disease. However, little is currently known regarding the extent of regulatory site-sharing between genes. This is despite it having potentially wide-ranging implications, from the determination of the way in which genetic variants may shape multiple phenotypes to the understanding of the evolution of human gene order. By first identifying the location of non-redundant cis-eQTLs, we show that regulatory site-sharing is a relatively common phenomenon in the human genome, with over 10% of non-redundant regulatory variants linked to the expression of multiple nearby genes. We show that these shared, local regulatory sites are linked to high levels of chromatin looping between the regulatory sites and their associated genes. In addition, these co-regulated gene modules are found to be strongly conserved across mammalian species, suggesting that shared regulatory sites have played an important role in shaping human gene order. The association of these shared cis-eQTLs with multiple genes means they also appear to be unusually important in understanding the genetics of human phenotypes and pleiotropy, with shared regulatory sites more often linked to multiple human phenotypes than other regulatory variants. This study shows that regulatory site-sharing is likely an underappreciated aspect of gene regulation and has important implications for the understanding of various biological phenomena, including how the two and three dimensional structures of the genome have been shaped and the potential causes of disease pleiotropy outside coding regions.
Pyviko: an automated Python tool to design gene knockouts in complex viruses with overlapping genes.

PubMed

Taylor, Louis J; Strebel, Klaus

2017-01-07

Gene knockouts are a common tool used to study gene function in various organisms. However, designing gene knockouts is complicated in viruses, which frequently contain sequences that code for multiple overlapping genes. Designing mutants that can be traced by the creation of new or elimination of existing restriction sites further compounds the difficulty in experimental design of knockouts of overlapping genes. While software is available to rapidly identify restriction sites in a given nucleotide sequence, no existing software addresses experimental design of mutations involving multiple overlapping amino acid sequences in generating gene knockouts. Pyviko performed well on a test set of over 240,000 gene pairs collected from viral genomes deposited in the National Center for Biotechnology Information Nucleotide database, identifying a point mutation which added a premature stop codon within the first 20 codons of the target gene in 93.2% of all tested gene-overprinted gene pairs. This shows that Pyviko can be used successfully in a wide variety of contexts to facilitate the molecular cloning and study of viral overprinted genes. Pyviko is an extensible and intuitive Python tool for designing knockouts of overlapping genes. Freely available as both a Python package and a web-based interface ( http://louiejtaylor.github.io/pyViKO/ ), Pyviko simplifies the experimental design of gene knockouts in complex viruses with overlapping genes.
Comparative architecture of silks, fibrous proteins and their encoding genes in insects and spiders.

PubMed

Craig, Catherine L; Riekel, Christian

2002-12-01

The known silk fibroins and fibrous glues are thought to be encoded by members of the same gene family. All silk fibroins sequenced to date contain regions of long-range order (crystalline regions) and/or short-range order (non-crystalline regions). All of the sequenced fibroin silks (Flag or silk from flagelliform gland in spiders; Fhc or heavy chain fibroin silks produced by Lepidoptera larvae) are made up of hierarchically organized, repetitive arrays of amino acids. Fhc fibroin genes are characterized by a similar molecular genetic architecture of two exons and one intron, but the organization and size of these units differs. The Flag, Ser (sericin gene) and BR (Balbiani ring genes; both fibrous proteins) genes are made up of multiple exons and introns. Sequences coding for crystalline and non-crystalline protein domains are integrated in the repetitive regions of Fhc and MA exons, but not in the protein glues Ser1 and BR-1. Genetic 'hot-spots' promote recombination errors in Fhc, MA, and Flag. Codon bias, structural constraint, point mutations, and shortened coding arrays may be alternative means of stabilizing precursor mRNA transcripts. Differential regulation of gene expression and selective splicing of the mRNA transcript may allow rapid adaptation of silk functional properties to different physical environments.
Prdm9 controls activation of mammalian recombination hotspots.

PubMed

Parvanov, Emil D; Petkov, Petko M; Paigen, Kenneth

2010-02-12

Mammalian meiotic recombination, which preferentially occurs at specialized sites called hotspots, ensures the orderly segregation of meiotic chromosomes and creates genetic variation among offspring. A locus on mouse chromosome 17, which controls activation of recombination at multiple distant hotspots, has been mapped within a 181-kilobase interval, three of whose genes can be eliminated as candidates. The remaining gene, Prdm9, codes for a zinc finger containing histone H3K4 trimethylase that is expressed in early meiosis and whose deficiency results in sterility in both sexes. Mus musculus exhibits five alleles of Prdm9; human populations exhibit two predominant alleles and multiple minor alleles. The identification of Prdm9 as a protein regulating mammalian recombination hotspots initiates molecular studies of this important biological control system.
CHIR99021 promotes self-renewal of mouse embryonic stem cells by modulation of protein-encoding gene and long intergenic non-coding RNA expression

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Yongyan; Key Laboratory of Animal Biotechnology, Ministry of Agriculture, Northwest A and F University, Yangling 712100, Shaanxi; Ai, Zhiying

2013-10-15

Embryonic stem cells (ESCs) can proliferate indefinitely in vitro and differentiate into cells of all three germ layers. These unique properties make them exceptionally valuable for drug discovery and regenerative medicine. However, the practical application of ESCs is limited because it is difficult to derive and culture ESCs. It has been demonstrated that CHIR99021 (CHIR) promotes self-renewal and enhances the derivation efficiency of mouse (m)ESCs. However, the downstream targets of CHIR are not fully understood. In this study, we identified CHIR-regulated genes in mESCs using microarray analysis. Our microarray data demonstrated that CHIR not only influenced the Wnt/β-catenin pathway bymore » stabilizing β-catenin, but also modulated several other pluripotency-related signaling pathways such as TGF-β, Notch and MAPK signaling pathways. More detailed analysis demonstrated that CHIR inhibited Nodal signaling, while activating bone morphogenetic protein signaling in mESCs. In addition, we found that pluripotency-maintaining transcription factors were up-regulated by CHIR, while several developmental-related genes were down-regulated. Furthermore, we found that CHIR altered the expression of epigenetic regulatory genes and long intergenic non-coding RNAs. Quantitative real-time PCR results were consistent with microarray data, suggesting that CHIR alters the expression pattern of protein-encoding genes (especially transcription factors), epigenetic regulatory genes and non-coding RNAs to establish a relatively stable pluripotency-maintaining network. - Highlights: • Combined use of CHIR with LIF promotes self-renewal of J1 mESCs. • CHIR-regulated genes are involved in multiple pathways. • CHIR inhibits Nodal signaling and promotes Bmp4 expression to activate BMP signaling. • Expression of epigenetic regulatory genes and lincRNAs is altered by CHIR.« less
Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

DOE PAGES

Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; ...

2015-05-12

Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less
Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.

Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less
PanCoreGen - Profiling, detecting, annotating protein-coding genes in microbial genomes.

PubMed

Paul, Sandip; Bhardwaj, Archana; Bag, Sumit K; Sokurenko, Evgeni V; Chattopadhyay, Sujay

2015-12-01

A large amount of genomic data, especially from multiple isolates of a single species, has opened new vistas for microbial genomics analysis. Analyzing the pan-genome (i.e. the sum of genetic repertoire) of microbial species is crucial in understanding the dynamics of molecular evolution, where virulence evolution is of major interest. Here we present PanCoreGen - a standalone application for pan- and core-genomic profiling of microbial protein-coding genes. PanCoreGen overcomes key limitations of the existing pan-genomic analysis tools, and develops an integrated annotation-structure for a species-specific pan-genomic profile. It provides important new features for annotating draft genomes/contigs and detecting unidentified genes in annotated genomes. It also generates user-defined group-specific datasets within the pan-genome. Interestingly, analyzing an example-set of Salmonella genomes, we detect potential footprints of adaptive convergence of horizontally transferred genes in two human-restricted pathogenic serovars - Typhi and Paratyphi A. Overall, PanCoreGen represents a state-of-the-art tool for microbial phylogenomics and pathogenomics study. Copyright © 2015 Elsevier Inc. All rights reserved.
Roles of long non-coding RNAs in gastric cancer metastasis

PubMed Central

Yang, Zi-Guo; Gao, Ling; Guo, Xiao-Bo; Shi, Yu-Long

2015-01-01

Gastric cancer is the second leading cause of cancer-related deaths. Metastasis, which is an important element of gastric cancer, leads to a high mortality rate and to a poor prognosis. Gastric cancer metastasis has a complex progression that involves multiple biological processes. The comprehensive mechanisms of metastasis remain unclear, though traditional regulation modulates the molecular functions associated with metastasis. Long non-coding RNAs (lncRNAs) have a role in different gene regulatory pathways by epigenetic modification and by transcriptional and post-transcription regulation. lncRNAs participate in various diseases, including Alzheimer’s disease, cardiovascular disease, and cancer. The altered expressions of certain lncRNAs are linked to gastric cancer metastasis and invasion, as with tumor suppressor genes or oncogenes. Studies have partly elucidated the roles of lncRNAs as biomarkers and in therapies, as well as their gene regulatory mechanisms. However, comprehensive knowledge regarding the functional mechanisms of gene regulation in metastatic gastric cancer remains scarce. To provide a theoretical basis for therapeutic intervention in metastatic gastric cancer, we reviewed the functions of lncRNAs and their regulatory roles in gastric cancer metastasis. PMID:25954095
The Evolution and Expression Pattern of Human Overlapping lncRNA and Protein-coding Gene Pairs.

PubMed

Ning, Qianqian; Li, Yixue; Wang, Zhen; Zhou, Songwen; Sun, Hong; Yu, Guangjun

2017-03-27

Long non-coding RNA overlapping with protein-coding gene (lncRNA-coding pair) is a special type of overlapping genes. Protein-coding overlapping genes have been well studied and increasing attention has been paid to lncRNAs. By studying lncRNA-coding pairs in human genome, we showed that lncRNA-coding pairs were more likely to be generated by overprinting and retaining genes in lncRNA-coding pairs were given higher priority than non-overlapping genes. Besides, the preference of overlapping configurations preserved during evolution was based on the origin of lncRNA-coding pairs. Further investigations showed that lncRNAs promoting the splicing of their embedded protein-coding partners was a unilateral interaction, but the existence of overlapping partners improving the gene expression was bidirectional and the effect was decreased with the increased evolutionary age of genes. Additionally, the expression of lncRNA-coding pairs showed an overall positive correlation and the expression correlation was associated with their overlapping configurations, local genomic environment and evolutionary age of genes. Comparison of the expression correlation of lncRNA-coding pairs between normal and cancer samples found that the lineage-specific pairs including old protein-coding genes may play an important role in tumorigenesis. This work presents a systematically comprehensive understanding of the evolution and the expression pattern of human lncRNA-coding pairs.
Proglucagons in vertebrates: Expression and processing of multiple genes in a bony fish.

PubMed

Busby, Ellen R; Mommsen, Thomas P

2016-09-01

In contrast to mammals, where a single proglucagon (PG) gene encodes three peptides: glucagon, glucagon-like peptide 1 and glucagon-like peptide 2 (GLP-1; GLP-2), many non-mammalian vertebrates carry multiple PG genes. Here, we investigate proglucagon mRNA sequences, their tissue expression and processing in a diploid bony fish. Copper rockfish (Sebastes caurinus) express two independent genes coding for distinct proglucagon sequences (PG I, PG II), with PG II lacking the GLP-2 sequence. These genes are differentially transcribed in the endocrine pancreas, the brain, and the gastrointestinal tract. Alternative splicing identified in rockfish is only one part of this complex regulation of the PG transcripts: the system has the potential to produce two glucagons, four GLP-1s and a single GLP-2, or any combination of these peptides. Mass spectrometric analysis of partially purified PG-derived peptides in endocrine pancreas confirms translation of both PG transcripts and differential processing of the resulting peptides. The complex differential regulation of the two PG genes and their continued presence in this extant teleostean fish strongly suggests unique and, as yet largely unidentified, roles for the peptide products encoded in each gene. Copyright © 2016 Elsevier Inc. All rights reserved.
The landscape of cancer genes and mutational processes in breast cancer

PubMed Central

Stephens, Philip J.; Tarpey, Patrick S.; Davies, Helen; Loo, Peter Van; Greenman, Chris; Wedge, David C.; Nik-Zainal, Serena; Martin, Sancha; Varela, Ignacio; Bignell, Graham R.; Yates, Lucy R.; Papaemmanuil, Elli; Beare, David; Butler, Adam; Cheverton, Angela; Gamble, John; Hinton, Jonathan; Jia, Mingming; Jayakumar, Alagu; Jones, David; Latimer, Calli; Lau, King Wai; McLaren, Stuart; McBride, David J.; Menzies, Andrew; Mudie, Laura; Raine, Keiran; Rad, Roland; Chapman, Michael Spencer; Teague, Jon; Easton, Douglas; Langerød, Anita; OSBREAC; Lee, Ming Ta Michael; Shen, Chen-Yang; Tee, Benita Tan Kiat; Huimin, Bernice Wong; Broeks, Annegien; Vargas, Ana Cristina; Turashvili, Gulisa; Martens, John; Fatima, Aquila; Miron, Penelope; Chin, Suet-Feung; Thomas, Gilles; Boyault, Sandrine; Mariani, Odette; Lakhani, Sunil R.; van de Vijver, Marc; van ’t Veer, Laura; Foekens, John; Desmedt, Christine; Sotiriou, Christos; Tutt, Andrew; Caldas, Carlos; Reis-Filho, Jorge S.; Aparicio, Samuel A. J. R.; Salomon, Anne Vincent; Børresen-Dale, Anne-Lise; Richardson, Andrea L.; Campbell, Peter J.; Futreal, P. Andrew; Stratton, Michael R.

2012-01-01

All cancers carry somatic mutations in their genomes. A subset, known as driver mutations, confer clonal selective advantage on cancer cells and are causally implicated in oncogenesis1, and the remainder are passenger mutations. The driver mutations and mutational processes operative in breast cancer have not yet been comprehensively explored. Here we examine the genomes of 100 tumours for somatic copy number changes and mutations in the coding exons of protein-coding genes. The number of somatic mutations varied markedly between individual tumours. We found strong correlations between mutation number, age at which cancer was diagnosed and cancer histological grade, and observed multiple mutational signatures, including one present in about ten per cent of tumours characterized by numerous mutations of cytosine at TpC dinucleotides. Driver mutations were identified in several new cancer genes including AKT2, ARID1B, CASP8, CDKN1B, MAP3K1, MAP3K13, NCOR1, SMARCD1 and TBX3. Among the 100 tumours, we found driver mutations in at least 40 cancer genes and 73 different combinations of mutated cancer genes. The results highlight the substantial genetic diversity underlying this common disease. PMID:22722201
Perspectives on the mechanism of transcriptional regulation by long non-coding RNAs.

PubMed

Roberts, Thomas C; Morris, Kevin V; Weinberg, Marc S

2014-01-01

Long non-coding RNAs (lncRNAs) are increasingly being recognized as epigenetic regulators of gene transcription. The diversity and complexity of lncRNA genes means that they exert their regulatory effects by a variety of mechanisms. Although there is still much to be learned about the mechanism of lncRNA function, general principles are starting to emerge. In particular, the application of high throughput (deep) sequencing methodologies has greatly advanced our understanding of lncRNA gene function. lncRNAs function as adaptors that link specific chromatin loci with chromatin-remodeling complexes and transcription factors. lncRNAs can act in cis or trans to guide epigenetic-modifier complexes to distinct genomic sites, or act as scaffolds which recruit multiple proteins simultaneously, thereby coordinating their activities. In this review we discuss the genomic organization of lncRNAs, the importance of RNA secondary structure to lncRNA functionality, the multitude of ways in which they interact with the genome, and what evolutionary conservation tells us about their function.
Genome-wide Analysis of Drosophila Circular RNAs Reveals Their Structural and Sequence Properties and Age-Dependent Neural Accumulation

DOE PAGES

Westholm, Jakub O.; Miura, Pedro; Olson, Sara; ...

2014-11-26

Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues, and cultured cells, to rigorously annotate >2,500 fruit fly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and the circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1,000 well-conserved canonical miRNA seed matches, especially within coding regions, and codingmore » conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase substantially relative to linear isoforms during CNS aging and constitute an aging biomarker.« less
Genome-wide Analysis of Drosophila Circular RNAs Reveals Their Structural and Sequence Properties and Age-Dependent Neural Accumulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Westholm, Jakub O.; Miura, Pedro; Olson, Sara

Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues, and cultured cells, to rigorously annotate >2,500 fruit fly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and the circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1,000 well-conserved canonical miRNA seed matches, especially within coding regions, and codingmore » conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase substantially relative to linear isoforms during CNS aging and constitute an aging biomarker.« less

Expression of short hairpin RNAs using the compact architecture of retroviral microRNA genes.

PubMed

Burke, James M; Kincaid, Rodney P; Aloisio, Francesca; Welch, Nicole; Sullivan, Christopher S

2017-09-29

Short hairpin RNAs (shRNAs) are effective in generating stable repression of gene expression. RNA polymerase III (RNAP III) type III promoters (U6 or H1) are typically used to drive shRNA expression. While useful for some knockdown applications, the robust expression of U6/H1-driven shRNAs can induce toxicity and generate heterogeneous small RNAs with undesirable off-target effects. Additionally, typical U6/H1 promoters encompass the majority of the ∼270 base pairs (bp) of vector space required for shRNA expression. This can limit the efficacy and/or number of delivery vector options, particularly when delivery of multiple gene/shRNA combinations is required. Here, we develop a compact shRNA (cshRNA) expression system based on retroviral microRNA (miRNA) gene architecture that uses RNAP III type II promoters. We demonstrate that cshRNAs coded from as little as 100 bps of total coding space can precisely generate small interfering RNAs (siRNAs) that are active in the RNA-induced silencing complex (RISC). We provide an algorithm with a user-friendly interface to design cshRNAs for desired target genes. This cshRNA expression system reduces the coding space required for shRNA expression by >2-fold as compared to the typical U6/H1 promoters, which may facilitate therapeutic RNAi applications where delivery vector space is limiting. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Diversity in copy number and structure of a silkworm morphogenetic gene as a result of domestication.

PubMed

Sakudoh, Takashi; Nakashima, Takeharu; Kuroki, Yoko; Fujiyama, Asao; Kohara, Yuji; Honda, Naoko; Fujimoto, Hirofumi; Shimada, Toru; Nakagaki, Masao; Banno, Yutaka; Tsuchida, Kozo

2011-03-01

The carotenoid-binding protein (CBP) of the domesticated silkworm, Bombyx mori, a major determinant of cocoon color, is likely to have been substantially influenced by domestication of this species. We analyzed the structure of the CBP gene in multiple strains of B. mori, in multiple individuals of the wild silkworm, B. mandarina (the putative wild ancestor of B. mori), and in a number of other lepidopterans. We found the CBP gene copy number in genomic DNA to vary widely among B. mori strains, ranging from 1 to 20. The copies of CBP are of several types, based on the presence of a retrotransposon or partial deletion of the coding sequence. In contrast to B. mori, B. mandarina was found to possess a single copy of CBP without the retrotransposon insertion, regardless of habitat. Several other lepidopterans were found to contain sequences homologous to CBP, revealing that this gene is evolutionarily conserved in the lepidopteran lineage. Thus, domestication can generate significant diversity of gene copy number and structure over a relatively short evolutionary time. © 2011 by the Genetics Society of America
Diversity in Copy Number and Structure of a Silkworm Morphogenetic Gene as a Result of Domestication

PubMed Central

Sakudoh, Takashi; Nakashima, Takeharu; Kuroki, Yoko; Fujiyama, Asao; Kohara, Yuji; Honda, Naoko; Fujimoto, Hirofumi; Shimada, Toru; Nakagaki, Masao; Banno, Yutaka; Tsuchida, Kozo

2011-01-01

The carotenoid-binding protein (CBP) of the domesticated silkworm, Bombyx mori, a major determinant of cocoon color, is likely to have been substantially influenced by domestication of this species. We analyzed the structure of the CBP gene in multiple strains of B. mori, in multiple individuals of the wild silkworm, B. mandarina (the putative wild ancestor of B. mori), and in a number of other lepidopterans. We found the CBP gene copy number in genomic DNA to vary widely among B. mori strains, ranging from 1 to 20. The copies of CBP are of several types, based on the presence of a retrotransposon or partial deletion of the coding sequence. In contrast to B. mori, B. mandarina was found to possess a single copy of CBP without the retrotransposon insertion, regardless of habitat. Several other lepidopterans were found to contain sequences homologous to CBP, revealing that this gene is evolutionarily conserved in the lepidopteran lineage. Thus, domestication can generate significant diversity of gene copy number and structure over a relatively short evolutionary time. PMID:21242537
Positive Selection Underlies Faster-Z Evolution of Gene Expression in Birds.

PubMed

Dean, Rebecca; Harrison, Peter W; Wright, Alison E; Zimmer, Fabian; Mank, Judith E

2015-10-01

The elevated rate of evolution for genes on sex chromosomes compared with autosomes (Fast-X or Fast-Z evolution) can result either from positive selection in the heterogametic sex or from nonadaptive consequences of reduced relative effective population size. Recent work in birds suggests that Fast-Z of coding sequence is primarily due to relaxed purifying selection resulting from reduced relative effective population size. However, gene sequence and gene expression are often subject to distinct evolutionary pressures; therefore, we tested for Fast-Z in gene expression using next-generation RNA-sequencing data from multiple avian species. Similar to studies of Fast-Z in coding sequence, we recover clear signatures of Fast-Z in gene expression; however, in contrast to coding sequence, our data indicate that Fast-Z in expression is due to positive selection acting primarily in females. In the soma, where gene expression is highly correlated between the sexes, we detected Fast-Z in both sexes, although at a higher rate in females, suggesting that many positively selected expression changes in females are also expressed in males. In the gonad, where intersexual correlations in expression are much lower, we detected Fast-Z for female gene expression, but crucially, not males. This suggests that a large amount of expression variation is sex-specific in its effects within the gonad. Taken together, our results indicate that Fast-Z evolution of gene expression is the product of positive selection acting on recessive beneficial alleles in the heterogametic sex. More broadly, our analysis suggests that the adaptive potential of Z chromosome gene expression may be much greater than that of gene sequence, results which have important implications for the role of sex chromosomes in speciation and sexual selection. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Poly(A) code analyses reveal key determinants for tissue-specific mRNA alternative polyadenylation

PubMed Central

Weng, Lingjie; Li, Yi; Xie, Xiaohui; Shi, Yongsheng

2016-01-01

mRNA alternative polyadenylation (APA) is a critical mechanism for post-transcriptional gene regulation and is often regulated in a tissue- and/or developmental stage-specific manner. An ultimate goal for the APA field has been to be able to computationally predict APA profiles under different physiological or pathological conditions. As a first step toward this goal, we have assembled a poly(A) code for predicting tissue-specific poly(A) sites (PASs). Based on a compendium of over 600 features that have known or potential roles in PAS selection, we have generated and refined a machine-learning algorithm using multiple high-throughput sequencing-based data sets of tissue-specific and constitutive PASs. This code can predict tissue-specific PASs with >85% accuracy. Importantly, by analyzing the prediction performance based on different RNA features, we found that PAS context, including the distance between alternative PASs and the relative position of a PAS within the gene, is a key feature for determining the susceptibility of a PAS to tissue-specific regulation. Our poly(A) code provides a useful tool for not only predicting tissue-specific APA regulation, but also for studying its underlying molecular mechanisms. PMID:27095026
Structured association analysis leads to insight into Saccharomyces cerevisiae gene regulation by finding multiple contributing eQTL hotspots associated with functional gene modules.

PubMed

Curtis, Ross E; Kim, Seyoung; Woolford, John L; Xu, Wenjie; Xing, Eric P

2013-03-21

Association analysis using genome-wide expression quantitative trait locus (eQTL) data investigates the effect that genetic variation has on cellular pathways and leads to the discovery of candidate regulators. Traditional analysis of eQTL data via pairwise statistical significance tests or linear regression does not leverage the availability of the structural information of the transcriptome, such as presence of gene networks that reveal correlation and potentially regulatory relationships among the study genes. We employ a new eQTL mapping algorithm, GFlasso, which we have previously developed for sparse structured regression, to reanalyze a genome-wide yeast dataset. GFlasso fully takes into account the dependencies among expression traits to suppress false positives and to enhance the signal/noise ratio. Thus, GFlasso leverages the gene-interaction network to discover the pleiotropic effects of genetic loci that perturb the expression level of multiple (rather than individual) genes, which enables us to gain more power in detecting previously neglected signals that are marginally weak but pleiotropically significant. While eQTL hotspots in yeast have been reported previously as genomic regions controlling multiple genes, our analysis reveals additional novel eQTL hotspots and, more interestingly, uncovers groups of multiple contributing eQTL hotspots that affect the expression level of functional gene modules. To our knowledge, our study is the first to report this type of gene regulation stemming from multiple eQTL hotspots. Additionally, we report the results from in-depth bioinformatics analysis for three groups of these eQTL hotspots: ribosome biogenesis, telomere silencing, and retrotransposon biology. We suggest candidate regulators for the functional gene modules that map to each group of hotspots. Not only do we find that many of these candidate regulators contain mutations in the promoter and coding regions of the genes, in the case of the Ribi group, we provide experimental evidence suggesting that the identified candidates do regulate the target genes predicted by GFlasso. Thus, this structured association analysis of a yeast eQTL dataset via GFlasso, coupled with extensive bioinformatics analysis, discovers a novel regulation pattern between multiple eQTL hotspots and functional gene modules. Furthermore, this analysis demonstrates the potential of GFlasso as a powerful computational tool for eQTL studies that exploit the rich structural information among expression traits due to correlation, regulation, or other forms of biological dependencies.
Decoding the non-coding genome: elucidating genetic risk outside the coding genome.

PubMed

Barr, C L; Misener, V L

2016-01-01

Current evidence emerging from genome-wide association studies indicates that the genetic underpinnings of complex traits are likely attributable to genetic variation that changes gene expression, rather than (or in combination with) variation that changes protein-coding sequences. This is particularly compelling with respect to psychiatric disorders, as genetic changes in regulatory regions may result in differential transcriptional responses to developmental cues and environmental/psychosocial stressors. Until recently, however, the link between transcriptional regulation and psychiatric genetic risk has been understudied. Multiple obstacles have contributed to the paucity of research in this area, including challenges in identifying the positions of remote (distal from the promoter) regulatory elements (e.g. enhancers) and their target genes and the underrepresentation of neural cell types and brain tissues in epigenome projects - the availability of high-quality brain tissues for epigenetic and transcriptome profiling, particularly for the adolescent and developing brain, has been limited. Further challenges have arisen in the prediction and testing of the functional impact of DNA variation with respect to multiple aspects of transcriptional control, including regulatory-element interaction (e.g. between enhancers and promoters), transcription factor binding and DNA methylation. Further, the brain has uncommon DNA-methylation marks with unique genomic distributions not found in other tissues - current evidence suggests the involvement of non-CG methylation and 5-hydroxymethylation in neurodevelopmental processes but much remains unknown. We review here knowledge gaps as well as both technological and resource obstacles that will need to be overcome in order to elucidate the involvement of brain-relevant gene-regulatory variants in genetic risk for psychiatric disorders. © 2015 John Wiley & Sons Ltd and International Behavioural and Neural Genetics Society.
Expansion by whole genome duplication and evolution of the sox gene family in teleost fish

PubMed Central

Naville, Magali; Volff, Jean-Nicolas

2017-01-01

It is now recognized that several rounds of whole genome duplication (WGD) have occurred during the evolution of vertebrates, but the link between WGDs and phenotypic diversification remains unsolved. We have investigated in this study the impact of the teleost-specific WGD on the evolution of the sox gene family in teleostean fishes. The sox gene family, which encodes for transcription factors, has essential role in morphology, physiology and behavior of vertebrates and teleosts, the current largest group of vertebrates. We have first redrawn the evolution of all sox genes identified in eleven teleost genomes using a comparative genomic approach including phylogenetic and synteny analyses. We noticed, compared to tetrapods, an important expansion of the sox family: 58% (11/19) of sox genes are duplicated in teleost genomes. Furthermore, all duplicated sox genes, except sox17 paralogs, are derived from the teleost-specific WGD. Then, focusing on five sox genes, analyzing the evolution of coding and non-coding sequences, as well as the expression patterns in fish embryos and adult tissues, we demonstrated that these paralogs followed lineage-specific evolutionary trajectories in teleost genomes. This work, based on whole genome data from multiple teleostean species, supports the contribution of WGDs to the expansion of gene families, as well as to the emergence of genomic differences between lineages that might promote genetic and phenotypic diversity in teleosts. PMID:28738066
Comparative genomic and plasmid analysis of beer-spoiling and non-beer-spoiling Lactobacillus brevis isolates.

PubMed

Bergsveinson, Jordyn; Ziola, Barry

2017-12-01

Beer-spoilage-related lactic acid bacteria (BSR LAB) belong to multiple genera and species; however, beer-spoilage capacity is isolate-specific and partially acquired via horizontal gene transfer within the brewing environment. Thus, the extent to which genus-, species-, or environment- (i.e., brewery-) level genetic variability influences beer-spoilage phenotype is unknown. Publicly available Lactobacillus brevis genomes were analyzed via BlAst Diagnostic Gene findEr (BADGE) for BSR genes and assessed for pangenomic relationships. Also analyzed were functional coding capacities of plasmids of LAB inhabiting extreme niche environments. Considerable genetic variation was observed in L. brevis isolated from clinical samples, whereas 16 candidate genes distinguish BSR and non-BSR L. brevis genomes. These genes are related to nutrient scavenging of gluconate or pentoses, mannose, and metabolism of pectin. BSR L. brevis isolates also have higher average nucleotide identity and stronger pangenome association with one another, though isolation source (i.e., specific brewery) also appears to influence the plasmid coding capacity of BSR LAB. Finally, it is shown that niche-specific adaptation and phenotype are plasmid-encoded for both BSR and non-BSR LAB. The ultimate combination of plasmid-encoded genes dictates the ability of L. brevis to survive in the most extreme beer environment, namely, gassed (i.e., pressurized) beer.
Long non-coding RNA discovery across the genus anopheles reveals conserved secondary structures within and beyond the Gambiae complex.

PubMed

Jenkins, Adam M; Waterhouse, Robert M; Muskavitch, Marc A T

2015-04-23

Long non-coding RNAs (lncRNAs) have been defined as mRNA-like transcripts longer than 200 nucleotides that lack significant protein-coding potential, and many of them constitute scaffolds for ribonucleoprotein complexes with critical roles in epigenetic regulation. Various lncRNAs have been implicated in the modulation of chromatin structure, transcriptional and post-transcriptional gene regulation, and regulation of genomic stability in mammals, Caenorhabditis elegans, and Drosophila melanogaster. The purpose of this study is to identify the lncRNA landscape in the malaria vector An. gambiae and assess the evolutionary conservation of lncRNAs and their secondary structures across the Anopheles genus. Using deep RNA sequencing of multiple Anopheles gambiae life stages, we have identified 2,949 lncRNAs and more than 300 previously unannotated putative protein-coding genes. The lncRNAs exhibit differential expression profiles across life stages and adult genders. We find that across the genus Anopheles, lncRNAs display much lower sequence conservation than protein-coding genes. Additionally, we find that lncRNA secondary structure is highly conserved within the Gambiae complex, but diverges rapidly across the rest of the genus Anopheles. This study offers one of the first lncRNA secondary structure analyses in vector insects. Our description of lncRNAs in An. gambiae offers the most comprehensive genome-wide insights to date into lncRNAs in this vector mosquito, and defines a set of potential targets for the development of vector-based interventions that may further curb the human malaria burden in disease-endemic countries.
[Long non-coding RNAs in plants].

PubMed

Xiaoqing, Huang; Dandan, Li; Juan, Wu

2015-04-01

Long non-coding RNAs (lncRNAs), which are longer than 200 nucleotides in length, widely exist in organisms and function in a variety of biological processes. Currently, most of lncRNAs found in plants are transcribed by RNA polymerase Ⅱ and mediate gene expression through multiple mechanisms, such as target mimicry, transcription interference, histone methylation and DNA methylation, and play important roles in flowering, male sterility, nutrition metabolism, biotic and abiotic stress and other biological processes as regulators in plants. In this review, we summarize the databases, prediction methods, and possible functions of plant lncRNAs discovered in recent years.
MGAS: a powerful tool for multivariate gene-based genome-wide association analysis.

PubMed

Van der Sluis, Sophie; Dolan, Conor V; Li, Jiang; Song, Youqiang; Sham, Pak; Posthuma, Danielle; Li, Miao-Xin

2015-04-01

Standard genome-wide association studies, testing the association between one phenotype and a large number of single nucleotide polymorphisms (SNPs), are limited in two ways: (i) traits are often multivariate, and analysis of composite scores entails loss in statistical power and (ii) gene-based analyses may be preferred, e.g. to decrease the multiple testing problem. Here we present a new method, multivariate gene-based association test by extended Simes procedure (MGAS), that allows gene-based testing of multivariate phenotypes in unrelated individuals. Through extensive simulation, we show that under most trait-generating genotype-phenotype models MGAS has superior statistical power to detect associated genes compared with gene-based analyses of univariate phenotypic composite scores (i.e. GATES, multiple regression), and multivariate analysis of variance (MANOVA). Re-analysis of metabolic data revealed 32 False Discovery Rate controlled genome-wide significant genes, and 12 regions harboring multiple genes; of these 44 regions, 30 were not reported in the original analysis. MGAS allows researchers to conduct their multivariate gene-based analyses efficiently, and without the loss of power that is often associated with an incorrectly specified genotype-phenotype models. MGAS is freely available in KGG v3.0 (http://statgenpro.psychiatry.hku.hk/limx/kgg/download.php). Access to the metabolic dataset can be requested at dbGaP (https://dbgap.ncbi.nlm.nih.gov/). The R-simulation code is available from http://ctglab.nl/people/sophie_van_der_sluis. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
iGC-an integrated analysis package of gene expression and copy number alteration.

PubMed

Lai, Yi-Pin; Wang, Liang-Bo; Wang, Wei-An; Lai, Liang-Chuan; Tsai, Mong-Hsun; Lu, Tzu-Pin; Chuang, Eric Y

2017-01-14

With the advancement in high-throughput technologies, researchers can simultaneously investigate gene expression and copy number alteration (CNA) data from individual patients at a lower cost. Traditional analysis methods analyze each type of data individually and integrate their results using Venn diagrams. Challenges arise, however, when the results are irreproducible and inconsistent across multiple platforms. To address these issues, one possible approach is to concurrently analyze both gene expression profiling and CNAs in the same individual. We have developed an open-source R/Bioconductor package (iGC). Multiple input formats are supported and users can define their own criteria for identifying differentially expressed genes driven by CNAs. The analysis of two real microarray datasets demonstrated that the CNA-driven genes identified by the iGC package showed significantly higher Pearson correlation coefficients with their gene expression levels and copy numbers than those genes located in a genomic region with CNA. Compared with the Venn diagram approach, the iGC package showed better performance. The iGC package is effective and useful for identifying CNA-driven genes. By simultaneously considering both comparative genomic and transcriptomic data, it can provide better understanding of biological and medical questions. The iGC package's source code and manual are freely available at https://www.bioconductor.org/packages/release/bioc/html/iGC.html .
Coding and noncoding expression patterns associated with rare obesity-related disorders: Prader–Willi and Alström syndromes

PubMed Central

Butler, Merlin G; Wang, Kun; Marshall, Jan D; Naggert, Jürgen K; Rethmeyer, Jasmine A; Gunewardena, Sumedha S; Manzardo, Ann M

2015-01-01

Obesity is accompanied by hyperphagia in several classical genetic obesity-related syndromes that are rare, including Prader–Willi syndrome (PWS) and Alström syndrome (ALMS). We compared coding and noncoding gene expression in adult males with PWS, ALMS, and nonsyndromic obesity relative to nonobese males using readily available lymphoblastoid cells to identify disease-specific molecular patterns and disturbed mechanisms in obesity. We found 231 genes upregulated in ALMS compared with nonobese males, but no genes were found to be upregulated in obese or PWS males and 124 genes were downregulated in ALMS. The metallothionein gene (MT1X) was significantly downregulated in ALMS, in common with obese males. Only the complex SNRPN locus was disturbed (downregulated) in PWS along with several downregulated small nucleolar RNAs (snoRNAs) in the 15q11-q13 region (SNORD116, SNORD109B, SNORD109A, SNORD107). Eleven upregulated and ten downregulated snoRNAs targeting multiple genes impacting rRNA processing, developmental pathways, and associated diseases were found in ALMS. Fifty-two miRNAs associated with multiple, overlapping gene expression disturbances were upregulated in ALMS, and four were shared with obese males but not PWS males. For example, seven passenger strand microRNAs (miRNAs) (miR-93*, miR-373*, miR-29b-2*, miR-30c-1*, miR27a*, miR27b*, and miR-149*) were disturbed in association with six separate downregulated target genes (CD68, FAM102A, MXI1, MYO1D, TP53INP1, and ZRANB1). Cell cycle (eg, PPP3CA), transcription (eg, POLE2), and development may be impacted by upregulated genes in ALMS, while downregulated genes were found to be involved with metabolic processes (eg, FABP3), immune responses (eg, IL32), and cell signaling (eg, IL1B). The high number of gene and noncoding RNA disturbances in ALMS contrast with observations in PWS and males with nonsyndromic obesity and may reflect the progressing multiorgan pathology of the ALMS disease process. PMID:25705109
Customized oligonucleotide microchips that convert multiple genetic information to simple patterns, are portable and reusable

DOEpatents

Mirzabekov, Andrei; Guschin, Dmitry Y.; Chik, Valentine; Drobyshev, Aleksei; Fotin, Alexander; Yershov, Gennadiy; Lysov, Yuri

2002-01-01

This invention relates to using customized oligonucleotide microchips as biosensors for the detection and identification of nucleic acids specific for different genes, organisms and/or individuals in the environment, in food and in biological samples. The microchips are designed to convert multiple bits of genetic information into simpler patterns of signals that are interpreted as a unit. Because of an improved method of hybridizing oligonucleotides from samples to microchips, microchips are reusable and transportable. For field study, portable laser or bar code scanners are suitable.
Role of genomic architecture in the expression dynamics of long noncoding RNAs during differentiation of human neuroblastoma cells.

PubMed

Batagov, Arsen O; Yarmishyn, Aliaksandr A; Jenjaroenpun, Piroon; Tan, Jovina Z; Nishida, Yuichiro; Kurochkin, Igor V

2013-10-16

Mammalian genomes are extensively transcribed producing thousands of long non-protein-coding RNAs (lncRNAs). The biological significance and function of the vast majority of lncRNAs remain unclear. Recent studies have implicated several lncRNAs as playing important roles in embryonic development and cancer progression. LncRNAs are characterized with different genomic architectures in relationship with their associated protein-coding genes. Our study aimed at bridging lncRNA architecture with dynamical patterns of their expression using differentiating human neuroblastoma cells model. LncRNA expression was studied in a 120-hours timecourse of differentiation of human neuroblastoma SH-SY5Y cells into neurons upon treatment with retinoic acid (RA), the compound used for the treatment of neuroblastoma. A custom microarray chip was utilized to interrogate expression levels of 9,267 lncRNAs in the course of differentiation. We categorized lncRNAs into 19 architecture classes according to their position relatively to protein-coding genes. For each architecture class, dynamics of expression of lncRNAs was studied in association with their protein-coding partners. It allowed us to demonstrate positive correlation of lncRNAs with their associated protein-coding genes at bidirectional promoters and for sense-antisense transcript pairs. In contrast, lncRNAs located in the introns and downstream of the protein-coding genes were characterized with negative correlation modes. We further classified the lncRNAs by the temporal patterns of their expression dynamics. We found that intronic and bidirectional promoter architectures are associated with rapid RA-dependent induction or repression of the corresponding lncRNAs, followed by their constant expression. At the same time, lncRNAs expressed downstream of protein-coding genes are characterized by rapid induction, followed by transcriptional repression. Quantitative RT-PCR analysis confirmed the discovered functional modes for several selected lncRNAs associated with proteins involved in cancer and embryonic development. This is the first report detailing dynamical changes of multiple lncRNAs during RA-induced neuroblastoma differentiation. Integration of genomic and transcriptomic levels of information allowed us to demonstrate specific behavior of lncRNAs organized in different genomic architectures. This study also provides a list of lncRNAs with possible roles in neuroblastoma.
Computational Identification and Functional Predictions of Long Noncoding RNA in Zea mays

PubMed Central

Boerner, Susan; McGinnis, Karen M.

2012-01-01

Background Computational analysis of cDNA sequences from multiple organisms suggests that a large portion of transcribed DNA does not code for a functional protein. In mammals, noncoding transcription is abundant, and often results in functional RNA molecules that do not appear to encode proteins. Many long noncoding RNAs (lncRNAs) appear to have epigenetic regulatory function in humans, including HOTAIR and XIST. While epigenetic gene regulation is clearly an essential mechanism in plants, relatively little is known about the presence or function of lncRNAs in plants. Methodology/Principal Findings To explore the connection between lncRNA and epigenetic regulation of gene expression in plants, a computational pipeline using the programming language Python has been developed and applied to maize full length cDNA sequences to identify, classify, and localize potential lncRNAs. The pipeline was used in parallel with an SVM tool for identifying ncRNAs to identify the maximal number of ncRNAs in the dataset. Although the available library of sequences was small and potentially biased toward protein coding transcripts, 15% of the sequences were predicted to be noncoding. Approximately 60% of these sequences appear to act as precursors for small RNA molecules and may function to regulate gene expression via a small RNA dependent mechanism. ncRNAs were predicted to originate from both genic and intergenic loci. Of the lncRNAs that originated from genic loci, ∼20% were antisense to the host gene loci. Conclusions/Significance Consistent with similar studies in other organisms, noncoding transcription appears to be widespread in the maize genome. Computational predictions indicate that maize lncRNAs may function to regulate expression of other genes through multiple RNA mediated mechanisms. PMID:22916204
Single nucleotide polymorphism-specific regulation of matrix metalloproteinase-9 by multiple miRNAs targeting the coding exon

PubMed Central

Duellman, Tyler; Warren, Christopher; Yang, Jay

2014-01-01

Microribonucleic acids (miRNAs) work with exquisite specificity and are able to distinguish a target from a non-target based on a single nucleotide mismatch in the core nucleotide domain. We questioned whether miRNA regulation of gene expression could occur in a single nucleotide polymorphism (SNP)-specific manner, manifesting as a post-transcriptional control of expression of genetic polymorphisms. In our recent study of the functional consequences of matrix metalloproteinase (MMP)-9 SNPs, we discovered that expression of a coding exon SNP in the pro-domain of the protein resulted in a profound decrease in the secreted protein. This missense SNP results in the N38S amino acid change and a loss of an N-glycosylation site. A systematic study demonstrated that the loss of secreted protein was due not to the loss of an N-glycosylation site, but rather an SNP-specific targeting by miR-671-3p and miR-657. Bioinformatics analysis identified 41 SNP-specific miRNA targeting MMP-9 SNPs, mostly in the coding exon and an extension of the analysis to chromosome 20, where the MMP-9 gene is located, suggesting that SNP-specific miRNAs targeting the coding exon are prevalent. This selective post-transcriptional regulation of a target messenger RNA harboring genetic polymorphisms by miRNAs offers an SNP-dependent post-transcriptional regulatory mechanism, allowing for polymorphic-specific differential gene regulation. PMID:24627221
Metformin-Induced Changes of the Coding Transcriptome and Non-Coding RNAs in the Livers of Non-Alcoholic Fatty Liver Disease Mice.

PubMed

Guo, Jun; Zhou, Yuan; Cheng, Yafen; Fang, Weiwei; Hu, Gang; Wei, Jie; Lin, Yajun; Man, Yong; Guo, Lixin; Sun, Mingxiao; Cui, Qinghua; Li, Jian

2018-01-01

Recent studies have suggested that changes in non-coding mRNA play a key role in the progression of non-alcoholic fatty liver disease (NAFLD). Metformin is now recommended and effective for the treatment of NAFLD. We hope the current analyses of the non-coding mRNA transcriptome will provide a better presentation of the potential roles of mRNAs and long non-coding RNAs (lncRNAs) that underlie NAFLD and metformin intervention. The present study mainly analysed changes in the coding transcriptome and non-coding RNAs after the application of a five-week metformin intervention. Liver samples from three groups of mice were harvested for transcriptome profiling, which covered mRNA, lncRNA, microRNA (miRNA) and circular RNA (circRNA), using a microarray technique. A systematic alleviation of high-fat diet (HFD)-induced transcriptome alterations by metformin was observed. The metformin treatment largely reversed the correlations with diabetes-related pathways. Our analysis also suggested interaction networks between differentially expressed lncRNAs and known hepatic disease genes and interactions between circRNA and their disease-related miRNA partners. Eight HFD-responsive lncRNAs and three metformin-responsive lncRNAs were noted due to their widespread associations with disease genes. Moreover, seven miRNAs that interacted with multiple differentially expressed circRNAs were highlighted because they were likely to be associated with metabolic or liver diseases. The present study identified novel changes in the coding transcriptome and non-coding RNAs in the livers of NAFLD mice after metformin treatment that might shed light on the underlying mechanism by which metformin impedes the progression of NAFLD. © 2018 The Author(s). Published by S. Karger AG, Basel.
Studying Functions of All Yeast Genes Simultaneously

NASA Technical Reports Server (NTRS)

Stolc, Viktor; Eason, Robert G.; Poumand, Nader; Herman, Zelek S.; Davis, Ronald W.; Anthony Kevin; Jejelowo, Olufisayo

2006-01-01

A method of studying the functions of all the genes of a given species of microorganism simultaneously has been developed in experiments on Saccharomyces cerevisiae (commonly known as baker's or brewer's yeast). It is already known that many yeast genes perform functions similar to those of corresponding human genes; therefore, by facilitating understanding of yeast genes, the method may ultimately also contribute to the knowledge needed to treat some diseases in humans. Because of the complexity of the method and the highly specialized nature of the underlying knowledge, it is possible to give only a brief and sketchy summary here. The method involves the use of unique synthetic deoxyribonucleic acid (DNA) sequences that are denoted as DNA bar codes because of their utility as molecular labels. The method also involves the disruption of gene functions through deletion of genes. Saccharomyces cerevisiae is a particularly powerful experimental system in that multiple deletion strains easily can be pooled for parallel growth assays. Individual deletion strains recently have been created for 5,918 open reading frames, representing nearly all of the estimated 6,000 genetic loci of Saccharomyces cerevisiae. Tagging of each deletion strain with one or two unique 20-nucleotide sequences enables identification of genes affected by specific growth conditions, without prior knowledge of gene functions. Hybridization of bar-code DNA to oligonucleotide arrays can be used to measure the growth rate of each strain over several cell-division generations. The growth rate thus measured serves as an index of the fitness of the strain.

The Ftx Noncoding Locus Controls X Chromosome Inactivation Independently of Its RNA Products.

PubMed

Furlan, Giulia; Gutierrez Hernandez, Nancy; Huret, Christophe; Galupa, Rafael; van Bemmel, Joke Gerarda; Romito, Antonio; Heard, Edith; Morey, Céline; Rougeulle, Claire

2018-05-03

Accumulation of the Xist long noncoding RNA (lncRNA) on one X chromosome is the trigger for X chromosome inactivation (XCI) in female mammals. Xist expression, which needs to be tightly controlled, involves a cis-acting region, the X-inactivation center (Xic), containing many lncRNA genes that evolved concomitantly to Xist from protein-coding ancestors through pseudogeneization and loss of coding potential. Here, we uncover an essential role for the Xic-linked noncoding gene Ftx in the regulation of Xist expression. We show that Ftx is required in cis to promote Xist transcriptional activation and establishment of XCI. Importantly, we demonstrate that this function depends on Ftx transcription and not on the RNA products. Our findings illustrate the multiplicity of layers operating in the establishment of XCI and highlight the diversity in the modus operandi of the noncoding players. Copyright © 2018 Elsevier Inc. All rights reserved.
Genomic and Epigenomic Alterations in Cancer.

PubMed

Chakravarthi, Balabhadrapatruni V S K; Nepal, Saroj; Varambally, Sooryanarayana

2016-07-01

Multiple genetic and epigenetic events characterize tumor progression and define the identity of the tumors. Advances in high-throughput technologies, like gene expression profiling, next-generation sequencing, proteomics, and metabolomics, have enabled detailed molecular characterization of various tumors. The integration and analyses of these high-throughput data have unraveled many novel molecular aberrations and network alterations in tumors. These molecular alterations include multiple cancer-driving mutations, gene fusions, amplification, deletion, and post-translational modifications, among others. Many of these genomic events are being used in cancer diagnosis, whereas others are therapeutically targeted with small-molecule inhibitors. Multiple genes/enzymes that play a role in DNA and histone modifications are also altered in various cancers, changing the epigenomic landscape during cancer initiation and progression. Apart from protein-coding genes, studies are uncovering the critical regulatory roles played by noncoding RNAs and noncoding regions of the genome during cancer progression. Many of these genomic and epigenetic events function in tandem to drive tumor development and metastasis. Concurrent advances in genome-modulating technologies, like gene silencing and genome editing, are providing ability to understand in detail the process of cancer initiation, progression, and signaling as well as opening up avenues for therapeutic targeting. In this review, we discuss some of the recent advances in cancer genomic and epigenomic research. Copyright © 2016 American Society for Investigative Pathology. Published by Elsevier Inc. All rights reserved.
A Phylogenomic Assessment of Ancient Polyploidy and Genome Evolution across the Poales

PubMed Central

McKain, Michael R.; Tang, Haibao; McNeal, Joel R.; Ayyampalayam, Saravanaraj; Davis, Jerrold I.; dePamphilis, Claude W.; Givnish, Thomas J.; Pires, J. Chris; Stevenson, Dennis Wm.; Leebens-Mack, James H.

2016-01-01

Comparisons of flowering plant genomes reveal multiple rounds of ancient polyploidy characterized by large intragenomic syntenic blocks. Three such whole-genome duplication (WGD) events, designated as rho (ρ), sigma (σ), and tau (τ), have been identified in the genomes of cereal grasses. Precise dating of these WGD events is necessary to investigate how they have influenced diversification rates, evolutionary innovations, and genomic characteristics such as the GC profile of protein-coding sequences. The timing of these events has remained uncertain due to the paucity of monocot genome sequence data outside the grass family (Poaceae). Phylogenomic analysis of protein-coding genes from sequenced genomes and transcriptome assemblies from 35 species, including representatives of all families within the Poales, has resolved the timing of rho and sigma relative to speciation events and placed tau prior to divergence of Asparagales and the commelinids but after divergence with eudicots. Examination of gene family phylogenies indicates that rho occurred just prior to the diversification of Poaceae and sigma occurred before early diversification of Poales lineages but after the Poales-commelinid split. Additional lineage-specific WGD events were identified on the basis of the transcriptome data. Gene families exhibiting high GC content are underrepresented among those with duplicate genes that persisted following these genome duplications. However, genome duplications had little overall influence on lineage-specific changes in the GC content of coding genes. Improved resolution of the timing of WGD events in monocot history provides evidence for the influence of polyploidization on functional evolution and species diversification. PMID:26988252
A Case-by-Case Evolutionary Analysis of Four Imprinted Retrogenes

PubMed Central

McCole, Ruth B; Loughran, Noeleen B; Chahal, Mandeep; Fernandes, Luis P; Roberts, Roland G; Fraternali, Franca; O'Connell, Mary J; Oakey, Rebecca J

2011-01-01

Retroposition is a widespread phenomenon resulting in the generation of new genes that are initially related to a parent gene via very high coding sequence similarity. We examine the evolutionary fate of four retrogenes generated by such an event; mouse Inpp5f_v2, Mcts2, Nap1l5, and U2af1-rs1. These genes are all subject to the epigenetic phenomenon of parental imprinting. We first provide new data on the age of these retrogene insertions. Using codon-based models of sequence evolution, we show these retrogenes have diverse evolutionary trajectories, including divergence from the parent coding sequence under positive selection pressure, purifying selection pressure maintaining parent-retrogene similarity, and neutral evolution. Examination of the expression pattern of retrogenes shows an atypical, broad pattern across multiple tissues. Protein 3D structure modeling reveals that a positively selected residue in U2af1-rs1, not shared by its parent, may influence protein conformation. Our case-by-case analysis of the evolution of four imprinted retrogenes reveals that this interesting class of imprinted genes, while similar in regulation and sequence characteristics, follow very varied evolutionary paths. PMID:21166792
Whole genome sequence analysis of Geitlerinema sp. FC II unveils competitive edge of the strain in marine cultivation system for biofuel production.

PubMed

Batchu, Navish Kumar; Khater, Shradha; Patil, Sonal; Nagle, Vinod; Das, Gautam; Bhadra, Bhaskar; Sapre, Ajit; Dasgupta, Santanu

2018-03-05

A filamentous cyanobacteria, Geitlerinema sp. FC II, was isolated from marine algae culture pond at Reliance Industries Limited (RIL), India. The 6.7 Mb draft genome of FC II encodes for 6697 protein coding genes. Analysis of the whole genome sequence revealed presence of nif gene cluster, supporting its capability to fix atmospheric nitrogen. FC II genome contains two variants of sulfide:quinone oxidoreductases (SQR), which is a crucial elector donor in cyanobacterial metabolic processes. FC II is characterized by the presence of multiple CRISPR- Cas (Clustered Regularly Interspaced Short Palindrome Repeats - CRISPR associated proteins) clusters, multiple variants of genes encoding photosystem reaction centres, biosynthetic gene clusters of alkane, polyketides and non-ribosomal peptides. Presence of these pathways will help FC II in gaining an ecological advantage over other strains for biomass production in large scale cultivation system. Hence, FC II may be used for production of biofuel and other industrially important metabolites. Copyright © 2018 Elsevier Inc. All rights reserved.
The Maximal C³ Self-Complementary Trinucleotide Circular Code X in Genes of Bacteria, Archaea, Eukaryotes, Plasmids and Viruses.

PubMed

Michel, Christian J

2017-04-18

In 1996, a set X of 20 trinucleotides was identified in genes of both prokaryotes and eukaryotes which has on average the highest occurrence in reading frame compared to its two shifted frames. Furthermore, this set X has an interesting mathematical property as X is a maximal C 3 self-complementary trinucleotide circular code. In 2015, by quantifying the inspection approach used in 1996, the circular code X was confirmed in the genes of bacteria and eukaryotes and was also identified in the genes of plasmids and viruses. The method was based on the preferential occurrence of trinucleotides among the three frames at the gene population level. We extend here this definition at the gene level. This new statistical approach considers all the genes, i.e., of large and small lengths, with the same weight for searching the circular code X . As a consequence, the concept of circular code, in particular the reading frame retrieval, is directly associated to each gene. At the gene level, the circular code X is strengthened in the genes of bacteria, eukaryotes, plasmids, and viruses, and is now also identified in the genes of archaea. The genes of mitochondria and chloroplasts contain a subset of the circular code X . Finally, by studying viral genes, the circular code X was found in DNA genomes, RNA genomes, double-stranded genomes, and single-stranded genomes.
Coding of Class I and II aminoacyl-tRNA synthetases

PubMed Central

Carter, Charles W.

2018-01-01

SUMMARY The aminoacyl-tRNA synthetases and their cognate transfer RNAs translate the universal genetic code. The twenty canonical amino acids are sufficiently diverse to create a selective advantage for dividing amino acid activation between two distinct, apparently unrelated superfamilies of synthetases, Class I amino acids being generally larger and less polar, Class II amino acids smaller and more polar. Biochemical, bioinformatic, and protein engineering experiments support the hypothesis that the two Classes descended from opposite strands of the same ancestral gene. Parallel experimental deconstructions of Class I and II synthetases reveal parallel losses in catalytic proficiency at two novel modular levels—protozymes and Urzymes—associated with the evolution of catalytic activity. Bi-directional coding supports an important unification of the proteome; affords a genetic relatedness metric—middle base-pairing frequencies in sense/antisense alignments—that probes more deeply into the evolutionary history of translation than do single multiple sequence alignments; and has facilitated the analysis of hitherto unknown coding relationships in tRNA sequences. Reconstruction of native synthetases by modular thermodynamic cycles facilitated by domain engineering emphasizes the subtlety associated with achieving high specificity, shedding new light on allosteric relationships in contemporary synthetases. Synthetase Urzyme structural biology suggests that they are catalytically active molten globules, broadening the potential manifold of polypeptide catalysts accessible to primitive genetic coding and motivating revisions of the origins of catalysis. Finally, bi-directional genetic coding of some of the oldest genes in the proteome places major limitations on the likelihood that any RNA World preceded the origins of coded proteins. PMID:28828732
Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes

PubMed Central

Lin, Michael F.; Deoras, Ameya N.; Rasmussen, Matthew D.; Kellis, Manolis

2008-01-01

Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (≤240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human. PMID:18421375
SMARCB1/INI1 germline mutations contribute to 10% of sporadic schwannomatosis.

PubMed

Rousseau, Guillaume; Noguchi, Tetsuro; Bourdon, Violaine; Sobol, Hagay; Olschwang, Sylviane

2011-01-24

Schwannomatosis is a disease characterized by multiple non-vestibular schwannomas. Although biallelic NF2 mutations are found in schwannomas, no germ line event is detected in schwannomatosis patients. In contrast, germline mutations of the SMARCB1 (INI1) tumor suppressor gene were described in familial and sporadic schwannomatosis patients. To delineate the SMARCB1 gene contribution, the nine coding exons were sequenced in a series of 56 patients affected with a variable number of non-vestibular schwannomas. Nine variants scattered along the sequence of SMARCB1 were identified. Five of them were classified as deleterious. All five patients carrying a SMARCB1 mutation had more multiple schwannomas, corresponding to 10.2% of patients with schwannomatosis. They were also diagnosed before 35 years of age. These results suggest that patients with schwannomas have a significant probability of carrying a SMARCB1 mutation. Combined with data available from other studies, they confirm the clinical indications for genetic screening of the SMARCB1 gene.
SMARCB1/INI1 germline mutations contribute to 10% of sporadic schwannomatosis

PubMed Central

2011-01-01

Background Schwannomatosis is a disease characterized by multiple non-vestibular schwannomas. Although biallelic NF2 mutations are found in schwannomas, no germ line event is detected in schwannomatosis patients. In contrast, germline mutations of the SMARCB1 (INI1) tumor suppressor gene were described in familial and sporadic schwannomatosis patients. Methods To delineate the SMARCB1 gene contribution, the nine coding exons were sequenced in a series of 56 patients affected with a variable number of non-vestibular schwannomas. Results Nine variants scattered along the sequence of SMARCB1 were identified. Five of them were classified as deleterious. All five patients carrying a SMARCB1 mutation had more multiple schwannomas, corresponding to 10.2% of patients with schwannomatosis. They were also diagnosed before 35 years of age. Conclusions These results suggest that patients with schwannomas have a significant probability of carrying a SMARCB1 mutation. Combined with data available from other studies, they confirm the clinical indications for genetic screening of the SMARCB1 gene. PMID:21255467
Multiple organ gigantism caused by mutation in VmPPD gene in blackgram (Vigna mungo).

PubMed

Naito, Ken; Takahashi, Yu; Chaitieng, Bubpa; Hirano, Kumi; Kaga, Akito; Takagi, Kyoko; Ogiso-Tanaka, Eri; Thavarasook, Charaspon; Ishimoto, Masao; Tomooka, Norihiko

2017-03-01

Seed size is one of the most important traits in leguminous crops. We obtained a recessive mutant of blackgram that had greatly enlarged leaves, stems and seeds. The mutant produced 100% bigger leaves, 50% more biomass and 70% larger seeds though it produced 40% less number of seeds. We designated the mutant as multiple-organ-gigantism ( mog ) and found the mog phenotype was due to increase in cell numbers but not in cell size. We also found the mog mutant showed a rippled leaf ( rl ) phenotype, which was probably caused by a pleiotropic effect of the mutation. We performed a map-based cloning and successfully identified an 8 bp deletion in the coding sequence of VmPPD gene, an orthologue of Arabidopsis PEAPOD ( PPD ) that regulates arrest of cell divisions in meristematic cells . We found no other mutations in the neighboring genes between the mutant and the wild type. We also knocked down GmPPD genes and reproduced both the mog and rl phenotypes in soybean. Controlling PPD genes to produce the mog phenotype is highly valuable for breeding since larger seed size could directly increase the commercial values of grain legumes.
Haplotype Analysis in Multiple Crosses to Identify a QTL Gene

PubMed Central

Wang, Xiaosong; Korstanje, Ron; Higgins, David; Paigen, Beverly

2004-01-01

Identifying quantitative trait locus (QTL) genes is a challenging task. Herein, we report using a two-step process to identify Apoa2 as the gene underlying Hdlq5, a QTL for plasma high-density lipoprotein cholesterol (HDL) levels on mouse chromosome 1. First, we performed a sequence analysis of the Apoa2 coding region in 46 genetically diverse mouse strains and found five different APOA2 protein variants, which we named APOA2a to APOA2e. Second, we conducted a haplotype analysis of the strains in 21 crosses that have so far detected HDL QTLs; we found that Hdlq5 was detected only in the nine crosses where one parent had the APOA2b protein variant characterized by an Ala61-to-Val61 substitution. We then found that strains with the APOA2b variant had significantly higher (P ≤ 0.002) plasma HDL levels than those with either the APOA2a or the APOA2c variant. These findings support Apoa2 as the underlying Hdlq5 gene and suggest the Apoa2 polymorphisms responsible for the Hdlq5 phenotype. Therefore, haplotype analysis in multiple crosses can be used to support a candidate QTL gene. PMID:15310659
Haplotype analysis in multiple crosses to identify a QTL gene.

PubMed

Wang, Xiaosong; Korstanje, Ron; Higgins, David; Paigen, Beverly

2004-09-01

Identifying quantitative trait locus (QTL) genes is a challenging task. Herein, we report using a two-step process to identify Apoa2 as the gene underlying Hdlq5, a QTL for plasma high-density lipoprotein cholesterol (HDL) levels on mouse chromosome 1. First, we performed a sequence analysis of the Apoa2 coding region in 46 genetically diverse mouse strains and found five different APOA2 protein variants, which we named APOA2a to APOA2e. Second, we conducted a haplotype analysis of the strains in 21 crosses that have so far detected HDL QTLs; we found that Hdlq5 was detected only in the nine crosses where one parent had the APOA2b protein variant characterized by an Ala61-to-Val61 substitution. We then found that strains with the APOA2b variant had significantly higher (P < or = 0.002) plasma HDL levels than those with either the APOA2a or the APOA2c variant. These findings support Apoa2 as the underlying Hdlq5 gene and suggest the Apoa2 polymorphisms responsible for the Hdlq5 phenotype. Therefore, haplotype analysis in multiple crosses can be used to support a candidate QTL gene.
Multiple organ gigantism caused by mutation in VmPPD gene in blackgram (Vigna mungo)

PubMed Central

Naito, Ken; Takahashi, Yu; Chaitieng, Bubpa; Hirano, Kumi; Kaga, Akito; Takagi, Kyoko; Ogiso-Tanaka, Eri; Thavarasook, Charaspon; Ishimoto, Masao; Tomooka, Norihiko

2017-01-01

Seed size is one of the most important traits in leguminous crops. We obtained a recessive mutant of blackgram that had greatly enlarged leaves, stems and seeds. The mutant produced 100% bigger leaves, 50% more biomass and 70% larger seeds though it produced 40% less number of seeds. We designated the mutant as multiple-organ-gigantism (mog) and found the mog phenotype was due to increase in cell numbers but not in cell size. We also found the mog mutant showed a rippled leaf (rl) phenotype, which was probably caused by a pleiotropic effect of the mutation. We performed a map-based cloning and successfully identified an 8 bp deletion in the coding sequence of VmPPD gene, an orthologue of Arabidopsis PEAPOD (PPD) that regulates arrest of cell divisions in meristematic cells. We found no other mutations in the neighboring genes between the mutant and the wild type. We also knocked down GmPPD genes and reproduced both the mog and rl phenotypes in soybean. Controlling PPD genes to produce the mog phenotype is highly valuable for breeding since larger seed size could directly increase the commercial values of grain legumes. PMID:28588392
Low-dose exposure to bisphenols A, F and S of human primary adipocyte impacts coding and non-coding RNA profiles

PubMed Central

Leloire, Audrey; Dhennin, Véronique; Coumoul, Xavier; Yengo, Loïc; Froguel, Philippe

2017-01-01

Bisphenol A (BPA) exposure has been suspected to be associated with deleterious effects on health including obesity and metabolically-linked diseases. Although bisphenols F (BPF) and S (BPS) are BPA structural analogs commonly used in many marketed products as a replacement for BPA, only sparse toxicological data are available yet. Our objective was to comprehensively characterize bisphenols gene targets in a human primary adipocyte model, in order to determine whether they may induce cellular dysfunction, using chronic exposure at two concentrations: a “low-dose” similar to the dose usually encountered in human biological fluids and a higher dose. Therefore, BPA, BPF and BPS have been added at 10 nM or 10 μM during the differentiation of human primary adipocytes from subcutaneous fat of three non-diabetic Caucasian female patients. Gene expression (mRNA/lncRNA) arrays and microRNA arrays, have been used to assess coding and non-coding RNA changes. We detected significantly deregulated mRNA/lncRNA and miRNA at low and high doses. Enrichment in “cancer” and “organismal injury and abnormalities” related pathways was found in response to the three products. Some long intergenic non-coding RNAs and small nucleolar RNAs were differentially expressed suggesting that bisphenols may also activate multiple cellular processes and epigenetic modifications. The analysis of upstream regulators of deregulated genes highlighted hormones or hormone-like chemicals suggesting that BPS and BPF can be suspected to interfere, just like BPA, with hormonal regulation and have to be considered as endocrine disruptors. All these results suggest that as BPA, its substitutes BPS and BPF should be used with the same restrictions. PMID:28628672
Systematic reconstruction of autism biology from massive genetic mutation profiles

PubMed Central

Zhang, Chaolin; Jiang, Yong-hui

2018-01-01

Autism spectrum disorder (ASD) affects 1% of world population and has become a pressing medical and social problem worldwide. As a paradigmatic complex genetic disease, ASD has been intensively studied and thousands of gene mutations have been reported. Because these mutations rarely recur, it is difficult to (i) pinpoint the fewer disease-causing versus majority random events and (ii) replicate or verify independent studies. A coherent and systematic understanding of autism biology has not been achieved. We analyzed 3392 and 4792 autism-related mutations from two large-scale whole-exome studies across multiple resolution levels, that is, variants (single-nucleotide), genes (protein-coding unit), and pathways (molecular module). These mutations do not recur or replicate at the variant level, but significantly and increasingly do so at gene and pathway levels. Genetic association reveals a novel gene + pathway dual-hit model, where the mutation burden becomes less relevant. In multiple independent analyses, hundreds of variants or genes repeatedly converge to several canonical pathways, either novel or literature-supported. These pathways define recurrent and systematic ASD biology, distinct from previously reported gene groups or networks. They also present a catalog of novel ASD risk factors including 118 variants and 72 genes. At a subpathway level, most variants disrupt the pathway-related gene functions, and in the same gene, they tend to hit residues extremely close to each other and in the same domain. Multiple interacting variants spotlight key modules, including the cAMP (adenosine 3′,5′-monophosphate) second-messenger system and mGluR (metabotropic glutamate receptor) signaling regulation by GRKs (G protein–coupled receptor kinases). At a superpathway level, distinct pathways further interconnect and converge to three biology themes: synaptic function, morphology, and plasticity. PMID:29651456
Systematic reconstruction of autism biology from massive genetic mutation profiles.

PubMed

Luo, Weijun; Zhang, Chaolin; Jiang, Yong-Hui; Brouwer, Cory R

2018-04-01

Autism spectrum disorder (ASD) affects 1% of world population and has become a pressing medical and social problem worldwide. As a paradigmatic complex genetic disease, ASD has been intensively studied and thousands of gene mutations have been reported. Because these mutations rarely recur, it is difficult to (i) pinpoint the fewer disease-causing versus majority random events and (ii) replicate or verify independent studies. A coherent and systematic understanding of autism biology has not been achieved. We analyzed 3392 and 4792 autism-related mutations from two large-scale whole-exome studies across multiple resolution levels, that is, variants (single-nucleotide), genes (protein-coding unit), and pathways (molecular module). These mutations do not recur or replicate at the variant level, but significantly and increasingly do so at gene and pathway levels. Genetic association reveals a novel gene + pathway dual-hit model, where the mutation burden becomes less relevant. In multiple independent analyses, hundreds of variants or genes repeatedly converge to several canonical pathways, either novel or literature-supported. These pathways define recurrent and systematic ASD biology, distinct from previously reported gene groups or networks. They also present a catalog of novel ASD risk factors including 118 variants and 72 genes. At a subpathway level, most variants disrupt the pathway-related gene functions, and in the same gene, they tend to hit residues extremely close to each other and in the same domain. Multiple interacting variants spotlight key modules, including the cAMP (adenosine 3',5'-monophosphate) second-messenger system and mGluR (metabotropic glutamate receptor) signaling regulation by GRKs (G protein-coupled receptor kinases). At a superpathway level, distinct pathways further interconnect and converge to three biology themes: synaptic function, morphology, and plasticity.
MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity

PubMed Central

Wang, Yupeng; Tang, Haibao; DeBarry, Jeremy D.; Tan, Xu; Li, Jingping; Wang, Xiyin; Lee, Tae-ho; Jin, Huizhe; Marler, Barry; Guo, Hui; Kissinger, Jessica C.; Paterson, Andrew H.

2012-01-01

MCScan is an algorithm able to scan multiple genomes or subgenomes in order to identify putative homologous chromosomal regions, and align these regions using genes as anchors. The MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity that extends the original software by incorporating 14 utility programs for visualization of results and additional downstream analyses. Applications of MCScanX to several sequenced plant genomes and gene families are shown as examples. MCScanX can be used to effectively analyze chromosome structural changes, and reveal the history of gene family expansions that might contribute to the adaptation of lineages and taxa. An integrated view of various modes of gene duplication can supplement the traditional gene tree analysis in specific families. The source code and documentation of MCScanX are freely available at http://chibba.pgml.uga.edu/mcscan2/. PMID:22217600
PyPanda: a Python package for gene regulatory network reconstruction

PubMed Central

van IJzendoorn, David G.P.; Glass, Kimberly; Quackenbush, John; Kuijjer, Marieke L.

2016-01-01

Summary: PANDA (Passing Attributes between Networks for Data Assimilation) is a gene regulatory network inference method that uses message-passing to integrate multiple sources of ‘omics data. PANDA was originally coded in C ++. In this application note we describe PyPanda, the Python version of PANDA. PyPanda runs considerably faster than the C ++ version and includes additional features for network analysis. Availability and implementation: The open source PyPanda Python package is freely available at http://github.com/davidvi/pypanda. Contact: mkuijjer@jimmy.harvard.edu or d.g.p.van_ijzendoorn@lumc.nl PMID:27402905
PyPanda: a Python package for gene regulatory network reconstruction.

PubMed

van IJzendoorn, David G P; Glass, Kimberly; Quackenbush, John; Kuijjer, Marieke L

2016-11-01

PANDA (Passing Attributes between Networks for Data Assimilation) is a gene regulatory network inference method that uses message-passing to integrate multiple sources of 'omics data. PANDA was originally coded in C ++. In this application note we describe PyPanda, the Python version of PANDA. PyPanda runs considerably faster than the C ++ version and includes additional features for network analysis. The open source PyPanda Python package is freely available at http://github.com/davidvi/pypanda CONTACT: mkuijjer@jimmy.harvard.edu or d.g.p.van_ijzendoorn@lumc.nl. © The Author 2016. Published by Oxford University Press.

Recent advances and versatility of MAGE towards industrial applications.

PubMed

Singh, Vijai; Braddick, Darren

2015-12-01

The genome engineering toolkit has expanded significantly in recent years, allowing us to study the functions of genes in cellular networks and assist in over-production of proteins, drugs, chemicals and biofuels. Multiplex automated genome engineering (MAGE) has been recently developed and gained more scientific interest towards strain engineering. MAGE is a simple, rapid and efficient tool for manipulating genes simultaneously in multiple loci, assigning genetic codes and integrating non-natural amino acids. MAGE can be further expanded towards the engineering of fast, robust and over-producing strains for chemicals, drugs and biofuels at industrial scales.
Sequence data and association statistics from 12,940 type 2 diabetes cases and controls.

PubMed

Flannick, Jason; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M; Agarwala, Vineeta; Gaulton, Kyle J; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J; Rivas, Manuel A; Perry, John R B; Sim, Xueling; Blackwell, Thomas W; Robertson, Neil R; Rayner, N William; Cingolani, Pablo; Locke, Adam E; Tajes, Juan Fernandez; Highland, Heather M; Dupuis, Josee; Chines, Peter S; Lindgren, Cecilia M; Hartl, Christopher; Jackson, Anne U; Chen, Han; Huyghe, Jeroen R; van de Bunt, Martijn; Pearson, Richard D; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M; Gamazon, Eric R; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A; Below, Jennifer E; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L; Pasko, Dorota; Parker, Stephen C J; Varga, Tibor V; Green, Todd; Beer, Nicola L; Day-Williams, Aaron G; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F; Han, Bok-Ghee; Jenkinson, Christopher P; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C Y; Palmer, Nicholette D; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D; Neale, Benjamin M; Purcell, Shaun; Butterworth, Adam S; Howson, Joanna M M; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K L; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H T; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E; Rybin, Dennis; Farook, Vidya S; Fowler, Sharon P; Freedman, Barry I; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K; Puppala, Sobha; Scott, William R; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C; Mangino, Massimo; Bonnycastle, Lori L; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L; Herder, Christian; Groves, Christopher J; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A; Doney, Alex S F; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H; Stirrups, Kathleen; Wood, Andrew R; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N A; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M; Syvänen, Ann-Christine; Bergman, Richard N; Bharadwaj, Dwaipayan; Bottinger, Erwin P; Cho, Yoon Shin; Chandak, Giriraj R; Chan, Juliana Cn; Chia, Kee Seng; Daly, Mark J; Ebrahim, Shah B; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A; Lehman, Donna M; Jia, Weiping; Ma, Ronald C W; Pollin, Toni I; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J F; Small, Kerrin S; Ried, Janina S; DeFronzo, Ralph A; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R; Gloyn, Anna L; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D; Hattersley, Andrew T; Bowden, Donald W; Collins, Francis S; Atzmon, Gil; Chambers, John C; Spector, Timothy D; Laakso, Markku; Strom, Tim M; Bell, Graeme I; Blangero, John; Duggirala, Ravindranath; Tai, E Shyong; McVean, Gilean; Hanis, Craig L; Wilson, James G; Seielstad, Mark; Frayling, Timothy M; Meigs, James B; Cox, Nancy J; Sladek, Rob; Lander, Eric S; Gabriel, Stacey; Mohlke, Karen L; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J; Morris, Andrew P; Kang, Hyun Min; Altshuler, David; Burtt, Noël P; Florez, Jose C; Boehnke, Michael; McCarthy, Mark I

2017-12-19

To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.
Sequence data and association statistics from 12,940 type 2 diabetes cases and controls

PubMed Central

Jason, Flannick; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M.; Agarwala, Vineeta; Gaulton, Kyle J.; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J.; Rivas, Manuel A.; Perry, John R. B.; Sim, Xueling; Blackwell, Thomas W.; Robertson, Neil R.; Rayner, N William; Cingolani, Pablo; Locke, Adam E.; Tajes, Juan Fernandez; Highland, Heather M.; Dupuis, Josee; Chines, Peter S.; Lindgren, Cecilia M.; Hartl, Christopher; Jackson, Anne U.; Chen, Han; Huyghe, Jeroen R.; van de Bunt, Martijn; Pearson, Richard D.; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M.; Gamazon, Eric R.; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A.; Below, Jennifer E.; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L.; Pasko, Dorota; Parker, Stephen C. J.; Varga, Tibor V.; Green, Todd; Beer, Nicola L.; Day-Williams, Aaron G.; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J.; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P.; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F.; Han, Bok-Ghee; Jenkinson, Christopher P.; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C. Y.; Palmer, Nicholette D.; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E.; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D.; Neale, Benjamin M.; Purcell, Shaun; Butterworth, Adam S.; Howson, Joanna M. M.; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K. L.; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H. T.; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E.; Rybin, Dennis; Farook, Vidya S.; Fowler, Sharon P.; Freedman, Barry I.; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J.; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K.; Puppala, Sobha; Scott, William R.; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A.; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C.; Mangino, Massimo; Bonnycastle, Lori L.; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L.; Herder, Christian; Groves, Christopher J.; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A.; Doney, Alex S. F.; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J.; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E.; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H.; Stirrups, Kathleen; Wood, Andrew R.; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O.; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P.; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B.; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N. A.; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M.; Syvänen, Ann-Christine; Bergman, Richard N.; Bharadwaj, Dwaipayan; Bottinger, Erwin P.; Cho, Yoon Shin; Chandak, Giriraj R.; Chan, Juliana CN; Chia, Kee Seng; Daly, Mark J.; Ebrahim, Shah B.; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A.; Lehman, Donna M.; Jia, Weiping; Ma, Ronald C. W.; Pollin, Toni I.; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J. F.; Small, Kerrin S.; Ried, Janina S.; DeFronzo, Ralph A.; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J.; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W.; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R.; Gloyn, Anna L.; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D.; Hattersley, Andrew T.; Bowden, Donald W.; Collins, Francis S.; Atzmon, Gil; Chambers, John C.; Spector, Timothy D.; Laakso, Markku; Strom, Tim M.; Bell, Graeme I.; Blangero, John; Duggirala, Ravindranath; Tai, E. Shyong; McVean, Gilean; Hanis, Craig L.; Wilson, James G.; Seielstad, Mark; Frayling, Timothy M.; Meigs, James B.; Cox, Nancy J.; Sladek, Rob; Lander, Eric S.; Gabriel, Stacey; Mohlke, Karen L.; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J.; Morris, Andrew P.; Kang, Hyun Min; Altshuler, David; Burtt, Noël P.; Florez, Jose C.; Boehnke, Michael; McCarthy, Mark I.

2017-01-01

To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1–5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D. PMID:29257133
MouSensor: A Versatile Genetic Platform to Create Super Sniffer Mice for Studying Human Odor Coding.

PubMed

D'Hulst, Charlotte; Mina, Raena B; Gershon, Zachary; Jamet, Sophie; Cerullo, Antonio; Tomoiaga, Delia; Bai, Li; Belluscio, Leonardo; Rogers, Matthew E; Sirotin, Yevgeniy; Feinstein, Paul

2016-07-26

Typically, ∼0.1% of the total number of olfactory sensory neurons (OSNs) in the main olfactory epithelium express the same odorant receptor (OR) in a singular fashion and their axons coalesce into homotypic glomeruli in the olfactory bulb. Here, we have dramatically increased the total number of OSNs expressing specific cloned OR coding sequences by multimerizing a 21-bp sequence encompassing the predicted homeodomain binding site sequence, TAATGA, known to be essential in OR gene choice. Singular gene choice is maintained in these "MouSensors." In vivo synaptopHluorin imaging of odor-induced responses by known M71 ligands shows functional glomerular activation in an M71 MouSensor. Moreover, a behavioral avoidance task demonstrates that specific odor detection thresholds are significantly decreased in multiple transgenic lines, expressing mouse or human ORs. We have developed a versatile platform to study gene choice and axon identity, to create biosensors with great translational potential, and to finally decode human olfaction. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages

PubMed Central

Yu, Ying; Fuscoe, James C.; Zhao, Chen; Guo, Chao; Jia, Meiwen; Qing, Tao; Bannon, Desmond I.; Lancashire, Lee; Bao, Wenjun; Du, Tingting; Luo, Heng; Su, Zhenqiang; Jones, Wendell D.; Moland, Carrie L.; Branham, William S.; Qian, Feng; Ning, Baitang; Li, Yan; Hong, Huixiao; Guo, Lei; Mei, Nan; Shi, Tieliu; Wang, Kevin Y.; Wolfinger, Russell D.; Nikolsky, Yuri; Walker, Stephen J.; Duerksen-Hughes, Penelope; Mason, Christopher E.; Tong, Weida; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Shi, Leming; Wang, Charles

2014-01-01

The rat has been used extensively as a model for evaluating chemical toxicities and for understanding drug mechanisms. However, its transcriptome across multiple organs, or developmental stages, has not yet been reported. Here we show, as part of the SEQC consortium efforts, a comprehensive rat transcriptomic BodyMap created by performing RNA-Seq on 320 samples from 11 organs of both sexes of juvenile, adolescent, adult and aged Fischer 344 rats. We catalogue the expression profiles of 40,064 genes, 65,167 transcripts, 31,909 alternatively spliced transcript variants and 2,367 non-coding genes/non-coding RNAs (ncRNAs) annotated in AceView. We find that organ-enriched, differentially expressed genes reflect the known organ-specific biological activities. A large number of transcripts show organ-specific, age-dependent or sex-specific differential expression patterns. We create a web-based, open-access rat BodyMap database of expression profiles with crosslinks to other widely used databases, anticipating that it will serve as a primary resource for biomedical research using the rat model. PMID:24510058
RANGER-DTL 2.0: Rigorous Reconstruction of Gene-Family Evolution by Duplication, Transfer, and Loss.

PubMed

Bansal, Mukul S; Kellis, Manolis; Kordi, Misagh; Kundu, Soumya

2018-04-24

RANGER-DTL 2.0 is a software program for inferring gene family evolution using Duplication-Transfer-Loss reconciliation. This new software is highly scalable and easy to use, and offers many new features not currently available in any other reconciliation program. RANGER-DTL 2.0 has a particular focus on reconciliation accuracy and can account for many sources of reconciliation uncertainty including uncertain gene tree rooting, gene tree topological uncertainty, multiple optimal reconciliations, and alternative event cost assignments. RANGER-DTL 2.0 is open-source and written in C ++ and Python. Pre-compiled executables, source code (open-source under GNU GPL), and a detailed manual are freely available from http://compbio.engr.uconn.edu/software/RANGER-DTL/. mukul.bansal@uconn.edu.
Theory of prokaryotic genome evolution.

PubMed

Sela, Itamar; Wolf, Yuri I; Koonin, Eugene V

2016-10-11

Bacteria and archaea typically possess small genomes that are tightly packed with protein-coding genes. The compactness of prokaryotic genomes is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. Here, by fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. These results suggest that the number of genes in prokaryotic genomes reflects the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias (i.e., the rate of deletion of genetic material being slightly greater than the rate of acquisition). Thus, new genes acquired by microbial genomes, on average, appear to be adaptive. The tight spacing of protein-coding genes likely results from a combination of the deletion bias and purifying selection that efficiently eliminates nonfunctional, noncoding sequences.
Epigenomics of Hypertension

PubMed Central

Liang, Mingyu; Cowley, Allen W.; Mattson, David L.; Kotchen, Theodore A.; Liu, Yong

2013-01-01

Multiple genes and pathways are involved in the pathogenesis of hypertension. Epigenomic studies of hypertension are beginning to emerge and hold great promise of providing novel insights into the mechanisms underlying hypertension. Epigenetic marks or mediators including DNA methylation, histone modifications, and non-coding RNA can be studied at a genome or near-genome scale using epigenomic approaches. At the single gene level, several studies have identified changes in epigenetic modifications in genes expressed in the kidney that correlate with the development of hypertension. Systematic analysis and integration of epigenetic marks at the genome scale, demonstration of cellular and physiological roles of specific epigenetic modifications, and investigation of inheritance are among the major challenges and opportunities for future epigenomic and epigenetic studies of hypertension. Essential hypertension is a multifactorial disease involving multiple genetic and environmental factors and mediated by alterations in multiple biological pathways. Because the non-genetic mechanisms may involve epigenetic modifications, epigenomics is one of the latest concepts and approaches brought to bear on hypertension research. In this article, we summarize briefly the concepts and techniques for epigenomics, discuss the rationale for applying epigenomic approaches to study hypertension, and review the current state of this research area. PMID:24011581
Nitrification inhibition by hexavalent chromium Cr(VI)--Microbial ecology, gene expression and off-gas emissions.

PubMed

Kim, Young Mo; Park, Hongkeun; Chandran, Kartik

2016-04-01

The goal of this study was to investigate the responses in the physiology, microbial ecology and gene expression of nitrifying bacteria to imposition of and recovery from Cr(VI) loading in a lab-scale nitrification bioreactor. Exposure to Cr(VI) in the reactor strongly inhibited nitrification performance resulting in a parallel decrease in nitrate production and ammonia consumption. Cr(VI) exposure also led to an overall decrease in total bacterial concentrations in the reactor. However, the fraction of ammonia oxidizing bacteria (AOB) decreased to a greater extent than the fraction of nitrite oxidizing bacteria (NOB). In terms of functional gene expression, a rapid decrease in the transcript concentrations of amoA gene coding for ammonia oxidation in AOB was observed in response to the Cr(VI) shock. In contrast, transcript concentrations of the nxrA gene coding for nitrite oxidation in NOB were relatively unchanged compared to Cr(VI) pre-exposure levels. Therefore, Cr(VI) exposure selectively and directly inhibited activity of AOB, which indirectly resulted in substrate (nitrite) limitation to NOB. Significantly, trends in amoA expression preceded performance trends both during imposition of and recovery from inhibition. During recovery from the Cr(VI) shock, the high ammonia concentrations in the bioreactor resulted in an irreversible shift towards AOB populations, which are expected to be more competitive in high ammonia environments. An inadvertent impact during recovery was increased emission of nitrous oxide (N2O) and nitric oxide (NO), consistent with recent findings linking AOB activity and the production of these gases. Therefore, Cr(VI) exposure elicited multiple responses on the microbial ecology, gene expression and both aqueous and gaseous nitrogenous conversion in a nitrification process. A complementary interrogation of these multiple responses facilitated an understanding of both direct and indirect inhibitory impacts on nitrification. Copyright © 2016 Elsevier Ltd. All rights reserved.
An engineered promoter driving expression of a microbial avirulence gene confers recognition of TAL effectors and reduces growth of diverse Xanthomonas strains in citrus.

PubMed

Shantharaj, Deepak; Römer, Patrick; Figueiredo, Jose F L; Minsavage, Gerald V; Krönauer, Christina; Stall, Robert E; Moore, Gloria A; Fisher, Latanya C; Hu, Yang; Horvath, Diana M; Lahaye, Thomas; Jones, Jeffrey B

2017-09-01

Xanthomonas citri ssp. citri (X. citri), causal agent of citrus canker, uses transcription activator-like effectors (TALEs) as major pathogenicity factors. TALEs, which are delivered into plant cells through the type III secretion system (T3SS), interact with effector binding elements (EBEs) in host genomes to activate the expression of downstream susceptibility genes to promote disease. Predictably, TALEs bind EBEs in host promoters via known combinations of TALE amino acids to DNA bases, known as the TALE code. We introduced 14 EBEs, matching distinct X. citri TALEs, into the promoter of the pepper Bs3 gene (ProBs3 1EBE ), and fused this engineered promoter with multiple EBEs (ProBs3 14EBE ) to either the β-glucuronidase (GUS) reporter gene or the coding sequence (cds) of the pepper gene, Bs3. TALE-induced expression of the Bs3 cds in citrus leaves resulted in no visible hypersensitive response (HR). Therefore, we utilized a different approach in which ProBs3 1EBE and ProBs3 14EBE were fused to the Xanthomonas gene, avrGf1, which encodes a bacterial effector that elicits an HR in grapefruit and sweet orange. We demonstrated, in transient assays, that activation of ProBs3 14EBE by X. citri TALEs is T3SS dependent, and that the expression of AvrGf1 triggers HR and correlates with reduced bacterial growth. We further demonstrated that all tested virulent X. citri strains from diverse geographical locations activate ProBs3 14EBE . TALEs are essential for the virulence of X. citri strains and, because the engineered promoter traps are activated by multiple TALEs, this concept has the potential to confer broad-spectrum, durable resistance to citrus canker in stably transformed plants. © 2016 BSPP AND JOHN WILEY & SONS LTD.
The Maximal C3 Self-Complementary Trinucleotide Circular Code X in Genes of Bacteria, Archaea, Eukaryotes, Plasmids and Viruses

PubMed Central

Michel, Christian J.

2017-01-01

In 1996, a set X of 20 trinucleotides was identified in genes of both prokaryotes and eukaryotes which has on average the highest occurrence in reading frame compared to its two shifted frames. Furthermore, this set X has an interesting mathematical property as X is a maximal C3 self-complementary trinucleotide circular code. In 2015, by quantifying the inspection approach used in 1996, the circular code X was confirmed in the genes of bacteria and eukaryotes and was also identified in the genes of plasmids and viruses. The method was based on the preferential occurrence of trinucleotides among the three frames at the gene population level. We extend here this definition at the gene level. This new statistical approach considers all the genes, i.e., of large and small lengths, with the same weight for searching the circular code X. As a consequence, the concept of circular code, in particular the reading frame retrieval, is directly associated to each gene. At the gene level, the circular code X is strengthened in the genes of bacteria, eukaryotes, plasmids, and viruses, and is now also identified in the genes of archaea. The genes of mitochondria and chloroplasts contain a subset of the circular code X. Finally, by studying viral genes, the circular code X was found in DNA genomes, RNA genomes, double-stranded genomes, and single-stranded genomes. PMID:28420220
[Exome sequencing revealed Allan-Herndon-Dudley syndrome underlying multiple disabilities].

PubMed

Arvio, Maria; Philips, Anju K; Ahvenainen, Minna; Somer, Mirja; Kalscheuer, Vera; Järvelä, Irma

2014-01-01

Normal function of the thyroid gland is the cornerstone of a child's mental development and physical growth. We describe a Finnish family, in which the diagnosis of three brothers became clear after investigations that lasted for more than 30 years. Two of the sons have already died. DNA analysis of the third one, a 16-year-old boy, revealed in exome sequencing of the complete X chromosome a mutation in the SLC16A2 gene, i.e. MCT8, coding for a thyroid hormone transport protein. Allan-Herndon-Dudley syndrome was thus shown to be the cause of multiple disabilities.
Expression of Multiple Stress Response Genes by Escherichia Coli Under Modeled Reduced Gravity

NASA Astrophysics Data System (ADS)

Vukanti, Raja; Leff, Laura G.

2012-09-01

Bacteria, in response to changes in their environment, quickly regulate gene expression; hence, transcriptional profiling has been widely used to characterize bacterial responses to various environmental conditions. In this study, we used clinorotation to grow bacteria under low-sedimentation, -shear, and -turbulence conditions (referred to as modeled reduced gravity, MRG, below) which profoundly impacts bacteria including causing elevated resistance to multiple environmental stresses. To explore potential mechanisms behind the multiple stress resistance response to MRG, we assessed expression levels of E. coli genes, using reverse transcription followed by real-time-PCR, involved in specific stress and general stress responses under MRG and normal gravity (NG) in nutritionally rich and minimal media, and during exponential and stationary phases of growth. In addition, growth rates as well as physico-chemical parameters of culture media were examined. Over-expression of stress response genes (csiD, cstA, katE, otsA, treA) occurred under MRG compared to NG controls, but only during the later stages of growth in rich medium demonstrating that bacterial response to MRG varies with growth-medium and -phase. At stationary phase in rich medium under MRG and NG, E. coli had similar growth rates (based on rRNA-leader abundance) and yields (cell mass and numbers); this coupled, with observations of simultaneous induction of starvation response genes (csiD and cstA) suggests the multiple stress resistance phenotype under MRG could be attributable to microzones of nutrient unavailability around cells. Overall, in rich medium, the response resembled the general stress response (GSR) that E. coli develops during stationary phase of growth. Along these same lines, induction of genes coding for GSR was reversed by improving nutritional conditions under MRG. The reversal of GSR under MRG suggests that the multiple stress response exhibited is not specific to MRG but may result from nutrient limitation experienced by bacteria after incubation in nutrient-rich media under these conditions.
Analysis of resistance genes of clinical Pannonibacter phragmitetus strain 31801 by complete genome sequencing.

PubMed

Ming, De-Song; Chen, Qing-Qing; Chen, Xiao-Tin

2018-05-14

To clarify the resistance mechanisms of Pannonibacter phragmitetus 31801, isolated from the blood of a liver abscess patient, at the genomic level, we performed whole genomic sequencing using a PacBio RS II single-molecule real-time long-read sequencer. Bioinformatic analysis of the resulting sequence was then carried out to identify any possible resistance genes. Analyses included Basic Local Alignment Search Tool searches against the Antibiotic Resistance Genes Database, ResFinder analysis of the genome sequence, and Resistance Gene Identifier analysis within the Comprehensive Antibiotic Resistance Database. Prophages, clustered regularly interspaced short palindromic repeats (CRISPR), and other putative virulence factors were also identified using PHAST, CRISPRfinder, and the Virulence Factors Database, respectively. The circular chromosome and single plasmid of P. phragmitetus 31801 contained multiple antibiotic resistance genes, including those coding for three different types of β-lactamase [NPS β-lactamase (EC 3.5.2.6), β-lactamase class C, and a metal-dependent hydrolase of β-lactamase superfamily I]. In addition, genes coding for subunits of several multidrug-resistance efflux pumps were identified, including those targeting macrolides (adeJ, cmeB), tetracycline (acrB, adeAB), fluoroquinolones (acrF, ceoB), and aminoglycosides (acrD, amrB, ceoB, mexY, smeB). However, apart from the tripartite macrolide efflux pump macAB-tolC, the genome did not appear to contain the complete complement of subunit genes required for production of most of the major multidrug-resistance efflux pumps.
A Catalogue of Putative cis-Regulatory Interactions Between Long Non-coding RNAs and Proximal Coding Genes Based on Correlative Analysis Across Diverse Human Tumors.

PubMed

Basu, Swaraj; Larsson, Erik

2018-05-31

Antisense transcripts and other long non-coding RNAs are pervasive in mammalian cells, and some of these molecules have been proposed to regulate proximal protein-coding genes in cis For example, non-coding transcription can contribute to inactivation of tumor suppressor genes in cancer, and antisense transcripts have been implicated in the epigenetic inactivation of imprinted genes. However, our knowledge is still limited and more such regulatory interactions likely await discovery. Here, we make use of available gene expression data from a large compendium of human tumors to generate hypotheses regarding non-coding-to-coding cis -regulatory relationships with emphasis on negative associations, as these are less likely to arise for reasons other than cis -regulation. We document a large number of possible regulatory interactions, including 193 coding/non-coding pairs that show expression patterns compatible with negative cis -regulation. Importantly, by this approach we capture several known cases, and many of the involved coding genes have known roles in cancer. Our study provides a large catalog of putative non-coding/coding cis -regulatory pairs that may serve as a basis for further experimental validation and characterization. Copyright © 2018 Basu and Larsson.
The Reference Genome of the Halophytic Plant Eutrema salsugineum

PubMed Central

Yang, Ruolin; Jarvis, David E.; Chen, Hao; Beilstein, Mark A.; Grimwood, Jane; Jenkins, Jerry; Shu, ShengQiang; Prochnik, Simon; Xin, Mingming; Ma, Chuang; Schmutz, Jeremy; Wing, Rod A.; Mitchell-Olds, Thomas; Schumaker, Karen S.; Wang, Xiangfeng

2013-01-01

Halophytes are plants that can naturally tolerate high concentrations of salt in the soil, and their tolerance to salt stress may occur through various evolutionary and molecular mechanisms. Eutrema salsugineum is a halophytic species in the Brassicaceae that can naturally tolerate multiple types of abiotic stresses that typically limit crop productivity, including extreme salinity and cold. It has been widely used as a laboratorial model for stress biology research in plants. Here, we present the reference genome sequence (241 Mb) of E. salsugineum at 8× coverage sequenced using the traditional Sanger sequencing-based approach with comparison to its close relative Arabidopsis thaliana. The E. salsugineum genome contains 26,531 protein-coding genes and 51.4% of its genome is composed of repetitive sequences that mostly reside in pericentromeric regions. Comparative analyses of the genome structures, protein-coding genes, microRNAs, stress-related pathways, and estimated translation efficiency of proteins between E. salsugineum and A. thaliana suggest that halophyte adaptation to environmental stresses may occur via a global network adjustment of multiple regulatory mechanisms. The E. salsugineum genome provides a resource to identify naturally occurring genetic alterations contributing to the adaptation of halophytic plants to salinity and that might be bioengineered in related crop species. PMID:23518688
A new polymorphic and multicopy MHC gene family related to nonmammalian class I

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leelayuwat, C.; Degli-Esposti, M.A.; Abraham, L.J.

1994-12-31

The authors have used genomic analysis to characterize a region of the central major histocompatibility complex (MHC) spanning {approximately} 300 kilobases (kb) between TNF and HLA-B. This region has been suggested to carry genetic factors relevant to the development of autoimmune diseases such as myasthenia gravis (MG) and insulin dependent diabetes mellitus (IDDM). Genomic sequence was analyzed for coding potential, using two neural network programs, GRAIL and GeneParser. A genomic probe, JAB, containing putative coding sequences (PERB11) located 60 kb centromeric of HLA-B, was used for northern analysis of human tissues. Multiple transcripts were detected. Southern analysis of genomic DNAmore » and overlapping YAC clones, covering the region from BAT1 to HLA-F, indicated that there are at least five copies of PERB11, four of which are located within this region of the MHC. The partial cDNA sequence of PERB11 was obtained from poly-A RNA derived from skeletal muscle. The putative amino acid sequence of PERB11 shares {approximately} 30% identity to MHC class I molecules from various species, including reptiles, chickens, and frogs, as well as to other MHC class I-like molecules, such as the IgG FcR of the mouse and rat and the human Zn-{alpha}2-glycoprotein. From direct comparison of amino acid sequences, it is concluded that PERB11 is a distinct molecule more closely related to nonmammalian than known mammalian MHC class I molecules. Genomic sequence analysis of PERB11 from five MHC ancestral haplotypes (AH) indicated that the gene is polymorphic at both DNA and protein level. The results suggest that the authors have identified a novel polymorphic gene family with multiple copies within the MHC. 48 refs., 10 figs., 2 tabs.« less
Characteristics and significance of intergenic polyadenylated RNA transcription in Arabidopsis.

PubMed

Moghe, Gaurav D; Lehti-Shiu, Melissa D; Seddon, Alex E; Yin, Shan; Chen, Yani; Juntawong, Piyada; Brandizzi, Federica; Bailey-Serres, Julia; Shiu, Shin-Han

2013-01-01

The Arabidopsis (Arabidopsis thaliana) genome is the most well-annotated plant genome. However, transcriptome sequencing in Arabidopsis continues to suggest the presence of polyadenylated (polyA) transcripts originating from presumed intergenic regions. It is not clear whether these transcripts represent novel noncoding or protein-coding genes. To understand the nature of intergenic polyA transcription, we first assessed its abundance using multiple messenger RNA sequencing data sets. We found 6,545 intergenic transcribed fragments (ITFs) occupying 3.6% of Arabidopsis intergenic space. In contrast to transcribed fragments that map to protein-coding and RNA genes, most ITFs are significantly shorter, are expressed at significantly lower levels, and tend to be more data set specific. A surprisingly large number of ITFs (32.1%) may be protein coding based on evidence of translation. However, our results indicate that these "translated" ITFs tend to be close to and are likely associated with known genes. To investigate if ITFs are under selection and are functional, we assessed ITF conservation through cross-species as well as within-species comparisons. Our analysis reveals that 237 ITFs, including 49 with translation evidence, are under strong selective constraint and relatively distant from annotated features. These ITFs are likely parts of novel genes. However, the selective pressure imposed on most ITFs is similar to that of randomly selected, untranscribed intergenic sequences. Our findings indicate that despite the prevalence of ITFs, apart from the possibility of genomic contamination, many may be background or noisy transcripts derived from "junk" DNA, whose production may be inherent to the process of transcription and which, on rare occasions, may act as catalysts for the creation of novel genes.
A Cytogenetic Abnormality and Rare Coding Variants Identify ABCA13 as a Candidate Gene in Schizophrenia, Bipolar Disorder, and Depression

PubMed Central

Knight, Helen M.; Pickard, Benjamin S.; Maclean, Alan; Malloy, Mary P.; Soares, Dinesh C.; McRae, Allan F.; Condie, Alison; White, Angela; Hawkins, William; McGhee, Kevin; van Beck, Margaret; MacIntyre, Donald J.; Starr, John M.; Deary, Ian J.; Visscher, Peter M.; Porteous, David J.; Cannon, Ronald E.; St Clair, David; Muir, Walter J.; Blackwood, Douglas H.R.

2009-01-01

Schizophrenia and bipolar disorder are leading causes of morbidity across all populations, with heritability estimates of ∼80% indicating a substantial genetic component. Population genetics and genome-wide association studies suggest an overlap of genetic risk factors between these illnesses but it is unclear how this genetic component is divided between common gene polymorphisms, rare genomic copy number variants, and rare gene sequence mutations. We report evidence that the lipid transporter gene ABCA13 is a susceptibility factor for both schizophrenia and bipolar disorder. After the initial discovery of its disruption by a chromosome abnormality in a person with schizophrenia, we resequenced ABCA13 exons in 100 cases with schizophrenia and 100 controls. Multiple rare coding variants were identified including one nonsense and nine missense mutations and compound heterozygosity/homozygosity in six cases. Variants were genotyped in additional schizophrenia, bipolar, depression (n > 1600), and control (n > 950) cohorts and the frequency of all rare variants combined was greater than controls in schizophrenia (OR = 1.93, p = 0.0057) and bipolar disorder (OR = 2.71, p = 0.00007). The population attributable risk of these mutations was 2.2% for schizophrenia and 4.0% for bipolar disorder. In a study of 21 families of mutation carriers, we genotyped affected and unaffected relatives and found significant linkage (LOD = 4.3) of rare variants with a phenotype including schizophrenia, bipolar disorder, and major depression. These data identify a candidate gene, highlight the genetic overlap between schizophrenia, bipolar disorder, and depression, and suggest that rare coding variants may contribute significantly to risk of these disorders. PMID:19944402
RARE VARIANTS IN THE NEUROTROPHIN SIGNALING PATHWAY IMPLICATED IN SCHIZOPHRENIA RISK

PubMed Central

Kranz, Thorsten M.; Goetz, Ray R.; Walsh-Messinger, Julie; Goetz, Deborah; Antonius, Daniel; Dolgalev, Igor; Heguy, Adriana; Seandel, Marco; Malaspina, Dolores; Chao, Moses V.

2015-01-01

Multiple lines of evidence corroborate impaired signaling pathways as relevant to the underpinnings of schizophrenia. There has been an interest in neurotrophins, since they are crucial mediators of neurodevelopment and in synaptic connectivity in the adult brain. Neurotrophins and their receptors demonstrate aberrant expression patterns in cortical areas for schizophrenia cases in comparison to control subjects. There is little known about the contribution of neurotrophin genes in psychiatric disorders. To begin to address this issue, we conducted high-coverage targeted exome capture in a subset of neurotrophin genes in 48 comprehensively characterized cases with schizophrenia-related psychosis. We herein report rare missense polymorphisms and novel missense mutations in neurotrophin receptor signaling pathway genes. Furthermore, we observed that several genes have a higher propensity to harbor missense coding variants than others. Based on this initial analysis we suggest that rare variants and missense mutations in neurotrophin genes might represent genetic contributions involved across psychiatric disorders. PMID:26215504

A Cluster of Cuticle Protein Genes of Drosophila Melanogaster at 65a: Sequence, Structure and Evolution

PubMed Central

Charles, J. P.; Chihara, C.; Nejad, S.; Riddiford, L. M.

1997-01-01

A 36-kb genomic DNA segment of the Drosophila melanogaster genome containing 12 clustered cuticle genes has been mapped and partially sequenced. The cluster maps at 65A 5-6 on the left arm of the third chromosome, in agreement with the previously determined location of a putative cluster encompassing the genes for the third instar larval cuticle proteins LCP5, LCP6 and LCP8. This cluster is the largest cuticle gene cluster discovered to date and shows a number of surprising features that explain in part the genetic complexity of the LCP5, LCP6 and LCP8 loci. The genes encoding LCP5 and LCP8 are multiple copy genes and the presence of extensive similarity in their coding regions gives the first evidence for gene conversion in cuticle genes. In addition, five genes in the cluster are intronless. Four of these five have arisen by retroposition. The other genes in the cluster have a single intron located at an unusual location for insect cuticle genes. PMID:9383064
Ancient genomic architecture for mammalian olfactory receptor clusters

PubMed Central

Aloni, Ronny; Olender, Tsviya; Lancet, Doron

2006-01-01

Background Mammalian olfactory receptor (OR) genes reside in numerous genomic clusters of up to several dozen genes. Whole-genome sequence alignment nets of five mammals allow their comprehensive comparison, aimed at reconstructing the ancestral olfactory subgenome. Results We developed a new and general tool for genome-wide definition of genomic gene clusters conserved in multiple species. Syntenic orthologs, defined as gene pairs showing conservation of both genomic location and coding sequence, were subjected to a graph theory algorithm for discovering CLICs (clusters in conservation). When applied to ORs in five mammals, including the marsupial opossum, more than 90% of the OR genes were found within a framework of 48 multi-species CLICs, invoking a general conservation of gene order and composition. A detailed analysis of individual CLICs revealed multiple differences among species, interpretable through species-specific genomic rearrangements and reflecting complex mammalian evolutionary dynamics. One significant instance involves CLIC #1, which lacks a human member, implying the human-specific deletion of an OR cluster, whose mouse counterpart has been tentatively associated with isovaleric acid odorant detection. Conclusion The identified multi-species CLICs demonstrate that most of the mammalian OR clusters have a common ancestry, preceding the split between marsupials and placental mammals. However, only two of these CLICs were capable of incorporating chicken OR genes, parsimoniously implying that all other CLICs emerged subsequent to the avian-mammalian divergence. PMID:17010214
The mitochondrial ND1 m.3337G>A mutation associated to multiple mitochondrial DNA deletions in a patient with Wolfram syndrome and cardiomyopathy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mezghani, Najla; Mnif, Mouna; Mkaouar-Rebai, Emna, E-mail: emna_mkaouar@mail2world.com

Highlights: {yields} We reported a patient with Wolfram syndrome and dilated cardiomyopathy. {yields} We detected the ND1 mitochondrial m.3337G>A mutation in 3 tested tissues (blood leukocytes, buccal mucosa and skeletal muscle). {yields} Long-range PCR amplification revealed the presence of multiple mitochondrial deletions in the skeletal muscle. {yields} The deletions remove several tRNA and protein-coding genes. -- Abstract: Wolfram syndrome (WFS) is a rare hereditary disorder also known as DIDMOAD (diabetes insipidus, diabetes mellitus, optic atrophy, and deafness). It is a heterogeneous disease and full characterization of all clinical and biological features of this disorder is difficult. The wide spectrum ofmore » clinical expression, affecting several organs and tissues, and the similarity in phenotype between patients with Wolfram syndrome and those with certain types of respiratory chain diseases suggests mitochondrial DNA (mtDNA) involvement in Wolfram syndrome patients. We report a Tunisian patient with clinical features of moderate Wolfram syndrome including diabetes, dilated cardiomyopathy and neurological complications. The results showed the presence of the mitochondrial ND1 m.3337G>A mutation in almost homoplasmic form in 3 tested tissues of the proband (blood leukocytes, buccal mucosa and skeletal muscle). In addition, the long-range PCR amplifications revealed the presence of multiple deletions of the mitochondrial DNA extracted from the patient's skeletal muscle removing several tRNA and protein-coding genes. Our study reported a Tunisian patient with clinical features of moderate Wolfram syndrome associated with cardiomyopathy, in whom we detected the ND1 m.3337G>A mutation with mitochondrial multiple deletions.« less
The agents of natural genome editing.

PubMed

Witzany, Guenther

2011-06-01

The DNA serves as a stable information storage medium and every protein which is needed by the cell is produced from this blueprint via an RNA intermediate code. More recently it was found that an abundance of various RNA elements cooperate in a variety of steps and substeps as regulatory and catalytic units with multiple competencies to act on RNA transcripts. Natural genome editing on one side is the competent agent-driven generation and integration of meaningful DNA nucleotide sequences into pre-existing genomic content arrangements, and the ability to (re-)combine and (re-)regulate them according to context-dependent (i.e. adaptational) purposes of the host organism. Natural genome editing on the other side designates the integration of all RNA activities acting on RNA transcripts without altering DNA-encoded genes. If we take the genetic code seriously as a natural code, there must be agents that are competent to act on this code because no natural code codes itself as no natural language speaks itself. As code editing agents, viral and subviral agents have been suggested because there are several indicators that demonstrate viruses competent in both RNA and DNA natural genome editing.
gsSKAT: Rapid gene set analysis and multiple testing correction for rare-variant association studies using weighted linear kernels.

PubMed

Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan E; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J

2017-05-01

Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results. © 2017 WILEY PERIODICALS, INC.
An integrated PCR colony hybridization approach to screen cDNA libraries for full-length coding sequences.

PubMed

Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain

2011-01-01

cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.
Decoding the disease-associated proteins encoded in the human chromosome 4.

PubMed

Chen, Lien-Chin; Liu, Mei-Ying; Hsiao, Yung-Chin; Choong, Wai-Kok; Wu, Hsin-Yi; Hsu, Wen-Lian; Liao, Pao-Chi; Sung, Ting-Yi; Tsai, Shih-Feng; Yu, Jau-Song; Chen, Yu-Ju

2013-01-04

Chromosome 4 is the fourth largest chromosome, containing approximately 191 megabases (~6.4% of the human genome) with 757 protein-coding genes. A number of marker genes for many diseases have been found in this chromosome, including genetic diseases (e.g., hepatocellular carcinoma) and biomedical research (cardiac system, aging, metabolic disorders, immune system, cancer and stem cell) related genes (e.g., oncogenes, growth factors). As a pilot study for the chromosome 4-centric human proteome project (Chr 4-HPP), we present here a systematic analysis of the disease association, protein isoforms, coding single nucleotide polymorphisms of these 757 protein-coding genes and their experimental evidence at the protein level. We also describe how the findings from the chromosome 4 project might be used to drive the biomarker discovery and validation study in disease-oriented projects, using the examples of secretomic and membrane proteomic approaches in cancer research. By integrating with cancer cell secretomes and several other existing databases in the public domain, we identified 141 chromosome 4-encoded proteins as cancer cell-secretable/shedable proteins. Additionally, we also identified 54 chromosome 4-encoded proteins that have been classified as cancer-associated proteins with successful selected or multiple reaction monitoring (SRM/MRM) assays developed. From literature annotation and topology analysis, 271 proteins were recognized as membrane proteins while 27.9% of the 757 proteins do not have any experimental evidence at the protein-level. In summary, the analysis revealed that the chromosome 4 is a rich resource for cancer-associated proteins for biomarker verification projects and for drug target discovery projects.
APADB: a database for alternative polyadenylation and microRNA regulation events

PubMed Central

Müller, Sören; Rycak, Lukas; Afonso-Grunz, Fabian; Winter, Peter; Zawada, Adam M.; Damrath, Ewa; Scheider, Jessica; Schmäh, Juliane; Koch, Ina; Kahl, Günter; Rotter, Björn

2014-01-01

Alternative polyadenylation (APA) is a widespread mechanism that contributes to the sophisticated dynamics of gene regulation. Approximately 50% of all protein-coding human genes harbor multiple polyadenylation (PA) sites; their selective and combinatorial use gives rise to transcript variants with differing length of their 3′ untranslated region (3′UTR). Shortened variants escape UTR-mediated regulation by microRNAs (miRNAs), especially in cancer, where global 3′UTR shortening accelerates disease progression, dedifferentiation and proliferation. Here we present APADB, a database of vertebrate PA sites determined by 3′ end sequencing, using massive analysis of complementary DNA ends. APADB provides (A)PA sites for coding and non-coding transcripts of human, mouse and chicken genes. For human and mouse, several tissue types, including different cancer specimens, are available. APADB records the loss of predicted miRNA binding sites and visualizes next-generation sequencing reads that support each PA site in a genome browser. The database tables can either be browsed according to organism and tissue or alternatively searched for a gene of interest. APADB is the largest database of APA in human, chicken and mouse. The stored information provides experimental evidence for thousands of PA sites and APA events. APADB combines 3′ end sequencing data with prediction algorithms of miRNA binding sites, allowing to further improve prediction algorithms. Current databases lack correct information about 3′UTR lengths, especially for chicken, and APADB provides necessary information to close this gap. Database URL: http://tools.genxpro.net/apadb/ PMID:25052703
Target genes discovery through copy number alteration analysis in human hepatocellular carcinoma.

PubMed

Gu, De-Leung; Chen, Yen-Hsieh; Shih, Jou-Ho; Lin, Chi-Hung; Jou, Yuh-Shan; Chen, Chian-Feng

2013-12-21

High-throughput short-read sequencing of exomes and whole cancer genomes in multiple human hepatocellular carcinoma (HCC) cohorts confirmed previously identified frequently mutated somatic genes, such as TP53, CTNNB1 and AXIN1, and identified several novel genes with moderate mutation frequencies, including ARID1A, ARID2, MLL, MLL2, MLL3, MLL4, IRF2, ATM, CDKN2A, FGF19, PIK3CA, RPS6KA3, JAK1, KEAP1, NFE2L2, C16orf62, LEPR, RAC2, and IL6ST. Functional classification of these mutated genes suggested that alterations in pathways participating in chromatin remodeling, Wnt/β-catenin signaling, JAK/STAT signaling, and oxidative stress play critical roles in HCC tumorigenesis. Nevertheless, because there are few druggable genes used in HCC therapy, the identification of new therapeutic targets through integrated genomic approaches remains an important task. Because a large amount of HCC genomic data genotyped by high density single nucleotide polymorphism arrays is deposited in the public domain, copy number alteration (CNA) analyses of these arrays is a cost-effective way to reveal target genes through profiling of recurrent and overlapping amplicons, homozygous deletions and potentially unbalanced chromosomal translocations accumulated during HCC progression. Moreover, integration of CNAs with other high-throughput genomic data, such as aberrantly coding transcriptomes and non-coding gene expression in human HCC tissues and rodent HCC models, provides lines of evidence that can be used to facilitate the identification of novel HCC target genes with the potential of improving the survival of HCC patients.
Microbial genotype-phenotype mapping by class association rule mining.

PubMed

Tamura, Makio; D'haeseleer, Patrik

2008-07-01

Microbial phenotypes are typically due to the concerted action of multiple gene functions, yet the presence of each gene may have only a weak correlation with the observed phenotype. Hence, it may be more appropriate to examine co-occurrence between sets of genes and a phenotype (multiple-to-one) instead of pairwise relations between a single gene and the phenotype. Here, we propose an efficient class association rule mining algorithm, netCAR, in order to extract sets of COGs (clusters of orthologous groups of proteins) associated with a phenotype from COG phylogenetic profiles and a phenotype profile. netCAR takes into account the phylogenetic co-occurrence graph between COGs to restrict hypothesis space, and uses mutual information to evaluate the biconditional relation. We examined the mining capability of pairwise and multiple-to-one association by using netCAR to extract COGs relevant to six microbial phenotypes (aerobic, anaerobic, facultative, endospore, motility and Gram negative) from 11,969 unique COG profiles across 155 prokaryotic organisms. With the same level of false discovery rate, multiple-to-one association can extract about 10 times more relevant COGs than one-to-one association. We also reveal various topologies of association networks among COGs (modules) from extracted multiple-to-one correlation rules relevant with the six phenotypes; including a well-connected network for motility, a star-shaped network for aerobic and intermediate topologies for the other phenotypes. netCAR outperforms a standard CAR mining algorithm, CARapriori, while requiring several orders of magnitude less computational time for extracting 3-COG sets. Source code of the Java implementation is available as Supplementary Material at the Bioinformatics online website, or upon request to the author. Supplementary data are available at Bioinformatics online.
Multiple Neuropeptide-Coding Genes Involved in Planarian Pharynx Extension.

PubMed

Shimoyama, Seira; Inoue, Takeshi; Kashima, Makoto; Agata, Kiyokazu

2016-06-01

Planarian feeding behavior involves three steps: moving toward food, extending the pharynx from their planarian's ventral side after arriving at the food, and ingesting the food through the pharynx. Although pharynx extension is a remarkable behavior, it remains unknown what neuronal cell types are involved in its regulation. To identify neurons involved in regulating pharynx extension, we quantitatively analyzed pharynx extension and sought to identify these neurons by RNA interference (RNAi) and in situ hybridization. This assay, when performed using planarians with amputation of various body parts, clearly showed that the head portion is indispensable for inducing pharynx extension. We thus tested the effects of knockdown of brain neurons such as serotonergic, GABAergic, and dopaminergic neurons by RNAi, but did not observe any effects on pharynx extension behavior. However, animals with RNAi of the Prohormone Convertase 2 (PC2, a neuropeptide processing enzyme) gene did not perform the pharynx extension behavior, suggesting the possible involvement of neuropeptide(s in the regulation of pharynx extension. We screened 24 neuropeptide-coding genes, analyzed their functions by RNAi using the pharynx extension assay system, and identified at least five neuropeptide genes involved in pharynx extension. These was expressed in different cells or neurons, and some of them were expressed in the brain, suggesting complex regulation of planarian feeding behavior by the nervous system.
[Overexpression of four fatty acid synthase genes elevated the efficiency of long-chain polyunsaturated fatty acids biosynthesis in mammalian cells].

PubMed

Zhu, Guiming; Saleh, Abdulmomen Ali Mohammed; Bahwal, Said Ahmed; Wang, Kunfu; Wang, Mingfu; Wang, Didi; Ge, Tangdong; Sun, Jie

2014-09-01

Three long-chain polyunsaturated fatty acids, docosahexaenoic acid (DHA, 22:6n-3), eicosapentaenoic acid (EPA, 20:5n-3) and arachidonic acid (ARA, 20:4n-6), are the most biologically active polyunsaturated fatty acids in the body. They are important in developing and maintaining the brain function, and in preventing and treating many diseases such as cardiovascular disease, inflammation and cancer. Although mammals can biosynthesize these long-chain polyunsaturated fatty acids, the efficiency is very low and dietary intake is needed to meet the requirement. In this study, a multiple-genes expression vector carrying mammalian A6/A5 fatty acid desaturases and multiple-genes expression vector carrying mammalian Δ6/Δ5 fatty acid desaturases and Δ6/Δ5 fatty acid elongases coding genes was used to transfect HEK293T cells, then the overexpression of the target genes was detected. GC-MS analysis shows that the biosynthesis efficiency and level of DHA, EPA and ARA were significantly increased in cells transfected with the multiple-genes expression vector. Particularly, DHA level in these cells was 2.5 times higher than in the control cells. This study indicates mammal possess a certain mechanism for suppression of high level of biosynthesis of long chain polyunsaturated fatty acids, and the overexpression of Δ6/Δ5 fatty acid desaturases and Δ6/Δ5 fatty acid elongases broke this suppression mechanism so that the level of DHA, EPA and ARA was significantly increased. This study also provides a basis for potential applications of this gene construct in transgenic animal to produce high level of these long-chain polyunsaturated fatty acid.
A genome-wide association study for somatic cell score using the Illumina high-density bovine beadchip identifies several novel QTL potentially related to mastitis susceptibility

PubMed Central

Meredith, Brian K.; Berry, Donagh P.; Kearney, Francis; Finlay, Emma K.; Fahey, Alan G.; Bradley, Daniel G.; Lynn, David J.

2013-01-01

Mastitis is an inflammation-driven disease of the bovine mammary gland that occurs in response to physical damage or infection and is one of the most costly production-related diseases in the dairy industry worldwide. We performed a genome-wide association study (GWAS) to identify genetic loci associated with somatic cell score (SCS), an indicator trait of mammary gland inflammation. A total of 702 Holstein-Friesian bulls were genotyped for 777,962 single nucleotide polymorphisms (SNPs) and associated with SCS phenotypes. The SCS phenotypes were expressed as daughter yield deviations (DYD) based on a large number of progeny performance records. A total of 138 SNPs on 15 different chromosomes reached genome-wide significance (corrected p-value ≤ 0.05) for association with SCS (after correction for multiple testing). We defined 28 distinct QTL regions and a number of candidate genes located in these QTL regions were identified. The most significant association (p-value = 1.70 × 10−7) was observed on chromosome 6. This QTL had no known genes annotated within it, however, the Ensembl Genome Browser predicted the presence of a small non-coding RNA (a Y RNA gene) in this genomic region. This Y RNA gene was 99% identical to human RNY4. Y RNAs are a rare type of non-coding RNA that were originally discovered due to their association with the autoimmune disease, systemic lupus erythematosus. Examining small-RNA sequencing (RNAseq) data being generated by us in multiple different mastitis-pathogen challenged cell-types has revealed that this Y RNA is expressed (but not differentially expressed) in these cells. Other QTL regions identified in this study also encoded strong candidate genes for mastitis susceptibility. A QTL region on chromosome 13, for example, was found to contain a cluster of β-defensin genes, a gene family with known roles in innate immunity. Due to the increased SNP density, this study also refined the boundaries for several known QTL for SCS and mastitis. PMID:24223582
A Novel Method for Determining the Level of Viable Disseminated Prostate Cancer Cells

DTIC Science & Technology

2012-10-01

Metridia luciferase, for use in a real-time viability assay for mammalian cells. The coding region of the marine copepod gene has been codon optimized for...need for multiple replicates of plates in time course studies. Recently a naturally secreted luciferase was identified and cloned from the marine ...well solid white flat bottom polystyrene microplates (Corning, Cat#3917, Lowell, MA). After 24 hours, conditioned media was harvested and remaining
Distinctive mitochondrial genome of Calanoid copepod Calanus sinicus with multiple large non-coding regions and reshuffled gene order: Useful molecular markers for phylogenetic and population studies

PubMed Central

2011-01-01

Background Copepods are highly diverse and abundant, resulting in extensive ecological radiation in marine ecosystems. Calanus sinicus dominates continental shelf waters in the northwest Pacific Ocean and plays an important role in the local ecosystem by linking primary production to higher trophic levels. A lack of effective molecular markers has hindered phylogenetic and population genetic studies concerning copepods. As they are genome-level informative, mitochondrial DNA sequences can be used as markers for population genetic studies and phylogenetic studies. Results The mitochondrial genome of C. sinicus is distinct from other arthropods owing to the concurrence of multiple non-coding regions and a reshuffled gene arrangement. Further particularities in the mitogenome of C. sinicus include low A + T-content, symmetrical nucleotide composition between strands, abbreviated stop codons for several PCGs and extended lengths of the genes atp6 and atp8 relative to other copepods. The monophyletic Copepoda should be placed within the Vericrustacea. The close affinity between Cyclopoida and Poecilostomatoida suggests reassigning the latter as subordinate to the former. Monophyly of Maxillopoda is rejected. Within the alignment of 11 C. sinicus mitogenomes, there are 397 variable sites harbouring three 'hotspot' variable sites and three microsatellite loci. Conclusion The occurrence of the circular subgenomic fragment during laboratory assays suggests that special caution should be taken when sequencing mitogenomes using long PCR. Such a phenomenon may provide additional evidence of mitochondrial DNA recombination, which appears to have been a prerequisite for shaping the present mitochondrial profile of C. sinicus during its evolution. The lack of synapomorphic gene arrangements among copepods has cast doubt on the utility of gene order as a useful molecular marker for deep phylogenetic analysis. However, mitochondrial genomic sequences have been valuable markers for resolving phylogenetic issues concerning copepods. The variable site maps of C. sinicus mitogenomes provide a solid foundation for population genetic studies. PMID:21269523
Multiple transcription factor codes activate epidermal wound–response genes in Drosophila

PubMed Central

Pearson, Joseph C.; Juarez, Michelle T.; Kim, Myungjin; Drivenes, Øyvind; McGinnis, William

2009-01-01

Wounds in Drosophila and mouse embryos induce similar genetic pathways to repair epidermal barriers. However, the transcription factors that transduce wound signals to repair epidermal barriers are largely unknown. We characterize the transcriptional regulatory enhancers of 4 genes—Ddc, ple, msn, and kkv—that are rapidly activated in epidermal cells surrounding wounds in late Drosophila embryos and early larvae. These epidermal wound enhancers all contain evolutionarily conserved sequences matching binding sites for JUN/FOS and GRH transcription factors, but vary widely in trans- and cis-requirements for these inputs and their binding sites. We propose that the combination of GRH and FOS is part of an ancient wound–response pathway still used in vertebrates and invertebrates, but that other mechanisms have evolved that result in similar transcriptional output. A common, but largely untested assumption of bioinformatic analyses of gene regulatory networks is that transcription units activated in the same spatial and temporal patterns will require the same cis-regulatory codes. Our results indicate that this is an overly simplistic view. PMID:19168633
RNAcode: Robust discrimination of coding and noncoding regions in comparative sequence data

PubMed Central

Washietl, Stefan; Findeiß, Sven; Müller, Stephan A.; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L.; Stadler, Peter F.; Goldman, Nick

2011-01-01

With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied “out of the box,” without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as “noncoding.” RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode. PMID:21357752
RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data.

PubMed

Washietl, Stefan; Findeiss, Sven; Müller, Stephan A; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L; Stadler, Peter F; Goldman, Nick

2011-04-01

With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box," without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as "noncoding." RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.
Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes.

PubMed

Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan

2017-10-03

Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.
Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes

PubMed Central

Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan

2017-01-01

Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes. PMID:29108274

A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes.

PubMed

Hezroni, Hadas; Ben-Tov Perry, Rotem; Meir, Zohar; Housman, Gali; Lubelsky, Yoav; Ulitsky, Igor

2017-08-30

Only a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs. We systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality. We estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.
Regularized rare variant enrichment analysis for case-control exome sequencing data.

PubMed

Larson, Nicholas B; Schaid, Daniel J

2014-02-01

Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research. © 2013 WILEY PERIODICALS, INC.
Comparison of the protein-coding gene content of Chlamydia trachomatis and Protochlamydia amoebophila using a Raspberry Pi computer.

PubMed

Robson, James F; Barker, Daniel

2015-10-13

To demonstrate the bioinformatics capabilities of a low-cost computer, the Raspberry Pi, we present a comparison of the protein-coding gene content of two species in phylum Chlamydiae: Chlamydia trachomatis, a common sexually transmitted infection of humans, and Candidatus Protochlamydia amoebophila, a recently discovered amoebal endosymbiont. Identifying species-specific proteins and differences in protein families could provide insights into the unique phenotypes of the two species. Using a Raspberry Pi computer, sequence similarity-based protein families were predicted across the two species, C. trachomatis and P. amoebophila, and their members counted. Examples include nine multi-protein families unique to C. trachomatis, 132 multi-protein families unique to P. amoebophila and one family with multiple copies in both. Most families unique to C. trachomatis were polymorphic outer-membrane proteins. Additionally, multiple protein families lacking functional annotation were found. Predicted functional interactions suggest one of these families is involved with the exodeoxyribonuclease V complex. The Raspberry Pi computer is adequate for a comparative genomics project of this scope. The protein families unique to P. amoebophila may provide a basis for investigating the host-endosymbiont interaction. However, additional species should be included; and further laboratory research is required to identify the functions of unknown or putative proteins. Multiple outer membrane proteins were found in C. trachomatis, suggesting importance for host evasion. The tyrosine transport protein family is shared between both species, with four proteins in C. trachomatis and two in P. amoebophila. Shared protein families could provide a starting point for discovery of wide-spectrum drugs against Chlamydiae.
Detection of Multiple Budding Yeast Cells and a Partial Sequence of 43-kDa Glycoprotein Coding Gene of Paracoccidioides brasiliensis from a Case of Lacaziosis in a Female Pacific White-Sided Dolphin (Lagenorhynchus obliquidens).

PubMed

Minakawa, Tomoko; Ueda, Keiichi; Tanaka, Miyuu; Tanaka, Natsuki; Kuwamura, Mitsuru; Izawa, Takeshi; Konno, Toshihiro; Yamate, Jyoji; Itano, Eiko Nakagawa; Sano, Ayako; Wada, Shinpei

2016-08-01

Lacaziosis, formerly called as lobomycosis, is a zoonotic mycosis, caused by Lacazia loboi, found in humans and dolphins, and is endemic in the countries on the Atlantic Ocean, Indian Ocean and Pacific Ocean of Japanese coast. Susceptible Cetacean species include the bottlenose dolphin (Tursiops truncatus), the Indian Ocean bottlenose dolphin (T. aduncus), and the estuarine dolphin (Sotalia guianensis); however, no cases have been recorded in other Cetacean species. We diagnosed a case of Lacaziosis in a Pacific white-sided dolphin (Lagenorhynchus obliquidens) nursing in an aquarium in Japan. The dolphin was a female estimated to be more than 14 years old at the end of June 2015 and was captured in a coast of Japan Sea in 2001. Multiple, lobose, and solid granulomatous lesions with or without ulcers appeared on her jaw, back, flipper and fluke skin, in July 2014. The granulomatous skin lesions from the present case were similar to those of our previous cases. Multiple budding and chains of round yeast cells were detected in the biopsied samples. The partial sequence of 43-kDa glycoprotein coding gene confirmed by a nested PCR and sequencing, which revealed a different genotype from both Amazonian and Japanese lacaziosis in bottlenose dolphins, and was 99 % identical to those derived from Paracoccidioides brasiliensis; a sister fungal species to L. loboi. This is the first case of lacaziosis in Pacific white-sided dolphin.
A Simple Test of Class-Level Genetic Association Can Reveal Novel Cardiometabolic Trait Loci.

PubMed

Qian, Jing; Nunez, Sara; Reed, Eric; Reilly, Muredach P; Foulkes, Andrea S

2016-01-01

Characterizing the genetic determinants of complex diseases can be further augmented by incorporating knowledge of underlying structure or classifications of the genome, such as newly developed mappings of protein-coding genes, epigenetic marks, enhancer elements and non-coding RNAs. We apply a simple class-level testing framework, termed Genetic Class Association Testing (GenCAT), to identify protein-coding gene association with 14 cardiometabolic (CMD) related traits across 6 publicly available genome wide association (GWA) meta-analysis data resources. GenCAT uses SNP-level meta-analysis test statistics across all SNPs within a class of elements, as well as the size of the class and its unique correlation structure, to determine if the class is statistically meaningful. The novelty of findings is evaluated through investigation of regional signals. A subset of findings are validated using recently updated, larger meta-analysis resources. A simulation study is presented to characterize overall performance with respect to power, control of family-wise error and computational efficiency. All analysis is performed using the GenCAT package, R version 3.2.1. We demonstrate that class-level testing complements the common first stage minP approach that involves individual SNP-level testing followed by post-hoc ascribing of statistically significant SNPs to genes and loci. GenCAT suggests 54 protein-coding genes at 41 distinct loci for the 13 CMD traits investigated in the discovery analysis, that are beyond the discoveries of minP alone. An additional application to biological pathways demonstrates flexibility in defining genetic classes. We conclude that it would be prudent to include class-level testing as standard practice in GWA analysis. GenCAT, for example, can be used as a simple, complementary and efficient strategy for class-level testing that leverages existing data resources, requires only summary level data in the form of test statistics, and adds significant value with respect to its potential for identifying multiple novel and clinically relevant trait associations.
Population-specific association of genes for telomere-associated proteins with longevity in an Italian population.

PubMed

Crocco, Paolina; Barale, Roberto; Rose, Giuseppina; Rizzato, Cosmeri; Santoro, Aurelia; De Rango, Francesco; Carrai, Maura; Fogar, Paola; Monti, Daniela; Biondi, Fiammetta; Bucci, Laura; Ostan, Rita; Tallaro, Federica; Montesanto, Alberto; Zambon, Carlo-Federico; Franceschi, Claudio; Canzian, Federico; Passarino, Giuseppe; Campa, Daniele

2015-06-01

Leukocyte telomere length (LTL) has been observed to be hereditable and correlated with longevity. However, contrasting results have been reported in different populations on the value of LTL heritability and on how biology of telomeres influences longevity. We investigated whether the variability of genes correlated to telomere maintenance is associated with telomere length and affects longevity in a population from Southern Italy (20-106 years). For this purpose we analyzed thirty-one polymorphisms in eight telomerase-associated genes of which twelve in the genes coding for the core enzyme (TERT and TERC) and the remaining in genes coding for components of the telomerase complex (TERF1, TERF2, TERF2IP, TNKS, TNKS2 and TEP1). We did not observe (after correcting for multiple testing) statistically significant associations between SNPs and LTL, possibly suggesting a low genetic influence of the variability of these genes on LTL in the elderly. On the other hand, we found that the variability of genes encoding for TERF1 and TNKS2, not directly involved in LTL, but important for keeping the integrity of the structure, shows a significant association with longevity. This suggests that the maintenance of these chromosomal structures may be critically important for preventing, or delaying, senescence and aging. Such a correlation was not observed in a population from northern Italy that we used as an independent replication set. This discrepancy is in line with previous reports regarding both the population specificity of results on telomere biology and the differences of aging in northern and southern Italy.
Multiple Trellis Coded Modulation (MTCM): An MSAT-X report

NASA Technical Reports Server (NTRS)

Divsalar, D.; Simon, M. K.

1986-01-01

Conventional trellis coding outputs one channel symbol per trellis branch. The notion of multiple trellis coding is introduced wherein more than one channel symbol per trellis branch is transmitted. It is shown that the combination of multiple trellis coding with M-ary modulation yields a performance gain with symmetric signal set comparable to that previously achieved only with signal constellation asymmetry. The advantage of multiple trellis coding over the conventional trellis coded asymmetric modulation technique is that the potential for code catastrophe associated with the latter has been eliminated with no additional cost in complexity (as measured by the number of states in the trellis diagram).
Comparative analysis of mitochondrial genomes between a wheat K-type cytoplasmic male sterility (CMS) line and its maintainer line.

PubMed

Liu, Huitao; Cui, Peng; Zhan, Kehui; Lin, Qiang; Zhuo, Guoyin; Guo, Xiaoli; Ding, Feng; Yang, Wenlong; Liu, Dongcheng; Hu, Songnian; Yu, Jun; Zhang, Aimin

2011-03-29

Plant mitochondria, semiautonomous organelles that function as manufacturers of cellular ATP, have their own genome that has a slow rate of evolution and rapid rearrangement. Cytoplasmic male sterility (CMS), a common phenotype in higher plants, is closely associated with rearrangements in mitochondrial DNA (mtDNA), and is widely used to produce F1 hybrid seeds in a variety of valuable crop species. Novel chimeric genes deduced from mtDNA rearrangements causing CMS have been identified in several plants, such as rice, sunflower, pepper, and rapeseed, but there are very few reports about mtDNA rearrangements in wheat. In the present work, we describe the mitochondrial genome of a wheat K-type CMS line and compare it with its maintainer line. The complete mtDNA sequence of a wheat K-type (with cytoplasm of Aegilops kotschyi) CMS line, Ks3, was assembled into a master circle (MC) molecule of 647,559 bp and found to harbor 34 known protein-coding genes, three rRNAs (18 S, 26 S, and 5 S rRNAs), and 16 different tRNAs. Compared to our previously published sequence of a K-type maintainer line, Km3, we detected Ks3-specific mtDNA (> 100 bp, 11.38%) and repeats (> 100 bp, 29 units) as well as genes that are unique to each line: rpl5 was missing in Ks3 and trnH was absent from Km3. We also defined 32 single nucleotide polymorphisms (SNPs) in 13 protein-coding, albeit functionally irrelevant, genes, and predicted 22 unique ORFs in Ks3, representing potential candidates for K-type CMS. All these sequence variations are candidates for involvement in CMS. A comparative analysis of the mtDNA of several angiosperms, including those from Ks3, Km3, rice, maize, Arabidopsis thaliana, and rapeseed, showed that non-coding sequences of higher plants had mostly divergent multiple reorganizations during the mtDNA evolution of higher plants. The complete mitochondrial genome of the wheat K-type CMS line Ks3 is very different from that of its maintainer line Km3, especially in non-coding sequences. Sequence rearrangement has produced novel chimeric ORFs, which may be candidate genes for CMS. Comparative analysis of several angiosperm mtDNAs indicated that non-coding sequences are the most frequently reorganized during mtDNA evolution in higher plants.
De Novo ORFs in Drosophila Are Important to Organismal Fitness and Evolved Rapidly from Previously Non-coding Sequences

PubMed Central

Reinhardt, Josephine A.; Wanjiru, Betty M.; Brant, Alicia T.; Saelao, Perot; Begun, David J.; Jones, Corbin D.

2013-01-01

How non-coding DNA gives rise to new protein-coding genes (de novo genes) is not well understood. Recent work has revealed the origins and functions of a few de novo genes, but common principles governing the evolution or biological roles of these genes are unknown. To better define these principles, we performed a parallel analysis of the evolution and function of six putatively protein-coding de novo genes described in Drosophila melanogaster. Reconstruction of the transcriptional history of de novo genes shows that two de novo genes emerged from novel long non-coding RNAs that arose at least 5 MY prior to evolution of an open reading frame. In contrast, four other de novo genes evolved a translated open reading frame and transcription within the same evolutionary interval suggesting that nascent open reading frames (proto-ORFs), while not required, can contribute to the emergence of a new de novo gene. However, none of the genes arose from proto-ORFs that existed long before expression evolved. Sequence and structural evolution of de novo genes was rapid compared to nearby genes and the structural complexity of de novo genes steadily increases over evolutionary time. Despite the fact that these genes are transcribed at a higher level in males than females, and are most strongly expressed in testes, RNAi experiments show that most of these genes are essential in both sexes during metamorphosis. This lethality suggests that protein coding de novo genes in Drosophila quickly become functionally important. PMID:24146629
De Novo Origin of Human Protein-Coding Genes

PubMed Central

Wu, Dong-Dong; Irwin, David M.; Zhang, Ya-Ping

2011-01-01

The de novo origin of a new protein-coding gene from non-coding DNA is considered to be a very rare occurrence in genomes. Here we identify 60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee. The functionality of these genes is supported by both transcriptional and proteomic evidence. RNA–seq data indicate that these genes have their highest expression levels in the cerebral cortex and testes, which might suggest that these genes contribute to phenotypic traits that are unique to humans, such as improved cognitive ability. Our results are inconsistent with the traditional view that the de novo origin of new genes is very rare, thus there should be greater appreciation of the importance of the de novo origination of genes. PMID:22102831
Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.

PubMed

Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A

2018-04-24

mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.
Divergent evolution of multiple virus-resistance genes from a progenitor in Capsicum spp.

PubMed

Kim, Saet-Byul; Kang, Won-Hee; Huy, Hoang Ngoc; Yeom, Seon-In; An, Jeong-Tak; Kim, Seungill; Kang, Min-Young; Kim, Hyun Jung; Jo, Yeong Deuk; Ha, Yeaseong; Choi, Doil; Kang, Byoung-Cheorl

2017-01-01

Plants have evolved hundreds of nucleotide-binding and leucine-rich domain proteins (NLRs) as potential intracellular immune receptors, but the evolutionary mechanism leading to the ability to recognize specific pathogen effectors is elusive. Here, we cloned Pvr4 (a Potyvirus resistance gene in Capsicum annuum) and Tsw (a Tomato spotted wilt virus resistance gene in Capsicum chinense) via a genome-based approach using independent segregating populations. The genes both encode typical NLRs and are located at the same locus on pepper chromosome 10. Despite the fact that these two genes recognize completely different viral effectors, the genomic structures and coding sequences of the two genes are strikingly similar. Phylogenetic studies revealed that these two immune receptors diverged from a progenitor gene of a common ancestor. Our results suggest that sequence variations caused by gene duplication and neofunctionalization may underlie the evolution of the ability to specifically recognize different effectors. These findings thereby provide insight into the divergent evolution of plant immune receptors. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Unique features of a global human ectoparasite identified through sequencing of the bed bug genome.

PubMed

Benoit, Joshua B; Adelman, Zach N; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C; Szuter, Elise M; Hagan, Richard W; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M; Nelson, David R; Rosendale, Andrew J; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R; Ioannidis, Panagiotis; Waterhouse, Robert M; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J Spencer; Gondhalekar, Ameya D; Scharf, Michael E; Peterson, Brittany F; Raje, Kapil R; Hottel, Benjamin A; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S T; Duncan, Elizabeth J; Murali, Shwetha C; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C; Muzny, Donna M; Wheeler, David; Panfilio, Kristen A; Vargas Jentzsch, Iris M; Vargo, Edward L; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T; Anderson, Michelle A E; Jones, Jeffery W; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D; Attardo, Geoffrey M; Robertson, Hugh M; Zdobnov, Evgeny M; Ribeiro, Jose M C; Gibbs, Richard A; Werren, John H; Palli, Subba R; Schal, Coby; Richards, Stephen

2016-02-02

The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host-symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human-bed bug and symbiont-bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite.
Mediator phosphorylation prevents stress response transcription during non-stress conditions.

PubMed

Miller, Christian; Matic, Ivan; Maier, Kerstin C; Schwalb, Björn; Roether, Susanne; Strässer, Katja; Tresch, Achim; Mann, Matthias; Cramer, Patrick

2012-12-28

The multiprotein complex Mediator is a coactivator of RNA polymerase (Pol) II transcription that is required for the regulated expression of protein-coding genes. Mediator serves as an end point of signaling pathways and regulates Pol II transcription, but the mechanisms it uses are not well understood. Here, we used mass spectrometry and dynamic transcriptome analysis to investigate a functional role of Mediator phosphorylation in gene expression. Affinity purification and mass spectrometry revealed that Mediator from the yeast Saccharomyces cerevisiae is phosphorylated at multiple sites of 17 of its 25 subunits. Mediator phosphorylation levels change upon an external stimulus set by exposure of cells to high salt concentrations. Phosphorylated sites in the Mediator tail subunit Med15 are required for suppression of stress-induced changes in gene expression under non-stress conditions. Thus dynamic and differential Mediator phosphorylation contributes to gene regulation in eukaryotic cells.
Medical Sequencing at the extremes of Human Body Mass

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ahituv, Nadav; Kavaslar, Nihan; Schackwitz, Wendy

2006-09-01

Body weight is a quantitative trait with significantheritability in humans. To identify potential genetic contributors tothis phenotype, we resequenced the coding exons and splice junctions of58 genes in 379 obese and 378 lean individuals. Our 96Mb survey included21 genes associated with monogenic forms of obesity in humans or mice, aswell as 37 genes that function in body weight-related pathways. We foundthat the monogenic obesity-associated gene group was enriched for rarenonsynonymous variants unique to the obese (n=46) versus lean (n=26)populations. Computational analysis further predicted a significantlygreater fraction of deleterious variants within the obese cohort.Consistent with the complex inheritance of body weight,more » we did notobserve obvious familial segregation in the majority of the 28 availablekindreds. Taken together, these data suggest that multiple rare alleleswith variable penetrance contribute to obesity in the population andprovide a deep medical sequencing based approach to detectthem.« less
Unique features of a global human ectoparasite identified through sequencing of the bed bug genome

PubMed Central

Benoit, Joshua B.; Adelman, Zach N.; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C.; Szuter, Elise M.; Hagan, Richard W.; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M.; Nelson, David R.; Rosendale, Andrew J.; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M.; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R.; Ioannidis, Panagiotis; Waterhouse, Robert M.; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J. Spencer; Gondhalekar, Ameya D.; Scharf, Michael E.; Peterson, Brittany F.; Raje, Kapil R.; Hottel, Benjamin A.; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S. T.; Duncan, Elizabeth J.; Murali, Shwetha C.; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L.; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C.; Muzny, Donna M.; Wheeler, David; Panfilio, Kristen A.; Vargas Jentzsch, Iris M.; Vargo, Edward L.; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T.; Anderson, Michelle A. E.; Jones, Jeffery W.; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D.; Attardo, Geoffrey M.; Robertson, Hugh M.; Zdobnov, Evgeny M.; Ribeiro, Jose M. C.; Gibbs, Richard A.; Werren, John H.; Palli, Subba R.; Schal, Coby; Richards, Stephen

2016-01-01

The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host–symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human–bed bug and symbiont–bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite. PMID:26836814
Disentangling the many layers of eukaryotic transcriptional regulation.

PubMed

Lelli, Katherine M; Slattery, Matthew; Mann, Richard S

2012-01-01

Regulation of gene expression in eukaryotes is an extremely complex process. In this review, we break down several critical steps, emphasizing new data and techniques that have expanded current gene regulatory models. We begin at the level of DNA sequence where cis-regulatory modules (CRMs) provide important regulatory information in the form of transcription factor (TF) binding sites. In this respect, CRMs function as instructional platforms for the assembly of gene regulatory complexes. We discuss multiple mechanisms controlling complex assembly, including cooperative DNA binding, combinatorial codes, and CRM architecture. The second section of this review places CRM assembly in the context of nucleosomes and condensed chromatin. We discuss how DNA accessibility and histone modifications contribute to TF function. Lastly, new advances in chromosomal mapping techniques have provided increased understanding of intra- and interchromosomal interactions. We discuss how these topological maps influence gene regulatory models.
GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores.

PubMed

Chikkagoudar, Satish; Wang, Kai; Li, Mingyao

2011-05-26

Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/.
Natural Hot Spots for Gain of Multiple Resistances: Arsenic and Antibiotic Resistances in Heterotrophic, Aerobic Bacteria from Marine Hydrothermal Vent Fields

PubMed Central

Farias, Pedro; Espírito Santo, Christophe; Branco, Rita; Francisco, Romeu; Santos, Susana; Hansen, Lars; Sorensen, Soren

2015-01-01

Microorganisms are responsible for multiple antibiotic resistances that have been associated with resistance/tolerance to heavy metals, with consequences to public health. Many genes conferring these resistances are located on mobile genetic elements, easily exchanged among phylogenetically distant bacteria. The objective of the present work was to isolate arsenic-, antimonite-, and antibiotic-resistant strains and to determine the existence of plasmids harboring antibiotic/arsenic/antimonite resistance traits in phenotypically resistant strains, in a nonanthropogenically impacted environment. The hydrothermal Lucky Strike field in the Azores archipelago (North Atlantic, between 11°N and 38°N), at the Mid-Atlantic Ridge, protected under the OSPAR Convention, was sampled as a metal-rich pristine environment. A total of 35 strains from 8 different species were isolated in the presence of arsenate, arsenite, and antimonite. ACR3 and arsB genes were amplified from the sediment's total DNA, and 4 isolates also carried ACR3 genes. Phenotypic multiple resistances were found in all strains, and 7 strains had recoverable plasmids. Purified plasmids were sequenced by Illumina and assembled by EDENA V3, and contig annotation was performed using the “Rapid Annotation using the Subsystems Technology” server. Determinants of resistance to copper, zinc, cadmium, cobalt, and chromium as well as to the antibiotics β-lactams and fluoroquinolones were found in the 3 sequenced plasmids. Genes coding for heavy metal resistance and antibiotic resistance in the same mobile element were found, suggesting the possibility of horizontal gene transfer and distribution of theses resistances in the bacterial population. PMID:25636836
GENIE: a software package for gene-gene interaction analysis in genetic association studies using multiple GPU or CPU cores

PubMed Central

2011-01-01

Background Gene-gene interaction in genetic association studies is computationally intensive when a large number of SNPs are involved. Most of the latest Central Processing Units (CPUs) have multiple cores, whereas Graphics Processing Units (GPUs) also have hundreds of cores and have been recently used to implement faster scientific software. However, currently there are no genetic analysis software packages that allow users to fully utilize the computing power of these multi-core devices for genetic interaction analysis for binary traits. Findings Here we present a novel software package GENIE, which utilizes the power of multiple GPU or CPU processor cores to parallelize the interaction analysis. GENIE reads an entire genetic association study dataset into memory and partitions the dataset into fragments with non-overlapping sets of SNPs. For each fragment, GENIE analyzes: 1) the interaction of SNPs within it in parallel, and 2) the interaction between the SNPs of the current fragment and other fragments in parallel. We tested GENIE on a large-scale candidate gene study on high-density lipoprotein cholesterol. Using an NVIDIA Tesla C1060 graphics card, the GPU mode of GENIE achieves a speedup of 27 times over its single-core CPU mode run. Conclusions GENIE is open-source, economical, user-friendly, and scalable. Since the computing power and memory capacity of graphics cards are increasing rapidly while their cost is going down, we anticipate that GENIE will achieve greater speedups with faster GPU cards. Documentation, source code, and precompiled binaries can be downloaded from http://www.cceb.upenn.edu/~mli/software/GENIE/. PMID:21615923

Network perturbation by recurrent regulatory variants in cancer

PubMed Central

Cho, Ara; Lee, Insuk; Choi, Jung Kyoon

2017-01-01

Cancer driving genes have been identified as recurrently affected by variants that alter protein-coding sequences. However, a majority of cancer variants arise in noncoding regions, and some of them are thought to play a critical role through transcriptional perturbation. Here we identified putative transcriptional driver genes based on combinatorial variant recurrence in cis-regulatory regions. The identified genes showed high connectivity in the cancer type-specific transcription regulatory network, with high outdegree and many downstream genes, highlighting their causative role during tumorigenesis. In the protein interactome, the identified transcriptional drivers were not as highly connected as coding driver genes but appeared to form a network module centered on the coding drivers. The coding and regulatory variants associated via these interactions between the coding and transcriptional drivers showed exclusive and complementary occurrence patterns across tumor samples. Transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes. PMID:28333928
Associations and interactions between SNPs in the alcohol metabolizing genes and alcoholism phenotypes in European Americans.

PubMed

Sherva, Richard; Rice, John P; Neuman, Rosalind J; Rochberg, Nanette; Saccone, Nancy L; Bierut, Laura J

2009-05-01

Alcohol dependence is a major cause of morbidity and mortality worldwide and has a strong familial component. Several linkage and association studies have identified chromosomal regions and/or genes that affect alcohol consumption, notably in genes involved in the 2-stage pathway of alcohol metabolism. Here, we use multiple regression models to test for associations and interactions between 2 alcohol-related phenotypes and SNPs in 17 genes involved in alcohol metabolism in a sample of 1,588 European American subjects. The strongest evidence for association after correcting for multiple testing was between rs1229984, a nonsynonymous coding SNP in ADH1B, and DSM-IV symptom count (p = 0.0003). This SNP was also associated with maximum number of drinks in 24 hours (p = 0.0004). Each minor allele at this SNP predicts 45% fewer DSM-IV symptoms and 18% fewer max drinks. Another SNP in a splice site in ALDH1A1 (rs8187974) showed evidence for association with both phenotypes as well (p = 0.02 and 0.004, respectively), but neither association was significant after accounting for multiple testing. Minor alleles at this SNP predict greater alcohol consumption. In addition, pairwise interactions were observed between SNPs in several genes (p = 0.00002). We replicated the large effect of rs1229984 on alcohol behavior, and although not common (MAF = 4%), this polymorphism may be highly relevant from a public health perspective in European Americans. Another SNP, rs8187974, may also affect alcohol behavior but requires replication. Also, interactions between polymorphisms in genes involved in alcohol metabolism are likely determinants of the parameters that ultimately affect alcohol consumption.
Identification and Functional Characterization of G6PC2 Coding Variants Influencing Glycemic Traits Define an Effector Transcript at the G6PC2-ABCB11 Locus

PubMed Central

Mahajan, Anubha; Sim, Xueling; Ng, Hui Jin; Manning, Alisa; Rivas, Manuel A.; Highland, Heather M.; Locke, Adam E.; Grarup, Niels; Im, Hae Kyung; Cingolani, Pablo; Flannick, Jason; Fontanillas, Pierre; Fuchsberger, Christian; Gaulton, Kyle J.; Teslovich, Tanya M.; Rayner, N. William; Robertson, Neil R.; Beer, Nicola L.; Rundle, Jana K.; Bork-Jensen, Jette; Ladenvall, Claes; Blancher, Christine; Buck, David; Buck, Gemma; Burtt, Noël P.; Gabriel, Stacey; Gjesing, Anette P.; Groves, Christopher J.; Hollensted, Mette; Huyghe, Jeroen R.; Jackson, Anne U.; Jun, Goo; Justesen, Johanne Marie; Mangino, Massimo; Murphy, Jacquelyn; Neville, Matt; Onofrio, Robert; Small, Kerrin S.; Stringham, Heather M.; Syvänen, Ann-Christine; Trakalo, Joseph; Abecasis, Goncalo; Bell, Graeme I.; Blangero, John; Cox, Nancy J.; Duggirala, Ravindranath; Hanis, Craig L.; Seielstad, Mark; Wilson, James G.; Christensen, Cramer; Brandslund, Ivan; Rauramaa, Rainer; Surdulescu, Gabriela L.; Doney, Alex S. F.; Lannfelt, Lars; Linneberg, Allan; Isomaa, Bo; Tuomi, Tiinamaija; Jørgensen, Marit E.; Jørgensen, Torben; Kuusisto, Johanna; Uusitupa, Matti; Salomaa, Veikko; Spector, Timothy D.; Morris, Andrew D.; Palmer, Colin N. A.; Collins, Francis S.; Mohlke, Karen L.; Bergman, Richard N.; Ingelsson, Erik; Lind, Lars; Tuomilehto, Jaakko; Hansen, Torben; Watanabe, Richard M.; Prokopenko, Inga; Dupuis, Josee; Karpe, Fredrik; Groop, Leif; Laakso, Markku; Pedersen, Oluf; Florez, Jose C.; Morris, Andrew P.; Altshuler, David; Meigs, James B.; Boehnke, Michael; McCarthy, Mark I.; Lindgren, Cecilia M.; Gloyn, Anna L.

2015-01-01

Genome wide association studies (GWAS) for fasting glucose (FG) and insulin (FI) have identified common variant signals which explain 4.8% and 1.2% of trait variance, respectively. It is hypothesized that low-frequency and rare variants could contribute substantially to unexplained genetic variance. To test this, we analyzed exome-array data from up to 33,231 non-diabetic individuals of European ancestry. We found exome-wide significant (P<5×10-7) evidence for two loci not previously highlighted by common variant GWAS: GLP1R (p.Ala316Thr, minor allele frequency (MAF)=1.5%) influencing FG levels, and URB2 (p.Glu594Val, MAF = 0.1%) influencing FI levels. Coding variant associations can highlight potential effector genes at (non-coding) GWAS signals. At the G6PC2/ABCB11 locus, we identified multiple coding variants in G6PC2 (p.Val219Leu, p.His177Tyr, and p.Tyr207Ser) influencing FG levels, conditionally independent of each other and the non-coding GWAS signal. In vitro assays demonstrate that these associated coding alleles result in reduced protein abundance via proteasomal degradation, establishing G6PC2 as an effector gene at this locus. Reconciliation of single-variant associations and functional effects was only possible when haplotype phase was considered. In contrast to earlier reports suggesting that, paradoxically, glucose-raising alleles at this locus are protective against type 2 diabetes (T2D), the p.Val219Leu G6PC2 variant displayed a modest but directionally consistent association with T2D risk. Coding variant associations for glycemic traits in GWAS signals highlight PCSK1, RREB1, and ZHX3 as likely effector transcripts. These coding variant association signals do not have a major impact on the trait variance explained, but they do provide valuable biological insights. PMID:25625282
Identification and functional characterization of G6PC2 coding variants influencing glycemic traits define an effector transcript at the G6PC2-ABCB11 locus.

PubMed

Mahajan, Anubha; Sim, Xueling; Ng, Hui Jin; Manning, Alisa; Rivas, Manuel A; Highland, Heather M; Locke, Adam E; Grarup, Niels; Im, Hae Kyung; Cingolani, Pablo; Flannick, Jason; Fontanillas, Pierre; Fuchsberger, Christian; Gaulton, Kyle J; Teslovich, Tanya M; Rayner, N William; Robertson, Neil R; Beer, Nicola L; Rundle, Jana K; Bork-Jensen, Jette; Ladenvall, Claes; Blancher, Christine; Buck, David; Buck, Gemma; Burtt, Noël P; Gabriel, Stacey; Gjesing, Anette P; Groves, Christopher J; Hollensted, Mette; Huyghe, Jeroen R; Jackson, Anne U; Jun, Goo; Justesen, Johanne Marie; Mangino, Massimo; Murphy, Jacquelyn; Neville, Matt; Onofrio, Robert; Small, Kerrin S; Stringham, Heather M; Syvänen, Ann-Christine; Trakalo, Joseph; Abecasis, Goncalo; Bell, Graeme I; Blangero, John; Cox, Nancy J; Duggirala, Ravindranath; Hanis, Craig L; Seielstad, Mark; Wilson, James G; Christensen, Cramer; Brandslund, Ivan; Rauramaa, Rainer; Surdulescu, Gabriela L; Doney, Alex S F; Lannfelt, Lars; Linneberg, Allan; Isomaa, Bo; Tuomi, Tiinamaija; Jørgensen, Marit E; Jørgensen, Torben; Kuusisto, Johanna; Uusitupa, Matti; Salomaa, Veikko; Spector, Timothy D; Morris, Andrew D; Palmer, Colin N A; Collins, Francis S; Mohlke, Karen L; Bergman, Richard N; Ingelsson, Erik; Lind, Lars; Tuomilehto, Jaakko; Hansen, Torben; Watanabe, Richard M; Prokopenko, Inga; Dupuis, Josee; Karpe, Fredrik; Groop, Leif; Laakso, Markku; Pedersen, Oluf; Florez, Jose C; Morris, Andrew P; Altshuler, David; Meigs, James B; Boehnke, Michael; McCarthy, Mark I; Lindgren, Cecilia M; Gloyn, Anna L

2015-01-01

Genome wide association studies (GWAS) for fasting glucose (FG) and insulin (FI) have identified common variant signals which explain 4.8% and 1.2% of trait variance, respectively. It is hypothesized that low-frequency and rare variants could contribute substantially to unexplained genetic variance. To test this, we analyzed exome-array data from up to 33,231 non-diabetic individuals of European ancestry. We found exome-wide significant (P<5×10-7) evidence for two loci not previously highlighted by common variant GWAS: GLP1R (p.Ala316Thr, minor allele frequency (MAF)=1.5%) influencing FG levels, and URB2 (p.Glu594Val, MAF = 0.1%) influencing FI levels. Coding variant associations can highlight potential effector genes at (non-coding) GWAS signals. At the G6PC2/ABCB11 locus, we identified multiple coding variants in G6PC2 (p.Val219Leu, p.His177Tyr, and p.Tyr207Ser) influencing FG levels, conditionally independent of each other and the non-coding GWAS signal. In vitro assays demonstrate that these associated coding alleles result in reduced protein abundance via proteasomal degradation, establishing G6PC2 as an effector gene at this locus. Reconciliation of single-variant associations and functional effects was only possible when haplotype phase was considered. In contrast to earlier reports suggesting that, paradoxically, glucose-raising alleles at this locus are protective against type 2 diabetes (T2D), the p.Val219Leu G6PC2 variant displayed a modest but directionally consistent association with T2D risk. Coding variant associations for glycemic traits in GWAS signals highlight PCSK1, RREB1, and ZHX3 as likely effector transcripts. These coding variant association signals do not have a major impact on the trait variance explained, but they do provide valuable biological insights.
A Dual Origin of the Xist Gene from a Protein-Coding Gene and a Set of Transposable Elements

PubMed Central

Elisaphenko, Eugeny A.; Kolesnikov, Nikolay N.; Shevchenko, Alexander I.; Rogozin, Igor B.; Nesterova, Tatyana B.; Brockdorff, Neil; Zakian, Suren M.

2008-01-01

X-chromosome inactivation, which occurs in female eutherian mammals is controlled by a complex X-linked locus termed the X-inactivation center (XIC). Previously it was proposed that genes of the XIC evolved, at least in part, as a result of pseudogenization of protein-coding genes. In this study we show that the key XIC gene Xist, which displays fragmentary homology to a protein-coding gene Lnx3, emerged de novo in early eutherians by integration of mobile elements which gave rise to simple tandem repeats. The Xist gene promoter region and four out of ten exons found in eutherians retain homology to exons of the Lnx3 gene. The remaining six Xist exons including those with simple tandem repeats detectable in their structure have similarity to different transposable elements. Integration of mobile elements into Xist accompanies the overall evolution of the gene and presumably continues in contemporary eutherian species. Additionally we showed that the combination of remnants of protein-coding sequences and mobile elements is not unique to the Xist gene and is found in other XIC genes producing non-coding nuclear RNA. PMID:18575625
Multifaceted Targeting of the Chromatin Mediates Gonadotropin-Releasing Hormone Effects on Gene Expression in the Gonadotrope.

PubMed

Melamed, Philippa; Haj, Majd; Yosefzon, Yahav; Rudnizky, Sergei; Wijeweera, Andrea; Pnueli, Lilach; Kaplan, Ariel

2018-01-01

Gonadotropin-releasing hormone (GnRH) stimulates the expression of multiple genes in the pituitary gonadotropes, most notably to induce synthesis of the gonadotropins, luteinizing hormone (LH), and follicle-stimulating hormone (FSH), but also to ensure the appropriate functioning of these cells at the center of the mammalian reproductive endocrine axis. Aside from the activation of gene-specific transcription factors, GnRH stimulates through its membrane-bound receptor, alterations in the chromatin that facilitate transcription of its target genes. These include changes in the histone and DNA modifications, nucleosome positioning, and chromatin packaging at the regulatory regions of each gene. The requirements for each of these events vary according to the DNA sequence which determines the basal chromatin packaging at the regulatory regions. Despite considerable progress in this field in recent years, we are only beginning to understand some of the complexities involved in the role and regulation of this chromatin structure, including new modifications, extensive cross talk, histone variants, and the actions of distal enhancers and non-coding RNAs. This short review aims to integrate the latest findings on GnRH-induced alterations in the chromatin of its target genes, which indicate multiple and diverse actions. Understanding these processes is illuminating not only in the context of the activation of these hormones during the reproductive life span but may also reveal how aberrant epigenetic regulation of these genes leads to sub-fertility.
Multifaceted Targeting of the Chromatin Mediates Gonadotropin-Releasing Hormone Effects on Gene Expression in the Gonadotrope

PubMed Central

Melamed, Philippa; Haj, Majd; Yosefzon, Yahav; Rudnizky, Sergei; Wijeweera, Andrea; Pnueli, Lilach; Kaplan, Ariel

2018-01-01

Gonadotropin-releasing hormone (GnRH) stimulates the expression of multiple genes in the pituitary gonadotropes, most notably to induce synthesis of the gonadotropins, luteinizing hormone (LH), and follicle-stimulating hormone (FSH), but also to ensure the appropriate functioning of these cells at the center of the mammalian reproductive endocrine axis. Aside from the activation of gene-specific transcription factors, GnRH stimulates through its membrane-bound receptor, alterations in the chromatin that facilitate transcription of its target genes. These include changes in the histone and DNA modifications, nucleosome positioning, and chromatin packaging at the regulatory regions of each gene. The requirements for each of these events vary according to the DNA sequence which determines the basal chromatin packaging at the regulatory regions. Despite considerable progress in this field in recent years, we are only beginning to understand some of the complexities involved in the role and regulation of this chromatin structure, including new modifications, extensive cross talk, histone variants, and the actions of distal enhancers and non-coding RNAs. This short review aims to integrate the latest findings on GnRH-induced alterations in the chromatin of its target genes, which indicate multiple and diverse actions. Understanding these processes is illuminating not only in the context of the activation of these hormones during the reproductive life span but may also reveal how aberrant epigenetic regulation of these genes leads to sub-fertility. PMID:29535683
Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes

DOE PAGES

Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui; ...

2014-10-02

Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui

Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
GWIPS‐viz as a tool for exploring ribosome profiling evidence supporting the synthesis of alternative proteoforms

PubMed Central

Michel, Audrey M.; Ahern, Anna M.; Donohue, Claire A.

2015-01-01

The boundaries of protein coding sequences are more difficult to define at the 5′ end than at the 3′ end due to potential multiple translation initiation sites (TISs). Even in the presence of phylogenetic data, the use of sequence information only may not be sufficient for the accurate identification of TISs. Traditional proteomics approaches may also fail because the N‐termini of newly synthesized proteins are often processed. Thus ribosome profiling (ribo‐seq), producing a snapshot of the ribosome distribution across the entire transcriptome, is an attractive experimental technique for the purpose of TIS location exploration. The GWIPS‐viz (Genome Wide Information on Protein Synthesis visualized) browser (http://gwips.ucc.ie) provides free access to the genomic alignments of ribo‐seq data and corresponding mRNA‐seq data along with relevant annotation tracks. In this brief, we illustrate how GWIPS‐viz can be used to explore the ribosome occupancy at the 5′ ends of protein coding genes to assess the activity of AUG and non‐AUG TISs responsible for the synthesis of proteoforms with alternative or heterogeneous N‐termini. The presence of ribo‐seq tracks for various organisms allows for cross‐species comparison of orthologous genes and the availability of datasets from multiple laboratories permits the assessment of the technical reproducibility of the ribosome densities. PMID:25736862
Improving the genome annotation of the acarbose producer Actinoplanes sp. SE50/110 by sequencing enriched 5'-ends of primary transcripts.

PubMed

Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred

2014-11-20

Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria. Copyright © 2014 Elsevier B.V. All rights reserved.
MitoNuc: a database of nuclear genes coding for mitochondrial proteins. Update 2002.

PubMed

Attimonelli, Marcella; Catalano, Domenico; Gissi, Carmela; Grillo, Giorgio; Licciulli, Flavio; Liuni, Sabino; Santamaria, Monica; Pesole, Graziano; Saccone, Cecilia

2002-01-01

Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organelle genomes. Mitochondrial genomes have been extensively sequenced and analysed and data have been collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc, a database containing detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa. The MitoNuc database can be retrieved through SRS and is available via the web site http://bighost.area.ba.cnr.it/mitochondriome where other mitochondrial databases developed by our group, the complete list of the sequenced mitochondrial genomes, links to other mitochondrial sites and related information, are available. The MitoAln database, related to MitoNuc in the previous release, reporting the multiple alignments of the relevant homologous protein coding regions, is no longer supported in the present release. In order to keep the links among entries in MitoNuc from homologous proteins, a new field in the database has been defined: the cluster identifier, an alpha numeric code used to identify each cluster of homologous proteins. A comment field derived from the corresponding SWISS-PROT entry has been introduced; this reports clinical data related to dysfunction of the protein. The logic scheme of MitoNuc database has been implemented in the ORACLE DBMS. This will allow the end-users to retrieve data through a friendly interface that will be soon implemented.
Gene-Auto: Automatic Software Code Generation for Real-Time Embedded Systems

NASA Astrophysics Data System (ADS)

Rugina, A.-E.; Thomas, D.; Olive, X.; Veran, G.

2008-08-01

This paper gives an overview of the Gene-Auto ITEA European project, which aims at building a qualified C code generator from mathematical models under Matlab-Simulink and Scilab-Scicos. The project is driven by major European industry partners, active in the real-time embedded systems domains. The Gene- Auto code generator will significantly improve the current development processes in such domains by shortening the time to market and by guaranteeing the quality of the generated code through the use of formal methods. The first version of the Gene-Auto code generator has already been released and has gone thought a validation phase on real-life case studies defined by each project partner. The validation results are taken into account in the implementation of the second version of the code generator. The partners aim at introducing the Gene-Auto results into industrial development by 2010.
Optimal bit allocation for hybrid scalable/multiple-description video transmission over wireless channels

NASA Astrophysics Data System (ADS)

Jubran, Mohammad K.; Bansal, Manu; Kondi, Lisimachos P.

2006-01-01

In this paper, we consider the problem of optimal bit allocation for wireless video transmission over fading channels. We use a newly developed hybrid scalable/multiple-description codec that combines the functionality of both scalable and multiple-description codecs. It produces a base layer and multiple-description enhancement layers. Any of the enhancement layers can be decoded (in a non-hierarchical manner) with the base layer to improve the reconstructed video quality. Two different channel coding schemes (Rate-Compatible Punctured Convolutional (RCPC)/Cyclic Redundancy Check (CRC) coding and, product code Reed Solomon (RS)+RCPC/CRC coding) are used for unequal error protection of the layered bitstream. Optimal allocation of the bitrate between source and channel coding is performed for discrete sets of source coding rates and channel coding rates. Experimental results are presented for a wide range of channel conditions. Also, comparisons with classical scalable coding show the effectiveness of using hybrid scalable/multiple-description coding for wireless transmission.
Evidence for Transcript Networks Composed of Chimeric RNAs in Human Cells

PubMed Central

Borel, Christelle; Mudge, Jonathan M.; Howald, Cédric; Foissac, Sylvain; Ucla, Catherine; Chrast, Jacqueline; Ribeca, Paolo; Martin, David; Murray, Ryan R.; Yang, Xinping; Ghamsari, Lila; Lin, Chenwei; Bell, Ian; Dumais, Erica; Drenkow, Jorg; Tress, Michael L.; Gelpí, Josep Lluís; Orozco, Modesto; Valencia, Alfonso; van Berkum, Nynke L.; Lajoie, Bryan R.; Vidal, Marc; Stamatoyannopoulos, John; Batut, Philippe; Dobin, Alex; Harrow, Jennifer; Hubbard, Tim; Dekker, Job; Frankish, Adam; Salehi-Ashtiani, Kourosh; Reymond, Alexandre; Antonarakis, Stylianos E.; Guigó, Roderic; Gingeras, Thomas R.

2012-01-01

The classic organization of a gene structure has followed the Jacob and Monod bacterial gene model proposed more than 50 years ago. Since then, empirical determinations of the complexity of the transcriptomes found in yeast to human has blurred the definition and physical boundaries of genes. Using multiple analysis approaches we have characterized individual gene boundaries mapping on human chromosomes 21 and 22. Analyses of the locations of the 5′ and 3′ transcriptional termini of 492 protein coding genes revealed that for 85% of these genes the boundaries extend beyond the current annotated termini, most often connecting with exons of transcripts from other well annotated genes. The biological and evolutionary importance of these chimeric transcripts is underscored by (1) the non-random interconnections of genes involved, (2) the greater phylogenetic depth of the genes involved in many chimeric interactions, (3) the coordination of the expression of connected genes and (4) the close in vivo and three dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs. The non-random nature of the connection of the genes involved suggest that chimeric transcripts should not be studied in isolation, but together, as an RNA network. PMID:22238572
Co-fuse: a new class discovery analysis tool to identify and prioritize recurrent fusion genes from RNA-sequencing data.

PubMed

Paisitkriangkrai, Sakrapee; Quek, Kelly; Nievergall, Eva; Jabbour, Anissa; Zannettino, Andrew; Kok, Chung Hoow

2018-06-07

Recurrent oncogenic fusion genes play a critical role in the development of various cancers and diseases and provide, in some cases, excellent therapeutic targets. To date, analysis tools that can identify and compare recurrent fusion genes across multiple samples have not been available to researchers. To address this deficiency, we developed Co-occurrence Fusion (Co-fuse), a new and easy to use software tool that enables biologists to merge RNA-seq information, allowing them to identify recurrent fusion genes, without the need for exhaustive data processing. Notably, Co-fuse is based on pattern mining and statistical analysis which enables the identification of hidden patterns of recurrent fusion genes. In this report, we show that Co-fuse can be used to identify 2 distinct groups within a set of 49 leukemic cell lines based on their recurrent fusion genes: a multiple myeloma (MM) samples-enriched cluster and an acute myeloid leukemia (AML) samples-enriched cluster. Our experimental results further demonstrate that Co-fuse can identify known driver fusion genes (e.g., IGH-MYC, IGH-WHSC1) in MM, when compared to AML samples, indicating the potential of Co-fuse to aid the discovery of yet unknown driver fusion genes through cohort comparisons. Additionally, using a 272 primary glioma sample RNA-seq dataset, Co-fuse was able to validate recurrent fusion genes, further demonstrating the power of this analysis tool to identify recurrent fusion genes. Taken together, Co-fuse is a powerful new analysis tool that can be readily applied to large RNA-seq datasets, and may lead to the discovery of new disease subgroups and potentially new driver genes, for which, targeted therapies could be developed. The Co-fuse R source code is publicly available at https://github.com/sakrapee/co-fuse .
The CanOE strategy: integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes.

PubMed

Smith, Adam Alexander Thil; Belda, Eugeni; Viari, Alain; Medigue, Claudine; Vallenet, David

2012-05-01

Of all biochemically characterized metabolic reactions formalized by the IUBMB, over one out of four have yet to be associated with a nucleic or protein sequence, i.e. are sequence-orphan enzymatic activities. Few bioinformatics annotation tools are able to propose candidate genes for such activities by exploiting context-dependent rather than sequence-dependent data, and none are readily accessible and propose result integration across multiple genomes. Here, we present CanOE (Candidate genes for Orphan Enzymes), a four-step bioinformatics strategy that proposes ranked candidate genes for sequence-orphan enzymatic activities (or orphan enzymes for short). The first step locates "genomic metabolons", i.e. groups of co-localized genes coding proteins catalyzing reactions linked by shared metabolites, in one genome at a time. These metabolons can be particularly helpful for aiding bioanalysts to visualize relevant metabolic data. In the second step, they are used to generate candidate associations between un-annotated genes and gene-less reactions. The third step integrates these gene-reaction associations over several genomes using gene families, and summarizes the strength of family-reaction associations by several scores. In the final step, these scores are used to rank members of gene families which are proposed for metabolic reactions. These associations are of particular interest when the metabolic reaction is a sequence-orphan enzymatic activity. Our strategy found over 60,000 genomic metabolons in more than 1,000 prokaryote organisms from the MicroScope platform, generating candidate genes for many metabolic reactions, of which more than 70 distinct orphan reactions. A computational validation of the approach is discussed. Finally, we present a case study on the anaerobic allantoin degradation pathway in Escherichia coli K-12.
Biallelic insertion of a transcriptional terminator via the CRISPR/Cas9 system efficiently silences expression of protein-coding and non-coding RNA genes.

PubMed

Liu, Yangyang; Han, Xiao; Yuan, Junting; Geng, Tuoyu; Chen, Shihao; Hu, Xuming; Cui, Isabelle H; Cui, Hengmi

2017-04-07

The type II bacterial CRISPR/Cas9 system is a simple, convenient, and powerful tool for targeted gene editing. Here, we describe a CRISPR/Cas9-based approach for inserting a poly(A) transcriptional terminator into both alleles of a targeted gene to silence protein-coding and non-protein-coding genes, which often play key roles in gene regulation but are difficult to silence via insertion or deletion of short DNA fragments. The integration of 225 bp of bovine growth hormone poly(A) signals into either the first intron or the first exon or behind the promoter of target genes caused efficient termination of expression of PPP1R12C , NSUN2 (protein-coding genes), and MALAT1 (non-protein-coding gene). Both NeoR and PuroR were used as markers in the selection of clonal cell lines with biallelic integration of a poly(A) signal. Genotyping analysis indicated that the cell lines displayed the desired biallelic silencing after a brief selection period. These combined results indicate that this CRISPR/Cas9-based approach offers an easy, convenient, and efficient novel technique for gene silencing in cell lines, especially for those in which gene integration is difficult because of a low efficiency of homology-directed repair. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Analysis of the genome-wide variations among multiple strains of the plant pathogenic bacterium Xylella fastidiosa

PubMed Central

Doddapaneni, Harshavardhan; Yao, Jiqiang; Lin, Hong; Walker, M Andrew; Civerolo, Edwin L

2006-01-01

Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c), 54 (Dixon), 83 (Ann1) and 9 (Temecula-1). A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes have been identified as the main source of variations among strains, with individual strains showing different rates of genome evolution. Based on these genome comparisons, it appears that the Pierce's disease strain Temecula-1 genome represents the ancestral genome of the X. fastidiosa. Results of this analysis are publicly available in the form of a web database. PMID:16948851
Global characterization of copy number variants in epilepsy patients from whole genome sequencing

PubMed Central

Meloche, Caroline; Andrade, Danielle M.; Lafreniere, Ron G.; Gravel, Micheline; Spiegelman, Dan; Dionne-Laporte, Alexandre; Boelman, Cyrus; Hamdan, Fadi F.; Michaud, Jacques L.; Rouleau, Guy; Minassian, Berge A.; Bourque, Guillaume; Cossette, Patrick

2018-01-01

Epilepsy will affect nearly 3% of people at some point during their lifetime. Previous copy number variants (CNVs) studies of epilepsy have used array-based technology and were restricted to the detection of large or exonic events. In contrast, whole-genome sequencing (WGS) has the potential to more comprehensively profile CNVs but existing analytic methods suffer from limited accuracy. We show that this is in part due to the non-uniformity of read coverage, even after intra-sample normalization. To improve on this, we developed PopSV, an algorithm that uses multiple samples to control for technical variation and enables the robust detection of CNVs. Using WGS and PopSV, we performed a comprehensive characterization of CNVs in 198 individuals affected with epilepsy and 301 controls. For both large and small variants, we found an enrichment of rare exonic events in epilepsy patients, especially in genes with predicted loss-of-function intolerance. Notably, this genome-wide survey also revealed an enrichment of rare non-coding CNVs near previously known epilepsy genes. This enrichment was strongest for non-coding CNVs located within 100 Kbp of an epilepsy gene and in regions associated with changes in the gene expression, such as expression QTLs or DNase I hypersensitive sites. Finally, we report on 21 potentially damaging events that could be associated with known or new candidate epilepsy genes. Our results suggest that comprehensive sequence-based profiling of CNVs could help explain a larger fraction of epilepsy cases. PMID:29649218

A Detailed History of Intron-rich Eukaryotic Ancestors Inferred from a Global Survey of 100 Complete Genomes

PubMed Central

Csuros, Miklos; Rogozin, Igor B.; Koonin, Eugene V.

2011-01-01

Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing. PMID:21935348
Gene finding in metatranscriptomic sequences.

PubMed

Ismail, Wazim Mohammed; Ye, Yuzhen; Tang, Haixu

2014-01-01

Metatranscriptomic sequencing is a highly sensitive bioassay of functional activity in a microbial community, providing complementary information to the metagenomic sequencing of the community. The acquisition of the metatranscriptomic sequences will enable us to refine the annotations of the metagenomes, and to study the gene activities and their regulation in complex microbial communities and their dynamics. In this paper, we present TransGeneScan, a software tool for finding genes in assembled transcripts from metatranscriptomic sequences. By incorporating several features of metatranscriptomic sequencing, including strand-specificity, short intergenic regions, and putative antisense transcripts into a Hidden Markov Model, TranGeneScan can predict a sense transcript containing one or multiple genes (in an operon) or an antisense transcript. We tested TransGeneScan on a mock metatranscriptomic data set containing three known bacterial genomes. The results showed that TranGeneScan performs better than metagenomic gene finders (MetaGeneMark and FragGeneScan) on predicting protein coding genes in assembled transcripts, and achieves comparable or even higher accuracy than gene finders for microbial genomes (Glimmer and GeneMark). These results imply, with the assistance of metatranscriptomic sequencing, we can obtain a broad and precise picture about the genes (and their functions) in a microbial community. TransGeneScan is available as open-source software on SourceForge at https://sourceforge.net/projects/transgenescan/.
LORD: a phenotype-genotype semantically integrated biomedical data tool to support rare disease diagnosis coding in health information systems.

PubMed

Choquet, Remy; Maaroufi, Meriem; Fonjallaz, Yannick; de Carrara, Albane; Vandenbussche, Pierre-Yves; Dhombres, Ferdinand; Landais, Paul

Characterizing a rare disease diagnosis for a given patient is often made through expert's networks. It is a complex task that could evolve over time depending on the natural history of the disease and the evolution of the scientific knowledge. Most rare diseases have genetic causes and recent improvements of sequencing techniques contribute to the discovery of many new diseases every year. Diagnosis coding in the rare disease field requires data from multiple knowledge bases to be aggregated in order to offer the clinician a global information space from possible diagnosis to clinical signs (phenotypes) and known genetic mutations (genotype). Nowadays, the major barrier to the coding activity is the lack of consolidation of such information scattered in different thesaurus such as Orphanet, OMIM or HPO. The Linking Open data for Rare Diseases (LORD) web portal we developed stands as the first attempt to fill this gap by offering an integrated view of 8,400 rare diseases linked to more than 14,500 signs and 3,270 genes. The application provides a browsing feature to navigate through the relationships between diseases, signs and genes, and some Application Programming Interfaces to help its integration in health information systems in routine.
LORD: a phenotype-genotype semantically integrated biomedical data tool to support rare disease diagnosis coding in health information systems

PubMed Central

Choquet, Remy; Maaroufi, Meriem; Fonjallaz, Yannick; de Carrara, Albane; Vandenbussche, Pierre-Yves; Dhombres, Ferdinand; Landais, Paul

2015-01-01

Characterizing a rare disease diagnosis for a given patient is often made through expert’s networks. It is a complex task that could evolve over time depending on the natural history of the disease and the evolution of the scientific knowledge. Most rare diseases have genetic causes and recent improvements of sequencing techniques contribute to the discovery of many new diseases every year. Diagnosis coding in the rare disease field requires data from multiple knowledge bases to be aggregated in order to offer the clinician a global information space from possible diagnosis to clinical signs (phenotypes) and known genetic mutations (genotype). Nowadays, the major barrier to the coding activity is the lack of consolidation of such information scattered in different thesaurus such as Orphanet, OMIM or HPO. The Linking Open data for Rare Diseases (LORD) web portal we developed stands as the first attempt to fill this gap by offering an integrated view of 8,400 rare diseases linked to more than 14,500 signs and 3,270 genes. The application provides a browsing feature to navigate through the relationships between diseases, signs and genes, and some Application Programming Interfaces to help its integration in health information systems in routine. PMID:26958175
Tenebrio molitor antifreeze protein gene identification and regulation.

PubMed

Qin, Wensheng; Walker, Virginia K

2006-02-15

The yellow mealworm, Tenebrio molitor, is a freeze susceptible, stored product pest. Its winter survival is facilitated by the accumulation of antifreeze proteins (AFPs), encoded by a small gene family. We have now isolated 11 different AFP genomic clones from 3 genomic libraries. All the clones had a single coding sequence, with no evidence of intervening sequences. Three genomic clones were further characterized. All have putative TATA box sequences upstream of the coding regions and multiple potential poly(A) signal sequences downstream of the coding regions. A TmAFP regulatory region, B1037, conferred transcriptional activity when ligated to a luciferase reporter sequence and after transfection into an insect cell line. A 143 bp core promoter including a TATA box sequence was identified. Its promoter activity was increased 4.4 times by inserting an exotic 245 bp intron into the construct, similar to the enhancement of transgenic expression seen in several other systems. The addition of a duplication of the first 120 bp sequence from the 143 bp core promoter decreased promoter activity by half. Although putative hormonal response sequences were identified, none of the five hormones tested enhanced reporter activity. These studies on the mechanisms of AFP transcriptional control are important for the consideration of any transfer of freeze-resistance phenotypes to beneficial hosts.
Differentiation and classification of phytoplasmas in the pigeon pea witches'-broom group (16SrIX): an update based on multiple gene sequence analysis.

PubMed

Lee, I-M; Bottner-Parker, K D; Zhao, Y; Bertaccini, A; Davis, R E

2012-09-01

The pigeon pea witches'-broom phytoplasma group (16SrIX) comprises diverse strains that cause numerous diseases in leguminous trees and herbaceous crops, vegetables, a fruit, a nut tree and a forest tree. At least 14 strains have been reported worldwide. Comparative phylogenetic analyses of the highly conserved 16S rRNA gene and the moderately conserved rplV (rpl22)-rpsC (rps3) and secY genes indicated that the 16SrIX group consists of at least six distinct genetic lineages. Some of these lineages cannot be readily differentiated based on analysis of 16S rRNA gene sequences alone. The relative genetic distances among these closely related lineages were better assessed by including more variable genes [e.g. ribosomal protein (rp) and secY genes]. The present study demonstrated that virtual RFLP analyses using rp and secY gene sequences allowed unambiguous identification of such lineages. A coding system is proposed to designate each distinct rp and secY subgroup in the 16SrIX group.
Polymorphisms in miRNA genes and their involvement in autoimmune diseases susceptibility.

PubMed

Latini, Andrea; Ciccacci, Cinzia; Novelli, Giuseppe; Borgiani, Paola

2017-08-01

MicroRNAs (miRNAs) are small non-coding RNA molecules that negatively regulate the expression of multiple protein-encoding genes at the post-transcriptional level. MicroRNAs are involved in different pathways, such as cellular proliferation and differentiation, signal transduction and inflammation, and play crucial roles in the development of several diseases, such as cancer, diabetes, and cardiovascular diseases. They have recently been recognized to play a role also in the pathogenesis of autoimmune diseases. Although the majority of studies are focused on miRNA expression profiles investigation, a growing number of studies have been investigating the role of polymorphisms in miRNA genes in the autoimmune diseases development. Indeed, polymorphisms affecting the miRNA genes can modify the set of targets they regulate or the maturation efficiency. This review is aimed to give an overview about the available studies that have investigated the association of miRNA gene polymorphisms with the susceptibility to various autoimmune diseases and to their clinical phenotypes.
Rare variants in the neurotrophin signaling pathway implicated in schizophrenia risk.

PubMed

Kranz, Thorsten M; Goetz, Ray R; Walsh-Messinger, Julie; Goetz, Deborah; Antonius, Daniel; Dolgalev, Igor; Heguy, Adriana; Seandel, Marco; Malaspina, Dolores; Chao, Moses V

2015-10-01

Multiple lines of evidence corroborate impaired signaling pathways as relevant to the underpinnings of schizophrenia. There has been an interest in neurotrophins, since they are crucial mediators of neurodevelopment and in synaptic connectivity in the adult brain. Neurotrophins and their receptors demonstrate aberrant expression patterns in cortical areas for schizophrenia cases in comparison to control subjects. There is little known about the contribution of neurotrophin genes in psychiatric disorders. To begin to address this issue, we conducted high-coverage targeted exome capture in a subset of neurotrophin genes in 48 comprehensively characterized cases with schizophrenia-related psychosis. We herein report rare missense polymorphisms and novel missense mutations in neurotrophin receptor signaling pathway genes. Furthermore, we observed that several genes have a higher propensity to harbor missense coding variants than others. Based on this initial analysis we suggest that rare variants and missense mutations in neurotrophin genes might represent genetic contributions involved across psychiatric disorders. Copyright © 2015 Elsevier B.V. All rights reserved.
COGNATE: comparative gene annotation characterizer.

PubMed

Wilbrandt, Jeanne; Misof, Bernhard; Niehuis, Oliver

2017-07-17

The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331, 2015). For example, testing the hypothesis that the total amount of coding sequences is a reliable measure of potential proteome diversity (Wang M, Kurland CG, Caetano-Anollés G, PNAS 108:11954, 2011) requires the application of standardized definitions of coding sequence and genes to create both comparable and comprehensive data sets and corresponding summary statistics. However, such standard definitions either do not exist or are not consistently applied. These circumstances call for a standard at the descriptive level using a minimum of parameters as well as an undeviating use of standardized terms, and for software that infers the required data under these strict definitions. The acquisition of a comprehensive, descriptive, and standardized set of parameters and summary statistics for genome publications and further analyses can thus greatly benefit from the availability of an easy to use standard tool. We developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. Additionally, we revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage ( https://www.zfmk.de/en/COGNATE ) and on github ( https://github.com/ZFMK/COGNATE ). The tool COGNATE allows comparing genome assemblies and structural elements on multiples levels (e.g., scaffold or contig sequence, gene). It clearly enhances comparability between analyses. Thus, COGNATE can provide the important standardization of both genome and gene structure parameter disclosure as well as data acquisition for future comparative analyses. With the establishment of comprehensive descriptive standards and the extensive availability of genomes, an encompassing database will become possible.
Ensembl comparative genomics resources.

PubMed

Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

2016-01-01

Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. © The Author(s) 2016. Published by Oxford University Press.
Molecular characterization and expression analysis of AMPK α subunit isoform genes from Scophthalmus maximus responding to salinity stress.

PubMed

Zeng, Lin; Liu, Bin; Wu, Chang-Wen; Lei, Ji-Lin; Xu, Mei-Ying; Zhu, Ai-Yi; Zhang, Jian-She; Hong, Wan-Shu

2016-12-01

AMP-activated protein kinase (AMPK) is a highly conserved and multi-functional protein kinase that plays important roles in both intracellular energy balance and cellular stress response. In the present study, molecular characterization, tissue distribution and gene expression levels of the AMPK α1 and α2 genes from turbot (Scophthalmus maximus) under salinity stress are described. The complete coding regions of the AMPK α1 and α2 genes were isolated from turbot through degenerate primers in combination with RACE using muscle cDNA. The complete coding regions of AMPK α1 (1722 bp) and α2 (1674 bp) encoded 573 and 557 amino acids peptides, respectively. Multiple alignments, structural analysis and phylogenetic tree construction indicated that S. maximus AMPK α1 and α2 shared a high amino acid identity with other species, especially fish. AMPK α1 and α2 genes could be detected in all tested tissues, indicating that they are constitutively expressed. Salinity challenges significantly altered the gene expression levels of AMPK α1 and α2 mRNA in a salinity- and time-dependent manners in S. maximus gill tissues, suggesting that AMPK α1 and α2 played important roles in mediating the salinity stress in S. maximus. The expression levels of AMPK α1 and α2 mRNA were a positive correlation with gill Na + , K + -ATPase activities. These findings will aid our understanding of the molecular mechanism of juvenile turbot in response to environmental salinity changes.
Ensembl comparative genomics resources

PubMed Central

Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

2016-01-01

Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847
Validating the performance of correlated fission multiplicity implementation in radiation transport codes with subcritical neutron multiplication benchmark experiments

DOE PAGES

Arthur, Jennifer; Bahran, Rian; Hutchinson, Jesson; ...

2018-06-14

Historically, radiation transport codes have uncorrelated fission emissions. In reality, the particles emitted by both spontaneous and induced fissions are correlated in time, energy, angle, and multiplicity. This work validates the performance of various current Monte Carlo codes that take into account the underlying correlated physics of fission neutrons, specifically neutron multiplicity distributions. The performance of 4 Monte Carlo codes - MCNP®6.2, MCNP®6.2/FREYA, MCNP®6.2/CGMF, and PoliMi - was assessed using neutron multiplicity benchmark experiments. In addition, MCNP®6.2 simulations were run using JEFF-3.2 and JENDL-4.0, rather than ENDF/B-VII.1, data for 239Pu and 240Pu. The sensitive benchmark parameters that in this workmore » represent the performance of each correlated fission multiplicity Monte Carlo code include the singles rate, the doubles rate, leakage multiplication, and Feynman histograms. Although it is difficult to determine which radiation transport code shows the best overall performance in simulating subcritical neutron multiplication inference benchmark measurements, it is clear that correlations exist between the underlying nuclear data utilized by (or generated by) the various codes, and the correlated neutron observables of interest. This could prove useful in nuclear data validation and evaluation applications, in which a particular moment of the neutron multiplicity distribution is of more interest than the other moments. It is also quite clear that, because transport is handled by MCNP®6.2 in 3 of the 4 codes, with the 4th code (PoliMi) being based on an older version of MCNP®, the differences in correlated neutron observables of interest are most likely due to the treatment of fission event generation in each of the different codes, as opposed to the radiation transport.« less
Validating the performance of correlated fission multiplicity implementation in radiation transport codes with subcritical neutron multiplication benchmark experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arthur, Jennifer; Bahran, Rian; Hutchinson, Jesson

Historically, radiation transport codes have uncorrelated fission emissions. In reality, the particles emitted by both spontaneous and induced fissions are correlated in time, energy, angle, and multiplicity. This work validates the performance of various current Monte Carlo codes that take into account the underlying correlated physics of fission neutrons, specifically neutron multiplicity distributions. The performance of 4 Monte Carlo codes - MCNP®6.2, MCNP®6.2/FREYA, MCNP®6.2/CGMF, and PoliMi - was assessed using neutron multiplicity benchmark experiments. In addition, MCNP®6.2 simulations were run using JEFF-3.2 and JENDL-4.0, rather than ENDF/B-VII.1, data for 239Pu and 240Pu. The sensitive benchmark parameters that in this workmore » represent the performance of each correlated fission multiplicity Monte Carlo code include the singles rate, the doubles rate, leakage multiplication, and Feynman histograms. Although it is difficult to determine which radiation transport code shows the best overall performance in simulating subcritical neutron multiplication inference benchmark measurements, it is clear that correlations exist between the underlying nuclear data utilized by (or generated by) the various codes, and the correlated neutron observables of interest. This could prove useful in nuclear data validation and evaluation applications, in which a particular moment of the neutron multiplicity distribution is of more interest than the other moments. It is also quite clear that, because transport is handled by MCNP®6.2 in 3 of the 4 codes, with the 4th code (PoliMi) being based on an older version of MCNP®, the differences in correlated neutron observables of interest are most likely due to the treatment of fission event generation in each of the different codes, as opposed to the radiation transport.« less
Alternate approaches to repress endogenous microRNA activity in Arabidopsis thaliana

PubMed Central

Wang, Ming-Bo

2011-01-01

MicroRnAs (miRnAs) are an endogenous class of regulatory small RnA (sRnA). in plants, miRnAs are processed from short non-protein-coding messenger RnAs (mRnAs) transcribed from small miRnA genes (MIR genes). Traditionally in the model plant Arabidopsis thaliana (Arabidopsis), the functional analysis of a gene product has relied on the identification of a corresponding T-DnA insertion knockout mutant from a large, randomly-mutagenized population. However, because of the small size of MIR genes and presence of multiple, highly conserved members in most plant miRnA families, it has been extremely laborious and time consuming to obtain a corresponding single or multiple, null mutant plant line. Our recent study published in Molecular Plant1 outlines an alternate method for the functional characterization of miRnA action in Arabidopsis, termed anti-miRnA technology. Using this approach we demonstrated that the expression of individual miRnAs or entire miRnA families, can be readily and efficiently knocked-down. Our approach is in addition to two previously reported methodologies that also allow for the targeted suppression of either individual miRnAs, or all members of a MIR gene family; these include miRnA target mimicry2,3 and transcriptional gene silencing (TGS) of MIR gene promoters.4 All three methodologies rely on endogenous gene regulatory machinery and in this article we provide an overview of these technologies and discuss their strengths and weaknesses in inhibiting the activity of their targeted miRnA(s). PMID:21358288
Prediction and analysis of three gene families related to leaf rust (Puccinia triticina) resistance in wheat (Triticum aestivum L.).

PubMed

Peng, Fred Y; Yang, Rong-Cai

2017-06-20

The resistance to leaf rust (Lr) caused by Puccinia triticina in wheat (Triticum aestivum L.) has been well studied over the past decades with over 70 Lr genes being mapped on different chromosomes and numerous QTLs (quantitative trait loci) being detected or mapped using DNA markers. Such resistance is often divided into race-specific and race-nonspecific resistance. The race-nonspecific resistance can be further divided into resistance to most or all races of the same pathogen and resistance to multiple pathogens. At the molecular level, these three types of resistance may cover across the whole spectrum of pathogen specificities that are controlled by genes encoding different protein families in wheat. The objective of this study is to predict and analyze genes in three such families: NBS-LRR (nucleotide-binding sites and leucine-rich repeats or NLR), START (Steroidogenic Acute Regulatory protein [STaR] related lipid-transfer) and ABC (ATP-Binding Cassette) transporter. The focus of the analysis is on the patterns of relationships between these protein-coding genes within the gene families and QTLs detected for leaf rust resistance. We predicted 526 ABC, 1117 NLR and 144 START genes in the hexaploid wheat genome through a domain analysis of wheat proteome. Of the 1809 SNPs from leaf rust resistance QTLs in seedling and adult stages of wheat, 126 SNPs were found within coding regions of these genes or their neighborhood (5 Kb upstream from transcription start site [TSS] or downstream from transcription termination site [TTS] of the genes). Forty-three of these SNPs for adult resistance and 18 SNPs for seedling resistance reside within coding or neighboring regions of the ABC genes whereas 14 SNPs for adult resistance and 29 SNPs for seedling resistance reside within coding or neighboring regions of the NLR gene. Moreover, we found 17 nonsynonymous SNPs for adult resistance and five SNPs for seedling resistance in the ABC genes, and five nonsynonymous SNPs for adult resistance and six SNPs for seedling resistance in the NLR genes. Most of these coding SNPs were predicted to alter encoded amino acids and such information may serve as a starting point towards more thorough molecular and functional characterization of the designated Lr genes. Using the primer sequences of 99 known non-SNP markers from leaf rust resistance QTLs, we found candidate genes closely linked to these markers, including Lr34 with distances to its two gene-specific markers being 1212 bases (to cssfr1) and 2189 bases (to cssfr2). This study represents a comprehensive analysis of ABC, NLR and START genes in the hexaploid wheat genome and their physical relationships with QTLs for leaf rust resistance at seedling and adult stages. Our analysis suggests that the ABC (and START) genes are more likely to be co-located with QTLs for race-nonspecific, adult resistance whereas the NLR genes are more likely to be co-located with QTLs for race-specific resistance that would be often expressed at the seedling stage. Though our analysis was hampered by inaccurate or unknown physical positions of numerous QTLs due to the incomplete assembly of the complex hexaploid wheat genome that is currently available, the observed associations between (i) QTLs for race-specific resistance and NLR genes and (ii) QTLs for nonspecific resistance and ABC genes will help discover SNP variants for leaf rust resistance at seedling and adult stages. The genes containing nonsynonymous SNPs are promising candidates that can be investigated in future studies as potential new sources of leaf rust resistance in wheat breeding.
Transcription Factor Binding Profiles Reveal Cyclic Expression of Human Protein-coding Genes and Non-coding RNAs

PubMed Central

Cheng, Chao; Ung, Matthew; Grant, Gavin D.; Whitfield, Michael L.

2013-01-01

Cell cycle is a complex and highly supervised process that must proceed with regulatory precision to achieve successful cellular division. Despite the wide application, microarray time course experiments have several limitations in identifying cell cycle genes. We thus propose a computational model to predict human cell cycle genes based on transcription factor (TF) binding and regulatory motif information in their promoters. We utilize ENCODE ChIP-seq data and motif information as predictors to discriminate cell cycle against non-cell cycle genes. Our results show that both the trans- TF features and the cis- motif features are predictive of cell cycle genes, and a combination of the two types of features can further improve prediction accuracy. We apply our model to a complete list of GENCODE promoters to predict novel cell cycle driving promoters for both protein-coding genes and non-coding RNAs such as lincRNAs. We find that a similar percentage of lincRNAs are cell cycle regulated as protein-coding genes, suggesting the importance of non-coding RNAs in cell cycle division. The model we propose here provides not only a practical tool for identifying novel cell cycle genes with high accuracy, but also new insights on cell cycle regulation by TFs and cis-regulatory elements. PMID:23874175
Comparative Analysis of Evolutionary Mechanisms of the Hemagglutinin and Three Internal Protein Genes of Influenza B Virus: Multiple Cocirculating Lineages and Frequent Reassortment of the NP, M, and NS Genes

PubMed Central

Lindstrom, Stephen E.; Hiromoto, Yasuaki; Nishimura, Hidekazu; Saito, Takehiko; Nerome, Reiko; Nerome, Kuniaki

1999-01-01

Phylogenetic profiles of the genes coding for the hemagglutinin (HA) protein, nucleoprotein (NP), matrix (M) protein, and nonstructural (NS) proteins of influenza B viruses isolated from 1940 to 1998 were analyzed in a parallel manner in order to understand the evolutionary mechanisms of these viruses. Unlike human influenza A (H3N2) viruses, the evolutionary pathways of all four genes of recent influenza B viruses revealed similar patterns of genetic divergence into two major lineages. Although evolutionary rates of the HA, NP, M, and NS genes of influenza B viruses were estimated to be generally lower than those of human influenza A viruses, genes of influenza B viruses demonstrated complex phylogenetic patterns, indicating alternative mechanisms for generation of virus variability. Topologies of the evolutionary trees of each gene were determined to be quite distinct from one another, showing that these genes were evolving in an independent manner. Furthermore, variable topologies were apparently the result of frequent genetic exchange among cocirculating epidemic viruses. Evolutionary analysis done in the present study provided further evidence for cocirculation of multiple lineages as well as sequestering and reemergence of phylogenetic lineages of the internal genes. In addition, comparison of deduced amino acid sequences revealed a novel amino acid deletion in the HA1 domain of the HA protein of recent isolates from 1998 belonging to the B/Yamagata/16/88-like lineage. It thus became apparent that, despite lower evolutionary rates, influenza B viruses were able to generate genetic diversity among circulating viruses through a combination of evolutionary mechanisms involving cocirculating lineages and genetic reassortment by which new variants with distinct gene constellations emerged. PMID:10196339
Nothing in Evolution Makes Sense Except in the Light of Genomics: Read-Write Genome Evolution as an Active Biological Process.

PubMed

Shapiro, James A

2016-06-08

The 21st century genomics-based analysis of evolutionary variation reveals a number of novel features impossible to predict when Dobzhansky and other evolutionary biologists formulated the neo-Darwinian Modern Synthesis in the middle of the last century. These include three distinct realms of cell evolution; symbiogenetic fusions forming eukaryotic cells with multiple genome compartments; horizontal organelle, virus and DNA transfers; functional organization of proteins as systems of interacting domains subject to rapid evolution by exon shuffling and exonization; distributed genome networks integrated by mobile repetitive regulatory signals; and regulation of multicellular development by non-coding lncRNAs containing repetitive sequence components. Rather than single gene traits, all phenotypes involve coordinated activity by multiple interacting cell molecules. Genomes contain abundant and functional repetitive components in addition to the unique coding sequences envisaged in the early days of molecular biology. Combinatorial coding, plus the biochemical abilities cells possess to rearrange DNA molecules, constitute a powerful toolbox for adaptive genome rewriting. That is, cells possess "Read-Write Genomes" they alter by numerous biochemical processes capable of rapidly restructuring cellular DNA molecules. Rather than viewing genome evolution as a series of accidental modifications, we can now study it as a complex biological process of active self-modification.
Nothing in Evolution Makes Sense Except in the Light of Genomics: Read–Write Genome Evolution as an Active Biological Process

PubMed Central

Shapiro, James A.

2016-01-01

The 21st century genomics-based analysis of evolutionary variation reveals a number of novel features impossible to predict when Dobzhansky and other evolutionary biologists formulated the neo-Darwinian Modern Synthesis in the middle of the last century. These include three distinct realms of cell evolution; symbiogenetic fusions forming eukaryotic cells with multiple genome compartments; horizontal organelle, virus and DNA transfers; functional organization of proteins as systems of interacting domains subject to rapid evolution by exon shuffling and exonization; distributed genome networks integrated by mobile repetitive regulatory signals; and regulation of multicellular development by non-coding lncRNAs containing repetitive sequence components. Rather than single gene traits, all phenotypes involve coordinated activity by multiple interacting cell molecules. Genomes contain abundant and functional repetitive components in addition to the unique coding sequences envisaged in the early days of molecular biology. Combinatorial coding, plus the biochemical abilities cells possess to rearrange DNA molecules, constitute a powerful toolbox for adaptive genome rewriting. That is, cells possess “Read–Write Genomes” they alter by numerous biochemical processes capable of rapidly restructuring cellular DNA molecules. Rather than viewing genome evolution as a series of accidental modifications, we can now study it as a complex biological process of active self-modification. PMID:27338490

Transcriptome interrogation of human myometrium identifies differentially expressed sense-antisense pairs of protein-coding and long non-coding RNA genes in spontaneous labor at term.

PubMed

Romero, Roberto; Tarca, Adi L; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S; Kalita, Cynthia A; Cai, Juan; Yeo, Lami; Lipovich, Leonard

2014-09-01

To identify differentially expressed long non-coding RNA (lncRNA) genes in human myometrium in women with spontaneous labor at term. Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n = 19) and women in spontaneous labor at term (n = 20). RNA was extracted and profiled using an Illumina® microarray platform. We have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. We identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an experimental method completely independent of the microarray analysis. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site, that lacked evolutionary conservation beyond primates. We provide, for the first time, evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term.
Three-layered polyplex as a microRNA targeted delivery system for breast cancer gene therapy

NASA Astrophysics Data System (ADS)

Li, Yan; Dai, Yu; Zhang, Xiaojin; Chen, Jihua

2017-07-01

MicroRNAs (miRNAs), small non-coding RNAs, play an important role in modulating cell proliferation, migration, and differentiation. Since miRNAs can regulate multiple cancer-related genes simultaneously, regulating miRNAs could target a set of related oncogenic genes or pathways. Owing to their reduced immune response and low toxicity, miRNAs with small size and low molecular weight have become increasingly promising therapeutic drugs in cancer therapy. However, one of the major challenges of miRNAs-based cancer therapy is to achieve specific, effective, and safe delivery of therapeutic miRNAs into cancer cells. Here we provide a strategy using three-layered polyplex with folic acid as a targeting group to systemically deliver miR-210 into breast cancer cells, which results in breast cancer growth being inhibited.
Three-layered polyplex as a microRNA targeted delivery system for breast cancer gene therapy.

PubMed

Li, Yan; Dai, Yu; Zhang, Xiaojin; Chen, Jihua

2017-07-14

MicroRNAs (miRNAs), small non-coding RNAs, play an important role in modulating cell proliferation, migration, and differentiation. Since miRNAs can regulate multiple cancer-related genes simultaneously, regulating miRNAs could target a set of related oncogenic genes or pathways. Owing to their reduced immune response and low toxicity, miRNAs with small size and low molecular weight have become increasingly promising therapeutic drugs in cancer therapy. However, one of the major challenges of miRNAs-based cancer therapy is to achieve specific, effective, and safe delivery of therapeutic miRNAs into cancer cells. Here we provide a strategy using three-layered polyplex with folic acid as a targeting group to systemically deliver miR-210 into breast cancer cells, which results in breast cancer growth being inhibited.
Long Non-Coding RNAs Differentially Expressed between Normal versus Primary Breast Tumor Tissues Disclose Converse Changes to Breast Cancer-Related Protein-Coding Genes

PubMed Central

Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U.; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N.; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O.

2014-01-01

Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes. PMID:25264628
Long non-coding RNAs differentially expressed between normal versus primary breast tumor tissues disclose converse changes to breast cancer-related protein-coding genes.

PubMed

Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O

2014-01-01

Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes.
Characterization of mitochondrial genome of sea cucumber Stichopus horrens: a novel gene arrangement in Holothuroidea.

PubMed

Fan, SiGang; Hu, ChaoQun; Wen, Jing; Zhang, LvPing

2011-05-01

The complete mitochondrial DNA sequence contains useful information for phylogenetic analyses of metazoa. In this study, the complete mitochondrial DNA sequence of sea cucumber Stichopus horrens (Holothuroidea: Stichopodidae: Stichopus) is presented. The complete sequence was determined using normal and long PCRs. The mitochondrial genome of Stichopus horrens is a circular molecule 16257 bps long, composed of 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes. Most of these genes are coded on the heavy strand except for one protein-coding gene (nad6) and five tRNA genes (tRNA ( Ser(UCN) ), tRNA ( Gln ), tRNA ( Ala ), tRNA ( Val ), tRNA ( Asp )) which are coded on the light strand. The composition of the heavy strand is 30.8% A, 23.7% C, 16.2% G, and 29.3% T bases (AT skew=0.025; GC skew=-0.188). A non-coding region of 675 bp was identified as a putative control region because of its location and AT richness. The intergenic spacers range from 1 to 50 bp in size, totaling 227 bp. A total of 25 overlapping nucleotides, ranging from 1 to 10 bp in size, exist among 11 genes. All 13 protein-coding genes are initiated with an ATG. The TAA codon is used as the stop codon in all the protein coding genes except nad3 and nad4 that use TAG as their termination codon. The most frequently used amino acids are Leu (16.29%), Ser (10.34%) and Phe (8.37%). All of the tRNA genes have the potential to fold into typical cloverleaf secondary structures. We also compared the order of the genes in the mitochondrial DNA from the five holothurians that are now available and found a novel gene arrangement in the mitochondrial DNA of Stichopus horrens.
JCoDA: a tool for detecting evolutionary selection.

PubMed

Steinway, Steven N; Dannenfelser, Ruth; Laucius, Christopher D; Hayes, James E; Nayak, Sudhir

2010-05-27

The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda.
JCoDA: a tool for detecting evolutionary selection

PubMed Central

2010-01-01

Background The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. Results JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. Conclusions JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda. PMID:20507581
Transcriptome interrogation of human myometrium identifies differentially expressed sense-antisense pairs of protein-coding and long non-coding RNA genes in spontaneous labor at term

PubMed Central

Romero, Roberto; Tarca, Adi; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S.; Kalita, Cynthia A.; Cai, Juan; Yeo, Lami; Lipovich, Leonard

2014-01-01

Objective The mechanisms responsible for normal and abnormal parturition are poorly understood. Myometrial activation leading to regular uterine contractions is a key component of labor. Dysfunctional labor (arrest of dilatation and/or descent) is a leading indication for cesarean delivery. Compelling evidence suggests that most of these disorders are functional in nature, and not the result of cephalopelvic disproportion. The methodology and the datasets afforded by the post-genomic era provide novel opportunities to understand and target gene functions in these disorders. In 2012, the ENCODE Consortium elucidated the extraordinary abundance and functional complexity of long non-coding RNA genes in the human genome. The purpose of the study was to identify differentially expressed long non-coding RNA genes in human myometrium in women in spontaneous labor at term. Materials and Methods Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n=19) and women in spontaneous labor at term (n=20). RNA was extracted and profiled using an Illumina® microarray platform. The analysis of the protein coding genes from this study has been previously reported. Here, we have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. Results Upon considering more than 18,498 distinct lncRNA genes compiled nonredundantly from public experimental data sources, and interrogating 2,634 that matched Illumina microarray probes, we identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an independent experimental method. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site that lacked evolutionary conservation beyond primates. Conclusions We provide for the first time evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known, as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term. PMID:24168098
Mrp--a new auxiliary gene essential for optimal expression of methicillin resistance in Staphylococcus aureus.

PubMed

Wu, S W; De Lencastre, H

1999-01-01

Screening of a library of Tn551 insertional mutants selected for reduction in the methicillin resistance level of the parental Staphylococcus aureus strain COL resulted in the isolation of mutant RUSA266 in which the minimal inhibitory concentration (MIC) of the parent was reduced from 1,600 to 1.5 micrograms/mL. Cloning and sequencing of the vicinity of the insertion site omega 726 identified an open reading frame (orf1365) encoding a very large polypeptide of more than 1,365 amino acids. A unique feature of the deduced amino acid sequence was the presence of multiple tandem repeats of 75 amino acids in the polypeptide, reminiscent of the structure of high-molecular-weight cell-surface proteins EF* and Emb identified in some streptococcal strains. Mutant RUSA266 with the inactivated gene, which we shall provisionally refer to as mrp (for multiple repeat polypeptide), produced a peptidoglycan with altered muropeptide composition, and both the reduced antibiotic resistance and the altered cell wall composition were co-transduced in back-crosses into the parental strain COL. Additional sequencing upstream of mrp has revealed that this gene was part of a five-gene cluster occupying a 9.2-kb region of the staphylococcal chromosome and was composed of glmM (directly upstream of mrp), two open reading frames orf310 and orf269 coding for two hypothetical proteins, and the gene encoding the staphylococcal arginase (arg). Transcriptional analysis demonstrated that the five genes in the cluster were transcribed together.
Single-Nucleosome Mapping of Histone Modifications in S. cerevisiae

PubMed Central

Kim, Minkyu; Buratowski, Stephen; Schreiber, Stuart L; Friedman, Nir

2005-01-01

Covalent modification of histone proteins plays a role in virtually every process on eukaryotic DNA, from transcription to DNA repair. Many different residues can be covalently modified, and it has been suggested that these modifications occur in a great number of independent, meaningful combinations. Published low-resolution microarray studies on the combinatorial complexity of histone modification patterns suffer from confounding effects caused by the averaging of modification levels over multiple nucleosomes. To overcome this problem, we used a high-resolution tiled microarray with single-nucleosome resolution to investigate the occurrence of combinations of 12 histone modifications on thousands of nucleosomes in actively growing S. cerevisiae. We found that histone modifications do not occur independently; there are roughly two groups of co-occurring modifications. One group of lysine acetylations shows a sharply defined domain of two hypo-acetylated nucleosomes, adjacent to the transcriptional start site, whose occurrence does not correlate with transcription levels. The other group consists of modifications occurring in gradients through the coding regions of genes in a pattern associated with transcription. We found no evidence for a deterministic code of many discrete states, but instead we saw blended, continuous patterns that distinguish nucleosomes at one location (e.g., promoter nucleosomes) from those at another location (e.g., over the 3′ ends of coding regions). These results are consistent with the idea of a simple, redundant histone code, in which multiple modifications share the same role. PMID:16122352
RNAseq analysis of fast skeletal muscle in restriction-fed transgenic coho salmon (Oncorhynchus kisutch): an experimental model uncoupling the growth hormone and nutritional signals regulating growth.

PubMed

Garcia de la Serrana, Daniel; Devlin, Robert H; Johnston, Ian A

2015-07-31

Coho salmon (Oncorhynchus kisutch) transgenic for growth hormone (Gh) express Gh in multiple tissues which results in increased appetite and continuous high growth with satiation feeding. Restricting Gh-transgenics to the same lower ration (TR) as wild-type fish (WT) results in similar growth, but with the recruitment of fewer, larger diameter, muscle skeletal fibres to reach a given body size. In order to better understand the genetic mechanisms behind these different patterns of muscle growth and to investigate how the decoupling of Gh and nutritional signals affects gene regulation we used RNA-seq to compare the fast skeletal muscle transcriptome in TR and WT coho salmon. Illumina sequencing of individually barcoded libraries from 6 WT and 6 TR coho salmon yielded 704,550,985 paired end reads which were used to construct 323,115 contigs containing 19,093 unique genes of which >10,000 contained >90 % of the coding sequence. Transcripts coding for 31 genes required for myoblast fusion were identified with 22 significantly downregulated in TR relative to WT fish, including 10 (vaspa, cdh15, graf1, crk, crkl, dock1, trio, plekho1a, cdc42a and dock5) associated with signaling through the cell surface protein cadherin. Nineteen out of 44 (43 %) translation initiation factors and 14 of 47 (30 %) protein chaperones were upregulated in TR relative to WT fish. TR coho salmon showed increased growth hormone transcripts and gene expression associated with protein synthesis and folding than WT fish even though net rates of protein accretion were similar. The uncoupling of Gh and amino acid signals likely results in additional costs of transcription associated with protein turnover in TR fish. The predicted reduction in the ionic costs of homeostasis in TR fish associated with increased fibre size were shown to involve multiple pathways regulating myotube fusion, particularly cadherin signaling.
A deep transcriptomic resource for the copepod crustacean Labidocera madurae: A potential indicator species for assessing near shore ecosystem health

PubMed Central

Christie, Andrew E.; Sommer, Stephanie A.; Cieslak, Matthew C.; Hartline, Daniel K.; Lenz, Petra H.

2017-01-01

Coral reef ecosystems of many sub-tropical and tropical marine coastal environments have suffered significant degradation from anthropogenic sources. Research to inform management strategies that mitigate stressors and promote a healthy ecosystem has focused on the ecology and physiology of coral reefs and associated organisms. Few studies focus on the surrounding pelagic communities, which are equally important to ecosystem function. Zooplankton, often dominated by small crustaceans such as copepods, is an important food source for invertebrates and fishes, especially larval fishes. The reef-associated zooplankton includes a sub-neustonic copepod family that could serve as an indicator species for the community. Here, we describe the generation of a de novo transcriptome for one such copepod, Labidocera madurae, a pontellid from an intensively-studied coral reef ecosystem, Kāne‘ohe Bay, Oahu, Hawai‘i. The transcriptome was assembled using high-throughput sequence data obtained from whole organisms. It comprised 211,002 unique transcripts, including 72,391 with coding regions. It was assessed for quality and completeness using multiple workflows. Bench-marking-universal-single-copy-orthologs (BUSCO) analysis identified transcripts for 88% of expected eukaryotic core proteins. Targeted gene-discovery analyses included searches for transcripts coding full-length “giant” proteins (>4,000 amino acids), proteins and splice variants of voltage-gated sodium channels, and proteins involved in the circadian signaling pathway. Four different reference transcriptomes were generated and compared for the detection of differential gene expression between copepodites and adult females; 6,229 genes were consistently identified as differentially expressed between the two regardless of reference. Automated bioinformatics analyses and targeted manual gene curation suggest that the de novo assembled L. madurae transcriptome is of high quality and completeness. This transcriptome provides a new resource for assessing the global physiological status of a planktonic species inhabiting a coral reef ecosystem that is subjected to multiple anthropogenic stressors. The workflows provide a template for generating and assessing transcriptomes in other non-model species. PMID:29065152
A deep transcriptomic resource for the copepod crustacean Labidocera madurae: A potential indicator species for assessing near shore ecosystem health.

PubMed

Roncalli, Vittoria; Christie, Andrew E; Sommer, Stephanie A; Cieslak, Matthew C; Hartline, Daniel K; Lenz, Petra H

2017-01-01

Coral reef ecosystems of many sub-tropical and tropical marine coastal environments have suffered significant degradation from anthropogenic sources. Research to inform management strategies that mitigate stressors and promote a healthy ecosystem has focused on the ecology and physiology of coral reefs and associated organisms. Few studies focus on the surrounding pelagic communities, which are equally important to ecosystem function. Zooplankton, often dominated by small crustaceans such as copepods, is an important food source for invertebrates and fishes, especially larval fishes. The reef-associated zooplankton includes a sub-neustonic copepod family that could serve as an indicator species for the community. Here, we describe the generation of a de novo transcriptome for one such copepod, Labidocera madurae, a pontellid from an intensively-studied coral reef ecosystem, Kāne'ohe Bay, Oahu, Hawai'i. The transcriptome was assembled using high-throughput sequence data obtained from whole organisms. It comprised 211,002 unique transcripts, including 72,391 with coding regions. It was assessed for quality and completeness using multiple workflows. Bench-marking-universal-single-copy-orthologs (BUSCO) analysis identified transcripts for 88% of expected eukaryotic core proteins. Targeted gene-discovery analyses included searches for transcripts coding full-length "giant" proteins (>4,000 amino acids), proteins and splice variants of voltage-gated sodium channels, and proteins involved in the circadian signaling pathway. Four different reference transcriptomes were generated and compared for the detection of differential gene expression between copepodites and adult females; 6,229 genes were consistently identified as differentially expressed between the two regardless of reference. Automated bioinformatics analyses and targeted manual gene curation suggest that the de novo assembled L. madurae transcriptome is of high quality and completeness. This transcriptome provides a new resource for assessing the global physiological status of a planktonic species inhabiting a coral reef ecosystem that is subjected to multiple anthropogenic stressors. The workflows provide a template for generating and assessing transcriptomes in other non-model species.
Complete mitochondrial genome of Germain's Peacock-Pheasant Polyplectron germaini (Aves, Galliformes, Phasianidae).

PubMed

Omeire, Destiny; Abdin, Shaunte; Brooks, Daniel M; Miranda, Hector C

2015-04-01

The Germain's Peacock-Pheasant Polyplectron germaini (Aves, Galliformes, Phasianidae) is classified as Near Threatened on the IUCN Red List. The complete mitochondrial genome of P. germaini is 16,699 bp, consisting of 13 protein-coding genes, 2 rRNA, 22 tRNA genes and 1 control region. All of the 13 protein-coding genes have ATG as start codon. Eight of the 13 protein-coding genes have TAA as stop codon.
Hearing the voices of service user researchers in collaborative qualitative data analysis: the case for multiple coding.

PubMed

Sweeney, Angela; Greenwood, Kathryn E; Williams, Sally; Wykes, Til; Rose, Diana S

2013-12-01

Health research is frequently conducted in multi-disciplinary teams, with these teams increasingly including service user researchers. Whilst it is common for service user researchers to be involved in data collection--most typically interviewing other service users--it is less common for service user researchers to be involved in data analysis and interpretation. This means that a unique and significant perspective on the data is absent. This study aims to use an empirical report of a study on Cognitive Behavioural Therapy for psychosis (CBTp) to demonstrate the value of multiple coding in enabling service users voices to be heard in team-based qualitative data analysis. The CBTp study employed multiple coding to analyse service users' discussions of CBT for psychosis (CBTp) from the perspectives of a service user researcher, clinical researcher and psychology assistant. Multiple coding was selected to enable multiple perspectives to analyse and interpret data, to understand and explore differences and to build multi-disciplinary consensus. Multiple coding enabled the team to understand where our views were commensurate and incommensurate and to discuss and debate differences. Through the process of multiple coding, we were able to build strong consensus about the data from multiple perspectives, including that of the service user researcher. Multiple coding is an important method for understanding and exploring multiple perspectives on data and building team consensus. This can be contrasted with inter-rater reliability which is only appropriate in limited circumstances. We conclude that multiple coding is an appropriate and important means of hearing service users' voices in qualitative data analysis. © 2012 John Wiley & Sons Ltd.
Non-coding RNAs: new biomarkers and therapeutic targets for esophageal cancer

PubMed Central

Ren, Zhipeng; Zhang, Guoliang

2017-01-01

Esophageal cancer is one of the most common gastrointestinal malignant diseases and there is still no effective treatment. The incidence of esophageal cancer in the world is relatively high and on the increase year by year. Thus, the elaboration on the carcinogenesis of esophageal cancer and the identification of new biomarkers and therapeutic targets is quite beneficial to optimizing the current therapeutic regimen for treating such deadly disease. More and more evidence has shown that non-coding RNAs play an important role in the development and progression of multiple human cancers, including esophageal cancer. microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) are two functional kinds of non-coding RNAs that have been well investigated. They exert tumor suppressive or promoting effect by specifically regulating the expression of certain downstream target genes, which is tumor specific. It is also proved that miRNAs and lncRNAs level in tissue and plasma from esophageal cancer patients are closely correlated with the survival and disease progression, which could be used as a prognostic factor and therapeutic target for esophageal cancer. PMID:28388588
Non-coding RNAs: new biomarkers and therapeutic targets for esophageal cancer.

PubMed

Hou, Xiaobin; Wen, Jiaxin; Ren, Zhipeng; Zhang, Guoliang

2017-06-27

Esophageal cancer is one of the most common gastrointestinal malignant diseases and there is still no effective treatment. The incidence of esophageal cancer in the world is relatively high and on the increase year by year. Thus, the elaboration on the carcinogenesis of esophageal cancer and the identification of new biomarkers and therapeutic targets is quite beneficial to optimizing the current therapeutic regimen for treating such deadly disease. More and more evidence has shown that non-coding RNAs play an important role in the development and progression of multiple human cancers, including esophageal cancer. microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) are two functional kinds of non-coding RNAs that have been well investigated. They exert tumor suppressive or promoting effect by specifically regulating the expression of certain downstream target genes, which is tumor specific. It is also proved that miRNAs and lncRNAs level in tissue and plasma from esophageal cancer patients are closely correlated with the survival and disease progression, which could be used as a prognostic factor and therapeutic target for esophageal cancer.
Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs

PubMed Central

Guttman, Mitchell; Garber, Manuel; Levin, Joshua Z.; Donaghey, Julie; Robinson, James; Adiconis, Xian; Fan, Lin; Koziol, Magdalena J.; Gnirke, Andreas; Nusbaum, Chad; Rinn, John L.; Lander, Eric S.; Regev, Aviv

2010-01-01

RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes. PMID:20436462
Codon usage and expression level of human mitochondrial 13 protein coding genes across six continents.

PubMed

Chakraborty, Supriyo; Uddin, Arif; Mazumder, Tarikul Huda; Choudhury, Monisha Nath; Malakar, Arup Kumar; Paul, Prosenjit; Halder, Binata; Deka, Himangshu; Mazumder, Gulshana Akthar; Barbhuiya, Riazul Ahmed; Barbhuiya, Masuk Ahmed; Devi, Warepam Jesmi

2017-12-02

The study of codon usage coupled with phylogenetic analysis is an important tool to understand the genetic and evolutionary relationship of a gene. The 13 protein coding genes of human mitochondria are involved in electron transport chain for the generation of energy currency (ATP). However, no work has yet been reported on the codon usage of the mitochondrial protein coding genes across six continents. To understand the patterns of codon usage in mitochondrial genes across six different continents, we used bioinformatic analyses to analyze the protein coding genes. The codon usage bias was low as revealed from high ENC value. Correlation between codon usage and GC3 suggested that all the codons ending with G/C were positively correlated with GC3 but vice versa for A/T ending codons with the exception of ND4L and ND5 genes. Neutrality plot revealed that for the genes ATP6, COI, COIII, CYB, ND4 and ND4L, natural selection might have played a major role while mutation pressure might have played a dominant role in the codon usage bias of ATP8, COII, ND1, ND2, ND3, ND5 and ND6 genes. Phylogenetic analysis indicated that evolutionary relationships in each of 13 protein coding genes of human mitochondria were different across six continents and further suggested that geographical distance was an important factor for the origin and evolution of 13 protein coding genes of human mitochondria. Copyright © 2017 Elsevier B.V. and Mitochondria Research Society. All rights reserved.

Interactions of Cigarette Smoking with NAT2 Polymorphisms Impact Rheumatoid Arthritis Risk in African Americans

PubMed Central

Mikuls, Ted R.; LeVan, Tricia; Gould, Karen A.; Yu, Fang; Thiele, Geoffrey M.; Bynote, Kimberly K.; Conn, Doyt; Jonas, Beth L.; Callahan, Leigh F.; Smith, Edwin; Brasington, Richard; Moreland, Larry W.; Reynolds, Richard; Gaffo, Angelo; Bridges, S. Louis

2011-01-01

Objective To examine whether polymorphisms in genes coding for drug metabolizing enzymes (DMEs) impact rheumatoid arthritis (RA) risk due to cigarette smoking in African Americans. Methods Smoking status was evaluated in African American RA cases and non-RA controls categorized as heavy (≥ 10 pack-years) vs. other. Individuals were genotyped for a homozygous deletion polymorphism in glutathione S-transferase Mu-1 (GSTM1-null) in addition to tagging single nucleotide polymorphisms (SNPs) in N-acetyltransferase (NAT)1, NAT2, and epoxide hydrolase (EPXH1). Associations of genotypes with RA were examined using logistic regression and gene-smoking interactions were assessed. Results There were no significant associations of any DME genotype with RA. After adjustment for multiple comparisons, there were significant additive interactions between heavy smoking and NAT2 SNPs rs9987109 (Padd = 0.000003) and rs1208 (Padd = 0.00001); attributable proportions (APs) due to interaction ranged from 0.61 to 0.67. None of the multiplicative gene-smoking interactions examined remained significant after adjustment for multiple testing in overall disease risk. There was no evidence of significant gene-smoking interactions in analyses of GSTM1-null, NAT1, or EPXH1. DME gene-smoking interactions were similar when cases were limited to anti-citrullinated protein antibody (ACPA) positive individuals. Conclusion Among African Americans, RA risk imposed by heavy smoking appears to be mediated in part by genetic variation in NAT2. While further studies are needed to elucidate mechanisms underpinning these interactions, these SNPs appear to identify African American smokers at a much higher risk for RA with relative risks that are at least two-fold higher compared to non-smokers lacking these risk alleles. PMID:21989592
Multiple PAR and E4BP4 bZIP transcription factors in zebrafish: diverse spatial and temporal expression patterns.

PubMed

Ben-Moshe, Zohar; Vatine, Gad; Alon, Shahar; Tovin, Adi; Mracek, Philipp; Foulkes, Nicholas S; Gothilf, Yoav

2010-09-01

Circadian rhythms of physiology and behavior are generated by an autonomous circadian oscillator that is synchronized daily with the environment, mainly by light input. The PAR subfamily of transcriptional activators and the related E4BP4 repressor belonging to the basic leucine zipper (bZIP) family are clock-controlled genes that are suggested to mediate downstream circadian clock processes and to feedback onto the core oscillator. Here, the authors report the characterization of these genes in the zebrafish, an increasingly important model in the field of chronobiology. Five novel PAR and six novel e4bp4 zebrafish homolog genes were identified using bioinformatic tools and their coding sequences were cloned. Based on their evolutionary relationships, these genes were annotated as ztef2, zhlf1 and zhlf2, zdbp1 and zdbp2, and ze4bp4-1 to -6. The spatial and temporal mRNA expression pattern of each of these factors was characterized in zebrafish embryos in the context of a functional circadian clock and regulation by light. Nine of the factors exhibited augmented and rhythmic expression in the pineal gland, a central clock organ in zebrafish. Moreover, these genes were found to be regulated, to variable extents, by the circadian clock and/or by light. Differential expression patterns of multiple paralogs in zebrafish suggest multiple roles for these factors within the vertebrate circadian clock. This study, in the genetically accessible zebrafish model, lays the foundation for further research regarding the involvement and specific roles of PAR and E4BP4 transcription factors in the vertebrate circadian clock mechanism.
Significance of duon mutations in cancer genomes

NASA Astrophysics Data System (ADS)

Yadav, Vinod Kumar; Smith, Kyle S.; Flinders, Colin; Mumenthaler, Shannon M.; de, Subhajyoti

2016-06-01

Functional mutations in coding regions not only affect the structure and function of the protein products, but may also modulate their expression in some cases. This class of mutations, recently dubbed “duon mutations” due to their dual roles, can potentially have major impacts on downstream pathways. However their significance in diseases such as cancer remain unclear. In a survey covering 4606 samples from 19 cancer types, and integrating allelic expression, overall mRNA expression, regulatory motif perturbation, and chromatin signatures in one composite index called REDACT score, we identified potential duon mutations. Several such mutations are detected in known cancer genes in multiple cancer types. For instance a potential duon mutation in TP53 is associated with increased expression of the mutant allelic gene copy, thereby possibly amplifying the functional effects on the downstream pathways. Another potential duon mutation in SF3B1 is associated with abnormal splicing and changes in angiogenesis and matrix degradation related pathways. Our findings emphasize the need to interrogate the mutations in coding regions beyond their obvious effects on protein structures.
Survival in extreme environment by "preserve-expand-specialize" strategy: lessons from comparative genomics of an anhydrobiotic midge.

NASA Astrophysics Data System (ADS)

Gusev, Oleg; Sugimoto, Manabu; Novikova, Nataliya; Sychev, Vladimir; Okuda, Takashi; Kikawada, Takahiro

2012-07-01

Anhydrobiotic chironomid larvae of Polypedilum vanderplanki (Diptera) can withstand prolonged complete desiccation as well as other external stresses including ionizing radiation. Recent experiments showed that this insect is able to survive long-tern exposure to real outer space. At the same time, we found that dehydration causes alterations in chromatin structure and a severe fragmentation of nuclear DNA in the cells of the larvae despite successful anhydrobiosis. Analysis of several remote populations of the chironomid in Africa that desiccation-related DNA damage might be a driving genetic force for rapid radiation within the species. First results of ongoing genome project suggest that origin and evolution of anhydrobiosis in this single insect species related to rapid duplication of the genes, coding late embryogenesis abundant proteins (LEA) and other molecular agents directly involved in desiccation resistance in the cells. Analysis of genome-wide mRNA expression profiles in the larvae subjected to desiccation shows that joint-activity of large multiple-genes coding regions in the genome involved in control of anhydrobiosis-related molecular adaptations in the chironomid.
Type IV pili of Acidithiobacillus ferrooxidans can transfer electrons from extracellular electron donors.

PubMed

Li, Yongquan; Li, Hongyu

2014-03-01

Studies on Acidithiobacillus ferrooxidans accepting electrons from Fe(II) have previously focused on cytochrome c. However, we have discovered that, besides cytochrome c, type IV pili (Tfp) can transfer electrons. Here, we report conduction by Tfp of A. ferrooxidans analyzed with a conducting-probe atomic force microscope (AFM). The results indicate that the Tfp of A. ferrooxidans are highly conductive. The genome sequence of A. ferrooxidans ATCC 23270 contains two genes, pilV and pilW, which code for pilin domain proteins with the conserved amino acids characteristic of Tfp. Multiple alignment analysis of the PilV and PilW (pilin) proteins indicated that pilV is the adhesin gene while pilW codes for the major protein element of Tfp. The likely function of Tfp is to complete the circuit between the cell surface and Fe(II) oxides. These results indicate that Tfp of A. ferrooxidans might serve as biological nanowires transferring electrons from the surface of Fe(II) oxides to the cell surface. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Ribosome profiling: a Hi-Def monitor for protein synthesis at the genome-wide scale

PubMed Central

Michel, Audrey M; Baranov, Pavel V

2013-01-01

Ribosome profiling or ribo-seq is a new technique that provides genome-wide information on protein synthesis (GWIPS) in vivo. It is based on the deep sequencing of ribosome protected mRNA fragments allowing the measurement of ribosome density along all RNA molecules present in the cell. At the same time, the high resolution of this technique allows detailed analysis of ribosome density on individual RNAs. Since its invention, the ribosome profiling technique has been utilized in a range of studies in both prokaryotic and eukaryotic organisms. Several studies have adapted and refined the original ribosome profiling protocol for studying specific aspects of translation. Ribosome profiling of initiating ribosomes has been used to map sites of translation initiation. These studies revealed the surprisingly complex organization of translation initiation sites in eukaryotes. Multiple initiation sites are responsible for the generation of N-terminally extended and truncated isoforms of known proteins as well as for the translation of numerous open reading frames (ORFs), upstream of protein coding ORFs. Ribosome profiling of elongating ribosomes has been used for measuring differential gene expression at the level of translation, the identification of novel protein coding genes and ribosome pausing. It has also provided data for developing quantitative models of translation. Although only a dozen or so ribosome profiling datasets have been published so far, they have already dramatically changed our understanding of translational control and have led to new hypotheses regarding the origin of protein coding genes. © 2013 John Wiley & Sons, Ltd. PMID:23696005
Multiple component codes based generalized LDPC codes for high-speed optical transport.

PubMed

Djordjevic, Ivan B; Wang, Ting

2014-07-14

A class of generalized low-density parity-check (GLDPC) codes suitable for optical communications is proposed, which consists of multiple local codes. It is shown that Hamming, BCH, and Reed-Muller codes can be used as local codes, and that the maximum a posteriori probability (MAP) decoding of these local codes by Ashikhmin-Lytsin algorithm is feasible in terms of complexity and performance. We demonstrate that record coding gains can be obtained from properly designed GLDPC codes, derived from multiple component codes. We then show that several recently proposed classes of LDPC codes such as convolutional and spatially-coupled codes can be described using the concept of GLDPC coding, which indicates that the GLDPC coding can be used as a unified platform for advanced FEC enabling ultra-high speed optical transport. The proposed class of GLDPC codes is also suitable for code-rate adaption, to adjust the error correction strength depending on the optical channel conditions.
Identification of the Operon for the Sorbitol (Glucitol) Phosphoenolpyruvate:Sugar Phosphotransferase System in Streptococcus mutans

PubMed Central

Boyd, David A.; Thevenot, Tracy; Gumbmann, Markus; Honeyman, Allen L.; Hamilton, Ian R.

2000-01-01

Transposon mutagenesis and marker rescue were used to isolate and identify an 8.5-kb contiguous region containing six open reading frames constituting the operon for the sorbitol P-enolpyruvate phosphotransferase transport system (PTS) of Streptococcus mutans LT11. The first gene, srlD, codes for sorbitol-6-phosphate dehydrogenase, followed downstream by srlR, coding for a transcriptional regulator; srlM, coding for a putative activator; and the srlA, srlE, and srlB genes, coding for the EIIC, EIIBC, and EIIA components of the sorbitol PTS, respectively. Among all sorbitol PTS operons characterized to date, the srlD gene is found after the genes coding for the EII components; thus, the location of the gene in S. mutans is unique. The SrlR protein is similar to several transcriptional regulators found in Bacillus spp. that contain PTS regulator domains (J. Stülke, M. Arnaud, G. Rapoport, and I. Martin-Verstraete, Mol. Microbiol. 28:865–874, 1998), and its gene overlaps the srlM gene by 1 bp. The arrangement of these two regulatory genes is unique, having not been reported for other bacteria. PMID:10639465
Long non-coding RNAs and mRNAs profiling during spleen development in pig.

PubMed

Che, Tiandong; Li, Diyan; Jin, Long; Fu, Yuhua; Liu, Yingkai; Liu, Pengliang; Wang, Yixin; Tang, Qianzi; Ma, Jideng; Wang, Xun; Jiang, Anan; Li, Xuewei; Li, Mingzhou

2018-01-01

Genome-wide transcriptomic studies in humans and mice have become extensive and mature. However, a comprehensive and systematic understanding of protein-coding genes and long non-coding RNAs (lncRNAs) expressed during pig spleen development has not been achieved. LncRNAs are known to participate in regulatory networks for an array of biological processes. Here, we constructed 18 RNA libraries from developing fetal pig spleen (55 days before birth), postnatal pig spleens (0, 30, 180 days and 2 years after birth), and the samples from the 2-year-old Wild Boar. A total of 15,040 lncRNA transcripts were identified among these samples. We found that the temporal expression pattern of lncRNAs was more restricted than observed for protein-coding genes. Time-series analysis showed two large modules for protein-coding genes and lncRNAs. The up-regulated module was enriched for genes related to immune and inflammatory function, while the down-regulated module was enriched for cell proliferation processes such as cell division and DNA replication. Co-expression networks indicated the functional relatedness between protein-coding genes and lncRNAs, which were enriched for similar functions over the series of time points examined. We identified numerous differentially expressed protein-coding genes and lncRNAs in all five developmental stages. Notably, ceruloplasmin precursor (CP), a protein-coding gene participating in antioxidant and iron transport processes, was differentially expressed in all stages. This study provides the first catalog of the developing pig spleen, and contributes to a fuller understanding of the molecular mechanisms underpinning mammalian spleen development.
Gene regulatory network inference using fused LASSO on multiple data sets

PubMed Central

Omranian, Nooshin; Eloundou-Mbebi, Jeanne M. O.; Mueller-Roeber, Bernd; Nikoloski, Zoran

2016-01-01

Devising computational methods to accurately reconstruct gene regulatory networks given gene expression data is key to systems biology applications. Here we propose a method for reconstructing gene regulatory networks by simultaneous consideration of data sets from different perturbation experiments and corresponding controls. The method imposes three biologically meaningful constraints: (1) expression levels of each gene should be explained by the expression levels of a small number of transcription factor coding genes, (2) networks inferred from different data sets should be similar with respect to the type and number of regulatory interactions, and (3) relationships between genes which exhibit similar differential behavior over the considered perturbations should be favored. We demonstrate that these constraints can be transformed in a fused LASSO formulation for the proposed method. The comparative analysis on transcriptomics time-series data from prokaryotic species, Escherichia coli and Mycobacterium tuberculosis, as well as a eukaryotic species, mouse, demonstrated that the proposed method has the advantages of the most recent approaches for regulatory network inference, while obtaining better performance and assigning higher scores to the true regulatory links. The study indicates that the combination of sparse regression techniques with other biologically meaningful constraints is a promising framework for gene regulatory network reconstructions. PMID:26864687
Divergent transcription is associated with promoters of transcriptional regulators

PubMed Central

2013-01-01

Background Divergent transcription is a wide-spread phenomenon in mammals. For instance, short bidirectional transcripts are a hallmark of active promoters, while longer transcripts can be detected antisense from active genes in conditions where the RNA degradation machinery is inhibited. Moreover, many described long non-coding RNAs (lncRNAs) are transcribed antisense from coding gene promoters. However, the general significance of divergent lncRNA/mRNA gene pair transcription is still poorly understood. Here, we used strand-specific RNA-seq with high sequencing depth to thoroughly identify antisense transcripts from coding gene promoters in primary mouse tissues. Results We found that a substantial fraction of coding-gene promoters sustain divergent transcription of long non-coding RNA (lncRNA)/mRNA gene pairs. Strikingly, upstream antisense transcription is significantly associated with genes related to transcriptional regulation and development. Their promoters share several characteristics with those of transcriptional developmental genes, including very large CpG islands, high degree of conservation and epigenetic regulation in ES cells. In-depth analysis revealed a unique GC skew profile at these promoter regions, while the associated coding genes were found to have large first exons, two genomic features that might enforce bidirectional transcription. Finally, genes associated with antisense transcription harbor specific H3K79me2 epigenetic marking and RNA polymerase II enrichment profiles linked to an intensified rate of early transcriptional elongation. Conclusions We concluded that promoters of a class of transcription regulators are characterized by a specialized transcriptional control mechanism, which is directly coupled to relaxed bidirectional transcription. PMID:24365181
The Quest for Targets Executing MYC-Dependent Cell Transformation.

PubMed

Hartl, Markus

2016-01-01

MYC represents a transcription factor with oncogenic potential converting multiple cellular signals into a broad transcriptional response, thereby controlling the expression of numerous protein-coding and non-coding RNAs important for cell proliferation, metabolism, differentiation, and apoptosis. Constitutive activation of MYC leads to neoplastic cell transformation, and deregulated MYC alleles are frequently observed in many human cancer cell types. Multiple approaches have been performed to isolate genes differentially expressed in cells containing aberrantly activated MYC proteins leading to the identification of thousands of putative targets. Functional analyses of genes differentially expressed in MYC-transformed cells had revealed that so far more than 40 upregulated or downregulated MYC targets are actively involved in cell transformation or tumorigenesis. However, further systematic and selective approaches are required for determination of the known or yet unidentified targets responsible for processing the oncogenic MYC program. The search for critical targets in MYC-dependent tumor cells is exacerbated by the fact that during tumor development, cancer cells progressively evolve in a multistep process, thereby acquiring their characteristic features in an additive manner. Functional expression cloning, combinatorial gene expression, and appropriate in vivo tests could represent adequate tools for dissecting the complex scenario of MYC-specified cell transformation. In this context, the central goal is to identify a minimal set of targets that suffices to phenocopy oncogenic MYC. Recently developed genomic editing tools could be employed to confirm the requirement of crucial transformation-associated targets. Knowledge about essential MYC-regulated genes is beneficial to expedite the development of specific inhibitors to interfere with growth and viability of human tumor cells in which MYC is aberrantly activated. Approaches based on the principle of synthetic lethality using MYC-overexpressing cancer cells and chemical or RNAi libraries have been employed to search for novel anticancer drugs, also leading to the identification of several druggable targets. Targeting oncogenic MYC effector genes instead of MYC may lead to compounds with higher specificities and less side effects. This class of drugs could also display a wider pharmaceutical window because physiological functions of MYC, which are important for normal cell growth, proliferation, and differentiation would be less impaired.
The Quest for Targets Executing MYC-Dependent Cell Transformation

PubMed Central

Hartl, Markus

2016-01-01

MYC represents a transcription factor with oncogenic potential converting multiple cellular signals into a broad transcriptional response, thereby controlling the expression of numerous protein-coding and non-coding RNAs important for cell proliferation, metabolism, differentiation, and apoptosis. Constitutive activation of MYC leads to neoplastic cell transformation, and deregulated MYC alleles are frequently observed in many human cancer cell types. Multiple approaches have been performed to isolate genes differentially expressed in cells containing aberrantly activated MYC proteins leading to the identification of thousands of putative targets. Functional analyses of genes differentially expressed in MYC-transformed cells had revealed that so far more than 40 upregulated or downregulated MYC targets are actively involved in cell transformation or tumorigenesis. However, further systematic and selective approaches are required for determination of the known or yet unidentified targets responsible for processing the oncogenic MYC program. The search for critical targets in MYC-dependent tumor cells is exacerbated by the fact that during tumor development, cancer cells progressively evolve in a multistep process, thereby acquiring their characteristic features in an additive manner. Functional expression cloning, combinatorial gene expression, and appropriate in vivo tests could represent adequate tools for dissecting the complex scenario of MYC-specified cell transformation. In this context, the central goal is to identify a minimal set of targets that suffices to phenocopy oncogenic MYC. Recently developed genomic editing tools could be employed to confirm the requirement of crucial transformation-associated targets. Knowledge about essential MYC-regulated genes is beneficial to expedite the development of specific inhibitors to interfere with growth and viability of human tumor cells in which MYC is aberrantly activated. Approaches based on the principle of synthetic lethality using MYC-overexpressing cancer cells and chemical or RNAi libraries have been employed to search for novel anticancer drugs, also leading to the identification of several druggable targets. Targeting oncogenic MYC effector genes instead of MYC may lead to compounds with higher specificities and less side effects. This class of drugs could also display a wider pharmaceutical window because physiological functions of MYC, which are important for normal cell growth, proliferation, and differentiation would be less impaired. PMID:27313991
Origin and evolution of the long non-coding genes in the X-inactivation center.

PubMed

Romito, Antonio; Rougeulle, Claire

2011-11-01

Random X chromosome inactivation (XCI), the eutherian mechanism of X-linked gene dosage compensation, is controlled by a cis-acting locus termed the X-inactivation center (Xic). One of the striking features that characterize the Xic landscape is the abundance of loci transcribing non-coding RNAs (ncRNAs), including Xist, the master regulator of the inactivation process. Recent comparative genomic analyses have depicted the evolutionary scenario behind the origin of the X-inactivation center, revealing that this locus evolved from a region harboring protein-coding genes. During mammalian radiation, this ancestral protein-coding region was disrupted in the marsupial group, whilst it provided in eutherian lineage the starting material for the non-translated RNAs of the X-inactivation center. The emergence of non-coding genes occurred by a dual mechanism involving loss of protein-coding function of the pre-existing genes and integration of different classes of mobile elements, some of which modeled the structure and sequence of the non-coding genes in a species-specific manner. The rising genes started to produce transcripts that acquired function in regulating the epigenetic status of the X chromosome, as shown for Xist, its antisense Tsix, Jpx, and recently suggested for Ftx. Thus, the appearance of the Xic, which occurred after the divergence between eutherians and marsupials, was the basis for the evolution of random X inactivation as a strategy to achieve dosage compensation. Copyright © 2011. Published by Elsevier Masson SAS.
Complete genome of Nitrosospira briensis C-128, an ammonia-oxidizing bacterium from agricultural soil

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rice, Marlen C.; Norton, Jeanette M.; Valois, Frederica

Nitrosospira briensis C-128 is an ammonia-oxidizing bacterium isolated from an acid agricultural soil. N. briensis C-128 was sequenced with PacBio RS technologies at the DOE-Joint Genome Institute through their Community Science Program (2010). The high-quality finished genome contains one chromosome of 3.21 Mb and no plasmids. We identified 3073 gene models, 3018 of which are protein coding. The two-way average nucleotide identity between the chromosomes of Nitrosospira multiformis ATCC 25196 and Nitrosospira briensis C-128 was found to be 77.2 %. Multiple copies of modules encoding chemolithotrophic metabolism were identified in their genomic context. The gene inventory supports chemolithotrophic metabolism withmore » implications for function in soil environments.« less
Complete genome of Nitrosospira briensis C-128, an ammonia-oxidizing bacterium from agricultural soil

DOE PAGES

Rice, Marlen C.; Norton, Jeanette M.; Valois, Frederica; ...

2016-07-28

Nitrosospira briensis C-128 is an ammonia-oxidizing bacterium isolated from an acid agricultural soil. N. briensis C-128 was sequenced with PacBio RS technologies at the DOE-Joint Genome Institute through their Community Science Program (2010). The high-quality finished genome contains one chromosome of 3.21 Mb and no plasmids. We identified 3073 gene models, 3018 of which are protein coding. The two-way average nucleotide identity between the chromosomes of Nitrosospira multiformis ATCC 25196 and Nitrosospira briensis C-128 was found to be 77.2 %. Multiple copies of modules encoding chemolithotrophic metabolism were identified in their genomic context. The gene inventory supports chemolithotrophic metabolism withmore » implications for function in soil environments.« less
Genetic variation and gene expression across multiple tissues and developmental stages in a non-human primate

PubMed Central

Jasinska, Anna J.; Zelaya, Ivette; Service, Susan K.; Peterson, Christine B.; Cantor, Rita M.; Choi, Oi-Wa; DeYoung, Joseph; Eskin, Eleazar; Fairbanks, Lynn A.; Fears, Scott; Furterer, Allison E.; Huang, Yu S.; Ramensky, Vasily; Schmitt, Christopher A.; Svardal, Hannes; Jorgensen, Matthew J.; Kaplan, Jay R.; Villar, Diego; Aken, Bronwen L.; Flicek, Paul; Nag, Rishi; Wong, Emily S.; Blangero, John; Dyer, Thomas D.; Bogomolov, Marina; Benjamini, Yoav; Weinstock, George M.; Dewar, Ken; Sabatti, Chiara; Wilson, Richard K.; Jentsch, J. David; Warren, Wesley; Coppola, Giovanni; Woods, Roger P.; Freimer, Nelson B.

2017-01-01

By analyzing multi-tissue gene expression and genome-wide genetic variation data in samples from a vervet monkey pedigree, we generated a transcriptome resource and produced the first catalogue of expression quantitative trait loci (eQTLs) in a non-human primate model. This catalogue contains more genome-wide significant eQTLs, per sample, than comparable human resources, and reveals sex and age-related expression patterns. Findings include a master regulatory locus that likely plays a role in immune function, and a locus regulating hippocampal long non-coding RNAs (lncRNAs), whose expression correlates with hippocampal volume. This resource will facilitate genetic investigation of quantitative traits, including brain and behavioral phenotypes relevant to neuropsychiatric disorders. PMID:29083405
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers

DOE PAGES

Wang, Bei; Ethier, Stephane; Tang, William; ...

2017-06-29

The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Bei; Ethier, Stephane; Tang, William

The Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5D Vlasov-Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P's multiple levels of parallelism, including inter-node 2D domain decomposition and particle decomposition, as well as intra-node shared memory partition and vectorization have enabled pushing the scalability ofmore » the PIC method to extreme computational scales. In this paper, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) co-processors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of Ion-Temperature-Gradient (ITG) driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.« less
Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae.

PubMed

Michel, Christian J; Ngoune, Viviane Nguefack; Poch, Olivier; Ripp, Raymond; Thompson, Julie D

2017-12-03

A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae . Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae . We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae , but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.

Genome Sequence of Microbulbifer mangrovi DD-13T Reveals Its Versatility to Degrade Multiple Polysaccharides.

PubMed

Imran, Md; Pant, Poonam; Shanbhag, Yogini P; Sawant, Samir V; Ghadi, Sanjeev C

2017-02-01

Microbulbifer mangrovi strain DD-13 T is a novel-type species isolated from the mangroves of Goa, India. The draft genome sequence of strain DD-13 comprised 4,528,106 bp with G+C content of 57.15%. Out of 3479 open reading frames, functions for 3488 protein coding sequences were predicted on the basis of similarity with the cluster of orthologous groups. In addition to protein coding sequences, 34 tRNA genes and 3 rRNA genes were detected. Analysis of nucleotide sequence of predicted gene using a Carbohydrate-Active Enzymes (CAZymes) Analysis Toolkit indicates that strain DD-13 encodes a large set of CAZymes including 255 glycoside hydrolases, 76 carbohydrate esterases, 17 polysaccharide lyases, and 113 carbohydrate-binding modules (CBMs). Many genes from strain DD-13 were annotated as carbohydrases specific for degradation of agar, alginate, carrageenan, chitin, xylan, pullulan, cellulose, starch, β-glucan, pectin, etc. Some of polysaccharide-degrading genes were highly modular and were appended at least with one CBM indicating the versatility of strain DD-13 to degrade complex polysaccharides. The cell growth of strain DD-13 was validated using pure polysaccharides such as agarose or alginate as carbon source as well as by using red and brown seaweed powder as substrate. The homologous carbohydrase produced by strain DD-13 during growth degraded the polysaccharide, ensuring the production of metabolizable reducing sugars. Additionally, several other polysaccharides such as carrageenan, xylan, pullulan, pectin, starch, and carboxymethyl cellulose were also corroborated as growth substrate for strain DD-13 and were associated with concomitant production of homologous carbohydrase.
Sudden infant death syndrome (SIDS) and polymorphisms in Monoamine oxidase A gene (MAOA): a revisit.

PubMed

Groß, Maximilian; Bajanowski, Thomas; Vennemann, Mechtild; Poetsch, Micaela

2014-01-01

Literature describes multiple possible links between genetic variations in the neuroadrenergic system and the occurrence of sudden infant death syndrome. The X-chromosomal Monoamine oxidase A (MAOA) is one of the genes with regulatory activity in the noradrenergic and serotonergic neuronal systems and a polymorphism of the promoter which affects the activity of this gene has been proclaimed to contribute significantly to the prevalence of sudden infant death syndrome (SIDS) in three studies from 2009, 2012 and 2013. However, these studies described different significant correlations regarding gender or age of children. Since several studies, suggesting associations between genetic variations and SIDS, were disproved by follow-up analysis, this study was conducted to take a closer look at the MAOA gene and its polymorphisms. The functional MAOA promoter length polymorphism was investigated in 261 SIDS cases and 93 control subjects. Moreover, the allele distribution of 12 coding and non-coding single nucleotide polymorphisms (SNPs) of the MAOA gene was examined in 285 SIDS cases and 93 controls by a minisequencing technique. In contrast to prior studies with fewer individuals, no significant correlations between the occurrence of SIDS and the frequency of allele variants of the promoter polymorphism could be demonstrated, even including the results from the abovementioned previous studies. Regarding the SNPs, three statistically significant associations were observed which had not been described before. This study clearly disproves interactions between MAOA promoter polymorphisms and SIDS, even if variations in single nucleotide polymorphisms of MAOA should be subjected to further analysis to clarify their impact on SIDS.
The human serotonin 5-HT{sub 2C} receptor: Complete cDNA, genomic structure, and alternatively spliced variant

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xie, Enzhong; Zhu, Lingyu; Zhao, Lingyun

1996-08-01

The complete 4775-nt cDNA encoding the human serotonin 5-HT{sub 2C} receptor (5-HT{sub 2C}R), a G-protein-coupled receptor, has been isolated. It contains a 1377-nt coding region flanked by a 728-nt 5{prime}-untranslated region and a 2670-nt 3{prime}-untranslated region. By using the cloned 5-HT{sub 2C}R cDNA probe, the complete human gene for this receptor has been isolated and shown to contain six exons and five introns spanning at least 230 kb of DNA. The coding region of the human 5-HT{sub 2C}R gene is interrupted by three introns, and the positions of the intron/exon junctions are conserved between the human and the rodent genes.more » In addition, an alternatively spliced 5-HT{sub 2C}R RNA that contains a 95-nt deletion in the region coding for the second intracellular loop and the fourth transmembrane domain of the receptor has been identified. This deletion leads to a frameshift and premature termination so that the short isoform RNA encodes a putative protein of 248 amino acids. The ratio for the short isoform over the 5-HT{sub 2C}R RNA was found to be higher in choroid plexus tumor than in normal brain tissue, suggesting the possibility of differential regulation of the 5-HT{sub 2C}R gene in different neural tissues or during tumorigenesis. Transcription of the human 5-HT{sub 2C}R gene was found to be initiated at multiple sites. No classical TATA-box sequence was found at the appropriate location, and the 5{prime}-flanking sequence contains many potential transcription factor-binding sites. A 7.3-kb 5{prime}-flanking 5-HT{sub 2C}R DNA directed the efficient expression of a luciferase reported gene in SK-N-SH and IMR32 neuroblastoma cells, indicating that is contains a functional promoter. 69 refs., 8 figs., 1 tab.« less
Multiple description distributed image coding with side information for mobile wireless transmission

NASA Astrophysics Data System (ADS)

Wu, Min; Song, Daewon; Chen, Chang Wen

2005-03-01

Multiple description coding (MDC) is a source coding technique that involves coding the source information into multiple descriptions, and then transmitting them over different channels in packet network or error-prone wireless environment to achieve graceful degradation if parts of descriptions are lost at the receiver. In this paper, we proposed a multiple description distributed wavelet zero tree image coding system for mobile wireless transmission. We provide two innovations to achieve an excellent error resilient capability. First, when MDC is applied to wavelet subband based image coding, it is possible to introduce correlation between the descriptions in each subband. We consider using such a correlation as well as potentially error corrupted description as side information in the decoding to formulate the MDC decoding as a Wyner Ziv decoding problem. If only part of descriptions is lost, however, their correlation information is still available, the proposed Wyner Ziv decoder can recover the description by using the correlation information and the error corrupted description as side information. Secondly, in each description, single bitstream wavelet zero tree coding is very vulnerable to the channel errors. The first bit error may cause the decoder to discard all subsequent bits whether or not the subsequent bits are correctly received. Therefore, we integrate the multiple description scalar quantization (MDSQ) with the multiple wavelet tree image coding method to reduce error propagation. We first group wavelet coefficients into multiple trees according to parent-child relationship and then code them separately by SPIHT algorithm to form multiple bitstreams. Such decomposition is able to reduce error propagation and therefore improve the error correcting capability of Wyner Ziv decoder. Experimental results show that the proposed scheme not only exhibits an excellent error resilient performance but also demonstrates graceful degradation over the packet loss rate.
Missing genes, multiple ORFs, and C-to-U type RNA editing in Acrasis kona (Heterolobosea, Excavata) mitochondrial DNA.

PubMed

Fu, Cheng-Jie; Sheikh, Sanea; Miao, Wei; Andersson, Siv G E; Baldauf, Sandra L

2014-08-21

Discoba (Excavata) is an ancient group of eukaryotes with great morphological and ecological diversity. Unlike the other major divisions of Discoba (Jakobida and Euglenozoa), little is known about the mitochondrial DNAs (mtDNAs) of Heterolobosea. We have assembled a complete mtDNA genome from the aggregating heterolobosean amoeba, Acrasis kona, which consists of a single circular highly AT-rich (83.3%) molecule of 51.5 kb. Unexpectedly, A. kona mtDNA is missing roughly 40% of the protein-coding genes and nearly half of the transfer RNAs found in the only other sequenced heterolobosean mtDNAs, those of Naegleria spp. Instead, over a quarter of A. kona mtDNA consists of novel open reading frames. Eleven of the 16 protein-coding genes missing from A. kona mtDNA were identified in its nuclear DNA and polyA RNA, and phylogenetic analyses indicate that at least 10 of these 11 putative nuclear-encoded mitochondrial (NcMt) proteins arose by direct transfer from the mitochondrion. Acrasis kona mtDNA also employs C-to-U type RNA editing, and 12 homologs of DYW-type pentatricopeptide repeat (PPR) proteins implicated in plant organellar RNA editing are found in A. kona nuclear DNA. A mapping of mitochondrial gene content onto a consensus phylogeny reveals a sporadic pattern of relative stasis and rampant gene loss in Discoba. Rampant loss occurred independently in the unique common lineage leading to Heterolobosea + Tsukubamonadida and later in the unique lineage leading to Acrasis. Meanwhile, mtDNA gene content appears to be remarkably stable in the Acrasis sister lineage leading to Naegleria and in their distant relatives Jakobida. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
A Common histone modification code on C4 genes in maize and its conservation in Sorghum and Setaria italica.

PubMed

Heimann, Louisa; Horst, Ina; Perduns, Renke; Dreesen, Björn; Offermann, Sascha; Peterhansel, Christoph

2013-05-01

C4 photosynthesis evolved more than 60 times independently in different plant lineages. Each time, multiple genes were recruited into C4 metabolism. The corresponding promoters acquired new regulatory features such as high expression, light induction, or cell type-specific expression in mesophyll or bundle sheath cells. We have previously shown that histone modifications contribute to the regulation of the model C4 phosphoenolpyruvate carboxylase (C4-Pepc) promoter in maize (Zea mays). We here tested the light- and cell type-specific responses of three selected histone acetylations and two histone methylations on five additional C4 genes (C4-Ca, C4-Ppdk, C4-Me, C4-Pepck, and C4-RbcS2) in maize. Histone acetylation and nucleosome occupancy assays indicated extended promoter regions with regulatory upstream regions more than 1,000 bp from the transcription initiation site for most of these genes. Despite any detectable homology of the promoters on the primary sequence level, histone modification patterns were highly coregulated. Specifically, H3K9ac was regulated by illumination, whereas H3K4me3 was regulated in a cell type-specific manner. We further compared histone modifications on the C4-Pepc and C4-Me genes from maize and the homologous genes from sorghum (Sorghum bicolor) and Setaria italica. Whereas sorghum and maize share a common C4 origin, C4 metabolism evolved independently in S. italica. The distribution of histone modifications over the promoters differed between the species, but differential regulation of light-induced histone acetylation and cell type-specific histone methylation were evident in all three species. We propose that a preexisting histone code was recruited into C4 promoter control during the evolution of C4 metabolism.
Synergistic interactions of biotic and abiotic environmental stressors on gene expression.

PubMed

Altshuler, Ianina; McLeod, Anne M; Colbourne, John K; Yan, Norman D; Cristescu, Melania E

2015-03-01

Understanding the response of organisms to multiple stressors is critical for predicting if populations can adapt to rapid environmental change. Natural and anthropogenic stressors often interact, complicating general predictions. In this study, we examined the interactive and cumulative effects of two common environmental stressors, lowered calcium concentration, an anthropogenic stressor, and predator presence, a natural stressor, on the water flea Daphnia pulex. We analyzed expression changes of five genes involved in calcium homeostasis - cuticle proteins (Cutie, Icp2), calbindin (Calb), and calcium pump and channel (Serca and Ip3R) - using real-time quantitative PCR (RT-qPCR) in a full factorial experiment. We observed strong synergistic interactions between low calcium concentration and predator presence. While the Ip3R gene was not affected by the stressors, the other four genes were affected in their transcriptional levels by the combination of the stressors. Transcriptional patterns of genes that code for cuticle proteins (Cutie and Icp2) and a sarcoplasmic calcium pump (Serca) only responded to the combination of stressors, changing their relative expression levels in a synergistic response, while a calcium-binding protein (Calb) responded to low calcium stress and the combination of both stressors. The expression pattern of these genes (Cutie, Icp2, and Serca) were nonlinear, yet they were dose dependent across the calcium gradient. Multiple stressors can have complex, often unexpected effects on ecosystems. This study demonstrates that the dominant interaction for the set of tested genes appears to be synergism. We argue that gene expression patterns can be used to understand and predict the type of interaction expected when organisms are exposed simultaneously to natural and anthropogenic stressors.
Identification of a reference gene for the quantification of mRNA and miRNA expression during skin wound healing.

PubMed

Etich, Julia; Bergmeier, Vera; Pitzler, Lena; Brachvogel, Bent

2017-03-01

Wound healing is a coordinated process to restore tissue homeostasis and reestablish the protective barrier of the skin. miRNAs may modulate the expression of target genes to contribute to repair processes, but due to the complexity of the tissue it is challenging to quantify gene expression during the distinct phases of wound repair. Here, we aimed to identify a common reference gene to quantify changes in miRNA and mRNA expression during skin wound healing. Quantitative real-time PCR and bioinformatic analysis tools were used to identify suitable reference genes during skin repair and their reliability was tested by studying the expression of mRNAs and miRNAs. Morphological assessment of wounds showed that the injury model recapitulates the distinct phases of skin repair. Non-degraded RNA could be isolated from skin and wounds and used to study the expression of non-coding small nuclear RNAs during wound healing. Among those, RNU6B was most constantly expressed during skin repair. Using this reference gene we could confirm the transient upregulation of IL-1β and PTPRC/CD45 during the early phase as well as the increased expression of collagen type I at later stages of repair and validate the differential expression of miR-204, miR-205, and miR-31 in skin wounds. In contrast to Gapdh the normalization to multiple reference genes gave a similar outcome. RNU6B is an accurate alternative normalizer to quantify mRNA and miRNA expression during the distinct phases of skin wound healing when analysis of multiple reference genes is not feasible.
Defective minor spliceosome mRNA processing results in isolated familial growth hormone deficiency

PubMed Central

Argente, Jesús; Flores, Raquel; Gutiérrez-Arumí, Armand; Verma, Bhupendra; Martos-Moreno, Gabriel Á; Cuscó, Ivon; Oghabian, Ali; Chowen, Julie A; Frilander, Mikko J; Pérez-Jurado, Luis A

2014-01-01

The molecular basis of a significant number of cases of isolated growth hormone deficiency remains unknown. We describe three sisters affected with severe isolated growth hormone deficiency and pituitary hypoplasia caused by biallelic mutations in the RNPC3 gene, which codes for a minor spliceosome protein required for U11/U12 small nuclear ribonucleoprotein (snRNP) formation and splicing of U12-type introns. We found anomalies in U11/U12 di-snRNP formation and in splicing of multiple U12-type introns in patient cells. Defective transcripts include preprohormone convertases SPCS2 and SPCS3 and actin-related ARPC5L genes, which are candidates for the somatotroph-restricted dysfunction. The reported novel mechanism for familial growth hormone deficiency demonstrates that general mRNA processing defects of the minor spliceosome can lead to very narrow tissue-specific consequences. Subject Categories Genetics, Gene Therapy ' Genetic Disease; Metabolism PMID:24480542
Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression.

PubMed

Fairfax, Benjamin P; Humburg, Peter; Makino, Seiko; Naranbhai, Vivek; Wong, Daniel; Lau, Evelyn; Jostins, Luke; Plant, Katharine; Andrews, Robert; McGee, Chris; Knight, Julian C

2014-03-07

To systematically investigate the impact of immune stimulation upon regulatory variant activity, we exposed primary monocytes from 432 healthy Europeans to interferon-γ (IFN-γ) or differing durations of lipopolysaccharide and mapped expression quantitative trait loci (eQTLs). More than half of cis-eQTLs identified, involving hundreds of genes and associated pathways, are detected specifically in stimulated monocytes. Induced innate immune activity reveals multiple master regulatory trans-eQTLs including the major histocompatibility complex (MHC), coding variants altering enzyme and receptor function, an IFN-β cytokine network showing temporal specificity, and an interferon regulatory factor 2 (IRF2) transcription factor-modulated network. Induced eQTL are significantly enriched for genome-wide association study loci, identifying context-specific associations to putative causal genes including CARD9, ATM, and IRF8. Thus, applying pathophysiologically relevant immune stimuli assists resolution of functional genetic variants.
Overexpression of pyruvate decarboxylase in the yeast Hansenula polymorpha results in increased ethanol yield in high-temperature fermentation of xylose.

PubMed

Ishchuk, Olena P; Voronovsky, Andriy Y; Stasyk, Oleh V; Gayda, Galina Z; Gonchar, Mykhailo V; Abbas, Charles A; Sibirny, Andriy A

2008-11-01

Improvement of xylose fermentation is of great importance to the fuel ethanol industry. The nonconventional thermotolerant yeast Hansenula polymorpha naturally ferments xylose to ethanol at high temperatures (48-50 degrees C). Introduction of a mutation that impairs ethanol reutilization in H. polymorpha led to an increase in ethanol yield from xylose. The native and heterologous (Kluyveromyces lactis) PDC1 genes coding for pyruvate decarboxylase were expressed at high levels in H. polymorpha under the control of the strong constitutive promoter of the glyceraldehyde-3-phosphate dehydrogenase gene (GAPDH). This resulted in increased pyruvate decarboxylase activity and improved ethanol production from xylose. The introduction of multiple copies of the H. polymorpha PDC1 gene driven by the strong constitutive promoter led to a 20-fold increase in pyruvate decarboxylase activity and up to a threefold elevation of ethanol production.
De Novo Coding Variants Are Strongly Associated with Tourette Disorder

PubMed Central

Willsey, A. Jeremy; Fernandez, Thomas V.; Yu, Dongmei; King, Robert A.; Dietrich, Andrea; Xing, Jinchuan; Sanders, Stephan J.; Mandell, Jeffrey D.; Huang, Alden Y.; Richer, Petra; Smith, Louw; Dong, Shan; Samocha, Kaitlin E.; Neale, Benjamin M.; Coppola, Giovanni; Mathews, Carol A.; Tischfield, Jay A.; Scharf, Jeremiah M.; State, Matthew W.; Heiman, Gary A.

2017-01-01

SUMMARY Whole-exome sequencing (WES) and de novo variant detection have proven a powerful approach to gene discovery in complex neurodevelopmental disorders. We have completed WES of 325 Tourette disorder trios from the Tourette International Collaborative Genetics cohort and a replication sample of 186 trios from the Tourette Syndrome Association International Consortium on Genetics (511 total). We observe strong and consistent evidence for the contribution of de novo likely gene-disrupting (LGD) variants (rate ratio [RR] 2.32, p = 0.002). Additionally, de novo damaging variants (LGD and probably damaging missense) are overrepresented in probands (RR 1.37, p = 0.003). We identify four likely risk genes with multiple de novo damaging variants in unrelated probands: WWC1 (WW and C2 domain containing 1), CELSR3 (Cadherin EGF LAG seven-pass G-type receptor 3), NIPBL (Nipped-B-like), and FN1 (fibronectin 1). Overall, we estimate that de novo damaging variants in approximately 400 genes contribute risk in 12% of clinical cases. PMID:28472652
The Natural History of Class I Primate Alcohol Dehydrogenases Includes Gene Duplication, Gene Loss, and Gene Conversion

PubMed Central

Carrigan, Matthew A.; Uryasev, Oleg; Davis, Ross P.; Zhai, LanMin; Hurley, Thomas D.; Benner, Steven A.

2012-01-01

Background Gene duplication is a source of molecular innovation throughout evolution. However, even with massive amounts of genome sequence data, correlating gene duplication with speciation and other events in natural history can be difficult. This is especially true in its most interesting cases, where rapid and multiple duplications are likely to reflect adaptation to rapidly changing environments and life styles. This may be so for Class I of alcohol dehydrogenases (ADH1s), where multiple duplications occurred in primate lineages in Old and New World monkeys (OWMs and NWMs) and hominoids. Methodology/Principal Findings To build a preferred model for the natural history of ADH1s, we determined the sequences of nine new ADH1 genes, finding for the first time multiple paralogs in various prosimians (lemurs, strepsirhines). Database mining then identified novel ADH1 paralogs in both macaque (an OWM) and marmoset (a NWM). These were used with the previously identified human paralogs to resolve controversies relating to dates of duplication and gene conversion in the ADH1 family. Central to these controversies are differences in the topologies of trees generated from exonic (coding) sequences and intronic sequences. Conclusions/Significance We provide evidence that gene conversions are the primary source of difference, using molecular clock dating of duplications and analyses of microinsertions and deletions (micro-indels). The tree topology inferred from intron sequences appear to more correctly represent the natural history of ADH1s, with the ADH1 paralogs in platyrrhines (NWMs) and catarrhines (OWMs and hominoids) having arisen by duplications shortly predating the divergence of OWMs and NWMs. We also conclude that paralogs in lemurs arose independently. Finally, we identify errors in database interpretation as the source of controversies concerning gene conversion. These analyses provide a model for the natural history of ADH1s that posits four ADH1 paralogs in the ancestor of Catarrhine and Platyrrhine primates, followed by the loss of an ADH1 paralog in the human lineage. PMID:22859968
Decoding the genome beyond sequencing: the new phase of genomic research.

PubMed

Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J

2011-10-01

While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. Copyright © 2011 Elsevier Inc. All rights reserved.
Primer development to obtain complete coding sequence of HA and NA genes of influenza A/H3N2 virus.

PubMed

Agustiningsih, Agustiningsih; Trimarsanto, Hidayat; Setiawaty, Vivi; Artika, I Made; Muljono, David Handojo

2016-08-30

Influenza is an acute respiratory illness and has become a serious public health problem worldwide. The need to study the HA and NA genes in influenza A virus is essential since these genes frequently undergo mutations. This study describes the development of primer sets for RT-PCR to obtain complete coding sequence of Hemagglutinin (HA) and Neuraminidase (NA) genes of influenza A/H3N2 virus from Indonesia. The primers were developed based on influenza A/H3N2 sequence worldwide from Global Initiative on Sharing All Influenza Data (GISAID) and further tested using Indonesian influenza A/H3N2 archived samples of influenza-like illness (ILI) surveillance from 2008 to 2009. An optimum RT-PCR condition was acquired for all HA and NA fragments designed to cover complete coding sequence of HA and NA genes. A total of 71 samples were successfully sequenced for complete coding sequence both of HA and NA genes out of 145 samples of influenza A/H3N2 tested. The developed primer sets were suitable for obtaining complete coding sequences of HA and NA genes of Indonesian samples from 2008 to 2009.
Prevalence of virulence genes in Escherichia coli strains isolated from Romanian adult urinary tract infection cases.

PubMed

Usein, C R; Damian, M; Tatu-Chitoiu, D; Capusa, C; Fagaras, R; Tudorache, D; Nica, M; Le Bouguénec, C

2001-01-01

A total of 78 E. coli strains isolated from adults with different types of urinary tract infections were screened by polymerase chain reaction for prevalence of genetic regions coding for virulence factors. The targeted genetic determinants were those coding for type 1 fimbriae (fimH), pili associated with pyelonephritis (pap), S and F1C fimbriae (sfa and foc), afimbrial adhesins (afa), hemolysin (hly), cytotoxic necrotizing factor (cnf), aerobactin (aer). Among the studied strains, the prevalence of genes coding for fimbrial adhesive systems was 86%, 36%, and 23% for fimH, pap, and sfa/foc,respectively. The operons coding for Afa afimbrial adhesins were identified in 14% of strains. The hly and cnf genes coding for toxins were amplified in 23% and 13% of strains, respectively. A prevalence of 54% was found for the aer gene. The various combinations of detected genes were designated as virulence patterns. The strains isolated from the hospitalized patients displayed a greater number of virulence genes and a diversity of gene associations compared to the strains isolated from the ambulatory subjects. A rapid assessment of the bacterial pathogenicity characteristics may contribute to a better medical approach of the patients with urinary tract infections.
Cooperative MIMO communication at wireless sensor network: an error correcting code approach.

PubMed

Islam, Mohammad Rakibul; Han, Young Shin

2011-01-01

Cooperative communication in wireless sensor network (WSN) explores the energy efficient wireless communication schemes between multiple sensors and data gathering node (DGN) by exploiting multiple input multiple output (MIMO) and multiple input single output (MISO) configurations. In this paper, an energy efficient cooperative MIMO (C-MIMO) technique is proposed where low density parity check (LDPC) code is used as an error correcting code. The rate of LDPC code is varied by varying the length of message and parity bits. Simulation results show that the cooperative communication scheme outperforms SISO scheme in the presence of LDPC code. LDPC codes with different code rates are compared using bit error rate (BER) analysis. BER is also analyzed under different Nakagami fading scenario. Energy efficiencies are compared for different targeted probability of bit error p(b). It is observed that C-MIMO performs more efficiently when the targeted p(b) is smaller. Also the lower encoding rate for LDPC code offers better error characteristics.
Cooperative MIMO Communication at Wireless Sensor Network: An Error Correcting Code Approach

PubMed Central

Islam, Mohammad Rakibul; Han, Young Shin

2011-01-01

Cooperative communication in wireless sensor network (WSN) explores the energy efficient wireless communication schemes between multiple sensors and data gathering node (DGN) by exploiting multiple input multiple output (MIMO) and multiple input single output (MISO) configurations. In this paper, an energy efficient cooperative MIMO (C-MIMO) technique is proposed where low density parity check (LDPC) code is used as an error correcting code. The rate of LDPC code is varied by varying the length of message and parity bits. Simulation results show that the cooperative communication scheme outperforms SISO scheme in the presence of LDPC code. LDPC codes with different code rates are compared using bit error rate (BER) analysis. BER is also analyzed under different Nakagami fading scenario. Energy efficiencies are compared for different targeted probability of bit error pb. It is observed that C-MIMO performs more efficiently when the targeted pb is smaller. Also the lower encoding rate for LDPC code offers better error characteristics. PMID:22163732
The Mitochondrial Cytochrome Oxidase Subunit I Gene Occurs on a Minichromosome with Extensive Heteroplasmy in Two Species of Chewing Lice, Geomydoecus aurei and Thomomydoecus minor

PubMed Central

Pietan, Lucas L.; Spradling, Theresa A.

2016-01-01

In animals, mitochondrial DNA (mtDNA) typically occurs as a single circular chromosome with 13 protein-coding genes and 22 tRNA genes. The various species of lice examined previously, however, have shown mitochondrial genome rearrangements with a range of chromosome sizes and numbers. Our research demonstrates that the mitochondrial genomes of two species of chewing lice found on pocket gophers, Geomydoecus aurei and Thomomydoecus minor, are fragmented with the 1,536 base-pair (bp) cytochrome-oxidase subunit I (cox1) gene occurring as the only protein-coding gene on a 1,916–1,964 bp minicircular chromosome in the two species, respectively. The cox1 gene of T. minor begins with an atypical start codon, while that of G. aurei does not. Components of the non-protein coding sequence of G. aurei and T. minor include a tRNA (isoleucine) gene, inverted repeat sequences consistent with origins of replication, and an additional non-coding region that is smaller than the non-coding sequence of other lice with such fragmented mitochondrial genomes. Sequences of cox1 minichromosome clones for each species reveal extensive length and sequence heteroplasmy in both coding and noncoding regions. The highly variable non-gene regions of G. aurei and T. minor have little sequence similarity with one another except for a 19-bp region of phylogenetically conserved sequence with unknown function. PMID:27589589
Cellular miR-2909 RNomics governs the genes that ensure immune checkpoint regulation.

PubMed

Kaul, Deepak; Malik, Deepti; Wani, Sameena

2018-06-20

Cross-talk between coding RNAs and regulatory non-coding microRNAs, within human genome, has provided compelling evidence for the existence of flexible checkpoint control of T-Cell activation. The present study attempts to demonstrate that the interplay between miR-2909 and its effector KLF4 gene has the inherent capacity to regulate genes coding for CTLA4, CD28, CD40, CD134, PDL1, CD80, CD86, IL-6 and IL-10 within normal human peripheral blood mononuclear cells (PBMCs). Based upon these findings, we propose a pathway that links miR-2909 RNomics with the genes coding for immune checkpoint regulators required for the maintenance of immune homeostasis.

Draft genome sequence of carbapenem-resistant Shewanella algae strain AC isolated from small abalone (Haliotis diversicolor).

PubMed

Huang, Yao-Ting; Cheng, Jan-Fang; Chen, Shi-Yu; Hong, Yu-Kai; Wu, Zong-Yen; Liu, Po-Yu

2018-06-19

Shewanella algae is an environmental marine bacteria and an emerging opportunistic human pathogen. Moreover, there are increasing reports of strains showing multi-drug resistance, particularly carbapenem-resistant isolates. Although S. algae have been found in bivalve shellfish aquaculture, there is very little genome-wide data on resistant determinants in S. algae from shellfish. In the study, we aimed to determine the whole genome sequence of carbapenem-resistant S. algae strain AC isolated from small abalone in Taiwan. Genome DNA was sequenced using an Illumina MiSeq platform using 250bp paired-end reads. De novo genome assembly was performed using Velvet v1.2.07. The whole genome was annotated and several candidate genes for antimicrobial resistance were identified. The genome size was calculated at 4,751,156bp, with a mean G＋C content of 53.09%. A total of 4,164 protein-coding sequences, 7 rRNAs, 85 tRNAs, and 5 non-coding RNAs were identified. The genome contains genes associated with resistance to β-lactams, trimethoprim, tetracycline, colistin, and quinolone resistance. Multiple efflux pump genes were also detected. Small abalone is a potential source of foodborne drug resistant S. algae. The genome sequence of a carbapenem-resistant S. algae strain AC isolated from small abalone will provide valuable information for further study of the dissemination of resistance genes at the human-animal interface. Copyright © 2018. Published by Elsevier Ltd.
Association of Genetic Variants of Small Non-Coding RNAs with Survival in Colorectal Cancer

PubMed Central

Pao, Jiunn-Bey; Lu, Te-Ling; Ting, Wen-Chien; Chen, Lu-Min; Bao, Bo-Ying

2018-01-01

Background: Single nucleotide polymorphisms (SNPs) of small non-coding RNAs (sncRNAs) can influence sncRNA function and target gene expression to mediate the risk of certain diseases. The aim of the present study was to evaluate the prognostic relevance of sncRNA SNPs for colorectal cancer, which has not been well characterized to date. Methods: We comprehensively examined 31 common SNPs of sncRNAs, and assessed the impact of these variants on survival in a cohort of 188 patients with colorectal cancer. Results: Three SNPs were significantly associated with survival of patients with colorectal cancer after correction for multiple testing, and two of the SNPs (hsa-mir-196a-2 rs11614913 and U85 rs714775) remained significant in multivariate analyses. Additional in silico analysis provided further evidence of this association, since the expression levels of the target genes of the hsa-miR-196a (HOXA7, HOXB8, and AKT1) were significantly correlated with colorectal cancer progression. Furthermore, Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses indicated that hsa-miR-196a is associated with well-known oncogenic pathways, including cellular protein modification process, mitotic cell cycle, adherens junction, and extracellular matrix receptor interaction pathways. Conclusion: Our results suggest that SNPs of sncRNAs could play a critical role in cancer progression, and that hsa-miR-196a might be a valuable biomarker or therapeutic target for colorectal cancer patients. PMID:29483812
Studying the genetic basis of speciation in high gene flow marine invertebrates

PubMed Central

2016-01-01

A growing number of genes responsible for reproductive incompatibilities between species (barrier loci) exhibit the signals of positive selection. However, the possibility that genes experiencing positive selection diverge early in speciation and commonly cause reproductive incompatibilities has not been systematically investigated on a genome-wide scale. Here, I outline a research program for studying the genetic basis of speciation in broadcast spawning marine invertebrates that uses a priori genome-wide information on a large, unbiased sample of genes tested for positive selection. A targeted sequence capture approach is proposed that scores single-nucleotide polymorphisms (SNPs) in widely separated species populations at an early stage of allopatric divergence. The targeted capture of both coding and non-coding sequences enables SNPs to be characterized at known locations across the genome and at genes with known selective or neutral histories. The neutral coding and non-coding SNPs provide robust background distributions for identifying FST-outliers within genes that can, in principle, identify specific mutations experiencing diversifying selection. If natural hybridization occurs between species, the neutral coding and non-coding SNPs can provide a neutral admixture model for genomic clines analyses aimed at finding genes exhibiting strong blocks to introgression. Strongylocentrotid sea urchins are used as a model system to outline the approach but it can be used for any group that has a complete reference genome available. PMID:29491951
Determination of performance characteristics of scientific applications on IBM Blue Gene/Q

DOE Office of Scientific and Technical Information (OSTI.GOV)

Evangelinos, C.; Walkup, R. E.; Sachdeva, V.

The IBM Blue Gene®/Q platform presents scientists and engineers with a rich set of hardware features such as 16 cores per chip sharing a Level 2 cache, a wide SIMD (single-instruction, multiple-data) unit, a five-dimensional torus network, and hardware support for collective operations. Especially important is the feature related to cores that have four “hardware threads,” which makes it possible to hide latencies and obtain a high fraction of the peak issue rate from each core. All of these hardware resources present unique performance-tuning opportunities on Blue Gene/Q. We provide an overview of several important applications and solvers and studymore » them on Blue Gene/Q using performance counters and Message Passing Interface profiles. We also discuss how Blue Gene/Q tools help us understand the interaction of the application with the hardware and software layers and provide guidance for optimization. Furthermore, on the basis of our analysis, we discuss code improvement strategies targeting Blue Gene/Q. Information about how these algorithms map to the Blue Gene® architecture is expected to have an impact on future system design as we move to the exascale era.« less
The shaping and functional consequences of the dosage effect landscape in multiple myeloma.

PubMed

Samur, Mehmet K; Shah, Parantu K; Wang, Xujun; Minvielle, Stéphane; Magrangeas, Florence; Avet-Loiseau, Hervé; Munshi, Nikhil C; Li, Cheng

2013-10-02

Multiple myeloma (MM) is a malignant proliferation of plasma B cells. Based on recurrent aneuploidy such as copy number alterations (CNAs), myeloma is divided into two subtypes with different CNA patterns and patient survival outcomes. How aneuploidy events arise, and whether they contribute to cancer cell evolution are actively studied. The large amount of transcriptomic changes resultant of CNAs (dosage effect) pose big challenges for identifying functional consequences of CNAs in myeloma in terms of specific driver genes and pathways. In this study, we hypothesize that gene-wise dosage effect varies as a result from complex regulatory networks that translate the impact of CNAs to gene expression, and studying this variation can provide insights into functional effects of CNAs. We propose gene-wise dosage effect score and genome-wide karyotype plot as tools to measure and visualize concordant copy number and expression changes across cancer samples. We find that dosage effect in myeloma is widespread yet variable, and it is correlated with gene expression level and CNA frequencies in different chromosomes. Our analysis suggests that despite the enrichment of differentially expressed genes between hyperdiploid MM and non-hyperdiploid MM in the trisomy chromosomes, the chromosomal proportion of dosage sensitive genes is higher in the non-trisomy chromosomes. Dosage-sensitive genes are enriched by genes with protein translation and localization functions, and dosage resistant genes are enriched by apoptosis genes. These results point to future studies on differential dosage sensitivity and resistance of pro- and anti-proliferation pathways and their variation across patients as therapeutic targets and prognosis markers. Our findings support the hypothesis that recurrent CNAs in myeloma are selected by their functional consequences. The novel dosage effect score defined in this work will facilitate integration of copy number and expression data for identifying driver genes in cancer genomics studies. The accompanying R code is available at http://www.canevolve.org/dosageEffect/.
Transcriptome-wide effects of inverted SINEs on gene expression and their impact on RNA polymerase II activity.

PubMed

Tajaddod, Mansoureh; Tanzer, Andrea; Licht, Konstantin; Wolfinger, Michael T; Badelt, Stefan; Huber, Florian; Pusch, Oliver; Schopoff, Sandy; Janisiw, Michael; Hofacker, Ivo; Jantsch, Michael F

2016-10-25

Short interspersed elements (SINEs) represent the most abundant group of non-long-terminal repeat transposable elements in mammalian genomes. In primates, Alu elements are the most prominent and homogenous representatives of SINEs. Due to their frequent insertion within or close to coding regions, SINEs have been suggested to play a crucial role during genome evolution. Moreover, Alu elements within mRNAs have also been reported to control gene expression at different levels. Here, we undertake a genome-wide analysis of insertion patterns of human Alus within transcribed portions of the genome. Multiple, nearby insertions of SINEs within one transcript are more abundant in tandem orientation than in inverted orientation. Indeed, analysis of transcriptome-wide expression levels of 15 ENCODE cell lines suggests a cis-repressive effect of inverted Alu elements on gene expression. Using reporter assays, we show that the negative effect of inverted SINEs on gene expression is independent of known sensors of double-stranded RNAs. Instead, transcriptional elongation seems impaired, leading to reduced mRNA levels. Our study suggests that there is a bias against multiple SINE insertions that can promote intramolecular base pairing within a transcript. Moreover, at a genome-wide level, mRNAs harboring inverted SINEs are less expressed than mRNAs harboring single or tandemly arranged SINEs. Finally, we demonstrate a novel mechanism by which inverted SINEs can impact on gene expression by interfering with RNA polymerase II.
Structure of genes for dermaseptins B, antimicrobial peptides from frog skin. Exon 1-encoded prepropeptide is conserved in genes for peptides of highly different structures and activities.

PubMed

Vouille, V; Amiche, M; Nicolas, P

1997-09-01

We cloned the genes of two members of the dermaseptin family, broad-spectrum antimicrobial peptides isolated from the skin of the arboreal frog Phyllomedusa bicolor. The dermaseptin gene Drg2 has a 2-exon coding structure interrupted by a small 137-bp intron, wherein exon 1 encoded a 22-residue hydrophobic signal peptide and the first three amino acids of the acidic propiece; exon 2 contained the 18 additional acidic residues of the propiece plus a typical prohormone processing signal Lys-Arg and a 32-residue dermaseptin progenitor sequence. The dermaseptin genes Drg2 and Drg1g2 have conserved sequences at both untranslated ends and in the first and second coding exons. In contrast, Drg1g2 comprises a third coding exon for a short version of the acidic propiece and a second dermaseptin progenitor sequence. Structural conservation between the two genes suggests that Drg1g2 arose recently from an ancestral Drg2-like gene through amplification of part of the second coding exon and 3'-untranslated region. Analysis of the cDNAs coding precursors for several frog skin peptides of highly different structures and activities demonstrates that the signal peptides and part of the acidic propieces are encoded by conserved nucleotides encompassed by the first coding exon of the dermaseptin genes. The organization of the genes that belong to this family, with the signal peptide and the progenitor sequence on separate exons, permits strikingly different peptides to be directed into the secretory pathway. The recruitment of such a homologous 'secretory' exon by otherwise non-homologous genes may have been an early event in the evolution of amphibian.
MiR-191 Regulates Primary Human Fibroblast Proliferation and Directly Targets Multiple Oncogenes

PubMed Central

Polioudakis, Damon; Abell, Nathan S.; Iyer, Vishwanath R.

2015-01-01

miRNAs play a central role in numerous pathologies including multiple cancer types. miR-191 has predominantly been studied as an oncogene, but the role of miR-191 in the proliferation of primary cells is not well characterized, and the miR-191 targetome has not been experimentally profiled. Here we utilized RNA induced silencing complex immunoprecipitations as well as gene expression profiling to construct a genome wide miR-191 target profile. We show that miR-191 represses proliferation in primary human fibroblasts, identify multiple proto-oncogenes as novel miR-191 targets, including CDK9, NOTCH2, and RPS6KA3, and present evidence that miR-191 extensively mediates target expression through coding sequence (CDS) pairing. Our results provide a comprehensive genome wide miR-191 target profile, and demonstrate miR-191’s regulation of primary human fibroblast proliferation. PMID:25992613
Hamming and Accumulator Codes Concatenated with MPSK or QAM

NASA Technical Reports Server (NTRS)

Divsalar, Dariush; Dolinar, Samuel

2009-01-01

In a proposed coding-and-modulation scheme, a high-rate binary data stream would be processed as follows: 1. The input bit stream would be demultiplexed into multiple bit streams. 2. The multiple bit streams would be processed simultaneously into a high-rate outer Hamming code that would comprise multiple short constituent Hamming codes a distinct constituent Hamming code for each stream. 3. The streams would be interleaved. The interleaver would have a block structure that would facilitate parallelization for high-speed decoding. 4. The interleaved streams would be further processed simultaneously into an inner two-state, rate-1 accumulator code that would comprise multiple constituent accumulator codes - a distinct accumulator code for each stream. 5. The resulting bit streams would be mapped into symbols to be transmitted by use of a higher-order modulation - for example, M-ary phase-shift keying (MPSK) or quadrature amplitude modulation (QAM). The novelty of the scheme lies in the concatenation of the multiple-constituent Hamming and accumulator codes and the corresponding parallel architectures of the encoder and decoder circuitry (see figure) needed to process the multiple bit streams simultaneously. As in the cases of other parallel-processing schemes, one advantage of this scheme is that the overall data rate could be much greater than the data rate of each encoder and decoder stream and, hence, the encoder and decoder could handle data at an overall rate beyond the capability of the individual encoder and decoder circuits.
Selection of sporophytic and gametophytic self-incompatibility in the absence of a superlocus.

PubMed

Schoen, Daniel J; Roda, Megan J

2016-06-01

Self-incompatibility (SI) is a complex trait that enforces outcrossing in plant populations. SI generally involves tight linkage of genes coding for the proteins that underlie self-pollen detection and pollen identity specification. Here, we develop two-locus genetic models to address the question of whether sporophytic SI (SSI) and gametophytic SI (GSI) can invade populations of self-compatible plants when there is no linkage or weak linkage of the underlying pollen detection and identity genes (i.e., no S-locus supergene). The models assume that SI evolves as a result of exaptation of genes formerly involved in functions other than SI. Model analysis reveals that SSI and GSI can invade populations even when the underlying genes are loosely linked, provided that inbreeding depression and selfing rate are sufficiently high. Reducing recombination between these genes makes conditions for invasion more lenient. These results can help account for multiple, independent evolution of SI systems as seems to have occurred in the angiosperms. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
In vivo delivery of miRNAs for cancer therapy: Challenges and strategies⋆

PubMed Central

Chen, Yunching; Gao, Dong-Yu; Huang, Leaf

2016-01-01

MicroRNAs (miRNAs), small non-coding RNAs, can regulate post-transcriptional gene expressions and silence a broad set of target genes. miRNAs, aberrantly expressed in cancer cells, play an important role in modulating gene expressions, thereby regulating downstream signaling pathways and affecting cancer formation and progression. Oncogenes or tumor suppressor genes regulated by miRNAs mediate cell cycle progression, metabolism, cell death, angiogenesis, metastasis and immunosuppression in cancer. Recently, miRNAs have emerged as therapeutic targets or tools and biomarkers for diagnosis and therapy monitoring in cancer. Since miRNAs can regulate multiple cancer-related genes simultaneously, using miRNAs as a therapeutic approach plays an important role in cancer therapy. However, one of the major challenges of miRNA-based cancer therapy is to achieve specific, efficient and safe systemic delivery of therapeutic miRNAs In vivo. This review discusses the key challenges to the development of the carriers for miRNA-based therapy and explores current strategies to systemically deliver miRNAs to cancer without induction of toxicity. PMID:24859533
Exon Shuffling and Origin of Scorpion Venom Biodiversity

PubMed Central

Wang, Xueli; Gao, Bin; Zhu, Shunyi

2016-01-01

Scorpion venom is a complex combinatorial library of peptides and proteins with multiple biological functions. A combination of transcriptomic and proteomic techniques has revealed its enormous molecular diversity, as identified by the presence of a large number of ion channel-targeted neurotoxins with different folds, membrane-active antimicrobial peptides, proteases, and protease inhibitors. Although the biodiversity of scorpion venom has long been known, how it arises remains unsolved. In this work, we analyzed the exon-intron structures of an array of scorpion venom protein-encoding genes and unexpectedly found that nearly all of these genes possess a phase-1 intron (one intron located between the first and second nucleotides of a codon) near the cleavage site of a signal sequence despite their mature peptides remarkably differ. This observation matches a theory of exon shuffling in the origin of new genes and suggests that recruitment of different folds into scorpion venom might be achieved via shuffling between body protein-coding genes and ancestral venom gland-specific genes that presumably contributed tissue-specific regulatory elements and secretory signal sequences. PMID:28035955
Exon Shuffling and Origin of Scorpion Venom Biodiversity.

PubMed

Wang, Xueli; Gao, Bin; Zhu, Shunyi

2016-12-26

Scorpion venom is a complex combinatorial library of peptides and proteins with multiple biological functions. A combination of transcriptomic and proteomic techniques has revealed its enormous molecular diversity, as identified by the presence of a large number of ion channel-targeted neurotoxins with different folds, membrane-active antimicrobial peptides, proteases, and protease inhibitors. Although the biodiversity of scorpion venom has long been known, how it arises remains unsolved. In this work, we analyzed the exon-intron structures of an array of scorpion venom protein-encoding genes and unexpectedly found that nearly all of these genes possess a phase-1 intron (one intron located between the first and second nucleotides of a codon) near the cleavage site of a signal sequence despite their mature peptides remarkably differ. This observation matches a theory of exon shuffling in the origin of new genes and suggests that recruitment of different folds into scorpion venom might be achieved via shuffling between body protein-coding genes and ancestral venom gland-specific genes that presumably contributed tissue-specific regulatory elements and secretory signal sequences.
XGC developments for a more efficient XGC-GENE code coupling

NASA Astrophysics Data System (ADS)

Dominski, Julien; Hager, Robert; Ku, Seung-Hoe; Chang, Cs

2017-10-01

In the Exascale Computing Program, the High-Fidelity Whole Device Modeling project initially aims at delivering a tightly-coupled simulation of plasma neoclassical and turbulence dynamics from the core to the edge of the tokamak. To permit such simulations, the gyrokinetic codes GENE and XGC will be coupled together. Numerical efforts are made to improve the numerical schemes agreement in the coupling region. One of the difficulties of coupling those codes together is the incompatibility of their grids. GENE is a continuum grid-based code and XGC is a Particle-In-Cell code using unstructured triangular mesh. A field-aligned filter is thus implemented in XGC. Even if XGC originally had an approximately field-following mesh, this field-aligned filter permits to have a perturbation discretization closer to the one solved in the field-aligned code GENE. Additionally, new XGC gyro-averaging matrices are implemented on a velocity grid adapted to the plasma properties, thus ensuring same accuracy from the core to the edge regions.
Multiple microRNAs regulate human FOXP2 gene expression by targeting sequences in its 3' untranslated region.

PubMed

Fu, Lijuan; Shi, Zhimin; Luo, Guanzheng; Tu, Weihong; Wang, XiuJie; Fang, Zhide; Li, XiaoChing

2014-10-01

Mutations in the human FOXP2 gene cause speech and language impairments. The FOXP2 protein is a transcription factor that regulates the expression of many downstream genes, which may have important roles in nervous system development and function. An adequate amount of functional FOXP2 protein is thought to be critical for the proper development of the neural circuitry underlying speech and language. However, how FOXP2 gene expression is regulated is not clearly understood. The FOXP2 mRNA has an approximately 4-kb-long 3' untranslated region (3' UTR), twice as long as its protein coding region, indicating that FOXP2 can be regulated by microRNAs (miRNAs). We identified multiple miRNAs that regulate the expression of the human FOXP2 gene using sequence analysis and in vitro cell systems. Focusing on let-7a, miR-9, and miR-129-5p, three brain-enriched miRNAs, we show that these miRNAs regulate human FOXP2 expression in a dosage-dependent manner and target specific sequences in the FOXP2 3' UTR. We further show that these three miRNAs are expressed in the cerebellum of the human fetal brain, where FOXP2 is known to be expressed. Our results reveal novel regulatory functions of the human FOXP2 3' UTR sequence and regulatory interactions between multiple miRNAs and the human FOXP2 gene. The expression of let-7a, miR-9, and miR-129-5p in the human fetal cerebellum is consistent with their roles in regulating FOXP2 expression during early cerebellum development. These results suggest that various genetic and environmental factors may contribute to speech and language development and related neural developmental disorders via the miRNA-FOXP2 regulatory network.
Optimization of algorithm of coding of genetic information of Chlamydia

NASA Astrophysics Data System (ADS)

Feodorova, Valentina A.; Ulyanov, Sergey S.; Zaytsev, Sergey S.; Saltykov, Yury V.; Ulianova, Onega V.

2018-04-01

New method of coding of genetic information using coherent optical fields is developed. Universal technique of transformation of nucleotide sequences of bacterial gene into laser speckle pattern is suggested. Reference speckle patterns of the nucleotide sequences of omp1 gene of typical wild strains of Chlamydia trachomatis of genovars D, E, F, G, J and K and Chlamydia psittaci serovar I as well are generated. Algorithm of coding of gene information into speckle pattern is optimized. Fully developed speckles with Gaussian statistics for gene-based speckles have been used as criterion of optimization.
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae).

PubMed

Pan, Hong-Chun; Fang, Hong-Yan; Li, Shi-Wei; Liu, Jun-Hong; Wang, Ying; Wang, An-Tai

2014-12-01

The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae) is composed of two linear DNA molecules. The mitochondrial DNA (mtDNA) molecule 1 is 8010 bp long and contains six protein-coding genes, large subunit rRNA, methionine and tryptophan tRNAs, two pseudogenes consisting respectively of a partial copy of COI, and terminal sequences at two ends of the linear mtDNA, while the mtDNA molecule 2 is 7576 bp long and contains seven protein-coding genes, small subunit rRNA, methionine tRNA, a pseudogene consisting of a partial copy of COI and terminal sequences at two ends of the linear mtDNA. COI gene begins with GTG as start codon, whereas other 12 protein-coding genes start with a typical ATG initiation codon. In addition, all protein-coding genes are terminated with TAA as stop codon.
Multiple components in restriction enzyme digests of mammalian (insectivore), avian and reptilian genomic DNA hybridize with murine immunoglobulin VH probes.

PubMed

Litman, G W; Berger, L; Jahn, C L

1982-06-11

High molecular weight genomic DNAs isolated from an insectivore, Tupaia, and a representative reptilian, Caiman, and avian, Gallus, were digested with restriction endonucleases transferred to nitrocellulose and hybridized with nick-translated probes of murine VH genes. The derivations of the probes designated S107V (1) and mu 107V (2,3) have been described previously. Under conditions of reduced stringency, multiple hybridizing components were observed with Tupaia and Caiman; only mu mu 107V exhibited significant hybridization with the separated fragments of Gallus DNA. The nick-translated S107V probe was digested with Fnu4H1 and subinserts corresponding to the 5' and 3' regions both detected multiple hybridizing components in Tupaia and Caiman DNA. A 5' probe lacking the leader sequence identified the same components as the intact 5' probe, suggesting that VH coding regions distant as the reptilians may possess multiple genetic components which exhibit significant homology with murine immunoglobulin in VH regions.
Multiple components in restriction enzyme digests of mammalian (insectivore), avian and reptilian genomic DNA hybridize with murine immunoglobulin VH probes.

PubMed Central

Litman, G W; Berger, L; Jahn, C L

1982-01-01

High molecular weight genomic DNAs isolated from an insectivore, Tupaia, and a representative reptilian, Caiman, and avian, Gallus, were digested with restriction endonucleases transferred to nitrocellulose and hybridized with nick-translated probes of murine VH genes. The derivations of the probes designated S107V (1) and mu 107V (2,3) have been described previously. Under conditions of reduced stringency, multiple hybridizing components were observed with Tupaia and Caiman; only mu mu 107V exhibited significant hybridization with the separated fragments of Gallus DNA. The nick-translated S107V probe was digested with Fnu4H1 and subinserts corresponding to the 5' and 3' regions both detected multiple hybridizing components in Tupaia and Caiman DNA. A 5' probe lacking the leader sequence identified the same components as the intact 5' probe, suggesting that VH coding regions distant as the reptilians may possess multiple genetic components which exhibit significant homology with murine immunoglobulin in VH regions. Images PMID:6285298
The effect of multiple primary rules on cancer incidence rates and trends

PubMed Central

Weir, Hannah K.; Johnson, Christopher J.; Ward, Kevin C.; Coleman, Michel P.

2018-01-01

Purpose An examination of multiple primary cancers can provide insight into the etiologic role of genes, the environment, and prior cancer treatment on a cancer patient’s risk of developing a subsequent cancer. Different rules for registering multiple primary cancers (MP) are used by cancer registries throughout the world making data comparisons difficult. Methods We evaluated the effect of SEER and IARC/IACR rules on cancer incidence rates and trends using data from the SEER Program. We estimated age-standardized incidence rate (ASIR) and trends (1975–2011) for the top 26 cancer categories using joinpoint regression analysis. Results ASIRs were higher using SEER compared to IARC/IACR rules for all cancers combined (3 %) and, in rank order, melanoma (9 %), female breast (7 %), urinary bladder (6 %), colon (4 %), kidney and renal pelvis (4 %), oral cavity and pharynx (3 %), lung and bronchus (2 %), and non-Hodgkin lymphoma (2 %). ASIR differences were largest for patients aged 65+ years. Trends were similar using both MP rules with the exception of cancers of the urinary bladder, and kidney and renal pelvis. Conclusions The choice of multiple primary coding rules effects incidence rates and trends. Compared to SEER MP coding rules, IARC/IACR rules are less complex, have not changed over time, and report fewer multiple primary cancers, particularly cancers that occur in paired organs, at the same anatomic site and with the same or related histologic type. Cancer registries collecting incidence data using SEER rules may want to consider including incidence rates and trends using IARC/IACR rules to facilitate international data comparisons. PMID:26809509

Loss of stomach, loss of appetite? Sequencing of the ballan wrasse (Labrus bergylta) genome and intestinal transcriptomic profiling illuminate the evolution of loss of stomach function in fish.

PubMed

Lie, Kai K; Tørresen, Ole K; Solbakken, Monica Hongrø; Rønnestad, Ivar; Tooming-Klunderud, Ave; Nederbragt, Alexander J; Jentoft, Sissel; Sæle, Øystein

2018-03-06

The ballan wrasse (Labrus bergylta) belongs to a large teleost family containing more than 600 species showing several unique evolutionary traits such as lack of stomach and hermaphroditism. Agastric fish are found throughout the teleost phylogeny, in quite diverse and unrelated lineages, indicating stomach loss has occurred independently multiple times in the course of evolution. By assembling the ballan wrasse genome and transcriptome we aimed to determine the genetic basis for its digestive system function and appetite regulation. Among other, this knowledge will aid the formulation of aquaculture diets that meet the nutritional needs of agastric species. Long and short read sequencing technologies were combined to generate a ballan wrasse genome of 805 Mbp. Analysis of the genome and transcriptome assemblies confirmed the absence of genes that code for proteins involved in gastric function. The gene coding for the appetite stimulating protein ghrelin was also absent in wrasse. Gene synteny mapping identified several appetite-controlling genes and their paralogs previously undescribed in fish. Transcriptome profiling along the length of the intestine found a declining expression gradient from the anterior to the posterior, and a distinct expression profile in the hind gut. We showed gene loss has occurred for all known genes related to stomach function in the ballan wrasse, while the remaining functions of the digestive tract appear intact. The results also show appetite control in ballan wrasse has undergone substantial changes. The loss of ghrelin suggests that other genes, such as motilin, may play a ghrelin like role. The wrasse genome offers novel insight in to the evolutionary traits of this large family. As the stomach plays a major role in protein digestion, the lack of genes related to stomach digestion in wrasse suggests it requires formulated diets with higher levels of readily digestible protein than those for gastric species.
The two single nucleotide polymorphisms in the H37/RBM5 tumour suppressor gene at 3p21.3 correlated with different subtypes of non-small cell lung cancers

PubMed Central

Oh, Juliana J.; Koegel, Ashley; Phan, Diana T.; Razfar, Ali; Slamon, Dennis J.

2007-01-01

Summary Allele loss and genetic alteration in chromosome 3p, particularly in 3p21.3 region, are the most frequent and the earliest genomic abnormalities found in lung cancer. Multiple 3p21.3 genes exhibit various degrees of tumour suppression activity suggesting that 3p21.3 genes may function as an integrated tumour suppressor region through their diverse biological activities. We have previously demonstrated growth inhibitory effects and tumour suppression mechanism of the H37/RBM5 gene which is one of the 19 genes residing in the 370kb minimal overlap region at 3p21.3. In the current study, in an attempt to find, if any, mutations in the H37 coding region in lung cancer cells, we compared nucleotide sequences of the entire H37 gene in tumour vs. adjacent normal tissues from 17 non-small cell lung cancer (NSCLC) patients. No mutations were detected, instead, we found the two silent single nucleotide polymorphisms (SNPs), C1138T and C2185T, within the coding region of the H37 gene. In addition, we found that specific allele types at these SNP positions are correlated with different histological subtypes of NSCLC; tumours containing heterozygous alleles (C+T) at these SNP positions are more likely to be associated with adenocarcinoma (AC) whereas homozygous alleles (either C or T) are associated with squamous cell carcinoma (SCC) (p=0.0098). We postulate that, these two silent polymorphisms may be in linkage disequilibrium (LD) with a disease causative allele in the 3p21.3 tumour suppressor region which is packed with a large number of important genes affecting lung cancer development. In addition, because of prevalent loss of heterozygosity (LOH) detected at 3p21.3 which precedes lung cancer initiation, these SNPs may be developed into a marker screening for the high risk individuals. PMID:17606309
Identification, Classification and Differential Expression of Oleosin Genes in Tung Tree (Vernicia fordii)

PubMed Central

Cao, Heping; Zhang, Lin; Tan, Xiaofeng; Long, Hongxu; Shockey, Jay M.

2014-01-01

Triacylglycerols (TAG) are the major molecules of energy storage in eukaryotes. TAG are packed in subcellular structures called oil bodies or lipid droplets. Oleosins (OLE) are the major proteins in plant oil bodies. Multiple isoforms of OLE are present in plants such as tung tree (Vernicia fordii), whose seeds are rich in novel TAG with a wide range of industrial applications. The objectives of this study were to identify OLE genes, classify OLE proteins and analyze OLE gene expression in tung trees. We identified five tung tree OLE genes coding for small hydrophobic proteins. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that the five tung OLE genes represented the five OLE subfamilies and all contained the “proline knot” motif (PX5SPX3P) shared among 65 OLE from 19 tree species, including the sequenced genomes of Prunus persica (peach), Populus trichocarpa (poplar), Ricinus communis (castor bean), Theobroma cacao (cacao) and Vitis vinifera (grapevine). Tung OLE1, OLE2 and OLE3 belong to the S type and OLE4 and OLE5 belong to the SM type of Arabidopsis OLE. TaqMan and SYBR Green qPCR methods were used to study the differential expression of OLE genes in tung tree tissues. Expression results demonstrated that 1) All five OLE genes were expressed in developing tung seeds, leaves and flowers; 2) OLE mRNA levels were much higher in seeds than leaves or flowers; 3) OLE1, OLE2 and OLE3 genes were expressed in tung seeds at much higher levels than OLE4 and OLE5 genes; 4) OLE mRNA levels rapidly increased during seed development; and 5) OLE gene expression was well-coordinated with tung oil accumulation in the seeds. These results suggest that tung OLE genes 1–3 probably play major roles in tung oil accumulation and/or oil body development. Therefore, they might be preferred targets for tung oil engineering in transgenic plants. PMID:24516650
Identification, classification and differential expression of oleosin genes in tung tree (Vernicia fordii).

PubMed

Cao, Heping; Zhang, Lin; Tan, Xiaofeng; Long, Hongxu; Shockey, Jay M

2014-01-01

Triacylglycerols (TAG) are the major molecules of energy storage in eukaryotes. TAG are packed in subcellular structures called oil bodies or lipid droplets. Oleosins (OLE) are the major proteins in plant oil bodies. Multiple isoforms of OLE are present in plants such as tung tree (Vernicia fordii), whose seeds are rich in novel TAG with a wide range of industrial applications. The objectives of this study were to identify OLE genes, classify OLE proteins and analyze OLE gene expression in tung trees. We identified five tung tree OLE genes coding for small hydrophobic proteins. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that the five tung OLE genes represented the five OLE subfamilies and all contained the "proline knot" motif (PX5SPX3P) shared among 65 OLE from 19 tree species, including the sequenced genomes of Prunus persica (peach), Populus trichocarpa (poplar), Ricinus communis (castor bean), Theobroma cacao (cacao) and Vitis vinifera (grapevine). Tung OLE1, OLE2 and OLE3 belong to the S type and OLE4 and OLE5 belong to the SM type of Arabidopsis OLE. TaqMan and SYBR Green qPCR methods were used to study the differential expression of OLE genes in tung tree tissues. Expression results demonstrated that 1) All five OLE genes were expressed in developing tung seeds, leaves and flowers; 2) OLE mRNA levels were much higher in seeds than leaves or flowers; 3) OLE1, OLE2 and OLE3 genes were expressed in tung seeds at much higher levels than OLE4 and OLE5 genes; 4) OLE mRNA levels rapidly increased during seed development; and 5) OLE gene expression was well-coordinated with tung oil accumulation in the seeds. These results suggest that tung OLE genes 1-3 probably play major roles in tung oil accumulation and/or oil body development. Therefore, they might be preferred targets for tung oil engineering in transgenic plants.
Methylation of miRNA genes and oncogenesis.

PubMed

Loginov, V I; Rykov, S V; Fridman, M V; Braga, E A

2015-02-01

Interaction between microRNA (miRNA) and messenger RNA of target genes at the posttranscriptional level provides fine-tuned dynamic regulation of cell signaling pathways. Each miRNA can be involved in regulating hundreds of protein-coding genes, and, conversely, a number of different miRNAs usually target a structural gene. Epigenetic gene inactivation associated with methylation of promoter CpG-islands is common to both protein-coding genes and miRNA genes. Here, data on functions of miRNAs in development of tumor-cell phenotype are reviewed. Genomic organization of promoter CpG-islands of the miRNA genes located in inter- and intragenic areas is discussed. The literature and our own results on frequency of CpG-island methylation in miRNA genes from tumors are summarized, and data regarding a link between such modification and changed activity of miRNA genes and, consequently, protein-coding target genes are presented. Moreover, the impact of miRNA gene methylation on key oncogenetic processes as well as affected signaling pathways is discussed.
Combined actions of multiple hairpin loop structures and sites of rate-limiting endonucleolytic cleavage determine differential degradation rates of individual segments within polycistronic puf operon mRNA.

PubMed Central

Klug, G; Cohen, S N

1990-01-01

Differential expression of the genes within the puf operon of Rhodobacter capsulatus is accomplished in part by differences in the rate of degradation of different segments of the puf transcript. We report here that decay of puf mRNA sequences specifying the light-harvesting I (LHI) and reaction center (RC) photosynthetic membrane peptides is initiated endoribonucleolytically within a discrete 1.4-kilobase segment of the RC-coding region. Deletion of this segment increased the half-life of the RC-coding region from 8 to 20 min while not affecting decay of LHI-coding sequences upstream from an intercistronic hairpin loop structure shown previously to impede 3'-to-5' degradation. Prolongation of RC segment half-life was dependent on the presence of other hairpin structures 3' to the RC region. Inserting the endonuclease-sensitive sites into the LHI-coding segment markedly accelerated its degradation. Our results suggest that differential degradation of the RC- and LHI-coding segments of puf mRNA is accomplished at least in part by the combined actions of RC region-specific endonuclease(s), one or more exonucleases, and several strategically located exonuclease-impeding hairpins. Images PMID:2394682
Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation

PubMed Central

Pujar, Shashikant; O’Leary, Nuala A; Farrell, Catherine M; Mudge, Jonathan M; Wallin, Craig; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bult, Carol J; Frankish, Adam; Pruitt, Kim D

2018-01-01

Abstract The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. PMID:29126148
Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.)

PubMed Central

Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

2015-01-01

The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073
An unusual internal ribosomal entry site of inverted symmetry directs expression of a potato leafroll polerovirus replication-associated protein

PubMed Central

Jaag, Hannah Miriam; Kawchuk, Lawrence; Rohde, Wolfgang; Fischer, Rainer; Emans, Neil; Prüfer, Dirk

2003-01-01

Potato leafroll polerovirus (PLRV) genomic RNA acts as a polycistronic mRNA for the production of proteins P0, P1, and P2 translated from the 5′-proximal half of the genome. Within the P1 coding region we identified a 5-kDa replication-associated protein 1 (Rap1) essential for viral multiplication. An internal ribosome entry site (IRES) with unusual structure and location was identified that regulates Rap1 translation. Core structural elements for internal ribosome entry include a conserved AUG codon and a downstream GGAGAGAGAGG motif with inverted symmetry. Reporter gene expression in potato protoplasts confirmed the internal ribosome entry function. Unlike known IRES motifs, the PLRV IRES is located completely within the coding region of Rap1 at the center of the PLRV genome. PMID:12835413
Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae).

PubMed

Dubey, Bhawna; Meganathan, P R; Haque, Ikramul

2012-07-01

This paper reports the complete mitochondrial genome sequence of an endangered Indian snake, Python molurus molurus (Indian Rock Python). A typical snake mitochondrial (mt) genome of 17258 bp length comprising of 37 genes including the 13 protein coding genes, 22 tRNA genes, and 2 ribosomal RNA genes along with duplicate control regions is described herein. The P. molurus molurus mt. genome is relatively similar to other snake mt. genomes with respect to gene arrangement, composition, tRNA structures and skews of AT/GC bases. The nucleotide composition of the genome shows that there are more A-C % than T-G% on the positive strand as revealed by positive AT and CG skews. Comparison of individual protein coding genes, with other snake genomes suggests that ATP8 and NADH3 genes have high divergence rates. Codon usage analysis reveals a preference of NNC codons over NNG codons in the mt. genome of P. molurus. Also, the synonymous and non-synonymous substitution rates (ka/ks) suggest that most of the protein coding genes are under purifying selection pressure. The phylogenetic analyses involving the concatenated 13 protein coding genes of P. molurus molurus conformed to the previously established snake phylogeny.
Analysis of human ES cell differentiation establishes that the dominant isoforms of the lncRNAs RMST and FIRRE are circular.

PubMed

Izuogu, Osagie G; Alhasan, Abd A; Mellough, Carla; Collin, Joseph; Gallon, Richard; Hyslop, Jonathon; Mastrorosa, Francesco K; Ehrmann, Ingrid; Lako, Majlinda; Elliott, David J; Santibanez-Koref, Mauro; Jackson, Michael S

2018-04-20

Circular RNAs (circRNAs) are predominantly derived from protein coding genes, and some can act as microRNA sponges or transcriptional regulators. Changes in circRNA levels have been identified during human development which may be functionally important, but lineage-specific analyses are currently lacking. To address this, we performed RNAseq analysis of human embryonic stem (ES) cells differentiated for 90 days towards 3D laminated retina. A transcriptome-wide increase in circRNA expression, size, and exon count was observed, with circRNA levels reaching a plateau by day 45. Parallel statistical analyses, controlling for sample and locus specific effects, identified 239 circRNAs with expression changes distinct from the transcriptome-wide pattern, but these all also increased in abundance over time. Surprisingly, circRNAs derived from long non-coding RNAs (lncRNAs) were found to account for a significantly larger proportion of transcripts from their loci of origin than circRNAs from coding genes. The most abundant, circRMST:E12-E6, showed a > 100X increase during differentiation accompanied by an isoform switch, and accounts for > 99% of RMST transcripts in many adult tissues. The second most abundant, circFIRRE:E10-E5, accounts for > 98% of FIRRE transcripts in differentiating human ES cells, and is one of 39 FIRRE circRNAs, many of which include multiple unannotated exons. Our results suggest that during human ES cell differentiation, changes in circRNA levels are primarily globally controlled. They also suggest that RMST and FIRRE, genes with established roles in neurogenesis and topological organisation of chromosomal domains respectively, are processed as circular lncRNAs with only minor linear species.
The analysis of APOL1 genetic variation and haplotype diversity provided by 1000 Genomes project.

PubMed

Peng, Ting; Wang, Li; Li, Guisen

2017-08-11

The APOL1 gene variants has been shown to be associated with an increased risk of multiple kinds of diseases, particularly in African Americans, but not in Caucasians and Asians. In this study, we explored the single nucleotide polymorphism (SNP) and haplotype diversity of APOL1 gene in different races provided by 1000 Genomes project. Variants of APOL1 gene in 1000 Genome Project were obtained and SNPs located in the regulatory region or coding region were selected for genetic variation analysis. Total 2504 individuals from 26 populations were classified as four groups that included Africa, Europe, Asia and Admixed populations. Tag SNPs were selected to evaluate the haplotype diversities in the four populations by HaploStats software. APOL1 gene was surrounded by some of the most polymorphic genes in the human genome, variation of APOL1 gene was common, with up to 613 SNP (1000 Genome Project reported) and 99 of them (16.2%) with MAF ≥ 1%. There were 79 SNPs in the URR and 92 SNPs in 3'UTR. Total 12 SNPs in URR and 24 SNPs in 3'UTR were considered as common variants with MAF ≥ 1%. It is worth noting that URR-1 was presents lower frequencies in European populations, while other three haplotypes taken an opposite pattern; 3'UTR presents several high-frequency variation sites in a short segment, and the differences of its haplotypes among different population were significant (P < 0.01), UTR-1 and UTR-5 presented much higher frequency in African population, while UTR-2, UTR-3 and UTR-4 were much lower. APOL1 coding region showed that two SNP of G1 with higher frequency are actually pull down the haplotype H-1 frequency when considering all populations pooled together, and the diversity among the four populations be widen by the G1 two mutation (P 1 = 3.33E-4 vs P 2 = 3.61E-30). The distributions of APOL1 gene variants and haplotypes were significantly different among the different populations, in either regulatory or coding regions. It could provide clues for the future genetic study of APOL1 related diseases.
Identification of a Conserved Non-Protein-Coding Genomic Element that Plays an Essential Role in Alphabaculovirus Pathogenesis

PubMed Central

Kikhno, Irina

2014-01-01

Highly homologous sequences 154–157 bp in length grouped under the name of “conserved non-protein-coding element” (CNE) were revealed in all of the sequenced genomes of baculoviruses belonging to the genus Alphabaculovirus. A CNE alignment led to the detection of a set of highly conserved nucleotide clusters that occupy strictly conserved positions in the CNE sequence. The significant length of the CNE and conservation of both its length and cluster architecture were identified as a combination of characteristics that make this CNE different from known viral non-coding functional sequences. The essential role of the CNE in the Alphabaculovirus life cycle was demonstrated through the use of a CNE-knockout Autographa californica multiple nucleopolyhedrovirus (AcMNPV) bacmid. It was shown that the essential function of the CNE was not mediated by the presumed expression activities of the protein- and non-protein-coding genes that overlap the AcMNPV CNE. On the basis of the presented data, the AcMNPV CNE was categorized as a complex-structured, polyfunctional genomic element involved in an essential DNA transaction that is associated with an undefined function of the baculovirus genome. PMID:24740153
A Review on Spectral Amplitude Coding Optical Code Division Multiple Access

NASA Astrophysics Data System (ADS)

Kaur, Navpreet; Goyal, Rakesh; Rani, Monika

2017-06-01

This manuscript deals with analysis of Spectral Amplitude Coding Optical Code Division Multiple Access (SACOCDMA) system. The major noise source in optical CDMA is co-channel interference from other users known as multiple access interference (MAI). The system performance in terms of bit error rate (BER) degrades as a result of increased MAI. It is perceived that number of users and type of codes used for optical system directly decide the performance of system. MAI can be restricted by efficient designing of optical codes and implementing them with unique architecture to accommodate more number of users. Hence, it is a necessity to design a technique like spectral direct detection (SDD) technique with modified double weight code, which can provide better cardinality and good correlation property.
Comparison of PSF maxima and minima of multiple annuli coded aperture (MACA) and complementary multiple annuli coded aperture (CMACA) systems

NASA Astrophysics Data System (ADS)

Ratnam, Challa; Lakshmana Rao, Vadlamudi; Lachaa Goud, Sivagouni

2006-10-01

In the present paper, and a series of papers to follow, the Fourier analytical properties of multiple annuli coded aperture (MACA) and complementary multiple annuli coded aperture (CMACA) systems are investigated. First, the transmission function for MACA and CMACA is derived using Fourier methods and, based on the Fresnel-Kirchoff diffraction theory, the formulae for the point spread function are formulated. The PSF maxima and minima are calculated for both the MACA and CMACA systems. The dependence of these properties on the number of zones is studied and reported in this paper.
Carney Complex: an update

PubMed Central

Correa, Ricardo; Salpea, Paraskevi; Stratakis, Constantine

2015-01-01

Carney Complex (CNC) is a rare autosomal dominant syndrome, characterized by pigmented lesions of the skin and mucosa, cardiac, cutaneous and other myxomas, and multiple endocrine tumors. The disease is caused by inactivating mutations or large deletions of the PRKAR1A gene located at 17q22–24 coding for the regulatory subunit type I alpha of protein kinase A (PKA) gene. Most recently, components of the complex have been associated with defects of other PKA subunits, such as the catalytic subunits PRKACA (adrenal hyperplasia) and PRKACB (pigmented spots, myxomas, pituitary adenomas). In this report, we review CNC, its clinical features, diagnosis, treatment, and molecular etiology including PRKAR1A mutations and the newest on PRKACA and PRKACB defects especially as they pertain to adrenal tumors and Cushing’s syndrome. PMID:26130139
Crisp proteins and sperm chemotaxis: discovery in amphibians and explorations in mammals.

PubMed

Burnett, Lindsey A; Xiang, Xueyu; Bieber, Allan L; Chandler, Douglas E

2008-01-01

Crisp proteins appear to play multiple roles in the life history of sperm. One of these roles is to act as a sperm chemoattractant. Allurin, a 21 kDa Crisp protein rapidly released from the egg jelly of at least two frogs, X. laevis and X. tropicalis, elicits directed motility in both homospecific and heterospecific sperm. In X. tropicalis, allurin is coded for by the newly documented Crisp A gene. Recently, the observation that allurin can also elicit chemotaxis in mouse sperm raises the question of whether allurin-like proteins might act as sperm chemoattractants in mammals. Although an allurin gene has yet to be documented in mammals, Crisp proteins truncated post-translationally appear to exist in both the male and female reproductive tract of mammals.
In Silico Pattern-Based Analysis of the Human Cytomegalovirus Genome

PubMed Central

Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T.; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas

2003-01-01

More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/). PMID:12634390
Epigenome Aberrations: Emerging Driving Factors of the Clear Cell Renal Cell Carcinoma

PubMed Central

Mehdi, Ali; Riazalhosseini, Yasser

2017-01-01

Clear cell renal cell carcinoma (ccRCC), the most common form of Kidney cancer, is characterized by frequent mutations of the von Hippel-Lindau (VHL) tumor suppressor gene in ~85% of sporadic cases. Loss of pVHL function affects multiple cellular processes, among which the activation of hypoxia inducible factor (HIF) pathway is the best-known function. Constitutive activation of HIF signaling in turn activates hundreds of genes involved in numerous oncogenic pathways, which contribute to the development or progression of ccRCC. Although VHL mutations are considered as drivers of ccRCC, they are not sufficient to cause the disease. Recent genome-wide sequencing studies of ccRCC have revealed that mutations of genes coding for epigenome modifiers and chromatin remodelers, including PBRM1, SETD2 and BAP1, are the most common somatic genetic abnormalities after VHL mutations in these tumors. Moreover, recent research has shed light on the extent of abnormal epigenome alterations in ccRCC tumors, including aberrant DNA methylation patterns, abnormal histone modifications and deregulated expression of non-coding RNAs. In this review, we discuss the epigenetic modifiers that are commonly mutated in ccRCC, and our growing knowledge of the cellular processes that are impacted by them. Furthermore, we explore new avenues for developing therapeutic approaches based on our knowledge of epigenome aberrations of ccRCC. PMID:28812986
Epigenome Aberrations: Emerging Driving Factors of the Clear Cell Renal Cell Carcinoma.

PubMed

Mehdi, Ali; Riazalhosseini, Yasser

2017-08-16

Clear cell renal cell carcinoma (ccRCC), the most common form of Kidney cancer, is characterized by frequent mutations of the von Hippel-Lindau ( VHL ) tumor suppressor gene in ~85% of sporadic cases. Loss of pVHL function affects multiple cellular processes, among which the activation of hypoxia inducible factor (HIF) pathway is the best-known function. Constitutive activation of HIF signaling in turn activates hundreds of genes involved in numerous oncogenic pathways, which contribute to the development or progression of ccRCC. Although VHL mutations are considered as drivers of ccRCC, they are not sufficient to cause the disease. Recent genome-wide sequencing studies of ccRCC have revealed that mutations of genes coding for epigenome modifiers and chromatin remodelers, including PBRM1 , SETD2 and BAP1 , are the most common somatic genetic abnormalities after VHL mutations in these tumors. Moreover, recent research has shed light on the extent of abnormal epigenome alterations in ccRCC tumors, including aberrant DNA methylation patterns, abnormal histone modifications and deregulated expression of non-coding RNAs. In this review, we discuss the epigenetic modifiers that are commonly mutated in ccRCC, and our growing knowledge of the cellular processes that are impacted by them. Furthermore, we explore new avenues for developing therapeutic approaches based on our knowledge of epigenome aberrations of ccRCC.

[Frequency of intron 1 inversion of factor VIII gene in Chinese hemophilia A patients with case report of a female patient with heterozygous intron 1 inversion].

PubMed

Yan, Zhen-yu; Liang, Yan; Yan, Mei; Fan, Lian-kai; Xiao, Bai; Hua, Bao-lai; Liu, Jing-zhong; Zhao, Yong-qiang

2008-10-21

To investigate the frequency of intron 1 inversion (inv1) in FVIII gene in Chinese hemophilia A (HA) patients and to investigate the mechanism of pathogenesis. Peripheral blood samples were collected from 158 unrelated HA patients, aged 20 (1 - 73), including one female HA patient, aged 5, and several family members of a patient positive in inv1. One-stage method was used to assay the FVIII activity (FVIII:C). Long distance PCR and multiple PCR in duplex reactions were used to screen for the intron 22 inversion (inv22) and inv1 of the FVIII coding gene (F8). The F8 coding sequence was amplified with PCR and sequenced with an automatic sequencer. Two unrelated patients (pedigrees) were detected as inv1 positive with a positive rate of 1.26%. A rare female HA patient with inv1 was also discovered in a positive family (3 HA cases were found in this family and regarded as one case in calculating the total detection rate). The full length of FVIII was sequenced, and no other mutation was detected. There frequency of FVIII inv1 is low in Chinese HA patients compared with other populations. Female HA patients are heterozygous for FVIII inv1 and that may be resulted from nonrandom inactivation of X chromosome.
In silico pattern-based analysis of the human cytomegalovirus genome.

PubMed

Rigoutsos, Isidore; Novotny, Jiri; Huynh, Tien; Chin-Bow, Stephen T; Parida, Laxmi; Platt, Daniel; Coleman, David; Shenk, Thomas

2003-04-01

More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/).
Characterization of Dihydrofolate Reductase Genes from Trimethoprim-Susceptible and Trimethoprim-Resistant Strains of Enterococcus faecalis

PubMed Central

Coque, Teresa M.; Singh, Kavindra V.; Weinstock, George M.; Murray, Barbara E.

1999-01-01

Enterococci are usually susceptible in vitro to trimethoprim; however, high-level resistance (HLR) (MICs, >1,024 μg/ml) has been reported. We studied Enterococcus faecalis DEL, for which the trimethoprim MIC was >1,024 μg/ml. No transfer of resistance was achieved by broth or filter matings. Two different genes that conferred trimethoprim resistance when they were cloned in Escherichia coli (MICs, 128 and >1,024 μg/ml) were studied. One gene that coded for a polypeptide of 165 amino acids (MIC, 128 μg/ml for E. coli) was identical to dfr homologs that we cloned from a trimethoprim-susceptible E. faecalis strain, and it is presumed to be the intrinsic E. faecalis dfr gene (which causes resistance in E. coli when cloned in multiple copies); this gene was designated dfrE. The nucleotide sequence 5′ to this dfr gene showed similarity to thymidylate synthetase genes, suggesting that the dfr and thy genes from E. faecalis are located in tandem. The E. faecalis gene that conferred HLR to trimethoprim in E. coli, designated dfrF, codes for a predicted polypeptide of 165 amino acids with 38 to 64% similarity with other dihydrofolate reductases from gram-positive and gram-negative organisms. The nucleotide sequence 5′ to dfrF did not show similarity to the thy sequences. A DNA probe for dfrF hybridized under high-stringency conditions only to colony lysates of enterococci for which the trimethoprim MIC was >1,024 μg/ml; there was no hybridization to plasmid DNA from the strain of origin. To confirm that this gene causes trimethoprim resistance in enterococci, we cloned it into the integrative vector pAT113 and electroporated it into RH110 (E. faecalis OG1RF::Tn916ΔEm) (trimethoprim MIC, 0.5 μg/ml), which resulted in RH110 derivatives for which the trimethoprim MIC was >1,024 μg/ml. These results indicate that dfrF is an acquired but probably chromosomally located gene which is responsible for in vitro HLR to trimethoprim in E. faecalis. PMID:9869579
Complete genome sequence of lymphocystis disease virus isolated from China.

PubMed

Zhang, Qi-Ya; Xiao, Feng; Xie, Jian; Li, Zheng-Qiu; Gui, Jian-Fang

2004-07-01

Lymphocystis diseases in fish throughout the world have been extensively described. Here we report the complete genome sequence of lymphocystis disease virus isolated in China (LCDV-C), an LCDV isolated from cultured flounder (Paralichthys olivaceus) with lymphocystis disease in China. The LCDV-C genome is 186,250 bp, with a base composition of 27.25% G+C. Computer-assisted analysis revealed 240 potential open reading frames (ORFs) and 176 nonoverlapping putative viral genes, which encode polypeptides ranging from 40 to 1,193 amino acids. The percent coding density is 67%, and the average length of each ORF is 702 bp. A search of the GenBank database using the 176 individual putative genes revealed 103 homologues to the corresponding ORFs of LCDV-1 and 73 potential genes that were not found in LCDV-1 and other iridoviruses. Among the 73 genes, there are 8 genes that contain conserved domains of cellular genes and 65 novel genes that do not show any significant homology with the sequences in public databases. Although a certain extent of similarity between putative gene products of LCDV-C and corresponding proteins of LCDV-1 was revealed, no colinearity was detected when their ORF arrangements and coding strategies were compared to each other, suggesting that a high degree of genetic rearrangements between them has occurred. And a large number of tandem and overlapping repeated sequences were observed in the LCDV-C genome. The deduced amino acid sequence of the major capsid protein (MCP) presents the highest identity to those of LCDV-1 and other iridoviruses among the LCDV-C gene products. Furthermore, a phylogenetic tree was constructed based on the multiple alignments of nine MCP amino acid sequences. Interestingly, LCDV-C and LCDV-1 were clustered together, but their amino acid identity is much less than that in other clusters. The unexpected levels of divergence between their genomes in size, gene organization, and gene product identity suggest that LCDV-C and LCDV-1 shouldn't belong to a same species and that LCDV-C should be considered a species different from LCDV-1.
Complete Genome Sequence of Lymphocystis Disease Virus Isolated from China

PubMed Central

Zhang, Qi-Ya; Xiao, Feng; Xie, Jian; Li, Zheng-Qiu; Gui, Jian-Fang

2004-01-01

Lymphocystis diseases in fish throughout the world have been extensively described. Here we report the complete genome sequence of lymphocystis disease virus isolated in China (LCDV-C), an LCDV isolated from cultured flounder (Paralichthys olivaceus) with lymphocystis disease in China. The LCDV-C genome is 186,250 bp, with a base composition of 27.25% G+C. Computer-assisted analysis revealed 240 potential open reading frames (ORFs) and 176 nonoverlapping putative viral genes, which encode polypeptides ranging from 40 to 1,193 amino acids. The percent coding density is 67%, and the average length of each ORF is 702 bp. A search of the GenBank database using the 176 individual putative genes revealed 103 homologues to the corresponding ORFs of LCDV-1 and 73 potential genes that were not found in LCDV-1 and other iridoviruses. Among the 73 genes, there are 8 genes that contain conserved domains of cellular genes and 65 novel genes that do not show any significant homology with the sequences in public databases. Although a certain extent of similarity between putative gene products of LCDV-C and corresponding proteins of LCDV-1 was revealed, no colinearity was detected when their ORF arrangements and coding strategies were compared to each other, suggesting that a high degree of genetic rearrangements between them has occurred. And a large number of tandem and overlapping repeated sequences were observed in the LCDV-C genome. The deduced amino acid sequence of the major capsid protein (MCP) presents the highest identity to those of LCDV-1 and other iridoviruses among the LCDV-C gene products. Furthermore, a phylogenetic tree was constructed based on the multiple alignments of nine MCP amino acid sequences. Interestingly, LCDV-C and LCDV-1 were clustered together, but their amino acid identity is much less than that in other clusters. The unexpected levels of divergence between their genomes in size, gene organization, and gene product identity suggest that LCDV-C and LCDV-1 shouldn't belong to a same species and that LCDV-C should be considered a species different from LCDV-1. PMID:15194775
Polymerization of non-complementary RNA: systematic symmetric nucleotide exchanges mainly involving uracil produce mitochondrial RNA transcripts coding for cryptic overlapping genes.

PubMed

Seligmann, Hervé

2013-03-01

Usual DNA→RNA transcription exchanges T→U. Assuming different systematic symmetric nucleotide exchanges during translation, some GenBank RNAs match exactly human mitochondrial sequences (exchange rules listed in decreasing transcript frequencies): C↔U, A↔U, A↔U+C↔G (two nucleotide pairs exchanged), G↔U, A↔G, C↔G, none for A↔C, A↔G+C↔U, and A↔C+G↔U. Most unusual transcripts involve exchanging uracil. Independent measures of rates of rare replicational enzymatic DNA nucleotide misinsertions predict frequencies of RNA transcripts systematically exchanging the corresponding misinserted nucleotides. Exchange transcripts self-hybridize less than other gene regions, self-hybridization increases with length, suggesting endoribonuclease-limited elongation. Blast detects stop codon depleted putative protein coding overlapping genes within exchange-transcribed mitochondrial genes. These align with existing GenBank proteins (mainly metazoan origins, prokaryotic and viral origins underrepresented). These GenBank proteins frequently interact with RNA/DNA, are membrane transporters, or are typical of mitochondrial metabolism. Nucleotide exchange transcript frequencies increase with overlapping gene densities and stop densities, indicating finely tuned counterbalancing regulation of expression of systematic symmetric nucleotide exchange-encrypted proteins. Such expression necessitates combined activities of suppressor tRNAs matching stops, and nucleotide exchange transcription. Two independent properties confirm predicted exchanged overlap coding genes: discrepancy of third codon nucleotide contents from replicational deamination gradients, and codon usage according to circular code predictions. Predictions from both properties converge, especially for frequent nucleotide exchange types. Nucleotide exchanging transcription apparently increases coding densities of protein coding genes without lengthening genomes, revealing unsuspected functional DNA coding potential. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Comparative analysis of human protein-coding and noncoding RNAs between brain and 10 mixed cell lines by RNA-Seq.

PubMed

Chen, Geng; Yin, Kangping; Shi, Leming; Fang, Yuanzhang; Qi, Ya; Li, Peng; Luo, Jian; He, Bing; Liu, Mingyao; Shi, Tieliu

2011-01-01

In their expression process, different genes can generate diverse functional products, including various protein-coding or noncoding RNAs. Here, we investigated the protein-coding capacities and the expression levels of their isoforms for human known genes, the conservation and disease association of long noncoding RNAs (ncRNAs) with two transcriptome sequencing datasets from human brain tissues and 10 mixed cell lines. Comparative analysis revealed that about two-thirds of the genes expressed between brain and cell lines are the same, but less than one-third of their isoforms are identical. Besides those genes specially expressed in brain and cell lines, about 66% of genes expressed in common encoded different isoforms. Moreover, most genes dominantly expressed one isoform and some genes only generated protein-coding (or noncoding) RNAs in one sample but not in another. We found 282 human genes could encode both protein-coding and noncoding RNAs through alternative splicing in the two samples. We also identified more than 1,000 long ncRNAs, and most of those long ncRNAs contain conserved elements across either 46 vertebrates or 33 placental mammals or 10 primates. Further analysis showed that some long ncRNAs differentially expressed in human breast cancer or lung cancer, several of those differentially expressed long ncRNAs were validated by RT-PCR. In addition, those validated differentially expressed long ncRNAs were found significantly correlated with certain breast cancer or lung cancer related genes, indicating the important biological relevance between long ncRNAs and human cancers. Our findings reveal that the differences of gene expression profile between samples mainly result from the expressed gene isoforms, and highlight the importance of studying genes at the isoform level for completely illustrating the intricate transcriptome.
Effects of a petunia scaffold/matrix attachment region on copy number dependency and stability of transgene expression in Nicotiana tabacum.

PubMed

Dietz-Pfeilstetter, Antje; Arndt, Nicola; Manske, Ulrike

2016-04-01

Transgenes in genetically modified plants are often not reliably expressed during development or in subsequent generations. Transcriptional gene silencing (TGS) as well as post-transcriptional gene silencing (PTGS) have been shown to occur in transgenic plants depending on integration pattern, copy number and integration site. In an effort to reduce position effects, to prevent read-through transcription and to provide a more accessible chromatin structure, a P35S-ß-glucuronidase (P35S-gus) transgene flanked by a scaffold/matrix attachment region from petunia (Petun-SAR), was introduced in Nicotiana tabacum plants by Agrobacterium tumefaciens mediated transformation. It was found that Petun-SAR mediates enhanced expression and copy number dependency up to 2 gene copies, but did not prevent gene silencing in transformants with multiple and rearranged gene copies. However, in contrast to the non-SAR transformants where silencing was irreversible and proceeded during long-term vegetative propagation and in progeny plants, gus expression in Petun-SAR plants was re-established in the course of development. Gene silencing was not necessarily accompanied by DNA methylation, while the gus transgene could still be expressed despite considerable CG methylation within the coding region.
Spectrum of mutations in leiomyosarcomas identified by clinical targeted next-generation sequencing.

PubMed

Lee, Paul J; Yoo, Naomi S; Hagemann, Ian S; Pfeifer, John D; Cottrell, Catherine E; Abel, Haley J; Duncavage, Eric J

2017-02-01

Recurrent genomic mutations in uterine and non-uterine leiomyosarcomas have not been well established. Using a next generation sequencing (NGS) panel of common cancer-associated genes, 25 leiomyosarcomas arising from multiple sites were examined to explore genetic alterations, including single nucleotide variants (SNV), small insertions/deletions (indels), and copy number alterations (CNA). Sequencing showed 86 non-synonymous, coding region somatic variants within 151 gene targets in 21 cases, with a mean of 4.1 variants per case; 4 cases had no putative mutations in the panel of genes assayed. The most frequently altered genes were TP53 (36%), ATM and ATRX (16%), and EGFR and RB1 (12%). CNA were identified in 85% of cases, with the most frequent copy number losses observed in chromosomes 10 and 13 including PTEN and RB1; the most frequent gains were seen in chromosomes 7 and 17. Our data show that deletions in canonical cancer-related genes are common in leiomyosarcomas. Further, the spectrum of gene mutations observed shows that defects in DNA repair and chromosomal maintenance are central to the biology of leiomyosarcomas, and that activating mutations observed in other common cancer types are rare in leiomyosarcomas. Copyright © 2017 Elsevier Inc. All rights reserved.
Amplification of the groESL operon in Pseudomonas putida increases siderophore gene promoter activity.

PubMed

Venturi, V; Wolfs, K; Leong, J; Weisbeek, P J

1994-10-17

Pseudobactin 358 is the yellow-green fluorescent siderophore [microbial iron(III) transport agent] produced by Pseudomonas putida WCS358 under iron-limiting conditions. The genes encoding pseudobactin 358 biosynthesis are iron-regulated at the level of transcription. In this study, the molecular characterization is reported of a cosmid clone of WCS358 DNA that can stimulate, in an iron-dependent manner, the activity of a WCS358 siderophore gene promoter in the heterologous Pseudomonas strain A225. The functional region in the clone was identified by subcloning, transposon mutagenesis and DNA sequencing as the groESL operon of strain WCS358. This increase in promoter activity was not observed when the groESL genes of strain WCS358 were integrated via a transposon vector into the genome of Pseudomonas A225, indicating that multiple copies of the operon are necessary for the increase in siderophore gene promoter activity. Amplification of the Escherichia coli and WCS358 groESL genes also increased iron-regulated promoter activity in the parent strain WCS358. The groESL operon codes for the chaperone proteins GroES and GroEL, which are responsible for mediating the folding and assembly of many proteins.
Morphometric Analysis of Recognized Genes for Autism Spectrum Disorders and Obesity in Relationship to the Distribution of Protein-Coding Genes on Human Chromosomes.

PubMed

McGuire, Austen B; Rafi, Syed K; Manzardo, Ann M; Butler, Merlin G

2016-05-05

Mammalian chromosomes are comprised of complex chromatin architecture with the specific assembly and configuration of each chromosome influencing gene expression and function in yet undefined ways by varying degrees of heterochromatinization that result in Giemsa (G) negative euchromatic (light) bands and G-positive heterochromatic (dark) bands. We carried out morphometric measurements of high-resolution chromosome ideograms for the first time to characterize the total euchromatic and heterochromatic chromosome band length, distribution and localization of 20,145 known protein-coding genes, 790 recognized autism spectrum disorder (ASD) genes and 365 obesity genes. The individual lengths of G-negative euchromatin and G-positive heterochromatin chromosome bands were measured in millimeters and recorded from scaled and stacked digital images of 850-band high-resolution ideograms supplied by the International Society of Chromosome Nomenclature (ISCN) 2013. Our overall measurements followed established banding patterns based on chromosome size. G-negative euchromatic band regions contained 60% of protein-coding genes while the remaining 40% were distributed across the four heterochromatic dark band sub-types. ASD genes were disproportionately overrepresented in the darker heterochromatic sub-bands, while the obesity gene distribution pattern did not significantly differ from protein-coding genes. Our study supports recent trends implicating genes located in heterochromatin regions playing a role in biological processes including neurodevelopment and function, specifically genes associated with ASD.
Gene and genon concept: coding versus regulation

PubMed Central

2007-01-01

We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon. PMID:18087760
A 590 kb deletion caused by non-allelic homologous recombination between two LINE-1 elements in a patient with mesomelia-synostosis syndrome.

PubMed

Kohmoto, Tomohiro; Naruto, Takuya; Watanabe, Miki; Fujita, Yuji; Ujiro, Sae; Okamoto, Nana; Horikawa, Hideaki; Masuda, Kiyoshi; Imoto, Issei

2017-04-01

Mesomelia-synostoses syndrome (MSS) is a rare, autosomal-dominant, syndromal osteochondrodysplasia characterized by mesomelic limb shortening, acral synostoses, and multiple congenital malformations due to a non-recurrent deletion at 8q13 that always encompasses two coding-genes, SULF1 and SLCO5A1. To date, five unrelated patients have been reported worldwide, and MMS was previously proposed to not be a genomic disorder associated with deletions recurring from non-allelic homologous recombination (NAHR) in at least two analyzed cases. We conducted targeted gene panel sequencing and subsequent array-based copy number analysis in an 11-year-old undiagnosed Japanese female patient with multiple congenital anomalies that included mesomelic limb shortening and detected a novel 590 Kb deletion at 8q13 encompassing the same gene set as reported previously, resulting in the diagnosis of MSS. Breakpoint sequences of the deleted region in our case demonstrated the first LINE-1s (L1s)-mediated unequal NAHR event utilizing two distant L1 elements as homology substrates in this disease, which may represent a novel causative mechanism of the 8q13 deletion, expanding the range of mechanisms involved in the chromosomal rearrangements responsible for MSS. © 2017 Wiley Periodicals, Inc.
Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs

PubMed Central

Takeda, Jun-ichi; Suzuki, Yutaka; Nakao, Mitsuteru; Barrero, Roberto A.; Koyanagi, Kanako O.; Jin, Lihua; Motono, Chie; Hata, Hiroko; Isogai, Takao; Nagai, Keiichi; Otsuki, Tetsuji; Kuryshev, Vladimir; Shionyu, Masafumi; Yura, Kei; Go, Mitiko; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Wiemann, Stefan; Nomura, Nobuo; Sugano, Sumio; Gojobori, Takashi; Imanishi, Tadashi

2006-01-01

We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants. PMID:16914452
Use of deep whole-genome sequencing data to identify structure risk variants in breast cancer susceptibility genes.

PubMed

Guo, Xingyi; Shi, Jiajun; Cai, Qiuyin; Shu, Xiao-Ou; He, Jing; Wen, Wanqing; Allen, Jamie; Pharoah, Paul; Dunning, Alison; Hunter, David J; Kraft, Peter; Easton, Douglas F; Zheng, Wei; Long, Jirong

2018-03-01

Functional disruptions of susceptibility genes by large genomic structure variant (SV) deletions in germlines are known to be associated with cancer risk. However, few studies have been conducted to systematically search for SV deletions in breast cancer susceptibility genes. We analysed deep (> 30x) whole-genome sequencing (WGS) data generated in blood samples from 128 breast cancer patients of Asian and European descent with either a strong family history of breast cancer or early cancer onset disease. To identify SV deletions in known or suspected breast cancer susceptibility genes, we used multiple SV calling tools including Genome STRiP, Delly, Manta, BreakDancer and Pindel. SV deletions were detected by at least three of these bioinformatics tools in five genes. Specifically, we identified heterozygous deletions covering a fraction of the coding regions of BRCA1 (with approximately 80kb in two patients), and TP53 genes (with ∼1.6 kb in two patients), and of intronic regions (∼1 kb) of the PALB2 (one patient), PTEN (three patients) and RAD51C genes (one patient). We confirmed the presence of these deletions using real-time quantitative PCR (qPCR). Our study identified novel SV deletions in breast cancer susceptibility genes and the identification of such SV deletions may improve clinical testing.
Dinucleotide controlled null models for comparative RNA gene prediction.

PubMed

Gesell, Tanja; Washietl, Stefan

2008-05-27

Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak et al. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available. We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content. SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered. SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: http://sourceforge.net/projects/sissiz.
A multiple antibiotic and serum resistant oligotrophic strain, Klebsiella pneumoniae MB45 having novel dfrA30, is sensitive to ZnO QDs

PubMed Central

2011-01-01

Background The aim of this study was to describe a novel trimethoprim resistance gene cassette, designated dfrA30, within a class 1 integron in a facultatively oligotrophic, multiple antibiotic and human serum resistant test strain, MB45, in a population of oligotrophic bacteria isolated from the river Mahananda; and to test the efficiency of surface bound acetate on zinc oxide quantum dots (ZnO QDs) as bactericidal agent on MB45. Methods Diluted Luria broth/Agar (10-3) media was used to cultivate the oligotrophic bacteria from water sample. Multiple antibiotic resistant bacteria were selected by employing replica plate method. A rapid assay was performed to determine the sensitivity/resistance of the test strain to human serum. Variable region of class 1 integron was cloned, sequenced and the expression of gene coding for antibiotic resistance was done in Escherichia coli JM 109. Identity of culture was determined by biochemical phenotyping and 16S rRNA gene sequence analyses. A phylogenetic tree was constructed based on representative trimethoprim resistance-mediating DfrA proteins retrieved from GenBank. Growth kinetic studies for the strain MB45 were performed in presence of varied concentrations of ZnO QDs. Results and conclusions The facultatively oligotrophic strain, MB45, resistant to human serum and ten antibiotics trimethoprim, cotrimoxazole, ampicillin, gentamycin, netilmicin, tobramycin, chloramphenicol, cefotaxime, kanamycin and streptomycin, has been identified as a new strain of Klebsiella pneumoniae. A novel dfr gene, designated as dfrA30, found integrated in class 1 integron was responsible for resistance to trimethoprim in Klebsiella pneumoniae strain MB45. The growth of wild strain MB45 was 100% arrested at 500 mg/L concentration of ZnO QDs. To our knowledge this is the first report on application of ZnO quantum dots to kill multiple antibiotics and serum resistant K. pneumoniae strain. PMID:21595893
Genetic variation influences glutamate concentrations in brains of patients with multiple sclerosis.

PubMed

Baranzini, Sergio E; Srinivasan, Radhika; Khankhanian, Pouya; Okuda, Darin T; Nelson, Sarah J; Matthews, Paul M; Hauser, Stephen L; Oksenberg, Jorge R; Pelletier, Daniel

2010-09-01

Glutamate is the main excitatory neurotransmitter in the mammalian brain. Appropriate transmission of nerve impulses through glutamatergic synapses is required throughout the brain and forms the basis of many processes including learning and memory. However, abnormally high levels of extracellular brain glutamate can lead to neuroaxonal cell death. We have previously reported elevated glutamate levels in the brains of patients suffering from multiple sclerosis. Here two complementary analyses to assess the extent of genomic control over glutamate levels were used. First, a genome-wide association analysis in 382 patients with multiple sclerosis using brain glutamate concentration as a quantitative trait was conducted. In a second approach, a protein interaction network was used to find associated genes within the same pathway. The top associated marker was rs794185 (P < 6.44 x 10(-7)), a non-coding single nucleotide polymorphism within the gene sulphatase modifying factor 1. Our pathway approach identified a module composed of 70 genes with high relevance to glutamate biology. Individuals carrying a higher number of associated alleles from genes in this module showed the highest levels of glutamate. These individuals also showed greater decreases in N-acetylaspartate and in brain volume over 1 year of follow-up. Patients were then stratified by the amount of annual brain volume loss and the same approach was performed in the 'high' (n = 250) and 'low' (n = 132) neurodegeneration groups. The association with rs794185 was highly significant in the group with high neurodegeneration. Further, results from the network-based pathway analysis remained largely unchanged even after stratification. Results from these analyses indicated that variance in the activity of neurochemical pathways implicated in neurodegeneration is explained, at least in part, by the inheritance of common genetic polymorphisms. Spectroscopy-based imaging provides a novel quantitative endophenotype for genetic association studies directed towards identifying new factors that contribute to the heterogeneity of clinical expression of multiple sclerosis.
Dose-sensitivity, conserved non-coding sequences, and duplicate gene retention through multiple tetraploidies in the grasses.

PubMed

Schnable, James C; Pedersen, Brent S; Subramaniam, Sabarinath; Freeling, Michael

2011-01-01

Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein-protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein-protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose-sensitive protein-DNA interactions between the regulatory regions of CNS-rich genes - nicknamed bigfoot genes - and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy.
Dose–Sensitivity, Conserved Non-Coding Sequences, and Duplicate Gene Retention Through Multiple Tetraploidies in the Grasses

PubMed Central

Schnable, James C.; Pedersen, Brent S.; Subramaniam, Sabarinath; Freeling, Michael

2011-01-01

Whole genome duplications, or tetraploidies, are an important source of increased gene content. Following whole genome duplication, duplicate copies of many genes are lost from the genome. This loss of genes is biased both in the classes of genes deleted and the subgenome from which they are lost. Many or all classes are genes preferentially retained as duplicate copies are engaged in dose sensitive protein–protein interactions, such that deletion of any one duplicate upsets the status quo of subunit concentrations, and presumably lowers fitness as a result. Transcription factors are also preferentially retained following every whole genome duplications studied. This has been explained as a consequence of protein–protein interactions, just as for other highly retained classes of genes. We show that the quantity of conserved noncoding sequences (CNSs) associated with genes predicts the likelihood of their retention as duplicate pairs following whole genome duplication. As many CNSs likely represent binding sites for transcriptional regulators, we propose that the likelihood of gene retention following tetraploidy may also be influenced by dose–sensitive protein–DNA interactions between the regulatory regions of CNS-rich genes – nicknamed bigfoot genes – and the proteins that bind to them. Using grass genomes, we show that differential loss of CNSs from one member of a pair following the pre-grass tetraploidy reduces its chance of retention in the subsequent maize lineage tetraploidy. PMID:22645525

Non-coding variants contribute to the clinical heterogeneity of TTR amyloidosis.

PubMed

Iorio, Andrea; De Lillo, Antonella; De Angelis, Flavio; Di Girolamo, Marco; Luigetti, Marco; Sabatelli, Mario; Pradotto, Luca; Mauro, Alessandro; Mazzeo, Anna; Stancanelli, Claudia; Perfetto, Federico; Frusconi, Sabrina; My, Filomena; Manfellotto, Dario; Fuciarelli, Maria; Polimanti, Renato

2017-09-01

Coding mutations in TTR gene cause a rare hereditary form of systemic amyloidosis, which has a complex genotype-phenotype correlation. We investigated the role of non-coding variants in regulating TTR gene expression and consequently amyloidosis symptoms. We evaluated the genotype-phenotype correlation considering the clinical information of 129 Italian patients with TTR amyloidosis. Then, we conducted a re-sequencing of TTR gene to investigate how non-coding variants affect TTR expression and, consequently, phenotypic presentation in carriers of amyloidogenic mutations. Polygenic scores for genetically determined TTR expression were constructed using data from our re-sequencing analysis and the GTEx (Genotype-Tissue Expression) project. We confirmed a strong phenotypic heterogeneity across coding mutations causing TTR amyloidosis. Considering the effects of non-coding variants on TTR expression, we identified three patient clusters with specific expression patterns associated with certain phenotypic presentations, including late onset, autonomic neurological involvement, and gastrointestinal symptoms. This study provides novel data regarding the role of non-coding variation and the gene expression profiles in patients affected by TTR amyloidosis, also putting forth an approach that could be used to investigate the mechanisms at the basis of the genotype-phenotype correlation of the disease.
Maize GO annotation—methods, evaluation, and review (maize-GAMER)

USDA-ARS?s Scientific Manuscript database

We created a new high-coverage, robust, and reproducible functional annotation of maize protein-coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein-coding genes, respectively, this stu...
The complete mitochondrial genome of Pauropus longiramus (Myriapoda: Pauropoda): implications on early diversification of the myriapods revealed from comparative analysis.

PubMed

Dong, Yan; Sun, Hongying; Guo, Hua; Pan, Da; Qian, Changyuan; Hao, Sijing; Zhou, Kaiya

2012-08-15

Myriapods are among the earliest arthropods and may have evolved to become part of the terrestrial biota more than 400 million years ago. A noticeable lack of mitochondrial genome data from Pauropoda hampers phylogenetic and evolutionary studies within the subphylum Myriapoda. We sequenced the first complete mitochondrial genome of a microscopic pauropod, Pauropus longiramus (Arthropoda: Myriapoda), and conducted comprehensive mitogenomic analyses across the Myriapoda. The pauropod mitochondrial genome is a circular molecule of 14,487 bp long and contains the entire set of thirty-seven genes. Frequent intergenic overlaps occurred between adjacent tRNAs, and between tRNA and protein-coding genes. This is the first example of a mitochondrial genome with multiple intergenic overlaps and reveals a strategy for arthropods to effectively compact the mitochondrial genome by overlapping and truncating tRNA genes with neighbor genes, instead of only truncating tRNAs. Phylogenetic analyses based on protein-coding genes provide strong evidence that the sister group of Pauropoda is Symphyla. Additionally, approximately unbiased (AU) tests strongly support the Progoneata and confirm the basal position of Chilopoda in Myriapoda. This study provides an estimation of myriapod origins around 555 Ma (95% CI: 444-704 Ma) and this date is comparable with that of the Cambrian explosion and candidate myriapod-like fossils. A new time-scale suggests that deep radiations during early myriapod diversification occurred at least three times, not once as previously proposed. A Carboniferous origin of pauropods is congruent with the idea that these taxa are derived, rather than basal, progoneatans. Copyright © 2012 Elsevier B.V. All rights reserved.
Clonal Architecture of Secondary Acute Myeloid Leukemia

PubMed Central

Walter, Matthew J.; Shen, Dong; Ding, Li; Shao, Jin; Koboldt, Daniel C.; Chen, Ken; Larson, David E.; McLellan, Michael D.; Dooling, David; Abbott, Rachel; Fulton, Robert; Magrini, Vincent; Schmidt, Heather; Kalicki-Veizer, Joelle; O’Laughlin, Michelle; Fan, Xian; Grillot, Marcus; Witowski, Sarah; Heath, Sharon; Frater, John L.; Eades, William; Tomasson, Michael; Westervelt, Peter; DiPersio, John F.; Link, Daniel C.; Mardis, Elaine R.; Ley, Timothy J.; Wilson, Richard K.; Graubert, Timothy A.

2012-01-01

BACKGROUND The myelodysplastic syndromes are a group of hematologic disorders that often evolve into secondary acute myeloid leukemia (AML). The genetic changes that underlie progression from the myelodysplastic syndromes to secondary AML are not well understood. METHODS We performed whole-genome sequencing of seven paired samples of skin and bone marrow in seven subjects with secondary AML to identify somatic mutations specific to secondary AML. We then genotyped a bone marrow sample obtained during the antecedent myelodysplastic-syndrome stage from each subject to determine the presence or absence of the specific somatic mutations. We identified recurrent mutations in coding genes and defined the clonal architecture of each pair of samples from the myelodysplastic-syndrome stage and the secondary-AML stage, using the allele burden of hundreds of mutations. RESULTS Approximately 85% of bone marrow cells were clonal in the myelodysplastic-syndrome and secondary-AML samples, regardless of the myeloblast count. The secondary-AML samples contained mutations in 11 recurrently mutated genes, including 4 genes that have not been previously implicated in the myelodysplastic syndromes or AML. In every case, progression to acute leukemia was defined by the persistence of an antecedent founding clone containing 182 to 660 somatic mutations and the outgrowth or emergence of at least one subclone, harboring dozens to hundreds of new mutations. All founding clones and subclones contained at least one mutation in a coding gene. CONCLUSIONS Nearly all the bone marrow cells in patients with myelodysplastic syndromes and secondary AML are clonally derived. Genetic evolution of secondary AML is a dynamic process shaped by multiple cycles of mutation acquisition and clonal selection. Recurrent gene mutations are found in both founding clones and daughter subclones. (Funded by the National Institutes of Health and others.) PMID:22417201
A long non-coding RNA expression profile can predict early recurrence in hepatocellular carcinoma after curative resection.

PubMed

Lv, Yufeng; Wei, Wenhao; Huang, Zhong; Chen, Zhichao; Fang, Yuan; Pan, Lili; Han, Xueqiong; Xu, Zihai

2018-06-20

The aim of this study was to develop a novel long non-coding RNA (lncRNA) expression signature to accurately predict early recurrence for patients with hepatocellular carcinoma (HCC) after curative resection. Using expression profiles downloaded from The Cancer Genome Atlas database, we identified multiple lncRNAs with differential expression between early recurrence (ER) group and non-early recurrence (non-ER) group of HCC. Least absolute shrinkage and selection operator (LASSO) for logistic regression models were used to develop a lncRNA-based classifier for predicting ER in the training set. An independent test set was used to validated the predictive value of this classifier. Futhermore, a co-expression network based on these lncRNAs and its highly related genes was constructed and Gene Ontology and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses of genes in the network were performed. We identified 10 differentially expressed lncRNAs, including 3 that were upregulated and 7 that were downregulated in ER group. The lncRNA-based classifier was constructed based on 7 lncRNAs (AL035661.1, PART1, AC011632.1, AC109588.1, AL365361.1, LINC00861 and LINC02084), and its accuracy was 0.83 in training set, 0.87 in test set and 0.84 in total set. And ROC curve analysis showed the AUROC was 0.741 in training set, 0.824 in the test set and 0.765 in total set. A functional enrichment analysis suggested that the genes of which is highly related to 4 lncRNAs were involved in immune system. This 7-lncRNA expression profile can effectively predict the early recurrence after surgical resection for HCC. This article is protected by copyright. All rights reserved.
Common Variants in Cardiac Ion Channel Genes are Associated with Sudden Cardiac Death

PubMed Central

Albert, Christine M.; MacRae, Calum A.; Chasman, Daniel I.; VanDenburgh, Martin; Buring, Julie E; Manson, JoAnn E; Cook, Nancy R; Newton-Cheh, Christopher

2010-01-01

Background Rare variants in cardiac ion channel genes are associated with sudden cardiac death (SCD) in rare primary arrhythmic syndromes; however, it is unknown whether common variation in these same genes may contribute to SCD risk at the population level. Methods and Results We examined the association between 147 single nucleotide polymorphisms (SNPs) (137 tag, 5 non-coding SNPs associated with QT interval duration and 5 nonsynonymous SNPs) in 5 cardiac ion channel genes, KCNQ1, KCNH2, SCN5A, KCNE1 and KCNE2 and sudden and/or arrhythmic death in a combined nested case-control analysis among 516 cases and 1522 matched controls of European ancestry enrolled in six prospective cohort studies. After accounting for multiple testing, two SNPs (rs2283222 located in intron 11 in KCNQ1 and rs11720524 located in intron 1 in SCN5A) remained significantly associated with sudden/arrhythmic death (FDR = 0.01 and 0.03 respectively). Each increasing copy of the major T allele of rs2283222 or the major C allele of rs1172052 was associated with an OR = 1.36 (95% CI 1.16-1.60, P=0.0002) and 1.30 (95% CI 1.12-1.51, P=0.0005) respectively. Control for cardiovascular risk factors and/or limiting the analysis to definite SCDs did not significantly alter these relationships. Conclusion In this combined analysis of 6 prospective cohort studies, two common intronic variants in KCNQ1 and SCN5A were associated with SCD in individuals of European ancestry. Further study in other populations and investigation into the functional abnormalities associated with non-coding variation in these genes may lead to important insights into predisposition to lethal arrhythmias. PMID:20400777
webMGR: an online tool for the multiple genome rearrangement problem.

PubMed

Lin, Chi Ho; Zhao, Hao; Lowcay, Sean Harry; Shahab, Atif; Bourque, Guillaume

2010-02-01

The algorithm MGR enables the reconstruction of rearrangement phylogenies based on gene or synteny block order in multiple genomes. Although MGR has been successfully applied to study the evolution of different sets of species, its utilization has been hampered by the prohibitive running time for some applications. In the current work, we have designed new heuristics that significantly speed up the tool without compromising its accuracy. Moreover, we have developed a web server (webMGR) that includes elaborate web output to facilitate navigation through the results. webMGR can be accessed via http://www.gis.a-star.edu.sg/~bourque. The source code of the improved standalone version of MGR is also freely available from the web site. Supplementary data are available at Bioinformatics online.
Performance enhancement of optical code-division multiple-access systems using transposed modified Walsh code

NASA Astrophysics Data System (ADS)

Sikder, Somali; Ghosh, Shila

2018-02-01

This paper presents the construction of unipolar transposed modified Walsh code (TMWC) and analysis of its performance in optical code-division multiple-access (OCDMA) systems. Specifically, the signal-to-noise ratio, bit error rate (BER), cardinality, and spectral efficiency were investigated. The theoretical analysis demonstrated that the wavelength-hopping time-spreading system using TMWC was robust against multiple-access interference and more spectrally efficient than systems using other existing OCDMA codes. In particular, the spectral efficiency was calculated to be 1.0370 when TMWC of weight 3 was employed. The BER and eye pattern for the designed TMWC were also successfully obtained using OptiSystem simulation software. The results indicate that the proposed code design is promising for enhancing network capacity.
Essential RNA-Based Technologies and Their Applications in Plant Functional Genomics.

PubMed

Teotia, Sachin; Singh, Deepali; Tang, Xiaoqing; Tang, Guiliang

2016-02-01

Genome sequencing has not only extended our understanding of the blueprints of many plant species but has also revealed the secrets of coding and non-coding genes. We present here a brief introduction to and personal account of key RNA-based technologies, as well as their development and applications for functional genomics of plant coding and non-coding genes, with a focus on short tandem target mimics (STTMs), artificial microRNAs (amiRNAs), and CRISPR/Cas9. In addition, their use in multiplex technologies for the functional dissection of gene networks is discussed. Copyright © 2015 Elsevier Ltd. All rights reserved.
Initial description of primate-specific cystine-knot Prometheus genes and differential gene expansions of D-dopachrome tautomerase genes

PubMed Central

Premzl, Marko

2015-01-01

Using eutherian comparative genomic analysis protocol and public genomic sequence data sets, the present work attempted to update and revise two gene data sets. The most comprehensive third party annotation gene data sets of eutherian adenohypophysis cystine-knot genes (128 complete coding sequences), and d-dopachrome tautomerases and macrophage migration inhibitory factor genes (30 complete coding sequences) were annotated. For example, the present study first described primate-specific cystine-knot Prometheus genes, as well as differential gene expansions of D-dopachrome tautomerase genes. Furthermore, new frameworks of future experiments of two eutherian gene data sets were proposed. PMID:25941635
Characterization of Transcriptional Complexity during Adipose Tissue Development in Bovines of Different Ages and Sexes

PubMed Central

Zhou, Yang; Sun, Jiajie; Li, Congjun; Wang, Yanhong; Li, Lan; Cai, Hanfang; Lan, Xianyong; Lei, Chuzhao; Zhao, Xin; Chen, Hong

2014-01-01

Background Adipose tissue has long been recognized to play an extremely important role in development. In bovines, it not only serves a fundamental function but also plays a key role in the quality of beef and, consequently, has drawn much public attention. Age and sex are two key factors that affect the development of adipose tissue, and there has not yet been a global study detailing the effects of these two factors on expressional differences of adipose tissues. Results In this study, total RNA from the back fat of fetal bovines, adult bulls, adult heifers and adult steers were used to construct libraries for Illumina next-generation sequencing. We detected the expression levels of 12,233 genes, with over 3,000 differently expressed genes when comparing fetal and adult patterns and an average of 1000 differently expressed genes when comparing adult patterns. Multiple Gene Ontology terms and pathways were found to be significantly enriched for these differentially expressed genes. Of the 12,233 detected genes, a total of 4,753 genes (38.85%) underwent alternative splicing events, and over 50% were specifically expressed in each library. Over 4,000 novel transcript units were discovered for one library, whereas only approximately 30% were considered to have coding ability, which supplied a large amount of information for the lncRNA study. Additionally, we detected 56,564 (fetal bovine), 65,154 (adult bull), 78,061 (adult heifer) and 86,965 (adult steer) putative single nucleotide polymorphisms located in coding regions of the four pooled libraries. Conclusion Here, we present, for the first time, a complete dataset involving the spatial and temporal transcriptome of bovine adipose tissue using RNA-seq. These data will facilitate the understanding of the effects of age and sex on the development of adipose tissue and supply essential information towards further studies on the genomes of beef cattle and other related mammals. PMID:24983926
Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

PubMed

Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

2015-02-01

The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
The Complete Mitochondrial DNA Sequence of Scenedesmus obliquus Reflects an Intermediate Stage in the Evolution of the Green Algal Mitochondrial Genome

PubMed Central

Nedelcu, Aurora M.; Lee, Robert W.; Lemieux, Claude; Gray, Michael W.; Burger, Gertraud

2000-01-01

Two distinct mitochondrial genome types have been described among the green algal lineages investigated to date: a reduced–derived, Chlamydomonas-like type and an ancestral, Prototheca-like type. To determine if this unexpected dichotomy is real or is due to insufficient or biased sampling and to define trends in the evolution of the green algal mitochondrial genome, we sequenced and analyzed the mitochondrial DNA (mtDNA) of Scenedesmus obliquus. This genome is 42,919 bp in size and encodes 42 conserved genes (i.e., large and small subunit rRNA genes, 27 tRNA and 13 respiratory protein-coding genes), four additional free-standing open reading frames with no known homologs, and an intronic reading frame with endonuclease/maturase similarity. No 5S rRNA or ribosomal protein-coding genes have been identified in Scenedesmus mtDNA. The standard protein-coding genes feature a deviant genetic code characterized by the use of UAG (normally a stop codon) to specify leucine, and the unprecedented use of UCA (normally a serine codon) as a signal for termination of translation. The mitochondrial genome of Scenedesmus combines features of both green algal mitochondrial genome types: the presence of a more complex set of protein-coding and tRNA genes is shared with the ancestral type, whereas the lack of 5S rRNA and ribosomal protein-coding genes as well as the presence of fragmented and scrambled rRNA genes are shared with the reduced–derived type of mitochondrial genome organization. Furthermore, the gene content and the fragmentation pattern of the rRNA genes suggest that this genome represents an intermediate stage in the evolutionary process of mitochondrial genome streamlining in green algae. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF204057.] PMID:10854413
Using a Euclid distance discriminant method to find protein coding genes in the yeast genome.

PubMed

Zhang, Chun-Ting; Wang, Ju; Zhang, Ren

2002-02-01

The Euclid distance discriminant method is used to find protein coding genes in the yeast genome, based on the single nucleotide frequencies at three codon positions in the ORFs. The method is extremely simple and may be extended to find genes in prokaryotic genomes or eukaryotic genomes with less introns. Six-fold cross-validation tests have demonstrated that the accuracy of the algorithm is better than 93%. Based on this, it is found that the total number of protein coding genes in the yeast genome is less than or equal to 5579 only, about 3.8-7.0% less than 5800-6000, which is currently widely accepted. The base compositions at three codon positions are analyzed in details using a graphic method. The result shows that the preference codons adopted by yeast genes are of the RGW type, where R, G and W indicate the bases of purine, non-G and A/T, whereas the 'codons' in the intergenic sequences are of the form NNN, where N denotes any base. This fact constitutes the basis of the algorithm to distinguish between coding and non-coding ORFs in the yeast genome. The names of putative non-coding ORFs are listed here in detail.
Identification and analysis of unitary loss of long-established protein-coding genes in Poaceae shows evidences for biased gene loss and putatively functional transcription of relics.

PubMed

Zhao, Yi; Tang, Liang; Li, Zhe; Jin, Jinpu; Luo, Jingchu; Gao, Ge

2015-04-18

Long-established protein-coding genes may lose their coding potential during evolution ("unitary gene loss"). Members of the Poaceae family are a major food source and represent an ideal model clade for plant evolution research. However, the global pattern of unitary gene loss in Poaceae genomes as well as the evolutionary fate of lost genes are still less-investigated and remain largely elusive. Using a locally developed pipeline, we identified 129 unitary gene loss events for long-established protein-coding genes from four representative species of Poaceae, i.e. brachypodium, rice, sorghum and maize. Functional annotation suggested that the lost genes in all or most of Poaceae species are enriched for genes involved in development and response to endogenous stimulus. We also found that 44 mutated genomic loci of lost genes, which we referred as relics, were still actively transcribed, and of which 84% (37 of 44) showed significantly differential expression across different tissues. More interestingly, we found that there were totally five expressed relics may function as competitive endogenous RNA in brachypodium, rice and sorghum genome. Based on comparative genomics and transcriptome data, we firstly compiled a comprehensive catalogue of unitary gene loss events in Poaceae species and characterized a statistically significant functional preference for these lost genes as well showed the potential of relics functioning as competitive endogenous RNAs in Poaceae genomes.
Rapid evolution and copy number variation of primate RHOXF2, an X-linked homeobox gene involved in male reproduction and possibly brain function.

PubMed

Niu, Ao-lei; Wang, Yin-qiu; Zhang, Hui; Liao, Cheng-hong; Wang, Jin-kai; Zhang, Rui; Che, Jun; Su, Bing

2011-10-12

Homeobox genes are the key regulators during development, and they are in general highly conserved with only a few reported cases of rapid evolution. RHOXF2 is an X-linked homeobox gene in primates. It is highly expressed in the testicle and may play an important role in spermatogenesis. As male reproductive system is often the target of natural and/or sexual selection during evolution, in this study, we aim to dissect the pattern of molecular evolution of RHOXF2 in primates and its potential functional consequence. We studied sequences and copy number variation of RHOXF2 in humans and 16 nonhuman primate species as well as the expression patterns in human, chimpanzee, white-browed gibbon and rhesus macaque. The gene copy number analysis showed that there had been parallel gene duplications/losses in multiple primate lineages. Our evidence suggests that 11 nonhuman primate species have one RHOXF2 copy, and two copies are present in humans and four Old World monkey species, and at least 6 copies in chimpanzees. Further analysis indicated that the gene duplications in primates had likely been mediated by endogenous retrovirus (ERV) sequences flanking the gene regions. In striking contrast to non-human primates, humans appear to have homogenized their two RHOXF2 copies by the ERV-mediated non-allelic recombination mechanism. Coding sequence and phylogenetic analysis suggested multi-lineage strong positive selection on RHOXF2 during primate evolution, especially during the origins of humans and chimpanzees. All the 8 coding region polymorphic sites in human populations are non-synonymous, implying on-going selection. Gene expression analysis demonstrated that besides the preferential expression in the reproductive system, RHOXF2 is also expressed in the brain. The quantitative data suggests expression pattern divergence among primate species. RHOXF2 is a fast-evolving homeobox gene in primates. The rapid evolution and copy number changes of RHOXF2 had been driven by Darwinian positive selection acting on the male reproductive system and possibly also on the central nervous system, which sheds light on understanding the role of homeobox genes in adaptive evolution.
Functional similarity and molecular divergence of a novel reproductive transcriptome in two male-pregnant Syngnathus pipefish species

PubMed Central

Small, Clayton M; Harlin-Cognato, April D; Jones, Adam G

2013-01-01

Evolutionary studies have revealed that reproductive proteins in animals and plants often evolve more rapidly than the genome-wide average. The causes of this pattern, which may include relaxed purifying selection, sexual selection, sexual conflict, pathogen resistance, reinforcement, or gene duplication, remain elusive. Investigative expansions to additional taxa and reproductive tissues have the potential to shed new light on this unresolved problem. Here, we embark on such an expansion, in a comparison of the brood-pouch transcriptome between two male-pregnant species of the pipefish genus Syngnathus. Male brooding tissues in syngnathid fishes represent a novel, nonurogenital reproductive trait, heretofore mostly uncharacterized from a molecular perspective. We leveraged next-generation sequencing (Roche 454 pyrosequencing) to compare transcript abundance in the male brooding tissues of pregnant with nonpregnant samples from Gulf (S. scovelli) and dusky (S. floridae) pipefish. A core set of protein-coding genes, including multiple members of astacin metalloprotease and c-type lectin gene families, is consistent between species in both the direction and magnitude of expression bias. As predicted, coding DNA sequence analysis of these putative “male pregnancy proteins” suggests rapid evolution relative to nondifferentially expressed genes and reflects signatures of adaptation similar in magnitude to those reported from Drosophila male accessory gland proteins. Although the precise drivers of male pregnancy protein divergence remain unknown, we argue that the male pregnancy transcriptome in syngnathid fishes, a clade diverse with respect to brooding morphology and mating system, represents a unique and promising object of study for understanding the perplexing evolutionary nature of reproductive molecules. PMID:24324861
Leber Hereditary Optic Neuropathy: Exemplar of an mtDNA Disease.

PubMed

Wallace, Douglas C; Lott, Marie T

2017-01-01

The report in 1988 that Leber Hereditary Optic Neuropathy (LHON) was the product of mitochondrial DNA (mtDNA) mutations provided the first demonstration of the clinical relevance of inherited mtDNA variation. From LHON studies, the medical importance was demonstrated for the mtDNA showing its coding for the most important energy genes, its maternal inheritance, its high mutation rate, its presence in hundreds to thousands of copies per cell, its quantitatively segregation of biallelic genotypes during both mitosis and meiosis, its preferential effect on the most energetic tissues including the eye and brain, its wide range of functional polymorphisms that predispose to common diseases, and its accumulation of mutations within somatic tissues providing the aging clock. These features of mtDNA genetics, in combination with the genetics of the 1-2000 nuclear DNA (nDNA) coded mitochondrial genes, is not only explaining the genetics of LHON but also providing a model for understanding the complexity of many common diseases. With the maturation of LHON biology and genetics, novel animal models for complex disease have been developed and new therapeutic targets and strategies envisioned, both pharmacological and genetic. Multiple somatic gene therapy approaches are being developed for LHON which are applicable to other mtDNA diseases. Moreover, the unique cytoplasmic genetics of the mtDNA has permitted the first successful human germline gene therapy via spindle nDNA transfer from mtDNA mutant oocytes to enucleated normal mtDNA oocytes. Such LHON lessons are actively being applied to common ophthalmological diseases like glaucoma and neurological diseases like Parkinsonism.
Dynamic and Widespread lncRNA Expression in a Sponge and the Origin of Animal Complexity

PubMed Central

Gaiti, Federico; Fernandez-Valverde, Selene L.; Nakanishi, Nagayasu; Calcino, Andrew D.; Yanai, Itai; Tanurdzic, Milos; Degnan, Bernard M.

2015-01-01

Long noncoding RNAs (lncRNAs) are important developmental regulators in bilaterian animals. A correlation has been claimed between the lncRNA repertoire expansion and morphological complexity in vertebrate evolution. However, this claim has not been tested by examining morphologically simple animals. Here, we undertake a systematic investigation of lncRNAs in the demosponge Amphimedon queenslandica, a morphologically simple, early-branching metazoan. We combine RNA-Seq data across multiple developmental stages of Amphimedon with a filtering pipeline to conservatively predict 2,935 lncRNAs. These include intronic overlapping lncRNAs, exonic antisense overlapping lncRNAs, long intergenic nonprotein coding RNAs, and precursors for small RNAs. Sponge lncRNAs are remarkably similar to their bilaterian counterparts in being relatively short with few exons and having low primary sequence conservation relative to protein-coding genes. As in bilaterians, a majority of sponge lncRNAs exhibit typical hallmarks of regulatory molecules, including high temporal specificity and dynamic developmental expression. Specific lncRNA expression profiles correlate tightly with conserved protein-coding genes likely involved in a range of developmental and physiological processes, such as the Wnt signaling pathway. Although the majority of Amphimedon lncRNAs appears to be taxonomically restricted with no identifiable orthologs, we find a few cases of conservation between demosponges in lncRNAs that are antisense to coding sequences. Based on the high similarity in the structure, organization, and dynamic expression of sponge lncRNAs to their bilaterian counterparts, we propose that these noncoding RNAs are an ancient feature of the metazoan genome. These results are consistent with lncRNAs regulating the development of animals, regardless of their level of morphological complexity. PMID:25976353
Positive Selection and Multiple Losses of the LINE-1-Derived L1TD1 Gene in Mammals Suggest a Dual Role in Genome Defense and Pluripotency

PubMed Central

Yang, Lei; Neme, Rafik; Wichman, Holly A.; Malik, Harmit S.

2014-01-01

Mammalian genomes comprise many active and fossilized retroelements. The obligate requirement for retroelement integration affords host genomes an opportunity to ‘domesticate’ retroelement genes for their own purpose, leading to important innovations in genome defense and placentation. While many such exaptations involve retroviruses, the L1TD1 gene is the only known domesticated gene whose protein-coding sequence is almost entirely derived from a LINE-1 (L1) retroelement. Human L1TD1 has been shown to play an important role in pluripotency maintenance. To investigate how this role was acquired, we traced the origin and evolution of L1TD1. We find that L1TD1 originated in the common ancestor of eutherian mammals, but was lost or pseudogenized multiple times during mammalian evolution. We also find that L1TD1 has evolved under positive selection during primate and mouse evolution, and that one prosimian L1TD1 has ‘replenished’ itself with a more recent L1 ORF1 from the prosimian genome. These data suggest that L1TD1 has been recurrently selected for functional novelty, perhaps for a role in genome defense. L1TD1 loss is associated with L1 extinction in several megabat lineages, but not in sigmodontine rodents. We hypothesize that L1TD1 could have originally evolved for genome defense against L1 elements. Later, L1TD1 may have become incorporated into pluripotency maintenance in some lineages. Our study highlights the role of retroelement gene domestication in fundamental aspects of mammalian biology, and that such domesticated genes can adopt different functions in different lineages. PMID:25211013

Meta-analytic framework for liquid association.

PubMed

Wang, Lin; Liu, Silvia; Ding, Ying; Yuan, Shin-Sheng; Ho, Yen-Yi; Tseng, George C

2017-07-15

Although coexpression analysis via pair-wise expression correlation is popularly used to elucidate gene-gene interactions at the whole-genome scale, many complicated multi-gene regulations require more advanced detection methods. Liquid association (LA) is a powerful tool to detect the dynamic correlation of two gene variables depending on the expression level of a third variable (LA scouting gene). LA detection from single transcriptomic study, however, is often unstable and not generalizable due to cohort bias, biological variation and limited sample size. With the rapid development of microarray and NGS technology, LA analysis combining multiple gene expression studies can provide more accurate and stable results. In this article, we proposed two meta-analytic approaches for LA analysis (MetaLA and MetaMLA) to combine multiple transcriptomic studies. To compensate demanding computing, we also proposed a two-step fast screening algorithm for more efficient genome-wide screening: bootstrap filtering and sign filtering. We applied the methods to five Saccharomyces cerevisiae datasets related to environmental changes. The fast screening algorithm reduced 98% of running time. When compared with single study analysis, MetaLA and MetaMLA provided stronger detection signal and more consistent and stable results. The top triplets are highly enriched in fundamental biological processes related to environmental changes. Our method can help biologists understand underlying regulatory mechanisms under different environmental exposure or disease states. A MetaLA R package, data and code for this article are available at http://tsenglab.biostat.pitt.edu/software.htm. ctseng@pitt.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
The complete mitochondrial genome and phylogenetic analysis of the giant panda (Ailuropoda melanoleuca).

PubMed

Peng, Rui; Zeng, Bo; Meng, Xiuxiang; Yue, Bisong; Zhang, Zhihe; Zou, Fangdong

2007-08-01

The complete mitochondrial genome sequence of the giant panda, Ailuropoda melanoleuca, was determined by the long and accurate polymerase chain reaction (LA-PCR) with conserved primers and primer walking sequence methods. The complete mitochondrial DNA is 16,805 nucleotides in length and contains two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one control region. The total length of the 13 protein-coding genes is longer than the American black bear, brown bear and polar bear by 3 amino acids at the end of ND5 gene. The codon usage also followed the typical vertebrate pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 5 (ND5) gene. The molecular phylogenetic analysis was performed on the sequences of 12 concatenated heavy-strand encoded protein-coding genes, and suggested that the giant panda is most closely related to bears.
MicroRNA signature of the human developing pancreas.

PubMed

Rosero, Samuel; Bravo-Egana, Valia; Jiang, Zhijie; Khuri, Sawsan; Tsinoremas, Nicholas; Klein, Dagmar; Sabates, Eduardo; Correa-Medina, Mayrin; Ricordi, Camillo; Domínguez-Bendala, Juan; Diez, Juan; Pastori, Ricardo L

2010-09-22

MicroRNAs are non-coding RNAs that regulate gene expression including differentiation and development by either inhibiting translation or inducing target degradation. The aim of this study is to determine the microRNA expression signature during human pancreatic development and to identify potential microRNA gene targets calculating correlations between the signature microRNAs and their corresponding mRNA targets, predicted by bioinformatics, in genome-wide RNA microarray study. The microRNA signature of human fetal pancreatic samples 10-22 weeks of gestational age (wga), was obtained by PCR-based high throughput screening with Taqman Low Density Arrays. This method led to identification of 212 microRNAs. The microRNAs were classified in 3 groups: Group number I contains 4 microRNAs with the increasing profile; II, 35 microRNAs with decreasing profile and III with 173 microRNAs, which remain unchanged. We calculated Pearson correlations between the expression profile of microRNAs and target mRNAs, predicted by TargetScan 5.1 and miRBase algorithms, using genome-wide mRNA expression data. Group I correlated with the decreasing expression of 142 target mRNAs and Group II with the increasing expression of 876 target mRNAs. Most microRNAs correlate with multiple targets, just as mRNAs are targeted by multiple microRNAs. Among the identified targets are the genes and transcription factors known to play an essential role in pancreatic development. We have determined specific groups of microRNAs in human fetal pancreas that change the degree of their expression throughout the development. A negative correlative analysis suggests an intertwined network of microRNAs and mRNAs collaborating with each other. This study provides information leading to potential two-way level of combinatorial control regulating gene expression through microRNAs targeting multiple mRNAs and, conversely, target mRNAs regulated in parallel by other microRNAs as well. This study may further the understanding of gene expression regulation in the human developing pancreas.
MicroRNA signature of the human developing pancreas

PubMed Central

2010-01-01

Background MicroRNAs are non-coding RNAs that regulate gene expression including differentiation and development by either inhibiting translation or inducing target degradation. The aim of this study is to determine the microRNA expression signature during human pancreatic development and to identify potential microRNA gene targets calculating correlations between the signature microRNAs and their corresponding mRNA targets, predicted by bioinformatics, in genome-wide RNA microarray study. Results The microRNA signature of human fetal pancreatic samples 10-22 weeks of gestational age (wga), was obtained by PCR-based high throughput screening with Taqman Low Density Arrays. This method led to identification of 212 microRNAs. The microRNAs were classified in 3 groups: Group number I contains 4 microRNAs with the increasing profile; II, 35 microRNAs with decreasing profile and III with 173 microRNAs, which remain unchanged. We calculated Pearson correlations between the expression profile of microRNAs and target mRNAs, predicted by TargetScan 5.1 and miRBase altgorithms, using genome-wide mRNA expression data. Group I correlated with the decreasing expression of 142 target mRNAs and Group II with the increasing expression of 876 target mRNAs. Most microRNAs correlate with multiple targets, just as mRNAs are targeted by multiple microRNAs. Among the identified targets are the genes and transcription factors known to play an essential role in pancreatic development. Conclusions We have determined specific groups of microRNAs in human fetal pancreas that change the degree of their expression throughout the development. A negative correlative analysis suggests an intertwined network of microRNAs and mRNAs collaborating with each other. This study provides information leading to potential two-way level of combinatorial control regulating gene expression through microRNAs targeting multiple mRNAs and, conversely, target mRNAs regulated in parallel by other microRNAs as well. This study may further the understanding of gene expression regulation in the human developing pancreas. PMID:20860821
Intact coding region of the serotonin transporter gene in obsessive-compulsive disorder

DOE Office of Scientific and Technical Information (OSTI.GOV)

Altemus, M.; Murphy, D.L.; Greenberg, B.

1996-07-26

Epidemiologic studies indicate that obsessive-compulsive disorder is genetically transmitted in some families, although no genetic abnormalities have been identified in individuals with this disorder. The selective response of obsessive-compulsive disorder to treatment with agents which block serotonin reuptake suggests the gene coding for the serotonin transporter as a candidate gene. The primary structure of the serotonin-transporter coding region was sequenced in 22 patients with obsessive-compulsive disorder, using direct PCR sequencing of cDNA synthesized from platelet serotonin-transporter mRNA. No variations in amino acid sequence were found among the obsessive-compulsive disorder patients or healthy controls. These results do not support a rolemore » for alteration in the primary structure of the coding region of the serotonin-transporter gene in the pathogenesis of obsessive-compulsive disorder. 27 refs.« less
[Convergent origin of repeats in genes coding for globular proteins. An analysis of the factors determining the presence of inverted and symmetrical repeats].

PubMed

Solov'ev, V V; Kel', A E; Kolchanov, N A

1989-01-01

The factors, determining the presence of inverted and symmetrical repeats in genes coding for globular proteins, have been analysed. An interesting property of genetical code has been revealed in the analysis of symmetrical repeats: the pairs of symmetrical codons corresponded to pairs of amino acids with mostly similar physical-chemical parameters. This property may explain the presence of symmetrical repeats and palindromes only in genes coding for beta-structural proteins-polypeptides, where amino acids with similar physical-chemical properties occupy symmetrical positions. A stochastic model of evolution of polynucleotide sequences has been used for analysis of inverted repeats. The modelling demonstrated that only limiting of sequences (uneven frequencies of used codons) is enough for arising of nonrandom inverted repeats in genes.
Coding and decoding for code division multiple user communication systems

NASA Technical Reports Server (NTRS)

Healy, T. J.

1985-01-01

A new algorithm is introduced which decodes code division multiple user communication signals. The algorithm makes use of the distinctive form or pattern of each signal to separate it from the composite signal created by the multiple users. Although the algorithm is presented in terms of frequency-hopped signals, the actual transmitter modulator can use any of the existing digital modulation techniques. The algorithm is applicable to error-free codes or to codes where controlled interference is permitted. It can be used when block synchronization is assumed, and in some cases when it is not. The paper also discusses briefly some of the codes which can be used in connection with the algorithm, and relates the algorithm to past studies which use other approaches to the same problem.
Detection of Pathways Affected by Positive Selection in Primate Lineages Ancestral to Humans

PubMed Central

Moretti, S.; Davydov, I.I.; Excoffier, L.

2017-01-01

Abstract Gene set enrichment approaches have been increasingly successful in finding signals of recent polygenic selection in the human genome. In this study, we aim at detecting biological pathways affected by positive selection in more ancient human evolutionary history. Focusing on four branches of the primate tree that lead to modern humans, we tested all available protein coding gene trees of the Primates clade for signals of adaptation in these branches, using the likelihood-based branch site test of positive selection. The results of these locus-specific tests were then used as input for a gene set enrichment test, where whole pathways are globally scored for a signal of positive selection, instead of focusing only on outlier “significant” genes. We identified signals of positive selection in several pathways that are mainly involved in immune response, sensory perception, metabolism, and energy production. These pathway-level results are highly significant, even though there is no functional enrichment when only focusing on top scoring genes. Interestingly, several gene sets are found significant at multiple levels in the phylogeny, but different genes are responsible for the selection signal in the different branches. This suggests that the same function has been optimized in different ways at different times in primate evolution. PMID:28333345
Decoding the ubiquitous role of microRNAs in neurogenesis.

PubMed

Nampoothiri, Sreekala S; Rajanikant, G K

2017-04-01

Neurogenesis generates fledgling neurons that mature to form an intricate neuronal circuitry. The delusion on adult neurogenesis was far resolved in the past decade and became one of the largely explored domains to identify multifaceted mechanisms bridging neurodevelopment and neuropathology. Neurogenesis encompasses multiple processes including neural stem cell proliferation, neuronal differentiation, and cell fate determination. Each neurogenic process is specifically governed by manifold signaling pathways, several growth factors, coding, and non-coding RNAs. A class of small non-coding RNAs, microRNAs (miRNAs), is ubiquitously expressed in the brain and has emerged to be potent regulators of neurogenesis. It functions by fine-tuning the expression of specific neurogenic gene targets at the post-transcriptional level and modulates the development of mature neurons from neural progenitor cells. Besides the commonly discussed intrinsic factors, the neuronal morphogenesis is also under the control of several extrinsic temporal cues, which in turn are regulated by miRNAs. This review enlightens on dicer controlled switch from neurogenesis to gliogenesis, miRNA regulation of neuronal maturation and the differential expression of miRNAs in response to various extrinsic cues affecting neurogenesis.
Genomic evidence for genes encoding leucine-rich repeat receptors linked to resistance against the eukaryotic extra- and intracellular Brassica napus pathogens Leptosphaeria maculans and Plasmodiophora brassicae.

PubMed

Stotz, Henrik U; Harvey, Pascoe J; Haddadi, Parham; Mashanova, Alla; Kukol, Andreas; Larkan, Nicholas J; Borhan, M Hossein; Fitt, Bruce D L

2018-01-01

Genes coding for nucleotide-binding leucine-rich repeat (LRR) receptors (NLRs) control resistance against intracellular (cell-penetrating) pathogens. However, evidence for a role of genes coding for proteins with LRR domains in resistance against extracellular (apoplastic) fungal pathogens is limited. Here, the distribution of genes coding for proteins with eLRR domains but lacking kinase domains was determined for the Brassica napus genome. Predictions of signal peptide and transmembrane regions divided these genes into 184 coding for receptor-like proteins (RLPs) and 121 coding for secreted proteins (SPs). Together with previously annotated NLRs, a total of 720 LRR genes were found. Leptosphaeria maculans-induced expression during a compatible interaction with cultivar Topas differed between RLP, SP and NLR gene families; NLR genes were induced relatively late, during the necrotrophic phase of pathogen colonization. Seven RLP, one SP and two NLR genes were found in Rlm1 and Rlm3/Rlm4/Rlm7/Rlm9 loci for resistance against L. maculans on chromosome A07 of B. napus. One NLR gene at the Rlm9 locus was positively selected, as was the RLP gene on chromosome A10 with LepR3 and Rlm2 alleles conferring resistance against L. maculans races with corresponding effectors AvrLm1 and AvrLm2, respectively. Known loci for resistance against L. maculans (extracellular hemi-biotrophic fungus), Sclerotinia sclerotiorum (necrotrophic fungus) and Plasmodiophora brassicae (intracellular, obligate biotrophic protist) were examined for presence of RLPs, SPs and NLRs in these regions. Whereas loci for resistance against P. brassicae were enriched for NLRs, no such signature was observed for the other pathogens. These findings demonstrate involvement of (i) NLR genes in resistance against the intracellular pathogen P. brassicae and a putative NLR gene in Rlm9-mediated resistance against the extracellular pathogen L. maculans.
A genome-wide survey of maternal and embryonic transcripts during Xenopus tropicalis development.

PubMed

Paranjpe, Sarita S; Jacobi, Ulrike G; van Heeringen, Simon J; Veenstra, Gert Jan C

2013-11-06

Dynamics of polyadenylation vs. deadenylation determine the fate of several developmentally regulated genes. Decay of a subset of maternal mRNAs and new transcription define the maternal-to-zygotic transition, but the full complement of polyadenylated and deadenylated coding and non-coding transcripts has not yet been assessed in Xenopus embryos. To analyze the dynamics and diversity of coding and non-coding transcripts during development, both polyadenylated mRNA and ribosomal RNA-depleted total RNA were harvested across six developmental stages and subjected to high throughput sequencing. The maternally loaded transcriptome is highly diverse and consists of both polyadenylated and deadenylated transcripts. Many maternal genes show peak expression in the oocyte and include genes which are known to be the key regulators of events like oocyte maturation and fertilization. Of all the transcripts that increase in abundance between early blastula and larval stages, about 30% of the embryonic genes are induced by fourfold or more by the late blastula stage and another 35% by late gastrulation. Using a gene model validation and discovery pipeline, we identified novel transcripts and putative long non-coding RNAs (lncRNA). These lncRNA transcripts were stringently selected as spliced transcripts generated from independent promoters, with limited coding potential and a codon bias characteristic of noncoding sequences. Many lncRNAs are conserved and expressed in a developmental stage-specific fashion. These data reveal dynamics of transcriptome polyadenylation and abundance and provides a high-confidence catalogue of novel and long non-coding RNAs.
Clinical potential of oligonucleotide-based therapeutics in the respiratory system.

PubMed

Moschos, Sterghios A; Usher, Louise; Lindsay, Mark A

2017-01-01

The discovery of an ever-expanding plethora of coding and non-coding RNAs with nodal and causal roles in the regulation of lung physiology and disease is reinvigorating interest in the clinical utility of the oligonucleotide therapeutic class. This is strongly supported through recent advances in nucleic acids chemistry, synthetic oligonucleotide delivery and viral gene therapy that have succeeded in bringing to market at least three nucleic acid-based drugs. As a consequence, multiple new candidates such as RNA interference modulators, antisense, and splice switching compounds are now progressing through clinical evaluation. Here, manipulation of RNA for the treatment of lung disease is explored, with emphasis on robust pharmacological evidence aligned to the five pillars of drug development: exposure to the appropriate tissue, binding to the desired molecular target, evidence of the expected mode of action, activity in the relevant patient population and commercially viable value proposition. Copyright © 2016 Elsevier Inc. All rights reserved.
Mapping to Irregular Torus Topologies and Other Techniques for Petascale Biomolecular Simulation

PubMed Central

Phillips, James C.; Sun, Yanhua; Jain, Nikhil; Bohm, Eric J.; Kalé, Laxmikant V.

2014-01-01

Currently deployed petascale supercomputers typically use toroidal network topologies in three or more dimensions. While these networks perform well for topology-agnostic codes on a few thousand nodes, leadership machines with 20,000 nodes require topology awareness to avoid network contention for communication-intensive codes. Topology adaptation is complicated by irregular node allocation shapes and holes due to dedicated input/output nodes or hardware failure. In the context of the popular molecular dynamics program NAMD, we present methods for mapping a periodic 3-D grid of fixed-size spatial decomposition domains to 3-D Cray Gemini and 5-D IBM Blue Gene/Q toroidal networks to enable hundred-million atom full machine simulations, and to similarly partition node allocations into compact domains for smaller simulations using multiple-copy algorithms. Additional enabling techniques are discussed and performance is reported for NCSA Blue Waters, ORNL Titan, ANL Mira, TACC Stampede, and NERSC Edison. PMID:25594075
De Novo Coding Variants Are Strongly Associated with Tourette Disorder.

PubMed

Willsey, A Jeremy; Fernandez, Thomas V; Yu, Dongmei; King, Robert A; Dietrich, Andrea; Xing, Jinchuan; Sanders, Stephan J; Mandell, Jeffrey D; Huang, Alden Y; Richer, Petra; Smith, Louw; Dong, Shan; Samocha, Kaitlin E; Neale, Benjamin M; Coppola, Giovanni; Mathews, Carol A; Tischfield, Jay A; Scharf, Jeremiah M; State, Matthew W; Heiman, Gary A

2017-05-03

Whole-exome sequencing (WES) and de novo variant detection have proven a powerful approach to gene discovery in complex neurodevelopmental disorders. We have completed WES of 325 Tourette disorder trios from the Tourette International Collaborative Genetics cohort and a replication sample of 186 trios from the Tourette Syndrome Association International Consortium on Genetics (511 total). We observe strong and consistent evidence for the contribution of de novo likely gene-disrupting (LGD) variants (rate ratio [RR] 2.32, p = 0.002). Additionally, de novo damaging variants (LGD and probably damaging missense) are overrepresented in probands (RR 1.37, p = 0.003). We identify four likely risk genes with multiple de novo damaging variants in unrelated probands: WWC1 (WW and C2 domain containing 1), CELSR3 (Cadherin EGF LAG seven-pass G-type receptor 3), NIPBL (Nipped-B-like), and FN1 (fibronectin 1). Overall, we estimate that de novo damaging variants in approximately 400 genes contribute risk in 12% of clinical cases. VIDEO ABSTRACT. Copyright © 2017 Elsevier Inc. All rights reserved.
Genomicus update 2015: KaryoView and MatrixView provide a genome-wide perspective to multispecies comparative genomics

PubMed Central

Louis, Alexandra; Nguyen, Nga Thi Thuy; Muffato, Matthieu; Roest Crollius, Hugues

2015-01-01

The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets. PMID:25378326
Regulation of Global Transcription in Escherichia coli by Rsd and 6S RNA

PubMed Central

Lal, Avantika; Krishna, Sandeep; Seshasayee, Aswin Sai Narain

2018-01-01

In Escherichia coli, the sigma factor σ70 directs RNA polymerase to transcribe growth-related genes, while σ38 directs transcription of stress response genes during stationary phase. Two molecules hypothesized to regulate RNA polymerase are the protein Rsd, which binds to σ70, and the non-coding 6S RNA which binds to the RNA polymerase-σ70 holoenzyme. Despite multiple studies, the functions of Rsd and 6S RNA remain controversial. Here we use RNA-Seq in five phases of growth to elucidate their function on a genome-wide scale. We show that Rsd and 6S RNA facilitate σ38 activity throughout bacterial growth, while 6S RNA also regulates widely different genes depending upon growth phase. We discover novel interactions between 6S RNA and Rsd and show widespread expression changes in a strain lacking both regulators. Finally, we present a mathematical model of transcription which highlights the crosstalk between Rsd and 6S RNA as a crucial factor in controlling sigma factor competition and global gene expression. PMID:29686109
Regulation of Global Transcription in Escherichia coli by Rsd and 6S RNA.

PubMed

Lal, Avantika; Krishna, Sandeep; Seshasayee, Aswin Sai Narain

2018-05-31

In Escherichia coli , the sigma factor σ 70 directs RNA polymerase to transcribe growth-related genes, while σ 38 directs transcription of stress response genes during stationary phase. Two molecules hypothesized to regulate RNA polymerase are the protein Rsd, which binds to σ 70 , and the non-coding 6S RNA which binds to the RNA polymerase-σ 70 holoenzyme. Despite multiple studies, the functions of Rsd and 6S RNA remain controversial. Here we use RNA-Seq in five phases of growth to elucidate their function on a genome-wide scale. We show that Rsd and 6S RNA facilitate σ 38 activity throughout bacterial growth, while 6S RNA also regulates widely different genes depending upon growth phase. We discover novel interactions between 6S RNA and Rsd and show widespread expression changes in a strain lacking both regulators. Finally, we present a mathematical model of transcription which highlights the crosstalk between Rsd and 6S RNA as a crucial factor in controlling sigma factor competition and global gene expression. Copyright © 2018 Lal et al.
Similarity-based prediction for Anatomical Therapeutic Chemical classification of drugs by integrating multiple data sources.

PubMed

Liu, Zhongyang; Guo, Feifei; Gu, Jiangyong; Wang, Yong; Li, Yang; Wang, Dan; Lu, Liang; Li, Dong; He, Fuchu

2015-06-01

Anatomical Therapeutic Chemical (ATC) classification system, widely applied in almost all drug utilization studies, is currently the most widely recognized classification system for drugs. Currently, new drug entries are added into the system only on users' requests, which leads to seriously incomplete drug coverage of the system, and bioinformatics prediction is helpful during this process. Here we propose a novel prediction model of drug-ATC code associations, using logistic regression to integrate multiple heterogeneous data sources including chemical structures, target proteins, gene expression, side-effects and chemical-chemical associations. The model obtains good performance for the prediction not only on ATC codes of unclassified drugs but also on new ATC codes of classified drugs assessed by cross-validation and independent test sets, and its efficacy exceeds previous methods. Further to facilitate the use, the model is developed into a user-friendly web service SPACE ( S: imilarity-based P: redictor of A: TC C: od E: ), which for each submitted compound, will give candidate ATC codes (ranked according to the decreasing probability_score predicted by the model) together with corresponding supporting evidence. This work not only contributes to knowing drugs' therapeutic, pharmacological and chemical properties, but also provides clues for drug repositioning and side-effect discovery. In addition, the construction of the prediction model also provides a general framework for similarity-based data integration which is suitable for other drug-related studies such as target, side-effect prediction etc. The web service SPACE is available at http://www.bprc.ac.cn/space. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Base-By-Base: single nucleotide-level analysis of whole viral genome alignments.

PubMed

Brodie, Ryan; Smith, Alex J; Roper, Rachel L; Tcherepanov, Vasily; Upton, Chris

2004-07-14

With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes) is not feasible without new bioinformatics tools. A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1) rapidly identify and correct alignment errors in large, multiple genome alignments; and 2) generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs) to retrieve detailed annotation information about the aligned genomes or use information from text files. Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.
Automated DNA mutation detection using universal conditions direct sequencing: application to ten muscular dystrophy genes

PubMed Central

2009-01-01

Background One of the most common and efficient methods for detecting mutations in genes is PCR amplification followed by direct sequencing. Until recently, the process of designing PCR assays has been to focus on individual assay parameters rather than concentrating on matching conditions for a set of assays. Primers for each individual assay were selected based on location and sequence concerns. The two primer sequences were then iteratively adjusted to make the individual assays work properly. This generally resulted in groups of assays with different annealing temperatures that required the use of multiple thermal cyclers or multiple passes in a single thermal cycler making diagnostic testing time-consuming, laborious and expensive. These factors have severely hampered diagnostic testing services, leaving many families without an answer for the exact cause of a familial genetic disease. A search of GeneTests for sequencing analysis of the entire coding sequence for genes that are known to cause muscular dystrophies returns only a small list of laboratories that perform comprehensive gene panels. The hypothesis for the study was that a complete set of universal assays can be designed to amplify and sequence any gene or family of genes using computer aided design tools. If true, this would allow automation and optimization of the mutation detection process resulting in reduced cost and increased throughput. Results An automated process has been developed for the detection of deletions, duplications/insertions and point mutations in any gene or family of genes and has been applied to ten genes known to bear mutations that cause muscular dystrophy: DMD; CAV3; CAPN3; FKRP; TRIM32; LMNA; SGCA; SGCB; SGCG; SGCD. Using this process, mutations have been found in five DMD patients and four LGMD patients (one in the FKRP gene, one in the CAV3 gene, and two likely causative heterozygous pairs of variations in the CAPN3 gene of two other patients). Methods and assay sequences are reported in this paper. Conclusion This automated process allows laboratories to discover DNA variations in a short time and at low cost. PMID:19835634

Automated DNA mutation detection using universal conditions direct sequencing: application to ten muscular dystrophy genes.

PubMed

Bennett, Richard R; Schneider, Hal E; Estrella, Elicia; Burgess, Stephanie; Cheng, Andrew S; Barrett, Caitlin; Lip, Va; Lai, Poh San; Shen, Yiping; Wu, Bai-Lin; Darras, Basil T; Beggs, Alan H; Kunkel, Louis M

2009-10-18

One of the most common and efficient methods for detecting mutations in genes is PCR amplification followed by direct sequencing. Until recently, the process of designing PCR assays has been to focus on individual assay parameters rather than concentrating on matching conditions for a set of assays. Primers for each individual assay were selected based on location and sequence concerns. The two primer sequences were then iteratively adjusted to make the individual assays work properly. This generally resulted in groups of assays with different annealing temperatures that required the use of multiple thermal cyclers or multiple passes in a single thermal cycler making diagnostic testing time-consuming, laborious and expensive.These factors have severely hampered diagnostic testing services, leaving many families without an answer for the exact cause of a familial genetic disease. A search of GeneTests for sequencing analysis of the entire coding sequence for genes that are known to cause muscular dystrophies returns only a small list of laboratories that perform comprehensive gene panels.The hypothesis for the study was that a complete set of universal assays can be designed to amplify and sequence any gene or family of genes using computer aided design tools. If true, this would allow automation and optimization of the mutation detection process resulting in reduced cost and increased throughput. An automated process has been developed for the detection of deletions, duplications/insertions and point mutations in any gene or family of genes and has been applied to ten genes known to bear mutations that cause muscular dystrophy: DMD; CAV3; CAPN3; FKRP; TRIM32; LMNA; SGCA; SGCB; SGCG; SGCD. Using this process, mutations have been found in five DMD patients and four LGMD patients (one in the FKRP gene, one in the CAV3 gene, and two likely causative heterozygous pairs of variations in the CAPN3 gene of two other patients). Methods and assay sequences are reported in this paper. This automated process allows laboratories to discover DNA variations in a short time and at low cost.
The complete mitochondrial genome of Papilio glaucus and its phylogenetic implications.

PubMed

Shen, Jinhui; Cong, Qian; Grishin, Nick V

2015-09-01

Due to the intriguing morphology, lifecycle, and diversity of butterflies and moths, Lepidoptera are emerging as model organisms for the study of genetics, evolution and speciation. The progress of these studies relies on decoding Lepidoptera genomes, both nuclear and mitochondrial. Here we describe a protocol to obtain mitogenomes from Next Generation Sequencing reads performed for whole-genome sequencing and report the complete mitogenome of Papilio (Pterourus) glaucus. The circular mitogenome is 15,306 bp in length and rich in A and T. It contains 13 protein-coding genes (PCGs), 22 transfer-RNA-coding genes (tRNA), and 2 ribosomal-RNA-coding genes (rRNA), with a gene order typical for mitogenomes of Lepidoptera. We performed phylogenetic analyses based on PCG and RNA-coding genes or protein sequences using Bayesian Inference and Maximum Likelihood methods. The phylogenetic trees consistently show that among species with available mitogenomes Papilio glaucus is the closest to Papilio (Agehana) maraho from Asia.
Categorical Variables in Multiple Regression: Some Cautions.

ERIC Educational Resources Information Center

O'Grady, Kevin E.; Medoff, Deborah R.

1988-01-01

Limitations of dummy coding and nonsense coding as methods of coding categorical variables for use as predictors in multiple regression analysis are discussed. The combination of these approaches often yields estimates and tests of significance that are not intended by researchers for inclusion in their models. (SLD)
Lassa-Vesicular Stomatitis Chimeric Virus Safely Destroys Brain Tumors

PubMed Central

Wollmann, Guido; Drokhlyansky, Eugene; Davis, John N.; Cepko, Connie

2015-01-01

ABSTRACT High-grade tumors in the brain are among the deadliest of cancers. Here, we took a promising oncolytic virus, vesicular stomatitis virus (VSV), and tested the hypothesis that the neurotoxicity associated with the virus could be eliminated without blocking its oncolytic potential in the brain by replacing the neurotropic VSV glycoprotein with the glycoprotein from one of five different viruses, including Ebola virus, Marburg virus, lymphocytic choriomeningitis virus (LCMV), rabies virus, and Lassa virus. Based on in vitro infections of normal and tumor cells, we selected two viruses to test in vivo. Wild-type VSV was lethal when injected directly into the brain. In contrast, a novel chimeric virus (VSV-LASV-GPC) containing genes from both the Lassa virus glycoprotein precursor (GPC) and VSV showed no adverse actions within or outside the brain and targeted and completely destroyed brain cancer, including high-grade glioblastoma and melanoma, even in metastatic cancer models. When mice had two brain tumors, intratumoral VSV-LASV-GPC injection in one tumor (glioma or melanoma) led to complete tumor destruction; importantly, the virus moved contralaterally within the brain to selectively infect the second noninjected tumor. A chimeric virus combining VSV genes with the gene coding for the Ebola virus glycoprotein was safe in the brain and also selectively targeted brain tumors but was substantially less effective in destroying brain tumors and prolonging survival of tumor-bearing mice. A tropism for multiple cancer types combined with an exquisite tumor specificity opens a new door to widespread application of VSV-LASV-GPC as a safe and efficacious oncolytic chimeric virus within the brain. IMPORTANCE Many viruses have been tested for their ability to target and kill cancer cells. Vesicular stomatitis virus (VSV) has shown substantial promise, but a key problem is that if it enters the brain, it can generate adverse neurologic consequences, including death. We tested a series of chimeric viruses containing genes coding for VSV, together with a gene coding for the glycoprotein from other viruses, including Ebola virus, Lassa virus, LCMV, rabies virus, and Marburg virus, which was substituted for the VSV glycoprotein gene. Ebola and Lassa chimeric viruses were safe in the brain and targeted brain tumors. Lassa-VSV was particularly effective, showed no adverse side effects even when injected directly into the brain, and targeted and destroyed two different types of deadly brain cancer, including glioblastoma and melanoma. PMID:25878115
Next-generation sequencing of the Trichinella murrelli mitochondrial genome allows comprehensive comparison of its divergence from the principal agent of human trichinellosis, Trichinella spiralis.

PubMed

Webb, Kristen M; Rosenthal, Benjamin M

2011-01-01

The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple individuals that necessarily comprise such templates. Copyright © 2010 Elsevier B.V. All rights reserved.
Activity-Dependent Human Brain Coding/Noncoding Gene Regulatory Networks

PubMed Central

Lipovich, Leonard; Dachet, Fabien; Cai, Juan; Bagla, Shruti; Balan, Karina; Jia, Hui; Loeb, Jeffrey A.

2012-01-01

While most gene transcription yields RNA transcripts that code for proteins, a sizable proportion of the genome generates RNA transcripts that do not code for proteins, but may have important regulatory functions. The brain-derived neurotrophic factor (BDNF) gene, a key regulator of neuronal activity, is overlapped by a primate-specific, antisense long noncoding RNA (lncRNA) called BDNFOS. We demonstrate reciprocal patterns of BDNF and BDNFOS transcription in highly active regions of human neocortex removed as a treatment for intractable seizures. A genome-wide analysis of activity-dependent coding and noncoding human transcription using a custom lncRNA microarray identified 1288 differentially expressed lncRNAs, of which 26 had expression profiles that matched activity-dependent coding genes and an additional 8 were adjacent to or overlapping with differentially expressed protein-coding genes. The functions of most of these protein-coding partner genes, such as ARC, include long-term potentiation, synaptic activity, and memory. The nuclear lncRNAs NEAT1, MALAT1, and RPPH1, composing an RNAse P-dependent lncRNA-maturation pathway, were also upregulated. As a means to replicate human neuronal activity, repeated depolarization of SY5Y cells resulted in sustained CREB activation and produced an inverse pattern of BDNF-BDNFOS co-expression that was not achieved with a single depolarization. RNAi-mediated knockdown of BDNFOS in human SY5Y cells increased BDNF expression, suggesting that BDNFOS directly downregulates BDNF. Temporal expression patterns of other lncRNA-messenger RNA pairs validated the effect of chronic neuronal activity on the transcriptome and implied various lncRNA regulatory mechanisms. lncRNAs, some of which are unique to primates, thus appear to have potentially important regulatory roles in activity-dependent human brain plasticity. PMID:22960213
Novel insights into the response of Atlantic salmon (Salmo salar) to Piscirickettsia salmonis: Interplay of coding genes and lncRNAs during bacterial infection.

PubMed

Valenzuela-Miranda, Diego; Gallardo-Escárate, Cristian

2016-12-01

Despite the high prevalence and impact to Chilean salmon aquaculture of the intracellular bacterium Piscirickettsia salmonis, the molecular underpinnings of host-pathogen interactions remain unclear. Herein, the interplay of coding and non-coding transcripts has been proposed as a key mechanism involved in immune response. Therefore, the aim of this study was to evidence how coding and non-coding transcripts are modulated during the infection process of Atlantic salmon with P. salmonis. For this, RNA-seq was conducted in brain, spleen, and head kidney samples, revealing different transcriptional profiles according to bacterial load. Additionally, while most of the regulated genes annotated for diverse biological processes during infection, a common response associated with clathrin-mediated endocytosis and iron homeostasis was present in all tissues. Interestingly, while endocytosis-promoting factors and clathrin inductions were upregulated, endocytic receptors were mainly downregulated. Furthermore, the regulation of genes related to iron homeostasis suggested an intracellular accumulation of iron, a process in which heme biosynthesis/degradation pathways might play an important role. Regarding the non-coding response, 918 putative long non-coding RNAs were identified, where 425 were newly characterized for S. salar. Finally, co-localization and co-expression analyses revealed a strong correlation between the modulations of long non-coding RNAs and genes associated with endocytosis and iron homeostasis. These results represent the first comprehensive study of putative interplaying mechanisms of coding and non-coding RNAs during bacterial infection in salmonids. Copyright Â© 2016 Elsevier Ltd. All rights reserved.
Caste- and development-associated gene expression in a lower termite

PubMed Central

Scharf, Michael E; Wu-Scharf, Dancia; Pittendrigh, Barry R; Bennett, Gary W

2003-01-01

Background Social insects such as termites express dramatic polyphenism (the occurrence of multiple forms in a species on the basis of differential gene expression) both in association with caste differentiation and between castes after differentiation. We have used cDNA macroarrays to compare gene expression between polyphenic castes and intermediary developmental stages of the termite Reticulitermes flavipes. Results We identified differentially expressed genes from nine ontogenic categories. Quantitative PCR was used to quantify precise differences in gene expression between castes and between intermediary developmental stages. We found worker and nymph-biased expression of transcripts encoding termite and endosymbiont cellulases; presoldier-biased expression of transcripts encoding the storage/hormone-binding protein vitellogenin; and soldier-biased expression of gene transcripts encoding two transcription/translation factors, two signal transduction factors and four cytoskeletal/muscle proteins. The two transcription/translation factors showed significant homology to the bicaudal and bric-a-brac developmental genes of Drosophila. Conclusions Our results show differential expression of regulatory, structural and enzyme-coding genes in association with termite castes and their developmental precursor stages. They also provide the first glimpse into how insect endosymbiont cellulase gene expression can vary in association with the caste of a host. These findings shed light on molecular processes associated with termite biology, polyphenism, caste differentiation and development and highlight potentially interesting variations in developmental themes between termites, other insects, and higher animals. PMID:14519197
A novel all-optical label processing for OPS networks based on multiple OOC sequences from multiple-groups OOC

NASA Astrophysics Data System (ADS)

Qiu, Kun; Zhang, Chongfu; Ling, Yun; Wang, Yibo

2007-11-01

This paper proposes an all-optical label processing scheme using multiple optical orthogonal codes sequences (MOOCS) for optical packet switching (OPS) (MOOCS-OPS) networks, for the first time to the best of our knowledge. In this scheme, the multiple optical orthogonal codes (MOOC) from multiple-groups optical orthogonal codes (MGOOC) are permuted and combined to obtain the MOOCS for the optical labels, which are used to effectively enlarge the capacity of available optical codes for optical labels. The optical label processing (OLP) schemes are reviewed and analyzed, the principles of MOOCS-based optical labels for OPS networks are given, and analyzed, then the MOOCS-OPS topology and the key realization units of the MOOCS-based optical label packets are studied in detail, respectively. The performances of this novel all-optical label processing technology are analyzed, the corresponding simulation is performed. These analysis and results show that the proposed scheme can overcome the lack of available optical orthogonal codes (OOC)-based optical labels due to the limited number of single OOC for optical label with the short code length, and indicate that the MOOCS-OPS scheme is feasible.
Comparisons between Arabidopsis thaliana and Drosophila melanogaster in relation to Coding and Noncoding Sequence Length and Gene Expression

PubMed Central

Caldwell, Rachel; Lin, Yan-Xia; Zhang, Ren

2015-01-01

There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length. PMID:26114098
Large-Signal Code TESLA: Current Status and Recent Development

DTIC Science & Technology

2008-04-01

K.Eppley, J.J.Petillo, “ High - power four cavity S - band multiple- beam klystron design”, IEEE Trans. Plasma Sci. , vol. 32, pp. 1119-1135, June 2004. 4...advances in the development of the large-signal code TESLA, mainly used for the modeling of high - power single- beam and multiple-beam klystron ...amplifiers. Keywords: large-signal code; multiple-beam klystrons ; serial and parallel versions. Introduction The optimization and design of new high power
Development of the Average Likelihood Function for Code Division Multiple Access (CDMA) Using BPSK and QPSK Symbols

DTIC Science & Technology

2015-01-01

This research has the purpose to establish a foundation for new classification and estimation of CDMA signals. Keywords: DS / CDMA signals, BPSK, QPSK...DEVELOPMENT OF THE AVERAGE LIKELIHOOD FUNCTION FOR CODE DIVISION MULTIPLE ACCESS ( CDMA ) USING BPSK AND QPSK SYMBOLS JANUARY 2015...To) OCT 2013 – OCT 2014 4. TITLE AND SUBTITLE DEVELOPMENT OF THE AVERAGE LIKELIHOOD FUNCTION FOR CODE DIVISION MULTIPLE ACCESS ( CDMA ) USING BPSK
Novel mutations in the STK11 gene in Thai patients with Peutz-Jeghers syndrome

PubMed Central

Ausavarat, Surasawadee; Leoyklang, Petcharat; Vejchapipat, Paisarn; Chongsrisawat, Voranush; Suphapeetiporn, Kanya; Shotelersuk, Vorasuk

2009-01-01

Peutz-Jeghers syndrome (PJS), a rare autosomal dominant inherited disorder, is characterized by hamartomatous gastrointestinal polyps and mucocutaneous pigmentation. Patients with this syndrome have a predisposition to a variety of cancers in multiple organs. Mutations in the serine/threonine kinase 11 (STK11) gene have been identified as a major cause of PJS. Here we present the clinical and molecular findings of two unrelated Thai individuals with PJS. Mutation analysis by Polymerase Chain Reaction-sequencing of the entire coding region of STK11 revealed two potentially pathogenic mutations. One harbored a single nucleotide deletion (c.182delG) in exon 1 resulting in a frameshift leading to premature termination at codon 63 (p.Gly61AlafsX63). The other carried an in-frame 9-base-pair (bp) deletion in exon 7, c.907_915del9 (p.Ile303_Gln305del). Both deletions were de novo and have never been previously described. This study has expanded the genotypic spectrum of the STK11 gene. PMID:19908348
Dcode.org anthology of comparative genomic tools.

PubMed

Loots, Gabriela G; Ovcharenko, Ivan

2005-07-01

Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the non-coding encryption of gene regulation across genomes. To facilitate the practical application of comparative sequence analysis to genetics and genomics, we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools, zPicture and Mulan; a phylogenetic shadowing tool, eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools, rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, Creme 2.0; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here, we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ website.
CORUM: the comprehensive resource of mammalian protein complexes

PubMed Central

Ruepp, Andreas; Brauner, Barbara; Dunger-Kaltenbach, Irmtraud; Frishman, Goar; Montrone, Corinna; Stransky, Michael; Waegele, Brigitte; Schmidt, Thorsten; Doudieu, Octave Noubibou; Stümpflen, Volker; Mewes, H. Werner

2008-01-01

Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The CORUM (http://mips.gsf.de/genre/proj/corum/index.html) database is a collection of experimentally verified mammalian protein complexes. Information is manually derived by critical reading of the scientific literature from expert annotators. Information about protein complexes includes protein complex names, subunits, literature references as well as the function of the complexes. For functional annotation, we use the FunCat catalogue that enables to organize the protein complex space into biologically meaningful subsets. The database contains more than 1750 protein complexes that are built from 2400 different genes, thus representing 12% of the protein-coding genes in human. A web-based system is available to query, view and download the data. CORUM provides a comprehensive dataset of protein complexes for discoveries in systems biology, analyses of protein networks and protein complex-associated diseases. Comparable to the MIPS reference dataset of protein complexes from yeast, CORUM intends to serve as a reference for mammalian protein complexes. PMID:17965090
In silico screening of the chicken genome for overlaps between genomic regions: microRNA genes, coding and non-coding transcriptional units, QTL, and genetic variations.

PubMed

Zorc, Minja; Kunej, Tanja

2016-05-01

MicroRNAs (miRNAs) are a class of non-coding RNAs involved in posttranscriptional regulation of target genes. Regulation requires complementarity between target mRNA and the mature miRNA seed region, responsible for their recognition and binding. It has been estimated that each miRNA targets approximately 200 genes, and genetic variability of miRNA genes has been reported to affect phenotypic variability and disease susceptibility in humans, livestock species, and model organisms. Polymorphisms in miRNA genes could therefore represent biomarkers for phenotypic traits in livestock animals. In our previous study, we collected polymorphisms within miRNA genes in chicken. In the present study, we identified miRNA-related genomic overlaps to prioritize genomic regions of interest for further functional studies and biomarker discovery. Overlapping genomic regions in chicken were analyzed using the following bioinformatics tools and databases: miRNA SNiPer, Ensembl, miRBase, NCBI Blast, and QTLdb. Out of 740 known pre-miRNA genes, 263 (35.5 %) contain polymorphisms; among them, 35 contain more than three polymorphisms The most polymorphic miRNA genes in chicken are gga-miR-6662, containing 23 single nucleotide polymorphisms (SNPs) within the pre-miRNA region, including five consecutive SNPs, and gga-miR-6688, containing ten polymorphisms including three consecutive polymorphisms. Several miRNA-related genomic hotspots have been revealed in chicken genome; polymorphic miRNA genes are located within protein-coding and/or non-coding transcription units and quantitative trait loci (QTL) associated with production traits. The present study includes the first description of an exonic miRNA in a chicken genome, an overlap between the miRNA gene and the exon of the protein-coding gene (gga-miR-6578/HADHB), and the first report of a missense polymorphism located within a mature miRNA seed region. Identified miRNA-related genomic hotspots in chicken can serve researchers as a starting point for further functional studies and association studies with poultry production and health traits and the basis for systematic screening of exonic miRNAs and missense/miRNA seed polymorphisms in other genomes.
Systematic analysis of human kinase genes: a large number of genes and alternative splicing events result in functional and structural diversity

PubMed Central

Milanesi, Luciano; Petrillo, Mauro; Sepe, Leandra; Boccia, Angelo; D'Agostino, Nunzio; Passamano, Myriam; Di Nardo, Salvatore; Tasco, Gianluca; Casadio, Rita; Paolella, Giovanni

2005-01-01

Background Protein kinases are a well defined family of proteins, characterized by the presence of a common kinase catalytic domain and playing a significant role in many important cellular processes, such as proliferation, maintenance of cell shape, apoptosys. In many members of the family, additional non-kinase domains contribute further specialization, resulting in subcellular localization, protein binding and regulation of activity, among others. About 500 genes encode members of the kinase family in the human genome, and although many of them represent well known genes, a larger number of genes code for proteins of more recent identification, or for unknown proteins identified as kinase only after computational studies. Results A systematic in silico study performed on the human genome, led to the identification of 5 genes, on chromosome 1, 11, 13, 15 and 16 respectively, and 1 pseudogene on chromosome X; some of these genes are reported as kinases from NCBI but are absent in other databases, such as KinBase. Comparative analysis of 483 gene regions and subsequent computational analysis, aimed at identifying unannotated exons, indicates that a large number of kinase may code for alternately spliced forms or be incorrectly annotated. An InterProScan automated analysis was perfomed to study domain distribution and combination in the various families. At the same time, other structural features were also added to the annotation process, including the putative presence of transmembrane alpha helices, and the cystein propensity to participate into a disulfide bridge. Conclusion The predicted human kinome was extended by identifiying both additional genes and potential splice variants, resulting in a varied panorama where functionality may be searched at the gene and protein level. Structural analysis of kinase proteins domains as defined in multiple sources together with transmembrane alpha helices and signal peptide prediction provides hints to function assignment. The results of the human kinome analysis are collected in the KinWeb database, available for browsing and searching over the internet, where all results from the comparative analysis and the gene structure annotation are made available, alongside the domain information. Kinases may be searched by domain combinations and the relative genes may be viewed in a graphic browser at various level of magnification up to gene organization on the full chromosome set. PMID:16351747
Signaling coupled epigenomic regulation of gene expression.

PubMed

Kumar, R; Deivendran, S; Santhoshkumar, T R; Pillai, M R

2017-10-26

Inheritance of genomic information independent of the DNA sequence, the epigenetics, as well as gene transcription are profoundly shaped by serine/threonine and tyrosine signaling kinases and components of the chromatin remodeling complexes. To precisely respond to a changing external milieu, human cells efficiently translate upstream signals into post-translational modifications (PTMs) on histones and coregulators such as corepressors, coactivators, DNA-binding factors and PTM modifying enzymes. Because a protein with multiple residues for putative PTMs is expected to undergo more than one PTM in cells stimulated with growth factors, the outcome of combinational PTM codes on histones and coregulators is profoundly shaped by regulatory interplays between PTMs. The genomic functions of signaling kinases in cancer cells are manifested by the downstream effectors of cytoplasmic signaling cascades as well as translocation of the cytoplasmic signaling kinases to the nucleus. Signaling-mediated phosphorylation of histones serves as a regulatory switch for other PTMs, and connects chromatin remodeling complexes into gene transcription and gene activity. Here, we will discuss the recent advances in signaling-dependent epigenomic regulation of gene transcription using a few representative cancer-relevant serine/threonine and tyrosine kinases and their interplay with chromatin remodeling factors in cancer cells.
The genomic structure of the human Charcot-Leyden crystal protein gene is analogous to those of the galectin genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dyer, K.D.; Handen, J.S.; Rosenberg, H.F.

The Charcot-Leyden crystal (CLC) protein, or eosinophil lysophospholipase, is a characteristic protein of human eosinophils and basophils; recent work has demonstrated that the CLC protein is both structurally and functionally related to the galectin family of {beta}-galactoside binding proteins. The galectins as a group share a number of features in common, including a linear ligand binding site encoded on a single exon. In this work, we demonstrate that the intron-exon structure of the gene encoding CLC is analogous to those encoding the galectins. The coding sequence of the CLC gene is divided into four exons, with the entire {beta}-galactoside bindingmore » site encoded by exon III. We have isolated CLC {beta}-galactoside binding sites from both orangutan (Pongo pygmaeus) and murine (Mus musculus) genomic DNAs, both encoded on single exons, and noted conservation of the amino acids shown to interact directly with the {beta}-galactoside ligand. The most likely interpretation of these results suggests the occurrence of one or more exon duplication and insertion events, resulting in the distribution of this lectin domain to CLC as well as to the multiple galectin genes. 35 refs., 3 figs.« less
Incorporating interaction networks into the determination of functionally related hit genes in genomic experiments with Markov random fields

PubMed Central

Robinson, Sean; Nevalainen, Jaakko; Pinna, Guillaume; Campalans, Anna; Radicella, J. Pablo; Guyon, Laurent

2017-01-01

Abstract Motivation: Incorporating gene interaction data into the identification of ‘hit’ genes in genomic experiments is a well-established approach leveraging the ‘guilt by association’ assumption to obtain a network based hit list of functionally related genes. We aim to develop a method to allow for multivariate gene scores and multiple hit labels in order to extend the analysis of genomic screening data within such an approach. Results: We propose a Markov random field-based method to achieve our aim and show that the particular advantages of our method compared with those currently used lead to new insights in previously analysed data as well as for our own motivating data. Our method additionally achieves the best performance in an independent simulation experiment. The real data applications we consider comprise of a survival analysis and differential expression experiment and a cell-based RNA interference functional screen. Availability and implementation: We provide all of the data and code related to the results in the paper. Contact: sean.j.robinson@utu.fi or laurent.guyon@cea.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28881978

Cinnamon extract regulates glucose transporter and insulin-signaling gene expression in mouse adipocytes.

PubMed

Cao, Heping; Graves, Donald J; Anderson, Richard A

2010-11-01

Cinnamon extracts (CE) are reported to have beneficial effects on people with normal and impaired glucose tolerance, the metabolic syndrome, type 2 diabetes, and insulin resistance. However, clinical results are controversial. Molecular characterization of CE effects is limited. This study investigated the effects of CE on gene expression in cultured mouse adipocytes. Water-soluble CE was prepared from ground cinnamon (Cinnamomum burmannii). Quantitative real-time PCR was used to investigate CE effects on the expression of genes coding for adipokines, glucose transporter (GLUT) family, and insulin-signaling components in mouse 3T3-L1 adipocytes. CE (100 μg/ml) increased GLUT1 mRNA levels 1.91±0.15, 4.39±0.78, and 6.98±2.18-fold of the control after 2-, 4-, and 16-h treatments, respectively. CE decreased the expression of further genes encoding insulin-signaling pathway proteins including GSK3B, IGF1R, IGF2R, and PIK3R1. This study indicates that CE regulates the expression of multiple genes in adipocytes and this regulation could contribute to the potential health benefits of CE. Published by Elsevier GmbH.
Analysis of the cytochrome c oxidase subunit II (COX2) gene in giant panda, Ailuropoda melanoleuca.

PubMed

Ling, S S; Zhu, Y; Lan, D; Li, D S; Pang, H Z; Wang, Y; Li, D Y; Wei, R P; Zhang, H M; Wang, C D; Hu, Y D

2017-01-23

The giant panda, Ailuropoda melanoleuca (Ursidae), has a unique bamboo-based diet; however, this low-energy intake has been sufficient to maintain the metabolic processes of this species since the fourth ice age. As mitochondria are the main sites for energy metabolism in animals, the protein-coding genes involved in mitochondrial respiratory chains, particularly cytochrome c oxidase subunit II (COX2), which is the rate-limiting enzyme in electron transfer, could play an important role in giant panda metabolism. Therefore, the present study aimed to isolate, sequence, and analyze the COX2 DNA from individuals kept at the Giant Panda Protection and Research Center, China, and compare these sequences with those of the other Ursidae family members. Multiple sequence alignment showed that the COX2 gene had three point mutations that defined three haplotypes, with 60% of the sequences corresponding to haplotype I. The neutrality tests revealed that the COX2 gene was conserved throughout evolution, and the maximum likelihood phylogenetic analysis, using homologous sequences from other Ursidae species, showed clustering of the COX2 sequences of giant pandas, suggesting that this gene evolved differently in them.
Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin

ERIC Educational Resources Information Center

Offner, Susan

2010-01-01

The beta hemoglobin protein is identical in humans and chimpanzees. In this tutorial, students see that even though the proteins are identical, the genes that code for them are not. There are many more differences in the introns than in the exons, which indicates that coding regions of DNA are more highly conserved than non-coding regions.
RRE: a tool for the extraction of non-coding regions surrounding annotated genes from genomic datasets.

PubMed

Lazzarato, F; Franceschinis, G; Botta, M; Cordero, F; Calogero, R A

2004-11-01

RRE allows the extraction of non-coding regions surrounding a coding sequence [i.e. gene upstream region, 5'-untranslated region (5'-UTR), introns, 3'-UTR, downstream region] from annotated genomic datasets available at NCBI. RRE parser and web-based interface are accessible at http://www.bioinformatica.unito.it/bioinformatics/rre/rre.html
Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

PubMed

Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M; Loveland, Jane E; Mudge, Jonathan M; Wallin, Craig; Girón, Carlos G; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; Martin, Fergal J; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Suner, Marie-Marthe; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bruford, Elspeth A; Bult, Carol J; Frankish, Adam; Murphy, Terence; Pruitt, Kim D

2018-01-04

The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
Evolution of coding and non-coding genes in HOX clusters of a marsupial.

PubMed

Yu, Hongshi; Lindsay, James; Feng, Zhi-Ping; Frankenberg, Stephen; Hu, Yanqiu; Carone, Dawn; Shaw, Geoff; Pask, Andrew J; O'Neill, Rachel; Papenfuss, Anthony T; Renfree, Marilyn B

2012-06-18

The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial.
Evolution of coding and non-coding genes in HOX clusters of a marsupial

PubMed Central

2012-01-01

Background The HOX gene clusters are thought to be highly conserved amongst mammals and other vertebrates, but the long non-coding RNAs have only been studied in detail in human and mouse. The sequencing of the kangaroo genome provides an opportunity to use comparative analyses to compare the HOX clusters of a mammal with a distinct body plan to those of other mammals. Results Here we report a comparative analysis of HOX gene clusters between an Australian marsupial of the kangaroo family and the eutherians. There was a strikingly high level of conservation of HOX gene sequence and structure and non-protein coding genes including the microRNAs miR-196a, miR-196b, miR-10a and miR-10b and the long non-coding RNAs HOTAIR, HOTAIRM1 and HOXA11AS that play critical roles in regulating gene expression and controlling development. By microRNA deep sequencing and comparative genomic analyses, two conserved microRNAs (miR-10a and miR-10b) were identified and one new candidate microRNA with typical hairpin precursor structure that is expressed in both fibroblasts and testes was found. The prediction of microRNA target analysis showed that several known microRNA targets, such as miR-10, miR-414 and miR-464, were found in the tammar HOX clusters. In addition, several novel and putative miRNAs were identified that originated from elsewhere in the tammar genome and that target the tammar HOXB and HOXD clusters. Conclusions This study confirms that the emergence of known long non-coding RNAs in the HOX clusters clearly predate the marsupial-eutherian divergence 160 Ma ago. It also identified a new potentially functional microRNA as well as conserved miRNAs. These non-coding RNAs may participate in the regulation of HOX genes to influence the body plan of this marsupial. PMID:22708672
Two Perspectives on the Origin of the Standard Genetic Code

NASA Astrophysics Data System (ADS)

Sengupta, Supratim; Aggarwal, Neha; Bandhu, Ashutosh Vishwa

2014-12-01

The origin of a genetic code made it possible to create ordered sequences of amino acids. In this article we provide two perspectives on code origin by carrying out simulations of code-sequence coevolution in finite populations with the aim of examining how the standard genetic code may have evolved from more primitive code(s) encoding a small number of amino acids. We determine the efficacy of the physico-chemical hypothesis of code origin in the absence and presence of horizontal gene transfer (HGT) by allowing a diverse collection of code-sequence sets to compete with each other. We find that in the absence of horizontal gene transfer, natural selection between competing codes distinguished by differences in the degree of physico-chemical optimization is unable to explain the structure of the standard genetic code. However, for certain probabilities of the horizontal transfer events, a universal code emerges having a structure that is consistent with the standard genetic code.
Recurrent Coding Sequence Variation Explains Only A Small Fraction of the Genetic Architecture of Colorectal Cancer

PubMed Central

Timofeeva, Maria N.; Kinnersley, Ben; Farrington, Susan M.; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J.; Harris, Sarah E.; Northwood, Emma L.; Barrett, Jennifer H.; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J.; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D. Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G.; Houlston, Richard S.

2015-01-01

Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10−7), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10−7); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10−7 and OR = 1.09, P = 7.4 × 10−8); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10−9), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10−6). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10−4) and DNA mismatch repair genes (P = 6.1 × 10−4) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438
Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.

PubMed

Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M

2010-12-15

Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.
Gene Expression and Polymorphism of Myostatin Gene and its Association with Growth Traits in Chicken.

PubMed

Dushyanth, K; Bhattacharya, T K; Shukla, R; Chatterjee, R N; Sitaramamma, T; Paswan, C; Guru Vishnu, P

2016-10-01

Myostatin is a member of TGF-β super family and is directly involved in regulation of body growth through limiting muscular growth. A study was carried out in three chicken lines to identify the polymorphism in the coding region of the myostatin gene through SSCP and DNA sequencing. A total of 12 haplotypes were observed in myostatin coding region of chicken. Significant associations between haplogroups with body weight at day 1, 14, 28, and 42 days, and carcass traits at 42 days were observed across the lines. It is concluded that the coding region of myostatin gene was polymorphic, with varied levels of expression among lines and had significant effects on growth traits. The expression of MSTN gene varied during embryonic and post hatch development stage.
Identification of limit cycles in multi-nonlinearity, multiple path systems

NASA Technical Reports Server (NTRS)

Mitchell, J. R.; Barron, O. L.

1979-01-01

A method of analysis which identifies limit cycles in autonomous systems with multiple nonlinearities and multiple forward paths is presented. The FORTRAN code for implementing the Harmonic Balance Algorithm is reported. The FORTRAN code is used to identify limit cycles in multiple path and nonlinearity systems while retaining the effects of several harmonic components.
Global transcriptome analysis reveals extensive gene remodeling, alternative splicing and differential transcription profiles in non-seed vascular plant Selaginella moellendorffii.

PubMed

Zhu, Yan; Chen, Longxian; Zhang, Chengjun; Hao, Pei; Jing, Xinyun; Li, Xuan

2017-01-25

Selaginella moellendorffii, a lycophyte, is a model plant to study the early evolution and development of vascular plants. As the first and only sequenced lycophyte to date, the genome of S. moellendorffii revealed many conserved genes and pathways, as well as specialized genes different from flowering plants. Despite the progress made, little is known about long noncoding RNAs (lncRNA) and the alternative splicing (AS) of coding genes in S. moellendorffii. Its coding gene models have not been fully validated with transcriptome data. Furthermore, it remains important to understand whether the regulatory mechanisms similar to flowering plants are used, and how they operate in a non-seed primitive vascular plant. RNA-sequencing (RNA-seq) was performed for three S. moellendorffii tissues, root, stem, and leaf, by constructing strand-specific RNA-seq libraries from RNA purified using RiboMinus isolation protocol. A total of 176 million reads (44 Gbp) were obtained from three tissue types, and were mapped to S. moellendorffii genome. By comparing with 22,285 existing gene models of S. moellendorffii, we identified 7930 high-confidence novel coding genes (a 35.6% increase), and for the first time reported 4422 lncRNAs in a lycophyte. Further, we refined 2461 (11.0%) of existing gene models, and identified 11,030 AS events (for 5957 coding genes) revealed for the first time for lycophytes. Tissue-specific gene expression with functional implication was analyzed, and 1031, 554, and 269 coding genes, and 174, 39, and 17 lncRNAs were identified in root, stem, and leaf tissues, respectively. The expression of critical genes for vascular development stages, i.e. formation of provascular cells, xylem specification and differentiation, and phloem specification and differentiation, was compared in S. moellendorffii tissues, indicating a less complex regulatory mechanism in lycophytes than in flowering plants. The results were further strengthened by the evolutionary trend of seven transcription factor families related to vascular development, which was observed among four representative species of seed and non-seed vascular plants, and nonvascular land and aquatic plants. The deep RNA-seq study of S. moellendorffii discovered extensive new gene contents, including novel coding genes, lncRNAs, AS events, and refined gene models. Compared to flowering vascular plants, S. moellendorffii displayed a less complexity in both gene structure, alternative splicing, and regulatory elements of vascular development. The study offered important insight into the evolution of vascular plants, and the regulation mechanism of vascular development in a non-seed plant.
Users manual for program NYQUIST: Liquid rocket nyquist plots developed for use on a PC computer

NASA Astrophysics Data System (ADS)

Armstrong, Wilbur C.

1992-06-01

The piping in a liquid rocket can assume complex configurations due to multiple tanks, multiple engines, and structures that must be piped around. The capability to handle some of these complex configurations have been incorporated into the NYQUIST code. The capability to modify the input on line has been implemented. The configurations allowed include multiple tanks, multiple engines, and the splitting of a pipe into unequal segments going to different (or the same) engines. This program will handle the following type elements: straight pipes, bends, inline accumulators, tuned stub accumulators, Helmholtz resonators, parallel resonators, pumps, split pipes, multiple tanks, and multiple engines. The code is too large to compile as one program using Microsoft FORTRAN 5; therefore, the code was broken into two segments: NYQUIST1.FOR and NYQUIST2.FOR. These are compiled separately and then linked together. The final run code is not too large (approximately equals 344,000 bytes).
Users manual for program NYQUIST: Liquid rocket nyquist plots developed for use on a PC computer

NASA Technical Reports Server (NTRS)

Armstrong, Wilbur C.

1992-01-01

The piping in a liquid rocket can assume complex configurations due to multiple tanks, multiple engines, and structures that must be piped around. The capability to handle some of these complex configurations have been incorporated into the NYQUIST code. The capability to modify the input on line has been implemented. The configurations allowed include multiple tanks, multiple engines, and the splitting of a pipe into unequal segments going to different (or the same) engines. This program will handle the following type elements: straight pipes, bends, inline accumulators, tuned stub accumulators, Helmholtz resonators, parallel resonators, pumps, split pipes, multiple tanks, and multiple engines. The code is too large to compile as one program using Microsoft FORTRAN 5; therefore, the code was broken into two segments: NYQUIST1.FOR and NYQUIST2.FOR. These are compiled separately and then linked together. The final run code is not too large (approximately equals 344,000 bytes).
Discovery of rare protein-coding genes in model methylotroph Methylobacterium extorquens AM1.

PubMed

Kumar, Dhirendra; Mondal, Anupam Kumar; Yadav, Amit Kumar; Dash, Debasis

2014-12-01

Proteogenomics involves the use of MS to refine annotation of protein-coding genes and discover genes in a genome. We carried out comprehensive proteogenomic analysis of Methylobacterium extorquens AM1 (ME-AM1) from publicly available proteomics data with a motive to improve annotation for methylotrophs; organisms capable of surviving in reduced carbon compounds such as methanol. Besides identifying 2482(50%) proteins, 29 new genes were discovered and 66 annotated gene models were revised in ME-AM1 genome. One such novel gene is identified with 75 peptides, lacks homolog in other methylobacteria but has glycosyl transferase and lipopolysaccharide biosynthesis protein domains, indicating its potential role in outer membrane synthesis. Many novel genes are present only in ME-AM1 among methylobacteria. Distant homologs of these genes in unrelated taxonomic classes and low GC-content of few genes suggest lateral gene transfer as a potential mode of their origin. Annotations of methylotrophy related genes were also improved by the discovery of a short gene in methylotrophy gene island and redefining a gene important for pyrroquinoline quinone synthesis, essential for methylotrophy. The combined use of proteogenomics and rigorous bioinformatics analysis greatly enhanced the annotation of protein-coding genes in model methylotroph ME-AM1 genome. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Transcriptome Analysis of an Insecticide Resistant Housefly Strain: Insights about SNPs and Regulatory Elements in Cytochrome P450 Genes.

PubMed

Mahmood, Khalid; Højland, Dorte H; Asp, Torben; Kristensen, Michael

2016-01-01

Insecticide resistance in the housefly, Musca domestica, has been investigated for more than 60 years. It will enter a new era after the recent publication of the housefly genome and the development of multiple next generation sequencing technologies. The genetic background of the xenobiotic response can now be investigated in greater detail. Here, we investigate the 454-pyrosequencing transcriptome of the spinosad-resistant 791spin strain in relation to the housefly genome with focus on P450 genes. The de novo assembly of clean reads gave 35,834 contigs consisting of 21,780 sequences of the spinosad resistant strain. The 3,648 sequences were annotated with an enzyme code EC number and were mapped to 124 KEGG pathways with metabolic processes as most highly represented pathway. One hundred and twenty contigs were annotated as P450s covering 44 different P450 genes of housefly. Eight differentially expressed P450s genes were identified and investigated for SNPs, CpG islands and common regulatory motifs in promoter and coding regions. Functional annotation clustering of metabolic related genes and motif analysis of P450s revealed their association with epigenetic, transcription and gene expression related functions. The sequence variation analysis resulted in 12 SNPs and eight of them found in cyp6d1. There is variation in location, size and frequency of CpG islands and specific motifs were also identified in these P450s. Moreover, identified motifs were associated to GO terms and transcription factors using bioinformatic tools. Transcriptome data of a spinosad resistant strain provide together with genome data fundamental support for future research to understand evolution of resistance in houseflies. Here, we report for the first time the SNPs, CpG islands and common regulatory motifs in differentially expressed P450s. Taken together our findings will serve as a stepping stone to advance understanding of the mechanism and role of P450s in xenobiotic detoxification.
A Stable Thoracic Hox Code and Epimorphosis Characterize Posterior Regeneration in Capitella teleta

PubMed Central

de Jong, Danielle M.; Seaver, Elaine C.

2016-01-01

Regeneration, the ability to replace lost tissues and body parts following traumatic injury, occurs widely throughout the animal tree of life. Regeneration occurs either by remodeling of pre-existing tissues, through addition of new cells by cell division, or a combination of both. We describe a staging system for posterior regeneration in the annelid, Capitella teleta, and use the C. teleta Hox gene code as markers of regional identity for regenerating tissue along the anterior-posterior axis. Following amputation of different posterior regions of the animal, a blastema forms and by two days, proliferating cells are detected by EdU incorporation, demonstrating that epimorphosis occurs during posterior regeneration of C. teleta. Neurites rapidly extend into the blastema, and gradually become organized into discrete nerves before new ganglia appear approximately seven days after amputation. In situ hybridization shows that seven of the ten Hox genes examined are expressed in the blastema, suggesting roles in patterning the newly forming tissue, although neither spatial nor temporal co-linearity was detected. We hypothesized that following amputation, Hox gene expression in pre-existing segments would be re-organized to scale, and the remaining fragment would express the complete suite of Hox genes. Surprisingly, most Hox genes display stable expression patterns in the ganglia of pre-existing tissue following amputation at multiple axial positions, indicating general stability of segmental identity. However, the three Hox genes, CapI-lox4, CapI-lox2 and CapI-Post2, each shift its anterior expression boundary by one segment, and each shift includes a subset of cells in the ganglia. This expression shift depends upon the axial position of the amputation. In C. teleta, thoracic segments exhibit stable positional identity with limited morphallaxis, in contrast with the extensive body remodeling that occurs during regeneration of some other annelids, planarians and acoel flatworms. PMID:26894631
Chromosomal features of Escherichia coli serotype O2:K2, an avian pathogenic E. coli.

PubMed

Jørgensen, Steffen L; Kudirkiene, Egle; Li, Lili; Christensen, Jens P; Olsen, John E; Nolan, Lisa; Olsen, Rikke H

2017-01-01

Escherichia coli causing infection outside the gastrointestinal system are referred to as extra-intestinal pathogenic E. coli. Avian pathogenic E. coli is a subgroup of extra-intestinal pathogenic E. coli and infections due to avian pathogenic E. coli have major impact on poultry production economy and welfare worldwide. An almost defining characteristic of avian pathogenic E. coli is the carriage of plasmids, which may encode virulence factors and antibiotic resistance determinates. For the same reason, plasmids of avian pathogenic E. coli have been intensively studied. However, genes encoded by the chromosome may also be important for disease manifestation and antimicrobial resistance. For the E. coli strain APEC_O2 the plasmids have been sequenced and analyzed in several studies, and E. coli APEC_O2 may therefore serve as a reference strain in future studies. Here we describe the chromosomal features of E. coli APEC_O2. E. coli APEC_O2 is a sequence type ST135, has a chromosome of 4,908,820 bp (plasmid removed), comprising 4672 protein-coding genes, 110 RNA genes, and 156 pseudogenes, with an average G + C content of 50.69%. We identified 82 insertion sequences as well as 4672 protein coding sequences, 12 predicated genomic islands, three prophage-related sequences, and two clustered regularly interspaced short palindromic repeats regions on the chromosome, suggesting the possible occurrence of horizontal gene transfer in this strain. The wildtype strain of E. coli APEC_O2 is resistant towards multiple antimicrobials, however, no (complete) antibiotic resistance genes were present on the chromosome, but a number of genes associated with extra-intestinal disease were identified. Together, the information provided here on E. coli APEC_O2 will assist in future studies of avian pathogenic E. coli strains, in particular regarding strain of E. coli APEC_O2, and aid in the general understanding of the pathogenesis of avian pathogenic E. coli .
A Common Histone Modification Code on C4 Genes in Maize and Its Conservation in Sorghum and Setaria italica1[W][OA

PubMed Central

Heimann, Louisa; Horst, Ina; Perduns, Renke; Dreesen, Björn; Offermann, Sascha; Peterhansel, Christoph

2013-01-01

C4 photosynthesis evolved more than 60 times independently in different plant lineages. Each time, multiple genes were recruited into C4 metabolism. The corresponding promoters acquired new regulatory features such as high expression, light induction, or cell type-specific expression in mesophyll or bundle sheath cells. We have previously shown that histone modifications contribute to the regulation of the model C4 phosphoenolpyruvate carboxylase (C4-Pepc) promoter in maize (Zea mays). We here tested the light- and cell type-specific responses of three selected histone acetylations and two histone methylations on five additional C4 genes (C4-Ca, C4-Ppdk, C4-Me, C4-Pepck, and C4-RbcS2) in maize. Histone acetylation and nucleosome occupancy assays indicated extended promoter regions with regulatory upstream regions more than 1,000 bp from the transcription initiation site for most of these genes. Despite any detectable homology of the promoters on the primary sequence level, histone modification patterns were highly coregulated. Specifically, H3K9ac was regulated by illumination, whereas H3K4me3 was regulated in a cell type-specific manner. We further compared histone modifications on the C4-Pepc and C4-Me genes from maize and the homologous genes from sorghum (Sorghum bicolor) and Setaria italica. Whereas sorghum and maize share a common C4 origin, C4 metabolism evolved independently in S. italica. The distribution of histone modifications over the promoters differed between the species, but differential regulation of light-induced histone acetylation and cell type-specific histone methylation were evident in all three species. We propose that a preexisting histone code was recruited into C4 promoter control during the evolution of C4 metabolism. PMID:23564230

Combining multiple decisions: applications to bioinformatics

NASA Astrophysics Data System (ADS)

Yukinawa, N.; Takenouchi, T.; Oba, S.; Ishii, S.

2008-01-01

Multi-class classification is one of the fundamental tasks in bioinformatics and typically arises in cancer diagnosis studies by gene expression profiling. This article reviews two recent approaches to multi-class classification by combining multiple binary classifiers, which are formulated based on a unified framework of error-correcting output coding (ECOC). The first approach is to construct a multi-class classifier in which each binary classifier to be aggregated has a weight value to be optimally tuned based on the observed data. In the second approach, misclassification of each binary classifier is formulated as a bit inversion error with a probabilistic model by making an analogy to the context of information transmission theory. Experimental studies using various real-world datasets including cancer classification problems reveal that both of the new methods are superior or comparable to other multi-class classification methods.
MicroRNA-133 mediates cardiac diseases: Mechanisms and clinical implications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Yi; Liang, Yan; Zhang, Jin-fang

MicroRNAs (miRNAs) belong to the family of small non-coding RNAs that mediate gene expression by post-transcriptional regulation. Increasing evidence have demonstrated that miR-133 is enriched in muscle tissues and myogenic cells, and its aberrant expression could induce the occurrence and development of cardiac disorders, such as cardiac hypertrophy, heart failure, etc. In this review, we summarized the regulatory roles of miR-133 in cardiac disorders and the underlying mechanisms, which suggest that miR-133 may be a potential diagnostic and therapeutic tool for cardiac disorders. - Highlights: • miR-218 is frequently downregulated in multiple cancers. • miR-218 plays pivotal roles in carcinogenesis.more » • miR-218 mediates proliferation, apoptosis, metastasis, invasion, etc. • miR-218 mediates tumorigenesis and metastasis via multiple pathways.« less
Epigenomics and the concept of degeneracy in biological systems

PubMed Central

Mason, Paul H.; Barron, Andrew B.

2014-01-01

Researchers in the field of epigenomics are developing more nuanced understandings of biological complexity, and exploring the multiple pathways that lead to phenotypic expression. The concept of degeneracy—referring to the multiple pathways that a system recruits to achieve functional plasticity—is an important conceptual accompaniment to the growing body of knowledge in epigenomics. Distinct from degradation, redundancy and dilapidation; degeneracy refers to the plasticity of traits whose function overlaps in some environments, but diverges in others. While a redundant system is composed of repeated identical elements performing the same function, a degenerate system is composed of different elements performing similar or overlapping functions. Here, we describe the degenerate structure of gene regulatory systems from the basic genetic code to flexible epigenomic modifications, and discuss how these structural features have contributed to organism complexity, robustness, plasticity and evolvability. PMID:24335757
Complete mitochondrial genome of the agarophyte red alga Gelidium vagum (Gelidiales).

PubMed

Yang, Eun Chan; Kim, Kyeong Mi; Boo, Ga Hun; Lee, Jung-Hyun; Boo, Sung Min; Yoon, Hwan Su

2014-08-01

We describe the first complete mitochondrial genome of Gelidium vagum (Gelidiales) (24,901 bp, 30.4% GC content), an agar-producing red alga. The circular mitochondrial genome contains 43 genes, including 23 protein-coding, 18 tRNA and 2 rRNA genes. All the protein-coding genes have a typical ATG start codon. No introns were found. Two genes, secY and rps12, were overlapped by 41 bp.
Estimating replicate time shifts using Gaussian process regression

PubMed Central

Liu, Qiang; Andersen, Bogi; Smyth, Padhraic; Ihler, Alexander

2010-01-01

Motivation: Time-course gene expression datasets provide important insights into dynamic aspects of biological processes, such as circadian rhythms, cell cycle and organ development. In a typical microarray time-course experiment, measurements are obtained at each time point from multiple replicate samples. Accurately recovering the gene expression patterns from experimental observations is made challenging by both measurement noise and variation among replicates' rates of development. Prior work on this topic has focused on inference of expression patterns assuming that the replicate times are synchronized. We develop a statistical approach that simultaneously infers both (i) the underlying (hidden) expression profile for each gene, as well as (ii) the biological time for each individual replicate. Our approach is based on Gaussian process regression (GPR) combined with a probabilistic model that accounts for uncertainty about the biological development time of each replicate. Results: We apply GPR with uncertain measurement times to a microarray dataset of mRNA expression for the hair-growth cycle in mouse back skin, predicting both profile shapes and biological times for each replicate. The predicted time shifts show high consistency with independently obtained morphological estimates of relative development. We also show that the method systematically reduces prediction error on out-of-sample data, significantly reducing the mean squared error in a cross-validation study. Availability: Matlab code for GPR with uncertain time shifts is available at http://sli.ics.uci.edu/Code/GPRTimeshift/ Contact: ihler@ics.uci.edu PMID:20147305
Isolated familial somatotropinomas: clinical features and analysis of the MEN1 gene.

PubMed

De Menis, Ernesto; Prezant, Toni R

2002-01-01

Isolated familial somatotropinomas (IFS) rarely occurs in the absence of multiple endocrine neoplasia type I (MEN1) or the Carney complex. In the present study we report two Italian siblings affected by GH-secreting adenomas. There was no history of parental consanguinity. The sister presented at 18 years of age with secondary amenorrhea and acromegalic features and one of her two brothers presented with gigantism at the same age. Endocrinological investigations confirmed GH hypersecretion in both cases. Although a pituitary microadenoma was detected in both patients, transsphenoidal surgery was not successful. The sister received conventional radiotherapy and acromegaly is now considered controlled; the brother is being treated with octreotide LAR 30 mg monthly and the disease is considered clinically active. Patients, their parents and the unaffected brother underwent extensive evaluation, and no features of MEN1 or Carney complex were found. Analysis of polymorphic microsatellite markers from chromosome 11q13 (D11S599, D11S4945, D11S4939, D11S4938 and D11S987) showed that the acromegalic siblings had inherited different maternal chromosomes and shared the paternal chromosome. No pathogenic MEN1 sequence changes were detected by sequencing or dideoxy fingerprinting of the coding sequence (exons 2-10) and exon/intron junctions. Although mutations in the promoter, introns or untranslated regions of the MEN1 gene cannot be excluded, germline mutations within the coding region of this gene do not appear responsible for IFS in this family.
Rare Noncoding Mutations Extend the Mutational Spectrum in the PGAP3 Subtype of Hyperphosphatasia with Mental Retardation Syndrome

PubMed Central

Knaus, Alexej; Awaya, Tomonari; Helbig, Ingo; Afawi, Zaid; Pendziwiat, Manuela; Abu‐Rachma, Jubran; Thompson, Miles D.; Cole, David E.; Skinner, Steve; Annese, Fran; Canham, Natalie; Schweiger, Michal R.; Robinson, Peter N.; Mundlos, Stefan; Kinoshita, Taroh; Munnich, Arnold

2016-01-01

ABSTRACT HPMRS or Mabry syndrome is a heterogeneous glycosylphosphatidylinositol (GPI) anchor deficiency that is caused by an impairment of synthesis or maturation of the GPI‐anchor. The expressivity of the clinical features in HPMRS varies from severe syndromic forms with multiple organ malformations to mild nonsyndromic intellectual disability. In about half of the patients with the clinical diagnosis of HPMRS, pathogenic mutations can be identified in the coding region in one of the six genes, one among them is PGAP3. In this work, we describe a screening approach with sequence specific baits for transcripts of genes of the GPI pathway that allows the detection of functionally relevant mutations also including introns and the 5′ and 3′ UTR. By this means, we also identified pathogenic noncoding mutations, which increases the diagnostic yield for HPMRS on the basis of intellectual disability and elevated serum alkaline phosphatase. In eight affected individuals from different ethnicities, we found seven novel pathogenic mutations in PGAP3. Besides five missense mutations, we identified an intronic mutation, c.558‐10G>A, that causes an aberrant splice product and a mutation in the 3′UTR, c.*559C>T, that is associated with substantially lower mRNA levels. We show that our novel screening approach is a useful rapid detection tool for alterations in genes coding for key components of the GPI pathway. PMID:27120253
Vector Adaptive/Predictive Encoding Of Speech

NASA Technical Reports Server (NTRS)

Chen, Juin-Hwey; Gersho, Allen

1989-01-01

Vector adaptive/predictive technique for digital encoding of speech signals yields decoded speech of very good quality after transmission at coding rate of 9.6 kb/s and of reasonably good quality at 4.8 kb/s. Requires 3 to 4 million multiplications and additions per second. Combines advantages of adaptive/predictive coding, and code-excited linear prediction, yielding speech of high quality but requires 600 million multiplications and additions per second at encoding rate of 4.8 kb/s. Vector adaptive/predictive coding technique bridges gaps in performance and complexity between adaptive/predictive coding and code-excited linear prediction.
Distinct regulation of alternative polyadenylation and gene expression by nuclear poly(A) polymerases

PubMed Central

Li, Wencheng; Laishram, Rakesh S.; Hoque, Mainul; Ji, Zhe

2017-01-01

Abstract Polyadenylation of nascent RNA by poly(A) polymerase (PAP) is important for 3′ end maturation of almost all eukaryotic mRNAs. Most mammalian genes harbor multiple polyadenylation sites (PASs), leading to expression of alternative polyadenylation (APA) isoforms with distinct functions. How poly(A) polymerases may regulate PAS usage and hence gene expression is poorly understood. Here, we show that the nuclear canonical (PAPα and PAPγ) and non-canonical (Star-PAP) PAPs play diverse roles in PAS selection and gene expression. Deficiencies in the PAPs resulted in perturbations of gene expression, with Star-PAP impacting lowly expressed mRNAs and long-noncoding RNAs to the greatest extent. Importantly, different PASs of a gene are distinctly regulated by different PAPs, leading to widespread relative expression changes of APA isoforms. The location and surrounding sequence motifs of a PAS appear to differentiate its regulation by the PAPs. We show Star-PAP-specific PAS usage regulates the expression of the eukaryotic translation initiation factor EIF4A1, the tumor suppressor gene PTEN and the long non-coding RNA NEAT1. The Star-PAP-mediated APA of PTEN is essential for DNA damage-induced increase of PTEN protein levels. Together, our results reveal a PAS-guided and PAP-mediated paradigm for gene expression in response to cellular signaling cues. PMID:28911096
Evolution and expression analysis of the grape (Vitis vinifera L.) WRKY gene family.

PubMed

Guo, Chunlei; Guo, Rongrong; Xu, Xiaozhao; Gao, Min; Li, Xiaoqin; Song, Junyang; Zheng, Yi; Wang, Xiping

2014-04-01

WRKY proteins comprise a large family of transcription factors that play important roles in plant defence regulatory networks, including responses to various biotic and abiotic stresses. To date, no large-scale study of WRKY genes has been undertaken in grape (Vitis vinifera L.). In this study, a total of 59 putative grape WRKY genes (VvWRKY) were identified and renamed on the basis of their respective chromosome distribution. A multiple sequence alignment analysis using all predicted grape WRKY genes coding sequences, together with those from Arabidopsis thaliana and tomato (Solanum lycopersicum), indicated that the 59 VvWRKY genes can be classified into three main groups (I-III). An evaluation of the duplication events suggested that several WRKY genes arose before the divergence of the grape and Arabidopsis lineages. Moreover, expression profiles derived from semiquantitative PCR and real-time quantitative PCR analyses showed distinct expression patterns in various tissues and in response to different treatments. Four VvWRKY genes showed a significantly higher expression in roots or leaves, 55 responded to varying degrees to at least one abiotic stress treatment, and the expression of 38 were altered following powdery mildew (Erysiphe necator) infection. Most VvWRKY genes were downregulated in response to abscisic acid or salicylic acid treatments, while the expression of a subset was upregulated by methyl jasmonate or ethylene treatments.
Evolution and expression analysis of the grape (Vitis vinifera L.) WRKY gene family

PubMed Central

Guo, Chunlei; Guo, Rongrong; Wang, Xiping

2014-01-01

WRKY proteins comprise a large family of transcription factors that play important roles in plant defence regulatory networks, including responses to various biotic and abiotic stresses. To date, no large-scale study of WRKY genes has been undertaken in grape (Vitis vinifera L.). In this study, a total of 59 putative grape WRKY genes (VvWRKY) were identified and renamed on the basis of their respective chromosome distribution. A multiple sequence alignment analysis using all predicted grape WRKY genes coding sequences, together with those from Arabidopsis thaliana and tomato (Solanum lycopersicum), indicated that the 59 VvWRKY genes can be classified into three main groups (I–III). An evaluation of the duplication events suggested that several WRKY genes arose before the divergence of the grape and Arabidopsis lineages. Moreover, expression profiles derived from semiquantitative PCR and real-time quantitative PCR analyses showed distinct expression patterns in various tissues and in response to different treatments. Four VvWRKY genes showed a significantly higher expression in roots or leaves, 55 responded to varying degrees to at least one abiotic stress treatment, and the expression of 38 were altered following powdery mildew (Erysiphe necator) infection. Most VvWRKY genes were downregulated in response to abscisic acid or salicylic acid treatments, while the expression of a subset was upregulated by methyl jasmonate or ethylene treatments. PMID:24510937
HLA-E regulatory and coding region variability and haplotypes in a Brazilian population sample.

PubMed

Ramalho, Jaqueline; Veiga-Castelli, Luciana C; Donadi, Eduardo A; Mendes-Junior, Celso T; Castelli, Erick C

2017-11-01

The HLA-E gene is characterized by low but wide expression on different tissues. HLA-E is considered a conserved gene, being one of the least polymorphic class I HLA genes. The HLA-E molecule interacts with Natural Killer cell receptors and T lymphocytes receptors, and might activate or inhibit immune responses depending on the peptide associated with HLA-E and with which receptors HLA-E interacts to. Variable sites within the HLA-E regulatory and coding segments may influence the gene function by modifying its expression pattern or encoded molecule, thus, influencing its interaction with receptors and the peptide. Here we propose an approach to evaluate the gene structure, haplotype pattern and the complete HLA-E variability, including regulatory (promoter and 3'UTR) and coding segments (with introns), by using massively parallel sequencing. We investigated the variability of 420 samples from a very admixed population such as Brazilians by using this approach. Considering a segment of about 7kb, 63 variable sites were detected, arranged into 75 extended haplotypes. We detected 37 different promoter sequences (but few frequent ones), 27 different coding sequences (15 representing new HLA-E alleles) and 12 haplotypes at the 3'UTR segment, two of them presenting a summed frequency of 90%. Despite the number of coding alleles, they encode mainly two different full-length molecules, known as E*01:01 and E*01:03, which corresponds to about 90% of all. In addition, differently from what has been previously observed for other non classical HLA genes, the relationship among the HLA-E promoter, coding and 3'UTR haplotypes is not straightforward because the same promoter and 3'UTR haplotypes were many times associated with different HLA-E coding haplotypes. This data reinforces the presence of only two main full-length HLA-E molecules encoded by the many HLA-E alleles detected in our population sample. In addition, this data does indicate that the distal HLA-E promoter is by far the most variable segment. Further analyses involving the binding of transcription factors and non-coding RNAs, as well as the HLA-E expression in different tissues, are necessary to evaluate whether these variable sites at regulatory segments (or even at the coding sequence) may influence the gene expression profile. Copyright © 2017 Elsevier Ltd. All rights reserved.
Hierarchical parallelisation of functional renormalisation group calculations - hp-fRG

NASA Astrophysics Data System (ADS)

Rohe, Daniel

2016-10-01

The functional renormalisation group (fRG) has evolved into a versatile tool in condensed matter theory for studying important aspects of correlated electron systems. Practical applications of the method often involve a high numerical effort, motivating the question in how far High Performance Computing (HPC) can leverage the approach. In this work we report on a multi-level parallelisation of the underlying computational machinery and show that this can speed up the code by several orders of magnitude. This in turn can extend the applicability of the method to otherwise inaccessible cases. We exploit three levels of parallelisation: Distributed computing by means of Message Passing (MPI), shared-memory computing using OpenMP, and vectorisation by means of SIMD units (single-instruction-multiple-data). Results are provided for two distinct High Performance Computing (HPC) platforms, namely the IBM-based BlueGene/Q system JUQUEEN and an Intel Sandy-Bridge-based development cluster. We discuss how certain issues and obstacles were overcome in the course of adapting the code. Most importantly, we conclude that this vast improvement can actually be accomplished by introducing only moderate changes to the code, such that this strategy may serve as a guideline for other researcher to likewise improve the efficiency of their codes.
Non-coding cancer driver candidates identified with a sample- and position-specific model of the somatic mutation rate

PubMed Central

Juul, Malene; Bertl, Johanna; Guo, Qianyun; Nielsen, Morten Muhlig; Świtnicki, Michał; Hornshøj, Henrik; Madsen, Tobias; Hobolth, Asger; Pedersen, Jakob Skou

2017-01-01

Non-coding mutations may drive cancer development. Statistical detection of non-coding driver regions is challenged by a varying mutation rate and uncertainty of functional impact. Here, we develop a statistically founded non-coding driver-detection method, ncdDetect, which includes sample-specific mutational signatures, long-range mutation rate variation, and position-specific impact measures. Using ncdDetect, we screened non-coding regulatory regions of protein-coding genes across a pan-cancer set of whole-genomes (n = 505), which top-ranked known drivers and identified new candidates. For individual candidates, presence of non-coding mutations associates with altered expression or decreased patient survival across an independent pan-cancer sample set (n = 5454). This includes an antigen-presenting gene (CD1A), where 5’UTR mutations correlate significantly with decreased survival in melanoma. Additionally, mutations in a base-excision-repair gene (SMUG1) correlate with a C-to-T mutational-signature. Overall, we find that a rich model of mutational heterogeneity facilitates non-coding driver identification and integrative analysis points to candidates of potential clinical relevance. DOI: http://dx.doi.org/10.7554/eLife.21778.001 PMID:28362259
Improvement and Optimization of Two Engineered Phage Resistance Mechanisms in Lactococcus lactis

PubMed Central

McGrath, Stephen; Fitzgerald, Gerald F.; van Sinderen, Douwe

2001-01-01

Homologous replication module genes were identified for four P335 type phages. DNA sequence analysis revealed that all four phages exhibited more than 90% DNA homology for at least two genes, designated rep2009 and orf17. One of these genes, rep2009, codes for a putative replisome organizer protein and contains an assumed origin of phage DNA replication (ori2009), which was identical for all four phages. DNA fragments representing the ori2009 sequence confer a phage-encoded resistance (Per) phenotype on lactococcal hosts when they are supplied on a high-copy-number vector. Furthermore, cloning multiple copies of the ori2009 sequence was found to increase the effectiveness of the Per phenotype conferred. A number of antisense plasmids targeting specific genes of the replication module were constructed. Two separate plasmids targeting rep2009 and orf17 were found to efficiently inhibit proliferation of all four phages by interfering with intracellular phage DNA replication. These results represent two highly effective strategies for inhibiting bacteriophage proliferation, and they also identify a novel gene, orf17, which appears to be important for phage DNA replication. Furthermore, these results indicate that although the actual mechanisms of DNA replication are very similar, if not identical, for all four phages, expression of the replication genes is significantly different in each case. PMID:11157223
Abrogation of Microsatellite-instable Tumors Using a Highly Selective Suicide Gene/Prodrug Combination

PubMed Central

Ferrás, Cristina; Oude Vrielink, Joachim AF; Verspuy, Johan WA; te Riele, Hein; Tsaalbi-Shtylik, Anastasia; de Wind, Niels

2009-01-01

A substantial fraction of sporadic and inherited colorectal and endometrial cancers in humans is deficient in DNA mismatch repair (MMR). These cancers are characterized by length alterations in ubiquitous simple sequence repeats, a phenotype called microsatellite instability. Here we have exploited this phenotype by developing a novel approach for the highly selective gene therapy of MMR-deficient tumors. To achieve this selectivity, we mutated the VP22FCU1 suicide gene by inserting an out-of-frame microsatellite within its coding region. We show that in a significant fraction of microsatellite-instable (MSI) cells carrying the mutated suicide gene, full-length protein becomes expressed within a few cell doublings, presumably resulting from a reverting frameshift within the inserted microsatellite. Treatment of these cells with the innocuous prodrug 5-fluorocytosine (5-FC) induces strong cytotoxicity and we demonstrate that this owes to multiple bystander effects conferred by the suicide gene/prodrug combination. In a mouse model, MMR-deficient tumors that contained the out-of-frame VP22FCU1 gene displayed strong remission after treatment with 5-FC, without any obvious adverse systemic effects to the mouse. By virtue of its high selectivity and potency, this conditional enzyme/prodrug combination may hold promise for the treatment or prevention of MMR-deficient cancer in humans. PMID:19471249
Regulatory Architecture of Gene Expression Variation in the Threespine Stickleback Gasterosteus aculeatus.

PubMed

Pritchard, Victoria L; Viitaniemi, Heidi M; McCairns, R J Scott; Merilä, Juha; Nikinmaa, Mikko; Primmer, Craig R; Leder, Erica H

2017-01-05

Much adaptive evolutionary change is underlain by mutational variation in regions of the genome that regulate gene expression rather than in the coding regions of the genes themselves. An understanding of the role of gene expression variation in facilitating local adaptation will be aided by an understanding of underlying regulatory networks. Here, we characterize the genetic architecture of gene expression variation in the threespine stickleback (Gasterosteus aculeatus), an important model in the study of adaptive evolution. We collected transcriptomic and genomic data from 60 half-sib families using an expression microarray and genotyping-by-sequencing, and located expression quantitative trait loci (eQTL) underlying the variation in gene expression in liver tissue using an interval mapping approach. We identified eQTL for several thousand expression traits. Expression was influenced by polymorphism in both cis- and trans-regulatory regions. Trans-eQTL clustered into hotspots. We did not identify master transcriptional regulators in hotspot locations: rather, the presence of hotspots may be driven by complex interactions between multiple transcription factors. One observed hotspot colocated with a QTL recently found to underlie salinity tolerance in the threespine stickleback. However, most other observed hotspots did not colocate with regions of the genome known to be involved in adaptive divergence between marine and freshwater habitats. Copyright © 2017 Pritchard et al.
Genome-Wide Variation Patterns Uncover the Origin and Selection in Cultivated Ginseng (Panax ginseng Meyer)

PubMed Central

Li, Ming-Rui; Shi, Feng-Xue; Li, Ya-Ling; Jiang, Peng; Jiao, Lili

2017-01-01

Abstract Chinese ginseng (Panax ginseng Meyer) is a medicinally important herb and plays crucial roles in traditional Chinese medicine. Pharmacological analyses identified diverse bioactive components from Chinese ginseng. However, basic biological attributes including domestication and selection of the ginseng plant remain under-investigated. Here, we presented a genome-wide view of the domestication and selection of cultivated ginseng based on the whole genome data. A total of 8,660 protein-coding genes were selected for genome-wide scanning of the 30 wild and cultivated ginseng accessions. In complement, the 45s rDNA, chloroplast and mitochondrial genomes were included to perform phylogenetic and population genetic analyses. The observed spatial genetic structure between northern cultivated ginseng (NCG) and southern cultivated ginseng (SCG) accessions suggested multiple independent origins of cultivated ginseng. Genome-wide scanning further demonstrated that NCG and SCG have undergone distinct selection pressures during the domestication process, with more genes identified in the NCG (97 genes) than in the SCG group (5 genes). Functional analyses revealed that these genes are involved in diverse pathways, including DNA methylation, lignin biosynthesis, and cell differentiation. These findings suggested that the SCG and NCG groups have distinct demographic histories. Candidate genes identified are useful for future molecular breeding of cultivated ginseng. PMID:28922794
Regulatory Architecture of Gene Expression Variation in the Threespine Stickleback Gasterosteus aculeatus

PubMed Central

Pritchard, Victoria L.; Viitaniemi, Heidi M.; McCairns, R. J. Scott; Merilä, Juha; Nikinmaa, Mikko; Primmer, Craig R.; Leder, Erica H.

2016-01-01

Much adaptive evolutionary change is underlain by mutational variation in regions of the genome that regulate gene expression rather than in the coding regions of the genes themselves. An understanding of the role of gene expression variation in facilitating local adaptation will be aided by an understanding of underlying regulatory networks. Here, we characterize the genetic architecture of gene expression variation in the threespine stickleback (Gasterosteus aculeatus), an important model in the study of adaptive evolution. We collected transcriptomic and genomic data from 60 half-sib families using an expression microarray and genotyping-by-sequencing, and located expression quantitative trait loci (eQTL) underlying the variation in gene expression in liver tissue using an interval mapping approach. We identified eQTL for several thousand expression traits. Expression was influenced by polymorphism in both cis- and trans-regulatory regions. Trans-eQTL clustered into hotspots. We did not identify master transcriptional regulators in hotspot locations: rather, the presence of hotspots may be driven by complex interactions between multiple transcription factors. One observed hotspot colocated with a QTL recently found to underlie salinity tolerance in the threespine stickleback. However, most other observed hotspots did not colocate with regions of the genome known to be involved in adaptive divergence between marine and freshwater habitats. PMID:27836907
Genome-wide identification and characterization of the SBP-box gene family in Petunia.

PubMed

Zhou, Qin; Zhang, Sisi; Chen, Feng; Liu, Baojun; Wu, Lan; Li, Fei; Zhang, Jiaqi; Bao, Manzhu; Liu, Guofeng

2018-03-12

SQUAMOSA PROMOTER BINDING PROTEIN (SBP)-box genes encode a family of plant-specific transcription factors (TFs) that play important roles in many growth and development processes including phase transition, leaf initiation, shoot and inflorescence branching, fruit development and ripening etc. The SBP-box gene family has been identified and characterized in many species, but has not been well studied in Petunia, an important ornamental genus. We identified 21 putative SPL genes of Petunia axillaris and P. inflata from the reference genome of P. axillaris N and P. inflata S6, respectively, which were supported by the transcriptome data. For further confirmation, all the 21 genes were also cloned from P. hybrida line W115 (Mitchel diploid). Phylogenetic analysis based on the highly conserved SBP domains arranged PhSPLs in eight groups, analogous to those from Arabidopsis and tomato. Furthermore, the Petunia SPL genes had similar exon-intron structure and the deduced proteins contained very similar conserved motifs within the same subgroup. Out of 21 PhSPL genes, fourteen were predicted to be potential targets of PhmiR156/157, and the putative miR156/157 response elements (MREs) were located in the coding region of group IV, V, VII and VIII genes, but in the 3'-UTR regions of group VI genes. SPL genes were also identified from another two wild Petunia species, P. integrifolia and P. exserta, based on their transcriptome databases to investigate the origin of PhSPLs. Phylogenetic analysis and multiple alignments of the coding sequences of PhSPLs and their orthologs from wild species indicated that PhSPLs were originated mainly from P. axillaris. qRT-PCR analysis demonstrated differential spatiotemperal expression patterns of PhSPL genes in petunia and many were expressed predominantly in the axillary buds and/or inflorescences. In addition, overexpression of PhSPL9a and PhSPL9b in Arabidopsis suggested that these genes play a conserved role in promoting the vegetative-to-reproductive phase transition. Petunia genome contains at least 21 SPL genes, and most of the genes are expressed in different tissues. The PhSPL genes may play conserved and diverse roles in plant growth and development, including flowering regulation, leaf initiation, axillary bud and inflorescence development. This work provides a comprehensive understanding of the SBP-box gene family in Petunia and lays a significant foundation for future studies on the function and evolution of SPL genes in petunia.

Testing the burden of rare variation in arrhythmia-susceptibility genes provides new insights into molecular diagnosis for Brugada syndrome.

PubMed

Le Scouarnec, Solena; Karakachoff, Matilde; Gourraud, Jean-Baptiste; Lindenbaum, Pierre; Bonnaud, Stéphanie; Portero, Vincent; Duboscq-Bidot, Laëtitia; Daumy, Xavier; Simonet, Floriane; Teusan, Raluca; Baron, Estelle; Violleau, Jade; Persyn, Elodie; Bellanger, Lise; Barc, Julien; Chatel, Stéphanie; Martins, Raphaël; Mabo, Philippe; Sacher, Frédéric; Haïssaguerre, Michel; Kyndt, Florence; Schmitt, Sébastien; Bézieau, Stéphane; Le Marec, Hervé; Dina, Christian; Schott, Jean-Jacques; Probst, Vincent; Redon, Richard

2015-05-15

The Brugada syndrome (BrS) is a rare heritable cardiac arrhythmia disorder associated with ventricular fibrillation and sudden cardiac death. Mutations in the SCN5A gene have been causally related to BrS in 20-30% of cases. Twenty other genes have been described as involved in BrS, but their overall contribution to disease prevalence is still unclear. This study aims to estimate the burden of rare coding variation in arrhythmia-susceptibility genes among a large group of patients with BrS. We have developed a custom kit to capture and sequence the coding regions of 45 previously reported arrhythmia-susceptibility genes and applied this kit to 167 index cases presenting with a Brugada pattern on the electrocardiogram as well as 167 individuals aged over 65-year old and showing no history of cardiac arrhythmia. By applying burden tests, a significant enrichment in rare coding variation (with a minor allele frequency below 0.1%) was observed only for SCN5A, with rare coding variants carried by 20.4% of cases with BrS versus 2.4% of control individuals (P = 1.4 × 10(-7)). No significant enrichment was observed for any other arrhythmia-susceptibility gene, including SCN10A and CACNA1C. These results indicate that, except for SCN5A, rare coding variation in previously reported arrhythmia-susceptibility genes do not contribute significantly to the occurrence of BrS in a population with European ancestry. Extreme caution should thus be taken when interpreting genetic variation in molecular diagnostic setting, since rare coding variants were observed in a similar extent among cases versus controls, for most previously reported BrS-susceptibility genes. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Evaluation of the efficacy of twelve mitochondrial protein-coding genes as barcodes for mollusk DNA barcoding.

PubMed

Yu, Hong; Kong, Lingfeng; Li, Qi

2016-01-01

In this study, we evaluated the efficacy of 12 mitochondrial protein-coding genes from 238 mitochondrial genomes of 140 molluscan species as potential DNA barcodes for mollusks. Three barcoding methods (distance, monophyly and character-based methods) were used in species identification. The species recovery rates based on genetic distances for the 12 genes ranged from 70.83 to 83.33%. There were no significant differences in intra- or interspecific variability among the 12 genes. The monophyly and character-based methods provided higher resolution than the distance-based method in species delimitation. Especially in closely related taxa, the character-based method showed some advantages. The results suggested that besides the standard COI barcode, other 11 mitochondrial protein-coding genes could also be potentially used as a molecular diagnostic for molluscan species discrimination. Our results also showed that the combination of mitochondrial genes did not enhance the efficacy for species identification and a single mitochondrial gene would be fully competent.
Complete mitochondrial genome sequence of the heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus).

PubMed

Hu, Bo; Liu, Dong-Xing; Zhang, Yu-Qing; Song, Jian-Tao; Ji, Xian-Fei; Hou, Zhi-Qiang; Zhang, Zhen-Hai

2016-05-01

In this study we sequenced the complete mitochondrial genome sequencing of a heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus) for the first time. The total length of the mitogenome was 16,267 bp. It harbored 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 1 non-coding control region.
Complete mitochondrial genome of Bactrocera arecae (Insecta: Tephritidae) by next-generation sequencing and molecular phylogeny of Dacini tribe

PubMed Central

Yong, Hoi-Sen; Song, Sze-Looi; Lim, Phaik-Eem; Chan, Kok-Gan; Chow, Wan-Loo; Eamsobhana, Praphathip

2015-01-01

The whole mitochondrial genome of the pest fruit fly Bactrocera arecae was obtained from next-generation sequencing of genomic DNA. It had a total length of 15,900 bp, consisting of 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The control region (952 bp) was flanked by rrnS and trnI genes. The start codons included 6 ATG, 3 ATT and 1 each of ATA, ATC, GTG and TCG. Eight TAA, two TAG, one incomplete TA and two incomplete T stop codons were represented in the protein-coding genes. The cloverleaf structure for trnS1 lacked the D-loop, and that of trnN and trnF lacked the TΨC-loop. Molecular phylogeny based on 13 protein-coding genes was concordant with 37 mitochondrial genes, with B. arecae having closest genetic affinity to B. tryoni. The subgenus Bactrocera of Dacini tribe and the Dacinae subfamily (Dacini and Ceratitidini tribes) were monophyletic. The whole mitogenome of B. arecae will serve as a useful dataset for studying the genetics, systematics and phylogenetic relationships of the many species of Bactrocera genus in particular, and tephritid fruit flies in general. PMID:26472633
MicroRNAs in Control of Stem Cells in Normal and Malignant Hematopoiesis

PubMed Central

Roden, Christine; Lu, Jun

2016-01-01

Studies on hematopoietic stem cells (HSCs) and leukemia stem cells (LSCs) have helped to establish the paradigms of normal and cancer stem cell concepts. For both HSCs and LSCs, specific gene expression programs endowed by their epigenome functionally distinguish them from their differentiated progenies. MicroRNAs (miRNAs), as a class of small non-coding RNAs, act to control post-transcriptional gene expression. Research in the past decade has yielded exciting findings elucidating the roles of miRNAs in control of multiple facets of HSC and LSC biology. Here we review recent progresses on the functions of miRNAs in HSC emergence during development, HSC switch from a fetal/neonatal program to an adult program, HSC self-renewal and quiescence, HSC aging, HSC niche, and malignant stem cells. While multiple different miRNAs regulate a diverse array of targets, two common themes emerge in HSC and LSC biology: miRNA mediated regulation of epigenetic machinery and cell signaling pathways. In addition, we propose that miRNAs themselves behave like epigenetic regulators, as they possess key biochemical and biological properties that can provide both stability and alterability to the epigenetic program. Overall, the studies of miRNAs in stem cells in the hematologic contexts not only provide key understandings to post-transcriptional gene regulation mechanisms in HSCs and LSCs, but also will lend key insights for other stem cell fields. PMID:27547713
geneCBR: a translational tool for multiple-microarray analysis and integrative information retrieval for aiding diagnosis in cancer research.

PubMed

Glez-Peña, Daniel; Díaz, Fernando; Hernández, Jesús M; Corchado, Juan M; Fdez-Riverola, Florentino

2009-06-18

Bioinformatics and medical informatics are two research fields that serve the needs of different but related communities. Both domains share the common goal of providing new algorithms, methods and technological solutions to biomedical research, and contributing to the treatment and cure of diseases. Although different microarray techniques have been successfully used to investigate useful information for cancer diagnosis at the gene expression level, the true integration of existing methods into day-to-day clinical practice is still a long way off. Within this context, case-based reasoning emerges as a suitable paradigm specially intended for the development of biomedical informatics applications and decision support systems, given the support and collaboration involved in such a translational development. With the goals of removing barriers against multi-disciplinary collaboration and facilitating the dissemination and transfer of knowledge to real practice, case-based reasoning systems have the potential to be applied to translational research mainly because their computational reasoning paradigm is similar to the way clinicians gather, analyze and process information in their own practice of clinical medicine. In addressing the issue of bridging the existing gap between biomedical researchers and clinicians who work in the domain of cancer diagnosis, prognosis and treatment, we have developed and made accessible a common interactive framework. Our geneCBR system implements a freely available software tool that allows the use of combined techniques that can be applied to gene selection, clustering, knowledge extraction and prediction for aiding diagnosis in cancer research. For biomedical researches, geneCBR expert mode offers a core workbench for designing and testing new techniques and experiments. For pathologists or oncologists, geneCBR diagnostic mode implements an effective and reliable system that can diagnose cancer subtypes based on the analysis of microarray data using a CBR architecture. For programmers, geneCBR programming mode includes an advanced edition module for run-time modification of previous coded techniques. geneCBR is a new translational tool that can effectively support the integrative work of programmers, biomedical researches and clinicians working together in a common framework. The code is freely available under the GPL license and can be obtained at http://www.genecbr.org.
Coherent direct sequence optical code multiple access encoding-decoding efficiency versus wavelength detuning.

PubMed

Pastor, D; Amaya, W; García-Olcina, R; Sales, S

2007-07-01

We present a simple theoretical model of and the experimental verification for vanishing of the autocorrelation peak due to wavelength detuning on the coding-decoding process of coherent direct sequence optical code multiple access systems based on a superstructured fiber Bragg grating. Moreover, the detuning vanishing effect has been explored to take advantage of this effect and to provide an additional degree of multiplexing and/or optical code tuning.
The role of alternative Polyadenylation in regulation of rhythmic gene expression.

PubMed

Ptitsyna, Natalia; Boughorbel, Sabri; El Anbari, Mohammed; Ptitsyn, Andrey

2017-08-04

Alternative transcription is common in eukaryotic cells and plays important role in regulation of cellular processes. Alternative polyadenylation results from ambiguous PolyA signals in 3' untranslated region (UTR) of a gene. Such alternative transcripts share the same coding part, but differ by a stretch of UTR that may contain important functional sites. The methodoogy of this study is based on mathematical modeling, analytical solution, and subsequent validation by datamining in multiple independent experimental data from previously published studies. In this study we propose a mathematical model that describes the population dynamics of alternatively polyadenylated transcripts in conjunction with rhythmic expression such as transcription oscillation driven by circadian or metabolic oscillators. Analysis of the model shows that alternative transcripts with different turnover rates acquire a phase shift if the transcript decay rate is different. Difference in decay rate is one of the consequences of alternative polyadenylation. Phase shift can reach values equal to half the period of oscillation, which makes alternative transcripts oscillate in abundance in counter-phase to each other. Since counter-phased transcripts share the coding part, the rate of translation becomes constant. We have analyzed a few data sets collected in circadian timeline for the occurrence of transcript behavior that fits the mathematical model. Alternative transcripts with different turnover rate create the effect of rectifier. This "molecular diode" moderates or completely eliminates oscillation of individual transcripts and stabilizes overall protein production rate. In our observation this phenomenon is very common in different tissues in plants, mice, and humans. The occurrence of counter-phased alternative transcripts is also tissue-specific and affects functions of multiple biological pathways. Accounting for this mechanism is important for understanding the natural and engineering the synthetic cellular circuits.
The complete mitochondrial genome of the Giant Manta ray, Manta birostris.

PubMed

Hinojosa-Alvarez, Silvia; Díaz-Jaimes, Pindaro; Marcet-Houben, Marina; Gabaldón, Toni

2015-01-01

The complete mitochondrial genome of the giant manta ray (Manta birostris), consists of 18,075 bp with rich A + T and low G content. Gene organization and length is similar to other species of ray. It comprises of 13 protein-coding genes, 2 rRNAs genes, 23 tRNAs genes and 1 non-coding sequence, and the control region. We identified an AT tandem repeat region, similar to that reported in Mobula japanica.
Origins of Genes: "Big Bang" or Continuous Creation?

NASA Astrophysics Data System (ADS)

Kesse, Paul K.; Gibbs, Adrian

1992-10-01

Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes.
Duplication and expression of CYC2-like genes in the origin and maintenance of corolla zygomorphy in Lamiales.

PubMed

Zhong, Jinshun; Kellogg, Elizabeth A

2015-01-01

Duplication, retention, and expression of CYCLOIDEA2 (CYC2)-like genes are thought to affect evolution of corolla symmetry. However, exactly what and how changes in CYC2-like genes correlate with the origin of corolla zygomorphy are poorly understood. We inferred and calibrated a densely sampled phylogeny of CYC2-like genes across the Lamiales and examined their expression in early diverging (EDL) and higher core clades (HCL). CYC2-like genes duplicated extensively in Lamiales, at least six times in core Lamiales (CL) around the Cretaceous-Paleogene (K-Pg) boundary, and seven more in EDL relatively more recently. Nested duplications and losses of CYC2-like paralogs are pervasive but may not correlate with transitions in corolla symmetry. We found evidence for dN/dS (ω) variation following gene duplications. CYC2-like paralogs in HCL show differential expression with higher expression in adaxial petals. Asymmetric expression but not recurrent duplication of CYC2-like genes correlates with the origin of corolla zygomorphy. Changes in both cis-regulatory and coding domains of CYC2-like genes are probably crucial for the evolution of corolla zygomorphy. Multiple selection regimes appear likely to play important roles in gene retention. The parallel duplications of CYC2-like genes are after the initial diversification of bumble bees and Euglossine bees. © 2014 The Authors. New Phytologist © 2014 New Phytologist Trust.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).

PubMed

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-04-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)

PubMed Central

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-01-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575
A novel SMAD3 mutation caused multiple aneurysms in a patient without osteoarthritis symptoms.

PubMed

Courtois, Audrey; Coppieters, Wouter; Bours, Vincent; Defraigne, Jean-Olivier; Colige, Alain; Sakalihasan, Natzi

2017-04-01

Heterozygous mutations in the SMAD3 gene were recently described as the cause of a form of non-syndromic familial aortic thoracic aneurysm and dissection (FTAAD) transmitted as an autosomal dominant disorder and often associated with early-onset osteoarthritis. This new clinical entity, called aneurysms-osteoarthritis syndrome (AOS) or Loeys-Dietz syndrome 3 (LDS3), is characterized by aggressive arterial damages such as aneurysms, dissections and tortuosity throughout the arterial tree. We report, here, the case of a 45 year-old man presenting multiple visceral arteries and abdominal aortic aneurysms but without dissection of the thoracic aorta and without any sign of osteoarthritis. Exome-sequencing revealed a new frameshift heterozygous c.455delC (p.Pro152Hisfs*34) mutation in the SMAD3 gene. This deletion is located in the exon 3 coding for the linker region of the protein and causes a premature stop codon at positions 556-558 in the exon 4. The same mutation was found in the proband's mother and sister who had open surgery for abdominal aortic aneurysm and in one of his children who was 5 year-old and did not present aneurysm yet. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
GCView: the genomic context viewer for protein homology searches

PubMed Central

Grin, Iwan; Linke, Dirk

2011-01-01

Genomic neighborhood can provide important insights into evolution and function of a protein or gene. When looking at operons, changes in operon structure and composition can only be revealed by looking at the operon as a whole. To facilitate the analysis of the genomic context of a query in multiple organisms we have developed Genomic Context Viewer (GCView). GCView accepts results from one or multiple protein homology searches such as BLASTp as input. For each hit, the neighboring protein-coding genes are extracted, the regions of homology are labeled for each input and the results are presented as a clear, interactive graphical output. It is also possible to add more searches to iteratively refine the output. GCView groups outputs by the hits for different proteins. This allows for easy comparison of different operon compositions and structures. The tool is embedded in the framework of the Bioinformatics Toolkit of the Max-Planck Institute for Developmental Biology (MPI Toolkit). Job results from the homology search tools inside the MPI Toolkit can be forwarded to GCView and results can be subsequently analyzed by sequence analysis tools. Results are stored online, allowing for later reinspection. GCView is freely available at http://toolkit.tuebingen.mpg.de/gcview. PMID:21609955
EBIC: an evolutionary-based parallel biclustering algorithm for pattern discovery.

PubMed

Orzechowski, Patryk; Sipper, Moshe; Huang, Xiuzhen; Moore, Jason H

2018-05-22

Biclustering algorithms are commonly used for gene expression data analysis. However, accurate identification of meaningful structures is very challenging and state-of-the-art methods are incapable of discovering with high accuracy different patterns of high biological relevance. In this paper a novel biclustering algorithm based on evolutionary computation, a subfield of artificial intelligence (AI), is introduced. The method called EBIC aims to detect order-preserving patterns in complex data. EBIC is capable of discovering multiple complex patterns with unprecedented accuracy in real gene expression datasets. It is also one of the very few biclustering methods designed for parallel environments with multiple graphics processing units (GPUs). We demonstrate that EBIC greatly outperforms state-of-the-art biclustering methods, in terms of recovery and relevance, on both synthetic and genetic datasets. EBIC also yields results over 12 times faster than the most accurate reference algorithms. EBIC source code is available on GitHub at https://github.com/EpistasisLab/ebic. Correspondence and requests for materials should be addressed to P.O. (email: patryk.orzechowski@gmail.com) and J.H.M. (email: jhmoore@upenn.edu). Supplementary Data with results of analyses and additional information on the method is available at Bioinformatics online.
A novel all-optical label processing based on multiple optical orthogonal codes sequences for optical packet switching networks

NASA Astrophysics Data System (ADS)

Zhang, Chongfu; Qiu, Kun; Xu, Bo; Ling, Yun

2008-05-01

This paper proposes an all-optical label processing scheme that uses the multiple optical orthogonal codes sequences (MOOCS)-based optical label for optical packet switching (OPS) (MOOCS-OPS) networks. In this scheme, each MOOCS is a permutation or combination of the multiple optical orthogonal codes (MOOC) selected from the multiple-groups optical orthogonal codes (MGOOC). Following a comparison of different optical label processing (OLP) schemes, the principles of MOOCS-OPS network are given and analyzed. Firstly, theoretical analyses are used to prove that MOOCS is able to greatly enlarge the number of available optical labels when compared to the previous single optical orthogonal code (SOOC) for OPS (SOOC-OPS) network. Then, the key units of the MOOCS-based optical label packets, including optical packet generation, optical label erasing, optical label extraction and optical label rewriting etc., are given and studied. These results are used to verify that the proposed MOOCS-OPS scheme is feasible.
Iterative channel decoding of FEC-based multiple-description codes.

PubMed

Chang, Seok-Ho; Cosman, Pamela C; Milstein, Laurence B

2012-03-01

Multiple description coding has been receiving attention as a robust transmission framework for multimedia services. This paper studies the iterative decoding of FEC-based multiple description codes. The proposed decoding algorithms take advantage of the error detection capability of Reed-Solomon (RS) erasure codes. The information of correctly decoded RS codewords is exploited to enhance the error correction capability of the Viterbi algorithm at the next iteration of decoding. In the proposed algorithm, an intradescription interleaver is synergistically combined with the iterative decoder. The interleaver does not affect the performance of noniterative decoding but greatly enhances the performance when the system is iteratively decoded. We also address the optimal allocation of RS parity symbols for unequal error protection. For the optimal allocation in iterative decoding, we derive mathematical equations from which the probability distributions of description erasures can be generated in a simple way. The performance of the algorithm is evaluated over an orthogonal frequency-division multiplexing system. The results show that the performance of the multiple description codes is significantly enhanced.
EGASP: the human ENCODE Genome Annotation Assessment Project

PubMed Central

Guigó, Roderic; Flicek, Paul; Abril, Josep F; Reymond, Alexandre; Lagarde, Julien; Denoeud, France; Antonarakis, Stylianos; Ashburner, Michael; Bajic, Vladimir B; Birney, Ewan; Castelo, Robert; Eyras, Eduardo; Ucla, Catherine; Gingeras, Thomas R; Harrow, Jennifer; Hubbard, Tim; Lewis, Suzanna E; Reese, Martin G

2006-01-01

Background We present the results of EGASP, a community experiment to assess the state-of-the-art in genome annotation within the ENCODE regions, which span 1% of the human genome sequence. The experiment had two major goals: the assessment of the accuracy of computational methods to predict protein coding genes; and the overall assessment of the completeness of the current human genome annotations as represented in the ENCODE regions. For the computational prediction assessment, eighteen groups contributed gene predictions. We evaluated these submissions against each other based on a 'reference set' of annotations generated as part of the GENCODE project. These annotations were not available to the prediction groups prior to the submission deadline, so that their predictions were blind and an external advisory committee could perform a fair assessment. Results The best methods had at least one gene transcript correctly predicted for close to 70% of the annotated genes. Nevertheless, the multiple transcript accuracy, taking into account alternative splicing, reached only approximately 40% to 50% accuracy. At the coding nucleotide level, the best programs reached an accuracy of 90% in both sensitivity and specificity. Programs relying on mRNA and protein sequences were the most accurate in reproducing the manually curated annotations. Experimental validation shows that only a very small percentage (3.2%) of the selected 221 computationally predicted exons outside of the existing annotation could be verified. Conclusion This is the first such experiment in human DNA, and we have followed the standards established in a similar experiment, GASP1, in Drosophila melanogaster. We believe the results presented here contribute to the value of ongoing large-scale annotation projects and should guide further experimental methods when being scaled up to the entire human genome sequence. PMID:16925836
ExDom: an integrated database for comparative analysis of the exon–intron structures of protein domains in eukaryotes

PubMed Central

Bhasi, Ashwini; Philip, Philge; Manikandan, Vinu; Senapathy, Periannan

2009-01-01

We have developed ExDom, a unique database for the comparative analysis of the exon–intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon–intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon–intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon–intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/. PMID:18984624

Some links on this page may take you to non-federal websites. Their policies may differ from this site.