protein-coding genes start: Topics by Science.gov

Sample records for protein-coding genes start

Improving the genome annotation of the acarbose producer Actinoplanes sp. SE50/110 by sequencing enriched 5'-ends of primary transcripts.

PubMed

Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred

2014-11-20

Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria. Copyright © 2014 Elsevier B.V. All rights reserved.
The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae).

PubMed

Pan, Hong-Chun; Fang, Hong-Yan; Li, Shi-Wei; Liu, Jun-Hong; Wang, Ying; Wang, An-Tai

2014-12-01

The complete mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae) is composed of two linear DNA molecules. The mitochondrial DNA (mtDNA) molecule 1 is 8010 bp long and contains six protein-coding genes, large subunit rRNA, methionine and tryptophan tRNAs, two pseudogenes consisting respectively of a partial copy of COI, and terminal sequences at two ends of the linear mtDNA, while the mtDNA molecule 2 is 7576 bp long and contains seven protein-coding genes, small subunit rRNA, methionine tRNA, a pseudogene consisting of a partial copy of COI and terminal sequences at two ends of the linear mtDNA. COI gene begins with GTG as start codon, whereas other 12 protein-coding genes start with a typical ATG initiation codon. In addition, all protein-coding genes are terminated with TAA as stop codon.
Complete mitochondrial genome of Germain's Peacock-Pheasant Polyplectron germaini (Aves, Galliformes, Phasianidae).

PubMed

Omeire, Destiny; Abdin, Shaunte; Brooks, Daniel M; Miranda, Hector C

2015-04-01

The Germain's Peacock-Pheasant Polyplectron germaini (Aves, Galliformes, Phasianidae) is classified as Near Threatened on the IUCN Red List. The complete mitochondrial genome of P. germaini is 16,699 bp, consisting of 13 protein-coding genes, 2 rRNA, 22 tRNA genes and 1 control region. All of the 13 protein-coding genes have ATG as start codon. Eight of the 13 protein-coding genes have TAA as stop codon.
Complete mitochondrial genome of Palawan peacock-pheasant Polyplectron napoleonis (Galliformes, Phasianidae).

PubMed

Quach, Tommy; Brooks, Daniel M; Miranda, Hector C

2016-01-01

The complete mitochondrial genome of the Palawan peacock-pheasant Polyplectron napoleonis is 16,710 bp and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a control-region. All protein-coding genes use the standard ATG start codon, except for cox1 which has GTG start codon. Seven out of 13 PCGs have TAA stop codons, two have AGG (cox1 and nd6), and three PCGs (nd2, cox2 and nd4) have incomplete stop codon of just T- - nucleotide.
Complete mitochondrial genome of the agarophyte red alga Gelidium vagum (Gelidiales).

PubMed

Yang, Eun Chan; Kim, Kyeong Mi; Boo, Ga Hun; Lee, Jung-Hyun; Boo, Sung Min; Yoon, Hwan Su

2014-08-01

We describe the first complete mitochondrial genome of Gelidium vagum (Gelidiales) (24,901 bp, 30.4% GC content), an agar-producing red alga. The circular mitochondrial genome contains 43 genes, including 23 protein-coding, 18 tRNA and 2 rRNA genes. All the protein-coding genes have a typical ATG start codon. No introns were found. Two genes, secY and rps12, were overlapped by 41 bp.
Origin and evolution of the long non-coding genes in the X-inactivation center.

PubMed

Romito, Antonio; Rougeulle, Claire

2011-11-01

Random X chromosome inactivation (XCI), the eutherian mechanism of X-linked gene dosage compensation, is controlled by a cis-acting locus termed the X-inactivation center (Xic). One of the striking features that characterize the Xic landscape is the abundance of loci transcribing non-coding RNAs (ncRNAs), including Xist, the master regulator of the inactivation process. Recent comparative genomic analyses have depicted the evolutionary scenario behind the origin of the X-inactivation center, revealing that this locus evolved from a region harboring protein-coding genes. During mammalian radiation, this ancestral protein-coding region was disrupted in the marsupial group, whilst it provided in eutherian lineage the starting material for the non-translated RNAs of the X-inactivation center. The emergence of non-coding genes occurred by a dual mechanism involving loss of protein-coding function of the pre-existing genes and integration of different classes of mobile elements, some of which modeled the structure and sequence of the non-coding genes in a species-specific manner. The rising genes started to produce transcripts that acquired function in regulating the epigenetic status of the X chromosome, as shown for Xist, its antisense Tsix, Jpx, and recently suggested for Ftx. Thus, the appearance of the Xic, which occurred after the divergence between eutherians and marsupials, was the basis for the evolution of random X inactivation as a strategy to achieve dosage compensation. Copyright © 2011. Published by Elsevier Masson SAS.
Tetrahymena thermophila acidic ribosomal protein L37 contains an archaebacterial type of C-terminus.

PubMed

Hansen, T S; Andreasen, P H; Dreisig, H; Højrup, P; Nielsen, H; Engberg, J; Kristiansen, K

1991-09-15

We have cloned and characterized a Tetrahymena thermophila macronuclear gene (L37) encoding the acidic ribosomal protein (A-protein) L37. The gene contains a single intron located in the 3'-part of the coding region. Two major and three minor transcription start points (tsp) were mapped 39 to 63 nucleotides upstream from the translational start codon. The uppermost tsp mapped to the first T in a putative T. thermophila RNA polymerase II initiator element, TATAA. The coding region of L37 predicts a protein of 109 amino acid (aa) residues. A substantial part of the deduced aa sequence was verified by protein sequencing. The T. thermophila L37 clearly belongs to the P1-type family of eukaryotic A-proteins, but the C-terminal region has the hallmarks of archaebacterial A-proteins.
The complete mitochondrial genome and phylogenetic analysis of the giant panda (Ailuropoda melanoleuca).

PubMed

Peng, Rui; Zeng, Bo; Meng, Xiuxiang; Yue, Bisong; Zhang, Zhihe; Zou, Fangdong

2007-08-01

The complete mitochondrial genome sequence of the giant panda, Ailuropoda melanoleuca, was determined by the long and accurate polymerase chain reaction (LA-PCR) with conserved primers and primer walking sequence methods. The complete mitochondrial DNA is 16,805 nucleotides in length and contains two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one control region. The total length of the 13 protein-coding genes is longer than the American black bear, brown bear and polar bear by 3 amino acids at the end of ND5 gene. The codon usage also followed the typical vertebrate pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 5 (ND5) gene. The molecular phylogenetic analysis was performed on the sequences of 12 concatenated heavy-strand encoded protein-coding genes, and suggested that the giant panda is most closely related to bears.
The complete mitochondrial genome of Chinese green hydra, Hydra sinensis (Hydroida: Hydridae).

PubMed

Pan, Hong-Chun; Qian, Xiao-Cheng; Li, Ping; Li, Xiao-Fei; Wang, An-Tai

2014-02-01

The complete mitochondrial genome of Chinese green hydra, Hydra sinensis (Hydroida: Hydridae) is a linear molecule of 16,189 bp in length, containing 13 protein-coding genes, small and large subunit ribosomal RNAs, methionine and tryptophan transfer RNAs, a pseudogene consisting of a partial copy of COI and terminal sequences at two ends of the linear mitochondrial DNA. The A + T content of the overall base composition of H-strand is 77.2% (T: 41.7%; C: 10.9%; A: 35.5%; and G: 11.9%). COI and ND1 genes begin with GTG as start codon, while other 11 protein-coding genes start with a typical ATG initiation codon. COII, ATP8, ATP6, COIII, ND5, ND6, ND3, ND1, ND4 and COI genes are terminated with TAA as stop codon, ND4L ends with TAG, ND2 ends with TA and Cyt b ends with T.
The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.

PubMed Central

Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R

1982-01-01

The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791
Recognition of Protein-coding Genes Based on Z-curve Algorithms

PubMed Central

-Biao Guo, Feng; Lin, Yan; -Ling Chen, Ling

2014-01-01

Recognition of protein-coding genes, a classical bioinformatics issue, is an absolutely needed step for annotating newly sequenced genomes. The Z-curve algorithm, as one of the most effective methods on this issue, has been successfully applied in annotating or re-annotating many genomes, including those of bacteria, archaea and viruses. Two Z-curve based ab initio gene-finding programs have been developed: ZCURVE (for bacteria and archaea) and ZCURVE_V (for viruses and phages). ZCURVE_C (for 57 bacteria) and Zfisher (for any bacterium) are web servers for re-annotation of bacterial and archaeal genomes. The above four tools can be used for genome annotation or re-annotation, either independently or combined with the other gene-finding programs. In addition to recognizing protein-coding genes and exons, Z-curve algorithms are also effective in recognizing promoters and translation start sites. Here, we summarize the applications of Z-curve algorithms in gene finding and genome annotation. PMID:24822027
GENCODE: the reference human genome annotation for The ENCODE Project.

PubMed

Harrow, Jennifer; Frankish, Adam; Gonzalez, Jose M; Tapanari, Electra; Diekhans, Mark; Kokocinski, Felix; Aken, Bronwen L; Barrell, Daniel; Zadissa, Amonida; Searle, Stephen; Barnes, If; Bignell, Alexandra; Boychenko, Veronika; Hunt, Toby; Kay, Mike; Mukherjee, Gaurab; Rajan, Jeena; Despacio-Reyes, Gloria; Saunders, Gary; Steward, Charles; Harte, Rachel; Lin, Michael; Howald, Cédric; Tanzer, Andrea; Derrien, Thomas; Chrast, Jacqueline; Walters, Nathalie; Balasubramanian, Suganthi; Pei, Baikang; Tress, Michael; Rodriguez, Jose Manuel; Ezkurdia, Iakes; van Baren, Jeltje; Brent, Michael; Haussler, David; Kellis, Manolis; Valencia, Alfonso; Reymond, Alexandre; Gerstein, Mark; Guigó, Roderic; Hubbard, Tim J

2012-09-01

The GENCODE Consortium aims to identify all gene features in the human genome using a combination of computational analysis, manual annotation, and experimental validation. Since the first public release of this annotation data set, few new protein-coding loci have been added, yet the number of alternative splicing transcripts annotated has steadily increased. The GENCODE 7 release contains 20,687 protein-coding and 9640 long noncoding RNA loci and has 33,977 coding transcripts not represented in UCSC genes and RefSeq. It also has the most comprehensive annotation of long noncoding RNA (lncRNA) loci publicly available with the predominant transcript form consisting of two exons. We have examined the completeness of the transcript annotation and found that 35% of transcriptional start sites are supported by CAGE clusters and 62% of protein-coding genes have annotated polyA sites. Over one-third of GENCODE protein-coding genes are supported by peptide hits derived from mass spectrometry spectra submitted to Peptide Atlas. New models derived from the Illumina Body Map 2.0 RNA-seq data identify 3689 new loci not currently in GENCODE, of which 3127 consist of two exon models indicating that they are possibly unannotated long noncoding loci. GENCODE 7 is publicly available from gencodegenes.org and via the Ensembl and UCSC Genome Browsers.
Experimental annotation of post-translational features and translated coding regions in the pathogen Salmonella Typhimurium

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ansong, Charles; Tolic, Nikola; Purvine, Samuel O.

Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. For example systems biology-oriented genome scale modeling efforts greatly benefit from accurate annotation of protein-coding genes to develop proper functioning models. However, determining protein-coding genes for most new genomes is almost completely performed by inference, using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. With the ability to directly measure peptides arising from expressed proteins, mass spectrometry-based proteomics approaches can be used to augment and verify codingmore » regions of a genomic sequence and importantly detect post-translational processing events. In this study we utilized “shotgun” proteomics to guide accurate primary genome annotation of the bacterial pathogen Salmonella Typhimurium 14028 to facilitate a systems-level understanding of Salmonella biology. The data provides protein-level experimental confirmation for 44% of predicted protein-coding genes, suggests revisions to 48 genes assigned incorrect translational start sites, and uncovers 13 non-annotated genes missed by gene prediction programs. We also present a comprehensive analysis of post-translational processing events in Salmonella, revealing a wide range of complex chemical modifications (70 distinct modifications) and confirming more than 130 signal peptide and N-terminal methionine cleavage events in Salmonella. This study highlights several ways in which proteomics data applied during the primary stages of annotation can improve the quality of genome annotations, especially with regards to the annotation of mature protein products.« less
The Mitochondrial Cytochrome Oxidase Subunit I Gene Occurs on a Minichromosome with Extensive Heteroplasmy in Two Species of Chewing Lice, Geomydoecus aurei and Thomomydoecus minor

PubMed Central

Pietan, Lucas L.; Spradling, Theresa A.

2016-01-01

In animals, mitochondrial DNA (mtDNA) typically occurs as a single circular chromosome with 13 protein-coding genes and 22 tRNA genes. The various species of lice examined previously, however, have shown mitochondrial genome rearrangements with a range of chromosome sizes and numbers. Our research demonstrates that the mitochondrial genomes of two species of chewing lice found on pocket gophers, Geomydoecus aurei and Thomomydoecus minor, are fragmented with the 1,536 base-pair (bp) cytochrome-oxidase subunit I (cox1) gene occurring as the only protein-coding gene on a 1,916–1,964 bp minicircular chromosome in the two species, respectively. The cox1 gene of T. minor begins with an atypical start codon, while that of G. aurei does not. Components of the non-protein coding sequence of G. aurei and T. minor include a tRNA (isoleucine) gene, inverted repeat sequences consistent with origins of replication, and an additional non-coding region that is smaller than the non-coding sequence of other lice with such fragmented mitochondrial genomes. Sequences of cox1 minichromosome clones for each species reveal extensive length and sequence heteroplasmy in both coding and noncoding regions. The highly variable non-gene regions of G. aurei and T. minor have little sequence similarity with one another except for a 19-bp region of phylogenetically conserved sequence with unknown function. PMID:27589589
Core histone genes of Giardia intestinalis: genomic organization, promoter structure, and expression

PubMed Central

Yee, Janet; Tang, Anita; Lau, Wei-Ling; Ritter, Heather; Delport, Dewald; Page, Melissa; Adam, Rodney D; Müller, Miklós; Wu, Gang

2007-01-01

Background Giardia intestinalis is a protist found in freshwaters worldwide, and is the most common cause of parasitic diarrhea in humans. The phylogenetic position of this parasite is still much debated. Histones are small, highly conserved proteins that associate tightly with DNA to form chromatin within the nucleus. There are two classes of core histone genes in higher eukaryotes: DNA replication-independent histones and DNA replication-dependent ones. Results We identified two copies each of the core histone H2a, H2b and H3 genes, and three copies of the H4 gene, at separate locations on chromosomes 3, 4 and 5 within the genome of Giardia intestinalis, but no gene encoding a H1 linker histone could be recognized. The copies of each gene share extensive DNA sequence identities throughout their coding and 5' noncoding regions, which suggests these copies have arisen from relatively recent gene duplications or gene conversions. The transcription start sites are at triplet A sequences 1–27 nucleotides upstream of the translation start codon for each gene. We determined that a 50 bp region upstream from the start of the histone H4 coding region is the minimal promoter, and a highly conserved 15 bp sequence called the histone motif (him) is essential for its activity. The Giardia core histone genes are constitutively expressed at approximately equivalent levels and their mRNAs are polyadenylated. Competition gel-shift experiments suggest that a factor within the protein complex that binds him may also be a part of the protein complexes that bind other promoter elements described previously in Giardia. Conclusion In contrast to other eukaryotes, the Giardia genome has only a single class of core histone genes that encode replication-independent histones. Our inability to locate a gene encoding the linker histone H1 leads us to speculate that the H1 protein may not be required for the compaction of Giardia's small and gene-rich genome. PMID:17425802
The complete mitochondrial genome of the Korean skate: Hongeo koreana (Rajiformes, Rajidae).

PubMed

Jeong, Dageum; Kim, Sung; Kim, Choong-Gon; Lee, Youn-Ho

2014-12-01

The complete mitochondrial genome of the Korean skate, Hongeo koreana, the sole member of its genus, is investigated for the first time. The genome consists of 16,906 bp in length including 2 rRNA, 22 tRNA and 13 protein coding genes with the same gene order and structure of the genome as those of other Rajidae species. The overall nucleotide composition of the L-strand is A = 29.8%, C = 27.9%, T = 27.9% and G = 14.3%, showing a high A + T bias. The anti-G bias (6.0%) is more significant in the third codon position. Twelve of the 13 protein-coding genes use ATG as their start codon while the COX1 gene starts with GTG. For stop codon, ND3 and ND4 genes show incomplete stop codon T. The mitogenome sequence of H. koreana will provide important information on the evolution and the phylogenetic relation of the genus Hongeo in relation to the other genera of the family Rajidae.
The complete mitochondrial genome of the invasive Africanized Honey Bee, Apis mellifera scutellata (Insecta: Hymenoptera: Apidae).

PubMed

Gibson, Joshua D; Hunt, Greg J

2016-01-01

The complete mitochondrial genome from an Africanized honey bee population (AHB, derived from Apis mellifera scutellata) was assembled and analyzed. The mitogenome is 16,411 bp long and contains the same gene repertoire and gene order as the European honey bee (13 protein coding genes, 22 tRNA genes and 2 rRNA genes). ND4 appears to use an alternate start codon and the long rRNA gene is 48 bp shorter in AHB due to a deletion in a terminal AT dinucleotide repeat. The dihydrouracil arm is missing from tRNA-Ser (AGN) and tRNA-Glu is missing the TV loop. The A + T content is comparable to the European honey bee (84.7%), which increases to 95% for the 3rd position in the protein coding genes.
Complete mitochondrial genome of Bactrocera arecae (Insecta: Tephritidae) by next-generation sequencing and molecular phylogeny of Dacini tribe

PubMed Central

Yong, Hoi-Sen; Song, Sze-Looi; Lim, Phaik-Eem; Chan, Kok-Gan; Chow, Wan-Loo; Eamsobhana, Praphathip

2015-01-01

The whole mitochondrial genome of the pest fruit fly Bactrocera arecae was obtained from next-generation sequencing of genomic DNA. It had a total length of 15,900 bp, consisting of 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The control region (952 bp) was flanked by rrnS and trnI genes. The start codons included 6 ATG, 3 ATT and 1 each of ATA, ATC, GTG and TCG. Eight TAA, two TAG, one incomplete TA and two incomplete T stop codons were represented in the protein-coding genes. The cloverleaf structure for trnS1 lacked the D-loop, and that of trnN and trnF lacked the TΨC-loop. Molecular phylogeny based on 13 protein-coding genes was concordant with 37 mitochondrial genes, with B. arecae having closest genetic affinity to B. tryoni. The subgenus Bactrocera of Dacini tribe and the Dacinae subfamily (Dacini and Ceratitidini tribes) were monophyletic. The whole mitogenome of B. arecae will serve as a useful dataset for studying the genetics, systematics and phylogenetic relationships of the many species of Bactrocera genus in particular, and tephritid fruit flies in general. PMID:26472633
Deep Sequencing Reveals Uncharted Isoform Heterogeneity of the Protein-Coding Transcriptome in Cerebral Ischemia.

PubMed

Bhattarai, Sunil; Aly, Ahmed; Garcia, Kristy; Ruiz, Diandra; Pontarelli, Fabrizio; Dharap, Ashutosh

2018-06-03

Gene expression in cerebral ischemia has been a subject of intense investigations for several years. Studies utilizing probe-based high-throughput methodologies such as microarrays have contributed significantly to our existing knowledge but lacked the capacity to dissect the transcriptome in detail. Genome-wide RNA-sequencing (RNA-seq) enables comprehensive examinations of transcriptomes for attributes such as strandedness, alternative splicing, alternative transcription start/stop sites, and sequence composition, thus providing a very detailed account of gene expression. Leveraging this capability, we conducted an in-depth, genome-wide evaluation of the protein-coding transcriptome of the adult mouse cortex after transient focal ischemia at 6, 12, or 24 h of reperfusion using RNA-seq. We identified a total of 1007 transcripts at 6 h, 1878 transcripts at 12 h, and 1618 transcripts at 24 h of reperfusion that were significantly altered as compared to sham controls. With isoform-level resolution, we identified 23 splice variants arising from 23 genes that were novel mRNA isoforms. For a subset of genes, we detected reperfusion time-point-dependent splice isoform switching, indicating an expression and/or functional switch for these genes. Finally, for 286 genes across all three reperfusion time-points, we discovered multiple, distinct, simultaneously expressed and differentially altered isoforms per gene that were generated via alternative transcription start/stop sites. Of these, 165 isoforms derived from 109 genes were novel mRNAs. Together, our data unravel the protein-coding transcriptome of the cerebral cortex at an unprecedented depth to provide several new insights into the flexibility and complexity of stroke-related gene transcription and transcript organization.
Mitochondrial genome of Pteronotus personatus (Chiroptera: Mormoopidae): comparison with selected bats and phylogenetic considerations.

PubMed

López-Wilchis, Ricardo; Del Río-Portilla, Miguel Ángel; Guevara-Chumacero, Luis Manuel

2017-02-01

We described the complete mitochondrial genome (mitogenome) of the Wagner's mustached bat, Pteronotus personatus, a species belonging to the family Mormoopidae, and compared it with other published mitogenomes of bats (Chiroptera). The mitogenome of P. personatus was 16,570 bp long and contained a typically conserved structure including 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and one control region (D-loop). Most of the genes were encoded on the H-strand, except for eight tRNA and the ND6 genes. The order of protein-coding and rRNA genes was highly conserved in all mitogenomes. All protein-coding genes started with an ATG codon, except for ND2, ND3, and ND5, which initiated with ATA, and terminated with the typical stop codon TAA/TAG or the codon AGA. Phylogenetic trees constructed using Maximum Parsimony, Maximum Likelihood, and Bayesian inference methods showed an identical topology and indicated the monophyly of different families of bats (Mormoopidae, Phyllostomidae, Vespertilionidae, Rhinolophidae, and Pteropopidae) and the existence of two major clades corresponding to the suborders Yangochiroptera and Yinpterochiroptera. The mitogenome sequence provided here will be useful for further phylogenetic analyses and population genetic studies in mormoopid bats.

The complete mitochondrial genome of the Longnose skate: Raja rhina (Rajiformes, Rajidae).

PubMed

Jeong, Dageum; Lee, Youn-Ho

2015-02-01

The complete sequence of mitochondrial DNA of a longnose skate, Raja rhina was determined for the first time. It is 16,910 bp in length containing 2 rRNA, 22 tRNA and 13 protein coding genes with the same gene order and structure as those of other Rajidae species. The nucleotide of L-strand is composed of 30.1% A, 27.2% C, 28.5% T and 14.2% G, showing a slight A + T bias. The G is the least used base and markedly lower at the third codon position (5.4%). Twelve of the 13 protein coding genes use ATG as their start codon while the COX1 starts with GTG. As for stop codon, only ND4 shows incomplete stop codon TA. This mitogenome is the first report for a species of the genus Raja, and providing a valuable resource of genetic information for understanding the phylogenetic relationship and the evolution of the genus Raja as well as the family, Rajidae.
Complete mitochondrial genome of the Yellownose skate: Zearaja chilensis (Rajiformes, Rajidae).

PubMed

Jeong, Dageum; Lee, Youn-Ho

2016-01-01

The complete sequence of mitochondrial DNA of a Yellownose skate, Zearaja chilensis was determined for the first time. It is 16,909 bp in length covering 2 rRNA, 22 tRNA and 13 protein coding genes with the identical gene order and structure as those of other Rajidae species. The nucleotide of L-strand is composed of low G (14.3%), and slightly high A + T (58.9%) nucleotides. The strong codon usage bias against the use of G (6.0%) is found at the third codon positions. Twelve of the 13 protein coding genes use ATG as the start codon while COX1 starts with GTG. As for the stop codon, only ND4 shows an incomplete stop codon TA. This is the first report of the mitogenome for a species in the genus Zearaja, providing a valuable source of genetic information on the evolution of the family Rajidae and the genus Zearaja as well as for establishment of a sustainble fishery management plan of the species.
Complete mitochondrial genome of yellow meal worm(Tenebrio molitor)

PubMed Central

LIU, Li-Na; WANG, Cheng-Ye

2014-01-01

The yellow meal worm(Tenebrio molitor L.) is an important resource insect typically used as animal feed additive. It is also widely used for biological research. The first complete mitochondrial genome of T. molitor was determined for the first time by long PCR and conserved primer walking approaches. The results showed that the entire mitogenome of T. molitor was 15 785 bp long, with 72.35% A+T content [deposited in GenBank with accession number KF418153]. The gene order and orientation were the same as the most common type suggested as ancestral for insects. Two protein-coding genes used atypical start codons(CTA in ND2 and AAT in COX1), and the remaining 11 protein-coding genes started with a typical insect initiation codon ATN. All tRNAs showed standard clover-leaf structure, except for tRNASer(AGN), which lacked a dihydrouridine(DHU) arm. The newly added T. molitor mitogenome could provide information for future studies on yellow meal worm. PMID:25465087
Complete mitochondrial genome of yellow meal worm (Tenebrio molitor).

PubMed

Liu, Li-Na; Wang, Cheng-Ye

2014-11-18

The yellow meal worm (Tenebrio molitor L.) is an important resource insect typically used as animal feed additive. It is also widely used for biological research. The first complete mitochondrial genome of T. molitor was determined for the first time by long PCR and conserved primer walking approaches. The results showed that the entire mitogenome of T. molitor was 15 785 bp long, with 72.35% A+T content [deposited in GenBank with accession number KF418153]. The gene order and orientation were the same as the most common type suggested as ancestral for insects. Two protein-coding genes used atypical start codons (CTA in ND2 and AAT in COX1), and the remaining 11 protein-coding genes started with a typical insect initiation codon ATN. All tRNAs showed standard clover-leaf structure, except for tRNA(Ser) (AGN), which lacked a dihydrouridine (DHU) arm. The newly added T. molitor mitogenome could provide information for future studies on yellow meal worm.
Emerging Putative Associations between Non-Coding RNAs and Protein-Coding Genes in Neuropathic Pain: Added Value from Reusing Microarray Data.

PubMed

Raju, Hemalatha B; Tsinoremas, Nicholas F; Capobianco, Enrico

2016-01-01

Regeneration of injured nerves is likely occurring in the peripheral nervous system, but not in the central nervous system. Although protein-coding gene expression has been assessed during nerve regeneration, little is currently known about the role of non-coding RNAs (ncRNAs). This leaves open questions about the potential effects of ncRNAs at transcriptome level. Due to the limited availability of human neuropathic pain (NP) data, we have identified the most comprehensive time-course gene expression profile referred to sciatic nerve (SN) injury and studied in a rat model using two neuronal tissues, namely dorsal root ganglion (DRG) and SN. We have developed a methodology to identify differentially expressed bioentities starting from microarray probes and repurposing them to annotate ncRNAs, while analyzing the expression profiles of protein-coding genes. The approach is designed to reuse microarray data and perform first profiling and then meta-analysis through three main steps. First, we used contextual analysis to identify what we considered putative or potential protein-coding targets for selected ncRNAs. Relevance was therefore assigned to differential expression of neighbor protein-coding genes, with neighborhood defined by a fixed genomic distance from long or antisense ncRNA loci, and of parental genes associated with pseudogenes. Second, connectivity among putative targets was used to build networks, in turn useful to conduct inference at interactomic scale. Last, network paths were annotated to assess relevance to NP. We found significant differential expression in long-intergenic ncRNAs (32 lincRNAs in SN and 8 in DRG), antisense RNA (31 asRNA in SN and 12 in DRG), and pseudogenes (456 in SN and 56 in DRG). In particular, contextual analysis centered on pseudogenes revealed some targets with known association to neurodegeneration and/or neurogenesis processes. While modules of the olfactory receptors were clearly identified in protein-protein interaction networks, other connectivity paths were identified between proteins already investigated in studies on disorders, such as Parkinson, Down syndrome, Huntington disease, and Alzheimer. Our findings suggest the importance of reusing gene expression data by meta-analysis approaches.
Emerging Putative Associations between Non-Coding RNAs and Protein-Coding Genes in Neuropathic Pain: Added Value from Reusing Microarray Data

PubMed Central

Raju, Hemalatha B.; Tsinoremas, Nicholas F.; Capobianco, Enrico

2016-01-01

Regeneration of injured nerves is likely occurring in the peripheral nervous system, but not in the central nervous system. Although protein-coding gene expression has been assessed during nerve regeneration, little is currently known about the role of non-coding RNAs (ncRNAs). This leaves open questions about the potential effects of ncRNAs at transcriptome level. Due to the limited availability of human neuropathic pain (NP) data, we have identified the most comprehensive time-course gene expression profile referred to sciatic nerve (SN) injury and studied in a rat model using two neuronal tissues, namely dorsal root ganglion (DRG) and SN. We have developed a methodology to identify differentially expressed bioentities starting from microarray probes and repurposing them to annotate ncRNAs, while analyzing the expression profiles of protein-coding genes. The approach is designed to reuse microarray data and perform first profiling and then meta-analysis through three main steps. First, we used contextual analysis to identify what we considered putative or potential protein-coding targets for selected ncRNAs. Relevance was therefore assigned to differential expression of neighbor protein-coding genes, with neighborhood defined by a fixed genomic distance from long or antisense ncRNA loci, and of parental genes associated with pseudogenes. Second, connectivity among putative targets was used to build networks, in turn useful to conduct inference at interactomic scale. Last, network paths were annotated to assess relevance to NP. We found significant differential expression in long-intergenic ncRNAs (32 lincRNAs in SN and 8 in DRG), antisense RNA (31 asRNA in SN and 12 in DRG), and pseudogenes (456 in SN and 56 in DRG). In particular, contextual analysis centered on pseudogenes revealed some targets with known association to neurodegeneration and/or neurogenesis processes. While modules of the olfactory receptors were clearly identified in protein–protein interaction networks, other connectivity paths were identified between proteins already investigated in studies on disorders, such as Parkinson, Down syndrome, Huntington disease, and Alzheimer. Our findings suggest the importance of reusing gene expression data by meta-analysis approaches. PMID:27803687
The mitochondrial genome of the multicolored Asian lady beetle Harmonia axyridis (Pallas) and a phylogenetic analysis of the Polyphaga (Insecta: Coleoptera).

PubMed

Niu, Fang-Fang; Zhu, Liang; Wang, Su; Wei, Shu-Jun

2016-07-01

Here, we report the mitochondrial genome sequence of the multicolored Asian lady beetle Harmonia axyridis (Pallas, 1773) (Coleoptera: Coccinellidae) (GenBank accession No. KR108208). This is the first species with sequenced mitochondrial genome from the genus Harmonia. The current length with partitial A + T-rich region of this mitochondrial genome is 16,387 bp. All the typical genes were sequenced except the trnI and trnQ. As in most other sequenced mitochondrial genomes of Coleoptera, there is no re-arrangement in the sequenced region compared with the pupative ancestral arrangement of insects. All protein-coding genes start with ATN codons. Five, five and three protein-coding genes stop with termination codon TAA, TA and T, respectively. Phylogenetic analysis using Bayesian method based on the first and second codon positions of the protein-coding genes supported that the Scirtidae is a basal lineage of Polyphaga. The Harmonia and the Coccinella form a sister lineage. The monophyly of Staphyliniformia, Scarabaeiformia and Cucujiformia was supported. The Buprestidae was found to be a sister group to the Bostrichiformia.
Prediction and analysis of three gene families related to leaf rust (Puccinia triticina) resistance in wheat (Triticum aestivum L.).

PubMed

Peng, Fred Y; Yang, Rong-Cai

2017-06-20

The resistance to leaf rust (Lr) caused by Puccinia triticina in wheat (Triticum aestivum L.) has been well studied over the past decades with over 70 Lr genes being mapped on different chromosomes and numerous QTLs (quantitative trait loci) being detected or mapped using DNA markers. Such resistance is often divided into race-specific and race-nonspecific resistance. The race-nonspecific resistance can be further divided into resistance to most or all races of the same pathogen and resistance to multiple pathogens. At the molecular level, these three types of resistance may cover across the whole spectrum of pathogen specificities that are controlled by genes encoding different protein families in wheat. The objective of this study is to predict and analyze genes in three such families: NBS-LRR (nucleotide-binding sites and leucine-rich repeats or NLR), START (Steroidogenic Acute Regulatory protein [STaR] related lipid-transfer) and ABC (ATP-Binding Cassette) transporter. The focus of the analysis is on the patterns of relationships between these protein-coding genes within the gene families and QTLs detected for leaf rust resistance. We predicted 526 ABC, 1117 NLR and 144 START genes in the hexaploid wheat genome through a domain analysis of wheat proteome. Of the 1809 SNPs from leaf rust resistance QTLs in seedling and adult stages of wheat, 126 SNPs were found within coding regions of these genes or their neighborhood (5 Kb upstream from transcription start site [TSS] or downstream from transcription termination site [TTS] of the genes). Forty-three of these SNPs for adult resistance and 18 SNPs for seedling resistance reside within coding or neighboring regions of the ABC genes whereas 14 SNPs for adult resistance and 29 SNPs for seedling resistance reside within coding or neighboring regions of the NLR gene. Moreover, we found 17 nonsynonymous SNPs for adult resistance and five SNPs for seedling resistance in the ABC genes, and five nonsynonymous SNPs for adult resistance and six SNPs for seedling resistance in the NLR genes. Most of these coding SNPs were predicted to alter encoded amino acids and such information may serve as a starting point towards more thorough molecular and functional characterization of the designated Lr genes. Using the primer sequences of 99 known non-SNP markers from leaf rust resistance QTLs, we found candidate genes closely linked to these markers, including Lr34 with distances to its two gene-specific markers being 1212 bases (to cssfr1) and 2189 bases (to cssfr2). This study represents a comprehensive analysis of ABC, NLR and START genes in the hexaploid wheat genome and their physical relationships with QTLs for leaf rust resistance at seedling and adult stages. Our analysis suggests that the ABC (and START) genes are more likely to be co-located with QTLs for race-nonspecific, adult resistance whereas the NLR genes are more likely to be co-located with QTLs for race-specific resistance that would be often expressed at the seedling stage. Though our analysis was hampered by inaccurate or unknown physical positions of numerous QTLs due to the incomplete assembly of the complex hexaploid wheat genome that is currently available, the observed associations between (i) QTLs for race-specific resistance and NLR genes and (ii) QTLs for nonspecific resistance and ABC genes will help discover SNP variants for leaf rust resistance at seedling and adult stages. The genes containing nonsynonymous SNPs are promising candidates that can be investigated in future studies as potential new sources of leaf rust resistance in wheat breeding.
Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs

PubMed Central

Guttman, Mitchell; Garber, Manuel; Levin, Joshua Z.; Donaghey, Julie; Robinson, James; Adiconis, Xian; Fan, Lin; Koziol, Magdalena J.; Gnirke, Andreas; Nusbaum, Chad; Rinn, John L.; Lander, Eric S.; Regev, Aviv

2010-01-01

RNA-Seq provides an unbiased way to study a transcriptome, including both coding and non-coding genes. To date, most RNA-Seq studies have critically depended on existing annotations, and thus focused on expression levels and variation in known transcripts. Here, we present Scripture, a method to reconstruct the transcriptome of a mammalian cell using only RNA-Seq reads and the genome sequence. We apply it to mouse embryonic stem cells, neuronal precursor cells, and lung fibroblasts to accurately reconstruct the full-length gene structures for the vast majority of known expressed genes. We identify substantial variation in protein-coding genes, including thousands of novel 5′-start sites, 3′-ends, and internal coding exons. We then determine the gene structures of over a thousand lincRNA and antisense loci. Our results open the way to direct experimental manipulation of thousands of non-coding RNAs, and demonstrate the power of ab initio reconstruction to render a comprehensive picture of mammalian transcriptomes. PMID:20436462
The complete mitochondrial genome of Gryllotalpa unispina Saussure, 1874 (Orthoptera: Gryllotalpoidea: Gryllotalpidae).

PubMed

Zhang, Yulong; Shao, Dandan; Cai, Miao; Yin, Hong; Zhang, Daochuan

2016-01-01

The complete mitochondrial genome of Gryllotalpa unispina was 15,513 bp in length and contained 70.9% AT. All G. unispina protein-coding sequences except for the nad2 started with a typical ATN codon. The usual termination codons (TAA) and incomplete stop codons (T) were found from 13 protein-coding genes. All tRNA genes were folded into the typical cloverleaf secondary structure, except trnS(AGN) lacking the dihydrouridine arm. The sizes of the large and small ribosomal RNA genes were 1245 and 725 bp, respectively. The A + T-rich region was 917 bp in length with 76.8%. The orientation and gene order of the G. unispina mitogenome were identical to the G. orientalis and G. pluvialis, there was no phenomenon of "DK rearrangement" which has been widely reported in Caelifera.
The Complete Mitogenome of the Wood-Feeding Cockroach Cryptocercus meridianus (Blattodea: Cryptocercidae) and Its Phylogenetic Relationship among Cockroach Families.

PubMed

Li, Weijun; Wang, Zongqing; Che, Yanli

2017-11-12

In this study, the complete mitochondrial genome of Cryptocercus meridianus was sequenced. The circular mitochondrial genome is 15,322 bp in size and contains 13 protein-coding genes, two ribosomal RNA genes (12S rRNA and 16S rRNA), 22 transfer RNA genes, and one D-loop region. We compare the mitogenome of C. meridianus with that of C. relictus and C. kyebangensis . The base composition of the whole genome was 45.20%, 9.74%, 16.06%, and 29.00% for A, G, C, and T, respectively; it shows a high AT content (74.2%), similar to the mitogenomes of C. relictus and C. kyebangensis . The protein-coding genes are initiated with typical mitochondrial start codons except for cox1 with TTG. The gene order of the C. meridianus mitogenome differs from the typical insect pattern for the translocation of tRNA-Ser AGN , while the mitogenomes of the other two Cryptocercus species, C. relictus and C. kyebangensis , are consistent with the typical insect pattern. There are two very long non-coding intergenic regions lying on both sides of the rearranged gene tRNA-Ser AGN . The phylogenetic relationships were constructed based on the nucleotide sequence of 13 protein-coding genes and two ribosomal RNA genes. The mitogenome of C. meridianus is the first representative of the order Blattodea that demonstrates rearrangement, and it will contribute to the further study of the phylogeny and evolution of the genus Cryptocercus and related taxa.
Complete mitochondrial genome of the Yellow-spotted skate Okamejei hollandi (Rajiformes: Rajidae).

PubMed

Li, Weidong; Chen, Xiao; Liu, Wenai; Sun, Renjie; Zhou, Haolang

2016-07-01

The complete mitochondrial genome of the Yellow-spotted skate Okamejei hollandi was determined in this study. It is 16,974 bp in length and contains 13 protein-coding genes, two rRNA genes, 22 tRNA genes, and one putative control region. The overall base composition is 30.5% A, 27.8% C, 14.0% G, and 27.8% T. There are 28 bp short intergenic spaces located in 12 gene junctions and 31 bp overlaps located in nine gene junctions in the whole mitogenome. Two start codons (ATG and GTG) and two stop codons (TAG and TAA/T) were used in the protein-coding genes. The lengths of 22 tRNA genes range from 68 (tRNA-Ser2) to 75 (tRNA-Leu1) bp. The origin of L-strand replication (OL) sequence (37 bp) was identified between the tRNA-Asn and tRNA-Cys genes. The control region is 1311 bp in length with high A + T and poor G content.
Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA

PubMed Central

Djebali, Sarah; Delaplace, Franck; Crollius, Hugues Roest

2006-01-01

Background Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism. Results We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts. Conclusion We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement. PMID:16925841
The complete mitochondrial genome of Setaria digitata (Nematoda: Filarioidea): Mitochondrial gene content, arrangement and composition compared with other nematodes.

PubMed

Yatawara, Lalani; Wickramasinghe, Susiji; Rajapakse, R P V J; Agatsuma, Takeshi

2010-09-01

In the present study, we determined the complete mitochondrial (mt) genome sequence (13,839bp) of parasitic nematode Setaria digitata and its structure and organization compared with Onchocerca volvulus, Dirofilaria immitis and Brugia malayi. The mt genome of S. digitata is slightly larger than the mt genomes of other filarial nematodes. S. digitata mt genome contains 36 genes (12 protein-coding genes, 22 transfer RNAs and 2 ribosomal RNAs) that are typically found in metazoans. This genome contains a high A+T (75.1%) content and low G+C content (24.9%). The mt gene order for S. digitata is the same as those for O. volvulus, D. immitis and B. malayi but it is distinctly different from other nematodes compared. The start codons inferred in the mt genome of S. digitata are TTT, ATT, TTG, ATG, GTT and ATA. Interestingly, the initiation codon TTT is unique to S. digitata mt genome and four protein-coding genes use this codon as a translation initiation codon. Five protein-coding genes use TAG as a stop codon whereas three genes use TAA and four genes use T as a termination codon. Out of 64 possible codons, only 57 are used for mitochondrial protein-coding genes of S. digitata. T-rich codons such as TTT (18.9%), GTT (7.9%), TTG (7.8%), TAT (7%), ATT (5.7%), TCT (4.8%) and TTA (4.1%) are used more frequently. This pattern of codon usage reflects the strong bias for T in the mt genome of S. digitata. In conclusion, the present investigation provides new molecular data for future studies of the comparative mitochondrial genomics and systematic of parasitic nematodes of socio-economic importance. 2010 Elsevier B.V. All rights reserved.
Evidence for the recent origin of a bacterial protein-coding, overlapping orphan gene by evolutionary overprinting.

PubMed

Fellner, Lea; Simon, Svenja; Scherling, Christian; Witting, Michael; Schober, Steffen; Polte, Christine; Schmitt-Kopplin, Philippe; Keim, Daniel A; Scherer, Siegfried; Neuhaus, Klaus

2015-12-18

Gene duplication is believed to be the classical way to form novel genes, but overprinting may be an important alternative. Overprinting allows entirely novel proteins to evolve de novo, i.e., formerly non-coding open reading frames within functional genes become expressed. Only three cases have been described for Escherichia coli. Here, a fourth example is presented. RNA sequencing revealed an open reading frame weakly transcribed in cow dung, coding for 101 residues and embedded completely in the -2 reading frame of citC in enterohemorrhagic E. coli. This gene is designated novel overlapping gene, nog1. The promoter region fused to gfp exhibits specific activities and 5' rapid amplification of cDNA ends indicated the transcriptional start 40-bp upstream of the start codon. nog1 was strand-specifically arrested in translation by a nonsense mutation silent in citC. This Nog1-mutant showed a phenotype in competitive growth against wild type in the presence of MgCl2. Small differences in metabolite concentrations were also found. Bioinformatic analyses propose Nog1 to be inner membrane-bound and to possess at least one membrane-spanning domain. A phylogenetic analysis suggests that the orphan gene nog1 arose by overprinting after Escherichia/Shigella separated from the other γ-proteobacteria. Since nog1 is of recent origin, non-essential, short, weakly expressed and only marginally involved in E. coli's central metabolism, we propose that this gene is in an initial stage of evolution. While we present specific experimental evidence for the existence of a fourth overlapping gene in enterohemorrhagic E. coli, we believe that this may be an initial finding only and overlapping genes in bacteria may be more common than is currently assumed by microbiologists.
Isolation and sequencing of the gene encoding Sp23, a structural protein of spermatophore of the mealworm beetle, Tenebrio molitor.

PubMed

Feng, X; Happ, G M

1996-11-14

The cDNA for Sp23, a structural protein of the spermatophore of Tenebrio molitor, had been previously cloned and characterized (Paesen, G.C., Schwartz, M.B., Peferoen, M., Weyda, F. and Happ, G.M. (1992a) Amino acid sequence of Sp23, a structure protein of the spermatophore of the mealworm beetle, Tenebrio molitor. J. Biol. Chem. 257, 18852-18857). Using the labeled cDNA for Sp23 as a probe to screen a library of genomic DNA from Tenebrio molitor, we isolated a genomic clone for Sp23. A 5373-base pair (bp) restriction fragment containing the Sp23 gene was sequenced. The coding region is separated by a 55-bp intron which is located close to the translation start site. Three putative ecdysone response elements (EcRE) are identified in the 5' flanking region of the Sp23 gene. Comparison of the flanking regions of the Sp23 gene with those of the D-protein gene expressed in the accessory glands of Tenebrio reveals similar sequences present in the flanking regions of the two genes. The genomic organization of the coding region of the Sp23 gene shares similarities with that of the D-protein gene, three Drosophila accessory gland genes and two Drosophila 20-OH ecdysone-responsive genes.
GeneBuilder: interactive in silico prediction of gene structure.

PubMed

Milanesi, L; D'Angelo, D; Rogozin, I B

1999-01-01

Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
The mitochondrial genome of Polistes jokahamae and a phylogenetic analysis of the Vespoidea (Insecta: Hymenoptera).

PubMed

Song, Sheng-Nan; Chen, Peng-Yan; Wei, Shu-Jun; Chen, Xue-Xin

2016-07-01

The mitochondrial genome sequence of Polistes jokahamae (Radoszkowski, 1887) (Hymenoptera: Vespidae) (GenBank accession no. KR052468) was sequenced. The current length with partial A + T-rich region of this mitochondrial genome is 16,616 bp. All the typical mitochondrial genes were sequenced except for three tRNAs (trnI, trnQ, and trnY) located between the A + T-rich region and nad2. At least three rearrangement events occurred in the sequenced region compared with the pupative ancestral arrangement of insects, corresponding to the shuffling of trnK and trnD, translocation or remote inversion of tnnY and translocation of trnL1. All protein-coding genes start with ATN codons. Eleven, one, and another one protein-coding genes stop with termination codon TAA, TA, and T, respectively. Phylogenetic analysis using the Bayesian method based on all codon positions of the 13 protein-coding genes supports the monophyly of Vespidae and Formicidae. Within the Formicidae, the Myrmicinae and Formicinae form a sister lineage and then sister to the Dolichoderinae, while within the Vespidae, the Eumeninae is sister to the lineage of Vespinae + Polistinae.
Problem-Solving Test: The Effect of Synonymous Codons on Gene Expression

ERIC Educational Resources Information Center

Szeberenyi, Jozsef

2009-01-01

Terms to be familiar with before you start to solve the test: the genetic code, codon, degenerate codons, protein synthesis, aminoacyl-tRNA, anticodon, antiparallel orientation, wobble, unambiguous codons, ribosomes, initiation, elongation and termination of translation, peptidyl transferase, translocation, degenerate oligonucleotides, green…
Genome-wide transcription start site profiling in biofilm-grown Burkholderia cenocepacia J2315.

PubMed

Sass, Andrea M; Van Acker, Heleen; Förstner, Konrad U; Van Nieuwerburgh, Filip; Deforce, Dieter; Vogel, Jörg; Coenye, Tom

2015-10-13

Burkholderia cenocepacia is a soil-dwelling Gram-negative Betaproteobacterium with an important role as opportunistic pathogen in humans. Infections with B. cenocepacia are very difficult to treat due to their high intrinsic resistance to most antibiotics. Biofilm formation further adds to their antibiotic resistance. B. cenocepacia harbours a large, multi-replicon genome with a high GC-content, the reference genome of strain J2315 includes 7374 annotated genes. This study aims to annotate transcription start sites and identify novel transcripts on a whole genome scale. RNA extracted from B. cenocepacia J2315 biofilms was analysed by differential RNA-sequencing and the resulting dataset compared to data derived from conventional, global RNA-sequencing. Transcription start sites were annotated and further analysed according to their position relative to annotated genes. Four thousand ten transcription start sites were mapped over the whole B. cenocepacia genome and the primary transcription start site of 2089 genes expressed in B. cenocepacia biofilms were defined. For 64 genes a start codon alternative to the annotated one was proposed. Substantial antisense transcription for 105 genes and two novel protein coding sequences were identified. The distribution of internal transcription start sites can be used to identify genomic islands in B. cenocepacia. A potassium pump strongly induced only under biofilm conditions was found and 15 non-coding small RNAs highly expressed in biofilms were discovered. Mapping transcription start sites across the B. cenocepacia genome added relevant information to the J2315 annotation. Genes and novel regulatory RNAs putatively involved in B. cenocepacia biofilm formation were identified. These findings will help in understanding regulation of B. cenocepacia biofilm formation.

Co-expression of the Thermotoga neapolitana aglB gene with an upstream 3'-coding fragment of the malG gene improves enzymatic characteristics of recombinant AglB cyclomaltodextrinase.

PubMed

Lunina, Natalia A; Agafonova, Elena V; Chekanovskaya, Lyudmila A; Dvortsov, Igor A; Berezina, Oksana V; Shedova, Ekaterina N; Kostrov, Sergey V; Velikodvorskaya, Galina A

2007-07-01

A cluster of Thermotoga neapolitana genes participating in starch degradation includes the malG gene of sugar transport protein and the aglB gene of cyclomaltodextrinase. The start and stop codons of these genes share a common overlapping sequence, aTGAtg. Here, we compared properties of expression products of three different constructs with aglB from T. neapolitana. The first expression vector contained the aglB gene linked to an upstream 90-bp 3'-terminal region of the malG gene with the stop codon overlapping with the start codon of aglB. The second construct included the isolated coding sequence of aglB with two tandem potential start codons. The expression product of this construct in Escherichia coli had two tandem Met residues at its N terminus and was characterized by low thermostability and high tendency to aggregate. In contrast, co-expression of aglB and the 3'-terminal region of malG (the first construct) resulted in AglB with only one N-terminal Met residue and a much higher specific activity of cyclomaltodextrinase. Moreover, the enzyme expressed by such a construct was more thermostable and less prone to aggregation. The third construct was the same as the second one except that it contained only one ATG start codon. The product of its expression had kinetic and other properties similar to those of the enzyme with only one N-terminal Met residue.
Genetic Variation Linked to Lung Cancer Survival in White Smokers | Center for Cancer Research

Cancer.gov

CCR investigators have discovered evidence that links lung cancer survival with genetic variations (called single nucleotide polymorphisms) in the MBL2 gene, a key player in innate immunity. The variations in the gene, which codes for a protein called the mannose-binding lectin, occur in its promoter region, where the RNA polymerase molecule binds to start transcription, and
Brain cDNA clone for human cholinesterase

DOE Office of Scientific and Technical Information (OSTI.GOV)

McTiernan, C.; Adkins, S.; Chatonnet, A.

1987-10-01

A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less
Complete mitochondrial genome of Taharana fasciana (Insecta, Hemiptera: Cicadellidae) and comparison with other Cicadellidae insects.

PubMed

Wang, Jiajia; Li, Hu; Dai, Renhuai

2017-12-01

Here, we describe the first complete mitochondrial genome (mitogenome) sequence of the leafhopper Taharana fasciana (Coelidiinae). The mitogenome sequence contains 15,161 bp with an A + T content of 77.9%. It includes 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and one non-coding (A + T-rich) region; in addition, a repeat region is also present (GenBank accession no. KY886913). These genes/regions are in the same order as in the inferred insect ancestral mitogenome. All protein-coding genes have ATN as the start codon, and TAA or single T as the stop codons, except the gene ND3, which ends with TAG. Furthermore, we predicted the secondary structures of the rRNAs in T. fasciana. Six domains (domain III is absent in arthropods) and 41 helices were predicted for 16S rRNA, and 12S rRNA comprised three structural domains and 24 helices. Phylogenetic tree analysis confirmed that T. fasciana and other members of the Cicadellidae are clustered into a clade, and it identified the relationships among the subfamilies Deltocephalinae, Coelidiinae, Idiocerinae, Cicadellinae, and Typhlocybinae.
Expression of the Caulobacter heat shock gene dnaK is developmentally controlled during growth at normal temperatures.

PubMed Central

Gomes, S L; Gober, J W; Shapiro, L

1990-01-01

Caulobacter crescentus has a single dnaK gene that is highly homologous to the hsp70 family of heat shock genes. Analysis of the cloned and sequenced dnaK gene has shown that the deduced amino acid sequence could encode a protein of 67.6 kilodaltons that is 68% identical to the DnaK protein of Escherichia coli and 49% identical to the Drosophila and human hsp70 protein family. A partial open reading frame 165 base pairs 3' to the end of dnaK encodes a peptide of 190 amino acids that is 59% identical to DnaJ of E. coli. Northern blot analysis revealed a single 4.0-kilobase mRNA homologous to the cloned fragment. Since the dnaK coding region is 1.89 kilobases, dnaK and dnaJ may be transcribed as a polycistronic message. S1 mapping and primer extension experiments showed that transcription initiated at two sites 5' to the dnaK coding sequence. A single start site of transcription was identified during heat shock at 42 degrees C, and the predicted promoter sequence conformed to the consensus heat shock promoters of E. coli. At normal growth temperature (30 degrees C), a different start site was identified 3' to the heat shock start site that conformed to the E. coli sigma 70 promoter consensus sequence. S1 protection assays and analysis of expression of the dnaK gene fused to the lux transcription reporter gene showed that expression of dnaK is temporally controlled under normal physiological conditions and that transcription occurs just before the initiation of DNA replication. Thus, in both human cells (I. K. L. Milarski and R. I. Morimoto, Proc. Natl. Acad. Sci. USA 83:9517-9521, 1986) and in a simple bacterium, the transcription of a hsp70 gene is temporally controlled as a function of the cell cycle under normal growth conditions. Images PMID:2345134
Intron-exon organization of the active human protein S gene PS. alpha. and its pseudogene PS. beta. : Duplication and silencing during primate evolution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ploos van Amstel, H.; Reitsma, P.H.; van der Logt, C.P.

The human protein S locus on chromosome 3 consists of two protein S genes, PS{alpha} and PS{beta}. Here the authors report the cloning and characterization of both genes. Fifteen exons of the PS{alpha} gene were identified that together code for protein S mRNA as derived from the reported protein S cDNAs. Analysis by primer extension of liver protein S mRNA, however, reveals the presence of two mRNA forms that differ in the length of their 5{prime}-noncoding region. Both transcripts contain a 5{prime}-noncoding region longer than found in the protein S cDNAs. The two products may arise from alternative splicing ofmore » an additional intron in this region or from the usage of two start sites for transcription. The intron-exon organization of the PS{alpha} gene fully supports the hypothesis that the protein S gene is the product of an evolutional assembling process in which gene modules coding for structural/functional protein units also found in other coagulation proteins have been put upstream of the ancestral gene of a steroid hormone binding protein. The PS{beta} gene is identified as a pseudogene. It contains a large variety of detrimental aberrations, viz., the absence of exon I, a splice site mutation, three stop codons, and a frame shift mutation. Overall the two genes PS{alpha} and PS{beta} show between their exonic sequences 96.5% homology. Southern analysis of primate DNA showed that the duplication of the ancestral protein S gene has occurred after the branching of the orangutan from the African apes. A nonsense mutation that is present in the pseudogene of man also could be identified in one of the two protein S genes of both chimpanzee and gorilla. This implicates that silencing of one of the two protein S genes must have taken place before the divergence of the three African apes.« less
The complete mitochondrial genome of the mudsnail Cipangopaludina cathayensis (Gastropoda: Viviparidae).

PubMed

Yang, Huirong; Zhang, Jia-En; Luo, Hao; Luo, Mingzhu; Guo, Jing; Deng, Zhixin; Zhao, Benliang

2016-05-01

We present the complete mitochondrial genome of Cipangopaludina cathayensis in this study. The mitochondrial genome is 17,157 bp in length, containing 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes. All of them are encoded on the heavy strand except 7 tRNA genes on the light strand. Overall nucleotide compositions of the light strand are 44.51% of A, 26.74% of T, 20.48% of C and 8.28% of G. All the protein-coding genes start with ATG initiation codon except ATP6 with ATA and ND4 with TTG, and 2 types of termination codons are TAA (ATP6, ND2, COX1, COX2, ATP8, ND1, ND6, Cytb, COX3, ND4) and TAG (ND4L, ND5, ND3). There are 29 intergenic spacers and 5 gene overlaps. The tandem repeat sequences are observed in COX2, tRNA(Asp), ATP6, tRNA(Cys), S-rRNA, ND1, Cytb, ND4 and COX3 genes. Gene arrangement and distribution are different from the typical vertebrates. The absence of D-loop is consistent with the Gastropoda, but at least one lengthy non-coding region is essential regulatory element for the initiation of transcription and replication.
The high-level expression of human tissue plasminogen activator in the milk of transgenic mice with hybrid gene locus strategy.

PubMed

Zhou, Yanrong; Lin, Yanli; Wu, Xiaojie; Xiong, Fuyin; Lv, Yuemeng; Zheng, Tao; Huang, Peitang; Chen, Hongxing

2012-02-01

Transgene expression for the mammary gland bioreactor aimed at producing recombinant proteins requires optimized expression vector construction. Previously we presented a hybrid gene locus strategy, which was originally tested with human lactoferrin (hLF) as target transgene, and an extremely high-level expression of rhLF ever been achieved as to 29.8 g/l in mice milk. Here to demonstrate the broad application of this strategy, another 38.4 kb mWAP-htPA hybrid gene locus was constructed, in which the 3-kb genomic coding sequence in the 24-kb mouse whey acidic protein (mWAP) gene locus was substituted by the 17.4-kb genomic coding sequence of human tissue plasminogen activator (htPA), exactly from the start codon to the end codon. Corresponding five transgenic mice lines were generated and the highest expression level of rhtPA in the milk attained as to 3.3 g/l. Our strategy will provide a universal way for the large-scale production of pharmaceutical proteins in the mammary gland of transgenic animals.
Mechanisms and consequences of alternative polyadenylation

PubMed Central

Di Giammartino, Dafne Campigli; Nishida, Kensei; Manley, James L.

2011-01-01

Summary Alternative polyadenylation (APA) is emerging as a widespread mechanism used to control gene expression. Like alternative splicing, usage of alternative poly(A) sites allows a single gene to encode multiple mRNA transcripts. In some cases, this changes the mRNA coding potential; in other cases, the code remains unchanged but the 3’UTR length is altered, influencing the fate of mRNAs in several ways, for example, by altering the availability of RNA binding protein sites and microRNA binding sites. The mechansims governing both global and gene-specific APA are only starting to be deciphered. Here we review what is known about these mechanisms and the functional consequences of alternative polyadenlyation. PMID:21925375
The complete mitochondrial genome of the American black flour beetle Tribolium audax (Coleoptera: Tenebrionidae).

PubMed

Ou, Jing; Liu, Jin-Bo; Yao, Fu-Jiao; Wang, Xin-Guo; Wei, Zhao-Ming

2016-01-01

Flour beetles of the genus Tribolium are all pests of stored products and cause severe economic losses every year. The American black flour beetle Tribolium audax is one of the important pest species of flour beetle, and it is also an important quarantine insect. Here we sequenced and characterized the complete mitochondrial genome of T. audax, which was intercepted by Huangpu Custom in maize from America. The complete circular mitochondrial genome (mitogenome) of T. audax was 15,924 bp in length, containing 37 typical coding genes and one non-coding AT-rich region. The mitogenome of T. audax exhibits a gene arrangement and content identical to the most common type in insects. All protein coding genes (PCGs) are start with a typical ATN initiation codon, except for the cox1, which use AAC as its start codon instead of ATN. Eleven genes use standard complete termination codon (nine TAA, two TAG), whereas the nad4 and nad5 genes end with single T. Except for trnS1 (AGN), all tRNA genes display typical secondary cloverleaf structures as those of other insects. The sizes of the large and small ribosomal RNA genes are 1288 and 780 bp, respectively. The AT content of the AT-rich region is 81.36%. The 5 bp conserved motif TACTA was found in the intergenic region between trnS2 (UCN) and nad1.
Xenopus laevis ribosomal protein genes: isolation of recombinant cDNA clones and study of the genomic organization.

PubMed Central

Bozzoni, I; Beccari, E; Luo, Z X; Amaldi, F

1981-01-01

Poly-A+ mRNA from Xenopus laevis oocytes, partially enriched for r-protein coding capacity has been used as starting material for preparing a cDNA bank in plasmid pBR322. The clones containing sequences specific for r-proteins have been selected by translation of the complementary mRNAs. Clones for six different r-proteins have been identified and utilized as probes for studying their genomic organization. Two gene copies per haploid genome were found for r-proteins L1, L14, S19, and four-five for protein S1, S8 and L32. Moreover a population polymorphism has been observed for the genomic regions containing sequences for r-protein S1, S8 and L14. Images PMID:6112733
The complete mitochondrial genome of the Aluterus monoceros.

PubMed

Li, Wenshen; Zhang, Guoqing; Wen, Xin; Wang, Qian; Chen, Guohua

2016-07-01

The complete mitochondrial genome of Aluterus monoceros (A. monoceros) has been sequenced. The mitochondrial genome of A. monoceros is 16,429 bp in length, consisting of 22 tRNA genes, 2 rRNA genes, 13 protein-coding genes and a D-loop region (Gen Bank accession number KP637022). The base A + T of the mitochondrial genome is 63.25%, including 33.16% of A, 30.09% of T and 20.74% of C. Twelve protein-coding genes start with a standard ATG as the initiation codon, expect for the COXI, which begins with GTG. Some of the termination codons are incomplete T or TA, except for the ND1, COXI, ATP8, ND4L1, ND5 and ND6, which stop with TAA. Construction of phylogenetic trees based on the entire mitochondrial genome sequence of 14 Tetrodontiformes species constructed has suggested that A. monoceros has closer relationship with Acreichthys tomentosus and Monacanthus chinensis, and they constitute a sister group.
Characterization of the complete mitochondrial genome sequence of wild yak (Bos mutus).

PubMed

Chunnian, Liang; Wu, Xiaoyun; Ding, Xuezhi; Wang, Hongbo; Guo, Xian; Chu, Min; Bao, Pengjia; Yan, Ping

2016-11-01

Wild yak is a special breed in China and it is regarded as an important genetic resource for sustainably developing the animal husbandry in Tibetan area and enriching region's biodiversity. The complete mitochondrial genome of wild yak (16,322 bp in length) displayed 37 typical animal mitochondrial genes and A + T-rich (61.01%), with an overall G + C content of only 38.99%. It contained a non-coding control region (D-loop), 13 protein-coding genes, two rRNA genes, and 22 tRNA genes. Most of the genes have ATG initiation codons, whereas ND2, ND3, and ND5 genes start with ATA and were encoded on H-strand. The gene order of wild yak mitogenome is identical to that observed in most other vertebrates. The complete mitochondrial genome sequence of wild yak reported here could provide valuable information for developing genetic markers and phylogenetic analysis in yak.
The complete mitochondrial genome of the longhorn beetle Xylotrechus grayii (Coleoptera: Cerambycidae).

PubMed

Guo, Kun; Chen, Jun; Xu, Chang-Qing; Qiao, Hai-Li; Xu, Rong; Zhao, Xiang-Jian

2016-05-01

We sequenced the complete mitochondrial genome of the longhorn beetle, Xylotrechus grayii. The total length of the X. grayii mitogenome was 15,540 bp with an A + T content of 75.29%, consisting of 13 protein-coding genes (PCGs), 22 tRNA genes, 2 rRNA genes and an A + T-rich region. All the genes were arranged in the same order as that of the ancestral insect. All PCGs started with a typical ATN codon except for cox1 and nad1, which used TTG as start codon. Ten out of 13 PCGs terminated with incomplete codons (TA or T). The A + T-rich region was 893 bp in length with an A + T content of 85.89 %.
Approaches to Fungal Genome Annotation

PubMed Central

Haas, Brian J.; Zeng, Qiandong; Pearson, Matthew D.; Cuomo, Christina A.; Wortman, Jennifer R.

2011-01-01

Fungal genome annotation is the starting point for analysis of genome content. This generally involves the application of diverse methods to identify features on a genome assembly such as protein-coding and non-coding genes, repeats and transposable elements, and pseudogenes. Here we describe tools and methods leveraged for eukaryotic genome annotation with a focus on the annotation of fungal nuclear and mitochondrial genomes. We highlight the application of the latest technologies and tools to improve the quality of predicted gene sets. The Broad Institute eukaryotic genome annotation pipeline is described as one example of how such methods and tools are integrated into a sequencing center’s production genome annotation environment. PMID:22059117
N-terminal Proteomics Assisted Profiling of the Unexplored Translation Initiation Landscape in Arabidopsis thaliana *

PubMed Central

Ndah, Elvis; Jonckheere, Veronique

2017-01-01

Proteogenomics is an emerging research field yet lacking a uniform method of analysis. Proteogenomic studies in which N-terminal proteomics and ribosome profiling are combined, suggest that a high number of protein start sites are currently missing in genome annotations. We constructed a proteogenomic pipeline specific for the analysis of N-terminal proteomics data, with the aim of discovering novel translational start sites outside annotated protein coding regions. In summary, unidentified MS/MS spectra were matched to a specific N-terminal peptide library encompassing protein N termini encoded in the Arabidopsis thaliana genome. After a stringent false discovery rate filtering, 117 protein N termini compliant with N-terminal methionine excision specificity and indicative of translation initiation were found. These include N-terminal protein extensions and translation from transposable elements and pseudogenes. Gene prediction provided supporting protein-coding models for approximately half of the protein N termini. Besides the prediction of functional domains (partially) contained within the newly predicted ORFs, further supporting evidence of translation was found in the recently released Araport11 genome re-annotation of Arabidopsis and computational translations of sequences stored in public repositories. Most interestingly, complementary evidence by ribosome profiling was found for 23 protein N termini. Finally, by analyzing protein N-terminal peptides, an in silico analysis demonstrates the applicability of our N-terminal proteogenomics strategy in revealing protein-coding potential in species with well- and poorly-annotated genomes. PMID:28432195
N-terminal Proteomics Assisted Profiling of the Unexplored Translation Initiation Landscape in Arabidopsis thaliana.

PubMed

Willems, Patrick; Ndah, Elvis; Jonckheere, Veronique; Stael, Simon; Sticker, Adriaan; Martens, Lennart; Van Breusegem, Frank; Gevaert, Kris; Van Damme, Petra

2017-06-01

Proteogenomics is an emerging research field yet lacking a uniform method of analysis. Proteogenomic studies in which N-terminal proteomics and ribosome profiling are combined, suggest that a high number of protein start sites are currently missing in genome annotations. We constructed a proteogenomic pipeline specific for the analysis of N-terminal proteomics data, with the aim of discovering novel translational start sites outside annotated protein coding regions. In summary, unidentified MS/MS spectra were matched to a specific N-terminal peptide library encompassing protein N termini encoded in the Arabidopsis thaliana genome. After a stringent false discovery rate filtering, 117 protein N termini compliant with N-terminal methionine excision specificity and indicative of translation initiation were found. These include N-terminal protein extensions and translation from transposable elements and pseudogenes. Gene prediction provided supporting protein-coding models for approximately half of the protein N termini. Besides the prediction of functional domains (partially) contained within the newly predicted ORFs, further supporting evidence of translation was found in the recently released Araport11 genome re-annotation of Arabidopsis and computational translations of sequences stored in public repositories. Most interestingly, complementary evidence by ribosome profiling was found for 23 protein N termini. Finally, by analyzing protein N-terminal peptides, an in silico analysis demonstrates the applicability of our N-terminal proteogenomics strategy in revealing protein-coding potential in species with well- and poorly-annotated genomes. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Measles virus minigenomes encoding two autofluorescent proteins reveal cell-to-cell variation in reporter expression dependent on viral sequences between the transcription units.

PubMed

Rennick, Linda J; Duprex, W Paul; Rima, Bert K

2007-10-01

Transcription from morbillivirus genomes commences at a single promoter in the 3' non-coding terminus, with the six genes being transcribed sequentially. The 3' and 5' untranslated regions (UTRs) of the genes (mRNA sense), together with the intergenic trinucleotide spacer, comprise the non-coding sequences (NCS) of the virus and contain the conserved gene end and gene start signals, respectively. Bicistronic minigenomes containing transcription units (TUs) encoding autofluorescent reporter proteins separated by measles virus (MV) NCS were used to give a direct estimation of gene expression in single, living cells by assessing the relative amounts of each fluorescent protein in each cell. Initially, five minigenomes containing each of the MV NCS were generated. Assays were developed to determine the amount of each fluorescent protein in cells at both cell population and single-cell levels. This revealed significant variations in gene expression between cells expressing the same NCS-containing minigenome. The minigenome containing the M/F NCS produced significantly lower amounts of fluorescent protein from the second TU (TU2), compared with the other minigenomes. A minigenome with a truncated F 5' UTR had increased expression from TU2. This UTR is 524 nt longer than the other MV 5' UTRs. Insertions into the 5' UTR of the enhanced green fluorescent protein gene in the minigenome containing the N/P NCS showed that specific sequences, rather than just the additional length of F 5' UTR, govern this decreased expression from TU2.
The Evolution and Expression Pattern of Human Overlapping lncRNA and Protein-coding Gene Pairs.

PubMed

Ning, Qianqian; Li, Yixue; Wang, Zhen; Zhou, Songwen; Sun, Hong; Yu, Guangjun

2017-03-27

Long non-coding RNA overlapping with protein-coding gene (lncRNA-coding pair) is a special type of overlapping genes. Protein-coding overlapping genes have been well studied and increasing attention has been paid to lncRNAs. By studying lncRNA-coding pairs in human genome, we showed that lncRNA-coding pairs were more likely to be generated by overprinting and retaining genes in lncRNA-coding pairs were given higher priority than non-overlapping genes. Besides, the preference of overlapping configurations preserved during evolution was based on the origin of lncRNA-coding pairs. Further investigations showed that lncRNAs promoting the splicing of their embedded protein-coding partners was a unilateral interaction, but the existence of overlapping partners improving the gene expression was bidirectional and the effect was decreased with the increased evolutionary age of genes. Additionally, the expression of lncRNA-coding pairs showed an overall positive correlation and the expression correlation was associated with their overlapping configurations, local genomic environment and evolutionary age of genes. Comparison of the expression correlation of lncRNA-coding pairs between normal and cancer samples found that the lineage-specific pairs including old protein-coding genes may play an important role in tumorigenesis. This work presents a systematically comprehensive understanding of the evolution and the expression pattern of human lncRNA-coding pairs.
A linear mitochondrial genome of Cyclospora cayetanensis (Eimeriidae, Eucoccidiorida, Coccidiasina, Apicomplexa) suggests the ancestral start position within mitochondrial genomes of eimeriid coccidia.

PubMed

Ogedengbe, Mosun E; Qvarnstrom, Yvonne; da Silva, Alexandre J; Arrowood, Michael J; Barta, John R

2015-05-01

The near complete mitochondrial genome for Cyclospora cayetanensis is 6184 bp in length with three protein-coding genes (Cox1, Cox3, CytB) and numerous lsrDNA and ssrDNA fragments. Gene arrangements were conserved with other coccidia in the Eimeriidae, but the C. cayetanensis mitochondrial genome is not circular-mapping. Terminal transferase tailing and nested PCR completed the 5'-terminus of the genome starting with a 21 bp A/T-only region that forms a potential stem-loop. Regions homologous to the C. cayetanensis mitochondrial genome 5'-terminus are found in all eimeriid mitochondrial genomes available and suggest this may be the ancestral start of eimeriid mitochondrial genomes. Copyright © 2015 Australian Society for Parasitology Inc. All rights reserved.

The Blueprint of a Minimal Cell: MiniBacillus

PubMed Central

Reuß, Daniel R.; Commichau, Fabian M.; Gundlach, Jan; Zhu, Bingyao

2016-01-01

SUMMARY Bacillus subtilis is one of the best-studied organisms. Due to the broad knowledge and annotation and the well-developed genetic system, this bacterium is an excellent starting point for genome minimization with the aim of constructing a minimal cell. We have analyzed the genome of B. subtilis and selected all genes that are required to allow life in complex medium at 37°C. This selection is based on the known information on essential genes and functions as well as on gene and protein expression data and gene conservation. The list presented here includes 523 and 119 genes coding for proteins and RNAs, respectively. These proteins and RNAs are required for the basic functions of life in information processing (replication and chromosome maintenance, transcription, translation, protein folding, and secretion), metabolism, cell division, and the integrity of the minimal cell. The completeness of the selected metabolic pathways, reactions, and enzymes was verified by the development of a model of metabolism of the minimal cell. A comparison of the MiniBacillus genome to the recently reported designed minimal genome of Mycoplasma mycoides JCVI-syn3.0 indicates excellent agreement in the information-processing pathways, whereas each species has a metabolism that reflects specific evolution and adaptation. The blueprint of MiniBacillus presented here serves as the starting point for a successive reduction of the B. subtilis genome. PMID:27681641
Comparative transcriptomics of two environmentally relevant cyanobacteria reveals unexpected transcriptome diversity

PubMed Central

Voigt, Karsten; Sharma, Cynthia M; Mitschke, Jan; Joke Lambrecht, S; Voß, Björn; Hess, Wolfgang R; Steglich, Claudia

2014-01-01

Prochlorococcus is a genus of abundant and ecologically important marine cyanobacteria. Here, we present a comprehensive comparison of the structure and composition of the transcriptomes of two Prochlorococcus strains, which, despite their similarities, have adapted their gene pool to specific environmental constraints. We present genome-wide maps of transcriptional start sites (TSS) for both organisms, which are representatives of the two most diverse clades within the two major ecotypes adapted to high- and low-light conditions, respectively. Our data suggest antisense transcription for three-quarters of all genes, which is substantially more than that observed in other bacteria. We discovered hundreds of TSS within genes, most notably within 16 of the 29 prochlorosin genes, in strain MIT9313. A direct comparison revealed very little conservation in the location of TSS and the nature of non-coding transcripts between both strains. We detected extremely short 5′ untranslated regions with a median length of only 27 and 29 nt for MED4 and MIT9313, respectively, and for 8% of all protein-coding genes the median distance to the start codon is only 10 nt or even shorter. These findings and the absence of an obvious Shine–Dalgarno motif suggest that leaderless translation and ribosomal protein S1-dependent translation constitute alternative mechanisms for translation initiation in Prochlorococcus. We conclude that genome-wide antisense transcription is a major component of the transcriptional output from these relatively small genomes and that a hitherto unrecognized high degree of complexity and variability of gene expression exists in their transcriptional architecture. PMID:24739626
A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes.

PubMed

Hezroni, Hadas; Ben-Tov Perry, Rotem; Meir, Zohar; Housman, Gali; Lubelsky, Yoav; Ulitsky, Igor

2017-08-30

Only a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs. We systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality. We estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.
Exceptionally long 5' UTR short tandem repeats specifically linked to primates.

PubMed

Namdar-Aligoodarzi, P; Mohammadparast, S; Zaker-Kandjani, B; Talebi Kakroodi, S; Jafari Vesiehsari, M; Ohadi, M

2015-09-10

We have previously reported genome-scale short tandem repeats (STRs) in the core promoter interval (i.e. -120 to +1 to the transcription start site) of protein-coding genes that have evolved identically in primates vs. non-primates. Those STRs may function as evolutionary switch codes for primate speciation. In the current study, we used the Ensembl database to analyze the 5' untranslated region (5' UTR) between +1 and +60 of the transcription start site of the entire human protein-coding genes annotated in the GeneCards database, in order to identify "exceptionally long" STRs (≥5-repeats), which may be of selective/adaptive advantage. The importance of this critical interval is its function as core promoter, and its effect on transcription and translation. In order to minimize ascertainment bias, we analyzed the evolutionary status of the human 5' UTR STRs of ≥5-repeats in several species encompassing six major orders and superorders across mammals, including primates, rodents, Scandentia, Laurasiatheria, Afrotheria, and Xenarthra. We introduce primate-specific STRs, and STRs which have expanded from mouse to primates. Identical co-occurrence of the identified STRs of rare average frequency between 0.006 and 0.0001 in primates supports a role for those motifs in processes that diverged primates from other mammals, such as neuronal differentiation (e.g. APOD and FGF4), and craniofacial development (e.g. FILIP1L). A number of the identified STRs of ≥5-repeats may be human-specific (e.g. ZMYM3 and DAZAP1). Future work is warranted to examine the importance of the listed genes in primate/human evolution, development, and disease. Copyright © 2015 Elsevier B.V. All rights reserved.
Transcriptional landscapes of Axolotl (Ambystoma mexicanum).

PubMed

Caballero-Pérez, Juan; Espinal-Centeno, Annie; Falcon, Francisco; García-Ortega, Luis F; Curiel-Quesada, Everardo; Cruz-Hernández, Andrés; Bako, Laszlo; Chen, Xuemei; Martínez, Octavio; Alberto Arteaga-Vázquez, Mario; Herrera-Estrella, Luis; Cruz-Ramírez, Alfredo

2018-01-15

The axolotl (Ambystoma mexicanum) is the vertebrate model system with the highest regeneration capacity. Experimental tools established over the past 100 years have been fundamental to start unraveling the cellular and molecular basis of tissue and limb regeneration. In the absence of a reference genome for the Axolotl, transcriptomic analysis become fundamental to understand the genetic basis of regeneration. Here we present one of the most diverse transcriptomic data sets for Axolotl by profiling coding and non-coding RNAs from diverse tissues. We reconstructed a population of 115,906 putative protein coding mRNAs as full ORFs (including isoforms). We also identified 352 conserved miRNAs and 297 novel putative mature miRNAs. Systematic enrichment analysis of gene expression allowed us to identify tissue-specific protein-coding transcripts. We also found putative novel and conserved microRNAs which potentially target mRNAs which are reported as important disease candidates in heart and liver. Copyright © 2017 Elsevier Inc. All rights reserved.
DNA sequence requirements for the accurate transcription of a protein-coding plastid gene in a plastid in vitro system from mustard (Sinapis alba L.)

PubMed Central

Link, Gerhard

1984-01-01

A nuclease-treated plastid extract from mustard (Sinapis alba L.) allows efficient transcription of cloned plastid DNA templates. In this in vitro system, the major runoff transcript of the truncated gene for the 32 000 mol. wt. photosystem II protein was accurately initiated from a site close to or identical with the in vivo start site. By using plasmids with deletions in the 5'-flanking region of this gene as templates, a DNA region required for efficient and selective initiation was detected ˜28-35 nucleotides upstream of the transcription start site. This region contains the sequence element TTGACA, which matches the consensus sequence for prokaryotic `−35' promoter elements. In the absence of this region, a region ˜13-27 nucleotides upstream of the start site still enables a basic level of specific transcription. This second region contains the sequence element TATATAA, which matches the consensus sequence for the `TATA' box of genes transcribed by RNA polymerase II (or B). The region between the `TATA'-like element and the transcription start site is not sufficient but may be required for specific transcription of the plastid gene. This latter region contains the sequence element TATACT, which resembles the prokaryotic `−10' (Pribnow) box. Based on the structural and transcriptional features of the 5' upstream region, a `promoter switch' mechanism is proposed, which may account for the developmentally regulated expression of this plastid gene. ImagesFig. 1.Fig. 2.Fig. 3.Fig. 4.Figure 5. PMID:16453540
Rare and Coding Region Genetic Variants Associated With Risk of Ischemic Stroke: The NHLBI Exome Sequence Project.

PubMed

Auer, Paul L; Nalls, Mike; Meschia, James F; Worrall, Bradford B; Longstreth, W T; Seshadri, Sudha; Kooperberg, Charles; Burger, Kathleen M; Carlson, Christopher S; Carty, Cara L; Chen, Wei-Min; Cupples, L Adrienne; DeStefano, Anita L; Fornage, Myriam; Hardy, John; Hsu, Li; Jackson, Rebecca D; Jarvik, Gail P; Kim, Daniel S; Lakshminarayan, Kamakshi; Lange, Leslie A; Manichaikul, Ani; Quinlan, Aaron R; Singleton, Andrew B; Thornton, Timothy A; Nickerson, Deborah A; Peters, Ulrike; Rich, Stephen S

2015-07-01

Stroke is the second leading cause of death and the third leading cause of years of life lost. Genetic factors contribute to stroke prevalence, and candidate gene and genome-wide association studies (GWAS) have identified variants associated with ischemic stroke risk. These variants often have small effects without obvious biological significance. Exome sequencing may discover predicted protein-altering variants with a potentially large effect on ischemic stroke risk. To investigate the contribution of rare and common genetic variants to ischemic stroke risk by targeting the protein-coding regions of the human genome. The National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) analyzed approximately 6000 participants from numerous cohorts of European and African ancestry. For discovery, 365 cases of ischemic stroke (small-vessel and large-vessel subtypes) and 809 European ancestry controls were sequenced; for replication, 47 affected sibpairs concordant for stroke subtype and an African American case-control series were sequenced, with 1672 cases and 4509 European ancestry controls genotyped. The ESP's exome sequencing and genotyping started on January 1, 2010, and continued through June 30, 2012. Analyses were conducted on the full data set between July 12, 2012, and July 13, 2013. Discovery of new variants or genes contributing to ischemic stroke risk and subtype (primary analysis) and determination of support for protein-coding variants contributing to risk in previously published candidate genes (secondary analysis). We identified 2 novel genes associated with an increased risk of ischemic stroke: a protein-coding variant in PDE4DIP (rs1778155; odds ratio, 2.15; P = 2.63 × 10(-8)) with an intracellular signal transduction mechanism and in ACOT4 (rs35724886; odds ratio, 2.04; P = 1.24 × 10(-7)) with a fatty acid metabolism; confirmation of PDE4DIP was observed in affected sibpair families with large-vessel stroke subtype and in African Americans. Replication of protein-coding variants in candidate genes was observed for 2 previously reported GWAS associations: ZFHX3 (cardioembolic stroke) and ABCA1 (large-vessel stroke). Exome sequencing discovered 2 novel genes and mechanisms, PDE4DIP and ACOT4, associated with increased risk for ischemic stroke. In addition, ZFHX3 and ABCA1 were discovered to have protein-coding variants associated with ischemic stroke. These results suggest that genetic variation in novel pathways contributes to ischemic stroke risk and serves as a target for prediction, prevention, and therapy.
Complete Mitochondrial Genome of the Red Fox (Vuples vuples) and Phylogenetic Analysis with Other Canid Species.

PubMed

Zhong, Hua-Ming; Zhang, Hong-Hai; Sha, Wei-Lai; Zhang, Cheng-De; Chen, Yu-Cai

2010-04-01

The whole mitochondrial genome sequence of red fox (Vuples vuples) was determined. It had a total length of 16 723 bp. As in most mammal mitochondrial genome, it contained 13 protein coding genes, two ribosome RNA genes, 22 transfer RNA genes and one control region. The base composition was 31.3% A, 26.1% C, 14.8% G and 27.8% T, respectively. The codon usage of red fox, arctic fox, gray wolf, domestic dog and coyote followed the same pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 3 gene in the red fox. A long tandem repeat rich in AC was found between conserved sequence block 1 and 2 in the control region. In order to confirm the phylogenetic relationships of red fox to other canids, phylogenetic trees were reconstructed by neighbor-joining and maximum parsimony methods using 12 concatenated heavy-strand protein-coding genes. The result indicated that arctic fox was the sister group of red fox and they both belong to the red fox-like clade in family Canidae, while gray wolf, domestic dog and coyote belong to wolf-like clade. The result was in accordance with existing phylogenetic results.
The mitochondrial genome of Cethosia biblis (Drury) (Lepidoptera: Nymphalidae).

PubMed

Xin, Tianrong; Li, Lei; Yao, Chengyi; Wang, Yayu; Zou, Zhiwen; Wang, Jing; Xia, Bin

2016-07-01

We present the complete mitogenome of Cethosia biblis (Drury) (Lepidoptera: Nymphalidae) in this article. The mitogenome was a circle molecular consisting of 15,286 nucleotides, 37 genes, and an A + T-rich region. The order of 37 genes was typical of insect mitochondrial DNA sequences described to date. The overall base composition of the genome is A (37.41%), T (42.80%), C (11.87%), and G (7.91%) with an A + T-rich hallmark as that of other invertebrate mitochondrial genomes. The start codon was mainly ATA in most of the mitochondrial protein-coding genes such as ND2, COI, ATP8, ND3, ND5, ND4, ND6, and ND1, but COII, ATP6, COIII, ND4L, and Cob genes employing ATG. The stop codon was TAA in all the protein-coding genes. The A + T region is located between 12S rRNA and tRNA(M)(et). The phylogenetic relationships of Lepidoptera species were constructed based on the nucleotides sequences of 13 PCGs of mitogenomes using the neighbor-joining method. The molecular-based phylogeny supported the traditional morphological classification on relationships within Lepidoptera species.
Localization of TFIIB binding regions using serial analysis of chromatin occupancy

PubMed Central

Yochum, Gregory S; Rajaraman, Veena; Cleland, Ryan; McWeeney, Shannon

2007-01-01

Background: RNA Polymerase II (RNAP II) is recruited to core promoters by the pre-initiation complex (PIC) of general transcription factors. Within the PIC, transcription factor for RNA polymerase IIB (TFIIB) determines the start site of transcription. TFIIB binding has not been localized, genome-wide, in metazoans. Serial analysis of chromatin occupancy (SACO) is an unbiased methodology used to empirically identify transcription factor binding regions. In this report, we use TFIIB and SACO to localize TFIIB binding regions across the rat genome. Results: A sample of the TFIIB SACO library was sequenced and 12,968 TFIIB genomic signature tags (GSTs) were assigned to the rat genome. GSTs are 20–22 base pair fragments that are derived from TFIIB bound chromatin. TFIIB localized to both non-protein coding and protein-coding loci. For 21% of the 1783 protein-coding genes in this sample of the SACO library, TFIIB binding mapped near the characterized 5' promoter that is upstream of the transcription start site (TSS). However, internal TFIIB binding positions were identified in 57% of the 1783 protein-coding genes. Internal positions are defined as those within an inclusive region greater than 2.5 kb downstream from the 5' TSS and 2.5 kb upstream from the transcription stop. We demonstrate that both TFIIB and TFIID (an additional component of PICs) bound to internal regions using chromatin immunoprecipitation (ChIP). The 5' cap of transcripts associated with internal TFIIB binding positions were identified using a cap-trapping assay. The 5' TSSs for internal transcripts were confirmed by primer extension. Additionally, an analysis of the functional annotation of mouse 3 (FANTOM3) databases indicates that internally initiated transcripts identified by TFIIB SACO in rat are conserved in mouse. Conclusion: Our findings that TFIIB binding is not restricted to the 5' upstream region indicates that the propensity for PIC to contribute to transcript diversity is far greater than previously appreciated. PMID:17997859
In silico screening of the chicken genome for overlaps between genomic regions: microRNA genes, coding and non-coding transcriptional units, QTL, and genetic variations.

PubMed

Zorc, Minja; Kunej, Tanja

2016-05-01

MicroRNAs (miRNAs) are a class of non-coding RNAs involved in posttranscriptional regulation of target genes. Regulation requires complementarity between target mRNA and the mature miRNA seed region, responsible for their recognition and binding. It has been estimated that each miRNA targets approximately 200 genes, and genetic variability of miRNA genes has been reported to affect phenotypic variability and disease susceptibility in humans, livestock species, and model organisms. Polymorphisms in miRNA genes could therefore represent biomarkers for phenotypic traits in livestock animals. In our previous study, we collected polymorphisms within miRNA genes in chicken. In the present study, we identified miRNA-related genomic overlaps to prioritize genomic regions of interest for further functional studies and biomarker discovery. Overlapping genomic regions in chicken were analyzed using the following bioinformatics tools and databases: miRNA SNiPer, Ensembl, miRBase, NCBI Blast, and QTLdb. Out of 740 known pre-miRNA genes, 263 (35.5 %) contain polymorphisms; among them, 35 contain more than three polymorphisms The most polymorphic miRNA genes in chicken are gga-miR-6662, containing 23 single nucleotide polymorphisms (SNPs) within the pre-miRNA region, including five consecutive SNPs, and gga-miR-6688, containing ten polymorphisms including three consecutive polymorphisms. Several miRNA-related genomic hotspots have been revealed in chicken genome; polymorphic miRNA genes are located within protein-coding and/or non-coding transcription units and quantitative trait loci (QTL) associated with production traits. The present study includes the first description of an exonic miRNA in a chicken genome, an overlap between the miRNA gene and the exon of the protein-coding gene (gga-miR-6578/HADHB), and the first report of a missense polymorphism located within a mature miRNA seed region. Identified miRNA-related genomic hotspots in chicken can serve researchers as a starting point for further functional studies and association studies with poultry production and health traits and the basis for systematic screening of exonic miRNAs and missense/miRNA seed polymorphisms in other genomes.
Gene end-like sequences within the 3' non-coding region of the Nipah virus genome attenuate viral gene transcription.

PubMed

Sugai, Akihiro; Sato, Hiroki; Yoneda, Misako; Kai, Chieko

2017-08-01

The regulation of transcription during Nipah virus (NiV) replication is poorly understood. Using a bicistronic minigenome system, we investigated the involvement of non-coding regions (NCRs) in the transcriptional re-initiation efficiency of NiV RNA polymerase. Reporter assays revealed that attenuation of NiV gene expression was not constant at each gene junction, and that the attenuating property was controlled by the 3' NCR. However, this regulation was independent of the gene-end, gene-start and intergenic regions. Northern blot analysis indicated that regulation of viral gene expression by the phosphoprotein (P) and large protein (L) 3' NCRs occurred at the transcription level. We identified uridine-rich tracts within the L 3' NCR that are similar to gene-end signals. These gene-end-like sequences were recognized as weak transcription termination signals by the viral RNA polymerase, thereby reducing downstream gene transcription. Thus, we suggest that NiV has a unique mechanism of transcriptional regulation. Copyright © 2017 Elsevier Inc. All rights reserved.
The Complete Mitochondrial Genome of the Rice Moth, Corcyra cephalonica

PubMed Central

Wu, Yu-Peng; Li, Jie; Zhao, Jin-Liang; Su, Tian-Juan; Luo, A-Rong; Fan, Ren-Jun; Chen, Ming-Chang; Wu, Chun-Sheng; Zhu, Chao-Dong

2012-01-01

The complete mitochondrial genome (mitogenome) of the rice moth, Corcyra cephalonica Stainton (Lepidoptera: Pyralidae) was determined as a circular molecular of 15,273 bp in size. The mitogenome composition (37 genes) and gene order are the same as the other lepidopterans. Nucleotide composition of the C. cephalonica mitogenome is highly A+T biased (80.43%) like other insects. Twelve protein-coding genes start with a typical ATN codon, with the exception of coxl gene, which uses CGA as the initial codon. Nine protein-coding genes have the common stop codon TAA, and the nad2, cox1, cox2, and nad4 have single T as the incomplete stop codon. 22 tRNA genes demonstrated cloverleaf secondary structure. The mitogenome has several large intergenic spacer regions, the spacer1 between trnQ gene and nad2 gene, which is common in Lepidoptera. The spacer 3 between trnE and trnF includes microsatellite-like repeat regions (AT)18 and (TTAT)3. The spacer 4 (16 bp) between trnS2 gene and nad1 gene has a motif ATACTAT; another species, Sesamia inferens encodes ATCATAT at the same position, while other lepidopteran insects encode a similar ATACTAA motif. The spacer 6 is A+T rich region, include motif ATAGA and a 20-bp poly(T) stretch and two microsatellite (AT)9, (AT)8 elements. PMID:23413968
The complete mitochondrial genome of the rice moth, Corcyra cephalonica.

PubMed

Wu, Yu-Peng; Li, Jie; Zhao, Jin-Liang; Su, Tian-Juan; Luo, A-Rong; Fan, Ren-Jun; Chen, Ming-Chang; Wu, Chun-Sheng; Zhu, Chao-Dong

2012-01-01

The complete mitochondrial genome (mitogenome) of the rice moth, Corcyra cephalonica Stainton (Lepidoptera: Pyralidae) was determined as a circular molecular of 15,273 bp in size. The mitogenome composition (37 genes) and gene order are the same as the other lepidopterans. Nucleotide composition of the C. cephalonica mitogenome is highly A+T biased (80.43%) like other insects. Twelve protein-coding genes start with a typical ATN codon, with the exception of coxl gene, which uses CGA as the initial codon. Nine protein-coding genes have the common stop codon TAA, and the nad2, cox1, cox2, and nad4 have single T as the incomplete stop codon. 22 tRNA genes demonstrated cloverleaf secondary structure. The mitogenome has several large intergenic spacer regions, the spacer1 between trnQ gene and nad2 gene, which is common in Lepidoptera. The spacer 3 between trnE and trnF includes microsatellite-like repeat regions (AT)18 and (TTAT)(3). The spacer 4 (16 bp) between trnS2 gene and nad1 gene has a motif ATACTAT; another species, Sesamia inferens encodes ATCATAT at the same position, while other lepidopteran insects encode a similar ATACTAA motif. The spacer 6 is A+T rich region, include motif ATAGA and a 20-bp poly(T) stretch and two microsatellite (AT)(9), (AT)(8) elements.
The complete mitochondrial genome of Pomacea canaliculata (Gastropoda: Ampullariidae).

PubMed

Zhou, Xuming; Chen, Yu; Zhu, Shanliang; Xu, Haigen; Liu, Yan; Chen, Lian

2016-01-01

The mitochondrial genome of Pomacea canaliculata (Gastropoda: Ampullariidae) is the first complete mtDNA sequence reported in the genus Pomacea. The total length of mtDNA is 15,707 bp, which containing 13 protein-coding genes, 2 ribosomal RNAs, 22 transfer RNAs, and a 359 bp non-coding region. The A + T content of the overall base composition of H-strand is 71.7% (T: 41%, C: 12.7%, A: 30.7%, G: 15.6%). ATP6, ATP8, CO1, CO2, ND1-3, ND5, ND6, ND4L and Cyt b genes begin with ATG as start codon, CO3 and ND4 begin with ATA. ATP8, CO2-3, ND4L, ND2-6 and Cyt b genes are terminated with TAA as stop codon, ATP6, ND1, and CO1 end with TAG. A long non-coding region is found and a 23 bp repeat unit repeat 11 times in this region.
Genetic Variation Linked to Lung Cancer Survival in White Smokers | Center for Cancer Research

Cancer.gov

CCR investigators have discovered evidence that links lung cancer survival with genetic variations (called single nucleotide polymorphisms) in the MBL2 gene, a key player in innate immunity. The variations in the gene, which codes for a protein called the mannose-binding lectin, occur in its promoter region, where the RNA polymerase molecule binds to start transcription, and in the first exon that is responsible for the correct structure of MBL. The findings appear in the September 19, 2007, issue of the Journal of the National Cancer Institute.
Neutropenia-associated ELANE mutations disrupting translation initiation produce novel neutrophil elastase isoforms

PubMed Central

Tidwell, Timothy; Wechsler, Jeremy; Nayak, Ramesh C.; Trump, Lisa; Salipante, Stephen J.; Cheng, Jerry C.; Donadieu, Jean; Glaubach, Taly; Corey, Seth J.; Grimes, H. Leighton; Lutzko, Carolyn; Cancelas, Jose A.

2014-01-01

Hereditary neutropenia is usually caused by heterozygous germline mutations in the ELANE gene encoding neutrophil elastase (NE). How mutations cause disease remains uncertain, but two hypotheses have been proposed. In one, ELANE mutations lead to mislocalization of NE. In the other, ELANE mutations disturb protein folding, inducing an unfolded protein response in the endoplasmic reticulum (ER). In this study, we describe new types of mutations that disrupt the translational start site. At first glance, they should block translation and are incompatible with either the mislocalization or misfolding hypotheses, which require mutant protein for pathogenicity. We find that start-site mutations, instead, force translation from downstream in-frame initiation codons, yielding amino-terminally truncated isoforms lacking ER-localizing (pre) and zymogen-maintaining (pro) sequences, yet retain essential catalytic residues. Patient-derived induced pluripotent stem cells recapitulate hematopoietic and molecular phenotypes. Expression of the amino-terminally deleted isoforms in vitro reduces myeloid cell clonogenic capacity. We define an internal ribosome entry site (IRES) within ELANE and demonstrate that adjacent mutations modulate IRES activity, independently of protein-coding sequence alterations. Some ELANE mutations, therefore, appear to cause neutropenia via the production of amino-terminally deleted NE isoforms rather than by altering the coding sequence of the full-length protein. PMID:24184683
nGASP - the nematode genome annotation assessment project

DOE Office of Scientific and Technical Information (OSTI.GOV)

Coghlan, A; Fiedler, T J; McKay, S J

2008-12-19

While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner'more » algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders. While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets for 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second place. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy as reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs were the most challenging for gene-finders.« less
Phylogenomic analysis of the Chilean clade of Liolaemus lizards (Squamata: Liolaemidae) based on sequence capture data.

PubMed

Panzera, Alejandra; Leaché, Adam D; D'Elía, Guillermo; Victoriano, Pedro F

2017-01-01

The genus Liolaemus is one of the most ecologically diverse and species-rich genera of lizards worldwide. It currently includes more than 250 recognized species, which have been subject to many ecological and evolutionary studies. Nevertheless, Liolaemus lizards have a complex taxonomic history, mainly due to the incongruence between morphological and genetic data, incomplete taxon sampling, incomplete lineage sorting and hybridization. In addition, as many species have restricted and remote distributions, this has hampered their examination and inclusion in molecular systematic studies. The aims of this study are to infer a robust phylogeny for a subsample of lizards representing the Chilean clade (subgenus Liolaemus sensu stricto ), and to test the monophyly of several of the major species groups. We use a phylogenomic approach, targeting 541 ultra-conserved elements (UCEs) and 44 protein-coding genes for 16 taxa. We conduct a comparison of phylogenetic analyses using maximum-likelihood and several species tree inference methods. The UCEs provide stronger support for phylogenetic relationships compared to the protein-coding genes; however, the UCEs outnumber the protein-coding genes by 10-fold. On average, the protein-coding genes contain over twice the number of informative sites. Based on our phylogenomic analyses, all the groups sampled are polyphyletic. Liolaemus tenuis tenuis is difficult to place in the phylogeny, because only a few loci (nine) were recovered for this species. Topologies or support values did not change dramatically upon exclusion of L. t. tenuis from analyses, suggesting that missing data did not had a significant impact on phylogenetic inference in this data set. The phylogenomic analyses provide strong support for sister group relationships between L. fuscus , L. monticola , L. nigroviridis and L. nitidus , and L. platei and L. velosoi . Despite our limited taxon sampling, we have provided a reliable starting hypothesis for the relationships among many major groups of the Chilean clade of Liolaemus that will help future work aimed at resolving the Liolaemus phylogeny.
An expanding universe of the non-coding genome in cancer biology.

PubMed

Xue, Bin; He, Lin

2014-06-01

Neoplastic transformation is caused by accumulation of genetic and epigenetic alterations that ultimately convert normal cells into tumor cells with uncontrolled proliferation and survival, unlimited replicative potential and invasive growth [Hanahan,D. et al. (2011) Hallmarks of cancer: the next generation. Cell, 144, 646-674]. Although the majority of the cancer studies have focused on the functions of protein-coding genes, emerging evidence has started to reveal the importance of the vast non-coding genome, which constitutes more than 98% of the human genome. A number of non-coding RNAs (ncRNAs) derived from the 'dark matter' of the human genome exhibit cancer-specific differential expression and/or genomic alterations, and it is increasingly clear that ncRNAs, including small ncRNAs and long ncRNAs (lncRNAs), play an important role in cancer development by regulating protein-coding gene expression through diverse mechanisms. In addition to ncRNAs, nearly half of the mammalian genomes consist of transposable elements, particularly retrotransposons. Once depicted as selfish genomic parasites that propagate at the expense of host fitness, retrotransposon elements could also confer regulatory complexity to the host genomes during development and disease. Reactivation of retrotransposons in cancer, while capable of causing insertional mutagenesis and genome rearrangements to promote oncogenesis, could also alter host gene expression networks to favor tumor development. Taken together, the functional significance of non-coding genome in tumorigenesis has been previously underestimated, and diverse transcripts derived from the non-coding genome could act as integral functional components of the oncogene and tumor suppressor network. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Structure and evolution of the mitochondrial genome of Exorista sorbillans: the Tachinidae (Diptera: Calyptratae) perspective.

PubMed

Shao, Yuan-jun; Hu, Xian-qiong; Peng, Guang-da; Wang, Rui-xian; Gao, Rui-na; Lin, Chao; Shen, Wei-de; Li, Rui; Li, Bing

2012-12-01

The first complete mitochondrial genome (mitogenome) of Tachinidae Exorista sorbillans (Diptera) is sequenced by PCR-based approach. The circular mitogenome is 14,960 bp long and has the representative mitochondrial gene (mt gene) organization and order of Diptera. All protein-coding sequences are initiated with ATN codon; however, the only exception is Cox I gene, which has a 4-bp ATCG putative start codon. Ten of the thirteen protein-coding genes have a complete termination codon (TAA), but the rest are seated on the H strand with incomplete codons. The mitogenome of E. sorbillans is biased toward A+T content at 78.4 %, and the strand-specific bias is in reflection of the third codon positions of mt genes, and their T/C ratios as strand indictor are higher on the H strand more than those on the L strand pointing at any strain of seven Diptera flies. The length of the A+T-rich region of E. sorbillans is 106 bp, including a tandem triple copies of a13-bp fragment. Compared to Haematobia irritans, E. sorbillans holds distant relationship with Drosophila. Phylogenetic topologies based on the amino acid sequences, supporting that E. sorbillans (Tachinidae) is clustered with strains of Calliphoridae and Oestridae, and superfamily Oestroidea are polyphyletic groups with Muscidae in a clade.
The complete mitochondrial genome of the butterfly Apatura metis (Lepidoptera: Nymphalidae).

PubMed

Zhang, Min; Nie, Xinping; Cao, Tianwen; Wang, Juping; Li, Tao; Zhang, Xiaonan; Guo, Yaping; Ma, Enbo; Zhong, Yang

2012-06-01

As an important pest in the Slender Leaved Willow (Salix alba), Apatura metis is called Freyer's purple emperor, and its mitochondrial genome is 15,236 bp long. The encoded genes for 22 tRNA genes, two ribosomal RNA (rrnL and rrnS) genes, and 13 protein-coding genes (PCGs), and a control region in the A. metis mitochondria are highly homologous to other lepidopteran species. The mitochondrial genome of A. metis is biased toward a high A + T content (A + T = 80.5%). All protein-coding genes, except for COI begins with the CGA codon as observed in other lepidopterans, start with a typical ATN initiation codon. All tRNAs show the classic clover-leaf structure, except that the dihydrouridine (DHU) arm of tRNA(Ser(AGN)) forms a simple loop. The A. metis A + T-rich region contains some conserved structures including a structure combining the motif 'ATAGA' and 19 bp poly (T) stretch, which is similar to those found in other lepidopteran mitogenomes. The phylogenetic analyses of lepidopterans based on mitogenomes sequences demonstrate that each of the six superfamilies is monophyletic, and the relationship among them is (((Noctuoidea + (Geometroidea + Bombycoidea)) + Pyraloidea) + Papilionoidea) + Tortricoidea. In Papilionoidea group, our conclusion argues that ((Lycaenidae + Pieridae) + Nymphalidae) + Papilionidae.
Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes

DOE PAGES

Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui; ...

2014-10-02

Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
Promoter analysis reveals globally differential regulation of human long non-coding RNA and protein-coding genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alam, Tanvir; Medvedeva, Yulia A.; Jia, Hui

Transcriptional regulation of protein-coding genes is increasingly well-understood on a global scale, yet no comparable information exists for long non-coding RNA (lncRNA) genes, which were recently recognized to be as numerous as protein-coding genes in mammalian genomes. We performed a genome-wide comparative analysis of the promoters of human lncRNA and protein-coding genes, finding global differences in specific genetic and epigenetic features relevant to transcriptional regulation. These two groups of genes are hence subject to separate transcriptional regulatory programs, including distinct transcription factor (TF) proteins that significantly favor lncRNA, rather than coding-gene, promoters. We report a specific signature of promoter-proximal transcriptionalmore » regulation of lncRNA genes, including several distinct transcription factor binding sites (TFBS). Experimental DNase I hypersensitive site profiles are consistent with active configurations of these lncRNA TFBS sets in diverse human cell types. TFBS ChIP-seq datasets confirm the binding events that we predicted using computational approaches for a subset of factors. For several TFs known to be directly regulated by lncRNAs, we find that their putative TFBSs are enriched at lncRNA promoters, suggesting that the TFs and the lncRNAs may participate in a bidirectional feedback loop regulatory network. Accordingly, cells may be able to modulate lncRNA expression levels independently of mRNA levels via distinct regulatory pathways. Our results also raise the possibility that, given the historical reliance on protein-coding gene catalogs to define the chromatin states of active promoters, a revision of these chromatin signature profiles to incorporate expressed lncRNA genes is warranted in the future.« less
Sequence Analysis of Mitochondrial Genome of Toxascaris leonina from a South China Tiger.

PubMed

Li, Kangxin; Yang, Fang; Abdullahi, A Y; Song, Meiran; Shi, Xianli; Wang, Minwei; Fu, Yeqi; Pan, Weida; Shan, Fang; Chen, Wu; Li, Guoqing

2016-12-01

Toxascaris leonina is a common parasitic nematode of wild mammals and has significant impacts on the protection of rare wild animals. To analyze population genetic characteristics of T. leonina from South China tiger, its mitochondrial (mt) genome was sequenced. Its complete circular mt genome was 14,277 bp in length, including 12 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 2 non-coding regions. The nucleotide composition was biased toward A and T. The most common start codon and stop codon were TTG and TAG, and 4 genes ended with an incomplete stop codon. There were 13 intergenic regions ranging 1 to 10 bp in size. Phylogenetically, T. leonina from a South China tiger was close to canine T. leonina . This study reports for the first time a complete mt genome sequence of T. leonina from the South China tiger, and provides a scientific basis for studying the genetic diversity of nematodes between different hosts.
Long Non-Coding RNAs Differentially Expressed between Normal versus Primary Breast Tumor Tissues Disclose Converse Changes to Breast Cancer-Related Protein-Coding Genes

PubMed Central

Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U.; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N.; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O.

2014-01-01

Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes. PMID:25264628
Long non-coding RNAs differentially expressed between normal versus primary breast tumor tissues disclose converse changes to breast cancer-related protein-coding genes.

PubMed

Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O

2014-01-01

Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes.
Biallelic insertion of a transcriptional terminator via the CRISPR/Cas9 system efficiently silences expression of protein-coding and non-coding RNA genes.

PubMed

Liu, Yangyang; Han, Xiao; Yuan, Junting; Geng, Tuoyu; Chen, Shihao; Hu, Xuming; Cui, Isabelle H; Cui, Hengmi

2017-04-07

The type II bacterial CRISPR/Cas9 system is a simple, convenient, and powerful tool for targeted gene editing. Here, we describe a CRISPR/Cas9-based approach for inserting a poly(A) transcriptional terminator into both alleles of a targeted gene to silence protein-coding and non-protein-coding genes, which often play key roles in gene regulation but are difficult to silence via insertion or deletion of short DNA fragments. The integration of 225 bp of bovine growth hormone poly(A) signals into either the first intron or the first exon or behind the promoter of target genes caused efficient termination of expression of PPP1R12C , NSUN2 (protein-coding genes), and MALAT1 (non-protein-coding gene). Both NeoR and PuroR were used as markers in the selection of clonal cell lines with biallelic integration of a poly(A) signal. Genotyping analysis indicated that the cell lines displayed the desired biallelic silencing after a brief selection period. These combined results indicate that this CRISPR/Cas9-based approach offers an easy, convenient, and efficient novel technique for gene silencing in cell lines, especially for those in which gene integration is difficult because of a low efficiency of homology-directed repair. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Origins of De Novo Genes in Human and Chimpanzee.

PubMed

Ruiz-Orera, Jorge; Hernandez-Rodriguez, Jessica; Chiva, Cristina; Sabidó, Eduard; Kondova, Ivanela; Bontrop, Ronald; Marqués-Bonet, Tomàs; Albà, M Mar

2015-12-01

The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.
Origins of De Novo Genes in Human and Chimpanzee

PubMed Central

Ruiz-Orera, Jorge; Hernandez-Rodriguez, Jessica; Chiva, Cristina; Sabidó, Eduard; Kondova, Ivanela; Bontrop, Ronald; Marqués-Bonet, Tomàs; Albà, M.Mar

2015-01-01

The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species—human, chimpanzee, macaque, and mouse—and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins. PMID:26720152
De Novo Origin of Human Protein-Coding Genes

PubMed Central

Wu, Dong-Dong; Irwin, David M.; Zhang, Ya-Ping

2011-01-01

The de novo origin of a new protein-coding gene from non-coding DNA is considered to be a very rare occurrence in genomes. Here we identify 60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee. The functionality of these genes is supported by both transcriptional and proteomic evidence. RNA–seq data indicate that these genes have their highest expression levels in the cerebral cortex and testes, which might suggest that these genes contribute to phenotypic traits that are unique to humans, such as improved cognitive ability. Our results are inconsistent with the traditional view that the de novo origin of new genes is very rare, thus there should be greater appreciation of the importance of the de novo origination of genes. PMID:22102831
Codon usage and expression level of human mitochondrial 13 protein coding genes across six continents.

PubMed

Chakraborty, Supriyo; Uddin, Arif; Mazumder, Tarikul Huda; Choudhury, Monisha Nath; Malakar, Arup Kumar; Paul, Prosenjit; Halder, Binata; Deka, Himangshu; Mazumder, Gulshana Akthar; Barbhuiya, Riazul Ahmed; Barbhuiya, Masuk Ahmed; Devi, Warepam Jesmi

2017-12-02

The study of codon usage coupled with phylogenetic analysis is an important tool to understand the genetic and evolutionary relationship of a gene. The 13 protein coding genes of human mitochondria are involved in electron transport chain for the generation of energy currency (ATP). However, no work has yet been reported on the codon usage of the mitochondrial protein coding genes across six continents. To understand the patterns of codon usage in mitochondrial genes across six different continents, we used bioinformatic analyses to analyze the protein coding genes. The codon usage bias was low as revealed from high ENC value. Correlation between codon usage and GC3 suggested that all the codons ending with G/C were positively correlated with GC3 but vice versa for A/T ending codons with the exception of ND4L and ND5 genes. Neutrality plot revealed that for the genes ATP6, COI, COIII, CYB, ND4 and ND4L, natural selection might have played a major role while mutation pressure might have played a dominant role in the codon usage bias of ATP8, COII, ND1, ND2, ND3, ND5 and ND6 genes. Phylogenetic analysis indicated that evolutionary relationships in each of 13 protein coding genes of human mitochondria were different across six continents and further suggested that geographical distance was an important factor for the origin and evolution of 13 protein coding genes of human mitochondria. Copyright © 2017 Elsevier B.V. and Mitochondria Research Society. All rights reserved.
A Dual Origin of the Xist Gene from a Protein-Coding Gene and a Set of Transposable Elements

PubMed Central

Elisaphenko, Eugeny A.; Kolesnikov, Nikolay N.; Shevchenko, Alexander I.; Rogozin, Igor B.; Nesterova, Tatyana B.; Brockdorff, Neil; Zakian, Suren M.

2008-01-01

X-chromosome inactivation, which occurs in female eutherian mammals is controlled by a complex X-linked locus termed the X-inactivation center (XIC). Previously it was proposed that genes of the XIC evolved, at least in part, as a result of pseudogenization of protein-coding genes. In this study we show that the key XIC gene Xist, which displays fragmentary homology to a protein-coding gene Lnx3, emerged de novo in early eutherians by integration of mobile elements which gave rise to simple tandem repeats. The Xist gene promoter region and four out of ten exons found in eutherians retain homology to exons of the Lnx3 gene. The remaining six Xist exons including those with simple tandem repeats detectable in their structure have similarity to different transposable elements. Integration of mobile elements into Xist accompanies the overall evolution of the gene and presumably continues in contemporary eutherian species. Additionally we showed that the combination of remnants of protein-coding sequences and mobile elements is not unique to the Xist gene and is found in other XIC genes producing non-coding nuclear RNA. PMID:18575625
Ribosome reinitiation at leader peptides increases translation of bacterial proteins.

PubMed

Korolev, Semen A; Zverkov, Oleg A; Seliverstov, Alexandr V; Lyubetsky, Vassily A

2016-04-16

Short leader genes usually do not encode stable proteins, although their importance in expression control of bacterial genomes is widely accepted. Such genes are often involved in the control of attenuation regulation. However, the abundance of leader genes suggests that their role in bacteria is not limited to regulation. Specifically, we hypothesize that leader genes increase the expression of protein-coding (structural) genes via ribosome reinitiation at the leader peptide in the case of a short distance between the stop codon of the leader gene and the start codon of the structural gene. For instance, in Actinobacteria, the frequency of leader genes at a distance of 10-11 bp is about 70 % higher than the mean frequency within the 1 to 65 bp range; and it gradually decreases as the range grows longer. A pronounced peak of this frequency-distance relationship is also observed in Proteobacteria, Bacteroidetes, Spirochaetales, Acidobacteria, the Deinococcus-Thermus group, and Planctomycetes. In contrast, this peak falls to the distance of 15-16 bp and is not very pronounced in Firmicutes; and no such peak is observed in cyanobacteria and tenericutes. Generally, this peak is typical for many bacteria. Some leader genes located close to a structural gene probably play a regulatory role as well.
Long non-coding RNAs and mRNAs profiling during spleen development in pig.

PubMed

Che, Tiandong; Li, Diyan; Jin, Long; Fu, Yuhua; Liu, Yingkai; Liu, Pengliang; Wang, Yixin; Tang, Qianzi; Ma, Jideng; Wang, Xun; Jiang, Anan; Li, Xuewei; Li, Mingzhou

2018-01-01

Genome-wide transcriptomic studies in humans and mice have become extensive and mature. However, a comprehensive and systematic understanding of protein-coding genes and long non-coding RNAs (lncRNAs) expressed during pig spleen development has not been achieved. LncRNAs are known to participate in regulatory networks for an array of biological processes. Here, we constructed 18 RNA libraries from developing fetal pig spleen (55 days before birth), postnatal pig spleens (0, 30, 180 days and 2 years after birth), and the samples from the 2-year-old Wild Boar. A total of 15,040 lncRNA transcripts were identified among these samples. We found that the temporal expression pattern of lncRNAs was more restricted than observed for protein-coding genes. Time-series analysis showed two large modules for protein-coding genes and lncRNAs. The up-regulated module was enriched for genes related to immune and inflammatory function, while the down-regulated module was enriched for cell proliferation processes such as cell division and DNA replication. Co-expression networks indicated the functional relatedness between protein-coding genes and lncRNAs, which were enriched for similar functions over the series of time points examined. We identified numerous differentially expressed protein-coding genes and lncRNAs in all five developmental stages. Notably, ceruloplasmin precursor (CP), a protein-coding gene participating in antioxidant and iron transport processes, was differentially expressed in all stages. This study provides the first catalog of the developing pig spleen, and contributes to a fuller understanding of the molecular mechanisms underpinning mammalian spleen development.
Mitochondrial genome and phylogenetic position of the tawny nurse shark (Nebrius ferrugineus).

PubMed

Wang, Junjie; Chen, Hao; Lin, Lingling; Ai, Weiming; Chen, Xiao

2017-01-01

The complete mitochondrial genome of the tawny nurse shark (Nebrius ferrugineus) was first presented in this study. It was 16 693 bp in length with the typical gene order in vertebrates. The overall base composition was 33.6% A, 25.6% C, 12.7% G and 28.1% T. Two start (ATG and GTG) and two stop (TAG and TAA/T--) codons were found in the protein-coding genes. The size of 22 tRNA genes ranged from 67 to 75 bp. The origin of L-strand replication could form a hairpin structure. All nodes strongly supported that N. ferrugineus was placed as sister to Rhincodon typus in the Bayesian tree.
Comparison of the protein-coding gene content of Chlamydia trachomatis and Protochlamydia amoebophila using a Raspberry Pi computer.

PubMed

Robson, James F; Barker, Daniel

2015-10-13

To demonstrate the bioinformatics capabilities of a low-cost computer, the Raspberry Pi, we present a comparison of the protein-coding gene content of two species in phylum Chlamydiae: Chlamydia trachomatis, a common sexually transmitted infection of humans, and Candidatus Protochlamydia amoebophila, a recently discovered amoebal endosymbiont. Identifying species-specific proteins and differences in protein families could provide insights into the unique phenotypes of the two species. Using a Raspberry Pi computer, sequence similarity-based protein families were predicted across the two species, C. trachomatis and P. amoebophila, and their members counted. Examples include nine multi-protein families unique to C. trachomatis, 132 multi-protein families unique to P. amoebophila and one family with multiple copies in both. Most families unique to C. trachomatis were polymorphic outer-membrane proteins. Additionally, multiple protein families lacking functional annotation were found. Predicted functional interactions suggest one of these families is involved with the exodeoxyribonuclease V complex. The Raspberry Pi computer is adequate for a comparative genomics project of this scope. The protein families unique to P. amoebophila may provide a basis for investigating the host-endosymbiont interaction. However, additional species should be included; and further laboratory research is required to identify the functions of unknown or putative proteins. Multiple outer membrane proteins were found in C. trachomatis, suggesting importance for host evasion. The tyrosine transport protein family is shared between both species, with four proteins in C. trachomatis and two in P. amoebophila. Shared protein families could provide a starting point for discovery of wide-spectrum drugs against Chlamydiae.
Structural organization of the porcine and human genes coding for a leydig cell-specific insulin-like peptide (LEY I-L) and chromosomal localization of the human gene (INSL3)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burkhardt E.; Adham, I.M.; Brosig, B.

1994-03-01

Leydig insulin-like protein (LEY I-L) is a member of the insulin-like hormone superfamily. The LEY I-L gene (designated INSL3) is expressed exclusively in prenatal and postnatal Leydig cells. The authors report here the cloning and nucleotide sequence of porcine and human LEY I-L genes including the 5[prime] regions. Both genes consist of two exons and one intron. The organization of the LEY I-L gene is similar to that of insulin and relaxin. The transcription start site in the porcine and human LEY I-L gene is localized 13 and 14 bp upstream of the translation start site, respectively. Alignment of themore » 5[prime] flanking regions of both genes reveals that the first 107 nucleotides upstream of the transcription start site exhibit an overall sequence similarity of 80%. This conserved region contains a consensus TATAA box, a CAAT-like element (GAAT), and a consensus SP1 sequence (GGGCGG) at equivalent positions in both genes and therefore may play a role in regulation of expression of the LEY I-L gene. The porcine and human genome contains a single copy of the LEY I-L gene. By in situ hybridization, the human gene was assigned to bands p13.2-p12 of the short arm of chromosome 19. 25 refs., 6 figs.« less
The complete genome sequence of the Atlantic salmon paramyxovirus (ASPV)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nylund, Stian; Karlsen, Marius; Nylund, Are

2008-03-30

The complete RNA genome of the Atlantic salmon paramyxovirus (ASPV), isolated from Atlantic salmon suffering from proliferative gill inflammation (PGI), has been determined. The genome is 16,965 nucleotides in length and consists of six nonoverlapping genes in the order 3'- N - P/C/V - M - F - HN - L -5', coding for the nucleocapsid, phospho-, matrix, fusion, hemagglutinin-neuraminidase and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and trinucleotide intergenic regions similar to those of other Paramyxoviridae. The ASPV P-gene expression strategy is like that of the respiro- and morbilliviruses,more » which express the phosphoprotein from the primary transcript, and edit a portion of the mRNA to encode the accessory proteins V and W. It also encodes the C-protein by ribosomal choice of translation initiation. Pairwise comparisons of amino acid identities, and phylogenetic analysis of deduced ASPV protein sequences with homologous sequences from other Paramyxoviridae, show that ASPV has an affinity for the genus Respirovirus, but may represent a new genus within the subfamily Paramyxovirinae.« less
Prediction of plant lncRNA by ensemble machine learning classifiers.

PubMed

Simopoulos, Caitlin M A; Weretilnyk, Elizabeth A; Golding, G Brian

2018-05-02

In plants, long non-protein coding RNAs are believed to have essential roles in development and stress responses. However, relative to advances on discerning biological roles for long non-protein coding RNAs in animal systems, this RNA class in plants is largely understudied. With comparatively few validated plant long non-coding RNAs, research on this potentially critical class of RNA is hindered by a lack of appropriate prediction tools and databases. Supervised learning models trained on data sets of mostly non-validated, non-coding transcripts have been previously used to identify this enigmatic RNA class with applications largely focused on animal systems. Our approach uses a training set comprised only of empirically validated long non-protein coding RNAs from plant, animal, and viral sources to predict and rank candidate long non-protein coding gene products for future functional validation. Individual stochastic gradient boosting and random forest classifiers trained on only empirically validated long non-protein coding RNAs were constructed. In order to use the strengths of multiple classifiers, we combined multiple models into a single stacking meta-learner. This ensemble approach benefits from the diversity of several learners to effectively identify putative plant long non-coding RNAs from transcript sequence features. When the predicted genes identified by the ensemble classifier were compared to those listed in GreeNC, an established plant long non-coding RNA database, overlap for predicted genes from Arabidopsis thaliana, Oryza sativa and Eutrema salsugineum ranged from 51 to 83% with the highest agreement in Eutrema salsugineum. Most of the highest ranking predictions from Arabidopsis thaliana were annotated as potential natural antisense genes, pseudogenes, transposable elements, or simply computationally predicted hypothetical protein. Due to the nature of this tool, the model can be updated as new long non-protein coding transcripts are identified and functionally verified. This ensemble classifier is an accurate tool that can be used to rank long non-protein coding RNA predictions for use in conjunction with gene expression studies. Selection of plant transcripts with a high potential for regulatory roles as long non-protein coding RNAs will advance research in the elucidation of long non-protein coding RNA function.

Network perturbation by recurrent regulatory variants in cancer

PubMed Central

Cho, Ara; Lee, Insuk; Choi, Jung Kyoon

2017-01-01

Cancer driving genes have been identified as recurrently affected by variants that alter protein-coding sequences. However, a majority of cancer variants arise in noncoding regions, and some of them are thought to play a critical role through transcriptional perturbation. Here we identified putative transcriptional driver genes based on combinatorial variant recurrence in cis-regulatory regions. The identified genes showed high connectivity in the cancer type-specific transcription regulatory network, with high outdegree and many downstream genes, highlighting their causative role during tumorigenesis. In the protein interactome, the identified transcriptional drivers were not as highly connected as coding driver genes but appeared to form a network module centered on the coding drivers. The coding and regulatory variants associated via these interactions between the coding and transcriptional drivers showed exclusive and complementary occurrence patterns across tumor samples. Transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes. PMID:28333928
A genetic screen for terminator function in yeast identifies a role for a new functional domain in termination factor Nab3.

PubMed

Loya, Travis J; O'Rourke, Thomas W; Reines, Daniel

2012-08-01

The yeast IMD2 gene encodes an enzyme involved in GTP synthesis. Its expression is controlled by guanine nucleotides through a set of alternate start sites and an intervening transcriptional terminator. In the off state, transcription results in a short non-coding RNA that starts upstream of the gene. Transcription terminates via the Nrd1-Nab3-Sen1 complex and is degraded by the nuclear exosome. Using a sensitive terminator read-through assay, we identified trans-acting Terminator Override (TOV) genes that operate this terminator. Four genes were identified: the RNA polymerase II phosphatase SSU72, the RNA polymerase II binding protein PCF11, the TRAMP subunit TRF4 and the hnRNP-like, NAB3. The TOV phenotype can be explained by the loss of function of these gene products as described in models in which termination and RNA degradation are coupled to the phosphorylation state of RNA polymerase II's repeat domain. The most interesting mutations were those found in NAB3, which led to the finding that the removal of merely three carboxy-terminal amino acids compromised Nab3's function. This region of previously unknown function is distant from the protein's well-known RNA binding and Nrd1 binding domains. Structural homology modeling suggests this Nab3 'tail' forms an α-helical multimerization domain that helps assemble it onto an RNA substrate.
Opposite GC skews at the 5' and 3' ends of genes in unicellular fungi

PubMed Central

2011-01-01

Background GC-skews have previously been linked to transcription in some eukaryotes. They have been associated with transcription start sites, with the coding strand G-biased in mammals and C-biased in fungi and invertebrates. Results We show a consistent and highly significant pattern of GC-skew within genes of almost all unicellular fungi. The pattern of GC-skew is asymmetrical: the coding strand of genes is typically C-biased at the 5' ends but G-biased at the 3' ends, with intermediate skews at the middle of genes. Thus, the initiation, elongation, and termination phases of transcription are associated with different skews. This pattern influences the encoded proteins by generating differential usage of amino acids at the 5' and 3' ends of genes. These biases also affect fourfold-degenerate positions and extend into promoters and 3' UTRs, indicating that skews cannot be accounted by selection for protein function or translation. Conclusions We propose two explanations, the mutational pressure hypothesis, and the adaptive hypothesis. The mutational pressure hypothesis is that different co-factors bind to RNA pol II at different phases of transcription, producing different mutational regimes. The adaptive hypothesis is that cytidine triphosphate deficiency may lead to C-avoidance at the 3' ends of transcripts to control the flow of RNA pol II molecules and reduce their frequency of collisions. PMID:22208287
Cloning, characterization and sequence comparison of the gene coding for IMP dehydrogenase from Pyrococcus furiosus.

PubMed

Collart, F R; Osipiuk, J; Trent, J; Olsen, G J; Huberman, E

1996-10-03

We have cloned and characterized the gene encoding inosine monophosphate dehydrogenase (IMPDH) from Pyrococcus furiosus (Pf), a hyperthermophillic archeon. Sequence analysis of the Pf gene indicated an open reading frame specifying a protein of 485 amino acids (aa) with a calculated M(r) of 52900. Canonical Archaea promoter elements, Box A and Box B, are located -49 and -17 nucleotides (nt), respectively, upstream of the putative start codon. The sequence of the putative active-site region conforms to the IMPDH signature motif and contains a putative active-site cysteine. Phylogenetic relationships derived by using all available IMPDH sequences are consistent with trees developed for other molecules; they do not precisely resolve the history of Pf IMPDH but indicate a close similarity to bacterial IMPDH proteins. The phylogenetic analysis indicates that a gene duplication occurred prior to the division between rodents and humans, accounting for the Type I and II isoforms identified in mice and humans.
An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics.

PubMed

Omasits, Ulrich; Varadarajan, Adithi R; Schmid, Michael; Goetze, Sandra; Melidis, Damianos; Bourqui, Marc; Nikolayeva, Olga; Québatte, Maxime; Patrignani, Andrea; Dehio, Christoph; Frey, Juerg E; Robinson, Mark D; Wollscheid, Bernd; Ahrens, Christian H

2017-12-01

Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae , Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote. © 2017 Omasits et al.; Published by Cold Spring Harbor Laboratory Press.
Alternative splicing for members of human mosaic domain superfamilies. I. The CH and LIM domains containing group of proteins.

PubMed

Friedberg, Felix

2009-05-01

In this paper we examine (restricted to homo sapiens) the products resulting from gene duplication and the subsequent alternative splicing for the members of a multidomain group of proteins which possess the evolutionary conserved calponin homology CH domain, i.e. an "actin binding domain", as a singlet and which, in addition, contain the conserved cysteine rich double Zn finger possessing Lim domain, also as a singlet. Seven genes, resulting from gene duplications, were identified that code for seven group members for which pre-mRNAs appear to have undergone multiple alternative splicing: Mical 1, 2 and 3 are located on chromosomes 6q21, 11p15 and 22q11, respectively. The LMO7 gene is present on chromosome 13q22 and the LIMCH1 gene on chromosome 4p13. Micall1 is mapped to chromosome 22q13 and Micall2 to chromosome 7p22. Translated Gen/Bank ESTs suggest the existence of multiple products alternatively spliced from the pre-mRNAs encoded by these genes. Characteristic indicators of such splicing among the proteins derived from one gene must include containment of some common extensive 100% identical regions. In some instances only one exon might be partly or completely eliminated. Sometimes alternative splicing is also associated with an increased frequency of creation of an exon or part of an exon from an intron. Not only coding regions for the body of the protein but also for its N- or -C ends could be affected by the splicing. If created forms are merely beginning at different starting points but remain identical in sequence thereafter, their existence as products of alternate splicing must be questioned. In the splicings, described in this paper, multiple isoforms rather than a single isoform appear as products during the gene expression.
Perspectives on the mechanism of transcriptional regulation by long non-coding RNAs.

PubMed

Roberts, Thomas C; Morris, Kevin V; Weinberg, Marc S

2014-01-01

Long non-coding RNAs (lncRNAs) are increasingly being recognized as epigenetic regulators of gene transcription. The diversity and complexity of lncRNA genes means that they exert their regulatory effects by a variety of mechanisms. Although there is still much to be learned about the mechanism of lncRNA function, general principles are starting to emerge. In particular, the application of high throughput (deep) sequencing methodologies has greatly advanced our understanding of lncRNA gene function. lncRNAs function as adaptors that link specific chromatin loci with chromatin-remodeling complexes and transcription factors. lncRNAs can act in cis or trans to guide epigenetic-modifier complexes to distinct genomic sites, or act as scaffolds which recruit multiple proteins simultaneously, thereby coordinating their activities. In this review we discuss the genomic organization of lncRNAs, the importance of RNA secondary structure to lncRNA functionality, the multitude of ways in which they interact with the genome, and what evolutionary conservation tells us about their function.
Nucleotide sequence analysis of the L gene of Newcastle disease virus: homologies with Sendai and vesicular stomatitis viruses.

PubMed Central

Yusoff, K; Millar, N S; Chambers, P; Emmerson, P T

1987-01-01

The nucleotide sequence of the L gene of the Beaudette C strain of Newcastle disease virus (NDV) has been determined. The L gene is 6704 nucleotides long and encodes a protein of 2204 amino acids with a calculated molecular weight of 248822. Mung bean nuclease mapping of the 5' terminus of the L gene mRNA indicates that the transcription of the L gene is initiated 11 nucleotides upstream of the translational start site. Comparison with the amino acid sequences of the L genes of Sendai virus and vesicular stomatitis virus (VSV) suggests that there are several regions of homology between the sequences. These data provide further evidence for an evolutionary relationship between the Paramyxoviridae and the Rhabdoviridae. A non-coding sequence of 46 nucleotides downstream of the presumed polyadenylation site of the L gene may be part of a negative strand leader RNA. Images PMID:3035486
The primary transcriptome of the marine diazotroph Trichodesmium erythraeum IMS101

NASA Astrophysics Data System (ADS)

Pfreundt, Ulrike; Kopf, Matthias; Belkin, Natalia; Berman-Frank, Ilana; Hess, Wolfgang R.

2014-08-01

Blooms of the dinitrogen-fixing marine cyanobacterium Trichodesmium considerably contribute to new nitrogen inputs into tropical oceans. Intriguingly, only 60% of the Trichodesmium erythraeum IMS101 genome sequence codes for protein, compared with ~85% in other sequenced cyanobacterial genomes. The extensive non-coding genome fraction suggests space for an unusually high number of unidentified, potentially regulatory non-protein-coding RNAs (ncRNAs). To identify the transcribed fraction of the genome, here we present a genome-wide map of transcriptional start sites (TSS) at single nucleotide resolution, revealing the activity of 6,080 promoters. We demonstrate that T. erythraeum has the highest number of actively splicing group II introns and the highest percentage of TSS yielding ncRNAs of any bacterium examined to date. We identified a highly transcribed retroelement that serves as template repeat for the targeted mutation of at least 12 different genes by mutagenic homing. Our findings explain the non-coding portion of the T. erythraeum genome by the transcription of an unusually high number of non-coding transcripts in addition to the known high incidence of transposable elements. We conclude that riboregulation and RNA maturation-dependent processes constitute a major part of the Trichodesmium regulatory apparatus.
The Glucuronic Acid Utilization Gene Cluster from Bacillus stearothermophilus T-6

PubMed Central

Shulami, Smadar; Gat, Orit; Sonenshein, Abraham L.; Shoham, Yuval

1999-01-01

A λ-EMBL3 genomic library of Bacillus stearothermophilus T-6 was screened for hemicellulolytic activities, and five independent clones exhibiting β-xylosidase activity were isolated. The clones overlap each other and together represent a 23.5-kb chromosomal segment. The segment contains a cluster of xylan utilization genes, which are organized in at least three transcriptional units. These include the gene for the extracellular xylanase, xylanase T-6; part of an operon coding for an intracellular xylanase and a β-xylosidase; and a putative 15.5-kb-long transcriptional unit, consisting of 12 genes involved in the utilization of α-d-glucuronic acid (GlcUA). The first four genes in the potential GlcUA operon (orf1, -2, -3, and -4) code for a putative sugar transport system with characteristic components of the binding-protein-dependent transport systems. The most likely natural substrate for this transport system is aldotetraouronic acid [2-O-α-(4-O-methyl-α-d-glucuronosyl)-xylotriose] (MeGlcUAXyl3). The following two genes code for an intracellular α-glucuronidase (aguA) and a β-xylosidase (xynB). Five more genes (kdgK, kdgA, uxaC, uxuA, and uxuB) encode proteins that are homologous to enzymes involved in galacturonate and glucuronate catabolism. The gene cluster also includes a potential regulatory gene, uxuR, the product of which resembles repressors of the GntR family. The apparent transcriptional start point of the cluster was determined by primer extension analysis and is located 349 bp from the initial ATG codon. The potential operator site is a perfect 12-bp inverted repeat located downstream from the promoter between nucleotides +170 and +181. Gel retardation assays indicated that UxuR binds specifically to this sequence and that this binding is efficiently prevented in vitro by MeGlcUAXyl3, the most likely molecular inducer. PMID:10368143
Maize GO annotation—methods, evaluation, and review (maize-GAMER)

USDA-ARS?s Scientific Manuscript database

We created a new high-coverage, robust, and reproducible functional annotation of maize protein-coding genes based on Gene Ontology (GO) term assignments. Whereas the existing Phytozome and Gramene maize GO annotation sets only cover 41% and 56% of maize protein-coding genes, respectively, this stu...
How the Sequence of a Gene Specifies Structural Symmetry in Proteins

PubMed Central

Shen, Xiaojuan; Huang, Tongcheng; Wang, Guanyu; Li, Guanglin

2015-01-01

Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules. PMID:26641668
Transcriptome profile of a bovine respiratory disease pathogen: Mannheimia haemolytica PHL213

PubMed Central

2012-01-01

Background Computational methods for structural gene annotation have propelled gene discovery but face certain drawbacks with regards to prokaryotic genome annotation. Identification of transcriptional start sites, demarcating overlapping gene boundaries, and identifying regulatory elements such as small RNA are not accurate using these approaches. In this study, we re-visit the structural annotation of Mannheimia haemolytica PHL213, a bovine respiratory disease pathogen. M. haemolytica is one of the causative agents of bovine respiratory disease that results in about $3 billion annual losses to the cattle industry. We used RNA-Seq and analyzed the data using freely-available computational methods and resources. The aim was to identify previously unannotated regions of the genome using RNA-Seq based expression profile to complement the existing annotation of this pathogen. Results Using the Illumina Genome Analyzer, we generated 9,055,826 reads (average length ~76 bp) and aligned them to the reference genome using Bowtie. The transcribed regions were analyzed using SAMTOOLS and custom Perl scripts in conjunction with BLAST searches and available gene annotation information. The single nucleotide resolution map enabled the identification of 14 novel protein coding regions as well as 44 potential novel sRNA. The basal transcription profile revealed that 2,506 of the 2,837 annotated regions were expressed in vitro, at 95.25% coverage, representing all broad functional gene categories in the genome. The expression profile also helped identify 518 potential operon structures involving 1,086 co-expressed pairs. We also identified 11 proteins with mutated/alternate start codons. Conclusions The application of RNA-Seq based transcriptome profiling to structural gene annotation helped correct existing annotation errors and identify potential novel protein coding regions and sRNA. We used computational tools to predict regulatory elements such as promoters and terminators associated with the novel expressed regions for further characterization of these novel functional elements. Our study complements the existing structural annotation of Mannheimia haemolytica PHL213 based on experimental evidence. Given the role of sRNA in virulence gene regulation and stress response, potential novel sRNA described in this study can form the framework for future studies to determine the role of sRNA, if any, in M. haemolytica pathogenesis. PMID:23046475
Complete mitochondrial genome sequence from an endangered Indian snake, Python molurus molurus (Serpentes, Pythonidae).

PubMed

Dubey, Bhawna; Meganathan, P R; Haque, Ikramul

2012-07-01

This paper reports the complete mitochondrial genome sequence of an endangered Indian snake, Python molurus molurus (Indian Rock Python). A typical snake mitochondrial (mt) genome of 17258 bp length comprising of 37 genes including the 13 protein coding genes, 22 tRNA genes, and 2 ribosomal RNA genes along with duplicate control regions is described herein. The P. molurus molurus mt. genome is relatively similar to other snake mt. genomes with respect to gene arrangement, composition, tRNA structures and skews of AT/GC bases. The nucleotide composition of the genome shows that there are more A-C % than T-G% on the positive strand as revealed by positive AT and CG skews. Comparison of individual protein coding genes, with other snake genomes suggests that ATP8 and NADH3 genes have high divergence rates. Codon usage analysis reveals a preference of NNC codons over NNG codons in the mt. genome of P. molurus. Also, the synonymous and non-synonymous substitution rates (ka/ks) suggest that most of the protein coding genes are under purifying selection pressure. The phylogenetic analyses involving the concatenated 13 protein coding genes of P. molurus molurus conformed to the previously established snake phylogeny.
Complete genome sequence of Fer-de-Lance Virus reveals a novel gene in reptilian Paramyxoviruses

USGS Publications Warehouse

Kurath, G.; Batts, W.N.; Ahne, W.; Winton, J.R.

2004-01-01

The complete RNA genome sequence of the archetype reptilian paramyxovirus, Fer-de-Lance virus (FDLV), has been determined. The genome is 15,378 nucleotides in length and consists of seven nonoverlapping genes in the order 3??? N-U-P-M-F-HN-L 5???, coding for the nucleocapsid, unknown, phospho-, matrix, fusion, hemagglutinin-neuraminidase, and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and tri-nucleotide intergenic regions similar to those of other Paramyxoviridae. The FDLV P gene expression strategy is like that of rubulaviruses, which express the accessory V protein from the primary transcript and edit a portion of the mRNA to encode P and I proteins. There is also an overlapping open reading frame potentially encoding a small basic protein in the P gene. The gene designated U (unknown), encodes a deduced protein of 19.4 kDa that has no counterpart in other paramyxoviruses and has no similarity with sequences in the National Center for Biotechnology Information database. Active transcription of the U gene in infected cells was demonstrated by Northern blot analysis, and bicistronic N-U mRNA was also evident. The genomes of two other snake paramyxovirus genotypes were also found to have U genes, with 11 to 16% nucleotide divergence from the FDLV U gene. Pairwise comparisons of amino acid identities and phylogenetic analyses of all deduced FDLV protein sequences with homologous sequences from other Paramyxoviridae indicate that FDLV represents a new genus within the subfamily Paramyxovirinae. We suggest the name Ferlavirus for the new genus, with FDLV as the type species.
A genetic screen for terminator function in yeast identifies a role for a new functional domain in termination factor Nab3

PubMed Central

Loya, Travis J.; O’Rourke, Thomas W.; Reines, Daniel

2012-01-01

The yeast IMD2 gene encodes an enzyme involved in GTP synthesis. Its expression is controlled by guanine nucleotides through a set of alternate start sites and an intervening transcriptional terminator. In the off state, transcription results in a short non-coding RNA that starts upstream of the gene. Transcription terminates via the Nrd1-Nab3-Sen1 complex and is degraded by the nuclear exosome. Using a sensitive terminator read-through assay, we identified trans-acting Terminator Override (TOV) genes that operate this terminator. Four genes were identified: the RNA polymerase II phosphatase SSU72, the RNA polymerase II binding protein PCF11, the TRAMP subunit TRF4 and the hnRNP-like, NAB3. The TOV phenotype can be explained by the loss of function of these gene products as described in models in which termination and RNA degradation are coupled to the phosphorylation state of RNA polymerase II's repeat domain. The most interesting mutations were those found in NAB3, which led to the finding that the removal of merely three carboxy-terminal amino acids compromised Nab3's function. This region of previously unknown function is distant from the protein's well-known RNA binding and Nrd1 binding domains. Structural homology modeling suggests this Nab3 ‘tail’ forms an α-helical multimerization domain that helps assemble it onto an RNA substrate. PMID:22564898
The mitochondrial genome of the quiet-calling katydids, Xizicus fascipes (Orthoptera: Tettigoniidae: Meconematinae).

PubMed

Yang, Ming Ru; Zhou, Zhi Jun; Chang, Yan Lin; Zhao, Le Hong

2012-08-01

To help determine whether the typical arthropod arrangement was a synapomorphy for the whole Tettigoniidae, we sequenced the mitochondrial genome (mitogenome) of the quiet-calling katydids, Xizicus fascipes (Orthoptera: Tettigoniidae: Meconematinae). The 16,166-bp nucleotide sequences of X. fascipes mitogenome contains the typical gene content, gene order, base composition, and codon usage found in arthropod mitogenomes. As a whole, the X. fascipes mitogenome contains a lower A+T content (70.2%) found in the complete orthopteran mitogenomes determined to date. All protein-coding genes started with a typical ATN codon. Ten of the 13 protein-coding genes have a complete termination codon, but the remaining three genes (COIII, ND5 and ND4) terminate with incomplete T. All tRNAs have the typical clover-leaf structure of mitogenome tRNA, except for tRNA(Ser(AGN)), in which lengthened anticodon stem (9 bp) with a bulged nuleotide in the middle, an unusual T-stem (6 bp in constrast to the normal 5 bp), a mini DHU arm (2 bp) and no connector nucleotides. In the A+T-rich region, two (TA)n conserved blocks that were previously described in Ensifera and two 150-bp tandem repeats plus a partial copy of the composed at 61 bp of the beginning were present. Phylogenetic analysis found: i) the monophyly of Conocephalinae was interrupted by Elimaea cheni from Phaneropterinae; and ii) Meconematinae was the most basal group among these five subfamilies.
Polymerization of non-complementary RNA: systematic symmetric nucleotide exchanges mainly involving uracil produce mitochondrial RNA transcripts coding for cryptic overlapping genes.

PubMed

Seligmann, Hervé

2013-03-01

Usual DNA→RNA transcription exchanges T→U. Assuming different systematic symmetric nucleotide exchanges during translation, some GenBank RNAs match exactly human mitochondrial sequences (exchange rules listed in decreasing transcript frequencies): C↔U, A↔U, A↔U+C↔G (two nucleotide pairs exchanged), G↔U, A↔G, C↔G, none for A↔C, A↔G+C↔U, and A↔C+G↔U. Most unusual transcripts involve exchanging uracil. Independent measures of rates of rare replicational enzymatic DNA nucleotide misinsertions predict frequencies of RNA transcripts systematically exchanging the corresponding misinserted nucleotides. Exchange transcripts self-hybridize less than other gene regions, self-hybridization increases with length, suggesting endoribonuclease-limited elongation. Blast detects stop codon depleted putative protein coding overlapping genes within exchange-transcribed mitochondrial genes. These align with existing GenBank proteins (mainly metazoan origins, prokaryotic and viral origins underrepresented). These GenBank proteins frequently interact with RNA/DNA, are membrane transporters, or are typical of mitochondrial metabolism. Nucleotide exchange transcript frequencies increase with overlapping gene densities and stop densities, indicating finely tuned counterbalancing regulation of expression of systematic symmetric nucleotide exchange-encrypted proteins. Such expression necessitates combined activities of suppressor tRNAs matching stops, and nucleotide exchange transcription. Two independent properties confirm predicted exchanged overlap coding genes: discrepancy of third codon nucleotide contents from replicational deamination gradients, and codon usage according to circular code predictions. Predictions from both properties converge, especially for frequent nucleotide exchange types. Nucleotide exchanging transcription apparently increases coding densities of protein coding genes without lengthening genomes, revealing unsuspected functional DNA coding potential. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Comparative analysis of human protein-coding and noncoding RNAs between brain and 10 mixed cell lines by RNA-Seq.

PubMed

Chen, Geng; Yin, Kangping; Shi, Leming; Fang, Yuanzhang; Qi, Ya; Li, Peng; Luo, Jian; He, Bing; Liu, Mingyao; Shi, Tieliu

2011-01-01

In their expression process, different genes can generate diverse functional products, including various protein-coding or noncoding RNAs. Here, we investigated the protein-coding capacities and the expression levels of their isoforms for human known genes, the conservation and disease association of long noncoding RNAs (ncRNAs) with two transcriptome sequencing datasets from human brain tissues and 10 mixed cell lines. Comparative analysis revealed that about two-thirds of the genes expressed between brain and cell lines are the same, but less than one-third of their isoforms are identical. Besides those genes specially expressed in brain and cell lines, about 66% of genes expressed in common encoded different isoforms. Moreover, most genes dominantly expressed one isoform and some genes only generated protein-coding (or noncoding) RNAs in one sample but not in another. We found 282 human genes could encode both protein-coding and noncoding RNAs through alternative splicing in the two samples. We also identified more than 1,000 long ncRNAs, and most of those long ncRNAs contain conserved elements across either 46 vertebrates or 33 placental mammals or 10 primates. Further analysis showed that some long ncRNAs differentially expressed in human breast cancer or lung cancer, several of those differentially expressed long ncRNAs were validated by RT-PCR. In addition, those validated differentially expressed long ncRNAs were found significantly correlated with certain breast cancer or lung cancer related genes, indicating the important biological relevance between long ncRNAs and human cancers. Our findings reveal that the differences of gene expression profile between samples mainly result from the expressed gene isoforms, and highlight the importance of studying genes at the isoform level for completely illustrating the intricate transcriptome.
Transcription Factor Binding Profiles Reveal Cyclic Expression of Human Protein-coding Genes and Non-coding RNAs

PubMed Central

Cheng, Chao; Ung, Matthew; Grant, Gavin D.; Whitfield, Michael L.

2013-01-01

Cell cycle is a complex and highly supervised process that must proceed with regulatory precision to achieve successful cellular division. Despite the wide application, microarray time course experiments have several limitations in identifying cell cycle genes. We thus propose a computational model to predict human cell cycle genes based on transcription factor (TF) binding and regulatory motif information in their promoters. We utilize ENCODE ChIP-seq data and motif information as predictors to discriminate cell cycle against non-cell cycle genes. Our results show that both the trans- TF features and the cis- motif features are predictive of cell cycle genes, and a combination of the two types of features can further improve prediction accuracy. We apply our model to a complete list of GENCODE promoters to predict novel cell cycle driving promoters for both protein-coding genes and non-coding RNAs such as lincRNAs. We find that a similar percentage of lincRNAs are cell cycle regulated as protein-coding genes, suggesting the importance of non-coding RNAs in cell cycle division. The model we propose here provides not only a practical tool for identifying novel cell cycle genes with high accuracy, but also new insights on cell cycle regulation by TFs and cis-regulatory elements. PMID:23874175

An operon from Lactobacillus helveticus composed of a proline iminopeptidase gene (pepI) and two genes coding for putative members of the ABC transporter family of proteins.

PubMed

Varmanen, P; Rantanen, T; Palva, A

1996-12-01

A proline iminopeptidase gene (pepI) of an industrial Lactobacillus helveticus strain was cloned and found to be organized in an operon-like structure of three open reading frames (ORF1, ORF2 and ORF3). ORF1 was preceded by a typical prokaryotic promoter region, and a putative transcription terminator was found downstream of ORF3, identified as the pepI gene. Using primer-extension analyses, only one transcription start site, upstream of ORF1, was identifiable in the predicted operon. Although the size of mRNA could not be judged by Northern analysis either with ORF1-, ORF2- or pepI-specific probes, reverse transcription-PCR analyses further supported the operon structure of the three genes. ORF1, ORF2 and ORF3 had coding capacities for 50.7, 24.5 and 33.8 kDa proteins, respectively. The ORF3-encoded PepI protein showed 65% identity with the PepI proteins from Lactobacillus delbrueckii subsp. bulgaricus and Lactobacillus delbrueckii subsp. lactis. The ORF1-encoded protein had significant homology with several members of the ABC transporter family but, with two distinct putative ATP-binding sites, it would represent an unusual type among the bacterial ABC transporters. ORF2 encoded a putative integral membrane protein also characteristic of the ABC transporter family. The pepI gene was overexpressed in Escherichia coli. Purified PepI hydrolysed only di and tripeptides with proline in the first position. Optimum PepI activity was observed at pH 7.5 and 40 degrees C. A gel filtration analysis indicated that PepI is a dimer of M(r) 53,000. PepI was shown to be a metal-independent serine peptidase having thiol groups at or near the active site. Kinetic studies with proline-p-nitroanilide as substrate revealed Km and Vmax values of 0.8 mM and 350 mmol min-1 mg-1, respectively, and a very high turnover number of 135,000 s-1.
Discovery of rare protein-coding genes in model methylotroph Methylobacterium extorquens AM1.

PubMed

Kumar, Dhirendra; Mondal, Anupam Kumar; Yadav, Amit Kumar; Dash, Debasis

2014-12-01

Proteogenomics involves the use of MS to refine annotation of protein-coding genes and discover genes in a genome. We carried out comprehensive proteogenomic analysis of Methylobacterium extorquens AM1 (ME-AM1) from publicly available proteomics data with a motive to improve annotation for methylotrophs; organisms capable of surviving in reduced carbon compounds such as methanol. Besides identifying 2482(50%) proteins, 29 new genes were discovered and 66 annotated gene models were revised in ME-AM1 genome. One such novel gene is identified with 75 peptides, lacks homolog in other methylobacteria but has glycosyl transferase and lipopolysaccharide biosynthesis protein domains, indicating its potential role in outer membrane synthesis. Many novel genes are present only in ME-AM1 among methylobacteria. Distant homologs of these genes in unrelated taxonomic classes and low GC-content of few genes suggest lateral gene transfer as a potential mode of their origin. Annotations of methylotrophy related genes were also improved by the discovery of a short gene in methylotrophy gene island and redefining a gene important for pyrroquinoline quinone synthesis, essential for methylotrophy. The combined use of proteogenomics and rigorous bioinformatics analysis greatly enhanced the annotation of protein-coding genes in model methylotroph ME-AM1 genome. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Consistency of gene starts among Burkholderia genomes

PubMed Central

2011-01-01

Background Evolutionary divergence in the position of the translational start site among orthologous genes can have significant functional impacts. Divergence can alter the translation rate, degradation rate, subcellular location, and function of the encoded proteins. Results Existing Genbank gene maps for Burkholderia genomes suggest that extensive divergence has occurred--53% of ortholog sets based on Genbank gene maps had inconsistent gene start sites. However, most of these inconsistencies appear to be gene-calling errors. Evolutionary divergence was the most plausible explanation for only 17% of the ortholog sets. Correcting probable errors in the Genbank gene maps decreased the percentage of ortholog sets with inconsistent starts by 68%, increased the percentage of ortholog sets with extractable upstream intergenic regions by 32%, increased the sequence similarity of intergenic regions and predicted proteins, and increased the number of proteins with identifiable signal peptides. Conclusions Our findings highlight an emerging problem in comparative genomics: single-digit percent errors in gene predictions can lead to double-digit percentages of inconsistent ortholog sets. The work demonstrates a simple approach to evaluate and improve the quality of gene maps. PMID:21342528
Complete mitochondrial genome of the Freshwater Whipray Himantura dalyensis.

PubMed

Feutry, Pierre; Kyne, Peter M; Peng, Zaiqing; Pan, Lianghao; Chen, Xiao

2016-05-01

The complete mitochondrial genome of the Freshwater Whipray Himantura dalyensis is presented in this study. It is 17,693 bp in length and contains 37 genes in typical gene order and transcriptional orientation observed in vertebrates. There were a total of 86 bp short intergenic spacers and 22 bp overlaps in the genome. The overall base composition was 31.4% A, 25.5% C, 13.2% G and 29.9% T. Two start codons (GTG and ATG) and two stop codons (TAG and TAA/T) were found in 13 protein-coding genes. The length of 22 tRNA genes ranged from 68 (tRNA-Cys and tRNA-Ser2) to 75 bp (tRNA-Leu1). The origin of L-strand replication (OL) was found between the tRNA-Asn and tRNA-Cys genes. The base composition of the control region (1940 bp) was similar to the whole mitogenome.
Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.

PubMed

Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M

2010-12-15

Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.
Characterization of mitochondrial genome of sea cucumber Stichopus horrens: a novel gene arrangement in Holothuroidea.

PubMed

Fan, SiGang; Hu, ChaoQun; Wen, Jing; Zhang, LvPing

2011-05-01

The complete mitochondrial DNA sequence contains useful information for phylogenetic analyses of metazoa. In this study, the complete mitochondrial DNA sequence of sea cucumber Stichopus horrens (Holothuroidea: Stichopodidae: Stichopus) is presented. The complete sequence was determined using normal and long PCRs. The mitochondrial genome of Stichopus horrens is a circular molecule 16257 bps long, composed of 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes. Most of these genes are coded on the heavy strand except for one protein-coding gene (nad6) and five tRNA genes (tRNA ( Ser(UCN) ), tRNA ( Gln ), tRNA ( Ala ), tRNA ( Val ), tRNA ( Asp )) which are coded on the light strand. The composition of the heavy strand is 30.8% A, 23.7% C, 16.2% G, and 29.3% T bases (AT skew=0.025; GC skew=-0.188). A non-coding region of 675 bp was identified as a putative control region because of its location and AT richness. The intergenic spacers range from 1 to 50 bp in size, totaling 227 bp. A total of 25 overlapping nucleotides, ranging from 1 to 10 bp in size, exist among 11 genes. All 13 protein-coding genes are initiated with an ATG. The TAA codon is used as the stop codon in all the protein coding genes except nad3 and nad4 that use TAG as their termination codon. The most frequently used amino acids are Leu (16.29%), Ser (10.34%) and Phe (8.37%). All of the tRNA genes have the potential to fold into typical cloverleaf secondary structures. We also compared the order of the genes in the mitochondrial DNA from the five holothurians that are now available and found a novel gene arrangement in the mitochondrial DNA of Stichopus horrens.
De Novo ORFs in Drosophila Are Important to Organismal Fitness and Evolved Rapidly from Previously Non-coding Sequences

PubMed Central

Reinhardt, Josephine A.; Wanjiru, Betty M.; Brant, Alicia T.; Saelao, Perot; Begun, David J.; Jones, Corbin D.

2013-01-01

How non-coding DNA gives rise to new protein-coding genes (de novo genes) is not well understood. Recent work has revealed the origins and functions of a few de novo genes, but common principles governing the evolution or biological roles of these genes are unknown. To better define these principles, we performed a parallel analysis of the evolution and function of six putatively protein-coding de novo genes described in Drosophila melanogaster. Reconstruction of the transcriptional history of de novo genes shows that two de novo genes emerged from novel long non-coding RNAs that arose at least 5 MY prior to evolution of an open reading frame. In contrast, four other de novo genes evolved a translated open reading frame and transcription within the same evolutionary interval suggesting that nascent open reading frames (proto-ORFs), while not required, can contribute to the emergence of a new de novo gene. However, none of the genes arose from proto-ORFs that existed long before expression evolved. Sequence and structural evolution of de novo genes was rapid compared to nearby genes and the structural complexity of de novo genes steadily increases over evolutionary time. Despite the fact that these genes are transcribed at a higher level in males than females, and are most strongly expressed in testes, RNAi experiments show that most of these genes are essential in both sexes during metamorphosis. This lethality suggests that protein coding de novo genes in Drosophila quickly become functionally important. PMID:24146629
Methylation of miRNA genes and oncogenesis.

PubMed

Loginov, V I; Rykov, S V; Fridman, M V; Braga, E A

2015-02-01

Interaction between microRNA (miRNA) and messenger RNA of target genes at the posttranscriptional level provides fine-tuned dynamic regulation of cell signaling pathways. Each miRNA can be involved in regulating hundreds of protein-coding genes, and, conversely, a number of different miRNAs usually target a structural gene. Epigenetic gene inactivation associated with methylation of promoter CpG-islands is common to both protein-coding genes and miRNA genes. Here, data on functions of miRNAs in development of tumor-cell phenotype are reviewed. Genomic organization of promoter CpG-islands of the miRNA genes located in inter- and intragenic areas is discussed. The literature and our own results on frequency of CpG-island methylation in miRNA genes from tumors are summarized, and data regarding a link between such modification and changed activity of miRNA genes and, consequently, protein-coding target genes are presented. Moreover, the impact of miRNA gene methylation on key oncogenetic processes as well as affected signaling pathways is discussed.
Unraveling patterns of site-to-site synonymous rates variation and associated gene properties of protein domains and families.

PubMed

Dimitrieva, Slavica; Anisimova, Maria

2014-01-01

In protein-coding genes, synonymous mutations are often thought not to affect fitness and therefore are not subject to natural selection. Yet increasingly, cases of non-neutral evolution at certain synonymous sites were reported over the last decade. To evaluate the extent and the nature of site-specific selection on synonymous codons, we computed the site-to-site synonymous rate variation (SRV) and identified gene properties that make SRV more likely in a large database of protein-coding gene families and protein domains. To our knowledge, this is the first study that explores the determinants and patterns of the SRV in real data. We show that the SRV is widespread in the evolution of protein-coding sequences, putting in doubt the validity of the synonymous rate as a standard neutral proxy. While protein domains rarely undergo adaptive evolution, the SRV appears to play important role in optimizing the domain function at the level of DNA. In contrast, protein families are more likely to evolve by positive selection, but are less likely to exhibit SRV. Stronger SRV was detected in genes with stronger codon bias and tRNA reusage, those coding for proteins with larger number of interactions or forming larger number of structures, located in intracellular components and those involved in typically conserved complex processes and functions. Genes with extreme SRV show higher expression levels in nearly all tissues. This indicates that codon bias in a gene, which often correlates with gene expression, may often be a site-specific phenomenon regulating the speed of translation along the sequence, consistent with the co-translational folding hypothesis. Strikingly, genes with SRV were strongly overrepresented for metabolic pathways and those associated with several genetic diseases, particularly cancers and diabetes.
Multiple copies of genes coding for electron transport proteins in the bacterium Nitrosomonas europaea.

PubMed

McTavish, H; LaQuier, F; Arciero, D; Logan, M; Mundfrom, G; Fuchs, J A; Hooper, A B

1993-04-01

The genome of Nitrosomonas europaea contains at least three copies each of the genes coding for hydroxylamine oxidoreductase (HAO) and cytochrome c554. A copy of an HAO gene is always located within 2.7 kb of a copy of a cytochrome c554 gene. Cytochrome P-460, a protein that shares very unusual spectral features with HAO, was found to be encoded by a gene separate from the HAO genes.
Morphometric Analysis of Recognized Genes for Autism Spectrum Disorders and Obesity in Relationship to the Distribution of Protein-Coding Genes on Human Chromosomes.

PubMed

McGuire, Austen B; Rafi, Syed K; Manzardo, Ann M; Butler, Merlin G

2016-05-05

Mammalian chromosomes are comprised of complex chromatin architecture with the specific assembly and configuration of each chromosome influencing gene expression and function in yet undefined ways by varying degrees of heterochromatinization that result in Giemsa (G) negative euchromatic (light) bands and G-positive heterochromatic (dark) bands. We carried out morphometric measurements of high-resolution chromosome ideograms for the first time to characterize the total euchromatic and heterochromatic chromosome band length, distribution and localization of 20,145 known protein-coding genes, 790 recognized autism spectrum disorder (ASD) genes and 365 obesity genes. The individual lengths of G-negative euchromatin and G-positive heterochromatin chromosome bands were measured in millimeters and recorded from scaled and stacked digital images of 850-band high-resolution ideograms supplied by the International Society of Chromosome Nomenclature (ISCN) 2013. Our overall measurements followed established banding patterns based on chromosome size. G-negative euchromatic band regions contained 60% of protein-coding genes while the remaining 40% were distributed across the four heterochromatic dark band sub-types. ASD genes were disproportionately overrepresented in the darker heterochromatic sub-bands, while the obesity gene distribution pattern did not significantly differ from protein-coding genes. Our study supports recent trends implicating genes located in heterochromatin regions playing a role in biological processes including neurodevelopment and function, specifically genes associated with ASD.
Genomic Structure of an Economically Important Cyanobacterium, Arthrospira (Spirulina) platensis NIES-39

PubMed Central

Fujisawa, Takatomo; Narikawa, Rei; Okamoto, Shinobu; Ehira, Shigeki; Yoshimura, Hidehisa; Suzuki, Iwane; Masuda, Tatsuru; Mochimaru, Mari; Takaichi, Shinichi; Awai, Koichiro; Sekine, Mitsuo; Horikawa, Hiroshi; Yashiro, Isao; Omata, Seiha; Takarada, Hiromi; Katano, Yoko; Kosugi, Hiroki; Tanikawa, Satoshi; Ohmori, Kazuko; Sato, Naoki; Ikeuchi, Masahiko; Fujita, Nobuyuki; Ohmori, Masayuki

2010-01-01

A filamentous non-N2-fixing cyanobacterium, Arthrospira (Spirulina) platensis, is an important organism for industrial applications and as a food supply. Almost the complete genome of A. platensis NIES-39 was determined in this study. The genome structure of A. platensis is estimated to be a single, circular chromosome of 6.8 Mb, based on optical mapping. Annotation of this 6.7 Mb sequence yielded 6630 protein-coding genes as well as two sets of rRNA genes and 40 tRNA genes. Of the protein-coding genes, 78% are similar to those of other organisms; the remaining 22% are currently unknown. A total 612 kb of the genome comprise group II introns, insertion sequences and some repetitive elements. Group I introns are located in a protein-coding region. Abundant restriction-modification systems were determined. Unique features in the gene composition were noted, particularly in a large number of genes for adenylate cyclase and haemolysin-like Ca2+-binding proteins and in chemotaxis proteins. Filament-specific genes were highlighted by comparative genomic analysis. PMID:20203057
Computer analysis of protein functional sites projection on exon structure of genes in Metazoa.

PubMed

Medvedeva, Irina V; Demenkov, Pavel S; Ivanisenko, Vladimir A

2015-01-01

Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and allow a better understanding of the emergence of biological diversity.
RNAi mediates post-transcriptional repression of gene expression in fission yeast Schizosaccharomyces pombe

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smialowska, Agata, E-mail: smialowskaa@gmail.com; School of Life Sciences, Södertörn Högskola, Huddinge 141-89; Djupedal, Ingela

Highlights: • Protein coding genes accumulate anti-sense sRNAs in fission yeast S. pombe. • RNAi represses protein-coding genes in S. pombe. • RNAi-mediated gene repression is post-transcriptional. - Abstract: RNA interference (RNAi) is a gene silencing mechanism conserved from fungi to mammals. Small interfering RNAs are products and mediators of the RNAi pathway and act as specificity factors in recruiting effector complexes. The Schizosaccharomyces pombe genome encodes one of each of the core RNAi proteins, Dicer, Argonaute and RNA-dependent RNA polymerase (dcr1, ago1, rdp1). Even though the function of RNAi in heterochromatin assembly in S. pombe is established, its rolemore » in controlling gene expression is elusive. Here, we report the identification of small RNAs mapped anti-sense to protein coding genes in fission yeast. We demonstrate that these genes are up-regulated at the protein level in RNAi mutants, while their mRNA levels are not significantly changed. We show that the repression by RNAi is not a result of heterochromatin formation. Thus, we conclude that RNAi is involved in post-transcriptional gene silencing in S. pombe.« less
[Convergent origin of repeats in genes coding for globular proteins. An analysis of the factors determining the presence of inverted and symmetrical repeats].

PubMed

Solov'ev, V V; Kel', A E; Kolchanov, N A

1989-01-01

The factors, determining the presence of inverted and symmetrical repeats in genes coding for globular proteins, have been analysed. An interesting property of genetical code has been revealed in the analysis of symmetrical repeats: the pairs of symmetrical codons corresponded to pairs of amino acids with mostly similar physical-chemical parameters. This property may explain the presence of symmetrical repeats and palindromes only in genes coding for beta-structural proteins-polypeptides, where amino acids with similar physical-chemical properties occupy symmetrical positions. A stochastic model of evolution of polynucleotide sequences has been used for analysis of inverted repeats. The modelling demonstrated that only limiting of sequences (uneven frequencies of used codons) is enough for arising of nonrandom inverted repeats in genes.
The Human Cell Surfaceome of Breast Tumors

PubMed Central

da Cunha, Júlia Pinheiro Chagas; Galante, Pedro Alexandre Favoretto; de Souza, Jorge Estefano Santana; Pieprzyk, Martin; Carraro, Dirce Maria; Old, Lloyd J.; Camargo, Anamaria Aranha; de Souza, Sandro José

2013-01-01

Introduction. Cell surface proteins are ideal targets for cancer therapy and diagnosis. We have identified a set of more than 3700 genes that code for transmembrane proteins believed to be at human cell surface. Methods. We used a high-throuput qPCR system for the analysis of 573 cell surface protein-coding genes in 12 primary breast tumors, 8 breast cell lines, and 21 normal human tissues including breast. To better understand the role of these genes in breast tumors, we used a series of bioinformatics strategies to integrates different type, of the datasets, such as KEGG, protein-protein interaction databases, ONCOMINE, and data from, literature. Results. We found that at least 77 genes are overexpressed in breast primary tumors while at least 2 of them have also a restricted expression pattern in normal tissues. We found common signaling pathways that may be regulated in breast tumors through the overexpression of these cell surface protein-coding genes. Furthermore, a comparison was made between the genes found in this report and other genes associated with features clinically relevant for breast tumorigenesis. Conclusions. The expression profiling generated in this study, together with an integrative bioinformatics analysis, allowed us to identify putative targets for breast tumors. PMID:24195083
Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin

ERIC Educational Resources Information Center

Offner, Susan

2010-01-01

The beta hemoglobin protein is identical in humans and chimpanzees. In this tutorial, students see that even though the proteins are identical, the genes that code for them are not. There are many more differences in the introns than in the exons, which indicates that coding regions of DNA are more highly conserved than non-coding regions.
Transcriptome analysis of thermophilic methylotrophic Bacillus methanolicus MGA3 using RNA-sequencing provides detailed insights into its previously uncharted transcriptional landscape.

PubMed

Irla, Marta; Neshat, Armin; Brautaset, Trygve; Rückert, Christian; Kalinowski, Jörn; Wendisch, Volker F

2015-02-14

Bacillus methanolicus MGA3 is a thermophilic, facultative ribulose monophosphate (RuMP) cycle methylotroph. Together with its ability to produce high yields of amino acids, the relevance of this microorganism as a promising candidate for biotechnological applications is evident. The B. methanolicus MGA3 genome consists of a 3,337,035 nucleotides (nt) circular chromosome, the 19,174 nt plasmid pBM19 and the 68,999 nt plasmid pBM69. 3,218 protein-coding regions were annotated on the chromosome, 22 on pBM19 and 82 on pBM69. In the present study, the RNA-seq approach was used to comprehensively investigate the transcriptome of B. methanolicus MGA3 in order to improve the genome annotation, identify novel transcripts, analyze conserved sequence motifs involved in gene expression and reveal operon structures. For this aim, two different cDNA library preparation methods were applied: one which allows characterization of the whole transcriptome and another which includes enrichment of primary transcript 5'-ends. Analysis of the primary transcriptome data enabled the detection of 2,167 putative transcription start sites (TSSs) which were categorized into 1,642 TSSs located in the upstream region (5'-UTR) of known protein-coding genes and 525 TSSs of novel antisense, intragenic, or intergenic transcripts. Firstly, 14 wrongly annotated translation start sites (TLSs) were corrected based on primary transcriptome data. Further investigation of the identified 5'-UTRs resulted in the detailed characterization of their length distribution and the detection of 75 hitherto unknown cis-regulatory RNA elements. Moreover, the exact TSSs positions were utilized to define conserved sequence motifs for translation start sites, ribosome binding sites and promoters in B. methanolicus MGA3. Based on the whole transcriptome data set, novel transcripts, operon structures and mRNA abundances were determined. The analysis of the operon structures revealed that almost half of the genes are transcribed monocistronically (940), whereas 1,164 genes are organized in 381 operons. Several of the genes related to methylotrophy had highly abundant transcripts. The extensive insights into the transcriptional landscape of B. methanolicus MGA3, gained in this study, represent a valuable foundation for further comparative quantitative transcriptome analyses and possibly also for the development of molecular biology tools which at present are very limited for this organism.
TA-GC cloning: A new simple and versatile technique for the directional cloning of PCR products for recombinant protein expression.

PubMed

Niarchos, Athanasios; Siora, Anastasia; Konstantinou, Evangelia; Kalampoki, Vasiliki; Lagoumintzis, George; Poulas, Konstantinos

2017-01-01

During the last few decades, the recombinant protein expression finds more and more applications. The cloning of protein-coding genes into expression vectors is required to be directional for proper expression, and versatile in order to facilitate gene insertion in multiple different vectors for expression tests. In this study, the TA-GC cloning method is proposed, as a new, simple and efficient method for the directional cloning of protein-coding genes in expression vectors. The presented method features several advantages over existing methods, which tend to be relatively more labour intensive, inflexible or expensive. The proposed method relies on the complementarity between single A- and G-overhangs of the protein-coding gene, obtained after a short incubation with T4 DNA polymerase, and T and C overhangs of the novel vector pET-BccI, created after digestion with the restriction endonuclease BccI. The novel protein-expression vector pET-BccI also facilitates the screening of transformed colonies for recombinant transformants. Evaluation experiments of the proposed TA-GC cloning method showed that 81% of the transformed colonies contained recombinant pET-BccI plasmids, and 98% of the recombinant colonies expressed the desired protein. This demonstrates that TA-GC cloning could be a valuable method for cloning protein-coding genes in expression vectors.
TA-GC cloning: A new simple and versatile technique for the directional cloning of PCR products for recombinant protein expression

PubMed Central

Niarchos, Athanasios; Siora, Anastasia; Konstantinou, Evangelia; Kalampoki, Vasiliki; Poulas, Konstantinos

2017-01-01

During the last few decades, the recombinant protein expression finds more and more applications. The cloning of protein-coding genes into expression vectors is required to be directional for proper expression, and versatile in order to facilitate gene insertion in multiple different vectors for expression tests. In this study, the TA-GC cloning method is proposed, as a new, simple and efficient method for the directional cloning of protein-coding genes in expression vectors. The presented method features several advantages over existing methods, which tend to be relatively more labour intensive, inflexible or expensive. The proposed method relies on the complementarity between single A- and G-overhangs of the protein-coding gene, obtained after a short incubation with T4 DNA polymerase, and T and C overhangs of the novel vector pET-BccI, created after digestion with the restriction endonuclease BccI. The novel protein-expression vector pET-BccI also facilitates the screening of transformed colonies for recombinant transformants. Evaluation experiments of the proposed TA-GC cloning method showed that 81% of the transformed colonies contained recombinant pET-BccI plasmids, and 98% of the recombinant colonies expressed the desired protein. This demonstrates that TA-GC cloning could be a valuable method for cloning protein-coding genes in expression vectors. PMID:29091919

Translatomics combined with transcriptomics and proteomics reveals novel functional, recently evolved orphan genes in Escherichia coli O157:H7 (EHEC).

PubMed

Neuhaus, Klaus; Landstorfer, Richard; Fellner, Lea; Simon, Svenja; Schafferhans, Andrea; Goldberg, Tatyana; Marx, Harald; Ozoline, Olga N; Rost, Burkhard; Kuster, Bernhard; Keim, Daniel A; Scherer, Siegfried

2016-02-24

Genomes of E. coli, including that of the human pathogen Escherichia coli O157:H7 (EHEC) EDL933, still harbor undetected protein-coding genes which, apparently, have escaped annotation due to their small size and non-essential function. To find such genes, global gene expression of EHEC EDL933 was examined, using strand-specific RNAseq (transcriptome), ribosomal footprinting (translatome) and mass spectrometry (proteome). Using the above methods, 72 short, non-annotated protein-coding genes were detected. All of these showed signals in the ribosomal footprinting assay indicating mRNA translation. Seven were verified by mass spectrometry. Fifty-seven genes are annotated in other enterobacteriaceae, mainly as hypothetical genes; the remaining 15 genes constitute novel discoveries. In addition, protein structure and function were predicted computationally and compared between EHEC-encoded proteins and 100-times randomly shuffled proteins. Based on this comparison, 61 of the 72 novel proteins exhibit predicted structural and functional features similar to those of annotated proteins. Many of the novel genes show differential transcription when grown under eleven diverse growth conditions suggesting environmental regulation. Three genes were found to confer a phenotype in previous studies, e.g., decreased cattle colonization. These findings demonstrate that ribosomal footprinting can be used to detect novel protein coding genes, contributing to the growing body of evidence that hypothetical genes are not annotation artifacts and opening an additional way to study their functionality. All 72 genes are taxonomically restricted and, therefore, appear to have evolved relatively recently de novo.
Transcriptome interrogation of human myometrium identifies differentially expressed sense-antisense pairs of protein-coding and long non-coding RNA genes in spontaneous labor at term.

PubMed

Romero, Roberto; Tarca, Adi L; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S; Kalita, Cynthia A; Cai, Juan; Yeo, Lami; Lipovich, Leonard

2014-09-01

To identify differentially expressed long non-coding RNA (lncRNA) genes in human myometrium in women with spontaneous labor at term. Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n = 19) and women in spontaneous labor at term (n = 20). RNA was extracted and profiled using an Illumina® microarray platform. We have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. We identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an experimental method completely independent of the microarray analysis. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site, that lacked evolutionary conservation beyond primates. We provide, for the first time, evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term.
The Complete Mitochondrial DNA Sequence of Scenedesmus obliquus Reflects an Intermediate Stage in the Evolution of the Green Algal Mitochondrial Genome

PubMed Central

Nedelcu, Aurora M.; Lee, Robert W.; Lemieux, Claude; Gray, Michael W.; Burger, Gertraud

2000-01-01

Two distinct mitochondrial genome types have been described among the green algal lineages investigated to date: a reduced–derived, Chlamydomonas-like type and an ancestral, Prototheca-like type. To determine if this unexpected dichotomy is real or is due to insufficient or biased sampling and to define trends in the evolution of the green algal mitochondrial genome, we sequenced and analyzed the mitochondrial DNA (mtDNA) of Scenedesmus obliquus. This genome is 42,919 bp in size and encodes 42 conserved genes (i.e., large and small subunit rRNA genes, 27 tRNA and 13 respiratory protein-coding genes), four additional free-standing open reading frames with no known homologs, and an intronic reading frame with endonuclease/maturase similarity. No 5S rRNA or ribosomal protein-coding genes have been identified in Scenedesmus mtDNA. The standard protein-coding genes feature a deviant genetic code characterized by the use of UAG (normally a stop codon) to specify leucine, and the unprecedented use of UCA (normally a serine codon) as a signal for termination of translation. The mitochondrial genome of Scenedesmus combines features of both green algal mitochondrial genome types: the presence of a more complex set of protein-coding and tRNA genes is shared with the ancestral type, whereas the lack of 5S rRNA and ribosomal protein-coding genes as well as the presence of fragmented and scrambled rRNA genes are shared with the reduced–derived type of mitochondrial genome organization. Furthermore, the gene content and the fragmentation pattern of the rRNA genes suggest that this genome represents an intermediate stage in the evolutionary process of mitochondrial genome streamlining in green algae. [The sequence data described in this paper have been submitted to the GenBank data library under accession no. AF204057.] PMID:10854413
Variations in the non-coding transcriptome as a driver of inter-strain divergence and physiological adaptation in bacteria.

PubMed

Kopf, Matthias; Klähn, Stephan; Scholz, Ingeborg; Hess, Wolfgang R; Voß, Björn

2015-04-22

In all studied organisms, a substantial portion of the transcriptome consists of non-coding RNAs that frequently execute regulatory functions. Here, we have compared the primary transcriptomes of the cyanobacteria Synechocystis sp. PCC 6714 and PCC 6803 under 10 different conditions. These strains share 2854 protein-coding genes and a 16S rRNA identity of 99.4%, indicating their close relatedness. Conserved major transcriptional start sites (TSSs) give rise to non-coding transcripts within the sigB gene, from the 5'UTRs of cmpA and isiA, and 168 loci in antisense orientation. Distinct differences include single nucleotide polymorphisms rendering promoters inactive in one of the strains, e.g., for cmpR and for the asRNA PsbA2R. Based on the genome-wide mapped location, regulation and classification of TSSs, non-coding transcripts were identified as the most dynamic component of the transcriptome. We identified a class of mRNAs that originate by read-through from an sRNA that accumulates as a discrete and abundant transcript while also serving as the 5'UTR. Such an sRNA/mRNA structure, which we name 'actuaton', represents another way for bacteria to remodel their transcriptional network. Our findings support the hypothesis that variations in the non-coding transcriptome constitute a major evolutionary element of inter-strain divergence and capability for physiological adaptation.
Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins

PubMed Central

Delcourt, Vivian; Lucier, Jean-François; Gagnon, Jules; Beaudoin, Maxime C; Vanderperre, Benoît; Breton, Marc-André; Motard, Julie; Jacques, Jean-François; Brunelle, Mylène; Gagnon-Arsenault, Isabelle; Fournier, Isabelle; Ouangraoua, Aida; Hunting, Darel J; Cohen, Alan A; Landry, Christian R; Scott, Michelle S

2017-01-01

Recent functional, proteomic and ribosome profiling studies in eukaryotes have concurrently demonstrated the translation of alternative open-reading frames (altORFs) in addition to annotated protein coding sequences (CDSs). We show that a large number of small proteins could in fact be coded by these altORFs. The putative alternative proteins translated from altORFs have orthologs in many species and contain functional domains. Evolutionary analyses indicate that altORFs often show more extreme conservation patterns than their CDSs. Thousands of alternative proteins are detected in proteomic datasets by reanalysis using a database containing predicted alternative proteins. This is illustrated with specific examples, including altMiD51, a 70 amino acid mitochondrial fission-promoting protein encoded in MiD51/Mief1/SMCR7L, a gene encoding an annotated protein promoting mitochondrial fission. Our results suggest that many genes are multicoding genes and code for a large protein and one or several small proteins. PMID:29083303
ICAM-1-related long non-coding RNA: promoter analysis and expression in human retinal endothelial cells.

PubMed

Lumsden, Amanda L; Ma, Yuefang; Ashander, Liam M; Stempel, Andrew J; Keating, Damien J; Smith, Justine R; Appukuttan, Binoy

2018-05-09

Regulation of intercellular adhesion molecule (ICAM)-1 in retinal endothelial cells is a promising druggable target for retinal vascular diseases. The ICAM-1-related (ICR) long non-coding RNA stabilizes ICAM-1 transcript, increasing protein expression. However, studies of ICR involvement in disease have been limited as the promoter is uncharacterized. To address this issue, we undertook a comprehensive in silico analysis of the human ICR gene promoter region. We used genomic evolutionary rate profiling to identify a 115 base pair (bp) sequence within 500 bp upstream of the transcription start site of the annotated human ICR gene that was conserved across 25 eutherian genomes. A second constrained sequence upstream of the orthologous mouse gene (68 bp; conserved across 27 Eutherian genomes including human) was also discovered. Searching these elements identified 33 matrices predictive of binding sites for transcription factors known to be responsive to a broad range of pathological stimuli, including hypoxia, and metabolic and inflammatory proteins. Five phenotype-associated single nucleotide polymorphisms (SNPs) in the immediate vicinity of these elements included four SNPs (i.e. rs2569693, rs281439, rs281440 and rs11575074) predicted to impact binding motifs of transcription factors, and thus the expression of ICR and ICAM-1 genes, with potential to influence disease susceptibility. We verified that human retinal endothelial cells expressed ICR, and observed induction of expression by tumor necrosis factor-α.
Protein and Genetic Composition of Four Chromatin Types in Drosophila melanogaster Cell Lines.

PubMed

Boldyreva, Lidiya V; Goncharov, Fyodor P; Demakova, Olga V; Zykova, Tatyana Yu; Levitsky, Victor G; Kolesnikov, Nikolay N; Pindyurin, Alexey V; Semeshin, Valeriy F; Zhimulev, Igor F

2017-04-01

Recently, we analyzed genome-wide protein binding data for the Drosophila cell lines S2, Kc, BG3 and Cl.8 (modENCODE Consortium) and identified a set of 12 proteins enriched in the regions corresponding to interbands of salivary gland polytene chromosomes. Using these data, we developed a bioinformatic pipeline that partitioned the Drosophila genome into four chromatin types that we hereby refer to as aquamarine, lazurite, malachite and ruby. Here, we describe the properties of these chromatin types across different cell lines. We show that aquamarine chromatin tends to harbor transcription start sites (TSSs) and 5' untranslated regions (5'UTRs) of the genes, is enriched in diverse "open" chromatin proteins, histone modifications, nucleosome remodeling complexes and transcription factors. It encompasses most of the tRNA genes and shows enrichment for non-coding RNAs and miRNA genes. Lazurite chromatin typically encompasses gene bodies. It is rich in proteins involved in transcription elongation. Frequency of both point mutations and natural deletion breakpoints is elevated within lazurite chromatin. Malachite chromatin shows higher frequency of insertions of natural transposons. Finally, ruby chromatin is enriched for proteins and histone modifications typical for the "closed" chromatin. Ruby chromatin has a relatively low frequency of point mutations and is essentially devoid of miRNA and tRNA genes. Aquamarine and ruby chromatin types are highly stable across cell lines and have contrasting properties. Lazurite and malachite chromatin types also display characteristic protein composition, as well as enrichment for specific genomic features. We found that two types of chromatin, aquamarine and ruby, retain their complementary protein patterns in four Drosophila cell lines.
Transcriptome interrogation of human myometrium identifies differentially expressed sense-antisense pairs of protein-coding and long non-coding RNA genes in spontaneous labor at term

PubMed Central

Romero, Roberto; Tarca, Adi; Chaemsaithong, Piya; Miranda, Jezid; Chaiworapongsa, Tinnakorn; Jia, Hui; Hassan, Sonia S.; Kalita, Cynthia A.; Cai, Juan; Yeo, Lami; Lipovich, Leonard

2014-01-01

Objective The mechanisms responsible for normal and abnormal parturition are poorly understood. Myometrial activation leading to regular uterine contractions is a key component of labor. Dysfunctional labor (arrest of dilatation and/or descent) is a leading indication for cesarean delivery. Compelling evidence suggests that most of these disorders are functional in nature, and not the result of cephalopelvic disproportion. The methodology and the datasets afforded by the post-genomic era provide novel opportunities to understand and target gene functions in these disorders. In 2012, the ENCODE Consortium elucidated the extraordinary abundance and functional complexity of long non-coding RNA genes in the human genome. The purpose of the study was to identify differentially expressed long non-coding RNA genes in human myometrium in women in spontaneous labor at term. Materials and Methods Myometrium was obtained from women undergoing cesarean deliveries who were not in labor (n=19) and women in spontaneous labor at term (n=20). RNA was extracted and profiled using an Illumina® microarray platform. The analysis of the protein coding genes from this study has been previously reported. Here, we have used computational approaches to bound the extent of long non-coding RNA representation on this platform, and to identify co-differentially expressed and correlated pairs of long non-coding RNA genes and protein-coding genes sharing the same genomic loci. Results Upon considering more than 18,498 distinct lncRNA genes compiled nonredundantly from public experimental data sources, and interrogating 2,634 that matched Illumina microarray probes, we identified co-differential expression and correlation at two genomic loci that contain coding-lncRNA gene pairs: SOCS2-AK054607 and LMCD1-NR_024065 in women in spontaneous labor at term. This co-differential expression and correlation was validated by qRT-PCR, an independent experimental method. Intriguingly, one of the two lncRNA genes differentially expressed in term labor had a key genomic structure element, a splice site that lacked evolutionary conservation beyond primates. Conclusions We provide for the first time evidence for coordinated differential expression and correlation of cis-encoded antisense lncRNAs and protein-coding genes with known, as well as novel roles in pregnancy in the myometrium of women in spontaneous labor at term. PMID:24168098
An Annotated Draft Genome for Radix auricularia (Gastropoda, Mollusca)

PubMed Central

Feldmeyer, Barbara; Schmidt, Hanno; Greshake, Bastian; Tills, Oliver; Truebano, Manuela; Rundle, Simon D.; Paule, Juraj; Ebersberger, Ingo; Pfenninger, Markus

2017-01-01

Molluscs are the second most species-rich phylum in the animal kingdom, yet only 11 genomes of this group have been published so far. Here, we present the draft genome sequence of the pulmonate freshwater snail Radix auricularia. Six whole genome shotgun libraries with different layouts were sequenced. The resulting assembly comprises 4,823 scaffolds with a cumulative length of 910 Mb and an overall read coverage of 72×. The assembly contains 94.6% of a metazoan core gene collection, indicating an almost complete coverage of the coding fraction. The discrepancy of ∼690 Mb compared with the estimated genome size of R. auricularia (1.6 Gb) results from a high repeat content of 70% mainly comprising DNA transposons. The annotation of 17,338 protein coding genes was supported by the use of publicly available transcriptome data. This draft will serve as starting point for further genomic and population genetic research in this scientifically important phylum. PMID:28204581
Long non-coding RNA expression patterns in lung tissues of chronic cigarette smoke induced COPD mouse model.

PubMed

Zhang, Haiyun; Sun, Dejun; Li, Defu; Zheng, Zeguang; Xu, Jingyi; Liang, Xue; Zhang, Chenting; Wang, Sheng; Wang, Jian; Lu, Wenju

2018-05-15

Long non-coding RNAs (lncRNAs) have critical regulatory roles in protein-coding gene expression. Aberrant expression profiles of lncRNAs have been observed in various human diseases. In this study, we investigated transcriptome profiles in lung tissues of chronic cigarette smoke (CS)-induced COPD mouse model. We found that 109 lncRNAs and 260 mRNAs were significantly differential expressed in lungs of chronic CS-induced COPD mouse model compared with control animals. GO and KEGG analyses indicated that differentially expressed lncRNAs associated protein-coding genes were mainly involved in protein processing of endoplasmic reticulum pathway, and taurine and hypotaurine metabolism pathway. The combination of high throughput data analysis and the results of qRT-PCR validation in lungs of chronic CS-induced COPD mouse model, 16HBE cells with CSE treatment and PBMC from patients with COPD revealed that NR_102714 and its associated protein-coding gene UCHL1 might be involved in the development of COPD both in mouse and human. In conclusion, our study demonstrated that aberrant expression profiles of lncRNAs and mRNAs existed in lungs of chronic CS-induced COPD mouse model. From animal models perspective, these results might provide further clues to investigate biological functions of lncRNAs and their potential target protein-coding genes in the pathogenesis of COPD.
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing.

PubMed

Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R

2015-01-01

In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.
Successful Recovery of Nuclear Protein-Coding Genes from Small Insects in Museums Using Illumina Sequencing

PubMed Central

Dasenko, Mark A.

2015-01-01

In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693
Systematic asymmetric nucleotide exchanges produce human mitochondrial RNAs cryptically encoding for overlapping protein coding genes.

PubMed

Seligmann, Hervé

2013-05-07

GenBank's EST database includes RNAs matching exactly human mitochondrial sequences assuming systematic asymmetric nucleotide exchange-transcription along exchange rules: A→G→C→U/T→A (12 ESTs), A→U/T→C→G→A (4 ESTs), C→G→U/T→C (3 ESTs), and A→C→G→U/T→A (1 EST), no RNAs correspond to other potential asymmetric exchange rules. Hypothetical polypeptides translated from nucleotide-exchanged human mitochondrial protein coding genes align with numerous GenBank proteins, predicted secondary structures resemble their putative GenBank homologue's. Two independent methods designed to detect overlapping genes (one based on nucleotide contents analyses in relation to replicative deamination gradients at third codon positions, and circular code analyses of codon contents based on frame redundancy), confirm nucleotide-exchange-encrypted overlapping genes. Methods converge on which genes are most probably active, and which not, and this for the various exchange rules. Mean EST lengths produced by different nucleotide exchanges are proportional to (a) extents that various bioinformatics analyses confirm the protein coding status of putative overlapping genes; (b) known kinetic chemistry parameters of the corresponding nucleotide substitutions by the human mitochondrial DNA polymerase gamma (nucleotide DNA misinsertion rates); (c) stop codon densities in predicted overlapping genes (stop codon readthrough and exchanging polymerization regulate gene expression by counterbalancing each other). Numerous rarely expressed proteins seem encoded within regular mitochondrial genes through asymmetric nucleotide exchange, avoiding lengthening genomes. Intersecting evidence between several independent approaches confirms the working hypothesis status of gene encryption by systematic nucleotide exchanges. Copyright © 2013 Elsevier Ltd. All rights reserved.
Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.)

PubMed Central

Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

2015-01-01

The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. PMID:25362073
Transcription factor GATA-1 regulates human HOXB2 gene expression in erythroid cells.

PubMed

Vieille-Grosjean, I; Huber, P

1995-03-03

The human HOXB2 gene is a member of the vertebrate Hox gene family that contains genes coding for specific developmental stage DNA-binding proteins. Remarkably, within the hematopoietic compartment, genes of the HOXB complex are expressed specifically in erythromegakaryocytic cell lines and, for some of them, in hematopoietic progenitors. Here, we report the study of HOXB2 gene transcriptional regulation in hematopoietic cells, an initial step in understanding the lineage-specific expression of the whole HOXB complex in these cells. We have isolated the HOXB2 5'-flanking sequence and have characterized a promoter fragment extending 323 base pairs upstream from the transcriptional start site, which, in transfection experiments, was sufficient to direct the tissue-specific expression of HOXB2 in the erythroid cell line K562. In this fragment, we have identified a potential GATA-binding site that is essential to the promoter activity as demonstrated by point mutation experiments. Gel shift analysis revealed the formation of a specific complex in both erythroleukemic lines K562 and HEL that could be prevented by the addition of a specific antiserum raised against GATA-1 protein. These findings suggest a regulatory hierarchy in which GATA-1 is upstream of the HOXB2 gene in erythroid cells.
Novel Rhizosphere Soil Alleles for the Enzyme 1-Aminocyclopropane-1-Carboxylate Deaminase Queried for Function with an In Vivo Competition Assay

PubMed Central

Jin, Zhao; Di Rienzi, Sara C.; Janzon, Anders; Werner, Jeff J.; Angenent, Largus T.; Dangl, Jeffrey L.; Fowler, Douglas M.

2015-01-01

Metagenomes derived from environmental microbiota encode a vast diversity of protein homologs. How this diversity impacts protein function can be explored through selection assays aimed to optimize function. While artificially generated gene sequence pools are typically used in selection assays, their usage may be limited because of technical or ethical reasons. Here, we investigate an alternative strategy, the use of soil microbial DNA as a starting point. We demonstrate this approach by optimizing the function of a widely occurring soil bacterial enzyme, 1-aminocyclopropane-1-carboxylate (ACC) deaminase. We identified a specific ACC deaminase domain region (ACCD-DR) that, when PCR amplified from the soil, produced a variant pool that we could swap into functional plasmids carrying ACC deaminase-encoding genes. Functional clones of ACC deaminase were selected for in a competition assay based on their capacity to provide nitrogen to Escherichia coli in vitro. The most successful ACCD-DR variants were identified after multiple rounds of selection by sequence analysis. We observed that previously identified essential active-site residues were fixed in the original unselected library and that additional residues went to fixation after selection. We identified a divergent essential residue whose presence hints at the possible use of alternative substrates and a cluster of neutral residues that did not influence ACCD performance. Using an artificial ACCD-DR variant library generated by DNA oligomer synthesis, we validated the same fixation patterns. Our study demonstrates that soil metagenomes are useful starting pools of protein-coding-gene diversity that can be utilized for protein optimization and functional characterization when synthetic libraries are not appropriate. PMID:26637602
Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes.

PubMed

Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan

2017-10-03

Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes.
Chromosome preference of disease genes and vectorization for the prediction of non-coding disease genes

PubMed Central

Peng, Hui; Lan, Chaowang; Liu, Yuansheng; Liu, Tao; Blumenstein, Michael; Li, Jinyan

2017-01-01

Disease-related protein-coding genes have been widely studied, but disease-related non-coding genes remain largely unknown. This work introduces a new vector to represent diseases, and applies the newly vectorized data for a positive-unlabeled learning algorithm to predict and rank disease-related long non-coding RNA (lncRNA) genes. This novel vector representation for diseases consists of two sub-vectors, one is composed of 45 elements, characterizing the information entropies of the disease genes distribution over 45 chromosome substructures. This idea is supported by our observation that some substructures (e.g., the chromosome 6 p-arm) are highly preferred by disease-related protein coding genes, while some (e.g., the 21 p-arm) are not favored at all. The second sub-vector is 30-dimensional, characterizing the distribution of disease gene enriched KEGG pathways in comparison with our manually created pathway groups. The second sub-vector complements with the first one to differentiate between various diseases. Our prediction method outperforms the state-of-the-art methods on benchmark datasets for prioritizing disease related lncRNA genes. The method also works well when only the sequence information of an lncRNA gene is known, or even when a given disease has no currently recognized long non-coding genes. PMID:29108274
Computer analysis of protein functional sites projection on exon structure of genes in Metazoa

PubMed Central

2015-01-01

Background Study of the relationship between the structural and functional organization of proteins and their coding genes is necessary for an understanding of the evolution of molecular systems and can provide new knowledge for many applications for designing proteins with improved medical and biological properties. It is well known that the functional properties of proteins are determined by their functional sites. Functional sites are usually represented by a small number of amino acid residues that are distantly located from each other in the amino acid sequence. They are highly conserved within their functional group and vary significantly in structure between such groups. According to this facts analysis of the general properties of the structural organization of the functional sites at the protein level and, at the level of exon-intron structure of the coding gene is still an actual problem. Results One approach to this analysis is the projection of amino acid residue positions of the functional sites along with the exon boundaries to the gene structure. In this paper, we examined the discontinuity of the functional sites in the exon-intron structure of genes and the distribution of lengths and phases of the functional site encoding exons in vertebrate genes. We have shown that the DNA fragments coding the functional sites were in the same exons, or in close exons. The observed tendency to cluster the exons that code functional sites which could be considered as the unit of protein evolution. We studied the characteristics of the structure of the exon boundaries that code, and do not code, functional sites in 11 Metazoa species. This is accompanied by a reduced frequency of intercodon gaps (phase 0) in exons encoding the amino acid residue functional site, which may be evidence of the existence of evolutionary limitations to the exon shuffling. Conclusions These results characterize the features of the coding exon-intron structure that affect the functionality of the encoded protein and allow a better understanding of the emergence of biological diversity. PMID:26693737
nGASP--the nematode genome annotation assessment project.

PubMed

Coghlan, Avril; Fiedler, Tristan J; McKay, Sheldon J; Flicek, Paul; Harris, Todd W; Blasiar, Darin; Stein, Lincoln D

2008-12-19

While the C. elegans genome is extensively annotated, relatively little information is available for other Caenorhabditis species. The nematode genome annotation assessment project (nGASP) was launched to objectively assess the accuracy of protein-coding gene prediction software in C. elegans, and to apply this knowledge to the annotation of the genomes of four additional Caenorhabditis species and other nematodes. Seventeen groups worldwide participated in nGASP, and submitted 47 prediction sets across 10 Mb of the C. elegans genome. Predictions were compared to reference gene sets consisting of confirmed or manually curated gene models from WormBase. The most accurate gene-finders were 'combiner' algorithms, which made use of transcript- and protein-alignments and multi-genome alignments, as well as gene predictions from other gene-finders. Gene-finders that used alignments of ESTs, mRNAs and proteins came in second. There was a tie for third place between gene-finders that used multi-genome alignments and ab initio gene-finders. The median gene level sensitivity of combiners was 78% and their specificity was 42%, which is nearly the same accuracy reported for combiners in the human genome. C. elegans genes with exons of unusual hexamer content, as well as those with unusually many exons, short exons, long introns, a weak translation start signal, weak splice sites, or poorly conserved orthologs posed the greatest difficulty for gene-finders. This experiment establishes a baseline of gene prediction accuracy in Caenorhabditis genomes, and has guided the choice of gene-finders for the annotation of newly sequenced genomes of Caenorhabditis and other nematode species. We have created new gene sets for C. briggsae, C. remanei, C. brenneri, C. japonica, and Brugia malayi using some of the best-performing gene-finders.

The Ever-Evolving Concept of the Gene: The Use of RNA/Protein Experimental Techniques to Understand Genome Functions

PubMed Central

Cipriano, Andrea; Ballarino, Monica

2018-01-01

The completion of the human genome sequence together with advances in sequencing technologies have shifted the paradigm of the genome, as composed of discrete and hereditable coding entities, and have shown the abundance of functional noncoding DNA. This part of the genome, previously dismissed as “junk” DNA, increases proportionally with organismal complexity and contributes to gene regulation beyond the boundaries of known protein-coding genes. Different classes of functionally relevant nonprotein-coding RNAs are transcribed from noncoding DNA sequences. Among them are the long noncoding RNAs (lncRNAs), which are thought to participate in the basal regulation of protein-coding genes at both transcriptional and post-transcriptional levels. Although knowledge of this field is still limited, the ability of lncRNAs to localize in different cellular compartments, to fold into specific secondary structures and to interact with different molecules (RNA or proteins) endows them with multiple regulatory mechanisms. It is becoming evident that lncRNAs may play a crucial role in most biological processes such as the control of development, differentiation and cell growth. This review places the evolution of the concept of the gene in its historical context, from Darwin's hypothetical mechanism of heredity to the post-genomic era. We discuss how the original idea of protein-coding genes as unique determinants of phenotypic traits has been reconsidered in light of the existence of noncoding RNAs. We summarize the technological developments which have been made in the genome-wide identification and study of lncRNAs and emphasize the methodologies that have aided our understanding of the complexity of lncRNA-protein interactions in recent years. PMID:29560353
Evaluation of the efficacy of twelve mitochondrial protein-coding genes as barcodes for mollusk DNA barcoding.

PubMed

Yu, Hong; Kong, Lingfeng; Li, Qi

2016-01-01

In this study, we evaluated the efficacy of 12 mitochondrial protein-coding genes from 238 mitochondrial genomes of 140 molluscan species as potential DNA barcodes for mollusks. Three barcoding methods (distance, monophyly and character-based methods) were used in species identification. The species recovery rates based on genetic distances for the 12 genes ranged from 70.83 to 83.33%. There were no significant differences in intra- or interspecific variability among the 12 genes. The monophyly and character-based methods provided higher resolution than the distance-based method in species delimitation. Especially in closely related taxa, the character-based method showed some advantages. The results suggested that besides the standard COI barcode, other 11 mitochondrial protein-coding genes could also be potentially used as a molecular diagnostic for molluscan species discrimination. Our results also showed that the combination of mitochondrial genes did not enhance the efficacy for species identification and a single mitochondrial gene would be fully competent.
Physical structure and chromosomal localization of a gene encoding human p58[sup clk-1], a cell division control related protein kinase

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eipers, P.G.

1992-01-01

The gene for the human p58[sup clk[minus]1] protein kinase, a cell division control-related gene, has been mapped by somatic cell hybrid analyses, in situ localization with the chromosomal gene, and nested polymerase chain reaction amplification of microdissected chromosomes. These studies indicate that the expressed p58[sup clk[minus]1] chromosomal gene maps to 1p36, while a highly related p58[sup clk[minus]1] sequence of unknown nature maps to chromosome 15. Assignment of a p34[sup cdc2]-related gene to 1p36 region, including neuroblastoma, ductal carcinoma of the breast, malignant melanoma, Merkel cell carcinoma and endocrine neoplasia among others. Aberrant expression of this protein kinase negatively regulates normalmore » cellular growth. The p58[sup clk[minus]1] protein contains a central domain of 299 amino acids that is 46% identical to human p34[sup cdc2], the master mitotic protein kinase. This dissertation details the complete structure of the p58[sup clk[minus]1] chromosomal gene, including its putative promoter region, transcriptional start sites, exonic sequences, and intron/exon boundary sequences. The gene is 10 kb in size and contains 12 exons and 11 introns. Interestingly, the rather large 2.0 kb 3[prime] untranslated region is interrupted by an intron that separates a region containing numerous AUUUA destabilization motifs from the coding region. Furthermore, the expression of this gene in normal human tissues, as well as several human tumor cell samples and lines, is examined. The origin of multiple human transcripts from the same chromosomal gene, and the possible differential stability of these various transcripts, is discussed with regard to the transcriptional and post-transcriptional regulation of this gene. This is the first report of the chromosomal gene structure of a member of the p34[sup cdc2] supergene family.« less
Activity-Dependent Human Brain Coding/Noncoding Gene Regulatory Networks

PubMed Central

Lipovich, Leonard; Dachet, Fabien; Cai, Juan; Bagla, Shruti; Balan, Karina; Jia, Hui; Loeb, Jeffrey A.

2012-01-01

While most gene transcription yields RNA transcripts that code for proteins, a sizable proportion of the genome generates RNA transcripts that do not code for proteins, but may have important regulatory functions. The brain-derived neurotrophic factor (BDNF) gene, a key regulator of neuronal activity, is overlapped by a primate-specific, antisense long noncoding RNA (lncRNA) called BDNFOS. We demonstrate reciprocal patterns of BDNF and BDNFOS transcription in highly active regions of human neocortex removed as a treatment for intractable seizures. A genome-wide analysis of activity-dependent coding and noncoding human transcription using a custom lncRNA microarray identified 1288 differentially expressed lncRNAs, of which 26 had expression profiles that matched activity-dependent coding genes and an additional 8 were adjacent to or overlapping with differentially expressed protein-coding genes. The functions of most of these protein-coding partner genes, such as ARC, include long-term potentiation, synaptic activity, and memory. The nuclear lncRNAs NEAT1, MALAT1, and RPPH1, composing an RNAse P-dependent lncRNA-maturation pathway, were also upregulated. As a means to replicate human neuronal activity, repeated depolarization of SY5Y cells resulted in sustained CREB activation and produced an inverse pattern of BDNF-BDNFOS co-expression that was not achieved with a single depolarization. RNAi-mediated knockdown of BDNFOS in human SY5Y cells increased BDNF expression, suggesting that BDNFOS directly downregulates BDNF. Temporal expression patterns of other lncRNA-messenger RNA pairs validated the effect of chronic neuronal activity on the transcriptome and implied various lncRNA regulatory mechanisms. lncRNAs, some of which are unique to primates, thus appear to have potentially important regulatory roles in activity-dependent human brain plasticity. PMID:22960213
Using a Euclid distance discriminant method to find protein coding genes in the yeast genome.

PubMed

Zhang, Chun-Ting; Wang, Ju; Zhang, Ren

2002-02-01

The Euclid distance discriminant method is used to find protein coding genes in the yeast genome, based on the single nucleotide frequencies at three codon positions in the ORFs. The method is extremely simple and may be extended to find genes in prokaryotic genomes or eukaryotic genomes with less introns. Six-fold cross-validation tests have demonstrated that the accuracy of the algorithm is better than 93%. Based on this, it is found that the total number of protein coding genes in the yeast genome is less than or equal to 5579 only, about 3.8-7.0% less than 5800-6000, which is currently widely accepted. The base compositions at three codon positions are analyzed in details using a graphic method. The result shows that the preference codons adopted by yeast genes are of the RGW type, where R, G and W indicate the bases of purine, non-G and A/T, whereas the 'codons' in the intergenic sequences are of the form NNN, where N denotes any base. This fact constitutes the basis of the algorithm to distinguish between coding and non-coding ORFs in the yeast genome. The names of putative non-coding ORFs are listed here in detail.
Massive Collection of Full-Length Complementary DNA Clones and Microarray Analyses:. Keys to Rice Transcriptome Analysis

NASA Astrophysics Data System (ADS)

Kikuchi, Shoshi

2009-02-01

Completion of the high-precision genome sequence analysis of rice led to the collection of about 35,000 full-length cDNA clones and the determination of their complete sequences. Mapping of these full-length cDNA sequences has given us information on (1) the number of genes expressed in the rice genome; (2) the start and end positions and exon-intron structures of rice genes; (3) alternative transcripts; (4) possible encoded proteins; (5) non-protein-coding (np) RNAs; (6) the density of gene localization on the chromosome; (7) setting the parameters of gene prediction programs; and (8) the construction of a microarray system that monitors global gene expression. Manual curation for rice gene annotation by using mapping information on full-length cDNA and EST assemblies has revealed about 32,000 expressed genes in the rice genome. Analysis of major gene families, such as those encoding membrane transport proteins (pumps, ion channels, and secondary transporters), along with the evolution from bacteria to higher animals and plants, reveals how gene numbers have increased through adaptation to circumstances. Family-based gene annotation also gives us a new way of comparing organisms. Massive amounts of data on gene expression under many kinds of physiological conditions are being accumulated in rice oligoarrays (22K and 44K) based on full-length cDNA sequences. Cluster analyses of genes that have the same promoter cis-elements, that have similar expression profiles, or that encode enzymes in the same metabolic pathways or signal transduction cascades give us clues to understanding the networks of gene expression in rice. As a tool for that purpose, we recently developed "RiCES", a tool for searching for cis-elements in the promoter regions of clustered genes.
The complete mitochondrial genome of the diamondback moth, Plutella xylostella (Lepidoptera: Plutellidae).

PubMed

Dai, Li-Shang; Zhu, Bao-Jian; Qian, Cen; Zhang, Cong-Fen; Li, Jun; Wang, Lei; Wei, Guo-Qing; Liu, Chao-Liang

2016-01-01

The complete mitochondrial genome (mitogenome) of Plutella xylostella (Lepidoptera: Plutellidae) was determined (GenBank accession No. KM023645). The length of this mitogenome is 16,014 bp with 13 protein-coding genes (PCGs), 2 rRNA genes, 22 tRNA genes and an A + T-rich region. It presents the typical gene organization and order for completely sequenced lepidopteran mitogenomes. The nucleotide composition of the genome is highly A + T biased, accounting for 81.48%, with a slightly positive AT skewness (0.005). All PCGs are initiated by typical ATN codons, except for the gene cox1, which uses CGA as its start codon. Some PCGs harbor TA (nad5) or incomplete termination codon T (cox1, cox2, nad2 and nad4), while others use TAA as their termination codons. The A + T-rich region is located between rrnS and trnM with a length of 888 bp.
Genome-wide characterization of differential transcript usage in Arabidopsis thaliana.

PubMed

Vaneechoutte, Dries; Estrada, April R; Lin, Ying-Chen; Loraine, Ann E; Vandepoele, Klaas

2017-12-01

Alternative splicing and the usage of alternate transcription start- or stop sites allows a single gene to produce multiple transcript isoforms. Most plant genes express certain isoforms at a significantly higher level than others, but under specific conditions this expression dominance can change, resulting in a different set of dominant isoforms. These events of differential transcript usage (DTU) have been observed for thousands of Arabidopsis thaliana, Zea mays and Vitis vinifera genes, and have been linked to development and stress response. However, neither the characteristics of these genes, nor the implications of DTU on their protein coding sequences or functions, are currently well understood. Here we present a dataset of isoform dominance and DTU for all genes in the AtRTD2 reference transcriptome based on a protocol that was benchmarked on simulated data and validated through comparison with a published reverse transciptase-polymerase chain reaction panel. We report DTU events for 8148 genes across 206 public RNA-Seq samples, and find that protein sequences are affected in 22% of the cases. The observed DTU events show high consistency across replicates, and reveal reproducible patterns in response to treatment and development. We also demonstrate that genes with different evolutionary ages, expression breadths and functions show large differences in the frequency at which they undergo DTU, and in the effect that these events have on their protein sequences. Finally, we showcase how the generated dataset can be used to explore DTU events for genes of interest or to find genes with specific DTU in samples of interest. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.
Integrative structural annotation of de novo RNA-Seq provides an accurate reference gene set of the enormous genome of the onion (Allium cepa L.).

PubMed

Kim, Seungill; Kim, Myung-Shin; Kim, Yong-Min; Yeom, Seon-In; Cheong, Kyeongchae; Kim, Ki-Tae; Jeon, Jongbum; Kim, Sunggil; Kim, Do-Sun; Sohn, Seong-Han; Lee, Yong-Hwan; Choi, Doil

2015-02-01

The onion (Allium cepa L.) is one of the most widely cultivated and consumed vegetable crops in the world. Although a considerable amount of onion transcriptome data has been deposited into public databases, the sequences of the protein-coding genes are not accurate enough to be used, owing to non-coding sequences intermixed with the coding sequences. We generated a high-quality, annotated onion transcriptome from de novo sequence assembly and intensive structural annotation using the integrated structural gene annotation pipeline (ISGAP), which identified 54,165 protein-coding genes among 165,179 assembled transcripts totalling 203.0 Mb by eliminating the intron sequences. ISGAP performed reliable annotation, recognizing accurate gene structures based on reference proteins, and ab initio gene models of the assembled transcripts. Integrative functional annotation and gene-based SNP analysis revealed a whole biological repertoire of genes and transcriptomic variation in the onion. The method developed in this study provides a powerful tool for the construction of reference gene sets for organisms based solely on de novo transcriptome data. Furthermore, the reference genes and their variation described here for the onion represent essential tools for molecular breeding and gene cloning in Allium spp. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Genomic evidence for genes encoding leucine-rich repeat receptors linked to resistance against the eukaryotic extra- and intracellular Brassica napus pathogens Leptosphaeria maculans and Plasmodiophora brassicae.

PubMed

Stotz, Henrik U; Harvey, Pascoe J; Haddadi, Parham; Mashanova, Alla; Kukol, Andreas; Larkan, Nicholas J; Borhan, M Hossein; Fitt, Bruce D L

2018-01-01

Genes coding for nucleotide-binding leucine-rich repeat (LRR) receptors (NLRs) control resistance against intracellular (cell-penetrating) pathogens. However, evidence for a role of genes coding for proteins with LRR domains in resistance against extracellular (apoplastic) fungal pathogens is limited. Here, the distribution of genes coding for proteins with eLRR domains but lacking kinase domains was determined for the Brassica napus genome. Predictions of signal peptide and transmembrane regions divided these genes into 184 coding for receptor-like proteins (RLPs) and 121 coding for secreted proteins (SPs). Together with previously annotated NLRs, a total of 720 LRR genes were found. Leptosphaeria maculans-induced expression during a compatible interaction with cultivar Topas differed between RLP, SP and NLR gene families; NLR genes were induced relatively late, during the necrotrophic phase of pathogen colonization. Seven RLP, one SP and two NLR genes were found in Rlm1 and Rlm3/Rlm4/Rlm7/Rlm9 loci for resistance against L. maculans on chromosome A07 of B. napus. One NLR gene at the Rlm9 locus was positively selected, as was the RLP gene on chromosome A10 with LepR3 and Rlm2 alleles conferring resistance against L. maculans races with corresponding effectors AvrLm1 and AvrLm2, respectively. Known loci for resistance against L. maculans (extracellular hemi-biotrophic fungus), Sclerotinia sclerotiorum (necrotrophic fungus) and Plasmodiophora brassicae (intracellular, obligate biotrophic protist) were examined for presence of RLPs, SPs and NLRs in these regions. Whereas loci for resistance against P. brassicae were enriched for NLRs, no such signature was observed for the other pathogens. These findings demonstrate involvement of (i) NLR genes in resistance against the intracellular pathogen P. brassicae and a putative NLR gene in Rlm9-mediated resistance against the extracellular pathogen L. maculans.
Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana.

PubMed

Mayer, K; Schüller, C; Wambutt, R; Murphy, G; Volckaert, G; Pohl, T; Düsterhöft, A; Stiekema, W; Entian, K D; Terryn, N; Harris, B; Ansorge, W; Brandt, P; Grivell, L; Rieger, M; Weichselgartner, M; de Simone, V; Obermaier, B; Mache, R; Müller, M; Kreis, M; Delseny, M; Puigdomenech, P; Watson, M; Schmidtheini, T; Reichert, B; Portatelle, D; Perez-Alonso, M; Boutry, M; Bancroft, I; Vos, P; Hoheisel, J; Zimmermann, W; Wedler, H; Ridley, P; Langham, S A; McCullagh, B; Bilham, L; Robben, J; Van der Schueren, J; Grymonprez, B; Chuang, Y J; Vandenbussche, F; Braeken, M; Weltjens, I; Voet, M; Bastiaens, I; Aert, R; Defoor, E; Weitzenegger, T; Bothe, G; Ramsperger, U; Hilbert, H; Braun, M; Holzer, E; Brandt, A; Peters, S; van Staveren, M; Dirske, W; Mooijman, P; Klein Lankhorst, R; Rose, M; Hauf, J; Kötter, P; Berneiser, S; Hempel, S; Feldpausch, M; Lamberth, S; Van den Daele, H; De Keyser, A; Buysshaert, C; Gielen, J; Villarroel, R; De Clercq, R; Van Montagu, M; Rogers, J; Cronin, A; Quail, M; Bray-Allen, S; Clark, L; Doggett, J; Hall, S; Kay, M; Lennard, N; McLay, K; Mayes, R; Pettett, A; Rajandream, M A; Lyne, M; Benes, V; Rechmann, S; Borkova, D; Blöcker, H; Scharfe, M; Grimm, M; Löhnert, T H; Dose, S; de Haan, M; Maarse, A; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Fartmann, B; Granderath, K; Dauner, D; Herzl, A; Neumann, S; Argiriou, A; Vitale, D; Liguori, R; Piravandi, E; Massenet, O; Quigley, F; Clabauld, G; Mündlein, A; Felber, R; Schnabl, S; Hiller, R; Schmidt, W; Lecharny, A; Aubourg, S; Chefdor, F; Cooke, R; Berger, C; Montfort, A; Casacuberta, E; Gibbons, T; Weber, N; Vandenbol, M; Bargues, M; Terol, J; Torres, A; Perez-Perez, A; Purnelle, B; Bent, E; Johnson, S; Tacon, D; Jesse, T; Heijnen, L; Schwarz, S; Scholler, P; Heber, S; Francs, P; Bielke, C; Frishman, D; Haase, D; Lemcke, K; Mewes, H W; Stocker, S; Zaccaria, P; Bevan, M; Wilson, R K; de la Bastide, M; Habermann, K; Parnell, L; Dedhia, N; Gnoj, L; Schutz, K; Huang, E; Spiegel, L; Sehkon, M; Murray, J; Sheet, P; Cordes, M; Abu-Threideh, J; Stoneking, T; Kalicki, J; Graves, T; Harmon, G; Edwards, J; Latreille, P; Courtney, L; Cloud, J; Abbott, A; Scott, K; Johnson, D; Minx, P; Bentley, D; Fulton, B; Miller, N; Greco, T; Kemp, K; Kramer, J; Fulton, L; Mardis, E; Dante, M; Pepin, K; Hillier, L; Nelson, J; Spieth, J; Ryan, E; Andrews, S; Geisel, C; Layman, D; Du, H; Ali, J; Berghoff, A; Jones, K; Drone, K; Cotton, M; Joshu, C; Antonoiu, B; Zidanic, M; Strong, C; Sun, H; Lamar, B; Yordan, C; Ma, P; Zhong, J; Preston, R; Vil, D; Shekher, M; Matero, A; Shah, R; Swaby, I K; O'Shaughnessy, A; Rodriguez, M; Hoffmann, J; Till, S; Granat, S; Shohdy, N; Hasegawa, A; Hameed, A; Lodhi, M; Johnson, A; Chen, E; Marra, M; Martienssen, R; McCombie, W R

1999-12-16

The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).

PubMed

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-04-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)

PubMed Central

Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren

2016-01-01

Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575
In silico search for functionally similar proteins involved in meiosis and recombination in evolutionarily distant organisms.

PubMed

Bogdanov, Yuri F; Dadashev, Sergei Y; Grishaeva, Tatiana M

2003-01-01

Evolutionarily distant organisms have not only orthologs, but also nonhomologous proteins that build functionally similar subcellular structures. For instance, this is true with protein components of the synaptonemal complex (SC), a universal ultrastructure that ensures the successful pairing and recombination of homologous chromosomes during meiosis. We aimed at developing a method to search databases for genes that code for such nonhomologous but functionally analogous proteins. Advantage was taken of the ultrastructural parameters of SC and the conformation of SC proteins responsible for these. Proteins involved in SC central space are known to be similar in secondary structure. Using published data, we found a highly significant correlation between the width of the SC central space and the length of rod-shaped central domain of mammalian and yeast intermediate proteins forming transversal filaments in the SC central space. Basing on this, we suggested a method for searching genome databases of distant organisms for genes whose virtual proteins meet the above correlation requirement. Our recent finding of the Drosophila melanogaster CG17604 gene coding for synaptonemal complex transversal filament protein received experimental support from another lab. With the same strategy, we showed that the Arabidopsis thaliana and Caenorhabditis elegans genomes contain unique genes coding for such proteins.
Variations in the non-coding transcriptome as a driver of inter-strain divergence and physiological adaptation in bacteria

PubMed Central

Kopf, Matthias; Klähn, Stephan; Scholz, Ingeborg; Hess, Wolfgang R.; Voß, Björn

2015-01-01

In all studied organisms, a substantial portion of the transcriptome consists of non-coding RNAs that frequently execute regulatory functions. Here, we have compared the primary transcriptomes of the cyanobacteria Synechocystis sp. PCC 6714 and PCC 6803 under 10 different conditions. These strains share 2854 protein-coding genes and a 16S rRNA identity of 99.4%, indicating their close relatedness. Conserved major transcriptional start sites (TSSs) give rise to non-coding transcripts within the sigB gene, from the 5′UTRs of cmpA and isiA, and 168 loci in antisense orientation. Distinct differences include single nucleotide polymorphisms rendering promoters inactive in one of the strains, e.g., for cmpR and for the asRNA PsbA2R. Based on the genome-wide mapped location, regulation and classification of TSSs, non-coding transcripts were identified as the most dynamic component of the transcriptome. We identified a class of mRNAs that originate by read-through from an sRNA that accumulates as a discrete and abundant transcript while also serving as the 5′UTR. Such an sRNA/mRNA structure, which we name ‘actuaton’, represents another way for bacteria to remodel their transcriptional network. Our findings support the hypothesis that variations in the non-coding transcriptome constitute a major evolutionary element of inter-strain divergence and capability for physiological adaptation. PMID:25902393
Differential protein-coding gene and long noncoding RNA expression in smoking-related lung squamous cell carcinoma.

PubMed

Li, Shicheng; Sun, Xiao; Miao, Shuncheng; Liu, Jia; Jiao, Wenjie

2017-11-01

Cigarette smoking is one of the greatest preventable risk factors for developing cancer, and most cases of lung squamous cell carcinoma (lung SCC) are associated with smoking. The pathogenesis mechanism of tumor progress is unclear. This study aimed to identify biomarkers in smoking-related lung cancer, including protein-coding gene, long noncoding RNA, and transcription factors. We selected and obtained messenger RNA microarray datasets and clinical data from the Gene Expression Omnibus database to identify gene expression altered by cigarette smoking. Integrated bioinformatic analysis was used to clarify biological functions of the identified genes, including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, the construction of a protein-protein interaction network, transcription factor, and statistical analyses. Subsequent quantitative real-time PCR was utilized to verify these bioinformatic analyses. Five hundred and ninety-eight differentially expressed genes and 21 long noncoding RNA were identified in smoking-related lung SCC. GO and KEGG pathway analysis showed that identified genes were enriched in the cancer-related functions and pathways. The protein-protein interaction network revealed seven hub genes identified in lung SCC. Several transcription factors and their binding sites were predicted. The results of real-time quantitative PCR revealed that AURKA and BIRC5 were significantly upregulated and LINC00094 was downregulated in the tumor tissues of smoking patients. Further statistical analysis indicated that dysregulation of AURKA, BIRC5, and LINC00094 indicated poor prognosis in lung SCC. Protein-coding genes AURKA, BIRC5, and LINC00094 could be biomarkers or therapeutic targets for smoking-related lung SCC. © 2017 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd.
A common class of transcripts with 5'-intron depletion, distinct early coding sequence features, and N1-methyladenosine modification.

PubMed

Cenik, Can; Chua, Hon Nian; Singh, Guramrit; Akef, Abdalla; Snyder, Michael P; Palazzo, Alexander F; Moore, Melissa J; Roth, Frederick P

2017-03-01

Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5 ' proximal- i ntron- m inus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N 1 -methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N 1 -methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC. © 2017 Cenik et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data

PubMed Central

Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia

2015-01-01

Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/ PMID:26363020
Genes from the medicinal leech (Hirudo medicinalis) coding for unusual enzymes that specifically cleave endo-epsilon (gamma-Glu)-Lys isopeptide bonds and help to dissolve blood clots.

PubMed

Zavalova, L; Lukyanov, S; Baskova, I; Snezhkov, E; Akopov, S; Berezhnoy, S; Bogdanova, E; Barsova, E; Sverdlov, E D

1996-11-27

We previously detected in salivary gland secretions of the medicinal leech (Hirudo medicinalis) a novel enzymatic activity, endo-epsilon(gamma-Glu)-Lys isopeptidase, which cleaves isopeptide bonds formed by transglutaminase (Factor XIIIa) between glutamine gamma-carboxamide and the epsilon-amino group of lysine. Such isopeptide bonds, either within or between protein polypeptide chains are formed in many biological processes. However, before we started our work no enzymes were known to be capable of specifically splitting isopeptide bonds in proteins. The isopeptidase activity we detected was specific for isopeptide bonds. The enzyme was termed destabilase. Here we report the first purification of destabilase, part of its amino acid sequence isolation and sequencing of two related cDNAs derived from the gene family that encodes destabilase proteins, and the detection of isopeptidase activity encoded by one of these cDNAs cloned in a baculovirus expression vector. The deduced mature protein products of these cDNAs contain 115 and 116 amino acid residues, including 14 highly conserved Cys residues, and are formed from precursors containing specific leader peptides. No homologous sequences were found in public databases.
β-Glucuronidase as a Sensitive and Versatile Reporter in Actinomycetes ▿

PubMed Central

Myronovskyi, Maksym; Welle, Elisabeth; Fedorenko, Viktor; Luzhetskyy, Andriy

2011-01-01

Here we describe a versatile and sensitive reporter system for actinomycetes that is based on gusA, which encodes the β-glucuronidase enzyme. A series of gusA-containing transcriptional and translational fusion vectors were constructed and utilized to study the regulatory cascade of the phenalinolactone biosynthetic gene cluster. Furthermore, these vectors were used to study the efficiency of translation initiation at the ATG, GTG, TTG, and CTG start codons. Surprisingly, constructs using a TTG start codon showed the best activity, whereas those using ATG or GTG were approximately one-half or one-third as active, respectively. The CTG fusion showed only 5% of the activity of the TTG fusion. A suicide vector, pKGLP2, carrying gusA in its backbone was used to visually detect merodiploid formation and resolution, making gene targeting in actinomycetes much faster and easier. Three regulatory genes, plaR1, plaR2, and plaR3, involved in phenalinolactone biosynthesis were efficiently replaced with an apramycin resistance marker using this system. Finally, we expanded the genetic code of actinomycetes by introducing the nonproteinogenic amino acid N-epsilon-cyclopentyloxycarbonyl-l-lysine with the GusA protein as a reporter. PMID:21685164

Identification and analysis of unitary loss of long-established protein-coding genes in Poaceae shows evidences for biased gene loss and putatively functional transcription of relics.

PubMed

Zhao, Yi; Tang, Liang; Li, Zhe; Jin, Jinpu; Luo, Jingchu; Gao, Ge

2015-04-18

Long-established protein-coding genes may lose their coding potential during evolution ("unitary gene loss"). Members of the Poaceae family are a major food source and represent an ideal model clade for plant evolution research. However, the global pattern of unitary gene loss in Poaceae genomes as well as the evolutionary fate of lost genes are still less-investigated and remain largely elusive. Using a locally developed pipeline, we identified 129 unitary gene loss events for long-established protein-coding genes from four representative species of Poaceae, i.e. brachypodium, rice, sorghum and maize. Functional annotation suggested that the lost genes in all or most of Poaceae species are enriched for genes involved in development and response to endogenous stimulus. We also found that 44 mutated genomic loci of lost genes, which we referred as relics, were still actively transcribed, and of which 84% (37 of 44) showed significantly differential expression across different tissues. More interestingly, we found that there were totally five expressed relics may function as competitive endogenous RNA in brachypodium, rice and sorghum genome. Based on comparative genomics and transcriptome data, we firstly compiled a comprehensive catalogue of unitary gene loss events in Poaceae species and characterized a statistically significant functional preference for these lost genes as well showed the potential of relics functioning as competitive endogenous RNAs in Poaceae genomes.
Identification of a second flagellin gene and functional characterization of a sigma70-like promoter upstream of a Leptospira borgpetersenii flaB gene.

PubMed

Lin, Min; Dan, Hanhong; Li, Yijing

2004-02-01

Leptospira borgpetersenii, one of the causative agents of leptospirosis in both animals and humans, is a bacterial pathogen with characteristic motility that is mediated by the rotation of two periplasmic flagella (PF). The flaB gene coding for a core polypeptide subunit of PF was previously characterized by sequence analysis of its open reading frame (ORF) (M. Lin, J Biochem Mol Biol Biophys 2:181-187, 1999). The present study was undertaken to isolate and clone the uncharacterized sequence upstream of the flaB gene by using a PCR-based genome walking procedure. This has resulted in a 1470-bp genomic DNA sequence in which an 846-bp ORF coding for a 281-amino acid polypeptide (31.3 kDa) is identified 455 bp upstream from the flaB start codon. The encoded protein exhibits 72% amino acid identity to the deduced FlaB protein sequence of L. borgpetersenii and a high degree of sequence homology to the FlaB proteins of other spirochaetes. This has demonstrated for the first time that a second flaB gene homolog is present in a Leptospira species. The newly identified gene is designated flaB1, and the previously cloned flaB renamed flaB2. Within the intergenic sequence between flaB1 and flaB2, a potential stem-loop structure (12-bp inverted repeats) was identified 25 bp downstream of the flaB1 stop codon; this could serve as a transcription terminator for the flaB1 mRNA. Three E. coli-like promoter regions (I, II, and III) for binding Esigma(70), a regulatory sequence uncommonly found in flagellar genes, were predicted upstream of the flaB2 ORF. Only promoter region II contains a promoter that is functional in E. coli, as revealed at phenotypic and transcriptional levels by its capability of directing the expression of the chloramphenicol acetyltransferase (CAT) gene in the promoter probe vector pKK232-8. These observations may suggest that flaB1 and flaB2 are transcribed separately and do not form a transcriptional operon controlled by a single promoter.
Mutant phenotypes for thousands of bacterial genes of unknown function

DOE PAGES

Price, Morgan N.; Wetmore, Kelly M.; Waters, R. Jordan; ...

2018-05-16

One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because theymore » are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Lastly, our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.« less
Mutant phenotypes for thousands of bacterial genes of unknown function

DOE Office of Scientific and Technical Information (OSTI.GOV)

Price, Morgan N.; Wetmore, Kelly M.; Waters, R. Jordan

One-third of all protein-coding genes from bacterial genomes cannot be annotated with a function. Here, to investigate the functions of these genes, we present genome-wide mutant fitness data from 32 diverse bacteria across dozens of growth conditions. We identified mutant phenotypes for 11,779 protein-coding genes that had not been annotated with a specific function. Many genes could be associated with a specific condition because the gene affected fitness only in that condition, or with another gene in the same bacterium because they had similar mutant phenotypes. Of the poorly annotated genes, 2,316 had associations that have high confidence because theymore » are conserved in other bacteria. By combining these conserved associations with comparative genomics, we identified putative DNA repair proteins; in addition, we propose specific functions for poorly annotated enzymes and transporters and for uncharacterized protein families. Lastly, our study demonstrates the scalability of microbial genetics and its utility for improving gene annotations.« less
Analysis of protein-coding genetic variation in 60,706 humans.

PubMed

Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G

2016-08-18

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Complete nucleotide sequence of the freshwater unicellular cyanobacterium Synechococcus elongatus PCC 6301 chromosome: gene content and organization.

PubMed

Sugita, Chieko; Ogata, Koretsugu; Shikata, Masamitsu; Jikuya, Hiroyuki; Takano, Jun; Furumichi, Miho; Kanehisa, Minoru; Omata, Tatsuo; Sugiura, Masahiro; Sugita, Mamoru

2007-01-01

The entire genome of the unicellular cyanobacterium Synechococcus elongatus PCC 6301 (formerly Anacystis nidulans Berkeley strain 6301) was sequenced. The genome consisted of a circular chromosome 2,696,255 bp long. A total of 2,525 potential protein-coding genes, two sets of rRNA genes, 45 tRNA genes representing 42 tRNA species, and several genes for small stable RNAs were assigned to the chromosome by similarity searches and computer predictions. The translated products of 56% of the potential protein-coding genes showed sequence similarities to experimentally identified and predicted proteins of known function, and the products of 35% of the genes showed sequence similarities to the translated products of hypothetical genes. The remaining 9% of genes lacked significant similarities to genes for predicted proteins in the public DNA databases. Some 139 genes coding for photosynthesis-related components were identified. Thirty-seven genes for two-component signal transduction systems were also identified. This is the smallest number of such genes identified in cyanobacteria, except for marine cyanobacteria, suggesting that only simple signal transduction systems are found in this strain. The gene arrangement and nucleotide sequence of Synechococcus elongatus PCC 6301 were nearly identical to those of a closely related strain Synechococcus elongatus PCC 7942, except for the presence of a 188.6 kb inversion. The sequences as well as the gene information shown in this paper are available in the Web database, CYORF (http://www.cyano.genome.jp/).
Conserved small mRNA with an unique, extended Shine-Dalgarno sequence

PubMed Central

Hahn, Julia; Migur, Anzhela; von Boeselager, Raphael Freiherr; Kubatova, Nina; Kubareva, Elena; Schwalbe, Harald

2017-01-01

ABSTRACT Up to now, very small protein-coding genes have remained unrecognized in sequenced genomes. We identified an mRNA of 165 nucleotides (nt), which is conserved in Bradyrhizobiaceae and encodes a polypeptide with 14 amino acid residues (aa). The small mRNA harboring a unique Shine-Dalgarno sequence (SD) with a length of 17 nt was localized predominantly in the ribosome-containing P100 fraction of Bradyrhizobium japonicum USDA 110. Strong interaction between the mRNA and 30S ribosomal subunits was demonstrated by their co-sedimentation in sucrose density gradient. Using translational fusions with egfp, we detected weak translation and found that it is impeded by both the extended SD and the GTG start codon (instead of ATG). Biophysical characterization (CD- and NMR-spectroscopy) showed that synthesized polypeptide remained unstructured in physiological puffer. Replacement of the start codon by a stop codon increased the stability of the transcript, strongly suggesting additional posttranscriptional regulation at the ribosome. Therefore, the small gene was named rreB (ribosome-regulated expression in Bradyrhizobiaceae). Assuming that the unique ribosome binding site (RBS) is a hallmark of rreB homologs or similarly regulated genes, we looked for similar putative RBS in bacterial genomes and detected regions with at least 16 nt complementarity to the 3′-end of 16S rRNA upstream of sORFs in Caulobacterales, Rhizobiales, Rhodobacterales and Rhodospirillales. In the Rhodobacter/Roseobacter lineage of α-proteobacteria the corresponding gene (rreR) is conserved and encodes an 18 aa protein. This shows how specific RBS features can be used to identify new genes with presumably similar control of expression at the RNA level. PMID:27834614
The pnk/pnl gene (ORF 86) of Autographa californica nucleopolyhedrovirus is a non-essential, immediate early gene.

PubMed

Durantel, D; Croizier, L; Ayres, M D; Croizier, G; Possee, R D; López-Ferber, M

1998-03-01

Autographa californica nucleopolyhedrovirus (AcMNPV) ORF 86, located within the HindIII C fragment, potentially encodes a protein which shares sequence similarity with two T4 bacteriophage gene products, RNA ligase and polynucleotide kinase. This AcMNPV gene has been designated pnk/pnl but has yet to be assigned a function in virus replication. It has been classified as an immediate early virus gene, since the promoter was active in uninfected insect cells and mRNA transcripts were detectable from 4 to 48 h post-infection and in the presence of cycloheximide or aphidicolin in virus-infected cells. The extremities of the transcript have been mapped by primer extension and 3' RACE-PCR to positions -18 from the translational start codon and +15 downstream of the stop codon. The function of pnk/pnl was investigated by producing a recombinant virus (Acdel86lacZ) with the coding region replaced with that of lacZ. This virus replicated normally in Spodoptera frugiperda (Sf 21) cells, indicating that pnk/pnl is not essential for propagation in these cells. Virus protein production in Acdel86lacZ-infected Sf 21 cells also appeared to be unaffected, with normal synthesis of the IE-1, GP64, VP39 and polyhedrin proteins. Shut-down of host protein synthesis was not abolished in recombinant infection. When other baculovirus genomes were examined for the presence of pnk/pnl by restriction enzyme digestion and PCR, a deletion was found in AcMNPV 1.2, Galleria mellonella NPV (GmMNPV) and Bombyx mori NPV (BmNPV), suggesting that in many isolates this gene has either never been acquired or has been lost during genome evolution. This is one of the first baculovirus immediate early genes that appears to be nonessential for virus survival.
Novel promoters and coding first exons in DLG2 linked to developmental disorders and intellectual disability.

PubMed

Reggiani, Claudio; Coppens, Sandra; Sekhara, Tayeb; Dimov, Ivan; Pichon, Bruno; Lufin, Nicolas; Addor, Marie-Claude; Belligni, Elga Fabia; Digilio, Maria Cristina; Faletra, Flavio; Ferrero, Giovanni Battista; Gerard, Marion; Isidor, Bertrand; Joss, Shelagh; Niel-Bütschi, Florence; Perrone, Maria Dolores; Petit, Florence; Renieri, Alessandra; Romana, Serge; Topa, Alexandra; Vermeesch, Joris Robert; Lenaerts, Tom; Casimir, Georges; Abramowicz, Marc; Bontempi, Gianluca; Vilain, Catheline; Deconinck, Nicolas; Smits, Guillaume

2017-07-19

Tissue-specific integrative omics has the potential to reveal new genic elements important for developmental disorders. Two pediatric patients with global developmental delay and intellectual disability phenotype underwent array-CGH genetic testing, both showing a partial deletion of the DLG2 gene. From independent human and murine omics datasets, we combined copy number variations, histone modifications, developmental tissue-specific regulation, and protein data to explore the molecular mechanism at play. Integrating genomics, transcriptomics, and epigenomics data, we describe two novel DLG2 promoters and coding first exons expressed in human fetal brain. Their murine conservation and protein-level evidence allowed us to produce new DLG2 gene models for human and mouse. These new genic elements are deleted in 90% of 29 patients (public and in-house) showing partial deletion of the DLG2 gene. The patients' clinical characteristics expand the neurodevelopmental phenotypic spectrum linked to DLG2 gene disruption to cognitive and behavioral categories. While protein-coding genes are regarded as well known, our work shows that integration of multiple omics datasets can unveil novel coding elements. From a clinical perspective, our work demonstrates that two new DLG2 promoters and exons are crucial for the neurodevelopmental phenotypes associated with this gene. In addition, our work brings evidence for the lack of cross-annotation in human versus mouse reference genomes and nucleotide versus protein databases.
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones

PubMed Central

Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio

2004-01-01

The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
Multiplexed pyrosequencing of nine sea anemone (Cnidaria: Anthozoa: Hexacorallia: Actiniaria) mitochondrial genomes.

PubMed

Foox, Jonathan; Brugler, Mercer; Siddall, Mark Edward; Rodríguez, Estefanía

2016-07-01

Six complete and three partial actiniarian mitochondrial genomes were amplified in two semi-circles using long-range PCR and pyrosequenced in a single run on a 454 GS Junior, doubling the number of complete mitogenomes available within the order. Typical metazoan mtDNA features included circularity, 13 protein-coding genes, 2 ribosomal RNA genes, and length ranging from 17,498 to 19,727 bp. Several typical anthozoan mitochondrial genome features were also observed including the presence of only two transfer RNA genes, elevated A + T richness ranging from 54.9 to 62.4%, large intergenic regions, and group 1 introns interrupting NADH dehydrogenase subunit 5 and cytochrome c oxidase subunit I, the latter of which possesses a homing endonuclease gene. Within the sea anemone Alicia sansibarensis, we report the first mitochondrial gene order rearrangement within the Actiniaria, as well as putative novel non-canonical protein-coding genes. Phylogenetic analyses of all 13 protein-coding and 2 ribosomal genes largely corroborated current hypotheses of sea anemone interrelatedness, with a few lower-level differences.
Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas.

PubMed

Mathelier, Anthony; Lefebvre, Calvin; Zhang, Allen W; Arenillas, David J; Ding, Jiarui; Wasserman, Wyeth W; Shah, Sohrab P

2015-04-23

With the rapid increase of whole-genome sequencing of human cancers, an important opportunity to analyze and characterize somatic mutations lying within cis-regulatory regions has emerged. A focus on protein-coding regions to identify nonsense or missense mutations disruptive to protein structure and/or function has led to important insights; however, the impact on gene expression of mutations lying within cis-regulatory regions remains under-explored. We analyzed somatic mutations from 84 matched tumor-normal whole genomes from B-cell lymphomas with accompanying gene expression measurements to elucidate the extent to which these cancers are disrupted by cis-regulatory mutations. We characterize mutations overlapping a high quality set of well-annotated transcription factor binding sites (TFBSs), covering a similar portion of the genome as protein-coding exons. Our results indicate that cis-regulatory mutations overlapping predicted TFBSs are enriched in promoter regions of genes involved in apoptosis or growth/proliferation. By integrating gene expression data with mutation data, our computational approach culminates with identification of cis-regulatory mutations most likely to participate in dysregulation of the gene expression program. The impact can be measured along with protein-coding mutations to highlight key mutations disrupting gene expression and pathways in cancer. Our study yields specific genes with disrupted expression triggered by genomic mutations in either the coding or the regulatory space. It implies that mutated regulatory components of the genome contribute substantially to cancer pathways. Our analyses demonstrate that identifying genomically altered cis-regulatory elements coupled with analysis of gene expression data will augment biological interpretation of mutational landscapes of cancers.
Genes uniquely expressed in human growth plate chondrocytes uncover a distinct regulatory network.

PubMed

Li, Bing; Balasubramanian, Karthika; Krakow, Deborah; Cohn, Daniel H

2017-12-20

Chondrogenesis is the earliest stage of skeletal development and is a highly dynamic process, integrating the activities and functions of transcription factors, cell signaling molecules and extracellular matrix proteins. The molecular mechanisms underlying chondrogenesis have been extensively studied and multiple key regulators of this process have been identified. However, a genome-wide overview of the gene regulatory network in chondrogenesis has not been achieved. In this study, employing RNA sequencing, we identified 332 protein coding genes and 34 long non-coding RNA (lncRNA) genes that are highly selectively expressed in human fetal growth plate chondrocytes. Among the protein coding genes, 32 genes were associated with 62 distinct human skeletal disorders and 153 genes were associated with skeletal defects in knockout mice, confirming their essential roles in skeletal formation. These gene products formed a comprehensive physical interaction network and participated in multiple cellular processes regulating skeletal development. The data also revealed 34 transcription factors and 11,334 distal enhancers that were uniquely active in chondrocytes, functioning as transcriptional regulators for the cartilage-selective genes. Our findings revealed a complex gene regulatory network controlling skeletal development whereby transcription factors, enhancers and lncRNAs participate in chondrogenesis by transcriptional regulation of key genes. Additionally, the cartilage-selective genes represent candidate genes for unsolved human skeletal disorders.
Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana

PubMed Central

Itoh, Takeshi; Tanaka, Tsuyoshi; Barrero, Roberto A.; Yamasaki, Chisato; Fujii, Yasuyuki; Hilton, Phillip B.; Antonio, Baltazar A.; Aono, Hideo; Apweiler, Rolf; Bruskiewich, Richard; Bureau, Thomas; Burr, Frances; Costa de Oliveira, Antonio; Fuks, Galina; Habara, Takuya; Haberer, Georg; Han, Bin; Harada, Erimi; Hiraki, Aiko T.; Hirochika, Hirohiko; Hoen, Douglas; Hokari, Hiroki; Hosokawa, Satomi; Hsing, Yue; Ikawa, Hiroshi; Ikeo, Kazuho; Imanishi, Tadashi; Ito, Yukiyo; Jaiswal, Pankaj; Kanno, Masako; Kawahara, Yoshihiro; Kawamura, Toshiyuki; Kawashima, Hiroaki; Khurana, Jitendra P.; Kikuchi, Shoshi; Komatsu, Setsuko; Koyanagi, Kanako O.; Kubooka, Hiromi; Lieberherr, Damien; Lin, Yao-Cheng; Lonsdale, David; Matsumoto, Takashi; Matsuya, Akihiro; McCombie, W. Richard; Messing, Joachim; Miyao, Akio; Mulder, Nicola; Nagamura, Yoshiaki; Nam, Jongmin; Namiki, Nobukazu; Numa, Hisataka; Nurimoto, Shin; O’Donovan, Claire; Ohyanagi, Hajime; Okido, Toshihisa; OOta, Satoshi; Osato, Naoki; Palmer, Lance E.; Quetier, Francis; Raghuvanshi, Saurabh; Saichi, Naomi; Sakai, Hiroaki; Sakai, Yasumichi; Sakata, Katsumi; Sakurai, Tetsuya; Sato, Fumihiko; Sato, Yoshiharu; Schoof, Heiko; Seki, Motoaki; Shibata, Michie; Shimizu, Yuji; Shinozaki, Kazuo; Shinso, Yuji; Singh, Nagendra K.; Smith-White, Brian; Takeda, Jun-ichi; Tanino, Motohiko; Tatusova, Tatiana; Thongjuea, Supat; Todokoro, Fusano; Tsugane, Mika; Tyagi, Akhilesh K.; Vanavichit, Apichart; Wang, Aihui; Wing, Rod A.; Yamaguchi, Kaori; Yamamoto, Mayu; Yamamoto, Naoyuki; Yu, Yeisoo; Zhang, Hao; Zhao, Qiang; Higo, Kenichi; Burr, Benjamin; Gojobori, Takashi; Sasaki, Takuji

2007-01-01

We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is ∼32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene. PMID:17210932
RCDB: Renal Cancer Gene Database.

PubMed

Ramana, Jayashree

2012-05-18

Renal cell carcinoma or RCC is one of the common and most lethal urological cancers, with 40% of the patients succumbing to death because of metastatic progression of the disease. Treatment of metastatic RCC remains highly challenging because of its resistance to chemotherapy as well as radiotherapy, besides surgical resection. Whereas RCC comprises tumors with differing histological types, clear cell RCC remains the most common. A major problem in the clinical management of patients presenting with localized ccRCC is the inability to determine tumor aggressiveness and accurately predict the risk of metastasis following surgery. As a measure to improve the diagnosis and prognosis of RCC, researchers have identified several molecular markers through a number of techniques. However the wealth of information available is scattered in literature and not easily amenable to data-mining. To reduce this gap, this work describes a comprehensive repository called Renal Cancer Gene Database, as an integrated gateway to study renal cancer related data. Renal Cancer Gene Database is a manually curated compendium of 240 protein-coding and 269 miRNA genes contributing to the etiology and pathogenesis of various forms of renal cell carcinomas. The protein coding genes have been classified according to the kind of gene alteration observed in RCC. RCDB also includes the miRNAsdysregulated in RCC, along with the corresponding information regarding the type of RCC and/or metastatic or prognostic significance. While some of the miRNA genes showed an association with other types of cancers few were unique to RCC. Users can query the database using keywords, category and chromosomal location of the genes. The knowledgebase can be freely accessed via a user-friendly web interface at http://www.juit.ac.in/attachments/jsr/rcdb/homenew.html. It is hoped that this database would serve as a useful complement to the existing public resources and as a good starting point for researchers and physicians interested in RCC genetics.
The complete mitochondrial genome of Plodia interpunctella (Lepidoptera: Pyralidae) and comparison with other Pyraloidea insects.

PubMed

Liu, Qiu-Ning; Chai, Xin-Yue; Bian, Dan-Dan; Zhou, Chun-Lin; Tang, Bo-Ping

2016-01-01

The mitochondrial (mt) genome can provide important information for the understanding of phylogenetic relationships. The complete mt genome of Plodia interpunctella (Lepidoptera: Pyralidae) has been sequenced. The circular genome is 15 287 bp in size, encoding 13 protein-coding genes (PCGs), 2 rRNA genes, 22 tRNA genes, and a control region. The AT skew of this mt genome is slightly negative, and the nucleotide composition is biased toward A+T nucleotides (80.15%). All PCGs start with the typical ATN (ATA, ATC, ATG, and ATT) codons, except for the cox1 gene which may start with the CGA codon. Four of the 13 PCGs harbor the incomplete termination codon T or TA. All the tRNA genes are folded into the typical clover-leaf structure of mitochondrial tRNA, except for trnS1 (AGN) in which the DHU arm fails to form a stable stem-loop structure. The overlapping sequences are 35 bp in total and are found in seven different locations. A total of 240 bp of intergenic spacers are scattered in 16 regions. The control region of the mt genome is 327 bp in length and consisted of several features common to the sequenced lepidopteran insects. Phylogenetic analysis based on 13 PCGs using the Maximum Likelihood method shows that the placement of P. interpunctella was within the Pyralidae.
First Mitochondrial Genome from Nemouridae (Plecoptera) Reveals Novel Features of the Elongated Control Region and Phylogenetic Implications

PubMed Central

Chen, Zhi-Teng; Du, Yu-Zhou

2017-01-01

The complete mitochondrial genome (mitogenome) of Nemoura nankinensis (Plecoptera: Nemouridae) was sequenced as the first reported mitogenome from the family Nemouridae. The N. nankinensis mitogenome was the longest (16,602 bp) among reported plecopteran mitogenomes, and it contains 37 genes including 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes and two ribosomal RNA (rRNA) genes. Most PCGs used standard ATN as start codons, and TAN as termination codons. All tRNA genes of N. nankinensis could fold into the cloverleaf secondary structures except for trnSer (AGN), whose dihydrouridine (DHU) arm was reduced to a small loop. There was also a large non-coding region (control region, CR) in the N. nankinensis mitogenome. The 1751 bp CR was the longest and had the highest A+T content (81.8%) among stoneflies. A large tandem repeat region, five potential stem-loop (SL) structures, four tRNA-like structures and four conserved sequence blocks (CSBs) were detected in the elongated CR. The presence of these tRNA-like structures in the CR has never been reported in other plecopteran mitogenomes. These novel features of the elongated CR in N. nankinensis may have functions associated with the process of replication and transcription. Finally, phylogenetic reconstruction suggested that Nemouridae was the sister-group of Capniidae. PMID:28475163
First Mitochondrial Genome from Nemouridae (Plecoptera) Reveals Novel Features of the Elongated Control Region and Phylogenetic Implications.

PubMed

Chen, Zhi-Teng; Du, Yu-Zhou

2017-05-05

The complete mitochondrial genome (mitogenome) of Nemoura nankinensis (Plecoptera: Nemouridae) was sequenced as the first reported mitogenome from the family Nemouridae. The N. nankinensis mitogenome was the longest (16,602 bp) among reported plecopteran mitogenomes, and it contains 37 genes including 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes and two ribosomal RNA (rRNA) genes. Most PCGs used standard ATN as start codons, and TAN as termination codons. All tRNA genes of N. nankinensis could fold into the cloverleaf secondary structures except for trnSer ( AGN ), whose dihydrouridine (DHU) arm was reduced to a small loop. There was also a large non-coding region (control region, CR) in the N. nankinensis mitogenome. The 1751 bp CR was the longest and had the highest A+T content (81.8%) among stoneflies. A large tandem repeat region, five potential stem-loop (SL) structures, four tRNA-like structures and four conserved sequence blocks (CSBs) were detected in the elongated CR. The presence of these tRNA-like structures in the CR has never been reported in other plecopteran mitogenomes. These novel features of the elongated CR in N. nankinensis may have functions associated with the process of replication and transcription. Finally, phylogenetic reconstruction suggested that Nemouridae was the sister-group of Capniidae.
The transcriptional activator ZNF143 is essential for normal development in zebrafish

PubMed Central

2012-01-01

Background ZNF143 is a sequence-specific DNA-binding protein that stimulates transcription of both small RNA genes by RNA polymerase II or III, or protein-coding genes by RNA polymerase II, using separable activating domains. We describe phenotypic effects following knockdown of this protein in developing Danio rerio (zebrafish) embryos by injection of morpholino antisense oligonucleotides that target znf143 mRNA. Results The loss of function phenotype is pleiotropic and includes a broad array of abnormalities including defects in heart, blood, ear and midbrain hindbrain boundary. Defects are rescued by coinjection of synthetic mRNA encoding full-length ZNF143 protein, but not by protein lacking the amino-terminal activation domains. Accordingly, expression of several marker genes is affected following knockdown, including GATA-binding protein 1 (gata1), cardiac myosin light chain 2 (cmlc2) and paired box gene 2a (pax2a). The zebrafish pax2a gene proximal promoter contains two binding sites for ZNF143, and reporter gene transcription driven by this promoter in transfected cells is activated by this protein. Conclusions Normal development of zebrafish embryos requires ZNF143. Furthermore, the pax2a gene is probably one example of many protein-coding gene targets of ZNF143 during zebrafish development. PMID:22268977
The transcriptional activator ZNF143 is essential for normal development in zebrafish.

PubMed

Halbig, Kari M; Lekven, Arne C; Kunkel, Gary R

2012-01-23

ZNF143 is a sequence-specific DNA-binding protein that stimulates transcription of both small RNA genes by RNA polymerase II or III, or protein-coding genes by RNA polymerase II, using separable activating domains. We describe phenotypic effects following knockdown of this protein in developing Danio rerio (zebrafish) embryos by injection of morpholino antisense oligonucleotides that target znf143 mRNA. The loss of function phenotype is pleiotropic and includes a broad array of abnormalities including defects in heart, blood, ear and midbrain hindbrain boundary. Defects are rescued by coinjection of synthetic mRNA encoding full-length ZNF143 protein, but not by protein lacking the amino-terminal activation domains. Accordingly, expression of several marker genes is affected following knockdown, including GATA-binding protein 1 (gata1), cardiac myosin light chain 2 (cmlc2) and paired box gene 2a (pax2a). The zebrafish pax2a gene proximal promoter contains two binding sites for ZNF143, and reporter gene transcription driven by this promoter in transfected cells is activated by this protein. Normal development of zebrafish embryos requires ZNF143. Furthermore, the pax2a gene is probably one example of many protein-coding gene targets of ZNF143 during zebrafish development.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schriner, J.E.; Yi, W.; Hofmann, S.L.

Palmitoyl-protein thioesterase (PPT) is a small glycoprotein that removes palmitate groups from cysteine residues in lipid-modified proteins. We recently reported mutations in PPT in patients with infantile neuronal ceroid lipofuscinosis (INCL), a severe neurodegenerative disorder. INCL is characterized by the accumulation of proteolipid storage material in brain and other tissues, suggesting that the disease is a consequence of abnormal catabolism of acylated proteins. In the current paper, we report the sequence of the human PPT cDNA and the structure of the human PPT gene. The cDNA predicts a protein of 306 amino acids that contains a 25-amino-acid signal peptide, threemore » N-linked glycosylation sites, and consensus motifs characteristic of thioesterases. Northern analysis of a human tissue blot revealed ubiquitous expression of a single 2.5-kb mRNA, with highest expression in lung, brain, and heart. The human PPT gene spans 25 kb and is composed of seven coding exons and a large eighth exon, containing the entire 3{prime}-untranslated region of 1388 bp. An Alu repeat and promoter elements corresponding to putative binding sites for several general transcription factors were identified in the 1060 nucleotides upstream of the transcription start site. The human PPT cDNA sequence and gene structure will provide the means for the identification of further causative mutations in INCL and facilitate genetic screening in selected high-risk populations. 31 refs., 5 figs., 1 tab.« less
Rate heterogeneity in six protein-coding genes from the holoparasite Balanophora (Balanophoraceae) and other taxa of Santalales

PubMed Central

Su, Huei-Jiun; Hu, Jer-Ming

2012-01-01

Background and Aims The holoparasitic flowering plant Balanophora displays extreme floral reduction and was previously found to have enormous rate acceleration in the nuclear 18S rDNA region. So far, it remains unclear whether non-ribosomal, protein-coding genes of Balanophora also evolve in an accelerated fashion and whether the genes with high substitution rates retain their functionality. To tackle these issues, six different genes were sequenced from two Balanophora species and their rate variation and expression patterns were examined. Methods Sequences including nuclear PI, euAP3, TM6, LFY and RPB2 and mitochondrial matR were determined from two Balanophora spp. and compared with selected hemiparasitic species of Santalales and autotrophic core eudicots. Gene expression was detected for the six protein-coding genes and the expression patterns of the three B-class genes (PI, AP3 and TM6) were further examined across different organs of B. laxiflora using RT-PCR. Key Results Balanophora mitochondrial matR is highly accelerated in both nonsynonymous (dN) and synonymous (dS) substitution rates, whereas the rate variation of nuclear genes LFY, PI, euAP3, TM6 and RPB2 are less dramatic. Significant dS increases were detected in Balanophora PI, TM6, RPB2 and dN accelerations in euAP3. All of the protein-coding genes are expressed in inflorescences, indicative of their functionality. PI is restrictively expressed in tepals, synandria and floral bracts, whereas AP3 and TM6 are widely expressed in both male and female inflorescences. Conclusions Despite the observation that rates of sequence evolution are generally higher in Balanophora than in hemiparasitic species of Santalales and autotrophic core eudicots, the five nuclear protein-coding genes are functional and are evolving at a much slower rate than 18S rDNA. The mechanism or mechanisms responsible for rapid sequence evolution and concomitant rate acceleration for 18S rDNA and matR are currently not well understood and require further study in Balanophora and other holoparasites. PMID:23041381
Transcriptomes of six mutants in the Sen1 pathway reveal combinatorial control of transcription termination across the Saccharomyces cerevisiae genome

PubMed Central

Carver, Melissa N.; Müller, Ulrika; Bekiranov, Stefan; Auble, David T.

2017-01-01

Transcriptome studies on eukaryotic cells have revealed an unexpected abundance and diversity of noncoding RNAs synthesized by RNA polymerase II (Pol II), some of which influence the expression of protein-coding genes. Yet, much less is known about biogenesis of Pol II non-coding RNA than mRNAs. In the budding yeast Saccharomyces cerevisiae, initiation of non-coding transcripts by Pol II appears to be similar to that of mRNAs, but a distinct pathway is utilized for termination of most non-coding RNAs: the Sen1-dependent or “NNS” pathway. Here, we examine the effect on the S. cerevisiae transcriptome of conditional mutations in the genes encoding six different essential proteins that influence Sen1-dependent termination: Sen1, Nrd1, Nab3, Ssu72, Rpb11, and Hrp1. We observe surprisingly diverse effects on transcript abundance for the different proteins that cannot be explained simply by differing severity of the mutations. Rather, we infer from our results that termination of Pol II transcription of non-coding RNA genes is subject to complex combinatorial control that likely involves proteins beyond those studied here. Furthermore, we identify new targets and functions of Sen1-dependent termination, including a role in repression of meiotic genes in vegetative cells. In combination with other recent whole-genome studies on termination of non-coding RNAs, our results provide promising directions for further investigation. PMID:28665995
Characterization of the Lymantria dispar nucleopolyhedrovirus 25K FP gene

Treesearch

David S. Bischoff; James M. Slavicek

1996-01-01

The Lymantria dispar nucleopolyhedrovirus (LdMNPV) gene encoding the 25K FP protein has been cloned and sequenced. The 25KFP gene codes for a 217 amino acid protein with a predicted molecular mass of 24870 Da. Expression of the 25K FP protein in a rabbit reticulocyte system generated a 27 kDa protein, in close agreement with the...
Role of genomic architecture in the expression dynamics of long noncoding RNAs during differentiation of human neuroblastoma cells.

PubMed

Batagov, Arsen O; Yarmishyn, Aliaksandr A; Jenjaroenpun, Piroon; Tan, Jovina Z; Nishida, Yuichiro; Kurochkin, Igor V

2013-10-16

Mammalian genomes are extensively transcribed producing thousands of long non-protein-coding RNAs (lncRNAs). The biological significance and function of the vast majority of lncRNAs remain unclear. Recent studies have implicated several lncRNAs as playing important roles in embryonic development and cancer progression. LncRNAs are characterized with different genomic architectures in relationship with their associated protein-coding genes. Our study aimed at bridging lncRNA architecture with dynamical patterns of their expression using differentiating human neuroblastoma cells model. LncRNA expression was studied in a 120-hours timecourse of differentiation of human neuroblastoma SH-SY5Y cells into neurons upon treatment with retinoic acid (RA), the compound used for the treatment of neuroblastoma. A custom microarray chip was utilized to interrogate expression levels of 9,267 lncRNAs in the course of differentiation. We categorized lncRNAs into 19 architecture classes according to their position relatively to protein-coding genes. For each architecture class, dynamics of expression of lncRNAs was studied in association with their protein-coding partners. It allowed us to demonstrate positive correlation of lncRNAs with their associated protein-coding genes at bidirectional promoters and for sense-antisense transcript pairs. In contrast, lncRNAs located in the introns and downstream of the protein-coding genes were characterized with negative correlation modes. We further classified the lncRNAs by the temporal patterns of their expression dynamics. We found that intronic and bidirectional promoter architectures are associated with rapid RA-dependent induction or repression of the corresponding lncRNAs, followed by their constant expression. At the same time, lncRNAs expressed downstream of protein-coding genes are characterized by rapid induction, followed by transcriptional repression. Quantitative RT-PCR analysis confirmed the discovered functional modes for several selected lncRNAs associated with proteins involved in cancer and embryonic development. This is the first report detailing dynamical changes of multiple lncRNAs during RA-induced neuroblastoma differentiation. Integration of genomic and transcriptomic levels of information allowed us to demonstrate specific behavior of lncRNAs organized in different genomic architectures. This study also provides a list of lncRNAs with possible roles in neuroblastoma.
Identification of Methanococcus Jannaschii Proteins in 2-D Gel Electrophoresis Patterns by Mass Spectrometry

DOE R&D Accomplishments Database

Liang, X.

1998-06-10

The genome of Methanococcus jannaschii has been sequenced completely and has been found to contain approximately 1,770 predicted protein-coding regions. When these coding regions are expressed and how their expression is regulated, however, remain open questions. In this work, mass spectrometry was combined with two-dimensional gel electrophoresis to identify which proteins the genes produce under different growth conditions, and thus investigate the regulation of genes responsible for functions characteristic of this thermophilic representative of the methanogenic Archaea.
Mitochondrial and cytoplasmic isoleucyl-, glutamyl- and arginyl-tRNA synthetases of yeast are encoded by separate genes.

PubMed

Tzagoloff, A; Shtanko, A

1995-06-01

Three complementation groups of a pet mutant collection have been found to be composed of respiratory-deficient deficient mutants with lesions in mitochondrial protein synthesis. Recombinant plasmids capable of restoring respiration were cloned by transformation of representatives of each complementation group with a yeast genomic library. The plasmids were used to characterize the complementing genes and to institute disruption of the chromosomal copies of each gene in respiratory-proficient yeast. The sequences of the cloned genes indicate that they code for isoleucyl-, arginyl- and glutamyl-tRNA synthetases. The properties of the mutants used to obtain the genes and of strains with the disrupted genes indicate that all three aminoacyl-tRNA synthetases function exclusively in mitochondrial proteins synthesis. The ISM1 gene for mitochondrial isoleucyl-tRNA synthetase has been localized to chromosome XVI next to UME5. The MSR1 gene for the arginyl-tRNA synthetase was previously located on yeast chromosome VIII. The third gene MSE1 for the mitochondrial glutamyl-tRNA synthetase has not been localized. The identification of three new genes coding for mitochondrial-specific aminoacyl-tRNA synthetases indicates that in Saccharomyces cerevisiae at least 11 members of this protein family are encoded by genes distinct from those coding for the homologous cytoplasmic enzymes.
Structural and functional studies of a family of Dictyostelium discoideum developmentally regulated, prestalk genes coding for small proteins.

PubMed

Vicente, Juan J; Galardi-Castilla, María; Escalante, Ricardo; Sastre, Leandro

2008-01-03

The social amoeba Dictyostelium discoideum executes a multicellular development program upon starvation. This morphogenetic process requires the differential regulation of a large number of genes and is coordinated by extracellular signals. The MADS-box transcription factor SrfA is required for several stages of development, including slug migration and spore terminal differentiation. Subtractive hybridization allowed the isolation of a gene, sigN (SrfA-induced gene N), that was dependent on the transcription factor SrfA for expression at the slug stage of development. Homology searches detected the existence of a large family of sigN-related genes in the Dictyostelium discoideum genome. The 13 most similar genes are grouped in two regions of chromosome 2 and have been named Group1 and Group2 sigN genes. The putative encoded proteins are 87-89 amino acids long. All these genes have a similar structure, composed of a first exon containing a 13 nucleotides long open reading frame and a second exon comprising the remaining of the putative coding region. The expression of these genes is induced at10 hours of development. Analyses of their promoter regions indicate that these genes are expressed in the prestalk region of developing structures. The addition of antibodies raised against SigN Group 2 proteins induced disintegration of multi-cellular structures at the mound stage of development. A large family of genes coding for small proteins has been identified in D. discoideum. Two groups of very similar genes from this family have been shown to be specifically expressed in prestalk cells during development. Functional studies using antibodies raised against Group 2 SigN proteins indicate that these genes could play a role during multicellular development.
EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences.

PubMed

Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

2003-07-01

EUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGENE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGENE'HOM to handle sequences from a variety of organisms. The current target of EUGENE'HOM is plant sequences. The EUGENE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl.
The complete mitochondrial genome of Papilio glaucus and its phylogenetic implications.

PubMed

Shen, Jinhui; Cong, Qian; Grishin, Nick V

2015-09-01

Due to the intriguing morphology, lifecycle, and diversity of butterflies and moths, Lepidoptera are emerging as model organisms for the study of genetics, evolution and speciation. The progress of these studies relies on decoding Lepidoptera genomes, both nuclear and mitochondrial. Here we describe a protocol to obtain mitogenomes from Next Generation Sequencing reads performed for whole-genome sequencing and report the complete mitogenome of Papilio (Pterourus) glaucus. The circular mitogenome is 15,306 bp in length and rich in A and T. It contains 13 protein-coding genes (PCGs), 22 transfer-RNA-coding genes (tRNA), and 2 ribosomal-RNA-coding genes (rRNA), with a gene order typical for mitogenomes of Lepidoptera. We performed phylogenetic analyses based on PCG and RNA-coding genes or protein sequences using Bayesian Inference and Maximum Likelihood methods. The phylogenetic trees consistently show that among species with available mitogenomes Papilio glaucus is the closest to Papilio (Agehana) maraho from Asia.
Gene and genon concept: coding versus regulation

PubMed Central

2007-01-01

We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon. PMID:18087760
A Transcriptome Map of Actinobacillus pleuropneumoniae at Single-Nucleotide Resolution Using Deep RNA-Seq

PubMed Central

Su, Zhipeng; Zhu, Jiawen; Xu, Zhuofei; Xiao, Ran; Zhou, Rui; Li, Lu; Chen, Huanchun

2016-01-01

Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study provide a foundation for future studies concerning the gene functions and the transcriptional regulatory architectures of this pathogen. PMID:27018591
A novel TBP-TAF complex on RNA polymerase II-transcribed snRNA genes.

PubMed

Zaborowska, Justyna; Taylor, Alice; Roeder, Robert G; Murphy, Shona

2012-01-01

Initiation of transcription of most human genes transcribed by RNA polymerase II (RNAP II) requires the formation of a preinitiation complex comprising TFIIA, B, D, E, F, H and RNAP II. The general transcription factor TFIID is composed of the TATA-binding protein and up to 13 TBP-associated factors. During transcription of snRNA genes, RNAP II does not appear to make the transition to long-range productive elongation, as happens during transcription of protein-coding genes. In addition, recognition of the snRNA gene-type specific 3' box RNA processing element requires initiation from an snRNA gene promoter. These characteristics may, at least in part, be driven by factors recruited to the promoter. For example, differences in the complement of TAFs might result in differential recruitment of elongation and RNA processing factors. As precedent, it already has been shown that the promoters of some protein-coding genes do not recruit all the TAFs found in TFIID. Although TAF5 has been shown to be associated with RNAP II-transcribed snRNA genes, the full complement of TAFs associated with these genes has remained unclear. Here we show, using a ChIP and siRNA-mediated approach, that the TBP/TAF complex on snRNA genes differs from that found on protein-coding genes. Interestingly, the largest TAF, TAF1, and the core TAFs, TAF10 and TAF4, are not detected on snRNA genes. We propose that this snRNA gene-specific TAF subset plays a key role in gene type-specific control of expression.
Problem-Based Test: An "In Vitro" Experiment to Analyze the Genetic Code

ERIC Educational Resources Information Center

Szeberenyi, Jozsef

2010-01-01

Terms to be familiar with before you start to solve the test: genetic code, translation, synthetic polynucleotide, leucine, serine, filter precipitation, radioactivity measurement, template, mRNA, tRNA, rRNA, aminoacyl-tRNA synthesis, ribosomes, degeneration of the code, wobble, initiation, and elongation of protein synthesis, initiation codon.…
Retrieval of Enterobacteriaceae drug targets using singular value decomposition.

PubMed

Silvério-Machado, Rita; Couto, Bráulio R G M; Dos Santos, Marcos A

2015-04-15

The identification of potential drug target proteins in bacteria is important in pharmaceutical research for the development of new antibiotics to combat bacterial agents that cause diseases. A new model that combines the singular value decomposition (SVD) technique with biological filters composed of a set of protein properties associated with bacterial drug targets and similarity to protein-coding essential genes of Escherichia coli (strain K12) has been created to predict potential antibiotic drug targets in the Enterobacteriaceae family. This model identified 99 potential drug target proteins in the studied family, which exhibit eight different functions and are protein-coding essential genes or similar to protein-coding essential genes of E.coli (strain K12), indicating that the disruption of the activities of these proteins is critical for cells. Proteins from bacteria with described drug resistance were found among the retrieved candidates. These candidates have no similarity to the human proteome, therefore exhibiting the advantage of causing no adverse effects or at least no known adverse effects on humans. rita_silverio@hotmail.com. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

PubMed

Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

2015-12-11

High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
MHC class I-associated peptides derive from selective regions of the human genome.

PubMed

Pearson, Hillary; Daouda, Tariq; Granados, Diana Paola; Durette, Chantal; Bonneil, Eric; Courcelles, Mathieu; Rodenbrock, Anja; Laverdure, Jean-Philippe; Côté, Caroline; Mader, Sylvie; Lemieux, Sébastien; Thibault, Pierre; Perreault, Claude

2016-12-01

MHC class I-associated peptides (MAPs) define the immune self for CD8+ T lymphocytes and are key targets of cancer immunosurveillance. Here, the goals of our work were to determine whether the entire set of protein-coding genes could generate MAPs and whether specific features influence the ability of discrete genes to generate MAPs. Using proteogenomics, we have identified 25,270 MAPs isolated from the B lymphocytes of 18 individuals who collectively expressed 27 high-frequency HLA-A,B allotypes. The entire MAP repertoire presented by these 27 allotypes covered only 10% of the exomic sequences expressed in B lymphocytes. Indeed, 41% of expressed protein-coding genes generated no MAPs, while 59% of genes generated up to 64 MAPs, often derived from adjacent regions and presented by different allotypes. We next identified several features of transcripts and proteins associated with efficient MAP production. From these data, we built a logistic regression model that predicts with good accuracy whether a gene generates MAPs. Our results show preferential selection of MAPs from a limited repertoire of proteins with distinctive features. The notion that the MHC class I immunopeptidome presents only a small fraction of the protein-coding genome for monitoring by the immune system has profound implications in autoimmunity and cancer immunology.
MHC class I–associated peptides derive from selective regions of the human genome

PubMed Central

Pearson, Hillary; Granados, Diana Paola; Durette, Chantal; Bonneil, Eric; Courcelles, Mathieu; Rodenbrock, Anja; Laverdure, Jean-Philippe; Côté, Caroline; Thibault, Pierre

2016-01-01

MHC class I–associated peptides (MAPs) define the immune self for CD8+ T lymphocytes and are key targets of cancer immunosurveillance. Here, the goals of our work were to determine whether the entire set of protein-coding genes could generate MAPs and whether specific features influence the ability of discrete genes to generate MAPs. Using proteogenomics, we have identified 25,270 MAPs isolated from the B lymphocytes of 18 individuals who collectively expressed 27 high-frequency HLA-A,B allotypes. The entire MAP repertoire presented by these 27 allotypes covered only 10% of the exomic sequences expressed in B lymphocytes. Indeed, 41% of expressed protein-coding genes generated no MAPs, while 59% of genes generated up to 64 MAPs, often derived from adjacent regions and presented by different allotypes. We next identified several features of transcripts and proteins associated with efficient MAP production. From these data, we built a logistic regression model that predicts with good accuracy whether a gene generates MAPs. Our results show preferential selection of MAPs from a limited repertoire of proteins with distinctive features. The notion that the MHC class I immunopeptidome presents only a small fraction of the protein-coding genome for monitoring by the immune system has profound implications in autoimmunity and cancer immunology. PMID:27841757
Characterization of the complete mitochondrial genome of the hybrid Epinephelus moara♀ × Epinephelus lanceolatus♂, and phylogenetic analysis in subfamily epinephelinae

NASA Astrophysics Data System (ADS)

Gao, Fengtao; Wei, Min; Zhu, Ying; Guo, Hua; Chen, Songlin; Yang, Guanpin

2017-06-01

This study presents the complete mitochondrial genome of the hybrid Epinephelus moara♀× Epinephelus lanceolatus♂. The genome is 16886 bp in length, and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, a light-strand replication origin and a control region. Additionally, phylogenetic analysis based on the nucleotide sequences of 13 conserved protein-coding genes using the maximum likelihood method indicated that the mitochondrial genome is maternally inherited. This study presents genomic data for studying phylogenetic relationships and breeding of hybrid Epinephelinae.
Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals.

PubMed

Popova, Olga V; Mikhailov, Kirill V; Nikitin, Mikhail A; Logacheva, Maria D; Penin, Aleksey A; Muntyan, Maria S; Kedrova, Olga S; Petrov, Nikolai B; Panchin, Yuri V; Aleoshin, Vladimir V

2016-01-01

Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha-an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida) and Pycnophyes kielensis (Allomalorhagida). Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even Protostomia.

Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals

PubMed Central

Popova, Olga V.; Mikhailov, Kirill V.; Nikitin, Mikhail A.; Logacheva, Maria D.; Penin, Aleksey A.; Muntyan, Maria S.; Kedrova, Olga S.; Petrov, Nikolai B.; Panchin, Yuri V.

2016-01-01

Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha—an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida) and Pycnophyes kielensis (Allomalorhagida). Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even Protostomia. PMID:27755612
Evaluation of 10 genes encoding cardiac proteins in Doberman Pinschers with dilated cardiomyopathy.

PubMed

O'Sullivan, M Lynne; O'Grady, Michael R; Pyle, W Glen; Dawson, John F

2011-07-01

To identify a causative mutation for dilated cardiomyopathy (DCM) in Doberman Pinschers by sequencing the coding regions of 10 cardiac genes known to be associated with familial DCM in humans. 5 Doberman Pinschers with DCM and congestive heart failure and 5 control mixed-breed dogs that were euthanized or died. RNA was extracted from frozen ventricular myocardial samples from each dog, and first-strand cDNA was synthesized via reverse transcription, followed by PCR amplification with gene-specific primers. Ten cardiac genes were analyzed: cardiac actin, α-actinin, α-tropomyosin, β-myosin heavy chain, metavinculin, muscle LIM protein, myosinbinding protein C, tafazzin, titin-cap (telethonin), and troponin T. Sequences for DCM-affected and control dogs and the published canine genome were compared. None of the coding sequences yielded a common causative mutation among all Doberman Pinscher samples. However, 3 variants were identified in the α-actinin gene in the DCM-affected Doberman Pinschers. One of these variants, identified in 2 of the 5 Doberman Pinschers, resulted in an amino acid change in the rod-forming triple coiled-coil domain. Mutations in the coding regions of several genes associated with DCM in humans did not appear to consistently account for DCM in Doberman Pinschers. However, an α-actinin variant was detected in some Doberman Pinschers that may contribute to the development of DCM given its potential effect on the structure of this protein. Investigation of additional candidate gene coding and noncoding regions and further evaluation of the role of α-actinin in development of DCM in Doberman Pinschers are warranted.
Ribosome profiling reveals changes in translational status of soybean transcripts during immature cotyledon development

PubMed Central

Shamimuzzaman, Md.

2018-01-01

To understand translational capacity on a genome-wide scale across three developmental stages of immature soybean seed cotyledons, ribosome profiling was performed in combination with RNA sequencing and cluster analysis. Transcripts representing 216 unique genes demonstrated a higher level of translational activity in at least one stage by exhibiting higher translational efficiencies (TEs) in which there were relatively more ribosome footprint sequence reads mapping to the transcript than were present in the control total RNA sample. The majority of these transcripts were more translationally active at the early stage of seed development and included 12 unique serine or cysteine proteases and 16 2S albumin and low molecular weight cysteine-rich proteins that may serve as substrates for turnover and mobilization early in seed development. It would appear that the serine proteases and 2S albumins play a vital role in the early stages. In contrast, our investigation of profiles of 19 genes encoding high abundance seed storage proteins, such as glycinins, beta-conglycinins, lectin, and Kunitz trypsin inhibitors, showed that they all had similar patterns in which the TE values started at low levels and increased approximately 2 to 6-fold during development. The highest levels of these seed protein transcripts were found at the mid-developmental stage, whereas the highest ribosome footprint levels of only up to 1.6 TE were found at the late developmental stage. These experimental findings suggest that the major seed storage protein coding genes are primarily regulated at the transcriptional level during normal soybean cotyledon development. Finally, our analyses also identified a total of 370 unique gene models that showed very low TE values including over 48 genes encoding ribosomal family proteins and 95 gene models that are related to energy and photosynthetic functions, many of which have homology to the chloroplast genome. Additionally, we showed that genes of the chloroplast were relatively translationally inactive during seed development. PMID:29570733
Co-LncRNA: investigating the lncRNA combinatorial effects in GO annotations and KEGG pathways based on human RNA-Seq data.

PubMed

Zhao, Zheng; Bai, Jing; Wu, Aiwei; Wang, Yuan; Zhang, Jinwen; Wang, Zishan; Li, Yongsheng; Xu, Juan; Li, Xia

2015-01-01

Long non-coding RNAs (lncRNAs) are emerging as key regulators of diverse biological processes and diseases. However, the combinatorial effects of these molecules in a specific biological function are poorly understood. Identifying co-expressed protein-coding genes of lncRNAs would provide ample insight into lncRNA functions. To facilitate such an effort, we have developed Co-LncRNA, which is a web-based computational tool that allows users to identify GO annotations and KEGG pathways that may be affected by co-expressed protein-coding genes of a single or multiple lncRNAs. LncRNA co-expressed protein-coding genes were first identified in publicly available human RNA-Seq datasets, including 241 datasets across 6560 total individuals representing 28 tissue types/cell lines. Then, the lncRNA combinatorial effects in a given GO annotations or KEGG pathways are taken into account by the simultaneous analysis of multiple lncRNAs in user-selected individual or multiple datasets, which is realized by enrichment analysis. In addition, this software provides a graphical overview of pathways that are modulated by lncRNAs, as well as a specific tool to display the relevant networks between lncRNAs and their co-expressed protein-coding genes. Co-LncRNA also supports users in uploading their own lncRNA and protein-coding gene expression profiles to investigate the lncRNA combinatorial effects. It will be continuously updated with more human RNA-Seq datasets on an annual basis. Taken together, Co-LncRNA provides a web-based application for investigating lncRNA combinatorial effects, which could shed light on their biological roles and could be a valuable resource for this community. Database URL: http://www.bio-bigdata.com/Co-LncRNA/. © The Author(s) 2015. Published by Oxford University Press.
Decoding the disease-associated proteins encoded in the human chromosome 4.

PubMed

Chen, Lien-Chin; Liu, Mei-Ying; Hsiao, Yung-Chin; Choong, Wai-Kok; Wu, Hsin-Yi; Hsu, Wen-Lian; Liao, Pao-Chi; Sung, Ting-Yi; Tsai, Shih-Feng; Yu, Jau-Song; Chen, Yu-Ju

2013-01-04

Chromosome 4 is the fourth largest chromosome, containing approximately 191 megabases (~6.4% of the human genome) with 757 protein-coding genes. A number of marker genes for many diseases have been found in this chromosome, including genetic diseases (e.g., hepatocellular carcinoma) and biomedical research (cardiac system, aging, metabolic disorders, immune system, cancer and stem cell) related genes (e.g., oncogenes, growth factors). As a pilot study for the chromosome 4-centric human proteome project (Chr 4-HPP), we present here a systematic analysis of the disease association, protein isoforms, coding single nucleotide polymorphisms of these 757 protein-coding genes and their experimental evidence at the protein level. We also describe how the findings from the chromosome 4 project might be used to drive the biomarker discovery and validation study in disease-oriented projects, using the examples of secretomic and membrane proteomic approaches in cancer research. By integrating with cancer cell secretomes and several other existing databases in the public domain, we identified 141 chromosome 4-encoded proteins as cancer cell-secretable/shedable proteins. Additionally, we also identified 54 chromosome 4-encoded proteins that have been classified as cancer-associated proteins with successful selected or multiple reaction monitoring (SRM/MRM) assays developed. From literature annotation and topology analysis, 271 proteins were recognized as membrane proteins while 27.9% of the 757 proteins do not have any experimental evidence at the protein-level. In summary, the analysis revealed that the chromosome 4 is a rich resource for cancer-associated proteins for biomarker verification projects and for drug target discovery projects.
The compositional transition of vertebrate genomes: an analysis of the secondary structure of the proteins encoded by human genes.

PubMed

D'Onofrio, Giuseppe; Ghosh, Tapash Chandra

2005-01-17

Fluctuations and increments of both C(3) and G(3) levels along the human coding sequences were investigated comparing two sets of Xenopus/human orthologous genes. The first set of genes shows minor differences of the GC(3) levels, the second shows considerable increments of the GC(3) levels in the human genes. In both data sets, the fluctuations of C(3) and G(3) levels along the coding sequences correlated with the secondary structures of the encoded proteins. The human genes that underwent the compositional transition showed a different increment of the C(3) and G(3) levels within and among the structural units of the proteins. The relative synonymous codon usage (RSCU) of several amino acids were also affected during the compositional transition, showing that there exists a correlation between RSCU and protein secondary structures in human genes. The importance of natural selection for the formation of isochore organization of the human genome has been discussed on the basis of these results.
A global analysis of protein expression profiles in Sinorhizobium meliloti: discovery of new genes for nodule occupancy and stress adaptation.

PubMed

Djordjevic, Michael A; Chen, Han Cai; Natera, Siria; Van Noorden, Giel; Menzel, Christian; Taylor, Scott; Renard, Clotilde; Geiger, Otto; Weiller, Georg F

2003-06-01

A proteomic examination of Sinorhizobium meliloti strain 1021 was undertaken using a combination of 2-D gel electrophoresis, peptide mass fingerprinting, and bioinformatics. Our goal was to identify (i) putative symbiosis- or nutrient-stress-specific proteins, (ii) the biochemical pathways active under different conditions, (iii) potential new genes, and (iv) the extent of posttranslational modifications of S. meliloti proteins. In total, we identified the protein products of 810 genes (13.1% of the genome's coding capacity). The 810 genes generated 1,180 gene products, with chromosomal genes accounting for 78% of the gene products identified (18.8% of the chromosome's coding capacity). The activity of 53 metabolic pathways was inferred from bioinformatic analysis of proteins with assigned Enzyme Commission numbers. Of the remaining proteins that did not encode enzymes, ABC-type transporters composed 12.7% and regulatory proteins 3.4% of the total. Proteins with up to seven transmembrane domains were identified in membrane preparations. A total of 27 putative nodule-specific proteins and 35 nutrient-stress-specific proteins were identified and used as a basis to define genes and describe processes occurring in S. meliloti cells in nodules and under stress. Several nodule proteins from the plant host were present in the nodule bacteria preparations. We also identified seven potentially novel proteins not predicted from the DNA sequence. Post-translational modifications such as N-terminal processing could be inferred from the data. The posttranslational addition of UMP to the key regulator of nitrogen metabolism, PII, was demonstrated. This work demonstrates the utility of combining mass spectrometry with protein arraying or separation techniques to identify candidate genes involved in important biological processes and niche occupations that may be intransigent to other methods of gene expression profiling.
Long Non-Coding RNAs Responsive to Salt and Boron Stress in the Hyper-Arid Lluteño Maize from Atacama Desert.

PubMed

Huanca-Mamani, Wilson; Arias-Carrasco, Raúl; Cárdenas-Ninasivincha, Steffany; Rojas-Herrera, Marcelo; Sepúlveda-Hermosilla, Gonzalo; Caris-Maldonado, José Carlos; Bastías, Elizabeth; Maracaja-Coutinho, Vinicius

2018-03-20

Long non-coding RNAs (lncRNAs) have been defined as transcripts longer than 200 nucleotides, which lack significant protein coding potential and possess critical roles in diverse cellular processes. Long non-coding RNAs have recently been functionally characterized in plant stress-response mechanisms. In the present study, we perform a comprehensive identification of lncRNAs in response to combined stress induced by salinity and excess of boron in the Lluteño maize, a tolerant maize landrace from Atacama Desert, Chile. We use deep RNA sequencing to identify a set of 48,345 different lncRNAs, of which 28,012 (58.1%) are conserved with other maize (B73, Mo17 or Palomero), with the remaining 41.9% belonging to potentially Lluteño exclusive lncRNA transcripts. According to B73 maize reference genome sequence, most Lluteño lncRNAs correspond to intergenic transcripts. Interestingly, Lluteño lncRNAs presents an unusual overall higher expression compared to protein coding genes under exposure to stressed conditions. In total, we identified 1710 putatively responsive to the combined stressed conditions of salt and boron exposure. We also identified a set of 848 stress responsive potential trans natural antisense transcripts ( trans -NAT) lncRNAs, which seems to be regulating genes associated with regulation of transcription, response to stress, response to abiotic stimulus and participating of the nicotianamine metabolic process. Reverse transcription-quantitative PCR (RT-qPCR) experiments were performed in a subset of lncRNAs, validating their existence and expression patterns. Our results suggest that a diverse set of maize lncRNAs from leaves and roots is responsive to combined salt and boron stress, being the first effort to identify lncRNAs from a maize landrace adapted to extreme conditions such as the Atacama Desert. The information generated is a starting point to understand the genomic adaptabilities suffered by this maize to surpass this extremely stressed environment.
Long Non-Coding RNAs Responsive to Salt and Boron Stress in the Hyper-Arid Lluteño Maize from Atacama Desert

PubMed Central

Huanca-Mamani, Wilson; Arias-Carrasco, Raúl; Cárdenas-Ninasivincha, Steffany; Rojas-Herrera, Marcelo; Sepúlveda-Hermosilla, Gonzalo; Caris-Maldonado, José Carlos; Bastías, Elizabeth; Maracaja-Coutinho, Vinicius

2018-01-01

Long non-coding RNAs (lncRNAs) have been defined as transcripts longer than 200 nucleotides, which lack significant protein coding potential and possess critical roles in diverse cellular processes. Long non-coding RNAs have recently been functionally characterized in plant stress–response mechanisms. In the present study, we perform a comprehensive identification of lncRNAs in response to combined stress induced by salinity and excess of boron in the Lluteño maize, a tolerant maize landrace from Atacama Desert, Chile. We use deep RNA sequencing to identify a set of 48,345 different lncRNAs, of which 28,012 (58.1%) are conserved with other maize (B73, Mo17 or Palomero), with the remaining 41.9% belonging to potentially Lluteño exclusive lncRNA transcripts. According to B73 maize reference genome sequence, most Lluteño lncRNAs correspond to intergenic transcripts. Interestingly, Lluteño lncRNAs presents an unusual overall higher expression compared to protein coding genes under exposure to stressed conditions. In total, we identified 1710 putatively responsive to the combined stressed conditions of salt and boron exposure. We also identified a set of 848 stress responsive potential trans natural antisense transcripts (trans-NAT) lncRNAs, which seems to be regulating genes associated with regulation of transcription, response to stress, response to abiotic stimulus and participating of the nicotianamine metabolic process. Reverse transcription-quantitative PCR (RT-qPCR) experiments were performed in a subset of lncRNAs, validating their existence and expression patterns. Our results suggest that a diverse set of maize lncRNAs from leaves and roots is responsive to combined salt and boron stress, being the first effort to identify lncRNAs from a maize landrace adapted to extreme conditions such as the Atacama Desert. The information generated is a starting point to understand the genomic adaptabilities suffered by this maize to surpass this extremely stressed environment. PMID:29558449
APPRIS 2017: principal isoforms for multiple gene sets

PubMed Central

Rodriguez-Rivas, Juan; Di Domenico, Tomás; Vázquez, Jesús; Valencia, Alfonso

2018-01-01

Abstract The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants. PMID:29069475
Satellite DNA Modulates Gene Expression in the Beetle Tribolium castaneum after Heat Stress

PubMed Central

Feliciello, Isidoro; Akrap, Ivana; Ugarković, Đurđica

2015-01-01

Non-coding repetitive DNAs have been proposed to perform a gene regulatory role, however for tandemly repeated satellite DNA no such role was defined until now. Here we provide the first evidence for a role of satellite DNA in the modulation of gene expression under specific environmental conditions. The major satellite DNA TCAST1 in the beetle Tribolium castaneum is preferentially located within pericentromeric heterochromatin but is also dispersed as single repeats or short arrays in the vicinity of protein-coding genes within euchromatin. Our results show enhanced suppression of activity of TCAST1-associated genes and slower recovery of their activity after long-term heat stress relative to the same genes without associated TCAST1 satellite DNA elements. The level of gene suppression is not influenced by the distance of TCAST1 elements from the associated genes up to 40 kb from the genes’ transcription start sites, but it does depend on the copy number of TCAST1 repeats within an element, being stronger for the higher number of copies. The enhanced gene suppression correlates with the enrichment of the repressive histone marks H3K9me2/3 at dispersed TCAST1 elements and their flanking regions as well as with increased expression of TCAST1 satellite DNA. The results reveal transient, RNAi based heterochromatin formation at dispersed TCAST1 repeats and their proximal regions as a mechanism responsible for enhanced silencing of TCAST1-associated genes. Differences in the pattern of distribution of TCAST1 elements contribute to gene expression diversity among T. castaneum strains after long-term heat stress and might have an impact on adaptation to different environmental conditions. PMID:26275223
The gene coding for the B cell surface protein CD19 is localized on human chromosome 16p11.

PubMed

Stapleton, P; Kozmik, Z; Weith, A; Busslinger, M

1995-02-01

The CD19 gene codes for one of the earliest markers of the human B cell lineage and is a target for the B lymphoid-specific transcription factor BSAP (Pax-5). The transmembrane protein CD19 has been implicated in controlling proliferation of mature B lymphocytes by modulating signal transduction through the antigen receptor. In this study, we have employed Southern blot and fluorescence in situ hybridization analyses to localize the CD19 gene to human chromosome 16p11.
A murC gene in Porphyromonas gingivalis 381.

PubMed

Ansai, T; Yamashita, Y; Awano, S; Shibata, Y; Wachi, M; Nagai, K; Takehara, T

1995-09-01

The gene encoding a 51 kDa polypeptide of Porphyromonas gingivalis 381 was isolated by immunoblotting using an antiserum raised against P. gingivalis alkaline phosphatase. DNA sequence analysis of a 2.5 kb DNA fragment containing a gene encoding the 51 kDa protein revealed one complete and two incomplete ORFs. Database searches using the FASTA program revealed significant homology between the P. gingivalis 51 kDa protein and the MurC protein of Escherichia coli, which functions in peptidoglycan synthesis. The cloned 51 kDa protein encoded a functional product that complemented an E. coli murC mutant. Moreover, the ORF just upstream of murC coded for a protein that was 31% homologous with the E. coli MurG protein. The ORF just downstream of murC coded for a protein that was 17% homologous with the Streptococcus pneumoniae penicillin-binding protein 2B (PBP2B), which functions in peptidoglycan synthesis and is responsible for antibiotic resistance. These results suggest that P. gingivalis contains a homologue of the E. coli peptidoglycan synthesis gene murC and indicate the possibility of a cluster of genes responsible for cell division and cell growth, as in the E. coli mra region.
Molecular cloning and sequence analysis of the gene coding for the 57kDa soluble antigen of the salmonid fish pathogen Renibacterium salmoninarum

USGS Publications Warehouse

Chien, Maw-Sheng; Gilbert , Teresa L.; Huang, Chienjin; Landolt, Marsha L.; O'Hara, Patrick J.; Winton, James R.

1992-01-01

The complete sequence coding for the 57-kDa major soluble antigen of the salmonid fish pathogen, Renibacterium salmoninarum, was determined. The gene contained an opening reading frame of 1671 nucleotides coding for a protein of 557 amino acids with a calculated Mr value of 57190. The first 26 amino acids constituted a signal peptide. The deduced sequence for amino acid residues 27–61 was in agreement with the 35 N-terminal amino acid residues determined by microsequencing, suggesting the protein in synthesized as a 557-amino acid precursor and processed to produce a mature protein of Mr 54505. Two regions of the protein contained imperfect direct repeats. The first region contained two copies of an 81-residue repeat, the second contained five copies of an unrelated 25-residue repeat. Also, a perfect inverted repeat (including three in-frame UAA stop codons) was observed at the carboxyl-terminus of the gene.
Characterization of the complete mitochondrial genome of Marshallagia marshalli and phylogenetic implications for the superfamily Trichostrongyloidea.

PubMed

Sun, Miao-Miao; Han, Liang; Zhang, Fu-Kai; Zhou, Dong-Hui; Wang, Shu-Qing; Ma, Jun; Zhu, Xing-Quan; Liu, Guo-Hua

2018-01-01

Marshallagia marshalli (Nematoda: Trichostrongylidae) infection can lead to serious parasitic gastroenteritis in sheep, goat, and wild ruminant, causing significant socioeconomic losses worldwide. Up to now, the study concerning the molecular biology of M. marshalli is limited. Herein, we sequenced the complete mitochondrial (mt) genome of M. marshalli and examined its phylogenetic relationship with selected members of the superfamily Trichostrongyloidea using Bayesian inference (BI) based on concatenated mt amino acid sequence datasets. The complete mt genome sequence of M. marshalli is 13,891 bp, including 12 protein-coding genes, 22 transfer RNA genes, and 2 ribosomal RNA genes. All protein-coding genes are transcribed in the same direction. Phylogenetic analyses based on concatenated amino acid sequences of the 12 protein-coding genes supported the monophylies of the families Haemonchidae, Molineidae, and Dictyocaulidae with strong statistical support, but rejected the monophyly of the family Trichostrongylidae. The determination of the complete mt genome sequence of M. marshalli provides novel genetic markers for studying the systematics, population genetics, and molecular epidemiology of M. marshalli and its congeners.
RNA editing of non-coding RNA and its role in gene regulation.

PubMed

Daniel, Chammiran; Lagergren, Jens; Öhman, Marie

2015-10-01

It has for a long time been known that repetitive elements, particularly Alu sequences in human, are edited by the adenosine deaminases acting on RNA, ADAR, family. The functional interpretation of these events has been even more difficult than that of editing events in coding sequences, but today there is an emerging understanding of their downstream effects. A surprisingly large fraction of the human transcriptome contains inverted Alu repeats, often forming long double stranded structures in RNA transcripts, typically occurring in introns and UTRs of protein coding genes. Alu repeats are also common in other primates, and similar inverted repeats can frequently be found in non-primates, although the latter are less prone to duplex formation. In human, as many as 700,000 Alu elements have been identified as substrates for RNA editing, of which many are edited at several sites. In fact, recent advancements in transcriptome sequencing techniques and bioinformatics have revealed that the human editome comprises at least a hundred million adenosine to inosine (A-to-I) editing sites in Alu sequences. Although substantial additional efforts are required in order to map the editome, already present knowledge provides an excellent starting point for studying cis-regulation of editing. In this review, we will focus on editing of long stem loop structures in the human transcriptome and how it can effect gene expression. Copyright © 2015 Elsevier B.V. and Société Française de Biochimie et Biologie Moléculaire (SFBBM). All rights reserved.
Genomics of Clostridium taeniosporum, an organism which forms endospores with ribbon-like appendages

PubMed Central

Cambridge, Joshua M.; Blinkova, Alexandra L.; Salvador Rocha, Erick I.; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M.; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O.

2018-01-01

Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12–14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism. PMID:29293521
Genomics of Clostridium taeniosporum, an organism which forms endospores with ribbon-like appendages.

PubMed

Cambridge, Joshua M; Blinkova, Alexandra L; Salvador Rocha, Erick I; Bode Hernández, Addys; Moreno, Maday; Ginés-Candelaria, Edwin; Goetz, Benjamin M; Hunicke-Smith, Scott; Satterwhite, Ed; Tucker, Haley O; Walker, James R

2018-01-01

Clostridium taeniosporum, a non-pathogenic anaerobe closely related to the C. botulinum Group II members, was isolated from Crimean lake silt about 60 years ago. Its endospores are surrounded by an encasement layer which forms a trunk at one spore pole to which about 12-14 large, ribbon-like appendages are attached. The genome consists of one 3,264,813 bp, circular chromosome (with 26.6% GC) and three plasmids. The chromosome contains 2,892 potential protein coding sequences: 2,124 have specific functions, 147 have general functions, 228 are conserved but without known function and 393 are hypothetical based on the fact that no statistically significant orthologs were found. The chromosome also contains 101 genes for stable RNAs, including 7 rRNA clusters. Over 84% of the protein coding sequences and 96% of the stable RNA coding regions are oriented in the same direction as replication. The three known appendage genes are located within a single cluster with five other genes, the protein products of which are closely related, in terms of sequence, to the known appendage proteins. The relatedness of the deduced protein products suggests that all or some of the closely related genes might code for minor appendage proteins or assembly factors. The appendage genes might be unique among the known clostridia; no statistically significant orthologs were found within other clostridial genomes for which sequence data are available. The C. taeniosporum chromosome contains two functional prophages, one Siphoviridae and one Myoviridae, and one defective prophage. Three plasmids of 5.9, 69.7 and 163.1 Kbp are present. These data are expected to contribute to future studies of developmental, structural and evolutionary biology and to potential industrial applications of this organism.
Molecular characterisation of Atlantic salmon paramyxovirus (ASPV): A novel paramyxovirus associated with proliferative gill inflammation

USGS Publications Warehouse

Falk, K.; Batts, W.N.; Kvellestad, A.; Kurath, G.; Wiik-Nielsen, J.; Winton, J.R.

2008-01-01

Atlantic salmon paramyxovirus (ASPV) was isolated in 1995 from gills of farmed Atlantic salmon suffering from proliferative gill inflammation. The complete genome sequence of ASPV was determined, revealing a genome 16,968 nucleotides in length consisting of six non-overlapping genes coding for the nucleo- (N), phospho- (P), matrix- (M), fusion- (F), haemagglutinin-neuraminidase- (HN) and large polymerase (L) proteins in the order 3???-N-P-M-F-HN-L-5???. The various conserved features related to virus replication found in most paramyxoviruses were also found in ASPV. These include: conserved and complementary leader and trailer sequences, tri-nucleotide intergenic regions and highly conserved transcription start and stop signal sequences. The P gene expression strategy of ASPV was like that of the respiro-, morbilli- and henipaviruses, which express the P and C proteins from the primary transcript and edit a portion of the mRNA to encode V and W proteins. Sequence similarities among various features related to virus replication, pairwise comparisons of all deduced ASPV protein sequences with homologous regions from other members of the family Paramyxoviridae, and phylogenetic analyses of these amino acid sequences suggested that ASPV was a novel member of the sub-family Paramyxovirinae, most closely related to the respiroviruses. ?? 2008 Elsevier B.V. All rights reserved.
Identification Of Protein Vaccine Candidates Using Comprehensive Proteomic Analysis Strategies

DTIC Science & Technology

2007-12-01

urease (URE) gene codes for a urea amidohydrolase protein that catalyzes urea hydrolysis. The protein was first isolated from C. immitis and...the Cu, Zn, Superoxide Dismutase (SOD), the Spherule Outer Wall glycoprotein (SOWgp), the T-Cell Reactive Protein (TCRP), and Urease (URE). It is...et al. 1997. Isolation and characterization of the urease gene (URE) from the pathogenic fungus Coccidioides immitis. Gene 198: 387-391. 54. Li, K

Cloning and identification of bacteriophage T4 gene 2 product gp2 and action of gp2 on infecting DNA in vivo.

PubMed Central

Lipinska, B; Rao, A S; Bolten, B M; Balakrishnan, R; Goldberg, E B

1989-01-01

We sequenced bacteriophage T4 genes 2 and 3 and the putative C-terminal portion of gene 50. They were found to have appropriate open reading frames directed counterclockwise on the T4 map. Mutations in genes 2 and 64 were shown to be in the same open reading frame, which we now call gene 2. This gene codes for a protein of 27,068 daltons. The open reading frame corresponding to gene 3 codes for a protein of 20,634 daltons. Appropriate bands on polyacrylamide gels were identified at 30 and 20 kilodaltons, respectively. We found that the product of the cloned gene 2 can protect T4 DNA double-stranded ends from exonuclease V action. Images PMID:2644202
Dynamic gene expression response to altered gravity in human T cells.

PubMed

Thiel, Cora S; Hauschild, Swantje; Huge, Andreas; Tauber, Svantje; Lauber, Beatrice A; Polzer, Jennifer; Paulsen, Katrin; Lier, Hartwin; Engelmann, Frank; Schmitz, Burkhard; Schütte, Andreas; Layer, Liliana E; Ullrich, Oliver

2017-07-12

We investigated the dynamics of immediate and initial gene expression response to different gravitational environments in human Jurkat T lymphocytic cells and compared expression profiles to identify potential gravity-regulated genes and adaptation processes. We used the Affymetrix GeneChip® Human Transcriptome Array 2.0 containing 44,699 protein coding genes and 22,829 non-protein coding genes and performed the experiments during a parabolic flight and a suborbital ballistic rocket mission to cross-validate gravity-regulated gene expression through independent research platforms and different sets of control experiments to exclude other factors than alteration of gravity. We found that gene expression in human T cells rapidly responded to altered gravity in the time frame of 20 s and 5 min. The initial response to microgravity involved mostly regulatory RNAs. We identified three gravity-regulated genes which could be cross-validated in both completely independent experiment missions: ATP6V1A/D, a vacuolar H + -ATPase (V-ATPase) responsible for acidification during bone resorption, IGHD3-3/IGHD3-10, diversity genes of the immunoglobulin heavy-chain locus participating in V(D)J recombination, and LINC00837, a long intergenic non-protein coding RNA. Due to the extensive and rapid alteration of gene expression associated with regulatory RNAs, we conclude that human cells are equipped with a robust and efficient adaptation potential when challenged with altered gravitational environments.
A systemic identification approach for primary transcription start site of Arabidopsis miRNAs from multidimensional omics data.

PubMed

You, Qi; Yan, Hengyu; Liu, Yue; Yi, Xin; Zhang, Kang; Xu, Wenying; Su, Zhen

2017-05-01

The 22-nucleotide non-coding microRNAs (miRNAs) are mostly transcribed by RNA polymerase II and are similar to protein-coding genes. Unlike the clear process from stem-loop precursors to mature miRNAs, the primary transcriptional regulation of miRNA, especially in plants, still needs to be further clarified, including the original transcription start site, functional cis-elements and primary transcript structures. Due to several well-characterized transcription signals in the promoter region, we proposed a systemic approach integrating multidimensional "omics" (including genomics, transcriptomics, and epigenomics) data to improve the genome-wide identification of primary miRNA transcripts. Here, we used the model plant Arabidopsis thaliana to improve the ability to identify candidate promoter locations in intergenic miRNAs and to determine rules for identifying primary transcription start sites of miRNAs by integrating high-throughput omics data, such as the DNase I hypersensitive sites, chromatin immunoprecipitation-sequencing of polymerase II and H3K4me3, as well as high throughput transcriptomic data. As a result, 93% of refined primary transcripts could be confirmed by the primer pairs from a previous study. Cis-element and secondary structure analyses also supported the feasibility of our results. This work will contribute to the primary transcriptional regulatory analysis of miRNAs, and the conserved regulatory pattern may be a suitable miRNA characteristic in other plant species.
Generation of a variety of stable Influenza A reporter viruses by genetic engineering of the NS gene segment

PubMed Central

Reuther, Peter; Göpfert, Kristina; Dudek, Alexandra H.; Heiner, Monika; Herold, Susanne; Schwemmle, Martin

2015-01-01

Influenza A viruses (IAV) pose a constant threat to the human population and therefore a better understanding of their fundamental biology and identification of novel therapeutics is of upmost importance. Various reporter-encoding IAV were generated to achieve these goals, however, one recurring difficulty was the genetic instability especially of larger reporter genes. We employed the viral NS segment coding for the non-structural protein 1 (NS1) and nuclear export protein (NEP) for stable expression of diverse reporter proteins. This was achieved by converting the NS segment into a single open reading frame (ORF) coding for NS1, the respective reporter and NEP. To allow expression of individual proteins, the reporter genes were flanked by two porcine Teschovirus-1 2A peptide (PTV-1 2A)-coding sequences. The resulting viruses encoding luciferases, fluorescent proteins or a Cre recombinase are characterized by a high genetic stability in vitro and in mice and can be readily employed for antiviral compound screenings, visualization of infected cells or cells that survived acute infection. PMID:26068081
Human Immunodeficiency Virus-Type 1 LTR DNA contains an intrinsic gene producing antisense RNA and protein products

PubMed Central

Ludwig, Linda B; Ambrus, Julian L; Krawczyk, Kristie A; Sharma, Sanjay; Brooks, Stephen; Hsiao, Chiu-Bin; Schwartz, Stanley A

2006-01-01

Background While viruses have long been shown to capitalize on their limited genomic size by utilizing both strands of DNA or complementary DNA/RNA intermediates to code for viral proteins, it has been assumed that human retroviruses have all their major proteins translated only from the plus or sense strand of RNA, despite their requirement for a dsDNA proviral intermediate. Several studies, however, have suggested the presence of antisense transcription for both HIV-1 and HTLV-1. More recently an antisense transcript responsible for the HTLV-1 bZIP factor (HBZ) protein has been described. In this study we investigated the possibility of an antisense gene contained within the human immunodeficiency virus type 1 (HIV-1) long terminal repeat (LTR). Results Inspection of published sequences revealed a potential transcription initiator element (INR) situated downstream of, and in reverse orientation to, the usual HIV-1 promoter and transcription start site. This antisense initiator (HIVaINR) suggested the possibility of an antisense gene responsible for RNA and protein production. We show that antisense transcripts are generated, in vitro and in vivo, originating from the TAR DNA of the HIV-1 LTR. To test the possibility that protein(s) could be translated from this novel HIV-1 antisense RNA, recombinant HIV antisense gene-FLAG vectors were designed. Recombinant protein(s) were produced and isolated utilizing carboxy-terminal FLAG epitope (DYKDDDDK) sequences. In addition, affinity-purified antisera to an internal peptide derived from the HIV antisense protein (HAP) sequences identified HAPs from HIV+ human peripheral blood lymphocytes. Conclusion HIV-1 contains an antisense gene in the U3-R regions of the LTR responsible for both an antisense RNA transcript and proteins. This antisense transcript has tremendous potential for intrinsic RNA regulation because of its overlap with the beginning of all HIV-1 sense RNA transcripts by 25 nucleotides. The novel HAPs are encoded in a region of the LTR that has already been shown to be deleted in some HIV-infected long-term survivors and represent new potential targets for vaccine development. PMID:17090330
Novel coding, translation, and gene expression of a replicating covalently closed circular RNA of 220 nt.

PubMed

AbouHaidar, Mounir Georges; Venkataraman, Srividhya; Golshani, Ashkan; Liu, Bolin; Ahmad, Tauqeer

2014-10-07

The highly structured (64% GC) covalently closed circular (CCC) RNA (220 nt) of the virusoid associated with rice yellow mottle virus codes for a 16-kDa highly basic protein using novel modalities for coding, translation, and gene expression. This CCC RNA is the smallest among all known viroids and virusoids and the only one that codes proteins. Its sequence possesses an internal ribosome entry site and is directly translated through two (or three) completely overlapping ORFs (shifting to a new reading frame at the end of each round). The initiation and termination codons overlap UGAUGA (underline highlights the initiation codon AUG within the combined initiation-termination sequence). Termination codons can be ignored to obtain larger read-through proteins. This circular RNA with no noncoding sequences is a unique natural supercompact "nanogenome."
Phylogenetic relationships within Echinococcus and Taenia tapeworms (Cestoda: Taeniidae): an inference from nuclear protein-coding genes.

PubMed

Knapp, Jenny; Nakao, Minoru; Yanagida, Tetsuya; Okamoto, Munehiro; Saarma, Urmas; Lavikainen, Antti; Ito, Akira

2011-12-01

The family Taeniidae of tapeworms is composed of two genera, Echinococcus and Taenia, which obligately parasitize mammals including humans. Inferring phylogeny via molecular markers is the only way to trace back their evolutionary histories. However, molecular dating approaches are lacking so far. Here we established new markers from nuclear protein-coding genes for RNA polymerase II second largest subunit (rpb2), phosphoenolpyruvate carboxykinase (pepck) and DNA polymerase delta (pold). Bayesian inference and maximum likelihood analyses of the concatenated gene sequences allowed us to reconstruct phylogenetic trees for taeniid parasites. The tree topologies clearly demonstrated that Taenia is paraphyletic and that the clade of Echinococcus oligarthrus and Echinococcusvogeli is sister to all other members of Echinococcus. Both species are endemic in Central and South America, and their definitive hosts originated from carnivores that immigrated from North America after the formation of the Panamanian land bridge about 3 million years ago (Ma). A time-calibrated phylogeny was estimated by a Bayesian relaxed-clock method based on the assumption that the most recent common ancestor of E. oligarthrus and E. vogeli existed during the late Pliocene (3.0 Ma). The results suggest that a clade of Taenia including human-pathogenic species diversified primarily in the late Miocene (11.2 Ma), whereas Echinococcus started to diversify later, in the end of the Miocene (5.8 Ma). Close genetic relationships among the members of Echinococcus imply that the genus is a young group in which speciation and global radiation occurred rapidly. Copyright © 2011 Elsevier Inc. All rights reserved.
The complete mitochondrial genomes of the Fenton′s wood white, Leptidea morsei, and the lemon emigrant, Catopsilia pomona

PubMed Central

Hao, Juan-Juan; Hao, Jia-Sheng; Sun, Xiao-Yan; Zhang, Lan-Lan; Yang, Qun

2014-01-01

Abstract The complete mitochondrial genomes of Leptidea morsei Fenton (Lepidoptera: Pieridae: Dis-morphiinae) and Catopsilia pomona (F.) (Lepidoptera: Pieridae: Coliadinae) were determined to be 15,122 and 15,142 bp in length, respectively, with that of L . morsei being the smallest among all known butterflies. Both mitogenomes contained 37 genes and an A+T-rich region, with the gene order identical to those of other butterflies, except for the presence of a tRNA-like insertion, tRNA Leu (UUR), in C . pomona . The nucleotide compositions of both genomes were higher in A and T (80.2% for L . morsei and 81.3% for C . pomona ) than C and G; the A+T bias had a significant effect on the codon usage and the amino acid composition. The protein-coding genes utilized the standard mitochondrial start codon ATN, except the COI gene using CGA as the initiation codon, as reported in other butterflies. The intergenic spacer sequence between the tRNA Ser (UCN) and ND1 genes contained the ATACTAA motif. The A+T-rich region harbored a poly-T stretch and a conserved ATAGA motif located at the end of the region. In addition, there was a triplicated 23 bp repeat and a microsatellite-like (TA) 9 (AT) 3 element in the A+T-rich region of the L. morsei mitogenome , while in C . pomona, there was a duplicated 24 bp repeat element and a microsatellite-like (TA) 9 element. The phylogenetic trees of the main butterfly lineages (Hesperiidae, Papilionidae, Pieridae, Nymphalidae, Lycaenidae, and Riodinidae) were reconstructed with maximum likelihood and Bayesian inference methods based on the 13 concatenated nucleotide sequences of protein-coding genes, and both trees showed that the Pieridae family is sister to Lycaenidae. Although this result contradicts the traditional morphologically based views, it agrees with other recent studies based on mitochondrial genomic data. PMID:25368074
From Genomes to Protein Models and Back

NASA Astrophysics Data System (ADS)

Tramontano, Anna; Giorgetti, Alejandro; Orsini, Massimiliano; Raimondo, Domenico

2007-12-01

The alternative splicing mechanism allows genes to generate more than one product. When the splicing events occur within protein coding regions they can modify the biological function of the protein. Alternative splicing has been suggested as one way for explaining the discrepancy between the number of human genes and functional complexity. We analysed the putative structure of the alternatively spliced gene products annotated in the ENCODE pilot project and discovered that many of the potential alternative gene products will be unlikely to produce stable functional proteins.
Multiple Site-Directed and Saturation Mutagenesis by the Patch Cloning Method.

PubMed

Taniguchi, Naohiro; Murakami, Hiroshi

2017-01-01

Constructing protein-coding genes with desired mutations is a basic step for protein engineering. Herein, we describe a multiple site-directed and saturation mutagenesis method, termed MUPAC. This method has been used to introduce multiple site-directed mutations in the green fluorescent protein gene and in the moloney murine leukemia virus reverse transcriptase gene. Moreover, this method was also successfully used to introduce randomized codons at five desired positions in the green fluorescent protein gene, and for simple DNA assembly for cloning.
The genome of Hyperthermus butylicus: a sulfur-reducing, peptide fermenting, neutrophilic Crenarchaeote growing up to 108 °C

PubMed Central

Brügger, Kim; Chen, Lanming; Stark, Markus; Zibat, Arne; Redder, Peter; Ruepp, Andreas; Awayez, Mariana; She, Qunxin; Garrett, Roger A.; Klenk, Hans-Peter

2007-01-01

Hyperthermus butylicus, a hyperthermophilic neutrophile and anaerobe, is a member of the archaeal kingdom Crenarchaeota. Its genome consists of a single circular chromosome of 1,667,163 bp with a 53.7% G+C content. A total of 1672 genes were annotated, of which 1602 are protein-coding, and up to a third are specific to H. butylicus. In contrast to some other crenarchaeal genomes, a high level of GUG and UUG start codons are predicted. Two cdc6 genes are present, but neither could be linked unambiguously to an origin of replication. Many of the predicted metabolic gene products are associated with the fermentation of peptide mixtures including several peptidases with diverse specificities, and there are many encoded transporters. Most of the sulfur-reducing enzymes, hydrogenases and electron-transfer proteins were identified which are associated with energy production by reducing sulfur to H2S. Two large clusters of regularly interspaced repeats (CRISPRs) are present, one of which is associated with a crenarchaeal-type cas gene superoperon; none of the spacer sequences yielded good sequence matches with known archaeal chromosomal elements. The genome carries no detectable transposable or integrated elements, no inteins, and introns are exclusive to tRNA genes. This suggests that the genome structure is quite stable, possibly reflecting a constant, and relatively uncompetitive, natural environment. PMID:17350933
The complete mitochondrial genome of a spiraling whitefly, Aleurodicus dispersus Russell (Hemiptera: Aleyrodidae).

PubMed

Ming-Xing, Lu; Zhi-Teng, Chen; Wei-Wei, Yu; Yu-Zhou, Du

2017-03-01

We report the complete mitochondrial genome (mitogenome) of a spiraling whitefly, Aleurodicus dispersus (Hemiptera: Aleyrodidae). The 16 170 bp long genome consists of 13 protein-coding genes, 20 transfer RNAs, 2 ribosomal RNAs, and a control region. The A. dispersus mitogenome also includes a cytb-like non-coding region and shows several variations relative to the typical insect mitogenome. A phylogenetic tree has been constructed using the 13 protein-coding genes of 12 related species from Hemiptera. Our results would contribute to further study of phylogeny in Aleyrodidae and Hemiptera.
Novel mutation in forkhead box G1 (FOXG1) gene in an Indian patient with Rett syndrome.

PubMed

Das, Dhanjit Kumar; Jadhav, Vaishali; Ghattargi, Vikas C; Udani, Vrajesh

2014-03-15

Rett syndrome (RTT) is a severe neurodevelopmental disorder characterized by the progressive loss of intellectual functioning, fine and gross motor skills and communicative abilities, deceleration of head growth, and the development of stereotypic hand movements, occurring after a period of normal development. The classic form of RTT involves mutation in MECP2 while the involvement of CDKL5 and FOXG1 genes has been identified in atypical RTT phenotype. FOXG1 gene encodes for a fork-head box protein G1, a transcription factor acting primarily as transcriptional repressor through DNA binding in the embryonic telencephalon as well as a number of other neurodevelopmental processes. In this report we have described the molecular analysis of FOXG1 gene in Indian patients with Rett syndrome. FOXG1 gene mutation analysis was done in a cohort of 34 MECP2/CDKL5 mutation negative RTT patients. We have identified a novel mutation (p. D263VfsX190) in FOXG1 gene in a patient with congenital variant of Rett syndrome. This mutation resulted into a frameshift, thereby causing an alteration in the reading frames of the entire coding sequence downstream of the mutation. The start position of the frameshift (Asp263) and amino acid towards the carboxyl terminal end of the protein was found to be well conserved across species using multiple sequence alignment. Since the mutation is located at forkhead binding domain, the resultant mutation disrupts the secondary structure of the protein making it non-functional. This is the first report from India showing mutation in FOXG1 gene in Rett syndrome. Copyright © 2014 Elsevier B.V. All rights reserved.
EUGÈNE'HOM: a generic similarity-based gene finder using multiple homologous sequences

PubMed Central

Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

2003-01-01

EUGÈNE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGÈNE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGÈNE'HOM to handle sequences from a variety of organisms. The current target of EUGÈNE'HOM is plant sequences. The EUGÈNE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl. PMID:12824408
ChIPBase v2.0: decoding transcriptional regulatory networks of non-coding RNAs and protein-coding genes from ChIP-seq data.

PubMed

Zhou, Ke-Ren; Liu, Shun; Sun, Wen-Ju; Zheng, Ling-Ling; Zhou, Hui; Yang, Jian-Hua; Qu, Liang-Hu

2017-01-04

The abnormal transcriptional regulation of non-coding RNAs (ncRNAs) and protein-coding genes (PCGs) is contributed to various biological processes and linked with human diseases, but the underlying mechanisms remain elusive. In this study, we developed ChIPBase v2.0 (http://rna.sysu.edu.cn/chipbase/) to explore the transcriptional regulatory networks of ncRNAs and PCGs. ChIPBase v2.0 has been expanded with ∼10 200 curated ChIP-seq datasets, which represent about 20 times expansion when comparing to the previous released version. We identified thousands of binding motif matrices and their binding sites from ChIP-seq data of DNA-binding proteins and predicted millions of transcriptional regulatory relationships between transcription factors (TFs) and genes. We constructed 'Regulator' module to predict hundreds of TFs and histone modifications that were involved in or affected transcription of ncRNAs and PCGs. Moreover, we built a web-based tool, Co-Expression, to explore the co-expression patterns between DNA-binding proteins and various types of genes by integrating the gene expression profiles of ∼10 000 tumor samples and ∼9100 normal tissues and cell lines. ChIPBase also provides a ChIP-Function tool and a genome browser to predict functions of diverse genes and visualize various ChIP-seq data. This study will greatly expand our understanding of the transcriptional regulations of ncRNAs and PCGs. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Promoter analysis of the membrane protein gp64 gene of the cellular slime mold Polysphondylium pallidum.

PubMed

Takaoka, N; Fukuzawa, M; Saito, T; Sakaitani, T; Ochiai, H

1999-10-28

We cloned a genomic fragment of the membrane protein gp64 gene of the cellular slime mold Polysphondylium pallidum by inverse PCR. Primer extension analysis identified a major transcription start site 65 bp upstream of the translation start codon. The promoter region of the gp64 gene contains sequences homologous to a TATA box at position -47 to -37 and to an initiator (Inr, PyPyCAPyPyPyPy) at position -3 to +5 from the transcription start site. Successively truncated segments of the promoter were tested for their ability to drive expression of the beta-galactosidase reporter gene in transformed cells; also the difference in activity between growth conditions was compared. The results indicated that there are two positive vegetative regulatory elements extending between -187 and -62 bp from the transcription start site of the gp64 promoter; also their activity was two to three times higher in the cells grown with bacteria in shaken suspension than in the cells grown in an axenic medium.
Origin of sphinx, a young chimeric RNA gene in Drosophila melanogaster

PubMed Central

Wang, Wen; Brunet, Frédéric G.; Nevo, Eviatar; Long, Manyuan

2002-01-01

Non-protein-coding RNA genes play an important role in various biological processes. How new RNA genes originated and whether this process is controlled by similar evolutionary mechanisms for the origin of protein-coding genes remains unclear. A young chimeric RNA gene that we term sphinx (spx) provides the first insight into the early stage of evolution of RNA genes. spx originated as an insertion of a retroposed sequence of the ATP synthase chain F gene at the cytological region 60DB since the divergence of Drosophila melanogaster from its sibling species 2–3 million years ago. This retrosequence, which is located at 102F on the fourth chromosome, recruited a nearby exon and intron, thereby evolving a chimeric gene structure. This molecular process suggests that the mechanism of exon shuffling, which can generate protein-coding genes, also plays a role in the origin of RNA genes. The subsequent evolutionary process of spx has been associated with a high nucleotide substitution rate, possibly driven by a continuous positive Darwinian selection for a novel function, as is shown in its sex- and development-specific alternative splicing. To test whether spx has adapted to different environments, we investigated its population genetic structure in the unique “Evolution Canyon” in Israel, revealing a similar haplotype structure in spx, and thus similar evolutionary forces operating on spx between environments. PMID:11904380
A multigene phylogenetic synthesis for the class Lecanoromycetes (Ascomycota): 1307 fungi representing 1139 infrageneric taxa, 317 genera and 66 families

PubMed Central

Miadlikowska, Jolanta; Kauff, Frank; Högnabba, Filip; Oliver, Jeffrey C.; Molnár, Katalin; Fraker, Emily; Gaya, Ester; Hafellner, Josef; Hofstetter, Valérie; Gueidan, Cécile; Otálora, Mónica A.G.; Hodkinson, Brendan; Kukwa, Martin; Lücking, Robert; Björk, Curtis; Sipman, Harrie J.M.; Burgaz, Ana Rosa; Thell, Arne; Passo, Alfredo; Myllys, Leena; Goward, Trevor; Fernández-Brime, Samantha; Hestmark, Geir; Lendemer, James; Lumbsch, H. Thorsten; Schmull, Michaela; Schoch, Conrad; Sérusiaux, Emmanuël; Maddison, David R.; Arnold, A. Elizabeth; Lutzoni, François; Stenroos, Soili

2014-01-01

The Lecanoromycetes is the largest class of lichenized Fungi, and one of the most species-rich classes in the kingdom. Here we provide a multigene phylogenetic synthesis (using three ribosomal RNA-coding and two protein-coding genes) of the Lecanoromycetes based on 642 newly generated and 3329 publicly available sequences representing 1139 taxa, 317 genera, 66 families, 17 orders and five subclasses (four currently recognized: Acarosporomycetidae, Lecanoromycetidae, Ostropomycetidae, Umbilicariomycetidae; and one provisionarily recognized, ‘Candelariomycetidae’). Maximum likelihood phylogenetic analyses on four multigene datasets assembled using a cumulative supermatrix approach with a progressively higher number of species and missing data (5-gene, 5+4-gene, 5+4+3-gene and 5+4+3+2-gene datasets) show that the current classification includes non-monophyletic taxa at various ranks, which need to be recircumscribed and require revisionary treatments based on denser taxon sampling and more loci. Two newly circumscribed orders (Arctomiales and Hymeneliales in the Ostropomycetidae) and three families (Ramboldiaceae and Psilolechiaceae in the Lecanorales, and Strangosporaceae in the Lecanoromycetes inc. sed.) are introduced. The potential resurrection of the families Eigleraceae and Lopadiaceae is considered here to alleviate phylogenetic and classification disparities. An overview of the photobionts associated with the main fungal lineages in the Lecanoromycetes based on available published records is provided. A revised schematic classification at the family level in the phylogenetic context of widely accepted and newly revealed relationships across Lecanoromycetes is included. The cumulative addition of taxa with an increasing amount of missing data (i.e., a cumulative supermatrix approach, starting with taxa for which sequences were available for all five targeted genes and ending with the addition of taxa for which only two genes have been sequenced) revealed relatively stable relationships for many families and orders. However, the increasing number of taxa without the addition of more loci also resulted in an expected substantial loss of phylogenetic resolving power and support (especially for deep phylogenetic relationships), potentially including the misplacements of several taxa. Future phylogenetic analyses should include additional single copy protein-coding markers in order to improve the tree of the Lecanoromycetes. As part of this study, a new module (“Hypha”) of the freely available Mesquite software was developed to compare and display the internodal support values derived from this cumulative supermatrix approach. PMID:24747130
Complete mitochondrial genome of the Kwangtung skate: Dipturus kwangtungensis (Rajiformes, Rajidae).

PubMed

Jeong, Dageum; Kim, Sung; Kim, Choong-Gon; Lee, Youn-Ho

2015-01-01

The complete sequence of mitochondrial DNA of a Kwangtung skate, Dipturus kwangtungensis, was determined as being circular molecules of 16,912 bp including 2 rRNA, 22 tRNA, 13 protein coding genes (PCGs) and a control region. The arrangement of the PCGs is the same as that found in other Rajidae species. The nucleotide of L-strand which encodes most of the proteins is composed of 30.2% A, 27.4% C, 28.2% T and 14.2% G with a bias toward A+T slightly. Twelve of 13 PCGs are initiated by the ATG codon while COX1 starts with GTG. Only ND4 harbors the incomplete termination codon, TA. All tRNA genes have a typical clover-leaf structure of mitochondrial tRNA with the exception of tRNA(Ser)AGY, which has a reduced DHU arm. This mitogenome is the first report for a species of the genus Dipturus, which will become an important source of information on the phylogenetic relationship and the evolution of the genus Dipturus within the family Rajidae.
Complete genome sequencing of the luminescent bacterium, Vibrio qinghaiensis sp. Q67 using PacBio technology

NASA Astrophysics Data System (ADS)

Gong, Liang; Wu, Yu; Jian, Qijie; Yin, Chunxiao; Li, Taotao; Gupta, Vijai Kumar; Duan, Xuewu; Jiang, Yueming

2018-01-01

Vibrio qinghaiensis sp.-Q67 (Vqin-Q67) is a freshwater luminescent bacterium that continuously emits blue-green light (485 nm). The bacterium has been widely used for detecting toxic contaminants. Here, we report the complete genome sequence of Vqin-Q67, obtained using third-generation PacBio sequencing technology. Continuous long reads were attained from three PacBio sequencing runs and reads >500 bp with a quality value of >0.75 were merged together into a single dataset. This resultant highly-contiguous de novo assembly has no genome gaps, and comprises two chromosomes with substantial genetic information, including protein-coding genes, non-coding RNA, transposon and gene islands. Our dataset can be useful as a comparative genome for evolution and speciation studies, as well as for the analysis of protein-coding gene families, the pathogenicity of different Vibrio species in fish, the evolution of non-coding RNA and transposon, and the regulation of gene expression in relation to the bioluminescence of Vqin-Q67.

cncRNAs: Bi-functional RNAs with protein coding and non-coding functions

PubMed Central

Kumari, Pooja; Sampath, Karuna

2015-01-01

For many decades, the major function of mRNA was thought to be to provide protein-coding information embedded in the genome. The advent of high-throughput sequencing has led to the discovery of pervasive transcription of eukaryotic genomes and opened the world of RNA-mediated gene regulation. Many regulatory RNAs have been found to be incapable of protein coding and are hence termed as non-coding RNAs (ncRNAs). However, studies in recent years have shown that several previously annotated non-coding RNAs have the potential to encode proteins, and conversely, some coding RNAs have regulatory functions independent of the protein they encode. Such bi-functional RNAs, with both protein coding and non-coding functions, which we term as ‘cncRNAs’, have emerged as new players in cellular systems. Here, we describe the functions of some cncRNAs identified from bacteria to humans. Because the functions of many RNAs across genomes remains unclear, we propose that RNAs be classified as coding, non-coding or both only after careful analysis of their functions. PMID:26498036
A human haploid gene trap collection to study lncRNAs with unusual RNA biology.

PubMed

Kornienko, Aleksandra E; Vlatkovic, Irena; Neesen, Jürgen; Barlow, Denise P; Pauler, Florian M

2016-01-01

Many thousand long non-coding (lnc) RNAs are mapped in the human genome. Time consuming studies using reverse genetic approaches by post-transcriptional knock-down or genetic modification of the locus demonstrated diverse biological functions for a few of these transcripts. The Human Gene Trap Mutant Collection in haploid KBM7 cells is a ready-to-use tool for studying protein-coding gene function. As lncRNAs show remarkable differences in RNA biology compared to protein-coding genes, it is unclear if this gene trap collection is useful for functional analysis of lncRNAs. Here we use the uncharacterized LOC100288798 lncRNA as a model to answer this question. Using public RNA-seq data we show that LOC100288798 is ubiquitously expressed, but inefficiently spliced. The minor spliced LOC100288798 isoforms are exported to the cytoplasm, whereas the major unspliced isoform is nuclear localized. This shows that LOC100288798 RNA biology differs markedly from typical mRNAs. De novo assembly from RNA-seq data suggests that LOC100288798 extends 289kb beyond its annotated 3' end and overlaps the downstream SLC38A4 gene. Three cell lines with independent gene trap insertions in LOC100288798 were available from the KBM7 gene trap collection. RT-qPCR and RNA-seq confirmed successful lncRNA truncation and its extended length. Expression analysis from RNA-seq data shows significant deregulation of 41 protein-coding genes upon LOC100288798 truncation. Our data shows that gene trap collections in human haploid cell lines are useful tools to study lncRNAs, and identifies the previously uncharacterized LOC100288798 as a potential gene regulator.
[Cloning, expression and transcriptional analysis of biotin carboxyl carrier protein gene (accA) from Amycolatopsis mediterranei U32 ].

PubMed

Lu, Jie; Yao, Yufeng; Jiang, Weihong; Jiao, Ruishen

2003-02-01

Acetyl CoA carboxylase (EC 6.4.1.2, ACC) catalyzes the ATP-dependent carboxylation of acetyl CoA to yield malonyl CoA, which is the first committed step in fatty acid synthesis. A pair of degenerate PCR primers were designed according to the conserved amino acid sequence of AccA from M. tuberculosis and S. coelicolor. The product of the PCR amplification, a DNA fragment of 250bp was used as a probe for screening the U32 genomic cosmid library and its gene, accA, coding the biotinylated protein subunit of acetyl CoA carboxylase, was successfully cloned from U32. The accA ORF encodes a 598-amino-acid protein with the calculated molecular mass of 63.7kD, with 70.1% of G + C content. A typical Streptomyces RBS sequence, AGGAGG, was found at the - 6 position upstream of the start codon GTG. Analysis of the deduced amino acid sequence showed the presence of biotin-binding site and putative ATP-bicarbonate interaction region, which suggested the U32 AccA may act as a biotin carboxylase as well as a biotin carrier protein. Gene accA was then cloned into the pET28 (b) vector and expressed solubly in E. coli BL21 (DE3) by 0.1 mmol/L IPTG induction. Western blot confirmed the covalent binding of biotin with AccA. Northern blot analyzed transcriptional regulation of accA by 5 different nitrogen sources.
Quantifying the mechanisms of domain gain in animal proteins.

PubMed

Buljan, Marija; Frankish, Adam; Bateman, Alex

2010-01-01

Protein domains are protein regions that are shared among different proteins and are frequently functionally and structurally independent from the rest of the protein. Novel domain combinations have a major role in evolutionary innovation. However, the relative contributions of the different molecular mechanisms that underlie domain gains in animals are still unknown. By using animal gene phylogenies we were able to identify a set of high confidence domain gain events and by looking at their coding DNA investigate the causative mechanisms. Here we show that the major mechanism for gains of new domains in metazoan proteins is likely to be gene fusion through joining of exons from adjacent genes, possibly mediated by non-allelic homologous recombination. Retroposition and insertion of exons into ancestral introns through intronic recombination are, in contrast to previous expectations, only minor contributors to domain gains and have accounted for less than 1% and 10% of high confidence domain gain events, respectively. Additionally, exonization of previously non-coding regions appears to be an important mechanism for addition of disordered segments to proteins. We observe that gene duplication has preceded domain gain in at least 80% of the gain events. The interplay of gene duplication and domain gain demonstrates an important mechanism for fast neofunctionalization of genes.
The first mitochondrial genome for the butterfly family Riodinidae (Abisara fylloides) and its systematic implications.

PubMed

Zhao, Fang; Huang, Dun-Yuan; Sun, Xiao-Yan; Shi, Qing-Hui; Hao, Jia-Sheng; Zhang, Lan-Lan; Yang, Qun

2013-10-01

The Riodinidae is one of the lepidopteran butterfly families. This study describes the complete mitochondrial genome of the butterfly species Abisara fylloides, the first mitochondrial genome of the Riodinidae family. The results show that the entire mitochondrial genome of A. fylloides is 15 301 bp in length, and contains 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and a 423 bp A+T-rich region. The gene content, orientation and order are identical to the majority of other lepidopteran insects. Phylogenetic reconstruction was conducted using the concatenated 13 protein-coding gene (PCG) sequences of 19 available butterfly species covering all the five butterfly families (Papilionidae, Nymphalidae, Peridae, Lycaenidae and Riodinidae). Both maximum likelihood and Bayesian inference analyses highly supported the monophyly of Lycaenidae+Riodinidae, which was standing as the sister of Nymphalidae. In addition, we propose that the riodinids be categorized into the family Lycaenidae as a subfamilial taxon. The Riodinidae is one of the lepidopteran butterfly families. This study describes the complete mitochondrial genome of the butterfly species Abisara fylloides , the first mitochondrial genome of the Riodinidae family. The results show that the entire mitochondrial genome of A. fylloides is 15 301 bp in length, and contains 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and a 423 bp A+T-rich region. The gene content, orientation and order are identical to the majority of other lepidopteran insects. Phylogenetic reconstruction was conducted using the concatenated 13 protein-coding gene (PCG) sequences of 19 available butterfly species covering all the five butterfly families (Papilionidae, Nymphalidae, Peridae, Lycaenidae and Riodinidae). Both maximum likelihood and Bayesian inference analyses highly supported the monophyly of Lycaenidae+Riodinidae, which was standing as the sister of Nymphalidae. In addition, we propose that the riodinids be categorized into the family Lycaenidae as a subfamilial taxon.
Expression and function of AtMBD4L, the single gene encoding the nuclear DNA glycosylase MBD4L in Arabidopsis.

PubMed

Nota, Florencia; Cambiagno, Damián A; Ribone, Pamela; Alvarez, María E

2015-06-01

DNA glycosylases recognize and excise damaged or incorrect bases from DNA initiating the base excision repair (BER) pathway. Methyl-binding domain protein 4 (MBD4) is a member of the HhH-GPD DNA glycosylase superfamily, which has been well studied in mammals but not in plants. Our knowledge on the plant enzyme is limited to the activity of the Arabidopsis recombinant protein MBD4L in vitro. To start evaluating MBD4L in its biological context, we here characterized the structure, expression and effects of its gene, AtMBD4L. Phylogenetic analysis indicated that AtMBD4L belongs to one of the seven families of HhH-GPD DNA glycosylase genes existing in plants, and is unique on its family. Two AtMBD4L transcripts coding for active enzymes were detected in leaves and flowers. Transgenic plants expressing the AtMBD4L:GUS gene confined GUS activity to perivascular leaf tissues (usually adjacent to hydathodes), flowers (anthers at particular stages of development), and the apex of immature siliques. MBD4L-GFP fusion proteins showed nuclear localization in planta. Interestingly, overexpression of the full length MBD4L, but not a truncated enzyme lacking the DNA glycosylase domain, induced the BER gene LIG1 and enhanced tolerance to oxidative stress. These results suggest that endogenous MBD4L acts on particular tissues, is capable of activating BER, and may contribute to repair DNA damage caused by oxidative stress. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Murine homeobox-containing gene, Msx-1: analysis of genomic organization, promoter structure, and potential autoregulatory cis-acting elements.

PubMed

Kuzuoka, M; Takahashi, T; Guron, C; Raghow, R

1994-05-01

Detailed molecular organization of the coding and upstream regulatory regions of the murine homeodomain-containing gene, Msx-1, is reported. The protein-encoding portion of the gene is contained in two exons, 590 and 1214 bp in length, separated by a 2107-bp intron; the homeodomain is located in the second exon. The two-exon organization of the murine Msx-1 gene resembles a number of other homeodomain-containing genes. The 5'-(GTAAGT) and 3'-(CCCTAG) splicing junctions and the mRNA polyadenylation signal (UAUAA) of the murine Msx-1 gene are also characteristic of other vertebrate genes. By nuclease protection and primer extension assays, the start of transcription of the Msx-1 gene was located 256 bp upstream of the first AUG. Computer analysis of the promoter proximal 1280-bp sequence revealed a number of potentially important cis-regulatory sequences; these include the recognition elements for Ap-1, Ap-2, Ap-3, Sp-1, a possible binding site for RAR:RXR, and a number of TCF-1 consensus motifs. Importantly, a perfect reverse complement of (C/G)TTAATTG, which was recently shown to be an optimal binding sequence for the homeodomain of Msx-1 protein (K.M. Catron, N. Iler, and C. Abate (1993) Mol. Cell. Biol. 13:2354-2365), was also located in the murine Msx-1 promoter. Binding of bacterially expressed Msx-1 homeodomain polypeptide to Msx-1-specific oligonucleotide was experimentally demonstrated, raising a distinct possibility of autoregulation of this developmentally regulated gene.
The Yersinia pestis gcvB gene encodes two small regulatory RNA molecules

PubMed Central

McArthur, Sarah D; Pulvermacher, Sarah C; Stauffer, George V

2006-01-01

Background In recent years it has become clear that small non-coding RNAs function as regulatory elements in bacterial virulence and bacterial stress responses. We tested for the presence of the small non-coding GcvB RNAs in Y. pestis as possible regulators of gene expression in this organism. Results In this study, we report that the Yersinia pestis KIM6 gcvB gene encodes two small RNAs. Transcription of gcvB is activated by the GcvA protein and repressed by the GcvR protein. The gcvB-encoded RNAs are required for repression of the Y. pestis dppA gene, encoding the periplasmic-binding protein component of the dipeptide transport system, showing that the GcvB RNAs have regulatory activity. A deletion of the gcvB gene from the Y. pestis KIM6 chromosome results in a decrease in the generation time of the organism as well as a change in colony morphology. Conclusion The results of this study indicate that the Y. pestis gcvB gene encodes two small non-coding regulatory RNAs that repress dppA expression. A gcvB deletion is pleiotropic, suggesting that the sRNAs are likely involved in controlling genes in addition to dppA. PMID:16768793
The first draft genome of the aquatic model plant Lemna minor opens the route for future stress physiology research and biotechnological applications.

PubMed

Van Hoeck, Arne; Horemans, Nele; Monsieurs, Pieter; Cao, Hieu Xuan; Vandenhove, Hildegarde; Blust, Ronny

2015-01-01

Freshwater duckweed, comprising the smallest, fastest growing and simplest macrophytes has various applications in agriculture, phytoremediation and energy production. Lemna minor, the so-called common duckweed, is a model system of these aquatic plants for ecotoxicological bioassays, genetic transformation tools and industrial applications. Given the ecotoxic relevance and high potential for biomass production, whole-genome information of this cosmopolitan duckweed is needed. The 472 Mbp assembly of the L. minor genome (2n = 40; estimated 481 Mbp; 98.1 %) contains 22,382 protein-coding genes and 61.5 % repetitive sequences. The repeat content explains 94.5 % of the genome size difference in comparison with the greater duckweed, Spirodela polyrhiza (2n = 40; 158 Mbp; 19,623 protein-coding genes; and 15.79 % repetitive sequences). Comparison of proteins from other monocot plants, protein ortholog identification, OrthoMCL, suggests 1356 duckweed-specific groups (3367 proteins, 15.0 % total L. minor proteins) and 795 Lemna-specific groups (2897 proteins, 12.9 % total L. minor proteins). Interestingly, proteins involved in biosynthetic processes in response to various stimuli and hydrolase activities are enriched in the Lemna proteome in comparison with the Spirodela proteome. The genome sequence and annotation of L. minor protein-coding genes provide new insights in biological understanding and biomass production applications of Lemna species.
Identification of a Novel Transcript and Regulatory Mechanism for Microsomal Triglyceride Transfer Protein

PubMed Central

Suzuki, Takashi; Brown, Judy J.; Swift, Larry L.

2016-01-01

Microsomal triglyceride transfer protein (MTP) is essential for the assembly of triglyceride-rich apolipoprotein B-containing lipoproteins. Previous studies in our laboratory identified a novel splice variant of MTP in mice that we named MTP-B. MTP-B has a unique first exon (1B) located 2.7 kB upstream of the first exon (1A) for canonical MTP (MTP-A). The two mature isoforms, though nearly identical in sequence and function, have different tissue expression patterns. In this study we report the identification of a second MTP splice variant (MTP-C), which contains both exons 1B and 1A. MTP-C is expressed in all the tissues we tested. In cells transfected with MTP-C, protein expression was less than 15% of that found when the cells were transfected with MTP-A or MTP-B. In silico analysis of the 5’-UTR of MTP-C revealed seven ATGs upstream of the start site for MTP-A, which is the only viable start site in frame with the main coding sequence. One of those ATGs was located in the 5’-UTR for MTP-A. We generated reporter constructs in which the 5’-UTRs of MTP-A or MTP-C were inserted between an SV40 promoter and the coding sequence of the luciferase gene and transfected these constructs into HEK 293 cells. Luciferase activity was significantly reduced by the MTP-C 5’-UTR, but not by the MTP-A 5’-UTR. We conclude that alternative splicing plays a key role in regulating MTP expression by introducing unique 5’-UTRs, which contain elements that alter translation efficiency, enabling the cell to optimize MTP levels and activity. PMID:26771188
Assessment of allelic diversity in intron-containing Mal d 1 genes and their association to apple allergenicity

PubMed Central

Gao, Zhongshan; Weg, Eric W van de; Matos, Catarina I; Arens, Paul; Bolhaar, Suzanne THP; Knulst, Andre C; Li, Yinghui; Hoffmann-Sommergruber, Karin; Gilissen, Luud JWJ

2008-01-01

Background Mal d 1 is a major apple allergen causing food allergic symptoms of the oral allergy syndrome (OAS) in birch-pollen sensitised patients. The Mal d 1 gene family is known to have at least 7 intron-containing and 11 intronless members that have been mapped in clusters on three linkage groups. In this study, the allelic diversity of the seven intron-containing Mal d 1 genes was assessed among a set of apple cultivars by sequencing or indirectly through pedigree genotyping. Protein variant constitutions were subsequently compared with Skin Prick Test (SPT) responses to study the association of deduced protein variants with allergenicity in a set of 14 cultivars. Results From the seven intron-containing Mal d 1 genes investigated, Mal d 1.01 and Mal d 1.02 were highly conserved, as nine out of ten cultivars coded for the same protein variant, while only one cultivar coded for a second variant. Mal d 1.04, Mal d 1.05 and Mal d 1.06 A, B and C were more variable, coding for three to six different protein variants. Comparison of Mal d 1 allelic composition between the high-allergenic cultivar Golden Delicious and the low-allergenic cultivars Santana and Priscilla, which are linked in pedigree, showed an association between the protein variants coded by the Mal d 1.04 and -1.06A genes (both located on linkage group 16) with allergenicity. This association was confirmed in 10 other cultivars. In addition, Mal d 1.06A allele dosage effects associated with the degree of allergenicity based on prick to prick testing. Conversely, no associations were observed for the protein variants coded by the Mal d 1.01 (on linkage group 13), -1.02, -1.06B, -1.06C genes (all on linkage group 16), nor by the Mal d 1.05 gene (on linkage group 6). Conclusion Protein variant compositions of Mal d 1.04 and -1.06A and, in case of Mal d 1.06A, allele doses are associated with the differences in allergenicity among fourteen apple cultivars. This information indicates the involvement of qualitative as well as quantitative factors in allergenicity and warrants further research in the relative importance of quantitative and qualitative aspects of Mal d 1 gene expression on allergenicity. Results from this study have implications for medical diagnostics, immunotherapy, clinical research and breeding schemes for new hypo-allergenic cultivars. PMID:19014530
How to calculate the non-synonymous to synonymous rate ratio of protein-coding genes under the Fisher-Wright mutation-selection framework.

PubMed

Dos Reis, Mario

2015-04-01

First principles of population genetics are used to obtain formulae relating the non-synonymous to synonymous substitution rate ratio to the selection coefficients acting at codon sites in protein-coding genes. Two theoretical cases are discussed and two examples from real data (a chloroplast gene and a virus polymerase) are given. The formulae give much insight into the dynamics of non-synonymous substitutions and may inform the development of methods to detect adaptive evolution. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Informational structure of genetic sequences and nature of gene splicing

NASA Astrophysics Data System (ADS)

Trifonov, E. N.

1991-10-01

Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
Complete mitochondrial genome sequence of the heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus).

PubMed

Hu, Bo; Liu, Dong-Xing; Zhang, Yu-Qing; Song, Jian-Tao; Ji, Xian-Fei; Hou, Zhi-Qiang; Zhang, Zhen-Hai

2016-05-01

In this study we sequenced the complete mitochondrial genome sequencing of a heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus) for the first time. The total length of the mitogenome was 16,267 bp. It harbored 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 1 non-coding control region.
Novel coding, translation, and gene expression of a replicating covalently closed circular RNA of 220 nt

PubMed Central

AbouHaidar, Mounir Georges; Venkataraman, Srividhya; Golshani, Ashkan; Liu, Bolin; Ahmad, Tauqeer

2014-01-01

The highly structured (64% GC) covalently closed circular (CCC) RNA (220 nt) of the virusoid associated with rice yellow mottle virus codes for a 16-kDa highly basic protein using novel modalities for coding, translation, and gene expression. This CCC RNA is the smallest among all known viroids and virusoids and the only one that codes proteins. Its sequence possesses an internal ribosome entry site and is directly translated through two (or three) completely overlapping ORFs (shifting to a new reading frame at the end of each round). The initiation and termination codons overlap UGAUGA (underline highlights the initiation codon AUG within the combined initiation-termination sequence). Termination codons can be ignored to obtain larger read-through proteins. This circular RNA with no noncoding sequences is a unique natural supercompact “nanogenome.” PMID:25253891
Infection of capilloviruses requires subgenomic RNAs whose transcription is controlled by promoter-like sequences conserved among flexiviruses.

PubMed

Komatsu, Ken; Hirata, Hisae; Fukagawa, Takako; Yamaji, Yasuyuki; Okano, Yukari; Ishikawa, Kazuya; Adachi, Tatsushi; Maejima, Kensaku; Hashimoto, Masayoshi; Namba, Shigetou

2012-07-01

The first open-reading frame (ORF) of apple stem grooving virus (ASGV), of the genus Capillovirus, encodes an apparently chimeric polyprotein containing conserved regions for replicase (Rep) and coat protein (CP). However, our previous study revealed that ASGV mutants with distinct and discontinuous Rep- and CP-coding regions successfully infect plants, indicating that CP expressed via a subgenomic RNA (sgRNA) is sufficient for viability of the virus. Here we identified a transcription start site of the CP sgRNA and revealed that CP translated from the sgRNA is essential for ASGV infection. We mapped the transcription start sites of both the CP and the movement protein (MP) sgRNAs of ASGV and found a hexanucleotide motif, UUAGGU, conserved upstream from both sgRNA transcription start sites. Mutational analysis of the putative CP initiation codon and of the UUAGGU sequence upstream from the transcription start site of CP sgRNA demonstrated their importance for ASGV accumulation. Our results also demonstrated that potato virus T (PVT), an unassigned species closely related to ASGV, produces two sgRNAs putatively deployed for the CP and MP expression and that the same hexanucleotide motif as found in ASGV is located upstream from the transcription start sites of both sgRNAs. This motif, which constituted putative core elements of the sgRNA promoter, is broadly conserved among viruses in the families Alphaflexiviridae and Betaflexiviridae, suggesting that the gene expression strategy of the viruses in both families has been conserved throughout evolution. Copyright © 2012 Elsevier B.V. All rights reserved.
PAR-CLIP data indicate that Nrd1-Nab3-dependent transcription termination regulates expression of hundreds of protein coding genes in yeast

PubMed Central

2014-01-01

Background Nrd1 and Nab3 are essential sequence-specific yeast RNA binding proteins that function as a heterodimer in the processing and degradation of diverse classes of RNAs. These proteins also regulate several mRNA coding genes; however, it remains unclear exactly what percentage of the mRNA component of the transcriptome these proteins control. To address this question, we used the pyCRAC software package developed in our laboratory to analyze CRAC and PAR-CLIP data for Nrd1-Nab3-RNA interactions. Results We generated high-resolution maps of Nrd1-Nab3-RNA interactions, from which we have uncovered hundreds of new Nrd1-Nab3 mRNA targets, representing between 20 and 30% of protein-coding transcripts. Although Nrd1 and Nab3 showed a preference for binding near 5′ ends of relatively short transcripts, they bound transcripts throughout coding sequences and 3′ UTRs. Moreover, our data for Nrd1-Nab3 binding to 3′ UTRs was consistent with a role for these proteins in the termination of transcription. Our data also support a tight integration of Nrd1-Nab3 with the nutrient response pathway. Finally, we provide experimental evidence for some of our predictions, using northern blot and RT-PCR assays. Conclusions Collectively, our data support the notion that Nrd1 and Nab3 function is tightly integrated with the nutrient response and indicate a role for these proteins in the regulation of many mRNA coding genes. Further, we provide evidence to support the hypothesis that Nrd1-Nab3 represents a failsafe termination mechanism in instances of readthrough transcription. PMID:24393166
Characterization and mapping of the mouse NDP (Norrie disease) locus (Ndp).

PubMed

Battinelli, E M; Boyd, Y; Craig, I W; Breakefield, X O; Chen, Z Y

1996-02-01

Norrie disease is a severe X-linked recessive neurological disorder characterized by congenital blindness with progressive loss of hearing. Over half of Norrie patients also manifest different degrees of mental retardation. The gene for Norrie disease (NDP) has recently been cloned and characterized. With the human NDP cDNA, mouse genomic phage libraries were screened for the homolog of the gene. Comparison between mouse and human genomic DNA blots hybridized with the NDP cDNA, as well as analysis of phage clones, shows that the mouse NDP gene is 29 kb in size (28 kb for the human gene). The organization in the two species is very similar. Both have three exons with similar-sized introns and identical exon-intron boundaries between exon 2 and 3. The mouse open reading frame is 393 bp and, like the human coding sequence, is encoded in exons 2 and 3. The absence of six nucleotides in the second mouse exon results in the encoded protein being two amino acids smaller than its human counterpart. The overall homology between the human and mouse NDP protein is 95% and is particularly high (99%) in exon 3, consistent with the apparent functional importance of this region. Analysis of transcription initiation sites suggests the presence of multiple start sites associated with expression of the mouse NDP gene. Pedigree analysis of an interspecific mouse backcross localizes the mouse NDP gene close to Maoa in the conserved segment, which runs from CYBB to PFC in both human and mouse.
Discovery of the First Germline-Restricted Gene by Subtractive Transcriptomic Analysis in the Zebra Finch, Taeniopygia guttata.

PubMed

Biederman, Michelle K; Nelson, Megan M; Asalone, Kathryn C; Pedersen, Alyssa L; Saldanha, Colin J; Bracht, John R

2018-05-21

Developmentally programmed genome rearrangements are rare in vertebrates, but have been reported in scattered lineages including the bandicoot, hagfish, lamprey, and zebra finch (Taeniopygia guttata) [1]. In the finch, a well-studied animal model for neuroendocrinology and vocal learning [2], one such programmed genome rearrangement involves a germline-restricted chromosome, or GRC, which is found in germlines of both sexes but eliminated from mature sperm [3, 4]. Transmitted only through the oocyte, it displays uniparental female-driven inheritance, and early in embryonic development is apparently eliminated from all somatic tissue in both sexes [3, 4]. The GRC comprises the longest finch chromosome at over 120 million base pairs [3], and previously the only known GRC-derived sequence was repetitive and non-coding [5]. Because the zebra finch genome project was sourced from male muscle (somatic) tissue [6], the remaining genomic sequence and protein-coding content of the GRC remain unknown. Here we report the first protein-coding gene from the GRC: a member of the α-soluble N-ethylmaleimide sensitive fusion protein (NSF) attachment protein (α-SNAP) family hitherto missing from zebra finch gene annotations. In addition to the GRC-encoded α-SNAP, we find an additional paralogous α-SNAP residing in the somatic genome (a somatolog)-making the zebra finch the first example in which α-SNAP is not a single-copy gene. We show divergent, sex-biased expression for the paralogs and also that positive selection is detectable across the bird α-SNAP lineage, including the GRC-encoded α-SNAP. This study presents the identification and evolutionary characterization of the first protein-coding GRC gene in any organism. Copyright © 2018 Elsevier Ltd. All rights reserved.
Both coding exons of the c-myc gene contribute to its posttranscriptional regulation in the quiescent liver and regenerating liver and after protein synthesis inhibition.

PubMed Central

Lavenu, A; Pistoi, S; Pournin, S; Babinet, C; Morello, D

1995-01-01

In vivo, the steady-state level of c-myc mRNA is mainly controlled by posttranscriptional mechanisms. Using a panel of transgenic mice in which various versions of the human c-myc proto-oncogene were under the control of major histocompatibility complex H-2Kb class I regulatory sequences, we have shown that the 5' and the 3' noncoding sequences are dispensable for obtaining a regulated expression of the transgene in adult quiescent tissues, at the start of liver regeneration, and after inhibition of protein synthesis. These results indicated that the coding sequences were sufficient to ensure a regulated c-myc expression. In the present study, we have pursued this analysis with transgenes containing one or the other of the two c-myc coding exons either alone or in association with the c-myc 3' untranslated region. We demonstrate that each of the exons contains determinants which control c-myc mRNA expression. Moreover, we show that in the liver, c-myc exon 2 sequences are able to down-regulate an otherwise stable H-2K mRNA when embedded within it and to induce its transient accumulation after cycloheximide treatment and soon after liver ablation. Finally, the use of transgenes with different coding capacities has allowed us to postulate that the primary mRNA sequence itself and not c-Myc peptides is an important component of c-myc posttranscriptional regulation. PMID:7623834

Transcription of a protein-coding gene on B chromosomes of the Siberian roe deer (Capreolus pygargus)

PubMed Central

2013-01-01

Background Most eukaryotic species represent stable karyotypes with a particular diploid number. B chromosomes are additional to standard karyotypes and may vary in size, number and morphology even between cells of the same individual. For many years it was generally believed that B chromosomes found in some plant, animal and fungi species lacked active genes. Recently, molecular cytogenetic studies showed the presence of additional copies of protein-coding genes on B chromosomes. However, the transcriptional activity of these genes remained elusive. We studied karyotypes of the Siberian roe deer (Capreolus pygargus) that possess up to 14 B chromosomes to investigate the presence and expression of genes on supernumerary chromosomes. Results Here, we describe a 2 Mbp region homologous to cattle chromosome 3 and containing TNNI3K (partial), FPGT, LRRIQ3 and a large gene-sparse segment on B chromosomes of the Siberian roe deer. The presence of the copy of the autosomal region was demonstrated by B-specific cDNA analysis, PCR assisted mapping, cattle bacterial artificial chromosome (BAC) clone localization and quantitative polymerase chain reaction (qPCR). By comparative analysis of B-specific and non-B chromosomal sequences we discovered some B chromosome-specific mutations in protein-coding genes, which further enabled the detection of a FPGT-TNNI3K transcript expressed from duplicated genes located on B chromosomes in roe deer fibroblasts. Conclusions Discovery of a large autosomal segment in all B chromosomes of the Siberian roe deer further corroborates the view of an autosomal origin for these elements. Detection of a B-derived transcript in fibroblasts implies that the protein coding sequences located on Bs are not fully inactivated. The origin, evolution and effect on host of B chromosomal genes seem to be similar to autosomal segmental duplications, which reinforces the view that supernumerary chromosomal elements might play an important role in genome evolution. PMID:23915065
Xuhuai goat H-FABP gene clone, subcellular localization of expression products and the preparation of transgenic mice.

PubMed

Yin, Yan-hui; Li, Bi-chun; Wei, Guang-hui; Zhu, Cai-ye; Li, Wei; Zhang, Ya-ni; Du, Li-xin; Cao, Wen-guang

2012-05-01

The aim of this study was to clone the heart-type fatty acid binding protein (H-FABP) gene of Xuhuai goat, to explore it bioinformatically, and analyze the subcellular localization using enhanced green fluorescent protein (EGFP). The results showed that the coding sequence (CDS) length of Xuhuai goat H-FABP gene was 402 bp, encoding 133 amino acids (GenBank accession number AY466498.1). The H-FABP cDNA coding sequence was compared with the corresponding region of human, chicken, brown rat, cow, wild boar, donkey, and zebrafish. The similarity were 89%, 76%, 85%, 84%, 93%, 91%, 70%, respectively. For the corresponding amino acid sequences, the similarity were 90%, 79%, 88%, 97%, 95%, 94%, 72%, respectively. This study did not find the signal peptide region in the H-FABP protein; it revealed that H-FABP protein might be a nonsecreted protein. H-FABP expression was detected in vitro by reverse transcription-polymerase chain reaction (RT-PCR), and the EGFP-H-FABP fusion protein was localized to the cytoplasm. The gene could also be transiently and permanently expressed in mice.
[Regulation of heat shock gene expression in response to stress].

PubMed

Garbuz, D G

2017-01-01

Heat shock (HS) genes, or stress genes, code for a number of proteins that collectively form the most ancient and universal stress defense system. The system determines the cell capability of adaptation to various adverse factors and performs a variety of auxiliary functions in normal physiological conditions. Common stress factors, such as higher temperatures, hypoxia, heavy metals, and others, suppress transcription and translation for the majority of genes, while HS genes are upregulated. Transcription of HS genes is controlled by transcription factors of the HS factor (HSF) family. Certain HSFs are activated on exposure to higher temperatures or other adverse factors to ensure stress-induced HS gene expression, while other HSFs are specifically activated at particular developmental stages. The regulation of the main mammalian stress-inducible factor HSF1 and Drosophila melanogaster HSF includes many components, such as a variety of early warning signals indicative of abnormal cell activity (e.g., increases in intracellular ceramide, cytosolic calcium ions, or partly denatured proteins); protein kinases, which phosphorylate HSFs at various Ser residues; acetyltransferases; and regulatory proteins, such as SUMO and HSBP1. Transcription factors other than HSFs are also involved in activating HS gene transcription; the set includes D. melanogaster GAF, mammalian Sp1 and NF-Y, and other factors. Transcription of several stress genes coding for molecular chaperones of the glucose-regulated protein (GRP) family is predominantly regulated by another stress-detecting system, which is known as the unfolded protein response (UPR) system and is activated in response to massive protein misfolding in the endoplasmic reticulum and mitochondrial matrix. A translational fine tuning of HS protein expression occurs via changing the phosphorylation status of several proteins involved in translation initiation. In addition, specific signal sequences in the 5'-UTRs of some HS protein mRNAs ensure their preferential translation in stress.
Mitochondrial genomes of the jungle crow Corvus macrorhynchos (Passeriformes: Corvidae) from shed feathers and a phylogenetic analysis of genus Corvus using mitochondrial protein-coding genes.

PubMed

Krzeminska, Urszula; Wilson, Robyn; Rahman, Sadequr; Song, Beng Kah; Seneviratne, Sampath; Gan, Han Ming; Austin, Christopher M

2016-07-01

The complete mitochondrial genomes of two jungle crows (Corvus macrorhynchos) were sequenced. DNA was extracted from tissue samples obtained from shed feathers collected in the field in Sri Lanka and sequenced using the Illumina MiSeq Personal Sequencer. Jungle crow mitogenomes have a structural organization typical of the genus Corvus and are 16,927 bp and 17,066 bp in length, both comprising 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal subunit genes, and a non-coding control region. In addition, we complement already available house crow (Corvus spelendens) mitogenome resources by sequencing an individual from Singapore. A phylogenetic tree constructed from Corvidae family mitogenome sequences available on GenBank is presented. We confirm the monophyly of the genus Corvus and propose to use complete mitogenome resources for further intra- and interspecies genetic studies.
Complete mitochondrial genome of Cynopterus sphinx (Pteropodidae: Cynopterus).

PubMed

Li, Linmiao; Li, Min; Wu, Zhengjun; Chen, Jinping

2015-01-01

We have characterized the complete mitochondrial genome of Cynopterus sphinx (Pteropodidae: Cynopterus) and described its organization in this study. The total length of C. sphinx complete mitochondrial genome was 16,895 bp with the base composition of 32.54% A, 14.05% G, 25.82% T and 27.59% C. The complete mitochondrial genome included 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes (12S rRNA and 16S rRNA) and 1 control region (D-loop). The control region was 1435 bp long with the sequence CATACG repeat 64 times. Three protein-coding genes (ND1, COI and ND4) were ended with incomplete stop codon TA or T.
Not so bad after all: retroviruses and long terminal repeat retrotransposons as a source of new genes in vertebrates.

PubMed

Naville, M; Warren, I A; Haftek-Terreau, Z; Chalopin, D; Brunet, F; Levin, P; Galiana, D; Volff, J-N

2016-04-01

Viruses and transposable elements, once considered as purely junk and selfish sequences, have repeatedly been used as a source of novel protein-coding genes during the evolution of most eukaryotic lineages, a phenomenon called 'molecular domestication'. This is exemplified perfectly in mammals and other vertebrates, where many genes derived from long terminal repeat (LTR) retroelements (retroviruses and LTR retrotransposons) have been identified through comparative genomics and functional analyses. In particular, genes derived from gag structural protein and envelope (env) genes, as well as from the integrase-coding and protease-coding sequences, have been identified in humans and other vertebrates. Retroelement-derived genes are involved in many important biological processes including placenta formation, cognitive functions in the brain and immunity against retroelements, as well as in cell proliferation, apoptosis and cancer. These observations support an important role of retroelement-derived genes in the evolution and diversification of the vertebrate lineage. Copyright © 2016 European Society of Clinical Microbiology and Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Complete mitochondrial genome of Cuora trifasciata (Chinese three-striped box turtle), and a comparative analysis with other box turtles.

PubMed

Li, Wei; Zhang, Xin-Cheng; Zhao, Jian; Shi, Yan; Zhu, Xin-Ping

2015-01-25

Cuora trifasciata has become one of the most critically endangered species in the world. The complete mitochondrial genome of C. trifasciata (Chinese three-striped box turtle) was determined in this study. Its mitochondrial genome is a 16,575-bp-long circular molecule that consists of 37 genes that are typically found in other vertebrates. And the basic characteristics of the C. trifasciata mitochondrial genome were also determined. Moreover, a comparison of C. trifasciata with Cuora cyclornata, Cuora pani and Cuora aurocapitata indicated that the four mitogenomics differed in length, codons, overlaps, 13 protein-coding genes (PCGs), ND3, rRNA genes, control region, and other aspects. Phylogenetic analysis with Bayesian inference and maximum likelihood based on 12 protein-coding genes of the genus Cuora indicated the phylogenetic position of C. trifasciata within Cuora. The phylogenetic analysis also showed that C. trifasciata from Vietnam and China formed separate monophyletic clades with different Cuora species. The results of nucleotide base compositions, protein-coding genes and phylogenetic analysis showed that C. trifasciata from these two countries may represent different Cuora species. Copyright © 2014 Elsevier B.V. All rights reserved.
Current and future implications of basic and translational research on amyloid-β peptide production and removal pathways

PubMed Central

Bohm, C.; Chen, F.; Sevalle, J.; Qamar, S.; Dodd, R.; Li, Y.; Schmitt-Ulms, G.; Fraser, P.E.; St George-Hyslop, P.H.

2015-01-01

Inherited variants in multiple different genes are associated with increased risk for Alzheimer's disease (AD). In many of these genes, the inherited variants alter some aspect of the production or clearance of the neurotoxic amyloid β-peptide (Aβ). Thus missense, splice site or duplication mutants in the presenilin 1 (PS1), presenilin 2 (PS2) or the amyloid precursor protein (APP) genes, which alter the levels or shift the balance of Aβ produced, are associated with rare, highly penetrant autosomal dominant forms of Familial Alzheimer's Disease (FAD). Similarly, the more prevalent late-onset forms of AD are associated with both coding and non-coding variants in genes such as SORL1, PICALM and ABCA7 that affect the production and clearance of Aβ. This review summarises some of the recent molecular and structural work on the role of these genes and the proteins coded by them in the biology of Aβ. We also briefly outline how the emerging knowledge about the pathways involved in Aβ generation and clearance can be potentially targeted therapeutically. This article is part of Special Issue entitled "Neuronal Protein". PMID:25748120
Bio—Cryptography: A Possible Coding Role for RNA Redundancy

NASA Astrophysics Data System (ADS)

Regoli, M.

2009-03-01

The RNA-Crypto System (shortly RCS) is a symmetric key algorithm to cipher data. The idea for this new algorithm starts from the observation of nature. In particular from the observation of RNA behavior and some of its properties. The RNA sequences have some sections called Introns. Introns, derived from the term "intragenic regions," are non-coding sections of precursor mRNA (pre-mRNA) or other RNAs, that are removed (spliced out of the RNA) before the mature RNA is formed. Once the introns have been spliced out of a pre-mRNA, the resulting mRNA sequence is ready to be translated into a protein. The corresponding parts of a gene are known as introns as well. The nature and the role of Introns in the pre-mRNA is not clear and it is under ponderous researches by biologists but, in our case, we will use the presence of Introns in the RNA-Crypto System output as a strong method to add chaotic non coding information and an unnecessary behavior in the access to the secret key to code the messages. In the RNA-Crypto System algorithm the introns are sections of the ciphered message with non-coding information as well as in the precursor mRNA.
Base composition and expression level of human genes.

PubMed

Arhondakis, Stilianos; Auletta, Fabio; Torelli, Giuseppe; D'Onofrio, Giuseppe

2004-01-21

It is well known that the gene distribution is non-uniform in the human genome, reaching the highest concentration in the GC-rich isochores. Also the amino acid frequencies, and the hydrophobicity, of the corresponding encoded proteins are affected by the high GC level of the genes localized in the GC-rich isochores. It was hypothesized that the gene expression level as well is higher in GC-rich compared to GC-poor isochores [Mol. Biol. Evol. 10 (1993) 186]. Several features of human genes and proteins, namely expression level, coding and non-coding lengths, and hydrophobicity were investigated in the present paper. The results support the hypothesis reported above, since all the parameters so far studied converge to the same conclusion, that the average expression level of the GC-rich genes is significantly higher than that of the GC-poor genes.
GenePRIMP: A Gene Prediction Improvement Pipeline For Prokaryotic Genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kyrpides, Nikos C.; Ivanova, Natalia N.; Pati, Amrita

2010-07-08

GenePRIMP (Gene Prediction Improvement Pipeline, Http://geneprimp.jgi-psf.org), a computational process that performs evidence-based evaluation of gene models in prokaryotic genomes and reports anomalies including inconsistent start sites, missing genes, and split genes. We show that manual curation of gene models using the anomaly reports generated by GenePRIMP improves their quality and demonstrate the applicability of GenePRIMP in improving finishing quality and comparing different genome sequencing and annotation technologies. Keywords in context: Gene model, Quality Control, Translation start sites, Automatic correction. Hardware requirements; PC, MAC; Operating System: UNIX/LINUX; Compiler/Version: Perl 5.8.5 or higher; Special requirements: NCBI Blast and nr installation; File Types:more » Source Code, Executable module(s), Sample problem input data; installation instructions other; programmer documentation. Location/transmission: http://geneprimp.jgi-psf.org/gp.tar.gz« less
Capturing the Biofuel Wellhead and Powerhouse: The Chloroplast and Mitochondrial Genomes of the Leguminous Feedstock Tree Pongamia pinnata

PubMed Central

Kazakoff, Stephen H.; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T.; Gresshoff, Peter M.

2012-01-01

Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® ‘Second Generation DNA Sequencing (2GS)’ and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites. PMID:23272141
Capturing the biofuel wellhead and powerhouse: the chloroplast and mitochondrial genomes of the leguminous feedstock tree Pongamia pinnata.

PubMed

Kazakoff, Stephen H; Imelfort, Michael; Edwards, David; Koehorst, Jasper; Biswas, Bandana; Batley, Jacqueline; Scott, Paul T; Gresshoff, Peter M

2012-01-01

Pongamia pinnata (syn. Millettia pinnata) is a novel, fast-growing arboreal legume that bears prolific quantities of oil-rich seeds suitable for the production of biodiesel and aviation biofuel. Here, we have used Illumina® 'Second Generation DNA Sequencing (2GS)' and a new short-read de novo assembler, SaSSY, to assemble and annotate the Pongamia chloroplast (152,968 bp; cpDNA) and mitochondrial (425,718 bp; mtDNA) genomes. We also show that SaSSY can be used to accurately assemble 2GS data, by re-assembling the Lotus japonicus cpDNA and in the process assemble its mtDNA (380,861 bp). The Pongamia cpDNA contains 77 unique protein-coding genes and is almost 60% gene-dense. It contains a 50 kb inversion common to other legumes, as well as a novel 6.5 kb inversion that is responsible for the non-disruptive, re-orientation of five protein-coding genes. Additionally, two copies of an inverted repeat firmly place the species outside the subclade of the Fabaceae lacking the inverted repeat. The Pongamia and L. japonicus mtDNA contain just 33 and 31 unique protein-coding genes, respectively, and like other angiosperm mtDNA, have expanded intergenic and multiple repeat regions. Through comparative analysis with Vigna radiata we measured the average synonymous and non-synonymous divergence of all three legume mitochondrial (1.59% and 2.40%, respectively) and chloroplast (8.37% and 8.99%, respectively) protein-coding genes. Finally, we explored the relatedness of Pongamia within the Fabaceae and showed the utility of the organellar genome sequences by mapping transcriptomic data to identify up- and down-regulated stress-responsive gene candidates and confirm in silico predicted RNA editing sites.
Nucleotide sequence of the gene for the Mr 32,000 thylakoid membrane protein from Spinacia oleracea and Nicotiana debneyi predicts a totally conserved primary translation product of Mr 38,950

PubMed Central

Zurawski, Gerard; Bohnert, Hans J.; Whitfeld, Paul R.; Bottomley, Warwick

1982-01-01

The gene for the so-called Mr 32,000 rapidly labeled photosystem II thylakoid membrane protein (here designated psbA) of spinach (Spinacia oleracea) chloroplasts is located on the chloroplast DNA in the large single-copy region immediately adjacent to one of the inverted repeat sequences. In this paper we show that the size of the mRNA for this protein is ≈ 1.25 kilobases and that the direction of transcription is towards the inverted repeat unit. The nucleotide sequence of the gene and its flanking regions is presented. The only large open reading frame in the sequence codes for a protein of Mr 38,950. The nucleotide sequence of psbA from Nicotiana debneyi also has been determined, and comparison of the sequences from the two species shows them to be highly conserved (>95% homology) throughout the entire reading frame. Conservation of the amino acid sequence is absolute, there being no changes in a total of 353 residues. This leads us to conclude that the primary translation product of psbA must be a protein of Mr 38,950. The protein is characterized by the complete absence of lysine residues and is relatively rich in hydrophobic amino acids, which tend to be clustered. Transcription of spinach psbA starts about 86 base pairs before the first ATG codon. Immediately upstream from this point there is a sequence typical of that found in E. coli promoters. An almost identical sequence occurs in the equivalent region of N. debneyi DNA. Images PMID:16593262
An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region.

PubMed Central

Ashburner, M; Misra, S; Roote, J; Lewis, S E; Blazej, R; Davis, T; Doyle, C; Galle, R; George, R; Harris, N; Hartzell, G; Harvey, D; Hong, L; Houston, K; Hoskins, R; Johnson, G; Martin, C; Moshrefi, A; Palazzolo, M; Reese, M G; Spradling, A; Tsang, G; Wan, K; Whitelaw, K; Celniker, S

1999-01-01

A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized "Adh region." A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species.Before beginning a Hunt, it is wise to ask someone what you are looking for before you begin looking for it. Milne 1926 PMID:10471707
Analysis of polyglutamine-coding repeats in the TATA-binding protein in different human populations and in patients with schizophrenia an bipolar affective disorder

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rubinsztein, D.C.; Leggo, J.; Crow, T.J.

A new class of disease (including Huntington disease, Kennedy disease, and spinocerebellar ataxias types 1 and 3) results from abnormal expansions of CAG trinucleotides in the coding regions of genes. In all of these diseases the CAG repeats are thought to be translated into polyglutamine tracts. There is accumulating evidence arguing for CAG trinucleotide expansions as one of the causative disease mutations in schizophrenia and bipolar affective disorder. We and others believe that the TATA-binding protein (TBP) is an important candidate to investigate in these diseases as it contains a highly polymorphic stretch of glutamine codons, which are close tomore » the threshold length where the polyglutamine tracts start to be associated with disease. Thus, we examined the lengths of this polyglutamine repeat in normal unrelated East Anglians, South African Blacks, sub-Saharan Africans mainly from Nigeria, and Asian Indians. We also examined 43 bipolar affective disorder patients and 65 schizophrenic patients. The range of polyglutamine tract-lengths that we found in humans was from 26-42 codons. No patients with bipolar affective disorder and schizophrenia had abnormal expansions at this locus. 22 refs., 1 tab.« less
Different small, acid-soluble proteins of the alpha/beta type have interchangeable roles in the heat and UV radiation resistance of Bacillus subtilis spores.

PubMed Central

Mason, J M; Setlow, P

1987-01-01

Spores of Bacillus subtilis strains which carry deletion mutations in one gene (sspA) or two genes (sspA and sspB) which code for major alpha/beta-type small, acid-soluble spore proteins (SASP) are known to be much more sensitive to heat and UV radiation than wild-type spores. This heat- and UV-sensitive phenotype was cured completely or in part by introduction into these mutant strains of one or more copies of the sspA or sspB genes themselves; multiple copies of the B. subtilis sspD gene, which codes for a minor alpha/beta-type SASP; or multiple copies of the SASP-C gene, which codes for a major alpha/beta-type SASP of Bacillus megaterium. These findings suggest that alpha/beta-type SASP play interchangeable roles in the heat and UV radiation resistance of bacterial spores. Images PMID:3112127
A Catalogue of Putative cis-Regulatory Interactions Between Long Non-coding RNAs and Proximal Coding Genes Based on Correlative Analysis Across Diverse Human Tumors.

PubMed

Basu, Swaraj; Larsson, Erik

2018-05-31

Antisense transcripts and other long non-coding RNAs are pervasive in mammalian cells, and some of these molecules have been proposed to regulate proximal protein-coding genes in cis For example, non-coding transcription can contribute to inactivation of tumor suppressor genes in cancer, and antisense transcripts have been implicated in the epigenetic inactivation of imprinted genes. However, our knowledge is still limited and more such regulatory interactions likely await discovery. Here, we make use of available gene expression data from a large compendium of human tumors to generate hypotheses regarding non-coding-to-coding cis -regulatory relationships with emphasis on negative associations, as these are less likely to arise for reasons other than cis -regulation. We document a large number of possible regulatory interactions, including 193 coding/non-coding pairs that show expression patterns compatible with negative cis -regulation. Importantly, by this approach we capture several known cases, and many of the involved coding genes have known roles in cancer. Our study provides a large catalog of putative non-coding/coding cis -regulatory pairs that may serve as a basis for further experimental validation and characterization. Copyright © 2018 Basu and Larsson.
Nucleotide sequence of the L1 ribosomal protein gene of Xenopus laevis: remarkable sequence homology among introns.

PubMed Central

Loreni, F; Ruberti, I; Bozzoni, I; Pierandrei-Amaldi, P; Amaldi, F

1985-01-01

Ribosomal protein L1 is encoded by two genes in Xenopus laevis. The comparison of two cDNA sequences shows that the two L1 gene copies (L1a and L1b) have diverged in many silent sites and very few substitution sites; moreover a small duplication occurred at the very end of the coding region of the L1b gene which thus codes for a product five amino acids longer than that coded by L1a. Quantitatively the divergence between the two L1 genes confirms that a whole genome duplication took place in Xenopus laevis approximately 30 million years ago. A genomic fragment containing one of the two L1 gene copies (L1a), with its nine introns and flanking regions, has been completely sequenced. The 5' end of this gene has been mapped within a 20-pyridimine stretch as already found for other vertebrate ribosomal protein genes. Four of the nine introns have a 60-nucleotide sequence with 80% homology; within this region some boxes, one of which is 16 nucleotides long, are 100% homologous among the four introns. This feature of L1a gene introns is interesting since we have previously shown that the activity of this gene is regulated at a post-transcriptional level and it involves the block of the normal splicing of some intron sequences. Images Fig. 3. Fig. 5. PMID:3841512
Experimental Analysis of Mimivirus Translation Initiation Factor 4a Reveals Its Importance in Viral Protein Translation during Infection of Acanthamoeba polyphaga.

PubMed

Bekliz, Meriem; Azza, Said; Seligmann, Hervé; Decloquement, Philippe; Raoult, Didier; La Scola, Bernard

2018-05-15

The Acanthamoeba polyphaga mimivirus is the first giant virus ever described, with a 1.2-Mb genome which encodes 979 proteins, including central components of the translation apparatus. One of these proteins, R458, was predicted to initiate translation, although its specific role remains unknown. We silenced the R458 gene using small interfering RNA (siRNA) and compared levels of viral fitness and protein expression in silenced versus wild-type mimivirus. Silencing decreased the growth rate, but viral particle production at the end of the viral cycle was unaffected. A comparative proteomic approach using two-dimensional difference-in-gel electrophoresis (2D-DIGE) revealed deregulation of the expression of 32 proteins in silenced mimivirus, which were defined as up- or downregulated. Besides revealing proteins with unknown functions, silencing R458 also revealed deregulation in proteins associated with viral particle structures, transcriptional machinery, oxidative pathways, modification of proteins/lipids, and DNA topology/repair. Most of these proteins belong to genes transcribed at the end of the viral cycle. Overall, our data suggest that the R458 protein regulates the expression of mimivirus proteins and, thus, that mimivirus translational proteins may not be strictly redundant in relation to those from the amoeba host. As is the case for eukaryotic initiation factor 4a (eIF4a), the R458 protein is the prototypical member of the ATP-dependent DEAD box RNA helicase mechanism. We suggest that the R458 protein is required to unwind the secondary structures at the 5' ends of mRNAs and to bind the mRNA to the ribosome, making it possible to scan for the start codon. These data are the first experimental evidence of mimivirus translation-related genes, predicted to initiate protein biosynthesis. IMPORTANCE The presence in the genome of a mimivirus of genes coding for many translational processes, with the exception of ribosome constituents, has been the subject of debate since its discovery in 2003. In this work, we focused on the R458 mimivirus gene, predicted to initiate protein biosynthesis. After silencing was performed, we observed that it has no major effect on mimivirus multiplication but that it affects protein expression and fitness. This suggests that it is effectively used by mimivirus during its developmental cycle. Until large-scale genetic manipulation of giant viruses becomes possible, the silencing strategy used here on mimivirus translation-related factors will open the way to understanding the functions of these translational genes. Copyright © 2018 American Society for Microbiology.

New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation.

PubMed

McLysaght, Aoife; Guerzoni, Daniele

2015-09-26

The origin of novel protein-coding genes de novo was once considered so improbable as to be impossible. In less than a decade, and especially in the last five years, this view has been overturned by extensive evidence from diverse eukaryotic lineages. There is now evidence that this mechanism has contributed a significant number of genes to genomes of organisms as diverse as Saccharomyces, Drosophila, Plasmodium, Arabidopisis and human. From simple beginnings, these genes have in some instances acquired complex structure, regulated expression and important functional roles. New genes are often thought of as dispensable late additions; however, some recent de novo genes in human can play a role in disease. Rather than an extremely rare occurrence, it is now evident that there is a relatively constant trickle of proto-genes released into the testing ground of natural selection. It is currently unknown whether de novo genes arise primarily through an 'RNA-first' or 'ORF-first' pathway. Either way, evolutionary tinkering with this pool of genetic potential may have been a significant player in the origins of lineage-specific traits and adaptations. © 2015 The Authors.
A gene family for acidic ribosomal proteins in Schizosaccharomyces pombe: two essential and two nonessential genes.

PubMed Central

Beltrame, M; Bianchi, M E

1990-01-01

We have cloned the genes for small acidic ribosomal proteins (A-proteins) of the fission yeast Schizosaccharomyces pombe. S. pombe contains four transcribed genes for small A-proteins per haploid genome, as is the case for Saccharomyces cerevisiae. In contrast, multicellular eucaryotes contain two transcribed genes per haploid genome. The four proteins of S. pombe, besides sharing a high overall similarity, form two couples of nearly identical sequences. Their corresponding genes have a very conserved structure and are transcribed to a similar level. Surprisingly, of each couple of genes coding for nearly identical proteins, one is essential for cell growth, whereas the other is not. We suggest that the unequal importance of the four small A-proteins for cell survival is related to their physical organization in 60S ribosomal subunits. Images PMID:2325655
The analysis of the complete mitochondrial genome of Lecanicillium muscarium (synonym Verticillium lecanii) suggests a minimum common gene organization in mtDNAs of Sordariomycetes: phylogenetic implications.

PubMed

Kouvelis, Vassili N; Ghikas, Dimitri V; Typas, Milton A

2004-10-01

The mitochondrial genome (mtDNA) of the entomopathogenic fungus Lecanicillium muscarium (synonym Verticillium lecanii) with a total size of 24,499-bp has been analyzed. So far, it is the smallest known mitochondrial genome among Pezizomycotina, with an extremely compact gene organization and only one group-I intron in its large ribosomal RNA (rnl) gene. It contains the 14 typical genes coding for proteins related to oxidative phosphorylation, the two rRNA genes, one intronic ORF coding for a possible ribosomal protein (rps), and a set of 25 tRNA genes which recognize codons for all amino acids, except alanine and cysteine. All genes are transcribed from the same DNA strand. Gene order comparison with all available complete fungal mtDNAs-representatives of all four Phyla are included-revealed some characteristic common features like uninterrupted gene pairs, overlapping genes, and extremely variable intergenic regions, that can all be exploited for the study of fungal mitochondrial genomes. Moreover, a minimum common mtDNA gene order could be detected, in two units, for all known Sordariomycetes namely nad1-nad4-atp8-atp6 and rns-cox3-rnl, which can be extended in Hypocreales, to nad4L-nad5-cob-cox1-nad1-nad4-atp8-atp6 and rns-cox3-rnl nad2-nad3, respectively. Phylogenetic analysis of all fungal mtDNA essential protein-coding genes as one unit, clearly demonstrated the superiority of small genome (mtDNA) over single gene comparisons.
Phenome-genome association studies of pancreatic cancer: new targets for therapy and diagnosis.

PubMed

Narayanan, Ramaswamy

2015-01-01

Pancreatic cancer, has a very high mortality rate and requires novel molecular targets for diagnosis and therapy. Genetic association studies over databases offer an attractive starting point for gene discovery. The National Center for Biotechnology Information (NCBI) Phenome Genome Integrator (PheGenI) tool was enriched for pancreatic cancer-associated traits. The genes associated with the trait were characterized using diverse bioinformatics tools for Genome-Wide Association (GWA), transcriptome and proteome profile and protein classes for motif and domain. Two hundred twenty-six genes were identified that had a genetic association with pancreatic cancer in the human genome. This included 25 uncharacterized open reading frames (ORFs). Bioinformatics analysis of these ORFs identified putative druggable proteins and biomarkers including enzymes, transporters and G-protein-coupled receptor signaling proteins. Secreted proteins including a neuroendocrine factor and a chemokine were identified. Five out of these ORFs encompassed non coding RNAs. The ORF protein expression was detected in numerous body fluids, such as ascites, bile, pancreatic juice, milk, plasma, serum and saliva. Transcriptome and proteome analyses showed a correlation of mRNA and protein expression for nine ORFs. Analysis of the Catalogue of Somatic Mutations in Cancer (COSMIC) database revealed a strong correlation across copy number variations and mRNA over-expression for four ORFs. Mining of the International Cancer Gene Consortium (ICGC) database identified somatic mutations in a significant number of pancreatic patients' tumors for most of these ORFs. The pancreatic cancer-associated ORFs were also found to be genetically associated with other neoplasms, including leukemia, malignant melanoma, neuroblastoma and prostate carcinomas, as well as other unrelated diseases and disorders, such as Alzheimer's disease, Crohn's disease, coronary diseases, attention deficit disorder and addiction. Based on Genome-Wide Association Studies (GWAS), copy number variations, somatic mutational status and correlation of gene expression in pancreatic tumors at the mRNA and protein level, expression specificity in normal tissues and detection in body fluids, six ORFs emerged as putative leads for pancreatic cancer. These six targets provide a basis for accelerated drug discovery and diagnostic marker development for pancreatic cancer. Copyright© 2015, International Institute of Anticancer Research (Dr. John G. Delinasios), All rights reserved.
Identification and characterization of an early gene in the Lymantria dispar multinucleocapsid nuclear polyhedrosis virus

Treesearch

David S. Bischoff; James M. Slavicek

1995-01-01

The Lymantria dispar multinucleocapsid nuclear polyhedrosis virus (LdMNPV) gene encoding G22 was cloned and sequenced. The G22 gene codes for a 191 amino acid protein with a predicted Mr of 22000. Expression of G22 in a rabbit reticulocyte system generated a protein with an M...
Identification of high-efficiency 3'GG gRNA motifs in indexed FASTA files with ngg2.

PubMed

Roberson, Elisha D O

CRISPR/Cas9 is emerging as one of the most-used methods of genome modification in organisms ranging from bacteria to human cells. However, the efficiency of editing varies tremendously site-to-site. A recent report identified a novel motif, called the 3'GG motif, which substantially increases the efficiency of editing at all sites tested in C. elegans . Furthermore, they highlighted that previously published gRNAs with high editing efficiency also had this motif. I designed a python command-line tool, ngg2, to identify 3'GG gRNA sites from indexed FASTA files. As a proof-of-concept, I screened for these motifs in six model genomes: Saccharomyces cerevisiae , Caenorhabditis elegans , Drosophila melanogaster , Danio rerio , Mus musculus , and Homo sapiens. I also scanned the genomes of pig ( Sus scrofa ) and African elephant ( Loxodonta africana ) to demonstrate the utility in non-model organisms. I identified more than 60 million single match 3'GG motifs in these genomes. Greater than 61% of all protein coding genes in the reference genomes had at least one unique 3'GG gRNA site overlapping an exon. In particular, more than 96% of mouse and 93% of human protein coding genes have at least one unique, overlapping 3'GG gRNA. These identified sites can be used as a starting point in gRNA selection, and the ngg2 tool provides an important ability to identify 3'GG editing sites in any species with an available genome sequence.
Transcriptional regulation of the human mitochondrial peptide deformylase (PDF).

PubMed

Pereira-Castro, Isabel; Costa, Luís Teixeira da; Amorim, António; Azevedo, Luisa

2012-05-18

The last years of research have been particularly dynamic in establishing the importance of peptide deformylase (PDF), a protein of the N-terminal methionine excision (NME) pathway that removes formyl-methionine from mitochondrial-encoded proteins. The genomic sequence of the human PDF gene is shared with the COG8 gene, which encodes a component of the oligomeric golgi complex, a very unusual case in Eukaryotic genomes. Since PDF is crucial in maintaining mitochondrial function and given the atypical short distance between the end of COG8 coding sequence and the PDF initiation codon, we investigated whether the regulation of the human PDF is affected by the COG8 overlapping partner. Our data reveals that PDF has several transcription start sites, the most important of which only 18 bp from the initiation codon. Furthermore, luciferase-activation assays using differently-sized fragments defined a 97 bp minimal promoter region for human PDF, which is capable of very strong transcriptional activity. This fragment contains a potential Sp1 binding site highly conserved in mammalian species. We show that this binding site, whose mutation significantly reduces transcription activation, is a target for the Sp1 transcription factor, and possibly of other members of the Sp family. Importantly, the entire minimal promoter region is located after the end of COG8's coding region, strongly suggesting that the human PDF preserves an independent regulation from its overlapping partner. Copyright © 2012 Elsevier Inc. All rights reserved.
Evolution of the alternative AQP2 gene: Acquisition of a novel protein-coding sequence in dolphins.

PubMed

Kishida, Takushi; Suzuki, Miwa; Takayama, Asuka

2018-01-01

Taxon-specific de novo protein-coding sequences are thought to be important for taxon-specific environmental adaptation. A recent study revealed that bottlenose dolphins acquired a novel isoform of aquaporin 2 generated by alternative splicing (alternative AQP2), which helps dolphins to live in hyperosmotic seawater. The AQP2 gene consists of four exons, but the alternative AQP2 gene lacks the fourth exon and instead has a longer third exon that includes the original third exon and a part of the original third intron. Here, we show that the latter half of the third exon of the alternative AQP2 arose from a non-protein-coding sequence. Intact ORF of this de novo sequence is shared not by all cetaceans, but only by delphinoids. However, this sequence is conservative in all modern cetaceans, implying that this de novo sequence potentially plays important roles for marine adaptation in cetaceans. Copyright © 2017 Elsevier Inc. All rights reserved.
Chemical and Biological Tools for the Preparation of Modified Histone Proteins

PubMed Central

Howard, Cecil J.; Yu, Ruixuan R.; Gardner, Miranda L.; Shimko, John C.; Ottesen, Jennifer J.

2016-01-01

Eukaryotic chromatin is a complex and dynamic system in which the DNA double helix is organized and protected by interactions with histone proteins. This system is regulated through, a large network of dynamic post-translational modifications (PTMs) exists to ensure proper gene transcription, DNA repair, and other processes involving DNA. Homogenous protein samples with precisely characterized modification sites are necessary to better understand the functions of modified histone proteins. Here, we discuss sets of chemical and biological tools that have been developed for the preparation of modified histones, with a focus on the appropriate choice of tool for a given target. We start with genetic approaches for the creation of modified histones, including the incorporation of genetic mimics of histone modifications, chemical installation of modification analogs, and the use of the expanded genetic code to incorporate modified amino acids. Additionally, we will cover the chemical ligation techniques that have been invaluable in the generation of complex modified histones that are indistinguishable from the natural counterparts. Finally, we will end with a prospectus on future directions of synthetic chromatin in living systems. PMID:25863817
Efficient analysis of mouse genome sequences reveal many nonsense variants

PubMed Central

Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E.; Libert, Claude

2016-01-01

Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605
Massively Convergent Evolution for Ribosomal Protein Gene Content in Plastid and Mitochondrial Genomes

PubMed Central

Maier, Uwe-G; Zauner, Stefan; Woehle, Christian; Bolte, Kathrin; Hempel, Franziska; Allen, John F.; Martin, William F.

2013-01-01

Plastid and mitochondrial genomes have undergone parallel evolution to encode the same functional set of genes. These encode conserved protein components of the electron transport chain in their respective bioenergetic membranes and genes for the ribosomes that express them. This highly convergent aspect of organelle genome evolution is partly explained by the redox regulation hypothesis, which predicts a separate plastid or mitochondrial location for genes encoding bioenergetic membrane proteins of either photosynthesis or respiration. Here we show that convergence in organelle genome evolution is far stronger than previously recognized, because the same set of genes for ribosomal proteins is independently retained by both plastid and mitochondrial genomes. A hitherto unrecognized selective pressure retains genes for the same ribosomal proteins in both organelles. On the Escherichia coli ribosome assembly map, the retained proteins are implicated in 30S and 50S ribosomal subunit assembly and initial rRNA binding. We suggest that ribosomal assembly imposes functional constraints that govern the retention of ribosomal protein coding genes in organelles. These constraints are subordinate to redox regulation for electron transport chain components, which anchor the ribosome to the organelle genome in the first place. As organelle genomes undergo reduction, the rRNAs also become smaller. Below size thresholds of approximately 1,300 nucleotides (16S rRNA) and 2,100 nucleotides (26S rRNA), all ribosomal protein coding genes are lost from organelles, while electron transport chain components remain organelle encoded as long as the organelles use redox chemistry to generate a proton motive force. PMID:24259312
Expression profiles of long non-coding RNAs located in autoimmune disease-associated regions reveal immune cell-type specificity.

PubMed

Hrdlickova, Barbara; Kumar, Vinod; Kanduri, Kartiek; Zhernakova, Daria V; Tripathi, Subhash; Karjalainen, Juha; Lund, Riikka J; Li, Yang; Ullah, Ubaid; Modderman, Rutger; Abdulahad, Wayel; Lähdesmäki, Harri; Franke, Lude; Lahesmaa, Riitta; Wijmenga, Cisca; Withoff, Sebo

2014-01-01

Although genome-wide association studies (GWAS) have identified hundreds of variants associated with a risk for autoimmune and immune-related disorders (AID), our understanding of the disease mechanisms is still limited. In particular, more than 90% of the risk variants lie in non-coding regions, and almost 10% of these map to long non-coding RNA transcripts (lncRNAs). lncRNAs are known to show more cell-type specificity than protein-coding genes. We aimed to characterize lncRNAs and protein-coding genes located in loci associated with nine AIDs which have been well-defined by Immunochip analysis and by transcriptome analysis across seven populations of peripheral blood leukocytes (granulocytes, monocytes, natural killer (NK) cells, B cells, memory T cells, naive CD4(+) and naive CD8(+) T cells) and four populations of cord blood-derived T-helper cells (precursor, primary, and polarized (Th1, Th2) T-helper cells). We show that lncRNAs mapping to loci shared between AID are significantly enriched in immune cell types compared to lncRNAs from the whole genome (α <0.005). We were not able to prioritize single cell types relevant for specific diseases, but we observed five different cell types enriched (α <0.005) in five AID (NK cells for inflammatory bowel disease, juvenile idiopathic arthritis, primary biliary cirrhosis, and psoriasis; memory T and CD8(+) T cells in juvenile idiopathic arthritis, primary biliary cirrhosis, psoriasis, and rheumatoid arthritis; Th0 and Th2 cells for inflammatory bowel disease, juvenile idiopathic arthritis, primary biliary cirrhosis, psoriasis, and rheumatoid arthritis). Furthermore, we show that co-expression analyses of lncRNAs and protein-coding genes can predict the signaling pathways in which these AID-associated lncRNAs are involved. The observed enrichment of lncRNA transcripts in AID loci implies lncRNAs play an important role in AID etiology and suggests that lncRNA genes should be studied in more detail to interpret GWAS findings correctly. The co-expression results strongly support a model in which the lncRNA and protein-coding genes function together in the same pathways.
A comprehensive catalog of human KRAB-associated zinc finger genes: Insights into the evolutionary history of a large family of transcriptional repressors

PubMed Central

Huntley, Stuart; Baggott, Daniel M.; Hamilton, Aaron T.; Tran-Gyamfi, Mary; Yang, Shan; Kim, Joomyeong; Gordon, Laurie; Branscomb, Elbert; Stubbs, Lisa

2006-01-01

Krüppel-type zinc finger (ZNF) motifs are prevalent components of transcription factor proteins in all eukaryotes. KRAB-ZNF proteins, in which a potent repressor domain is attached to a tandem array of DNA-binding zinc-finger motifs, are specific to tetrapod vertebrates and represent the largest class of ZNF proteins in mammals. To define the full repertoire of human KRAB-ZNF proteins, we searched the genome sequence for key motifs and then constructed and manually curated gene models incorporating those sequences. The resulting gene catalog contains 423 KRAB-ZNF protein-coding loci, yielding alternative transcripts that altogether predict at least 742 structurally distinct proteins. Active rounds of segmental duplication, involving single genes or larger regions and including both tandem and distributed duplication events, have driven the expansion of this mammalian gene family. Comparisons between the human genes and ZNF loci mined from the draft mouse, dog, and chimpanzee genomes not only identified 103 KRAB-ZNF genes that are conserved in mammals but also highlighted a substantial level of lineage-specific change; at least 136 KRAB-ZNF coding genes are primate specific, including many recent duplicates. KRAB-ZNF genes are widely expressed and clustered genes are typically not coregulated, indicating that paralogs have evolved to fill roles in many different biological processes. To facilitate further study, we have developed a Web-based public resource with access to gene models, sequences, and other data, including visualization tools to provide genomic context and interaction with other public data sets. PMID:16606702
Genome-Wide Discovery of Long Non-Coding RNAs in Rainbow Trout.

PubMed

Al-Tobasei, Rafet; Paneru, Bam; Salem, Mohamed

2016-01-01

The ENCODE project revealed that ~70% of the human genome is transcribed. While only 1-2% of the RNAs encode for proteins, the rest are non-coding RNAs. Long non-coding RNAs (lncRNAs) form a diverse class of non-coding RNAs that are longer than 200 nt. Emerging evidence indicates that lncRNAs play critical roles in various cellular processes including regulation of gene expression. LncRNAs show low levels of gene expression and sequence conservation, which make their computational identification in genomes difficult. In this study, more than two billion Illumina sequence reads were mapped to the genome reference using the TopHat and Cufflinks software. Transcripts shorter than 200 nt, with more than 83-100 amino acids ORF, or with significant homologies to the NCBI nr-protein database were removed. In addition, a computational pipeline was used to filter the remaining transcripts based on a protein-coding-score test. Depending on the filtering stringency conditions, between 31,195 and 54,503 lncRNAs were identified, with only 421 matching known lncRNAs in other species. A digital gene expression atlas revealed 2,935 tissue-specific and 3,269 ubiquitously-expressed lncRNAs. This study annotates the lncRNA rainbow trout genome and provides a valuable resource for functional genomics research in salmonids.
Seed storage protein gene promoters contain conserved DNA motifs in Brassicaceae, Fabaceae and Poaceae

PubMed Central

Fauteux, François; Strömvik, Martina V

2009-01-01

Background Accurate computational identification of cis-regulatory motifs is difficult, particularly in eukaryotic promoters, which typically contain multiple short and degenerate DNA sequences bound by several interacting factors. Enrichment in combinations of rare motifs in the promoter sequence of functionally or evolutionarily related genes among several species is an indicator of conserved transcriptional regulatory mechanisms. This provides a basis for the computational identification of cis-regulatory motifs. Results We have used a discriminative seeding DNA motif discovery algorithm for an in-depth analysis of 54 seed storage protein (SSP) gene promoters from three plant families, namely Brassicaceae (mustards), Fabaceae (legumes) and Poaceae (grasses) using backgrounds based on complete sets of promoters from a representative species in each family, namely Arabidopsis (Arabidopsis thaliana (L.) Heynh.), soybean (Glycine max (L.) Merr.) and rice (Oryza sativa L.) respectively. We have identified three conserved motifs (two RY-like and one ACGT-like) in Brassicaceae and Fabaceae SSP gene promoters that are similar to experimentally characterized seed-specific cis-regulatory elements. Fabaceae SSP gene promoter sequences are also enriched in a novel, seed-specific E2Fb-like motif. Conserved motifs identified in Poaceae SSP gene promoters include a GCN4-like motif, two prolamin-box-like motifs and an Skn-1-like motif. Evidence of the presence of a variant of the TATA-box is found in the SSP gene promoters from the three plant families. Motifs discovered in SSP gene promoters were used to score whole-genome sets of promoters from Arabidopsis, soybean and rice. The highest-scoring promoters are associated with genes coding for different subunits or precursors of seed storage proteins. Conclusion Seed storage protein gene promoter motifs are conserved in diverse species, and different plant families are characterized by a distinct combination of conserved motifs. The majority of discovered motifs match experimentally characterized cis-regulatory elements. These results provide a good starting point for further experimental analysis of plant seed-specific promoters and our methodology can be used to unravel more transcriptional regulatory mechanisms in plants and other eukaryotes. PMID:19843335
Cloning and Expression Analysis of Genes Encoding Lytic Endopeptidases L1 and L5 from Lysobacter sp. Strain XL1

PubMed Central

Lapteva, Y. S.; Zolova, O. E.; Shlyapnikov, M. G.; Tsfasman, I. M.; Muranova, T. A.; Stepnaya, O. A.; Kulaev, I. S.

2012-01-01

Lytic enzymes are the group of hydrolases that break down structural polymers of the cell walls of various microorganisms. In this work, we determined the nucleotide sequences of the Lysobacter sp. strain XL1 alpA and alpB genes, which code for, respectively, secreted lytic endopeptidases L1 (AlpA) and L5 (AlpB). In silico analysis of their amino acid sequences showed these endopeptidases to be homologous proteins synthesized as precursors similar in structural organization: the mature enzyme sequence is preceded by an N-terminal signal peptide and a pro region. On the basis of phylogenetic analysis, endopeptidases AlpA and AlpB were assigned to the S1E family [clan PA(S)] of serine peptidases. Expression of the alpA and alpB open reading frames (ORFs) in Escherichia coli confirmed that they code for functionally active lytic enzymes. Each ORF was predicted to have the Shine-Dalgarno sequence located at a canonical distance from the start codon and a potential Rho-independent transcription terminator immediately after the stop codon. The alpA and alpB mRNAs were experimentally found to be monocistronic; transcription start points were determined for both mRNAs. The synthesis of the alpA and alpB mRNAs was shown to occur predominantly in the late logarithmic growth phase. The amount of alpA mRNA in cells of Lysobacter sp. strain XL1 was much higher, which correlates with greater production of endopeptidase L1 than of L5. PMID:22865082
Cloning and expression analysis of genes encoding lytic endopeptidases L1 and L5 from Lysobacter sp. strain XL1.

PubMed

Lapteva, Y S; Zolova, O E; Shlyapnikov, M G; Tsfasman, I M; Muranova, T A; Stepnaya, O A; Kulaev, I S; Granovsky, I E

2012-10-01

Lytic enzymes are the group of hydrolases that break down structural polymers of the cell walls of various microorganisms. In this work, we determined the nucleotide sequences of the Lysobacter sp. strain XL1 alpA and alpB genes, which code for, respectively, secreted lytic endopeptidases L1 (AlpA) and L5 (AlpB). In silico analysis of their amino acid sequences showed these endopeptidases to be homologous proteins synthesized as precursors similar in structural organization: the mature enzyme sequence is preceded by an N-terminal signal peptide and a pro region. On the basis of phylogenetic analysis, endopeptidases AlpA and AlpB were assigned to the S1E family [clan PA(S)] of serine peptidases. Expression of the alpA and alpB open reading frames (ORFs) in Escherichia coli confirmed that they code for functionally active lytic enzymes. Each ORF was predicted to have the Shine-Dalgarno sequence located at a canonical distance from the start codon and a potential Rho-independent transcription terminator immediately after the stop codon. The alpA and alpB mRNAs were experimentally found to be monocistronic; transcription start points were determined for both mRNAs. The synthesis of the alpA and alpB mRNAs was shown to occur predominantly in the late logarithmic growth phase. The amount of alpA mRNA in cells of Lysobacter sp. strain XL1 was much higher, which correlates with greater production of endopeptidase L1 than of L5.
The complete mitochondrial genome of the Giant Manta ray, Manta birostris.

PubMed

Hinojosa-Alvarez, Silvia; Díaz-Jaimes, Pindaro; Marcet-Houben, Marina; Gabaldón, Toni

2015-01-01

The complete mitochondrial genome of the giant manta ray (Manta birostris), consists of 18,075 bp with rich A + T and low G content. Gene organization and length is similar to other species of ray. It comprises of 13 protein-coding genes, 2 rRNAs genes, 23 tRNAs genes and 1 non-coding sequence, and the control region. We identified an AT tandem repeat region, similar to that reported in Mobula japanica.
The complete mitogenome sequence of the Japanese oak silkmoth, Antheraea yamamai (Lepidoptera: Saturniidae).

PubMed

Kim, Seong Ryeol; Kim, Man Il; Hong, Mee Yeon; Kim, Kee Young; Kang, Pil Don; Hwang, Jae Sam; Han, Yeon Soo; Jin, Byung Rae; Kim, Iksoo

2009-09-01

The 15,338-bp long complete mitochondrial genome (mitogenome) of the Japanese oak silkmoth, Antheraea yamamai (Lepidoptera: Saturniidae) was determined. This genome has a gene arrangement identical to those of all other sequenced lepidopteran insects, but differs from the most common type, as the result of the movement of tRNA(Met) to a position 5'-upstream of tRNA(Ile). No typical start codon of the A. yamamai COI gene is available. Instead, a tetranucleotide, TTAG, which is found at the beginning context of all sequenced lepidopteran insects was tentatively designated as the start codon for A. yamamai COI gene. Three of the 13 protein-coding genes (PCGs) harbor the incomplete termination codon, T or TA. All tRNAs formed stable stem-and-loop structures, with the exception of tRNA(Ser)(AGN), the DHU arm of which formed a simple loop as has been observed in many other metazoan mt tRNA(Ser)(AGN). The 334-bp long A + T-rich region is noteworthy in that it harbors tRNA-like structures, as has also been seen in the A + T-rich regions of other insect mitogenomes. Phylogenetic analyses of the available species of Bombycoidea, Pyraloidea, and Tortricidea bolstered the current morphology-based hypothesis that Bombycoidea and Pyraloidea are monophyletic (Obtectomera). As has been previously suggested, Bombycidae (Bombyx mori and B. mandarina) and Saturniidae (A. yamamai and Caligula boisduvalii) formed a reciprocal monophyletic group.
Origins of Genes: "Big Bang" or Continuous Creation?

NASA Astrophysics Data System (ADS)

Kesse, Paul K.; Gibbs, Adrian

1992-10-01

Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes.

Biodegradation of DDT by Stenotrophomonas sp. DDT-1: Characterization and genome functional analysis

NASA Astrophysics Data System (ADS)

Pan, Xiong; Lin, Dunli; Zheng, Yuan; Zhang, Qian; Yin, Yuanming; Cai, Lin; Fang, Hua; Yu, Yunlong

2016-02-01

A novel bacterium capable of utilizing 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT) as the sole carbon and energy source was isolated from a contaminated soil which was identified as Stenotrophomonas sp. DDT-1 based on morphological characteristics, BIOLOG GN2 microplate profile, and 16S rDNA phylogeny. Genome sequencing and functional annotation of the isolate DDT-1 showed a 4,514,569 bp genome size, 66.92% GC content, 4,033 protein-coding genes, and 76 RNA genes including 8 rRNA genes. Totally, 2,807 protein-coding genes were assigned to Clusters of Orthologous Groups (COGs), and 1,601 protein-coding genes were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The degradation half-lives of DDT increased with substrate concentration from 0.1 to 10.0 mg/l, whereas decreased with temperature from 15 °C to 35 °C. Neutral condition was the most favorable for DDT biodegradation. Based on genome annotation of DDT degradation genes and the metabolites detected by GC-MS, a mineralization pathway was proposed for DDT biodegradation in which it was orderly converted into DDE/DDD, DDMU, DDOH, and DDA via dechlorination, hydroxylation, and carboxylation, and ultimately mineralized to carbon dioxide. The results indicate that the isolate DDT-1 is a promising bacterial resource for the removal or detoxification of DDT residues in the environment.
Biodegradation of DDT by Stenotrophomonas sp. DDT-1: Characterization and genome functional analysis.

PubMed

Pan, Xiong; Lin, Dunli; Zheng, Yuan; Zhang, Qian; Yin, Yuanming; Cai, Lin; Fang, Hua; Yu, Yunlong

2016-02-18

A novel bacterium capable of utilizing 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT) as the sole carbon and energy source was isolated from a contaminated soil which was identified as Stenotrophomonas sp. DDT-1 based on morphological characteristics, BIOLOG GN2 microplate profile, and 16S rDNA phylogeny. Genome sequencing and functional annotation of the isolate DDT-1 showed a 4,514,569 bp genome size, 66.92% GC content, 4,033 protein-coding genes, and 76 RNA genes including 8 rRNA genes. Totally, 2,807 protein-coding genes were assigned to Clusters of Orthologous Groups (COGs), and 1,601 protein-coding genes were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The degradation half-lives of DDT increased with substrate concentration from 0.1 to 10.0 mg/l, whereas decreased with temperature from 15 °C to 35 °C. Neutral condition was the most favorable for DDT biodegradation. Based on genome annotation of DDT degradation genes and the metabolites detected by GC-MS, a mineralization pathway was proposed for DDT biodegradation in which it was orderly converted into DDE/DDD, DDMU, DDOH, and DDA via dechlorination, hydroxylation, and carboxylation, and ultimately mineralized to carbon dioxide. The results indicate that the isolate DDT-1 is a promising bacterial resource for the removal or detoxification of DDT residues in the environment.
Nmf9 Encodes a Highly Conserved Protein Important to Neurological Function in Mice and Flies.

PubMed

Zhang, Shuxiao; Ross, Kevin D; Seidner, Glen A; Gorman, Michael R; Poon, Tiffany H; Wang, Xiaobo; Keithley, Elizabeth M; Lee, Patricia N; Martindale, Mark Q; Joiner, William J; Hamilton, Bruce A

2015-07-01

Many protein-coding genes identified by genome sequencing remain without functional annotation or biological context. Here we define a novel protein-coding gene, Nmf9, based on a forward genetic screen for neurological function. ENU-induced and genome-edited null mutations in mice produce deficits in vestibular function, fear learning and circadian behavior, which correlated with Nmf9 expression in inner ear, amygdala, and suprachiasmatic nuclei. Homologous genes from unicellular organisms and invertebrate animals predict interactions with small GTPases, but the corresponding domains are absent in mammalian Nmf9. Intriguingly, homozygotes for null mutations in the Drosophila homolog, CG45058, show profound locomotor defects and premature death, while heterozygotes show striking effects on sleep and activity phenotypes. These results link a novel gene orthology group to discrete neurological functions, and show conserved requirement across wide phylogenetic distance and domain level structural changes.
CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts.

PubMed

Testa, Alison C; Hane, James K; Ellwood, Simon R; Oliver, Richard P

2015-03-11

The impact of gene annotation quality on functional and comparative genomics makes gene prediction an important process, particularly in non-model species, including many fungi. Sets of homologous protein sequences are rarely complete with respect to the fungal species of interest and are often small or unreliable, especially when closely related species have not been sequenced or annotated in detail. In these cases, protein homology-based evidence fails to correctly annotate many genes, or significantly improve ab initio predictions. Generalised hidden Markov models (GHMM) have proven to be invaluable tools in gene annotation and, recently, RNA-seq has emerged as a cost-effective means to significantly improve the quality of automated gene annotation. As these methods do not require sets of homologous proteins, improving gene prediction from these resources is of benefit to fungal researchers. While many pipelines now incorporate RNA-seq data in training GHMMs, there has been relatively little investigation into additionally combining RNA-seq data at the point of prediction, and room for improvement in this area motivates this study. CodingQuarry is a highly accurate, self-training GHMM fungal gene predictor designed to work with assembled, aligned RNA-seq transcripts. RNA-seq data informs annotations both during gene-model training and in prediction. Our approach capitalises on the high quality of fungal transcript assemblies by incorporating predictions made directly from transcript sequences. Correct predictions are made despite transcript assembly problems, including those caused by overlap between the transcripts of adjacent gene loci. Stringent benchmarking against high-confidence annotation subsets showed CodingQuarry predicted 91.3% of Schizosaccharomyces pombe genes and 90.4% of Saccharomyces cerevisiae genes perfectly. These results are 4-5% better than those of AUGUSTUS, the next best performing RNA-seq driven gene predictor tested. Comparisons against whole genome Sc. pombe and S. cerevisiae annotations further substantiate a 4-5% improvement in the number of correctly predicted genes. We demonstrate the success of a novel method of incorporating RNA-seq data into GHMM fungal gene prediction. This shows that a high quality annotation can be achieved without relying on protein homology or a training set of genes. CodingQuarry is freely available ( https://sourceforge.net/projects/codingquarry/ ), and suitable for incorporation into genome annotation pipelines.
Silicon enhances suberization and lignification in roots of rice (Oryza sativa).

PubMed

Fleck, Alexander T; Nye, Thandar; Repenning, Cornelia; Stahl, Frank; Zahn, Marc; Schenk, Manfred K

2011-03-01

The beneficial element silicon (Si) may affect radial oxygen loss (ROL) of rice roots depending on suberization of the exodermis and lignification of sclerenchyma. Thus, the effect of Si nutrition on the oxidation power of rice roots, suberization and lignification was examined. In addition, Si-induced alterations of the transcript levels of 265 genes related to suberin and lignin synthesis were studied by custom-made microarray and quantitative Real Time-PCR. Without Si supply, the oxidation zone of 12 cm long adventitious roots extended along the entire root length but with Si supply the oxidation zone was restricted to 5 cm behind the root tip. This pattern coincided with enhanced suberization of the exodermis and lignification of sclerenchyma by Si supply. Suberization of the exodermis started, with and without Si supply, at 4-5 cm and 8-9 cm distance from the root tip (drt), respectively. Si significantly increased transcript abundance of 12 genes, while two genes had a reduced transcript level. A gene coding for a leucine-rich repeat protein exhibited a 25-fold higher transcript level with Si nutrition. Physiological, histochemical, and molecular-biological data showing that Si has an active impact on rice root anatomy and gene transcription is presented here.
Foxo3 activity promoted by non-coding effects of circular RNA and Foxo3 pseudogene in the inhibition of tumor growth and angiogenesis.

PubMed

Yang, W; Du, W W; Li, X; Yee, A J; Yang, B B

2016-07-28

It has recently been shown that the upregulation of a pseudogene specific to a protein-coding gene could function as a sponge to bind multiple potential targeting microRNAs (miRNAs), resulting in increased gene expression. Similarly, it was recently demonstrated that circular RNAs can function as sponges for miRNAs, and could upregulate expression of mRNAs containing an identical sequence. Furthermore, some mRNAs are now known to not only translate protein, but also function to sponge miRNA binding, facilitating gene expression. Collectively, these appear to be effective mechanisms to ensure gene expression and protein activity. Here we show that expression of a member of the forkhead family of transcription factors, Foxo3, is regulated by the Foxo3 pseudogene (Foxo3P), and Foxo3 circular RNA, both of which bind to eight miRNAs. We found that the ectopic expression of the Foxo3P, Foxo3 circular RNA and Foxo3 mRNA could all suppress tumor growth and cancer cell proliferation and survival. Our results showed that at least three mechanisms are used to ensure protein translation of Foxo3, which reflects an essential role of Foxo3 and its corresponding non-coding RNAs.
Regulation of cellulase expression, sporulation, and morphogenesis by velvet family proteins in Trichoderma reesei.

PubMed

Liu, Kuimei; Dong, Yanmei; Wang, Fangzhong; Jiang, Baojie; Wang, Mingyu; Fang, Xu

2016-01-01

Homologs of the velvet protein family are encoded by the ve1, vel2, and vel3 genes in Trichoderma reesei. To test their regulatory functions, the velvet protein-coding genes were disrupted, generating Δve1, Δvel2, and Δvel3 strains. The phenotypic features of these strains were examined to identify their functions in morphogenesis, sporulation, and cellulase expression. The three velvet-deficient strains produced more hyphal branches, indicating that velvet family proteins participate in the morphogenesis in T. reesei. Deletion of ve1 and vel3 did not affect biomass accumulation, while deletion of vel2 led to a significantly hampered growth when cellulose was used as the sole carbon source in the medium. The deletion of either ve1 or vel2 led to the sharp decrease of sporulation as well as a global downregulation of cellulase-coding genes. In contrast, although the expression of cellulase-coding genes of the ∆vel3 strain was downregulated in the dark, their expression in light condition was unaffected. Sporulation was hampered in the ∆vel3 strain. These results suggest that Ve1 and Vel2 play major roles, whereas Vel3 plays a minor role in sporulation, morphogenesis, and cellulase expression.
Decoding sORF translation - from small proteins to gene regulation.

PubMed

Cabrera-Quio, Luis Enrique; Herberg, Sarah; Pauli, Andrea

2016-11-01

Translation is best known as the fundamental mechanism by which the ribosome converts a sequence of nucleotides into a string of amino acids. Extensive research over many years has elucidated the key principles of translation, and the majority of translated regions were thought to be known. The recent discovery of wide-spread translation outside of annotated protein-coding open reading frames (ORFs) came therefore as a surprise, raising the intriguing possibility that these newly discovered translated regions might have unrecognized protein-coding or gene-regulatory functions. Here, we highlight recent findings that provide evidence that some of these newly discovered translated short ORFs (sORFs) encode functional, previously missed small proteins, while others have regulatory roles. Based on known examples we will also speculate about putative additional roles and the potentially much wider impact that these translated regions might have on cellular homeostasis and gene regulation.
Deep developmental transcriptome sequencing uncovers numerous new genes and enhances gene annotation in the sponge Amphimedon queenslandica.

PubMed

Fernandez-Valverde, Selene L; Calcino, Andrew D; Degnan, Bernard M

2015-05-15

The demosponge Amphimedon queenslandica is amongst the few early-branching metazoans with an assembled and annotated draft genome, making it an important species in the study of the origin and early evolution of animals. Current gene models in this species are largely based on in silico predictions and low coverage expressed sequence tag (EST) evidence. Amphimedon queenslandica protein-coding gene models are improved using deep RNA-Seq data from four developmental stages and CEL-Seq data from 82 developmental samples. Over 86% of previously predicted genes are retained in the new gene models, although 24% have additional exons; there is also a marked increase in the total number of annotated 3' and 5' untranslated regions (UTRs). Importantly, these new developmental transcriptome data reveal numerous previously unannotated protein-coding genes in the Amphimedon genome, increasing the total gene number by 25%, from 30,060 to 40,122. In general, Amphimedon genes have introns that are markedly smaller than those in other animals and most of the alternatively spliced genes in Amphimedon undergo intron-retention; exon-skipping is the least common mode of alternative splicing. Finally, in addition to canonical polyadenylation signal sequences, Amphimedon genes are enriched in a number of unique AT-rich motifs in their 3' UTRs. The inclusion of developmental transcriptome data has substantially improved the structure and composition of protein-coding gene models in Amphimedon queenslandica, providing a more accurate and comprehensive set of genes for functional and comparative studies. These improvements reveal the Amphimedon genome is comprised of a remarkably high number of tightly packed genes. These genes have small introns and there is pervasive intron retention amongst alternatively spliced transcripts. These aspects of the sponge genome are more similar unicellular opisthokont genomes than to other animal genomes.
The mitochondrial genome of the ascalaphid owlfly Libelloides macaronius and comparative evolutionary mitochondriomics of neuropterid insects

PubMed Central

2011-01-01

Background The insect order Neuroptera encompasses more than 5,700 described species. To date, only three neuropteran mitochondrial genomes have been fully and one partly sequenced. Current knowledge on neuropteran mitochondrial genomes is limited, and new data are strongly required. In the present work, the mitochondrial genome of the ascalaphid owlfly Libelloides macaronius is described and compared with the known neuropterid mitochondrial genomes: Megaloptera, Neuroptera and Raphidioptera. These analyses are further extended to other endopterygotan orders. Results The mitochondrial genome of L. macaronius is a circular molecule 15,890 bp long. It includes the entire set of 37 genes usually present in animal mitochondrial genomes. The gene order of this newly sequenced genome is unique among Neuroptera and differs from the ancestral type of insects in the translocation of trnC. The L. macaronius genome shows the lowest A+T content (74.50%) among known neuropterid genomes. Protein-coding genes possess the typical mitochondrial start codons, except for cox1, which has an unusual ACG. Comparisons among endopterygotan mitochondrial genomes showed that A+T content and AT/GC-skews exhibit a broad range of variation among 84 analyzed taxa. Comparative analyses showed that neuropterid mitochondrial protein-coding genes experienced complex evolutionary histories, involving features ranging from codon usage to rate of substitution, that make them potential markers for population genetics/phylogenetics studies at different taxonomic ranks. The 22 tRNAs show variable substitution patterns in Neuropterida, with higher sequence conservation in genes located on the α strand. Inferred secondary structures for neuropterid rrnS and rrnL genes largely agree with those known for other insects. For the first time, a model is provided for domain I of an insect rrnL. The control region in Neuropterida, as in other insects, is fast-evolving genomic region, characterized by AT-rich motifs. Conclusions The new genome shares many features with known neuropteran genomes but differs in its low A+T content. Comparative analysis of neuropterid mitochondrial genes showed that they experienced distinct evolutionary patterns. Both tRNA families and ribosomal RNAs show composite substitution pathways. The neuropterid mitochondrial genome is characterized by a complex evolutionary history. PMID:21569260
Analysis of the cbhE' plasmid gene from acute disease-causing isolates of Coxiella burnetii.

PubMed

Minnick, M F; Small, C L; Frazier, M E; Mallavia, L P

1991-07-15

A gene termed cbhE' was cloned from the QpH1 plasmid of Coxiella burnetii. Expression of recombinants containing cbhE' in vitro and in Escherichia coli maxicells, produced an insert-encoded polypeptide of approx. 42 kDa. The CbhE protein was not cleaved when intact maxicells were treated with trypsin. Hybridizations of total DNA isolated from the six strains of C. burnetii indicate that this gene is unique to C. burnetii strains associated with acute disease, i.e., Hamilton[I], Vacca[II], and Rasche[III]. The cbhE' gene was not detected in strains associated with chronic disease (Biotzere[IV] and Corazon[V]) or the Dod[VI] strain. The cbhE' open reading frame (ORF) is 1022 bp in length and is preceded by a predicted promoter/Shine-Dalgarno (SD) region of TCAACT(-35)-N16-TAAAAT(-10)-N14-AGAAGGA (SD) located 10 nucleotides (nt) before the presumed AUG start codon. The ORF ends with a single UAA stop codon and has no apparent Rho-factor-independent terminator following it. The cbhE' gene codes for the CbhE protein of 341 amino acid (aa) residues with a deduced Mr of 39,442. CbhE is predominantly hydrophilic with a predicted pI of 4.43. The function of CbhE is unknown. No nt or aa sequences with homology to cbhE' or CbhE, respectively, were found in searches of a number of data bases.
The mitochondrial genomes of the acoelomorph worms Paratomella rubra, Isodiametra pulchra and Archaphanostoma ylvae.

PubMed

Robertson, Helen E; Lapraz, François; Egger, Bernhard; Telford, Maximilian J; Schiffer, Philipp H

2017-05-12

Acoels are small, ubiquitous - but understudied - marine worms with a very simple body plan. Their internal phylogeny is still not fully resolved, and the position of their proposed phylum Xenacoelomorpha remains debated. Here we describe mitochondrial genome sequences from the acoels Paratomella rubra and Isodiametra pulchra, and the complete mitochondrial genome of the acoel Archaphanostoma ylvae. The P. rubra and A. ylvae sequences are typical for metazoans in size and gene content. The larger I. pulchra mitochondrial genome contains both ribosomal genes, 21 tRNAs, but only 11 protein-coding genes. We find evidence suggesting a duplicated sequence in the I. pulchra mitochondrial genome. The P. rubra, I. pulchra and A. ylvae mitochondria have a unique genome organisation in comparison to other metazoan mitochondrial genomes. We found a large degree of protein-coding gene and tRNA overlap with little non-coding sequence in the compact P. rubra genome. Conversely, the A. ylvae and I. pulchra genomes have many long non-coding sequences between genes, likely driving genome size expansion in the latter. Phylogenetic trees inferred from mitochondrial genes retrieve Xenacoelomorpha as an early branching taxon in the deuterostomes. Sequence divergence analysis between P. rubra sampled in England and Spain indicates cryptic diversity.
Analysis and recognition of 5′ UTR intron splice sites in human pre-mRNA

PubMed Central

Eden, E.; Brunak, S.

2004-01-01

Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5′ untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to ‘pure’ UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by ‘coding’ noise, thus enhancing significantly the prediction of 5′ UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3′ ends of non-coding exons and 5′ non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2–3-fold better compared with NetGene2 and GenScan in 5′ UTRs. We also tested the 5′ UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR. PMID:14960723
Lysis delay and burst shrinkage of coliphage T7 by deletion of terminator Tφ reversed by deletion of early genes.

PubMed

Nguyen, Huong Minh; Kang, Changwon

2014-02-01

Bacteriophage T7 terminator Tϕ is a class I intrinsic terminator coding for an RNA hairpin structure immediately followed by oligo(U), which has been extensively studied in terms of its transcription termination mechanism, but little is known about its physiological or regulatory functions. In this study, using a T7 mutant phage, where a 31-bp segment of Tϕ was deleted from the genome, we discovered that deletion of Tϕ from T7 reduces the phage burst size but delays lysis timing, both of which are disadvantageous for the phage. The burst downsizing could directly result from Tϕ deletion-caused upregulation of gene 17.5, coding for holin, among other Tϕ downstream genes, because infection of gp17.5-overproducing Escherichia coli by wild-type T7 phage showed similar burst downsizing. However, the lysis delay was not associated with cellular levels of holin or lysozyme or with rates of phage adsorption. Instead, when allowed to evolve spontaneously in five independent adaptation experiments, the Tϕ-lacking mutant phage, after 27 or 29 passages, recovered both burst size and lysis time reproducibly by deleting early genes 0.5, 0.6, and 0.7 of class I, among other mutations. Deletion of genes 0.5 to 0.7 from the Tϕ-lacking mutant phage decreased expression of several Tϕ downstream genes to levels similar to that of the wild-type phage. Accordingly, phage T7 lysis timing is associated with cellular levels of Tϕ downstream gene products. This suggests the involvement of unknown factor(s) besides the known lysis proteins, lysozyme and holin, and that Tϕ plays a role of optimizing burst size and lysis time during T7 infection. IMPORTANCE Bacteriophages are bacterium-infecting viruses. After producing numerous progenies inside bacteria, phages lyse bacteria using their lysis protein(s) to get out and start a new infection cycle. Normally, lysis is tightly controlled to ensure phage progenies are maximally produced and released at an optimal time. Here, we have discovered that phage T7, besides employing its known lysis proteins, additionally uses its transcription terminator Tϕ to guarantee the optimal lysis of the E. coli host. Tϕ, positioned in the middle of the T7 genome, must be inactivated at least partially to allow for transcription-driven translocation of T7 DNA into hosts and expression of Tϕ downstream but promoter-lacking genes. What role is played by Tϕ before inactivation? Without Tϕ, not only was lysis time delayed but also the number of progenies was reduced in this study. Furthermore, T7 can overcome Tϕ deletion by further deleting some genes, highlighting that a phage has multiple strategies for optimizing lysis.
Conservation of Transcription Start Sites within Genes across a Bacterial Genus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shao, Wenjun; Price, Morgan N.; Deutschbauer, Adam M.

Transcription start sites (TSSs) lying inside annotated genes, on the same or opposite strand, have been observed in diverse bacteria, but the function of these unexpected transcripts is unclear. Here, we use the metal-reducing bacterium Shewanella oneidensis MR-1 and its relatives to study the evolutionary conservation of unexpected TSSs. Using high-resolution tiling microarrays and 5'-end RNA sequencing, we identified 2,531 TSSs in S. oneidensis MR-1, of which 18% were located inside coding sequences (CDSs). Comparative transcriptome analysis with seven additional Shewanella species revealed that the majority (76%) of the TSSs within the upstream regions of annotated genes (gTSSs) were conserved.more » Thirty percent of the TSSs that were inside genes and on the sense strand (iTSSs) were also conserved. Sequence analysis around these iTSSs showed conserved promoter motifs, suggesting that many iTSS are under purifying selection. Furthermore, conserved iTSSs are enriched for regulatory motifs, suggesting that they are regulated, and they tend to eliminate polar effects, which confirms that they are functional. In contrast, the transcription of antisense TSSs located inside CDSs (aTSSs) was significantly less likely to be conserved (22%). However, aTSSs whose transcription was conserved often have conserved promoter motifs and drive the expression of nearby genes. Overall, our findings demonstrate that some internal TSSs are conserved and drive protein expression despite their unusual locations, but the majority are not conserved and may reflect noisy initiation of transcription rather than a biological function.« less
Identification of BSAP (Pax-5) target genes in early B-cell development by loss- and gain-of-function experiments.

PubMed Central

Nutt, S L; Morrison, A M; Dörfler, P; Rolink, A; Busslinger, M

1998-01-01

The Pax-5 gene codes for the transcription factor BSAP which is essential for the progression of adult B lymphopoiesis beyond an early progenitor (pre-BI) cell stage. Although several genes have been proposed to be regulated by BSAP, CD19 is to date the only target gene which has been genetically confirmed to depend on this transcription factor for its expression. We have now taken advantage of cultured pre-BI cells of wild-type and Pax-5 mutant bone marrow to screen a large panel of B lymphoid genes for additional BSAP target genes. Four differentially expressed genes were shown to be under the direct control of BSAP, as their expression was rapidly regulated in Pax-5-deficient pre-BI cells by a hormone-inducible BSAP-estrogen receptor fusion protein. The genes coding for the B-cell receptor component Ig-alpha (mb-1) and the transcription factors N-myc and LEF-1 are positively regulated by BSAP, while the gene coding for the cell surface protein PD-1 is efficiently repressed. Distinct regulatory mechanisms of BSAP were revealed by reconstituting Pax-5-deficient pre-BI cells with full-length BSAP or a truncated form containing only the paired domain. IL-7 signalling was able to efficiently induce the N-myc gene only in the presence of full-length BSAP, while complete restoration of CD19 synthesis was critically dependent on the BSAP protein concentration. In contrast, the expression of the mb-1 and LEF-1 genes was already reconstituted by the paired domain polypeptide lacking any transactivation function, suggesting that the DNA-binding domain of BSAP is sufficient to recruit other transcription factors to the regulatory regions of these two genes. In conclusion, these loss- and gain-of-function experiments demonstrate that BSAP regulates four newly identified target genes as a transcriptional activator, repressor or docking protein depending on the specific regulatory sequence context. PMID:9545244
Recurrent and functional regulatory mutations in breast cancer.

PubMed

Rheinbay, Esther; Parasuraman, Prasanna; Grimsby, Jonna; Tiao, Grace; Engreitz, Jesse M; Kim, Jaegil; Lawrence, Michael S; Taylor-Weiner, Amaro; Rodriguez-Cuevas, Sergio; Rosenberg, Mara; Hess, Julian; Stewart, Chip; Maruvka, Yosef E; Stojanov, Petar; Cortes, Maria L; Seepo, Sara; Cibulskis, Carrie; Tracy, Adam; Pugh, Trevor J; Lee, Jesse; Zheng, Zongli; Ellisen, Leif W; Iafrate, A John; Boehm, Jesse S; Gabriel, Stacey B; Meyerson, Matthew; Golub, Todd R; Baselga, Jose; Hidalgo-Miranda, Alfredo; Shioda, Toshi; Bernards, Andre; Lander, Eric S; Getz, Gad

2017-07-06

Genomic analysis of tumours has led to the identification of hundreds of cancer genes on the basis of the presence of mutations in protein-coding regions. By contrast, much less is known about cancer-causing mutations in non-coding regions. Here we perform deep sequencing in 360 primary breast cancers and develop computational methods to identify significantly mutated promoters. Clear signals are found in the promoters of three genes. FOXA1, a known driver of hormone-receptor positive breast cancer, harbours a mutational hotspot in its promoter leading to overexpression through increased E2F binding. RMRP and NEAT1, two non-coding RNA genes, carry mutations that affect protein binding to their promoters and alter expression levels. Our study shows that promoter regions harbour recurrent mutations in cancer with functional consequences and that the mutations occur at similar frequencies as in coding regions. Power analyses indicate that more such regions remain to be discovered through deep sequencing of adequately sized cohorts of patients.
Exome chip meta-analysis identifies novel loci and East Asian-specific coding variants that contribute to lipid levels and coronary artery disease.

PubMed

Lu, Xiangfeng; Peloso, Gina M; Liu, Dajiang J; Wu, Ying; Zhang, He; Zhou, Wei; Li, Jun; Tang, Clara Sze-Man; Dorajoo, Rajkumar; Li, Huaixing; Long, Jirong; Guo, Xiuqing; Xu, Ming; Spracklen, Cassandra N; Chen, Yang; Liu, Xuezhen; Zhang, Yan; Khor, Chiea Chuen; Liu, Jianjun; Sun, Liang; Wang, Laiyuan; Gao, Yu-Tang; Hu, Yao; Yu, Kuai; Wang, Yiqin; Cheung, Chloe Yu Yan; Wang, Feijie; Huang, Jianfeng; Fan, Qiao; Cai, Qiuyin; Chen, Shufeng; Shi, Jinxiu; Yang, Xueli; Zhao, Wanting; Sheu, Wayne H-H; Cherny, Stacey Shawn; He, Meian; Feranil, Alan B; Adair, Linda S; Gordon-Larsen, Penny; Du, Shufa; Varma, Rohit; Chen, Yii-Der Ida; Shu, Xiao-Ou; Lam, Karen Siu Ling; Wong, Tien Yin; Ganesh, Santhi K; Mo, Zengnan; Hveem, Kristian; Fritsche, Lars G; Nielsen, Jonas Bille; Tse, Hung-Fat; Huo, Yong; Cheng, Ching-Yu; Chen, Y Eugene; Zheng, Wei; Tai, E Shyong; Gao, Wei; Lin, Xu; Huang, Wei; Abecasis, Goncalo; Kathiresan, Sekar; Mohlke, Karen L; Wu, Tangchun; Sham, Pak Chung; Gu, Dongfeng; Willer, Cristen J

2017-12-01

Most genome-wide association studies have been of European individuals, even though most genetic variation in humans is seen only in non-European samples. To search for novel loci associated with blood lipid levels and clarify the mechanism of action at previously identified lipid loci, we used an exome array to examine protein-coding genetic variants in 47,532 East Asian individuals. We identified 255 variants at 41 loci that reached chip-wide significance, including 3 novel loci and 14 East Asian-specific coding variant associations. After a meta-analysis including >300,000 European samples, we identified an additional nine novel loci. Sixteen genes were identified by protein-altering variants in both East Asians and Europeans, and thus are likely to be functional genes. Our data demonstrate that most of the low-frequency or rare coding variants associated with lipids are population specific, and that examining genomic data across diverse ancestries may facilitate the identification of functional genes at associated loci.
Exome chip meta-analysis identifies novel loci and East Asian-specific coding variants contributing to lipid levels and coronary artery disease

PubMed Central

Lu, Xiangfeng; Peloso, Gina M; Liu, Dajiang J.; Wu, Ying; Zhang, He; Zhou, Wei; Li, Jun; Tang, Clara Sze-man; Dorajoo, Rajkumar; Li, Huaixing; Long, Jirong; Guo, Xiuqing; Xu, Ming; Spracklen, Cassandra N.; Chen, Yang; Liu, Xuezhen; Zhang, Yan; Khor, Chiea Chuen; Liu, Jianjun; Sun, Liang; Wang, Laiyuan; Gao, Yu-Tang; Hu, Yao; Yu, Kuai; Wang, Yiqin; Cheung, Chloe Yu Yan; Wang, Feijie; Huang, Jianfeng; Fan, Qiao; Cai, Qiuyin; Chen, Shufeng; Shi, Jinxiu; Yang, Xueli; Zhao, Wanting; Sheu, Wayne H.-H.; Cherny, Stacey Shawn; He, Meian; Feranil, Alan B.; Adair, Linda S.; Gordon-Larsen, Penny; Du, Shufa; Varma, Rohit; da Chen, Yii-Der I; Shu, XiaoOu; Lam, Karen Siu Ling; Wong, Tien Yin; Ganesh, Santhi K.; Mo, Zengnan; Hveem, Kristian; Fritsche, Lars; Nielsen, Jonas Bille; Tse, Hung-fat; Huo, Yong; Cheng, Ching-Yu; Chen, Y. Eugene; Zheng, Wei; Tai, E Shyong; Gao, Wei; Lin, Xu; Huang, Wei; Abecasis, Goncalo; Consortium, GLGC; Kathiresan, Sekar; Mohlke, Karen L.; Wu, Tangchun; Sham, Pak Chung; Gu, Dongfeng; Willer, Cristen J

2017-01-01

Most genome-wide association studies have been conducted in European individuals, even though most genetic variation in humans is seen only in non-European samples. To search for novel loci associated with blood lipid levels and clarify the mechanism of action at previously identified lipid loci, we examined protein-coding genetic variants in 47,532 East Asian individuals using an exome array. We identified 255 variants at 41 loci reaching chip-wide significance, including 3 novel loci and 14 East Asian-specific coding variant associations. After meta-analysis with > 300,000 European samples, we identified an additional 9 novel loci. The same 16 genes were identified by the protein-altering variants in both East Asians and Europeans, likely pointing to the functional genes. Our data demonstrate that most of the low-frequency or rare coding variants associated with lipids are population-specific, and that examining genomic data across diverse ancestries may facilitate the identification of functional genes at associated loci. PMID:29083407
Biased exonization of transposed elements in duplicated genes: A lesson from the TIF-IA gene.

PubMed

Amit, Maayan; Sela, Noa; Keren, Hadas; Melamed, Ze'ev; Muler, Inna; Shomron, Noam; Izraeli, Shai; Ast, Gil

2007-11-29

Gene duplication and exonization of intronic transposed elements are two mechanisms that enhance genomic diversity. We examined whether there is less selection against exonization of transposed elements in duplicated genes than in single-copy genes. Genome-wide analysis of exonization of transposed elements revealed a higher rate of exonization within duplicated genes relative to single-copy genes. The gene for TIF-IA, an RNA polymerase I transcription initiation factor, underwent a humanoid-specific triplication, all three copies of the gene are active transcriptionally, although only one copy retains the ability to generate the TIF-IA protein. Prior to TIF-IA triplication, an Alu element was inserted into the first intron. In one of the non-protein coding copies, this Alu is exonized. We identified a single point mutation leading to exonization in one of the gene duplicates. When this mutation was introduced into the TIF-IA coding copy, exonization was activated and the level of the protein-coding mRNA was reduced substantially. A very low level of exonization was detected in normal human cells. However, this exonization was abundant in most leukemia cell lines evaluated, although the genomic sequence is unchanged in these cancerous cells compared to normal cells. The definition of the Alu element within the TIF-IA gene as an exon is restricted to certain types of cancers; the element is not exonized in normal human cells. These results further our understanding of the delicate interplay between gene duplication and alternative splicing and of the molecular evolutionary mechanisms leading to genetic innovations. This implies the existence of purifying selection against exonization in single copy genes, with duplicate genes free from such constrains.

Biased exonization of transposed elements in duplicated genes: A lesson from the TIF-IA gene

PubMed Central

Amit, Maayan; Sela, Noa; Keren, Hadas; Melamed, Ze'ev; Muler, Inna; Shomron, Noam; Izraeli, Shai; Ast, Gil

2007-01-01

Background Gene duplication and exonization of intronic transposed elements are two mechanisms that enhance genomic diversity. We examined whether there is less selection against exonization of transposed elements in duplicated genes than in single-copy genes. Results Genome-wide analysis of exonization of transposed elements revealed a higher rate of exonization within duplicated genes relative to single-copy genes. The gene for TIF-IA, an RNA polymerase I transcription initiation factor, underwent a humanoid-specific triplication, all three copies of the gene are active transcriptionally, although only one copy retains the ability to generate the TIF-IA protein. Prior to TIF-IA triplication, an Alu element was inserted into the first intron. In one of the non-protein coding copies, this Alu is exonized. We identified a single point mutation leading to exonization in one of the gene duplicates. When this mutation was introduced into the TIF-IA coding copy, exonization was activated and the level of the protein-coding mRNA was reduced substantially. A very low level of exonization was detected in normal human cells. However, this exonization was abundant in most leukemia cell lines evaluated, although the genomic sequence is unchanged in these cancerous cells compared to normal cells. Conclusion The definition of the Alu element within the TIF-IA gene as an exon is restricted to certain types of cancers; the element is not exonized in normal human cells. These results further our understanding of the delicate interplay between gene duplication and alternative splicing and of the molecular evolutionary mechanisms leading to genetic innovations. This implies the existence of purifying selection against exonization in single copy genes, with duplicate genes free from such constrains. PMID:18047649
Molecular cloning, expression and characterization of 100K gene of fowl adenovirus-4 for prevention and control of hydropericardium syndrome.

PubMed

Shah, M S; Ashraf, A; Khan, M I; Rahman, M; Habib, M; Qureshi, J A

2016-01-01

Fowl adenovirus-4 is an infectious agent causing Hydropericardium syndrome in chickens. Adenovirus are non-enveloped virions having linear, double stranded DNA. Viral genome codes for few structural and non structural proteins. 100K is an important non-structural viral protein. Open reading frame for coding sequence of 100K protein was cloned with oligo histidine tag and expressed in Escherichia coli as a fusion protein. Nucleotide sequence of the gene revealed that 100K gene of FAdV-4 has high homology (98%) with the respective gene of FAdV-10. Recombinant 100K protein was expressed in E. coli and purified by nickel affinity chromatography. Immunization of chickens with recombinant 100K protein elicited significant serum antibody titers. However challenge protection test revealed that 100K protein conferred little protection (40%) to the immunized chicken against pathogenic viral challenge. So it was concluded that 100K gene has 2397 bp length and recombinant 100K protein has molecular weight of 95 kDa. It was also found that the recombinant protein has little capacity to affect the immune response because in-spite of having an important role in intracellular transport & folding of viral capsid proteins during viral replication, it is not exposed on the surface of the virus at any stage. Copyright © 2015 The International Alliance for Biological Standardization. All rights reserved.
Identification and characterization of smallest pore-forming protein in the cell wall of pathogenic Corynebacterium urealyticum DSM 7109.

PubMed

Abdali, Narges; Younas, Farhan; Mafakheri, Samaneh; Pothula, Karunakar R; Kleinekathöfer, Ulrich; Tauch, Andreas; Benz, Roland

2018-05-09

Corynebacterium urealyticum, a pathogenic, multidrug resistant member of the mycolata, is known as causative agent of urinary tract infections although it is a bacterium of the skin flora. This pathogenic bacterium shares with the mycolata the property of having an unusual cell envelope composition and architecture, typical for the genus Corynebacterium. The cell wall of members of the mycolata contains channel-forming proteins for the uptake of solutes. In this study, we provide novel information on the identification and characterization of a pore-forming protein in the cell wall of C. urealyticum DSM 7109. Detergent extracts of whole C. urealyticum cultures formed in lipid bilayer membranes slightly cation-selective pores with a single-channel conductance of 1.75 nS in 1 M KCl. Experiments with different salts and non-electrolytes suggested that the cell wall pore of C. urealyticum is wide and water-filled and has a diameter of about 1.8 nm. Molecular modelling and dynamics has been performed to obtain a model of the pore. For the search of the gene coding for the cell wall pore of C. urealyticum we looked in the known genome of C. urealyticum for a similar chromosomal localization of the porin gene to known porH and porA genes of other Corynebacterium strains. Three genes are located between the genes coding for GroEL2 and polyphosphate kinase (PKK2). Two of the genes (cur_1714 and cur_1715) were expressed in different constructs in C. glutamicum ΔporAΔporH and in porin-deficient BL21 DE3 Omp8 E. coli strains. The results suggested that the gene cur_1714 codes alone for the cell wall channel. The cell wall porin of C. urealyticum termed PorACur was purified to homogeneity using different biochemical methods and had an apparent molecular mass of about 4 kDa on tricine-containing sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE). Biophysical characterization of the purified protein (PorACur) suggested indeed that cur_1714 is the gene coding for the pore-forming protein in C. urealyticum because the protein formed in lipid bilayer experiments the same pores as the detergent extract of whole cells. The study is the first report of a cell wall channel in the pathogenic C. urealyticum.
Pseudoscorpion mitochondria show rearranged genes and genome-wide reductions of RNA gene sizes and inferred structures, yet typical nucleotide composition bias

PubMed Central

2012-01-01

Background Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes. Results We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic analyses of all 13 mitochondrial protein-coding gene sequences consistently yield trees that place pseudoscorpions as sister to acariform mites. Conclusion The well-supported phylogenetic placement of pseudoscorpions as sister to Acariformes differs from some previous analyses based on morphology. However, these two lineages share multiple molecular evolutionary traits, including substantial mitochondrial genome rearrangements, extensive nucleotide substitution, and loss of helices in their inferred tRNA and rRNA structures. PMID:22409411
Gene discovery in Eimeria tenella by immunoscreening cDNA expression libraries of sporozoites and schizonts with chicken intestinal antibodies.

PubMed

Réfega, Susana; Girard-Misguich, Fabienne; Bourdieu, Christiane; Péry, Pierre; Labbé, Marie

2003-04-02

Specific antibodies were produced ex vivo from intestinal culture of Eimeria tenella infected chickens. The specificity of these intestinal antibodies was tested against different parasite stages. These antibodies were used to immunoscreen first generation schizont and sporozoite cDNA libraries permitting the identification of new E. tenella antigens. We obtained a total of 119 cDNA clones which were subjected to sequence analysis. The sequences coding for the proteins inducing local immune responses were compared with nucleotide or protein databases and with expressed sequence tags (ESTs) databases. We identified new Eimeria genes coding for heat shock proteins, a ribosomal protein, a pyruvate kinase and a pyridoxine kinase. Specific features of other sequences are discussed.
The nearly complete mitochondrial genome of a stonefly species, Styloperla sp. (Plecoptera: Styloperlidae).

PubMed

Chen, Zhi-Teng; Wu, Hai-Yan; Du, Yu-Zhou

2016-07-01

We report the nearly complete mitochondrial genome of a stonefly species, Styloperla sp. (Plecoptera: Styloperlidae), which is a circular molecule of 15,416 bp in length and consists of 13 protein-coding genes, 2 ribosomal RNAs, 20 transfer RNAs and a partial control region (645 bp). Using the 13 protein-coding genes of 8 stoneflies and 3 other related species, we constructed a phylogenetic tree to verify the accuracy of the new determined mitogenome sequences. Our results provide basic data for further study of phylogeny in Plecoptera.
Proteomic Analysis and Identification of the Structural and Regulatory Proteins of the Rhodobacter capsulatus Gene Transfer Agent

PubMed Central

Chen, Frank; Spano, Anthony; Goodman, Benjamin E.; Blasier, Kiev R.; Sabat, Agnes; Jeffery, Erin; Norris, Andrew; Shabanowitz, Jeffrey; Hunt, Donald F.; Lebedev, Nikolai

2010-01-01

The gene transfer agent of Rhodobacter capsulatus (GTA) is a unique phage-like particle that exchanges genetic information between members of this same species of bacterium. Besides being an excellent tool for genetic mapping, the GTA has a number of advantages for biotechnological and nanoengineering purposes. To facilitate the GTA purification and identify the proteins involved in GTA expression, assembly and regulation, in the present work we construct and transform into R. capsulatus Y262 a gene coding for a C-terminally His-tagged capsid protein. The constructed protein was expressed in the cells, assembled into chimeric GTA particles inside the cells and excreted from the cells into surrounding medium. Transmission electron micrographs of phosphotungstate-stained, NiNTA-purified chimeric GTA confirm that its structure is similar to normal GTA particles, with many particles composed both of a head and a tail. The mass spectrometric proteomic analysis of polypeptides present in the GTA recovered outside the cells shows that GTA is composed of at least 9 proteins represented in the GTA gene cluster including proteins coded for by Orf’s 3, 5, 6–9, 11, 13, and 15. PMID:19105630
Proteomic analysis and identification of the structural and regulatory proteins of the Rhodobacter capsulatus gene transfer agent.

PubMed

Chen, Frank; Spano, Anthony; Goodman, Benjamin E; Blasier, Kiev R; Sabat, Agnes; Jeffery, Erin; Norris, Andrew; Shabanowitz, Jeffrey; Hunt, Donald F; Lebedev, Nikolai

2009-02-01

The gene transfer agent of Rhodobacter capsulatus (GTA) is a unique phage-like particle that exchanges genetic information between members of this same species of bacterium. Besides being an excellent tool for genetic mapping, the GTA has a number of advantages for biotechnological and nanoengineering purposes. To facilitate the GTA purification and identify the proteins involved in GTA expression, assembly and regulation, in the present work we construct and transform into R. capsulatus Y262 a gene coding for a C-terminally His-tagged capsid protein. The constructed protein was expressed in the cells, assembled into chimeric GTA particles inside the cells and excreted from the cells into surrounding medium. Transmission electron micrographs of phosphotungstate-stained, NiNTA-purified chimeric GTA confirm that its structure is similar to normal GTA particles, with many particles composed both of a head and a tail. The mass spectrometric proteomic analysis of polypeptides present in the GTA recovered outside the cells shows that GTA is composed of at least 9 proteins represented in the GTA gene cluster including proteins coded for by Orf's 3, 5, 6-9, 11, 13, and 15.
Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae.

PubMed

Michel, Christian J; Ngoune, Viviane Nguefack; Poch, Olivier; Ripp, Raymond; Thompson, Julie D

2017-12-03

A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae . Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae . We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae , but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.
Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.

PubMed Central

Borodovsky, M; Rudd, K E; Koonin, E V

1994-01-01

The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins. Images PMID:7984428
Decoding the genome beyond sequencing: the new phase of genomic research.

PubMed

Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J

2011-10-01

While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. Copyright © 2011 Elsevier Inc. All rights reserved.
Identification of a novel species of papillomavirus in giraffe lesions using nanopore sequencing.

PubMed

Vanmechelen, Bert; Bertelsen, Mads Frost; Rector, Annabel; Van den Oord, Joost J; Laenen, Lies; Vergote, Valentijn; Maes, Piet

2017-03-01

Papillomaviridae form a large family of viruses that are known to infect a variety of vertebrates, including mammals, reptiles, birds and fish. Infections usually give rise to minor skin lesions but can in some cases lead to the development of malignant neoplasia. In this study, we identified a novel species of papillomavirus (PV), isolated from warts of four giraffes (Giraffa camelopardalis). The sequence of the L1 gene was determined and found to be identical for all isolates. Using nanopore sequencing, the full sequence of the PV genome could be determined. The coding region of the genome was found to contain seven open reading frames (ORF), encoding the early proteins E1, E2 and E5-E7 as well as the late proteins L1 and L2. In addition to these ORFs, a region located within the E2 gene is thought, based on sequence similarities to other papillomaviruses, to encode an E4 protein, although no start codon could be identified. Based on the sequence of the L1 gene, this novel PV was found to be most similar to Capreolus capreolus papillomavirus 1 (CcaPV1), with 67.96% nucleotide identity. We therefore suggest that the virus identified here is given the name Giraffa camelopardalis papillomavirus 1 (GcPV1) and is classified as a novel species within the genus Deltapapillomavirus, in line with the current guidelines for the nomenclature and classification of PVs. Copyright © 2017 Elsevier B.V. All rights reserved.
Complete mitochondrial genome of a Asian lion (Panthera leo goojratensis).

PubMed

Li, Yu-Fei; Wang, Qiang; Zhao, Jian-ning

2016-01-01

The entire mitochondrial genome of this Asian lion (Panthera leo goojratensis) was 17,183 bp in length, gene composition and arrangement conformed to other lions, which contained the typical structure of 22 tRNAs, 2 rRNAs, 13 protein-coding genes and a non-coding region. The characteristic of the mitochondrial genome was analyzed in detail.
A Plain English Map of the Human Glycolysis Enzymes.

ERIC Educational Resources Information Center

Offner, Susan

1999-01-01

Presents a plain English map of the gene coding for the glycolysis enzymes in humans to be used as a teaching tool. The map can be used to illustrate that every reaction in a cell requires an enzyme, and that every enzyme is a protein coded for by a gene somewhere on the chromosomes. (WRM)
Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis

PubMed Central

Tellgren-Roth, Christian; Baudo, Charles D.; Kennell, John C.; Sun, Sheng; Billmyre, R. Blake; Schröder, Markus S.; Andersson, Anna; Holm, Tina; Sigurgeirsson, Benjamin; Wu, Guangxi; Sankaranarayanan, Sundar Ram; Siddharthan, Rahul; Sanyal, Kaustuv; Lundeberg, Joakim; Nystedt, Björn; Boekhout, Teun; Dawson, Thomas L.; Heitman, Joseph

2017-01-01

Abstract Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies. PMID:28100699
Plasticity of DNA methylation and gene expression under zinc deficiency in Arabidopsis roots.

PubMed

Chen, Xiaochao; Schönberger, Brigitte; Menz, Jochen; Ludewig, Uwe

2018-05-25

DNA methylation is a heritable chromatin modification that maintains chromosome stability, regulates transposon silencing and appears to be involved in gene expression in response to environmental conditions. Environmental stress alters DNA methylation patterns that are correlated with gene expression differences. Here, genome-wide differential DNA-methylation was identified upon prolonged Zn deficiency, leading to hypo- and hyper-methylated chromosomal regions. Preferential CpG methylation changes occurred in gene promoters and gene bodies, but did not overlap with transcriptional start sites. Methylation changes were also prominent in transposable elements. By contrast, non-CG methylation differences were exclusively found in promoters of protein coding genes and in transposable elements. Strongly Zn deficiency-induced genes and their promoters were mostly non-methylated, irrespective of Zn supply. Differential DNA methylation in the CpG and CHG, but not in the CHH context, was found close to a few up-regulated Zn-deficiency genes. However, the transcriptional Zn-deficiency response in roots appeared little correlated with associated DNA methylation changes in promoters or gene bodies. Furthermore, under Zn deficiency, developmental defects were identified in an Arabidopsis mutant lacking non-CpG methylation. The root methylome thus responds specifically to a micro-nutrient deficiency and is important for efficient Zn utilization at low availability, but the relationship of differential methylation and differentially expressed genes is surprisingly poor.
PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements.

PubMed

Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D

2017-01-04

The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Intragenome Diversity of Gene Families Encoding Toxin-like Proteins in Venomous Animals.

PubMed

Rodríguez de la Vega, Ricardo C; Giraud, Tatiana

2016-11-01

The evolution of venoms is the story of how toxins arise and of the processes that generate and maintain their diversity. For animal venoms these processes include recruitment for expression in the venom gland, neofunctionalization, paralogous expansions, and functional divergence. The systematic study of these processes requires the reliable identification of the venom components involved in antagonistic interactions. High-throughput sequencing has the potential of uncovering the entire set of toxins in a given organism, yet the existence of non-venom toxin paralogs and the misleading effects of partial census of the molecular diversity of toxins make necessary to collect complementary evidence to distinguish true toxins from their non-venom paralogs. Here, we analyzed the whole genomes of two scorpions, one spider and one snake, aiming at the identification of the full repertoires of genes encoding toxin-like proteins. We classified the entire set of protein-coding genes into paralogous groups and monotypic genes, identified genes encoding toxin-like proteins based on known toxin families, and quantified their expression in both venom-glands and pooled tissues. Our results confirm that genes encoding toxin-like proteins are part of multigene families, and that these families arise by recruitment events from non-toxin genes followed by limited expansions of the toxin-like protein coding genes. We also show that failing to account for sequence similarity with non-toxin proteins has a considerable misleading effect that can be greatly reduced by comparative transcriptomics. Our study overall contributes to the understanding of the evolutionary dynamics of proteins involved in antagonistic interactions. © The Author 2016. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.
The Drosophila genes CG14593 and CG30106 code for G-protein-coupled receptors specifically activated by the neuropeptides CCHamide-1 and CCHamide-2.

PubMed

Hansen, Karina K; Hauser, Frank; Williamson, Michael; Weber, Stine B; Grimmelikhuijzen, Cornelis J P

2011-01-07

Recently, a novel neuropeptide, CCHamide, was discovered in the silkworm Bombyx mori (L. Roller et al., Insect Biochem. Mol. Biol. 38 (2008) 1147-1157). We have now found that all insects with a sequenced genome have two genes, each coding for a different CCHamide, CCHamide-1 and -2. We have also cloned and deorphanized two Drosophila G-protein-coupled receptors (GPCRs) coded for by genes CG14593 and CG30106 that are selectively activated by Drosophila CCH-amide-1 (EC(50), 2×10(-9) M) and CCH-amide-2 (EC(50), 5×10(-9) M), respectively. Gene CG30106 (symbol synonym CG14484) has in a previous publication (E.C. Johnson et al., J. Biol. Chem. 278 (2003) 52172-52178) been wrongly assigned to code for an allatostatin-B receptor. This conclusion is based on our findings that the allatostatins-B do not activate the CG30106 receptor and on the recent findings from other research groups that the allatostatins-B activate an unrelated GPCR coded for by gene CG16752. Comparative genomics suggests that a duplication of the CCHamide neuropeptide signalling system occurred after the split of crustaceans and insects, about 410 million years ago, because only one CCHamide neuropeptide gene is found in the water flea Daphnia pulex (Crustacea) and the tick Ixodes scapularis (Chelicerata). Copyright Â© 2010 Elsevier Inc. All rights reserved.
End Joining-Mediated Gene Expression in Mammalian Cells Using PCR-Amplified DNA Constructs that Contain Terminator in Front of Promoter.

PubMed

Nakamura, Mikiko; Suzuki, Ayako; Akada, Junko; Tomiyoshi, Keisuke; Hoshida, Hisashi; Akada, Rinji

2015-12-01

Mammalian gene expression constructs are generally prepared in a plasmid vector, in which a promoter and terminator are located upstream and downstream of a protein-coding sequence, respectively. In this study, we found that front terminator constructs-DNA constructs containing a terminator upstream of a promoter rather than downstream of a coding region-could sufficiently express proteins as a result of end joining of the introduced DNA fragment. By taking advantage of front terminator constructs, FLAG substitutions, and deletions were generated using mutagenesis primers to identify amino acids specifically recognized by commercial FLAG antibodies. A minimal epitope sequence for polyclonal FLAG antibody recognition was also identified. In addition, we analyzed the sequence of a C-terminal Ser-Lys-Leu peroxisome localization signal, and identified the key residues necessary for peroxisome targeting. Moreover, front terminator constructs of hepatitis B surface antigen were used for deletion analysis, leading to the identification of regions required for the particle formation. Collectively, these results indicate that front terminator constructs allow for easy manipulations of C-terminal protein-coding sequences, and suggest that direct gene expression with PCR-amplified DNA is useful for high-throughput protein analysis in mammalian cells.

Accumulation of multiple mutations in linezolid-resistant Staphylococcus epidermidis causing bloodstream infections; in silico analysis of L3 amino acid substitutions that might confer high-level linezolid resistance.

PubMed

Ikonomidis, Alexandros; Grapsa, Anastasia; Pavlioglou, Charikleia; Demiri, Antonia; Batarli, Alexandra; Panopoulou, Maria

2016-12-01

Fifty-six Staphylococcus epidermidis clinical isolates, showing high-level linezolid resistance and causing bacteremia in critically ill patients, were studied. All isolates belonged to ST22 clone and carried the T2504A and C2534T mutations in gene coding for 23SrRNA as well as the C189A, G208A, C209T and G384C missense mutations in L3 protein which resulted in Asp159Tyr, Gly152Asp and Leu94Val substitutions. Other silent mutations were also detected in genes coding for ribosomal proteins L3 and L22. In silico analysis of missense mutations showed that although L3 protein retained the sequence of secondary motifs, the tertiary structure was influenced. The observed alteration in L3 protein folding provides an indication on the putative role of L3-coding gene mutations in high-level linezolid resistance. Furthermore, linezolid pressure in health care settings where linezolid consumption is of high rates might lead to the selection of resistant mutants possessing L3 mutations that might confer high-level linezolid resistance.
Complete mitochondrial genome of the invasive brown alga Sargassum muticum (Sargassaceae, Phaeophyceae).

PubMed

Liu, Feng; Pang, Shaojun

2016-01-01

Sargassum muticum (Yendo) Fensholt is an invasive canopy-forming brown alga, expanding its presence from Northeast Asia to North America and Europe. The complete mitochondrial genome of S. muticum is characterized as a circular molecule of 34,720 bp. The overall AT content of S. muticum mitogenome is 63.41%. This mitogenome contains 65 genes typically found in brown algae, including 3 ribosomal RNA genes, 25 transfer RNA genes, 35 protein-coding genes, and 2 conserved open reading frames (ORFs). The gene order of mitogenome for S. muticum is identical to that for Sargassum horneri, Fucus vesiculosus and Desmarestia viridis. Phylogenetic analyses based on 35 protein-coding genes reveal that S. muticum has a close evolutionary relationship with S. horneri and a distant relationship with Dictyota dichotoma, supporting current taxonomic systems. The present investigation provides new molecular data for studies of S. muticum population diversity as well as comparative genomics in the Phaeophyceae.
Identification of Circular RNAs from the Parental Genes Involved in Multiple Aspects of Cellular Metabolism in Barley

PubMed Central

Darbani, Behrooz; Noeparvar, Shahin; Borg, Søren

2016-01-01

RNA circularization made by head-to-tail back-splicing events is involved in the regulation of gene expression from transcriptional to post-translational levels. By exploiting RNA-Seq data and down-stream analysis, we shed light on the importance of circular RNAs in plants. The results introduce circular RNAs as novel interactors in the regulation of gene expression in plants and imply the comprehensiveness of this regulatory pathway by identifying circular RNAs for a diverse set of genes. These genes are involved in several aspects of cellular metabolism as hormonal signaling, intracellular protein sorting, carbohydrate metabolism and cell-wall biogenesis, respiration, amino acid biosynthesis, transcription and translation, and protein ubiquitination. Additionally, these parental loci of circular RNAs, from both nuclear and mitochondrial genomes, encode for different transcript classes including protein coding transcripts, microRNA, rRNA, and long non-coding/microprotein coding RNAs. The results shed light on the mitochondrial exonic circular RNAs and imply the importance of circular RNAs for regulation of mitochondrial genes. Importantly, we introduce circular RNAs in barley and elucidate their cellular-level alterations across tissues and in response to micronutrients iron and zinc. In further support of circular RNAs' functional roles in plants, we report several cases where fluctuations of circRNAs do not correlate with the levels of their parental-loci encoded linear transcripts. PMID:27375638
The artificial zinc finger coding gene 'Jazz' binds the utrophin promoter and activates transcription.

PubMed

Corbi, N; Libri, V; Fanciulli, M; Tinsley, J M; Davies, K E; Passananti, C

2000-06-01

Up-regulation of utrophin gene expression is recognized as a plausible therapeutic approach in the treatment of Duchenne muscular dystrophy (DMD). We have designed and engineered new zinc finger-based transcription factors capable of binding and activating transcription from the promoter of the dystrophin-related gene, utrophin. Using the recognition 'code' that proposes specific rules between zinc finger primary structure and potential DNA binding sites, we engineered a new gene named 'Jazz' that encodes for a three-zinc finger peptide. Jazz belongs to the Cys2-His2 zinc finger type and was engineered to target the nine base pair DNA sequence: 5'-GCT-GCT-GCG-3', present in the promoter region of both the human and mouse utrophin gene. The entire zinc finger alpha-helix region, containing the amino acid positions that are crucial for DNA binding, was specifically chosen on the basis of the contacts more frequently represented in the available list of the 'code'. Here we demonstrate that Jazz protein binds specifically to the double-stranded DNA target, with a dissociation constant of about 32 nM. Band shift and super-shift experiments confirmed the high affinity and specificity of Jazz protein for its DNA target. Moreover, we show that chimeric proteins, named Gal4-Jazz and Sp1-Jazz, are able to drive the transcription of a test gene from the human utrophin promoter.
Probing the Boundaries of Orthology: The Unanticipated Rapid Evolution of Drosophila centrosomin

PubMed Central

Eisman, Robert C.; Kaufman, Thomas C.

2013-01-01

The rapid evolution of essential developmental genes and their protein products is both intriguing and problematic. The rapid evolution of gene products with simple protein folds and a lack of well-characterized functional domains typically result in a low discovery rate of orthologous genes. Additionally, in the absence of orthologs it is difficult to study the processes and mechanisms underlying rapid evolution. In this study, we have investigated the rapid evolution of centrosomin (cnn), an essential gene encoding centrosomal protein isoforms required during syncytial development in Drosophila melanogaster. Until recently the rapid divergence of cnn made identification of orthologs difficult and questionable because Cnn violates many of the assumptions underlying models for protein evolution. To overcome these limitations, we have identified a group of insect orthologs and present conserved features likely to be required for the functions attributed to cnn in D. melanogaster. We also show that the rapid divergence of Cnn isoforms is apparently due to frequent coding sequence indels and an accelerated rate of intronic additions and eliminations. These changes appear to be buffered by multi-exon and multi-reading frame maximum potential ORFs, simple protein folds, and the splicing machinery. These buffering features also occur in other genes in Drosophila and may help prevent potentially deleterious mutations due to indels in genes with large coding exons and exon-dense regions separated by small introns. This work promises to be useful for future investigations of cnn and potentially other rapidly evolving genes and proteins. PMID:23749319
Cohort-specific imputation of gene expression improves prediction of warfarin dose for African Americans.

PubMed

Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne; Bourgeois, Stephane; Svensson, Peter J; Wadelius, Mia; Deloukas, Panos; Montgomery, Stephen B; Altman, Russ B

2017-11-24

Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects. Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.
Biotin protein ligase from Corynebacterium glutamicum: role for growth and L: -lysine production.

PubMed

Peters-Wendisch, P; Stansen, K C; Götker, S; Wendisch, V F

2012-03-01

Corynebacterium glutamicum is a biotin auxotrophic Gram-positive bacterium that is used for large-scale production of amino acids, especially of L-glutamate and L-lysine. It is known that biotin limitation triggers L-glutamate production and that L-lysine production can be increased by enhancing the activity of pyruvate carboxylase, one of two biotin-dependent proteins of C. glutamicum. The gene cg0814 (accession number YP_225000) has been annotated to code for putative biotin protein ligase BirA, but the protein has not yet been characterized. A discontinuous enzyme assay of biotin protein ligase activity was established using a 105aa peptide corresponding to the carboxyterminus of the biotin carboxylase/biotin carboxyl carrier protein subunit AccBC of the acetyl CoA carboxylase from C. glutamicum as acceptor substrate. Biotinylation of this biotin acceptor peptide was revealed with crude extracts of a strain overexpressing the birA gene and was shown to be ATP dependent. Thus, birA from C. glutamicum codes for a functional biotin protein ligase (EC 6.3.4.15). The gene birA from C. glutamicum was overexpressed and the transcriptome was compared with the control strain revealing no significant gene expression changes of the bio-genes. However, biotin protein ligase overproduction increased the level of the biotin-containing protein pyruvate carboxylase and entailed a significant growth advantage in glucose minimal medium. Moreover, birA overexpression resulted in a twofold higher L-lysine yield on glucose as compared with the control strain.
Forty-four novel protein-coding loci discovered using a proteomics informed by transcriptomics (PIT) approach in rat male germ cells.

PubMed

Chocu, Sophie; Evrard, Bertrand; Lavigne, Régis; Rolland, Antoine D; Aubry, Florence; Jégou, Bernard; Chalmel, Frédéric; Pineau, Charles

2014-11-01

Spermatogenesis is a complex process, dependent upon the successive activation and/or repression of thousands of gene products, and ends with the production of haploid male gametes. RNA sequencing of male germ cells in the rat identified thousands of novel testicular unannotated transcripts (TUTs). Although such RNAs are usually annotated as long noncoding RNAs (lncRNAs), it is possible that some of these TUTs code for protein. To test this possibility, we used a "proteomics informed by transcriptomics" (PIT) strategy combining RNA sequencing data with shotgun proteomics analyses of spermatocytes and spermatids in the rat. Among 3559 TUTs and 506 lncRNAs found in meiotic and postmeiotic germ cells, 44 encoded at least one peptide. We showed that these novel high-confidence protein-coding loci exhibit several genomic features intermediate between those of lncRNAs and mRNAs. We experimentally validated the testicular expression pattern of two of these novel protein-coding gene candidates, both highly conserved in mammals: one for a vesicle-associated membrane protein we named VAMP-9, and the other for an enolase domain-containing protein. This study confirms the potential of PIT approaches for the discovery of protein-coding transcripts initially thought to be untranslated or unknown transcripts. Our results contribute to the understanding of spermatogenesis by characterizing two novel proteins, implicated by their strong expression in germ cells. The mass spectrometry proteomics data have been deposited with the ProteomeXchange Consortium under the data set identifier PXD000872. © 2014 by the Society for the Study of Reproduction, Inc.
Comparative architecture of silks, fibrous proteins and their encoding genes in insects and spiders.

PubMed

Craig, Catherine L; Riekel, Christian

2002-12-01

The known silk fibroins and fibrous glues are thought to be encoded by members of the same gene family. All silk fibroins sequenced to date contain regions of long-range order (crystalline regions) and/or short-range order (non-crystalline regions). All of the sequenced fibroin silks (Flag or silk from flagelliform gland in spiders; Fhc or heavy chain fibroin silks produced by Lepidoptera larvae) are made up of hierarchically organized, repetitive arrays of amino acids. Fhc fibroin genes are characterized by a similar molecular genetic architecture of two exons and one intron, but the organization and size of these units differs. The Flag, Ser (sericin gene) and BR (Balbiani ring genes; both fibrous proteins) genes are made up of multiple exons and introns. Sequences coding for crystalline and non-crystalline protein domains are integrated in the repetitive regions of Fhc and MA exons, but not in the protein glues Ser1 and BR-1. Genetic 'hot-spots' promote recombination errors in Fhc, MA, and Flag. Codon bias, structural constraint, point mutations, and shortened coding arrays may be alternative means of stabilizing precursor mRNA transcripts. Differential regulation of gene expression and selective splicing of the mRNA transcript may allow rapid adaptation of silk functional properties to different physical environments.
Creating reference gene annotation for the mouse C57BL6/J genome assembly.

PubMed

Mudge, Jonathan M; Harrow, Jennifer

2015-10-01

Annotation on the reference genome of the C57BL6/J mouse has been an ongoing project ever since the draft genome was first published. Initially, the principle focus was on the identification of all protein-coding genes, although today the importance of describing long non-coding RNAs, small RNAs, and pseudogenes is recognized. Here, we describe the progress of the GENCODE mouse annotation project, which combines manual annotation from the HAVANA group with Ensembl computational annotation, alongside experimental and in silico validation pipelines from other members of the consortium. We discuss the more recent incorporation of next-generation sequencing datasets into this workflow, including the usage of mass-spectrometry data to potentially identify novel protein-coding genes. Finally, we will outline how the C57BL6/J genebuild can be used to gain insights into the variant sites that distinguish different mouse strains and species.
Production and purification of recombinant human glucagon overexpressed as intein fusion protein in Escherichia coli.

PubMed

Esipov, Roman S; Stepanenko, Vasily N; Gurevich, Alexandr I; Chupova, Larisa A; Miroshnikov, Anatoly I

2006-01-01

Chemico-enzymatic synthesis and cloning in Esherichia coli of an artificial gene coding human glucagon was performed. Recombinant plasmid containing hybrid glucagons gene and intein Ssp dnaB from Synechocestis sp. was designed. Expression of the obtained hybrid gene in E. coli, properties of the formed hybrid protein, and conditions of its autocatalytic cleavage leading to glucagon formation were studied.
Pre-Mrna Introns as a Model for Cryptographic Algorithm:. Theory and Experiments

NASA Astrophysics Data System (ADS)

Regoli, Massimo

2010-01-01

The RNA-Crypto System (shortly RCS) is a symmetric key algorithm to cipher data. The idea for this new algorithm starts from the observation of nature. In particular from the observation of RNA behavior and some of its properties. In particular the RNA sequences have some sections called Introns. Introns, derived from the term "intragenic regions", are non-coding sections of precursor mRNA (pre-mRNA) or other RNAs, that are removed (spliced out of the RNA) before the mature RNA is formed. Once the introns have been spliced out of a pre-mRNA, the resulting mRNA sequence is ready to be translated into a protein. The corresponding parts of a gene are known as introns as well. The nature and the role of Introns in the pre-mRNA is not clear and it is under ponderous researches by Biologists but, in our case, we will use the presence of Introns in the RNA-Crypto System output as a strong method to add chaotic non coding information and an unnecessary behaviour in the access to the secret key to code the messages. In the RNA-Crypto System algorithm the introns are sections of the ciphered message with non-coding information as well as in the precursor mRNA.
a Simple Symmetric Algorithm Using a Likeness with Introns Behavior in RNA Sequences

NASA Astrophysics Data System (ADS)

Regoli, Massimo

2009-02-01

The RNA-Crypto System (shortly RCS) is a symmetric key algorithm to cipher data. The idea for this new algorithm starts from the observation of nature. In particular from the observation of RNA behavior and some of its properties. The RNA sequences has some sections called Introns. Introns, derived from the term "intragenic regions", are non-coding sections of precursor mRNA (pre-mRNA) or other RNAs, that are removed (spliced out of the RNA) before the mature RNA is formed. Once the introns have been spliced out of a pre-mRNA, the resulting mRNA sequence is ready to be translated into a protein. The corresponding parts of a gene are known as introns as well. The nature and the role of Introns in the pre-mRNA is not clear and it is under ponderous researches by Biologists but, in our case, we will use the presence of Introns in the RNA-Crypto System output as a strong method to add chaotic non coding information and an unnecessary behaviour in the access to the secret key to code the messages. In the RNA-Crypto System algoritnm the introns are sections of the ciphered message with non-coding information as well as in the precursor mRNA.
Mechanisms of radiation-induced gene responses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Woloschak, G.E.; Paunesku, T.

1996-10-01

In the process of identifying genes differentially expressed in cells exposed ultraviolet radiation, we have identified a transcript having a 26-bp region that is highly conserved in a variety of species including Bacillus circulans, yeast, pumpkin, Drosophila, mouse, and man. When the 5` region (flanking region or UTR) of a gene, the sequence is predominantly in +/+ orientation with respect to the coding DNA strand; while in the coding region and the 3` region (UTR), the sequence is most frequently in the +/-orientation with respect to the coding DNA strand. In two genes, the element is split into two parts;more » however, in most cases, it is found only once but with a minimum of 11 consecutive nucleotides precisely depicting the original sequence. The element is found in a large number of different genes with diverse functions (from human ras p21 to B. circulans chitonase). Gel shift assays demonstrated the presence of a protein in HeLa cell extracts that binds to the sense and antisense single-stranded consensus oligomers, as well as to the double- stranded oligonucleotide. When double-stranded oligomer was used, the size shift demonstrated as additional protein-oligomer complex larger than the one bound to either sense or antisense single-stranded consensus oligomers alone. It is speculated either that this element binds to protein(s) important in maintaining DNA is a single-stranded orientation for transcription or, alternatively that this element is important in the transcription-coupled DNA repair process.« less
Structure and Expression of Hybrid Dysgenesis-Induced Alleles of the Ovarian Tumor (Otu) Gene in Drosophila Melanogaster

PubMed Central

Sass, G. L.; Mohler, J. D.; Walsh, R. C.; Kalfayan, L. J.; Searles, L. L.

1993-01-01

Mutations at the ovarian tumor (otu) gene of Drosophila melanogaster cause female sterility and generate a range of ovarian phenotypes. Quiescent (QUI) mutants exhibit reduced germ cell proliferation; in oncogenic (ONC) mutants germ cells undergo uncontrolled proliferation generating excessive numbers of undifferentiated cells; the egg chambers of differentiated (DIF) mutants differentiate to variable degrees but fail to complete oogenesis. We have examined mutations caused by insertion and deletion of P elements at the otu gene. The P element insertion sites are upstream of the major otu transcription start sites. In deletion derivatives, the P element, regulatory regions and/or protein coding sequences have been removed. In both insertion and deletion mutants, the level of otu expression correlates directly with the severity of the phenotype: the absence of otu function produces the most severe QUI phenotype while the ONC mutants express lower levels of otu than those which are DIF. The results of this study demonstrate that the diverse mutant phenotypes of otu are the consequence of different levels of otu function. PMID:8436274
Tau mRNA 3'UTR-to-CDS ratio is increased in Alzheimer disease.

PubMed

García-Escudero, Vega; Gargini, Ricardo; Martín-Maestro, Patricia; García, Esther; García-Escudero, Ramón; Avila, Jesús

2017-08-10

Neurons frequently show an imbalance in expression of the 3' untranslated region (3'UTR) relative to the coding DNA sequence (CDS) region of mature messenger RNAs (mRNA). The ratio varies among different cells or parts of the brain. The Map2 protein levels per cell depend on the 3'UTR-to-CDS ratio rather than the total mRNA amount, which suggests powerful regulation of protein expression by 3'UTR sequences. Here we found that MAPT (the microtubule-associated protein tau gene) 3'UTR levels are particularly high with respect to other genes; indeed, the 3'UTR-to-CDS ratio of MAPT is balanced in healthy brain in mouse and human. The tau protein accumulates in Alzheimer diseased brain. We nonetheless observed that the levels of RNA encoding MAPT/tau were diminished in these patients' brains. To explain this apparently contradictory result, we studied MAPT mRNA stoichiometry in coding and non-coding regions, and found that the 3'UTR-to-CDS ratio was higher in the hippocampus of Alzheimer disease patients, with higher tau protein but lower total mRNA levels. Our data indicate that changes in the 3'UTR-to-CDS ratio have a regulatory role in the disease. Future research should thus consider not only mRNA levels, but also the ratios between coding and non-coding regions. Copyright © 2017 Elsevier B.V. All rights reserved.
Origins of genes: "big bang" or continuous creation?

PubMed Central

Keese, P K; Gibbs, A

1992-01-01

Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes. PMID:1329098
Molecular cloning of chitinase 33 (chit33) gene from Trichoderma atroviride

PubMed Central

Matroudi, S.; Zamani, M.R.; Motallebi, M.

2008-01-01

In this study Trichoderma atroviride was selected as over producer of chitinase enzyme among 30 different isolates of Trichoderma sp. on the basis of chitinase specific activity. From this isolate the genomic and cDNA clones encoding chit33 have been isolated and sequenced. Comparison of genomic and cDNA sequences for defining gene structure indicates that this gene contains three short introns and also an open reading frame coding for a protein of 321 amino acids. The deduced amino acid sequence includes a 19 aa putative signal peptide. Homology between this sequence and other reported Trichoderma Chit33 proteins are discussed. The coding sequence of chit33 gene was cloned in pEt26b(+) expression vector and expressed in E. coli. PMID:24031242
Complete mitochondrial genome of Yangtze River wild common carp (Cyprinus carpio haematopterus) and Russian scattered scale mirror carp (Cyprinus carpio carpio).

PubMed

Hu, Guang Fu; Liu, Xiang Jiang; Zou, Gui Wei; Li, Zhong; Liang, Hong-Wei; Hu, Shao-Na

2016-01-01

We sequenced the complete mitogenomes of (Cyprinus carpio haematopterus) and Russian scattered scale mirror carp (Cyprinus carpio carpio). Comparison of these two mitogenomes revealed that the mitogenomes of these two common carp strains were remarkably similar in genome length, gene order and content, and AT content. There were only 55 bp variations in 16,581 nucleotides. About 1 bp variation was located in rRNAs, 2 bp in tRNAs, 9 bp in the control region and 43 bp in protein-coding genes. Furthermore, forty-three variable nucleotides in the protein-coding genes of the two strains led to four variable amino acids, which were located in the ND2, ATPase 6, ND5 and ND6 genes, respectively.
Next-Generation Sequencing of Protein-Coding and Long Non-protein-Coding RNAs in Two Types of Exosomes Derived from Human Whole Saliva.

PubMed

Ogawa, Yuko; Tsujimoto, Masafumi; Yanoshita, Ryohei

2016-01-01

Exosomes are small extracellular vesicles containing microRNAs and mRNAs that are produced by various types of cells. We previously used ultrafiltration and size-exclusion chromatography to isolate two types of human salivary exosomes (exosomes I, II) that are different in size and proteomes. We showed that salivary exosomes contain large repertoires of small RNAs. However, precise information regarding long RNAs in salivary exosomes has not been fully determined. In this study, we investigated the compositions of protein-coding RNAs (pcRNAs) and long non-protein-coding RNAs (lncRNAs) of exosome I, exosome II and whole saliva (WS) by next-generation sequencing technology. Although 11% of all RNAs were commonly detected among the three samples, the compositions of reads mapping to known RNAs were similar. The most abundant pcRNA is ribosomal RNA protein, and pcRNAs of some salivary proteins such as S100 calcium-binding protein A8 (protein S100-A8) were present in salivary exosomes. Interestingly, lncRNAs of pseudogenes (presumably, processed pseudogenes) were abundant in exosome I, exosome II and WS. Translationally controlled tumor protein gene, which plays an important role in cell proliferation, cell death and immune responses, was highly expressed as pcRNA and pseudogenes in salivary exosomes. Our results show that salivary exosomes contain various types of RNAs such as pseudogenes and small RNAs, and may mediate intercellular communication by transferring these RNAs to target cells as gene expression regulators.

Biodegradation of DDT by Stenotrophomonas sp. DDT-1: Characterization and genome functional analysis

PubMed Central

Pan, Xiong; Lin, Dunli; Zheng, Yuan; Zhang, Qian; Yin, Yuanming; Cai, Lin; Fang, Hua; Yu, Yunlong

2016-01-01

A novel bacterium capable of utilizing 1,1,1-trichloro-2,2-bis(p-chlorophenyl)ethane (DDT) as the sole carbon and energy source was isolated from a contaminated soil which was identified as Stenotrophomonas sp. DDT-1 based on morphological characteristics, BIOLOG GN2 microplate profile, and 16S rDNA phylogeny. Genome sequencing and functional annotation of the isolate DDT-1 showed a 4,514,569 bp genome size, 66.92% GC content, 4,033 protein-coding genes, and 76 RNA genes including 8 rRNA genes. Totally, 2,807 protein-coding genes were assigned to Clusters of Orthologous Groups (COGs), and 1,601 protein-coding genes were mapped to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway. The degradation half-lives of DDT increased with substrate concentration from 0.1 to 10.0 mg/l, whereas decreased with temperature from 15 °C to 35 °C. Neutral condition was the most favorable for DDT biodegradation. Based on genome annotation of DDT degradation genes and the metabolites detected by GC-MS, a mineralization pathway was proposed for DDT biodegradation in which it was orderly converted into DDE/DDD, DDMU, DDOH, and DDA via dechlorination, hydroxylation, and carboxylation, and ultimately mineralized to carbon dioxide. The results indicate that the isolate DDT-1 is a promising bacterial resource for the removal or detoxification of DDT residues in the environment. PMID:26888254
Expression and regulation of long noncoding RNAs during the osteogenic differentiation of periodontal ligament stem cells in the inflammatory microenvironment.

PubMed

Zhang, Qingbin; Chen, Li; Cui, Shiman; Li, Yan; Zhao, Qi; Cao, Wei; Lai, Shixiang; Yin, Sanjun; Zuo, Zhixiang; Ren, Jian

2017-10-25

Although long noncoding RNAs (lncRNAs) have been emerging as critical regulators in various tissues and biological processes, little is known about their expression and regulation during the osteogenic differentiation of periodontal ligament stem cells (PDLSCs) in inflammatory microenvironment. In this study, we have identified 63 lncRNAs that are not annotated in previous database. These novel lncRNAs were not randomly located in the genome but preferentially located near protein-coding genes related to particular functions and diseases, such as stem cell maintenance and differentiation, development disorders and inflammatory diseases. Moreover, we have identified 650 differentially expressed lncRNAs among different subsets of PDLSCs. Pathway enrichment analysis for neighboring protein-coding genes of these differentially expressed lncRNAs revealed stem cell differentiation related functions. Many of these differentially expressed lncRNAs function as competing endogenous RNAs that regulate protein-coding transcripts through competing shared miRNAs.
First complete mitochondrial genome of the South American annual fish Austrolebias charrua (Cyprinodontiformes: Rivulidae): peculiar features among cyprinodontiforms mitogenomes.

PubMed

Gutiérrez, Verónica; Rego, Natalia; Naya, Hugo; García, Graciela

2015-10-28

Among teleosts, the South American genus Austrolebias (Cyprinodontiformes: Rivulidae) includes 42 taxa of annual fishes divided into five different species groups. It is a monophyletic genus, but morphological and molecular data do not resolve the relationship among intrageneric clades and high rates of substitution have been previously described in some mitochondrial genes. In this work, the complete mitogenome of a species of the genus was determined for the first time. We determined its structure, gene order and evolutionary peculiar features, which will allow us to evaluate the performance of mitochondrial genes in the phylogenetic resolution at different taxonomic levels. Regarding gene content and order, the circular mitogenome of A. charrua (17,271 pb) presents the typical pattern of vertebrate mitogenomes. It contains the full complement of 13 proteins-coding genes, 22 tRNA, 2 rRNA and one non-coding control region. Notably, the tRNA-Cys was only 57 bp in length and lacks the D-loop arm. In three full sibling individuals, heteroplasmatic condition was detected due to a total of 12 variable sites in seven protein-coding genes. Among cyprinodontiforms, the mitogenome of A. charrua exhibits the lowest G+C content (37 %) and GCskew, as well as the highest strand asymmetry with a net difference of T over A at 1st and 3rd codon positions. Considering the 12 coding-genes of the H strand, correspondence analyses of nucleotide composition and codon usage show that A and T at 1st and 3rd codon positions have the highest weight in the first axis, and segregate annual species from the other cyprinodontiforms analyzed. Given the annual life-style, their mitogenomes could be under different selective pressures. All 13 protein-coding genes are under strong purifying selection and we did not find any significant evidence of nucleotide sites showing episodic selection (dN >dS) at annual lineages. When fast evolving third codon positions were removed from alignments, the "supergene" tree recovers our reference species phylogeny as well as the Cytb, ND4L and ND6 genes. Therefore, third codon positions seem to be saturated in the aforementioned coding regions at intergeneric Cyprinodontiformes comparisons. The complete mitogenome obtained in present work, offers relevant data for further comparative studies on molecular phylogeny and systematics of this taxonomic controversial endemic genus of annual fishes.
Development-related expression patterns of protein-coding and miRNA genes involved in porcine muscle growth.

PubMed

Wang, F J; Jin, L; Guo, Y Q; Liu, R; He, M N; Li, M Z; Li, X W

2014-11-27

Muscle growth and development is associated with remarkable changes in protein-coding and microRNA (miRNA) gene expression. To determine the expression patterns of genes and miRNAs related to muscle growth and development, we measured the expression levels of 25 protein-coding and 16 miRNA genes in skeletal and cardiac muscles throughout 5 developmental stages by quantitative reverse transcription-polymerase chain reaction. The Short Time-Series Expression Miner (STEM) software clustering results showed that growth-related genes were downregulated at all developmental stages in both the psoas major and longissimus dorsi muscles, indicating their involvement in early developmental stages. Furthermore, genes related to muscle atrophy, such as forkhead box 1 and muscle ring finger, showed unregulated expression with increasing age, suggesting a decrease in protein synthesis during the later stages of skeletal muscle development. We found that development of the cardiac muscle was a complex process in which growth-related genes were highly expressed during embryonic development, but they did not show uniform postnatal expression patterns. Moreover, the expression level of miR-499, which enhances the expression of the β-myosin heavy chain, was significantly different in the psoas major and longissimus dorsi muscles, suggesting the involvement of miR-499 in the determination of skeletal muscle fiber types. We also performed correlation analyses of messenger RNA and miRNA expression. We found negative relationships between miR-486 and forkhead box 1, and miR-133a and serum response factor at all developmental stages, suggesting that forkhead box 1 and serum response factor are potential targets of miR-486 and miR-133a, respectively.
Draft Genome Sequence of the Deinococcus-Thermus Bacterium Meiothermus ruber Strain A

DOE PAGES

Thiel, Vera; Tomsho, Lynn P.; Burhans, Richard; ...

2015-03-26

The draft genome sequence of the Deinococcus-Thermus group bacterium Meiothermus ruber strain A, isolated from a cyanobacterial enrichment culture obtained from Octopus Spring (Yellowstone National Park, WY), comprises 2,968,099 bp in 170 contigs. It is predicted to contain 2,895 protein-coding genes, 44 tRNA-coding genes, and 2 rRNA operons.
Amino- and carboxyl-terminal amino acid sequences of proteins coded by gag gene of murine leukemia virus

PubMed Central

Oroszlan, Stephen; Henderson, Louis E.; Stephenson, John R.; Copeland, Terry D.; Long, Cedric W.; Ihle, James N.; Gilden, Raymond V.

1978-01-01

The amino- and carboxyl-terminal amino acid sequences of proteins (p10, p12, p15, and p30) coded by the gag gene of Rauscher and AKR murine leukemia viruses were determined. Among these proteins, p15 from both viruses appears to have a blocked amino end. Proline was found to be the common NH2 terminus of both p30s and both p12s, and alanine of both p10s. The amino-terminal sequences of p30s are identical, as are those of p10s, while the p12 sequences are clearly distinctive but also show substantial homology. The carboxyl-terminal amino acids of both viral p30s and p12s are leucine and phenylalanine, respectively. Rauscher leukemia virus p15 has tyrosine as the carboxyl terminus while AKR virus p15 has phenylalanine in this position. The compositional and sequence data provide definite chemical criteria for the identification of analogous gag gene products and for the comparison of viral proteins isolated in different laboratories. On the basis of amino acid sequences and the previously proposed H-p15-p12-p30-p10-COOH peptide sequence in the precursor polyprotein, a model for cleavage sites involved in the post-translational processing of the precursor coded for by the gag gene is proposed. PMID:206897
FunGene: the functional gene pipeline and repository.

PubMed

Fish, Jordan A; Chai, Benli; Wang, Qiong; Sun, Yanni; Brown, C Titus; Tiedje, James M; Cole, James R

2013-01-01

Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer. While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/) offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.
On the evolution of primitive genetic codes.

PubMed

Weberndorfer, Günter; Hofacker, Ivo L; Stadler, Peter F

2003-10-01

The primordial genetic code probably has been a drastically simplified ancestor of the canonical code that is used by contemporary cells. In order to understand how the present-day code came about we first need to explain how the language of the building plan can change without destroying the encoded information. In this work we introduce a minimal organism model that is based on biophysically reasonable descriptions of RNA and protein, namely secondary structure folding and knowledge based potentials. The evolution of a population of such organism under competition for a common resource is simulated explicitly at the level of individual replication events. Starting with very simple codes, and hence greatly reduced amino acid alphabets, we observe a diversification of the codes in most simulation runs. The driving force behind this effect is the possibility to produce fitter proteins when the repertoire of amino acids is enlarged.
Expression of the Long Intergenic Non-Protein Coding RNA 665 (LINC00665) Gene and the Cell Cycle in Hepatocellular Carcinoma Using The Cancer Genome Atlas, the Gene Expression Omnibus, and Quantitative Real-Time Polymerase Chain Reaction.

PubMed

Wen, Dong-Yue; Lin, Peng; Pang, Yu-Yan; Chen, Gang; He, Yun; Dang, Yi-Wu; Yang, Hong

2018-05-05

BACKGROUND Long non-coding RNAs (lncRNAs) have a role in physiological and pathological processes, including cancer. The aim of this study was to investigate the expression of the long intergenic non-protein coding RNA 665 (LINC00665) gene and the cell cycle in hepatocellular carcinoma (HCC) using database analysis including The Cancer Genome Atlas (TCGA), the Gene Expression Omnibus (GEO), and quantitative real-time polymerase chain reaction (qPCR). MATERIAL AND METHODS Expression levels of LINC00665 were compared between human tissue samples of HCC and adjacent normal liver, clinicopathological correlations were made using TCGA and the GEO, and qPCR was performed to validate the findings. Other public databases were searched for other genes associated with LINC00665 expression, including The Atlas of Noncoding RNAs in Cancer (TANRIC), the Multi Experiment Matrix (MEM), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and protein-protein interaction (PPI) networks. RESULTS Overexpression of LINC00665 in patients with HCC was significantly associated with gender, tumor grade, stage, and tumor cell type. Overexpression of LINC00665 in patients with HCC was significantly associated with overall survival (OS) (HR=1.47795%; CI: 1.046-2.086). Bioinformatics analysis identified 469 related genes and further analysis supported a hypothesis that LINC00665 regulates pathways in the cell cycle to facilitate the development and progression of HCC through ten identified core genes: CDK1, BUB1B, BUB1, PLK1, CCNB2, CCNB1, CDC20, ESPL1, MAD2L1, and CCNA2. CONCLUSIONS Overexpression of the lncRNA, LINC00665 may be involved in the regulation of cell cycle pathways in HCC through ten identified hub genes.
Combining Shigella Tn-seq data with gold-standard E. coli gene deletion data suggests rare transitions between essential and non-essential gene functionality.

PubMed

Freed, Nikki E; Bumann, Dirk; Silander, Olin K

2016-09-06

Gene essentiality - whether or not a gene is necessary for cell growth - is a fundamental component of gene function. It is not well established how quickly gene essentiality can change, as few studies have compared empirical measures of essentiality between closely related organisms. Here we present the results of a Tn-seq experiment designed to detect essential protein coding genes in the bacterial pathogen Shigella flexneri 2a 2457T on a genome-wide scale. Superficial analysis of this data suggested that 481 protein-coding genes in this Shigella strain are critical for robust cellular growth on rich media. Comparison of this set of genes with a gold-standard data set of essential genes in the closely related Escherichia coli K12 BW25113 revealed that an excessive number of genes appeared essential in Shigella but non-essential in E. coli. Importantly, and in converse to this comparison, we found no genes that were essential in E. coli and non-essential in Shigella, implying that many genes were artefactually inferred as essential in Shigella. Controlling for such artefacts resulted in a much smaller set of discrepant genes. Among these, we identified three sets of functionally related genes, two of which have previously been implicated as critical for Shigella growth, but which are dispensable for E. coli growth. The data presented here highlight the small number of protein coding genes for which we have strong evidence that their essentiality status differs between the closely related bacterial taxa E. coli and Shigella. A set of genes involved in acetate utilization provides a canonical example. These results leave open the possibility of developing strain-specific antibiotic treatments targeting such differentially essential genes, but suggest that such opportunities may be rare in closely related bacteria.
Influence of Translation Initiation on Organellar Protein Targeting in Arabidopsis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sally A. Mackenzie

2011-04-18

A primary focus of the Mackenzie laboratory is the elucidation of processes and machinery for mitochondrial genome maintenance and transmission in higher plants. We have found that numerous organellar DNA maintenance components in plants appear to be dual targeted to mitochondria and plastids. Of particular interest was the observation that some twin (tandemly arrayed) dual targeting presequences appeared to utilize non-AUG alternative translation initiation, allowing for multiple translation starts at a single gene. Two aspects of this phenomenon were of particular interest: (1) Alternative translation initiation might provide a mechanism to regulate protein targeting temporally and spatially, a possibility thatmore » had not been demonstrated previously, and (2) alternative translation initiation might occur in genes involved in nuclear-controlled mitochondrial genome recombination, thought to be exclusively mitochondrial in their function. During the course of this research, we pursued three aims, with an emphasis on two specific genes of interest: POLgamma2, an organellar DNA polymerase, and MSH1, a MutS homolog thought to participate in mitochondrial, but not plastid, genome recombination surveillance. Our aims were to (1) Identify additional genes within Arabidopsis and other genomes that employ non-AUG alternative translation initiation, (2) Locate sequences upstream to the annotated AUG that confer alternative non-AUG translation initiation activity, and (3) Identify cis and trans factors that influence start site selection in genes with non-AUG starts. Toward these ends, we have shown that non-AUG initiation occurs in a number of genes, likely influencing targeting behavior of the protein. We have also shown that start site selection is strongly influenced by Kozak consensus sequence environment, indicating that alternative translation initiation in plants occurs by relaxation of ribosome scanning.« less
MitoNuc: a database of nuclear genes coding for mitochondrial proteins. Update 2002.

PubMed

Attimonelli, Marcella; Catalano, Domenico; Gissi, Carmela; Grillo, Giorgio; Licciulli, Flavio; Liuni, Sabino; Santamaria, Monica; Pesole, Graziano; Saccone, Cecilia

2002-01-01

Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organelle genomes. Mitochondrial genomes have been extensively sequenced and analysed and data have been collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc, a database containing detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa. The MitoNuc database can be retrieved through SRS and is available via the web site http://bighost.area.ba.cnr.it/mitochondriome where other mitochondrial databases developed by our group, the complete list of the sequenced mitochondrial genomes, links to other mitochondrial sites and related information, are available. The MitoAln database, related to MitoNuc in the previous release, reporting the multiple alignments of the relevant homologous protein coding regions, is no longer supported in the present release. In order to keep the links among entries in MitoNuc from homologous proteins, a new field in the database has been defined: the cluster identifier, an alpha numeric code used to identify each cluster of homologous proteins. A comment field derived from the corresponding SWISS-PROT entry has been introduced; this reports clinical data related to dysfunction of the protein. The logic scheme of MitoNuc database has been implemented in the ORACLE DBMS. This will allow the end-users to retrieve data through a friendly interface that will be soon implemented.
Analysis of the complete genome of peach chlorotic mottle virus: identification of non-AUG start codons, in vitro coat protein expression, and elucidation of serological cross-reactions.

PubMed

James, D; Varga, A; Croft, H

2007-01-01

The entire genome of peach chlorotic mottle virus (PCMV), originally identified as Prunus persica cv. Agua virus (4N6), was sequenced and analysed. PCMV cross-reacts with antisera to diverse viruses, such as plum pox virus (PPV), genus Potyvirus, family Potyviridae; and apple stem pitting virus (ASPV), genus Foveavirus, family Flexiviridae. The PCMV genome consists of 9005 nucleotides (nts), excluding a poly(A) tail at the 3' end of the genome. Five open reading frames (ORFs) were identified with four untranslated regions (UTR) including a 5', a 3', and two intergenic UTRs. The genome organisation of PCMV is similar to that of ASPV and the two genomes share a nucleotide (nt) sequence identity of 58%. PCMV ORF1 encodes the replication-associated protein complex (Mr 241,503), ORF2-ORF4 code for the triple gene block proteins (TGBp; Mr 24,802, 12,370, and 7320, respectively), and ORF5 encodes the coat protein (CP) (Mr 42,505). Two non-AUG start codons participate in the initiation of translation: 35AUC and 7676AUA initiate translation of ORF1 and ORF5. In vitro expression with subsequent Western blot analysis confirmed ORF5 as the CP-encoding gene and confirmed that the codon AUA is able to initiate translation of the CP. Expression of a truncated CP fragment (Mr 39, 689) was demonstrated, and both proteins are expressed in vivo, since both were observed in Western blot analysis of PCMV-infected peach and Nicotiana occidentalis. The expressed proteins cross-reacted with an antiserum against ASPV. The amino acid sequences of the CPs of PCMV and ASPV CP share only 37% identity, but there are 11 shared peptides 4-8 aa residues long. These may constitute linear epitopes responsible for ASPV antiserum cross reactions. No significant common linear epitopes were associated with PPV. Extensive phylogenetic analysis indicates that PCMV is closely related to ASPV and is a new and distinct member of the genus Foveavirus.
Genetics of PCOS: A systematic bioinformatics approach to unveil the proteins responsible for PCOS.

PubMed

Panda, Pritam Kumar; Rane, Riya; Ravichandran, Rahul; Singh, Shrinkhla; Panchal, Hetalkumar

2016-06-01

Polycystic ovary syndrome (PCOS) is a hormonal imbalance in women, which causes problems during menstrual cycle and in pregnancy that sometimes results in fatality. Though the genetics of PCOS is not fully understood, early diagnosis and treatment can prevent long-term effects. In this study, we have studied the proteins involved in PCOS and the structural aspects of the proteins that are taken into consideration using computational tools. The proteins involved are modeled using Modeller 9v14 and Ab-initio programs. All the 43 proteins responsible for PCOS were subjected to phylogenetic analysis to identify the relatedness of the proteins. Further, microarray data analysis of PCOS datasets was analyzed that was downloaded from GEO datasets to find the significant protein-coding genes responsible for PCOS, which is an addition to the reported protein-coding genes. Various statistical analyses were done using R programming to get an insight into the structural aspects of PCOS that can be used as drug targets to treat PCOS and other related reproductive diseases.
Permanent draft genome of Thermithiobacillus tepidarius DSM 3134 T, a moderately thermophilic, obligately chemolithoautotrophic member of the Acidithiobacillia

DOE PAGES

Boden, Rich; Hutt, Lee P.; Huntemann, Marcel; ...

2016-09-26

Thermithiobacillus tepidarius DSM 3134 T was originally isolated (1983) from the waters of a sulfidic spring entering the Roman Baths (Temple of Sulis-Minerva) at Bath, United Kingdom and is an obligate chemolithoautotroph growing at the expense of reduced sulfur species. This strain has a genome size of 2,958,498 bp. Here we report the genome sequence, annotation and characteristics. The genome comprises 2,902 protein coding and 66 RNA coding genes. Genes responsible for the transaldolase variant of the Calvin-Benson-Bassham cycle were identified along with a biosynthetic horseshoe in lieu of Krebs' cycle sensu stricto. Terminal oxidases were identified, viz. cytochrome cmore » oxidase (cbb 3 , EC 1.9.3.1) and ubiquinol oxidase (bd, EC 1.10.3.10). Metalloresistance genes involved in pathways of arsenic and cadmium resistance were found. Evidence of horizontal gene transfer accounting for 5.9 % of the protein-coding genes was found, including transfer from Thiobacillus spp. and Methylococcus capsulatus Bath, isolated from the same spring. A sox gene cluster was found, similar in structure to those from other Acidithiobacillia - by comparison with Thiobacillus thioparus and Paracoccus denitrificans, an additional gene between soxA and soxB was found, annotated as a DUF302-family protein of unknown function. As the Kelly-Friedrich pathway of thiosulfate oxidation (encoded by sox) is not used in Thermithiobacillus spp., the role of the operon (if any) in this species remains unknown. We speculate that DUF302 and sox genes may have a role in periplasmic trithionate oxidation.« less
Permanent draft genome of Thermithiobacillus tepidarius DSM 3134 T, a moderately thermophilic, obligately chemolithoautotrophic member of the Acidithiobacillia

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boden, Rich; Hutt, Lee P.; Huntemann, Marcel

Thermithiobacillus tepidarius DSM 3134 T was originally isolated (1983) from the waters of a sulfidic spring entering the Roman Baths (Temple of Sulis-Minerva) at Bath, United Kingdom and is an obligate chemolithoautotroph growing at the expense of reduced sulfur species. This strain has a genome size of 2,958,498 bp. Here we report the genome sequence, annotation and characteristics. The genome comprises 2,902 protein coding and 66 RNA coding genes. Genes responsible for the transaldolase variant of the Calvin-Benson-Bassham cycle were identified along with a biosynthetic horseshoe in lieu of Krebs' cycle sensu stricto. Terminal oxidases were identified, viz. cytochrome cmore » oxidase (cbb 3 , EC 1.9.3.1) and ubiquinol oxidase (bd, EC 1.10.3.10). Metalloresistance genes involved in pathways of arsenic and cadmium resistance were found. Evidence of horizontal gene transfer accounting for 5.9 % of the protein-coding genes was found, including transfer from Thiobacillus spp. and Methylococcus capsulatus Bath, isolated from the same spring. A sox gene cluster was found, similar in structure to those from other Acidithiobacillia - by comparison with Thiobacillus thioparus and Paracoccus denitrificans, an additional gene between soxA and soxB was found, annotated as a DUF302-family protein of unknown function. As the Kelly-Friedrich pathway of thiosulfate oxidation (encoded by sox) is not used in Thermithiobacillus spp., the role of the operon (if any) in this species remains unknown. We speculate that DUF302 and sox genes may have a role in periplasmic trithionate oxidation.« less
Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri

DOE Office of Scientific and Technical Information (OSTI.GOV)

Prochnik, Simon E.; Umen, James; Nedelcu, Aurora

2010-07-01

Analysis of the Volvox carteri genome reveals that this green alga's increased organismal complexity and multicellularity are associated with modifications in protein families shared with its unicellular ancestor, and not with large-scale innovations in protein coding capacity. The multicellular green alga Volvox carteri and its morphologically diverse close relatives (the volvocine algae) are uniquely suited for investigating the evolution of multicellularity and development. We sequenced the 138 Mb genome of V. carteri and compared its {approx}14,500 predicted proteins to those of its unicellular relative, Chlamydomonas reinhardtii. Despite fundamental differences in organismal complexity and life history, the two species have similarmore » protein-coding potentials, and few species-specific protein-coding gene predictions. Interestingly, volvocine algal-specific proteins are enriched in Volvox, including those associated with an expanded and highly compartmentalized extracellular matrix. Our analysis shows that increases in organismal complexity can be associated with modifications of lineage-specific proteins rather than large-scale invention of protein-coding capacity.« less
[FOXP2: from the specific disorder to the molecular biology of language. I. Aetiological, neuroanatomical, neurophysiological and molecular aspects].

PubMed

Benítez-Burraco, A

The task of cloning the genes whose products are involved in the organisation and functioning of the nerve centres that enable language tasks to be executed must necessarily start with the identification and the cognitive, linguistic, neuroanatomical and neurophysiological analysis of individuals with hereditary (specific) language impairment (SLI). The first of these genes to be characterised in this way--a gene called FOXP2--codes for a regulating factor that acts as a transcriptional repressor in the central nervous system. It is expressed in neuronal populations mainly situated in the basal ganglia, but also in the cortex, cerebellum and the thalamus, which are presumably involved in the development and/or functioning of the thalamic-cortical-striatal circuits associated with motor planning and learning. The protein FOXP2 shows several structural patterns that, when altered in other proteins, also give rise to different disorders in the central nervous system. The pattern of expression of the gene is preserved phylogenetically, although this does not happen in the case of the pattern of mRNA maturation. In individuals with a mutated version of FOXP2, morphological and functional anomalies are detected in those areas in which the gene is expressed. These abnormalities can be correlated satisfactorily with the phenotypic characteristics of the disorder, which are at the same time of both a motor and linguistic nature. The fact that other variations of SLI are not linked to the FOXP2 gene raises the need for further research into the genetic bases of the disorder, while also suggesting that it would be advisable to reassess the phenotypic scope of the variant associated to the mutation of this gene.
The complete mitochondrial genome of Octopus conispadiceus (Sasaki, 1917) (Cephalopoda: Octopodidae).

PubMed

Ma, Yuanyuan; Zheng, Xiaodong; Cheng, Rubin; Li, Qi

2016-01-01

In this paper, we determined the complete mitochondrial genome of Octopus conispadiceus (Cephalopoda: Octopodidae). The whole mitogenome of O. conispadiceus is 16,027 basepairs (bp) in length with a base composition of 41.4% A, 34.8% T, 16.1% C, 7.7% G and contains 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, and a major non-coding region (MNR). The gene arrangements of O. conispadiceus showed remarkable similarity to that of O. vulgaris, Amphioctopus fangsiao, Cistopus chinensis and C. taiwanicus.
The complete mitochondrial genome of Conus tulipa (Neogastropoda: Conidae).

PubMed

Chen, Po-Wei; Hsiao, Sheng-Tai; Huang, Chih-Wei; Chen, Kao-Sung; Tseng, Chen-Te; Wu, Wen-Lung; Hwang, Deng-Fwu

2016-07-01

The complete mitogenome sequence of the cone snail Conus tulipa (Linnaeus, 1758) has been sequenced by next-generation sequencing method. The assembled mitogenome is 16,599 bp in length, including 13 protein-coding genes, 22 transfer RNA genes and 2 ribosomal RNA genes. The overall base composition of C. tulipa is 28.7% A, 15.2% C, 18.4% G and 37.7% T. It shows 81.1% identity to the cone snail C. consors, 78.5% to C. borgesi and 77.5% to C. textile. Using the 13 protein-coding genes and 2 ribosomal RNA genes of C. tulipa in this study, together with 18 other closely species, we constructed the species phylogenetic tree to verify the accuracy and utility of new determined mitogenome sequence. The complete mitogenome of the C. tulipa provides an essential and important DNA molecular data for further phylogeography and evolutionary analysis for cone snail phylogeny.

Complete mitochondrial genome of the mottled skate: Raja pulchra (Rajiformes, Rajidae).

PubMed

Jeong, Dageum; Kim, Sung; Kim, Choong-Gon; Myoung, Jung-Goo; Lee, Youn-Ho

2016-05-01

The complete sequence of mitochondrial DNA of a mottled skate, Raja pulchra was sequenced as being circular molecules of 16,907 bp including 2 rRNA, 22 tRNA, 13 protein-coding genes (PCGs), and an AT-rich control region. The organization of the PCGs is the same as those found in other Rajidae species. The nucleotide of L-strand is composed of 29.8% A, 28.0% C, 27.9% T, and 14.3% G with a bias toward A + T slightly. Twelve of 13 PCGs are initiated by the ATG codon while COX1 starts with GTG. Only ND4 harbors the incomplete termination codon, TA. All tRNA genes have a typical clover-leaf structure of mitochondrial tRNA with the exception of [Formula: see text] which has a reduced DHU arm. This mitogenome will provide essential information for better phylogenetic resolution and precision of the family Rajidae and the genus Raja as well as for establishment of a fish stock recovery plan of the species.
The complete mitochondrial genome of Chrysopa pallens (Insecta, Neuroptera, Chrysopidae).

PubMed

He, Kun; Chen, Zhe; Yu, Dan-Na; Zhang, Jia-Yong

2012-10-01

The complete mitochondrial genome of Chrysopa pallens (Neuroptera, Chrysopidae) was sequenced. It consists of 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA (rRNA) genes, and a control region (AT-rich region). The total length of C. pallens mitogenome is 16,723 bp with 79.5% AT content, and the length of control region is 1905 bp with 89.1% AT content. The non-coding regions of C. pallens include control region between 12S rRNA and trnI genes, and a 75-bp space region between trnI and trnQ genes.
Structure and expression of canary myc family genes.

PubMed Central

Collum, R G; Clayton, D F; Alt, F W

1991-01-01

We found that the canary N-myc gene is highly related to mammalian N-myc genes in both the protein-coding region and the long 3' untranslated region. Examined coding regions of the canary c-myc gene were also highly related to their mammalian counterparts, but in contrast to N-myc, the canary and mammalian c-myc genes were quite divergent in their 3' untranslated regions. We readily detected N-myc and c-myc expression in the adult canary brain and found N-myc expression both at sites of proliferating neuronal precursors and in mature neurons. Images PMID:1996121
Expression-Linked Patterns of Codon Usage, Amino Acid Frequency, and Protein Length in the Basally Branching Arthropod Parasteatoda tepidariorum

PubMed Central

Whittle, Carrie A.; Extavour, Cassandra G.

2016-01-01

Abstract Spiders belong to the Chelicerata, the most basally branching arthropod subphylum. The common house spider, Parasteatoda tepidariorum, is an emerging model and provides a valuable system to address key questions in molecular evolution in an arthropod system that is distinct from traditionally studied insects. Here, we provide evidence suggesting that codon usage, amino acid frequency, and protein lengths are each influenced by expression-mediated selection in P. tepidariorum. First, highly expressed genes exhibited preferential usage of T3 codons in this spider, suggestive of selection. Second, genes with elevated transcription favored amino acids with low or intermediate size/complexity (S/C) scores (glycine and alanine) and disfavored those with large S/C scores (such as cysteine), consistent with the minimization of biosynthesis costs of abundant proteins. Third, we observed a negative correlation between expression level and coding sequence length. Together, we conclude that protein-coding genes exhibit signals of expression-related selection in this emerging, noninsect, arthropod model. PMID:27017527
Orpinomyces cellulase celf protein and coding sequences

DOEpatents

Li, Xin-Liang; Chen, Huizhong; Ljungdahl, Lars G.

2000-09-05

A cDNA (1,520 bp), designated celF, consisting of an open reading frame (ORF) encoding a polypeptide (CelF) of 432 amino acids was isolated from a cDNA library of the anaerobic rumen fungus Orpinomyces PC-2 constructed in Escherichia coli. Analysis of the deduced amino acid sequence showed that starting from the N-terminus, CelF consists of a signal peptide, a cellulose binding domain (CBD) followed by an extremely Asn-rich linker region which separate the CBD and the catalytic domains. The latter is located at the C-terminus. The catalytic domain of CelF is highly homologous to CelA and CelC of Orpinomyces PC-2, to CelA of Neocallimastix patriciarum and also to cellobiohydrolase IIs (CBHIIs) from aerobic fungi. However, Like CelA of Neocallimastix patriciarum, CelF does not have the noncatalytic repeated peptide domain (NCRPD) found in CelA and CelC from the same organism. The recombinant protein CelF hydrolyzes cellooligosaccharides in the pattern of CBHII, yielding only cellobiose as product with cellotetraose as the substrate. The genomic celF is interrupted by a 111 bp intron, located within the region coding for the CBD. The intron of the celF has features in common with genes from aerobic filamentous fungi.
Identification of the Operon for the Sorbitol (Glucitol) Phosphoenolpyruvate:Sugar Phosphotransferase System in Streptococcus mutans

PubMed Central

Boyd, David A.; Thevenot, Tracy; Gumbmann, Markus; Honeyman, Allen L.; Hamilton, Ian R.

2000-01-01

Transposon mutagenesis and marker rescue were used to isolate and identify an 8.5-kb contiguous region containing six open reading frames constituting the operon for the sorbitol P-enolpyruvate phosphotransferase transport system (PTS) of Streptococcus mutans LT11. The first gene, srlD, codes for sorbitol-6-phosphate dehydrogenase, followed downstream by srlR, coding for a transcriptional regulator; srlM, coding for a putative activator; and the srlA, srlE, and srlB genes, coding for the EIIC, EIIBC, and EIIA components of the sorbitol PTS, respectively. Among all sorbitol PTS operons characterized to date, the srlD gene is found after the genes coding for the EII components; thus, the location of the gene in S. mutans is unique. The SrlR protein is similar to several transcriptional regulators found in Bacillus spp. that contain PTS regulator domains (J. Stülke, M. Arnaud, G. Rapoport, and I. Martin-Verstraete, Mol. Microbiol. 28:865–874, 1998), and its gene overlaps the srlM gene by 1 bp. The arrangement of these two regulatory genes is unique, having not been reported for other bacteria. PMID:10639465
[Identification of proteins interacting with the circadian clock protein PER1 in tumors using bacterial two-hybrid system technique].

PubMed

Zhang, Yu; Yao, Youlin; Jiang, Siyuan; Lu, Yilu; Liu, Yunqiang; Tao, Dachang; Zhang, Sizhong; Ma, Yongxin

2015-04-01

To identify protein-protein interaction partners of PER1 (period circadian protein homolog 1), key component of the molecular oscillation system of the circadian rhythm in tumors using bacterial two-hybrid system technique. Human cervical carcinoma cell Hela library was adopted. Recombinant bait plasmid pBT-PER1 and pTRG cDNA plasmid library were cotransformed into the two-hybrid system reporter strain cultured in a special selective medium. Target clones were screened. After isolating the positive clones, the target clones were sequenced and analyzed. Fourteen protein coding genes were identified, 4 of which were found to contain whole coding regions of genes, which included optic atrophy 3 protein (OPA3) associated with mitochondrial dynamics and homo sapiens cutA divalent cation tolerance homolog of E. coli (CUTA) associated with copper metabolism. There were also cellular events related proteins and proteins which are involved in biochemical reaction and signal transduction-related proteins. Identification of potential interacting proteins with PER1 in tumors may provide us new insights into the functions of the circadian clock protein PER1 during tumorigenesis.
The chloroplast tRNALys(UUU) gene from mustard (Sinapis alba) contains a class II intron potentially coding for a maturase-related polypeptide.

PubMed

Neuhaus, H; Link, G

1987-01-01

The trnK gene endocing the tRNALys(UUU) has been located on mustard (Sinapis alba) chloroplast DNA, 263 bp upstream of the psbA gene on the same strand. The nucleotide sequence of the trnK gene and its flanking regions as well as the putative transcription start and termination sites are shown. The 5' end of the transcript lies 121 bp upstream of the 5' tRNA coding region and is preceded by procaryotic-type "-10" and "-35" sequence elements, while the 3' end maps 2.77 kb downstream to a DNA region with possible stemloop secondary structure. The anticodon loop of the tRNALys is interrupted by a 2,574 bp intron containing a long open reading frame, which codes for 524 amino acids. Based on conserved stem and loop structures, this intron has characteristic features of a class II intron. A region near the carboxyl terminus of the derived polypeptide appears structurally related to maturases.
Molecular identification and transcriptional regulation of porcine IFIT2 gene.

PubMed

Yang, Xiuqin; Jing, Xiaoyan; Song, Yanfang; Zhang, Caixia; Liu, Di

2018-04-06

IFN-induced protein with tetratricopeptide repeats 2 (IFIT2) plays important roles in host defense against viral infection as revealed by studies in humans and mice. However, little is known on porcine IFIT2 (pIFIT2). Here, we performed molecular cloning, expression profile, and transcriptional regulation analysis of pIFIT2. pIFIT2 gene, located on chromosome 14, is composed of two exons and have a complete coding sequence of 1407 bp. The encoded polypeptide, 468 aa in length, has three tetratricopeptide repeat motifs. pIFIT2 gene was unevenly distributed in all eleven tissues studied with the most abundance in spleen. Poly(I:C) treatment notably strongly upregulated the mRNA level and promoter activity of pIFIT2 gene. Upstream sequence of 1759 bp from the start codon which was assigned +1 here has promoter activity, and deltaEF1 acts as transcription repressor through binding to sequences at position - 1774 to - 1764. Minimal promoter region exists within nucleotide position - 162 and - 126. Two adjacent interferon-stimulated response elements (ISREs) and two nuclear factor (NF)-κB binding sites were identified within position - 310 and - 126. The ISRE elements act alone and in synergy with the one closer to start codon having more strength, so do the NF-κB binding sites. Synergistic effect was also found between the ISRE and NF-κB binding sites. Additionally, a third ISRE element was identified within position - 1661 to - 1579. These findings will contribute to clarifying the antiviral effect and underlying mechanisms of pIFIT2.
Gene cloning and prokaryotic expression of recombinant outer membrane protein from Vibrio parahaemolyticus

NASA Astrophysics Data System (ADS)

Yuan, Ye; Wang, Xiuli; Guo, Sheping; Qiu, Xuemei

2011-06-01

Gram-negative Vibrio parahaemolyticus is a common pathogen in humans and marine animals. The outer membrane protein of bacteria plays an important role in the infection and pathogenicity to the host. Thus, the outer membrane proteins are an ideal target for vaccines. We amplified a complete outer membrane protein gene (ompW) from V. parahaemolyticus ATCC 17802. We then cloned and expressed the gene into Escherichia coli BL21 (DE3) cells. The gene coded for a protein that was 42.78 kDa. We purified the protein using Ni-NTA affinity chromatography and Anti-His antibody Western blotting, respectively. Our results provide a basis for future application of the OmpW protein as a vaccine candidate against infection by V. parahaemolyticus. In addition, the purified OmpW protein can be used for further functional and structural studies.
A Molecular Portrait of De Novo Genes in Yeasts.

PubMed

Vakirlis, Nikolaos; Hebert, Alex S; Opulente, Dana A; Achaz, Guillaume; Hittinger, Chris Todd; Fischer, Gilles; Coon, Joshua J; Lafontaine, Ingrid

2018-03-01

New genes, with novel protein functions, can evolve "from scratch" out of intergenic sequences. These de novo genes can integrate the cell's genetic network and drive important phenotypic innovations. Therefore, identifying de novo genes and understanding how the transition from noncoding to coding occurs are key problems in evolutionary biology. However, identifying de novo genes is a difficult task, hampered by the presence of remote homologs, fast evolving sequences and erroneously annotated protein coding genes. To overcome these limitations, we developed a procedure that handles the usual pitfalls in de novo gene identification and predicted the emergence of 703 de novo gene candidates in 15 yeast species from 2 genera whose phylogeny spans at least 100 million years of evolution. We validated 85 candidates by proteomic data, providing new translation evidence for 25 of them through mass spectrometry experiments. We also unambiguously identified the mutations that enabled the transition from noncoding to coding for 30 Saccharomyces de novo genes. We established that de novo gene origination is a widespread phenomenon in yeasts, only a few being ultimately maintained by selection. We also found that de novo genes preferentially emerge next to divergent promoters in GC-rich intergenic regions where the probability of finding a fortuitous and transcribed ORF is the highest. Finally, we found a more than 3-fold enrichment of de novo genes at recombination hot spots, which are GC-rich and nucleosome-free regions, suggesting that meiotic recombination contributes to de novo gene emergence in yeasts.
Evolution at protein ends: major contribution of alternative transcription initiation and termination to the transcriptome and proteome diversity in mammals

PubMed Central

Shabalina, Svetlana A.; Ogurtsov, Aleksey Y.; Spiridonov, Nikolay A.; Koonin, Eugene V.

2014-01-01

Alternative splicing (AS), alternative transcription initiation (ATI) and alternative transcription termination (ATT) create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5′ and 3′ transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5′-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3′-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns. PMID:24792168
Mechanisms generating long range correlation in nucleotide composition of the Borrelia Burgdorferi genome

NASA Astrophysics Data System (ADS)

Mackiewicz, P.; Gierlik, A.; Kowalczuk, M.; Szczepanik, D.; Dudek, M. R.; Cebrat, S.

1999-12-01

We have analysed protein coding and intergenic sequences in the Borrelia burgdorferi (the Lyme disease bacterium) genome using different kinds of DNA walks. Genes occupying the leading strand of DNA have significantly different nucleotide composition from genes occupying the lagging strand. Nucleotide compositional bias of the two DNA strands reflects the aminoacid composition of proteins. 96% of genes coding for ribosomal proteins lie on the leading DNA strand, which suggests that the positions of these as well as other genes are non-random. In the B. burgdorferi genome, the asymmetry in intergenic DNA sequences is lower than the asymmetry in the third positions in codons. All these characters of the B. burgdorferi genome suggest that both replication-associated mutational pressure and recombination mechanisms have established the specific structure of the genome and now any recombination leading to inversion of a gene in respect to the direction of replication is forbidden. This property of the genome allows us to assume that it is in a steady state, which enables us to fix some parameters for simulations of DNA evolution.
Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis.

PubMed

Zhu, Yafeng; Engström, Pär G; Tellgren-Roth, Christian; Baudo, Charles D; Kennell, John C; Sun, Sheng; Billmyre, R Blake; Schröder, Markus S; Andersson, Anna; Holm, Tina; Sigurgeirsson, Benjamin; Wu, Guangxi; Sankaranarayanan, Sundar Ram; Siddharthan, Rahul; Sanyal, Kaustuv; Lundeberg, Joakim; Nystedt, Björn; Boekhout, Teun; Dawson, Thomas L; Heitman, Joseph; Scheynius, Annika; Lehtiö, Janne

2017-03-17

Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Directed Shotgun Proteomics Guided by Saturated RNA-seq Identifies a Complete Expressed Prokaryotic Proteome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Omasits, U.; Quebatte, Maxime; Stekhoven, Daniel J.

2013-11-01

Prokaryotes, due to their moderate complexity, are particularly amenable to the comprehensive identification of the protein repertoire expressed under different conditions. We applied a generic strategy to identify a complete expressed prokaryotic proteome, which is based on the analysis of RNA and proteins extracted from matched samples. Saturated transcriptome profiling by RNA-seq provided an endpoint estimate of the protein-coding genes expressed under two conditions which mimic the interaction of Bartonella henselae with its mammalian host. Directed shotgun proteomics experiments were carried out on four subcellular fractions. By specifically targeting proteins which are short, basic, low abundant, and membrane localized, wemore » could eliminate their initial underrepresentation compared to the estimated endpoint. A total of 1250 proteins were identified with an estimated false discovery rate below 1%. This represents 85% of all distinct annotated proteins and ~90% of the expressed protein-coding genes. Genes that were detected at the transcript but not protein level, were found to be highly enriched in several genomic islands. Furthermore, genes that lacked an ortholog and a functional annotation were not detected at the protein level; these may represent examples of overprediction in genome annotations. A dramatic membrane proteome reorganization was observed, including differential regulation of autotransporters, adhesins, and hemin binding proteins. Particularly noteworthy was the complete membrane proteome coverage, which included expression of all members of the VirB/D4 type IV secretion system, a key virulence factor.« less
Directed shotgun proteomics guided by saturated RNA-seq identifies a complete expressed prokaryotic proteome

PubMed Central

Omasits, Ulrich; Quebatte, Maxime; Stekhoven, Daniel J.; Fortes, Claudia; Roschitzki, Bernd; Robinson, Mark D.; Dehio, Christoph; Ahrens, Christian H.

2013-01-01

Prokaryotes, due to their moderate complexity, are particularly amenable to the comprehensive identification of the protein repertoire expressed under different conditions. We applied a generic strategy to identify a complete expressed prokaryotic proteome, which is based on the analysis of RNA and proteins extracted from matched samples. Saturated transcriptome profiling by RNA-seq provided an endpoint estimate of the protein-coding genes expressed under two conditions which mimic the interaction of Bartonella henselae with its mammalian host. Directed shotgun proteomics experiments were carried out on four subcellular fractions. By specifically targeting proteins which are short, basic, low abundant, and membrane localized, we could eliminate their initial underrepresentation compared to the estimated endpoint. A total of 1250 proteins were identified with an estimated false discovery rate below 1%. This represents 85% of all distinct annotated proteins and ∼90% of the expressed protein-coding genes. Genes that were detected at the transcript but not protein level, were found to be highly enriched in several genomic islands. Furthermore, genes that lacked an ortholog and a functional annotation were not detected at the protein level; these may represent examples of overprediction in genome annotations. A dramatic membrane proteome reorganization was observed, including differential regulation of autotransporters, adhesins, and hemin binding proteins. Particularly noteworthy was the complete membrane proteome coverage, which included expression of all members of the VirB/D4 type IV secretion system, a key virulence factor. PMID:23878158
Non-coding RNAs in lung cancer

PubMed Central

Ricciuti, Biagio; Mecca, Carmen; Crinò, Lucio; Baglivo, Sara; Cenci, Matteo; Metro, Giulio

2014-01-01

The discovery that protein-coding genes represent less than 2% of all human genome, and the evidence that more than 90% of it is actively transcribed, changed the classical point of view of the central dogma of molecular biology, which was always based on the assumption that RNA functions mainly as an intermediate bridge between DNA sequences and protein synthesis machinery. Accumulating data indicates that non-coding RNAs are involved in different physiological processes, providing for the maintenance of cellular homeostasis. They are important regulators of gene expression, cellular differentiation, proliferation, migration, apoptosis, and stem cell maintenance. Alterations and disruptions of their expression or activity have increasingly been associated with pathological changes of cancer cells, this evidence and the prospect of using these molecules as diagnostic markers and therapeutic targets, make currently non-coding RNAs among the most relevant molecules in cancer research. In this paper we will provide an overview of non-coding RNA function and disruption in lung cancer biology, also focusing on their potential as diagnostic, prognostic and predictive biomarkers. PMID:25593996
Molecular cloning and evolutionary analysis of captive forest musk deer bitter taste receptor gene T2R16.

PubMed

Zhao, G J; Wu, N; Li, D Y; Zeng, D J; Chen, Q; Lu, L; Feng, X L; Zhang, C L; Zheng, C L; Jie, H

2015-12-08

Sensing bitter tastes is crucial for most animals because it can prevent them from ingesting harmful food. This process is mainly mediated by the bitter taste receptors (T2R) that are largely expressed in the taste buds. Previous studies have identified some T2R gene repertoires. Marked variation in repertoire size has been noted among species. However, research on T2Rs is still limited and the mechanisms underlying the evolution of vertebrate T2Rs remain poorly understood. In the present study, we analyzed the structure and features of the protein encoded by the forest musk deer (Moschus berezovskii) T2R16 and submitted the gene sequence to NCBI GenBank. The results showed that the full coding DNA sequence (CDS) of musk deer T2R16 (GenBank accession No. KP677279) was 906 bp, encoding 301 amino acids, which contained ATG start codon and TGA stop codon, with a calculated molecular weight of 35.03 kDa and an isoelectric point of 9.56. The T2R16 protein receptor had seven conserved transmembrane regions. Hydrophobicity analysis showed that most amino acid residues in T2R16 protein were hydrophobic, and the grand average of hydrophobicity (GRAVY) was 0.657. Phylogenetic analysis based on this gene revealed that forest musk deer had the closest association with sheep (Ovis aries), as compared to cow (Bos taurus), Tursiops truncatus, and other species, whereas it was genetically farthest from humans (Homo sapiens). We hope these results would complement the existing data on T2R16 and encourage further research in this respect.
SGP-1: Prediction and Validation of Homologous Genes Based on Sequence Alignments

PubMed Central

Wiehe, Thomas; Gebauer-Jung, Steffi; Mitchell-Olds, Thomas; Guigó, Roderic

2001-01-01

Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of SGP-1 depends little on species-specific properties such as codon usage or the nucleotide distribution. SGP-1 may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors. PMID:11544202
The First Mitochondrial Genomes of Antlion (Neuroptera: Myrmeleontidae) and Split-footed Lacewing (Neuroptera: Nymphidae), with Phylogenetic Implications of Myrmeleontiformia

PubMed Central

Yan, Yan; Wang, Yuyu; Liu, Xingyue; Winterton, Shaun L.; Yang, Ding

2014-01-01

In the holometabolous insect order Neuroptera (lacewings), the cosmopolitan Myrmeleontidae (antlions) are the most species-rich family, while the closely related Nymphidae (split-footed lacewings) are a small endemic family from the Australian-Malesian region. Both families belong to the suborder Myrmeleontiformia, within which controversial hypotheses on the interfamilial phylogenetic relationships exist. Herein, we describe the complete mitochondrial (mt) genomes of an antlion (Myrmeleon immanis Walker, 1853) and a split-footed lacewing (Nymphes myrmeleonoides Leach, 1814), representing the first mt genomes for both families. These mt genomes are relatively small (respectively composed of 15,799 and 15,713 bp) compared to other lacewing mt genomes, and comprise 37 genes (13 protein coding genes, 22 tRNA genes and two rRNA genes). The arrangement of these two mt genomes is the same as in most derived Neuroptera mt genomes previously sequenced, specifically with a translocation of trnC. The start codons of all PCGs are started by ATN, with an exception of cox1, which is ACG in the M. immanis mt genome and TCG in N. myrmeleonoides. All tRNA genes have a typical clover-leaf structure of mitochondrial tRNA, with the exception of trnS1(AGN). The secondary structures of rrnL and rrnS are similar with those proposed insects and the domain I contains nine helices rather than eight helices, which is common within Neuroptera. A phylogenetic analysis based on the mt genomic data for all Neuropterida sequenced thus far, supports the monophyly of Myrmeleontiformia and the sister relationship between Ascalaphidae and Myrmeleontidae. PMID:25170303

Characterization of the complete mitochondrial genome of Acanthoscelides obtectus (Coleoptera: Chrysomelidae: Bruchinae) with phylogenetic analysis.

PubMed

Yao, Jie; Yang, Hong; Dai, Renhuai

2017-10-01

Acanthoscelides obtectus is a common species of the subfamily Bruchinae and a worldwide-distributed seed-feeding beetle. The complete mitochondrial genome of A. obtectus is 16,130 bp in length with an A + T content of 76.4%. It contains a positive AT skew and a negative GC skew. The mitogenome of A. obtectus contains 13 protein-coding genes (PCGs), 22 tRNA genes, two rRNA genes and a non-coding region (D-loop). All PCGs start with an ATN codon, and seven (ND3, ATP6, COIII, ND3, ND4L, ND6, and Cytb) of them terminate with TAA, while the remaining five (COI, COII, ND1, ND4, and ND5) terminate with a single T, ATP8 terminates with TGA. Except tRNA Ser , the secondary structures of 21 tRNAs that can be folded into a typical clover-leaf structure were identified. The secondary structures of lrRNA and srRNA were also predicted in this study. There are six domains with 48 helices in lrRNA and three domains with 32 helices in srRNA. The control region of A. obtectus is 1354 bp in size with the highest A + T content (83.5%) in a mitochondrial gene. Thirteen PCGs in 19 species have been used to infer their phylogenetic relationships. Our results show that A. obtectus belongs to the family Chrysomelidae (subfamily-Bruchinae). This is the first study on phylogenetic analyses involving the mitochondrial genes of A. obtectus and could provide basic data for future studies of mitochondrial genome diversities and the evolution of related insect lineages.
Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants.

PubMed

Fu, Wenqing; O'Connor, Timothy D; Jun, Goo; Kang, Hyun Min; Abecasis, Goncalo; Leal, Suzanne M; Gabriel, Stacey; Rieder, Mark J; Altshuler, David; Shendure, Jay; Nickerson, Deborah A; Bamshad, Michael J; Akey, Joshua M

2013-01-10

Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history and will help to facilitate the development of new approaches for disease-gene discovery. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth, notable for an excess of rare genetic variants, suggesting that many mutations arose recently. To more quantitatively assess the distribution of mutation ages, we resequenced 15,336 genes in 6,515 individuals of European American and African American ancestry and inferred the age of 1,146,401 autosomal single nucleotide variants (SNVs). We estimate that approximately 73% of all protein-coding SNVs and approximately 86% of SNVs predicted to be deleterious arose in the past 5,000-10,000 years. The average age of deleterious SNVs varied significantly across molecular pathways, and disease genes contained a significantly higher proportion of recently arisen deleterious SNVs than other genes. Furthermore, European Americans had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans, consistent with weaker purifying selection due to the Out-of-Africa dispersal. Our results better delimit the historical details of human protein-coding variation, show the profound effect of recent human history on the burden of deleterious SNVs segregating in contemporary populations, and provide important practical information that can be used to prioritize variants in disease-gene discovery.
The complete nucleotide sequence of the domestic dog (Canis familiaris) mitochondrial genome.

PubMed

Kim, K S; Lee, S E; Jeong, H W; Ha, J H

1998-10-01

The complete nucleotide sequence of the mitochondrial genome of the domestic dog, Canis familiaris, was determined. The length of the sequence was 16,728 bp; however, the length was not absolute due to the variation (heteroplasmy) caused by differing numbers of the repetitive motif, 5'-GTACACGT(A/G)C-3', in the control region. The genome organization, gene contents, and codon usage conformed to those of other mammalian mitochondrial genomes. Although its features were unknown, the "CTAGA" duplication event which followed the translational stop codon of the COII gene was not observed in other mammalian mitochondrial genomes. In order to determine the possible differences between mtDNAs in carnivores, two rRNA and 13 protein-coding genes from the cat, dog, and seal were compared. The combined molecular differences, in two rRNA genes as well as in the inferred amino acid sequences of the mitochondrial 13 protein-coding genes, suggested that there is a closer relationship between the dog and the seal than there is between either of these species and the cat. Based on the molecular differences of the mtDNA, the evolutionary divergence between the cat, the dog, and the seal was dated to approximately 50 +/- 4 million years ago. The degree of difference between carnivore mtDNAs varied according to the individual protein-coding gene applied, showing that the evolutionary relationships of distantly related species should be presented in an extended study based on ample sequence data like complete mtDNA molecules. Copyright 1998 Academic Press.
The complete mitochondrial genome of Rapana venosa (Gastropoda, Muricidae).

PubMed

Sun, Xiujun; Yang, Aiguo

2016-01-01

The complete mitochondrial (mt) genome of the veined rapa whelk, Rapana venosa, was determined using genome walking techniques in this study. The total length of the mt genome sequence of R. venosa was 15,271 bp, which is comparable to the reported Muricidae mitogenomes to date. It contained 13 protein-coding genes, 21 transfer RNA genes, and two ribosomal RNA genes. A bias towards a higher representation of nucleotides A and T (69%) was detected in the mt genome of R. venosa. A small number of non-coding nucleotides (302 bp) was detected, and the largest non-coding region was 74 bp in length.
Towards a complete map of the human long non-coding RNA transcriptome.

PubMed

Uszczynska-Ratajczak, Barbara; Lagarde, Julien; Frankish, Adam; Guigó, Roderic; Johnson, Rory

2018-05-23

Gene maps, or annotations, enable us to navigate the functional landscape of our genome. They are a resource upon which virtually all studies depend, from single-gene to genome-wide scales and from basic molecular biology to medical genetics. Yet present-day annotations suffer from trade-offs between quality and size, with serious but often unappreciated consequences for downstream studies. This is particularly true for long non-coding RNAs (lncRNAs), which are poorly characterized compared to protein-coding genes. Long-read sequencing technologies promise to improve current annotations, paving the way towards a complete annotation of lncRNAs expressed throughout a human lifetime.
The mitochondrial gene encoding ribosomal protein S12 has been translocated to the nuclear genome in Oenothera.

PubMed Central

Grohmann, L; Brennicke, A; Schuster, W

1992-01-01

The Oenothera mitochondrial genome contains only a gene fragment for ribosomal protein S12 (rps12), while other plants encode a functional gene in the mitochondrion. The complete Oenothera rps12 gene is located in the nucleus. The transit sequence necessary to target this protein to the mitochondrion is encoded by a 5'-extension of the open reading frame. Comparison of the amino acid sequence encoded by the nuclear gene with the polypeptides encoded by edited mitochondrial cDNA and genomic sequences of other plants suggests that gene transfer between mitochondrion and nucleus started from edited mitochondrial RNA molecules. Mechanisms and requirements of gene transfer and activation are discussed. Images PMID:1454526
Long Non-Coding RNAs (lncRNAs) of Sea Cucumber: Large-Scale Prediction, Expression Profiling, Non-Coding Network Construction, and lncRNA-microRNA-Gene Interaction Analysis of lncRNAs in Apostichopus japonicus and Holothuria glaberrima During LPS Challenge and Radial Organ Complex Regeneration.

PubMed

Mu, Chuang; Wang, Ruijia; Li, Tianqi; Li, Yuqiang; Tian, Meilin; Jiao, Wenqian; Huang, Xiaoting; Zhang, Lingling; Hu, Xiaoli; Wang, Shi; Bao, Zhenmin

2016-08-01

Long non-coding RNA (lncRNA) structurally resembles mRNA but cannot be translated into protein. Although the systematic identification and characterization of lncRNAs have been increasingly reported in model species, information concerning non-model species is still lacking. Here, we report the first systematic identification and characterization of lncRNAs in two sea cucumber species: (1) Apostichopus japonicus during lipopolysaccharide (LPS) challenge and in heathy tissues and (2) Holothuria glaberrima during radial organ complex regeneration, using RNA-seq datasets and bioinformatics analysis. We identified A. japonicus and H. glaberrima lncRNAs that were differentially expressed during LPS challenge and radial organ complex regeneration, respectively. Notably, the predicted lncRNA-microRNA-gene trinities revealed that, in addition to targeting protein-coding transcripts, miRNAs might also target lncRNAs, thereby participating in a potential novel layer of regulatory interactions among non-coding RNA classes in echinoderms. Furthermore, the constructed coding-non-coding network implied the potential involvement of lncRNA-gene interactions during the regulation of several important genes (e.g., Toll-like receptor 1 [TLR1] and transglutaminase-1 [TGM1]) in response to LPS challenge and radial organ complex regeneration in sea cucumbers. Overall, this pioneer systematic identification, annotation, and characterization of lncRNAs in echinoderm pave the way for similar studies and future genetic, genomic, and evolutionary research in non-model species.
HippDB: a database of readily targeted helical protein-protein interactions.

PubMed

Bergey, Christina M; Watkins, Andrew M; Arora, Paramjit S

2013-11-01

HippDB catalogs every protein-protein interaction whose structure is available in the Protein Data Bank and which exhibits one or more helices at the interface. The Web site accepts queries on variables such as helix length and sequence, and it provides computational alanine scanning and change in solvent-accessible surface area values for every interfacial residue. HippDB is intended to serve as a starting point for structure-based small molecule and peptidomimetic drug development. HippDB is freely available on the web at http://www.nyu.edu/projects/arora/hippdb. The Web site is implemented in PHP, MySQL and Apache. Source code freely available for download at http://code.google.com/p/helidb, implemented in Perl and supported on Linux. arora@nyu.edu.
The first two mitochondrial genomes from Taeniopterygidae (Insecta: Plecoptera): Structural features and phylogenetic implications.

PubMed

Chen, Zhi-Teng; Du, Yu-Zhou

2018-05-01

The complete mitochondrial genomes (mitogenomes) of Taeniopteryx ugola and Doddsia occidentalis (Plecoptera: Taeniopterygidae) were firstly sequenced from the family Taeniopterygidae. The 15,353-bp long mitogenome of T. ugola and the 16,020-bp long mitogenome of D. occidentalis each contained 37 genes including 13 protein-coding genes (PCGs), 22 transfer RNA genes (tRNAs), two ribosomal RNA genes (rRNAs) and a control region (CR). The mitochondrial gene arrangement of the two taeniopterygids and other stoneflies was identical with the putative ancestral mitogenome of Drosophila yakuba. Most PCGs used standard ATN start codons and TAN termination codons. Twenty-one of the 22 tRNAs in each mitogenome could fold into the cloverleaf secondary structures, while the dihydrouridine (DHU) arm of trnSer (AGN) was reduced or absent. Stem-loop (SL) structures, poly-T stretch, poly-[AT] n stretch and tandem repeats were found in the CRs of the two mitogenomes. The phylogenetic analyses using Bayesian inference (BI) and maximum likelihood methods (ML) generated identical results, both supporting the monophyly of all stonefly families and the two infraorders, Systellognatha and Euholognatha. Taeniopterygidae was grouped with another two families from Euholognatha. The relationships within Plecoptera were recovered as (((Perlidae+Peltoperlidae)+((Pteronarcyidae+Chloroperlidae)+Styloperlidae))+((Capniidae+Taeniopterygidae)+Nemouridae))+Gripopterygidae. Copyright © 2017 Elsevier B.V. All rights reserved.
The complete mitochondrial genome of the styloperlid stonefly species Styloperla spinicercia Wu (Insecta: Plecoptera) with family-level phylogenetic analyses of the Pteronarcyoidea.

PubMed

Wang, Ying; Cao, Jinjun; Li, Weihai

2017-03-13

We present the complete mitochondrial (mt) genome sequence of the stonefly, Styloperla spinicercia Wu, 1935 (Plecoptera: Styloperlidae), the type species of the genus Styloperla and the first complete mt genome for the family Styloperlidae. The genome is circular, 16,129 base pairs long, has an A+T content of 70.7%, and contains 37 genes including the large and small ribosomal RNA (rRNA) subunits, 13 protein coding genes (PCGs), 22 tRNA genes and a large non-coding region (CR). All of the PCGs use the standard initiation codon ATN except ND1 and ND5, which start with TTG and GTG. Twelve of the PCGs stop with conventional terminal codons TAA and TAG, except ND5 which shows an incomplete terminator signal T. All tRNAs have the classic clover-leaf structures with the dihydrouridine (DHU) arm of tRNASer(AGN) forming a simple loop. Secondary structures of the two ribosomal RNAs are presented with reference to previous models. The structural elements and the variable numbers of tandem repeats are described within the control region. Phylogenetic analyses using both Bayesian (BI) and Maximum Likelihood (ML) methods support the previous hypotheses regarding family level relationships within the Pteronarcyoidea. The genetic distance calculated based on 13 PCGs and two rRNAs between Styloperla sp. and S. spinicercia is provided and interspecific divergence is discussed.
Comparative Mitogenomics of the Assassin Bug Genus Peirates (Hemiptera: Reduviidae: Peiratinae) Reveal Conserved Mitochondrial Genome Organization of P. atromaculatus, P. fulvescens and P. turpis

PubMed Central

Zhao, Guangyu; Li, Hu; Zhao, Ping; Cai, Wanzhi

2015-01-01

In this study, we sequenced four new mitochondrial genomes and presented comparative mitogenomic analyses of five species in the genus Peirates (Hemiptera: Reduviidae). Mitochondrial genomes of these five assassin bugs had a typical set of 37 genes and retained the ancestral gene arrangement of insects. The A+T content, AT- and GC-skews were similar to the common base composition biases of insect mtDNA. Genomic size ranges from 15,702 bp to 16,314 bp and most of the size variation was due to length and copy number of the repeat unit in the putative control region. All of the control region sequences included large tandem repeats present in two or more copies. Our result revealed similarity in mitochondrial genomes of P. atromaculatus, P. fulvescens and P. turpis, as well as the highly conserved genomic-level characteristics of these three species, e.g., the same start and stop codons of protein-coding genes, conserved secondary structure of tRNAs, identical location and length of non-coding and overlapping regions, and conservation of structural elements and tandem repeat unit in control region. Phylogenetic analyses also supported a close relationship between P. atromaculatus, P. fulvescens and P. turpis, which might be recently diverged species. The present study indicates that mitochondrial genome has important implications on phylogenetics, population genetics and speciation in the genus Peirates. PMID:25689825
Chimeric mitochondrial minichromosomes of the human body louse, Pediculus humanus: evidence for homologous and non-homologous recombination.

PubMed

Shao, Renfu; Barker, Stephen C

2011-02-15

The mitochondrial (mt) genome of the human body louse, Pediculus humanus, consists of 18 minichromosomes. Each minichromosome is 3 to 4 kb long and has 1 to 3 genes. There is unequivocal evidence for recombination between different mt minichromosomes in P. humanus. It is not known, however, how these minichromosomes recombine. Here, we report the discovery of eight chimeric mt minichromosomes in P. humanus. We classify these chimeric mt minichromosomes into two groups: Group I and Group II. Group I chimeric minichromosomes contain parts of two different protein-coding genes that are from different minichromosomes. The two parts of protein-coding genes in each Group I chimeric minichromosome are joined at a microhomologous nucleotide sequence; microhomologous nucleotide sequences are hallmarks of non-homologous recombination. Group II chimeric minichromosomes contain all of the genes and the non-coding regions of two different minichromosomes. The conserved sequence blocks in the non-coding regions of Group II chimeric minichromosomes resemble the "recombination repeats" in the non-coding regions of the mt genomes of higher plants. These repeats are essential to homologous recombination in higher plants. Our analyses of the nucleotide sequences of chimeric mt minichromosomes indicate both homologous and non-homologous recombination between minichromosomes in the mitochondria of the human body louse. Copyright © 2010 Elsevier B.V. All rights reserved.
Molecular cloning of a Candida albicans gene (SSB1) coding for a protein related to the Hsp70 family.

PubMed

Maneu, V; Cervera, A M; Martinez, J P; Gozalbo, D

1997-06-15

We have cloned and sequenced a Candida albicans gene (SSB1) encoding a potential member of the heat-shock protein seventy (hsp70) family. The protein encoded by this gene contains 613 amino acids and shows a high degree (85%) of sequence identity to the ssb subfamily (ssb1 and ssb2) of the Saccharomyces cerevisiae hsp70 family. The transcribed mRNA (2.1 kb) is present in similar amounts both in yeast and germ tube cells of C. albicans.
A High-Resolution Gene Map of the Chloroplast Genome of the Red Alga Porphyra purpurea.

PubMed Central

Reith, M; Munholland, J

1993-01-01

Extensive DNA sequencing of the chloroplast genome of the red alga Porphyra purpurea has resulted in the detection of more than 125 genes. Fifty-eight (approximately 46%) of these genes are not found on the chloroplast genomes of land plants. These include genes encoding 17 photosynthetic proteins, three tRNAs, and nine ribosomal proteins. In addition, nine genes encoding proteins related to biosynthetic functions, six genes encoding proteins involved in gene expression, and at least five genes encoding miscellaneous proteins are among those not known to be located on land plant chloroplast genomes. The increased coding capacity of the P. purpurea chloroplast genome, along with other characteristics such as the absence of introns and the conservation of ancestral operons, demonstrate the primitive nature of the P. purpurea chloroplast genome. In addition, evidence for a monophyletic origin of chloroplasts is suggested by the identification of two groups of genes that are clustered in chloroplast genomes but not in cyanobacteria. PMID:12271072
Nucleic acids encoding plant glutamine phenylpyruvate transaminase (GPT) and uses thereof

DOEpatents

Unkefer, Pat J.; Anderson, Penelope S.; Knight, Thomas J.

2016-03-29

Glutamine phenylpyruvate transaminase (GPT) proteins, nucleic acid molecules encoding GPT proteins, and uses thereof are disclosed. Provided herein are various GPT proteins and GPT gene coding sequences isolated from a number of plant species. As disclosed herein, GPT proteins share remarkable structural similarity within plant species, and are active in catalyzing the synthesis of 2-hydroxy-5-oxoproline (2-oxoglutaramate), a powerful signal metabolite which regulates the function of a large number of genes involved in the photosynthesis apparatus, carbon fixation and nitrogen metabolism.
Horizontal gene acquisitions contributed to genome expansion in insect-symbiotic Spiroplasma clarkii.

PubMed

Tsai, Yi-Ming; Chang, An; Kuo, Chih-Horng

2018-06-01

Genome reduction is a recurring theme of symbiont evolution. The genus Spiroplasma contains species that are mostly facultative insect symbionts. The typical genome sizes of those species within the Apis clade were estimated to be ∼1.0-1.4 Mb. Intriguingly, Spiroplasma clarkii was found to have a genome size that is > 30% larger than the median of other species within the same clade. To investigate the molecular evolution events that led to the genome expansion of this bacterium, we determined its complete genome sequence and inferred the evolutionary origin of each protein-coding gene based on the phylogenetic distribution of homologs. Among the 1,346 annotated protein-coding genes, 641 were originated from within the Apis clade while 233 were putatively acquired from outside of the clade (including 91 high-confidence candidates). Additionally, 472 were specific to S. clarkii without homologs in the current database (i.e., the origins remained unknown). The acquisition of protein-coding genes, rather than mobile genetic elements, appeared to be a major contributing factor of genome expansion. Notably, >50% of the high-confidence acquired genes are related to carbohydrate transport and metabolism, suggesting that these acquired genes contributed to the expansion of both genome size and metabolic capability. The findings of this work provided an interesting case against the general evolutionary trend observed among symbiotic bacteria and further demonstrated the flexibility of Spiroplasma genomes. For future studies, investigation on the functional integration of these acquired genes, as well as the inference of their contribution to fitness could improve our knowledge of symbiont evolution.
Necessities for the First Life to Emerge

NASA Astrophysics Data System (ADS)

Ikehara, K.

2017-07-01

For the first life to emerge, the first protein must be produced by random joining of amino acids in protein 0th-order structure. In addition, the first genetic code and the first double-stranded gene must encode the protein 0th-order structure.
Complete Mitochondrial Genome of Eruca sativa Mill. (Garden Rocket)

PubMed Central

Yang, Qing; Chang, Shengxin; Chen, Jianmei; Hu, Maolong; Guan, Rongzhan

2014-01-01

Eruca sativa (Cruciferae family) is an ancient crop of great economic and agronomic importance. Here, the complete mitochondrial genome of Eruca sativa was sequenced and annotated. The circular molecule is 247 696 bp long, with a G+C content of 45.07%, containing 33 protein-coding genes, three rRNA genes, and 18 tRNA genes. The Eruca sativa mitochondrial genome may be divided into six master circles and four subgenomic molecules via three pairwise large repeats, resulting in a more dynamic structure of the Eruca sativa mtDNA compared with other cruciferous mitotypes. Comparison with the Brassica napus MtDNA revealed that most of the genes with known function are conserved between these two mitotypes except for the ccmFN2 and rrn18 genes, and 27 point mutations were scattered in the 14 protein-coding genes. Evolutionary relationships analysis suggested that Eruca sativa is more closely related to the Brassica species and to Raphanus sativus than to Arabidopsis thaliana. PMID:25157569
Computational prediction of over-annotated protein-coding genes in the genome of Agrobacterium tumefaciens strain C58

NASA Astrophysics Data System (ADS)

Yu, Jia-Feng; Sui, Tian-Xiang; Wang, Hong-Mei; Wang, Chun-Ling; Jing, Li; Wang, Ji-Hua

2015-12-01

Agrobacterium tumefaciens strain C58 is a type of pathogen that can cause tumors in some dicotyledonous plants. Ever since the genome of A. tumefaciens strain C58 was sequenced, the quality of annotation of its protein-coding genes has been queried continually, because the annotation varies greatly among different databases. In this paper, the questionable hypothetical genes were re-predicted by integrating the TN curve and Z curve methods. As a result, 30 genes originally annotated as “hypothetical” were discriminated as being non-coding sequences. By testing the re-prediction program 10 times on data sets composed of the function-known genes, the mean accuracy of 99.99% and mean Matthews correlation coefficient value of 0.9999 were obtained. Further sequence analysis and COG analysis showed that the re-annotation results were very reliable. This work can provide an efficient tool and data resources for future studies of A. tumefaciens strain C58. Project supported by the National Natural Science Foundation of China (Grant Nos. 61302186 and 61271378) and the Funding from the State Key Laboratory of Bioelectronics of Southeast University.
The complete sequence of mitochondrial genome of polled yak (Bos grunniens).

PubMed

Chu, Min; Wu, Xiaoyun; Liang, Chunnian; Pei, Jie; Ding, Xuezhi; Guo, Xian; Bao, Pengjia; Yan, Ping

2016-05-01

Generally speaking, the hornless trait is also known as polled. Although the POLL locus could be assigned to a 1.36-Mb interval in the centromeric region of BTA1 (Georges et al., 1993; Drögemüller et al., 2005)), and (Liu et al., 2014) reported a 147-kb segment that included three protein-coding genes was the most likely location of the POLL mutation in domestic yaks, the underlying genetic basis for the polled trait is still unknown. In this work, the complete mitochondrial genome sequence of polled yak was determined for the first time. The total length of the mitogenome is 16,324 bp long, with the base composition of 33.72% A, 27.25% T, 25.83% C, and 13.20% G. It contained 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes and 1 non-coding region (D-loop region). The gene order of polled yak mitogenome is identical to that observed in most other vertebrates. The complete mitogenome sequence information of polled yak will provide useful data for further studies on protection of genetic resources and phylogenetic relationships within Bos grunniens.

Chromosome mapping of the human arrestin (SAG), {beta}-arrestin 2 (ARRB2), and {beta}-adrenergic receptor kinase 2 (ADRBK2) genes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Calabrese, G.; Sallese, M.; Stornaiuolo, A.

1994-09-01

Two types of proteins play a major role in determining homologous desensitization of G-coupled receptors: {beta}-adrenergic receptor kinase ({beta}ARK), which phosphorylates the agonist-occupied receptor and its functional cofactor, {beta}-arrestin. Both {beta}ARK and {beta}-arrestin are members of multigene families. The family of G-protein-coupled receptor kinases includes rhodopsin kinase, {beta}ARK1, {beta}ARK2, IT11-A (GRK4), GRK5, and GRK6. The arrestin/{beta}-arrestin gene family includes arrestin (also known as S-antigen), {beta}-arrestin 1, and {beta}-arrestin 2. Here we report the chromosome mapping of the human genes for arrestin (SAG), {beta}arrestin 2 (ARRB2), and {beta}ARK2 (ADRBK2) by fluorescence in situ hybridization (FISH). FISH results confirmed the assignment ofmore » the gene coding for arrestin (SAG) to chromosome 2 and allowed us to refine its localization to band q37. The gene coding for {beta}-arrestin 2 (ARRB2) was mapped to chromosome 17p13 and that coding for {beta}ARK2 (ADRBK2) to chromosome 22q11. 17 refs., 1 fig.« less
Genetic relatedness among human rotavirus genes coding for VP7, a major neutralization protein, and its application to serotype identification.

PubMed Central

Midthun, K; Flores, J; Taniguchi, K; Urasawa, S; Kapikian, A Z; Chanock, R M

1987-01-01

Antigenic characterization of human rotaviruses by plaque reduction neutralization assay has revealed four distinct serotypes. The outer capsid protein VP7, coded for by gene 8 or 9, is a major neutralization protein; however, studies of rotaviruses derived from genetic reassortment between two strains have confirmed that another outer capsid protein, VP3, is in some cases equally important in neutralization. In this study, the genetic relatedness of the genes coding for VP7 of human rotaviruses belonging to serotypes 1 through 4 was examined by hybridization of their denatured double-stranded genomic RNAs to labeled single-stranded mRNA probes derived from human-animal rotavirus reassortants containing only the VP7 gene of their human rotavirus parent. A high degree of homology was demonstrated between the VP7 genes of strain D and other serotype 1 human rotaviruses, strain DS-1 and other serotype 2 human rotaviruses, strain P and other serotype 3 human rotaviruses, and strain ST3 and other serotype 4 human rotaviruses. Hybrid bands could not be demonstrated between the VP7 gene of D, DS-1, P, or ST3 and the corresponding gene of human rotaviruses belonging to a different serotype. RNA specimens extracted from the stools of 15 Venezuelan children hospitalized with rotavirus diarrhea were hybridized to each of the reassortant probes representing the four human serotypes. All five viruses with short RNA patterns showed homology with the DS-1 strain VP7 gene; two of these were previously adapted to tissue culture and shown to be serotype 2 strains by tissue culture neutralization. Of the remaining 10 viruses with long RNA patterns, 2 hybridized only to the D strain VP7 gene, 6 hybridized only to the P strain VP7 gene, and 2 hybridized only to the ST3 strain VP7 gene. Hybridization using single human rotavirus gene substitution reassortants as probes may provide an alternative method for identifying the VP7 serotype of field isolates that would circumvent the need for tissue culture adaptation. Images PMID:3038948
Origin and evolution of spliceosomal introns

PubMed Central

2012-01-01

Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded ‘introns first’ held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers’ Reports section. PMID:22507701
The complete mitochondrial DNA of endemic Eastern Pacific coral (Porites panamensis).

PubMed

Del Río-Portilla, Miguel A; Vargas-Peralta, Carmen E; Paz-García, David A; Lafarga De La Cruz, Fabiola; Balart, Eduardo F; García-de-León, Francisco J

2016-01-01

The mitogenome of the endemic coral Porites panamensis (Genbank accession number KJ546638) has a total length of 18,628 bp, and the arrangement consist of 13 protein-coding genes, 2 ribosomal RNA (rRNA) genes and 2 transfer RNA (tRNA) genes. Gene order was equal to other scleractinian coral mitogenomes.
A specific indel marker for the Philippines Schistosoma japonicum revealed by analysis of mitochondrial genome sequences.

PubMed

Li, Juan; Chen, Fen; Sugiyama, Hiromu; Blair, David; Lin, Rui-Qing; Zhu, Xing-Quan

2015-07-01

In the present study, near-complete mitochondrial (mt) genome sequences for Schistosoma japonicum from different regions in the Philippines and Japan were amplified and sequenced. Comparisons among S. japonicum from the Philippines, Japan, and China revealed a geographically based length difference in mt genomes, but the mt genomic organization and gene arrangement were the same. Sequence differences among samples from the Philippines and all samples from the three endemic areas were 0.57-2.12 and 0.76-3.85 %, respectively. The most variable part of the mt genome was the non-coding region. In the coding portion of the genome, protein-coding genes varied more than rRNA genes and tRNAs. The near-complete mt genome sequences for Philippine specimens were identical in length (14,091 bp) which was 4 bp longer than those of S. japonicum samples from Japan and China. This indel provides a unique genetic marker for S. japonicum samples from the Philippines. Phylogenetic analyses based on the concatenated amino acids of 12 protein-coding genes showed that samples of S. japonicum clustered according to their geographical origins. The identified mitochondrial indel marker will be useful for tracing the source of S. japonicum infection in humans and animals in Southeast Asia.
CHIR99021 promotes self-renewal of mouse embryonic stem cells by modulation of protein-encoding gene and long intergenic non-coding RNA expression

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Yongyan; Key Laboratory of Animal Biotechnology, Ministry of Agriculture, Northwest A and F University, Yangling 712100, Shaanxi; Ai, Zhiying

2013-10-15

Embryonic stem cells (ESCs) can proliferate indefinitely in vitro and differentiate into cells of all three germ layers. These unique properties make them exceptionally valuable for drug discovery and regenerative medicine. However, the practical application of ESCs is limited because it is difficult to derive and culture ESCs. It has been demonstrated that CHIR99021 (CHIR) promotes self-renewal and enhances the derivation efficiency of mouse (m)ESCs. However, the downstream targets of CHIR are not fully understood. In this study, we identified CHIR-regulated genes in mESCs using microarray analysis. Our microarray data demonstrated that CHIR not only influenced the Wnt/β-catenin pathway bymore » stabilizing β-catenin, but also modulated several other pluripotency-related signaling pathways such as TGF-β, Notch and MAPK signaling pathways. More detailed analysis demonstrated that CHIR inhibited Nodal signaling, while activating bone morphogenetic protein signaling in mESCs. In addition, we found that pluripotency-maintaining transcription factors were up-regulated by CHIR, while several developmental-related genes were down-regulated. Furthermore, we found that CHIR altered the expression of epigenetic regulatory genes and long intergenic non-coding RNAs. Quantitative real-time PCR results were consistent with microarray data, suggesting that CHIR alters the expression pattern of protein-encoding genes (especially transcription factors), epigenetic regulatory genes and non-coding RNAs to establish a relatively stable pluripotency-maintaining network. - Highlights: • Combined use of CHIR with LIF promotes self-renewal of J1 mESCs. • CHIR-regulated genes are involved in multiple pathways. • CHIR inhibits Nodal signaling and promotes Bmp4 expression to activate BMP signaling. • Expression of epigenetic regulatory genes and lincRNAs is altered by CHIR.« less
AP1 Keeps Chromatin Poised for Action | Center for Cancer Research

Cancer.gov

The human genome harbors gene-encoding DNA, the blueprint for building proteins that regulate cellular function. Embedded across the genome, in non-coding regions, are DNA elements to which regulatory factors bind. The interaction of regulatory factors with DNA at these sites modifies gene expression to modulate cell activity. In cells, DNA exists in a complex with proteins
Molecular cloning of low-temperature-inducible ribosomal proteins from soybean.

PubMed

Kim, Kee-Young; Park, Seong-Whan; Chung, Young-Soo; Chung, Chung-Han; Kim, Jung-In; Lee, Jai-Heon

2004-05-01

Three ribosomal protein genes induced by low-temperature treatment were isolated from soybean. GmRPS13 (742 bp) encodes a 17.1 kDa protein which has 95% identity with the 40S ribosomal protein S13 of Panax ginseng (AB043974). GmRPS6 (925 bp) encodes a 28.1 kDa protein which has 94% identity with the 40S ribosomal protein S6 of Asparagus officinalis (AJ277533). GmRPL37 (494 bp) encodes a 10.7 kDa protein which has 85% identity with the 60S ribosomal protein L37 of Arabidopsis thaliana (AF370216). The expression of these ribosomal protein genes started to increase 3 d after low-temperature treatment, whereas the cold-stress protein src1 was highly induced from the first day. Such late response of these ribosomal protein genes may be due to secondary signals during cold adaptation. The induction of ribosomal protein genes might enhance the translation process or help proper ribosome functioning under low-temperature conditions.
Td4IN2: A drought-responsive durum wheat (Triticum durum Desf.) gene coding for a resistance like protein with serine/threonine protein kinase, nucleotide binding site and leucine rich domains.

PubMed

Rampino, Patrizia; De Pascali, Mariarosaria; De Caroli, Monica; Luvisi, Andrea; De Bellis, Luigi; Piro, Gabriella; Perrotta, Carla

2017-11-01

Wheat, the main food source for a third of world population, appears strongly under threat because of predicted increasing temperatures coupled to drought. Plant complex molecular response to drought stress relies on the gene network controlling cell reactions to abiotic stress. In the natural environment, plants are subjected to the combination of abiotic and biotic stresses. Also the response of plants to biotic stress, to cope with pathogens, involves the activation of a molecular network. Investigations on combination of abiotic and biotic stresses indicate the existence of cross-talk between the two networks and a kind of overlapping can be hypothesized. In this work we describe the isolation and characterization of a drought-related durum wheat (Triticum durum Desf.) gene, identified in a previous study, coding for a protein combining features of NBS-LRR type resistance protein with a S/TPK domain, involved in drought stress response. This is one of the few examples reported where all three domains are present in a single protein and, to our knowledge, it is the first report on a gene specifically induced by drought stress and drought-related conditions, with this particular structure. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins.

PubMed

Dehal, Paramvir; Satou, Yutaka; Campbell, Robert K; Chapman, Jarrod; Degnan, Bernard; De Tomaso, Anthony; Davidson, Brad; Di Gregorio, Anna; Gelpke, Maarten; Goodstein, David M; Harafuji, Naoe; Hastings, Kenneth E M; Ho, Isaac; Hotta, Kohji; Huang, Wayne; Kawashima, Takeshi; Lemaire, Patrick; Martinez, Diego; Meinertzhagen, Ian A; Necula, Simona; Nonaka, Masaru; Putnam, Nik; Rash, Sam; Saiga, Hidetoshi; Satake, Masanobu; Terry, Astrid; Yamada, Lixy; Wang, Hong-Gang; Awazu, Satoko; Azumi, Kaoru; Boore, Jeffrey; Branno, Margherita; Chin-Bow, Stephen; DeSantis, Rosaria; Doyle, Sharon; Francino, Pilar; Keys, David N; Haga, Shinobu; Hayashi, Hiroko; Hino, Kyosuke; Imai, Kaoru S; Inaba, Kazuo; Kano, Shungo; Kobayashi, Kenji; Kobayashi, Mari; Lee, Byung-In; Makabe, Kazuhiro W; Manohar, Chitra; Matassi, Giorgio; Medina, Monica; Mochizuki, Yasuaki; Mount, Steve; Morishita, Tomomi; Miura, Sachiko; Nakayama, Akie; Nishizaka, Satoko; Nomoto, Hisayo; Ohta, Fumiko; Oishi, Kazuko; Rigoutsos, Isidore; Sano, Masako; Sasaki, Akane; Sasakura, Yasunori; Shoguchi, Eiichi; Shin-i, Tadasu; Spagnuolo, Antoinetta; Stainier, Didier; Suzuki, Miho M; Tassy, Olivier; Takatori, Naohito; Tokuoka, Miki; Yagi, Kasumi; Yoshizaki, Fumiko; Wada, Shuichi; Zhang, Cindy; Hyatt, P Douglas; Larimer, Frank; Detter, Chris; Doggett, Norman; Glavina, Tijana; Hawkins, Trevor; Richardson, Paul; Lucas, Susan; Kohara, Yuji; Levine, Michael; Satoh, Nori; Rokhsar, Daniel S

2002-12-13

The first chordates appear in the fossil record at the time of the Cambrian explosion, nearly 550 million years ago. The modern ascidian tadpole represents a plausible approximation to these ancestral chordates. To illuminate the origins of chordate and vertebrates, we generated a draft of the protein-coding portion of the genome of the most studied ascidian, Ciona intestinalis. The Ciona genome contains approximately 16,000 protein-coding genes, similar to the number in other invertebrates, but only half that found in vertebrates. Vertebrate gene families are typically found in simplified form in Ciona, suggesting that ascidians contain the basic ancestral complement of genes involved in cell signaling and development. The ascidian genome has also acquired a number of lineage-specific innovations, including a group of genes engaged in cellulose metabolism that are related to those in bacteria and fungi.
Behind the curtain of non-coding RNAs; long non-coding RNAs regulating hepatocarcinogenesis

PubMed Central

El Khodiry, Aya; Afify, Menna; El Tayebi, Hend M

2018-01-01

Hepatocellular carcinoma (HCC) is one of the most common and aggressive cancers worldwide. HCC is the fifth common malignancy in the world and the second leading cause of cancer death in Asia. Long non-coding RNAs (lncRNAs) are RNAs with a length greater than 200 nucleotides that do not encode proteins. lncRNAs can regulate gene expression and protein synthesis in several ways by interacting with DNA, RNA and proteins in a sequence specific manner. They could regulate cellular and developmental processes through either gene inhibition or gene activation. Many studies have shown that dysregulation of lncRNAs is related to many human diseases such as cardiovascular diseases, genetic disorders, neurological diseases, immune mediated disorders and cancers. However, the study of lncRNAs is challenging as they are poorly conserved between species, their expression levels aren’t as high as that of mRNAs and have great interpatient variations. The study of lncRNAs expression in cancers have been a breakthrough as it unveils potential biomarkers and drug targets for cancer therapy and helps understand the mechanism of pathogenesis. This review discusses many long non-coding RNAs and their contribution in HCC, their role in development, metastasis, and prognosis of HCC and how to regulate and target these lncRNAs as a therapeutic tool in HCC treatment in the future. PMID:29434445
Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production.

PubMed

Roth, Melissa S; Cokus, Shawn J; Gallaher, Sean D; Walter, Andreas; Lopez, David; Erickson, Erika; Endelman, Benjamin; Westcott, Daniel; Larabell, Carolyn A; Merchant, Sabeeha S; Pellegrini, Matteo; Niyogi, Krishna K

2017-05-23

Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis , because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. To advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ∼58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniform gene density over chromosomes, low repetitive sequence content (∼6%), and a high fraction of protein-coding sequence (∼39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (∼73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase ( BKT ), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. The high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.
Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production

DOE PAGES

Roth, Melissa S.; Cokus, Shawn J.; Gallaher, Sean D.; ...

2017-05-08

Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. Here, to advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ~58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniformmore » gene density over chromosomes, low repetitive sequence content (~6%), and a high fraction of protein-coding sequence (~39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (~73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. Finally, the high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.« less
Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roth, Melissa S.; Cokus, Shawn J.; Gallaher, Sean D.

Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. Here, to advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ~58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniformmore » gene density over chromosomes, low repetitive sequence content (~6%), and a high fraction of protein-coding sequence (~39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (~73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. Finally, the high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production.« less
Chromosome-level genome assembly and transcriptome of the green alga Chromochloris zofingiensis illuminates astaxanthin production

PubMed Central

Roth, Melissa S.; Cokus, Shawn J.; Gallaher, Sean D.; Walter, Andreas; Lopez, David; Erickson, Erika; Endelman, Benjamin; Westcott, Daniel; Larabell, Carolyn A.; Merchant, Sabeeha S.; Pellegrini, Matteo

2017-01-01

Microalgae have potential to help meet energy and food demands without exacerbating environmental problems. There is interest in the unicellular green alga Chromochloris zofingiensis, because it produces lipids for biofuels and a highly valuable carotenoid nutraceutical, astaxanthin. To advance understanding of its biology and facilitate commercial development, we present a C. zofingiensis chromosome-level nuclear genome, organelle genomes, and transcriptome from diverse growth conditions. The assembly, derived from a combination of short- and long-read sequencing in conjunction with optical mapping, revealed a compact genome of ∼58 Mbp distributed over 19 chromosomes containing 15,274 predicted protein-coding genes. The genome has uniform gene density over chromosomes, low repetitive sequence content (∼6%), and a high fraction of protein-coding sequence (∼39%) with relatively long coding exons and few coding introns. Functional annotation of gene models identified orthologous families for the majority (∼73%) of genes. Synteny analysis uncovered localized but scrambled blocks of genes in putative orthologous relationships with other green algae. Two genes encoding beta-ketolase (BKT), the key enzyme synthesizing astaxanthin, were found in the genome, and both were up-regulated by high light. Isolation and molecular analysis of astaxanthin-deficient mutants showed that BKT1 is required for the production of astaxanthin. Moreover, the transcriptome under high light exposure revealed candidate genes that could be involved in critical yet missing steps of astaxanthin biosynthesis, including ABC transporters, cytochrome P450 enzymes, and an acyltransferase. The high-quality genome and transcriptome provide insight into the green algal lineage and carotenoid production. PMID:28484037
Identification and Validation of Selected Universal Stress Protein Domain Containing Drought-Responsive Genes in Pigeonpea (Cajanus cajan L.)

PubMed Central

Sinha, Pallavi; Pazhamala, Lekha T.; Singh, Vikas K.; Saxena, Rachit K.; Krishnamurthy, L.; Azam, Sarwar; Khan, Aamir W.; Varshney, Rajeev K.

2016-01-01

Pigeonpea is a resilient crop, which is relatively more drought tolerant than many other legume crops. To understand the molecular mechanisms of this unique feature of pigeonpea, 51 genes were selected using the Hidden Markov Models (HMM) those codes for proteins having close similarity to universal stress protein domain. Validation of these genes was conducted on three pigeonpea genotypes (ICPL 151, ICPL 8755, and ICPL 227) having different levels of drought tolerance. Gene expression analysis using qRT-PCR revealed 6, 8, and 18 genes to be ≥2-fold differentially expressed in ICPL 151, ICPL 8755, and ICPL 227, respectively. A total of 10 differentially expressed genes showed ≥2-fold up-regulation in the more drought tolerant genotype, which encoded four different classes of proteins. These include plant U-box protein (four genes), universal stress protein A-like protein (four genes), cation/H(+) antiporter protein (one gene) and an uncharacterized protein (one gene). Genes C.cajan_29830 and C.cajan_33874 belonging to uspA, were found significantly expressed in all the three genotypes with ≥2-fold expression variations. Expression profiling of these two genes on the four other legume crops revealed their specific role in pigeonpea. Therefore, these genes seem to be promising candidates for conferring drought tolerance specifically to pigeonpea. PMID:26779199
Comparative Genome Analysis of “Candidatus Phytoplasma australiense” (Subgroup tuf-Australia I; rp-A) and “Ca. Phytoplasma asteris” Strains OY-M and AY-WB▿ †

PubMed Central

Tran-Nguyen, L. T. T.; Kube, M.; Schneider, B.; Reinhardt, R.; Gibb, K. S.

2008-01-01

The chromosome sequence of “Candidatus Phytoplasma australiense” (subgroup tuf-Australia I; rp-A), associated with dieback in papaya, Australian grapevine yellows in grapevine, and several other important plant diseases, was determined. The circular chromosome is represented by 879,324 nucleotides, a GC content of 27%, and 839 protein-coding genes. Five hundred two of these protein-coding genes were functionally assigned, while 337 genes were hypothetical proteins with unknown function. Potential mobile units (PMUs) containing clusters of DNA repeats comprised 12.1% of the genome. These PMUs encoded genes involved in DNA replication, repair, and recombination; nucleotide transport and metabolism; translation; and ribosomal structure. Elements with similarities to phage integrases found in these mobile units were difficult to classify, as they were similar to both insertion sequences and bacteriophages. Comparative analysis of “Ca. Phytoplasma australiense” with “Ca. Phytoplasma asteris” strains OY-M and AY-WB showed that the gene order was more conserved between the closely related “Ca. Phytoplasma asteris” strains than to “Ca. Phytoplasma australiense.” Differences observed between “Ca. Phytoplasma australiense” and “Ca. Phytoplasma asteris” strains included the chromosome size (18,693 bp larger than OY-M), a larger number of genes with assigned function, and hypothetical proteins with unknown function. PMID:18359806
Plasmid-encoded hygromycin B resistance: the sequence of hygromycin B phosphotransferase gene and its expression in Escherichia coli and Saccharomyces cerevisiae.

PubMed

Gritz, L; Davies, J

1983-11-01

The plasmid-borne gene hph coding for hygromycin B phosphotransferase (HPH) in Escherichia coli has been identified and its nucleotide sequence determined. The hph gene is 1026 nucleotides long, coding for a protein with a predicted Mr of 39 000. The hph gene was placed in a shuttle plasmid vector, downstream from the promoter region of the cyc 1 gene of Saccharomyces cerevisiae, and an hph construction containing a single AUG in the 5' noncoding region allowed direct selection following transformation in yeast and in E. coli. Thus the hph gene can be used in cloning vectors for both pro- and eukaryotes.
Regulation of the spoVM gene of Bacillus subtilis.

PubMed

Le, Ai Thi Thuy; Schumann, Wolfgang

2008-11-01

The spoVM gene of Bacillus subtilis codes for a 26 amino-acid peptide that is essential for sporulation. Analysis of the expression of the spoVM gene revealed that wild-type cells started to synthesize a spoVM-specific transcript at t2, whereas the SpoVM peptide accumulated at t4. Both the transcript and the peptide were absent from an spoVM knockout strain. The 5' untranslated region of the spoVM transcript increased expression of SpoVM. Possible regulation mechanisms are discussed.
Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses

PubMed Central

Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael

2013-01-01

Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343

Characterization of the orf1glnKamtB operon of Herbaspirillum seropedicae.

PubMed

Noindorf, Lilian; Rego, Fabiane G M; Baura, Valter A; Monteiro, Rose A; Wassem, Roseli; Cruz, Leonardo M; Rigo, Liu U; Souza, Emanuel M; Steffens, Maria B R; Pedrosa, Fabio O; Chubatsu, Leda S

2006-03-01

Herbaspirillum seropedicae is an endophytic nitrogen-fixing bacterium that colonizes economically important grasses. In this organism, the amtB gene is co-transcribed with two other genes: glnK that codes for a PII-like protein and orf1 that codes for a probable periplasmatic protein of unknown function. The expression of the orf1glnKamtB operon is increased under nitrogen-limiting conditions and is dependent on NtrC. An amtB mutant failed to transport methylammonium. Post-translational control of nitrogenase was also partially impaired in this mutant, since a complete switch-off of nitrogenase after ammonium addition was not observed. This result suggests that the AmtB protein is involved in the signaling pathway for the reversible inactivation of nitrogenase in H. seropedicae.
The Membrane-Bound C Subunit of Reductive Dehalogenases: Topology Analysis and Reconstitution of the FMN-Binding Domain of PceC

PubMed Central

Buttet, Géraldine F.; Willemin, Mathilde S.; Hamelin, Romain; Rupakula, Aamani; Maillard, Julien

2018-01-01

Organohalide respiration (OHR) is the energy metabolism of anaerobic bacteria able to use halogenated organic compounds as terminal electron acceptors. While the terminal enzymes in OHR, so-called reductive dehalogenases, are well-characterized, the identity of proteins potentially involved in electron transfer to the terminal enzymes remains elusive. Among the accessory genes identified in OHR gene clusters, the C subunit (rdhC) could well code for the missing redox protein between the quinol pool and the reductive dehalogenase, although it was initially proposed to act as transcriptional regulator. RdhC sequences are characterized by the presence of multiple transmembrane segments, a flavin mononucleotide (FMN) binding motif and two conserved CX3CP motifs. Based on these features, we propose a curated selection of RdhC proteins identified in general sequence databases. Beside the Firmicutes from which RdhC sequences were initially identified, the identified sequences belong to three additional phyla, the Chloroflexi, the Proteobacteria, and the Bacteriodetes. The diversity of RdhC sequences mostly respects the phylogenetic distribution, suggesting that rdhC genes emerged relatively early in the evolution of the OHR metabolism. PceC, the C subunit of the tetrachloroethene (PCE) reductive dehalogenase is encoded by the conserved pceABCT gene cluster identified in Dehalobacter restrictus PER-K23 and in several strains of Desulfitobacterium hafniense. Surfaceome analysis of D. restrictus cells confirmed the predicted topology of the FMN-binding domain (FBD) of PceC that is the exocytoplasmic face of the membrane. Starting from inclusion bodies of a recombinant FBD protein, strategies for successful assembly of the FMN cofactor and refolding were achieved with the use of the flavin-trafficking protein from D. hafniense TCE1. Mass spectrometry analysis and site-directed mutagenesis of rFBD revealed that threonine-168 of PceC is binding FMN covalently. Our results suggest that PceC, and more generally RdhC proteins, may play a role in electron transfer in the metabolism of OHR. PMID:29740408
Structure of the human gene encoding the protein repair L-isoaspartyl (D-aspartyl) O-methyltransferase.

PubMed

DeVry, C G; Tsai, W; Clarke, S

1996-11-15

The protein L-isoaspartyl/D-aspartyl O-methyltransferase (EC 2.1.1.77) catalyzes the first step in the repair of proteins damaged in the aging process by isomerization or racemization reactions at aspartyl and asparaginyl residues. A single gene has been localized to human chromosome 6 and multiple transcripts arising through alternative splicing have been identified. Restriction enzyme mapping, subcloning, and DNA sequence analysis of three overlapping clones from a human genomic library in bacteriophage P1 indicate that the gene spans approximately 60 kb and is composed of 8 exons interrupted by 7 introns. Analysis of intron/exon splice junctions reveals that all of the donor and acceptor splice sites are in agreement with the mammalian consensus splicing sequence. Determination of transcription initiation sites by primer extension analysis of poly(A)+ mRNA from human brain identifies multiple start sites, with a major site 159 nucleotides upstream from the ATG start codon. Sequence analysis of the 5'-untranslated region demonstrates several potential cis-acting DNA elements including SP1, ETF, AP1, AP2, ARE, XRE, CREB, MED-1, and half-palindromic ERE motifs. The promoter of this methyltransferase gene lacks an identifiable TATA box but is characterized by a CpG island which begins approximately 723 nucleotides upstream of the major transcriptional start site and extends through exon 1 and into the first intron. These features are characteristic of housekeeping genes and are consistent with the wide tissue distribution observed for this methyltransferase activity.
Complete nucleotide sequence of pig (Sus scrofa) mitochondrial genome and dating evolutionary divergence within Artiodactyla.

PubMed

Lin, C S; Sun, Y L; Liu, C Y; Yang, P C; Chang, L C; Cheng, I C; Mao, S J; Huang, M C

1999-08-05

The complete nucleotide sequence of the pig (Sus scrofa) mitochondrial genome, containing 16613bp, is presented in this report. The genome is not a specific length because of the presence of the variable numbers of tandem repeats, 5'-CGTGCGTACA in the displacement loop (D-loop). Genes responsible for 12S and 16S rRNAs, 22 tRNAs, and 13 protein-coding regions are found. The genome carries very few intergenic nucleotides with several instances of overlap between protein-coding or tRNA genes, except in the D-loop region. For evaluating the possible evolutionary relationships between Artiodactyla and Cetacea, the nucleotide substitutions and amino acid sequences of 13 protein-coding genes were aligned by pairwise comparisons of the pig, cow, and fin whale. By comparing these sequences, we suggest that there is a closer relationship between the pig and cow than that between either of these species and fin whale. In addition, the accumulation of transversions and gaps in pig 12S and 16S rRNA genes was compared with that in other eutherian species, including cow, fin whale, human, horse, and harbor seal. The results also reveal a close phylogenetic relationship between pig and cow, as compared to fin whale and others. Thus, according to the sequence differences of mitochondrial rRNA genes in eutherian species, the evolutionary separation of pig and cow occurred about 53-60 million years ago.
A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.

PubMed

Mehmood, Tahir; Bohlin, Jon; Snipen, Lars

2015-01-01

The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.
Natural variation of rice blast resistance gene Pi-d2

USDA-ARS?s Scientific Manuscript database

Studying natural variation of rice resistance (R) genes in cultivated and wild rice relatives can predict resistance stability to rice blast fungus. In the present study, the protein coding regions of rice R gene Pi-d2 in 35 rice accessions of subgroups, aus (AUS), indica (IND), temperate japonica (...
Kinetic models of gene expression including non-coding RNAs

NASA Astrophysics Data System (ADS)

Zhdanov, Vladimir P.

2011-03-01

In cells, genes are transcribed into mRNAs, and the latter are translated into proteins. Due to the feedbacks between these processes, the kinetics of gene expression may be complex even in the simplest genetic networks. The corresponding models have already been reviewed in the literature. A new avenue in this field is related to the recognition that the conventional scenario of gene expression is fully applicable only to prokaryotes whose genomes consist of tightly packed protein-coding sequences. In eukaryotic cells, in contrast, such sequences are relatively rare, and the rest of the genome includes numerous transcript units representing non-coding RNAs (ncRNAs). During the past decade, it has become clear that such RNAs play a crucial role in gene expression and accordingly influence a multitude of cellular processes both in the normal state and during diseases. The numerous biological functions of ncRNAs are based primarily on their abilities to silence genes via pairing with a target mRNA and subsequently preventing its translation or facilitating degradation of the mRNA-ncRNA complex. Many other abilities of ncRNAs have been discovered as well. Our review is focused on the available kinetic models describing the mRNA, ncRNA and protein interplay. In particular, we systematically present the simplest models without kinetic feedbacks, models containing feedbacks and predicting bistability and oscillations in simple genetic networks, and models describing the effect of ncRNAs on complex genetic networks. Mathematically, the presentation is based primarily on temporal mean-field kinetic equations. The stochastic and spatio-temporal effects are also briefly discussed.
Mu-Like Prophage in Serogroup B Neisseria meningitidis Coding for Surface-Exposed Antigens

PubMed Central

Masignani, Vega; Giuliani, Marzia Monica; Tettelin, Hervé; Comanducci, Maurizio; Rappuoli, Rino; Scarlato, Vincenzo

2001-01-01

Sequence analysis of the genome of Neisseria meningititdis serogroup B revealed the presence of an ∼35-kb region inserted within a putative gene coding for an ABC-type transporter. The region contains 46 open reading frames, 29 of which are colinear and homologous to the genes of Escherichia coli Mu phage. Two prophages with similar organizations were also found in serogroup A meningococcus, and one was found in Haemophilus influenzae. Early and late phage functions are well preserved in this family of Mu-like prophages. Several regions of atypical nucleotide content were identified. These likely represent genes acquired by horizontal transfer. Three of the acquired genes are shown to code for surface-associated antigens, and the encoded proteins are able to induce bactericidal antibodies. PMID:11254622
The Complete Mitochondrial Genome of the Land Snail Cornu aspersum (Helicidae: Mollusca): Intra-Specific Divergence of Protein-Coding Genes and Phylogenetic Considerations within Euthyneura

PubMed Central

Gaitán-Espitia, Juan Diego; Nespolo, Roberto F.; Opazo, Juan C.

2013-01-01

The complete sequences of three mitochondrial genomes from the land snail Cornu aspersum were determined. The mitogenome has a length of 14050 bp, and it encodes 13 protein-coding genes, 22 transfer RNA genes and two ribosomal RNA genes. It also includes nine small intergene spacers, and a large AT-rich intergenic spacer. The intra-specific divergence analysis revealed that COX1 has the lower genetic differentiation, while the most divergent genes were NADH1, NADH3 and NADH4. With the exception of Euhadra herklotsi, the structural comparisons showed the same gene order within the family Helicidae, and nearly identical gene organization to that found in order Pulmonata. Phylogenetic reconstruction recovered Basommatophora as polyphyletic group, whereas Eupulmonata and Pulmonata as paraphyletic groups. Bayesian and Maximum Likelihood analyses showed that C. aspersum is a close relative of Cepaea nemoralis, and with the other Helicidae species form a sister group of Albinaria caerulea, supporting the monophyly of the Stylommatophora clade. PMID:23826260
High quality draft genome sequence of Olivibacter sitiensis type strain (AW-6T), a diphenol degrader with genes involved in the catechol pathway

PubMed Central

Ntougias, Spyridon; Lapidus, Alla; Han, James; Mavromatis, Konstantinos; Pati, Amrita; Chen, Amy; Klenk, Hans-Peter; Woyke, Tanja; Fasseas, Constantinos; Kyrpides, Nikos C.; Zervakis, Georgios I.

2014-01-01

Olivibacter sitiensis Ntougias et al. 2007 is a member of the family Sphingobacteriaceae, phylum Bacteroidetes. Members of the genus Olivibacter are phylogenetically diverse and of significant interest. They occur in diverse habitats, such as rhizosphere and contaminated soils, viscous wastes, composts, biofilter clean-up facilities on contaminated sites and cave environments, and they are involved in the degradation of complex and toxic compounds. Here we describe the features of O. sitiensis AW-6T, together with the permanent-draft genome sequence and annotation. The organism was sequenced under the Genomic Encyclopedia for Bacteria and Archaea (GEBA) project at the DOE Joint Genome Institute and is the first genome sequence of a species within the genus Olivibacter. The genome is 5,053,571 bp long and is comprised of 110 scaffolds with an average GC content of 44.61%. Of the 4,565 genes predicted, 4,501 were protein-coding genes and 64 were RNA genes. Most protein-coding genes (68.52%) were assigned to a putative function. The identification of 2-keto-4-pentenoate hydratase/2-oxohepta-3-ene-1,7-dioic acid hydratase-coding genes indicates involvement of this organism in the catechol catabolic pathway. In addition, genes encoding for β-1,4-xylanases and β-1,4-xylosidases reveal the xylanolytic action of O. sitiensis. PMID:25197463
The mitochondrial genome of the phytopathogenic basidiomycete Moniliophthora perniciosa is 109 kb in size and contains a stable integrated plasmid.

PubMed

Formighieri, Eduardo F; Tiburcio, Ricardo A; Armas, Eduardo D; Medrano, Francisco J; Shimo, Hugo; Carels, Nicolas; Góes-Neto, Aristóteles; Cotomacci, Carolina; Carazzolle, Marcelo F; Sardinha-Pinto, Naiara; Thomazella, Daniela P T; Rincones, Johana; Digiampietri, Luciano; Carraro, Dirce M; Azeredo-Espin, Ana M; Reis, Sérgio F; Deckmann, Ana C; Gramacho, Karina; Gonçalves, Marilda S; Moura Neto, José P; Barbosa, Luciana V; Meinhardt, Lyndel W; Cascardo, Júlio C M; Pereira, Gonçalo A G

2008-10-01

We present here the sequence of the mitochondrial genome of the basidiomycete phytopathogenic hemibiotrophic fungus Moniliophthora perniciosa, causal agent of the Witches' Broom Disease in Theobroma cacao. The DNA is a circular molecule of 109,103 base pairs, with 31.9% GC, and is the largest sequenced so far. This size is due essentially to the presence of numerous non-conserved hypothetical ORFs. It contains the 14 genes coding for proteins involved in the oxidative phosphorylation, the two rRNA genes, one ORF coding for a ribosomal protein (rps3), and a set of 26 tRNA genes that recognize codons for all amino acids. Seven homing endonucleases are located inside introns. Except atp8, all conserved known genes are in the same orientation. Phylogenetic analysis based on the cox genes agrees with the commonly accepted fungal taxonomy. An uncommon feature of this mitochondrial genome is the presence of a region that contains a set of four, relatively small, nested, inverted repeats enclosing two genes coding for polymerases with an invertron-type structure and three conserved hypothetical genes interpreted as the stable integration of a mitochondrial linear plasmid. The integration of this plasmid seems to be a recent evolutionary event that could have implications in fungal biology. This sequence is available under GenBank accession number AY376688.
A global view of the nonprotein-coding transcriptome in Plasmodium falciparum

PubMed Central

Raabe, Carsten A.; Sanchez, Cecilia P.; Randau, Gerrit; Robeck, Thomas; Skryabin, Boris V.; Chinni, Suresh V.; Kube, Michael; Reinhardt, Richard; Ng, Guey Hooi; Manickam, Ravichandran; Kuryshev, Vladimir Y.; Lanzer, Michael; Brosius, Juergen; Tang, Thean Hock; Rozhdestvensky, Timofey S.

2010-01-01

Nonprotein-coding RNAs (npcRNAs) represent an important class of regulatory molecules that act in many cellular pathways. Here, we describe the experimental identification and validation of the small npcRNA transcriptome of the human malaria parasite Plasmodium falciparum. We identified 630 novel npcRNA candidates. Based on sequence and structural motifs, 43 of them belong to the C/D and H/ACA-box subclasses of small nucleolar RNAs (snoRNAs) and small Cajal body-specific RNAs (scaRNAs). We further observed the exonization of a functional H/ACA snoRNA gene, which might contribute to the regulation of ribosomal protein L7a gene expression. Some of the small npcRNA candidates are from telomeric and subtelomeric repetitive regions, suggesting their potential involvement in maintaining telomeric integrity and subtelomeric gene silencing. We also detected 328 cis-encoded antisense npcRNAs (asRNAs) complementary to P. falciparum protein-coding genes of a wide range of biochemical pathways, including determinants of virulence and pathology. All cis-encoded asRNA genes tested exhibit lifecycle-specific expression profiles. For all but one of the respective sense–antisense pairs, we deduced concordant patterns of expression. Our findings have important implications for a better understanding of gene regulatory mechanisms in P. falciparum, revealing an extended and sophisticated npcRNA network that may control the expression of housekeeping genes and virulence factors. PMID:19864253
A global view of the nonprotein-coding transcriptome in Plasmodium falciparum.

PubMed

Raabe, Carsten A; Sanchez, Cecilia P; Randau, Gerrit; Robeck, Thomas; Skryabin, Boris V; Chinni, Suresh V; Kube, Michael; Reinhardt, Richard; Ng, Guey Hooi; Manickam, Ravichandran; Kuryshev, Vladimir Y; Lanzer, Michael; Brosius, Juergen; Tang, Thean Hock; Rozhdestvensky, Timofey S

2010-01-01

Nonprotein-coding RNAs (npcRNAs) represent an important class of regulatory molecules that act in many cellular pathways. Here, we describe the experimental identification and validation of the small npcRNA transcriptome of the human malaria parasite Plasmodium falciparum. We identified 630 novel npcRNA candidates. Based on sequence and structural motifs, 43 of them belong to the C/D and H/ACA-box subclasses of small nucleolar RNAs (snoRNAs) and small Cajal body-specific RNAs (scaRNAs). We further observed the exonization of a functional H/ACA snoRNA gene, which might contribute to the regulation of ribosomal protein L7a gene expression. Some of the small npcRNA candidates are from telomeric and subtelomeric repetitive regions, suggesting their potential involvement in maintaining telomeric integrity and subtelomeric gene silencing. We also detected 328 cis-encoded antisense npcRNAs (asRNAs) complementary to P. falciparum protein-coding genes of a wide range of biochemical pathways, including determinants of virulence and pathology. All cis-encoded asRNA genes tested exhibit lifecycle-specific expression profiles. For all but one of the respective sense-antisense pairs, we deduced concordant patterns of expression. Our findings have important implications for a better understanding of gene regulatory mechanisms in P. falciparum, revealing an extended and sophisticated npcRNA network that may control the expression of housekeeping genes and virulence factors.
Complete sequence and gene organization of the mitochondrial genome of Asio flammeus (Strigiformes, strigidae).

PubMed

Zhang, Yanan; Song, Tao; Pan, Tao; Sun, Xiaonan; Sun, Zhonglou; Qian, Lifu; Zhang, Baowei

2016-07-01

The complete sequence of the mitochondrial genome was determined for Asio flammeus, which is distributed widely in geography. The length of the complete mitochondrial genome was 18,966 bp, containing 2 rRNA genes, 22 tRNA genes, 13 protein-coding genes (PCGs), and 1 non-coding region (D-loop). All the genes were distributed on the H-strand, except for the ND6 subunit gene and eight tRNA genes which were encoded on the L-strand. The D-loop of A. flammeus contained many tandem repeats of varying lengths and repeat numbers. The molecular-based phylogeny showed that our species acted as the sister group to A. capensis and the supported Asio was the monophyletic group.
Comparative genomic analysis reveals a novel mitochondrial isoform of human rTS protein and unusual phylogenetic distribution of the rTS gene

PubMed Central

Liang, Ping; Nair, Jayakumar R; Song, Lei; McGuire, John J; Dolnick, Bruce J

2005-01-01

Background The rTS gene (ENOSF1), first identified in Homo sapiens as a gene complementary to the thymidylate synthase (TYMS) mRNA, is known to encode two protein isoforms, rTSα and rTSβ. The rTSβ isoform appears to be an enzyme responsible for the synthesis of signaling molecules involved in the down-regulation of thymidylate synthase, but the exact cellular functions of rTS genes are largely unknown. Results Through comparative genomic sequence analysis, we predicted the existence of a novel protein isoform, rTS, which has a 27 residue longer N-terminus by virtue of utilizing an alternative start codon located upstream of the start codon in rTSβ. We observed that a similar extended N-terminus could be predicted in all rTS genes for which genomic sequences are available and the extended regions are conserved from bacteria to human. Therefore, we reasoned that the protein with the extended N-terminus might represent an ancestral form of the rTS protein. Sequence analysis strongly predicts a mitochondrial signal sequence in the extended N-terminal of human rTSγ, which is absent in rTSβ. We confirmed the existence of rTS in human mitochondria experimentally by demonstrating the presence of both rTSγ and rTSβ proteins in mitochondria isolated by subcellular fractionation. In addition, our comprehensive analysis of rTS orthologous sequences reveals an unusual phylogenetic distribution of this gene, which suggests the occurrence of one or more horizontal gene transfer events. Conclusion The presence of two rTS isoforms in mitochondria suggests that the rTS signaling pathway may be active within mitochondria. Our report also presents an example of identifying novel protein isoforms and for improving gene annotation through comparative genomic analysis. PMID:16162288
Efficiency of VIGS and gene expression in a novel bipartite potexvirus vector delivery system as a function of strength of TGB1 silencing suppression.

PubMed

Lim, Hyoun-Sub; Vaira, Anna Maria; Domier, Leslie L; Lee, Sung Chul; Kim, Hong Gi; Hammond, John

2010-06-20

We have developed plant virus-based vectors for virus-induced gene silencing (VIGS) and protein expression, based on Alternanthera mosaic virus (AltMV), for infection of a wide range of host plants including Nicotiana benthamiana and Arabidopsis thaliana by either mechanical inoculation of in vitro transcripts or via agroinfiltration. In vivo transcripts produced by co-agroinfiltration of bacteriophage T7 RNA polymerase resulted in T7-driven AltMV infection from a binary vector in the absence of the Cauliflower mosaic virus 35S promoter. An artificial bipartite viral vector delivery system was created by separating the AltMV RNA-dependent RNA polymerase and Triple Gene Block (TGB)123-Coat protein (CP) coding regions into two constructs each bearing the AltMV 5' and 3' non-coding regions, which recombined in planta to generate a full-length AltMV genome. Substitution of TGB1 L(88)P, and equivalent changes in other potexvirus TGB1 proteins, affected RNA silencing suppression efficacy and suitability of the vectors from protein expression to VIGS. Published by Elsevier Inc.
A resource of vectors and ES cells for targeted deletion of microRNAs in mice

PubMed Central

Prosser, Haydn M.; Koike-Yusa, Hiroko; Cooper, James D.; Law, Frances C.; Bradley, Allan

2011-01-01

The 21-23 nucleotide single-stranded RNAs classified as microRNAs (miRNA) perform fundamental roles in a wide range of cellular and developmental processes. miRNAs regulate protein expression through sequence-specific base pairing with target messenger RNAs (mRNA) reducing both their stability and the process of protein translation1, 2. At least 30% of protein coding genes appear to be conserved targets for miRNAs1. In contrast to the protein coding genes3, 4, no public resource of miRNA mouse mutant alleles exists. We have generated a library of highly germ-line transmissible C57BL/6N mouse mutant embryonic stem (ES) cells with targeted deletions for the majority of miRNA genes currently annotated within the miRBase registry5. These alleles have been designed to be highly adaptable research tools that can be efficiently altered to create reporter, conditional and other allelic variants. This ES cell resource can be searched electronically and is available from ES cell repositories for distribution to the scientific community6. PMID:21822254
Histone-derived piRNA biogenesis depends on the ping-pong partners Piwi5 and Ago3 in Aedes aegypti

PubMed Central

Girardi, Erika; Miesen, Pascal; Pennings, Bas; Frangeul, Lionel; Saleh, Maria-Carla

2017-01-01

Abstract The piRNA pathway is of key importance in controlling transposable elements in most animal species. In the vector mosquito Aedes aegypti, the presence of eight PIWI proteins and the accumulation of viral piRNAs upon arbovirus infection suggest additional functions of the piRNA pathway beyond genome defense. To better understand the regulatory potential of this pathway, we analyzed in detail host-derived piRNAs in A. aegypti Aag2 cells. We show that a large repertoire of protein-coding genes and non-retroviral integrated RNA virus elements are processed into genic piRNAs by different combinations of PIWI proteins. Among these, we identify a class of genes that produces piRNAs from coding sequences in an Ago3- and Piwi5-dependent fashion. We demonstrate that the replication-dependent histone gene family is a genic source of ping-pong dependent piRNAs and that histone-derived piRNAs are dynamically expressed throughout the cell cycle, suggesting a role for the piRNA pathway in the regulation of histone gene expression. Moreover, our results establish the Aag2 cell line as an accessible experimental model to study gene-derived piRNAs. PMID:28115625
Use of fluorescent proteins and color-coded imaging to visualize cancer cells with different genetic properties.

PubMed

Hoffman, Robert M

2016-03-01

Fluorescent proteins are very bright and available in spectrally-distinct colors, enable the imaging of color-coded cancer cells growing in vivo and therefore the distinction of cancer cells with different genetic properties. Non-invasive and intravital imaging of cancer cells with fluorescent proteins allows the visualization of distinct genetic variants of cancer cells down to the cellular level in vivo. Cancer cells with increased or decreased ability to metastasize can be distinguished in vivo. Gene exchange in vivo which enables low metastatic cancer cells to convert to high metastatic can be color-coded imaged in vivo. Cancer stem-like and non-stem cells can be distinguished in vivo by color-coded imaging. These properties also demonstrate the vast superiority of imaging cancer cells in vivo with fluorescent proteins over photon counting of luciferase-labeled cancer cells.
The DNA Methylome of Human Peripheral Blood Mononuclear Cells

PubMed Central

Ye, Mingzhi; Zheng, Hancheng; Yu, Jian; Wu, Honglong; Sun, Jihua; Zhang, Hongyu; Chen, Quan; Luo, Ruibang; Chen, Minfeng; He, Yinghua; Jin, Xin; Zhang, Qinghui; Yu, Chang; Zhou, Guangyu; Sun, Jinfeng; Huang, Yebo; Zheng, Huisong; Cao, Hongzhi; Zhou, Xiaoyu; Guo, Shicheng; Hu, Xueda; Li, Xin; Kristiansen, Karsten; Bolund, Lars; Xu, Jiujin; Wang, Wen; Yang, Huanming; Wang, Jian; Li, Ruiqiang; Beck, Stephan; Wang, Jun; Zhang, Xiuqing

2010-01-01

DNA methylation plays an important role in biological processes in human health and disease. Recent technological advances allow unbiased whole-genome DNA methylation (methylome) analysis to be carried out on human cells. Using whole-genome bisulfite sequencing at 24.7-fold coverage (12.3-fold per strand), we report a comprehensive (92.62%) methylome and analysis of the unique sequences in human peripheral blood mononuclear cells (PBMC) from the same Asian individual whose genome was deciphered in the YH project. PBMC constitute an important source for clinical blood tests world-wide. We found that 68.4% of CpG sites and <0.2% of non-CpG sites were methylated, demonstrating that non-CpG cytosine methylation is minor in human PBMC. Analysis of the PBMC methylome revealed a rich epigenomic landscape for 20 distinct genomic features, including regulatory, protein-coding, non-coding, RNA-coding, and repeat sequences. Integration of our methylome data with the YH genome sequence enabled a first comprehensive assessment of allele-specific methylation (ASM) between the two haploid methylomes of any individual and allowed the identification of 599 haploid differentially methylated regions (hDMRs) covering 287 genes. Of these, 76 genes had hDMRs within 2 kb of their transcriptional start sites of which >80% displayed allele-specific expression (ASE). These data demonstrate that ASM is a recurrent phenomenon and is highly correlated with ASE in human PBMCs. Together with recently reported similar studies, our study provides a comprehensive resource for future epigenomic research and confirms new sequencing technology as a paradigm for large-scale epigenomics studies. PMID:21085693

Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures

PubMed Central

Stark, Alexander; Lin, Michael F.; Kheradpour, Pouya; Pedersen, Jakob S.; Parts, Leopold; Carlson, Joseph W.; Crosby, Madeline A.; Rasmussen, Matthew D.; Roy, Sushmita; Deoras, Ameya N.; Ruby, J. Graham; Brennecke, Julius; Hodges, Emily; Hinrichs, Angie S.; Caspi, Anat; Paten, Benedict; Park, Seung-Won; Han, Mira V.; Maeder, Morgan L.; Polansky, Benjamin J.; Robson, Bryanne E.; Aerts, Stein; van Helden, Jacques; Hassan, Bassem; Gilbert, Donald G.; Eastman, Deborah A.; Rice, Michael; Weir, Michael; Hahn, Matthew W.; Park, Yongkyu; Dewey, Colin N.; Pachter, Lior; Kent, W. James; Haussler, David; Lai, Eric C.; Bartel, David P.; Hannon, Gregory J.; Kaufman, Thomas C.; Eisen, Michael B.; Clark, Andrew G.; Smith, Douglas; Celniker, Susan E.; Gelbart, William M.; Kellis, Manolis

2008-01-01

Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies. PMID:17994088
Complete mitochondrial genome of the brown alga Sargassum fusiforme (Sargassaceae, Phaeophyceae): genome architecture and taxonomic consideration.

PubMed

Liu, Feng; Pang, Shaojun; Luo, Minbo

2016-01-01

Sargassum fusiforme (Harvey) Setchell (=Hizikia fusiformis (Harvey) Okamura) is one of the most important economic seaweeds for mariculture in China. In this study, we present the complete mitochondrial genome of S. fusiforme. The genome is 34,696 bp in length with circular organization, encoding the standard set of three ribosomal RNA genes (rRNA), 25 transfer RNA genes (tRNA), 35 protein-coding genes, and two conserved open reading frames (ORFs). Its total AT content is 62.47%, lower than other brown algae except Pylaiella littoralis. The mitogenome carries 1571 bp of intergenic region constituting 4.53% of the genome, and 13 pairs of overlapping genes with the overlap size from 1 to 90 bp. The phylogenetic analyses based on 35 protein-coding genes reveal that S. fusiforme has a closer evolutionary relationship with Sargassum muticum than Sargassum horneri, indicating Hizikia are not distinct evolutionary entity and should be reduced to synonymy with Sargassum.
The PE/PPE multigene family codes for virulence factors and is a possible source of mycobacterial antigenic variation: perhaps more?

PubMed

Akhter, Yusuf; Ehebauer, Matthias T; Mukhopadhyay, Sangita; Hasnain, Seyed E

2012-01-01

The PE/PPE multigene family codes for approximately 10% of the Mycobacterium tuberculosis proteome and is encoded by 176 open reading frames. These proteins possess, and have been named after, the conserved proline-glutamate (PE) or proline-proline-glutamate (PPE) motifs at their N-terminus. Their genes have a conserved structure and repeat motifs that could be a potential source of antigenic variation in M. tuberculosis. PE/PPE genes are scattered throughout the genome and PE/PPE pairs are usually encoded in bicistronic operons although this is not universally so. This gene family has evolved by specific gene duplication events. PE/PPE proteins are either secreted or localized to the cell surface. Several are thought to be virulence factors, which participate in evasion of the host immune response. This review summarizes the current knowledge about the gene family in order to better understand its biological function. Copyright © 2011 Elsevier Masson SAS. All rights reserved.
Sequencing and comparative genomic analysis of 1227 Felis catus cDNA sequences enriched for developmental, clinical and nutritional phenotypes

PubMed Central

2012-01-01

Background The feline genome is valuable to the veterinary and model organism genomics communities because the cat is an obligate carnivore and a model for endangered felids. The initial public release of the Felis catus genome assembly provided a framework for investigating the genomic basis of feline biology. However, the entire set of protein coding genes has not been elucidated. Results We identified and characterized 1227 protein coding feline sequences, of which 913 map to public sequences and 314 are novel. These sequences have been deposited into NCBI's genbank database and complement public genomic resources by providing additional protein coding sequences that fill in some of the gaps in the feline genome assembly. Through functional and comparative genomic analyses, we gained an understanding of the role of these sequences in feline development, nutrition and health. Specifically, we identified 104 orthologs of human genes associated with Mendelian disorders. We detected negative selection within sequences with gene ontology annotations associated with intracellular trafficking, cytoskeleton and muscle functions. We detected relatively less negative selection on protein sequences encoding extracellular networks, apoptotic pathways and mitochondrial gene ontology annotations. Additionally, we characterized feline cDNA sequences that have mouse orthologs associated with clinical, nutritional and developmental phenotypes. Together, this analysis provides an overview of the value of our cDNA sequences and enhances our understanding of how the feline genome is similar to, and different from other mammalian genomes. Conclusions The cDNA sequences reported here expand existing feline genomic resources by providing high-quality sequences annotated with comparative genomic information providing functional, clinical, nutritional and orthologous gene information. PMID:22257742
A second gene for acyl-(acyl-carrier-protein): glycerol-3-phosphate acyltransferase in squash, Cucurbita moschata cv. Shirogikuza(*), codes for an oleate-selective isozyme: molecular cloning and protein purification studies.

PubMed

Nishida, I; Sugiura, M; Enju, A; Nakamura, M

2000-12-01

A new isogene for acyl-(acyl-carrier-protein):glycerol-3-phosphate acyltransferase (GPAT; EC 2.3.1.15) in squash has been cloned and the gene product was identified as oleate-selective GPAT. Using PCR primers that could hybridise with exons for a previously cloned squash GPAT, we obtained two PCR products of different size: one coded for a previously cloned squash GPAT corresponding to non-selective isoforms AT2 and AT3, and the other for a new isozyme, probably the oleate-selective isoform AT1. Full-length amino acid sequences of respective isozymes were deduced from the nucleotide sequences of genomic genes and cDNAs, which were cloned by a series of PCR-based methods. Thus, we designated the new gene CmATS1;1 and the other one CmATS1;2. Genome blot analysis revealed that the squash genome contained the two isogenes at non-allelic loci. AT1-active fractions were partially purified, and three polypeptide bands were identified as being AT1 polypeptides, which exhibited relative molecular masses of 39.5-40.5 kDa, pI values of 6.75-7.15, and oleate selectivity over palmitate. Partial amino-terminal sequences obtained from two of these bands verified that the new isogene codes for AT1 polypeptides.
Maternal transcription of non-protein coding RNAs from the PWS-critical region rescues growth retardation in mice.

PubMed

Rozhdestvensky, Timofey S; Robeck, Thomas; Galiveti, Chenna R; Raabe, Carsten A; Seeger, Birte; Wolters, Anna; Gubar, Leonid V; Brosius, Jürgen; Skryabin, Boris V

2016-02-05

Prader-Willi syndrome (PWS) is a neurogenetic disorder caused by loss of paternally expressed genes on chromosome 15q11-q13. The PWS-critical region (PWScr) contains an array of non-protein coding IPW-A exons hosting intronic SNORD116 snoRNA genes. Deletion of PWScr is associated with PWS in humans and growth retardation in mice exhibiting ~15% postnatal lethality in C57BL/6 background. Here we analysed a knock-in mouse containing a 5'HPRT-LoxP-Neo(R) cassette (5'LoxP) inserted upstream of the PWScr. When the insertion was inherited maternally in a paternal PWScr-deletion mouse model (PWScr(p-/m5'LoxP)), we observed compensation of growth retardation and postnatal lethality. Genomic methylation pattern and expression of protein-coding genes remained unaltered at the PWS-locus of PWScr(p-/m5'LoxP) mice. Interestingly, ubiquitous Snord116 and IPW-A exon transcription from the originally silent maternal chromosome was detected. In situ hybridization indicated that PWScr(p-/m5'LoxP) mice expressed Snord116 in brain areas similar to wild type animals. Our results suggest that the lack of PWScr RNA expression in certain brain areas could be a primary cause of the growth retardation phenotype in mice. We propose that activation of disease-associated genes on imprinted regions could lead to general therapeutic strategies in associated diseases.
Diversification and Expression of the PIN, AUX/LAX, and ABCB Families of Putative Auxin Transporters in Populus

PubMed Central

Carraro, Nicola; Tisdale-Orr, Tracy Eizabeth; Clouse, Ronald Matthew; Knöller, Anne Sophie; Spicer, Rachel

2012-01-01

Intercellular transport of the plant hormone auxin is mediated by three families of membrane-bound protein carriers, with the PIN and ABCB families coding primarily for efflux proteins and the AUX/LAX family coding for influx proteins. In the last decade our understanding of gene and protein function for these transporters in Arabidopsis has expanded rapidly but very little is known about their role in woody plant development. Here we present a comprehensive account of all three families in the model woody species Populus, including chromosome distribution, protein structure, quantitative gene expression, and evolutionary relationships. The PIN and AUX/LAX gene families in Populus comprise 16 and 8 members respectively and show evidence for the retention of paralogs following a relatively recent whole genome duplication. There is also differential expression across tissues within many gene pairs. The ABCB family is previously undescribed in Populus and includes 20 members, showing a much deeper evolutionary history, including both tandem and whole genome duplication as well as probable gene loss. A striking number of these transporters are expressed in developing Populus stems and we suggest that evolutionary and structural relationships with known auxin transporters in Arabidopsis can point toward candidate genes for further study in Populus. This is especially important for the ABCBs, which is a large family and includes members in Arabidopsis that are able to transport other substrates in addition to auxin. Protein modeling, sequence alignment and expression data all point to ABCB1.1 as a likely auxin transport protein in Populus. Given that basipetal auxin flow through the cambial zone shapes the development of woody stems, it is important that we identify the full complement of genes involved in this process. This work should lay the foundation for studies targeting specific proteins for functional characterization and in situ localization. PMID:22645571
Chamber Specific Gene Expression Landscape of the Zebrafish Heart

PubMed Central

Singh, Angom Ramcharan; Sivadas, Ambily; Sabharwal, Ankit; Vellarikal, Shamsudheen Karuthedath; Jayarajan, Rijith; Verma, Ankit; Kapoor, Shruti; Joshi, Adita; Scaria, Vinod; Sivasubbu, Sridhar

2016-01-01

The organization of structure and function of cardiac chambers in vertebrates is defined by chamber-specific distinct gene expression. This peculiarity and uniqueness of the genetic signatures demonstrates functional resolution attributed to the different chambers of the heart. Altered expression of the cardiac chamber genes can lead to individual chamber related dysfunctions and disease patho-physiologies. Information on transcriptional repertoire of cardiac compartments is important to understand the spectrum of chamber specific anomalies. We have carried out a genome wide transcriptome profiling study of the three cardiac chambers in the zebrafish heart using RNA sequencing. We have captured the gene expression patterns of 13,396 protein coding genes in the three cardiac chambers—atrium, ventricle and bulbus arteriosus. Of these, 7,260 known protein coding genes are highly expressed (≥10 FPKM) in the zebrafish heart. Thus, this study represents nearly an all-inclusive information on the zebrafish cardiac transcriptome. In this study, a total of 96 differentially expressed genes across the three cardiac chambers in zebrafish were identified. The atrium, ventricle and bulbus arteriosus displayed 20, 32 and 44 uniquely expressing genes respectively. We validated the expression of predicted chamber-restricted genes using independent semi-quantitative and qualitative experimental techniques. In addition, we identified 23 putative novel protein coding genes that are specifically restricted to the ventricle and not in the atrium or bulbus arteriosus. In our knowledge, these 23 novel genes have either not been investigated in detail or are sparsely studied. The transcriptome identified in this study includes 68 differentially expressing zebrafish cardiac chamber genes that have a human ortholog. We also carried out spatiotemporal gene expression profiling of the 96 differentially expressed genes throughout the three cardiac chambers in 11 developmental stages and 6 tissue types of zebrafish. We hypothesize that clustering the differentially expressed genes with both known and unknown functions will deliver detailed insights on fundamental gene networks that are important for the development and specification of the cardiac chambers. It is also postulated that this transcriptome atlas will help utilize zebrafish in a better way as a model for studying cardiac development and to explore functional role of gene networks in cardiac disease pathogenesis. PMID:26815362
Tunicate mitogenomics and phylogenetics: peculiarities of the Herdmania momus mitochondrial genome and support for the new chordate phylogeny

PubMed Central

2009-01-01

Background Tunicates represent a key metazoan group as the sister-group of vertebrates within chordates. The six complete mitochondrial genomes available so far for tunicates have revealed distinctive features. Extensive gene rearrangements and particularly high evolutionary rates have been evidenced with regard to other chordates. This peculiar evolutionary dynamics has hampered the reconstruction of tunicate phylogenetic relationships within chordates based on mitogenomic data. Results In order to further understand the atypical evolutionary dynamics of the mitochondrial genome of tunicates, we determined the complete sequence of the solitary ascidian Herdmania momus. This genome from a stolidobranch ascidian presents the typical tunicate gene content with 13 protein-coding genes, 2 rRNAs and 24 tRNAs which are all encoded on the same strand. However, it also presents a novel gene arrangement, highlighting the extreme plasticity of gene order observed in tunicate mitochondrial genomes. Probabilistic phylogenetic inferences were conducted on the concatenation of the 13 mitochondrial protein-coding genes from representatives of major metazoan phyla. We show that whereas standard homogeneous amino acid models support an artefactual sister position of tunicates relative to all other bilaterians, the CAT and CAT+BP site- and time-heterogeneous mixture models place tunicates as the sister-group of vertebrates within monophyletic chordates. Moreover, the reference phylogeny indicates that tunicate mitochondrial genomes have experienced a drastic acceleration in their evolutionary rate that equally affects protein-coding and ribosomal-RNA genes. Conclusion This is the first mitogenomic study supporting the new chordate phylogeny revealed by recent phylogenomic analyses. It illustrates the beneficial effects of an increased taxon sampling coupled with the use of more realistic amino acid substitution models for the reconstruction of animal phylogeny. PMID:19922605
Tunicate mitogenomics and phylogenetics: peculiarities of the Herdmania momus mitochondrial genome and support for the new chordate phylogeny.

PubMed

Singh, Tiratha Raj; Tsagkogeorga, Georgia; Delsuc, Frédéric; Blanquart, Samuel; Shenkar, Noa; Loya, Yossi; Douzery, Emmanuel Jp; Huchon, Dorothée

2009-11-17

Tunicates represent a key metazoan group as the sister-group of vertebrates within chordates. The six complete mitochondrial genomes available so far for tunicates have revealed distinctive features. Extensive gene rearrangements and particularly high evolutionary rates have been evidenced with regard to other chordates. This peculiar evolutionary dynamics has hampered the reconstruction of tunicate phylogenetic relationships within chordates based on mitogenomic data. In order to further understand the atypical evolutionary dynamics of the mitochondrial genome of tunicates, we determined the complete sequence of the solitary ascidian Herdmania momus. This genome from a stolidobranch ascidian presents the typical tunicate gene content with 13 protein-coding genes, 2 rRNAs and 24 tRNAs which are all encoded on the same strand. However, it also presents a novel gene arrangement, highlighting the extreme plasticity of gene order observed in tunicate mitochondrial genomes. Probabilistic phylogenetic inferences were conducted on the concatenation of the 13 mitochondrial protein-coding genes from representatives of major metazoan phyla. We show that whereas standard homogeneous amino acid models support an artefactual sister position of tunicates relative to all other bilaterians, the CAT and CAT+BP site- and time-heterogeneous mixture models place tunicates as the sister-group of vertebrates within monophyletic chordates. Moreover, the reference phylogeny indicates that tunicate mitochondrial genomes have experienced a drastic acceleration in their evolutionary rate that equally affects protein-coding and ribosomal-RNA genes. This is the first mitogenomic study supporting the new chordate phylogeny revealed by recent phylogenomic analyses. It illustrates the beneficial effects of an increased taxon sampling coupled with the use of more realistic amino acid substitution models for the reconstruction of animal phylogeny.
Complete mitochondrial genomes of the ‘intermediate form’ of Fasciola and Fasciola gigantica, and their comparison with F. hepatica

PubMed Central

2014-01-01

Background Fascioliasis is an important and neglected disease of humans and other mammals, caused by trematodes of the genus Fasciola. Fasciola hepatica and F. gigantica are valid species that infect humans and animals, but the specific status of Fasciola sp. (‘intermediate form’) is unclear. Methods Single specimens inferred to represent Fasciola sp. (‘intermediate form’; Heilongjiang) and F. gigantica (Guangxi) from China were genetically identified and characterized using PCR-based sequencing of the first and second internal transcribed spacer regions of nuclear ribosomal DNA. The complete mitochondrial (mt) genomes of these representative specimens were then sequenced. The relationships of these specimens with selected members of the Trematoda were assessed by phylogenetic analysis of concatenated amino acid sequence datasets by Bayesian inference (BI). Results The complete mt genomes of representatives of Fasciola sp. and F. gigantica were 14,453 bp and 14,478 bp in size, respectively. Both mt genomes contain 12 protein-coding genes, 22 transfer RNA genes and two ribosomal RNA genes, but lack an atp8 gene. All protein-coding genes are transcribed in the same direction, and the gene order in both mt genomes is the same as that published for F. hepatica. Phylogenetic analysis of the concatenated amino acid sequence data for all 12 protein-coding genes showed that the specimen of Fasciola sp. was more closely related to F. gigantica than to F. hepatica. Conclusions The mt genomes characterized here provide a rich source of markers, which can be used in combination with nuclear markers and imaging techniques, for future comparative studies of the biology of Fasciola sp. from China and other countries. PMID:24685294
Complete mitochondrial genomes of the 'intermediate form' of Fasciola and Fasciola gigantica, and their comparison with F. hepatica.

PubMed

Liu, Guo-Hua; Gasser, Robin B; Young, Neil D; Song, Hui-Qun; Ai, Lin; Zhu, Xing-Quan

2014-03-31

Fascioliasis is an important and neglected disease of humans and other mammals, caused by trematodes of the genus Fasciola. Fasciola hepatica and F. gigantica are valid species that infect humans and animals, but the specific status of Fasciola sp. ('intermediate form') is unclear. Single specimens inferred to represent Fasciola sp. ('intermediate form'; Heilongjiang) and F. gigantica (Guangxi) from China were genetically identified and characterized using PCR-based sequencing of the first and second internal transcribed spacer regions of nuclear ribosomal DNA. The complete mitochondrial (mt) genomes of these representative specimens were then sequenced. The relationships of these specimens with selected members of the Trematoda were assessed by phylogenetic analysis of concatenated amino acid sequence datasets by Bayesian inference (BI). The complete mt genomes of representatives of Fasciola sp. and F. gigantica were 14,453 bp and 14,478 bp in size, respectively. Both mt genomes contain 12 protein-coding genes, 22 transfer RNA genes and two ribosomal RNA genes, but lack an atp8 gene. All protein-coding genes are transcribed in the same direction, and the gene order in both mt genomes is the same as that published for F. hepatica. Phylogenetic analysis of the concatenated amino acid sequence data for all 12 protein-coding genes showed that the specimen of Fasciola sp. was more closely related to F. gigantica than to F. hepatica. The mt genomes characterized here provide a rich source of markers, which can be used in combination with nuclear markers and imaging techniques, for future comparative studies of the biology of Fasciola sp. from China and other countries.
Ribosome profiling: a Hi-Def monitor for protein synthesis at the genome-wide scale

PubMed Central

Michel, Audrey M; Baranov, Pavel V

2013-01-01

Ribosome profiling or ribo-seq is a new technique that provides genome-wide information on protein synthesis (GWIPS) in vivo. It is based on the deep sequencing of ribosome protected mRNA fragments allowing the measurement of ribosome density along all RNA molecules present in the cell. At the same time, the high resolution of this technique allows detailed analysis of ribosome density on individual RNAs. Since its invention, the ribosome profiling technique has been utilized in a range of studies in both prokaryotic and eukaryotic organisms. Several studies have adapted and refined the original ribosome profiling protocol for studying specific aspects of translation. Ribosome profiling of initiating ribosomes has been used to map sites of translation initiation. These studies revealed the surprisingly complex organization of translation initiation sites in eukaryotes. Multiple initiation sites are responsible for the generation of N-terminally extended and truncated isoforms of known proteins as well as for the translation of numerous open reading frames (ORFs), upstream of protein coding ORFs. Ribosome profiling of elongating ribosomes has been used for measuring differential gene expression at the level of translation, the identification of novel protein coding genes and ribosome pausing. It has also provided data for developing quantitative models of translation. Although only a dozen or so ribosome profiling datasets have been published so far, they have already dramatically changed our understanding of translational control and have led to new hypotheses regarding the origin of protein coding genes. © 2013 John Wiley & Sons, Ltd. PMID:23696005
A dehydration-inducible gene in the truffle Tuber borchii identifies a novel group of dehydrins

PubMed Central

Abba', Simona; Ghignone, Stefano; Bonfante, Paola

2006-01-01

Background The expressed sequence tag M6G10 was originally isolated from a screening for differentially expressed transcripts during the reproductive stage of the white truffle Tuber borchii. mRNA levels for M6G10 increased dramatically during fruiting body maturation compared to the vegetative mycelial stage. Results Bioinformatics tools, phylogenetic analysis and expression studies were used to support the hypothesis that this sequence, named TbDHN1, is the first dehydrin (DHN)-like coding gene isolated in fungi. Homologs of this gene, all defined as "coding for hypothetical proteins" in public databases, were exclusively found in ascomycetous fungi and in plants. Although complete (or almost complete) fungal genomes and EST collections of some Basidiomycota and Glomeromycota are already available, DHN-like proteins appear to be represented only in Ascomycota. A new and previously uncharacterized conserved signature pattern was identified and proposed to Uniprot database as the main distinguishing feature of this new group of DHNs. Expression studies provide experimental evidence of a transcript induction of TbDHN1 during cellular dehydration. Conclusion Expression pattern and sequence similarities to known plant DHNs indicate that TbDHN1 is the first characterized DHN-like protein in fungi. The high similarity of TbDHN1 with homolog coding sequences implies the existence of a novel fungal/plant group of LEA Class II proteins characterized by a previously undescribed signature pattern. PMID:16512918
Natural selection in avian protein-coding genes expressed in brain.

PubMed

Axelsson, Erik; Hultin-Rosenberg, Lina; Brandström, Mikael; Zwahlén, Martin; Clayton, David F; Ellegren, Hans

2008-06-01

The evolution of birds from theropod dinosaurs took place approximately 150 million years ago, and was associated with a number of specific adaptations that are still evident among extant birds, including feathers, song and extravagant secondary sexual characteristics. Knowledge about the molecular evolutionary background to such adaptations is lacking. Here, we analyse the evolution of > 5000 protein-coding gene sequences expressed in zebra finch brain by comparison to orthologous sequences in chicken. Mean d(N)/d(S) is 0.085 and genes with their maximal expression in the eye and central nervous system have the lowest mean d(N)/d(S) value, while those expressed in digestive and reproductive tissues exhibit the highest. We find that fast-evolving genes (those which have higher than expected rate of nonsynonymous substitution, indicative of adaptive evolution) are enriched for biological functions such as fertilization, muscle contraction, defence response, response to stress, wounding and endogenous stimulus, and cell death. After alignment to mammalian orthologues, we identify a catalogue of 228 genes that show a significantly higher rate of protein evolution in the two bird lineages than in mammals. These accelerated bird genes, representing candidates for avian-specific adaptations, include genes implicated in vocal learning and other cognitive processes. Moreover, colouration genes evolve faster in birds than in mammals, which may have been driven by sexual selection for extravagant plumage characteristics.
Plastid and mitochondrial genomes of Coccophora langsdorfii (Fucales, Phaeophyceae) and the utility of molecular markers

PubMed Central

Graf, Louis; Kim, Yae Jin; Cho, Ga Youn; Miller, Kathy Ann

2017-01-01

Coccophora langsdorfii (Turner) Greville (Fucales) is an intertidal brown alga that is endemic to Northeast Asia and increasingly endangered by habitat loss and climate change. We sequenced the complete circular plastid and mitochondrial genomes of C. langsdorfii. The circular plastid genome is 124,450 bp and contains 139 protein-coding, 28 tRNA and 6 rRNA genes. The circular mitochondrial genome is 35,660 bp and contains 38 protein-coding, 25 tRNA and 3 rRNA genes. The structure and gene content of the C. langsdorfii plastid genome is similar to those of other species in the Fucales. The plastid genomes of brown algae in other orders share similar gene content but exhibit large structural recombination. The large in-frame insert in the cox2 gene in the mitochondrial genome of C. langsdorfii is typical of other brown algae. We explored the effect of this insertion on the structure and function of the cox2 protein. We estimated the usefulness of 135 plastid genes and 35 mitochondrial genes for developing molecular markers. This study shows that 29 organellar genes will prove efficient for resolving brown algal phylogeny. In addition, we propose a new molecular marker suitable for the study of intraspecific genetic diversity that should be tested in a large survey of populations of C. langsdorfii. PMID:29095864
Cloning and characterization of a mouse gene with homology to the human von Hippel-Lindau disease tumor suppressor gene: implications for the potential organization of the human von Hippel-Lindau disease gene.

PubMed

Gao, J; Naglich, J G; Laidlaw, J; Whaley, J M; Seizinger, B R; Kley, N

1995-02-15

The human von Hippel-Lindau disease (VHL) gene has recently been identified and, based on the nucleotide sequence of a partial cDNA clone, has been predicted to encode a novel protein with as yet unknown functions [F. Latif et al., Science (Washington DC), 260: 1317-1320, 1993]. The length of the encoded protein and the characteristics of the cellular expressed protein are as yet unclear. Here we report the cloning and characterization of a mouse gene (mVHLh1) that is widely expressed in different mouse tissues and shares high homology with the human VHL gene. It predicts a protein 181 residues long (and/or 162 amino acids, considering a potential alternative start codon), which across a core region of approximately 140 residues displays a high degree of sequence identity (98%) to the predicted human VHL protein. High stringency DNA and RNA hybridization experiments and protein expression analyses indicate that this gene is the most highly VHL-related mouse gene, suggesting that it represents the mouse VHL gene homologue rather than a related gene sharing a conserved functional domain. These findings provide new insights into the potential organization of the VHL gene and nature of its encoded protein.
Dissecting non-coding RNA mechanisms in cellulo by single-molecule high-resolution localization and counting

PubMed Central

Pitchiaya, Sethuramasundaram; Krishnan, Vishalakshi; Custer, Thomas C.; Walter, Nils G.

2013-01-01

Non-coding RNAs (ncRNAs) recently were discovered to outnumber their protein-coding counterparts, yet their diverse functions are still poorly understood. Here we report on a method for the intracellular Single-molecule High Resolution Localization and Counting (iSHiRLoC) of microRNAs (miRNAs), a conserved, ubiquitous class of regulatory ncRNAs that controls the expression of over 60% of all mammalian protein coding genes post-transcriptionally, by a mechanism shrouded by seemingly contradictory observations. We present protocols to execute single particle tracking (SPT) and single-molecule counting of functional microinjected, fluorophore-labeled miRNAs and thereby extract diffusion coefficients and molecular stoichiometries of micro-ribonucleoprotein (miRNP) complexes from living and fixed cells, respectively. This probing of miRNAs at the single molecule level sheds new light on the intracellular assembly/disassembly of miRNPs, thus beginning to unravel the dynamic nature of this important gene regulatory pathway and facilitating the development of a parsimonious model for their obscured mechanism of action. PMID:23820309
Long non-coding RNA discovery across the genus anopheles reveals conserved secondary structures within and beyond the Gambiae complex.

PubMed

Jenkins, Adam M; Waterhouse, Robert M; Muskavitch, Marc A T

2015-04-23

Long non-coding RNAs (lncRNAs) have been defined as mRNA-like transcripts longer than 200 nucleotides that lack significant protein-coding potential, and many of them constitute scaffolds for ribonucleoprotein complexes with critical roles in epigenetic regulation. Various lncRNAs have been implicated in the modulation of chromatin structure, transcriptional and post-transcriptional gene regulation, and regulation of genomic stability in mammals, Caenorhabditis elegans, and Drosophila melanogaster. The purpose of this study is to identify the lncRNA landscape in the malaria vector An. gambiae and assess the evolutionary conservation of lncRNAs and their secondary structures across the Anopheles genus. Using deep RNA sequencing of multiple Anopheles gambiae life stages, we have identified 2,949 lncRNAs and more than 300 previously unannotated putative protein-coding genes. The lncRNAs exhibit differential expression profiles across life stages and adult genders. We find that across the genus Anopheles, lncRNAs display much lower sequence conservation than protein-coding genes. Additionally, we find that lncRNA secondary structure is highly conserved within the Gambiae complex, but diverges rapidly across the rest of the genus Anopheles. This study offers one of the first lncRNA secondary structure analyses in vector insects. Our description of lncRNAs in An. gambiae offers the most comprehensive genome-wide insights to date into lncRNAs in this vector mosquito, and defines a set of potential targets for the development of vector-based interventions that may further curb the human malaria burden in disease-endemic countries.
Growth of Rhodococcus sp. strain BCP1 on gaseous n-alkanes: new metabolic insights and transcriptional analysis of two soluble di-iron monooxygenase genes

PubMed Central

Cappelletti, Martina; Presentato, Alessandro; Milazzo, Giorgio; Turner, Raymond J.; Fedi, Stefano; Frascari, Dario; Zannoni, Davide

2015-01-01

Rhodococcus sp. strain BCP1 was initially isolated for its ability to grow on gaseous n-alkanes, which act as inducers for the co-metabolic degradation of low-chlorinated compounds. Here, both molecular and metabolic features of BCP1 cells grown on gaseous and short-chain n-alkanes (up to n-heptane) were examined in detail. We show that propane metabolism generated terminal and sub-terminal oxidation products such as 1- and 2-propanol, whereas 1-butanol was the only terminal oxidation product detected from n-butane metabolism. Two gene clusters, prmABCD and smoABCD—coding for Soluble Di-Iron Monooxgenases (SDIMOs) involved in gaseous n-alkanes oxidation—were detected in the BCP1 genome. By means of Reverse Transcriptase-quantitative PCR (RT-qPCR) analysis, a set of substrates inducing the expression of the sdimo genes in BCP1 were assessed as well as their transcriptional repression in the presence of sugars, organic acids, or during the cell growth on rich medium (Luria–Bertani broth). The transcriptional start sites of both the sdimo gene clusters were identified by means of primer extension experiments. Finally, proteomic studies revealed changes in the protein pattern induced by growth on gaseous- (n-butane) and/or liquid (n-hexane) short-chain n-alkanes as compared to growth on succinate. Among the differently expressed protein spots, two chaperonins and an isocytrate lyase were identified along with oxidoreductases involved in oxidation reactions downstream of the initial monooxygenase reaction step. PMID:26029173

Some links on this page may take you to non-federal websites. Their policies may differ from this site.