Lazzarato, F; Franceschinis, G; Botta, M; Cordero, F; Calogero, R A
2004-11-01
RRE allows the extraction of non-coding regions surrounding a coding sequence [i.e. gene upstream region, 5'-untranslated region (5'-UTR), introns, 3'-UTR, downstream region] from annotated genomic datasets available at NCBI. RRE parser and web-based interface are accessible at http://www.bioinformatica.unito.it/bioinformatics/rre/rre.html
The complete mitochondrial genome of Chrysopa pallens (Insecta, Neuroptera, Chrysopidae).
He, Kun; Chen, Zhe; Yu, Dan-Na; Zhang, Jia-Yong
2012-10-01
The complete mitochondrial genome of Chrysopa pallens (Neuroptera, Chrysopidae) was sequenced. It consists of 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA (rRNA) genes, and a control region (AT-rich region). The total length of C. pallens mitogenome is 16,723 bp with 79.5% AT content, and the length of control region is 1905 bp with 89.1% AT content. The non-coding regions of C. pallens include control region between 12S rRNA and trnI genes, and a 75-bp space region between trnI and trnQ genes.
Novel variants of the 5S rRNA genes in Eruca sativa.
Singh, K; Bhatia, S; Lakshmikumaran, M
1994-02-01
The 5S ribosomal RNA (rRNA) genes of Eruca sativa were cloned and characterized. They are organized into clusters of tandemly repeated units. Each repeat unit consists of a 119-bp coding region followed by a noncoding spacer region that separates it from the coding region of the next repeat unit. Our study reports novel gene variants of the 5S rRNA genes in plants. Two families of the 5S rDNA, the 0.5-kb size family and the 1-kb size family, coexist in the E. sativa genome. The 0.5-kb size family consists of the 5S rRNA genes (S4) that have coding regions similar to those of other reported plant 5S rDNA sequences, whereas the 1-kb size family consists of the 5S rRNA gene variants (S1) that exist as 1-kb BamHI tandem repeats. S1 is made up of two variant units (V1 and V2) of 5S rDNA where the BamHI site between the two units is mutated. Sequence heterogeneity among S4, V1, and V2 units exists throughout the sequence and is not limited to the noncoding spacer region only. The coding regions of V1 and V2 show approximately 20% dissimilarity to the coding regions of S4 and other reported plant 5S rDNA sequences. Such a large variation in the coding regions of the 5S rDNA units within the same plant species has been observed for the first time. Restriction site variation is observed between the two size classes of 5S rDNA in E. sativa.(ABSTRACT TRUNCATED AT 250 WORDS)
Pietan, Lucas L.; Spradling, Theresa A.
2016-01-01
In animals, mitochondrial DNA (mtDNA) typically occurs as a single circular chromosome with 13 protein-coding genes and 22 tRNA genes. The various species of lice examined previously, however, have shown mitochondrial genome rearrangements with a range of chromosome sizes and numbers. Our research demonstrates that the mitochondrial genomes of two species of chewing lice found on pocket gophers, Geomydoecus aurei and Thomomydoecus minor, are fragmented with the 1,536 base-pair (bp) cytochrome-oxidase subunit I (cox1) gene occurring as the only protein-coding gene on a 1,916–1,964 bp minicircular chromosome in the two species, respectively. The cox1 gene of T. minor begins with an atypical start codon, while that of G. aurei does not. Components of the non-protein coding sequence of G. aurei and T. minor include a tRNA (isoleucine) gene, inverted repeat sequences consistent with origins of replication, and an additional non-coding region that is smaller than the non-coding sequence of other lice with such fragmented mitochondrial genomes. Sequences of cox1 minichromosome clones for each species reveal extensive length and sequence heteroplasmy in both coding and noncoding regions. The highly variable non-gene regions of G. aurei and T. minor have little sequence similarity with one another except for a 19-bp region of phylogenetically conserved sequence with unknown function. PMID:27589589
Structure and expression of canary myc family genes.
Collum, R G; Clayton, D F; Alt, F W
1991-01-01
We found that the canary N-myc gene is highly related to mammalian N-myc genes in both the protein-coding region and the long 3' untranslated region. Examined coding regions of the canary c-myc gene were also highly related to their mammalian counterparts, but in contrast to N-myc, the canary and mammalian c-myc genes were quite divergent in their 3' untranslated regions. We readily detected N-myc and c-myc expression in the adult canary brain and found N-myc expression both at sites of proliferating neuronal precursors and in mature neurons. Images PMID:1996121
Network perturbation by recurrent regulatory variants in cancer
Cho, Ara; Lee, Insuk; Choi, Jung Kyoon
2017-01-01
Cancer driving genes have been identified as recurrently affected by variants that alter protein-coding sequences. However, a majority of cancer variants arise in noncoding regions, and some of them are thought to play a critical role through transcriptional perturbation. Here we identified putative transcriptional driver genes based on combinatorial variant recurrence in cis-regulatory regions. The identified genes showed high connectivity in the cancer type-specific transcription regulatory network, with high outdegree and many downstream genes, highlighting their causative role during tumorigenesis. In the protein interactome, the identified transcriptional drivers were not as highly connected as coding driver genes but appeared to form a network module centered on the coding drivers. The coding and regulatory variants associated via these interactions between the coding and transcriptional drivers showed exclusive and complementary occurrence patterns across tumor samples. Transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes. PMID:28333928
Intact coding region of the serotonin transporter gene in obsessive-compulsive disorder
DOE Office of Scientific and Technical Information (OSTI.GOV)
Altemus, M.; Murphy, D.L.; Greenberg, B.
1996-07-26
Epidemiologic studies indicate that obsessive-compulsive disorder is genetically transmitted in some families, although no genetic abnormalities have been identified in individuals with this disorder. The selective response of obsessive-compulsive disorder to treatment with agents which block serotonin reuptake suggests the gene coding for the serotonin transporter as a candidate gene. The primary structure of the serotonin-transporter coding region was sequenced in 22 patients with obsessive-compulsive disorder, using direct PCR sequencing of cDNA synthesized from platelet serotonin-transporter mRNA. No variations in amino acid sequence were found among the obsessive-compulsive disorder patients or healthy controls. These results do not support a rolemore » for alteration in the primary structure of the coding region of the serotonin-transporter gene in the pathogenesis of obsessive-compulsive disorder. 27 refs.« less
Using the NCBI Genome Databases to Compare the Genes for Human & Chimpanzee Beta Hemoglobin
ERIC Educational Resources Information Center
Offner, Susan
2010-01-01
The beta hemoglobin protein is identical in humans and chimpanzees. In this tutorial, students see that even though the proteins are identical, the genes that code for them are not. There are many more differences in the introns than in the exons, which indicates that coding regions of DNA are more highly conserved than non-coding regions.
Zorc, Minja; Kunej, Tanja
2016-05-01
MicroRNAs (miRNAs) are a class of non-coding RNAs involved in posttranscriptional regulation of target genes. Regulation requires complementarity between target mRNA and the mature miRNA seed region, responsible for their recognition and binding. It has been estimated that each miRNA targets approximately 200 genes, and genetic variability of miRNA genes has been reported to affect phenotypic variability and disease susceptibility in humans, livestock species, and model organisms. Polymorphisms in miRNA genes could therefore represent biomarkers for phenotypic traits in livestock animals. In our previous study, we collected polymorphisms within miRNA genes in chicken. In the present study, we identified miRNA-related genomic overlaps to prioritize genomic regions of interest for further functional studies and biomarker discovery. Overlapping genomic regions in chicken were analyzed using the following bioinformatics tools and databases: miRNA SNiPer, Ensembl, miRBase, NCBI Blast, and QTLdb. Out of 740 known pre-miRNA genes, 263 (35.5 %) contain polymorphisms; among them, 35 contain more than three polymorphisms The most polymorphic miRNA genes in chicken are gga-miR-6662, containing 23 single nucleotide polymorphisms (SNPs) within the pre-miRNA region, including five consecutive SNPs, and gga-miR-6688, containing ten polymorphisms including three consecutive polymorphisms. Several miRNA-related genomic hotspots have been revealed in chicken genome; polymorphic miRNA genes are located within protein-coding and/or non-coding transcription units and quantitative trait loci (QTL) associated with production traits. The present study includes the first description of an exonic miRNA in a chicken genome, an overlap between the miRNA gene and the exon of the protein-coding gene (gga-miR-6578/HADHB), and the first report of a missense polymorphism located within a mature miRNA seed region. Identified miRNA-related genomic hotspots in chicken can serve researchers as a starting point for further functional studies and association studies with poultry production and health traits and the basis for systematic screening of exonic miRNAs and missense/miRNA seed polymorphisms in other genomes.
Dushyanth, K; Bhattacharya, T K; Shukla, R; Chatterjee, R N; Sitaramamma, T; Paswan, C; Guru Vishnu, P
2016-10-01
Myostatin is a member of TGF-β super family and is directly involved in regulation of body growth through limiting muscular growth. A study was carried out in three chicken lines to identify the polymorphism in the coding region of the myostatin gene through SSCP and DNA sequencing. A total of 12 haplotypes were observed in myostatin coding region of chicken. Significant associations between haplogroups with body weight at day 1, 14, 28, and 42 days, and carcass traits at 42 days were observed across the lines. It is concluded that the coding region of myostatin gene was polymorphic, with varied levels of expression among lines and had significant effects on growth traits. The expression of MSTN gene varied during embryonic and post hatch development stage.
GeneMachine: gene prediction and sequence annotation.
Makalowska, I; Ryan, J F; Baxevanis, A D
2001-09-01
A number of free-standing programs have been developed in order to help researchers find potential coding regions and deduce gene structure for long stretches of what is essentially 'anonymous DNA'. As these programs apply inherently different criteria to the question of what is and is not a coding region, multiple algorithms should be used in the course of positional cloning and positional candidate projects to assure that all potential coding regions within a previously-identified critical region are identified. We have developed a gene identification tool called GeneMachine which allows users to query multiple exon and gene prediction programs in an automated fashion. BLAST searches are also performed in order to see whether a previously-characterized coding region corresponds to a region in the query sequence. A suite of Perl programs and modules are used to run MZEF, GENSCAN, GRAIL 2, FGENES, RepeatMasker, Sputnik, and BLAST. The results of these runs are then parsed and written into ASN.1 format. Output files can be opened using NCBI Sequin, in essence using Sequin as both a workbench and as a graphical viewer. The main feature of GeneMachine is that the process is fully automated; the user is only required to launch GeneMachine and then open the resulting file with Sequin. Annotations can then be made to these results prior to submission to GenBank, thereby increasing the intrinsic value of these data. GeneMachine is freely-available for download at http://genome.nhgri.nih.gov/genemachine. A public Web interface to the GeneMachine server for academic and not-for-profit users is available at http://genemachine.nhgri.nih.gov. The Web supplement to this paper may be found at http://genome.nhgri.nih.gov/genemachine/supplement/.
Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U.; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N.; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O.
2014-01-01
Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes. PMID:25264628
Reiche, Kristin; Kasack, Katharina; Schreiber, Stephan; Lüders, Torben; Due, Eldri U; Naume, Bjørn; Riis, Margit; Kristensen, Vessela N; Horn, Friedemann; Børresen-Dale, Anne-Lise; Hackermüller, Jörg; Baumbusch, Lars O
2014-01-01
Breast cancer, the second leading cause of cancer death in women, is a highly heterogeneous disease, characterized by distinct genomic and transcriptomic profiles. Transcriptome analyses prevalently assessed protein-coding genes; however, the majority of the mammalian genome is expressed in numerous non-coding transcripts. Emerging evidence supports that many of these non-coding RNAs are specifically expressed during development, tumorigenesis, and metastasis. The focus of this study was to investigate the expression features and molecular characteristics of long non-coding RNAs (lncRNAs) in breast cancer. We investigated 26 breast tumor and 5 normal tissue samples utilizing a custom expression microarray enclosing probes for mRNAs as well as novel and previously identified lncRNAs. We identified more than 19,000 unique regions significantly differentially expressed between normal versus breast tumor tissue, half of these regions were non-coding without any evidence for functional open reading frames or sequence similarity to known proteins. The identified non-coding regions were primarily located in introns (53%) or in the intergenic space (33%), frequently orientated in antisense-direction of protein-coding genes (14%), and commonly distributed at promoter-, transcription factor binding-, or enhancer-sites. Analyzing the most diverse mRNA breast cancer subtypes Basal-like versus Luminal A and B resulted in 3,025 significantly differentially expressed unique loci, including 682 (23%) for non-coding transcripts. A notable number of differentially expressed protein-coding genes displayed non-synonymous expression changes compared to their nearest differentially expressed lncRNA, including an antisense lncRNA strongly anticorrelated to the mRNA coding for histone deacetylase 3 (HDAC3), which was investigated in more detail. Previously identified chromatin-associated lncRNAs (CARs) were predominantly downregulated in breast tumor samples, including CARs located in the protein-coding genes for CALD1, FTX, and HNRNPH1. In conclusion, a number of differentially expressed lncRNAs have been identified with relation to cancer-related protein-coding genes.
The complete mitochondrial genome of the Giant Manta ray, Manta birostris.
Hinojosa-Alvarez, Silvia; Díaz-Jaimes, Pindaro; Marcet-Houben, Marina; Gabaldón, Toni
2015-01-01
The complete mitochondrial genome of the giant manta ray (Manta birostris), consists of 18,075 bp with rich A + T and low G content. Gene organization and length is similar to other species of ray. It comprises of 13 protein-coding genes, 2 rRNAs genes, 23 tRNAs genes and 1 non-coding sequence, and the control region. We identified an AT tandem repeat region, similar to that reported in Mobula japanica.
Recurrent and functional regulatory mutations in breast cancer.
Rheinbay, Esther; Parasuraman, Prasanna; Grimsby, Jonna; Tiao, Grace; Engreitz, Jesse M; Kim, Jaegil; Lawrence, Michael S; Taylor-Weiner, Amaro; Rodriguez-Cuevas, Sergio; Rosenberg, Mara; Hess, Julian; Stewart, Chip; Maruvka, Yosef E; Stojanov, Petar; Cortes, Maria L; Seepo, Sara; Cibulskis, Carrie; Tracy, Adam; Pugh, Trevor J; Lee, Jesse; Zheng, Zongli; Ellisen, Leif W; Iafrate, A John; Boehm, Jesse S; Gabriel, Stacey B; Meyerson, Matthew; Golub, Todd R; Baselga, Jose; Hidalgo-Miranda, Alfredo; Shioda, Toshi; Bernards, Andre; Lander, Eric S; Getz, Gad
2017-07-06
Genomic analysis of tumours has led to the identification of hundreds of cancer genes on the basis of the presence of mutations in protein-coding regions. By contrast, much less is known about cancer-causing mutations in non-coding regions. Here we perform deep sequencing in 360 primary breast cancers and develop computational methods to identify significantly mutated promoters. Clear signals are found in the promoters of three genes. FOXA1, a known driver of hormone-receptor positive breast cancer, harbours a mutational hotspot in its promoter leading to overexpression through increased E2F binding. RMRP and NEAT1, two non-coding RNA genes, carry mutations that affect protein binding to their promoters and alter expression levels. Our study shows that promoter regions harbour recurrent mutations in cancer with functional consequences and that the mutations occur at similar frequencies as in coding regions. Power analyses indicate that more such regions remain to be discovered through deep sequencing of adequately sized cohorts of patients.
Mu-Like Prophage in Serogroup B Neisseria meningitidis Coding for Surface-Exposed Antigens
Masignani, Vega; Giuliani, Marzia Monica; Tettelin, Hervé; Comanducci, Maurizio; Rappuoli, Rino; Scarlato, Vincenzo
2001-01-01
Sequence analysis of the genome of Neisseria meningititdis serogroup B revealed the presence of an ∼35-kb region inserted within a putative gene coding for an ABC-type transporter. The region contains 46 open reading frames, 29 of which are colinear and homologous to the genes of Escherichia coli Mu phage. Two prophages with similar organizations were also found in serogroup A meningococcus, and one was found in Haemophilus influenzae. Early and late phage functions are well preserved in this family of Mu-like prophages. Several regions of atypical nucleotide content were identified. These likely represent genes acquired by horizontal transfer. Three of the acquired genes are shown to code for surface-associated antigens, and the encoded proteins are able to induce bactericidal antibodies. PMID:11254622
Expressed gene sequence of the IFN-gamma-response chemokine CXCL9 of cattle, horses, and swine
USDA-ARS?s Scientific Manuscript database
This report describes the cloning and characterization of expressed gene sequences of bovine, equine, and swine CXCL9 from RNA obtained from peripheral blood mononuclear cell (PBMC) or other tissues. The bovine coding region was 378 nucleotides in length, while the equine and swine coding regions w...
XGC developments for a more efficient XGC-GENE code coupling
NASA Astrophysics Data System (ADS)
Dominski, Julien; Hager, Robert; Ku, Seung-Hoe; Chang, Cs
2017-10-01
In the Exascale Computing Program, the High-Fidelity Whole Device Modeling project initially aims at delivering a tightly-coupled simulation of plasma neoclassical and turbulence dynamics from the core to the edge of the tokamak. To permit such simulations, the gyrokinetic codes GENE and XGC will be coupled together. Numerical efforts are made to improve the numerical schemes agreement in the coupling region. One of the difficulties of coupling those codes together is the incompatibility of their grids. GENE is a continuum grid-based code and XGC is a Particle-In-Cell code using unstructured triangular mesh. A field-aligned filter is thus implemented in XGC. Even if XGC originally had an approximately field-following mesh, this field-aligned filter permits to have a perturbation discretization closer to the one solved in the field-aligned code GENE. Additionally, new XGC gyro-averaging matrices are implemented on a velocity grid adapted to the plasma properties, thus ensuring same accuracy from the core to the edge regions.
Evidence of translation efficiency adaptation of the coding regions of the bacteriophage lambda.
Goz, Eli; Mioduser, Oriah; Diament, Alon; Tuller, Tamir
2017-08-01
Deciphering the way gene expression regulatory aspects are encoded in viral genomes is a challenging mission with ramifications related to all biomedical disciplines. Here, we aimed to understand how the evolution shapes the bacteriophage lambda genes by performing a high resolution analysis of ribosomal profiling data and gene expression related synonymous/silent information encoded in bacteriophage coding regions.We demonstrated evidence of selection for distinct compositions of synonymous codons in early and late viral genes related to the adaptation of translation efficiency to different bacteriophage developmental stages. Specifically, we showed that evolution of viral coding regions is driven, among others, by selection for codons with higher decoding rates; during the initial/progressive stages of infection the decoding rates in early/late genes were found to be superior to those in late/early genes, respectively. Moreover, we argued that selection for translation efficiency could be partially explained by adaptation to Escherichia coli tRNA pool and the fact that it can change during the bacteriophage life cycle.An analysis of additional aspects related to the expression of viral genes, such as mRNA folding and more complex/longer regulatory signals in the coding regions, is also reported. The reported conclusions are likely to be relevant also to additional viruses. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis
Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia
2011-01-01
Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358
Statistical properties of DNA sequences
NASA Technical Reports Server (NTRS)
Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.
1995-01-01
We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
Omeire, Destiny; Abdin, Shaunte; Brooks, Daniel M; Miranda, Hector C
2015-04-01
The Germain's Peacock-Pheasant Polyplectron germaini (Aves, Galliformes, Phasianidae) is classified as Near Threatened on the IUCN Red List. The complete mitochondrial genome of P. germaini is 16,699 bp, consisting of 13 protein-coding genes, 2 rRNA, 22 tRNA genes and 1 control region. All of the 13 protein-coding genes have ATG as start codon. Eight of the 13 protein-coding genes have TAA as stop codon.
New PAH gene promoter KLF1 and 3'-region C/EBPalpha motifs influence transcription in vitro.
Klaassen, Kristel; Stankovic, Biljana; Kotur, Nikola; Djordjevic, Maja; Zukic, Branka; Nikcevic, Gordana; Ugrin, Milena; Spasovski, Vesna; Srzentic, Sanja; Pavlovic, Sonja; Stojiljkovic, Maja
2017-02-01
Phenylketonuria (PKU) is a metabolic disease caused by mutations in the phenylalanine hydroxylase (PAH) gene. Although the PAH genotype remains the main determinant of PKU phenotype severity, genotype-phenotype inconsistencies have been reported. In this study, we focused on unanalysed sequences in non-coding PAH gene regions to assess their possible influence on the PKU phenotype. We transiently transfected HepG2 cells with various chloramphenicol acetyl transferase (CAT) reporter constructs which included PAH gene non-coding regions. Selected non-coding regions were indicated by in silico prediction to contain transcription factor binding sites. Furthermore, electrophoretic mobility shift assay (EMSA) and supershift assays were performed to identify which transcriptional factors were engaged in the interaction. We found novel KLF1 motif in the PAH promoter, which decreases CAT activity by 50 % in comparison to basal transcription in vitro. The cytosine at the c.-170 promoter position creates an additional binding site for the protein complex involving KLF1 transcription factor. Moreover, we assessed for the first time the role of a multivariant variable number tandem repeat (VNTR) region located in the 3'-region of the PAH gene. We found that the VNTR3, VNTR7 and VNTR8 constructs had approximately 60 % of CAT activity. The regulation is mediated by the C/EBPalpha transcription factor, present in protein complex binding to VNTR3. Our study highlighted two novel promoter KLF1 and 3'-region C/EBPalpha motifs in the PAH gene which decrease transcription in vitro and, thus, could be considered as PAH expression modifiers. New transcription motifs in non-coding regions will contribute to better understanding of the PKU phenotype complexity and may become important for the optimisation of PKU treatment.
Yong, Hoi-Sen; Song, Sze-Looi; Lim, Phaik-Eem; Chan, Kok-Gan; Chow, Wan-Loo; Eamsobhana, Praphathip
2015-01-01
The whole mitochondrial genome of the pest fruit fly Bactrocera arecae was obtained from next-generation sequencing of genomic DNA. It had a total length of 15,900 bp, consisting of 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The control region (952 bp) was flanked by rrnS and trnI genes. The start codons included 6 ATG, 3 ATT and 1 each of ATA, ATC, GTG and TCG. Eight TAA, two TAG, one incomplete TA and two incomplete T stop codons were represented in the protein-coding genes. The cloverleaf structure for trnS1 lacked the D-loop, and that of trnN and trnF lacked the TΨC-loop. Molecular phylogeny based on 13 protein-coding genes was concordant with 37 mitochondrial genes, with B. arecae having closest genetic affinity to B. tryoni. The subgenus Bactrocera of Dacini tribe and the Dacinae subfamily (Dacini and Ceratitidini tribes) were monophyletic. The whole mitogenome of B. arecae will serve as a useful dataset for studying the genetics, systematics and phylogenetic relationships of the many species of Bactrocera genus in particular, and tephritid fruit flies in general. PMID:26472633
Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas.
Mathelier, Anthony; Lefebvre, Calvin; Zhang, Allen W; Arenillas, David J; Ding, Jiarui; Wasserman, Wyeth W; Shah, Sohrab P
2015-04-23
With the rapid increase of whole-genome sequencing of human cancers, an important opportunity to analyze and characterize somatic mutations lying within cis-regulatory regions has emerged. A focus on protein-coding regions to identify nonsense or missense mutations disruptive to protein structure and/or function has led to important insights; however, the impact on gene expression of mutations lying within cis-regulatory regions remains under-explored. We analyzed somatic mutations from 84 matched tumor-normal whole genomes from B-cell lymphomas with accompanying gene expression measurements to elucidate the extent to which these cancers are disrupted by cis-regulatory mutations. We characterize mutations overlapping a high quality set of well-annotated transcription factor binding sites (TFBSs), covering a similar portion of the genome as protein-coding exons. Our results indicate that cis-regulatory mutations overlapping predicted TFBSs are enriched in promoter regions of genes involved in apoptosis or growth/proliferation. By integrating gene expression data with mutation data, our computational approach culminates with identification of cis-regulatory mutations most likely to participate in dysregulation of the gene expression program. The impact can be measured along with protein-coding mutations to highlight key mutations disrupting gene expression and pathways in cancer. Our study yields specific genes with disrupted expression triggered by genomic mutations in either the coding or the regulatory space. It implies that mutated regulatory components of the genome contribute substantially to cancer pathways. Our analyses demonstrate that identifying genomically altered cis-regulatory elements coupled with analysis of gene expression data will augment biological interpretation of mutational landscapes of cancers.
Neuhaus, H; Link, G
1987-01-01
The trnK gene endocing the tRNALys(UUU) has been located on mustard (Sinapis alba) chloroplast DNA, 263 bp upstream of the psbA gene on the same strand. The nucleotide sequence of the trnK gene and its flanking regions as well as the putative transcription start and termination sites are shown. The 5' end of the transcript lies 121 bp upstream of the 5' tRNA coding region and is preceded by procaryotic-type "-10" and "-35" sequence elements, while the 3' end maps 2.77 kb downstream to a DNA region with possible stemloop secondary structure. The anticodon loop of the tRNALys is interrupted by a 2,574 bp intron containing a long open reading frame, which codes for 524 amino acids. Based on conserved stem and loop structures, this intron has characteristic features of a class II intron. A region near the carboxyl terminus of the derived polypeptide appears structurally related to maturases.
Analysis and recognition of 5′ UTR intron splice sites in human pre-mRNA
Eden, E.; Brunak, S.
2004-01-01
Prediction of splice sites in non-coding regions of genes is one of the most challenging aspects of gene structure recognition. We perform a rigorous analysis of such splice sites embedded in human 5′ untranslated regions (UTRs), and investigate correlations between this class of splice sites and other features found in the adjacent exons and introns. By restricting the training of neural network algorithms to ‘pure’ UTRs (not extending partially into protein coding regions), we for the first time investigate the predictive power of the splicing signal proper, in contrast to conventional splice site prediction, which typically relies on the change in sequence at the transition from protein coding to non-coding. By doing so, the algorithms were able to pick up subtler splicing signals that were otherwise masked by ‘coding’ noise, thus enhancing significantly the prediction of 5′ UTR splice sites. For example, the non-coding splice site predicting networks pick up compositional and positional bias in the 3′ ends of non-coding exons and 5′ non-coding intron ends, where cytosine and guanine are over-represented. This compositional bias at the true UTR donor sites is also visible in the synaptic weights of the neural networks trained to identify UTR donor sites. Conventional splice site prediction methods perform poorly in UTRs because the reading frame pattern is absent. The NetUTR method presented here performs 2–3-fold better compared with NetGene2 and GenScan in 5′ UTRs. We also tested the 5′ UTR trained method on protein coding regions, and discovered, surprisingly, that it works quite well (although it cannot compete with NetGene2). This indicates that the local splicing pattern in UTRs and coding regions is largely the same. The NetUTR method is made publicly available at www.cbs.dtu.dk/services/NetUTR. PMID:14960723
Singh, Kh Dhanachandra; Karthikeyan, Muthusamy
2014-12-01
The renin-angiotensin-aldosterone system (RAAS) plays a key role in the regulation of blood pressure (BP). Mutations on the genes that encode components of the RAAS have played a significant role in genetic susceptibility to hypertension and have been intensively scrutinized. The identification of such probably causal mutations not only provides insight into the RAAS but may also serve as antihypertensive therapeutic targets and diagnostic markers. The methods for analyzing the SNPs from the huge dataset of SNPs, containing both functional and neutral SNPs is challenging by the experimental approach on every SNPs to determine their biological significance. To explore the functional significance of genetic mutation (SNPs), we adopted combined sequence and sequence-structure-based SNP analysis algorithm. Out of 3864 SNPs reported in dbSNP, we found 108 missense SNPs in the coding region and remaining in the non-coding region. In this study, we are reporting only those SNPs in coding region to be deleterious when three or more tools are predicted to be deleterious and which have high RMSD from the native structure. Based on these analyses, we have identified two SNPs of REN gene, eight SNPs of AGT gene, three SNPs of ACE gene, two SNPs of AT1R gene, three SNPs of CYP11B2 gene and three SNPs of CMA1 gene in the coding region were found to be deleterious. Further this type of study will be helpful in reducing the cost and time for identification of potential SNP and also helpful in selecting potential SNP for experimental study out of SNP pool.
The complete mitochondrial genome of Pholis nebulosus (Perciformes: Pholidae).
Wang, Zhongquan; Qin, Kaili; Liu, Jingxi; Song, Na; Han, Zhiqiang; Gao, Tianxiang
2016-11-01
In this study, the complete mitochondrial genome (mitogenome) sequence of Pholis nebulosus has been determined by long polymerase chain reaction and primer-walking methods. The mitogenome is a circular molecule of 16 524 bp in length, including the typical structure of 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 2 non-coding regions (L-strand replication origin and control region), the gene contents of which are identical to those observed in most bony fishes. Within the control region, we identified the termination-associated sequence domain (TAS), and the conserved sequence block domain (CSB-F, CSB-E, CSB-D, CSB-C, CSB-B, CSB-A, CSB-1, CSB-2, CSB-3).
Origin and evolution of the long non-coding genes in the X-inactivation center.
Romito, Antonio; Rougeulle, Claire
2011-11-01
Random X chromosome inactivation (XCI), the eutherian mechanism of X-linked gene dosage compensation, is controlled by a cis-acting locus termed the X-inactivation center (Xic). One of the striking features that characterize the Xic landscape is the abundance of loci transcribing non-coding RNAs (ncRNAs), including Xist, the master regulator of the inactivation process. Recent comparative genomic analyses have depicted the evolutionary scenario behind the origin of the X-inactivation center, revealing that this locus evolved from a region harboring protein-coding genes. During mammalian radiation, this ancestral protein-coding region was disrupted in the marsupial group, whilst it provided in eutherian lineage the starting material for the non-translated RNAs of the X-inactivation center. The emergence of non-coding genes occurred by a dual mechanism involving loss of protein-coding function of the pre-existing genes and integration of different classes of mobile elements, some of which modeled the structure and sequence of the non-coding genes in a species-specific manner. The rising genes started to produce transcripts that acquired function in regulating the epigenetic status of the X chromosome, as shown for Xist, its antisense Tsix, Jpx, and recently suggested for Ftx. Thus, the appearance of the Xic, which occurred after the divergence between eutherians and marsupials, was the basis for the evolution of random X inactivation as a strategy to achieve dosage compensation. Copyright © 2011. Published by Elsevier Masson SAS.
Nowacka-Woszuk, Joanna; Switonski, Marek
2009-01-01
The sex determination process is under the control of several genes of which two (SRY and SOX9), encoding transcription factors, play a crucial role. It is well-known that mutations at these genes may cause the development of an intersexual phenotype. The aim of this study was to conduct a comparative analysis of the coding sequence and 5'-flanking regions of both genes in four species of the family Canidae (the dog, red fox, arctic fox and Chinese raccoon dog). Similarity of the coding sequence of the SOX9 gene among the studied species was higher (99.7-99.9%) than in the case of the SRY gene (96.7-97.3%). Only single nucleotide changes were found in the compared coding sequences, whereas in the 5'-flanking region of both genes nucleotide substitutions, as well as insertions and deletions were observed. None of the changes detected in the 5'-flanking region occurred within the potential consensus sequences for transcription factors. No polymorphism was found for either of these genes in any of the analyzed species.
A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.
Mehmood, Tahir; Bohlin, Jon; Snipen, Lars
2015-01-01
The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.
Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.
Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi
2016-03-01
Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.
Gritz, L; Davies, J
1983-11-01
The plasmid-borne gene hph coding for hygromycin B phosphotransferase (HPH) in Escherichia coli has been identified and its nucleotide sequence determined. The hph gene is 1026 nucleotides long, coding for a protein with a predicted Mr of 39 000. The hph gene was placed in a shuttle plasmid vector, downstream from the promoter region of the cyc 1 gene of Saccharomyces cerevisiae, and an hph construction containing a single AUG in the 5' noncoding region allowed direct selection following transformation in yeast and in E. coli. Thus the hph gene can be used in cloning vectors for both pro- and eukaryotes.
Multiple copies of a bile acid-inducible gene in Eubacterium sp. strain VPI 12708.
Gopal-Srivastava, R; Mallonee, D H; White, W B; Hylemon, P B
1990-01-01
Eubacterium sp. strain VPI 12708 is an anaerobic intestinal bacterium which possesses inducible bile acid 7-dehydroxylation activity. Several new polypeptides are produced in this strain following induction with cholic acid. Genes coding for two copies of a bile acid-inducible 27,000-dalton polypeptide (baiA1 and baiA2) have been previously cloned and sequenced. We now report on a gene coding for a third copy of this 27,000-dalton polypeptide (baiA3). The baiA3 gene has been cloned in lambda DASH on an 11.2-kilobase DNA fragment from a partial Sau3A digest of the Eubacterium DNA. DNA sequence analysis of the baiA3 gene revealed 100% homology with the baiA1 gene within the coding region of the 27,000-dalton polypeptides. The baiA2 gene shares 81% sequence identity with the other two genes at the nucleotide level. The flanking nucleotide sequences associated with the baiA1 and baiA3 genes are identical for 930 bases in the 5' direction from the initiation codon and for at least 325 bases in the 3' direction from the stop codon, including the putative promoter regions for the genes. An additional open reading frame (occupying from 621 to 648 bases, depending on the correct start codon) was found in the identical 5' regions associated with the baiA1 and baiA3 clones. The 5' sequence 930 bases upstream from the baiA1 and baiA3 genes was totally divergent. The baiA2 gene, which is part of a large bile acid-inducible operon, showed no homology with the other two genes either in the 5' or 3' direction from the polypeptide coding region, except for a 15-base-pair presumed ribosome-binding site in the 5' region. These studies strongly suggest that a gene duplication (baiA1 and baiA3) has occurred and is stably maintained in this bacterium. Images PMID:2376563
Fan, SiGang; Hu, ChaoQun; Wen, Jing; Zhang, LvPing
2011-05-01
The complete mitochondrial DNA sequence contains useful information for phylogenetic analyses of metazoa. In this study, the complete mitochondrial DNA sequence of sea cucumber Stichopus horrens (Holothuroidea: Stichopodidae: Stichopus) is presented. The complete sequence was determined using normal and long PCRs. The mitochondrial genome of Stichopus horrens is a circular molecule 16257 bps long, composed of 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes. Most of these genes are coded on the heavy strand except for one protein-coding gene (nad6) and five tRNA genes (tRNA ( Ser(UCN) ), tRNA ( Gln ), tRNA ( Ala ), tRNA ( Val ), tRNA ( Asp )) which are coded on the light strand. The composition of the heavy strand is 30.8% A, 23.7% C, 16.2% G, and 29.3% T bases (AT skew=0.025; GC skew=-0.188). A non-coding region of 675 bp was identified as a putative control region because of its location and AT richness. The intergenic spacers range from 1 to 50 bp in size, totaling 227 bp. A total of 25 overlapping nucleotides, ranging from 1 to 10 bp in size, exist among 11 genes. All 13 protein-coding genes are initiated with an ATG. The TAA codon is used as the stop codon in all the protein coding genes except nad3 and nad4 that use TAG as their termination codon. The most frequently used amino acids are Leu (16.29%), Ser (10.34%) and Phe (8.37%). All of the tRNA genes have the potential to fold into typical cloverleaf secondary structures. We also compared the order of the genes in the mitochondrial DNA from the five holothurians that are now available and found a novel gene arrangement in the mitochondrial DNA of Stichopus horrens.
Scaling features of noncoding DNA
NASA Technical Reports Server (NTRS)
Stanley, H. E.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.
1999-01-01
We review evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene, and utilize this fact to build a Coding Sequence Finder Algorithm, which uses statistical ideas to locate the coding regions of an unknown DNA sequence. Finally, we describe briefly some recent work adapting to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and reporting that noncoding regions in eukaryotes display a larger redundancy than coding regions. Specifically, we consider the possibility that this result is solely a consequence of nucleotide concentration differences as first noted by Bonhoeffer and his collaborators. We find that cytosine-guanine (CG) concentration does have a strong "background" effect on redundancy. However, we find that for the purine-pyrimidine binary mapping rule, which is not affected by the difference in CG concentration, the Shannon redundancy for the set of analyzed sequences is larger for noncoding regions compared to coding regions.
McGuire, Austen B; Rafi, Syed K; Manzardo, Ann M; Butler, Merlin G
2016-05-05
Mammalian chromosomes are comprised of complex chromatin architecture with the specific assembly and configuration of each chromosome influencing gene expression and function in yet undefined ways by varying degrees of heterochromatinization that result in Giemsa (G) negative euchromatic (light) bands and G-positive heterochromatic (dark) bands. We carried out morphometric measurements of high-resolution chromosome ideograms for the first time to characterize the total euchromatic and heterochromatic chromosome band length, distribution and localization of 20,145 known protein-coding genes, 790 recognized autism spectrum disorder (ASD) genes and 365 obesity genes. The individual lengths of G-negative euchromatin and G-positive heterochromatin chromosome bands were measured in millimeters and recorded from scaled and stacked digital images of 850-band high-resolution ideograms supplied by the International Society of Chromosome Nomenclature (ISCN) 2013. Our overall measurements followed established banding patterns based on chromosome size. G-negative euchromatic band regions contained 60% of protein-coding genes while the remaining 40% were distributed across the four heterochromatic dark band sub-types. ASD genes were disproportionately overrepresented in the darker heterochromatic sub-bands, while the obesity gene distribution pattern did not significantly differ from protein-coding genes. Our study supports recent trends implicating genes located in heterochromatin regions playing a role in biological processes including neurodevelopment and function, specifically genes associated with ASD.
RPS8—a New Informative DNA Marker for Phylogeny of Babesia and Theileria Parasites in China
Tian, Zhan-Cheng; Liu, Guang-Yuan; Yin, Hong; Luo, Jian-Xun; Guan, Gui-Quan; Luo, Jin; Xie, Jun-Ren; Shen, Hui; Tian, Mei-Yuan; Zheng, Jin-feng; Yuan, Xiao-song; Wang, Fang-fang
2013-01-01
Piroplasmosis is a serious debilitating and sometimes fatal disease. Phylogenetic relationships within piroplasmida are complex and remain unclear. We compared the intron–exon structure and DNA sequences of the RPS8 gene from Babesia and Theileria spp. isolates in China. Similar to 18S rDNA, the 40S ribosomal protein S8 gene, RPS8, including both coding and non-coding regions is a useful and novel genetic marker for defining species boundaries and for inferring phylogenies because it tends to have little intra-specific variation but considerable inter-specific difference. However, more samples are needed to verify the usefulness of the RPS8 (coding and non-coding regions) gene as a marker for the phylogenetic position and detection of most Babesia and Theileria species, particularly for some closely related species. PMID:24244571
Schmouth, Jean-François; Castellarin, Mauro; Laprise, Stéphanie; Banks, Kathleen G; Bonaguro, Russell J; McInerny, Simone C; Borretta, Lisa; Amirabbasi, Mahsa; Korecki, Andrea J; Portales-Casamar, Elodie; Wilson, Gary; Dreolini, Lisa; Jones, Steven J M; Wasserman, Wyeth W; Goldowitz, Daniel; Holt, Robert A; Simpson, Elizabeth M
2013-10-14
The next big challenge in human genetics is understanding the 98% of the genome that comprises non-coding DNA. Hidden in this DNA are sequences critical for gene regulation, and new experimental strategies are needed to understand the functional role of gene-regulation sequences in health and disease. In this study, we build upon our HuGX ('high-throughput human genes on the X chromosome') strategy to expand our understanding of human gene regulation in vivo. In all, ten human genes known to express in therapeutically important brain regions were chosen for study. For eight of these genes, human bacterial artificial chromosome clones were identified, retrofitted with a reporter, knocked single-copy into the Hprt locus in mouse embryonic stem cells, and mouse strains derived. Five of these human genes expressed in mouse, and all expressed in the adult brain region for which they were chosen. This defined the boundaries of the genomic DNA sufficient for brain expression, and refined our knowledge regarding the complexity of gene regulation. We also characterized for the first time the expression of human MAOA and NR2F2, two genes for which the mouse homologs have been extensively studied in the central nervous system (CNS), and AMOTL1 and NOV, for which roles in CNS have been unclear. We have demonstrated the use of the HuGX strategy to functionally delineate non-coding-regulatory regions of therapeutically important human brain genes. Our results also show that a careful investigation, using publicly available resources and bioinformatics, can lead to accurate predictions of gene expression.
Cheng, Hui; Li, Jinfeng; Zhang, Hong; Cai, Binhua; Gao, Zhihong
2017-01-01
Compared with other members of the family Rosaceae, the chloroplast genomes of Fragaria species exhibit low variation, and this situation has limited phylogenetic analyses; thus, complete chloroplast genome sequencing of Fragaria species is needed. In this study, we sequenced the complete chloroplast genome of F. × ananassa ‘Benihoppe’ using the Illumina HiSeq 2500-PE150 platform and then performed a combination of de novo assembly and reference-guided mapping of contigs to generate complete chloroplast genome sequences. The chloroplast genome exhibits a typical quadripartite structure with a pair of inverted repeats (IRs, 25,936 bp) separated by large (LSC, 85,531 bp) and small (SSC, 18,146 bp) single-copy (SC) regions. The length of the F. × ananassa ‘Benihoppe’ chloroplast genome is 155,549 bp, representing the smallest Fragaria chloroplast genome observed to date. The genome encodes 112 unique genes, comprising 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Comparative analysis of the overall nucleotide sequence identity among ten complete chloroplast genomes confirmed that for both coding and non-coding regions in Rosaceae, SC regions exhibit higher sequence variation than IRs. The Ka/Ks ratio of most genes was less than 1, suggesting that most genes are under purifying selection. Moreover, the mVISTA results also showed a high degree of conservation in genome structure, gene order and gene content in Fragaria, particularly among three octoploid strawberries which were F. × ananassa ‘Benihoppe’, F. chiloensis (GP33) and F. virginiana (O477). However, when the sequences of the coding and non-coding regions of F. × ananassa ‘Benihoppe’ were compared in detail with those of F. chiloensis (GP33) and F. virginiana (O477), a number of SNPs and InDels were revealed by MEGA 7. Six non-coding regions (trnK-matK, trnS-trnG, atpF-atpH, trnC-petN, trnT-psbD and trnP-psaJ) with a percentage of variable sites greater than 1% and no less than five parsimony-informative sites were identified and may be useful for phylogenetic analysis of the genus Fragaria. PMID:29038765
The complete mitochondrial genome of the Border Collie dog.
Wu, An-Quan; Zhang, Yong-Liang; Li, Li-Li; Chen, Long; Yang, Tong-Wen
2016-01-01
Border Collie dog is one of the famous breed of dog. In the present work we report the complete mitochondrial genome sequence of Border Collie dog for the first time. The total length of the mitogenome was 16,730 bp with the base composition of 31.6% for A, 28.7% for T, 25.5% for C, and 14.2% for G and an A-T (60.3%)-rich feature was detected. It harbored 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes and one non-coding control region (D-loop region). The arrangement of all genes was identical to the typical mitochondrial genomes of dogs.
Complete mitochondrial genome of Eagle Owl (Bubo bubo, Strigiformes; Strigidae) from China.
Hengjiu, Tian; Jianwei, Ji; Shi, Yang; Zhiming, Zhang; Laghari, Muhammad Younis; Narejo, Naeem Tariq; Lashari, Punhal
2016-01-01
In the present study, the complete mitochondrial genome sequence of Bubo bubo using PCR amplification, sequencing and assembling has been obtained for the first time. The total length of the mitochondrial genome was 16,250 bp, with the base composition of 29.88% A, 34.16% C, 14.35% G, and 21.58% T. It contained 37 genes (2 ribosomal RNA genes, 13 protein-coding genes and 22 transfer RNA genes) and a major non-coding control region (D-loop region). The complete mitochondrial genome sequence of Bubo bubo provides an important data set for further investigation on the phylogenetic relationships within Strigiformes.
Wang, Jiajia; Li, Hu; Dai, Renhuai
2017-12-01
Here, we describe the first complete mitochondrial genome (mitogenome) sequence of the leafhopper Taharana fasciana (Coelidiinae). The mitogenome sequence contains 15,161 bp with an A + T content of 77.9%. It includes 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and one non-coding (A + T-rich) region; in addition, a repeat region is also present (GenBank accession no. KY886913). These genes/regions are in the same order as in the inferred insect ancestral mitogenome. All protein-coding genes have ATN as the start codon, and TAA or single T as the stop codons, except the gene ND3, which ends with TAG. Furthermore, we predicted the secondary structures of the rRNAs in T. fasciana. Six domains (domain III is absent in arthropods) and 41 helices were predicted for 16S rRNA, and 12S rRNA comprised three structural domains and 24 helices. Phylogenetic tree analysis confirmed that T. fasciana and other members of the Cicadellidae are clustered into a clade, and it identified the relationships among the subfamilies Deltocephalinae, Coelidiinae, Idiocerinae, Cicadellinae, and Typhlocybinae.
Ming-Xing, Lu; Zhi-Teng, Chen; Wei-Wei, Yu; Yu-Zhou, Du
2017-03-01
We report the complete mitochondrial genome (mitogenome) of a spiraling whitefly, Aleurodicus dispersus (Hemiptera: Aleyrodidae). The 16 170 bp long genome consists of 13 protein-coding genes, 20 transfer RNAs, 2 ribosomal RNAs, and a control region. The A. dispersus mitogenome also includes a cytb-like non-coding region and shows several variations relative to the typical insect mitogenome. A phylogenetic tree has been constructed using the 13 protein-coding genes of 12 related species from Hemiptera. Our results would contribute to further study of phylogeny in Aleyrodidae and Hemiptera.
Juul, Malene; Bertl, Johanna; Guo, Qianyun; Nielsen, Morten Muhlig; Świtnicki, Michał; Hornshøj, Henrik; Madsen, Tobias; Hobolth, Asger; Pedersen, Jakob Skou
2017-01-01
Non-coding mutations may drive cancer development. Statistical detection of non-coding driver regions is challenged by a varying mutation rate and uncertainty of functional impact. Here, we develop a statistically founded non-coding driver-detection method, ncdDetect, which includes sample-specific mutational signatures, long-range mutation rate variation, and position-specific impact measures. Using ncdDetect, we screened non-coding regulatory regions of protein-coding genes across a pan-cancer set of whole-genomes (n = 505), which top-ranked known drivers and identified new candidates. For individual candidates, presence of non-coding mutations associates with altered expression or decreased patient survival across an independent pan-cancer sample set (n = 5454). This includes an antigen-presenting gene (CD1A), where 5’UTR mutations correlate significantly with decreased survival in melanoma. Additionally, mutations in a base-excision-repair gene (SMUG1) correlate with a C-to-T mutational-signature. Overall, we find that a rich model of mutational heterogeneity facilitates non-coding driver identification and integrative analysis points to candidates of potential clinical relevance. DOI: http://dx.doi.org/10.7554/eLife.21778.001 PMID:28362259
Galián, José A; Rosato, Marcela; Rosselló, Josep A
2014-03-01
Multigene families have provided opportunities for evolutionary biologists to assess molecular evolution processes and phylogenetic reconstructions at deep and shallow systematic levels. However, the use of these markers is not free of technical and analytical challenges. Many evolutionary studies that used the nuclear 5S rDNA gene family rarely used contiguous 5S coding sequences due to the routine use of head-to-tail polymerase chain reaction primers that are anchored to the coding region. Moreover, the 5S coding sequences have been concatenated with independent, adjacent gene units in many studies, creating simulated chimeric genes as the raw data for evolutionary analysis. This practice is based on the tacitly assumed, but rarely tested, hypothesis that strict intra-locus concerted evolution processes are operating in 5S rDNA genes, without any empirical evidence as to whether it holds for the recovered data. The potential pitfalls of analysing the patterns of molecular evolution and reconstructing phylogenies based on these chimeric genes have not been assessed to date. Here, we compared the sequence integrity and phylogenetic behavior of entire versus concatenated 5S coding regions from a real data set obtained from closely related plant species (Medicago, Fabaceae). Our results suggest that within arrays sequence homogenization is partially operating in the 5S coding region, which is traditionally assumed to be highly conserved. Consequently, concatenating 5S genes increases haplotype diversity, generating novel chimeric genotypes that most likely do not exist within the genome. In addition, the patterns of gene evolution are distorted, leading to incorrect haplotype relationships in some evolutionary reconstructions.
Hu, Bo; Liu, Dong-Xing; Zhang, Yu-Qing; Song, Jian-Tao; Ji, Xian-Fei; Hou, Zhi-Qiang; Zhang, Zhen-Hai
2016-05-01
In this study we sequenced the complete mitochondrial genome sequencing of a heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus) for the first time. The total length of the mitogenome was 16,267 bp. It harbored 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 1 non-coding control region.
Vicente, Juan J; Galardi-Castilla, María; Escalante, Ricardo; Sastre, Leandro
2008-01-03
The social amoeba Dictyostelium discoideum executes a multicellular development program upon starvation. This morphogenetic process requires the differential regulation of a large number of genes and is coordinated by extracellular signals. The MADS-box transcription factor SrfA is required for several stages of development, including slug migration and spore terminal differentiation. Subtractive hybridization allowed the isolation of a gene, sigN (SrfA-induced gene N), that was dependent on the transcription factor SrfA for expression at the slug stage of development. Homology searches detected the existence of a large family of sigN-related genes in the Dictyostelium discoideum genome. The 13 most similar genes are grouped in two regions of chromosome 2 and have been named Group1 and Group2 sigN genes. The putative encoded proteins are 87-89 amino acids long. All these genes have a similar structure, composed of a first exon containing a 13 nucleotides long open reading frame and a second exon comprising the remaining of the putative coding region. The expression of these genes is induced at10 hours of development. Analyses of their promoter regions indicate that these genes are expressed in the prestalk region of developing structures. The addition of antibodies raised against SigN Group 2 proteins induced disintegration of multi-cellular structures at the mound stage of development. A large family of genes coding for small proteins has been identified in D. discoideum. Two groups of very similar genes from this family have been shown to be specifically expressed in prestalk cells during development. Functional studies using antibodies raised against Group 2 SigN proteins indicate that these genes could play a role during multicellular development.
Lafuente, M J; Petit, T; Gancedo, C
1997-12-22
We have constructed a series of plasmids to facilitate the fusion of promoters with or without coding regions of genes of Schizosaccharomyces pombe to the lacZ gene of Escherichia coli. These vectors carry a multiple cloning region in which fission yeast DNA may be inserted in three different reading frames with respect to the coding region of lacZ. The plasmids were constructed with the ura4+ or the his3+ marker of S. pombe. Functionality of the plasmids was tested measuring in parallel the expression of fructose 1,6-bisphosphatase and beta-galactosidase under the control of the fbp1+ promoter in different conditions.
Complete mitochondrial genome of the Tyto longimembris (Strigiformes: Tytonidae).
Xu, Peng; Li, Yankuo; Miao, Lujun; Xie, Guangyong; Huang, Yan
2016-07-01
The complete mitochondrial genome of Tyto longimembris has been determined in this study. It is 18,466 bp in length and consists of 13 protein-coding genes, 22 transfer RNA (tRNA) genes, 2 ribosomal RNA (rRNA) genes and a non-coding control region (D-loop). The overall base composition of the heavy strand of the T. longimembris mitochondrial genome is A: 30.1%, T: 23.5%, C: 31.8% and G: 14.6%. The structure of control region should be characterized by a region containing tandem repeats as two definitely separated clusters of tandem repeats were found. This study provided an important data set for phylogenetic and taxonomic analyses of Tyto species.
Ashburner, M; Misra, S; Roote, J; Lewis, S E; Blazej, R; Davis, T; Doyle, C; Galle, R; George, R; Harris, N; Hartzell, G; Harvey, D; Hong, L; Houston, K; Hoskins, R; Johnson, G; Martin, C; Moshrefi, A; Palazzolo, M; Reese, M G; Spradling, A; Tsang, G; Wan, K; Whitelaw, K; Celniker, S
1999-01-01
A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized "Adh region." A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species.Before beginning a Hunt, it is wise to ask someone what you are looking for before you begin looking for it. Milne 1926 PMID:10471707
González, Carolina; Tabernero, David; Cortese, Maria Francesca; Gregori, Josep; Casillas, Rosario; Riveiro-Barciela, Mar; Godoy, Cristina; Sopena, Sara; Rando, Ariadna; Yll, Marçal; Lopez-Martinez, Rosa; Quer, Josep; Esteban, Rafael; Buti, Maria; Rodríguez-Frías, Francisco
2018-05-21
To detect hyper-conserved regions in the hepatitis B virus (HBV) X gene ( HBX ) 5' region that could be candidates for gene therapy. The study included 27 chronic hepatitis B treatment-naive patients in various clinical stages (from chronic infection to cirrhosis and hepatocellular carcinoma, both HBeAg-negative and HBeAg-positive), and infected with HBV genotypes A-F and H. In a serum sample from each patient with viremia > 3.5 log IU/mL, the HBX 5' end region [nucleotide (nt) 1255-1611] was PCR-amplified and submitted to next-generation sequencing (NGS). We assessed genotype variants by phylogenetic analysis, and evaluated conservation of this region by calculating the information content of each nucleotide position in a multiple alignment of all unique sequences (haplotypes) obtained by NGS. Conservation at the HBx protein amino acid (aa) level was also analyzed. NGS yielded 1333069 sequences from the 27 samples, with a median of 4578 sequences/sample (2487-9279, IQR 2817). In 14/27 patients (51.8%), phylogenetic analysis of viral nucleotide haplotypes showed a complex mixture of genotypic variants. Analysis of the information content in the haplotype multiple alignments detected 2 hyper-conserved nucleotide regions, one in the HBX upstream non-coding region (nt 1255-1286) and the other in the 5' end coding region (nt 1519-1603). This last region coded for a conserved amino acid region (aa 63-76) that partially overlaps a Kunitz-like domain. Two hyper-conserved regions detected in the HBX 5' end may be of value for targeted gene therapy, regardless of the patients' clinical stage or HBV genotype.
Systematic screening for mutations in the promoter and the coding region of the 5-HT{sub 1A} gene
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erdmann, J.; Shimron-Abarbanell, D.; Cichon, S.
1995-10-09
In the present study we sought to identify genetic variation in the 5-HT{sub 1A} receptor gene which through alteration of protein function or level of expression might contribute to the genetic predisposition to neuropsychiatric diseases. Genomic DNA samples from 159 unrelated subjects (including 45 schizophrenic, 46 bipolar affective, and 43 patients with Tourette`s syndrome, as well as 25 healthy controls) were investigated by single-strand conformation analysis. Overlapping PCR (polymerase chain reaction) fragments covered the whole coding sequence as well as the 5{prime} untranslated region of the 5-HT{sub 1A} gene. The region upstream to the coding sequence we investigated contains amore » functional promoter. We found two rare nucleotide sequence variants. Both mutations are located in the coding region of the gene: a coding mutation (A{yields}G) in nucleotide position 82 which leads to an amino acid exchange (Ile{yields}Val) in position 28 of the receptor protein and a silent mutation (C{yields}T) in nucleotide position 549. The occurrence of the Ile-28-Val substitution was studied in an extended sample of patients (n = 352) and controls (n = 210) but was found in similar frequencies in all groups. Thus, this mutation is unlikely to play a significant role in the genetic predisposition to the diseases investigated. In conclusion, our study does not provide evidence that the 5-HT{sub 1A} gene plays either a major or a minor role in the genetic predisposition to schizophrenia, bipolar affective disorder, or Tourette`s syndrome. 29 refs., 4 figs., 1 tab.« less
Takagi, M; Kobayashi, N; Sugimoto, M; Fujii, T; Watari, J; Yano, K
1987-01-01
The expression of a LEU gene from Candida maltosa (designated as C-LEU2) isolated previously (Kawamura et al. 1983) was shown to be regulated, when transferred into Saccharomyces cerevisiae, by leucine and threonine in the medium, as in the case of LEU2 gene of S. cerevisiae. The coding region together with the regulatory region was subcloned and the nucleotide sequence was determined. When the sequence of the coding region was compared with that of LEU2, the homology was 72% for base pairs and 76% for deduced amino acids. Comparison of the regulatory region of C-LEU2 with those of LEU1 and LEU2 suggested a few short consensus sequences which are involved in regulation of gene expression by leucine and threonine in the medium.
Dual CRISPR-Cas9 Cleavage Mediated Gene Excision and Targeted Integration in Yarrowia lipolytica.
Gao, Difeng; Smith, Spencer; Spagnuolo, Michael; Rodriguez, Gabriel; Blenner, Mark
2018-05-29
CRISPR-Cas9 technology has been successfully applied in Yarrowia lipolytica for targeted genomic editing including gene disruption and integration; however, disruptions by existing methods typically result from small frameshift mutations caused by indels within the coding region, which usually resulted in unnatural protein. In this study, a dual cleavage strategy directed by paired sgRNAs is developed for gene knockout. This method allows fast and robust gene excision, demonstrated on six genes of interest. The targeted regions for excision vary in length from 0.3 kb up to 3.5 kb and contain both non-coding and coding regions. The majority of the gene excisions are repaired by perfect nonhomologous end-joining without indel. Based on this dual cleavage system, two targeted markerless integration methods are developed by providing repair templates. While both strategies are effective, homology mediated end joining (HMEJ) based method are twice as efficient as homology recombination (HR) based method. In both cases, dual cleavage leads to similar or improved gene integration efficiencies compared to gene excision without integration. This dual cleavage strategy will be useful for not only generating more predictable and robust gene knockout, but also for efficient targeted markerless integration, and simultaneous knockout and integration in Y. lipolytica. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Sugai, Akihiro; Sato, Hiroki; Yoneda, Misako; Kai, Chieko
2017-08-01
The regulation of transcription during Nipah virus (NiV) replication is poorly understood. Using a bicistronic minigenome system, we investigated the involvement of non-coding regions (NCRs) in the transcriptional re-initiation efficiency of NiV RNA polymerase. Reporter assays revealed that attenuation of NiV gene expression was not constant at each gene junction, and that the attenuating property was controlled by the 3' NCR. However, this regulation was independent of the gene-end, gene-start and intergenic regions. Northern blot analysis indicated that regulation of viral gene expression by the phosphoprotein (P) and large protein (L) 3' NCRs occurred at the transcription level. We identified uridine-rich tracts within the L 3' NCR that are similar to gene-end signals. These gene-end-like sequences were recognized as weak transcription termination signals by the viral RNA polymerase, thereby reducing downstream gene transcription. Thus, we suggest that NiV has a unique mechanism of transcriptional regulation. Copyright © 2017 Elsevier Inc. All rights reserved.
Shao, Renfu; Barker, Stephen C
2011-02-15
The mitochondrial (mt) genome of the human body louse, Pediculus humanus, consists of 18 minichromosomes. Each minichromosome is 3 to 4 kb long and has 1 to 3 genes. There is unequivocal evidence for recombination between different mt minichromosomes in P. humanus. It is not known, however, how these minichromosomes recombine. Here, we report the discovery of eight chimeric mt minichromosomes in P. humanus. We classify these chimeric mt minichromosomes into two groups: Group I and Group II. Group I chimeric minichromosomes contain parts of two different protein-coding genes that are from different minichromosomes. The two parts of protein-coding genes in each Group I chimeric minichromosome are joined at a microhomologous nucleotide sequence; microhomologous nucleotide sequences are hallmarks of non-homologous recombination. Group II chimeric minichromosomes contain all of the genes and the non-coding regions of two different minichromosomes. The conserved sequence blocks in the non-coding regions of Group II chimeric minichromosomes resemble the "recombination repeats" in the non-coding regions of the mt genomes of higher plants. These repeats are essential to homologous recombination in higher plants. Our analyses of the nucleotide sequences of chimeric mt minichromosomes indicate both homologous and non-homologous recombination between minichromosomes in the mitochondria of the human body louse. Copyright © 2010 Elsevier B.V. All rights reserved.
Next generation sequencing and analysis of a conserved transcriptome of New Zealand's kiwi.
Subramanian, Sankar; Huynen, Leon; Millar, Craig D; Lambert, David M
2010-12-15
Kiwi is a highly distinctive, flightless and endangered ratite bird endemic to New Zealand. To understand the patterns of molecular evolution of the nuclear protein-coding genes in brown kiwi (Apteryx australis mantelli) and to determine the timescale of avian history we sequenced a transcriptome obtained from a kiwi embryo using next generation sequencing methods. We then assembled the conserved protein-coding regions using the chicken proteome as a scaffold. Using 1,543 conserved protein coding genes we estimated the neutral evolutionary divergence between the kiwi and chicken to be ~45%, which is approximately equal to the divergence computed for the human-mouse pair using the same set of genes. A large fraction of genes was found to be under high selective constraint, as most of the expressed genes appeared to be involved in developmental gene regulation. Our study suggests a significant relationship between gene expression levels and protein evolution. Using sequences from over 700 nuclear genes we estimated the divergence between the two basal avian groups, Palaeognathae and Neognathae to be 132 million years, which is consistent with previous studies using mitochondrial genes. The results of this investigation revealed patterns of mutation and purifying selection in conserved protein coding regions in birds. Furthermore this study suggests a relatively cost-effective way of obtaining a glimpse into the fundamental molecular evolutionary attributes of a genome, particularly when no closely related genomic sequence is available.
Identification of coding and non-coding mutational hotspots in cancer genomes.
Piraino, Scott W; Furney, Simon J
2017-01-05
The identification of mutations that play a causal role in tumour development, so called "driver" mutations, is of critical importance for understanding how cancers form and how they might be treated. Several large cancer sequencing projects have identified genes that are recurrently mutated in cancer patients, suggesting a role in tumourigenesis. While the landscape of coding drivers has been extensively studied and many of the most prominent driver genes are well characterised, comparatively less is known about the role of mutations in the non-coding regions of the genome in cancer development. The continuing fall in genome sequencing costs has resulted in a concomitant increase in the number of cancer whole genome sequences being produced, facilitating systematic interrogation of both the coding and non-coding regions of cancer genomes. To examine the mutational landscapes of tumour genomes we have developed a novel method to identify mutational hotspots in tumour genomes using both mutational data and information on evolutionary conservation. We have applied our methodology to over 1300 whole cancer genomes and show that it identifies prominent coding and non-coding regions that are known or highly suspected to play a role in cancer. Importantly, we applied our method to the entire genome, rather than relying on predefined annotations (e.g. promoter regions) and we highlight recurrently mutated regions that may have resulted from increased exposure to mutational processes rather than selection, some of which have been identified previously as targets of selection. Finally, we implicate several pan-cancer and cancer-specific candidate non-coding regions, which could be involved in tumourigenesis. We have developed a framework to identify mutational hotspots in cancer genomes, which is applicable to the entire genome. This framework identifies known and novel coding and non-coding mutional hotspots and can be used to differentiate candidate driver regions from likely passenger regions susceptible to somatic mutation.
Complete mitochondrial genome of Cynopterus sphinx (Pteropodidae: Cynopterus).
Li, Linmiao; Li, Min; Wu, Zhengjun; Chen, Jinping
2015-01-01
We have characterized the complete mitochondrial genome of Cynopterus sphinx (Pteropodidae: Cynopterus) and described its organization in this study. The total length of C. sphinx complete mitochondrial genome was 16,895 bp with the base composition of 32.54% A, 14.05% G, 25.82% T and 27.59% C. The complete mitochondrial genome included 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes (12S rRNA and 16S rRNA) and 1 control region (D-loop). The control region was 1435 bp long with the sequence CATACG repeat 64 times. Three protein-coding genes (ND1, COI and ND4) were ended with incomplete stop codon TA or T.
Evaluation of the phospholamban gene in purebred large-breed dogs with dilated cardiomyopathy.
Stabej, Polona; Leegwater, Peter A; Stokhof, Arnold A; Domanjko-Petric, Aleksandra; van Oost, Bernard A
2005-03-01
To evaluate the role of the phospholamban gene in purebred large-breed dogs with dilated cardiomyopathy (DCM). 6 dogs with DCM, including 2 Doberman Pinschers, 2 Newfoundlands, and 2 Great Danes. All dogs had clinical signs of congestive heart failure, and a diagnosis of DCM was made on the basis of echocardiographic findings. Blood samples were collected from each dog, and genomic DNA was isolated by a salt extraction method. Specific oligonucleotides were designed to amplify the promoter, exon 1, the 5'-part of exon 2 including the complete coding region, and part of intron 1 of the canine phospholamban gene via polymerase chain reaction procedures. These regions were screened for mutations in DNA obtained from the 6 dogs with DCM. No mutations were identified in the promoter, 5' untranslated region, part of intron 1, part of the 3' untranslated region, and the complete coding region of the phospholamban gene in dogs with DCM. Results indicate that mutations in the phospholamban gene are not a frequent cause of DCM in Doberman Pinschers, Newfoundlands, and Great Danes.
The complete sequence of mitochondrial genome of polled yak (Bos grunniens).
Chu, Min; Wu, Xiaoyun; Liang, Chunnian; Pei, Jie; Ding, Xuezhi; Guo, Xian; Bao, Pengjia; Yan, Ping
2016-05-01
Generally speaking, the hornless trait is also known as polled. Although the POLL locus could be assigned to a 1.36-Mb interval in the centromeric region of BTA1 (Georges et al., 1993; Drögemüller et al., 2005)), and (Liu et al., 2014) reported a 147-kb segment that included three protein-coding genes was the most likely location of the POLL mutation in domestic yaks, the underlying genetic basis for the polled trait is still unknown. In this work, the complete mitochondrial genome sequence of polled yak was determined for the first time. The total length of the mitogenome is 16,324 bp long, with the base composition of 33.72% A, 27.25% T, 25.83% C, and 13.20% G. It contained 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes and 1 non-coding region (D-loop region). The gene order of polled yak mitogenome is identical to that observed in most other vertebrates. The complete mitogenome sequence information of polled yak will provide useful data for further studies on protection of genetic resources and phylogenetic relationships within Bos grunniens.
A Dual Origin of the Xist Gene from a Protein-Coding Gene and a Set of Transposable Elements
Elisaphenko, Eugeny A.; Kolesnikov, Nikolay N.; Shevchenko, Alexander I.; Rogozin, Igor B.; Nesterova, Tatyana B.; Brockdorff, Neil; Zakian, Suren M.
2008-01-01
X-chromosome inactivation, which occurs in female eutherian mammals is controlled by a complex X-linked locus termed the X-inactivation center (XIC). Previously it was proposed that genes of the XIC evolved, at least in part, as a result of pseudogenization of protein-coding genes. In this study we show that the key XIC gene Xist, which displays fragmentary homology to a protein-coding gene Lnx3, emerged de novo in early eutherians by integration of mobile elements which gave rise to simple tandem repeats. The Xist gene promoter region and four out of ten exons found in eutherians retain homology to exons of the Lnx3 gene. The remaining six Xist exons including those with simple tandem repeats detectable in their structure have similarity to different transposable elements. Integration of mobile elements into Xist accompanies the overall evolution of the gene and presumably continues in contemporary eutherian species. Additionally we showed that the combination of remnants of protein-coding sequences and mobile elements is not unique to the Xist gene and is found in other XIC genes producing non-coding nuclear RNA. PMID:18575625
Diehl, William E.; Johnson, Welkin E.; Hunter, Eric
2013-01-01
All genes in the TRIM6/TRIM34/TRIM5/TRIM22 locus are type I interferon inducible, with TRIM5 and TRIM22 possessing antiviral properties. Evolutionary studies involving the TRIM6/34/5/22 locus have predominantly focused on the coding sequence of the genes, finding that TRIM5 and TRIM22 have undergone high rates of both non-synonymous nucleotide replacements and in-frame insertions and deletions. We sought to understand if divergent evolutionary pressures on TRIM6/34/5/22 coding regions have selected for modifications in the non-coding regions of these genes and explore whether such non-coding changes may influence the biological function of these genes. The transcribed genomic regions, including the introns, of TRIM6, TRIM34, TRIM5, and TRIM22 from ten Haplorhini primates and one prosimian species were analyzed for transposable element content. In Haplorhini species, TRIM5 displayed an exaggerated interspecies variability, predominantly resulting from changes in the composition of transposable elements in the large first and fourth introns. Multiple lineage-specific endogenous retroviral long terminal repeats (LTRs) were identified in the first intron of TRIM5 and TRIM22. In the prosimian genome, we identified a duplication of TRIM5 with a concomitant loss of TRIM22. The transposable element content of the prosimian TRIM5 genes appears to largely represent the shared Haplorhini/prosimian ancestral state for this gene. Furthermore, we demonstrated that one such differentially fixed LTR provides for species-specific transcriptional regulation of TRIM22 in response to p53 activation. Our results identify a previously unrecognized source of species-specific variation in the antiviral TRIM genes, which can lead to alterations in their transcriptional regulation. These observations suggest that there has existed long-term pressure for exaptation of retroviral LTRs in the non-coding regions of these genes. This likely resulted from serial viral challenges and provided a mechanism for rapid alteration of transcriptional regulation. To our knowledge, this represents the first report of persistent evolutionary pressure for the capture of retroviral LTR insertions. PMID:23516500
Loreni, F; Ruberti, I; Bozzoni, I; Pierandrei-Amaldi, P; Amaldi, F
1985-01-01
Ribosomal protein L1 is encoded by two genes in Xenopus laevis. The comparison of two cDNA sequences shows that the two L1 gene copies (L1a and L1b) have diverged in many silent sites and very few substitution sites; moreover a small duplication occurred at the very end of the coding region of the L1b gene which thus codes for a product five amino acids longer than that coded by L1a. Quantitatively the divergence between the two L1 genes confirms that a whole genome duplication took place in Xenopus laevis approximately 30 million years ago. A genomic fragment containing one of the two L1 gene copies (L1a), with its nine introns and flanking regions, has been completely sequenced. The 5' end of this gene has been mapped within a 20-pyridimine stretch as already found for other vertebrate ribosomal protein genes. Four of the nine introns have a 60-nucleotide sequence with 80% homology; within this region some boxes, one of which is 16 nucleotides long, are 100% homologous among the four introns. This feature of L1a gene introns is interesting since we have previously shown that the activity of this gene is regulated at a post-transcriptional level and it involves the block of the normal splicing of some intron sequences. Images Fig. 3. Fig. 5. PMID:3841512
Evaluation of 10 genes encoding cardiac proteins in Doberman Pinschers with dilated cardiomyopathy.
O'Sullivan, M Lynne; O'Grady, Michael R; Pyle, W Glen; Dawson, John F
2011-07-01
To identify a causative mutation for dilated cardiomyopathy (DCM) in Doberman Pinschers by sequencing the coding regions of 10 cardiac genes known to be associated with familial DCM in humans. 5 Doberman Pinschers with DCM and congestive heart failure and 5 control mixed-breed dogs that were euthanized or died. RNA was extracted from frozen ventricular myocardial samples from each dog, and first-strand cDNA was synthesized via reverse transcription, followed by PCR amplification with gene-specific primers. Ten cardiac genes were analyzed: cardiac actin, α-actinin, α-tropomyosin, β-myosin heavy chain, metavinculin, muscle LIM protein, myosinbinding protein C, tafazzin, titin-cap (telethonin), and troponin T. Sequences for DCM-affected and control dogs and the published canine genome were compared. None of the coding sequences yielded a common causative mutation among all Doberman Pinscher samples. However, 3 variants were identified in the α-actinin gene in the DCM-affected Doberman Pinschers. One of these variants, identified in 2 of the 5 Doberman Pinschers, resulted in an amino acid change in the rod-forming triple coiled-coil domain. Mutations in the coding regions of several genes associated with DCM in humans did not appear to consistently account for DCM in Doberman Pinschers. However, an α-actinin variant was detected in some Doberman Pinschers that may contribute to the development of DCM given its potential effect on the structure of this protein. Investigation of additional candidate gene coding and noncoding regions and further evaluation of the role of α-actinin in development of DCM in Doberman Pinschers are warranted.
2014-01-01
Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method. PMID:24386895
DOE R&D Accomplishments Database
Liang, X.
1998-06-10
The genome of Methanococcus jannaschii has been sequenced completely and has been found to contain approximately 1,770 predicted protein-coding regions. When these coding regions are expressed and how their expression is regulated, however, remain open questions. In this work, mass spectrometry was combined with two-dimensional gel electrophoresis to identify which proteins the genes produce under different growth conditions, and thus investigate the regulation of genes responsible for functions characteristic of this thermophilic representative of the methanogenic Archaea.
Feng, X; Happ, G M
1996-11-14
The cDNA for Sp23, a structural protein of the spermatophore of Tenebrio molitor, had been previously cloned and characterized (Paesen, G.C., Schwartz, M.B., Peferoen, M., Weyda, F. and Happ, G.M. (1992a) Amino acid sequence of Sp23, a structure protein of the spermatophore of the mealworm beetle, Tenebrio molitor. J. Biol. Chem. 257, 18852-18857). Using the labeled cDNA for Sp23 as a probe to screen a library of genomic DNA from Tenebrio molitor, we isolated a genomic clone for Sp23. A 5373-base pair (bp) restriction fragment containing the Sp23 gene was sequenced. The coding region is separated by a 55-bp intron which is located close to the translation start site. Three putative ecdysone response elements (EcRE) are identified in the 5' flanking region of the Sp23 gene. Comparison of the flanking regions of the Sp23 gene with those of the D-protein gene expressed in the accessory glands of Tenebrio reveals similar sequences present in the flanking regions of the two genes. The genomic organization of the coding region of the Sp23 gene shares similarities with that of the D-protein gene, three Drosophila accessory gland genes and two Drosophila 20-OH ecdysone-responsive genes.
Chen, Zhi-Teng; Du, Yu-Zhou
2015-03-01
The complete mitochondrial genome of the stonefly, Sweltsa longistyla Wu (Plecoptera: Chloroperlidae), was sequenced in this study. The mitogenome of S. longistyla is 16,151bp and contains 37 genes including 13 protein-coding genes (PCGs), 22 tRNA genes, two rRNA genes, and a large non-coding region. S. longistyla, Pteronarcys princeps Banks, Kamimuria wangi Du and Cryptoperla stilifera Sivec belong to the Plecoptera, and the gene order and orientation of their mitogenomes were similar. The overall AT content for the four stoneflies was below 72%, and the AT content of tRNA genes was above 69%. The four genomes were compact and contained only 65-127bp of non-coding intergenic DNAs. Overlapping nucleotides existed in all four genomes and ranged from 24 (P. princeps) to 178bp (K. wangi). There was a 7-bp motif ('ATGATAA') of overlapping DNA and an 8-bp motif (AAGCCTTA) conserved in three stonefly species (P. princeps, K. wangi and C. stilifera). The control regions of four stoneflies contained a stem-loop structure. Four conserved sequence blocks (CSBs) were present in the A+T-rich regions of all four stoneflies. Copyright © 2014 Elsevier B.V. All rights reserved.
Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae.
Michel, Christian J; Ngoune, Viviane Nguefack; Poch, Olivier; Ripp, Raymond; Thompson, Julie D
2017-12-03
A set X of 20 trinucleotides has been found to have the highest average occurrence in the reading frame, compared to the two shifted frames, of genes of bacteria, archaea, eukaryotes, plasmids and viruses. This set X has an interesting mathematical property, since X is a maximal C3 self-complementary trinucleotide circular code. Furthermore, any motif obtained from this circular code X has the capacity to retrieve, maintain and synchronize the original (reading) frame. Since 1996, the theory of circular codes in genes has mainly been developed by analysing the properties of the 20 trinucleotides of X, using combinatorics and statistical approaches. For the first time, we test this theory by analysing the X motifs, i.e., motifs from the circular code X, in the complete genome of the yeast Saccharomyces cerevisiae . Several properties of X motifs are identified by basic statistics (at the frequency level), and evaluated by comparison to R motifs, i.e., random motifs generated from 30 different random codes R. We first show that the frequency of X motifs is significantly greater than that of R motifs in the genome of S. cerevisiae . We then verify that no significant difference is observed between the frequencies of X and R motifs in the non-coding regions of S. cerevisiae , but that the occurrence number of X motifs is significantly higher than R motifs in the genes (protein-coding regions). This property is true for all cardinalities of X motifs (from 4 to 20) and for all 16 chromosomes. We further investigate the distribution of X motifs in the three frames of S. cerevisiae genes and show that they occur more frequently in the reading frame, regardless of their cardinality or their length. Finally, the ratio of X genes, i.e., genes with at least one X motif, to non-X genes, in the set of verified genes is significantly different to that observed in the set of putative or dubious genes with no experimental evidence. These results, taken together, represent the first evidence for a significant enrichment of X motifs in the genes of an extant organism. They raise two hypotheses: the X motifs may be evolutionary relics of the primitive codes used for translation, or they may continue to play a functional role in the complex processes of genome decoding and protein synthesis.
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio
2004-01-01
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
GeneBuilder: interactive in silico prediction of gene structure.
Milanesi, L; D'Angelo, D; Rogozin, I B
1999-01-01
Prediction of gene structure in newly sequenced DNA becomes very important in large genome sequencing projects. This problem is complicated due to the exon-intron structure of eukaryotic genes and because gene expression is regulated by many different short nucleotide domains. In order to be able to analyse the full gene structure in different organisms, it is necessary to combine information about potential functional signals (promoter region, splice sites, start and stop codons, 3' untranslated region) together with the statistical properties of coding sequences (coding potential), information about homologous proteins, ESTs and repeated elements. We have developed the GeneBuilder system which is based on prediction of functional signals and coding regions by different approaches in combination with similarity searches in proteins and EST databases. The potential gene structure models are obtained by using a dynamic programming method. The program permits the use of several parameters for gene structure prediction and refinement. During gene model construction, selecting different exon homology levels with a protein sequence selected from a list of homologous proteins can improve the accuracy of the gene structure prediction. In the case of low homology, GeneBuilder is still able to predict the gene structure. The GeneBuilder system has been tested by using the standard set (Burset and Guigo, Genomics, 34, 353-367, 1996) and the performances are: 0.89 sensitivity and 0.91 specificity at the nucleotide level. The total correlation coefficient is 0.88. The GeneBuilder system is implemented as a part of the WebGene a the URL: http://www.itba.mi. cnr.it/webgene and TRADAT (TRAncription Database and Analysis Tools) launcher URL: http://www.itba.mi.cnr.it/tradat.
Phylogenetic Network for European mtDNA
Finnilä, Saara; Lehtonen, Mervi S.; Majamaa, Kari
2001-01-01
The sequence in the first hypervariable segment (HVS-I) of the control region has been used as a source of evolutionary information in most phylogenetic analyses of mtDNA. Population genetic inference would benefit from a better understanding of the variation in the mtDNA coding region, but, thus far, complete mtDNA sequences have been rare. We determined the nucleotide sequence in the coding region of mtDNA from 121 Finns, by conformation-sensitive gel electrophoresis and subsequent sequencing and by direct sequencing of the D loop. Furthermore, 71 sequences from our previous reports were included, so that the samples represented all the mtDNA haplogroups present in the Finnish population. We found a total of 297 variable sites in the coding region, which allowed the compilation of unambiguous phylogenetic networks. The D loop harbored 104 variable sites, and, in most cases, these could be localized within the coding-region networks, without discrepancies. Interestingly, many homoplasies were detected in the coding region. Nucleotide variation in the rRNA and tRNA genes was 6%, and that in the third nucleotide positions of structural genes amounted to 22% of that in the HVS-I. The complete networks enabled the relationships between the mtDNA haplogroups to be analyzed. Phylogenetic networks based on the entire coding-region sequence in mtDNA provide a rich source for further population genetic studies, and complete sequences make it easier to differentiate between disease-causing mutations and rare polymorphisms. PMID:11349229
Calin, George; Ranzani, Guglielmina N; Amadori, Dino; Herlea, Vlad; Matei, Irina; Barbanti-Brodano, Giuseppe; Negrini, Massimo
2001-01-01
Background Genomic instability has been reported at microsatellite tracts in few coding sequences. We have shown that the Bloom syndrome BLM gene may be a target of microsatelliteinstability (MSI) in a short poly-adenine repeat located in its coding region. To further characterize the involvement of BLM in tumorigenesis, we have investigated mutations in nine genes containing coding microsatellites in microsatellite mutator phenotype (MMP) positive and negative gastric carcinomas (GCs). Methods We analyzed 50 gastric carcinomas (GCs) for mutations in the BLM poly(A) tract aswell as in the coding microsatellites of the TGFβ1-RII, IGFIIR, hMSH3, hMSH6, BAX, WRN, RECQL and CBL genes. Results BLM mutations were found in 27% of MMP+ GCs (4/15 cases) but not in any of the MMP negative GCs (0/35 cases). The frequency of mutations in the other eight coding regions microsatellite was the following: TGFβ1-RII (60 %), BAX (27%), hMSH6 (20%),hMSH3 (13%), CBL (13%), IGFIIR (7%), RECQL (0%) and WRN (0%). Mutations in BLM appear to be more frequently associated with frameshifts in BAX and in hMSH6and/or hMSH3. Tumors with BLM alterations present a higher frequency of unstable mono- and trinucleotide repeats located in coding regions as compared with mutator phenotype tumors without BLM frameshifts. Conclusions BLM frameshifts are frequent alterations in GCs specifically associated with MMP+tumors. We suggest that BLM loss of function by MSI may increase the genetic instability of a pre-existent unstable genotype in gastric tumors. PMID:11532193
Kim, Min Jee; Im, Hyun Hwak; Lee, Kwang Youll; Han, Yeon Soo; Kim, Iksoo
2014-06-01
Abstract The complete nucleotide sequences of the mitochondrial genome from the whiter-spotted flower chafer, Protaetia brevitarsis (Coleoptera: Scarabaeidae), was determined. The 20,319-bp long circular genome is the longest among completely sequenced Coleoptera. As is typical in animals, the P. brevitarsis genome consisted of two ribosomal RNAs, 22 transfer RNAs, 13 protein-coding genes and one A + T-rich region. Although the size of the coding genes was typical, the non-coding A + T-rich region was 5654 bp, which is the longest in insects. The extraordinary length of this region was composed of 28,117-bp tandem repeats and 782-bp tandem repeats. These repeat sequences were encompassed by three non-repeat sequences constituting 1804 bp.
Khrustalev, Vladislav Victorovich
2009-01-01
Guanine is the most mutable nucleotide in HIV genes because of frequently occurring G to A transitions, which are caused by cytosine deamination in viral DNA minus strands catalyzed by APOBEC enzymes. Distribution of guanine between three codon positions should influence the probability for G to A mutation to be nonsynonymous (to occur in first or second codon position). We discovered that nucleotide sequences of env genes coding for third variable regions (V3 loops) of gp120 from HIV1 and HIV2 have different kinds of guanine usage biases. In the HIV1 reference strain and 100 additionally analyzed HIV1 strains the guanine usage bias in V3 loop coding regions (2G>1G>3G) should lead to elevated nonsynonymous G to A transitions occurrence rates. In the HIV2 reference strain and 100 other HIV2 strains guanine usage bias in V3 loop coding regions (3G>2G>1G) should protect V3 loops from hypermutability. According to the HIV1 and HIV2 V3 alignment, insertion of the sequence enriched with 2G (21 codons in length) occurred during the evolution of HIV1 predecessor, while insertion of the different sequence enriched with 3G (19 codons in length) occurred during the evolution of HIV2 predecessor. The higher is the level of 3G in the V3 coding region, the lower should be the immune escaping mutation occurrence rates. This hypothesis was tested in this study by comparing the guanine usage in V3 loop coding regions from HIV1 fast and slow progressors. All calculations have been performed by our algorithms "VVK In length", "VVK Dinucleotides" and "VVK Consensus" (www.barkovsky.hotmail.ru).
The complete mitochondrial genome of Pomacea canaliculata (Gastropoda: Ampullariidae).
Zhou, Xuming; Chen, Yu; Zhu, Shanliang; Xu, Haigen; Liu, Yan; Chen, Lian
2016-01-01
The mitochondrial genome of Pomacea canaliculata (Gastropoda: Ampullariidae) is the first complete mtDNA sequence reported in the genus Pomacea. The total length of mtDNA is 15,707 bp, which containing 13 protein-coding genes, 2 ribosomal RNAs, 22 transfer RNAs, and a 359 bp non-coding region. The A + T content of the overall base composition of H-strand is 71.7% (T: 41%, C: 12.7%, A: 30.7%, G: 15.6%). ATP6, ATP8, CO1, CO2, ND1-3, ND5, ND6, ND4L and Cyt b genes begin with ATG as start codon, CO3 and ND4 begin with ATA. ATP8, CO2-3, ND4L, ND2-6 and Cyt b genes are terminated with TAA as stop codon, ATP6, ND1, and CO1 end with TAG. A long non-coding region is found and a 23 bp repeat unit repeat 11 times in this region.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Umans, L.; Serneels, L.; Hilliker, C.
1994-08-01
The authors have cloned the mouse gene coding for {alpha}{sub 2}-macroglobulin in overlapping {lambda} clones and have analyzed its structure. The gene contains 36 exons, coding for the 4.8-kb cDNA that we cloned previously. Including putative control elements in the 5{prime} flanking region, the gene covers about 45 kb. A region of 3.8 kb, stretching from 835 bases upstream of the cDNA start site to exon 4, including all intervening sequences, was sequenced completely. The analysis demonstrated that the putative promoter region of the mouse A2M gene differed considerably from the known promoter sequences of the human A2M gene andmore » of the rat acute-phas A2M gene. Comparison of the exon-intron structure of all known genes of the A2M family confirmed that the rat acute phase A2M gene is more closely related to the human gene than to the mouse A2M gene. To generate mice with the A2M gene inactivated, an insertion type of construct containing 7.5 kb of genomic DNA of the mouse strain 129/J, encompassing exons 16 to 19, was synthesized. A hygromycin marker gene was embedded in intron 17. After electroporation, 198 hygromycin-resistant ES cell lines were isolated and analyzed by Southern blotting. Five ES cell lines were obtained with one allele of the mouse A2M gene targeted by this insertion construct, demonstrating that the position and the characteristics of the vector served the intended goal.« less
Ou, Jing; Liu, Jin-Bo; Yao, Fu-Jiao; Wang, Xin-Guo; Wei, Zhao-Ming
2016-01-01
Flour beetles of the genus Tribolium are all pests of stored products and cause severe economic losses every year. The American black flour beetle Tribolium audax is one of the important pest species of flour beetle, and it is also an important quarantine insect. Here we sequenced and characterized the complete mitochondrial genome of T. audax, which was intercepted by Huangpu Custom in maize from America. The complete circular mitochondrial genome (mitogenome) of T. audax was 15,924 bp in length, containing 37 typical coding genes and one non-coding AT-rich region. The mitogenome of T. audax exhibits a gene arrangement and content identical to the most common type in insects. All protein coding genes (PCGs) are start with a typical ATN initiation codon, except for the cox1, which use AAC as its start codon instead of ATN. Eleven genes use standard complete termination codon (nine TAA, two TAG), whereas the nad4 and nad5 genes end with single T. Except for trnS1 (AGN), all tRNA genes display typical secondary cloverleaf structures as those of other insects. The sizes of the large and small ribosomal RNA genes are 1288 and 780 bp, respectively. The AT content of the AT-rich region is 81.36%. The 5 bp conserved motif TACTA was found in the intergenic region between trnS2 (UCN) and nad1.
Characterization of the complete mitochondrial genome sequence of wild yak (Bos mutus).
Chunnian, Liang; Wu, Xiaoyun; Ding, Xuezhi; Wang, Hongbo; Guo, Xian; Chu, Min; Bao, Pengjia; Yan, Ping
2016-11-01
Wild yak is a special breed in China and it is regarded as an important genetic resource for sustainably developing the animal husbandry in Tibetan area and enriching region's biodiversity. The complete mitochondrial genome of wild yak (16,322 bp in length) displayed 37 typical animal mitochondrial genes and A + T-rich (61.01%), with an overall G + C content of only 38.99%. It contained a non-coding control region (D-loop), 13 protein-coding genes, two rRNA genes, and 22 tRNA genes. Most of the genes have ATG initiation codons, whereas ND2, ND3, and ND5 genes start with ATA and were encoded on H-strand. The gene order of wild yak mitogenome is identical to that observed in most other vertebrates. The complete mitochondrial genome sequence of wild yak reported here could provide valuable information for developing genetic markers and phylogenetic analysis in yak.
Baurens, Franc-Christophe; Bocs, Stéphanie; Rouard, Mathieu; Matsumoto, Takashi; Miller, Robert N G; Rodier-Goud, Marguerite; MBéguié-A-MBéguié, Didier; Yahiaoui, Nabila
2010-07-16
Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana.
Mechanisms of radiation-induced gene responses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woloschak, G.E.; Paunesku, T.
1996-10-01
In the process of identifying genes differentially expressed in cells exposed ultraviolet radiation, we have identified a transcript having a 26-bp region that is highly conserved in a variety of species including Bacillus circulans, yeast, pumpkin, Drosophila, mouse, and man. When the 5` region (flanking region or UTR) of a gene, the sequence is predominantly in +/+ orientation with respect to the coding DNA strand; while in the coding region and the 3` region (UTR), the sequence is most frequently in the +/-orientation with respect to the coding DNA strand. In two genes, the element is split into two parts;more » however, in most cases, it is found only once but with a minimum of 11 consecutive nucleotides precisely depicting the original sequence. The element is found in a large number of different genes with diverse functions (from human ras p21 to B. circulans chitonase). Gel shift assays demonstrated the presence of a protein in HeLa cell extracts that binds to the sense and antisense single-stranded consensus oligomers, as well as to the double- stranded oligonucleotide. When double-stranded oligomer was used, the size shift demonstrated as additional protein-oligomer complex larger than the one bound to either sense or antisense single-stranded consensus oligomers alone. It is speculated either that this element binds to protein(s) important in maintaining DNA is a single-stranded orientation for transcription or, alternatively that this element is important in the transcription-coupled DNA repair process.« less
Evolution of the snake body form reveals homoplasy in amniote Hox gene function.
Head, Jason J; Polly, P David
2015-04-02
Hox genes regulate regionalization of the axial skeleton in vertebrates, and changes in their expression have been proposed to be a fundamental mechanism driving the evolution of new body forms. The origin of the snake-like body form, with its deregionalized pre-cloacal axial skeleton, has been explained as either homogenization of Hox gene expression domains, or retention of standard vertebrate Hox domains with alteration of downstream expression that suppresses development of distinct regions. Both models assume a highly regionalized ancestor, but the extent of deregionalization of the primaxial domain (vertebrae, dorsal ribs) of the skeleton in snake-like body forms has never been analysed. Here we combine geometric morphometrics and maximum-likelihood analysis to show that the pre-cloacal primaxial domain of elongate, limb-reduced lizards and snakes is not deregionalized compared with limbed taxa, and that the phylogenetic structure of primaxial morphology in reptiles does not support a loss of regionalization in the evolution of snakes. We demonstrate that morphometric regional boundaries correspond to mapped gene expression domains in snakes, suggesting that their primaxial domain is patterned by a normally functional Hox code. Comparison of primaxial osteology in fossil and modern amniotes with Hox gene distributions within Amniota indicates that a functional, sequentially expressed Hox code patterned a subtle morphological gradient along the anterior-posterior axis in stem members of amniote clades and extant lizards, including snakes. The highly regionalized skeletons of extant archosaurs and mammals result from independent evolution in the Hox code and do not represent ancestral conditions for clades with snake-like body forms. The developmental origin of snakes is best explained by decoupling of the primaxial and abaxial domains and by increases in somite number, not by changes in the function of primaxial Hox genes.
A subset of conserved mammalian long non-coding RNAs are fossils of ancestral protein-coding genes.
Hezroni, Hadas; Ben-Tov Perry, Rotem; Meir, Zohar; Housman, Gali; Lubelsky, Yoav; Ulitsky, Igor
2017-08-30
Only a small portion of human long non-coding RNAs (lncRNAs) appear to be conserved outside of mammals, but the events underlying the birth of new lncRNAs in mammals remain largely unknown. One potential source is remnants of protein-coding genes that transitioned into lncRNAs. We systematically compare lncRNA and protein-coding loci across vertebrates, and estimate that up to 5% of conserved mammalian lncRNAs are derived from lost protein-coding genes. These lncRNAs have specific characteristics, such as broader expression domains, that set them apart from other lncRNAs. Fourteen lncRNAs have sequence similarity with the loci of the contemporary homologs of the lost protein-coding genes. We propose that selection acting on enhancer sequences is mostly responsible for retention of these regions. As an example of an RNA element from a protein-coding ancestor that was retained in the lncRNA, we describe in detail a short translated ORF in the JPX lncRNA that was derived from an upstream ORF in a protein-coding gene and retains some of its functionality. We estimate that ~ 55 annotated conserved human lncRNAs are derived from parts of ancestral protein-coding genes, and loss of coding potential is thus a non-negligible source of new lncRNAs. Some lncRNAs inherited regulatory elements influencing transcription and translation from their protein-coding ancestors and those elements can influence the expression breadth and functionality of these lncRNAs.
The complete mitochondrial genome of the Feral Rock Pigeon (Columba livia breed feral).
Li, Chun-Hong; Liu, Fang; Wang, Li
2014-10-01
Abstract In the present work, we report the complete mitochondrial genome sequence of feral rock pigeon for the first time. The total length of the mitogenome was 17,239 bp with the base composition of 30.3% for A, 24.0% for T, 31.9% for C, and 13.8% for G and an A-T (54.3 %)-rich feature was detected. It harbored 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 1 non-coding control region (D-loop region). The arrangement of all genes was identical to the typical mitochondrial genomes of pigeon. The complete mitochondrial genome sequence of feral rock pigeon would serve as an important data set of the germplasm resources for further study.
The complete mitochondrial genome sequence of the Datong yak (Bos grunniens).
Wu, Xiaoyun; Chu, Min; Liang, Chunnian; Ding, Xuezhi; Guo, Xian; Bao, Pengjia; Yan, Ping
2016-01-01
Datong yak is a famous artificially cultivated breed in China. In the present work, we report the complete mitochondrial genome sequence of Datong yak for the first time. The total length of the mitogenome is 16,323 bp long, containing 13 protein-coding genes, 22 tRNA genes, two rRNA genes and one non-coding region (D-loop region). The gene order of Datong yak mitogenome is identical to that observed in most other vertebrates. The overall base composition is 33.71% A, 25.8.0% C, 13.21% G and 27.27% T, with an A + T content of 60.98%. The complete mitogenome sequence information of Datong yak can provide useful data for further studies on molecular breeding and taxonomic status.
Characterization of the complete mitochondrial genome sequence of Gannan yak (Bos grunniens).
Wu, Xiaoyun; Ding, Xuezhi; Chu, Min; Guo, Xian; Bao, Pengjia; Liang, Chunnian; Yan, Ping
2016-01-01
Gannan yak is the native breed of Gansu province in China. In this work, the complete mitochondrial genome sequence of Gannan yak was determined for the first time. The total length of the mitogenome is 16,322 bp long, with the base composition of 33.74% A, 25.84% T, 13.18% C, and 27.24% G. It contained 13 protein-coding genes, 22 tRNA genes, two rRNA genes and one non-coding region (D-loop region). The gene order of Gannan yak mitogenome is identical to that observed in most other vertebrates. The complete mitogenome sequence information of Gannan yak can provide useful data for further studies on protection of genetic resources and phylogenetic relationships within Bos grunniens.
Complete mitochondrial genome of the larch hawk moth, Sphinx morio (Lepidoptera: Sphingidae).
Kim, Min Jee; Choi, Sei-Woong; Kim, Iksoo
2013-12-01
The larch hawk moth, Sphinx morio, belongs to the lepidopteran family Sphingidae that has long been studied as a family of model insects in a diverse field. In this study, we describe the complete mitochondrial genome (mitogenome) sequences of the species in terms of general genomic features and characteristic short repetitive sequences found in the A + T-rich region. The 15,299-bp-long genome consisted of a typical set of genes (13 protein-coding genes, 2 rRNA genes, and 22 tRNA genes) and one major non-coding A + T-rich region, with the typical arrangement found in Lepidoptera. The 316-bp-long A + T-rich region located between srRNA and tRNA(Met) harbored the conserved sequence blocks that are typically found in lepidopteran insects. Additionally, the A + T-rich region of S. morio contained three characteristic repeat sequences that are rarely found in Lepidoptera: two identical 12-bp repeat, three identical 5-bp-long tandem repeat, and six nearly identical 5-6 bp long repeat sequences.
Wang, Shuo; Gao, Li-Zhi
2016-09-01
The complete chloroplast genome of green foxtail (Setaria viridis), a promising model system for C4 photosynthesis, is first reported in this study. The genome harbors a large single copy (LSC) region of 81 016 bp and a small single copy (SSC) region of 12 456 bp separated by a pair of inverted repeat (IRa and IRb) regions of 22 315 bp. GC content is 38.92%. The proportion of coding sequence is 57.97%, comprising of 111 (19 duplicated in IR regions) unique genes, 71 of which are protein-coding genes, four are rRNA genes, and 36 are tRNA genes. Phylogenetic analysis indicated that S. viridis was clustered with its cultivated species S. italica in the tribe Paniceae of the family Poaceae. This newly determined chloroplast genome will provide valuable genetic resources to assist future studies on C4 photosynthesis in grasses.
Complete mitochondrial genome of a Asian lion (Panthera leo goojratensis).
Li, Yu-Fei; Wang, Qiang; Zhao, Jian-ning
2016-01-01
The entire mitochondrial genome of this Asian lion (Panthera leo goojratensis) was 17,183 bp in length, gene composition and arrangement conformed to other lions, which contained the typical structure of 22 tRNAs, 2 rRNAs, 13 protein-coding genes and a non-coding region. The characteristic of the mitochondrial genome was analyzed in detail.
Stotz, Henrik U; Harvey, Pascoe J; Haddadi, Parham; Mashanova, Alla; Kukol, Andreas; Larkan, Nicholas J; Borhan, M Hossein; Fitt, Bruce D L
2018-01-01
Genes coding for nucleotide-binding leucine-rich repeat (LRR) receptors (NLRs) control resistance against intracellular (cell-penetrating) pathogens. However, evidence for a role of genes coding for proteins with LRR domains in resistance against extracellular (apoplastic) fungal pathogens is limited. Here, the distribution of genes coding for proteins with eLRR domains but lacking kinase domains was determined for the Brassica napus genome. Predictions of signal peptide and transmembrane regions divided these genes into 184 coding for receptor-like proteins (RLPs) and 121 coding for secreted proteins (SPs). Together with previously annotated NLRs, a total of 720 LRR genes were found. Leptosphaeria maculans-induced expression during a compatible interaction with cultivar Topas differed between RLP, SP and NLR gene families; NLR genes were induced relatively late, during the necrotrophic phase of pathogen colonization. Seven RLP, one SP and two NLR genes were found in Rlm1 and Rlm3/Rlm4/Rlm7/Rlm9 loci for resistance against L. maculans on chromosome A07 of B. napus. One NLR gene at the Rlm9 locus was positively selected, as was the RLP gene on chromosome A10 with LepR3 and Rlm2 alleles conferring resistance against L. maculans races with corresponding effectors AvrLm1 and AvrLm2, respectively. Known loci for resistance against L. maculans (extracellular hemi-biotrophic fungus), Sclerotinia sclerotiorum (necrotrophic fungus) and Plasmodiophora brassicae (intracellular, obligate biotrophic protist) were examined for presence of RLPs, SPs and NLRs in these regions. Whereas loci for resistance against P. brassicae were enriched for NLRs, no such signature was observed for the other pathogens. These findings demonstrate involvement of (i) NLR genes in resistance against the intracellular pathogen P. brassicae and a putative NLR gene in Rlm9-mediated resistance against the extracellular pathogen L. maculans.
Cenik, Can; Chua, Hon Nian; Zhang, Hui; Tarnawsky, Stefan P.; Akef, Abdalla; Derti, Adnan; Tasan, Murat; Moore, Melissa J.; Palazzo, Alexander F.; Roth, Frederick P.
2011-01-01
In higher eukaryotes, messenger RNAs (mRNAs) are exported from the nucleus to the cytoplasm via factors deposited near the 5′ end of the transcript during splicing. The signal sequence coding region (SSCR) can support an alternative mRNA export (ALREX) pathway that does not require splicing. However, most SSCR–containing genes also have introns, so the interplay between these export mechanisms remains unclear. Here we support a model in which the furthest upstream element in a given transcript, be it an intron or an ALREX–promoting SSCR, dictates the mRNA export pathway used. We also experimentally demonstrate that nuclear-encoded mitochondrial genes can use the ALREX pathway. Thus, ALREX can also be supported by nucleotide signals within mitochondrial-targeting sequence coding regions (MSCRs). Finally, we identified and experimentally verified novel motifs associated with the ALREX pathway that are shared by both SSCRs and MSCRs. Our results show strong correlation between 5′ untranslated region (5′UTR) intron presence/absence and sequence features at the beginning of the coding region. They also suggest that genes encoding secretory and mitochondrial proteins share a common regulatory mechanism at the level of mRNA export. PMID:21533221
Identification of three novel NHS mutations in families with Nance-Horan syndrome.
Huang, Kristen M; Wu, Junhua; Brooks, Simon P; Hardcastle, Alison J; Lewis, Richard Alan; Stambolian, Dwight
2007-03-27
Nance-Horan Syndrome (NHS) is an infrequent and often overlooked X-linked disorder characterized by dense congenital cataracts, microphthalmia, and dental abnormalities. The syndrome is caused by mutations in the NHS gene, whose function is not known. The purpose of this study was to identify the frequency and distribution of NHS gene mutations and compare genotype with Nance-Horan phenotype in five North American NHS families. Genomic DNA was isolated from white blood cells from NHS patients and family members. The NHS gene coding region and its splice site donor and acceptor regions were amplified from genomic DNA by PCR, and the amplicons were sequenced directly. We identified three unique NHS coding region mutations in these NHS families. This report extends the number of unique identified NHS mutations to 14.
Negligible impact of rare autoimmune-locus coding-region variants on missing heritability.
Hunt, Karen A; Mistry, Vanisha; Bockett, Nicholas A; Ahmad, Tariq; Ban, Maria; Barker, Jonathan N; Barrett, Jeffrey C; Blackburn, Hannah; Brand, Oliver; Burren, Oliver; Capon, Francesca; Compston, Alastair; Gough, Stephen C L; Jostins, Luke; Kong, Yong; Lee, James C; Lek, Monkol; MacArthur, Daniel G; Mansfield, John C; Mathew, Christopher G; Mein, Charles A; Mirza, Muddassar; Nutland, Sarah; Onengut-Gumuscu, Suna; Papouli, Efterpi; Parkes, Miles; Rich, Stephen S; Sawcer, Steven; Satsangi, Jack; Simmonds, Matthew J; Trembath, Richard C; Walker, Neil M; Wozniak, Eva; Todd, John A; Simpson, Michael A; Plagnol, Vincent; van Heel, David A
2013-06-13
Genome-wide association studies (GWAS) have identified common variants of modest-effect size at hundreds of loci for common autoimmune diseases; however, a substantial fraction of heritability remains unexplained, to which rare variants may contribute. To discover rare variants and test them for association with a phenotype, most studies re-sequence a small initial sample size and then genotype the discovered variants in a larger sample set. This approach fails to analyse a large fraction of the rare variants present in the entire sample set. Here we perform simultaneous amplicon-sequencing-based variant discovery and genotyping for coding exons of 25 GWAS risk genes in 41,911 UK residents of white European origin, comprising 24,892 subjects with six autoimmune disease phenotypes and 17,019 controls, and show that rare coding-region variants at known loci have a negligible role in common autoimmune disease susceptibility. These results do not support the rare-variant synthetic genome-wide-association hypothesis (in which unobserved rare causal variants lead to association detected at common tag variants). Many known autoimmune disease risk loci contain multiple, independently associated, common and low-frequency variants, and so genes at these loci are a priori stronger candidates for harbouring rare coding-region variants than other genes. Our data indicate that the missing heritability for common autoimmune diseases may not be attributable to the rare coding-region variant portion of the allelic spectrum, but perhaps, as others have proposed, may be a result of many common-variant loci of weak effect.
Li, Weijun; Wang, Zongqing; Che, Yanli
2017-11-12
In this study, the complete mitochondrial genome of Cryptocercus meridianus was sequenced. The circular mitochondrial genome is 15,322 bp in size and contains 13 protein-coding genes, two ribosomal RNA genes (12S rRNA and 16S rRNA), 22 transfer RNA genes, and one D-loop region. We compare the mitogenome of C. meridianus with that of C. relictus and C. kyebangensis . The base composition of the whole genome was 45.20%, 9.74%, 16.06%, and 29.00% for A, G, C, and T, respectively; it shows a high AT content (74.2%), similar to the mitogenomes of C. relictus and C. kyebangensis . The protein-coding genes are initiated with typical mitochondrial start codons except for cox1 with TTG. The gene order of the C. meridianus mitogenome differs from the typical insect pattern for the translocation of tRNA-Ser AGN , while the mitogenomes of the other two Cryptocercus species, C. relictus and C. kyebangensis , are consistent with the typical insect pattern. There are two very long non-coding intergenic regions lying on both sides of the rearranged gene tRNA-Ser AGN . The phylogenetic relationships were constructed based on the nucleotide sequence of 13 protein-coding genes and two ribosomal RNA genes. The mitogenome of C. meridianus is the first representative of the order Blattodea that demonstrates rearrangement, and it will contribute to the further study of the phylogeny and evolution of the genus Cryptocercus and related taxa.
Boyd, David A.; Thevenot, Tracy; Gumbmann, Markus; Honeyman, Allen L.; Hamilton, Ian R.
2000-01-01
Transposon mutagenesis and marker rescue were used to isolate and identify an 8.5-kb contiguous region containing six open reading frames constituting the operon for the sorbitol P-enolpyruvate phosphotransferase transport system (PTS) of Streptococcus mutans LT11. The first gene, srlD, codes for sorbitol-6-phosphate dehydrogenase, followed downstream by srlR, coding for a transcriptional regulator; srlM, coding for a putative activator; and the srlA, srlE, and srlB genes, coding for the EIIC, EIIBC, and EIIA components of the sorbitol PTS, respectively. Among all sorbitol PTS operons characterized to date, the srlD gene is found after the genes coding for the EII components; thus, the location of the gene in S. mutans is unique. The SrlR protein is similar to several transcriptional regulators found in Bacillus spp. that contain PTS regulator domains (J. Stülke, M. Arnaud, G. Rapoport, and I. Martin-Verstraete, Mol. Microbiol. 28:865–874, 1998), and its gene overlaps the srlM gene by 1 bp. The arrangement of these two regulatory genes is unique, having not been reported for other bacteria. PMID:10639465
DOE Office of Scientific and Technical Information (OSTI.GOV)
Proia, R.L.
1988-03-01
Lysosomal {beta}-hexosaminidase is composed of two structurally similar chains, {alpha} and {beta}, that are the products of different genes. Mutations in either gene causing {beta}-hexosaminidase deficiency result in the lysosomal storage disease GM2-gangliosidosis. To enable the investigation of the molecular lesions in this disorder and to study the evolutionary relationship between the {alpha} and {beta} chains, the {beta}-chain gene was isolated, and its organization was characterized. The {beta}-chain coding region is divided into 14 exons distributed over {approx}40 kilobases of DNA. Comparison with the {alpha}-chain gene revealed that 12 of the 13 introns interrupt the coding regions at homologous positions.more » This extensive sharing of intron placement demonstrates that the {alpha} and {beta} chains evolved by way of the duplication of a common ancestor.« less
Song, Sheng-Nan; Chen, Peng-Yan; Wei, Shu-Jun; Chen, Xue-Xin
2016-07-01
The mitochondrial genome sequence of Polistes jokahamae (Radoszkowski, 1887) (Hymenoptera: Vespidae) (GenBank accession no. KR052468) was sequenced. The current length with partial A + T-rich region of this mitochondrial genome is 16,616 bp. All the typical mitochondrial genes were sequenced except for three tRNAs (trnI, trnQ, and trnY) located between the A + T-rich region and nad2. At least three rearrangement events occurred in the sequenced region compared with the pupative ancestral arrangement of insects, corresponding to the shuffling of trnK and trnD, translocation or remote inversion of tnnY and translocation of trnL1. All protein-coding genes start with ATN codons. Eleven, one, and another one protein-coding genes stop with termination codon TAA, TA, and T, respectively. Phylogenetic analysis using the Bayesian method based on all codon positions of the 13 protein-coding genes supports the monophyly of Vespidae and Formicidae. Within the Formicidae, the Myrmicinae and Formicinae form a sister lineage and then sister to the Dolichoderinae, while within the Vespidae, the Eumeninae is sister to the lineage of Vespinae + Polistinae.
Boeri, Eduardo J.; Wanke, María M.; Madariaga, María J.; Teijeiro, María L.; Elena, Sebastian A.; Trangoni, Marcos D.
2018-01-01
Aim: This study aimed to compare the sensitivity (S), specificity (Sp), and positive likelihood ratios (LR+) of four polymerase chain reaction (PCR) assays for the detection of Brucella spp. in dog’s clinical samples. Materials and Methods: A total of 595 samples of whole blood, urine, and genital fluids were evaluated between October 2014 and November 2016. To compare PCR assays, the gold standard was defined using a combination of different serological and microbiological test. Bacterial isolation from urine and blood cultures was carried out. Serological methods such as rapid slide agglutination test, indirect enzyme-linked immunosorbent assay, agar gel immunodiffusion test, and buffered plate antigen test were performed. Four genes were evaluated: (i) The gene coding for the BCSP31 protein, (ii) the ribosomal gene coding for the 16S-23S intergenic spacer region, (iii) the gene coding for porins omp2a/omp2b, and (iv) the gene coding for the insertion sequence IS711. Results: The results obtained were as follows: (1) For the primers that amplify the gene coding for the BCSP31 protein: S: 45.64% (confidence interval [CI] 39.81-51.46), Sp: 95.62% (CI 93.13-98.12), and LR+: 10.43 (CI 6.04-18); (2) for the primers that amplify the ribosomal gene of the 16S-23S rDNA intergenic spacer region: S: 69.80% (CI 64.42-75.18), Sp: 95.62 % (CI 93.13-98.12), and LR+: 11.52 (CI 7.31-18.13); (3) for the primers that amplify the omp2a and omp2b genes: S: 39.26% (CI 33.55-44.97), Sp: 97.31% (CI 95.30-99.32), and LR+ 14.58 (CI 7.25-29.29); and (4) for the primers that amplify the insertion sequence IS711: S: 22.82% (CI 17.89 - 27.75), Sp: 99.66% (CI 98.84-100), and LR+ 67.77 (CI 9.47-484.89). Conclusion: We concluded that the gene coding for the 16S-23S rDNA intergenic spacer region was the one that best detected Brucella spp. in canine clinical samples. PMID:29657404
2018-01-01
FAM230C, a long intergenic non-coding RNA (lincRNA) gene in human chromosome 13 (chr13) is a member of lincRNA genes termed family with sequence similarity 230. An analysis using bioinformatics search tools and alignment programs was undertaken to determine properties of FAM230C and its related genes. Results reveal that the DNA translocation element, the Translocation Breakpoint Type A (TBTA) sequence, which consists of satellite DNA, Alu elements, and AT-rich sequences is embedded in the FAM230C gene. Eight lincRNA genes related to FAM230C also carry the TBTA sequences. These genes were formed from a large segment of the 3’ half of the FAM230C sequence duplicated in chr22, and are specifically in regions of low copy repeats (LCR22)s, in or close to the 22q.11.2 region. 22q11.2 is a chromosomal segment that undergoes a high rate of DNA translocation and is prone to genetic deletions. FAM230C-related genes present in other chromosomes do not carry the TBTA motif and were formed from the 5’ half region of the FAM230C sequence. These findings identify a high specificity in lincRNA gene formation by gene sequence duplication in different chromosomes. PMID:29668722
Marra, M A; Prasad, S S; Baillie, D L
1993-01-01
A previous study of genomic organization described the identification of nine potential coding regions in 150 kb of genomic DNA from the unc-22(IV) region of Caenorhabditis elegans. In this study, we focus on the genomic organization of a small interval of 0.1 map unit bordered on the right by unc-22 and on the left by the left-hand breakpoints of the deficiencies sDf9, sDf19 and sDf65. This small interval at present contains a single mutagenically defined locus, the essential gene let-56. The cosmid C11F2 has previously been used to rescue let-56. Therefore, at least some of C11F2 must reside in the interval. In this paper, we report the characterization of two coding elements that reside on C11F2. Analysis of nucleotide sequence data obtained from cDNAs and cosmid subclones revealed that one of the coding elements closely resembles aromatic amino acid decarboxylases from several species. The other of these coding elements was found to closely resemble a human growth factor activatable Na+/H+ antiporter. Paris of oligonucleotide primers, predicted from both coding elements, have been used in PCR experiments to position these coding elements between the left breakpoint of sDf19 and the left breakpoint of sDf65, between the essential genes let-653 and let-56.
Higashi, Koichi; Tobe, Toru; Kanai, Akinori; Uyar, Ebru; Ishikawa, Shu; Suzuki, Yutaka; Ogasawara, Naotake; Kurokawa, Ken; Oshima, Taku
2016-01-01
Bacteria can acquire new traits through horizontal gene transfer. Inappropriate expression of transferred genes, however, can disrupt the physiology of the host bacteria. To reduce this risk, Escherichia coli expresses the nucleoid-associated protein, H-NS, which preferentially binds to horizontally transferred genes to control their expression. Once expression is optimized, the horizontally transferred genes may actually contribute to E. coli survival in new habitats. Therefore, we investigated whether and how H-NS contributes to this optimization process. A comparison of H-NS binding profiles on common chromosomal segments of three E. coli strains belonging to different phylogenetic groups indicated that the positions of H-NS-bound regions have been conserved in E. coli strains. The sequences of the H-NS-bound regions appear to have diverged more so than H-NS-unbound regions only when H-NS-bound regions are located upstream or in coding regions of genes. Because these regions generally contain regulatory elements for gene expression, sequence divergence in these regions may be associated with alteration of gene expression. Indeed, nucleotide substitutions in H-NS-bound regions of the ybdO promoter and coding regions have diversified the potential for H-NS-independent negative regulation among E. coli strains. The ybdO expression in these strains was still negatively regulated by H-NS, which reduced the effect of H-NS-independent regulation under normal growth conditions. Hence, we propose that, during E. coli evolution, the conservation of H-NS binding sites resulted in the diversification of the regulation of horizontally transferred genes, which may have facilitated E. coli adaptation to new ecological niches. PMID:26789284
Higashi, Koichi; Tobe, Toru; Kanai, Akinori; Uyar, Ebru; Ishikawa, Shu; Suzuki, Yutaka; Ogasawara, Naotake; Kurokawa, Ken; Oshima, Taku
2016-01-01
Bacteria can acquire new traits through horizontal gene transfer. Inappropriate expression of transferred genes, however, can disrupt the physiology of the host bacteria. To reduce this risk, Escherichia coli expresses the nucleoid-associated protein, H-NS, which preferentially binds to horizontally transferred genes to control their expression. Once expression is optimized, the horizontally transferred genes may actually contribute to E. coli survival in new habitats. Therefore, we investigated whether and how H-NS contributes to this optimization process. A comparison of H-NS binding profiles on common chromosomal segments of three E. coli strains belonging to different phylogenetic groups indicated that the positions of H-NS-bound regions have been conserved in E. coli strains. The sequences of the H-NS-bound regions appear to have diverged more so than H-NS-unbound regions only when H-NS-bound regions are located upstream or in coding regions of genes. Because these regions generally contain regulatory elements for gene expression, sequence divergence in these regions may be associated with alteration of gene expression. Indeed, nucleotide substitutions in H-NS-bound regions of the ybdO promoter and coding regions have diversified the potential for H-NS-independent negative regulation among E. coli strains. The ybdO expression in these strains was still negatively regulated by H-NS, which reduced the effect of H-NS-independent regulation under normal growth conditions. Hence, we propose that, during E. coli evolution, the conservation of H-NS binding sites resulted in the diversification of the regulation of horizontally transferred genes, which may have facilitated E. coli adaptation to new ecological niches.
Complete mitochondrial genome of the Yellow-spotted skate Okamejei hollandi (Rajiformes: Rajidae).
Li, Weidong; Chen, Xiao; Liu, Wenai; Sun, Renjie; Zhou, Haolang
2016-07-01
The complete mitochondrial genome of the Yellow-spotted skate Okamejei hollandi was determined in this study. It is 16,974 bp in length and contains 13 protein-coding genes, two rRNA genes, 22 tRNA genes, and one putative control region. The overall base composition is 30.5% A, 27.8% C, 14.0% G, and 27.8% T. There are 28 bp short intergenic spaces located in 12 gene junctions and 31 bp overlaps located in nine gene junctions in the whole mitogenome. Two start codons (ATG and GTG) and two stop codons (TAG and TAA/T) were used in the protein-coding genes. The lengths of 22 tRNA genes range from 68 (tRNA-Ser2) to 75 (tRNA-Leu1) bp. The origin of L-strand replication (OL) sequence (37 bp) was identified between the tRNA-Asn and tRNA-Cys genes. The control region is 1311 bp in length with high A + T and poor G content.
Whole mitochondrial genome sequence for an osteoarthritis model of Guinea pig (Caviidae; Cavia).
Cui, Xin-Gang; Liu, Cheng-Yao; Wei, Bo; Zhao, Wen-Jian; Zhang, Wen-Feng
2016-11-01
Animal models played an important role in osteoarthritis studies. Here, the complete mitochondrial genome sequence of the Guinea pig was reported for the first time. The total length of the mitogenome was 16,797 bp. It contained the typical structure, including two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one non-coding control region (D-loop region). The overall composition of the mitogenome was estimated to be 34.9% for A, 26.1% for T, 26.0% for C and 13.0% for G showing an A-T (61.0%)-rich feature. This mitochondrial genome sequence will provide new genetic resource into osteoarthritis disease.
Howard, David M; Adams, Mark J; Clarke, Toni-Kim; Wigmore, Eleanor M; Zeng, Yanni; Hagenaars, Saskia P; Lyall, Donald M; Thomson, Pippa A; Evans, Kathryn L; Porteous, David J; Nagy, Reka; Hayward, Caroline; Haley, Chris S; Smith, Blair H; Murray, Alison D; Batty, G David; Deary, Ian J; McIntosh, Andrew M
2017-01-01
Cognitive ability is a heritable trait with a polygenic architecture, for which several associated variants have been identified using genotype-based and candidate gene approaches. Haplotype-based analyses are a complementary technique that take phased genotype data into account, and potentially provide greater statistical power to detect lower frequency variants. In the present analysis, three cohort studies (n total = 48,002) were utilised: Generation Scotland: Scottish Family Health Study (GS:SFHS), the English Longitudinal Study of Ageing (ELSA), and the UK Biobank. A genome-wide haplotype-based meta-analysis of cognitive ability was performed, as well as a targeted meta-analysis of several gene coding regions. None of the assessed haplotypes provided evidence of a statistically significant association with cognitive ability in either the individual cohorts or the meta-analysis. Within the meta-analysis, the haplotype with the lowest observed P -value overlapped with the D-amino acid oxidase activator ( DAOA ) gene coding region. This coding region has previously been associated with bipolar disorder, schizophrenia and Alzheimer's disease, which have all been shown to impact upon cognitive ability. Another potentially interesting region highlighted within the current genome-wide association analysis (GS:SFHS: P = 4.09 x 10 -7 ), was the butyrylcholinesterase ( BCHE ) gene coding region. The protein encoded by BCHE has been shown to influence the progression of Alzheimer's disease and its role in cognitive ability merits further investigation. Although no evidence was found for any haplotypes with a statistically significant association with cognitive ability, our results did provide further evidence that the genetic variants contributing to the variance of cognitive ability are likely to be of small effect.
Regions of extreme synonymous codon selection in mammalian genes
Schattner, Peter; Diekhans, Mark
2006-01-01
Recently there has been increasing evidence that purifying selection occurs among synonymous codons in mammalian genes. This selection appears to be a consequence of either cis-regulatory motifs, such as exonic splicing enhancers (ESEs), or mRNA secondary structures, being superimposed on the coding sequence of the gene. We have developed a program to identify regions likely to be enriched for such motifs by searching for extended regions of extreme codon conservation between homologous genes of related species. Here we present the results of applying this approach to five mammalian species (human, chimpanzee, mouse, rat and dog). Even with very conservative selection criteria, we find over 200 regions of extreme codon conservation, ranging in length from 60 to 178 codons. The regions are often found within genes involved in DNA-binding, RNA-binding or zinc-ion-binding. They are highly depleted for synonymous single nucleotide polymorphisms (SNPs) but not for non-synonymous SNPs, further indicating that the observed codon conservation is being driven by negative selection. Forty-three percent of the regions overlap conserved alternative transcript isoforms and are enriched for known ESEs. Other regions are enriched for TpA dinucleotides and may contain conserved motifs/structures relating to mRNA stability and/or degradation. We anticipate that this tool will be useful for detecting regions enriched in other classes of coding-sequence motifs and structures as well. PMID:16556911
DOE Office of Scientific and Technical Information (OSTI.GOV)
Teumer, J.; Green, H.
1989-02-01
The gene for involucrin, an epidermal protein, has been remodeled in the higher primates. Most of the coding region of the human gene consists of a modern segment of repeats derived from a 10-codon sequence present in the ancestral segment of the gene. The modern segment can be divided into early, middle, and late regions. The authors report here the nucleotide sequence of three alleles of the gorilla involucrin gene. Each possesses a modern segment homologous to that of the human and consisting of 10-codon repeats. The early and middle regions are similar to the corresponding regions of the humanmore » allele and are nearly identical among the different gorilla alleles. The late region consists of recent duplications whose pattern is unique in each of the gorilla alleles and in the human allele. The early region is located in what is now the 3{prime} third of the modern segment, and the late, polymorphic region is located in what is now the 5{prime} third. Therefore, as the modern segment expanded during evolution, its 3{prime} end became stabilized, and continuing duplications became confined to its 5{prime} end. The expansion of the involucrin coding region, which began long before the separation of the gorilla and human, has continued in both species after their separation.« less
Quach, Tommy; Brooks, Daniel M; Miranda, Hector C
2016-01-01
The complete mitochondrial genome of the Palawan peacock-pheasant Polyplectron napoleonis is 16,710 bp and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a control-region. All protein-coding genes use the standard ATG start codon, except for cox1 which has GTG start codon. Seven out of 13 PCGs have TAA stop codons, two have AGG (cox1 and nd6), and three PCGs (nd2, cox2 and nd4) have incomplete stop codon of just T- - nucleotide.
Ma, Yuanyuan; Zheng, Xiaodong; Cheng, Rubin; Li, Qi
2016-01-01
In this paper, we determined the complete mitochondrial genome of Octopus conispadiceus (Cephalopoda: Octopodidae). The whole mitogenome of O. conispadiceus is 16,027 basepairs (bp) in length with a base composition of 41.4% A, 34.8% T, 16.1% C, 7.7% G and contains 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, and a major non-coding region (MNR). The gene arrangements of O. conispadiceus showed remarkable similarity to that of O. vulgaris, Amphioctopus fangsiao, Cistopus chinensis and C. taiwanicus.
Seim, Inge; Carter, Shea L; Herington, Adrian C; Chopin, Lisa K
2008-01-01
Background The peptide hormone ghrelin has many important physiological and pathophysiological roles, including the stimulation of growth hormone (GH) release, appetite regulation, gut motility and proliferation of cancer cells. We previously identified a gene on the opposite strand of the ghrelin gene, ghrelinOS (GHRLOS), which spans the promoter and untranslated regions of the ghrelin gene (GHRL). Here we further characterise GHRLOS. Results We have described GHRLOS mRNA isoforms that extend over 1.4 kb of the promoter region and 106 nucleotides of exon 4 of the ghrelin gene, GHRL. These GHRLOS transcripts initiate 4.8 kb downstream of the terminal exon 4 of GHRL and are present in the 3' untranslated exon of the adjacent gene TATDN2 (TatD DNase domain containing 2). Interestingly, we have also identified a putative non-coding TATDN2-GHRLOS chimaeric transcript, indicating that GHRLOS RNA biogenesis is extremely complex. Moreover, we have discovered that the 3' region of GHRLOS is also antisense, in a tail-to-tail fashion to a novel terminal exon of the neighbouring SEC13 gene, which is important in protein transport. Sequence analyses revealed that GHRLOS is riddled with stop codons, and that there is little nucleotide and amino-acid sequence conservation of the GHRLOS gene between vertebrates. The gene spans 44 kb on 3p25.3, is extensively spliced and harbours multiple variable exons. We have also investigated the expression of GHRLOS and found evidence of differential tissue expression. It is highly expressed in tissues which are emerging as major sites of non-coding RNA expression (the thymus, brain, and testis), as well as in the ovary and uterus. In contrast, very low levels were found in the stomach where sense, GHRL derived RNAs are highly expressed. Conclusion GHRLOS RNA transcripts display several distinctive features of non-coding (ncRNA) genes, including 5' capping, polyadenylation, extensive splicing and short open reading frames. The gene is also non-conserved, with differential and tissue-restricted expression. The overlapping genomic arrangement of GHRLOS with the ghrelin gene indicates that it is likely to have interesting regulatory and functional roles in the ghrelin axis. PMID:18954468
Seim, Inge; Carter, Shea L; Herington, Adrian C; Chopin, Lisa K
2008-10-28
The peptide hormone ghrelin has many important physiological and pathophysiological roles, including the stimulation of growth hormone (GH) release, appetite regulation, gut motility and proliferation of cancer cells. We previously identified a gene on the opposite strand of the ghrelin gene, ghrelinOS (GHRLOS), which spans the promoter and untranslated regions of the ghrelin gene (GHRL). Here we further characterise GHRLOS. We have described GHRLOS mRNA isoforms that extend over 1.4 kb of the promoter region and 106 nucleotides of exon 4 of the ghrelin gene, GHRL. These GHRLOS transcripts initiate 4.8 kb downstream of the terminal exon 4 of GHRL and are present in the 3' untranslated exon of the adjacent gene TATDN2 (TatD DNase domain containing 2). Interestingly, we have also identified a putative non-coding TATDN2-GHRLOS chimaeric transcript, indicating that GHRLOS RNA biogenesis is extremely complex. Moreover, we have discovered that the 3' region of GHRLOS is also antisense, in a tail-to-tail fashion to a novel terminal exon of the neighbouring SEC13 gene, which is important in protein transport. Sequence analyses revealed that GHRLOS is riddled with stop codons, and that there is little nucleotide and amino-acid sequence conservation of the GHRLOS gene between vertebrates. The gene spans 44 kb on 3p25.3, is extensively spliced and harbours multiple variable exons. We have also investigated the expression of GHRLOS and found evidence of differential tissue expression. It is highly expressed in tissues which are emerging as major sites of non-coding RNA expression (the thymus, brain, and testis), as well as in the ovary and uterus. In contrast, very low levels were found in the stomach where sense, GHRL derived RNAs are highly expressed. GHRLOS RNA transcripts display several distinctive features of non-coding (ncRNA) genes, including 5' capping, polyadenylation, extensive splicing and short open reading frames. The gene is also non-conserved, with differential and tissue-restricted expression. The overlapping genomic arrangement of GHRLOS with the ghrelin gene indicates that it is likely to have interesting regulatory and functional roles in the ghrelin axis.
Phylogeny of flowering plants by the chloroplast genome sequences: in search of a "lucky gene".
Logacheva, M D; Penin, A A; Samigullin, T H; Vallejo-Roman, C M; Antonov, A S
2007-12-01
One of the most complicated remaining problems of molecular-phylogenetic analysis is choosing an appropriate genome region. In an ideal case, such a region should have two specific properties: (i) results of analysis using this region should be similar to the results of multigene analysis using the maximal number of regions; (ii) this region should be arranged compactly and be significantly shorter than the multigene set. The second condition is necessary to facilitate sequencing and extension of taxons under analysis, the number of which is also crucial for molecular phylogenetic analysis. Such regions have been revealed for some groups of animals and have been designated as "lucky genes". We have carried out a computational experiment on analysis of 41 complete chloroplast genomes of flowering plants aimed at searching for a "lucky gene" for reconstruction of their phylogeny. It is shown that the phylogenetic tree inferred from a combination of translated nucleotide sequences of genes encoding subunits of plastid RNA polymerase is closest to the tree constructed using all protein coding sites of the chloroplast genome. The only node for which a contradiction is observed is unstable according to the different type analyses. For all the other genes or their combinations, the coincidence is significantly worse. The RNA polymerase genes are compactly arranged in the genome and are fourfold shorter than the total length of protein coding genes used for phylogenetic analysis. The combination of all necessary features makes this group of genes main candidates for the role of "lucky gene" in studying phylogeny of flowering plants.
Comparative architecture of silks, fibrous proteins and their encoding genes in insects and spiders.
Craig, Catherine L; Riekel, Christian
2002-12-01
The known silk fibroins and fibrous glues are thought to be encoded by members of the same gene family. All silk fibroins sequenced to date contain regions of long-range order (crystalline regions) and/or short-range order (non-crystalline regions). All of the sequenced fibroin silks (Flag or silk from flagelliform gland in spiders; Fhc or heavy chain fibroin silks produced by Lepidoptera larvae) are made up of hierarchically organized, repetitive arrays of amino acids. Fhc fibroin genes are characterized by a similar molecular genetic architecture of two exons and one intron, but the organization and size of these units differs. The Flag, Ser (sericin gene) and BR (Balbiani ring genes; both fibrous proteins) genes are made up of multiple exons and introns. Sequences coding for crystalline and non-crystalline protein domains are integrated in the repetitive regions of Fhc and MA exons, but not in the protein glues Ser1 and BR-1. Genetic 'hot-spots' promote recombination errors in Fhc, MA, and Flag. Codon bias, structural constraint, point mutations, and shortened coding arrays may be alternative means of stabilizing precursor mRNA transcripts. Differential regulation of gene expression and selective splicing of the mRNA transcript may allow rapid adaptation of silk functional properties to different physical environments.
Usein, C R; Damian, M; Tatu-Chitoiu, D; Capusa, C; Fagaras, R; Tudorache, D; Nica, M; Le Bouguénec, C
2001-01-01
A total of 78 E. coli strains isolated from adults with different types of urinary tract infections were screened by polymerase chain reaction for prevalence of genetic regions coding for virulence factors. The targeted genetic determinants were those coding for type 1 fimbriae (fimH), pili associated with pyelonephritis (pap), S and F1C fimbriae (sfa and foc), afimbrial adhesins (afa), hemolysin (hly), cytotoxic necrotizing factor (cnf), aerobactin (aer). Among the studied strains, the prevalence of genes coding for fimbrial adhesive systems was 86%, 36%, and 23% for fimH, pap, and sfa/foc,respectively. The operons coding for Afa afimbrial adhesins were identified in 14% of strains. The hly and cnf genes coding for toxins were amplified in 23% and 13% of strains, respectively. A prevalence of 54% was found for the aer gene. The various combinations of detected genes were designated as virulence patterns. The strains isolated from the hospitalized patients displayed a greater number of virulence genes and a diversity of gene associations compared to the strains isolated from the ambulatory subjects. A rapid assessment of the bacterial pathogenicity characteristics may contribute to a better medical approach of the patients with urinary tract infections.
Norman, Paul J.; Norberg, Steven J.; Guethlein, Lisbeth A.; Nemat-Gorgani, Neda; Royce, Thomas; Wroblewski, Emily E.; Dunn, Tamsen; Mann, Tobias; Alicata, Claudia; Hollenbach, Jill A.; Chang, Weihua; Shults Won, Melissa; Gunderson, Kevin L.; Abi-Rached, Laurent; Ronaghi, Mostafa; Parham, Peter
2017-01-01
The most polymorphic part of the human genome, the MHC, encodes over 160 proteins of diverse function. Half of them, including the HLA class I and II genes, are directly involved in immune responses. Consequently, the MHC region strongly associates with numerous diseases and clinical therapies. Notoriously, the MHC region has been intractable to high-throughput analysis at complete sequence resolution, and current reference haplotypes are inadequate for large-scale studies. To address these challenges, we developed a method that specifically captures and sequences the 4.8-Mbp MHC region from genomic DNA. For 95 MHC homozygous cell lines we assembled, de novo, a set of high-fidelity contigs and a sequence scaffold, representing a mean 98% of the target region. Included are six alternative MHC reference sequences of the human genome that we completed and refined. Characterization of the sequence and structural diversity of the MHC region shows the approach accurately determines the sequences of the highly polymorphic HLA class I and HLA class II genes and the complex structural diversity of complement factor C4A/C4B. It has also uncovered extensive and unexpected diversity in other MHC genes; an example is MUC22, which encodes a lung mucin and exhibits more coding sequence alleles than any HLA class I or II gene studied here. More than 60% of the coding sequence alleles analyzed were previously uncharacterized. We have created a substantial database of robust reference MHC haplotype sequences that will enable future population scale studies of this complicated and clinically important region of the human genome. PMID:28360230
Peng, Rui; Zeng, Bo; Meng, Xiuxiang; Yue, Bisong; Zhang, Zhihe; Zou, Fangdong
2007-08-01
The complete mitochondrial genome sequence of the giant panda, Ailuropoda melanoleuca, was determined by the long and accurate polymerase chain reaction (LA-PCR) with conserved primers and primer walking sequence methods. The complete mitochondrial DNA is 16,805 nucleotides in length and contains two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one control region. The total length of the 13 protein-coding genes is longer than the American black bear, brown bear and polar bear by 3 amino acids at the end of ND5 gene. The codon usage also followed the typical vertebrate pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 5 (ND5) gene. The molecular phylogenetic analysis was performed on the sequences of 12 concatenated heavy-strand encoded protein-coding genes, and suggested that the giant panda is most closely related to bears.
Simard, Frédéric; Licht, Monica; Besansky, Nora J.; Lehmann, Tovi
2007-01-01
Genetic variation in defensin, a gene encoding a major effector molecule of insects immune response was analyzed within and between populations of three members of the Anopheles gambiae complex. The species selected included the two anthropophilic species, An. gambiae and An. arabiensis and the most zoophilic species of the complex, An. quadriannulatus. The first species was represented by four populations spanning its extreme genetic and geographical ranges, whereas each of the other two species was represented by a single population. We found (i) reduced overall polymorphism in the mature peptide region and in the total coding region, together with specific reductions in rare and moderately frequent mutations (sites) in the coding region compared with non coding regions, (ii) markedly reduced rate of nonsynonymous diversity compared with synonymous variation in the mature peptide and virtually identical mature peptide across the three species, and (iii) increased divergence between species in the mature peptide together with reduced differentiation between populations of An. gambiae in the same DNA region. These patterns suggest a strong purifying selection on the mature peptide and probably the whole coding region. Because An. quadriannulatus is not exposed to human pathogens, identical mature peptide and similar pattern of polymorphism across species implies that human pathogens played no role as selective agents on this peptide. PMID:17161659
A Molecular Portrait of De Novo Genes in Yeasts.
Vakirlis, Nikolaos; Hebert, Alex S; Opulente, Dana A; Achaz, Guillaume; Hittinger, Chris Todd; Fischer, Gilles; Coon, Joshua J; Lafontaine, Ingrid
2018-03-01
New genes, with novel protein functions, can evolve "from scratch" out of intergenic sequences. These de novo genes can integrate the cell's genetic network and drive important phenotypic innovations. Therefore, identifying de novo genes and understanding how the transition from noncoding to coding occurs are key problems in evolutionary biology. However, identifying de novo genes is a difficult task, hampered by the presence of remote homologs, fast evolving sequences and erroneously annotated protein coding genes. To overcome these limitations, we developed a procedure that handles the usual pitfalls in de novo gene identification and predicted the emergence of 703 de novo gene candidates in 15 yeast species from 2 genera whose phylogeny spans at least 100 million years of evolution. We validated 85 candidates by proteomic data, providing new translation evidence for 25 of them through mass spectrometry experiments. We also unambiguously identified the mutations that enabled the transition from noncoding to coding for 30 Saccharomyces de novo genes. We established that de novo gene origination is a widespread phenomenon in yeasts, only a few being ultimately maintained by selection. We also found that de novo genes preferentially emerge next to divergent promoters in GC-rich intergenic regions where the probability of finding a fortuitous and transcribed ORF is the highest. Finally, we found a more than 3-fold enrichment of de novo genes at recombination hot spots, which are GC-rich and nucleosome-free regions, suggesting that meiotic recombination contributes to de novo gene emergence in yeasts.
Identification of three novel NHS mutations in families with Nance-Horan syndrome
Wu, Junhua; Brooks, Simon P.; Hardcastle, Alison J.; Lewis, Richard Alan; Stambolian, Dwight
2007-01-01
Purpose Nance-Horan Syndrome (NHS) is an infrequent and often overlooked X-linked disorder characterized by dense congenital cataracts, microphthalmia, and dental abnormalities. The syndrome is caused by mutations in the NHS gene, whose function is not known. The purpose of this study was to identify the frequency and distribution of NHS gene mutations and compare genotype with Nance-Horan phenotype in five North American NHS families. Methods Genomic DNA was isolated from white blood cells from NHS patients and family members. The NHS gene coding region and its splice site donor and acceptor regions were amplified from genomic DNA by PCR, and the amplicons were sequenced directly. Results We identified three unique NHS coding region mutations in these NHS families. Conclusions This report extends the number of unique identified NHS mutations to 14. PMID:17417607
The complete mitochondrial genome of Rapana venosa (Gastropoda, Muricidae).
Sun, Xiujun; Yang, Aiguo
2016-01-01
The complete mitochondrial (mt) genome of the veined rapa whelk, Rapana venosa, was determined using genome walking techniques in this study. The total length of the mt genome sequence of R. venosa was 15,271 bp, which is comparable to the reported Muricidae mitogenomes to date. It contained 13 protein-coding genes, 21 transfer RNA genes, and two ribosomal RNA genes. A bias towards a higher representation of nucleotides A and T (69%) was detected in the mt genome of R. venosa. A small number of non-coding nucleotides (302 bp) was detected, and the largest non-coding region was 74 bp in length.
The structure of the human interferon alpha/beta receptor gene.
Lutfalla, G; Gardiner, K; Proudhon, D; Vielh, E; Uzé, G
1992-02-05
Using the cDNA coding for the human interferon alpha/beta receptor (IFNAR), the IFNAR gene has been physically mapped relative to the other loci of the chromosome 21q22.1 region. 32,906 base pairs covering the IFNAR gene have been cloned and sequenced. Primer extension and solution hybridization-ribonuclease protection have been used to determine that the transcription of the gene is initiated in a broad region of 20 base pairs. Some aspects of the polymorphism of the gene, including noncoding sequences, have been analyzed; some are allelic differences in the coding sequence that induce amino acid variations in the resulting protein. The exon structure of the IFNAR gene and of that of the available genes for the receptors of the cytokine/growth hormone/prolactin/interferon receptor family have been compared with the predictions for the secondary structure of those receptors. From this analysis, we postulate a common origin and propose an hypothesis for the divergence from the immunoglobulin superfamily.
The complete mitochondrial genome of Gobiobotia filifer (Teleostei, Cypriniformes: Cyprinidae).
Li, Qiang; Liu, Ya; Zhou, Jian; Gong, Quan; Li, Hua; Lai, Jiansheng; Li, Lianman
2016-09-01
The Gobiobotia filifer is a small economic fish which distributes in the upstream of Yangtze River and its distributaries. For the environmental pollution and overfishing, its population declined drastically in recent decades, so it is essential to protect its resource. In this study, the complete mitochondrial genome sequence of G. filifer was determined with PCR technology, which contains 13 protein-coding genes, 22 tRNA genes, two rRNA genes, and a non-coding control region with the total length of 16,613 bp. The order and composition of genes were similar to most of the other teleost fish. Most of the genes were encoded on heavy strand, except for ND6 genes and eight tRNAs. Just like most other vertebrates, the bias of G and C has been found in different genes/regions. The complete mitochondrial genome sequence of G. filifer would contribute to better understand evolution of this lineage, population genetics, and will help administrative department to make rules and laws to protect this lineage.
The complete mitochondrial genome of Liobagrus marginatus (Teleostei, Siluriformes: Amblycipitidae).
Li, Qiang; Du, Jun; Liu, Ya; Zhou, Jian; Ke, Hongyu; Liu, Chao; Liu, Guangxun
2014-04-01
The Liobagrus marginatus is an economic fish which distribute in the upstream of Yangtze river and its distributary. For its taste fresh, environmental pollution and overfishing, its population declined drastically and body miniaturization in recent decades, so it is essential to protect its resource. In this study, the complete mitochondrial genome sequence of Liobagrus marginatus was sequenced, which contains 22 tRNA genes, 13 protein-coding genes, 2 rRNA genes, and a non-coding control region with the total length of 16,497 bp. The gene arrangement and composition are similar to most of other fish. Most of the genes are encoded on heavy-strand, except for eight tRNA and ND6 genes. Just like most other vertebrates, the bias of G and C has been found in statistics results of different genes/regions. The complete mitochondrial genome sequence of Liobagrus marginatus would contribute to better understand population genetics, evolution of this lineage, and will help administrative departments to make rules and laws to protect it.
Development of Plant Gene Vectors for Tissue-Specific Expression Using GFP as a Reporter Gene
NASA Technical Reports Server (NTRS)
Jackson, Jacquelyn; Egnin, Marceline; Xue, Qi-Han; Prakash, C. S.
1997-01-01
Reporter genes are widely employed in plant molecular biology research to analyze gene expression and to identify promoters. Gus (UidA) is currently the most popular reporter gene but its detection requires a destructive assay. The use of jellyfish green fluorescent protein (GFP) gene from Aequorea Victoria holds promise for noninvasive detection of in vivo gene expression. To study how various plant promoters are expressed in sweet potato (Ipomoea batatas), we are transcriptionally fusing the intron-modified (mGFP) or synthetic (modified for codon-usage) GFP coding regions to these promoters: double cauliflower mosaic virus 35S (CaMV 35S) with AMV translational enhancer, ubiquitin7-intron-ubiquitin coding region (ubi7-intron-UQ) and sporaminA. A few of these vectors have been constructed and introduced into E. coli DH5a and Agrobacterium tumefaciens EHA105. Transient expression studies are underway using protoplast-electroporation and particle bombardment of leaf tissues.
IL-TIF/IL-22: genomic organization and mapping of the human and mouse genes.
Dumoutier, L; Van Roost, E; Ameye, G; Michaux, L; Renauld, J C
2000-12-01
IL-TIF is a new cytokine originally identified as a gene induced by IL-9 in murine T lymphocytes, and showing 22% amino acid identity with IL-10. Here, we report the sequence and organization of the mouse and human IL-TIF genes, which both consist of 6 exons spreading over approximately 6 Kb. The IL-TIF gene is a single copy gene in humans, and is located on chromosome 12q15, at 90 Kb from the IFN gamma gene, and at 27 Kb from the AK155 gene, which codes for another IL-10-related cytokine. In the mouse, the IL-TIF gene is located on chromosome 10, also in the same region as the IFN gamma gene. Although it is a single copy gene in BALB/c and DBA/2 mice, the IL-TIF gene is duplicated in other strains such as C57Bl/6, FVB and 129. The two copies, which show 98% nucleotide identity in the coding region, were named IL-TIF alpha and IL-TIF beta. Beside single nucleotide variations, they differ by a 658 nucleotide deletion in IL-TIF beta, including the first non-coding exon and 603 nucleotides from the promoter. A DNA fragment corresponding to this deletion was sufficient to confer IL-9-regulated expression of a luciferase reporter plasmid, suggesting that the IL-TIF beta gene is either differentially regulated, or not expressed at all.
2010-01-01
Background Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Results Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. Conclusions A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana. PMID:20637079
Genomic Structure of the Luciferase Gene from the Bioluminescent Beetle, Nyctophila cf. Caucasica
Day, John C.; Chaichi, Mohammad J.; Najafil, Iraj; Whiteley, Andrew S.
2006-01-01
The gene coding for beetle luciferase, the enzyme responsible for bioluminescence in over two thousand coleopteran species has, to date, only been characterized from one Palearctic species of Lampyridae. Here we report the characterization of the luciferase gene from a female beetle of an Iranian lampyrid species, Nyctophila cf. caucasica (Coleoptera:Lampyridae). The luciferase gene was composed of seven exons, coding for 547 amino acids, separated by six introns spanning 1976 bp of genomic DNA. The deduced amino acid sequences of the luciferase gene of N. caucasica showed 98.9% homology to that of the Palearctic species Lampyris noctiluca. Analysis of the 810 bp upstream region of the luciferase gene revealed three TATA boxes and several other consensus transcriptional factor recognition sequences presenting evidence for a putative core promoter region conserved in Lampyrinae from -190 through to -155 upstream of the luciferase start codon. Along with the core promoter region the luciferase gene was compared with orthologous sequences from other lampyrid species and found to have greatest identity to Lampyris turkistanicus and Lampyris noctiluca. The significant sequence identity to the former is discussed in relation to taxonomic issues of Iranian lampyrids. PMID:20298115
Zeng, Fan-chun; Gao, Cheng-wen; Gao, Li-zhi
2016-01-01
The complete chloroplast genome sequence of American bird pepper (Capsicum annuum var. glabriusculum) is reported and characterized in this study. The genome size is 156,612 bp, containing a pair of inverted repeats (IRs) of 25,776 bp separated by a large single-copy region of 87,213 bp and a small single-copy region of 17,851 bp. The chloroplast genome harbors 130 known genes, including 89 protein-coding genes, 8 ribosomal RNA genes, and 37 tRNA genes. A total of 18 of these genes are duplicated in the inverted repeat regions, 16 genes contain 1 intron, and 2 genes and one ycf have 2 introns.
The 5S RNA gene minichromosome of Euplotes.
Roberson, A E; Wolffe, A P; Hauser, L J; Olins, D E
1989-01-01
The macronucleus of the ciliated protozoan Euplotes eurystomus contains about 10(6) copies of a single type of 5S ribosomal RNA gene. This 5S gene DNA is only 930 bp long, is flanked by telomeres, and contains a single coding region of 120 bp which serves as a template for transcription in vivo and in vitro. The 5S gene minichromatin possesses four positioned nucleosomes and hypersensitive cleavage sites in the telomeric regions. Images PMID:2501759
Analysis of alterative cleavage and polyadenylation by 3′ region extraction and deep sequencing
Hoque, Mainul; Ji, Zhe; Zheng, Dinghai; Luo, Wenting; Li, Wencheng; You, Bei; Park, Ji Yeon; Yehia, Ghassan; Tian, Bin
2012-01-01
Alternative cleavage and polyadenylation (APA) leads to mRNA isoforms with different coding sequences (CDS) and/or 3′ untranslated regions (3′UTRs). Using 3′ Region Extraction And Deep Sequencing (3′READS), a method which addresses the internal priming and oligo(A) tail issues that commonly plague polyA site (pA) identification, we comprehensively mapped pAs in the mouse genome, thoroughly annotating 3′ ends of genes and revealing over five thousand pAs (~8% of total) flanked by A-rich sequences, which have hitherto been overlooked. About 79% of mRNA genes and 66% of long non-coding RNA (lncRNA) genes have APA; but these two gene types have distinct usage patterns for pAs in introns and upstream exons. Promoter-distal pAs become relatively more abundant during embryonic development and cell differentiation, a trend affecting pAs in both 3′-most exons and upstream regions. Upregulated isoforms generally have stronger pAs, suggesting global modulation of the 3′ end processing activity in development and differentiation. PMID:23241633
Dubey, Bhawna; Meganathan, P R; Haque, Ikramul
2012-07-01
This paper reports the complete mitochondrial genome sequence of an endangered Indian snake, Python molurus molurus (Indian Rock Python). A typical snake mitochondrial (mt) genome of 17258 bp length comprising of 37 genes including the 13 protein coding genes, 22 tRNA genes, and 2 ribosomal RNA genes along with duplicate control regions is described herein. The P. molurus molurus mt. genome is relatively similar to other snake mt. genomes with respect to gene arrangement, composition, tRNA structures and skews of AT/GC bases. The nucleotide composition of the genome shows that there are more A-C % than T-G% on the positive strand as revealed by positive AT and CG skews. Comparison of individual protein coding genes, with other snake genomes suggests that ATP8 and NADH3 genes have high divergence rates. Codon usage analysis reveals a preference of NNC codons over NNG codons in the mt. genome of P. molurus. Also, the synonymous and non-synonymous substitution rates (ka/ks) suggest that most of the protein coding genes are under purifying selection pressure. The phylogenetic analyses involving the concatenated 13 protein coding genes of P. molurus molurus conformed to the previously established snake phylogeny.
Holland, M J; Holland, J P; Thill, G P; Jackson, K A
1981-02-10
Segments of yeast genomic DNA containing two enolase structural genes have been isolated by subculture cloning procedures using a cDNA hybridization probe synthesized from purified yeast enolase mRNA. Based on restriction endonuclease and transcriptional maps of these two segments of yeast DNA, each hybrid plasmid contains a region of extensive nucleotide sequence homology which forms hybrids with the cDNA probe. The DNA sequences which flank this homologous region in the two hybrid plasmids are nonhomologous indicating that these sequences are nontandemly repeated in the yeast genome. The complete nucleotide sequence of the coding as well as the flanking noncoding regions of these genes has been determined. The amino acid sequence predicted from one reading frame of both structural genes is extremely similar to that determined for yeast enolase (Chin, C. C. Q., Brewer, J. M., Eckard, E., and Wold, F. (1981) J. Biol. Chem. 256, 1370-1376), confirming that these isolated structural genes encode yeast enolase. The nucleotide sequences of the coding regions of the genes are approximately 95% homologous, and neither gene contains an intervening sequence. Codon utilization in the enolase genes follows the same biased pattern previously described for two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes (Holland, J. P., and Holland, M. J. (1980) J. Biol. Chem. 255, 2596-2605). DNA blotting analysis confirmed that the isolated segments of yeast DNA are colinear with yeast genomic DNA and that there are two nontandemly repeated enolase genes per haploid yeast genome. The noncoding portions of the two enolase genes adjacent to the initiation and termination codons are approximately 70% homologous and contain sequences thought to be involved in the synthesis and processing messenger RNA. Finally there are regions of extensive homology between the two enolase structural genes and two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes within the 5- noncoding portions of these glycolytic genes.
Palindromic repetitive DNA elements with coding potential in Methanocaldococcus jannaschii.
Suyama, Mikita; Lathe, Warren C; Bork, Peer
2005-10-10
We have identified 141 novel palindromic repetitive elements in the genome of euryarchaeon Methanocaldococcus jannaschii. The total length of these elements is 14.3kb, which corresponds to 0.9% of the total genomic sequence and 6.3% of all extragenic regions. The elements can be divided into three groups (MJRE1-3) based on the sequence similarity. The low sequence identity within each of the groups suggests rather old origin of these elements in M. jannaschii. Three MJRE2 elements were located within the protein coding regions without disrupting the coding potential of the host genes, indicating that insertion of repeats might be a widespread mechanism to enhance sequence diversity in coding regions.
Véliz, David; Vega-Retter, Caren; Quezada-Romegialli, Claudio
2016-01-01
The complete sequence of the mitochondrial genome for the Chilean silverside Basilichthys microlepidotus is reported for the first time. The entire mitochondrial genome was 16,544 bp in length (GenBank accession no. KM245937); gene composition and arrangement was conformed to that reported for most fishes and contained the typical structure of 2 rRNAs, 13 protein-coding genes, 22 tRNAs and a non-coding region. The assembled mitogenome was validated against sequences of COI and Control Region previously sequenced in our lab, functional genes from RNA-Seq data for the same species and the mitogenome of two other atherinopsid species available in Genbank.
Clinical application of antenatal genetic diagnosis of osteogenesis imperfecta type IV.
Yuan, Jing; Li, Song; Xu, YeYe; Cong, Lin
2015-04-02
Clinical analysis and genetic testing of a family with osteogenesis imperfecta type IV were conducted, aiming to discuss antenatal genetic diagnosis of osteogenesis imperfecta type IV. Preliminary genotyping was performed based on clinical characteristics of the family members and then high-throughput sequencing was applied to rapidly and accurately detect the changes in candidate genes. Genetic testing of the III5 fetus and other family members revealed missense mutation in c.2746G>A, pGly916Arg in COL1A2 gene coding region and missense and synonymous mutation in COL1A1 gene coding region. Application of antenatal genetic diagnosis provides fast and accurate genetic counseling and eugenics suggestions for patients with osteogenesis imperfecta type IV and their families.
The complete mitochondrial genome of the ice pigeon (Columba livia breed ice).
Zhang, Rui-Hua; He, Wen-Xiao
2015-02-01
The ice pigeon is a breed of fancy pigeon developed over many years of selective breeding. In the present work, we report the complete mitochondrial genome sequence of ice pigeon for the first time. The total length of the mitogenome was 17,236 bp with the base composition of 30.2% for A, 24.0% for T, 31.9% for C, and 13.9% for G and an A-T (54.2 %)-rich feature was detected. It harbored 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 1 non-coding control region (D-loop region). The arrangement of all genes was identical to the typical mitochondrial genomes of pigeon. The complete mitochondrial genome sequence of ice pigeon would serve as an important data set of the germplasm resources for further study.
The complete mitochondrial genome of the Fancy Pigeon, Columba livia (Columbiformes: Columbidae).
Zhang, Rui-Hua; Xu, Ming-Ju; Wang, Cun-Lian; Xu, Tong; Wei, Dong; Liu, Bao-Jian; Wang, Guo-Hua
2015-02-01
The fancy pigeons are domesticated varieties of the rock pigeon developed over many years of selective breeding. In the present work, we report the complete mitochondrial genome sequence of fancy pigeon for the first time. The total length of the mitogenome was 17,233 bp with the base composition of 30.1% for A, 24.0% for T, 31.9% for C, and 14.0% for G and an A-T (54.2 %)-rich feature was detected. It harbored 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 1 non-coding control region (D-loop region). The arrangement of all genes was identical to the typical mitochondrial genomes of pigeon. The complete mitochondrial genome sequence of fancy pigeon would serve as an important data set of the germplasm resources for further study.
Stable chromosome condensation revealed by chromosome conformation capture
Eagen, Kyle P.; Hartl, Tom A.; Kornberg, Roger D.
2015-01-01
SUMMARY Chemical cross-linking and DNA sequencing have revealed regions of intra-chromosomal interaction, referred to as topologically associating domains (TADs), interspersed with regions of little or no interaction, in interphase nuclei. We find that TADs and the regions between them correspond with the bands and interbands of polytene chromosomes of Drosophila. We further establish the conservation of TADs between polytene and diploid cells of Drosophila. From direct measurements on light micrographs of polytene chromosomes, we then deduce the states of chromatin folding in the diploid cell nucleus. Two states of folding, fully extended fibers containing regulatory regions and promoters, and fibers condensed up to ten-fold containing coding regions of active genes, constitute the euchromatin of the nuclear interior. Chromatin fibers condensed up to 30-fold, containing coding regions of inactive genes, represent the heterochromatin of the nuclear periphery. A convergence of molecular analysis with direct observation thus reveals the architecture of interphase chromosomes. PMID:26544940
Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred
2014-11-20
Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria. Copyright © 2014 Elsevier B.V. All rights reserved.
Four out of eight genes in a mouse chromosome 7 congenic donor region are candidate obesity genes.
Sarahan, Kari A; Fisler, Janis S; Warden, Craig H
2011-09-22
We previously identified a region of mouse chromosome 7 that influences body fat mass in F2 littermates of congenic × background intercrosses. Current analyses revealed that alleles in the donor region of the subcongenic B6.C-D7Mit318 (318) promoted a twofold increase in adiposity in homozygous lines of 318 compared with background C57BL/6ByJ (B6By) mice. Parent-of-origin effects were discounted through cross-fostering studies and an F1 reciprocal cross. Mapping of the donor region revealed that it has a maximal size of 2.8 Mb (minimum 1.8 Mb) and contains a maximum of eight protein coding genes. Quantitative PCR in whole brain, liver, and gonadal white adipose tissue (GWAT) revealed differential expression between genotypes for three genes in females and two genes in males. Alpha-2,8-sialyltransferase 8B (St8sia2) showed reduced 318 mRNA levels in brain for females and males and in GWAT for females only. Both sexes of 318 mice had reduced Repulsive guidance molecule-a (Rgma) expression in GWAT. In brain, Family with sequence similarity 174 member b (Fam174b) had increased expression in 318 females, whereas Chromodomain helicase DNA binding protein 2 (Chd2-2) had reduced expression in 318 males. No donor region genes were differentially expressed in liver. Sequence analysis of coding exons for all genes in the 318 donor region revealed only one single nucleotide polymorphism that produced a nonsynonymous missense mutation, Gln7Pro, in Fam174b. Our findings highlight the difficulty of using expression and sequence to identify quantitative trait genes underlying obesity even in small genomic regions.
Niu, Fang-Fang; Zhu, Liang; Wang, Su; Wei, Shu-Jun
2016-07-01
Here, we report the mitochondrial genome sequence of the multicolored Asian lady beetle Harmonia axyridis (Pallas, 1773) (Coleoptera: Coccinellidae) (GenBank accession No. KR108208). This is the first species with sequenced mitochondrial genome from the genus Harmonia. The current length with partitial A + T-rich region of this mitochondrial genome is 16,387 bp. All the typical genes were sequenced except the trnI and trnQ. As in most other sequenced mitochondrial genomes of Coleoptera, there is no re-arrangement in the sequenced region compared with the pupative ancestral arrangement of insects. All protein-coding genes start with ATN codons. Five, five and three protein-coding genes stop with termination codon TAA, TA and T, respectively. Phylogenetic analysis using Bayesian method based on the first and second codon positions of the protein-coding genes supported that the Scirtidae is a basal lineage of Polyphaga. The Harmonia and the Coccinella form a sister lineage. The monophyly of Staphyliniformia, Scarabaeiformia and Cucujiformia was supported. The Buprestidae was found to be a sister group to the Bostrichiformia.
NASA Astrophysics Data System (ADS)
Gao, Fengtao; Wei, Min; Zhu, Ying; Guo, Hua; Chen, Songlin; Yang, Guanpin
2017-06-01
This study presents the complete mitochondrial genome of the hybrid Epinephelus moara♀× Epinephelus lanceolatus♂. The genome is 16886 bp in length, and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes, a light-strand replication origin and a control region. Additionally, phylogenetic analysis based on the nucleotide sequences of 13 conserved protein-coding genes using the maximum likelihood method indicated that the mitochondrial genome is maternally inherited. This study presents genomic data for studying phylogenetic relationships and breeding of hybrid Epinephelinae.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leong, JoAnn Ching
The nucleotide sequence of the IHNV glycoprotein gene has been determined from a cDNA clone containing the entire coding region. The glycoprotein cDNA clone contained a leader sequence of 48 bases, a coding region of 1524 nucleotides, and 39 bases at the 3 foot end. The entire cDNA clone contains 1609 nucleodites and encodes a protein of 508 amino acids. The deduced amino acid sequence gave a translated molecular weight of 56,795 daltons. A hydropathicity profile of the deduced amino acid sequence indicated that there were two major hydrophobic domains: one,at the N-terminus,delineating a signal peptide of 18 amino acidsmore » and the other, at the C-terminus,delineating the region of the transmembrane. Five possible sites of N-linked glyscoylation were identified. Although no nucleic acid homology existed between the IHNV glycoprotein gene and the glycoprotein genes of rabies and VSV, there was significant homology at the amino acid level between all three rhabdovirus glycoproteins.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ionasescu, V.; Ionasescu, R.; Searby, C.
1996-06-14
We studied the relationship between the genotype and clinical phenotype in 27 families with dominant X-linked Charcot-Marie-Tooth (CMTX1) neuropathy. Twenty-two families showed mutations in the coding region of the connexin32 (cx32) gene. The mutations include four nonsense mutations, eight missense mutations, two medium size deletions, and one insertion. Most missense mutations showed a mild clinical phenotype (five out of eight), whereas all nonsense mutations, the larger of the two deletions, and the insertion that produced frameshifts showed severe phenotypes. Five CMTX1 families with mild clinical phenotype showed no point mutations of the cx32 gene coding region. Three of these familiesmore » showed positive genetic linkage with the markers of the Xq13.1 region. The genetic linkage of the remaining two families could not be evaluated because of their small size. 25 refs., 1 fig., 1 tab.« less
Decoding sORF translation - from small proteins to gene regulation.
Cabrera-Quio, Luis Enrique; Herberg, Sarah; Pauli, Andrea
2016-11-01
Translation is best known as the fundamental mechanism by which the ribosome converts a sequence of nucleotides into a string of amino acids. Extensive research over many years has elucidated the key principles of translation, and the majority of translated regions were thought to be known. The recent discovery of wide-spread translation outside of annotated protein-coding open reading frames (ORFs) came therefore as a surprise, raising the intriguing possibility that these newly discovered translated regions might have unrecognized protein-coding or gene-regulatory functions. Here, we highlight recent findings that provide evidence that some of these newly discovered translated short ORFs (sORFs) encode functional, previously missed small proteins, while others have regulatory roles. Based on known examples we will also speculate about putative additional roles and the potentially much wider impact that these translated regions might have on cellular homeostasis and gene regulation.
The complete chloroplast genome sequence of Dendrobium nobile.
Yan, Wenjin; Niu, Zhitao; Zhu, Shuying; Ye, Meirong; Ding, Xiaoyu
2016-11-01
The complete chloroplast (cp) genome sequence of Dendrobium nobile, an endangered and traditional Chinese medicine with important economic value, is presented in this article. The total genome size is 150,793 bp, containing a large single copy (LSC) region (84,939 bp) and a small single copy region (SSC) (13,310 bp) which were separated by two inverted repeat (IRs) regions (26,272 bp). The overall GC contents of the plastid genome were 38.8%. In total, 130 unique genes were annotated and they were consisted of 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. Fourteen genes contained one or two introns.
Bacteriophage 5' untranslated regions for control of plastid transgene expression.
Yang, Huijun; Gray, Benjamin N; Ahner, Beth A; Hanson, Maureen R
2013-02-01
Expression of foreign proteins from transgenes incorporated into plastid genomes requires regulatory sequences that can be recognized by the plastid transcription and translation machinery. Translation signals harbored by the 5' untranslated region (UTR) of plastid transcripts can profoundly affect the level of accumulation of proteins expressed from chimeric transgenes. Both endogenous 5' UTRs and the bacteriophage T7 gene 10 (T7g10) 5' UTR have been found to be effective in combination with particular coding regions to mediate high-level expression of foreign proteins. We investigated whether two other bacteriophage 5' UTRs could be utilized in plastid transgenes by fusing them to the aadA (aminoglycoside-3'-adenyltransferase) coding region that is commonly used as a selectable marker in plastid transformation. Transplastomic plants containing either the T7g1.3 or T4g23 5' UTRs fused to Myc-epitope-tagged aadA were successfully obtained, demonstrating the ability of these 5' UTRs to regulate gene expression in plastids. Placing the Thermobifida fusca cel6A gene under the control of the T7g1.3 or T4g23 5' UTRs, along with a tetC downstream box, resulted in poor expression of the cellulase in contrast with high-level accumulation while using the T7g10 5' UTR. However, transplastomic plants with the bacteriophage 5' UTRs controlling the aadA coding region exhibited fewer undesired recombinant species than plants containing the same marker gene regulated by the Nicotiana tabacum psbA 5' UTR. Furthermore, expression of the T7g1.3 and T4g23 5' UTR::aadA fusions downstream of the cel6A gene provided sufficient spectinomycin resistance to allow selection of homoplasmic transgenic plants and had no effect on Cel6A accumulation.
Li, Juan; Chen, Fen; Sugiyama, Hiromu; Blair, David; Lin, Rui-Qing; Zhu, Xing-Quan
2015-07-01
In the present study, near-complete mitochondrial (mt) genome sequences for Schistosoma japonicum from different regions in the Philippines and Japan were amplified and sequenced. Comparisons among S. japonicum from the Philippines, Japan, and China revealed a geographically based length difference in mt genomes, but the mt genomic organization and gene arrangement were the same. Sequence differences among samples from the Philippines and all samples from the three endemic areas were 0.57-2.12 and 0.76-3.85 %, respectively. The most variable part of the mt genome was the non-coding region. In the coding portion of the genome, protein-coding genes varied more than rRNA genes and tRNAs. The near-complete mt genome sequences for Philippine specimens were identical in length (14,091 bp) which was 4 bp longer than those of S. japonicum samples from Japan and China. This indel provides a unique genetic marker for S. japonicum samples from the Philippines. Phylogenetic analyses based on the concatenated amino acids of 12 protein-coding genes showed that samples of S. japonicum clustered according to their geographical origins. The identified mitochondrial indel marker will be useful for tracing the source of S. japonicum infection in humans and animals in Southeast Asia.
Redwan, R M; Saidin, A; Kumar, S V
2015-08-12
Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology. In this study, the high error rate of PacBio long sequence reads of A. comosus's total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus. The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.
Novel mutations in the CHST6 gene associated with macular corneal dystrophy in southern India.
Warren, John F; Aldave, Anthony J; Srinivasan, M; Thonar, Eugene J; Kumar, Abha B; Cevallos, Vicky; Whitcher, John P; Margolis, Todd P
2003-11-01
To further characterize the role of the carbohydrate sulfotransferase (CHST6) gene in macular corneal dystrophy (MCD) through identification of causative mutations in a cohort of affected patients from southern India. Genomic DNA was extracted from buccal epithelium of 75 patients (51 families) with MCD, 33 unaffected relatives, and 48 healthy volunteers. The coding region of the CHST6 gene was evaluated by means of polymerase chain reaction amplification and direct sequencing. Subtyping of MCD into types I and II was performed by measuring serum levels of antigenic keratan sulfate. Seventy patients were classified as having type I MCD, and 5 patients as having type II MCD. Analysis of the CHST6 coding region in patients with type I MCD identified 11 homozygous missense mutations (Leu22Arg, His42Tyr, Arg50Cys, Arg50Leu, Ser53Leu, Arg97Pro, Cys102Tyr, Arg127Cys, Arg205Gln, His249Pro, and Glu274Lys), 2 compound heterozygous missense mutations (Arg93His and Ala206Thr), 5 homozygous deletion mutations (delCG707-708, delC890, delA1237, del1748-1770, and delORF), and 2 homozygous replacement mutations (ACCTAC 1273 GGT, and GCG 1304 AT). One patient with type II MCD was heterozygous for the C890 deletion mutation, whereas 4 possessed no CHST6 coding region mutations. A variety of previously unreported mutations in the coding region of the CHST6 gene are associated with type I MCD in a cohort of patients in southern India. An improved understanding of the genetic basis of MCD allows for earlier, more accurate diagnosis of affected individuals, and may provide the foundation for the development of novel disease treatments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ansong, Charles; Tolic, Nikola; Purvine, Samuel O.
Complete and accurate genome annotation is crucial for comprehensive and systematic studies of biological systems. For example systems biology-oriented genome scale modeling efforts greatly benefit from accurate annotation of protein-coding genes to develop proper functioning models. However, determining protein-coding genes for most new genomes is almost completely performed by inference, using computational predictions with significant documented error rates (> 15%). Furthermore, gene prediction programs provide no information on biologically important post-translational processing events critical for protein function. With the ability to directly measure peptides arising from expressed proteins, mass spectrometry-based proteomics approaches can be used to augment and verify codingmore » regions of a genomic sequence and importantly detect post-translational processing events. In this study we utilized “shotgun” proteomics to guide accurate primary genome annotation of the bacterial pathogen Salmonella Typhimurium 14028 to facilitate a systems-level understanding of Salmonella biology. The data provides protein-level experimental confirmation for 44% of predicted protein-coding genes, suggests revisions to 48 genes assigned incorrect translational start sites, and uncovers 13 non-annotated genes missed by gene prediction programs. We also present a comprehensive analysis of post-translational processing events in Salmonella, revealing a wide range of complex chemical modifications (70 distinct modifications) and confirming more than 130 signal peptide and N-terminal methionine cleavage events in Salmonella. This study highlights several ways in which proteomics data applied during the primary stages of annotation can improve the quality of genome annotations, especially with regards to the annotation of mature protein products.« less
Hall, L; Laird, J E; Craig, R K
1984-01-01
Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375
Divergent transcription is associated with promoters of transcriptional regulators
2013-01-01
Background Divergent transcription is a wide-spread phenomenon in mammals. For instance, short bidirectional transcripts are a hallmark of active promoters, while longer transcripts can be detected antisense from active genes in conditions where the RNA degradation machinery is inhibited. Moreover, many described long non-coding RNAs (lncRNAs) are transcribed antisense from coding gene promoters. However, the general significance of divergent lncRNA/mRNA gene pair transcription is still poorly understood. Here, we used strand-specific RNA-seq with high sequencing depth to thoroughly identify antisense transcripts from coding gene promoters in primary mouse tissues. Results We found that a substantial fraction of coding-gene promoters sustain divergent transcription of long non-coding RNA (lncRNA)/mRNA gene pairs. Strikingly, upstream antisense transcription is significantly associated with genes related to transcriptional regulation and development. Their promoters share several characteristics with those of transcriptional developmental genes, including very large CpG islands, high degree of conservation and epigenetic regulation in ES cells. In-depth analysis revealed a unique GC skew profile at these promoter regions, while the associated coding genes were found to have large first exons, two genomic features that might enforce bidirectional transcription. Finally, genes associated with antisense transcription harbor specific H3K79me2 epigenetic marking and RNA polymerase II enrichment profiles linked to an intensified rate of early transcriptional elongation. Conclusions We concluded that promoters of a class of transcription regulators are characterized by a specialized transcriptional control mechanism, which is directly coupled to relaxed bidirectional transcription. PMID:24365181
Statistical and linguistic features of DNA sequences
NASA Technical Reports Server (NTRS)
Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.
1995-01-01
We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.
Chien, Maw-Sheng; Gilbert , Teresa L.; Huang, Chienjin; Landolt, Marsha L.; O'Hara, Patrick J.; Winton, James R.
1992-01-01
The complete sequence coding for the 57-kDa major soluble antigen of the salmonid fish pathogen, Renibacterium salmoninarum, was determined. The gene contained an opening reading frame of 1671 nucleotides coding for a protein of 557 amino acids with a calculated Mr value of 57190. The first 26 amino acids constituted a signal peptide. The deduced sequence for amino acid residues 27–61 was in agreement with the 35 N-terminal amino acid residues determined by microsequencing, suggesting the protein in synthesized as a 557-amino acid precursor and processed to produce a mature protein of Mr 54505. Two regions of the protein contained imperfect direct repeats. The first region contained two copies of an 81-residue repeat, the second contained five copies of an unrelated 25-residue repeat. Also, a perfect inverted repeat (including three in-frame UAA stop codons) was observed at the carboxyl-terminus of the gene.
Zhang, Yanan; Song, Tao; Pan, Tao; Sun, Xiaonan; Sun, Zhonglou; Qian, Lifu; Zhang, Baowei
2016-07-01
The complete sequence of the mitochondrial genome was determined for Asio flammeus, which is distributed widely in geography. The length of the complete mitochondrial genome was 18,966 bp, containing 2 rRNA genes, 22 tRNA genes, 13 protein-coding genes (PCGs), and 1 non-coding region (D-loop). All the genes were distributed on the H-strand, except for the ND6 subunit gene and eight tRNA genes which were encoded on the L-strand. The D-loop of A. flammeus contained many tandem repeats of varying lengths and repeat numbers. The molecular-based phylogeny showed that our species acted as the sister group to A. capensis and the supported Asio was the monophyletic group.
The Complete Mitochondrial Genome of the Rice Moth, Corcyra cephalonica
Wu, Yu-Peng; Li, Jie; Zhao, Jin-Liang; Su, Tian-Juan; Luo, A-Rong; Fan, Ren-Jun; Chen, Ming-Chang; Wu, Chun-Sheng; Zhu, Chao-Dong
2012-01-01
The complete mitochondrial genome (mitogenome) of the rice moth, Corcyra cephalonica Stainton (Lepidoptera: Pyralidae) was determined as a circular molecular of 15,273 bp in size. The mitogenome composition (37 genes) and gene order are the same as the other lepidopterans. Nucleotide composition of the C. cephalonica mitogenome is highly A+T biased (80.43%) like other insects. Twelve protein-coding genes start with a typical ATN codon, with the exception of coxl gene, which uses CGA as the initial codon. Nine protein-coding genes have the common stop codon TAA, and the nad2, cox1, cox2, and nad4 have single T as the incomplete stop codon. 22 tRNA genes demonstrated cloverleaf secondary structure. The mitogenome has several large intergenic spacer regions, the spacer1 between trnQ gene and nad2 gene, which is common in Lepidoptera. The spacer 3 between trnE and trnF includes microsatellite-like repeat regions (AT)18 and (TTAT)3. The spacer 4 (16 bp) between trnS2 gene and nad1 gene has a motif ATACTAT; another species, Sesamia inferens encodes ATCATAT at the same position, while other lepidopteran insects encode a similar ATACTAA motif. The spacer 6 is A+T rich region, include motif ATAGA and a 20-bp poly(T) stretch and two microsatellite (AT)9, (AT)8 elements. PMID:23413968
The complete mitochondrial genome of the rice moth, Corcyra cephalonica.
Wu, Yu-Peng; Li, Jie; Zhao, Jin-Liang; Su, Tian-Juan; Luo, A-Rong; Fan, Ren-Jun; Chen, Ming-Chang; Wu, Chun-Sheng; Zhu, Chao-Dong
2012-01-01
The complete mitochondrial genome (mitogenome) of the rice moth, Corcyra cephalonica Stainton (Lepidoptera: Pyralidae) was determined as a circular molecular of 15,273 bp in size. The mitogenome composition (37 genes) and gene order are the same as the other lepidopterans. Nucleotide composition of the C. cephalonica mitogenome is highly A+T biased (80.43%) like other insects. Twelve protein-coding genes start with a typical ATN codon, with the exception of coxl gene, which uses CGA as the initial codon. Nine protein-coding genes have the common stop codon TAA, and the nad2, cox1, cox2, and nad4 have single T as the incomplete stop codon. 22 tRNA genes demonstrated cloverleaf secondary structure. The mitogenome has several large intergenic spacer regions, the spacer1 between trnQ gene and nad2 gene, which is common in Lepidoptera. The spacer 3 between trnE and trnF includes microsatellite-like repeat regions (AT)18 and (TTAT)(3). The spacer 4 (16 bp) between trnS2 gene and nad1 gene has a motif ATACTAT; another species, Sesamia inferens encodes ATCATAT at the same position, while other lepidopteran insects encode a similar ATACTAA motif. The spacer 6 is A+T rich region, include motif ATAGA and a 20-bp poly(T) stretch and two microsatellite (AT)(9), (AT)(8) elements.
The complete chloroplast genome of Sinopodophyllum hexandrum (Berberidaceae).
Li, Huie; Guo, Qiqiang
2016-07-01
The complete chloroplast (cp) genome of the Sinopodophyllum hexandrum (Berberidaceae) was determined in this study. The circular genome is 157,940 bp in size, and comprises a pair of inverted repeat (IR) regions of 26,077 bp each, a large single-copy (LSC) region of 86,460 bp and a small single-copy (SSC) region of 19,326 bp. The GC content of the whole cp genome was 38.5%. A total of 133 genes were identified, including 88 protein-coding genes, 37 tRNA genes and eight rRNA genes. The whole cp genome consists of 114 unique genes, and 19 genes are duplicated in the IR regions. The phylogenetic analysis revealed that S. hexandrum is closely related to Nandina domestica within the family Berberidaceae.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Helfenbein, Kevin G.; Brown, Wesley M.; Boore, Jeffrey L.
We have sequenced the complete mitochondrial DNA (mtDNA) of the articulate brachiopod Terebratalia transversa. The circular genome is 14,291 bp in size, relatively small compared to other published metazoan mtDNAs. The 37 genes commonly found in animal mtDNA are present; the size decrease is due to the truncation of several tRNA, rRNA, and protein genes, to some nucleotide overlaps, and to a paucity of non-coding nucleotides. Although the gene arrangement differs radically from those reported for other metazoans, some gene junctions are shared with two other articulate brachiopods, Laqueus rubellus and Terebratulina retusa. All genes in the T. transversa mtDNA,more » unlike those in most metazoan mtDNAs reported, are encoded by the same strand. The A+T content (59.1 percent) is low for a metazoan mtDNA, and there is a high propensity for homopolymer runs and a strong base-compositional strand bias. The coding strand is quite G+T-rich, a skew that is shared by the confamilial (laqueid) specie s L. rubellus, but opposite to that found in T. retusa, a cancellothyridid. These compositional skews are strongly reflected in the codon usage patterns and the amino acid compositions of the mitochondrial proteins, with markedly different usage observed between T. retusa and the two laqueids. This observation, plus the similarity of the laqueid non-coding regions to the reverse complement of the non-coding region of the cancellothyridid, suggest that an inversion that resulted in a reversal in the direction of first-strand replication has occurred in one of the two lineages. In addition to the presence of one non-coding region in T. transversa that is comparable to those in the other brachiopod mtDNAs, there are two others with the potential to form secondary structures; one or both of these may be involved in the process of transcript cleavage.« less
Song, Yuepeng; Tian, Min; Ci, Dong; Zhang, Deqiang
2015-04-01
Previous studies showed sex-specific DNA methylation and expression of candidate genes in bisexual flowers of andromonoecious poplar, but the regulatory relationship between methylation and microRNAs (miRNAs) remains unclear. To investigate whether the methylation of miRNA genes regulates gene expression in bisexual flower development, the methylome, microRNA, and transcriptome were examined in female and male flowers of andromonoecious poplar. 27 636 methylated coding genes and 113 methylated miRNA genes were identified. In the coding genes, 64.5% of the methylated reads mapped to the gene body region; by contrast, 60.7% of methylated reads in miRNA genes mainly mapped in the 5' and 3' flanking regions. CHH methylation showed the highest methylation levels and CHG showed the lowest methylation levels. Correlation analysis showed a significant, negative, strand-specific correlation of methylation and miRNA gene expression (r=0.79, P <0.05). The methylated miRNA genes included eight long miRNAs (lmiRNAs) of 24 nucleotides and 11 miRNAs related to flower development. miRNA172b might play an important role in the regulation of bisexual flower development-related gene expression in andromonoecious poplar, via modification of methylation. Gynomonoecious, female, and male poplars were used to validate the methylation patterns of the miRNA172b gene, implying that hyper-methylation in andromonoecious and gynomonoecious poplar might function as an important regulator in bisexual flower development. Our data provide a useful resource for the study of flower development in poplar and improve our understanding of the effect of epigenetic regulation on genes other than protein-coding genes. © The Author 2015. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Song, Yuepeng; Tian, Min; Ci, Dong; Zhang, Deqiang
2015-01-01
Previous studies showed sex-specific DNA methylation and expression of candidate genes in bisexual flowers of andromonoecious poplar, but the regulatory relationship between methylation and microRNAs (miRNAs) remains unclear. To investigate whether the methylation of miRNA genes regulates gene expression in bisexual flower development, the methylome, microRNA, and transcriptome were examined in female and male flowers of andromonoecious poplar. 27 636 methylated coding genes and 113 methylated miRNA genes were identified. In the coding genes, 64.5% of the methylated reads mapped to the gene body region; by contrast, 60.7% of methylated reads in miRNA genes mainly mapped in the 5′ and 3′ flanking regions. CHH methylation showed the highest methylation levels and CHG showed the lowest methylation levels. Correlation analysis showed a significant, negative, strand-specific correlation of methylation and miRNA gene expression (r=0.79, P <0.05). The methylated miRNA genes included eight long miRNAs (lmiRNAs) of 24 nucleotides and 11 miRNAs related to flower development. miRNA172b might play an important role in the regulation of bisexual flower development-related gene expression in andromonoecious poplar, via modification of methylation. Gynomonoecious, female, and male poplars were used to validate the methylation patterns of the miRNA172b gene, implying that hyper-methylation in andromonoecious and gynomonoecious poplar might function as an important regulator in bisexual flower development. Our data provide a useful resource for the study of flower development in poplar and improve our understanding of the effect of epigenetic regulation on genes other than protein-coding genes. PMID:25617468
The gene coding for glial cell line derived neurotrophic factor (GDNF) maps to chromosome 5p12-p13.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schindelhauer, D.; Schuffenhauer, S.; Meitinger, T.
1995-08-10
The gene coding for glial cell line derived neurotrophic factor (GDNF) has biological properties that may have potential as a treatment for Parkinson`s and motoneuron diseases. Using the NIGMS Mapping Panel 2, we have localized the GDNF gene to human chromosome 5p12-p13.1. Large NruI and NotI fragments on chromosome 5 will facilitate the construction of a long-range map of the region. 26 refs., 1 fig., 1 tab.
J Genes for Heavy Chain Immunoglobulins of Mouse
NASA Astrophysics Data System (ADS)
Newell, Nanette; Richards, Julia E.; Tucker, Philip W.; Blattner, Frederick R.
1980-09-01
A 15.8-kilobase pair fragment of BALB/c mouse liver DNA, cloned in the Charon 4Aλ phage vector system, was shown to contain the μ heavy chain constant region (CHμ ) gene for the mouse immunoglobulin M. In addition, this fragment of DNA contains at least two J genes, used to code for the carboxyl terminal portion of heavy chain variable regions. These genes are located in genomic DNA about eight kilobase pairs to the 5' side of the CHμ gene. The complete nucleotide sequence of a 1120-base pair stretch of DNA that includes the two J genes has been determined.
Tau mRNA 3'UTR-to-CDS ratio is increased in Alzheimer disease.
García-Escudero, Vega; Gargini, Ricardo; Martín-Maestro, Patricia; García, Esther; García-Escudero, Ramón; Avila, Jesús
2017-08-10
Neurons frequently show an imbalance in expression of the 3' untranslated region (3'UTR) relative to the coding DNA sequence (CDS) region of mature messenger RNAs (mRNA). The ratio varies among different cells or parts of the brain. The Map2 protein levels per cell depend on the 3'UTR-to-CDS ratio rather than the total mRNA amount, which suggests powerful regulation of protein expression by 3'UTR sequences. Here we found that MAPT (the microtubule-associated protein tau gene) 3'UTR levels are particularly high with respect to other genes; indeed, the 3'UTR-to-CDS ratio of MAPT is balanced in healthy brain in mouse and human. The tau protein accumulates in Alzheimer diseased brain. We nonetheless observed that the levels of RNA encoding MAPT/tau were diminished in these patients' brains. To explain this apparently contradictory result, we studied MAPT mRNA stoichiometry in coding and non-coding regions, and found that the 3'UTR-to-CDS ratio was higher in the hippocampus of Alzheimer disease patients, with higher tau protein but lower total mRNA levels. Our data indicate that changes in the 3'UTR-to-CDS ratio have a regulatory role in the disease. Future research should thus consider not only mRNA levels, but also the ratios between coding and non-coding regions. Copyright © 2017 Elsevier B.V. All rights reserved.
Characterization of the complete mitochondrial genome of the king pigeon (Columba livia breed king).
Zhang, Rui-Hua; He, Wen-Xiao; Xu, Tong
2015-06-01
The king pigeon is a breed of pigeon developed over many years of selective breeding primarily as a utility breed. In the present work, we report the complete mitochondrial genome sequence of king pigeon for the first time. The total length of the mitogenome was 17,221 bp with the base composition of 30.14% for A, 24.05% for T, 31.82% for C, and 13.99% for G and an A-T (54.22 %)-rich feature was detected. It harbored 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and one non-coding control region (D-loop region). The arrangement of all genes was identical to the typical mitochondrial genomes of pigeon. The complete mitochondrial genome sequence of king pigeon would serve as an important data set of the germplasm resources for further study.
Giardina, P; Cannio, R; Martirani, L; Marzullo, L; Palmieri, G; Sannia, G
1995-01-01
The gene (pox1) encoding a phenol oxidase from Pleurotus ostreatus, a lignin-degrading basidiomycete, was cloned and sequenced, and the corresponding pox1 cDNA was also synthesized and sequenced. The isolated gene consists of 2,592 bp, with the coding sequence being interrupted by 19 introns and flanked by an upstream region in which putative CAAT and TATA consensus sequences could be identified at positions -174 and -84, respectively. The isolation of a second cDNA (pox2 cDNA), showing 84% similarity, and of the corresponding truncated genomic clones demonstrated the existence of a multigene family coding for isoforms of laccase in P. ostreatus. PCR amplifications of specific regions on the DNA of isolated monokaryons proved that the two genes are not allelic forms. The POX1 amino acid sequence deduced was compared with those of other known laccases from different fungi. PMID:7793961
Wang, Shuo; Gao, Li-Zhi
2016-11-01
The complete chloroplast genome sequence of foxtail millet (Setaria italica), an important food and fodder crop in the family Poaceae, is first reported in this study. The genome consists of 1 35 516 bp containing a pair of inverted repeats (IRs) of 21 804 bp separated by a large single-copy (LSC) region and a small single-copy (SSC) region of 79 896 bp and 12 012 bp, respectively. Coding sequences constitute 58.8% of the genome harboring 111 unique genes, 71 of which are protein-coding genes, 4 are rRNA genes, and 36 are tRNA genes. Phylogenetic analysis indicated foxtail millet clustered with Panicum virgatum and Echinochloa crus-galli belonging to the tribe Paniceae of the subfamily Panicoideae. This newly determined chloroplast genome will provide valuable information for the future breeding programs of valuable cereal crops in the family Poaceae.
Luna, M G; Martins, M M; Newton, S M; Costa, S O; Almeida, D F; Ferreira, L C
1997-01-01
Oligonucleotides coding for linear epitopes of the fimbrial colonization factor antigen I (CFA/I) of enterotoxigenic Escherichia coli (ETEC) were cloned and expressed in a deleted form of the Salmonella muenchen flagellin fliC (H1-d) gene. Four synthetic oligonucleotide pairs coding for regions corresponding to amino acids 1 to 15 (region I), amino acids 11 to 25 (region II), amino acids 32 to 45 (region III) and amino acids 88 to 102 (region IV) were synthesized and cloned in the Salmonella flagellin-coding gene. All four hybrid flagellins were exported to the bacterial surface where they produced flagella, but only three constructs were fully motile. Sera recovered from mice immunized with intraperitoneal injections of purified flagella containing region II (FlaII) or region IV (FlaIV) showed high titres against dissociated solid-phase-bound CFA/I subunits. Hybrid flagellins containing region I (FlaI) or region III (FlaIII) elicited a weak immune response as measured in enzyme-linked immunosorbent assay (ELISA) with dissociated CFA/I subunits. None of the sera prepared with purified hybrid flagella were able to agglutinate or inhibit haemagglutination promoted by CFA/I-positive strains. Moreover, inhibition ELISA tests indicated that antisera directed against region I, II, III or IV cloned in flagellin were not able to recognize surface-exposed regions on the intact CFA/I fimbriae.
Li, Wan; Zhu, Lina; Huang, Hao; He, Yuehan; Lv, Junjie; Li, Weimin; Chen, Lina; He, Weiming
2017-10-01
Complex chronic diseases are caused by the effects of genetic and environmental factors. Single nucleotide polymorphisms (SNPs), one common type of genetic variations, played vital roles in diseases. We hypothesized that disease risk functional SNPs in coding regions and protein interaction network modules were more likely to contribute to the identification of disease susceptible genes for complex chronic diseases. This could help to further reveal the pathogenesis of complex chronic diseases. Disease risk SNPs were first recognized from public SNP data for coronary heart disease (CHD), hypertension (HT) and type 2 diabetes (T2D). SNPs in coding regions that were classified into nonsense and missense by integrating several SNP functional annotation databases were treated as functional SNPs. Then, regions significantly associated with each disease were screened using random permutations for disease risk functional SNPs. Corresponding to these regions, 155, 169 and 173 potential disease susceptible genes were identified for CHD, HT and T2D, respectively. A disease-related gene product interaction network in environmental context was constructed for interacting gene products of both disease genes and potential disease susceptible genes for these diseases. After functional enrichment analysis for disease associated modules, 5 CHD susceptible genes, 7 HT susceptible genes and 3 T2D susceptible genes were finally identified, some of which had pleiotropic effects. Most of these genes were verified to be related to these diseases in literature. This was similar for disease genes identified from another method proposed by Lee et al. from a different aspect. This research could provide novel perspectives for diagnosis and treatment of complex chronic diseases and susceptible genes identification for other diseases. Copyright © 2017 Elsevier Inc. All rights reserved.
Urantowka, Adam Dawid; Hajduk, Kacper; Kosowska, Barbara
2013-08-01
Amazona barbadensis is an endangered species of parrot living in northern coastal Venezuela and in several Caribbean islands. In this study, we sequenced full mitochondrial genome of the considered species. The total length of the mitogenome was 18,983 bp and contained 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, duplicated control region, and degenerate copies of ND6 and tRNA (Glu) genes. High degree of identity between two copies of control region suggests their coincident evolution and functionality. Comparative analysis of both the control region sequences from four Amazona species revealed their 89.1% identity over a region of 1300 bp and indicates the presence of distinctive parts of two control region copies.
Natural variation of rice blast resistance gene Pi-d2
USDA-ARS?s Scientific Manuscript database
Studying natural variation of rice resistance (R) genes in cultivated and wild rice relatives can predict resistance stability to rice blast fungus. In the present study, the protein coding regions of rice R gene Pi-d2 in 35 rice accessions of subgroups, aus (AUS), indica (IND), temperate japonica (...
Insights into HLA-G Genetics Provided by Worldwide Haplotype Diversity
Castelli, Erick C.; Ramalho, Jaqueline; Porto, Iane O. P.; Lima, Thálitta H. A.; Felício, Leandro P.; Sabbagh, Audrey; Donadi, Eduardo A.; Mendes-Junior, Celso T.
2014-01-01
Human leukocyte antigen G (HLA-G) belongs to the family of non-classical HLA class I genes, located within the major histocompatibility complex (MHC). HLA-G has been the target of most recent research regarding the function of class I non-classical genes. The main features that distinguish HLA-G from classical class I genes are (a) limited protein variability, (b) alternative splicing generating several membrane bound and soluble isoforms, (c) short cytoplasmic tail, (d) modulation of immune response (immune tolerance), and (e) restricted expression to certain tissues. In the present work, we describe the HLA-G gene structure and address the HLA-G variability and haplotype diversity among several populations around the world, considering each of its major segments [promoter, coding, and 3′ untranslated region (UTR)]. For this purpose, we developed a pipeline to reevaluate the 1000Genomes data and recover miscalled or missing genotypes and haplotypes. It became clear that the overall structure of the HLA-G molecule has been maintained during the evolutionary process and that most of the variation sites found in the HLA-G coding region are either coding synonymous or intronic mutations. In addition, only a few frequent and divergent extended haplotypes are found when the promoter, coding, and 3′UTRs are evaluated together. The divergence is particularly evident for the regulatory regions. The population comparisons confirmed that most of the HLA-G variability has originated before human dispersion from Africa and that the allele and haplotype frequencies have probably been shaped by strong selective pressures. PMID:25339953
The complete chloroplast genome sequence of the medicinal plant Andrographis paniculata.
Ding, Ping; Shao, Yanhua; Li, Qian; Gao, Junli; Zhang, Runjing; Lai, Xiaoping; Wang, Deqin; Zhang, Huiye
2016-07-01
The complete chloroplast genome of Andrographis paniculata, an important medicinal plant with great economic value, has been studied in this article. The genome size is 150,249 bp in length, with 38.3% GC content. A pair of inverted repeats (IRs, 25,300 bp) are separated by a large single copy region (LSC, 82,459 bp) and a small single-copy region (SSC, 17,190 bp). The chloroplast genome contains 114 unique genes, 80 protein-coding genes, 30 tRNA genes and 4 rRNA genes. In these genes, 15 genes contained 1 intron and 3 genes comprised of 2 introns.
Luo, Arong; Zhang, Aibing; Ho, Simon Yw; Xu, Weijun; Zhang, Yanzhou; Shi, Weifeng; Cameron, Stephen L; Zhu, Chaodong
2011-01-28
A well-informed choice of genetic locus is central to the efficacy of DNA barcoding. Current DNA barcoding in animals involves the use of the 5' half of the mitochondrial cytochrome oxidase 1 gene (CO1) to diagnose and delimit species. However, there is no compelling a priori reason for the exclusive focus on this region, and it has been shown that it performs poorly for certain animal groups. To explore alternative mitochondrial barcoding regions, we compared the efficacy of the universal CO1 barcoding region with the other mitochondrial protein-coding genes in eutherian mammals. Four criteria were used for this comparison: the number of recovered species, sequence variability within and between species, resolution to taxonomic levels above that of species, and the degree of mutational saturation. Based on 1,179 mitochondrial genomes of eutherians, we found that the universal CO1 barcoding region is a good representative of mitochondrial genes as a whole because the high species-recovery rate (> 90%) was similar to that of other mitochondrial genes, and there were no significant differences in intra- or interspecific variability among genes. However, an overlap between intra- and interspecific variability was still problematic for all mitochondrial genes. Our results also demonstrated that any choice of mitochondrial gene for DNA barcoding failed to offer significant resolution at higher taxonomic levels. We suggest that the CO1 barcoding region, the universal DNA barcode, is preferred among the mitochondrial protein-coding genes as a molecular diagnostic at least for eutherian species identification. Nevertheless, DNA barcoding with this marker may still be problematic for certain eutherian taxa and our approach can be used to test potential barcoding loci for such groups.
2011-01-01
Background A well-informed choice of genetic locus is central to the efficacy of DNA barcoding. Current DNA barcoding in animals involves the use of the 5' half of the mitochondrial cytochrome oxidase 1 gene (CO1) to diagnose and delimit species. However, there is no compelling a priori reason for the exclusive focus on this region, and it has been shown that it performs poorly for certain animal groups. To explore alternative mitochondrial barcoding regions, we compared the efficacy of the universal CO1 barcoding region with the other mitochondrial protein-coding genes in eutherian mammals. Four criteria were used for this comparison: the number of recovered species, sequence variability within and between species, resolution to taxonomic levels above that of species, and the degree of mutational saturation. Results Based on 1,179 mitochondrial genomes of eutherians, we found that the universal CO1 barcoding region is a good representative of mitochondrial genes as a whole because the high species-recovery rate (> 90%) was similar to that of other mitochondrial genes, and there were no significant differences in intra- or interspecific variability among genes. However, an overlap between intra- and interspecific variability was still problematic for all mitochondrial genes. Our results also demonstrated that any choice of mitochondrial gene for DNA barcoding failed to offer significant resolution at higher taxonomic levels. Conclusions We suggest that the CO1 barcoding region, the universal DNA barcode, is preferred among the mitochondrial protein-coding genes as a molecular diagnostic at least for eutherian species identification. Nevertheless, DNA barcoding with this marker may still be problematic for certain eutherian taxa and our approach can be used to test potential barcoding loci for such groups. PMID:21276253
The whole chloroplast genome of wild rice (Oryza australiensis).
Wu, Zhiqiang; Ge, Song
2016-01-01
The whole chloroplast genome of wild rice (Oryza australiensis) is characterized in this study. The genome size is 135,224 bp, exhibiting a typical circular structure including a pair of 25,776 bp inverted repeats (IRa,b) separated by a large single-copy region (LSC) of 82,212 bp and a small single-copy region (SSC) of 12,470 bp. The overall GC content of the genome is 38.95%. 110 unique genes were annotated, including 76 protein-coding genes, 4 ribosomal RNA genes, and 30t RNA genes. Among these, 18 are duplicated in the inverted repeat regions, 13 genes contain one intron, and 2 genes (rps12 and ycf3) have two introns.
Prader-Willi Syndrome: Obesity due to Genomic Imprinting
Butler, Merlin G
2011-01-01
Prader-Willi syndrome (PWS) is a complex neurodevelopmental disorder due to errors in genomic imprinting with loss of imprinted genes that are paternally expressed from the chromosome 15q11-q13 region. Approximately 70% of individuals with PWS have a de novo deletion of the paternally derived 15q11-q13 region in which there are two subtypes (i.e., larger Type I or smaller Type II), maternal disomy 15 (both 15s from the mother) in about 25% of cases, and the remaining subjects have either defects in the imprinting center controlling the activity of imprinted genes or due to other chromosome 15 rearrangements. PWS is characterized by a particular facial appearance, infantile hypotonia, a poor suck and feeding difficulties, hypogonadism and hypogenitalism in both sexes, short stature and small hands and feet due to growth hormone deficiency, mild learning and behavioral problems (e.g., skin picking, temper tantrums) and hyperphagia leading to early childhood obesity. Obesity is a significant health problem, if uncontrolled. PWS is considered the most common known genetic cause of morbid obesity in children. The chromosome 15q11-q13 region contains approximately 100 genes and transcripts in which about 10 are imprinted and paternally expressed. This region can be divided into four groups: 1) a proximal non-imprinted region; 2) a PWS paternal-only expressed region containing protein-coding and non-coding genes; 3) an Angelman syndrome region containing maternally expressed genes and 4) a distal non-imprinted region. This review summarizes the current understanding of the genetic causes, the natural history and clinical presentation of individuals with PWS. PMID:22043168
Chang, Chia-Hao; Shao, Kwang-Tsao; Lin, Yeong-Shin; Fang, Yi-Chiao; Ho, Hsuan-Ching
2014-10-01
The complete mitochondrial genome of the great white shark having 16,744 bp and including 13 protein-coding genes, 2 ribosomal RNA, 22 transfer RNA genes, 1 replication origin region and 1 control region. The mitochondrial gene arrangement of the great white shark is the same as the one observed in the most vertebrates. Base composition of the genome is A (30.6%), T (28.7%), C (26.9%) and G (13.9%).
Genetic structure of the mating-type locus of Chlamydomonas reinhardtii.
Ferris, Patrick J; Armbrust, E Virginia; Goodenough, Ursula W
2002-01-01
Portions of the cloned mating-type (MT) loci (mt(+) and mt(-)) of Chlamydomonas reinhardtii, defined as the approximately 1-Mb domains of linkage group VI that are under recombinational suppression, were subjected to Northern analysis to elucidate their coding capacity. The four central rearranged segments of the loci were found to contain both housekeeping genes (expressed during several life-cycle stages) and mating-related genes, while the sequences unique to mt(+) or mt(-) carried genes expressed only in the gametic or zygotic phases of the life cycle. One of these genes, Mtd1, is a candidate participant in gametic cell fusion; two others, Mta1 and Ezy2, are candidate participants in the uniparental inheritance of chloroplast DNA. The identified housekeeping genes include Pdk, encoding pyruvate dehydrogenase kinase, and GdcH, encoding glycine decarboxylase complex subunit H. Unusual genetic configurations include three genes whose sequences overlap, one gene that has inserted into the coding region of another, several genes that have been inactivated by rearrangements in the region, and genes that have undergone tandem duplication. This report extends our original conclusion that the MT locus has incurred high levels of mutational change. PMID:11805055
Tembrock, Luke R.; Zheng, Shaoyu; Wu, Zhiqiang
2018-01-01
Qat (Catha edulis, Celastraceae) is a woody evergreen species with great economic and cultural importance. It is cultivated for its stimulant alkaloids cathine and cathinone in East Africa and southwest Arabia. However, genome information, especially DNA sequence resources, for C. edulis are limited, hindering studies regarding interspecific and intraspecific relationships. Herein, the complete chloroplast (cp) genome of Catha edulis is reported. This genome is 157,960 bp in length with 37% GC content and is structurally arranged into two 26,577 bp inverted repeats and two single-copy areas. The size of the small single-copy and the large single-copy regions were 18,491 bp and 86,315 bp, respectively. The C. edulis cp genome consists of 129 coding genes including 37 transfer RNA (tRNA) genes, 8 ribosomal RNA (rRNA) genes, and 84 protein coding genes. For those genes, 112 are single copy genes and 17 genes are duplicated in two inverted regions with seven tRNAs, four rRNAs, and six protein coding genes. The phylogenetic relationships resolved from the cp genome of qat and 32 other species confirms the monophyly of Celastraceae. The cp genomes of C. edulis, Euonymus japonicus and seven Celastraceae species lack the rps16 intron, which indicates an intron loss took place among an ancestor of this family. The cp genome of C. edulis provides a highly valuable genetic resource for further phylogenomic research, barcoding and cp transformation in Celastraceae. PMID:29425128
Vouille, V; Amiche, M; Nicolas, P
1997-09-01
We cloned the genes of two members of the dermaseptin family, broad-spectrum antimicrobial peptides isolated from the skin of the arboreal frog Phyllomedusa bicolor. The dermaseptin gene Drg2 has a 2-exon coding structure interrupted by a small 137-bp intron, wherein exon 1 encoded a 22-residue hydrophobic signal peptide and the first three amino acids of the acidic propiece; exon 2 contained the 18 additional acidic residues of the propiece plus a typical prohormone processing signal Lys-Arg and a 32-residue dermaseptin progenitor sequence. The dermaseptin genes Drg2 and Drg1g2 have conserved sequences at both untranslated ends and in the first and second coding exons. In contrast, Drg1g2 comprises a third coding exon for a short version of the acidic propiece and a second dermaseptin progenitor sequence. Structural conservation between the two genes suggests that Drg1g2 arose recently from an ancestral Drg2-like gene through amplification of part of the second coding exon and 3'-untranslated region. Analysis of the cDNAs coding precursors for several frog skin peptides of highly different structures and activities demonstrates that the signal peptides and part of the acidic propieces are encoded by conserved nucleotides encompassed by the first coding exon of the dermaseptin genes. The organization of the genes that belong to this family, with the signal peptide and the progenitor sequence on separate exons, permits strikingly different peptides to be directed into the secretory pathway. The recruitment of such a homologous 'secretory' exon by otherwise non-homologous genes may have been an early event in the evolution of amphibian.
Detection of non-coding RNA in bacteria and archaea using the DETR'PROK Galaxy pipeline.
Toffano-Nioche, Claire; Luo, Yufei; Kuchly, Claire; Wallon, Claire; Steinbach, Delphine; Zytnicki, Matthias; Jacq, Annick; Gautheret, Daniel
2013-09-01
RNA-seq experiments are now routinely used for the large scale sequencing of transcripts. In bacteria or archaea, such deep sequencing experiments typically produce 10-50 million fragments that cover most of the genome, including intergenic regions. In this context, the precise delineation of the non-coding elements is challenging. Non-coding elements include untranslated regions (UTRs) of mRNAs, independent small RNA genes (sRNAs) and transcripts produced from the antisense strand of genes (asRNA). Here we present a computational pipeline (DETR'PROK: detection of ncRNAs in prokaryotes) based on the Galaxy framework that takes as input a mapping of deep sequencing reads and performs successive steps of clustering, comparison with existing annotation and identification of transcribed non-coding fragments classified into putative 5' UTRs, sRNAs and asRNAs. We provide a step-by-step description of the protocol using real-life example data sets from Vibrio splendidus and Escherichia coli. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Wang, Aishuai; Sun, Yuena; Wu, Changwen
2016-11-01
The complete mitochondrial genome of the Cheilodactylus quadricornis was firstly determined in the present study. The mitochondrial genome of C. quadricornis is 16 521 nucleotides, comprising 13 protein-coding genes and 2 ribosomal RNA genes, 22 tRNA genes and 2 main non-coding regions (the control region and the origin of the light-strand replication). The overall base composition was T, 26.3%; C, 29.6%; A, 27.8% and G, 16.3%. The gene arrangement, base composition, and tRNA structures of the complete mitochondrial genome of C. quadricornis is similar to other teleosts. Only two central conserved sequence blocks (CSB-2 and CSB-3) were identified in the control region. In addition, the conserved motif 5'-GCCGG-3' was identified in the origin of light-strand replication of C. quadricornis. The complete mitochondrial genome of C. quadricornis was used to construct phylogenetic tree, which shows that C. quadricornis and C. variegatus clustered in a clade and formed a sister relationship. This mitogenome sequence data would play an important role in population genetics and phylogenetic analysis of the Cheilodactylidae.
Rozhdestvensky, Timofey S; Robeck, Thomas; Galiveti, Chenna R; Raabe, Carsten A; Seeger, Birte; Wolters, Anna; Gubar, Leonid V; Brosius, Jürgen; Skryabin, Boris V
2016-02-05
Prader-Willi syndrome (PWS) is a neurogenetic disorder caused by loss of paternally expressed genes on chromosome 15q11-q13. The PWS-critical region (PWScr) contains an array of non-protein coding IPW-A exons hosting intronic SNORD116 snoRNA genes. Deletion of PWScr is associated with PWS in humans and growth retardation in mice exhibiting ~15% postnatal lethality in C57BL/6 background. Here we analysed a knock-in mouse containing a 5'HPRT-LoxP-Neo(R) cassette (5'LoxP) inserted upstream of the PWScr. When the insertion was inherited maternally in a paternal PWScr-deletion mouse model (PWScr(p-/m5'LoxP)), we observed compensation of growth retardation and postnatal lethality. Genomic methylation pattern and expression of protein-coding genes remained unaltered at the PWS-locus of PWScr(p-/m5'LoxP) mice. Interestingly, ubiquitous Snord116 and IPW-A exon transcription from the originally silent maternal chromosome was detected. In situ hybridization indicated that PWScr(p-/m5'LoxP) mice expressed Snord116 in brain areas similar to wild type animals. Our results suggest that the lack of PWScr RNA expression in certain brain areas could be a primary cause of the growth retardation phenotype in mice. We propose that activation of disease-associated genes on imprinted regions could lead to general therapeutic strategies in associated diseases.
A candidate gene for choanal atresia in alpaca.
Reed, Kent M; Bauer, Miranda M; Mendoza, Kristelle M; Armién, Aníbal G
2010-03-01
Choanal atresia (CA) is a common nasal craniofacial malformation in New World domestic camelids (alpaca and llama). CA results from abnormal development of the nasal passages and is especially debilitating to newborn crias. CA in camelids shares many of the clinical manifestations of a similar condition in humans (CHARGE syndrome). Herein we report on the regulatory gene CHD7 of alpaca, whose homologue in humans is most frequently associated with CHARGE. Sequence of the CHD7 coding region was obtained from a non-affected cria. The complete coding region was 9003 bp, corresponding to a translated amino acid sequence of 3000 aa. Additional genomic sequences corresponding to a significant portion of the CHD7 gene were identified and assembled from the 2x alpaca whole genome sequence, providing confirmatory sequence for much of the CHD7 coding region. The alpaca CHD7 mRNA sequence was 97.9% similar to the human sequence, with the greatest sequence difference being an insertion in exon 38 that results in a polyalanine repeat (A12). Polymorphism in this repeat was tested for association with CA in alpaca by cloning and sequencing the repeat from both affected and non-affected individuals. Variation in length of the poly-A repeat was not associated with CA. Complete sequencing of the CHD7 gene will be necessary to determine whether other mutations in CHD7 are the cause of CA in camelids.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).
Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren
2016-04-01
Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)
Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren
2016-01-01
Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575
Dweep, Harsh; Sticht, Carsten; Pandey, Priyanka; Gretz, Norbert
2011-10-01
MicroRNAs are small, non-coding RNA molecules that can complementarily bind to the mRNA 3'-UTR region to regulate the gene expression by transcriptional repression or induction of mRNA degradation. Increasing evidence suggests a new mechanism by which miRNAs may regulate target gene expression by binding in promoter and amino acid coding regions. Most of the existing databases on miRNAs are restricted to mRNA 3'-UTR region. To address this issue, we present miRWalk, a comprehensive database on miRNAs, which hosts predicted as well as validated miRNA binding sites, information on all known genes of human, mouse and rat. All mRNAs, mitochondrial genes and 10 kb upstream flanking regions of all known genes of human, mouse and rat were analyzed by using a newly developed algorithm named 'miRWalk' as well as with eight already established programs for putative miRNA binding sites. An automated and extensive text-mining search was performed on PubMed database to extract validated information on miRNAs. Combined information was put into a MySQL database. miRWalk presents predicted and validated information on miRNA-target interaction. Such a resource enables researchers to validate new targets of miRNA not only on 3'-UTR, but also on the other regions of all known genes. The 'Validated Target module' is updated every month and the 'Predicted Target module' is updated every 6 months. miRWalk is freely available at http://mirwalk.uni-hd.de/. Copyright © 2011 Elsevier Inc. All rights reserved.
MHC class I-associated peptides derive from selective regions of the human genome.
Pearson, Hillary; Daouda, Tariq; Granados, Diana Paola; Durette, Chantal; Bonneil, Eric; Courcelles, Mathieu; Rodenbrock, Anja; Laverdure, Jean-Philippe; Côté, Caroline; Mader, Sylvie; Lemieux, Sébastien; Thibault, Pierre; Perreault, Claude
2016-12-01
MHC class I-associated peptides (MAPs) define the immune self for CD8+ T lymphocytes and are key targets of cancer immunosurveillance. Here, the goals of our work were to determine whether the entire set of protein-coding genes could generate MAPs and whether specific features influence the ability of discrete genes to generate MAPs. Using proteogenomics, we have identified 25,270 MAPs isolated from the B lymphocytes of 18 individuals who collectively expressed 27 high-frequency HLA-A,B allotypes. The entire MAP repertoire presented by these 27 allotypes covered only 10% of the exomic sequences expressed in B lymphocytes. Indeed, 41% of expressed protein-coding genes generated no MAPs, while 59% of genes generated up to 64 MAPs, often derived from adjacent regions and presented by different allotypes. We next identified several features of transcripts and proteins associated with efficient MAP production. From these data, we built a logistic regression model that predicts with good accuracy whether a gene generates MAPs. Our results show preferential selection of MAPs from a limited repertoire of proteins with distinctive features. The notion that the MHC class I immunopeptidome presents only a small fraction of the protein-coding genome for monitoring by the immune system has profound implications in autoimmunity and cancer immunology.
MHC class I–associated peptides derive from selective regions of the human genome
Pearson, Hillary; Granados, Diana Paola; Durette, Chantal; Bonneil, Eric; Courcelles, Mathieu; Rodenbrock, Anja; Laverdure, Jean-Philippe; Côté, Caroline; Thibault, Pierre
2016-01-01
MHC class I–associated peptides (MAPs) define the immune self for CD8+ T lymphocytes and are key targets of cancer immunosurveillance. Here, the goals of our work were to determine whether the entire set of protein-coding genes could generate MAPs and whether specific features influence the ability of discrete genes to generate MAPs. Using proteogenomics, we have identified 25,270 MAPs isolated from the B lymphocytes of 18 individuals who collectively expressed 27 high-frequency HLA-A,B allotypes. The entire MAP repertoire presented by these 27 allotypes covered only 10% of the exomic sequences expressed in B lymphocytes. Indeed, 41% of expressed protein-coding genes generated no MAPs, while 59% of genes generated up to 64 MAPs, often derived from adjacent regions and presented by different allotypes. We next identified several features of transcripts and proteins associated with efficient MAP production. From these data, we built a logistic regression model that predicts with good accuracy whether a gene generates MAPs. Our results show preferential selection of MAPs from a limited repertoire of proteins with distinctive features. The notion that the MHC class I immunopeptidome presents only a small fraction of the protein-coding genome for monitoring by the immune system has profound implications in autoimmunity and cancer immunology. PMID:27841757
Transcriptional mapping of the ribosomal RNA region of mouse L-cell mitochondrial DNA.
Nagley, P; Clayton, D A
1980-01-01
The map positions in mouse mitochondrial DNA of the two ribosomal RNA genes and adjacent genes coding several small transcripts have been determined precisely by application of a procedure in which DNA-RNA hybrids have been subjected to digestion by S1 nuclease under conditions of varying severity. Digestion of the DNA-RNA hybrids with S1 nuclease yielded a series of species which were shown to contain ribosomal RNA molecules together with adjacent transcripts hybridized conjointly to a continuous segment of mitochondrial DNA. There is one small transcript about 60 bases long whose gene adjoins the sequences coding the 5'-end of the small ribosomal RNA (950 bases) and which lies approximately 200 nucleotides from the D-loop origin of heavy strand mitochondrial DNA synthesis. An 80-base transcript lies between the small and large ribosomal RNA genes, and genes for two further short transcript (each about 80 bases in length) abut the sequences coding the 3'-end of the large ribosomal RNA (approximately 1500 bases). The ability to isolate a discrete DNA-RNA hybrid species approximately 2700 base pairs in length containing all these transcripts suggests that there can be few nucleotides in this region of mouse mitochondrial DNA which are not represented as stable RNA species. Images PMID:6253898
The complete chloroplast genome sequence of Dendrobium officinale.
Yang, Pei; Zhou, Hong; Qian, Jun; Xu, Haibin; Shao, Qingsong; Li, Yonghua; Yao, Hui
2016-01-01
The complete chloroplast sequence of Dendrobium officinale, an endangered and economically important traditional Chinese medicine, was reported and characterized. The genome size is 152,018 bp, with 37.5% GC content. A pair of inverted repeats (IRs) of 26,284 bp are separated by a large single-copy region (LSC, 84,944 bp) and a small single-copy region (SSC, 14,506 bp). The complete cp DNA contains 83 protein-coding genes, 39 tRNA genes and 8 rRNA genes. Fourteen genes contained one or two introns.
Zhao, Fang; Huang, Dun-Yuan; Sun, Xiao-Yan; Shi, Qing-Hui; Hao, Jia-Sheng; Zhang, Lan-Lan; Yang, Qun
2013-10-01
The Riodinidae is one of the lepidopteran butterfly families. This study describes the complete mitochondrial genome of the butterfly species Abisara fylloides, the first mitochondrial genome of the Riodinidae family. The results show that the entire mitochondrial genome of A. fylloides is 15 301 bp in length, and contains 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and a 423 bp A+T-rich region. The gene content, orientation and order are identical to the majority of other lepidopteran insects. Phylogenetic reconstruction was conducted using the concatenated 13 protein-coding gene (PCG) sequences of 19 available butterfly species covering all the five butterfly families (Papilionidae, Nymphalidae, Peridae, Lycaenidae and Riodinidae). Both maximum likelihood and Bayesian inference analyses highly supported the monophyly of Lycaenidae+Riodinidae, which was standing as the sister of Nymphalidae. In addition, we propose that the riodinids be categorized into the family Lycaenidae as a subfamilial taxon. The Riodinidae is one of the lepidopteran butterfly families. This study describes the complete mitochondrial genome of the butterfly species Abisara fylloides , the first mitochondrial genome of the Riodinidae family. The results show that the entire mitochondrial genome of A. fylloides is 15 301 bp in length, and contains 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and a 423 bp A+T-rich region. The gene content, orientation and order are identical to the majority of other lepidopteran insects. Phylogenetic reconstruction was conducted using the concatenated 13 protein-coding gene (PCG) sequences of 19 available butterfly species covering all the five butterfly families (Papilionidae, Nymphalidae, Peridae, Lycaenidae and Riodinidae). Both maximum likelihood and Bayesian inference analyses highly supported the monophyly of Lycaenidae+Riodinidae, which was standing as the sister of Nymphalidae. In addition, we propose that the riodinids be categorized into the family Lycaenidae as a subfamilial taxon.
Caldwell, Rachel; Lin, Yan-Xia; Zhang, Ren
2015-01-01
There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length. PMID:26114098
Gan, Han Ming; Tan, Mun Hua; Lee, Yin Peng; Austin, Christopher M
2016-05-01
The mitogenome of the Australian freshwater blackfish, Gadopsis marmoratus was recovered coverage by genome skimming using the MiSeq sequencer (GenBank Accession Number: NC_024436). The blackfish mitogenome has 16,407 base pairs made up of 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a 819 bp non-coding AT-rich region. This is the 5th mitogenome sequence to be reported for the family Percichthyidae.
Corbi, N; Libri, V; Fanciulli, M; Tinsley, J M; Davies, K E; Passananti, C
2000-06-01
Up-regulation of utrophin gene expression is recognized as a plausible therapeutic approach in the treatment of Duchenne muscular dystrophy (DMD). We have designed and engineered new zinc finger-based transcription factors capable of binding and activating transcription from the promoter of the dystrophin-related gene, utrophin. Using the recognition 'code' that proposes specific rules between zinc finger primary structure and potential DNA binding sites, we engineered a new gene named 'Jazz' that encodes for a three-zinc finger peptide. Jazz belongs to the Cys2-His2 zinc finger type and was engineered to target the nine base pair DNA sequence: 5'-GCT-GCT-GCG-3', present in the promoter region of both the human and mouse utrophin gene. The entire zinc finger alpha-helix region, containing the amino acid positions that are crucial for DNA binding, was specifically chosen on the basis of the contacts more frequently represented in the available list of the 'code'. Here we demonstrate that Jazz protein binds specifically to the double-stranded DNA target, with a dissociation constant of about 32 nM. Band shift and super-shift experiments confirmed the high affinity and specificity of Jazz protein for its DNA target. Moreover, we show that chimeric proteins, named Gal4-Jazz and Sp1-Jazz, are able to drive the transcription of a test gene from the human utrophin promoter.
Fan, Zenghua; Zhao, Meng; Joshi, Parth D.; Li, Ping; Zhang, Yan; Guo, Weimin; Xu, Yichi; Wang, Haifang; Zhao, Zhihu
2017-01-01
Abstract Circadian rhythm exerts its influence on animal physiology and behavior by regulating gene expression at various levels. Here we systematically explored circadian long non-coding RNAs (lncRNAs) in mouse liver and examined their circadian regulation. We found that a significant proportion of circadian lncRNAs are expressed at enhancer regions, mostly bound by two key circadian transcription factors, BMAL1 and REV-ERBα. These circadian lncRNAs showed similar circadian phases with their nearby genes. The extent of their nuclear localization is higher than protein coding genes but less than enhancer RNAs. The association between enhancer and circadian lncRNAs is also observed in tissues other than liver. Comparative analysis between mouse and rat circadian liver transcriptomes showed that circadian transcription at lncRNA loci tends to be conserved despite of low sequence conservation of lncRNAs. One such circadian lncRNA termed lnc-Crot led us to identify a super-enhancer region interacting with a cluster of genes involved in circadian regulation of metabolism through long-range interactions. Further experiments showed that lnc-Crot locus has enhancer function independent of lnc-Crot's transcription. Our results suggest that the enhancer-associated circadian lncRNAs mark the genomic loci modulating long-range circadian gene regulation and shed new lights on the evolutionary origin of lncRNAs. PMID:28335007
Kress, W John; Erickson, David L
2007-06-06
A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.
The complete chloroplast genome sequence of Dianthus superbus var. longicalycinus.
Gurusamy, Raman; Lee, Do-Hyung; Park, SeonJoo
2016-05-01
The complete chloroplast genome (cpDNA) sequence of Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicine was reported and characterized. The cpDNA of Dianthus superbus var. longicalycinus is 149,539 bp, with 36.3% GC content. A pair of inverted repeats (IRs) of 24,803 bp is separated by a large single-copy region (LSC, 82,805 bp) and a small single-copy region (SSC, 17,128 bp). It encodes 85 protein-coding genes, 36 tRNA genes and 8 rRNA genes. Of 129 individual genes, 13 genes encoded one intron and three genes have two introns.
The complete chloroplast genome of a medicinal plant Epimedium koreanum Nakai (Berberidaceae).
Lee, Jung-Hoon; Kim, Kyunghee; Kim, Na-Rae; Lee, Sang-Choon; Yang, Tae-Jin; Kim, Young-Dong
2016-11-01
Epimedium koreanum is a perennial medicinal plant distributed in Eastern Asia. The complete chloroplast genome sequences of E. koreanum was obtained by de novo assembly using whole genome next-generation sequences. The chloroplast genome of E. koreanum was 157 218 bp in length and separated into four distinct regions such as large single copy region (89 600 bp), small single copy region (17 222 bp) and a pair of inverted repeat regions (25 198 bp). The genome contained a total of 112 genes including 78 protein-coding genes, 30 tRNA genes, and 4 rRNA genes. Phylogenetic analysis with the reported chloroplast genomes revealed that E. koreanum is most closely related to Berberis bealei, a traditional medicinal plant in the Berberidaceae family.
USDA-ARS?s Scientific Manuscript database
Newcastle disease virus (NDV), avian paramyxovirus type 1, has been developed as a vector to express foreign genes for vaccine and gene therapy purposes. The foreign genes are usually inserted into a non-coding region of the NDV genome as an independent transcription unit (ITU), which potentially a...
Gan, Han Ming; Tan, Mun Hua; Lee, Yin Peng; Austin, Christopher M
2016-05-01
The mitochondrial genome sequence of the Australian tadpole shrimp, Triops australiensis is presented (GenBank Accession Number: NC_024439) and compared with other Triops species. Triops australiensis has a mitochondrial genome of 15,125 base pairs consisting of 13 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a non-coding AT-rich region. The T. australiensis mitogenome is composed of 36.4% A, 16.1% C, 12.3% G and 35.1% T. The mitogenome gene order conforms to the primitive arrangement for Branchiopod crustaceans, which is also conserved within the Pancrustacean.
Austin, Christopher M; Tan, Mun Hua; Lee, Yin Peng; Croft, Laurence J; Meekan, Mark G; Pierce, Simon J; Gan, Han Ming
2016-01-01
The complete mitochondrial genome of the parasitic copepod Pandarus rhincodonicus was obtained from a partial genome scan using the HiSeq sequencing system. The Pandarus rhincodonicus mitogenome has 14,480 base pairs (62% A+T content) made up of 12 protein-coding genes, 2 ribosomal subunit genes, 22 transfer RNAs, and a putative 384 bp non-coding AT-rich region. This Pandarus mitogenome sequence is the first for the family Pandaridae, the second for the order Siphonostomatoida and the sixth for the Copepoda.
NASA Technical Reports Server (NTRS)
Chang, Dong Kyung; Metzgar, David; Wills, Christopher; Boland, C. Richard
2003-01-01
All "minor" components of the human DNA mismatch repair (MMR) system-MSH3, MSH6, PMS2, and the recently discovered MLH3-contain mononucleotide microsatellites in their coding sequences. This intriguing finding contrasts with the situation found in the major components of the DNA MMR system-MSH2 and MLH1-and, in fact, most human genes. Although eukaryotic genomes are rich in microsatellites, non-triplet microsatellites are rare in coding regions. The recurring presence of exonal mononucleotide repeat sequences within a single family of human genes would therefore be considered exceptional.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feder, J.N.; Jan, L.Y.; Jan, Y.N.
The Drosophila hairy gene encodes a basic helix- loop-helix protein that functions in at least two steps during Drosophila development: (1) during embryogenesis, when it partakes in the establishment of segments, and (2) during the larval stage, when it functions negatively in determining the pattern of sensory bristles on the adult fly. In the rat, a structurally homologous gene (RHL) behaves as an immediate-early gene in its response to growth factors and can, like that in Drosophila, suppress neuronal differentiation events. Here, the authors report the genomic cloning of the human hairy gene homolog (HRY). The coding region of themore » gene is contained within four exons. The predicted amino acid sequence reveals only four amino acid differences between the human and rat genes. Analysis of the DNA sequence 5[prime] to the coding region reveals a putatitve untranslated exon. To increase the value of the HRY gene as a genetic marker and to assess its potential involvement in genetic disorders, they sublocalized the locus to chromosome 3q28-q29 by fluorescence in situ hybridization. 34 refs., 4 figs., 1 tab.« less
Mitochondrial genome sequence of Egyptian swift Rock Pigeon (Columba livia breed Egyptian swift).
Li, Chun-Hong; Shi, Wei; Shi, Wan-Yu
2015-06-01
The Egyptian swift Rock Pigeon is a breed of fancy pigeon developed over many years of selective breeding. In this work, we report the complete mitochondrial genome sequence of Egyptian swift Rock Pigeon. The total length of the mitogenome was 17,239 bp and its overall base composition was estimated to be 30.2% for A, 24.0% for T, 31.9% for C and 13.9% for G, indicating an A-T (54.2%)-rich feature in the mitogenome. It contained the typical structure of 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and a non-coding control region (D-loop region). The complete mitochondrial genome sequence of Egyptian swift Rock Pigeon would serve as an important data set of the germplasm resources for further study.
PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes
Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J
2008-01-01
Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client side software setup or installation required. Source code is freely available to researchers interested in setting up a local version of PSAT for analysis of genomes not available through the public server. Access to the public web server and instructions for obtaining source code can be found at . PMID:18366802
Gutiérrez, Verónica; Rego, Natalia; Naya, Hugo; García, Graciela
2015-10-28
Among teleosts, the South American genus Austrolebias (Cyprinodontiformes: Rivulidae) includes 42 taxa of annual fishes divided into five different species groups. It is a monophyletic genus, but morphological and molecular data do not resolve the relationship among intrageneric clades and high rates of substitution have been previously described in some mitochondrial genes. In this work, the complete mitogenome of a species of the genus was determined for the first time. We determined its structure, gene order and evolutionary peculiar features, which will allow us to evaluate the performance of mitochondrial genes in the phylogenetic resolution at different taxonomic levels. Regarding gene content and order, the circular mitogenome of A. charrua (17,271 pb) presents the typical pattern of vertebrate mitogenomes. It contains the full complement of 13 proteins-coding genes, 22 tRNA, 2 rRNA and one non-coding control region. Notably, the tRNA-Cys was only 57 bp in length and lacks the D-loop arm. In three full sibling individuals, heteroplasmatic condition was detected due to a total of 12 variable sites in seven protein-coding genes. Among cyprinodontiforms, the mitogenome of A. charrua exhibits the lowest G+C content (37 %) and GCskew, as well as the highest strand asymmetry with a net difference of T over A at 1st and 3rd codon positions. Considering the 12 coding-genes of the H strand, correspondence analyses of nucleotide composition and codon usage show that A and T at 1st and 3rd codon positions have the highest weight in the first axis, and segregate annual species from the other cyprinodontiforms analyzed. Given the annual life-style, their mitogenomes could be under different selective pressures. All 13 protein-coding genes are under strong purifying selection and we did not find any significant evidence of nucleotide sites showing episodic selection (dN >dS) at annual lineages. When fast evolving third codon positions were removed from alignments, the "supergene" tree recovers our reference species phylogeny as well as the Cytb, ND4L and ND6 genes. Therefore, third codon positions seem to be saturated in the aforementioned coding regions at intergeneric Cyprinodontiformes comparisons. The complete mitogenome obtained in present work, offers relevant data for further comparative studies on molecular phylogeny and systematics of this taxonomic controversial endemic genus of annual fishes.
Mitochondrial genome evolution in the Saccharomyces sensu stricto complex.
Ruan, Jiangxing; Cheng, Jian; Zhang, Tongcun; Jiang, Huifeng
2017-01-01
Exploring the evolutionary patterns of mitochondrial genomes is important for our understanding of the Saccharomyces sensu stricto (SSS) group, which is a model system for genomic evolution and ecological analysis. In this study, we first obtained the complete mitochondrial sequences of two important species, Saccharomyces mikatae and Saccharomyces kudriavzevii. We then compared the mitochondrial genomes in the SSS group with those of close relatives, and found that the non-coding regions evolved rapidly, including dramatic expansion of intergenic regions, fast evolution of introns and almost 20-fold higher rearrangement rates than those of the nuclear genomes. However, the coding regions, and especially the protein-coding genes, are more conserved than those in the nuclear genomes of the SSS group. The different evolutionary patterns of coding and non-coding regions in the mitochondrial and nuclear genomes may be related to the origin of the aerobic fermentation lifestyle in this group. Our analysis thus provides novel insights into the evolution of mitochondrial genomes.
Complete mitochondrial genome of Chuanzhong black goat in southwest of China (Capra hircus).
Huang, Yong-Fu; Chen, Li-Peng; Zhao, Yong-Ju; Zhang, Hao; Na, Ri-Su; Zhao, Zhong-Quan; Zhang, Jia-Hua; Jiang, Cao-De; Ma, Yue-Hui; Sun, Ya-Wang; E, Guang-Xin
2016-09-01
The Chuanzhong black goat (Capra hircus) is a breed native to southwest of China. Its complete mitochondrial genome is 16,641 nt in length, consisting of 13 protein-coding genes, 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes, and a non-coding control region. As in other mammals, most mitochondrial genes are encoded on the heavy strand, except for ND6 and eight tRNA genes, which are encoded on the light strand. Its overall base composition is A: 33.5%, T: 27.3%, C: 26.1%, and G: 13.1%. The complete mitogenome of the Chinese indigenous breed of goat could provide a basic data for further phylogenetics analysis.
2013-01-01
Background Most eukaryotic species represent stable karyotypes with a particular diploid number. B chromosomes are additional to standard karyotypes and may vary in size, number and morphology even between cells of the same individual. For many years it was generally believed that B chromosomes found in some plant, animal and fungi species lacked active genes. Recently, molecular cytogenetic studies showed the presence of additional copies of protein-coding genes on B chromosomes. However, the transcriptional activity of these genes remained elusive. We studied karyotypes of the Siberian roe deer (Capreolus pygargus) that possess up to 14 B chromosomes to investigate the presence and expression of genes on supernumerary chromosomes. Results Here, we describe a 2 Mbp region homologous to cattle chromosome 3 and containing TNNI3K (partial), FPGT, LRRIQ3 and a large gene-sparse segment on B chromosomes of the Siberian roe deer. The presence of the copy of the autosomal region was demonstrated by B-specific cDNA analysis, PCR assisted mapping, cattle bacterial artificial chromosome (BAC) clone localization and quantitative polymerase chain reaction (qPCR). By comparative analysis of B-specific and non-B chromosomal sequences we discovered some B chromosome-specific mutations in protein-coding genes, which further enabled the detection of a FPGT-TNNI3K transcript expressed from duplicated genes located on B chromosomes in roe deer fibroblasts. Conclusions Discovery of a large autosomal segment in all B chromosomes of the Siberian roe deer further corroborates the view of an autosomal origin for these elements. Detection of a B-derived transcript in fibroblasts implies that the protein coding sequences located on Bs are not fully inactivated. The origin, evolution and effect on host of B chromosomal genes seem to be similar to autosomal segmental duplications, which reinforces the view that supernumerary chromosomal elements might play an important role in genome evolution. PMID:23915065
Seligmann, Hervé
2013-03-01
Usual DNA→RNA transcription exchanges T→U. Assuming different systematic symmetric nucleotide exchanges during translation, some GenBank RNAs match exactly human mitochondrial sequences (exchange rules listed in decreasing transcript frequencies): C↔U, A↔U, A↔U+C↔G (two nucleotide pairs exchanged), G↔U, A↔G, C↔G, none for A↔C, A↔G+C↔U, and A↔C+G↔U. Most unusual transcripts involve exchanging uracil. Independent measures of rates of rare replicational enzymatic DNA nucleotide misinsertions predict frequencies of RNA transcripts systematically exchanging the corresponding misinserted nucleotides. Exchange transcripts self-hybridize less than other gene regions, self-hybridization increases with length, suggesting endoribonuclease-limited elongation. Blast detects stop codon depleted putative protein coding overlapping genes within exchange-transcribed mitochondrial genes. These align with existing GenBank proteins (mainly metazoan origins, prokaryotic and viral origins underrepresented). These GenBank proteins frequently interact with RNA/DNA, are membrane transporters, or are typical of mitochondrial metabolism. Nucleotide exchange transcript frequencies increase with overlapping gene densities and stop densities, indicating finely tuned counterbalancing regulation of expression of systematic symmetric nucleotide exchange-encrypted proteins. Such expression necessitates combined activities of suppressor tRNAs matching stops, and nucleotide exchange transcription. Two independent properties confirm predicted exchanged overlap coding genes: discrepancy of third codon nucleotide contents from replicational deamination gradients, and codon usage according to circular code predictions. Predictions from both properties converge, especially for frequent nucleotide exchange types. Nucleotide exchanging transcription apparently increases coding densities of protein coding genes without lengthening genomes, revealing unsuspected functional DNA coding potential. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
The complete chloroplast genome sequence of Euonymus japonicus (Celastraceae).
Choi, Kyoung Su; Park, SeonJoo
2016-09-01
The complete chloroplast (cp) genome sequence of the Euonymus japonicus, the first sequenced of the genus Euonymus, was reported in this study. The total length was 157 637 bp, containing a pair of 26 678 bp inverted repeat region (IR), which were separated by small single copy (SSC) region and large single copy (LSC) region of 18 340 bp and 85 941 bp, respectively. This genome contains 107 unique genes, including 74 coding genes, four rRNA genes, and 29 tRNA genes. Seventeen genes contain intron of E. japonicus, of which three genes (clpP, ycf3, and rps12) include two introns. The maximum likelihood (ML) phylogenetic analysis revealed that E. japonicus was closely related to Manihot and Populus.
Wang, Pei; Song, Fan; Cai, Wanzhi
2014-01-01
Insect mitochondrial genomes are very important to understand the molecular evolution as well as for phylogenetic and phylogeographic studies of the insects. The Miridae are the largest family of Heteroptera encompassing more than 11,000 described species and of great economic importance. For better understanding the diversity and the evolution of plant bugs, we sequence five new mitochondrial genomes and present the first comparative analysis of nine mitochondrial genomes of mirids available to date. Our result showed that gene content, gene arrangement, base composition and sequences of mitochondrial transcription termination factor were conserved in plant bugs. Intra-genus species shared more conserved genomic characteristics, such as nucleotide and amino acid composition of protein-coding genes, secondary structure and anticodon mutations of tRNAs, and non-coding sequences. Control region possessed several distinct characteristics, including: variable size, abundant tandem repetitions, and intra-genus conservation; and was useful in evolutionary and population genetic studies. The AGG codon reassignments were investigated between serine and lysine in the genera Adelphocoris and other cimicomorphans. Our analysis revealed correlated evolution between reassignments of the AGG codon and specific point mutations at the antidocons of tRNALys and tRNASer(AGN). Phylogenetic analysis indicated that mitochondrial genome sequences were useful in resolving family level relationship of Cimicomorpha. Comparative evolutionary analysis of plant bug mitochondrial genomes allowed the identification of previously neglected coding genes or non-coding regions as potential molecular markers. The finding of the AGG codon reassignments between serine and lysine indicated the parallel evolution of the genetic code in Hemiptera mitochondrial genomes. PMID:24988409
Krzeminska, Urszula; Wilson, Robyn; Rahman, Sadequr; Song, Beng Kah; Seneviratne, Sampath; Gan, Han Ming; Austin, Christopher M
2016-07-01
The complete mitochondrial genomes of two jungle crows (Corvus macrorhynchos) were sequenced. DNA was extracted from tissue samples obtained from shed feathers collected in the field in Sri Lanka and sequenced using the Illumina MiSeq Personal Sequencer. Jungle crow mitogenomes have a structural organization typical of the genus Corvus and are 16,927 bp and 17,066 bp in length, both comprising 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal subunit genes, and a non-coding control region. In addition, we complement already available house crow (Corvus spelendens) mitogenome resources by sequencing an individual from Singapore. A phylogenetic tree constructed from Corvidae family mitogenome sequences available on GenBank is presented. We confirm the monophyly of the genus Corvus and propose to use complete mitogenome resources for further intra- and interspecies genetic studies.
Polymorphism of BMP4 gene in Indian goat breeds differing in prolificacy.
Sharma, Rekha; Ahlawat, Sonika; Maitra, A; Roy, Manoranjan; Mandakmale, S; Tantia, M S
2013-12-10
Bone morphogenetic proteins (BMPs) are members of the TGF-β (transforming growth factor-beta) superfamily, of which BMP4 is the most important due to its crucial role in follicular growth and differentiation, cumulus expansion and ovulation. Reproduction is a crucial trait in goat breeding and based on the important role of BMP4 gene in reproduction it was considered as a possible candidate gene for the prolificacy of goats. The objective of the present study was to detect polymorphism in intronic, exonic and 3' un-translated regions of BMP4 gene in Indian goats. Nine different goat breeds (Barbari, Beetal, Black Bengal, Malabari, Jakhrana (Twinning>40%), Osmanabadi, Sangamneri (Twinning 20-30%), Sirohi and Ganjam (Twinning<10%)) differing in prolificacy and geographic distribution were employed for polymorphism scanning. Cattle sequence (AC_000167.1) was used to design primers for the amplification of a targeted region followed by direct DNA sequencing to identify the genetic variations. Single nucleotide polymorphisms (SNPs) were not detected in exon 3, the intronic region and the 3' flanking region. A SNP (G1534A) was identified in exon 2. It was a non-synonymous mutation resulting in an arginine to lysine change in a corresponding protein sequence. G to A transition at the 1534 locus revealed two genotypes GG and GA in the nine investigated goat breeds. The GG genotype was predominant with a genotype frequency of 0.98. The GA genotype was present in the Black Bengal as well as Jakhrana breed with a genotype frequency of 0.02. A microsatellite was identified in the 3' flanking region, only 20 nucleotides downstream from the termination site of the coding region, as a short sequence with more than nineteen continuous and repeated CA dinucleotides. Since the gene is highly evolutionarily conserved, identification of a non-synonymous SNP (G1534A) in the coding region gains further importance. To our knowledge, this is the first report of a mutation in the coding region of the caprine BMP4 gene. But whether the reproduction trait of goat is associated with the BMP4 polymorphism, needs to be further defined by association studies in more populations so as to delineate an effect on it. © 2013 Elsevier B.V. All rights reserved.
Foox, Jonathan; Brugler, Mercer; Siddall, Mark Edward; Rodríguez, Estefanía
2016-07-01
Six complete and three partial actiniarian mitochondrial genomes were amplified in two semi-circles using long-range PCR and pyrosequenced in a single run on a 454 GS Junior, doubling the number of complete mitogenomes available within the order. Typical metazoan mtDNA features included circularity, 13 protein-coding genes, 2 ribosomal RNA genes, and length ranging from 17,498 to 19,727 bp. Several typical anthozoan mitochondrial genome features were also observed including the presence of only two transfer RNA genes, elevated A + T richness ranging from 54.9 to 62.4%, large intergenic regions, and group 1 introns interrupting NADH dehydrogenase subunit 5 and cytochrome c oxidase subunit I, the latter of which possesses a homing endonuclease gene. Within the sea anemone Alicia sansibarensis, we report the first mitochondrial gene order rearrangement within the Actiniaria, as well as putative novel non-canonical protein-coding genes. Phylogenetic analyses of all 13 protein-coding and 2 ribosomal genes largely corroborated current hypotheses of sea anemone interrelatedness, with a few lower-level differences.
Fujisawa, Takatomo; Narikawa, Rei; Okamoto, Shinobu; Ehira, Shigeki; Yoshimura, Hidehisa; Suzuki, Iwane; Masuda, Tatsuru; Mochimaru, Mari; Takaichi, Shinichi; Awai, Koichiro; Sekine, Mitsuo; Horikawa, Hiroshi; Yashiro, Isao; Omata, Seiha; Takarada, Hiromi; Katano, Yoko; Kosugi, Hiroki; Tanikawa, Satoshi; Ohmori, Kazuko; Sato, Naoki; Ikeuchi, Masahiko; Fujita, Nobuyuki; Ohmori, Masayuki
2010-01-01
A filamentous non-N2-fixing cyanobacterium, Arthrospira (Spirulina) platensis, is an important organism for industrial applications and as a food supply. Almost the complete genome of A. platensis NIES-39 was determined in this study. The genome structure of A. platensis is estimated to be a single, circular chromosome of 6.8 Mb, based on optical mapping. Annotation of this 6.7 Mb sequence yielded 6630 protein-coding genes as well as two sets of rRNA genes and 40 tRNA genes. Of the protein-coding genes, 78% are similar to those of other organisms; the remaining 22% are currently unknown. A total 612 kb of the genome comprise group II introns, insertion sequences and some repetitive elements. Group I introns are located in a protein-coding region. Abundant restriction-modification systems were determined. Unique features in the gene composition were noted, particularly in a large number of genes for adenylate cyclase and haemolysin-like Ca2+-binding proteins and in chemotaxis proteins. Filament-specific genes were highlighted by comparative genomic analysis. PMID:20203057
Liu, Zhongliang; Hui, Yi; Shi, Lei; Chen, Zhenyu; Xu, Xiangjie; Chi, Liankai; Fan, Beibei; Fang, Yujiang; Liu, Yang; Ma, Lin; Wang, Yiran; Xiao, Lei; Zhang, Quanbin; Jin, Guohua; Liu, Ling; Zhang, Xiaoqing
2016-09-13
Loss-of-function studies in human pluripotent stem cells (hPSCs) require efficient methodologies for lesion of genes of interest. Here, we introduce a donor-free paired gRNA-guided CRISPR/Cas9 knockout strategy (paired-KO) for efficient and rapid gene ablation in hPSCs. Through paired-KO, we succeeded in targeting all genes of interest with high biallelic targeting efficiencies. More importantly, during paired-KO, the cleaved DNA was repaired mostly through direct end joining without insertions/deletions (precise ligation), and thus makes the lesion product predictable. The paired-KO remained highly efficient for one-step targeting of multiple genes and was also efficient for targeting of microRNA, while for long non-coding RNA over 8 kb, cleavage of a short fragment of the core promoter region was sufficient to eradicate downstream gene transcription. This work suggests that the paired-KO strategy is a simple and robust system for loss-of-function studies for both coding and non-coding genes in hPSCs. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
The mitochondrial genome of Pomacea maculata (Gastropoda: Ampullariidae).
Yang, Qianqian; Liu, Suwen; Song, Fan; Li, Hu; Liu, Jinpeng; Liu, Guangfu; Yu, Xiaoping
2016-07-01
The golden apple snail, Pomacea maculata Perry, 1810 (Gastropoda: Ampullariidae) is one of the most serious invasive alien species from the native range of South America. The mitochondrial genome of P. maculata (15 516 bp) consists of 37 genes (13 protein-coding genes, two rRNAs, and 22 tRNAs) and a non-coding region with a 16 bp repeat unit. Most mitochondrial genes of P. maculata are distributed on the H-strand, except eight tRNA genes, which are encoded on the L-strand. A phylogenetic analysis showed that there was a close relationship between P. maculata and another invasive golden apple snail species, Pomacea canaliculata (Lamarck, 1822).
Li, S.-F.; Xu, J.-W.; Yang, Q.-L.; Wang, C.H.; Chen, Q.; Chapman, D.C.; Lu, G.
2009-01-01
Based upon morphological characters, Silver carp Hypophthalmichthys molitrix and bighead carp Hypophthalmichthys nobilis (or Aristichthys nobilis) have been classified into either the same genus or two distinct genera. Consequently, the taxonomic relationship of the two species at the generic level remains equivocal. This issue is addressed by sequencing complete mitochondrial genomes of H. molitrix and H. nobilis, comparing their mitogenome organization, structure and sequence similarity, and conducting a comprehensive phylogenetic analysis of cyprinid species. As with other cyprinid fishes, the mitogenomes of the two species were structurally conserved, containing 37 genes including 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA (tRNAs) genes and a putative control region (D-loop). Sequence similarity between the two mitogenomes varied in different genes or regions, being highest in the tRNA genes (98??8%), lowest in the control region (89??4%) and intermediate in the protein-coding genes (94??2%). Analyses of the sequence comparison and phylogeny using concatenated protein sequences support the view that the two species belong to the genus Hypophthalmichthys. Further studies using nuclear markers and involving more closely related species, and the systematic combination of traditional biology and molecular biology are needed in order to confirm this conclusion. ?? 2009 The Fisheries Society of the British Isles.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lappalainen, J.; Dean, M.; Virkkunen, M.
1995-04-24
Abnormal brain serotonin function may be characteristic of several neuropsychiatric disorders. Thus, it is important to identify polymorphic genes and screen for functional variants at loci coding for genes that control normal serotonin functions. 5-HT{sub 1D{beta}} is a terminal serotonin autoreceptor which may play a role in regulating serotonin synthesis and release. Using an SSCP technique we screened for 5-HT{sub 1D{beta}} coding sequence variants in psychiatrically interviewed populations, which included controls, alcoholics, and alcoholic arsonists and alcoholic violent offenders with low CSF concentrations of the main serotonin metabolite 5-HIAA. A common polymorphism was identified in the 5-HT{sub 1D{beta}} gene withmore » allele frequencies of 0.72 and 0.28. The SSCP variant was caused by a silent G to C substitution at nucleotide 861 of the coding region. This polymorphism could also be detected as a HincII RFLP of amplified DNA. DNAs from informative CEPH families were typed for the HincII RFLP and analyzed with respect to 20 linked markers on chromosome 6. Multipoint analysis placed the 5-HT{sub 1D{beta}} receptor gene between markers D6S286 and D6S275. A maximum two-point lod score of 10.90 was obtained to D6S26, which had been previously localized on 6q14-15. Chromosomal aberrations involving this region have been previously shown to cause retinal anomalies, developmental delay, and abnormal brain development. This region also contains the gene for North Carolina-type macular dystrophy. 34 refs., 3 figs., 1 tab.« less
Effects of Nickel Treatment on H3K4 Trimethylation and Gene Expression
Tchou-Wong, Kam-Meng; Kluz, Thomas; Arita, Adriana; Smith, Phillip R.; Brown, Stuart; Costa, Max
2011-01-01
Occupational exposure to nickel compounds has been associated with lung and nasal cancers. We have previously shown that exposure of the human lung adenocarcinoma A549 cells to NiCl2 for 24 hr significantly increased global levels of trimethylated H3K4 (H3K4me3), a transcriptional activating mark that maps to the promoters of transcribed genes. To further understand the potential epigenetic mechanism(s) underlying nickel carcinogenesis, we performed genome-wide mapping of H3K4me3 by chromatin immunoprecipitation and direct genome sequencing (ChIP-seq) and correlated with transcriptome genome-wide mapping of RNA transcripts by massive parallel sequencing of cDNA (RNA-seq). The effect of NiCl2 treatment on H3K4me3 peaks within 5,000 bp of transcription start sites (TSSs) on a set of genes highly induced by nickel in both A549 cells and human peripheral blood mononuclear cells were analyzed. Nickel exposure increased the level of H3K4 trimethylation in both the promoters and coding regions of several genes including CA9 and NDRG1 that were increased in expression in A549 cells. We have also compared the extent of the H3K4 trimethylation in the absence and presence of formaldehyde crosslinking and observed that crosslinking of chromatin was required to observe H3K4 trimethylation in the coding regions immediately downstream of TSSs of some nickel-induced genes including ADM and IGFBP3. This is the first genome-wide mapping of trimethylated H3K4 in the promoter and coding regions of genes induced after exposure to NiCl2. This study may provide insights into the epigenetic mechanism(s) underlying the carcinogenicity of nickel compounds. PMID:21455298
New genetic variants of LATS1 detected in urinary bladder and colon cancer.
Saadeldin, Mona K; Shawer, Heba; Mostafa, Ahmed; Kassem, Neemat M; Amleh, Asma; Siam, Rania
2014-01-01
LATS1, the large tumor suppressor 1 gene, encodes for a serine/threonine kinase protein and is implicated in cell cycle progression. LATS1 is down-regulated in various human cancers, such as breast cancer, and astrocytoma. Point mutations in LATS1 were reported in human sarcomas. Additionally, loss of heterozygosity of LATS1 chromosomal region predisposes to breast, ovarian, and cervical tumors. In the current study, we investigated LATS1 genetic variations including single nucleotide polymorphisms (SNPs), in 28 Egyptian patients with either urinary bladder or colon cancers. The LATS1 gene was amplified and sequenced and the expression of LATS1 at the RNA level was assessed in 12 urinary bladder cancer samples. We report, the identification of a total of 29 variants including previously identified SNPs within LATS1 coding and non-coding sequences. A total of 18 variants were novel. Majority of the novel variants, 13, were mapped to intronic sequences and un-translated regions of the gene. Four of the five novel variants located in the coding region of the gene, represented missense mutations within the serine/threonine kinase catalytic domain. Interestingly, LATS1 RNA steady state levels was lost in urinary bladder cancerous tissue harboring four specific SNPs (16045 + 41736 + 34614 + 56177) positioned in the 5'UTR, intron 6, and two silent mutations within exon 4 and exon 8, respectively. This study identifies novel single-base-sequence alterations in the LATS1 gene. These newly identified variants could potentially be used as novel diagnostic or prognostic tools in cancer.
The complete mitochondrial genome of the stomatopod crustacean Squilla mantis
Cook, Charles E
2005-01-01
Background Animal mitochondrial genomes are physically separate from the much larger nuclear genomes and have proven useful both for phylogenetic studies and for understanding genome evolution. Within the phylum Arthropoda the subphylum Crustacea includes over 50,000 named species with immense variation in body plans and habitats, yet only 23 complete mitochondrial genomes are available from this subphylum. Results I describe here the complete mitochondrial genome of the crustacean Squilla mantis (Crustacea: Malacostraca: Stomatopoda). This 15994-nucleotide genome, the first described from a hoplocarid, contains the standard complement of 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and a non-coding AT-rich region that is found in most other metazoans. The gene order is identical to that considered ancestral for hexapods and crustaceans. The 70% AT base composition is within the range described for other arthropods. A single unusual feature of the genome is a 230 nucleotide non-coding region between a serine transfer RNA and the nad1 gene, which has no apparent function. I also compare gene order, nucleotide composition, and codon usage of the S. mantis genome and eight other malacostracan crustaceans. A translocation of the histidine transfer RNA gene is shared by three taxa in the order Decapoda, infraorder Brachyura; Callinectes sapidus, Portunus trituberculatus and Pseudocarcinus gigas. This translocation may be diagnostic for the Brachyura. For all nine taxa nucleotide composition is biased towards AT-richness, as expected for arthropods, and is within the range reported for other arthropods. Codon usage is biased, and much of this bias is probably due to the skew in nucleotide composition towards AT-richness. Conclusion The mitochondrial genome of Squilla mantis contains one unusual feature, a 230 base pair non-coding region has so far not been described in any other malacostracan. Comparisons with other Malacostraca show that all nine genomes, like most other mitochondrial genomes, share a bias toward AT-richness and a related bias in codon usage. The nine malacostracans included in this analysis are not representative of the diversity of the class Malacostraca, and additional malacostracan sequences would surely reveal other unusual genomic features that could be useful in understanding mitochondrial evolution in this taxon. PMID:16091132
Chen, Zhi-Teng; Du, Yu-Zhou
2017-01-01
The complete mitochondrial genome (mitogenome) of Nemoura nankinensis (Plecoptera: Nemouridae) was sequenced as the first reported mitogenome from the family Nemouridae. The N. nankinensis mitogenome was the longest (16,602 bp) among reported plecopteran mitogenomes, and it contains 37 genes including 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes and two ribosomal RNA (rRNA) genes. Most PCGs used standard ATN as start codons, and TAN as termination codons. All tRNA genes of N. nankinensis could fold into the cloverleaf secondary structures except for trnSer (AGN), whose dihydrouridine (DHU) arm was reduced to a small loop. There was also a large non-coding region (control region, CR) in the N. nankinensis mitogenome. The 1751 bp CR was the longest and had the highest A+T content (81.8%) among stoneflies. A large tandem repeat region, five potential stem-loop (SL) structures, four tRNA-like structures and four conserved sequence blocks (CSBs) were detected in the elongated CR. The presence of these tRNA-like structures in the CR has never been reported in other plecopteran mitogenomes. These novel features of the elongated CR in N. nankinensis may have functions associated with the process of replication and transcription. Finally, phylogenetic reconstruction suggested that Nemouridae was the sister-group of Capniidae. PMID:28475163
Chen, Zhi-Teng; Du, Yu-Zhou
2017-05-05
The complete mitochondrial genome (mitogenome) of Nemoura nankinensis (Plecoptera: Nemouridae) was sequenced as the first reported mitogenome from the family Nemouridae. The N. nankinensis mitogenome was the longest (16,602 bp) among reported plecopteran mitogenomes, and it contains 37 genes including 13 protein-coding genes (PCGs), 22 transfer RNA (tRNA) genes and two ribosomal RNA (rRNA) genes. Most PCGs used standard ATN as start codons, and TAN as termination codons. All tRNA genes of N. nankinensis could fold into the cloverleaf secondary structures except for trnSer ( AGN ), whose dihydrouridine (DHU) arm was reduced to a small loop. There was also a large non-coding region (control region, CR) in the N. nankinensis mitogenome. The 1751 bp CR was the longest and had the highest A+T content (81.8%) among stoneflies. A large tandem repeat region, five potential stem-loop (SL) structures, four tRNA-like structures and four conserved sequence blocks (CSBs) were detected in the elongated CR. The presence of these tRNA-like structures in the CR has never been reported in other plecopteran mitogenomes. These novel features of the elongated CR in N. nankinensis may have functions associated with the process of replication and transcription. Finally, phylogenetic reconstruction suggested that Nemouridae was the sister-group of Capniidae.
Tetrahymena thermophila acidic ribosomal protein L37 contains an archaebacterial type of C-terminus.
Hansen, T S; Andreasen, P H; Dreisig, H; Højrup, P; Nielsen, H; Engberg, J; Kristiansen, K
1991-09-15
We have cloned and characterized a Tetrahymena thermophila macronuclear gene (L37) encoding the acidic ribosomal protein (A-protein) L37. The gene contains a single intron located in the 3'-part of the coding region. Two major and three minor transcription start points (tsp) were mapped 39 to 63 nucleotides upstream from the translational start codon. The uppermost tsp mapped to the first T in a putative T. thermophila RNA polymerase II initiator element, TATAA. The coding region of L37 predicts a protein of 109 amino acid (aa) residues. A substantial part of the deduced aa sequence was verified by protein sequencing. The T. thermophila L37 clearly belongs to the P1-type family of eukaryotic A-proteins, but the C-terminal region has the hallmarks of archaebacterial A-proteins.
Sun, Jiajie; Gao, Yuan; Liu, Dong; Ma, Wei; Xue, Jing; Zhang, Chunlei; Lan, Xianyong; Lei, Chuzhao; Chen, Hong
2012-06-01
The insulin-induced gene 1 (INSIG1) gene encodes a protein that blocks proteolytic activation of sterol regulatory element binding proteins, which are transcription factors that activate genes that regulate cholesterol, fatty acid, and glucose metabolism. However, similar research for the bovine INSIG1 gene is lacking. Therefore, in this study, polymorphisms of the bovine INSIG1 gene were detected in 643 individuals from four cattle breeds by DNA pooling, forced PCR-RFLP, PCR-SSCP, and DNA sequencing methods. Only 10 novel SNPs were identified, which included four mutations in the coding region and the others in the introns. In Nanyang individuals, seven common haplotypes were identified based on four coding region SNPs. The haplotype GACT, with a frequency of 75.4%, was the most prevalent haplotypes and SNPs formed two linkage disequilibrium blocks with strong multi-allelic D' (D' = 1). Additionally, association analysis between mutations of the bovine INSIG1 gene and growth traits in Nanyang cattle at 6, 12, 18, and 24 months old was performed, and the results indicated that the polymorphisms were not significantly associated with body mass.
USDA-ARS?s Scientific Manuscript database
Objectives: Newcastle disease virus (NDV), a member of the Paramxoviridae family, has been developed as a vector to express foreign genes for vaccine and gene therapy purposes. The foreign genes are usually inserted into a non-coding region of the NDV genome as an independent transcription unit (ITU...
Vermaas, Willem F J.
2014-06-17
Disclosed is a modified photoautotrophic bacterium comprising genes of interest that are modified in terms of their expression and/or coding region sequence, wherein modification of the genes of interest increases production of a desired product in the bacterium relative to the amount of the desired product production in a photoautotrophic bacterium that is not modified with respect to the genes of interest.
Javierre, Biola M; Burren, Oliver S; Wilder, Steven P; Kreuzhuber, Roman; Hill, Steven M; Sewitz, Sven; Cairns, Jonathan; Wingett, Steven W; Várnai, Csilla; Thiecke, Michiel J; Burden, Frances; Farrow, Samantha; Cutler, Antony J; Rehnström, Karola; Downes, Kate; Grassi, Luigi; Kostadima, Myrto; Freire-Pritchett, Paula; Wang, Fan; Stunnenberg, Hendrik G; Todd, John A; Zerbino, Daniel R; Stegle, Oliver; Ouwehand, Willem H; Frontini, Mattia; Wallace, Chris; Spivakov, Mikhail; Fraser, Peter
2016-11-17
Long-range interactions between regulatory elements and gene promoters play key roles in transcriptional regulation. The vast majority of interactions are uncharted, constituting a major missing link in understanding genome control. Here, we use promoter capture Hi-C to identify interacting regions of 31,253 promoters in 17 human primary hematopoietic cell types. We show that promoter interactions are highly cell type specific and enriched for links between active promoters and epigenetically marked enhancers. Promoter interactomes reflect lineage relationships of the hematopoietic tree, consistent with dynamic remodeling of nuclear architecture during differentiation. Interacting regions are enriched in genetic variants linked with altered expression of genes they contact, highlighting their functional role. We exploit this rich resource to connect non-coding disease variants to putative target promoters, prioritizing thousands of disease-candidate genes and implicating disease pathways. Our results demonstrate the power of primary cell promoter interactomes to reveal insights into genomic regulatory mechanisms underlying common diseases. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Sequence Analysis of Mitochondrial Genome of Toxascaris leonina from a South China Tiger.
Li, Kangxin; Yang, Fang; Abdullahi, A Y; Song, Meiran; Shi, Xianli; Wang, Minwei; Fu, Yeqi; Pan, Weida; Shan, Fang; Chen, Wu; Li, Guoqing
2016-12-01
Toxascaris leonina is a common parasitic nematode of wild mammals and has significant impacts on the protection of rare wild animals. To analyze population genetic characteristics of T. leonina from South China tiger, its mitochondrial (mt) genome was sequenced. Its complete circular mt genome was 14,277 bp in length, including 12 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 2 non-coding regions. The nucleotide composition was biased toward A and T. The most common start codon and stop codon were TTG and TAG, and 4 genes ended with an incomplete stop codon. There were 13 intergenic regions ranging 1 to 10 bp in size. Phylogenetically, T. leonina from a South China tiger was close to canine T. leonina . This study reports for the first time a complete mt genome sequence of T. leonina from the South China tiger, and provides a scientific basis for studying the genetic diversity of nematodes between different hosts.
The GENCODE exome: sequencing the complete human exome
Coffey, Alison J; Kokocinski, Felix; Calafato, Maria S; Scott, Carol E; Palta, Priit; Drury, Eleanor; Joyce, Christopher J; LeProust, Emily M; Harrow, Jen; Hunt, Sarah; Lehesjoki, Anna-Elina; Turner, Daniel J; Hubbard, Tim J; Palotie, Aarno
2011-01-01
Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing. PMID:21364695
Henne, Karsten; Li, Jing; Stoneking, Mark; Kessler, Olga; Schilling, Hildegard; Sonanini, Anne; Conrads, Georg; Horz, Hans-Peter
2014-08-22
The genetic diversity of the human microbiome holds great potential for shedding light on the history of our ancestors. Helicobacter pylori is the most prominent example as its analysis allowed a fine-scale resolution of past migration patterns including some that could not be distinguished using human genetic markers. However studies of H. pylori require stomach biopsies, which severely limits the number of samples that can be analysed. By focussing on the house-keeping gene gdh (coding for the glucose-6-phosphate dehydrogenase), on the virulence gene gtf (coding for the glucosyltransferase) of mitis-streptococci and on the 16S-23S rRNA internal transcribed spacer (ITS) region of the Fusobacterium nucleatum/periodonticum-group we here tested the hypothesis that bacterial genes from human saliva have the potential for distinguishing human populations. Analysis of 10 individuals from each of seven geographic regions, encompassing Africa, Asia and Europe, revealed that the genes gdh and ITS exhibited the highest number of polymorphic sites (59% and 79%, respectively) and most OTUs (defined at 99% identity) were unique to a given country. In contrast, the gene gtf had the lowest number of polymorphic sites (21%), and most OTUs were shared among countries. Most of the variation in the gdh and ITS genes was explained by the high clonal diversity within individuals (around 80%) followed by inter-individual variation of around 20%, leaving the geographic region as providing virtually no source of sequence variation. Conversely, for gtf the variation within individuals accounted for 32%, between individuals for 57% and among geographic regions for 11%. This geographic signature persisted upon extension of the analysis to four additional locations from the American continent. Pearson correlation analysis, pairwise Fst-cluster analysis as well as UniFrac analyses consistently supported a tree structure in which the European countries clustered tightly together and branched with American countries and South Africa, to the exclusion of Asian countries and the Congo. This study shows that saliva harbours protein-coding bacterial genes that are geographically structured, and which could potentially be used for addressing previously unresolved human migration events.
The complete chloroplast genome sequence of Hibiscus syriacus.
Kwon, Hae-Yun; Kim, Joon-Hyeok; Kim, Sea-Hyun; Park, Ji-Min; Lee, Hyoshin
2016-09-01
The complete chloroplast genome sequence of Hibiscus syriacus L. is presented in this study. The genome is composed of 161 019 bp in length, with a typical circular structure containing a pair of inverted repeats of 25 745 bp of length separated by a large single-copy region and a small single-copy region of 89 698 bp and 19 831 bp of length, respectively. The overall GC content is 36.8%. One hundred and fourteen genes were annotated, including 81 protein-coding genes, 4 ribosomal RNA genes and 29 transfer RNA genes.
Discover mouse gene coexpression landscapes using dictionary learning and sparse coding.
Li, Yujie; Chen, Hanbo; Jiang, Xi; Li, Xiang; Lv, Jinglei; Peng, Hanchuan; Tsien, Joe Z; Liu, Tianming
2017-12-01
Gene coexpression patterns carry rich information regarding enormously complex brain structures and functions. Characterization of these patterns in an unbiased, integrated, and anatomically comprehensive manner will illuminate the higher-order transcriptome organization and offer genetic foundations of functional circuitry. Here using dictionary learning and sparse coding, we derived coexpression networks from the space-resolved anatomical comprehensive in situ hybridization data from Allen Mouse Brain Atlas dataset. The key idea is that if two genes use the same dictionary to represent their original signals, then their gene expressions must share similar patterns, thereby considering them as "coexpressed." For each network, we have simultaneous knowledge of spatial distributions, the genes in the network and the extent a particular gene conforms to the coexpression pattern. Gene ontologies and the comparisons with published gene lists reveal biologically identified coexpression networks, some of which correspond to major cell types, biological pathways, and/or anatomical regions.
Anisha, Shashidharan; Bhasker, Salini; Mohankumar, Chinnamma
2012-03-01
Vechur cow, categorized as a critically maintained breed by the FAO, is a unique breed of Bos indicus due to its extremely small size, less fodder intake, adaptability, easy domestication and traditional medicinal property of the milk. Lactoferrin (Lf) is an iron-binding glycoprotein that is found predominantly in the milk of mammals. The full coding region of Lf gene of Vechur cow was cloned, sequenced and expressed in a prokaryotic system. Antibacterial activity of the recombinant Lf showed suppression of bacterial growth. To the best of our knowledge this is the first time that the full coding region of Lf gene of B. indicus Vechur breed is sequenced, successfully expressed in a prokaryotic system and characterized. Comparative analysis of Lf gene sequence of five Vechur cows with B. taurus revealed 15 SNPs in the exon region associated with 11 amino acid substitutions. The amino acid arginine was noticed as a pronounced substitution and the tertiary structure analysis of the BLfV protein confirmed the positions of arginine in the β sheet region, random coil and helix region 1. Based on the recent reports on the nutritional therapies of arginine supplementation for wound healing and for cardiovascular diseases, the higher level of arginine in the lactoferrin protein of Vechur cow milk provides enormous scope for further therapeutic studies. Copyright © 2011 Elsevier B.V. All rights reserved.
Masingue, Marion; Perrot, Jimmy; Carlier, Robert-Yves; Piguet-Lacroix, Guenaelle; Latour, Philippe; Stojkovic, Tanya
2018-05-01
Charcot-Marie-Tooth disease (CMT) refers to a group of clinically and genetically heterogeneous inherited neuropathies. Ganglioside-induced differentiation-associated protein 1 GDAP1-related CMT has been reported in an autosomal dominant or recessive form in patients presenting either axonal or demyelinating neuropathy. We report two Sri Lankan sisters born to consanguineous parents and presenting with a severe axonal sensorimotor neuropathy. The early onset of the disease, the distal and proximal weakness and atrophy leading to major disability, along with areflexia, and, most notably, vocal cord and diaphragm paralysis were highly evocative of a GDAP1-related CMT. However, sequencing of the coding regions of the gene was normal. Whole-exome sequencing (WES) was performed and revealed that the largest region of homozygosity was around GDAP1 with several variants, mostly in non-coding regions. In view of the high clinical suspicion of GDAP1 gene involvement, we examined the variants in this gene and this, along with functional studies, allowed us to identify an alternative splicing site revealing a cryptic in-frame stop codon in intron 4 responsible for a severe loss of wild-type GDAP1. This work is the first to describe a deleterious mutation in GDAP1 gene outside of coding sequences or intronic junctions and emphasizes the importance of interpreting molecular analysis, and in particular WES results, in light of the clinical and electrophysiological phenotype.
Rozhdestvensky, Timofey S.; Robeck, Thomas; Galiveti, Chenna R.; Raabe, Carsten A.; Seeger, Birte; Wolters, Anna; Gubar, Leonid V.; Brosius, Jürgen; Skryabin, Boris V.
2016-01-01
Prader-Willi syndrome (PWS) is a neurogenetic disorder caused by loss of paternally expressed genes on chromosome 15q11-q13. The PWS-critical region (PWScr) contains an array of non-protein coding IPW-A exons hosting intronic SNORD116 snoRNA genes. Deletion of PWScr is associated with PWS in humans and growth retardation in mice exhibiting ~15% postnatal lethality in C57BL/6 background. Here we analysed a knock-in mouse containing a 5′HPRT-LoxP-NeoR cassette (5′LoxP) inserted upstream of the PWScr. When the insertion was inherited maternally in a paternal PWScr-deletion mouse model (PWScrp−/m5′LoxP), we observed compensation of growth retardation and postnatal lethality. Genomic methylation pattern and expression of protein-coding genes remained unaltered at the PWS-locus of PWScrp−/m5′LoxP mice. Interestingly, ubiquitous Snord116 and IPW-A exon transcription from the originally silent maternal chromosome was detected. In situ hybridization indicated that PWScrp−/m5′LoxP mice expressed Snord116 in brain areas similar to wild type animals. Our results suggest that the lack of PWScr RNA expression in certain brain areas could be a primary cause of the growth retardation phenotype in mice. We propose that activation of disease-associated genes on imprinted regions could lead to general therapeutic strategies in associated diseases. PMID:26848093
Complete mitogenome sequencing and phylogenetic analysis of PaLi yak (Bos grunniens).
Bao, Pengjia; Guo, Xian; Pei, Jie; Liang, Chunnian; Ding, Xuezhi; Min, Chu; Wang, Hongbo; Wu, Xiaoyun; Yan, Ping
2016-11-01
PaLi yak is a very important local breed in China; as a year-round grazing animal, it plays a very important role for the economic and native herdsmen. The PaLi yak complete mitochondrial DNA is sequenced in this study, the total length is 16,324 bp, containing 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes and a non-coding control region (D-loop region). The order and composition are similar to most of the other vertebrates. The base contents are: 33.72% A, 25.80% C, 13.21% G and 27.27% T; A + T (60.99%) was higher than G + C (39.01%). The phylogenetic relationships were analyzed using the complete mitogenome sequence, results showed that the genetic relationship between yak and cattle is distinct. These information provides useful data for further study on protection of genetic resources and the taxonomy of Bovinae.
Mitochondrial genomes of parasitic flatworms.
Le, Thanh H; Blair, David; McManus, Donald P
2002-05-01
Complete or near-complete mitochondrial genomes are now available for 11 species or strains of parasitic flatworms belonging to the Trematoda and the Cestoda. The organization of these genomes is not strikingly different from those of other eumetazoans, although one gene (atp8) commonly found in other phyla is absent from flatworms. The gene order in most flatworms has similarities to those seen in higher protostomes such as annelids. However, the gene order has been drastically altered in Schistosoma mansoni, which obscures this possible relationship. Among the sequenced taxa, base composition varies considerably, creating potential difficulties for phylogeny reconstruction. Long non-coding regions are present in all taxa, but these vary in length from only a few hundred to approximately 10000 nucleotides. Among Schistosoma spp., the long non-coding regions are rich in repeats and length variation among individuals is known. Data from mitochondrial genomes are valuable for studies on species identification, phylogenies and biogeography.
Alternative polyadenylation of tumor suppressor genes in small intestinal neuroendocrine tumors.
Rehfeld, Anders; Plass, Mireya; Døssing, Kristina; Knigge, Ulrich; Kjær, Andreas; Krogh, Anders; Friis-Hansen, Lennart
2014-01-01
The tumorigenesis of small intestinal neuroendocrine tumors (SI-NETs) is poorly understood. Recent studies have associated alternative polyadenylation (APA) with proliferation, cell transformation, and cancer. Polyadenylation is the process in which the pre-messenger RNA is cleaved at a polyA site and a polyA tail is added. Genes with two or more polyA sites can undergo APA. This produces two or more distinct mRNA isoforms with different 3' untranslated regions. Additionally, APA can also produce mRNAs containing different 3'-terminal coding regions. Therefore, APA alters both the repertoire and the expression level of proteins. Here, we used high-throughput sequencing data to map polyA sites and characterize polyadenylation genome-wide in three SI-NETs and a reference sample. In the tumors, 16 genes showed significant changes of APA pattern, which lead to either the 3' truncation of mRNA coding regions or 3' untranslated regions. Among these, 11 genes had been previously associated with cancer, with 4 genes being known tumor suppressors: DCC, PDZD2, MAGI1, and DACT2. We validated the APA in three out of three cases with quantitative real-time-PCR. Our findings suggest that changes of APA pattern in these 16 genes could be involved in the tumorigenesis of SI-NETs. Furthermore, they also point to APA as a new target for both diagnostic and treatment of SI-NETs. The identified genes with APA specific to the SI-NETs could be further tested as diagnostic markers and drug targets for disease prevention and treatment.
Alternative Polyadenylation of Tumor Suppressor Genes in Small Intestinal Neuroendocrine Tumors
Rehfeld, Anders; Plass, Mireya; Døssing, Kristina; Knigge, Ulrich; Kjær, Andreas; Krogh, Anders; Friis-Hansen, Lennart
2014-01-01
The tumorigenesis of small intestinal neuroendocrine tumors (SI-NETs) is poorly understood. Recent studies have associated alternative polyadenylation (APA) with proliferation, cell transformation, and cancer. Polyadenylation is the process in which the pre-messenger RNA is cleaved at a polyA site and a polyA tail is added. Genes with two or more polyA sites can undergo APA. This produces two or more distinct mRNA isoforms with different 3′ untranslated regions. Additionally, APA can also produce mRNAs containing different 3′-terminal coding regions. Therefore, APA alters both the repertoire and the expression level of proteins. Here, we used high-throughput sequencing data to map polyA sites and characterize polyadenylation genome-wide in three SI-NETs and a reference sample. In the tumors, 16 genes showed significant changes of APA pattern, which lead to either the 3′ truncation of mRNA coding regions or 3′ untranslated regions. Among these, 11 genes had been previously associated with cancer, with 4 genes being known tumor suppressors: DCC, PDZD2, MAGI1, and DACT2. We validated the APA in three out of three cases with quantitative real-time-PCR. Our findings suggest that changes of APA pattern in these 16 genes could be involved in the tumorigenesis of SI-NETs. Furthermore, they also point to APA as a new target for both diagnostic and treatment of SI-NETs. The identified genes with APA specific to the SI-NETs could be further tested as diagnostic markers and drug targets for disease prevention and treatment. PMID:24782827
Kress, W. John; Erickson, David L.
2007-01-01
Background A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Methodology/Principal Findings Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. Conclusions/Significance A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination. PMID:17551588
Oh, Juliana J.; Koegel, Ashley; Phan, Diana T.; Razfar, Ali; Slamon, Dennis J.
2007-01-01
Summary Allele loss and genetic alteration in chromosome 3p, particularly in 3p21.3 region, are the most frequent and the earliest genomic abnormalities found in lung cancer. Multiple 3p21.3 genes exhibit various degrees of tumour suppression activity suggesting that 3p21.3 genes may function as an integrated tumour suppressor region through their diverse biological activities. We have previously demonstrated growth inhibitory effects and tumour suppression mechanism of the H37/RBM5 gene which is one of the 19 genes residing in the 370kb minimal overlap region at 3p21.3. In the current study, in an attempt to find, if any, mutations in the H37 coding region in lung cancer cells, we compared nucleotide sequences of the entire H37 gene in tumour vs. adjacent normal tissues from 17 non-small cell lung cancer (NSCLC) patients. No mutations were detected, instead, we found the two silent single nucleotide polymorphisms (SNPs), C1138T and C2185T, within the coding region of the H37 gene. In addition, we found that specific allele types at these SNP positions are correlated with different histological subtypes of NSCLC; tumours containing heterozygous alleles (C+T) at these SNP positions are more likely to be associated with adenocarcinoma (AC) whereas homozygous alleles (either C or T) are associated with squamous cell carcinoma (SCC) (p=0.0098). We postulate that, these two silent polymorphisms may be in linkage disequilibrium (LD) with a disease causative allele in the 3p21.3 tumour suppressor region which is packed with a large number of important genes affecting lung cancer development. In addition, because of prevalent loss of heterozygosity (LOH) detected at 3p21.3 which precedes lung cancer initiation, these SNPs may be developed into a marker screening for the high risk individuals. PMID:17606309
Hrdlickova, Barbara; Kumar, Vinod; Kanduri, Kartiek; Zhernakova, Daria V; Tripathi, Subhash; Karjalainen, Juha; Lund, Riikka J; Li, Yang; Ullah, Ubaid; Modderman, Rutger; Abdulahad, Wayel; Lähdesmäki, Harri; Franke, Lude; Lahesmaa, Riitta; Wijmenga, Cisca; Withoff, Sebo
2014-01-01
Although genome-wide association studies (GWAS) have identified hundreds of variants associated with a risk for autoimmune and immune-related disorders (AID), our understanding of the disease mechanisms is still limited. In particular, more than 90% of the risk variants lie in non-coding regions, and almost 10% of these map to long non-coding RNA transcripts (lncRNAs). lncRNAs are known to show more cell-type specificity than protein-coding genes. We aimed to characterize lncRNAs and protein-coding genes located in loci associated with nine AIDs which have been well-defined by Immunochip analysis and by transcriptome analysis across seven populations of peripheral blood leukocytes (granulocytes, monocytes, natural killer (NK) cells, B cells, memory T cells, naive CD4(+) and naive CD8(+) T cells) and four populations of cord blood-derived T-helper cells (precursor, primary, and polarized (Th1, Th2) T-helper cells). We show that lncRNAs mapping to loci shared between AID are significantly enriched in immune cell types compared to lncRNAs from the whole genome (α <0.005). We were not able to prioritize single cell types relevant for specific diseases, but we observed five different cell types enriched (α <0.005) in five AID (NK cells for inflammatory bowel disease, juvenile idiopathic arthritis, primary biliary cirrhosis, and psoriasis; memory T and CD8(+) T cells in juvenile idiopathic arthritis, primary biliary cirrhosis, psoriasis, and rheumatoid arthritis; Th0 and Th2 cells for inflammatory bowel disease, juvenile idiopathic arthritis, primary biliary cirrhosis, psoriasis, and rheumatoid arthritis). Furthermore, we show that co-expression analyses of lncRNAs and protein-coding genes can predict the signaling pathways in which these AID-associated lncRNAs are involved. The observed enrichment of lncRNA transcripts in AID loci implies lncRNAs play an important role in AID etiology and suggests that lncRNA genes should be studied in more detail to interpret GWAS findings correctly. The co-expression results strongly support a model in which the lncRNA and protein-coding genes function together in the same pathways.
Jheng, Cheng-Fong; Chen, Tien-Chih; Lin, Jhong-Yi; Chen, Ting-Chieh; Wu, Wen-Luan; Chang, Ching-Chun
2012-07-01
The chloroplast genome of Phalaenopsis equestris was determined and compared to those of Phalaenopsis aphrodite and Oncidium Gower Ramsey in Orchidaceae. The chloroplast genome of P. equestris is 148,959 bp, and a pair of inverted repeats (25,846 bp) separates the genome into large single-copy (85,967 bp) and small single-copy (11,300 bp) regions. The genome encodes 109 genes, including 4 rRNA, 30 tRNA and 75 protein-coding genes, but loses four ndh genes (ndhA, E, F and H) and seven other ndh genes are pseudogenes. The rate of inter-species variation between the two moth orchids was 0.74% (1107 sites) for single nucleotide substitution and 0.24% for insertions (161 sites; 1388 bp) and deletions (189 sites; 1393 bp). The IR regions have a lower rate of nucleotide substitution (3.5-5.8-fold) and indels (4.3-7.1-fold) than single-copy regions. The intergenic spacers are the most divergent, and based on the length variation of the three intergenic spacers, 11 native Phalaenopsis orchids could be successfully distinguished. The coding genes, IR junction and RNA editing sites are relatively more conserved between the two moth orchids than between those of Phalaenopsis and Oncidium spp. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Luethi, E.; Jasmat, N.B.; Grayling, R.A.
1991-03-01
A {lambda} recombinant phage expressing {beta}-mannanase activity in Escherichia coli has been isolated from a genomic library of the extremely thermophilic anaerobe Caldocellum saccharolyticum. The gene was cloned into pBR322 on a 5-kb BamHI fragment, and its location was obtained by deletion analysis. The sequence of a 2.1-kb fragment containing the mannanase gene has been determined. One open reading frame was found which could code for a protein of M{sub r} 38,904. The mannanase gene (manA) was overexpressed in E. coli by cloning the gene downstream from the lacZ promoter of pUC18. The enzyme was most active at pH 6more » and 80 C and degraded locust bean gum, guar gum, Pinus radiata glucomannan, and konjak glucomannan. The noncoding region downstream from the mannanase gene showed strong homology to celB, a gene coding for a cellulase from the same organism, suggesting that the manA gene might have been inserted into its present position on the C. saccharolyticum genome by homologous recombination.« less
Zhu, Shiyou; Li, Wei; Liu, Jingze; Chen, Chen-Hao; Liao, Qi; Xu, Ping; Xu, Han; Xiao, Tengfei; Cao, Zhongzheng; Peng, Jingyu; Yuan, Pengfei; Brown, Myles; Liu, Xiaole Shirley; Wei, Wensheng
2017-01-01
CRISPR/Cas9 screens have been widely adopted to analyse coding gene functions, but high throughput screening of non-coding elements using this method is more challenging, because indels caused by a single cut in non-coding regions are unlikely to produce a functional knockout. A high-throughput method to produce deletions of non-coding DNA is needed. Herein, we report a high throughput genomic deletion strategy to screen for functional long non-coding RNAs (lncRNAs) that is based on a lentiviral paired-guide RNA (pgRNA) library. Applying our screening method, we identified 51 lncRNAs that can positively or negatively regulate human cancer cell growth. We individually validated 9 lncRNAs using CRISPR/Cas9-mediated genomic deletion and functional rescue, CRISPR activation or inhibition, and gene expression profiling. Our high-throughput pgRNA genome deletion method should enable rapid identification of functional mammalian non-coding elements. PMID:27798563
Zhang, Wenping; Yue, Bisong; Wang, Xiaofang; Zhang, Xiuyue; Xie, Zhong; Liu, Nonglin; Fu, Wenyuan; Yuan, Yaohua; Chen, Daqing; Fu, Danghua; Zhao, Bo; Yin, Yuzhong; Yan, Xiahui; Wang, Xinjing; Zhang, Rongying; Liu, Jie; Li, Maoping; Tang, Yao; Hou, Rong; Zhang, Zhihe
2011-10-01
In order to investigate the mitochondrial genome of Panthera tigris amoyensis, two South China tigers (P25 and P27) were analyzed following 15 cymt-specific primer sets. The entire mtDNA sequence was found to be 16,957 bp and 17,001 bp long for P25 and P27 respectively, and this difference in length between P25 and P27 occurred in the number of tandem repeats in the RS-3 segment of the control region. The structural characteristics of complete P. t. amoyensis mitochondrial genomes were also highly similar to those of P. uncia. Additionally, the rate of point mutation was only 0.3% and a total of 59 variable sites between P25 and P27 were found. Out of the 59 variable sites, 6 were located in 6 different tRNA genes, 6 in the 2 rRNA genes, 7 in non-coding regions (one located between tRNA-Asn and tRNA-Tyr and six in the D-loop), and 40 in 10 protein-coding genes. COI held the largest amount of variable sites (9 sites) and Cytb contained the highest variable rate (0.7%) in the complete sequences. Moreover, out of the 40 variable sites located in 10 protein-coding genes, 12 sites were nonsynonymous.
Toyoda, N; Kleinhaus, N; Larsen, P R
1996-06-01
We analyzed the exon-intron structure of the human type 1 deiodinase gene (dio1) and compared it with that of a patient with suspected congenital type 1 deiodinase (D1) deficiency. The hdio1 gene is identical in exon-intron arrangement to the mouse gene, with coding sequences and a selenocysteine insertion sequence (SECIS) element contained in four exons. There were no mutations in the sequences of exons 1-4 of the patient's genomic DNA. Functional studies by transient expression techniques showed no difference in basal promoter activity or T3 responsiveness between the patient's and the normal dio1 gene. A structural abnormality in the dio1 gene is not a likely explanation for this patient's D1-deficient phenotype.
The complete chloroplast genome of Sinopodophyllum hexandrum Ying (Berberidaceae).
Meng, Lihua; Liu, Ruijuan; Chen, Jianbing; Ding, Chenxu
2017-05-01
The complete nucleotide sequence of the Sinopodophyllum hexandrum Ying chloroplast genome (cpDNA) was determined based on next-generation sequencing technologies in this study. The genome was 157 203 bp in length, containing a pair of inverted repeat (IRa and IRb) regions of 25 960 bp, which were separated by a large single-copy (LSC) region of 87 065 bp and a small single-copy (SSC) region of 18 218 bp, respectively. The cpDNA contained 148 genes, including 96 protein-coding genes, 8 ribosomal RNA genes, and 44 tRNA genes. In these genes, eight harbored a single intron, and two (ycf3 and clpP) contained a couple of introns. The cpDNA AT content of S. hexandrum cpDNA is 61.5%.
Evolution of the Iga Heavy Chain Gene in the Genus Mus
Osborne, B. A.; Golde, T. E.; Schwartz, R. L.; Rudikoff, S.
1988-01-01
To examine questions of immunoglobulin gene evolution, the IgA α heavy chain gene from Mus pahari, an evolutionarily distant relative to Mus musculus domesticus, was cloned and sequenced. The sequence, when compared to the IgA gene of BALB/c or human, demonstrated that the IgA gene is evolving in a mosaic fashion with the hinge region accumulating mutations most rapidly and the third domain at a considerably lower frequency. In spite of this pronounced accumulation of mutations, the hinge region appears to maintain the conformation of a random coil. A marked propensity to accumulate replacement over silent site changes in the coding regions was noted, as was a definite codon bias. The possibility that these two phenomena are interrelated is discussed. PMID:2842228
Shaffer, Christopher D.; Chen, Elizabeth J.; Quisenberry, Thomas J.; Ko, Kevin; Braverman, John M.; Giarla, Thomas C.; Mortimer, Nathan T.; Reed, Laura K.; Smith, Sheryl T.; Robic, Srebrenka; McCartha, Shannon R.; Perry, Danielle R.; Prescod, Lindsay M.; Sheppard, Zenyth A.; Saville, Ken J.; McClish, Allison; Morlock, Emily A.; Sochor, Victoria R.; Stanton, Brittney; Veysey-White, Isaac C.; Revie, Dennis; Jimenez, Luis A.; Palomino, Jennifer J.; Patao, Melissa D.; Patao, Shane M.; Himelblau, Edward T.; Campbell, Jaclyn D.; Hertz, Alexandra L.; McEvilly, Maddison F.; Wagner, Allison R.; Youngblom, James; Bedi, Baljit; Bettincourt, Jeffery; Duso, Erin; Her, Maiye; Hilton, William; House, Samantha; Karimi, Masud; Kumimoto, Kevin; Lee, Rebekah; Lopez, Darryl; Odisho, George; Prasad, Ricky; Robbins, Holly Lyn; Sandhu, Tanveer; Selfridge, Tracy; Tsukashima, Kara; Yosif, Hani; Kokan, Nighat P.; Britt, Latia; Zoellner, Alycia; Spana, Eric P.; Chlebina, Ben T.; Chong, Insun; Friedman, Harrison; Mammo, Danny A.; Ng, Chun L.; Nikam, Vinayak S.; Schwartz, Nicholas U.; Xu, Thomas Q.; Burg, Martin G.; Batten, Spencer M.; Corbeill, Lindsay M.; Enoch, Erica; Ensign, Jesse J.; Franks, Mary E.; Haiker, Breanna; Ingles, Judith A.; Kirkland, Lyndsay D.; Lorenz-Guertin, Joshua M.; Matthews, Jordan; Mittig, Cody M.; Monsma, Nicholaus; Olson, Katherine J.; Perez-Aragon, Guillermo; Ramic, Alen; Ramirez, Jordan R.; Scheiber, Christopher; Schneider, Patrick A.; Schultz, Devon E.; Simon, Matthew; Spencer, Eric; Wernette, Adam C.; Wykle, Maxine E.; Zavala-Arellano, Elizabeth; McDonald, Mitchell J.; Ostby, Kristine; Wendland, Peter; DiAngelo, Justin R.; Ceasrine, Alexis M.; Cox, Amanda H.; Docherty, James E.B.; Gingras, Robert M.; Grieb, Stephanie M.; Pavia, Michael J.; Personius, Casey L.; Polak, Grzegorz L.; Beach, Dale L.; Cerritos, Heaven L.; Horansky, Edward A.; Sharif, Karim A.; Moran, Ryan; Parrish, Susan; Bickford, Kirsten; Bland, Jennifer; Broussard, Juliana; Campbell, Kerry; Deibel, Katelynn E.; Forka, Richard; Lemke, Monika C.; Nelson, Marlee B.; O'Keeffe, Catherine; Ramey, S. Mariel; Schmidt, Luke; Villegas, Paola; Jones, Christopher J.; Christ, Stephanie L.; Mamari, Sami; Rinaldi, Adam S.; Stity, Ghazal; Hark, Amy T.; Scheuerman, Mark; Silver Key, S. Catherine; McRae, Briana D.; Haberman, Adam S.; Asinof, Sam; Carrington, Harriette; Drumm, Kelly; Embry, Terrance; McGuire, Richard; Miller-Foreman, Drew; Rosen, Stella; Safa, Nadia; Schultz, Darrin; Segal, Matt; Shevin, Yakov; Svoronos, Petros; Vuong, Tam; Skuse, Gary; Paetkau, Don W.; Bridgman, Rachael K.; Brown, Charlotte M.; Carroll, Alicia R.; Gifford, Francesca M.; Gillespie, Julie Beth; Herman, Susan E.; Holtcamp, Krystal L.; Host, Misha A.; Hussey, Gabrielle; Kramer, Danielle M.; Lawrence, Joan Q.; Martin, Madeline M.; Niemiec, Ellen N.; O'Reilly, Ashleigh P.; Pahl, Olivia A.; Quintana, Guadalupe; Rettie, Elizabeth A.S.; Richardson, Torie L.; Rodriguez, Arianne E.; Rodriguez, Mona O.; Schiraldi, Laura; Smith, Joanna J.; Sugrue, Kelsey F.; Suriano, Lindsey J.; Takach, Kaitlyn E.; Vasquez, Arielle M.; Velez, Ximena; Villafuerte, Elizabeth J.; Vives, Laura T.; Zellmer, Victoria R.; Hauke, Jeanette; Hauser, Charles R.; Barker, Karolyn; Cannon, Laurie; Parsamian, Perouza; Parsons, Samantha; Wichman, Zachariah; Bazinet, Christopher W.; Johnson, Diana E.; Bangura, Abubakarr; Black, Jordan A.; Chevee, Victoria; Einsteen, Sarah A.; Hilton, Sarah K.; Kollmer, Max; Nadendla, Rahul; Stamm, Joyce; Fafara-Thompson, Antoinette E.; Gygi, Amber M.; Ogawa, Emmy E.; Van Camp, Matt; Kocsisova, Zuzana; Leatherman, Judith L.; Modahl, Cassie M.; Rubin, Michael R.; Apiz-Saab, Susana S.; Arias-Mejias, Suzette M.; Carrion-Ortiz, Carlos F.; Claudio-Vazquez, Patricia N.; Espada-Green, Debbie M.; Feliciano-Camacho, Marium; Gonzalez-Bonilla, Karina M.; Taboas-Arroyo, Mariela; Vargas-Franco, Dorianmarie; Montañez-Gonzalez, Raquel; Perez-Otero, Joseph; Rivera-Burgos, Myrielis; Rivera-Rosario, Francisco J.; Eisler, Heather L.; Alexander, Jackie; Begley, Samatha K.; Gabbard, Deana; Allen, Robert J.; Aung, Wint Yan; Barshop, William D.; Boozalis, Amanda; Chu, Vanessa P.; Davis, Jeremy S.; Duggal, Ryan N.; Franklin, Robert; Gavinski, Katherine; Gebreyesus, Heran; Gong, Henry Z.; Greenstein, Rachel A.; Guo, Averill D.; Hanson, Casey; Homa, Kaitlin E.; Hsu, Simon C.; Huang, Yi; Huo, Lucy; Jacobs, Sarah; Jia, Sasha; Jung, Kyle L.; Wai-Chee Kong, Sarah; Kroll, Matthew R.; Lee, Brandon M.; Lee, Paul F.; Levine, Kevin M.; Li, Amy S.; Liu, Chengyu; Liu, Max Mian; Lousararian, Adam P.; Lowery, Peter B.; Mallya, Allyson P.; Marcus, Joseph E.; Ng, Patrick C.; Nguyen, Hien P.; Patel, Ruchik; Precht, Hashini; Rastogi, Suchita; Sarezky, Jonathan M.; Schefkind, Adam; Schultz, Michael B.; Shen, Delia; Skorupa, Tara; Spies, Nicholas C.; Stancu, Gabriel; Vivian Tsang, Hiu Man; Turski, Alice L.; Venkat, Rohit; Waldman, Leah E.; Wang, Kaidi; Wang, Tracy; Wei, Jeffrey W.; Wu, Dennis Y.; Xiong, David D.; Yu, Jack; Zhou, Karen; McNeil, Gerard P.; Fernandez, Robert W.; Menzies, Patrick Gomez; Gu, Tingting; Buhler, Jeremy; Mardis, Elaine R.; Elgin, Sarah C.R.
2017-01-01
The discordance between genome size and the complexity of eukaryotes can partly be attributed to differences in repeat density. The Muller F element (∼5.2 Mb) is the smallest chromosome in Drosophila melanogaster, but it is substantially larger (>18.7 Mb) in D. ananassae. To identify the major contributors to the expansion of the F element and to assess their impact, we improved the genome sequence and annotated the genes in a 1.4-Mb region of the D. ananassae F element, and a 1.7-Mb region from the D element for comparison. We find that transposons (particularly LTR and LINE retrotransposons) are major contributors to this expansion (78.6%), while Wolbachia sequences integrated into the D. ananassae genome are minor contributors (0.02%). Both D. melanogaster and D. ananassae F-element genes exhibit distinct characteristics compared to D-element genes (e.g., larger coding spans, larger introns, more coding exons, and lower codon bias), but these differences are exaggerated in D. ananassae. Compared to D. melanogaster, the codon bias observed in D. ananassae F-element genes can primarily be attributed to mutational biases instead of selection. The 5′ ends of F-element genes in both species are enriched in dimethylation of lysine 4 on histone 3 (H3K4me2), while the coding spans are enriched in H3K9me2. Despite differences in repeat density and gene characteristics, D. ananassae F-element genes show a similar range of expression levels compared to genes in euchromatic domains. This study improves our understanding of how transposons can affect genome size and how genes can function within highly repetitive domains. PMID:28667019
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xie, Enzhong; Zhu, Lingyu; Zhao, Lingyun
1996-08-01
The complete 4775-nt cDNA encoding the human serotonin 5-HT{sub 2C} receptor (5-HT{sub 2C}R), a G-protein-coupled receptor, has been isolated. It contains a 1377-nt coding region flanked by a 728-nt 5{prime}-untranslated region and a 2670-nt 3{prime}-untranslated region. By using the cloned 5-HT{sub 2C}R cDNA probe, the complete human gene for this receptor has been isolated and shown to contain six exons and five introns spanning at least 230 kb of DNA. The coding region of the human 5-HT{sub 2C}R gene is interrupted by three introns, and the positions of the intron/exon junctions are conserved between the human and the rodent genes.more » In addition, an alternatively spliced 5-HT{sub 2C}R RNA that contains a 95-nt deletion in the region coding for the second intracellular loop and the fourth transmembrane domain of the receptor has been identified. This deletion leads to a frameshift and premature termination so that the short isoform RNA encodes a putative protein of 248 amino acids. The ratio for the short isoform over the 5-HT{sub 2C}R RNA was found to be higher in choroid plexus tumor than in normal brain tissue, suggesting the possibility of differential regulation of the 5-HT{sub 2C}R gene in different neural tissues or during tumorigenesis. Transcription of the human 5-HT{sub 2C}R gene was found to be initiated at multiple sites. No classical TATA-box sequence was found at the appropriate location, and the 5{prime}-flanking sequence contains many potential transcription factor-binding sites. A 7.3-kb 5{prime}-flanking 5-HT{sub 2C}R DNA directed the efficient expression of a luciferase reported gene in SK-N-SH and IMR32 neuroblastoma cells, indicating that is contains a functional promoter. 69 refs., 8 figs., 1 tab.« less
Erturk, Elif; Cecener, Gulsah; Polatkan, Volkan; Gokgoz, Sehsuvar; Egeli, Unal; Tunca, Berrin; Tezcan, Gulcin; Demirdogen, Elif; Ak, Secil; Tasdelen, Ismet
2014-01-01
Although genetic markers identifying women at an increased risk of developing breast cancer exist, the majority of inherited risk factors remain elusive. Mutations in the BRCA1/BRCA2 gene confer a substantial increase in breast cancer risk, yet routine clinical genetic screening is limited to the coding regions and intron- exon boundaries, precluding the identification of mutations in noncoding and untranslated regions. Because 3' untranslated region (3'UTR) polymorphisms disrupting microRNA (miRNA) binding can be functional and can act as genetic markers of cancer risk, we aimed to determine genetic variation in the 3'UTR of BRCA1/BRCA2 in familial and early-onset breast cancer patients with and without mutations in the coding regions of BRCA1/ BRCA2 and to identify specific 3'UTR variants that may be risk factors for cancer development. The 3'UTRs of the BRCA1 and BRCA2 genes were screened by heteroduplex analysis and DNA sequencing in 100 patients from 46 BRCA1/2 families, 54 non-BRCA1/2 families, and 47 geographically matched controls. Two polymorphisms were identified. SNPs c.*1287C>T (rs12516) (BRCA1) and c.*105A>C (rs15869) (BRCA2) were identified in 27% and 24% of patients, respectively. These 2 variants were also identified in controls with no family history of cancer (23.4% and 23.4%, respectively). In comparison to variations in the 3'UTR region of the BRCA1/2 genes and the BRCA1/2 mutational status in patients, there was a statistically significant relationship between the BRCA1 gene polymorphism c.*1287C>T (rs12516) and BRCA1 mutations (p=0.035) by Fisher's Exact Test. SNP c.*1287C>T (rs12516) of the BRCA1 gene may have potential use as a genetic marker of an increased risk of developing breast cancer and likely represents a non-coding sequence variation in BRCA1 that impacts BRCA1 function and leads to increased early-onset and/or familial breast cancer risk in the Turkish population.
Activity-Dependent Human Brain Coding/Noncoding Gene Regulatory Networks
Lipovich, Leonard; Dachet, Fabien; Cai, Juan; Bagla, Shruti; Balan, Karina; Jia, Hui; Loeb, Jeffrey A.
2012-01-01
While most gene transcription yields RNA transcripts that code for proteins, a sizable proportion of the genome generates RNA transcripts that do not code for proteins, but may have important regulatory functions. The brain-derived neurotrophic factor (BDNF) gene, a key regulator of neuronal activity, is overlapped by a primate-specific, antisense long noncoding RNA (lncRNA) called BDNFOS. We demonstrate reciprocal patterns of BDNF and BDNFOS transcription in highly active regions of human neocortex removed as a treatment for intractable seizures. A genome-wide analysis of activity-dependent coding and noncoding human transcription using a custom lncRNA microarray identified 1288 differentially expressed lncRNAs, of which 26 had expression profiles that matched activity-dependent coding genes and an additional 8 were adjacent to or overlapping with differentially expressed protein-coding genes. The functions of most of these protein-coding partner genes, such as ARC, include long-term potentiation, synaptic activity, and memory. The nuclear lncRNAs NEAT1, MALAT1, and RPPH1, composing an RNAse P-dependent lncRNA-maturation pathway, were also upregulated. As a means to replicate human neuronal activity, repeated depolarization of SY5Y cells resulted in sustained CREB activation and produced an inverse pattern of BDNF-BDNFOS co-expression that was not achieved with a single depolarization. RNAi-mediated knockdown of BDNFOS in human SY5Y cells increased BDNF expression, suggesting that BDNFOS directly downregulates BDNF. Temporal expression patterns of other lncRNA-messenger RNA pairs validated the effect of chronic neuronal activity on the transcriptome and implied various lncRNA regulatory mechanisms. lncRNAs, some of which are unique to primates, thus appear to have potentially important regulatory roles in activity-dependent human brain plasticity. PMID:22960213
The complete mitochondrial genome of Octopus bimaculatus Verrill, 1883 from the Gulf of California.
Domínguez-Contreras, José Francisco; Munguia-Vega, Adrian; Ceballos-Vázquez, Bertha Patricia; García-Rodriguez, Francisco Javier; Arellano-Martinez, Marcial
2016-11-01
The complete mitochondrial genome of Octopus bimaculatus is 16 085 bp in length and includes 13 protein-codes genes, 2 ribosomal RNA genes, 22 transfers RNA genes, and a control region. The composition of genome is A (40.9%), T (34.7%), C (16.9%), and G (7.5%). The control region of O. bimaculatus contains a VNTR locus not present in the genomes from other octopus species. A phylogenetic analysis shows a closer relationship between the mitogenomes from O. bimaculatus and O. vulgaris.
Many human accelerated regions are developmental enhancers
Capra, John A.; Erwin, Genevieve D.; McKinsey, Gabriel; Rubenstein, John L. R.; Pollard, Katherine S.
2013-01-01
The genetic changes underlying the dramatic differences in form and function between humans and other primates are largely unknown, although it is clear that gene regulatory changes play an important role. To identify regulatory sequences with potentially human-specific functions, we and others used comparative genomics to find non-coding regions conserved across mammals that have acquired many sequence changes in humans since divergence from chimpanzees. These regions are good candidates for performing human-specific regulatory functions. Here, we analysed the DNA sequence, evolutionary history, histone modifications, chromatin state and transcription factor (TF) binding sites of a combined set of 2649 non-coding human accelerated regions (ncHARs) and predicted that at least 30% of them function as developmental enhancers. We prioritized the predicted ncHAR enhancers using analysis of TF binding site gain and loss, along with the functional annotations and expression patterns of nearby genes. We then tested both the human and chimpanzee sequence for 29 ncHARs in transgenic mice, and found 24 novel developmental enhancers active in both species, 17 of which had very consistent patterns of activity in specific embryonic tissues. Of these ncHAR enhancers, five drove expression patterns suggestive of different activity for the human and chimpanzee sequence at embryonic day 11.5. The changes to human non-coding DNA in these ncHAR enhancers may modify the complex patterns of gene expression necessary for proper development in a human-specific manner and are thus promising candidates for understanding the genetic basis of human-specific biology. PMID:24218637
Sequence and analysis of chromosome 4 of the plant Arabidopsis thaliana.
Mayer, K; Schüller, C; Wambutt, R; Murphy, G; Volckaert, G; Pohl, T; Düsterhöft, A; Stiekema, W; Entian, K D; Terryn, N; Harris, B; Ansorge, W; Brandt, P; Grivell, L; Rieger, M; Weichselgartner, M; de Simone, V; Obermaier, B; Mache, R; Müller, M; Kreis, M; Delseny, M; Puigdomenech, P; Watson, M; Schmidtheini, T; Reichert, B; Portatelle, D; Perez-Alonso, M; Boutry, M; Bancroft, I; Vos, P; Hoheisel, J; Zimmermann, W; Wedler, H; Ridley, P; Langham, S A; McCullagh, B; Bilham, L; Robben, J; Van der Schueren, J; Grymonprez, B; Chuang, Y J; Vandenbussche, F; Braeken, M; Weltjens, I; Voet, M; Bastiaens, I; Aert, R; Defoor, E; Weitzenegger, T; Bothe, G; Ramsperger, U; Hilbert, H; Braun, M; Holzer, E; Brandt, A; Peters, S; van Staveren, M; Dirske, W; Mooijman, P; Klein Lankhorst, R; Rose, M; Hauf, J; Kötter, P; Berneiser, S; Hempel, S; Feldpausch, M; Lamberth, S; Van den Daele, H; De Keyser, A; Buysshaert, C; Gielen, J; Villarroel, R; De Clercq, R; Van Montagu, M; Rogers, J; Cronin, A; Quail, M; Bray-Allen, S; Clark, L; Doggett, J; Hall, S; Kay, M; Lennard, N; McLay, K; Mayes, R; Pettett, A; Rajandream, M A; Lyne, M; Benes, V; Rechmann, S; Borkova, D; Blöcker, H; Scharfe, M; Grimm, M; Löhnert, T H; Dose, S; de Haan, M; Maarse, A; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Fartmann, B; Granderath, K; Dauner, D; Herzl, A; Neumann, S; Argiriou, A; Vitale, D; Liguori, R; Piravandi, E; Massenet, O; Quigley, F; Clabauld, G; Mündlein, A; Felber, R; Schnabl, S; Hiller, R; Schmidt, W; Lecharny, A; Aubourg, S; Chefdor, F; Cooke, R; Berger, C; Montfort, A; Casacuberta, E; Gibbons, T; Weber, N; Vandenbol, M; Bargues, M; Terol, J; Torres, A; Perez-Perez, A; Purnelle, B; Bent, E; Johnson, S; Tacon, D; Jesse, T; Heijnen, L; Schwarz, S; Scholler, P; Heber, S; Francs, P; Bielke, C; Frishman, D; Haase, D; Lemcke, K; Mewes, H W; Stocker, S; Zaccaria, P; Bevan, M; Wilson, R K; de la Bastide, M; Habermann, K; Parnell, L; Dedhia, N; Gnoj, L; Schutz, K; Huang, E; Spiegel, L; Sehkon, M; Murray, J; Sheet, P; Cordes, M; Abu-Threideh, J; Stoneking, T; Kalicki, J; Graves, T; Harmon, G; Edwards, J; Latreille, P; Courtney, L; Cloud, J; Abbott, A; Scott, K; Johnson, D; Minx, P; Bentley, D; Fulton, B; Miller, N; Greco, T; Kemp, K; Kramer, J; Fulton, L; Mardis, E; Dante, M; Pepin, K; Hillier, L; Nelson, J; Spieth, J; Ryan, E; Andrews, S; Geisel, C; Layman, D; Du, H; Ali, J; Berghoff, A; Jones, K; Drone, K; Cotton, M; Joshu, C; Antonoiu, B; Zidanic, M; Strong, C; Sun, H; Lamar, B; Yordan, C; Ma, P; Zhong, J; Preston, R; Vil, D; Shekher, M; Matero, A; Shah, R; Swaby, I K; O'Shaughnessy, A; Rodriguez, M; Hoffmann, J; Till, S; Granat, S; Shohdy, N; Hasegawa, A; Hameed, A; Lodhi, M; Johnson, A; Chen, E; Marra, M; Martienssen, R; McCombie, W R
1999-12-16
The higher plant Arabidopsis thaliana (Arabidopsis) is an important model for identifying plant genes and determining their function. To assist biological investigations and to define chromosome structure, a coordinated effort to sequence the Arabidopsis genome was initiated in late 1996. Here we report one of the first milestones of this project, the sequence of chromosome 4. Analysis of 17.38 megabases of unique sequence, representing about 17% of the genome, reveals 3,744 protein coding genes, 81 transfer RNAs and numerous repeat elements. Heterochromatic regions surrounding the putative centromere, which has not yet been completely sequenced, are characterized by an increased frequency of a variety of repeats, new repeats, reduced recombination, lowered gene density and lowered gene expression. Roughly 60% of the predicted protein-coding genes have been functionally characterized on the basis of their homology to known genes. Many genes encode predicted proteins that are homologous to human and Caenorhabditis elegans proteins.
Complete mitochondrial DNA sequence of the Eastern keelback mullet Liza affinis.
Gong, Xiaoling; Zhu, Wenjia; Bao, Baolong
2016-05-01
Eastern keelback mullet (Liza affinis) inhabits inlet waters and estuaries of rivers. In this paper, we initially determined the complete mitochondrial genome of Liza affinis. The entire mtDNA sequence is 16,831 bp in length, including 2 rRNA genes, 22 tRNA genes, 13 protein-coding genes and 1 putative control region. Its order and numbers of genes are similar to most bony fishes.
Spielmann, A; Stutz, E
1983-10-25
The soybean chloroplast psb A gene (photosystem II thylakoid membrane protein of Mr 32 000, lysine-free) and the trn H gene (tRNAHisGUG), which both map in the large single copy region adjacent to one of the inverted repeat structures (IR1), have been sequenced including flanking regions. The psb A gene shows in its structural part 92% sequence homology with the corresponding genes of spinach and N. debneyi and contains also an open reading frame for 353 aminoacids. The aminoacid sequence of a potential primary translation product (calculated Mr, 38 904, no lysine) diverges from that of spinach and N. debneyi in only two positions in the C-terminal part. The trn H gene has the same polarity as the psb A gene and the coding region is located at the very end of the large single copy region. The deduced sequence of the soybean chloroplast tRNAHisGUG is identical with that of Zea mays chloroplasts. Both ends of the large single copy region were sequenced including a small segment of the adjacent IR1 and IR2.
Burgos, Mariana; Arenas, Alvaro; Cabrera, Rodrigo
2016-08-01
Inherited long QT syndrome (LQTS) is a cardiac channelopathy characterized by a prolongation of QT interval and the risk of syncope, cardiac arrest, and sudden cardiac death. Genetic diagnosis of LQTS is critical in medical practice as results can guide adequate management of patients and distinguish phenocopies such as catecholaminergic polymorphic ventricular tachycardia (CPVT). However, extensive screening of large genomic regions is required in order to reliably identify genetic causes. Semiconductor whole exome sequencing (WES) is a promising approach for the identification of variants in the coding regions of most human genes. DNA samples from 21 Colombian patients clinically diagnosed with LQTS were enriched for coding regions using multiplex polymerase chain reaction (PCR) and subjected to WES using a semiconductor sequencer. Semiconductor WES showed mean coverage of 93.6 % for all coding regions relevant to LQTS at >10× depth with high intra- and inter-assay depth heterogeneity. Fifteen variants were detected in 12 patients in genes associated with LQTS. Three variants were identified in three patients in genes associated with CPVT. Co-segregation analysis was performed when possible. All variants were analyzed with two pathogenicity prediction algorithms. The overall prevalence of LQTS and CPVT variants in our cohort was 71.4 %. All LQTS variants previously identified through commercial genetic testing were identified. Standardized WES assays can be easily implemented, often at a lower cost than sequencing panels. Our results show that WES can identify LQTS-causing mutations and permits differential diagnosis of related conditions in a real-world clinical setting. However, high heterogeneity in sequencing depth and low coverage in the most relevant genes is expected to be associated with reduced analytical sensitivity.
2013-01-01
Background Vitis vinifera L. is one of society’s most important agricultural crops with a broad genetic variability. The difficulty in recognizing grapevine genotypes based on ampelographic traits and secondary metabolites prompted the development of molecular markers suitable for achieving variety genetic identification. Findings Here, we propose a comparison between a multi-locus barcoding approach based on six chloroplast markers and a single-copy nuclear gene sequencing method using five coding regions combined with a character-based system with the aim of reconstructing cultivar-specific haplotypes and genotypes to be exploited for the molecular characterization of 157 V. vinifera accessions. The analysis of the chloroplast target regions proved the inadequacy of the DNA barcoding approach at the subspecies level, and hence further DNA genotyping analyses were targeted on the sequences of five nuclear single-copy genes amplified across all of the accessions. The sequencing of the coding region of the UFGT nuclear gene (UDP-glucose: flavonoid 3-0-glucosyltransferase, the key enzyme for the accumulation of anthocyanins in berry skins) enabled the discovery of discriminant SNPs (1/34 bp) and the reconstruction of 130 V. vinifera distinct genotypes. Most of the genotypes proved to be cultivar-specific, and only few genotypes were shared by more, although strictly related, cultivars. Conclusion On the whole, this technique was successful for inferring SNP-based genotypes of grapevine accessions suitable for assessing the genetic identity and ancestry of international cultivars and also useful for corroborating some hypotheses regarding the origin of local varieties, suggesting several issues of misidentification (synonymy/homonymy). PMID:24298902
The complete mitochondrial genome of the bagarius yarrelli from honghe river
NASA Astrophysics Data System (ADS)
Du, M.; Zhou, C. J.; Niu, B. Z.; Liu, Y. H.; Li, N.; Ai, J. L.; Xu, G. L.
2016-08-01
The total length of mitochondrial DNA sequence of the Bagarius yarrelli from the Honghe river of China is determined in this paper. The total length of the circular molecule is 16524 base pair which denoted a similar gene order to that of the other bony fishes, which include a non-coding control region, a replicated origin, two ribosome RNA (rRNA) genes, 22 transfer RNA (tRNA) genes as well as 13 protein-coding genes. Its whole base constitution is 31.4% for A, 26.9% for C, 15.7% for G and 26.0% for T, with an A+T bias of 57.4%. Those mitochondrial data would contribute to further study molecular evolution and population genetics of this species.
RNA-Seq Based Transcriptional Map of Bovine Respiratory Disease Pathogen “Histophilus somni 2336”
Kumar, Ranjit; Lawrence, Mark L.; Watt, James; Cooksey, Amanda M.; Burgess, Shane C.; Nanduri, Bindu
2012-01-01
Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify “novel” genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method. The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations. PMID:22276113
RNA-seq based transcriptional map of bovine respiratory disease pathogen "Histophilus somni 2336".
Kumar, Ranjit; Lawrence, Mark L; Watt, James; Cooksey, Amanda M; Burgess, Shane C; Nanduri, Bindu
2012-01-01
Genome structural annotation, i.e., identification and demarcation of the boundaries for all the functional elements in a genome (e.g., genes, non-coding RNAs, proteins and regulatory elements), is a prerequisite for systems level analysis. Current genome annotation programs do not identify all of the functional elements of the genome, especially small non-coding RNAs (sRNAs). Whole genome transcriptome analysis is a complementary method to identify "novel" genes, small RNAs, regulatory regions, and operon structures, thus improving the structural annotation in bacteria. In particular, the identification of non-coding RNAs has revealed their widespread occurrence and functional importance in gene regulation, stress and virulence. However, very little is known about non-coding transcripts in Histophilus somni, one of the causative agents of Bovine Respiratory Disease (BRD) as well as bovine infertility, abortion, septicemia, arthritis, myocarditis, and thrombotic meningoencephalitis. In this study, we report a single nucleotide resolution transcriptome map of H. somni strain 2336 using RNA-Seq method.The RNA-Seq based transcriptome map identified 94 sRNAs in the H. somni genome of which 82 sRNAs were never predicted or reported in earlier studies. We also identified 38 novel potential protein coding open reading frames that were absent in the current genome annotation. The transcriptome map allowed the identification of 278 operon (total 730 genes) structures in the genome. When compared with the genome sequence of a non-virulent strain 129Pt, a disproportionate number of sRNAs (∼30%) were located in genomic region unique to strain 2336 (∼18% of the total genome). This observation suggests that a number of the newly identified sRNAs in strain 2336 may be involved in strain-specific adaptations.
The complete chloroplast genome of salt cress (Eutrema salsugineum).
Guo, Xinyi; Hao, Guoqian; Ma, Tao
2016-07-01
The complete chloroplast (cp) sequence of the salt cress (Eutrema salsugineum), a plant well-adapted to salt stress, was presented in this study. The circular molecule is 153,407 bp in length and exhibit a typical quadripartite structure containing an 83,894 bp large single copy (LSC) region, a 17,607 bp small single copy (SSC) region, and the two 25,953 bp inverted repeats (IRs). The salt cress cp genome contains 135 known genes, including 87 protein-coding genes, 8 ribosomal RNA genes, and 40 tRNA genes; 21 of these are located in the inverted repeat region. As expected, phylogenetic analysis support the idea that E. salsugineum is sister to Brassiceae species within the Brassicaceae family.
Bäumlein, H; Wobus, U; Pustell, J; Kafatos, F C
1986-01-01
The field bean, Vicia faba L. var. minor, possesses two sub-families of 11 S legumin genes named A and B. We isolated from a genomic library a B-type gene (LeB4) and determined its primary DNA sequence. Gene LeB4 codes for a 484 amino acid residue prepropolypeptide, encompassing a signal peptide of 22 amino acid residues, an acidic, very hydrophilic alpha-chain of 281 residues and a basic, somewhat hydrophobic beta-chain of 181 residues. The latter two coding regions are immediately contiguous, but each is interrupted by a short intron. Type A legumin genes from soybean and pea are known to have introns in the same two positions, in addition to an extra intron (within the alpha-coding sequence). Sequence comparisons of legumin genes from these three plants revealed a highly conserved sequence element of at least 28 bp, centered at approximately 100 bp upstream of each cap site. The element is absent from the equivalent position of all non-legumin and other plant and fungal genes examined. We tentatively name this element "legumin box" and suggest that it may have a function in the regulation of legumin gene expression. PMID:3960730
Dumas, Laura; Dickens, C Michael; Anderson, Nathan; Davis, Jonathan; Bennett, Beth; Radcliffe, Richard A; Sikela, James M
2014-06-01
It has been well documented that genetic factors can influence predisposition to develop alcoholism. While the underlying genomic changes may be of several types, two of the most common and disease associated are copy number variations (CNVs) and sequence alterations of protein coding regions. The goal of this study was to identify CNVs and single-nucleotide polymorphisms that occur in gene coding regions that may play a role in influencing the risk of an individual developing alcoholism. Toward this end, two mouse strains were used that have been selectively bred based on their differential sensitivity to alcohol: the Inbred long sleep (ILS) and Inbred short sleep (ISS) mouse strains. Differences in initial response to alcohol have been linked to risk for alcoholism, and the ILS/ISS strains are used to investigate the genetics of initial sensitivity to alcohol. Array comparative genomic hybridization (arrayCGH) and exome sequencing were conducted to identify CNVs and gene coding sequence differences, respectively, between ILS and ISS mice. Mouse arrayCGH was performed using catalog Agilent 1 × 244 k mouse arrays. Subsequently, exome sequencing was carried out using an Illumina HiSeq 2000 instrument. ArrayCGH detected 74 CNVs that were strain-specific (38 ILS/36 ISS), including several ISS-specific deletions that contained genes implicated in brain function and neurotransmitter release. Among several interesting coding variations detected by exome sequencing was the gain of a premature stop codon in the alpha-amylase 2B (AMY2B) gene specifically in the ILS strain. In total, exome sequencing detected 2,597 and 1,768 strain-specific exonic gene variants in the ILS and ISS mice, respectively. This study represents the most comprehensive and detailed genomic comparison of ILS and ISS mouse strains to date. The two complementary genome-wide approaches identified strain-specific CNVs and gene coding sequence variations that should provide strong candidates to contribute to the alcohol-related phenotypic differences associated with these strains.
Enhancer elements upstream of the SHOX gene are active in the developing limb.
Durand, Claudia; Bangs, Fiona; Signolet, Jason; Decker, Eva; Tickle, Cheryll; Rappold, Gudrun
2010-05-01
Léri-Weill Dyschondrosteosis (LWD) is a dominant skeletal disorder characterized by short stature and distinct bone anomalies. SHOX gene mutations and deletions of regulatory elements downstream of SHOX resulting in haploinsufficiency have been found in patients with LWD. SHOX encodes a homeodomain transcription factor and is known to be expressed in the developing limb. We have now analyzed the regulatory significance of the region upstream of the SHOX gene. By comparative genomic analyses, we identified several conserved non-coding elements, which subsequently were tested in an in ovo enhancer assay in both chicken limb bud and cornea, where SHOX is also expressed. In this assay, we found three enhancers to be active in the developing chicken limb, but none were functional in the developing cornea. A screening of 60 LWD patients with an intact SHOX coding and downstream region did not yield any deletion of the upstream enhancer region. Thus, we speculate that SHOX upstream deletions occur at a lower frequency because of the structural organization of this genomic region and/or that SHOX upstream deletions may cause a phenotype that differs from the one observed in LWD.
Enhancer elements upstream of the SHOX gene are active in the developing limb
Durand, Claudia; Bangs, Fiona; Signolet, Jason; Decker, Eva; Tickle, Cheryll; Rappold, Gudrun
2010-01-01
Léri-Weill Dyschondrosteosis (LWD) is a dominant skeletal disorder characterized by short stature and distinct bone anomalies. SHOX gene mutations and deletions of regulatory elements downstream of SHOX resulting in haploinsufficiency have been found in patients with LWD. SHOX encodes a homeodomain transcription factor and is known to be expressed in the developing limb. We have now analyzed the regulatory significance of the region upstream of the SHOX gene. By comparative genomic analyses, we identified several conserved non-coding elements, which subsequently were tested in an in ovo enhancer assay in both chicken limb bud and cornea, where SHOX is also expressed. In this assay, we found three enhancers to be active in the developing chicken limb, but none were functional in the developing cornea. A screening of 60 LWD patients with an intact SHOX coding and downstream region did not yield any deletion of the upstream enhancer region. Thus, we speculate that SHOX upstream deletions occur at a lower frequency because of the structural organization of this genomic region and/or that SHOX upstream deletions may cause a phenotype that differs from the one observed in LWD. PMID:19997128
Novotny, Peter; Tang, Xiaojia; Kalari, Krishna R.; Gorodkin, Jan
2014-01-01
Traditional mutation assessment methods generally focus on predicting disruptive changes in protein-coding regions rather than non-coding regulatory regions like untranslated regions (UTRs) of mRNAs. The UTRs, however, are known to have many sequence and structural motifs that can regulate translational and transcriptional efficiency and stability of mRNAs through interaction with RNA-binding proteins and other non-coding RNAs like microRNAs (miRNAs). In a recent study, transcriptomes of tumor cells harboring mutant and wild-type KRAS (V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog) genes in patients with non-small cell lung cancer (NSCLC) have been sequenced to identify single nucleotide variations (SNVs). About 40% of the total SNVs (73,717) identified were mapped to UTRs, but omitted in the previous analysis. To meet this obvious demand for analysis of the UTRs, we designed a comprehensive pipeline to predict the effect of SNVs on two major regulatory elements, secondary structure and miRNA target sites. Out of 29,290 SNVs in 6462 genes, we predict 472 SNVs (in 408 genes) affecting local RNA secondary structure, 490 SNVs (in 447 genes) affecting miRNA target sites and 48 that do both. Together these disruptive SNVs were present in 803 different genes, out of which 188 (23.4%) were previously known to be cancer-associated. Notably, this ratio is significantly higher (one-sided Fisher's exact test p-value = 0.032) than the ratio (20.8%) of known cancer-associated genes (n = 1347) in our initial data set (n = 6462). Network analysis shows that the genes harboring disruptive SNVs were involved in molecular mechanisms of cancer, and the signaling pathways of LPS-stimulated MAPK, IL-6, iNOS, EIF2 and mTOR. In conclusion, we have found hundreds of SNVs which are highly disruptive with respect to changes in the secondary structure and miRNA target sites within UTRs. These changes hold the potential to alter the expression of known cancer genes or genes linked to cancer-associated pathways. PMID:24416147
Sabarinathan, Radhakrishnan; Wenzel, Anne; Novotny, Peter; Tang, Xiaojia; Kalari, Krishna R; Gorodkin, Jan
2014-01-01
Traditional mutation assessment methods generally focus on predicting disruptive changes in protein-coding regions rather than non-coding regulatory regions like untranslated regions (UTRs) of mRNAs. The UTRs, however, are known to have many sequence and structural motifs that can regulate translational and transcriptional efficiency and stability of mRNAs through interaction with RNA-binding proteins and other non-coding RNAs like microRNAs (miRNAs). In a recent study, transcriptomes of tumor cells harboring mutant and wild-type KRAS (V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog) genes in patients with non-small cell lung cancer (NSCLC) have been sequenced to identify single nucleotide variations (SNVs). About 40% of the total SNVs (73,717) identified were mapped to UTRs, but omitted in the previous analysis. To meet this obvious demand for analysis of the UTRs, we designed a comprehensive pipeline to predict the effect of SNVs on two major regulatory elements, secondary structure and miRNA target sites. Out of 29,290 SNVs in 6462 genes, we predict 472 SNVs (in 408 genes) affecting local RNA secondary structure, 490 SNVs (in 447 genes) affecting miRNA target sites and 48 that do both. Together these disruptive SNVs were present in 803 different genes, out of which 188 (23.4%) were previously known to be cancer-associated. Notably, this ratio is significantly higher (one-sided Fisher's exact test p-value = 0.032) than the ratio (20.8%) of known cancer-associated genes (n = 1347) in our initial data set (n = 6462). Network analysis shows that the genes harboring disruptive SNVs were involved in molecular mechanisms of cancer, and the signaling pathways of LPS-stimulated MAPK, IL-6, iNOS, EIF2 and mTOR. In conclusion, we have found hundreds of SNVs which are highly disruptive with respect to changes in the secondary structure and miRNA target sites within UTRs. These changes hold the potential to alter the expression of known cancer genes or genes linked to cancer-associated pathways.
Schwizer, Sarah; Tasara, Taurai; Zurfluh, Katrin; Stephan, Roger; Lehner, Angelika
2013-02-15
Cronobacter spp. are opportunistic pathogens that can cause septicemia and infections of the central nervous system primarily in premature, low-birth weight and/or immune-compromised neonates. Serum resistance is a crucial virulence factor for the development of systemic infections, including bacteremia. It was the aim of the current study to identify genes involved in serum tolerance in a selected Cronobacter sakazakii strain of clinical origin. Screening of 2749 random transposon knock out mutants of a C. sakazakii ES 5 library for modified serum tolerance (compared to wild type) revealed 10 mutants showing significantly increased/reduced resistance to serum killing. Identification of the affected sites in mutants displaying reduced serum resistance revealed genes encoding for surface and membrane proteins as well as regulatory elements or chaperones. By this approach, the involvement of the yet undescribed Wzy_C superfamily domain containing coding region in serum tolerance was observed and experimentally confirmed. Additionally, knock out mutants with enhanced serum tolerance were observed. Examination of respective transposon insertion loci revealed regulatory (repressor) elements, coding regions for chaperones and efflux systems as well as the coding region for the protein YbaJ. Real time expression analysis experiments revealed, that knock out of the gene for this protein negatively affects the expression of the fimA gene, which is a key structural component of the formation of fimbriae. Fimbriae are structures of high immunogenic potential and it is likely that absence/truncation of the ybaJ gene resulted in a non-fimbriated phenotype accounting for the enhanced survival of this mutant in human serum. By using a transposon knock out approach we were able to identify genes involved in both increased and reduced serum tolerance in Cronobacter sakazakii ES5. This study reveals first insights in the complex nature of serum tolerance of Cronobacter spp.
Chen, Zhi-Teng; Wu, Hai-Yan; Du, Yu-Zhou
2016-07-01
We report the nearly complete mitochondrial genome of a stonefly species, Styloperla sp. (Plecoptera: Styloperlidae), which is a circular molecule of 15,416 bp in length and consists of 13 protein-coding genes, 2 ribosomal RNAs, 20 transfer RNAs and a partial control region (645 bp). Using the 13 protein-coding genes of 8 stoneflies and 3 other related species, we constructed a phylogenetic tree to verify the accuracy of the new determined mitogenome sequences. Our results provide basic data for further study of phylogeny in Plecoptera.
Bustamante, Carlos; Ovenden, Jennifer R
2016-01-01
The silver gemfish Rexea solandri is an important economic resource but Vulnerable to overfishing in Australian waters. The complete mitochondrial genome sequence is described from 1.6 million reads obtained via next generation sequencing. The total length of the mitogenome is 16,350 bp comprising 2 rRNA, 13 protein-coding genes, 22 tRNA and 2 non-coding regions. The mitogenome sequence was validated against sequences of PCR fragments and BLAST queries of Genbank. Gene order was equivalent to that found in marine fishes.
Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse
Hillier, LaDeana W.; Zody, Michael C.; Goldstein, Steve; She, Xinwe; Bult, Carol J.; Agarwala, Richa; Cherry, Joshua L.; DiCuccio, Michael; Hlavina, Wratko; Kapustin, Yuri; Meric, Peter; Maglott, Donna; Birtle, Zoë; Marques, Ana C.; Graves, Tina; Zhou, Shiguo; Teague, Brian; Potamousis, Konstantinos; Churas, Christopher; Place, Michael; Herschleb, Jill; Runnheim, Ron; Forrest, Daniel; Amos-Landgraf, James; Schwartz, David C.; Cheng, Ze; Lindblad-Toh, Kerstin; Eichler, Evan E.; Ponting, Chris P.
2009-01-01
The mouse (Mus musculus) is the premier animal model for understanding human disease and development. Here we show that a comprehensive understanding of mouse biology is only possible with the availability of a finished, high-quality genome assembly. The finished clone-based assembly of the mouse strain C57BL/6J reported here has over 175,000 fewer gaps and over 139 Mb more of novel sequence, compared with the earlier MGSCv3 draft genome assembly. In a comprehensive analysis of this revised genome sequence, we are now able to define 20,210 protein-coding genes, over a thousand more than predicted in the human genome (19,042 genes). In addition, we identified 439 long, non–protein-coding RNAs with evidence for transcribed orthologs in human. We analyzed the complex and repetitive landscape of 267 Mb of sequence that was missing or misassembled in the previously published assembly, and we provide insights into the reasons for its resistance to sequencing and assembly by whole-genome shotgun approaches. Duplicated regions within newly assembled sequence tend to be of more recent ancestry than duplicates in the published draft, correcting our initial understanding of recent evolution on the mouse lineage. These duplicates appear to be largely composed of sequence regions containing transposable elements and duplicated protein-coding genes; of these, some may be fixed in the mouse population, but at least 40% of segmentally duplicated sequences are copy number variable even among laboratory mouse strains. Mouse lineage-specific regions contain 3,767 genes drawn mainly from rapidly-changing gene families associated with reproductive functions. The finished mouse genome assembly, therefore, greatly improves our understanding of rodent-specific biology and allows the delineation of ancestral biological functions that are shared with human from derived functions that are not. PMID:19468303
The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.
Qian, Jun; Song, Jingyuan; Gao, Huanhuan; Zhu, Yingjie; Xu, Jiang; Pang, Xiaohui; Yao, Hui; Sun, Chao; Li, Xian'en; Li, Chuyuan; Liu, Juyan; Xu, Haibin; Chen, Shilin
2013-01-01
Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.
Jeukens, Julie; Bernatchez, Louis
2012-01-01
While gene expression divergence is known to be involved in adaptive phenotypic divergence and speciation, the relative importance of regulatory and structural evolution of genes is poorly understood. A recent next-generation sequencing experiment allowed identifying candidate genes potentially involved in the ongoing speciation of sympatric dwarf and normal lake whitefish (Coregonus clupeaformis), such as cytosolic malate dehydrogenase (MDH1), which showed both significant expression and sequence divergence. The main goal of this study was to investigate into more details the signatures of natural selection in the regulatory and coding sequences of MDH1 in lake whitefish and test for parallelism of these signatures with other coregonine species. Sequencing of the two regions in 118 fish from four sympatric pairs of whitefish and two cisco species revealed a total of 35 single nucleotide polymorphisms (SNPs), with more genetic diversity in European compared to North American coregonine species. While the coding region was found to be under purifying selection, an SNP in the proximal promoter exhibited significant allele frequency divergence in a parallel manner among independent sympatric pairs of North American lake whitefish and European whitefish (C. lavaretus). According to transcription factor binding simulation for 22 regulatory haplotypes of MDH1, putative binding profiles were fairly conserved among species, except for the region around this SNP. Moreover, we found evidence for the role of this SNP in the regulation of MDH1 expression level. Overall, these results provide further evidence for the role of natural selection in gene regulation evolution among whitefish species pairs and suggest its possible link with patterns of phenotypic diversity observed in coregonine species. PMID:22408741
Jeukens, Julie; Bernatchez, Louis
2012-01-01
While gene expression divergence is known to be involved in adaptive phenotypic divergence and speciation, the relative importance of regulatory and structural evolution of genes is poorly understood. A recent next-generation sequencing experiment allowed identifying candidate genes potentially involved in the ongoing speciation of sympatric dwarf and normal lake whitefish (Coregonus clupeaformis), such as cytosolic malate dehydrogenase (MDH1), which showed both significant expression and sequence divergence. The main goal of this study was to investigate into more details the signatures of natural selection in the regulatory and coding sequences of MDH1 in lake whitefish and test for parallelism of these signatures with other coregonine species. Sequencing of the two regions in 118 fish from four sympatric pairs of whitefish and two cisco species revealed a total of 35 single nucleotide polymorphisms (SNPs), with more genetic diversity in European compared to North American coregonine species. While the coding region was found to be under purifying selection, an SNP in the proximal promoter exhibited significant allele frequency divergence in a parallel manner among independent sympatric pairs of North American lake whitefish and European whitefish (C. lavaretus). According to transcription factor binding simulation for 22 regulatory haplotypes of MDH1, putative binding profiles were fairly conserved among species, except for the region around this SNP. Moreover, we found evidence for the role of this SNP in the regulation of MDH1 expression level. Overall, these results provide further evidence for the role of natural selection in gene regulation evolution among whitefish species pairs and suggest its possible link with patterns of phenotypic diversity observed in coregonine species.
López-Wilchis, Ricardo; Del Río-Portilla, Miguel Ángel; Guevara-Chumacero, Luis Manuel
2017-02-01
We described the complete mitochondrial genome (mitogenome) of the Wagner's mustached bat, Pteronotus personatus, a species belonging to the family Mormoopidae, and compared it with other published mitogenomes of bats (Chiroptera). The mitogenome of P. personatus was 16,570 bp long and contained a typically conserved structure including 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and one control region (D-loop). Most of the genes were encoded on the H-strand, except for eight tRNA and the ND6 genes. The order of protein-coding and rRNA genes was highly conserved in all mitogenomes. All protein-coding genes started with an ATG codon, except for ND2, ND3, and ND5, which initiated with ATA, and terminated with the typical stop codon TAA/TAG or the codon AGA. Phylogenetic trees constructed using Maximum Parsimony, Maximum Likelihood, and Bayesian inference methods showed an identical topology and indicated the monophyly of different families of bats (Mormoopidae, Phyllostomidae, Vespertilionidae, Rhinolophidae, and Pteropopidae) and the existence of two major clades corresponding to the suborders Yangochiroptera and Yinpterochiroptera. The mitogenome sequence provided here will be useful for further phylogenetic analyses and population genetic studies in mormoopid bats.
Genomic structure of two ras family genes in the slime mold Physarum polycephalum.
Trzcińska-Danielewicz, Joanna; Kozlowski, Piotr; Gierdal, Katarzyna; Wiejak, Jolanta; Jagielski, Adam; Toczko, Kazimierz; Fronk, Jan
2002-08-01
Genomic structure of two Physarum polycephalum ras family genes, Ppras2 and Pprap1, has been determined, including the upstream region of the latter. The genes are interrupted by three and four introns, respectively. The first intron of Ppras2 has the same location within the coding sequence as the first intron in another ras homolog from this organism, Ppras1 [Trzcińska-Danielewicz, J., Kozlowski, P., and Toczko, K. (1996). "Cloning and genomic sequence of the Physarum polycephalum Ppras1 gene, a homologue of the ras protooncogene", Gene 169, pp. 143-144]. All introns, ranging from 53 to ca. 460 base pairs, have the canonical 5' and 3' ends, are greatly enriched in pyrimidines in the coding strand and have frequent pyrimidines-only tracts. These latter features seem to be responsible for the difficulties in cloning and sequencing of parts of these genes. Short sequences shared with P. polycephalum transposon-like repeats are common in the introns, indicating a possible role of transposition in intron evolution. In all three ras family genes phase zero introns are located mostly between sequences coding for regular protein secondary structure elements.
The complete chloroplast genome sequence of Chikusichloa aquatica (Poaceae: Oryzeae).
Zhang, Jie; Zhang, Dan; Shi, Chao; Gao, Ju; Gao, Li-Zhi
2016-07-01
The complete chloroplast sequence of the Chikusichloa aquatica was determined in this study. The genome consists of 136 563 bp containing a pair of inverted repeats (IRs) of 20 837 bp, which was separated by a large single-copy region and a small single-copy region of 82 315 bp and 33 411 bp, respectively. The C. aquatica cp genome encodes 111 functional genes (71 protein-coding genes, four rRNA genes, and 36 tRNA genes): 92 are unique, while 19 are duplicated in the IR regions. The genic regions account for 58.9% of whole cp genome, and the GC content of the plastome is 39.0%. A phylogenomic analysis showed that C. aquatica is closely related to Rhynchoryza subulata that belongs to the tribe Oryzeae.
René, P; Lenne, F; Ventura, M A; Bertagna, X; de Keyzer, Y
2000-01-04
In the pituitary, vasopressin triggers ACTH release through a specific receptor subtype, termed V3 or V1b. We cloned the V3 cDNA and showed that its expression was almost exclusive to pituitary corticotrophs and some corticotroph tumors. To study the determinants of this tissue specificity, we have now cloned the gene for the human (h) V3 receptor and characterized its structure. It is composed of two exons, spanning 10kb, with the coding region interrupted between transmembrane domains 6 and 7. We established that the transcription initiation site is located 498 nucleotides upstream of the initiator codon and showed that two polyadenylation sites may be used, while the most frequent is the most downstream. Sequence analysis of the promoter region showed no TATA box but identified consensus binding motifs for Sp1, CREB, and half sites of the estrogen receptor binding site. However comparison with another corticotroph-specific gene, proopiomelanocortin, did not identify common regulatory elements in the two promoters except for a short GC-rich region. Unexpectedly, hV3 gene analysis revealed that a formerly cloned 'artifactual' hV3 cDNA indeed corresponded to a spliced antisense transcript, overlapping the 5' part of the coding sequence in exon 1 and the promoter region. This transcript, hV3rev, was detected in normal pituitary and in many corticotroph tumors expressing hV3 sense mRNA and may therefore play a role in hV3 gene expression.
Alpert, Carl-Alfred; Crutz-Le Coq, Anne-Marie; Malleret, Christine; Zagorec, Monique
2003-01-01
The complete nucleotide sequence of the 13-kb plasmid pRV500, isolated from Lactobacillus sakei RV332, was determined. Sequence analysis enabled the identification of genes coding for a putative type I restriction-modification system, two genes coding for putative recombinases of the integrase family, and a region likely involved in replication. The structural features of this region, comprising a putative ori segment containing 11- and 22-bp repeats and a repA gene coding for a putative initiator protein, indicated that pRV500 belongs to the pUCL287 subfamily of theta-type replicons. A 3.7-kb fragment encompassing this region was fused to an Escherichia coli replicon to produce the shuttle vector pRV566 and was observed to be functional in L. sakei for plasmid replication. The L. sakei replicon alone could not support replication in E. coli. Plasmid pRV500 and its derivative pRV566 were determined to be at very low copy numbers in L. sakei. pRV566 was maintained at a reasonable rate over 20 generations in several lactobacilli, such as Lactobacillus curvatus, Lactobacillus casei, and Lactobacillus plantarum, in addition to L. sakei, making it an interesting basis for developing vectors. Sequence relationships with other plasmids are described and discussed. PMID:12957947
Nakamura, Mikiko; Suzuki, Ayako; Akada, Junko; Tomiyoshi, Keisuke; Hoshida, Hisashi; Akada, Rinji
2015-12-01
Mammalian gene expression constructs are generally prepared in a plasmid vector, in which a promoter and terminator are located upstream and downstream of a protein-coding sequence, respectively. In this study, we found that front terminator constructs-DNA constructs containing a terminator upstream of a promoter rather than downstream of a coding region-could sufficiently express proteins as a result of end joining of the introduced DNA fragment. By taking advantage of front terminator constructs, FLAG substitutions, and deletions were generated using mutagenesis primers to identify amino acids specifically recognized by commercial FLAG antibodies. A minimal epitope sequence for polyclonal FLAG antibody recognition was also identified. In addition, we analyzed the sequence of a C-terminal Ser-Lys-Leu peroxisome localization signal, and identified the key residues necessary for peroxisome targeting. Moreover, front terminator constructs of hepatitis B surface antigen were used for deletion analysis, leading to the identification of regions required for the particle formation. Collectively, these results indicate that front terminator constructs allow for easy manipulations of C-terminal protein-coding sequences, and suggest that direct gene expression with PCR-amplified DNA is useful for high-throughput protein analysis in mammalian cells.
2011-01-01
Background The melon belongs to the Cucurbitaceae family, whose economic importance among vegetable crops is second only to Solanaceae. The melon has a small genome size (454 Mb), which makes it suitable for molecular and genetic studies. Despite similar nuclear and chloroplast genome sizes, cucurbits show great variation when their mitochondrial genomes are compared. The melon possesses the largest plant mitochondrial genome, as much as eight times larger than that of other cucurbits. Results The nucleotide sequences of the melon chloroplast and mitochondrial genomes were determined. The chloroplast genome (156,017 bp) included 132 genes, with 98 single-copy genes dispersed between the small (SSC) and large (LSC) single-copy regions and 17 duplicated genes in the inverted repeat regions (IRa and IRb). A comparison of the cucumber and melon chloroplast genomes showed differences in only approximately 5% of nucleotides, mainly due to short indels and SNPs. Additionally, 2.74 Mb of mitochondrial sequence, accounting for 95% of the estimated mitochondrial genome size, were assembled into five scaffolds and four additional unscaffolded contigs. An 84% of the mitochondrial genome is contained in a single scaffold. The gene-coding region accounted for 1.7% (45,926 bp) of the total sequence, including 51 protein-coding genes, 4 conserved ORFs, 3 rRNA genes and 24 tRNA genes. Despite the differences observed in the mitochondrial genome sizes of cucurbit species, Citrullus lanatus (379 kb), Cucurbita pepo (983 kb) and Cucumis melo (2,740 kb) share 120 kb of sequence, including the predicted protein-coding regions. Nevertheless, melon contained a high number of repetitive sequences and a high content of DNA of nuclear origin, which represented 42% and 47% of the total sequence, respectively. Conclusions Whereas the size and gene organisation of chloroplast genomes are similar among the cucurbit species, mitochondrial genomes show a wide variety of sizes, with a non-conserved structure both in gene number and organisation, as well as in the features of the noncoding DNA. The transfer of nuclear DNA to the melon mitochondrial genome and the high proportion of repetitive DNA appear to explain the size of the largest mitochondrial genome reported so far. PMID:21854637
New progress in snake mitochondrial gene rearrangement.
Chen, Nian; Zhao, Shujin
2009-08-01
To further understand the evolution of snake mitochondrial genomes, the complete mitochondrial DNA (mtDNA) sequences were determined for representative species from two snake families: the Many-banded krait, the Banded krait, the Chinese cobra, the King cobra, the Hundred-pace viper, the Short-tailed mamushi, and the Chain viper. Thirteen protein-coding genes, 22-23 tRNA genes, 2 rRNA genes, and 2 control regions were identified in these mtDNAs. Duplication of the control region and translocation of the tRNAPro gene were two notable features of the snake mtDNAs. These results from the gene rearrangement comparisons confirm the correctness of traditional classification schemes and validate the utility of comparing complete mtDNA sequences for snake phylogeny reconstruction.
Gottlieb, Assaf; Daneshjou, Roxana; DeGorter, Marianne; Bourgeois, Stephane; Svensson, Peter J; Wadelius, Mia; Deloukas, Panos; Montgomery, Stephen B; Altman, Russ B
2017-11-24
Genome-wide association studies are useful for discovering genotype-phenotype associations but are limited because they require large cohorts to identify a signal, which can be population-specific. Mapping genetic variation to genes improves power and allows the effects of both protein-coding variation as well as variation in expression to be combined into "gene level" effects. Previous work has shown that warfarin dose can be predicted using information from genetic variation that affects protein-coding regions. Here, we introduce a method that improves dose prediction by integrating tissue-specific gene expression. In particular, we use drug pathways and expression quantitative trait loci knowledge to impute gene expression-on the assumption that differential expression of key pathway genes may impact dose requirement. We focus on 116 genes from the pharmacokinetic and pharmacodynamic pathways of warfarin within training and validation sets comprising both European and African-descent individuals. We build gene-tissue signatures associated with warfarin dose in a cohort-specific manner and identify a signature of 11 gene-tissue pairs that significantly augments the International Warfarin Pharmacogenetics Consortium dosage-prediction algorithm in both populations. Our results demonstrate that imputed expression can improve dose prediction and bridge population-specific compositions. MATLAB code is available at https://github.com/assafgo/warfarin-cohort.
JCoDA: a tool for detecting evolutionary selection.
Steinway, Steven N; Dannenfelser, Ruth; Laucius, Christopher D; Hayes, James E; Nayak, Sudhir
2010-05-27
The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda.
JCoDA: a tool for detecting evolutionary selection
2010-01-01
Background The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. Results JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. Conclusions JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda. PMID:20507581
Analysis of protein-coding genetic variation in 60,706 humans.
Lek, Monkol; Karczewski, Konrad J; Minikel, Eric V; Samocha, Kaitlin E; Banks, Eric; Fennell, Timothy; O'Donnell-Luria, Anne H; Ware, James S; Hill, Andrew J; Cummings, Beryl B; Tukiainen, Taru; Birnbaum, Daniel P; Kosmicki, Jack A; Duncan, Laramie E; Estrada, Karol; Zhao, Fengmei; Zou, James; Pierce-Hoffman, Emma; Berghout, Joanne; Cooper, David N; Deflaux, Nicole; DePristo, Mark; Do, Ron; Flannick, Jason; Fromer, Menachem; Gauthier, Laura; Goldstein, Jackie; Gupta, Namrata; Howrigan, Daniel; Kiezun, Adam; Kurki, Mitja I; Moonshine, Ami Levy; Natarajan, Pradeep; Orozco, Lorena; Peloso, Gina M; Poplin, Ryan; Rivas, Manuel A; Ruano-Rubio, Valentin; Rose, Samuel A; Ruderfer, Douglas M; Shakir, Khalid; Stenson, Peter D; Stevens, Christine; Thomas, Brett P; Tiao, Grace; Tusie-Luna, Maria T; Weisburd, Ben; Won, Hong-Hee; Yu, Dongmei; Altshuler, David M; Ardissino, Diego; Boehnke, Michael; Danesh, John; Donnelly, Stacey; Elosua, Roberto; Florez, Jose C; Gabriel, Stacey B; Getz, Gad; Glatt, Stephen J; Hultman, Christina M; Kathiresan, Sekar; Laakso, Markku; McCarroll, Steven; McCarthy, Mark I; McGovern, Dermot; McPherson, Ruth; Neale, Benjamin M; Palotie, Aarno; Purcell, Shaun M; Saleheen, Danish; Scharf, Jeremiah M; Sklar, Pamela; Sullivan, Patrick F; Tuomilehto, Jaakko; Tsuang, Ming T; Watkins, Hugh C; Wilson, James G; Daly, Mark J; MacArthur, Daniel G
2016-08-18
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
Identification of common, unique and polymorphic microsatellites among 73 cyanobacterial genomes.
Kabra, Ritika; Kapil, Aditi; Attarwala, Kherunnisa; Rai, Piyush Kant; Shanker, Asheesh
2016-04-01
Microsatellites also known as Simple Sequence Repeats are short tandem repeats of 1-6 nucleotides. These repeats are found in coding as well as non-coding regions of both prokaryotic and eukaryotic genomes and play a significant role in the study of gene regulation, genetic mapping, DNA fingerprinting and evolutionary studies. The availability of 73 complete genome sequences of cyanobacteria enabled us to mine and statistically analyze microsatellites in these genomes. The cyanobacterial microsatellites identified through bioinformatics analysis were stored in a user-friendly database named CyanoSat, which is an efficient data representation and query system designed using ASP.net. The information in CyanoSat comprises of perfect, imperfect and compound microsatellites found in coding, non-coding and coding-non-coding regions. Moreover, it contains PCR primers with 200 nucleotides long flanking region. The mined cyanobacterial microsatellites can be freely accessed at www.compubio.in/CyanoSat/home.aspx. In addition to this 82 polymorphic, 13,866 unique and 2390 common microsatellites were also detected. These microsatellites will be useful in strain identification and genetic diversity studies of cyanobacteria.
Liu, Chen; Shen, He Ding; Zhou, Na
2016-01-01
The complete mitochondrial genome sequence of Platevindex sp. is firstly described in the article. The mitogenome (13,908 bp) contains 22 tRNA genes, 2 ribosomal RNA genes and 13 protein-coding genes, and 1 putative control region (CR). CR is not well characterized due to lack of discrete conserved sequence blocks. This characteristic is similar with CRs of other invertebrate mitochondrial genomes. The characteristic is the typical bivalvia mitochondrial gene composition.
Wang, Kai; Ding, Shuangmei; Yang, Ding
2016-09-01
This study determined the complete mitochondrial (mt) genome of the stonefly, Kamimuria chungnanshana Wu, 1948. The mt genome is 15, 943 bp in size and contains 37 canonical genes which include 22 transfer RNA genes, 13 protein-coding genes, and two ribosomal RNA genes, the control region is 1062 bp in length. The phylogenetic tree shows that Kamimuria chungnanshana is sister group of Kamimuria wangi.
Complete mitochondrial genome of a wild Siberian tiger.
Sun, Yujiao; Lu, Taofeng; Sun, Zhaohui; Guan, Weijun; Liu, Zhensheng; Teng, Liwei; Wang, Shuo; Ma, Yuehui
2015-01-01
In this study, the complete mitochondrial genome of Siberian tiger (Panthera tigris altaica) was sequenced, using muscle tissue obtained from a male wild tiger. The total length of the mitochondrial genome is 16,996 bp. The genome structure of this tiger is in accordance with other Siberian tigers and it contains 12S rRNA gene, 16S rRNA gene, 22 tRNA genes, 13 protein-coding genes, and 1 control region.
Organization and transient expression of the gene for human U11 snRNA
Clemens, Suter-Crazzolara; Walter, Keller
1991-01-01
The nucleotide sequence of U11 small nuclear RNA, a minor U RNA from HeLa cells, was determined. Computer analysis of the sequence (135 residues) predicts two strong hairpin loops which are separated by seventeen nucleotides containing an Sm binding site (AAUUUUUUGG). A synthetic gene was constructed in which the coding region of U11 RNA is under the control of a T7 promoter. This vector can be used to produce U11 RNA in vitro. Southern hybridization and PCR analysis of HeLa genomic DNA suggest that U11 RNA is encoded by a single copy gene, and that at least three genomic regions could be U11 RNA pseudogenes. A HeLa genomic copy of a U11 gene was isolated by inverted PCR. This gene contains the U11 RNA coding sequence and several sequence elements unique for the U RNA genes. These include a Distal Sequence Element (DSE, ATTTGCATA) present between positions −215 and −223 relative to the start of transcription; a Proximal Sequence Element (PSE, TTCACCTTTACCAAAAATG) located between positions −43 and −63 ; and a 3′box (GTTAGGCGAAATATTA) between positions +150 and +166. Transfection of HeLa cells with this gene revealed that it is functioning in vivo and can produce U11 RNA. PMID:1820214
Detecting long tandem duplications in genomic sequences.
Audemard, Eric; Schiex, Thomas; Faraut, Thomas
2012-05-08
Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,(a) we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS < 1) and that it is also able to predict tandem duplications involving non coding elements such as pseudo-genes or RNA genes. ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations.
Werling, Donna M; Brand, Harrison; An, Joon-Yong; Stone, Matthew R; Zhu, Lingxue; Glessner, Joseph T; Collins, Ryan L; Dong, Shan; Layer, Ryan M; Markenscoff-Papadimitriou, Eirene; Farrell, Andrew; Schwartz, Grace B; Wang, Harold Z; Currall, Benjamin B; Zhao, Xuefang; Dea, Jeanselle; Duhn, Clif; Erdman, Carolyn A; Gilson, Michael C; Yadav, Rachita; Handsaker, Robert E; Kashin, Seva; Klei, Lambertus; Mandell, Jeffrey D; Nowakowski, Tomasz J; Liu, Yuwen; Pochareddy, Sirisha; Smith, Louw; Walker, Michael F; Waterman, Matthew J; He, Xin; Kriegstein, Arnold R; Rubenstein, John L; Sestan, Nenad; McCarroll, Steven A; Neale, Benjamin M; Coon, Hilary; Willsey, A Jeremy; Buxbaum, Joseph D; Daly, Mark J; State, Matthew W; Quinlan, Aaron R; Marth, Gabor T; Roeder, Kathryn; Devlin, Bernie; Talkowski, Michael E; Sanders, Stephan J
2018-05-01
Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.
Hu, Min; Chilton, Neil B; Gasser, Robin B
2002-02-01
The complete mitochondrial genome sequences were determined for two species of human hookworms, Ancylostoma duodenale (13,721 bp) and Necator americanus (13,604 bp). The circular hookworm genomes are amongst the smallest reported to date for any metazoan organism. Their relatively small size relates mainly to a reduced length in the AT-rich region. Both hookworm genomes encode 12 protein, two ribosomal RNA and 22 transfer RNA genes, but lack the ATP synthetase subunit 8 gene, which is consistent with three other species of Secernentea studied to date. All genes are transcribed in the same direction and have a nucleotide composition high in A and T, but low in G and C. The AT bias had a significant effect on both the codon usage pattern and amino acid composition of proteins. For both hookworm species, genes were arranged in the same order as for Caenorhabditis elegans, except for the presence of a non-coding region between genes nad3 and nad5. In A. duodenale, this non-coding region is predicted to form a stem-and-loop structure which is not present in N. americanus. The mitochondrial genome structure for both hookworms differs from Ascaris suum only in the location of the AT-rich region, whereas there are substantial differences when compared with Onchocerca volvulus, including four gene or gene-block translocations and the positions of some transfer RNA genes and the AT-rich region. Based on genome organisation and amino acid sequence identity, A. duodenale and N. americanus were more closely related to C. elegans than to A. suum or O. volvulus (all secernentean nematodes), consistent with a previous phylogenetic study using ribosomal DNA sequence data. Determination of the complete mitochondrial genome sequences for two human hookworms (the first members of the order Strongylida ever sequenced) provides a foundation for studying the systematics, population genetics and ecology of these and other nematodes of socio-economic importance.
The full mitochondrial genome sequence of Raillietina tetragona from chicken (Cestoda: Davaineidae).
Liang, Jian-Ying; Lin, Rui-Qing
2016-11-01
In the present study, the complete mitochondrial DNA (mtDNA) sequence of Raillietina tetragona was sequenced and its gene contents and genome organizations was compared with that of other tapeworm. The complete mt genome sequence of R. tetragona is 14,444 bp in length. It contains 12 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and two non-coding region. All genes are transcribed in the same direction and have a nucleotide composition high in A and T. The contents of A + T of the complete mt genome are 71.4% for R. tetragona. The R. tetragona mt genome sequence provides novel mtDNA marker for studying the molecular epidemiology and population genetics of Raillietina and has implications for the molecular diagnosis of chicken cestodosis caused by Raillietina.
Image-guided genomic analysis of tissue response to laser-induced thermal stress
NASA Astrophysics Data System (ADS)
Mackanos, Mark A.; Helms, Mike; Kalish, Flora; Contag, Christopher H.
2011-05-01
The cytoprotective response to thermal injury is characterized by transcriptional activation of ``heat shock proteins'' (hsp) and proinflammatory proteins. Expression of these proteins may predict cellular survival. Microarray analyses were performed to identify spatially distinct gene expression patterns responding to thermal injury. Laser injury zones were identified by expression of a transgene reporter comprised of the 70 kD hsp gene and the firefly luciferase coding sequence. Zones included the laser spot, the surrounding region where hsp70-luc expression was increased, and a region adjacent to the surrounding region. A total of 145 genes were up-regulated in the laser irradiated region, while 69 were up-regulated in the adjacent region. At 7 hours the chemokine Cxcl3 was the highest expressed gene in the laser spot (24 fold) and adjacent region (32 fold). Chemokines were the most common up-regulated genes identified. Microarray gene expression was successfully validated using qRT- polymerase chain reaction for selected genes of interest. The early response genes are likely involved in cytoprotection and initiation of the healing response. Their regulatory elements will benefit creating the next generation reporter mice and controlling expression of therapeutic proteins. The identified genes serve as drug development targets that may prevent acute tissue damage and accelerate healing.
Ni, ZhouXian; Ye, YouJu; Bai, Tiandao; Xu, Meng; Xu, Li-An
2017-09-11
The chloroplast genome (CPG) of Pinus massoniana belonging to the genus Pinus (Pinaceae), which is a primary source of turpentine, was sequenced and analyzed in terms of gene rearrangements, ndh genes loss, and the contraction and expansion of short inverted repeats (IRs). P. massoniana CPG has a typical quadripartite structure that includes large single copy (LSC) (65,563 bp), small single copy (SSC) (53,230 bp) and two IRs (IRa and IRb, 485 bp). The 108 unique genes were identified, including 73 protein-coding genes, 31 tRNAs, and 4 rRNAs. Most of the 81 simple sequence repeats (SSRs) identified in CPG were mononucleotides motifs of A/T types and located in non-coding regions. Comparisons with related species revealed an inversion (21,556 bp) in the LSC region; P. massoniana CPG lacks all 11 intact ndh genes (four ndh genes lost completely; the five remained truncated as pseudogenes; and the other two ndh genes remain as pseudogenes because of short insertions or deletions). A pair of short IRs was found instead of large IRs, and size variations among pine species were observed, which resulted from short insertions or deletions and non-synchronized variations between "IRa" and "IRb". The results of phylogenetic analyses based on whole CPG sequences of 16 conifers indicated that the whole CPG sequences could be used as a powerful tool in phylogenetic analyses.
The complete mitochondrial genome of the endangered spotback skate, Atlantoraja castelnaui.
Duckett, Drew J L; Naylor, Gavin J P
2016-05-01
Chondrichthyes are a highly threatened class of organisms, largely due to overfishing and other human activities. The present study describes the complete mitochondrial genome (16,750 bp) of the endangered spotback skate, Atlantoraja castelnaui. The mitogenome is arranged in a typical vertebrate fashion, containing 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes and 1 control region.
Adaptive Evolution Is Substantially Impeded by Hill–Robertson Interference in Drosophila
Castellano, David; Coronado-Zamora, Marta; Campos, Jose L.; Barbadilla, Antonio; Eyre-Walker, Adam
2016-01-01
Hill–Robertson interference (HRi) is expected to reduce the efficiency of natural selection when two or more linked selected sites do not segregate freely, but no attempt has been done so far to quantify the overall impact of HRi on the rate of adaptive evolution for any given genome. In this work, we estimate how much HRi impedes the rate of adaptive evolution in the coding genome of Drosophila melanogaster. We compiled a data set of 6,141 autosomal protein-coding genes from Drosophila, from which polymorphism levels in D. melanogaster and divergence out to D. yakuba were estimated. The rate of adaptive evolution was calculated using a derivative of the McDonald–Kreitman test that controls for slightly deleterious mutations. We find that the rate of adaptive amino acid substitution at a given position of the genome is positively correlated to both the rate of recombination and the mutation rate, and negatively correlated to the gene density of the region. These correlations are robust to controlling for each other, for synonymous codon bias and for gene functions related to immune response and testes. We show that HRi diminishes the rate of adaptive evolution by approximately 27%. Interestingly, genes with low mutation rates embedded in gene poor regions lose approximately 17% of their adaptive substitutions whereas genes with high mutation rates embedded in gene rich regions lose approximately 60%. We conclude that HRi hampers the rate of adaptive evolution in Drosophila and that the variation in recombination, mutation, and gene density along the genome affects the HRi effect. PMID:26494843
Timofeeva, Maria N.; Kinnersley, Ben; Farrington, Susan M.; Whiffin, Nicola; Palles, Claire; Svinti, Victoria; Lloyd, Amy; Gorman, Maggie; Ooi, Li-Yin; Hosking, Fay; Barclay, Ella; Zgaga, Lina; Dobbins, Sara; Martin, Lynn; Theodoratou, Evropi; Broderick, Peter; Tenesa, Albert; Smillie, Claire; Grimes, Graeme; Hayward, Caroline; Campbell, Archie; Porteous, David; Deary, Ian J.; Harris, Sarah E.; Northwood, Emma L.; Barrett, Jennifer H.; Smith, Gillian; Wolf, Roland; Forman, David; Morreau, Hans; Ruano, Dina; Tops, Carli; Wijnen, Juul; Schrumpf, Melanie; Boot, Arnoud; Vasen, Hans F A; Hes, Frederik J.; van Wezel, Tom; Franke, Andre; Lieb, Wolgang; Schafmayer, Clemens; Hampe, Jochen; Buch, Stephan; Propping, Peter; Hemminki, Kari; Försti, Asta; Westers, Helga; Hofstra, Robert; Pinheiro, Manuela; Pinto, Carla; Teixeira, Manuel; Ruiz-Ponte, Clara; Fernández-Rozadilla, Ceres; Carracedo, Angel; Castells, Antoni; Castellví-Bel, Sergi; Campbell, Harry; Bishop, D. Timothy; Tomlinson, Ian P M; Dunlop, Malcolm G.; Houlston, Richard S.
2015-01-01
Whilst common genetic variation in many non-coding genomic regulatory regions are known to impart risk of colorectal cancer (CRC), much of the heritability of CRC remains unexplained. To examine the role of recurrent coding sequence variation in CRC aetiology, we genotyped 12,638 CRCs cases and 29,045 controls from six European populations. Single-variant analysis identified a coding variant (rs3184504) in SH2B3 (12q24) associated with CRC risk (OR = 1.08, P = 3.9 × 10−7), and novel damaging coding variants in 3 genes previously tagged by GWAS efforts; rs16888728 (8q24) in UTP23 (OR = 1.15, P = 1.4 × 10−7); rs6580742 and rs12303082 (12q13) in FAM186A (OR = 1.11, P = 1.2 × 10−7 and OR = 1.09, P = 7.4 × 10−8); rs1129406 (12q13) in ATF1 (OR = 1.11, P = 8.3 × 10−9), all reaching exome-wide significance levels. Gene based tests identified associations between CRC and PCDHGA genes (P < 2.90 × 10−6). We found an excess of rare, damaging variants in base-excision (P = 2.4 × 10−4) and DNA mismatch repair genes (P = 6.1 × 10−4) consistent with a recessive mode of inheritance. This study comprehensively explores the contribution of coding sequence variation to CRC risk, identifying associations with coding variation in 4 genes and PCDHG gene cluster and several candidate recessive alleles. However, these findings suggest that recurrent, low-frequency coding variants account for a minority of the unexplained heritability of CRC. PMID:26553438
Liu, X; Gorovsky, M A
1996-01-01
A truncated cDNA clone encoding Tetrahymena thermophila histone H2A2 was isolated using synthetic degenerate oligonucleotide probes derived from H2A protein sequences of Tetrahymena pyriformis. The cDNA clone was used as a homologous probe to isolate a truncated genomic clone encoding H2A1. The remaining regions of the genes for H2A1 (HTA1) and H2A2 (HTA2) were then isolated using inverse PCR on circularized genomic DNA fragments. These partial clones were assembled into intact HTA1 and HTA2 clones. Nucleotide sequences of the two genes were highly homologous within the coding region but not in the noncoding regions. Comparison of the deduced amino acid sequences with protein sequences of T. pyriformis H2As showed only two and three differences respectively, in a total of 137 amino acids for H2A1, and 132 amino acids for H2A2, indicating the two genes arose before the divergence of these two species. The HTA2 gene contains a TAA triplet within the coding region, encoding a glutamine residue. In contrast with the T. thermophila HHO and HTA3 genes, no introns were identified within the two genes. The 5'- and 3'-ends of the histone H2A mRNAs; were determined by RNase protection and by PCR mapping using RACE and RLM-RACE methods. Both genes encode polyadenylated mRNAs and are highly expressed in vegetatively growing cells but only weakly expressed in starved cultures. With the inclusion of these two genes, T. thermophila is the first organism whose entire complement of known core and linker histones, including replication-dependent and basal variants, has been cloned and sequenced. PMID:8760889
Lin, C S; Sun, Y L; Liu, C Y; Yang, P C; Chang, L C; Cheng, I C; Mao, S J; Huang, M C
1999-08-05
The complete nucleotide sequence of the pig (Sus scrofa) mitochondrial genome, containing 16613bp, is presented in this report. The genome is not a specific length because of the presence of the variable numbers of tandem repeats, 5'-CGTGCGTACA in the displacement loop (D-loop). Genes responsible for 12S and 16S rRNAs, 22 tRNAs, and 13 protein-coding regions are found. The genome carries very few intergenic nucleotides with several instances of overlap between protein-coding or tRNA genes, except in the D-loop region. For evaluating the possible evolutionary relationships between Artiodactyla and Cetacea, the nucleotide substitutions and amino acid sequences of 13 protein-coding genes were aligned by pairwise comparisons of the pig, cow, and fin whale. By comparing these sequences, we suggest that there is a closer relationship between the pig and cow than that between either of these species and fin whale. In addition, the accumulation of transversions and gaps in pig 12S and 16S rRNA genes was compared with that in other eutherian species, including cow, fin whale, human, horse, and harbor seal. The results also reveal a close phylogenetic relationship between pig and cow, as compared to fin whale and others. Thus, according to the sequence differences of mitochondrial rRNA genes in eutherian species, the evolutionary separation of pig and cow occurred about 53-60 million years ago.
Yi, Dong-Keun; Lee, Hae-Lim; Sun, Byung-Yun; Chung, Mi Yoon; Kim, Ki-Joong
2012-05-01
This study reports the complete chloroplast (cp) DNA sequence of Eleutherococcus senticosus (GenBank: JN 637765), an endangered endemic species. The genome is 156,768 bp in length, and contains a pair of inverted repeat (IR) regions of 25,930 bp each, a large single copy (LSC) region of 86,755 bp and a small single copy (SSC) region of 18,153 bp. The structural organization, gene and intron contents, gene order, AT content, codon usage, and transcription units of the E. senticosus chloroplast genome are similar to that of typical land plant cp DNA. We aligned and analyzed the sequences of 86 coding genes, 19 introns and 113 intergenic spacers (IGS) in three different taxonomic hierarchies; Eleutherococcus vs. Panax, Eleutherococcus vs. Daucus, and Eleutherococcus vs. Nicotiana. The distribution of indels, the number of polymorphic sites and nucleotide diversity indicate that positional constraint is more important than functional constraint for the evolution of cp genome sequences in Asterids. For example, the intron sequences in the LSC region exhibited base substitution rates 5-11-times higher than that of the IR regions, while the intron sequences in the SSC region evolved 7-14-times faster than those in the IR region. Furthermore, the Ka/Ks ratio of the gene coding sequences supports a stronger evolutionary constraint in the IR region than in the LSC or SSC regions. Therefore, our data suggest that selective sweeps by base collection mechanisms more frequently eliminate polymorphisms in the IR region than in other regions. Chloroplast genome regions that have high levels of base substitutions also show higher incidences of indels. Thirty-five simple sequence repeat (SSR) loci were identified in the Eleutherococcus chloroplast genome. Of these, 27 are homopolymers, while six are di-polymers and two are tri-polymers. In addition to the SSR loci, we also identified 18 medium size repeat units ranging from 22 to 79 bp, 11 of which are distributed in the IGS or intron regions. These medium size repeats may contribute to developing a cp genome-specific gene introduction vector because the region may use for specific recombination sites.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Klebig, M.L.; Woychik, R.P.; Wilkinson, J.E.
1994-09-01
The lethal yellow (A{sup y/-}) and viable yellow (A{sup vy/-}) mouse agouti mutants have a predominantly yellow pelage and display a complex syndrome that includes obesity, hyperinsulinemia, and insulin resistance, hallmark features of obesity-associated noninsulin-dependent diabetes mellitus (NIDDM) in humans. A new dominant agouti allele, A{sup iapy}, has recently been identified; like the A{sup vy} allele, it is homozygous viable and confers obesity and yellow fur in heterozygotes. The agouti gene was cloned and characterized at the molecular level. The gene is expressed in the skin during hair growth and is predicted to encode a 131 amino acid protein, thatmore » is likely to be a secreted factor. In both Ay/- and A{sup iapy}/- mice, the obesity and other dominant pleiotropic effects are associated with an ectopic expression of agouti in many tissues where the gene product is normally not produced. In Ay, a 170-kb deletion has occurred that causes an upstream promoter to drive the ectopic expression of the wild-type agouti coding exons. In A{sup iapy}, the coding region of the gene is expressed from a cryptic promoter within the LTR of an intracisternal A-particle (IAP), which has integrated within the region just upstream of the first agouti coding exon. Transgenic mice ubiquitously expressing the cloned agouti gene under the influence of the beta-actin and phosphoglycerate kinase promoters display obesity, hyperinsulinemia, and yellow coat color. This demonstrates unequivocally that ectopic expression of agouti is responsible for the yellow obese syndrome.« less
Smith, David Roy; Hua, Jimeng; Archibald, John M.; Lee, Robert W.
2013-01-01
Organelle DNA is no stranger to palindromic repeats. But never has a mitochondrial or plastid genome been described in which every coding region is part of a distinct palindromic unit. While sequencing the mitochondrial DNA of the nonphotosynthetic green alga Polytomella magna, we uncovered precisely this type of genic arrangement. The P. magna mitochondrial genome is linear and made up entirely of palindromes, each containing 1–7 unique coding regions. Consequently, every gene in the genome is duplicated and in an inverted orientation relative to its partner. And when these palindromic genes are folded into putative stem-loops, their predicted translational start sites are often positioned in the apex of the loop. Gel electrophoresis results support the linear, 28-kb monomeric conformation of the P. magna mitochondrial genome. Analyses of other Polytomella taxa suggest that palindromic mitochondrial genes were present in the ancestor of the Polytomella lineage and lost or retained to various degrees in extant species. The possible origins and consequences of this bizarre genomic architecture are discussed. PMID:23940100
Yin, Yan-hui; Li, Bi-chun; Wei, Guang-hui; Zhu, Cai-ye; Li, Wei; Zhang, Ya-ni; Du, Li-xin; Cao, Wen-guang
2012-05-01
The aim of this study was to clone the heart-type fatty acid binding protein (H-FABP) gene of Xuhuai goat, to explore it bioinformatically, and analyze the subcellular localization using enhanced green fluorescent protein (EGFP). The results showed that the coding sequence (CDS) length of Xuhuai goat H-FABP gene was 402 bp, encoding 133 amino acids (GenBank accession number AY466498.1). The H-FABP cDNA coding sequence was compared with the corresponding region of human, chicken, brown rat, cow, wild boar, donkey, and zebrafish. The similarity were 89%, 76%, 85%, 84%, 93%, 91%, 70%, respectively. For the corresponding amino acid sequences, the similarity were 90%, 79%, 88%, 97%, 95%, 94%, 72%, respectively. This study did not find the signal peptide region in the H-FABP protein; it revealed that H-FABP protein might be a nonsecreted protein. H-FABP expression was detected in vitro by reverse transcription-polymerase chain reaction (RT-PCR), and the EGFP-H-FABP fusion protein was localized to the cytoplasm. The gene could also be transiently and permanently expressed in mice.
Hu, Guang Fu; Liu, Xiang Jiang; Zou, Gui Wei; Li, Zhong; Liang, Hong-Wei; Hu, Shao-Na
2016-01-01
We sequenced the complete mitogenomes of (Cyprinus carpio haematopterus) and Russian scattered scale mirror carp (Cyprinus carpio carpio). Comparison of these two mitogenomes revealed that the mitogenomes of these two common carp strains were remarkably similar in genome length, gene order and content, and AT content. There were only 55 bp variations in 16,581 nucleotides. About 1 bp variation was located in rRNAs, 2 bp in tRNAs, 9 bp in the control region and 43 bp in protein-coding genes. Furthermore, forty-three variable nucleotides in the protein-coding genes of the two strains led to four variable amino acids, which were located in the ND2, ATPase 6, ND5 and ND6 genes, respectively.
The complete chloroplast genome of the Dendrobium strongylanthum (Orchidaceae: Epidendroideae).
Li, Jing; Chen, Chen; Wang, Zhe-Zhi
2016-07-01
Complete chloroplast genome sequence is very useful for studying the phylogenetic and evolution of species. In this study, the complete chloroplast genome of Dendrobium strongylanthum was constructed from whole-genome Illumina sequencing data. The chloroplast genome is 153 058 bp in length with 37.6% GC content and consists of two inverted repeats (IRs) of 26 316 bp. The IR regions are separated by large single-copy region (LSC, 85 836 bp) and small single-copy (SSC, 14 590 bp) region. A total of 130 chloroplast genes were successfully annotated, including 84 protein coding genes, 38 tRNA genes, and eight rRNA genes. Phylogenetic analyses showed that the chloroplast genome of Dendrobium strongylanthum is related to that of the Dendrobium officinal.
Xie, Qing; Shen, Kang-Ning; Hao, Xiuying; Nam, Phan Nhut; Ngoc Hieu, Bui Thi; Chen, Ching-Hung; Zhu, Changqing; Lin, Yen-Chang; Hsiao, Chung-Der
2017-03-01
abtract We decoded the complete chloroplast DNA (cpDNA) sequence of the Tianshan Snow Lotus (Saussurea involucrata), a famous traditional Chinese medicinal plant of the family Asteraceae, by using next-generation sequencing technology. The genome consists of 152 490 bp containing a pair of inverted repeats (IRs) of 25 202 bp, which was separated by a large single-copy region and a small single-copy region of 83 446 bp and 18 639 bp, respectively. The genic regions account for 57.7% of whole cpDNA, and the GC content of the cpDNA was 37.7%. The S. involucrata cpDNA encodes 114 unigenes (82 protein-coding genes, 4 rRNA genes, and 28 tRNA genes). There are eight protein-coding genes (atpF, ndhA, ndhB, rpl2, rpoC1, rps16, clpP, and ycf3) and five tRNA genes (trnA-UGC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) containing introns. A phylogenetic analysis of the 11 complete cpDNA from Asteracease showed that S. involucrata is closely related to Centaurea diffusa (Diffuse Knapweed). The complete cpDNA of S. involucrata provides essential and important DNA molecular data for further phylogenetic and evolutionary analysis for Asteraceae.
Yang, Huirong; Zhang, Jia-En; Luo, Hao; Luo, Mingzhu; Guo, Jing; Deng, Zhixin; Zhao, Benliang
2016-05-01
We present the complete mitochondrial genome of Cipangopaludina cathayensis in this study. The mitochondrial genome is 17,157 bp in length, containing 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes. All of them are encoded on the heavy strand except 7 tRNA genes on the light strand. Overall nucleotide compositions of the light strand are 44.51% of A, 26.74% of T, 20.48% of C and 8.28% of G. All the protein-coding genes start with ATG initiation codon except ATP6 with ATA and ND4 with TTG, and 2 types of termination codons are TAA (ATP6, ND2, COX1, COX2, ATP8, ND1, ND6, Cytb, COX3, ND4) and TAG (ND4L, ND5, ND3). There are 29 intergenic spacers and 5 gene overlaps. The tandem repeat sequences are observed in COX2, tRNA(Asp), ATP6, tRNA(Cys), S-rRNA, ND1, Cytb, ND4 and COX3 genes. Gene arrangement and distribution are different from the typical vertebrates. The absence of D-loop is consistent with the Gastropoda, but at least one lengthy non-coding region is essential regulatory element for the initiation of transcription and replication.
Cenik, Can; Chua, Hon Nian; Singh, Guramrit; Akef, Abdalla; Snyder, Michael P; Palazzo, Alexander F; Moore, Melissa J; Roth, Frederick P
2017-03-01
Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5 ' proximal- i ntron- m inus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N 1 -methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N 1 -methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC. © 2017 Cenik et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Zhong, Hua-Ming; Zhang, Hong-Hai; Sha, Wei-Lai; Zhang, Cheng-De; Chen, Yu-Cai
2010-04-01
The whole mitochondrial genome sequence of red fox (Vuples vuples) was determined. It had a total length of 16 723 bp. As in most mammal mitochondrial genome, it contained 13 protein coding genes, two ribosome RNA genes, 22 transfer RNA genes and one control region. The base composition was 31.3% A, 26.1% C, 14.8% G and 27.8% T, respectively. The codon usage of red fox, arctic fox, gray wolf, domestic dog and coyote followed the same pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 3 gene in the red fox. A long tandem repeat rich in AC was found between conserved sequence block 1 and 2 in the control region. In order to confirm the phylogenetic relationships of red fox to other canids, phylogenetic trees were reconstructed by neighbor-joining and maximum parsimony methods using 12 concatenated heavy-strand protein-coding genes. The result indicated that arctic fox was the sister group of red fox and they both belong to the red fox-like clade in family Canidae, while gray wolf, domestic dog and coyote belong to wolf-like clade. The result was in accordance with existing phylogenetic results.
The mitochondrial genome of Cethosia biblis (Drury) (Lepidoptera: Nymphalidae).
Xin, Tianrong; Li, Lei; Yao, Chengyi; Wang, Yayu; Zou, Zhiwen; Wang, Jing; Xia, Bin
2016-07-01
We present the complete mitogenome of Cethosia biblis (Drury) (Lepidoptera: Nymphalidae) in this article. The mitogenome was a circle molecular consisting of 15,286 nucleotides, 37 genes, and an A + T-rich region. The order of 37 genes was typical of insect mitochondrial DNA sequences described to date. The overall base composition of the genome is A (37.41%), T (42.80%), C (11.87%), and G (7.91%) with an A + T-rich hallmark as that of other invertebrate mitochondrial genomes. The start codon was mainly ATA in most of the mitochondrial protein-coding genes such as ND2, COI, ATP8, ND3, ND5, ND4, ND6, and ND1, but COII, ATP6, COIII, ND4L, and Cob genes employing ATG. The stop codon was TAA in all the protein-coding genes. The A + T region is located between 12S rRNA and tRNA(M)(et). The phylogenetic relationships of Lepidoptera species were constructed based on the nucleotides sequences of 13 PCGs of mitogenomes using the neighbor-joining method. The molecular-based phylogeny supported the traditional morphological classification on relationships within Lepidoptera species.
Internal control regions for transcription of eukaryotic tRNA genes.
Sharp, S; DeFranco, D; Dingermann, T; Farrell, P; Söll, D
1981-01-01
We have identified the region within a eukaryotic tRNA gene required for initiation of transcription. These results were obtained by systematically constructing deletions extending from the 5' or the 3' flanking regions into a cloned Drosophila tRNAArg gene by using nuclease BAL 31. The ability of the newly generated deletion clones to direct the in vitro synthesis of tRNA precursors was measured in transcription systems from Xenopus laevis oocytes, Drosophila Kc cells, and HeLa cells. Two control regions within the coding sequence were identified. The first was essential for transcription and was contained between nucleotides 8 and 25 of the mature tRNA sequence. Genes devoid of the second control region, which was contained between nucleotides 50 and 58 of the mature tRNA sequence, could be transcribed but with reduced efficiency. Thus, the promoter regions within a tRNA gene encode the tRNA sequences of the D stem and D loop, the invariant uridine at position 8, and the semi-invariant G-T-psi-C sequence. Images PMID:6947245
Le Scouarnec, Solena; Karakachoff, Matilde; Gourraud, Jean-Baptiste; Lindenbaum, Pierre; Bonnaud, Stéphanie; Portero, Vincent; Duboscq-Bidot, Laëtitia; Daumy, Xavier; Simonet, Floriane; Teusan, Raluca; Baron, Estelle; Violleau, Jade; Persyn, Elodie; Bellanger, Lise; Barc, Julien; Chatel, Stéphanie; Martins, Raphaël; Mabo, Philippe; Sacher, Frédéric; Haïssaguerre, Michel; Kyndt, Florence; Schmitt, Sébastien; Bézieau, Stéphane; Le Marec, Hervé; Dina, Christian; Schott, Jean-Jacques; Probst, Vincent; Redon, Richard
2015-05-15
The Brugada syndrome (BrS) is a rare heritable cardiac arrhythmia disorder associated with ventricular fibrillation and sudden cardiac death. Mutations in the SCN5A gene have been causally related to BrS in 20-30% of cases. Twenty other genes have been described as involved in BrS, but their overall contribution to disease prevalence is still unclear. This study aims to estimate the burden of rare coding variation in arrhythmia-susceptibility genes among a large group of patients with BrS. We have developed a custom kit to capture and sequence the coding regions of 45 previously reported arrhythmia-susceptibility genes and applied this kit to 167 index cases presenting with a Brugada pattern on the electrocardiogram as well as 167 individuals aged over 65-year old and showing no history of cardiac arrhythmia. By applying burden tests, a significant enrichment in rare coding variation (with a minor allele frequency below 0.1%) was observed only for SCN5A, with rare coding variants carried by 20.4% of cases with BrS versus 2.4% of control individuals (P = 1.4 × 10(-7)). No significant enrichment was observed for any other arrhythmia-susceptibility gene, including SCN10A and CACNA1C. These results indicate that, except for SCN5A, rare coding variation in previously reported arrhythmia-susceptibility genes do not contribute significantly to the occurrence of BrS in a population with European ancestry. Extreme caution should thus be taken when interpreting genetic variation in molecular diagnostic setting, since rare coding variants were observed in a similar extent among cases versus controls, for most previously reported BrS-susceptibility genes. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Intrinsic and extrinsic approaches for detecting genes in a bacterial genome.
Borodovsky, M; Rudd, K E; Koonin, E V
1994-01-01
The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 'intergenic' sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: i) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E. coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by GeneMark and BLAST, comprising 51.4% of the GeneMark 'hits' and 87.5% of the BLAST 'hits'. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins. Images PMID:7984428
Spielmann, A; Stutz, E
1983-01-01
The soybean chloroplast psb A gene (photosystem II thylakoid membrane protein of Mr 32 000, lysine-free) and the trn H gene (tRNAHisGUG), which both map in the large single copy region adjacent to one of the inverted repeat structures (IR1), have been sequenced including flanking regions. The psb A gene shows in its structural part 92% sequence homology with the corresponding genes of spinach and N. debneyi and contains also an open reading frame for 353 aminoacids. The aminoacid sequence of a potential primary translation product (calculated Mr, 38 904, no lysine) diverges from that of spinach and N. debneyi in only two positions in the C-terminal part. The trn H gene has the same polarity as the psb A gene and the coding region is located at the very end of the large single copy region. The deduced sequence of the soybean chloroplast tRNAHisGUG is identical with that of Zea mays chloroplasts. Both ends of the large single copy region were sequenced including a small segment of the adjacent IR1 and IR2. PMID:6314279
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Shiyu; Kaeppler, Shawn M.; Vogel, Kenneth P.
Switchgrass is undergoing development as a dedicated cellulosic bioenergy crop. Fermentation of lignocellulosic biomass to ethanol in a bioenergy system or to volatile fatty acids in a livestock production system is strongly and negatively influenced by lignification of cell walls. This study detects specific loci that exhibit selection signatures across switchgrass breeding populations that differ in in vitro dry matter digestibility (IVDMD), ethanol yield, and lignin concentration. Allele frequency changes in candidate genes were used to detect loci under selection. Out of the 183 polymorphisms identified in the four candidate genes, twenty-five loci in the intron regions and four locimore » in coding regions were found to display a selection signature. All loci in the coding regions are synonymous substitutions. Selection in both directions were observed on polymorphisms that appeared to be under selection. Genetic diversity and linkage disequilibrium within the candidate genes were low. The recurrent divergent selection caused excessive moderate allele frequencies in the cycle 3 reduced lignin population as compared to the base population. As a result, this study provides valuable insight on genetic changes occurring in short-term selection in the polyploid populations, and discovered potential markers for breeding switchgrass with improved biomass quality.« less
Chen, Shiyu; Kaeppler, Shawn M.; Vogel, Kenneth P.; ...
2016-11-28
Switchgrass is undergoing development as a dedicated cellulosic bioenergy crop. Fermentation of lignocellulosic biomass to ethanol in a bioenergy system or to volatile fatty acids in a livestock production system is strongly and negatively influenced by lignification of cell walls. This study detects specific loci that exhibit selection signatures across switchgrass breeding populations that differ in in vitro dry matter digestibility (IVDMD), ethanol yield, and lignin concentration. Allele frequency changes in candidate genes were used to detect loci under selection. Out of the 183 polymorphisms identified in the four candidate genes, twenty-five loci in the intron regions and four locimore » in coding regions were found to display a selection signature. All loci in the coding regions are synonymous substitutions. Selection in both directions were observed on polymorphisms that appeared to be under selection. Genetic diversity and linkage disequilibrium within the candidate genes were low. The recurrent divergent selection caused excessive moderate allele frequencies in the cycle 3 reduced lignin population as compared to the base population. As a result, this study provides valuable insight on genetic changes occurring in short-term selection in the polyploid populations, and discovered potential markers for breeding switchgrass with improved biomass quality.« less
FunGene: the functional gene pipeline and repository.
Fish, Jordan A; Chai, Benli; Wang, Qiong; Sun, Yanni; Brown, C Titus; Tiedje, James M; Cole, James R
2013-01-01
Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer. While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/) offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.
Huang, Kristen M; Geunes-Boyer, Scarlett; Wu, Sufen; Dutra, Amalia; Favor, Jack; Stambolian, Dwight
2004-05-01
Xcat mice display X-linked congenital cataracts and are a mouse model for the human X-linked cataract disease Nance Horan syndrome (NHS). The genetic defect in Xcat mice and NHS patients is not known. We isolated and sequenced a BAC contig representing a portion of the Xcat critical region. We combined our sequencing data with the most recent mouse sequence assemblies from both Celera and public databases. The sequence of the 2.2-Mb Xcat critical region was then analyzed for potential Xcat candidate genes. The coding regions of the seven known genes within this area (Rai2, Rbbp7, Ctps2, Calb3, Grpr, Reps2, and Syap1) were sequenced in Xcat mice and no mutations were detected. The expression of Rai2 was quantitatively identical in wild-type and Xcat mutant eyes. These results indicate that the Xcat mutation is within a novel, undiscovered gene.
Comprehensive Analysis of Genome Rearrangements in Eight Human Malignant Tumor Tissues
Wang, Chong
2016-01-01
Carcinogenesis is a complex multifactorial, multistage process, but the precise mechanisms are not well understood. In this study, we performed a genome-wide analysis of the copy number variation (CNV), breakpoint region (BPR) and fragile sites in 2,737 tumor samples from eight tumor entities and in 432 normal samples. CNV detection and BPR identification revealed that BPRs tended to accumulate in specific genomic regions in tumor samples whereas being dispersed genome-wide in the normal samples. Hotspots were observed, at which segments with similar alteration in copy number were overlapped along with BPRs adjacently clustered. Evaluation of BPR occurrence frequency showed that at least one was detected in about and more than 15% of samples for each tumor entity while BPRs were maximal in 12% of the normal samples. 127 of 2,716 tumor-relevant BPRs (termed ‘common BPRs’) exhibited also a noticeable occurrence frequency in the normal samples. Colocalization assessment identified 20,077 CNV-affecting genes and 169 of these being known tumor-related genes. The most noteworthy genes are KIAA0513 important for immunologic, synaptic and apoptotic signal pathways, intergenic non-coding RNA RP11-115C21.2 possibly acting as oncogene or tumor suppressor by changing the structure of chromatin, and ADAM32 likely importance in cancer cell proliferation and progression by ectodomain-shedding of diverse growth factors, and the well-known tumor suppressor gene p53. The BPR distributions indicate that CNV mutations are likely non-random in tumor genomes. The marked recurrence of BPRs at specific regions supports common progression mechanisms in tumors. The presence of hotspots together with common BPRs, despite its small group size, imply a relation between fragile sites and cancer-gene alteration. Our data further suggest that both protein-coding and non-coding genes possessing a range of biological functions might play a causative or functional role in tumor biology. This research enhances our understanding of the mechanisms for tumorigenesis and progression. PMID:27391163
Origin of sphinx, a young chimeric RNA gene in Drosophila melanogaster
Wang, Wen; Brunet, Frédéric G.; Nevo, Eviatar; Long, Manyuan
2002-01-01
Non-protein-coding RNA genes play an important role in various biological processes. How new RNA genes originated and whether this process is controlled by similar evolutionary mechanisms for the origin of protein-coding genes remains unclear. A young chimeric RNA gene that we term sphinx (spx) provides the first insight into the early stage of evolution of RNA genes. spx originated as an insertion of a retroposed sequence of the ATP synthase chain F gene at the cytological region 60DB since the divergence of Drosophila melanogaster from its sibling species 2–3 million years ago. This retrosequence, which is located at 102F on the fourth chromosome, recruited a nearby exon and intron, thereby evolving a chimeric gene structure. This molecular process suggests that the mechanism of exon shuffling, which can generate protein-coding genes, also plays a role in the origin of RNA genes. The subsequent evolutionary process of spx has been associated with a high nucleotide substitution rate, possibly driven by a continuous positive Darwinian selection for a novel function, as is shown in its sex- and development-specific alternative splicing. To test whether spx has adapted to different environments, we investigated its population genetic structure in the unique “Evolution Canyon” in Israel, revealing a similar haplotype structure in spx, and thus similar evolutionary forces operating on spx between environments. PMID:11904380
42 CFR 73.3 - HHS select agents and toxins.
Code of Federal Regulations, 2013 CFR
2013-10-01
... replication competent forms of the 1918 pandemic influenza virus containing any portion of the coding regions of all eight gene segments (Reconstructed 1918 Influenza virus) Ricin Rickettsia prowazekii SARS...
42 CFR 73.3 - HHS select agents and toxins.
Code of Federal Regulations, 2012 CFR
2012-10-01
... virus Monkeypox virus Reconstructed replication competent forms of the 1918 pandemic influenza virus containing any portion of the coding regions of all eight gene segments (Reconstructed 1918 Influenza virus...
42 CFR 73.3 - HHS select agents and toxins.
Code of Federal Regulations, 2014 CFR
2014-10-01
... replication competent forms of the 1918 pandemic influenza virus containing any portion of the coding regions of all eight gene segments (Reconstructed 1918 Influenza virus) Ricin Rickettsia prowazekii SARS...
... Caregiver Education » Fact Sheets Machado-Joseph Disease Fact Sheet What is Machado-Joseph disease? What are the ... the repeat is in a protein-producing or coding region of the gene. Modifications of the mutant ...
Junk DNA and the long non-coding RNA twist in cancer genetics
Ling, Hui; Vincent, Kimberly; Pichler, Martin; Fodde, Riccardo; Berindan-Neagoe, Ioana; Slack, Frank J.; Calin, George A
2015-01-01
The central dogma of molecular biology states that the flow of genetic information moves from DNA to RNA to protein. However, in the last decade this dogma has been challenged by new findings on non-coding RNAs (ncRNAs) such as microRNAs (miRNAs). More recently, long non-coding RNAs (lncRNAs) have attracted much attention due to their large number and biological significance. Many lncRNAs have been identified as mapping to regulatory elements including gene promoters and enhancers, ultraconserved regions, and intergenic regions of protein-coding genes. Yet, the biological function and molecular mechanisms of lncRNA in human diseases in general and cancer in particular remain largely unknown. Data from the literature suggest that lncRNA, often via interaction with proteins, functions in specific genomic loci or use their own transcription loci for regulatory activity. In this review, we summarize recent findings supporting the importance of DNA loci in lncRNA function, and the underlying molecular mechanisms via cis or trans regulation, and discuss their implications in cancer. In addition, we use the 8q24 genomic locus, a region containing interactive SNPs, DNA regulatory elements and lncRNAs, as an example to illustrate how single nucleotide polymorphism (SNP) located within lncRNAs may be functionally associated with the individual’s susceptibility to cancer. PMID:25619839
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhengqiu, C.; Penaflor, C.; Kuehl, J.V.
2006-06-01
The magnoliids represent the largest basal angiosperm clade with four orders, 19 families and 8,500 species. Although several recent angiosperm molecular phylogenies have supported the monophyly of magnoliids and suggested relationships among the orders, the limited number of genes examined resulted in only weak support, and these issues remain controversial. Furthermore, considerable incongruence has resulted in phylogenies supporting three different sets of relationships among magnoliids and the two large angiosperm clades, monocots and eudicots. This is one of the most important remaining issues concerning relationships among basal angiosperms. We sequenced the chloroplast genomes of three magnoliids, Drimys (Canellales), Liriodendron (Magnoliales),more » and Piper (Piperales), and used these data in combination with 32 other completed angiosperm chloroplast genomes to assess phylogenetic relationships among magnoliids. The Drimys and Piper chloroplast genomes are nearly identical in size at 160,606 and 160,624 bp, respectively. The genomes include a pair of inverted repeats of 26,649 bp (Drimys) and 27,039 (Piper), separated by a small single copy region of 18,621 (Drimys) and 18,878 (Piper) and a large single copy region of 88,685 bp (Drimys) and 87,666 bp (Piper). The gene order of both taxa is nearly identical to many other unrearranged angiosperm chloroplast genomes, including Calycanthus, the other published magnoliid genome. Comparisons of angiosperm chloroplast genomes indicate that GC content is not uniformly distributed across the genome. Overall GC content ranges from 34-39%, and coding regions have a substantially higher GC content than non-coding regions (both intergenic spacers and introns). Among protein-coding genes, GC content varies by codon position with 1st codon > 2nd codon > 3rd codon, and it varies by functional group with photosynthetic genes having the highest percentage and NADH genes the lowest. Across the genome, GC content is highest in the inverted repeat due to the presence of rRNA genes and lowest in the small single copy region where most NADH genes are located. Phylogenetic analyses using maximum parsimony and maximum likelihood methods were performed on DNA sequences of 61 protein-coding genes. Trees from both analyses provided strong support for the monophyly of magnoliids and two strongly supported groups were identified, the Canellales/Piperales and the Laurales/Magnoliales. The phylogenies also provided moderate to strong support for the basal position of Amborella, and a sister relationship of magnoliids to a clade that includes monocots and eudicots. The complete sequences of three magnoliid chloroplast genomes provide new data from the largest basal angiosperm clade. Evolutionary comparisons of these new genome sequences, combined with other published angiosperm genome, confirm that GC content is unevenly distributed across the genome by location, codon position, and functional group. Furthermore, phylogenetic analyses provide the strongest support so far for the hypothesis that the magnoliids are sister to a large clade that includes both monocots and eudicots.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
von Nickisch-Rosenegk, Markus; Brown, Wesley M.; Boore, Jeffrey L.
2001-01-01
Using ''long-PCR'' we have amplified in overlapping fragments the complete mitochondrial genome of the tapeworm Hymenolepis diminuta (Platyhelminthes: Cestoda) and determined its 13,900 nucleotide sequence. The gene content is the same as that typically found for animal mitochondrial DNA (mtDNA) except that atp8 appears to be lacking, a condition found previously for several other animals. Despite the small size of this mtDNA, there are two large non-coding regions, one of which contains 13 repeats of a 31 nucleotide sequence and a potential stem-loop structure of 25 base pairs with an 11-member loop. Large potential secondary structures are identified also formore » the non-coding regions of two other cestode mtDNAs. Comparison of the mitochondrial gene arrangement of H. diminuta with those previously published supports a phylogenetic position of flatworms as members of the Eutrochozoa, rather than being basal to either a clade of protostomes or a clade of coelomates.« less
Gardner, Elliot M.; Johnson, Matthew G.; Ragone, Diane; Wickett, Norman J.; Zerega, Nyree J. C.
2016-01-01
Premise of the study: We used moderately low-coverage (17×) whole-genome sequencing of Artocarpus camansi (Moraceae) to develop genomic resources for Artocarpus and Moraceae. Methods and Results: A de novo assembly of Illumina short reads (251,378,536 pairs, 2 × 100 bp) accounted for 93% of the predicted genome size. Predicted coding regions were used in a three-way orthology search with published genomes of Morus notabilis and Cannabis sativa. Phylogenetic markers for Moraceae were developed from 333 inferred single-copy exons. Ninety-eight putative MADS-box genes were identified. Analysis of all predicted coding regions resulted in preliminary annotation of 49,089 genes. An analysis of synonymous substitutions for pairs of orthologs (Ks analysis) in M. notabilis and A. camansi strongly suggested a lineage-specific whole-genome duplication in Artocarpus. Conclusions: This study substantially increases the genomic resources available for Artocarpus and Moraceae and demonstrates the value of low-coverage de novo assemblies for nonmodel organisms with moderately large genomes. PMID:27437173
Dalla Valle, Luisa; Nardi, Alessia; Belvedere, Paola; Toni, Mattia; Alibardi, Lorenzo
2007-07-01
Beta-keratins of reptilian scales have been recently cloned and characterized in some lizards. Here we report for the first time the sequence of some beta-keratins from the snake Elaphe guttata. Five different cDNAs were obtained using 5'- and 3'-RACE analyses. Four sequences differ by only few nucleotides in the coding region, whereas the last cDNA shows, in this region, only 84% of identity. The gene corresponding to one of the cDNA sequences has a single intron present in the 5'-untranslated region. This genomic organization is similar to that of birds' beta-keratins. Cloning and Southern blotting analysis suggest that snake beta-keratins belong to a family of high-related genes as for geckos. PCR analysis suggests a head-to-tail orientation of genes in the same chromosome. In situ hybridization detected beta-keratin transcripts almost exclusively in differentiating oberhautchen and beta-cells of the snake epidermis in renewal phase. This is confirmed by Northern blotting that showed, in this phase, a high expression of two different transcripts whereas only the longer transcript is expressed at a much lower level in resting skin. The cDNA coding sequences encoded putative glycine-proline-serine rich proteins containing 137-139 amino acids, with apparent isoelectric point at 7.5 and 8.2. A central region, rich in proline, shows over 50% homology with avian scale, claw, and feather keratins. The prediction of secondary structure shows mainly a random coil conformation and few beta-strand regions in the central region, likely involved in the formation of a fibrous framework of beta-keratins. This region was possibly present in basic reptiles that originated reptiles and birds. Copyright 2007 Wiley-Liss, Inc.
Vanlalruati, Catherine; Mandal, Surajit De; Gurusubramanian, Guruswami; Senthil Kumar, Nachimuthu
2016-07-01
The complete mitochondrial genome of Junonia iphita was determined to be 15,433 bp in length, including 37 typical mitochondrial genes and an AT-rich region. All the protein coding genes (PCGs) are initiated by typical ATN codons, except cox1 gene that is by CGA codon. Eight genes use complete termination codon (TAA), whereas the cox1, cox2 and nad5 genes end with single T; nad4 and nad1 ends with stop codon TA. All the tRNA show secondary cloverleaf structures except trnS1 (AGN). The A + T rich region is 546 bp in length containing ATAGA motif followed by a 18 bp poly-T stretch, two microsatellite-like (TA)9 elements and 8 bp poly-A stretch immediately upstream of trnM gene.
Genetic Mapping of a Mutant Defective in d, l-Alanine Racemase in Bacillus subtilis 168
Dul, Michael J.; Young, Frank E.
1973-01-01
Genetic analysis of a d-alanine requiring mutant (dal) of Bacillus subtilis reveals that the gene that codes for d,l-alanine racemase is linked to purB. The order of genes in this region of the chromosome is purB, pig, tsi, dal. Thus there are at least two clusters of genes that regulate cell wall biosynthesis in B. subtilis. PMID:4199510
Quantifying the mechanisms of domain gain in animal proteins.
Buljan, Marija; Frankish, Adam; Bateman, Alex
2010-01-01
Protein domains are protein regions that are shared among different proteins and are frequently functionally and structurally independent from the rest of the protein. Novel domain combinations have a major role in evolutionary innovation. However, the relative contributions of the different molecular mechanisms that underlie domain gains in animals are still unknown. By using animal gene phylogenies we were able to identify a set of high confidence domain gain events and by looking at their coding DNA investigate the causative mechanisms. Here we show that the major mechanism for gains of new domains in metazoan proteins is likely to be gene fusion through joining of exons from adjacent genes, possibly mediated by non-allelic homologous recombination. Retroposition and insertion of exons into ancestral introns through intronic recombination are, in contrast to previous expectations, only minor contributors to domain gains and have accounted for less than 1% and 10% of high confidence domain gain events, respectively. Additionally, exonization of previously non-coding regions appears to be an important mechanism for addition of disordered segments to proteins. We observe that gene duplication has preceded domain gain in at least 80% of the gain events. The interplay of gene duplication and domain gain demonstrates an important mechanism for fast neofunctionalization of genes.
Utilization of genetic tests: analysis of gene-specific billing in Medicare claims data.
Lynch, Julie A; Berse, Brygida; Dotson, W David; Khoury, Muin J; Coomer, Nicole; Kautter, John
2017-08-01
We examined the utilization of precision medicine tests among Medicare beneficiaries through analysis of gene-specific tier 1 and 2 billing codes developed by the American Medical Association in 2012. We conducted a retrospective cross-sectional study. The primary source of data was 2013 Medicare 100% fee-for-service claims. We identified claims billed for each laboratory test, the number of patients tested, expenditures, and the diagnostic codes indicated for testing. We analyzed variations in testing by patient demographics and region of the country. Pharmacogenetic tests were billed most frequently, accounting for 48% of the expenditures for new codes. The most common indications for testing were breast cancer, long-term use of medications, and disorders of lipid metabolism. There was underutilization of guideline-recommended tumor mutation tests (e.g., epidermal growth factor receptor) and substantial overutilization of a test discouraged by guidelines (methylenetetrahydrofolate reductase). Methodology-based tier 2 codes represented 15% of all claims billed with the new codes. The highest rate of testing per beneficiary was in Mississippi and the lowest rate was in Alaska. Gene-specific billing codes significantly improved our ability to conduct population-level research of precision medicine. Analysis of these data in conjunction with clinical records should be conducted to validate findings.Genet Med advance online publication 26 January 2017.
Yong, Hoi-Sen; Lim, Phaik-Eem; Eamsobhana, Praphathip
2017-01-01
The tephritid fruit fly Zeugodacus tau (Walker) is a polyphagous fruit pest of economic importance in Asia. Studies based on genetic markers indicate that it forms a species complex. We report here (1) the complete mitogenome of Z. tau from Malaysia and comparison with that of China as well as the mitogenome of other congeners, and (2) the relationship of Z. tau taxa from different geographical regions based on sequences of cytochrome c oxidase subunit I gene. The complete mitogenome of Z. tau had a total length of 15631 bp for the Malaysian specimen (ZT3) and 15835 bp for the China specimen (ZT1), with similar gene order comprising 37 genes (13 protein-coding genes—PCGs, 2 rRNA genes, and 22 tRNA genes) and a non-coding A + T-rich control region (D-loop). Based on 13 PCGs and 15 mt-genes, Z. tau NC_027290 (China) and Z. tau ZT1 (China) formed a sister group in the lineage containing also Z. tau ZT3 (Malaysia). Phylogenetic analysis based on partial sequences of cox1 gene indicates that the taxa from China, Japan, Laos, Malaysia, Bangladesh, India, Sri Lanka, and Z. tau sp. A from Thailand belong to Z. tau sensu stricto. A complete cox1 gene (or 13 PCGs or 15 mt-genes) instead of partial sequence is more appropriate for determining phylogenetic relationship. PMID:29216281
Yu, Jeong-Nam; Kim, Byung-Jik; Kim, Changmu; Yeo, Joo-Hong; Kim, Soonok
2017-01-01
The Black star fat minnow (Rhynchocypris semotilus) is an endemic and critically endangered freshwater fish in Korea. Its genome was 16 605 bp long and consisted of 13 protein-coding genes (PCG), two rRNA genes, 22 tRNA genes, and a control region. The gene order and the composition of R. semotilus were similar to that of most other vertebrates. Four overlapping regions in ATP8/ATP6, ATP6/COX3, ND4L/ND4, and ND5/ND6, among the 13 PCGs were found. The control region was located between the tRNA-Pro and tRNA-Phe genes and was determined to be 935 bp in length with the 3' end containing a 12 TA-repeat sequence. Phylogenetic analysis suggested that R. semotilus is most closely related to R. oxycephalus.
Regional localization of the human integrin {beta}{sub 6} gene (ITGB6) to chromosome 2q24-q31
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fernandez-Ruiz, E.; Sanchez-Madrid, F.
The heterodimer {alpha}{sub v}{beta}{sub 6} acts as a fibronectin receptor for human carcinoma cells. The authors report here the regional localization of the {beta}{sub 6} gene to 2q24-q31 by fluorescence in situ hybridization coupled with GTG-banding. This gene is located close to the region to which genes coding for the {alpha} subunits of the integrins VLA-4 and vitronectin receptor (ITGA4 and ITGAV, respectively) have been previously mapped (2q31-q32). These data suggest a proximal position of the integrin {beta}{sub 6} locus (ITGB6) on this integrin gene cluster. Futhermore, double-labeling in situ hybridization experiments performed with {alpha}{sub 4} and {alpha}{sub v} probesmore » indicated a telomeric position of ITGAV with respect to ITGA4. 22 refs., 2 figs.« less
Specific DNA binding of the two chicken Deformed family homeodomain proteins, Chox-1.4 and Chox-a.
Sasaki, H; Yokoyama, E; Kuroiwa, A
1990-01-01
The cDNA clones encoding two chicken Deformed (Dfd) family homeobox containing genes Chox-1.4 and Chox-a were isolated. Comparison of their amino acid sequences with another chicken Dfd family homeodomain protein and with those of mouse homologues revealed that strong homologies are located in the amino terminal regions and around the homeodomains. Although homologies in other regions were relatively low, some short conserved sequences were also identified. E. coli-made full length proteins were purified and used for the production of specific antibodies and for DNA binding studies. The binding profiles of these proteins to the 5'-leader and 5'-upstream sequences of Chox-1.4 and Chox-a coding regions were analyzed by immunoprecipitation and DNase I footprint assays. These two Chox proteins bound to the same sites in the 5'-flanking sequences of their coding regions with various affinities and their binding affinities to each site were nearly the same. The consensus sequences of the high and low affinity binding sites were TAATGA(C/G) and CTAATTTT, respectively. A clustered binding site was identified in the 5'-upstream of the Chox-a gene, suggesting that this clustered binding site works as a cis-regulatory element for auto- and/or cross-regulation of Chox-a gene expression. Images PMID:1970866
González, Leonardo Galindo; Deyholos, Michael K
2012-11-21
Flax (Linum usitatissimum L.) is an important crop for the production of bioproducts derived from its seed and stem fiber. Transposable elements (TEs) are widespread in plant genomes and are a key component of their evolution. The availability of a genome assembly of flax (Linum usitatissimum) affords new opportunities to explore the diversity of TEs and their relationship to genes and gene expression. Four de novo repeat identification algorithms (PILER, RepeatScout, LTR_finder and LTR_STRUC) were applied to the flax genome assembly. The resulting library of flax repeats was combined with the RepBase Viridiplantae division and used with RepeatMasker to identify TEs coverage in the genome. LTR retrotransposons were the most abundant TEs (17.2% genome coverage), followed by Long Interspersed Nuclear Element (LINE) retrotransposons (2.10%) and Mutator DNA transposons (1.99%). Comparison of putative flax TEs to flax transcript databases indicated that TEs are not highly expressed in flax. However, the presence of recent insertions, defined by 100% intra-element LTR similarity, provided evidence for recent TE activity. Spatial analysis showed TE-rich regions, gene-rich regions as well as regions with similar genes and TE density. Monte Carlo simulations for the 71 largest scaffolds (≥ 1 Mb each) did not show any regional differences in the frequency of TE overlap with gene coding sequences. However, differences between TE superfamilies were found in their proximity to genes. Genes within TE-rich regions also appeared to have lower transcript expression, based on EST abundance. When LTR elements were compared, Copia showed more diversity, recent insertions and conserved domains than the Gypsy, demonstrating their importance in genome evolution. The calculated 23.06% TE coverage of the flax WGS assembly is at the low end of the range of TE coverages reported in other eudicots, although this estimate does not include TEs likely found in unassembled repetitive regions of the genome. Since enrichment for TEs in genomic regions was associated with reduced expression of neighbouring genes, and many members of the Copia LTR superfamily are inserted close to coding regions, we suggest Copia elements have a greater influence on recent flax genome evolution while Gypsy elements have become residual and highly mutated.
2012-01-01
Background Flax (Linum usitatissimum L.) is an important crop for the production of bioproducts derived from its seed and stem fiber. Transposable elements (TEs) are widespread in plant genomes and are a key component of their evolution. The availability of a genome assembly of flax (Linum usitatissimum) affords new opportunities to explore the diversity of TEs and their relationship to genes and gene expression. Results Four de novo repeat identification algorithms (PILER, RepeatScout, LTR_finder and LTR_STRUC) were applied to the flax genome assembly. The resulting library of flax repeats was combined with the RepBase Viridiplantae division and used with RepeatMasker to identify TEs coverage in the genome. LTR retrotransposons were the most abundant TEs (17.2% genome coverage), followed by Long Interspersed Nuclear Element (LINE) retrotransposons (2.10%) and Mutator DNA transposons (1.99%). Comparison of putative flax TEs to flax transcript databases indicated that TEs are not highly expressed in flax. However, the presence of recent insertions, defined by 100% intra-element LTR similarity, provided evidence for recent TE activity. Spatial analysis showed TE-rich regions, gene-rich regions as well as regions with similar genes and TE density. Monte Carlo simulations for the 71 largest scaffolds (≥ 1 Mb each) did not show any regional differences in the frequency of TE overlap with gene coding sequences. However, differences between TE superfamilies were found in their proximity to genes. Genes within TE-rich regions also appeared to have lower transcript expression, based on EST abundance. When LTR elements were compared, Copia showed more diversity, recent insertions and conserved domains than the Gypsy, demonstrating their importance in genome evolution. Conclusions The calculated 23.06% TE coverage of the flax WGS assembly is at the low end of the range of TE coverages reported in other eudicots, although this estimate does not include TEs likely found in unassembled repetitive regions of the genome. Since enrichment for TEs in genomic regions was associated with reduced expression of neighbouring genes, and many members of the Copia LTR superfamily are inserted close to coding regions, we suggest Copia elements have a greater influence on recent flax genome evolution while Gypsy elements have become residual and highly mutated. PMID:23171245
Araya, Carlos L.; Cenik, Can; Reuter, Jason A.; Kiss, Gert; Pande, Vijay S.; Snyder, Michael P.; Greenleaf, William J.
2015-01-01
Cancer sequencing studies have primarily identified cancer-driver genes by the accumulation of protein-altering mutations. An improved method would be annotation-independent, sensitive to unknown distributions of functions within proteins, and inclusive of non-coding drivers. We employed density-based clustering methods in 21 tumor types to detect variably-sized significantly mutated regions (SMRs). SMRs reveal recurrent alterations across a spectrum of coding and non-coding elements, including transcription factor binding sites and untranslated regions mutated in up to ∼15% of specific tumor types. SMRs reveal spatial clustering of mutations at molecular domains and interfaces, often with associated changes in signaling. Mutation frequencies in SMRs demonstrate that distinct protein regions are differentially mutated among tumor types, as exemplified by a linker region of PIK3CA in which biophysical simulations suggest mutations affect regulatory interactions. The functional diversity of SMRs underscores both the varied mechanisms of oncogenic misregulation and the advantage of functionally-agnostic driver identification. PMID:26691984
Core histone genes of Giardia intestinalis: genomic organization, promoter structure, and expression
Yee, Janet; Tang, Anita; Lau, Wei-Ling; Ritter, Heather; Delport, Dewald; Page, Melissa; Adam, Rodney D; Müller, Miklós; Wu, Gang
2007-01-01
Background Giardia intestinalis is a protist found in freshwaters worldwide, and is the most common cause of parasitic diarrhea in humans. The phylogenetic position of this parasite is still much debated. Histones are small, highly conserved proteins that associate tightly with DNA to form chromatin within the nucleus. There are two classes of core histone genes in higher eukaryotes: DNA replication-independent histones and DNA replication-dependent ones. Results We identified two copies each of the core histone H2a, H2b and H3 genes, and three copies of the H4 gene, at separate locations on chromosomes 3, 4 and 5 within the genome of Giardia intestinalis, but no gene encoding a H1 linker histone could be recognized. The copies of each gene share extensive DNA sequence identities throughout their coding and 5' noncoding regions, which suggests these copies have arisen from relatively recent gene duplications or gene conversions. The transcription start sites are at triplet A sequences 1–27 nucleotides upstream of the translation start codon for each gene. We determined that a 50 bp region upstream from the start of the histone H4 coding region is the minimal promoter, and a highly conserved 15 bp sequence called the histone motif (him) is essential for its activity. The Giardia core histone genes are constitutively expressed at approximately equivalent levels and their mRNAs are polyadenylated. Competition gel-shift experiments suggest that a factor within the protein complex that binds him may also be a part of the protein complexes that bind other promoter elements described previously in Giardia. Conclusion In contrast to other eukaryotes, the Giardia genome has only a single class of core histone genes that encode replication-independent histones. Our inability to locate a gene encoding the linker histone H1 leads us to speculate that the H1 protein may not be required for the compaction of Giardia's small and gene-rich genome. PMID:17425802
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ploos van Amstel, H.; Reitsma, P.H.; van der Logt, C.P.
The human protein S locus on chromosome 3 consists of two protein S genes, PS{alpha} and PS{beta}. Here the authors report the cloning and characterization of both genes. Fifteen exons of the PS{alpha} gene were identified that together code for protein S mRNA as derived from the reported protein S cDNAs. Analysis by primer extension of liver protein S mRNA, however, reveals the presence of two mRNA forms that differ in the length of their 5{prime}-noncoding region. Both transcripts contain a 5{prime}-noncoding region longer than found in the protein S cDNAs. The two products may arise from alternative splicing ofmore » an additional intron in this region or from the usage of two start sites for transcription. The intron-exon organization of the PS{alpha} gene fully supports the hypothesis that the protein S gene is the product of an evolutional assembling process in which gene modules coding for structural/functional protein units also found in other coagulation proteins have been put upstream of the ancestral gene of a steroid hormone binding protein. The PS{beta} gene is identified as a pseudogene. It contains a large variety of detrimental aberrations, viz., the absence of exon I, a splice site mutation, three stop codons, and a frame shift mutation. Overall the two genes PS{alpha} and PS{beta} show between their exonic sequences 96.5% homology. Southern analysis of primate DNA showed that the duplication of the ancestral protein S gene has occurred after the branching of the orangutan from the African apes. A nonsense mutation that is present in the pseudogene of man also could be identified in one of the two protein S genes of both chimpanzee and gorilla. This implicates that silencing of one of the two protein S genes must have taken place before the divergence of the three African apes.« less
Evidence for an ergot alkaloid gene cluster in Claviceps purpurea.
Tudzynski, P; Hölter, K; Correia, T; Arntz, C; Grammel, N; Keller, U
1999-02-01
A gene (cpd1) coding for the dimethylallyltryptophan synthase (DMATS) that catalyzes the first specific step in the biosynthesis of ergot alkaloids, was cloned from a strain of Claviceps purpurea that produces alkaloids in axenic culture. The derived gene product (CPD1) shows only 70% similarity to the corresponding gene previously isolated from Claviceps strain ATCC 26245, which is likely to be an isolate of C. fusiformis. Therefore, the related cpd1 most probably represents the first C. purpurea gene coding for an enzymatic step of the alkaloid biosynthetic pathway to be cloned. Analysis of the 3'-flanking region of cpd1 revealed a second, closely linked ergot alkaloid biosynthetic gene named cpps1, which codes for a 356-kDa polypeptide showing significant similarity to fungal modular peptide synthetases. The protein contains three amino acid-activating modules, and in the second module a sequence is found which matches that of an internal peptide (17 amino acids in length) obtained from a tryptic digest of lysergyl peptide synthetase 1 (LPS1) of C. purpurea, thus confirming that cpps1 encodes LPS1. LPS1 activates the three amino acids of the peptide portion of ergot peptide alkaloids during D-lysergyl peptide assembly. Chromosome walking revealed the presence of additional genes upstream of cpd1 which are probably also involved in ergot alkaloid biosynthesis: cpox1 probably codes for an FAD-dependent oxidoreductase (which could represent the chanoclavine cyclase), and a second putative oxidoreductase gene, cpox2, is closely linked to it in inverse orientation. RT-PCR experiments confirm that all four genes are expressed under conditions of peptide alkaloid biosynthesis. These results strongly suggest that at least some genes of ergot alkaloid biosynthesis in C. purpurea are clustered, opening the way for a detailed molecular genetic analysis of the pathway.
Hinney, Anke; Hoch, Anne; Geller, Frank; Schäfer, Helmut; Siegfried, Wolfgang; Goldschmidt, Hanspeter; Remschmidt, Helmut; Hebebrand, Johannes
2002-06-01
Ghrelin induces obesity via central and peripheral mechanisms. Administration of ghrelin leads to increased food intake and decreased fat utilisation in rodents. Ghrelin levels are decreased in obese individuals. Recently, a polymorphism (Arg-51-Gln) within the ghrelin gene (GHRL) was described to be associated with obesity. We screened the GHRL coding region in 215 extremely obese German Children and adolescents (study group 1) and 93 normal weight students (study group 2) by single strand conformation polymorphism analysis (SSCP). We found the two previously described single nucleotide polymorphisms (SNP: Arg-51-Gln and Leu-72-Met) in similar frequencies in study groups 1 and 2 (allele frequencies were: 0.019 and 0.016 for the 51-Gln allele and 0.091 and 0.086 for the 72-Met allele, respectively). Hence, we could not confirm the previous finding. Additionally, two novel variants were identified within the coding region: (1) We detected one healthy normal weight individual with a frameshift mutation (2bp deletion at codon 34). This frameshift mutation affects the coding region of the mature ghrelin. Hence, it is highly likely that the normal weight student is haplo-insufficient for ghrelin. (2) An A to T transversion leads to an amino acid exchange from Gln to Leu at amino acid position 90. The frequency of the 90-Leu allele was significantly higher in the extremely obese children and adolescents (0.063) than in the normal weight students (0.016; nominal p = 0.011). Additionally, we genotyped 134 underweight students and 44 normal weight adults for this SNP. Genotype frequencies were similar in extremely obese children and adolescents, underweight students and normal weight adults (p > 0.8). In conclusion, we identified four sequence variants in the coding region of the ghrelin gene in individuals belonging to different weight extremes. A frameshift mutation was detected in a normal weight individual. None of the variants seem to influence weight regulation.
Decoding the complex genetic causes of heart diseases using systems biology.
Djordjevic, Djordje; Deshpande, Vinita; Szczesnik, Tomasz; Yang, Andrian; Humphreys, David T; Giannoulatou, Eleni; Ho, Joshua W K
2015-03-01
The pace of disease gene discovery is still much slower than expected, even with the use of cost-effective DNA sequencing and genotyping technologies. It is increasingly clear that many inherited heart diseases have a more complex polygenic aetiology than previously thought. Understanding the role of gene-gene interactions, epigenetics, and non-coding regulatory regions is becoming increasingly critical in predicting the functional consequences of genetic mutations identified by genome-wide association studies and whole-genome or exome sequencing. A systems biology approach is now being widely employed to systematically discover genes that are involved in heart diseases in humans or relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the complex genetic causes of congenital and complex heart diseases. This review summarises state-of-the-art genomic and bioinformatics techniques that are used in accelerating the pace of disease gene discovery in heart diseases. Accompanying this review, we provide an interactive web-resource for systems biology analysis of mammalian heart development and diseases, CardiacCode ( http://CardiacCode.victorchang.edu.au/ ). CardiacCode features a dataset of over 700 pieces of manually curated genetic or molecular perturbation data, which enables the inference of a cardiac-specific GRN of 280 regulatory relationships between 33 regulator genes and 129 target genes. We believe this growing resource will fill an urgent unmet need to fully realise the true potential of predictive and personalised genomic medicine in tackling human heart disease.
Human heavy chain disease protein WIS: implications for the organization of immunoglobulin genes.
Franklin, E C; Prelli, F; Frangione, B
1979-01-01
Protein WIS is a human gamma3 heavy (H) chain disease immunoglobulin variant whose amino acid sequence is most readily interpreted by postulating that three residues of the amino terminus are followed by a deletion of most of the variable (VH) domain, which ends at the variable-constant (VC) joining region. Then there is a stretch of eight residues, three of which are unusual, while the other five have striking homology to the VC junction sequence. This is followed by a second deletion, which ends at the beginning of the quadruplicated hinge region. These findings are consistent with mutations resulting in deletions of most of the gene coding for the V region and CH1 domain followed by splicing at the VC joining region and at the hinge. These structural features fit well the notion of genetic discontinuity between V and C genes and also suggest similar mechanisms of excision and splicing in the interdomain regions of the C gene of the heavy chain. PMID:106391
Hemipteran Mitochondrial Genomes: Features, Structures and Implications for Phylogeny
Wang, Yuan; Chen, Jing; Jiang, Li-Yun; Qiao, Ge-Xia
2015-01-01
The study of Hemipteran mitochondrial genomes (mitogenomes) began with the Chagas disease vector, Triatoma dimidiata, in 2001. At present, 90 complete Hemipteran mitogenomes have been sequenced and annotated. This review examines the history of Hemipteran mitogenomes research and summarizes the main features of them including genome organization, nucleotide composition, protein-coding genes, tRNAs and rRNAs, and non-coding regions. Special attention is given to the comparative analysis of repeat regions. Gene rearrangements are an additional data type for a few families, and most mitogenomes are arranged in the same order to the proposed ancestral insect. We also discuss and provide insights on the phylogenetic analyses of a variety of taxonomic levels. This review is expected to further expand our understanding of research in this field and serve as a valuable reference resource. PMID:26039239
Delimitation of essential genes of cassava latent virus DNA 2.
Etessami, P; Callis, R; Ellwood, S; Stanley, J
1988-01-01
Insertion and deletion mutagenesis of both extended open reading frames (ORFs) of cassava latent virus DNA 2 destroys infectivity. Infectivity is restored by coinoculating constructs that contain single mutations within different ORFs. Although frequent intermolecular recombination produces dominant parental-type virus, mutants can be retained within the virus population indicating that they are competent for replication and suggesting that rescue can occur by complementation of trans acting gene products. By cloning specific fragments into DNA 1 coat protein deletion vectors we have delimited the DNA 2 coding regions and provide substantive evidence that both are essential for virus infection. Although a DNA 2 component is unique to whitefly-transmitted geminiviruses, the results demonstrate that neither coding region is involved solely in insect transmission. The requirement for a bipartite genome for whitefly-transmitted geminiviruses is discussed. Images PMID:3387209
Hu, Guang-Fu; Liu, Xiang-Jiang; Li, Zhong; Liang, Hong-Wei; Hu, Shao-Na; Zou, Gui-Wei
2016-01-01
The complete mitochondrial genomes of Xingguo red carp (Cyprinus carpio var. singuonensis) and purse red carp (Cyprinus carpio var. wuyuanensis) were sequenced. Comparison of these two mitochondrial genomes revealed that the mtDNAs of these two common carp varieties were remarkably similar in genome length, gene order and content, and AT content. However, size variation between these two mitochondrial genomes presented here showed 39 site differences in overall length. About 2 site differences were located in rRNAs, 3 in tRNAs, 3 in the control region, 31 in protein-coding genes. Thirty-one variable bases in the protein-coding regions between the two varieties mitochondrial sequences led to three variable amino acids, which were mainly located in the protein ND5 and ND4.
RNAcode: Robust discrimination of coding and noncoding regions in comparative sequence data
Washietl, Stefan; Findeiß, Sven; Müller, Stephan A.; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L.; Stadler, Peter F.; Goldman, Nick
2011-01-01
With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied “out of the box,” without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as “noncoding.” RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode. PMID:21357752
RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data.
Washietl, Stefan; Findeiss, Sven; Müller, Stephan A; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L; Stadler, Peter F; Goldman, Nick
2011-04-01
With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box," without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as "noncoding." RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.
Extension of the XGC code for global gyrokinetic simulations in stellarator geometry
NASA Astrophysics Data System (ADS)
Cole, Michael; Moritaka, Toseo; White, Roscoe; Hager, Robert; Ku, Seung-Hoe; Chang, Choong-Seock
2017-10-01
In this work, the total-f, gyrokinetic particle-in-cell code XGC is extended to treat stellarator geometries. Improvements to meshing tools and the code itself have enabled the first physics studies, including single particle tracing and flux surface mapping in the magnetic geometry of the heliotron LHD and quasi-isodynamic stellarator Wendelstein 7-X. These have provided the first successful test cases for our approach. XGC is uniquely placed to model the complex edge physics of stellarators. A roadmap to such a global confinement modeling capability will be presented. Single particle studies will include the physics of energetic particles' global stochastic motions and their effect on confinement. Good confinement of energetic particles is vital for a successful stellarator reactor design. These results can be compared in the core region with those of other codes, such as ORBIT3d. In subsequent work, neoclassical transport and turbulence can then be considered and compared to results from codes such as EUTERPE and GENE. After sufficient verification in the core region, XGC will move into the stellarator edge region including the material wall and neutral particle recycling.
Ito, M; Mori, Y; Oiso, Y; Saito, H
1991-01-01
To elucidate the molecular mechanism of familial central diabetes insipidus (FDI), we sequenced the arginine vasopressin-neurophysin II (AVP-NPII) gene in 2 patients belonging to a pedigree that is consistent with an autosomal dominant mode of inheritance. 10 patients with idiopathic central diabetes insipidus (IDI) and 5 normals were also studied. The AVP-NPII gene, locating on chromosome 20, consists of three exons that encode putative signal peptide, AVP, NPII, and glycoprotein. Using polymerase chain reaction, fragments including the promoter region and all coding regions were amplified from genomic DNA and subjected to direct sequencing. Sequences of 10 patients with IDI were identical with those of normals, while in 2 patients with FDI, a single base substitution was detected in one of two alleles of the AVP-NPII gene, indicating they were heterozygotes for this mutation. It was a G----A transition at nucleotide position 1859 in the second exon, resulting in a substitution of Gly for Ser at amino acid position 57 in the NPII moiety. It was speculated that the mutated AVP-NPII precursor or the mutated NPII molecule, through their conformational changes, might be responsible for AVP deficiency. Images PMID:1840604
Adaptive Evolution Is Substantially Impeded by Hill-Robertson Interference in Drosophila.
Castellano, David; Coronado-Zamora, Marta; Campos, Jose L; Barbadilla, Antonio; Eyre-Walker, Adam
2016-02-01
Hill-Robertson interference (HRi) is expected to reduce the efficiency of natural selection when two or more linked selected sites do not segregate freely, but no attempt has been done so far to quantify the overall impact of HRi on the rate of adaptive evolution for any given genome. In this work, we estimate how much HRi impedes the rate of adaptive evolution in the coding genome of Drosophila melanogaster. We compiled a data set of 6,141 autosomal protein-coding genes from Drosophila, from which polymorphism levels in D. melanogaster and divergence out to D. yakuba were estimated. The rate of adaptive evolution was calculated using a derivative of the McDonald-Kreitman test that controls for slightly deleterious mutations. We find that the rate of adaptive amino acid substitution at a given position of the genome is positively correlated to both the rate of recombination and the mutation rate, and negatively correlated to the gene density of the region. These correlations are robust to controlling for each other, for synonymous codon bias and for gene functions related to immune response and testes. We show that HRi diminishes the rate of adaptive evolution by approximately 27%. Interestingly, genes with low mutation rates embedded in gene poor regions lose approximately 17% of their adaptive substitutions whereas genes with high mutation rates embedded in gene rich regions lose approximately 60%. We conclude that HRi hampers the rate of adaptive evolution in Drosophila and that the variation in recombination, mutation, and gene density along the genome affects the HRi effect. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Niu, Zhi-Tao; Liu, Wei; Xue, Qing-Yun; Ding, Xiao-Yu
2014-01-01
The orchid family Orchidaceae is one of the largest angiosperm families, including many species of important economic value. While chloroplast genomes are very informative for systematics and species identification, there is very limited information available on chloroplast genomes in the Orchidaceae. Here, we report the complete chloroplast genomes of the medicinal plant Dendrobium officinale and the ornamental orchid Cypripedium macranthos, demonstrating their gene content and order and potential RNA editing sites. The chloroplast genomes of the above two species and five known photosynthetic orchids showed similarities in structure as well as gene order and content, but differences in the organization of the inverted repeat/small single-copy junction and ndh genes. The organization of the inverted repeat/small single-copy junctions in the chloroplast genomes of these orchids was classified into four types; we propose that inverted repeats flanking the small single-copy region underwent expansion or contraction among Orchidaceae. The AT-rich regions of the ycf1 gene in orchids could be linked to the recombination of inverted repeat/small single-copy junctions. Relative species in orchids displayed similar patterns of variation in ndh gene contents. Furthermore, fifteen highly divergent protein-coding genes were identified, which are useful for phylogenetic analyses in orchids. To test the efficiency of these genes serving as markers in phylogenetic analyses, coding regions of four genes (accD, ccsA, matK, and ycf1) were used as a case study to construct phylogenetic trees in the subfamily Epidendroideae. High support was obtained for placement of previously unlocated subtribes Collabiinae and Dendrobiinae in the subfamily Epidendroideae. Our findings expand understanding of the diversity of orchid chloroplast genomes and provide a reference for study of the molecular systematics of this family. PMID:24911363
Luo, Jing; Hou, Bei-Wei; Niu, Zhi-Tao; Liu, Wei; Xue, Qing-Yun; Ding, Xiao-Yu
2014-01-01
The orchid family Orchidaceae is one of the largest angiosperm families, including many species of important economic value. While chloroplast genomes are very informative for systematics and species identification, there is very limited information available on chloroplast genomes in the Orchidaceae. Here, we report the complete chloroplast genomes of the medicinal plant Dendrobium officinale and the ornamental orchid Cypripedium macranthos, demonstrating their gene content and order and potential RNA editing sites. The chloroplast genomes of the above two species and five known photosynthetic orchids showed similarities in structure as well as gene order and content, but differences in the organization of the inverted repeat/small single-copy junction and ndh genes. The organization of the inverted repeat/small single-copy junctions in the chloroplast genomes of these orchids was classified into four types; we propose that inverted repeats flanking the small single-copy region underwent expansion or contraction among Orchidaceae. The AT-rich regions of the ycf1 gene in orchids could be linked to the recombination of inverted repeat/small single-copy junctions. Relative species in orchids displayed similar patterns of variation in ndh gene contents. Furthermore, fifteen highly divergent protein-coding genes were identified, which are useful for phylogenetic analyses in orchids. To test the efficiency of these genes serving as markers in phylogenetic analyses, coding regions of four genes (accD, ccsA, matK, and ycf1) were used as a case study to construct phylogenetic trees in the subfamily Epidendroideae. High support was obtained for placement of previously unlocated subtribes Collabiinae and Dendrobiinae in the subfamily Epidendroideae. Our findings expand understanding of the diversity of orchid chloroplast genomes and provide a reference for study of the molecular systematics of this family.
Methylation of the chicken vitellogenin gene: influence of estradiol administration.
Meijlink, F C; Philipsen, J N; Gruber, M; Ab, G
1983-01-01
The degree of methylation of the chicken vitellogenin gene has been investigated. Upon induction by administration of estradiol to a rooster, methyl groups at specific sites near the 5'-end of the gene are eliminated. The process of demethylation is slower than the activation of the gene. Demethylation is therefore probably not a prerequisite to gene transcription. At least two other sites in the coding region of the gene are methylated in the liver of estrogenized roosters, but not in the liver of a laying hen, where the gene is naturally active. Images PMID:6298743
From Genomes to Protein Models and Back
NASA Astrophysics Data System (ADS)
Tramontano, Anna; Giorgetti, Alejandro; Orsini, Massimiliano; Raimondo, Domenico
2007-12-01
The alternative splicing mechanism allows genes to generate more than one product. When the splicing events occur within protein coding regions they can modify the biological function of the protein. Alternative splicing has been suggested as one way for explaining the discrepancy between the number of human genes and functional complexity. We analysed the putative structure of the alternatively spliced gene products annotated in the ENCODE pilot project and discovered that many of the potential alternative gene products will be unlikely to produce stable functional proteins.
The complete mitochondrial genome sequence of the maned wolf (Chrysocyon brachyurus).
Zhao, Chao; Yang, Xiufeng; Zhang, Honghai; Zhang, Jin; Chen, Lei; Sha, Weilai; Liu, Guangshuai
2016-01-01
In this study, the complete mitochondrial genome of the maned wolf (Chrysocyon brachyurus), the unique species in Chrysocyon, was sequenced and reported for the first time using blood samples obtained from a female individual in Shanghai Zoo, China. Sequence analysis showed that the genome structure was in accordance with other Canidae species and it contained 12 S rRNA gene, 16 S rRNA gene, 22 tRNA genes, 13 protein-coding genes and 1 control region.
Vector systems for prenatal gene therapy: principles of retrovirus vector design and production.
Howe, Steven J; Chandrashekran, Anil
2012-01-01
Vectors derived from the Retroviridae family have several attributes required for successful gene delivery. Retroviral vectors have an adequate payload size for the coding regions of most genes; they are safe to handle and simple to produce. These vectors can be manipulated to target different cell types with low immunogenicity and can permanently insert genetic information into the host cells' genome. Retroviral vectors have been used in gene therapy clinical trials and successfully applied experimentally in vitro, in vivo, and in utero.
The mitochondrial genome of the Arizona Snowfly Mesocapnia arizonensis (Plecoptera, Capniidae).
Elbrecht, Vasco; Leese, Florian
2016-09-01
We assembled the mitochondrial genome of the capniid stonefly Mesocapnia arizonensis (Baumann & Gaufin, 1969) using Illumina HiSeq sequence data. The recovered mitogenome is 14,921 bp in length and includes 13 protein-coding genes, 2 ribosomal RNA genes and 22 transfer RNA genes. The control region could only be assembled partially. Gene order resembles that of basal arthropods. This is the first partial mitogenome sequence for the stonefly superfamily group Euholognatha and will be useful in future phylogenetic analyses.
The complete mitochondrial genome of the Jacobin pigeon (Columba livia breed Jacobin).
He, Wen-Xiao; Jia, Jin-Feng
2015-06-01
The Jacobin is a breed of fancy pigeon developed over many years of selective breeding that originated in Asia. In the present work, we report the complete mitochondrial genome sequence of Jacobin pigeon for the first time. The total length of the mitogenome was 17,245 bp with the base composition of 30.18% for A, 23.98% for T, 31.88% for C, and 13.96% for G and an A-T (54.17 %)-rich feature was detected. It harbored 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 1 non-coding control region. The arrangement of all genes was identical to the typical mitochondrial genomes of pigeon. The complete mitochondrial genome sequence of Jacobin pigeon would serve as an important data set of the germplasm resources for further study.
Zhang, Yulong; Shao, Dandan; Cai, Miao; Yin, Hong; Zhang, Daochuan
2016-01-01
The complete mitochondrial genome of Gryllotalpa unispina was 15,513 bp in length and contained 70.9% AT. All G. unispina protein-coding sequences except for the nad2 started with a typical ATN codon. The usual termination codons (TAA) and incomplete stop codons (T) were found from 13 protein-coding genes. All tRNA genes were folded into the typical cloverleaf secondary structure, except trnS(AGN) lacking the dihydrouridine arm. The sizes of the large and small ribosomal RNA genes were 1245 and 725 bp, respectively. The A + T-rich region was 917 bp in length with 76.8%. The orientation and gene order of the G. unispina mitogenome were identical to the G. orientalis and G. pluvialis, there was no phenomenon of "DK rearrangement" which has been widely reported in Caelifera.
Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R
2015-01-01
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.
Dasenko, Mark A.
2015-01-01
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693
Zayed, Amro; Whitfield, Charles W.
2008-01-01
Apis mellifera originated in Africa and extended its range into Eurasia in two or more ancient expansions. In 1956, honey bees of African origin were introduced into South America, their descendents admixing with previously introduced European bees, giving rise to the highly invasive and economically devastating “Africanized” honey bee. Here we ask whether the honey bee's out-of-Africa expansions, both ancient and recent (invasive), were associated with a genome-wide signature of positive selection, detected by contrasting genetic differentiation estimates (FST) between coding and noncoding SNPs. In native populations, SNPs in protein-coding regions had significantly higher FST estimates than those in noncoding regions, indicating adaptive evolution in the genome driven by positive selection. This signal of selection was associated with the expansion of honey bees from Africa into Western and Northern Europe, perhaps reflecting adaptation to temperate environments. We estimate that positive selection acted on a minimum of 852–1,371 genes or ≈10% of the bee's coding genome. We also detected positive selection associated with the invasion of African-derived honey bees in the New World. We found that introgression of European-derived alleles into Africanized bees was significantly greater for coding than noncoding regions. Our findings demonstrate that Africanized bees exploited the genetic diversity present from preexisting introductions in an adaptive way. Finally, we found a significant negative correlation between FST estimates and the local GC content surrounding coding SNPs, suggesting that AT-rich genes play an important role in adaptive evolution in the honey bee. PMID:18299560
Zayed, Amro; Whitfield, Charles W
2008-03-04
Apis mellifera originated in Africa and extended its range into Eurasia in two or more ancient expansions. In 1956, honey bees of African origin were introduced into South America, their descendents admixing with previously introduced European bees, giving rise to the highly invasive and economically devastating "Africanized" honey bee. Here we ask whether the honey bee's out-of-Africa expansions, both ancient and recent (invasive), were associated with a genome-wide signature of positive selection, detected by contrasting genetic differentiation estimates (F(ST)) between coding and noncoding SNPs. In native populations, SNPs in protein-coding regions had significantly higher F(ST) estimates than those in noncoding regions, indicating adaptive evolution in the genome driven by positive selection. This signal of selection was associated with the expansion of honey bees from Africa into Western and Northern Europe, perhaps reflecting adaptation to temperate environments. We estimate that positive selection acted on a minimum of 852-1,371 genes or approximately 10% of the bee's coding genome. We also detected positive selection associated with the invasion of African-derived honey bees in the New World. We found that introgression of European-derived alleles into Africanized bees was significantly greater for coding than noncoding regions. Our findings demonstrate that Africanized bees exploited the genetic diversity present from preexisting introductions in an adaptive way. Finally, we found a significant negative correlation between F(ST) estimates and the local GC content surrounding coding SNPs, suggesting that AT-rich genes play an important role in adaptive evolution in the honey bee.
Zhao, J.; Chen, Y. H.; Kwan, H. S.
2000-01-01
The complete nucleotide sequence of putative glucoamylase gene gla1 from the basidiomycetous fungus Lentinula edodes strain L54 is reported. The coding region of the genomic glucoamylase sequence, which is preceded by eukaryotic promoter elements CAAT and TATA, spans 2,076 bp. The gla1 gene sequence codes for a putative polypeptide of 571 amino acids and is interrupted by seven introns. The open reading frame sequence of the gla1 gene shows strong homology with those of other fungal glucoamylase genes and encodes a protein with an N-terminal catalytic domain and a C-terminal starch-binding domain. The similarity between the Gla1 protein and other fungal glucoamylases is from 45 to 61%, with the region of highest conservation found in catalytic domains and starch-binding domains. We compared the kinetics of glucoamylase activity and levels of gene expression in L. edodes strain L54 grown on different carbon sources (glucose, starch, cellulose, and potato extract) and in various developmental stages (mycelium growth, primordium appearance, and fruiting body formation). Quantitative reverse transcription PCR utilizing pairs of primers specific for gla1 gene expression shows that expression of gla1 was induced by starch and increased during the process of fruiting body formation, which indicates that glucoamylases may play an important role in the morphogenesis of the basidiomycetous fungus. PMID:10831434
Nicolas, Laura; Cols, Montserrat; Choi, Jee Eun; Chaudhuri, Jayanta; Vuong, Bao
2018-01-01
Adaptive immune responses require the generation of a diverse repertoire of immunoglobulins (Igs) that can recognize and neutralize a seemingly infinite number of antigens. V(D)J recombination creates the primary Ig repertoire, which subsequently is modified by somatic hypermutation (SHM) and class switch recombination (CSR). SHM promotes Ig affinity maturation whereas CSR alters the effector function of the Ig. Both SHM and CSR require activation-induced cytidine deaminase (AID) to produce dU:dG mismatches in the Ig locus that are transformed into untemplated mutations in variable coding segments during SHM or DNA double-strand breaks (DSBs) in switch regions during CSR. Within the Ig locus, DNA repair pathways are diverted from their canonical role in maintaining genomic integrity to permit AID-directed mutation and deletion of gene coding segments. Recently identified proteins, genes, and regulatory networks have provided new insights into the temporally and spatially coordinated molecular interactions that control the formation and repair of DSBs within the Ig locus. Unravelling the genetic program that allows B cells to selectively alter the Ig coding regions while protecting non-Ig genes from DNA damage advances our understanding of the molecular processes that maintain genomic integrity as well as humoral immunity. PMID:29744038
Analysis of CHRNA7 rare variants in autism spectrum disorder susceptibility.
Bacchelli, Elena; Battaglia, Agatino; Cameli, Cinzia; Lomartire, Silvia; Tancredi, Raffaella; Thomson, Susanne; Sutcliffe, James S; Maestrini, Elena
2015-04-01
Chromosome 15q13.3 recurrent microdeletions are causally associated with a wide range of phenotypes, including autism spectrum disorder (ASD), seizures, intellectual disability, and other psychiatric conditions. Whether the reciprocal microduplication is pathogenic is less certain. CHRNA7, encoding for the alpha7 subunit of the neuronal nicotinic acetylcholine receptor, is considered the likely culprit gene in mediating neurological phenotypes in 15q13.3 deletion cases. To assess if CHRNA7 rare variants confer risk to ASD, we performed copy number variant analysis and Sanger sequencing of the CHRNA7 coding sequence in a sample of 135 ASD cases. Sequence variation in this gene remains largely unexplored, given the existence of a fusion gene, CHRFAM7A, which includes a nearly identical partial duplication of CHRNA7. Hence, attempts to sequence coding exons must distinguish between CHRNA7 and CHRFAM7A, making next-generation sequencing approaches unreliable for this purpose. A CHRNA7 microduplication was detected in a patient with autism and moderate cognitive impairment; while no rare damaging variants were identified in the coding region, we detected rare variants in the promoter region, previously described to functionally reduce transcription. This study represents the first sequence variant analysis of CHRNA7 in a sample of idiopathic autism. © 2015 Wiley Periodicals, Inc.
Küpper, Clemens; Burke, Terry; Lank, David B.
2015-01-01
Sequence variation in the melanocortin-1 receptor (MC1R) gene explains color morph variation in several species of birds and mammals. Ruffs (Philomachus pugnax) exhibit major dark/light color differences in melanin-based male breeding plumage which is closely associated with alternative reproductive behavior. A previous study identified a microsatellite marker (Ppu020) near the MC1R locus associated with the presence/absence of ornamental plumage. We investigated whether coding sequence variation in the MC1R gene explains major dark/light plumage color variation and/or the presence/absence of ornamental plumage in ruffs. Among 821bp of the MC1R coding region from 44 male ruffs we found 3 single nucleotide polymorphisms, representing 1 nonsynonymous and 2 synonymous amino acid substitutions. None were associated with major dark/light color differences or the presence/absence of ornamental plumage. At all amino acid sites known to be functionally important in other avian species with dark/light plumage color variation, ruffs were either monomorphic or the shared polymorphism did not coincide with color morph. Neither ornamental plumage color differences nor the presence/absence of ornamental plumage in ruffs are likely to be caused entirely by amino acid variation within the coding regions of the MC1R locus. Regulatory elements and structural variation at other loci may be involved in melanin expression and contribute to the extreme plumage polymorphism observed in this species. PMID:25534935
Origins of De Novo Genes in Human and Chimpanzee.
Ruiz-Orera, Jorge; Hernandez-Rodriguez, Jessica; Chiva, Cristina; Sabidó, Eduard; Kondova, Ivanela; Bontrop, Ronald; Marqués-Bonet, Tomàs; Albà, M Mar
2015-12-01
The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species--human, chimpanzee, macaque, and mouse--and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.
Origins of De Novo Genes in Human and Chimpanzee
Ruiz-Orera, Jorge; Hernandez-Rodriguez, Jessica; Chiva, Cristina; Sabidó, Eduard; Kondova, Ivanela; Bontrop, Ronald; Marqués-Bonet, Tomàs; Albà, M.Mar
2015-01-01
The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of de novo gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species—human, chimpanzee, macaque, and mouse—and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of de novo genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins. PMID:26720152
van den Berg, L; Kwant, L; Hestand, M S; van Oost, B A; Leegwater, P A J
2005-01-01
Aggressive behavior is the most frequently encountered behavioral problem in dogs. Abnormalities in brain serotonin metabolism have been described in aggressive dogs. We studied canine serotonergic genes to investigate genetic factors underlying canine aggression. Here, we describe the characterization of three genes of the canine serotonergic system: the serotonin receptor 1A and 2A gene (htr1A and htr2A) and the serotonin transporter gene (slc6A4). We isolated canine bacterial artificial chromosome clones containing these genes and designed oligonucleotides for genomic sequencing of coding regions and intron-exon boundaries. Golden retrievers were analyzed for DNA sequence variations. We found two nonsynonymous single nucleotide polymorphisms (SNPs) in the coding sequence of htr1A; one SNP close to a splice site in htr2A; and two SNPs in slc6A4, one in the coding sequence and one close to a splice site. In addition, we identified a polymorphic microsatellite marker for each gene. Htr1A is a strong candidate for involvement in the domestication of the dog. We genotyped the htr1A SNPs in 41 dogs of seven breeds with diverse behavioral characteristics. At least three SNP haplotypes were found. Our results do not support involvement of the gene in domestication.
Maia, Rafaela M; Valente, Valeria; Cunha, Marco A V; Sousa, Josane F; Araujo, Daniela D; Silva, Wilson A; Zago, Marco A; Dias-Neto, Emmanuel; Souza, Sandro J; Simpson, Andrew J G; Monesi, Nadia; Ramos, Ricardo G P; Espreafico, Enilza M; Paçó-Larson, Maria L
2007-07-24
The sequencing of the D.melanogaster genome revealed an unexpected small number of genes (~ 14,000) indicating that mechanisms acting on generation of transcript diversity must have played a major role in the evolution of complex metazoans. Among the most extensively used mechanisms that accounts for this diversity is alternative splicing. It is estimated that over 40% of Drosophila protein-coding genes contain one or more alternative exons. A recent transcription map of the Drosophila embryogenesis indicates that 30% of the transcribed regions are unannotated, and that 1/3 of this is estimated as missed or alternative exons of previously characterized protein-coding genes. Therefore, the identification of the variety of expressed transcripts depends on experimental data for its final validation and is continuously being performed using different approaches. We applied the Open Reading Frame Expressed Sequence Tags (ORESTES) methodology, which is capable of generating cDNA data from the central portion of rare transcripts, in order to investigate the presence of hitherto unnanotated regions of Drosophila transcriptome. Bioinformatic analysis of 1,303 Drosophila ORESTES clusters identified 68 sequences derived from unannotated regions in the current Drosophila genome version (4.3). Of these, a set of 38 was analysed by polyA+ northern blot hybridization, validating 17 (50%) new exons of low abundance transcripts. For one of these ESTs, we obtained the cDNA encompassing the complete coding sequence of a new serine protease, named SP212. The SP212 gene is part of a serine protease gene cluster located in the chromosome region 88A12-B1. This cluster includes the predicted genes CG9631, CG9649 and CG31326, which were previously identified as up-regulated after immune challenges in genomic-scale microarray analysis. In agreement with the proposal that this locus is co-regulated in response to microorganisms infection, we show here that SP212 is also up-regulated upon injury. Using the ORESTES methodology we identified 17 novel exons from low abundance Drosophila transcripts, and through a PCR approach the complete CDS of one of these transcripts was defined. Our results show that the computational identification and manual inspection are not sufficient to annotate a genome in the absence of experimentally derived data.
Maia, Rafaela M; Valente, Valeria; Cunha, Marco AV; Sousa, Josane F; Araujo, Daniela D; Silva, Wilson A; Zago, Marco A; Dias-Neto, Emmanuel; Souza, Sandro J; Simpson, Andrew JG; Monesi, Nadia; Ramos, Ricardo GP; Espreafico, Enilza M; Paçó-Larson, Maria L
2007-01-01
Background The sequencing of the D.melanogaster genome revealed an unexpected small number of genes (~ 14,000) indicating that mechanisms acting on generation of transcript diversity must have played a major role in the evolution of complex metazoans. Among the most extensively used mechanisms that accounts for this diversity is alternative splicing. It is estimated that over 40% of Drosophila protein-coding genes contain one or more alternative exons. A recent transcription map of the Drosophila embryogenesis indicates that 30% of the transcribed regions are unannotated, and that 1/3 of this is estimated as missed or alternative exons of previously characterized protein-coding genes. Therefore, the identification of the variety of expressed transcripts depends on experimental data for its final validation and is continuously being performed using different approaches. We applied the Open Reading Frame Expressed Sequence Tags (ORESTES) methodology, which is capable of generating cDNA data from the central portion of rare transcripts, in order to investigate the presence of hitherto unnanotated regions of Drosophila transcriptome. Results Bioinformatic analysis of 1,303 Drosophila ORESTES clusters identified 68 sequences derived from unannotated regions in the current Drosophila genome version (4.3). Of these, a set of 38 was analysed by polyA+ northern blot hybridization, validating 17 (50%) new exons of low abundance transcripts. For one of these ESTs, we obtained the cDNA encompassing the complete coding sequence of a new serine protease, named SP212. The SP212 gene is part of a serine protease gene cluster located in the chromosome region 88A12-B1. This cluster includes the predicted genes CG9631, CG9649 and CG31326, which were previously identified as up-regulated after immune challenges in genomic-scale microarray analysis. In agreement with the proposal that this locus is co-regulated in response to microorganisms infection, we show here that SP212 is also up-regulated upon injury. Conclusion Using the ORESTES methodology we identified 17 novel exons from low abundance Drosophila transcripts, and through a PCR approach the complete CDS of one of these transcripts was defined. Our results show that the computational identification and manual inspection are not sufficient to annotate a genome in the absence of experimentally derived data. PMID:17650329
The analysis of APOL1 genetic variation and haplotype diversity provided by 1000 Genomes project.
Peng, Ting; Wang, Li; Li, Guisen
2017-08-11
The APOL1 gene variants has been shown to be associated with an increased risk of multiple kinds of diseases, particularly in African Americans, but not in Caucasians and Asians. In this study, we explored the single nucleotide polymorphism (SNP) and haplotype diversity of APOL1 gene in different races provided by 1000 Genomes project. Variants of APOL1 gene in 1000 Genome Project were obtained and SNPs located in the regulatory region or coding region were selected for genetic variation analysis. Total 2504 individuals from 26 populations were classified as four groups that included Africa, Europe, Asia and Admixed populations. Tag SNPs were selected to evaluate the haplotype diversities in the four populations by HaploStats software. APOL1 gene was surrounded by some of the most polymorphic genes in the human genome, variation of APOL1 gene was common, with up to 613 SNP (1000 Genome Project reported) and 99 of them (16.2%) with MAF ≥ 1%. There were 79 SNPs in the URR and 92 SNPs in 3'UTR. Total 12 SNPs in URR and 24 SNPs in 3'UTR were considered as common variants with MAF ≥ 1%. It is worth noting that URR-1 was presents lower frequencies in European populations, while other three haplotypes taken an opposite pattern; 3'UTR presents several high-frequency variation sites in a short segment, and the differences of its haplotypes among different population were significant (P < 0.01), UTR-1 and UTR-5 presented much higher frequency in African population, while UTR-2, UTR-3 and UTR-4 were much lower. APOL1 coding region showed that two SNP of G1 with higher frequency are actually pull down the haplotype H-1 frequency when considering all populations pooled together, and the diversity among the four populations be widen by the G1 two mutation (P 1 = 3.33E-4 vs P 2 = 3.61E-30). The distributions of APOL1 gene variants and haplotypes were significantly different among the different populations, in either regulatory or coding regions. It could provide clues for the future genetic study of APOL1 related diseases.
Unraveling transcriptional control and cis-regulatory codes using the software suite GeneACT
Cheung, Tom Hiu; Kwan, Yin Lam; Hamady, Micah; Liu, Xuedong
2006-01-01
Deciphering gene regulatory networks requires the systematic identification of functional cis-acting regulatory elements. We present a suite of web-based bioinformatics tools, called GeneACT , that can rapidly detect evolutionarily conserved transcription factor binding sites or microRNA target sites that are either unique or over-represented in differentially expressed genes from DNA microarray data. GeneACT provides graphic visualization and extraction of common regulatory sequence elements in the promoters and 3'-untranslated regions that are conserved across multiple mammalian species. PMID:17064417
Alterations of CHEK2 forkhead-associated domain increase the risk of Hodgkin lymphoma.
Havranek, O; Spacek, M; Hubacek, P; Mocikova, H; Markova, J; Trneny, M; Kleibl, Z
2011-01-01
Checkpoint kinase 2 gene (CHEK2) codes for an important mediator of DNA damage response pathway. Mutations in the CHEK2 gene increase the risk of several cancer types, however, their role in Hodgkin lymphoma (HL) has not been studied so far. The most frequent CHEK2 alterations (including c.470T>C; p.I157T) cluster into the forkhead-associated (FHA) domain-coding region of the CHEK2 gene. We performed mutation analysis of the CHEK2 gene segment coding for FHA domain using denaturing high-performance liquid chromatography in 298 HL patients and analyzed the impact of characterized CHEK2 gene variants on the risk of HL development and progression-free survival (PFS). The overall frequency of CHEK2 alterations was significantly higher in HL patients (17/298; 5.7%) compared to the previously analyzed non-cancer controls (19/683; 2.8%; p= 0.04). Presence of any alteration within the analyzed region of the CHEK2 gene was associated with increased risk of HL development (OR = 2.11; 95% CI = 1.08 - 4.13; p= 0.04). The most frequent I157T mutation was found in 4.0% of HL patients and 2.5% of controls (p = 0.22), however, the frequency of 5 other alterations (excluding I157T) was significantly higher in HL cases and associated with increased risk of HL development (OR = 5.81; 95% CI = 1.12 - 30.12; p= 0.03). PFS in HL patients did not differ between CHEK2 mutation carriers and non-carriers. The predominant I157T mutation together with other alterations in its proximity represent moderate genetic predisposition factor increasing the risk of HL development.
Wang, Aide; Yamakake, Junko; Kudo, Hisayuki; Wakasa, Yuhya; Hatsuyama, Yoshimichi; Igarashi, Megumi; Kasai, Atsushi; Li, Tianzhong; Harada, Takeo
2009-09-01
Expression of MdACS1, coding for 1-aminocyclopropane-1-carboxylate synthase (ACS), parallels the level of ethylene production in ripening apple (Malus domestica) fruit. Here we show that expression of another ripening-specific ACS gene (MdACS3) precedes the initiation of MdACS1 expression by approximately 3 weeks; MdACS3 expression then gradually decreases as MdACS1 expression increases. Because MdACS3 expression continues in ripening fruit treated with 1-methylcyclopropene, its transcription appears to be regulated by a negative feedback mechanism. Three genes in the MdACS3 family (a, b, and c) were isolated from a genomic library, but two of them (MdACS3b and MdACS3c) possess a 333-bp transposon-like insertion in their 5' flanking region that may prevent transcription of these genes during ripening. A single nucleotide polymorphism in the coding region of MdACS3a results in an amino acid substitution (glycine-289 --> valine) in the active site that inactivates the enzyme. Furthermore, another null allele of MdACS3a, Mdacs3a, showing no ability to be transcribed, was found by DNA sequencing. Apple cultivars homozygous or heterozygous for both null allelotypes showed no or very low expression of ripening-related genes and maintained fruit firmness. These results suggest that MdACS3a plays a crucial role in regulation of fruit ripening in apple, and is a possible determinant of ethylene production and shelf life in apple fruit.
HLA-E regulatory and coding region variability and haplotypes in a Brazilian population sample.
Ramalho, Jaqueline; Veiga-Castelli, Luciana C; Donadi, Eduardo A; Mendes-Junior, Celso T; Castelli, Erick C
2017-11-01
The HLA-E gene is characterized by low but wide expression on different tissues. HLA-E is considered a conserved gene, being one of the least polymorphic class I HLA genes. The HLA-E molecule interacts with Natural Killer cell receptors and T lymphocytes receptors, and might activate or inhibit immune responses depending on the peptide associated with HLA-E and with which receptors HLA-E interacts to. Variable sites within the HLA-E regulatory and coding segments may influence the gene function by modifying its expression pattern or encoded molecule, thus, influencing its interaction with receptors and the peptide. Here we propose an approach to evaluate the gene structure, haplotype pattern and the complete HLA-E variability, including regulatory (promoter and 3'UTR) and coding segments (with introns), by using massively parallel sequencing. We investigated the variability of 420 samples from a very admixed population such as Brazilians by using this approach. Considering a segment of about 7kb, 63 variable sites were detected, arranged into 75 extended haplotypes. We detected 37 different promoter sequences (but few frequent ones), 27 different coding sequences (15 representing new HLA-E alleles) and 12 haplotypes at the 3'UTR segment, two of them presenting a summed frequency of 90%. Despite the number of coding alleles, they encode mainly two different full-length molecules, known as E*01:01 and E*01:03, which corresponds to about 90% of all. In addition, differently from what has been previously observed for other non classical HLA genes, the relationship among the HLA-E promoter, coding and 3'UTR haplotypes is not straightforward because the same promoter and 3'UTR haplotypes were many times associated with different HLA-E coding haplotypes. This data reinforces the presence of only two main full-length HLA-E molecules encoded by the many HLA-E alleles detected in our population sample. In addition, this data does indicate that the distal HLA-E promoter is by far the most variable segment. Further analyses involving the binding of transcription factors and non-coding RNAs, as well as the HLA-E expression in different tissues, are necessary to evaluate whether these variable sites at regulatory segments (or even at the coding sequence) may influence the gene expression profile. Copyright © 2017 Elsevier Ltd. All rights reserved.
Promoter variants of Xa23 alleles affect bacterial blight resistance and evolutionary pattern
Xu, Feifei; Tang, Yongchao; Gao, Ying
2017-01-01
Bacterial blight, caused by Xanthomonas oryzae pv. oryzae (Xoo), is the most important bacterial disease in rice (Oryza sativa L.). Our previous studies have revealed that the bacterial blight resistance gene Xa23 from wild rice O. rufipogon Griff. confers the broadest-spectrum resistance against all the naturally occurring Xoo races. As a novel executor R gene, Xa23 is transcriptionally activated by the bacterial avirulence (Avr) protein AvrXa23 via binding to a 28-bp DNA element (EBEAvrXa23) in the promoter region. So far, the evolutionary mechanism of Xa23 remains to be illustrated. Here, a rice germplasm collection of 97 accessions, including 29 rice cultivars (indica and japonica) and 68 wild relatives, was used to analyze the evolution, phylogeographic relationship and association of Xa23 alleles with bacterial blight resistance. All the ~ 473 bp DNA fragments consisting of promoter and coding regions of Xa23 alleles in the germplasm accessions were PCR-amplified and sequenced, and nine single nucleotide polymorphisms (SNPs) were detected in the promoter regions (~131 bp sequence upstream from the start codon ATG) of Xa23/xa23 alleles while only two SNPs were found in the coding regions. The SNPs in the promoter regions formed 5 haplotypes (Pro-A, B, C, D, E) which showed no significant difference in geographic distribution among these 97 rice accessions. However, haplotype association analysis indicated that Pro-A is the most favored haplotype for bacterial blight resistance. Moreover, SNP changes among the 5 haplotypes mostly located in the EBE/ebe regions (EBEAvrXa23 and corresponding ebes located in promoters of xa23 alleles), confirming that the EBE region is the key factor to confer bacterial blight resistance by altering gene expression. Polymorphism analysis and neutral test implied that Xa23 had undergone a bottleneck effect, and selection process of Xa23 was not detected in cultivated rice. In addition, the Xa23 coding region was found highly conserved in the Oryza genus but absent in other plant species by searching the plant database, suggesting that Xa23 originated along with the diversification of the Oryza genus from the grass family during evolution. This research offers a potential for flexible use of novel Xa23 alleles in rice breeding programs and provide a model for evolution analysis of other executor R genes. PMID:28982185
Promoter variants of Xa23 alleles affect bacterial blight resistance and evolutionary pattern.
Cui, Hua; Wang, Chunlian; Qin, Tengfei; Xu, Feifei; Tang, Yongchao; Gao, Ying; Zhao, Kaijun
2017-01-01
Bacterial blight, caused by Xanthomonas oryzae pv. oryzae (Xoo), is the most important bacterial disease in rice (Oryza sativa L.). Our previous studies have revealed that the bacterial blight resistance gene Xa23 from wild rice O. rufipogon Griff. confers the broadest-spectrum resistance against all the naturally occurring Xoo races. As a novel executor R gene, Xa23 is transcriptionally activated by the bacterial avirulence (Avr) protein AvrXa23 via binding to a 28-bp DNA element (EBEAvrXa23) in the promoter region. So far, the evolutionary mechanism of Xa23 remains to be illustrated. Here, a rice germplasm collection of 97 accessions, including 29 rice cultivars (indica and japonica) and 68 wild relatives, was used to analyze the evolution, phylogeographic relationship and association of Xa23 alleles with bacterial blight resistance. All the ~ 473 bp DNA fragments consisting of promoter and coding regions of Xa23 alleles in the germplasm accessions were PCR-amplified and sequenced, and nine single nucleotide polymorphisms (SNPs) were detected in the promoter regions (~131 bp sequence upstream from the start codon ATG) of Xa23/xa23 alleles while only two SNPs were found in the coding regions. The SNPs in the promoter regions formed 5 haplotypes (Pro-A, B, C, D, E) which showed no significant difference in geographic distribution among these 97 rice accessions. However, haplotype association analysis indicated that Pro-A is the most favored haplotype for bacterial blight resistance. Moreover, SNP changes among the 5 haplotypes mostly located in the EBE/ebe regions (EBEAvrXa23 and corresponding ebes located in promoters of xa23 alleles), confirming that the EBE region is the key factor to confer bacterial blight resistance by altering gene expression. Polymorphism analysis and neutral test implied that Xa23 had undergone a bottleneck effect, and selection process of Xa23 was not detected in cultivated rice. In addition, the Xa23 coding region was found highly conserved in the Oryza genus but absent in other plant species by searching the plant database, suggesting that Xa23 originated along with the diversification of the Oryza genus from the grass family during evolution. This research offers a potential for flexible use of novel Xa23 alleles in rice breeding programs and provide a model for evolution analysis of other executor R genes.
Pecker, I; Avraham, K B; Gilbert, D J; Savitsky, K; Rotman, G; Harnik, R; Fukao, T; Schröck, E; Hirotsune, S; Tagle, D A; Collins, F S; Wynshaw-Boris, A; Ried, T; Copeland, N G; Jenkins, N A; Shiloh, Y; Ziv, Y
1996-07-01
Atm, the mouse homolog of the human ATM gene defective in ataxia-telangiectasia (A-T), has been identified. The entire coding sequence of the Atm transcript was cloned and found to contain an open reading frame encoding a protein of 3066 amino acids with 84% overall identity and 91% similarity to the human ATM protein. Variable levels of expression of Atm were observed in different tissues. Fluorescence in situ hybridization and linkage analysis located the Atm gene on mouse chromosome 9, band 9C, in a region homologous to the ATM region on human chromosome 11q22-q23.
Complete mitochondrial genome sequence of northeastern sika deer (Cervus nippon hortulorum).
Shao, Yuanchen; Zha, Daiming; Xing, Xiumei; Su, Weilin; Liu, Huamiao; Zhang, Ranran
2016-01-01
The complete mitochondrial genome of the northeastern sika deer, Cervus nippon hortulorum, was determined by accurate polymerase chain reaction. The entire genome is 16,434 bp in length and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and 1 control region, all of which are arranged in a typical vertebrate manner. The overall base composition of the northeastern sika deer's mitochondrial genome is 33.3% of A, 24.5% of C, 28.7% of T and 13.5% of G. A termination associated sequence and several conserved central sequence block domains were discovered within the control region.
Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan
2015-12-11
High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
Qian, Chaoju; Wang, Yuanxiu; Guo, Zhichun; Yang, Jianke; Kan, Xianzhao
2013-06-01
The circular mitochondrial genome of Alauda arvensis is 17,018 bp in length, containing 13 protein-coding genes (PCGs), 2 ribosomal RNA genes, 22 transfer RNA (tRNA) genes, and 2 extensive heteroplasmic control regions. All of the genes encoded on the H-strand, with the exceptions of one PCG (nad6) and eight tRNA genes (tRNA(Gln), tRNA(Ala), tRNA(Asn), tRNA(Cys), tRNA(Tyr), tRNA(Ser(UCN)), tRNA(Pro), and tRNA(Glu)), as found in other birds' mitochondrial genomes. All of these PCGs are initiated with ATG, while stopped by six types of stop codons. All tRNA genes have the potential to fold into typical clover-leaf structure. Two extensive heteroplasmic control regions were found, and more interestingly, a minisatellite of 37 nucleotides (5'-TCAATCCCATTGATTTCATTATATTAGTATAAAGAAA-3') with 6 tandem repeats was detected at the end of CR2.
Li, Wei; Zhang, Xin-Cheng; Zhao, Jian; Shi, Yan; Zhu, Xin-Ping
2015-01-25
Cuora trifasciata has become one of the most critically endangered species in the world. The complete mitochondrial genome of C. trifasciata (Chinese three-striped box turtle) was determined in this study. Its mitochondrial genome is a 16,575-bp-long circular molecule that consists of 37 genes that are typically found in other vertebrates. And the basic characteristics of the C. trifasciata mitochondrial genome were also determined. Moreover, a comparison of C. trifasciata with Cuora cyclornata, Cuora pani and Cuora aurocapitata indicated that the four mitogenomics differed in length, codons, overlaps, 13 protein-coding genes (PCGs), ND3, rRNA genes, control region, and other aspects. Phylogenetic analysis with Bayesian inference and maximum likelihood based on 12 protein-coding genes of the genus Cuora indicated the phylogenetic position of C. trifasciata within Cuora. The phylogenetic analysis also showed that C. trifasciata from Vietnam and China formed separate monophyletic clades with different Cuora species. The results of nucleotide base compositions, protein-coding genes and phylogenetic analysis showed that C. trifasciata from these two countries may represent different Cuora species. Copyright © 2014 Elsevier B.V. All rights reserved.
Umchs5, a gene coding for a class IV chitin synthase in Ustilago maydis.
Xoconostle-Cázares, B; Specht, C A; Robbins, P W; Liu, Y; León, C; Ruiz-Herrera, J
1997-12-01
A fragment corresponding to a conserved region of a fifth gene coding for chitin synthase in the plant pathogenic fungus Ustilago maydis was amplified by means of the polymerase chain reaction (PCR). The amplified fragment was utilized as a probe for the identification of the whole gene in a genomic library of the fungus. The predicted gene product of Umchs5 has highest similarity with class IV chitin synthases encoded by the CHS3 genes from Saccharomyces cerevisiae and Candida albicans, chs-4 from Neurospora crassa, and chsE from Aspergillus nidulans. Umchs5 null mutants were constructed by substitution of most of the coding sequence with the hygromycin B resistance cassette. Mutants displayed significant reduction in growth rate, chitin content, and chitin synthase activity, specially in the mycelial form. Virulence to corn plantules was also reduced in the mutants. PCR was also used to obtain a fragment of a sixth chitin synthase, Umchs6. It is suggested that multigenic control of chitin synthesis in U. maydis operates as a protection mechanism for fungal viability in which the loss of one activity is partially compensated by the remaining enzymes. Copyright 1997 Academic Press.
Genomic and Epigenomic Insights into Nutrition and Brain Disorders
Dauncey, Margaret Joy
2013-01-01
Considerable evidence links many neuropsychiatric, neurodevelopmental and neurodegenerative disorders with multiple complex interactions between genetics and environmental factors such as nutrition. Mental health problems, autism, eating disorders, Alzheimer’s disease, schizophrenia, Parkinson’s disease and brain tumours are related to individual variability in numerous protein-coding and non-coding regions of the genome. However, genotype does not necessarily determine neurological phenotype because the epigenome modulates gene expression in response to endogenous and exogenous regulators, throughout the life-cycle. Studies using both genome-wide analysis of multiple genes and comprehensive analysis of specific genes are providing new insights into genetic and epigenetic mechanisms underlying nutrition and neuroscience. This review provides a critical evaluation of the following related areas: (1) recent advances in genomic and epigenomic technologies, and their relevance to brain disorders; (2) the emerging role of non-coding RNAs as key regulators of transcription, epigenetic processes and gene silencing; (3) novel approaches to nutrition, epigenetics and neuroscience; (4) gene-environment interactions, especially in the serotonergic system, as a paradigm of the multiple signalling pathways affected in neuropsychiatric and neurological disorders. Current and future advances in these four areas should contribute significantly to the prevention, amelioration and treatment of multiple devastating brain disorders. PMID:23503168
Zhuo, Chuanjun; Hou, Weihong; Hu, Lirong; Lin, Chongguang; Chen, Ce; Lin, Xiaodong
2017-01-01
Schizophrenia is a genetically related mental illness, in which the majority of genetic alterations occur in the non-coding regions of the human genome. In the past decade, a growing number of regulatory non-coding RNAs (ncRNAs) including microRNAs (miRNAs) and long non-coding RNAs (lncRNAs) have been identified to be strongly associated with schizophrenia. However, the studies of these ncRNAs in the pathophysiology of schizophrenia and the reverting of their genetic defects in restoration of the normal phenotype have been hampered by insufficient technology to manipulate these ncRNA genes effectively as well as a lack of appropriate animal models. Most recently, a revolutionary gene editing technology known as Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/CRISPR-associated nuclease 9 (Cas9; CRISPR/Cas9) has been developed that enable researchers to overcome these challenges. In this review article, we mainly focus on the schizophrenia-related ncRNAs and the use of CRISPR/Cas9-mediated editing on the non-coding regions of the genomic DNA in proving causal relationship between the genetic defects and the pathophysiology of schizophrenia. We subsequently discuss the potential of translating this advanced technology into a clinical therapy for schizophrenia, although the CRISPR/Cas9 technology is currently still in its infancy and immature to put into use in the treatment of diseases. Furthermore, we suggest strategies to accelerate the pace from the bench to the bedside. This review describes the application of the powerful and feasible CRISPR/Cas9 technology to manipulate schizophrenia-associated ncRNA genes. This technology could help researchers tackle this complex health problem and perhaps other genetically related mental disorders due to the overlapping genetic alterations of schizophrenia with other mental illnesses. PMID:28217082
Kawano, Tomonori
2013-03-01
There have been a wide variety of approaches for handling the pieces of DNA as the "unplugged" tools for digital information storage and processing, including a series of studies applied to the security-related area, such as DNA-based digital barcodes, water marks and cryptography. In the present article, novel designs of artificial genes as the media for storing the digitally compressed data for images are proposed for bio-computing purpose while natural genes principally encode for proteins. Furthermore, the proposed system allows cryptographical application of DNA through biochemically editable designs with capacity for steganographical numeric data embedment. As a model case of image-coding DNA technique application, numerically and biochemically combined protocols are employed for ciphering the given "passwords" and/or secret numbers using DNA sequences. The "passwords" of interest were decomposed into single letters and translated into the font image coded on the separate DNA chains with both the coding regions in which the images are encoded based on the novel run-length encoding rule, and the non-coding regions designed for biochemical editing and the remodeling processes revealing the hidden orientation of letters composing the original "passwords." The latter processes require the molecular biological tools for digestion and ligation of the fragmented DNA molecules targeting at the polymerase chain reaction-engineered termini of the chains. Lastly, additional protocols for steganographical overwriting of the numeric data of interests over the image-coding DNA are also discussed.
Peng, Fred Y; Yang, Rong-Cai
2017-06-20
The resistance to leaf rust (Lr) caused by Puccinia triticina in wheat (Triticum aestivum L.) has been well studied over the past decades with over 70 Lr genes being mapped on different chromosomes and numerous QTLs (quantitative trait loci) being detected or mapped using DNA markers. Such resistance is often divided into race-specific and race-nonspecific resistance. The race-nonspecific resistance can be further divided into resistance to most or all races of the same pathogen and resistance to multiple pathogens. At the molecular level, these three types of resistance may cover across the whole spectrum of pathogen specificities that are controlled by genes encoding different protein families in wheat. The objective of this study is to predict and analyze genes in three such families: NBS-LRR (nucleotide-binding sites and leucine-rich repeats or NLR), START (Steroidogenic Acute Regulatory protein [STaR] related lipid-transfer) and ABC (ATP-Binding Cassette) transporter. The focus of the analysis is on the patterns of relationships between these protein-coding genes within the gene families and QTLs detected for leaf rust resistance. We predicted 526 ABC, 1117 NLR and 144 START genes in the hexaploid wheat genome through a domain analysis of wheat proteome. Of the 1809 SNPs from leaf rust resistance QTLs in seedling and adult stages of wheat, 126 SNPs were found within coding regions of these genes or their neighborhood (5 Kb upstream from transcription start site [TSS] or downstream from transcription termination site [TTS] of the genes). Forty-three of these SNPs for adult resistance and 18 SNPs for seedling resistance reside within coding or neighboring regions of the ABC genes whereas 14 SNPs for adult resistance and 29 SNPs for seedling resistance reside within coding or neighboring regions of the NLR gene. Moreover, we found 17 nonsynonymous SNPs for adult resistance and five SNPs for seedling resistance in the ABC genes, and five nonsynonymous SNPs for adult resistance and six SNPs for seedling resistance in the NLR genes. Most of these coding SNPs were predicted to alter encoded amino acids and such information may serve as a starting point towards more thorough molecular and functional characterization of the designated Lr genes. Using the primer sequences of 99 known non-SNP markers from leaf rust resistance QTLs, we found candidate genes closely linked to these markers, including Lr34 with distances to its two gene-specific markers being 1212 bases (to cssfr1) and 2189 bases (to cssfr2). This study represents a comprehensive analysis of ABC, NLR and START genes in the hexaploid wheat genome and their physical relationships with QTLs for leaf rust resistance at seedling and adult stages. Our analysis suggests that the ABC (and START) genes are more likely to be co-located with QTLs for race-nonspecific, adult resistance whereas the NLR genes are more likely to be co-located with QTLs for race-specific resistance that would be often expressed at the seedling stage. Though our analysis was hampered by inaccurate or unknown physical positions of numerous QTLs due to the incomplete assembly of the complex hexaploid wheat genome that is currently available, the observed associations between (i) QTLs for race-specific resistance and NLR genes and (ii) QTLs for nonspecific resistance and ABC genes will help discover SNP variants for leaf rust resistance at seedling and adult stages. The genes containing nonsynonymous SNPs are promising candidates that can be investigated in future studies as potential new sources of leaf rust resistance in wheat breeding.
Intriguing Balancing Selection on the Intron 5 Region of LMBR1 in Human Population
He, Fang; Wu, Dong-Dong; Kong, Qing-Peng; Zhang, Ya-Ping
2008-01-01
Background The intron 5 of gene LMBR1 is the cis-acting regulatory module for the sonic hedgehog (SHH) gene. Mutation in this non-coding region is associated with preaxial polydactyly, and may play crucial roles in the evolution of limb and skeletal system. Methodology/Principal Findings We sequenced a region of the LMBR1 gene intron 5 in East Asian human population, and found a significant deviation of Tajima's D statistics from neutrality taking human population growth into account. Data from HapMap also demonstrated extended linkage disequilibrium in the region in East Asian and European population, and significantly low degree of genetic differentiation among human populations. Conclusion/Significance We proposed that the intron 5 of LMBR1 was presumably subject to balancing selection during the evolution of modern human. PMID:18698406
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pham-Dinh, D.; Gaspera, D.B.; Dautigny, A.
1995-09-20
Myelin/oligodendrocyte glycoprotein (MOG), a special component of the central nervous system localization on the outermost lamellae of mature myelin, is a member of the immunoglobulin superfamily. We report here the organization of the human MOG gene, which spans approximately 17 kb, and the characterization of six MOG mRNA splicing variants. The intron/exon structure of the human MOG gene confirmed the splicing pattern, supporting the hypothesis that mRNA isoforms could arise by alternative splicing of a single gene. In addition to the eight exons coding for the major MOG isoform, the human MOG gene also contains 3` region, a previously unknownmore » alternatively spliced coding exon, VIA. Alternative utilization of two acceptor splicing sites for exon VIII could produce two different C-termini. The nucleotide sequences presented here may be a useful tool to study further possible involvement if the MOG gene in hereditary neurological disorders. 23 refs., 5 figs.« less
Characterisation of ATM mutations in Slavic Ataxia telangiectasia patients.
Soukupova, Jana; Pohlreich, Petr; Seemanova, Eva
2011-09-01
Ataxia telangiectasia (AT) is a genomic instability syndrome characterised, among others, by progressive cerebellar degeneration, oculocutaneous telangiectases, immunodeficiency, elevated serum alpha-phetoprotein level, chromosomal breakage, hypersensitivity to ionising radiation and increased cancer risk. This autosomal recessive disorder is caused by mutations in the ataxia telangiectasia mutated (ATM) gene coding for serine/threonine protein kinase with a crucial role in response to DNA double-strand breaks. We characterised genotype and phenotype of 12 Slavic AT patients from 11 families. Mutation analysis included sequencing of the entire coding sequence, adjacent intron regions, 3'UTR and 5'UTR of the ATM gene and multiplex ligation-dependent probe amplification (MLPA) for the detection of large deletions/duplications at the ATM locus. The high incidence of new and individual mutations demonstrates a marked mutational heterogeneity of AT in the Czech Republic. Our data indicate that sequence analysis of the entire coding region of ATM is sufficient for a high detection rate of mutations in ATM and that MLPA analysis for the detection of deletions/duplications seems to be redundant in the Slavic population.
Beaudet, Denis; Nadimi, Maryam; Iffis, Bachir; Hijri, Mohamed
2013-01-01
Arbuscular mycorrhizal fungi (AMF) are common and important plant symbionts. They have coenocytic hyphae and form multinucleated spores. The nuclear genome of AMF is polymorphic and its organization is not well understood, which makes the development of reliable molecular markers challenging. In stark contrast, their mitochondrial genome (mtDNA) is homogeneous. To assess the intra- and inter-specific mitochondrial variability in closely related Glomus species, we performed 454 sequencing on total genomic DNA of Glomus sp. isolate DAOM-229456 and we compared its mtDNA with two G. irregulare isolates. We found that the mtDNA of Glomus sp. is homogeneous, identical in gene order and, with respect to the sequences of coding regions, almost identical to G. irregulare. However, certain genomic regions vary substantially, due to insertions/deletions of elements such as introns, mitochondrial plasmid-like DNA polymerase genes and mobile open reading frames. We found no evidence of mitochondrial or cytoplasmic plasmids in Glomus species, and mobile ORFs in Glomus are responsible for the formation of four gene hybrids in atp6, atp9, cox2, and nad3, which are most probably the result of horizontal gene transfer and are expressed at the mRNA level. We found evidence for substantial sequence variation in defined regions of mtDNA, even among closely related isolates with otherwise identical coding gene sequences. This variation makes it possible to design reliable intra- and inter-specific markers. PMID:23637766
Beaudet, Denis; Nadimi, Maryam; Iffis, Bachir; Hijri, Mohamed
2013-01-01
Arbuscular mycorrhizal fungi (AMF) are common and important plant symbionts. They have coenocytic hyphae and form multinucleated spores. The nuclear genome of AMF is polymorphic and its organization is not well understood, which makes the development of reliable molecular markers challenging. In stark contrast, their mitochondrial genome (mtDNA) is homogeneous. To assess the intra- and inter-specific mitochondrial variability in closely related Glomus species, we performed 454 sequencing on total genomic DNA of Glomus sp. isolate DAOM-229456 and we compared its mtDNA with two G. irregulare isolates. We found that the mtDNA of Glomus sp. is homogeneous, identical in gene order and, with respect to the sequences of coding regions, almost identical to G. irregulare. However, certain genomic regions vary substantially, due to insertions/deletions of elements such as introns, mitochondrial plasmid-like DNA polymerase genes and mobile open reading frames. We found no evidence of mitochondrial or cytoplasmic plasmids in Glomus species, and mobile ORFs in Glomus are responsible for the formation of four gene hybrids in atp6, atp9, cox2, and nad3, which are most probably the result of horizontal gene transfer and are expressed at the mRNA level. We found evidence for substantial sequence variation in defined regions of mtDNA, even among closely related isolates with otherwise identical coding gene sequences. This variation makes it possible to design reliable intra- and inter-specific markers.
Genetic analysis of SIGMAR1 as a cause of familial ALS with dementia
Belzil, Véronique V; Daoud, Hussein; Camu, William; Strong, Michael J; Dion, Patrick A; Rouleau, Guy A
2013-01-01
Amyotrophic lateral sclerosis (ALS) is the most common motor neuron diseases (MND), while frontotemporal lobar degeneration (FTLD) is the second most common cause of early-onset dementia. Many ALS families segregating FTLD have been reported, particularly over the last decade. Recently, mutations in TARDBP, FUS/TLS, and C9ORF72 have been identified in both ALS and FTLD patients, while mutations in VCP, a FTLD associated gene, have been found in ALS families. Distinct variants located in the 3′-untranslated region (UTR) of the SIGMAR1 gene were previously reported in three unrelated FTLD or FTLD–MND families. We directly sequenced the coding and UTR regions of the SIGMAR1 gene in a targeted cohort of 25 individual familial ALS cases of Caucasian origin with a history of cognitive impairments. This screening identified one variant in the 3′-UTR of the SIGMAR1 gene in one ALS patient, but the same variant was also observed in 1 out of 380 control chromosomes. Subsequently, we screened the same samples for a C9ORF72 repeat expansion: 52% of this cohort was found expanded, including the sample with the SIGMAR1 3′-UTR variant. Consequently, coding and noncoding variants located in the 3′-UTR region of the SIGMAR1 gene are not the cause of FTLD–MND in our cohort, and more than half of this targeted cohort is genetically explained by C9ORF72 repeat expansions. PMID:22739338
Genetic analysis of SIGMAR1 as a cause of familial ALS with dementia.
Belzil, Véronique V; Daoud, Hussein; Camu, William; Strong, Michael J; Dion, Patrick A; Rouleau, Guy A
2013-02-01
Amyotrophic lateral sclerosis (ALS) is the most common motor neuron diseases (MND), while frontotemporal lobar degeneration (FTLD) is the second most common cause of early-onset dementia. Many ALS families segregating FTLD have been reported, particularly over the last decade. Recently, mutations in TARDBP, FUS/TLS, and C9ORF72 have been identified in both ALS and FTLD patients, while mutations in VCP, a FTLD associated gene, have been found in ALS families. Distinct variants located in the 3'-untranslated region (UTR) of the SIGMAR1 gene were previously reported in three unrelated FTLD or FTLD-MND families. We directly sequenced the coding and UTR regions of the SIGMAR1 gene in a targeted cohort of 25 individual familial ALS cases of Caucasian origin with a history of cognitive impairments. This screening identified one variant in the 3'-UTR of the SIGMAR1 gene in one ALS patient, but the same variant was also observed in 1 out of 380 control chromosomes. Subsequently, we screened the same samples for a C9ORF72 repeat expansion: 52% of this cohort was found expanded, including the sample with the SIGMAR1 3'-UTR variant. Consequently, coding and noncoding variants located in the 3'-UTR region of the SIGMAR1 gene are not the cause of FTLD-MND in our cohort, and more than half of this targeted cohort is genetically explained by C9ORF72 repeat expansions.
Gene and genon concept: coding versus regulation
2007-01-01
We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon. PMID:18087760
Winckler, T; Szafranski, K; Glöckner, G
2005-01-01
Almost every organism carries along a multitude of molecular parasites known as transposable elements (TEs). TEs influence their host genomes in many ways by expanding genome size and complexity, rearranging genomic DNA, mutagenizing host genes, and altering transcription levels of nearby genes. The eukaryotic microorganism Dictyostelium discoideum is attractive for the study of fundamental biological phenomena such as intercellular communication, formation of multicellularity, cell differentiation, and morphogenesis. D. discoideum has a highly compacted, haploid genome with less than 1 kb of genomic DNA separating coding regions. Nevertheless, the D. discoideum genome is loaded with 10% of TEs that managed to settle and survive in this inhospitable environment. In depth analysis of D. discoideum genome project data has provided intriguing insights into the evolutionary challenges that mobile elements face when they invade compact genomes. Two different mechanisms are used by D. discoideum TEs to avoid disruption of host genes upon retrotransposition. Several TEs have invented the specific targeting of tRNA gene-flanking regions as a means to avoid integration into coding regions. These elements have been dispersed on all chromosomes, closely following the distribution of tRNA genes. By contrast, TEs that lack bona fide integration specificities show a strong bias to nested integration, thus forming large TE clusters at certain chromosomal loci that are hardly resolved by bioinformatics approaches. We summarize our current view of D. discoideum TEs and present new data from the analysis of the complete sequences of D. discoideum chromosomes 1 and 2, which comprise more than one third of the total genome.
Strand, Janne M; Scheffler, Katja; Bjørås, Magnar; Eide, Lars
2014-06-01
The cellular genomes are continuously damaged by reactive oxygen species (ROS) from aerobic processes. The impact of DNA damage depends on the specific site as well as the cellular state. The steady-state level of DNA damage is the net result of continuous formation and subsequent repair, but it is unknown to what extent heterogeneous damage distribution is caused by variations in formation or repair of DNA damage. Here, we used a restriction enzyme/qPCR based method to analyze DNA damage in promoter and coding regions of four nuclear genes: the two house-keeping genes Gadph and Tbp, and the Ndufa9 and Ndufs2 genes encoding mitochondrial complex I subunits, as well as mt-Rnr1 encoded by mitochondrial DNA (mtDNA). The distribution of steady-state levels of damage varied in a site-specific manner. Oxidative stress induced damage in nDNA to a similar extent in promoter and coding regions, and more so in mtDNA. The subsequent removal of damage from nDNA was efficient and comparable with recovery times depending on the initial damage load, while repair of mtDNA was delayed with subsequently slower repair rate. The repair was furthermore found to be independent of transcription or the transcription-coupled repair factor CSB, but dependent on cellular ATP. Our results demonstrate that the capacity to repair DNA is sufficient to remove exogenously induced damage. Thus, we conclude that the heterogeneous steady-state level of DNA damage in promoters and coding regions is caused by site-specific DNA damage/modifications that take place under normal metabolism. Copyright © 2014 Elsevier B.V. All rights reserved.
Kim, Young-Kyu; Park, Chong-wook; Kim, Ki-Joong
2009-03-31
The chloroplast DNA sequences of Megaleranthis saniculifolia, an endemic and monotypic endangered plant species, were completed in this study (GenBank FJ597983). The genome is 159,924 bp in length. It harbors a pair of IR regions consisting of 26,608 bp each. The lengths of the LSC and SSC regions are 88,326 bp and 18,382 bp, respectively. The structural organizations, gene and intron contents, gene orders, AT contents, codon usages, and transcription units of the Megaleranthis chloroplast genome are similar to those of typical land plant cp DNAs. However, the detailed features of Megaleranthis chloroplast genomes are substantially different from that of Ranunculus, which belongs to the same family, the Ranunculaceae. First, the Megaleranthis cp DNA was 4,797 bp longer than that of Ranunculus due to an expanded IR region into the SSC region and duplicated sequence elements in several spacer regions of the Megaleranthis cp genome. Second, the chloroplast genomes of Megaleranthis and Ranunculus evidence 5.6% sequence divergence in the coding regions, 8.9% sequence divergence in the intron regions, and 18.7% sequence divergence in the intergenic spacer regions, respectively. In both the coding and noncoding regions, average nucleotide substitution rates differed markedly, depending on the genome position. Our data strongly implicate the positional effects of the evolutionary modes of chloroplast genes. The genes evidencing higher levels of base substitutions also have higher incidences of indel mutations and low Ka/Ks ratios. A total of 54 simple sequence repeat loci were identified from the Megaleranthis cp genome. The existence of rich cp SSR loci in the Megaleranthis cp genome provides a rare opportunity to study the population genetic structures of this endangered species. Our phylogenetic trees based on the two independent markers, the nuclear ITS and chloroplast matK sequences, strongly support the inclusion of the Megaleranthis to the Trollius. Therefore, our molecular trees support Ohwi's original treatment of Megaleranthis saniculiforia to Trollius chosenensis Ohwi.
Villela, Luciana Cristine Vasques; Alves, Anderson Luis; Varela, Eduardo Sousa; Yamagishi, Michel Eduardo Beleza; Giachetto, Poliana Fernanda; da Silva, Naiara Milagres Augusto; Ponzetto, Josi Margarete; Paiva, Samuel Rezende; Caetano, Alexandre Rodrigues
2017-02-01
The cachara (Pseudoplatystoma reticulatum) is a Neotropical freshwater catfish from family Pimelodidae (Siluriformes) native to Brazil. The species is of relative economic importance for local aquaculture production and basic biological information is under development to help boost efforts to domesticate and raise the species in commercial systems. The complete cachara mitochondrial genome was obtained by assembling Illumina RNA-seq data from pooled samples. The full mitogenome was found to be 16,576 bp in length, showing the same basic structure, order, and genetic organization observed in other Pimelodidae, with 13 protein-coding genes, 2 rNA genes, 22 trNAs, and a control region. Observed base composition was 24.63% T, 28.47% C, 31.45% A, and 15.44% G. With the exception of NAD6 and eight tRNAs, all of the observed mitochondrial genes were found to be coded on the H strand. A total of 107 SNPs were identified in P. reticulatum mtDNA, 67 of which were located in coding regions. Of these SNPs, 10 result in amino acid changes. Analysis of the obtained sequence with 94 publicly available full Siluriformes mitogenomes resulted in a phylogenetic tree that generally agreed with available phylogenetic proposals for the order. The first report of the complete Pseudoplatystoma reticulatum mitochondrial genome sequence revealed general gene organization, structure, content, and order similar to most vertebrates. Specific sequence and content features were observed and may have functional attributes which are now available for further investigation.
Kouvelis, Vassili N; Ghikas, Dimitri V; Typas, Milton A
2004-10-01
The mitochondrial genome (mtDNA) of the entomopathogenic fungus Lecanicillium muscarium (synonym Verticillium lecanii) with a total size of 24,499-bp has been analyzed. So far, it is the smallest known mitochondrial genome among Pezizomycotina, with an extremely compact gene organization and only one group-I intron in its large ribosomal RNA (rnl) gene. It contains the 14 typical genes coding for proteins related to oxidative phosphorylation, the two rRNA genes, one intronic ORF coding for a possible ribosomal protein (rps), and a set of 25 tRNA genes which recognize codons for all amino acids, except alanine and cysteine. All genes are transcribed from the same DNA strand. Gene order comparison with all available complete fungal mtDNAs-representatives of all four Phyla are included-revealed some characteristic common features like uninterrupted gene pairs, overlapping genes, and extremely variable intergenic regions, that can all be exploited for the study of fungal mitochondrial genomes. Moreover, a minimum common mtDNA gene order could be detected, in two units, for all known Sordariomycetes namely nad1-nad4-atp8-atp6 and rns-cox3-rnl, which can be extended in Hypocreales, to nad4L-nad5-cob-cox1-nad1-nad4-atp8-atp6 and rns-cox3-rnl nad2-nad3, respectively. Phylogenetic analysis of all fungal mtDNA essential protein-coding genes as one unit, clearly demonstrated the superiority of small genome (mtDNA) over single gene comparisons.
Shao, Jun-Li; Long, Yue-Sheng; Chen, Gu; Xie, Jun; Xu, Zeng-Fu
2010-06-01
Agrobacterium tumefaciens transfers DNA from its Ti plasmid to plant host cells. The genes located within the transferred DNA of Ti plasmid including the octopine synthase gene (OCS) are expressed in plant host cells. The 3'-flanking region of OCS gene, known as OCS terminator, is widely used as a transcriptional terminator of the transgenes in plant expression vectors. In this study, we found the reversed OCS terminator (3'-OCS-r) could drive expression of hygromycin phosphotransferase II gene (hpt II) and beta-glucuronidase gene in Escherichia coli, and expression of hpt II in A. tumefaciens. Furthermore, reverse transcription-polymerase chain reaction analysis revealed that an open reading frame (ORF12) that is located downstream to the 3'-OCS-r was transcribed in A. tumefaciens, which overlaps in reverse with the coding region of the OCS gene in octopine Ti plasmid.
van Hoek, Angela H A M; Mayrhofer, Sigrid; Domig, Konrad J; Flórez, Ana B; Ammor, Mohammed S; Mayo, Baltasar; Aarts, Henk J M
2008-01-01
For the first time, mosaic tetracycline resistance genes were identified in Lactobacillus johnsonii and in Bifidobacterium thermophilum strains. The L. johnsonii strain investigated contains a complex hybrid gene, tet(O/W/32/O/W/O), whereas the five bifidobacterial strains possess two different mosaic tet genes: i.e., tet(W/32/O) and tet(O/W). As reported by others, the crossover points of the mosaic tet gene segments were found at similar positions within the genes, suggesting a hot spot for recombination. Analysis of the sequences flanking these genes revealed that the upstream part corresponds to the 5' end of the mosaic open reading frame. In contrast, the downstream region was shown to be more variable. Surprisingly, in one of the B. thermophilum strains a third tet determinant was identified, coding for the efflux pump Tet(L).
2010-01-01
Background Molecular characterization of collagen-VI related myopathies currently relies on standard sequencing, which yields a detection rate approximating 75-79% in Ullrich congenital muscular dystrophy (UCMD) and 60-65% in Bethlem myopathy (BM) patients as PCR-based techniques tend to miss gross genomic rearrangements as well as copy number variations (CNVs) in both the coding sequence and intronic regions. Methods We have designed a custom oligonucleotide CGH array in order to investigate the presence of CNVs in the coding and non-coding regions of COL6A1, A2, A3, A5 and A6 genes and a group of genes functionally related to collagen VI. A cohort of 12 patients with UCMD/BM negative at sequencing analysis and 2 subjects carrying a single COL6 mutation whose clinical phenotype was not explicable by inheritance were selected and the occurrence of allelic and genetic heterogeneity explored. Results A deletion within intron 1A of the COL6A2 gene, occurring in compound heterozygosity with a small deletion in exon 28, previously detected by routine sequencing, was identified in a BM patient. RNA studies showed monoallelic transcription of the COL6A2 gene, thus elucidating the functional effect of the intronic deletion. No pathogenic mutations were identified in the remaining analyzed patients, either within COL6A genes, or in genes functionally related to collagen VI. Conclusions Our custom CGH array may represent a useful complementary diagnostic tool, especially in recessive forms of the disease, when only one mutant allele is detected by standard sequencing. The intronic deletion we identified represents the first example of a pure intronic mutation in COL6A genes. PMID:20302629
AP1 Keeps Chromatin Poised for Action | Center for Cancer Research
The human genome harbors gene-encoding DNA, the blueprint for building proteins that regulate cellular function. Embedded across the genome, in non-coding regions, are DNA elements to which regulatory factors bind. The interaction of regulatory factors with DNA at these sites modifies gene expression to modulate cell activity. In cells, DNA exists in a complex with proteins
Population genetic implications from sequence variation in four Y chromosome genes.
Shen, P; Wang, F; Underhill, P A; Franco, C; Yang, W H; Roxas, A; Sung, R; Lin, A A; Hyman, R W; Vollrath, D; Davis, R W; Cavalli-Sforza, L L; Oefner, P J
2000-06-20
Some insight into human evolution has been gained from the sequencing of four Y chromosome genes. Primary genomic sequencing determined gene SMCY to be composed of 27 exons that comprise 4,620 bp of coding sequence. The unfinished sequencing of the 5' portion of gene UTY1 was completed by primer walking, and a total of 20 exons were found. By using denaturing HPLC, these two genes, as well as DBY and DFFRY, were screened for polymorphic sites in 53-72 representatives of the five continents. A total of 98 variants were found, yielding nucleotide diversity estimates of 2.45 x 10(-5), 5. 07 x 10(-5), and 8.54 x 10(-5) for the coding regions of SMCY, DFFRY, and UTY1, respectively, with no variant having been observed in DBY. In agreement with most autosomal genes, diversity estimates for the noncoding regions were about 2- to 3-fold higher and ranged from 9. 16 x 10(-5) to 14.2 x 10(-5) for the four genes. Analysis of the frequencies of derived alleles for all four genes showed that they more closely fit the expectation of a Luria-Delbrück distribution than a distribution expected under a constant population size model, providing evidence for exponential population growth. Pairwise nucleotide mismatch distributions date the occurrence of population expansion to approximately 28,000 years ago. This estimate is in accord with the spread of Aurignacian technology and the disappearance of the Neanderthals.
Lunina, Natalia A; Agafonova, Elena V; Chekanovskaya, Lyudmila A; Dvortsov, Igor A; Berezina, Oksana V; Shedova, Ekaterina N; Kostrov, Sergey V; Velikodvorskaya, Galina A
2007-07-01
A cluster of Thermotoga neapolitana genes participating in starch degradation includes the malG gene of sugar transport protein and the aglB gene of cyclomaltodextrinase. The start and stop codons of these genes share a common overlapping sequence, aTGAtg. Here, we compared properties of expression products of three different constructs with aglB from T. neapolitana. The first expression vector contained the aglB gene linked to an upstream 90-bp 3'-terminal region of the malG gene with the stop codon overlapping with the start codon of aglB. The second construct included the isolated coding sequence of aglB with two tandem potential start codons. The expression product of this construct in Escherichia coli had two tandem Met residues at its N terminus and was characterized by low thermostability and high tendency to aggregate. In contrast, co-expression of aglB and the 3'-terminal region of malG (the first construct) resulted in AglB with only one N-terminal Met residue and a much higher specific activity of cyclomaltodextrinase. Moreover, the enzyme expressed by such a construct was more thermostable and less prone to aggregation. The third construct was the same as the second one except that it contained only one ATG start codon. The product of its expression had kinetic and other properties similar to those of the enzyme with only one N-terminal Met residue.
The complete mitochondrial genome of the butterfly Apatura metis (Lepidoptera: Nymphalidae).
Zhang, Min; Nie, Xinping; Cao, Tianwen; Wang, Juping; Li, Tao; Zhang, Xiaonan; Guo, Yaping; Ma, Enbo; Zhong, Yang
2012-06-01
As an important pest in the Slender Leaved Willow (Salix alba), Apatura metis is called Freyer's purple emperor, and its mitochondrial genome is 15,236 bp long. The encoded genes for 22 tRNA genes, two ribosomal RNA (rrnL and rrnS) genes, and 13 protein-coding genes (PCGs), and a control region in the A. metis mitochondria are highly homologous to other lepidopteran species. The mitochondrial genome of A. metis is biased toward a high A + T content (A + T = 80.5%). All protein-coding genes, except for COI begins with the CGA codon as observed in other lepidopterans, start with a typical ATN initiation codon. All tRNAs show the classic clover-leaf structure, except that the dihydrouridine (DHU) arm of tRNA(Ser(AGN)) forms a simple loop. The A. metis A + T-rich region contains some conserved structures including a structure combining the motif 'ATAGA' and 19 bp poly (T) stretch, which is similar to those found in other lepidopteran mitogenomes. The phylogenetic analyses of lepidopterans based on mitogenomes sequences demonstrate that each of the six superfamilies is monophyletic, and the relationship among them is (((Noctuoidea + (Geometroidea + Bombycoidea)) + Pyraloidea) + Papilionoidea) + Tortricoidea. In Papilionoidea group, our conclusion argues that ((Lycaenidae + Pieridae) + Nymphalidae) + Papilionidae.
Van, K; Onoda, S; Kim, M Y; Kim, K D; Lee, S-H
2008-03-01
The Waxy (Wx) gene product controls the formation of a straight chain polymer of amylose in the starch pathway. Dominance/recessiveness of the Wx allele is associated with amylose content, leading to non-waxy/waxy phenotypes. For a total of 113 foxtail millet accessions, agronomic traits and the molecular differences of the Wx gene were surveyed to evaluate genetic diversities. Molecular types were associated with phenotypes determined by four specific primer sets (non-waxy, Type I; low amylose, Type VI; waxy, Type IV or V). Additionally, the insertion of transposable element in waxy was confirmed by ex1/TSI2R, TSI2F/ex2, ex2int2/TSI7R and TSI7F/ex4r. Seventeen single nucleotide polymorphims (SNPs) were observed from non-coding regions, while three SNPs from coding regions were non-synonymous. Interestingly, the phenotype of No. 88 was still non-waxy, although seven nucleotides (AATTGGT) insertion at 2,993 bp led to 78 amino acids shorter. The rapid decline of r (2) in the sequenced region (exon 1-intron 1-exon 2) suggested a low level of linkage disequilibrium and limited haplotype structure. K (s) values and estimation of evolutionary events indicate early divergence of S. italica among cereal crops. This study suggested the Wx gene was one of the targets in the selection process during domestication.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Yutao; Das, Suchita; Olszewski, Robert Edward
Near naked hairless (HrN) is a semi-dominant mutation that arose spontaneously and was suggested by allelism testing to be an allele of mouse Hairless (Hr). HrN mice differ from other Hr mutants in that hair loss appears as the postnatal coat begins to emerge, as opposed to failure to initiate the first postnatal hair cycle, and that the mutation displays semi-dominant inheritance. We sequenced the Hr cDNA in HrN/HrN mice and characterized the pathological and molecular phenotypes to identify the basis for hair loss in this model. HrN/HrN mice exhibit dystrophic hairs that are unable to consistently emerge from themore » hair follicle, while HrN/+ mice display a sparse coat of hair and a milder degree of follicular dystrophy than their homozygous littermates. DNA microarray analysis of cutaneous gene expression demonstrates that numerous genes are downregulated in HrN/HrN mice, primarily genes important for hair structure. By contrast, Hr expression is significantly increased. Sequencing the Hr coding region, intron-exon boundaries, 5'- and 3'- UTR and immediate upstream region did not reveal the underlying mutation. Therefore HrN does not appear to be an allele of Hr but may result from a mutation in a closely linked gene or from a regulatory mutation in Hr.« less
Chandramohan, Bathrachalam; Renieri, Carlo; La Manna, Vincenzo; La Terza, Antonietta
2015-01-01
The objectives of the present study were to characterize the MC1R gene, its transcripts and the single nucleotide polymorphisms (SNPs) associated with coat color in alpaca. Full length cDNA amplification revealed the presence of two transcripts, named as F1 and F2, differing only in the length of their 5'-terminal untranslated region (UTR) sequences and presenting a color specific expression. Whereas the F1 transcript was common to white and colored (black and brown) alpaca phenotypes, the shorter F2 transcript was specific to white alpaca. Further sequencing of the MC1R gene in white and colored alpaca identified a total of twelve SNPs; among those nine (four silent mutations (c.126C>A, c.354T>C, c.618G>A, and c.933G>A); five missense mutations (c.82A>G, c.92C>T, c.259A>G, c.376A>G, and c.901C>T)) were observed in coding region and three in the 3'UTR. A 4 bp deletion (c.224 227del) was also identified in the coding region. Molecular segregation analysis uncovered that the combinatory mutations in the MC1R locus could cause eumelanin and pheomelanin synthesis in alpaca. Overall, our data refine what is known about the MC1R gene and provides additional information on its role in alpaca pigmentation.
Westholm, Jakub O.; Miura, Pedro; Olson, Sara; Shenker, Sol; Joseph, Brian; Sanfilippo, Piero; Celniker, Susan E.; Graveley, Brenton R.; Lai, Eric C.
2014-01-01
Circularization was recently recognized to broadly expand transcriptome complexity. Here, we exploit massive Drosophila total RNA-sequencing data, >5 billion paired-end reads from >100 libraries covering diverse developmental stages, tissues and cultured cells, to rigorously annotate >2500 fruitfly circular RNAs. These mostly derive from back-splicing of protein-coding genes and lack poly(A) tails, and circularization of hundreds of genes is conserved across multiple Drosophila species. We elucidate structural and sequence properties of Drosophila circular RNAs, which exhibit commonalities and distinctions from mammalian circles. Notably, Drosophila circular RNAs harbor >1000 well-conserved canonical miRNA seed matches, especially within coding regions, and coding conserved miRNA sites reside preferentially within circularized exons. Finally, we analyze the developmental and tissue specificity of circular RNAs, and note their preferred derivation from neural genes and enhanced accumulation in neural tissues. Interestingly, circular isoforms increase dramatically relative to linear isoforms during CNS aging, and constitute a novel aging biomarker. PMID:25544350
Sequence variations of the bovine prion protein gene (PRNP) in native Korean Hanwoo cattle
Choi, Sangho
2012-01-01
Bovine spongiform encephalopathy (BSE) is one of the fatal neurodegenerative diseases known as transmissible spongiform encephalopathies (TSEs) caused by infectious prion proteins. Genetic variations correlated with susceptibility or resistance to TSE in humans and sheep have not been reported for bovine strains including those from Holstein, Jersey, and Japanese Black cattle. Here, we investigated bovine prion protein gene (PRNP) variations in Hanwoo cattle [Bos (B.) taurus coreanae], a native breed in Korea. We identified mutations and polymorphisms in the coding region of PRNP, determined their frequency, and evaluated their significance. We identified four synonymous polymorphisms and two non-synonymous mutations in PRNP, but found no novel polymorphisms. The sequence and number of octapeptide repeats were completely conserved, and the haplotype frequency of the coding region was similar to that of other B. taurus strains. When we examined the 23-bp and 12-bp insertion/deletion (indel) polymorphisms in the non-coding region of PRNP, Hanwoo cattle had a lower deletion allele and 23-bp del/12-bp del haplotype frequency than healthy and BSE-affected animals of other strains. Thus, Hanwoo are seemingly less susceptible to BSE than other strains due to the 23-bp and 12-bp indel polymorphisms. PMID:22705734
Identification of two allelic IgG1 C(H) coding regions (Cgamma1) of cat.
Kanai, T H; Ueda, S; Nakamura, T
2000-01-31
Two types of cDNA encoding IgG1 heavy chain (gamma1) were isolated from a single domestic short-hair cat. Sequence analysis indicated a higher level of similarity of these Cgamma1 sequences to human Cgamma1 sequence (76.9 and 77.0%) than to mouse sequence (70.0 and 69.7%) at the nucleotide level. Predicted primary structures of both the feline Cgamma1 genes, designated as Cgamma1a and Cgamma1b, were similar to that of human Cgamma1 gene, for instance, as to the size of constant domains, the presence of six conserved cysteine residues involved in formation of the domain structure, and the location of a conserved N-linked glycosylation site. Sequence comparison between the two alleles showed that 7 out of 10 nucleotide differences were within the C(H)3 domain coding region, all leading to nonsynonymous changes in amino acid residues. Partial sequence analysis of genomic clones showed three nucleotide substitutions between the two Cgamma1 alleles in the intron between the CH2 and C(H)3 domain coding regions. In 12 domestic short-hair cats used in this study, the frequency of Cgamma1a allele (62.5%) was higher than that of the Cgamma1b allele (37.5%).
Tenebrio molitor antifreeze protein gene identification and regulation.
Qin, Wensheng; Walker, Virginia K
2006-02-15
The yellow mealworm, Tenebrio molitor, is a freeze susceptible, stored product pest. Its winter survival is facilitated by the accumulation of antifreeze proteins (AFPs), encoded by a small gene family. We have now isolated 11 different AFP genomic clones from 3 genomic libraries. All the clones had a single coding sequence, with no evidence of intervening sequences. Three genomic clones were further characterized. All have putative TATA box sequences upstream of the coding regions and multiple potential poly(A) signal sequences downstream of the coding regions. A TmAFP regulatory region, B1037, conferred transcriptional activity when ligated to a luciferase reporter sequence and after transfection into an insect cell line. A 143 bp core promoter including a TATA box sequence was identified. Its promoter activity was increased 4.4 times by inserting an exotic 245 bp intron into the construct, similar to the enhancement of transgenic expression seen in several other systems. The addition of a duplication of the first 120 bp sequence from the 143 bp core promoter decreased promoter activity by half. Although putative hormonal response sequences were identified, none of the five hormones tested enhanced reporter activity. These studies on the mechanisms of AFP transcriptional control are important for the consideration of any transfer of freeze-resistance phenotypes to beneficial hosts.
Hart, Elizabeth A; Caccamo, Mario; Harrow, Jennifer L; Humphray, Sean J; Gilbert, James GR; Trevanion, Steve; Hubbard, Tim; Rogers, Jane; Rothschild, Max F
2007-01-01
Background We describe here the sequencing, annotation and comparative analysis of an 8 Mb region of pig chromosome 17, which provides a useful test region to assess coverage and quality for the pig genome sequencing project. We report our findings comparing the annotation of draft sequence assembled at different depths of coverage. Results Within this region we annotated 71 loci, of which 53 are orthologous to human known coding genes. When compared to the syntenic regions in human (20q13.13-q13.33) and mouse (chromosome 2, 167.5 Mb-178.3 Mb), this region was found to be highly conserved with respect to gene order. The most notable difference between the three species is the presence of a large expansion of zinc finger coding genes and pseudogenes on mouse chromosome 2 between Edn3 and Phactr3 that is absent from pig and human. All of our annotation has been made publicly available in the Vertebrate Genome Annotation browser, VEGA. We assessed the impact of coverage on sequence assembly across this region and found, as expected, that increased sequence depth resulted in fewer, longer contigs. One-third of our annotated loci could not be fully re-aligned back to the low coverage version of the sequence, principally because the transcripts are fragmented over several contigs. Conclusion We have demonstrated the considerable advantages of sequencing at increased read depths and discuss the implications that lower coverage sequence may have on subsequent comparative and functional studies, particularly those involving complex loci such as GNAS. PMID:17705864
2013-01-01
Background Polycomb Repressive Complex 2 (PRC2) is an essential regulator of gene expression that maintains genes in a repressed state by marking chromatin with trimethylated Histone H3 lysine 27 (H3K27me3). In Arabidopsis, loss of PRC2 function leads to pleiotropic effects on growth and development thought to be due to ectopic expression of seed and embryo-specific genes. While there is some understanding of the mechanisms by which specific genes are targeted by PRC2 in animal systems, it is still not clear how PRC2 is recruited to specific regions of plant genomes. Results We used ChIP-seq to determine the genome-wide distribution of hemagglutinin (HA)-tagged FERTLIZATION INDEPENDENT ENDOSPERM (FIE-HA), the Extra Sex Combs homolog protein present in all Arabidopsis PRC2 complexes. We found that the FIE-HA binding sites co-locate with a subset of the H3K27me3 sites in the genome and that the associated genes were more likely to be de-repressed in mutants of PRC2 components. The FIE-HA binding sites are enriched for three sequence motifs including a putative GAGA factor binding site that is also found in Drosophila Polycomb Response Elements (PREs). Conclusions Our results suggest that PRC2 binding sites in plant genomes share some sequence features with Drosophila PREs. However, unlike Drosophila PREs which are located in promoters and devoid of H3K27me3, Arabidopsis FIE binding sites tend to be in gene coding regions and co-localize with H3K27me3. PMID:24001316
Raman, Gurusamy; Park, SeonJoo
2015-01-01
Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp) genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC) region (82,805 bp), with some variations in the inverted repeat region A (IRA)/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19) was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA) and ribosomal protein subunit L23 (rpl23) genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP) gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.
Raman, Gurusamy; Park, SeonJoo
2015-01-01
Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp) genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC) region (82,805 bp), with some variations in the inverted repeat region A (IRA)/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19) was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA) and ribosomal protein subunit L23 (rpl23) genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP) gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus. PMID:26513163
Bayesian variable selection for post-analytic interrogation of susceptibility loci.
Chen, Siying; Nunez, Sara; Reilly, Muredach P; Foulkes, Andrea S
2017-06-01
Understanding the complex interplay among protein coding genes and regulatory elements requires rigorous interrogation with analytic tools designed for discerning the relative contributions of overlapping genomic regions. To this aim, we offer a novel application of Bayesian variable selection (BVS) for classifying genomic class level associations using existing large meta-analysis summary level resources. This approach is applied using the expectation maximization variable selection (EMVS) algorithm to typed and imputed SNPs across 502 protein coding genes (PCGs) and 220 long intergenic non-coding RNAs (lncRNAs) that overlap 45 known loci for coronary artery disease (CAD) using publicly available Global Lipids Gentics Consortium (GLGC) (Teslovich et al., 2010; Willer et al., 2013) meta-analysis summary statistics for low-density lipoprotein cholesterol (LDL-C). The analysis reveals 33 PCGs and three lncRNAs across 11 loci with >50% posterior probabilities for inclusion in an additive model of association. The findings are consistent with previous reports, while providing some new insight into the architecture of LDL-cholesterol to be investigated further. As genomic taxonomies continue to evolve, additional classes such as enhancer elements and splicing regions, can easily be layered into the proposed analysis framework. Moreover, application of this approach to alternative publicly available meta-analysis resources, or more generally as a post-analytic strategy to further interrogate regions that are identified through single point analysis, is straightforward. All coding examples are implemented in R version 3.2.1 and provided as supplemental material. © 2016, The International Biometric Society.
Boldogköi, Zsolt
2004-09-01
Population genetics, the mathematical theory of modern evolutionary biology, defines evolution as the alteration of the frequency of distinct gene variants (alleles) differing in fitness over the time. The major problem with this view is that in gene and protein sequences we can find little evidence concerning the molecular basis of phenotypic variance, especially those that would confer adaptive benefit to the bearers. Some novel data, however, suggest that a large amount of genetic variation exists in the regulatory region of genes within populations. In addition, comparison of homologous DNA sequences of various species shows that evolution appears to depend more strongly on gene expression than on the genes themselves. Furthermore, it has been demonstrated in several systems that genes form functional networks, whose products exhibit interrelated expression profiles. Finally, it has been found that regulatory circuits of development behave as evolutionary units. These data demonstrate that our view of evolution calls for a new synthesis. In this article I propose a novel concept, termed the selfish gene network hypothesis, which is based on an overall consideration of the above findings. The major statements of this hypothesis are as follows. (1) Instead of individual genes, gene networks (GNs) are responsible for the determination of traits and behaviors. (2) The primary source of microevolution is the intraspecific polymorphism in GNs and not the allelic variation in either the coding or the regulatory sequences of individual genes. (3) GN polymorphism is generated by the variation in the regulatory regions of the component genes and not by the variance in their coding sequences. (4) Evolution proceeds through continuous restructuring of the composition of GNs rather than fixing of specific alleles or GN variants.
Eisenberger, Tobias; Neuhaus, Christine; Khan, Arif O.; Decker, Christian; Preising, Markus N.; Friedburg, Christoph; Bieg, Anika; Gliem, Martin; Issa, Peter Charbel; Holz, Frank G.; Baig, Shahid M.; Hellenbroich, Yorck; Galvez, Alberto; Platzer, Konrad; Wollnik, Bernd; Laddach, Nadja; Ghaffari, Saeed Reza; Rafati, Maryam; Botzenhart, Elke; Tinschert, Sigrid; Börger, Doris; Bohring, Axel; Schreml, Julia; Körtge-Jung, Stefani; Schell-Apacik, Chayim; Bakur, Khadijah; Al-Aama, Jumana Y.; Neuhann, Teresa; Herkenrath, Peter; Nürnberg, Gudrun; Nürnberg, Peter; Davis, John S.; Gal, Andreas; Bergmann, Carsten; Lorenz, Birgit; Bolz, Hanno J.
2013-01-01
Retinitis pigmentosa (RP) and Leber congenital amaurosis (LCA) are major causes of blindness. They result from mutations in many genes which has long hampered comprehensive genetic analysis. Recently, targeted next-generation sequencing (NGS) has proven useful to overcome this limitation. To uncover “hidden mutations” such as copy number variations (CNVs) and mutations in non-coding regions, we extended the use of NGS data by quantitative readout for the exons of 55 RP and LCA genes in 126 patients, and by including non-coding 5′ exons. We detected several causative CNVs which were key to the diagnosis in hitherto unsolved constellations, e.g. hemizygous point mutations in consanguineous families, and CNVs complemented apparently monoallelic recessive alleles. Mutations of non-coding exon 1 of EYS revealed its contribution to disease. In view of the high carrier frequency for retinal disease gene mutations in the general population, we considered the overall variant load in each patient to assess if a mutation was causative or reflected accidental carriership in patients with mutations in several genes or with single recessive alleles. For example, truncating mutations in RP1, a gene implicated in both recessive and dominant RP, were causative in biallelic constellations, unrelated to disease when heterozygous on a biallelic mutation background of another gene, or even non-pathogenic if close to the C-terminus. Patients with mutations in several loci were common, but without evidence for di- or oligogenic inheritance. Although the number of targeted genes was low compared to previous studies, the mutation detection rate was highest (70%) which likely results from completeness and depth of coverage, and quantitative data analysis. CNV analysis should routinely be applied in targeted NGS, and mutations in non-coding exons give reason to systematically include 5′-UTRs in disease gene or exome panels. Consideration of all variants is indispensable because even truncating mutations may be misleading. PMID:24265693
Eisenberger, Tobias; Neuhaus, Christine; Khan, Arif O; Decker, Christian; Preising, Markus N; Friedburg, Christoph; Bieg, Anika; Gliem, Martin; Charbel Issa, Peter; Holz, Frank G; Baig, Shahid M; Hellenbroich, Yorck; Galvez, Alberto; Platzer, Konrad; Wollnik, Bernd; Laddach, Nadja; Ghaffari, Saeed Reza; Rafati, Maryam; Botzenhart, Elke; Tinschert, Sigrid; Börger, Doris; Bohring, Axel; Schreml, Julia; Körtge-Jung, Stefani; Schell-Apacik, Chayim; Bakur, Khadijah; Al-Aama, Jumana Y; Neuhann, Teresa; Herkenrath, Peter; Nürnberg, Gudrun; Nürnberg, Peter; Davis, John S; Gal, Andreas; Bergmann, Carsten; Lorenz, Birgit; Bolz, Hanno J
2013-01-01
Retinitis pigmentosa (RP) and Leber congenital amaurosis (LCA) are major causes of blindness. They result from mutations in many genes which has long hampered comprehensive genetic analysis. Recently, targeted next-generation sequencing (NGS) has proven useful to overcome this limitation. To uncover "hidden mutations" such as copy number variations (CNVs) and mutations in non-coding regions, we extended the use of NGS data by quantitative readout for the exons of 55 RP and LCA genes in 126 patients, and by including non-coding 5' exons. We detected several causative CNVs which were key to the diagnosis in hitherto unsolved constellations, e.g. hemizygous point mutations in consanguineous families, and CNVs complemented apparently monoallelic recessive alleles. Mutations of non-coding exon 1 of EYS revealed its contribution to disease. In view of the high carrier frequency for retinal disease gene mutations in the general population, we considered the overall variant load in each patient to assess if a mutation was causative or reflected accidental carriership in patients with mutations in several genes or with single recessive alleles. For example, truncating mutations in RP1, a gene implicated in both recessive and dominant RP, were causative in biallelic constellations, unrelated to disease when heterozygous on a biallelic mutation background of another gene, or even non-pathogenic if close to the C-terminus. Patients with mutations in several loci were common, but without evidence for di- or oligogenic inheritance. Although the number of targeted genes was low compared to previous studies, the mutation detection rate was highest (70%) which likely results from completeness and depth of coverage, and quantitative data analysis. CNV analysis should routinely be applied in targeted NGS, and mutations in non-coding exons give reason to systematically include 5'-UTRs in disease gene or exome panels. Consideration of all variants is indispensable because even truncating mutations may be misleading.
Miller-Delaney, Suzanne F.C.; Bryan, Kenneth; Das, Sudipto; McKiernan, Ross C.; Bray, Isabella M.; Reynolds, James P.; Gwinn, Ryder; Stallings, Raymond L.
2015-01-01
Temporal lobe epilepsy is associated with large-scale, wide-ranging changes in gene expression in the hippocampus. Epigenetic changes to DNA are attractive mechanisms to explain the sustained hyperexcitability of chronic epilepsy. Here, through methylation analysis of all annotated C-phosphate-G islands and promoter regions in the human genome, we report a pilot study of the methylation profiles of temporal lobe epilepsy with or without hippocampal sclerosis. Furthermore, by comparative analysis of expression and promoter methylation, we identify methylation sensitive non-coding RNA in human temporal lobe epilepsy. A total of 146 protein-coding genes exhibited altered DNA methylation in temporal lobe epilepsy hippocampus (n = 9) when compared to control (n = 5), with 81.5% of the promoters of these genes displaying hypermethylation. Unique methylation profiles were evident in temporal lobe epilepsy with or without hippocampal sclerosis, in addition to a common methylation profile regardless of pathology grade. Gene ontology terms associated with development, neuron remodelling and neuron maturation were over-represented in the methylation profile of Watson Grade 1 samples (mild hippocampal sclerosis). In addition to genes associated with neuronal, neurotransmitter/synaptic transmission and cell death functions, differential hypermethylation of genes associated with transcriptional regulation was evident in temporal lobe epilepsy, but overall few genes previously associated with epilepsy were among the differentially methylated. Finally, a panel of 13, methylation-sensitive microRNA were identified in temporal lobe epilepsy including MIR27A, miR-193a-5p (MIR193A) and miR-876-3p (MIR876), and the differential methylation of long non-coding RNA documented for the first time. The present study therefore reports select, genome-wide DNA methylation changes in human temporal lobe epilepsy that may contribute to the molecular architecture of the epileptic brain. PMID:25552301
Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael
2013-01-01
Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343
The expanding regulatory universe of p53 in gastrointestinal cancer.
Fesler, Andrew; Zhang, Ning; Ju, Jingfang
2016-01-01
Tumor suppresser gene TP53 is one of the most frequently deleted or mutated genes in gastrointestinal cancers. As a transcription factor, p53 regulates a number of important protein coding genes to control cell cycle, cell death, DNA damage/repair, stemness, differentiation and other key cellular functions. In addition, p53 is also able to activate the expression of a number of small non-coding microRNAs (miRNAs) through direct binding to the promoter region of these miRNAs. Many miRNAs have been identified to be potential tumor suppressors by regulating key effecter target mRNAs. Our understanding of the regulatory network of p53 has recently expanded to include long non-coding RNAs (lncRNAs). Like miRNA, lncRNAs have been found to play important roles in cancer biology. With our increased understanding of the important functions of these non-coding RNAs and their relationship with p53, we are gaining exciting new insights into the biology and function of cells in response to various growth environment changes. In this review we summarize the current understanding of the ever expanding involvement of non-coding RNAs in the p53 regulatory network and its implications for our understanding of gastrointestinal cancer.
The complete coding region sequence of river buffalo (Bubalus bubalis) SRY gene.
Parma, Pietro; Feligini, Maria; Greppi, Gianfranco; Enne, Giuseppe
2004-02-01
The Y-linked SRY gene is responsible for testis determination in mammals. Mutations in this gene can lead to XY Gonadal Dysgenesis, an abnormal sexual phenotype described in humans, cattle, horses and river buffalo. We report here the complete river buffalo SRY sequence in order to enable the genetic diagnosis of this disease. The SRY sequence was also used to confirm the evolutionary divergence time between cattle and river buffalo 10 million years ago.
Genes involved in androgen biosynthesis and the male phenotype.
Waterman, M R; Keeney, D S
1992-01-01
A series of enzymatic steps in the testis lead to the conversion of cholesterol to the male sex steroid hormones, testosterone and 5 alpha-dihydrotestosterone. Mutations in any one of these steps are presumed to alter or block the development of the male phenotype. Most of the genes encoding the enzymes involved in this pathway have now been cloned, and mutations within the coding regions of these genes do, in fact, block development of the male phenotype.
Shen, Kang-Ning; Chen, Ching-Hung; Hsiao, Chung-Der
2016-05-01
In this study, the complete mitogenome sequence of hornlip mullet Plicomugil labiosus (Teleostei: Mugilidae) has been sequenced by next-generation sequencing method. The assembled mitogenome, consisting of 16,829 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein coding genes, 22 transfer RNAs, 2 ribosomal RNAs genes and a non-coding control region of D-loop. D-loop contains 1057 bp length is located between tRNA-Pro and tRNA-Phe. The overall base composition of P. labiosus is 28.0% for A, 29.3% for C, 15.5% for G and 27.2% for T. The complete mitogenome may provide essential and important DNA molecular data for further population, phylogenetic and evolutionary analysis for Mugilidae.
Shen, Kang-Ning; Tsai, Shiou-Yi; Chen, Ching-Hung; Hsiao, Chung-Der; Durand, Jean-Dominique
2016-11-01
In this study, the complete mitogenome sequence of largescale mullet (Teleostei: Mugilidae) has been sequenced by the next-generation sequencing method. The assembled mitogenome, consisting of 16,832 bp, had the typical vertebrate mitochondrial gene arrangement, including 13 protein-coding genes, 22 transfer RNAs, two ribosomal RNAs genes, and a non-coding control region of D-loop. D-loop which has a length of 1094 bp is located between tRNA-Pro and tRNA-Phe. The overall base composition of largescale mullet is 27.8% for A, 30.1% for C, 16.2% for G, and 25.9% for T. The complete mitogenome may provide essential and important DNA molecular data for further phylogenetic and evolutionary analysis for Mugilidae.
Su, Zhipeng; Zhu, Jiawen; Xu, Zhuofei; Xiao, Ran; Zhou, Rui; Li, Lu; Chen, Huanchun
2016-01-01
Actinobacillus pleuropneumoniae is the pathogen of porcine contagious pleuropneumoniae, a highly contagious respiratory disease of swine. Although the genome of A. pleuropneumoniae was sequenced several years ago, limited information is available on the genome-wide transcriptional analysis to accurately annotate the gene structures and regulatory elements. High-throughput RNA sequencing (RNA-seq) has been applied to study the transcriptional landscape of bacteria, which can efficiently and accurately identify gene expression regions and unknown transcriptional units, especially small non-coding RNAs (sRNAs), UTRs and regulatory regions. The aim of this study is to comprehensively analyze the transcriptome of A. pleuropneumoniae by RNA-seq in order to improve the existing genome annotation and promote our understanding of A. pleuropneumoniae gene structures and RNA-based regulation. In this study, we utilized RNA-seq to construct a single nucleotide resolution transcriptome map of A. pleuropneumoniae. More than 3.8 million high-quality reads (average length ~90 bp) from a cDNA library were generated and aligned to the reference genome. We identified 32 open reading frames encoding novel proteins that were mis-annotated in the previous genome annotations. The start sites for 35 genes based on the current genome annotation were corrected. Furthermore, 51 sRNAs in the A. pleuropneumoniae genome were discovered, of which 40 sRNAs were never reported in previous studies. The transcriptome map also enabled visualization of 5'- and 3'-UTR regions, in which contained 11 sRNAs. In addition, 351 operons covering 1230 genes throughout the whole genome were identified. The RNA-Seq based transcriptome map validated annotated genes and corrected annotations of open reading frames in the genome, and led to the identification of many functional elements (e.g. regions encoding novel proteins, non-coding sRNAs and operon structures). The transcriptional units described in this study provide a foundation for future studies concerning the gene functions and the transcriptional regulatory architectures of this pathogen. PMID:27018591
Diversity and evolution of the emerging Pandoraviridae family.
Legendre, Matthieu; Fabre, Elisabeth; Poirot, Olivier; Jeudy, Sandra; Lartigue, Audrey; Alempic, Jean-Marie; Beucher, Laure; Philippe, Nadège; Bertaux, Lionel; Christo-Foroux, Eugène; Labadie, Karine; Couté, Yohann; Abergel, Chantal; Claverie, Jean-Michel
2018-06-11
With DNA genomes reaching 2.5 Mb packed in particles of bacterium-like shape and dimension, the first two Acanthamoeba-infecting pandoraviruses remained up to now the most complex viruses since their discovery in 2013. Our isolation of three new strains from distant locations and environments is now used to perform the first comparative genomics analysis of the emerging worldwide-distributed Pandoraviridae family. Thorough annotation of the genomes combining transcriptomic, proteomic, and bioinformatic analyses reveals many non-coding transcripts and significantly reduces the former set of predicted protein-coding genes. Here we show that the pandoraviruses exhibit an open pan-genome, the enormous size of which is not adequately explained by gene duplications or horizontal transfers. As most of the strain-specific genes have no extant homolog and exhibit statistical features comparable to intergenic regions, we suggest that de novo gene creation could contribute to the evolution of the giant pandoravirus genomes.
Non-coding functions of alternative pre-mRNA splicing in development
Mockenhaupt, Stefan; Makeyev, Eugene V.
2015-01-01
A majority of messenger RNA precursors (pre-mRNAs) in the higher eukaryotes undergo alternative splicing to generate more than one mature product. By targeting the open reading frame region this process increases diversity of protein isoforms beyond the nominal coding capacity of the genome. However, alternative splicing also frequently controls output levels and spatiotemporal features of cellular and organismal gene expression programs. Here we discuss how these non-coding functions of alternative splicing contribute to development through regulation of mRNA stability, translational efficiency and cellular localization. PMID:26493705
2010-01-01
Background Natural accessions of Arabidopsis thaliana are characterized by a high level of phenotypic variation that can be used to investigate the extent and mode of selection on the primary metabolic traits. A collection of 54 A. thaliana natural accession-derived lines were subjected to deep genotyping through Single Feature Polymorphism (SFP) detection via genomic DNA hybridization to Arabidopsis Tiling 1.0 Arrays for the detection of selective sweeps, and identification of associations between sweep regions and growth-related metabolic traits. Results A total of 1,072,557 high-quality SFPs were detected and indications for 3,943 deletions and 1,007 duplications were obtained. A significantly lower than expected SFP frequency was observed in protein-, rRNA-, and tRNA-coding regions and in non-repetitive intergenic regions, while pseudogenes, transposons, and non-coding RNA genes are enriched with SFPs. Gene families involved in plant defence or in signalling were identified as highly polymorphic, while several other families including transcription factors are depleted of SFPs. 198 significant associations between metabolic genes and 9 metabolic and growth-related phenotypic traits were detected with annotation hinting at the nature of the relationship. Five significant selective sweep regions were also detected of which one associated significantly with a metabolic trait. Conclusions We generated a high density polymorphism map for 54 A. thaliana accessions that highlights the variability of resistance genes across geographic ranges and used it to identify selective sweeps and associations between metabolic genes and metabolic phenotypes. Several associations show a clear biological relationship, while many remain requiring further investigation. PMID:20302660
Ujvari, Beata; Madsen, Thomas
2008-10-01
Using PCR, the complete mitochondrial genome was sequenced in three frillneck lizards (Chlamydosaurus kingii). The mitochondria spanned over 16,761bp. As in other vertebrates, two rRNA genes, 22 tRNA genes and 13 protein coding genes were identified. However, similar to some other squamate reptiles, two control regions (CRI and CRII) were identified, spanning 801 and 812 bp, respectively. Our results were compared with another Australian member of the family Agamidae, the bearded dragon (Pogana vitticeps). The overall base composition of the light-strand sequence largely mirrored that observed in P vitticeps. Furthermore, similar to P. vitticeps, we observed an insertion 801 bp long between the ND5 and ND6 genes. However, in contrast to P vitticeps we did not observe a conserved sequence block III region. Based on a comparison among the three frillneck lizards, we also present data on the proportion of variable sites within the major mitochondrial regions.
Bhattacharya, D; Steinkötter, J; Melkonian, M
1993-12-01
Centrin (= caltractin) is a ubiquitous, cytoskeletal protein which is a member of the EF-hand superfamily of calcium-binding proteins. A centrin-coding cDNA was isolated and characterized from the prasinophyte green alga Scherffelia dubia. Centrin PCR amplification primers were used to isolate partial, homologous cDNA sequences from the green algae Tetraselmis striata and Spermatozopsis similis. Annealing analyses suggested that centrin is a single-copy-coding region in T. striata and S. similis and other green algae studied. Centrin-coding regions from S. dubia, S. similis and T. striata encode four colinear EF-hand domains which putatively bind calcium. Phylogenetic analyses, including homologous sequences from Chlamydomonas reinhardtii and the land plant Atriplex nummularia, demonstrate that the domains of centrins are congruent and arose from the two-fold duplication of an ancestral EF hand with Domains 1+3 and Domains 2+4 clustering. The domains of centrins are also congruent with those of calmodulins demonstrating that, like calmodulin, centrin is an ancient protein which arose within the ancestor of all eukaryotes via gene duplication. Phylogenetic relationships inferred from centrin-coding region comparisons mirror results of small subunit ribosomal RNA sequence analyses suggesting that centrin-coding regions are useful evolutionary markers within the green algae.
Klug, G; Cohen, S N
1990-01-01
Differential expression of the genes within the puf operon of Rhodobacter capsulatus is accomplished in part by differences in the rate of degradation of different segments of the puf transcript. We report here that decay of puf mRNA sequences specifying the light-harvesting I (LHI) and reaction center (RC) photosynthetic membrane peptides is initiated endoribonucleolytically within a discrete 1.4-kilobase segment of the RC-coding region. Deletion of this segment increased the half-life of the RC-coding region from 8 to 20 min while not affecting decay of LHI-coding sequences upstream from an intercistronic hairpin loop structure shown previously to impede 3'-to-5' degradation. Prolongation of RC segment half-life was dependent on the presence of other hairpin structures 3' to the RC region. Inserting the endonuclease-sensitive sites into the LHI-coding segment markedly accelerated its degradation. Our results suggest that differential degradation of the RC- and LHI-coding segments of puf mRNA is accomplished at least in part by the combined actions of RC region-specific endonuclease(s), one or more exonucleases, and several strategically located exonuclease-impeding hairpins. Images PMID:2394682
Aging Shapes the Population-Mean and -Dispersion of Gene Expression in Human Brains
Brinkmeyer-Langford, Candice L.; Guan, Jinting; Ji, Guoli; Cai, James J.
2016-01-01
Human aging is associated with cognitive decline and an increased risk of neurodegenerative disease. Our objective for this study was to evaluate potential relationships between age and variation in gene expression across different regions of the brain. We analyzed the Genotype-Tissue Expression (GTEx) data from 54 to 101 tissue samples across 13 brain regions in post-mortem donors of European descent aged between 20 and 70 years at death. After accounting for the effects of covariates and hidden confounding factors, we identified 1446 protein-coding genes whose expression in one or more brain regions is correlated with chronological age at a false discovery rate of 5%. These genes are involved in various biological processes including apoptosis, mRNA splicing, amino acid biosynthesis, and neurotransmitter transport. The distribution of these genes among brain regions is uneven, suggesting variable regional responses to aging. We also found that the aging response of many genes, e.g., TP37 and C1QA, depends on individuals' genotypic backgrounds. Finally, using dispersion-specific analysis, we identified genes such as IL7R, MS4A4E, and TERF1/TERF2 whose expressions are differentially dispersed by aging, i.e., variances differ between age groups. Our results demonstrate that age-related gene expression is brain region-specific, genotype-dependent, and associated with both mean and dispersion changes. Our findings provide a foundation for more sophisticated gene expression modeling in the studies of age-related neurodegenerative diseases. PMID:27536236
Formighieri, Eduardo F; Tiburcio, Ricardo A; Armas, Eduardo D; Medrano, Francisco J; Shimo, Hugo; Carels, Nicolas; Góes-Neto, Aristóteles; Cotomacci, Carolina; Carazzolle, Marcelo F; Sardinha-Pinto, Naiara; Thomazella, Daniela P T; Rincones, Johana; Digiampietri, Luciano; Carraro, Dirce M; Azeredo-Espin, Ana M; Reis, Sérgio F; Deckmann, Ana C; Gramacho, Karina; Gonçalves, Marilda S; Moura Neto, José P; Barbosa, Luciana V; Meinhardt, Lyndel W; Cascardo, Júlio C M; Pereira, Gonçalo A G
2008-10-01
We present here the sequence of the mitochondrial genome of the basidiomycete phytopathogenic hemibiotrophic fungus Moniliophthora perniciosa, causal agent of the Witches' Broom Disease in Theobroma cacao. The DNA is a circular molecule of 109,103 base pairs, with 31.9% GC, and is the largest sequenced so far. This size is due essentially to the presence of numerous non-conserved hypothetical ORFs. It contains the 14 genes coding for proteins involved in the oxidative phosphorylation, the two rRNA genes, one ORF coding for a ribosomal protein (rps3), and a set of 26 tRNA genes that recognize codons for all amino acids. Seven homing endonucleases are located inside introns. Except atp8, all conserved known genes are in the same orientation. Phylogenetic analysis based on the cox genes agrees with the commonly accepted fungal taxonomy. An uncommon feature of this mitochondrial genome is the presence of a region that contains a set of four, relatively small, nested, inverted repeats enclosing two genes coding for polymerases with an invertron-type structure and three conserved hypothetical genes interpreted as the stable integration of a mitochondrial linear plasmid. The integration of this plasmid seems to be a recent evolutionary event that could have implications in fungal biology. This sequence is available under GenBank accession number AY376688.
A global view of the nonprotein-coding transcriptome in Plasmodium falciparum
Raabe, Carsten A.; Sanchez, Cecilia P.; Randau, Gerrit; Robeck, Thomas; Skryabin, Boris V.; Chinni, Suresh V.; Kube, Michael; Reinhardt, Richard; Ng, Guey Hooi; Manickam, Ravichandran; Kuryshev, Vladimir Y.; Lanzer, Michael; Brosius, Juergen; Tang, Thean Hock; Rozhdestvensky, Timofey S.
2010-01-01
Nonprotein-coding RNAs (npcRNAs) represent an important class of regulatory molecules that act in many cellular pathways. Here, we describe the experimental identification and validation of the small npcRNA transcriptome of the human malaria parasite Plasmodium falciparum. We identified 630 novel npcRNA candidates. Based on sequence and structural motifs, 43 of them belong to the C/D and H/ACA-box subclasses of small nucleolar RNAs (snoRNAs) and small Cajal body-specific RNAs (scaRNAs). We further observed the exonization of a functional H/ACA snoRNA gene, which might contribute to the regulation of ribosomal protein L7a gene expression. Some of the small npcRNA candidates are from telomeric and subtelomeric repetitive regions, suggesting their potential involvement in maintaining telomeric integrity and subtelomeric gene silencing. We also detected 328 cis-encoded antisense npcRNAs (asRNAs) complementary to P. falciparum protein-coding genes of a wide range of biochemical pathways, including determinants of virulence and pathology. All cis-encoded asRNA genes tested exhibit lifecycle-specific expression profiles. For all but one of the respective sense–antisense pairs, we deduced concordant patterns of expression. Our findings have important implications for a better understanding of gene regulatory mechanisms in P. falciparum, revealing an extended and sophisticated npcRNA network that may control the expression of housekeeping genes and virulence factors. PMID:19864253
A global view of the nonprotein-coding transcriptome in Plasmodium falciparum.
Raabe, Carsten A; Sanchez, Cecilia P; Randau, Gerrit; Robeck, Thomas; Skryabin, Boris V; Chinni, Suresh V; Kube, Michael; Reinhardt, Richard; Ng, Guey Hooi; Manickam, Ravichandran; Kuryshev, Vladimir Y; Lanzer, Michael; Brosius, Juergen; Tang, Thean Hock; Rozhdestvensky, Timofey S
2010-01-01
Nonprotein-coding RNAs (npcRNAs) represent an important class of regulatory molecules that act in many cellular pathways. Here, we describe the experimental identification and validation of the small npcRNA transcriptome of the human malaria parasite Plasmodium falciparum. We identified 630 novel npcRNA candidates. Based on sequence and structural motifs, 43 of them belong to the C/D and H/ACA-box subclasses of small nucleolar RNAs (snoRNAs) and small Cajal body-specific RNAs (scaRNAs). We further observed the exonization of a functional H/ACA snoRNA gene, which might contribute to the regulation of ribosomal protein L7a gene expression. Some of the small npcRNA candidates are from telomeric and subtelomeric repetitive regions, suggesting their potential involvement in maintaining telomeric integrity and subtelomeric gene silencing. We also detected 328 cis-encoded antisense npcRNAs (asRNAs) complementary to P. falciparum protein-coding genes of a wide range of biochemical pathways, including determinants of virulence and pathology. All cis-encoded asRNA genes tested exhibit lifecycle-specific expression profiles. For all but one of the respective sense-antisense pairs, we deduced concordant patterns of expression. Our findings have important implications for a better understanding of gene regulatory mechanisms in P. falciparum, revealing an extended and sophisticated npcRNA network that may control the expression of housekeeping genes and virulence factors.
Silar, Philippe; Barreau, Christian; Debuchy, Robert; Kicka, Sébastien; Turcq, Béatrice; Sainsard-Chanet, Annie; Sellem, Carole H; Billault, Alain; Cattolico, Laurence; Duprat, Simone; Weissenbach, Jean
2003-08-01
A Podospora anserina BAC library of 4800 clones has been constructed in the vector pBHYG allowing direct selection in fungi. Screening of the BAC collection for centromeric sequences of chromosome V allowed the recovery of clones localized on either sides of the centromere, but no BAC clone was found to contain the centromere. Seven BAC clones containing 322,195 and 156,244bp from either sides of the centromeric region were sequenced and annotated. One 5S rRNA gene, 5 tRNA genes, and 163 putative coding sequences (CDS) were identified. Among these, only six CDS seem specific to P. anserina. The gene density in the centromeric region is approximately one gene every 2.8kb. Extrapolation of this gene density to the whole genome of P. anserina suggests that the genome contains about 11,000 genes. Synteny analyses between P. anserina and Neurospora crassa show that co-linearity extends at the most to a few genes, suggesting rapid genome rearrangements between these two species.
Fernandez-Valverde, Selene L; Calcino, Andrew D; Degnan, Bernard M
2015-05-15
The demosponge Amphimedon queenslandica is amongst the few early-branching metazoans with an assembled and annotated draft genome, making it an important species in the study of the origin and early evolution of animals. Current gene models in this species are largely based on in silico predictions and low coverage expressed sequence tag (EST) evidence. Amphimedon queenslandica protein-coding gene models are improved using deep RNA-Seq data from four developmental stages and CEL-Seq data from 82 developmental samples. Over 86% of previously predicted genes are retained in the new gene models, although 24% have additional exons; there is also a marked increase in the total number of annotated 3' and 5' untranslated regions (UTRs). Importantly, these new developmental transcriptome data reveal numerous previously unannotated protein-coding genes in the Amphimedon genome, increasing the total gene number by 25%, from 30,060 to 40,122. In general, Amphimedon genes have introns that are markedly smaller than those in other animals and most of the alternatively spliced genes in Amphimedon undergo intron-retention; exon-skipping is the least common mode of alternative splicing. Finally, in addition to canonical polyadenylation signal sequences, Amphimedon genes are enriched in a number of unique AT-rich motifs in their 3' UTRs. The inclusion of developmental transcriptome data has substantially improved the structure and composition of protein-coding gene models in Amphimedon queenslandica, providing a more accurate and comprehensive set of genes for functional and comparative studies. These improvements reveal the Amphimedon genome is comprised of a remarkably high number of tightly packed genes. These genes have small introns and there is pervasive intron retention amongst alternatively spliced transcripts. These aspects of the sponge genome are more similar unicellular opisthokont genomes than to other animal genomes.
Auer, Paul L; Nalls, Mike; Meschia, James F; Worrall, Bradford B; Longstreth, W T; Seshadri, Sudha; Kooperberg, Charles; Burger, Kathleen M; Carlson, Christopher S; Carty, Cara L; Chen, Wei-Min; Cupples, L Adrienne; DeStefano, Anita L; Fornage, Myriam; Hardy, John; Hsu, Li; Jackson, Rebecca D; Jarvik, Gail P; Kim, Daniel S; Lakshminarayan, Kamakshi; Lange, Leslie A; Manichaikul, Ani; Quinlan, Aaron R; Singleton, Andrew B; Thornton, Timothy A; Nickerson, Deborah A; Peters, Ulrike; Rich, Stephen S
2015-07-01
Stroke is the second leading cause of death and the third leading cause of years of life lost. Genetic factors contribute to stroke prevalence, and candidate gene and genome-wide association studies (GWAS) have identified variants associated with ischemic stroke risk. These variants often have small effects without obvious biological significance. Exome sequencing may discover predicted protein-altering variants with a potentially large effect on ischemic stroke risk. To investigate the contribution of rare and common genetic variants to ischemic stroke risk by targeting the protein-coding regions of the human genome. The National Heart, Lung, and Blood Institute (NHLBI) Exome Sequencing Project (ESP) analyzed approximately 6000 participants from numerous cohorts of European and African ancestry. For discovery, 365 cases of ischemic stroke (small-vessel and large-vessel subtypes) and 809 European ancestry controls were sequenced; for replication, 47 affected sibpairs concordant for stroke subtype and an African American case-control series were sequenced, with 1672 cases and 4509 European ancestry controls genotyped. The ESP's exome sequencing and genotyping started on January 1, 2010, and continued through June 30, 2012. Analyses were conducted on the full data set between July 12, 2012, and July 13, 2013. Discovery of new variants or genes contributing to ischemic stroke risk and subtype (primary analysis) and determination of support for protein-coding variants contributing to risk in previously published candidate genes (secondary analysis). We identified 2 novel genes associated with an increased risk of ischemic stroke: a protein-coding variant in PDE4DIP (rs1778155; odds ratio, 2.15; P = 2.63 × 10(-8)) with an intracellular signal transduction mechanism and in ACOT4 (rs35724886; odds ratio, 2.04; P = 1.24 × 10(-7)) with a fatty acid metabolism; confirmation of PDE4DIP was observed in affected sibpair families with large-vessel stroke subtype and in African Americans. Replication of protein-coding variants in candidate genes was observed for 2 previously reported GWAS associations: ZFHX3 (cardioembolic stroke) and ABCA1 (large-vessel stroke). Exome sequencing discovered 2 novel genes and mechanisms, PDE4DIP and ACOT4, associated with increased risk for ischemic stroke. In addition, ZFHX3 and ABCA1 were discovered to have protein-coding variants associated with ischemic stroke. These results suggest that genetic variation in novel pathways contributes to ischemic stroke risk and serves as a target for prediction, prevention, and therapy.
Kapanadze, B; Kashuba, V; Baranova, A; Rasool, O; van Everdink, W; Liu, Y; Syomov, A; Corcoran, M; Poltaraus, A; Brodyansky, V; Syomova, N; Kazakov, A; Ibbotson, R; van den Berg, A; Gizatullin, R; Fedorova, L; Sulimova, G; Zelenin, A; Deaven, L; Lehrach, H; Grander, D; Buys, C; Oscier, D; Zabarovsky, E R; Einhorn, S; Yankovsky, N
1998-04-17
B-cell chronic lymphocytic leukemia (B-CLL) is a human hematological neoplastic disease often associated with the loss of a chromosome 13 region between RB1 gene and locus D13S25. A new tumor suppressor gene (TSG) may be located in the region. A cosmid contig has been constructed between the loci D13S1168 (WI9598) and D13S25 (H2-42), which corresponds to the minimal region shared by B-CLL associated deletions. The contig includes more than 200 LANL and ICRF cosmid clones covering 620 kb. Three cDNAs likely corresponding to three different genes have been found in the minimally deleted region, sequenced and mapped against the contigged cosmids. cDNA clone 10k4 as well as a chimeric clone 13g3, codes for a zinc-finger domain of the RING type and shares homology to some known genes involved in tumorigenesis (RET finger protein, BRCA1) and embryogenesis (MID1). We have termed the gene corresponding to 10k4/13g3 clones LEU5. This is the first gene with homology to known TSGs which has been found in the region of B-CLL rearrangements.
Localization of TFIIB binding regions using serial analysis of chromatin occupancy
Yochum, Gregory S; Rajaraman, Veena; Cleland, Ryan; McWeeney, Shannon
2007-01-01
Background: RNA Polymerase II (RNAP II) is recruited to core promoters by the pre-initiation complex (PIC) of general transcription factors. Within the PIC, transcription factor for RNA polymerase IIB (TFIIB) determines the start site of transcription. TFIIB binding has not been localized, genome-wide, in metazoans. Serial analysis of chromatin occupancy (SACO) is an unbiased methodology used to empirically identify transcription factor binding regions. In this report, we use TFIIB and SACO to localize TFIIB binding regions across the rat genome. Results: A sample of the TFIIB SACO library was sequenced and 12,968 TFIIB genomic signature tags (GSTs) were assigned to the rat genome. GSTs are 20–22 base pair fragments that are derived from TFIIB bound chromatin. TFIIB localized to both non-protein coding and protein-coding loci. For 21% of the 1783 protein-coding genes in this sample of the SACO library, TFIIB binding mapped near the characterized 5' promoter that is upstream of the transcription start site (TSS). However, internal TFIIB binding positions were identified in 57% of the 1783 protein-coding genes. Internal positions are defined as those within an inclusive region greater than 2.5 kb downstream from the 5' TSS and 2.5 kb upstream from the transcription stop. We demonstrate that both TFIIB and TFIID (an additional component of PICs) bound to internal regions using chromatin immunoprecipitation (ChIP). The 5' cap of transcripts associated with internal TFIIB binding positions were identified using a cap-trapping assay. The 5' TSSs for internal transcripts were confirmed by primer extension. Additionally, an analysis of the functional annotation of mouse 3 (FANTOM3) databases indicates that internally initiated transcripts identified by TFIIB SACO in rat are conserved in mouse. Conclusion: Our findings that TFIIB binding is not restricted to the 5' upstream region indicates that the propensity for PIC to contribute to transcript diversity is far greater than previously appreciated. PMID:17997859
Lim, Hyoun-Sub; Vaira, Anna Maria; Domier, Leslie L; Lee, Sung Chul; Kim, Hong Gi; Hammond, John
2010-06-20
We have developed plant virus-based vectors for virus-induced gene silencing (VIGS) and protein expression, based on Alternanthera mosaic virus (AltMV), for infection of a wide range of host plants including Nicotiana benthamiana and Arabidopsis thaliana by either mechanical inoculation of in vitro transcripts or via agroinfiltration. In vivo transcripts produced by co-agroinfiltration of bacteriophage T7 RNA polymerase resulted in T7-driven AltMV infection from a binary vector in the absence of the Cauliflower mosaic virus 35S promoter. An artificial bipartite viral vector delivery system was created by separating the AltMV RNA-dependent RNA polymerase and Triple Gene Block (TGB)123-Coat protein (CP) coding regions into two constructs each bearing the AltMV 5' and 3' non-coding regions, which recombined in planta to generate a full-length AltMV genome. Substitution of TGB1 L(88)P, and equivalent changes in other potexvirus TGB1 proteins, affected RNA silencing suppression efficacy and suitability of the vectors from protein expression to VIGS. Published by Elsevier Inc.
Kocher, Arthur; Gantier, Jean-Charles; Holota, Hélène; Jeziorski, Céline; Coissac, Eric; Bañuls, Anne-Laure; Girod, Romain; Gaborit, Pascal; Murienne, Jérôme
2016-11-01
The nearly complete mitochondrial genome of Lutzomyia umbratilis Ward & Fraiha, 1977 (Psychodidae: Phlebotominae), considered as the main vector of Leishmania guyanensis, is presented. The sequencing has been performed on an Illumina Hiseq 2500 platform, with a genome skimming strategy. The full nuclear ribosomal RNA segment was also assembled. The mitogenome of L. umbratilis was determined to be at least 15,717 bp-long and presents an architecture found in many mitogenomes of insect (13 protein-coding genes, 22 transfer RNAs, two ribosomal RNAs, and one non-coding region also referred as the control region). The control region contains a large repeated element of c. 370 bp and a poly-AT region of unknown length. This is the first mitogenome of Psychodidae to be described.
Schmale, H; Ivell, R; Breindl, M; Darmer, D; Richter, D
1984-01-01
The vasopressin gene from normal and diabetes insipidus (Brattleboro) rats has been isolated and sequenced. Except for a single deletion of a G residue in region coding for the neurophysin carrier protein the approximately 2300 nucleotides of both genes are identical. Blot analysis of hypothalamic RNA as well as transfection and microinjection experiments indicate that the mutant gene is correctly transcribed and spliced, however the resulting mRNA is not efficiently translated. Images Fig. 2. Fig. 3. PMID:6526016
Schulte, W; Töpfer, R; Stracke, R; Schell, J; Martini, N
1997-04-01
Three genes coding for different multifunctional acetyl-CoA carboxylase (ACCase; EC 6.4.1.2) isoenzymes from Brassica napus were isolated and divided into two major classes according to structural features in their 5' regions: class I comprises two genes with an additional coding exon of approximately 300 bp at the 5' end, and class II is represented by one gene carrying an intron of 586 bp in its 5' untranslated region. Fusion of the peptide sequence encoded by the additional first exon of a class I ACCase gene to the jellyfish Aequorea victoria green fluorescent protein (GFP) and transient expression in tobacco protoplasts targeted GFP to the chloroplasts. In contrast to the deduced primary structure of the biotin carboxylase domain encoded by the class I gene, the corresponding amino acid sequence of the class II ACCase shows higher identity with that of the Arabidopsis ACCase, both lacking a transit peptide. The Arabidopsis ACCase has been proposed to be a cytosolic isoenzyme. These observations indicate that the two classes of ACCase genes encode plastidic and cytosolic isoforms of multi-functional, eukaryotic type, respectively, and that B. napus contains at least one multi-functional ACCase besides the multi-subunit, prokaryotic type located in plastids. Southern blot analysis of genomic DNA from B. napus, Brassica rapa, and Brassica oleracea, the ancestors of amphidiploid rapeseed, using a fragment of a multi-functional ACCase gene as a probe revealed that ACCase is encoded by a multi-gene family of at least five members.
Implication of common and disease specific variants in CLU, CR1, and PICALM.
Ferrari, Raffaele; Moreno, Jorge H; Minhajuddin, Abu T; O'Bryant, Sid E; Reisch, Joan S; Barber, Robert C; Momeni, Parastoo
2012-08-01
Two recent genome-wide association studies (GWAS) for late onset Alzheimer's disease (LOAD) revealed 3 new genes: clusterin (CLU), phosphatidylinositol binding clathrin assembly protein (PICALM), and complement receptor 1 (CR1). In order to evaluate association with these genome-wide association study-identified genes and to isolate the variants contributing to the pathogenesis of LOAD, we genotyped the top single nucleotide polymorphisms (SNPs), rs11136000 (CLU), rs3818361 (CR1), and rs3851179 (PICALM), and sequenced the entire coding regions of these genes in our cohort of 342 LOAD patients and 277 control subjects. We confirmed the association of rs3851179 (PICALM) (p = 7.4 × 10(-3)) with the disease status. Through sequencing we identified 18 variants in CLU, 3 of which were found exclusively in patients; 8 variants (out of 65) in CR1 gene were only found in patients and the 16 variants identified in PICALM gene were present in both patients and controls. In silico analysis of the variants in PICALM did not predict any damaging effect on the protein. The haplotype analysis of the variants in each gene predicted a common haplotype when the 3 single nucleotide polymorphisms rs11136000 (CLU), rs3818361 (CR1), and rs3851179 (PICALM), respectively, were included. For each gene the haplotype structure and size differed between patients and controls. In conclusion, we confirmed association of CLU, CR1, and PICALM genes with the disease status in our cohort through identification of a number of disease-specific variants among patients through the sequencing of the coding region of these genes. Published by Elsevier Inc.
Bohlin, Jon; Eldholm, Vegard; Pettersson, John H O; Brynildsrud, Ola; Snipen, Lars
2017-02-10
The core genome consists of genes shared by the vast majority of a species and is therefore assumed to have been subjected to substantially stronger purifying selection than the more mobile elements of the genome, also known as the accessory genome. Here we examine intragenic base composition differences in core genomes and corresponding accessory genomes in 36 species, represented by the genomes of 731 bacterial strains, to assess the impact of selective forces on base composition in microbes. We also explore, in turn, how these results compare with findings for whole genome intragenic regions. We found that GC content in coding regions is significantly higher in core genomes than accessory genomes and whole genomes. Likewise, GC content variation within coding regions was significantly lower in core genomes than in accessory genomes and whole genomes. Relative entropy in coding regions, measured as the difference between observed and expected trinucleotide frequencies estimated from mononucleotide frequencies, was significantly higher in the core genomes than in accessory and whole genomes. Relative entropy was positively associated with coding region GC content within the accessory genomes, but not within the corresponding coding regions of core or whole genomes. The higher intragenic GC content and relative entropy, as well as the lower GC content variation, observed in the core genomes is most likely associated with selective constraints. It is unclear whether the positive association between GC content and relative entropy in the more mobile accessory genomes constitutes signatures of selection or selective neutral processes.
Zhang, J R; Norris, S J
1998-08-01
The Lyme disease spirochete Borrelia burgdorferi possesses 15 silent vls cassettes and a vls expression site (vlsE) encoding a surface-exposed lipoprotein. Segments of the silent vls cassettes have been shown to recombine with the vlsE cassette region in the mammalian host, resulting in combinatorial antigenic variation. Despite promiscuous recombination within the vlsE cassette region, the 5' and 3' coding sequences of vlsE that flank the cassette region are not subject to sequence variation during these recombination events. The segments of the silent vls cassettes recombine in the vlsE cassette region through a unidirectional process such that the sequence and organization of the silent vls loci are not affected. As a result of recombination, the previously expressed segments are replaced by incoming segments and apparently degraded. These results provide evidence for a gene conversion mechanism in VlsE antigenic variation.
The complete plastid genome sequence of Eustrephus latifolius (Asparagaceae: Lomandroideae).
Kim, Hyoung Tae; Kim, Jung Sung; Kim, Joo-Hwan
2016-01-01
The complete chloroplast (cp) genome sequence of Eustrephus latifolius was firstly determined in subfamily Lomandriodeae of family Asparagaceae. It was 159,736 bp and contained a large single copy region (82,403 bp) and a small single copy region (13,607 bp) which were separated by two inverted repeat regions (31,863 bp). In total, 132 genes were identified and they were consisted of 83 coding genes, 8 rRNA genes, 38 tRNA genes, 3 pseudogenes. rpl23 and clpP were pseudogenes due to sequence deletions. Among 23 genes containing introns, rps12 and ycf3 contained two introns and the rest had just one intron. The intact ycf68 was identified within an intron of trnI-GAU. The amino acid sequence was almost identical with Phoenix dactylifera in Aracales. Ycf1 of E. latifolius was completely located in IR. It was similar to cp genome structure of Lemna minor, Spirodela polyrhiza, Wolffiella lingulata, Wolffia australiana in Alismatales.
MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity
Wang, Yupeng; Tang, Haibao; DeBarry, Jeremy D.; Tan, Xu; Li, Jingping; Wang, Xiyin; Lee, Tae-ho; Jin, Huizhe; Marler, Barry; Guo, Hui; Kissinger, Jessica C.; Paterson, Andrew H.
2012-01-01
MCScan is an algorithm able to scan multiple genomes or subgenomes in order to identify putative homologous chromosomal regions, and align these regions using genes as anchors. The MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity that extends the original software by incorporating 14 utility programs for visualization of results and additional downstream analyses. Applications of MCScanX to several sequenced plant genomes and gene families are shown as examples. MCScanX can be used to effectively analyze chromosome structural changes, and reveal the history of gene family expansions that might contribute to the adaptation of lineages and taxa. An integrated view of various modes of gene duplication can supplement the traditional gene tree analysis in specific families. The source code and documentation of MCScanX are freely available at http://chibba.pgml.uga.edu/mcscan2/. PMID:22217600
Evolution of the chalcone synthase gene family in the genus Ipomoea.
Durbin, M L; Learn, G H; Huttley, G A; Clegg, M T
1995-01-01
The evolution of the chalcone synthase [CHS; malonyl-CoA:4-coumaroyl-CoA malonyltransferase (cyclizing), EC 2.3.1.74] multigene family in the genus Ipomoea is explored. Thirteen CHS genes from seven Ipomoea species (family Convolvulaceae) were sequenced--three from genomic clones and the remainder from PCR amplification with primers designed from the 5' flanking region and the end of the 3' coding region of Ipomoea purpurea Roth. Analysis of the data indicates a duplication of CHS that predates the divergence of the Ipomoea species in this study. The Ipomoea CHS genes are among the most rapidly evolving of the CHS genes sequenced to date. The CHS genes in this study are most closely related to the Petunia CHS-B gene, which is also rapidly evolving and highly divergent from the rest of the Petunia CHS sequences. PMID:7724563
Transcription Factors Bind Thousands of Active and InactiveRegions in the Drosophila Blastoderm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Xiao-Yong; MacArthur, Stewart; Bourgon, Richard
2008-01-10
Identifying the genomic regions bound by sequence-specific regulatory factors is central both to deciphering the complex DNA cis-regulatory code that controls transcription in metazoans and to determining the range of genes that shape animal morphogenesis. Here, we use whole-genome tiling arrays to map sequences bound in Drosophila melanogaster embryos by the six maternal and gap transcription factors that initiate anterior-posterior patterning. We find that these sequence-specific DNA binding proteins bind with quantitatively different specificities to highly overlapping sets of several thousand genomic regions in blastoderm embryos. Specific high- and moderate-affinity in vitro recognition sequences for each factor are enriched inmore » bound regions. This enrichment, however, is not sufficient to explain the pattern of binding in vivo and varies in a context-dependent manner, demonstrating that higher-order rules must govern targeting of transcription factors. The more highly bound regions include all of the over forty well-characterized enhancers known to respond to these factors as well as several hundred putative new cis-regulatory modules clustered near developmental regulators and other genes with patterned expression at this stage of embryogenesis. The new targets include most of the microRNAs (miRNAs) transcribed in the blastoderm, as well as all major zygotically transcribed dorsal-ventral patterning genes, whose expression we show to be quantitatively modulated by anterior-posterior factors. In addition to these highly bound regions, there are several thousand regions that are reproducibly bound at lower levels. However, these poorly bound regions are, collectively, far more distant from genes transcribed in the blastoderm than highly bound regions; are preferentially found in protein-coding sequences; and are less conserved than highly bound regions. Together these observations suggest that many of these poorly-bound regions are not involved in early-embryonic transcriptional regulation, and a significant proportion may be nonfunctional. Surprisingly, for five of the six factors, their recognition sites are not unambiguously more constrained evolutionarily than the immediate flanking DNA, even in more highly bound and presumably functional regions, indicating that comparative DNA sequence analysis is limited in its ability to identify functional transcription factor targets.« less
Chang, Chia-Hao; Shao, Kwang-Tsao; Lin, Yeong-Shin; Liao, Yun-Chih
2013-12-01
The complete mitochondrial genome of the three-spot seahorse was sequenced using a polymerase chain reaction-based method. The total length of mitochondrial DNA is 16,535 bp and includes 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, and a control region. The mitochondrial gene order of the three-spot seahorse also conforms to the distinctive vertebrate mitochondrial gene order. The base composition of the genome is A (32.7%), T (29.3%), C (23.4%), and G (14.6%) with an A + T-rich hallmark as that of other vertebrate mitochondrial genomes.
VaDiR: an integrated approach to Variant Detection in RNA.
Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy
2018-02-01
Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.
Cryptic tRNAs in chaetognath mitochondrial genomes.
Barthélémy, Roxane-Marie; Seligmann, Hervé
2016-06-01
The chaetognaths constitute a small and enigmatic phylum of little marine invertebrates. Both nuclear and mitochondrial genomes have numerous originalities, some phylum-specific. Until recently, their mitogenomes seemed containing only one tRNA gene (trnMet), but a recent study found in two chaetognath mitogenomes two and four tRNA genes. Moreover, apparently two conspecific mitogenomes have different tRNA gene numbers (one and two). Reanalyses by tRNAscan-SE and ARWEN softwares of the five available complete chaetognath mitogenomes suggest numerous additional tRNA genes from different types. Their total number never reaches the 22 found in most other invertebrates using that genetic code. Predicted error compensation between codon-anticodon mismatch and tRNA misacylation suggests translational activity by tRNAs predicted solely according to secondary structure for tRNAs predicted by tRNAscan-SE, not ARWEN. Numbers of predicted stop-suppressor (antitermination) tRNAs coevolve with predicted overlapping, frameshifted protein coding genes including stop codons. Sequence alignments in secondary structure prediction with non-chaetognath tRNAs suggest that the most likely functional tRNAs are in intergenic regions, as regular mt-tRNAs. Due to usually short intergenic regions, generally tRNA sequences partially overlap with flanking genes. Some tRNA pairs seem templated by sense-antisense strands. Moreover, 16S rRNA genes, but not 12S rRNAs, appear as tRNA nurseries, as previously suggested for multifunctional ribosomal-like protogenomes. Copyright © 2016 Elsevier Ltd. All rights reserved.
Pandey, Manmohan; Kumar, Ravindra; Srivastava, Prachi; Agarwal, Suyash; Srivastava, Shreya; Nagpure, Naresh S; Jena, Joy K; Kushwaha, Basdeo
2018-03-16
Mining and characterization of Simple Sequence Repeat (SSR) markers from whole genomes provide valuable information about biological significance of SSR distribution and also facilitate development of markers for genetic analysis. Whole genome sequencing (WGS)-SSR Annotation Tool (WGSSAT) is a graphical user interface pipeline developed using Java Netbeans and Perl scripts which facilitates in simplifying the process of SSR mining and characterization. WGSSAT takes input in FASTA format and automates the prediction of genes, noncoding RNA (ncRNA), core genes, repeats and SSRs from whole genomes followed by mapping of the predicted SSRs onto a genome (classified according to genes, ncRNA, repeats, exonic, intronic, and core gene region) along with primer identification and mining of cross-species markers. The program also generates a detailed statistical report along with visualization of mapped SSRs, genes, core genes, and RNAs. The features of WGSSAT were demonstrated using Takifugu rubripes data. This yielded a total of 139 057 SSR, out of which 113 703 SSR primer pairs were uniquely amplified in silico onto a T. rubripes (fugu) genome. Out of 113 703 mined SSRs, 81 463 were from coding region (including 4286 exonic and 77 177 intronic), 7 from RNA, 267 from core genes of fugu, whereas 105 641 SSR and 601 SSR primer pairs were uniquely mapped onto the medaka genome. WGSSAT is tested under Ubuntu Linux. The source code, documentation, user manual, example dataset and scripts are available online at https://sourceforge.net/projects/wgssat-nbfgr.
Cloning and molecular evolution of the aldehyde dehydrogenase 2 gene (Aldh2) in bats (Chiroptera).
Chen, Yao; Shen, Bin; Zhang, Junpeng; Jones, Gareth; He, Guimei
2013-02-01
Old World fruit bats (Pteropodidae) and New World fruit bats (Phyllostomidae) ingest significant quantities of ethanol while foraging. Mitochondrial aldehyde dehydrogenase (ALDH2, encoded by the Aldh2 gene) plays an important role in ethanol metabolism. To test whether the Aldh2 gene has undergone adaptive evolution in frugivorous and nectarivorous bats in relation to ethanol elimination, we sequenced part of the coding region of the gene (1,143 bp, ~73 % coverage) in 14 bat species, including three Old World fruit bats and two New World fruit bats. Our results showed that the Aldh2 coding sequences are highly conserved across all bat species we examined, and no evidence of positive selection was detected in the ancestral branches leading to Old World fruit bats and New World fruit bats. Further research is needed to determine whether other genes involved in ethanol metabolism have been the targets of positive selection in frugivorous and nectarivorous bats.
Rider, Stanley Dean
2016-07-01
The complete mitochondrial genome of the desert darkling beetle Asbolus verrucosus (LeConte, 1851) was sequenced using paired-end technology to an average depth of 42,111× and assembled using De Bruijn graph-based methods. The genome is 15,828 bp in length and conforms to the basal arthropod mitochondrial gene composition with the same gene orders and orientations as other darkling beetle mitochondria. This arrangement includes a control region, 22 tRNA genes, 2 rRNA genes and 13 protein-coding genes. The main coding strand is probably replicated as the lagging strand (GC skew of -0.36 and AT skew of +0.19). Phylogenomics analyses are consistent with taxonomic classifications and indicate that Tenebrio molitor is the closest relative that has a completely sequenced mitochondrial genome available for analysis. This is the first fully assembled mitogenome sequence for a darkling beetle in the subfamily Pimeliinae and will be useful for population studies on members of this ecologically important group of beetles.
Liu, Feng; Pang, Shaojun; Luo, Minbo
2016-01-01
Sargassum fusiforme (Harvey) Setchell (=Hizikia fusiformis (Harvey) Okamura) is one of the most important economic seaweeds for mariculture in China. In this study, we present the complete mitochondrial genome of S. fusiforme. The genome is 34,696 bp in length with circular organization, encoding the standard set of three ribosomal RNA genes (rRNA), 25 transfer RNA genes (tRNA), 35 protein-coding genes, and two conserved open reading frames (ORFs). Its total AT content is 62.47%, lower than other brown algae except Pylaiella littoralis. The mitogenome carries 1571 bp of intergenic region constituting 4.53% of the genome, and 13 pairs of overlapping genes with the overlap size from 1 to 90 bp. The phylogenetic analyses based on 35 protein-coding genes reveal that S. fusiforme has a closer evolutionary relationship with Sargassum muticum than Sargassum horneri, indicating Hizikia are not distinct evolutionary entity and should be reduced to synonymy with Sargassum.
Su, Huei-Jiun; Hu, Jer-Ming
2012-01-01
Background and Aims The holoparasitic flowering plant Balanophora displays extreme floral reduction and was previously found to have enormous rate acceleration in the nuclear 18S rDNA region. So far, it remains unclear whether non-ribosomal, protein-coding genes of Balanophora also evolve in an accelerated fashion and whether the genes with high substitution rates retain their functionality. To tackle these issues, six different genes were sequenced from two Balanophora species and their rate variation and expression patterns were examined. Methods Sequences including nuclear PI, euAP3, TM6, LFY and RPB2 and mitochondrial matR were determined from two Balanophora spp. and compared with selected hemiparasitic species of Santalales and autotrophic core eudicots. Gene expression was detected for the six protein-coding genes and the expression patterns of the three B-class genes (PI, AP3 and TM6) were further examined across different organs of B. laxiflora using RT-PCR. Key Results Balanophora mitochondrial matR is highly accelerated in both nonsynonymous (dN) and synonymous (dS) substitution rates, whereas the rate variation of nuclear genes LFY, PI, euAP3, TM6 and RPB2 are less dramatic. Significant dS increases were detected in Balanophora PI, TM6, RPB2 and dN accelerations in euAP3. All of the protein-coding genes are expressed in inflorescences, indicative of their functionality. PI is restrictively expressed in tepals, synandria and floral bracts, whereas AP3 and TM6 are widely expressed in both male and female inflorescences. Conclusions Despite the observation that rates of sequence evolution are generally higher in Balanophora than in hemiparasitic species of Santalales and autotrophic core eudicots, the five nuclear protein-coding genes are functional and are evolving at a much slower rate than 18S rDNA. The mechanism or mechanisms responsible for rapid sequence evolution and concomitant rate acceleration for 18S rDNA and matR are currently not well understood and require further study in Balanophora and other holoparasites. PMID:23041381
Dzialo, Magdalena; Szopa, Jan; Czuj, Tadeusz; Zuk, Magdalena
2017-01-01
Chalcone synthase (CHS) has been recognized as an essential enzyme in the phenylpropanoid biosynthesis pathway. Apart from the leading role in the production of phenolic compounds with many valuable biological activities beneficial to biomedicine, CHS is well appreciated in science. Genetic engineering greatly facilitates expanding knowledge on the function and genetics of CHS in plants. The CHS gene is one of the most intensively studied genes in flax. In our study, we investigated engineering of the CHS gene through genetic and epigenetic approaches. Considering the numerous restrictions concerning the application of genetically modified (GM) crops, the main purpose of this research was optimization of the plant's modulation via epigenetics. In our study, plants modified through two methods were compared: a widely popular agrotransformation and a relatively recent oligodeoxynucleotide (ODN) strategy. It was recently highlighted that the ODN technique can be a rapid and time-serving antecedent in quick analysis of gene function before taking vector-mediated transformation. In order to understand the molecular background of epigenetic variation in more detail and evaluate the use of ODNs as a tool for predictable and stable gene engineering, we concentrated on the integration of gene expression and gene-body methylation. The treatment of flax with a series of short oligonucleotides homologous to a different part of CHS gene isoforms revealed that those directed to regulatory gene regions (5′- and 3′-UTR) activated gene expression, directed to non-coding region (introns) caused gen activity reduction, while those homologous to a coding region may have a variable influence on its activity. Gene expression changes were accompanied by changes in its methylation status. However, only certain (CCGG) motifs along the gene sequence were affected. The analyzed DNA motifs of the CHS flax gene are more accessible for methylation when located within a CpG island. The methylation motifs also led to rearrangement of the nucleosome location. The obtained results suggest high specificity of ODN action and establish a potential valuable alternative for improvement of crops. PMID:28555142
Dzialo, Magdalena; Szopa, Jan; Czuj, Tadeusz; Zuk, Magdalena
2017-01-01
Chalcone synthase (CHS) has been recognized as an essential enzyme in the phenylpropanoid biosynthesis pathway. Apart from the leading role in the production of phenolic compounds with many valuable biological activities beneficial to biomedicine, CHS is well appreciated in science. Genetic engineering greatly facilitates expanding knowledge on the function and genetics of CHS in plants. The CHS gene is one of the most intensively studied genes in flax. In our study, we investigated engineering of the CHS gene through genetic and epigenetic approaches. Considering the numerous restrictions concerning the application of genetically modified (GM) crops, the main purpose of this research was optimization of the plant's modulation via epigenetics. In our study, plants modified through two methods were compared: a widely popular agrotransformation and a relatively recent oligodeoxynucleotide (ODN) strategy. It was recently highlighted that the ODN technique can be a rapid and time-serving antecedent in quick analysis of gene function before taking vector-mediated transformation. In order to understand the molecular background of epigenetic variation in more detail and evaluate the use of ODNs as a tool for predictable and stable gene engineering, we concentrated on the integration of gene expression and gene-body methylation. The treatment of flax with a series of short oligonucleotides homologous to a different part of CHS gene isoforms revealed that those directed to regulatory gene regions (5'- and 3'-UTR) activated gene expression, directed to non-coding region (introns) caused gen activity reduction, while those homologous to a coding region may have a variable influence on its activity. Gene expression changes were accompanied by changes in its methylation status. However, only certain (CCGG) motifs along the gene sequence were affected. The analyzed DNA motifs of the CHS flax gene are more accessible for methylation when located within a CpG island. The methylation motifs also led to rearrangement of the nucleosome location. The obtained results suggest high specificity of ODN action and establish a potential valuable alternative for improvement of crops.
The complete sequence of the mitochondrial genome of Arctic fox (Alopex lagopus).
Yan, Shou-Qing; Guo, Peng-Cheng; Yue, Yuan; Li, Wan-Hong; Bai, Chun-Yan; Li, Yu-Mei; Sun, Jin-Hai; Zhao, Zhi-Hui
2016-11-01
In the present study, the complete mitochondrial genome sequence of Arctic fox (Alopex lagopus) was determined for the first time. It has a total length of 16,656 bp, and contains 13 protein-coding genes, 22 tRNA genes, 2 ribosome RNA genes and 1 control region. The nucleotide composition is 31.3% for A, 26.2% for C, 14.8% for G and 27.7% for T, respectively. The D-loop region located between tRNA Pro and tRNA Phe contains a (ACACGTACACGCAT) 18 tandem repeat array. The data will be useful for the investigation of the genetic structure and diversity in the natural and farmed population of Arctic foxes.
Genetic Variation Linked to Lung Cancer Survival in White Smokers | Center for Cancer Research
CCR investigators have discovered evidence that links lung cancer survival with genetic variations (called single nucleotide polymorphisms) in the MBL2 gene, a key player in innate immunity. The variations in the gene, which codes for a protein called the mannose-binding lectin, occur in its promoter region, where the RNA polymerase molecule binds to start transcription, and
2010-01-01
Background Snake mitochondrial genomes are of great interest in understanding mitogenomic evolution because of gene duplications and rearrangements and the fast evolutionary rate of their genes compared to other vertebrates. Mitochondrial gene sequences have also played an important role in attempts to resolve the contentious phylogenetic relationships of especially the early divergences among alethinophidian snakes. Two recent innovative studies found dramatic gene- and branch-specific relative acceleration in snake protein-coding gene evolution, particularly along internal branches leading to Serpentes and Alethinophidia. It has been hypothesized that some of these rate shifts are temporally (and possibly causally) associated with control region duplication and/or major changes in ecology and anatomy. Results The near-complete mitochondrial (mt) genomes of three henophidian snakes were sequenced: Anilius scytale, Rhinophis philippinus, and Charina trivirgata. All three genomes share a duplicated control region and translocated tRNALEU, derived features found in all alethinophidian snakes studied to date. The new sequence data were aligned with mt genome data for 21 other species of snakes and used in phylogenetic analyses. Phylogenetic results agreed with many other studies in recovering several robust clades, including Colubroidea, Caenophidia, and Cylindrophiidae+Uropeltidae. Nodes within Henophidia that have been difficult to resolve robustly in previous analyses remained uncompellingly resolved here. Comparisons of relative rates of evolution of rRNA vs. protein-coding genes were conducted by estimating branch lengths across the tree. Our expanded sampling revealed dramatic acceleration along the branch leading to Typhlopidae, particularly long rRNA terminal branches within Scolecophidia, and that most of the dramatic acceleration in protein-coding gene rate along Serpentes and Alethinophidia branches occurred before Anilius diverged from other alethinophidians. Conclusions Mitochondrial gene sequence data alone may not be able to robustly resolve basal divergences among alethinophidian snakes. Taxon sampling plays an important role in identifying mitogenomic evolutionary events within snakes, and in testing hypotheses explaining their origin. Dramatic rate shifts in mitogenomic evolution occur within Scolecophidia as well as Alethinophidia, thus falsifying the hypothesis that these shifts in snakes are associated exclusively with evolution of a non-burrowing lifestyle, macrostomatan feeding ecology and/or duplication of the control region, both restricted to alethinophidians among living snakes. PMID:20055998
Wang, Aide; Yamakake, Junko; Kudo, Hisayuki; Wakasa, Yuhya; Hatsuyama, Yoshimichi; Igarashi, Megumi; Kasai, Atsushi; Li, Tianzhong; Harada, Takeo
2009-01-01
Expression of MdACS1, coding for 1-aminocyclopropane-1-carboxylate synthase (ACS), parallels the level of ethylene production in ripening apple (Malus domestica) fruit. Here we show that expression of another ripening-specific ACS gene (MdACS3) precedes the initiation of MdACS1 expression by approximately 3 weeks; MdACS3 expression then gradually decreases as MdACS1 expression increases. Because MdACS3 expression continues in ripening fruit treated with 1-methylcyclopropene, its transcription appears to be regulated by a negative feedback mechanism. Three genes in the MdACS3 family (a, b, and c) were isolated from a genomic library, but two of them (MdACS3b and MdACS3c) possess a 333-bp transposon-like insertion in their 5′ flanking region that may prevent transcription of these genes during ripening. A single nucleotide polymorphism in the coding region of MdACS3a results in an amino acid substitution (glycine-289 → valine) in the active site that inactivates the enzyme. Furthermore, another null allele of MdACS3a, Mdacs3a, showing no ability to be transcribed, was found by DNA sequencing. Apple cultivars homozygous or heterozygous for both null allelotypes showed no or very low expression of ripening-related genes and maintained fruit firmness. These results suggest that MdACS3a plays a crucial role in regulation of fruit ripening in apple, and is a possible determinant of ethylene production and shelf life in apple fruit. PMID:19587104
Natural variation in non-coding regions underlying phenotypic diversity in budding yeast
Salinas, Francisco; de Boer, Carl G.; Abarca, Valentina; García, Verónica; Cuevas, Mara; Araos, Sebastian; Larrondo, Luis F.; Martínez, Claudio; Cubillos, Francisco A.
2016-01-01
Linkage mapping studies in model organisms have typically focused their efforts in polymorphisms within coding regions, ignoring those within regulatory regions that may contribute to gene expression variation. In this context, differences in transcript abundance are frequently proposed as a source of phenotypic diversity between individuals, however, until now, little molecular evidence has been provided. Here, we examined Allele Specific Expression (ASE) in six F1 hybrids from Saccharomyces cerevisiae derived from crosses between representative strains of the four main lineages described in yeast. ASE varied between crosses with levels ranging between 28% and 60%. Part of the variation in expression levels could be explained by differences in transcription factors binding to polymorphic cis-regulations and to differences in trans-activation depending on the allelic form of the TF. Analysis on highly expressed alleles on each background suggested ASN1 as a candidate transcript underlying nitrogen consumption differences between two strains. Further promoter allele swap analysis under fermentation conditions confirmed that coding and non-coding regions explained aspartic and glutamic acid consumption differences, likely due to a polymorphism affecting Uga3 binding. Together, we provide a new catalogue of variants to bridge the gap between genotype and phenotype. PMID:26898953
Contribution of transposable elements in the plant's genome.
Sahebi, Mahbod; Hanafi, Mohamed M; van Wijnen, Andre J; Rice, David; Rafii, M Y; Azizi, Parisa; Osman, Mohamad; Taheri, Sima; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat; Noor, Yusuf Muhammad
2018-07-30
Plants maintain extensive growth flexibility under different environmental conditions, allowing them to continuously and rapidly adapt to alterations in their environment. A large portion of many plant genomes consists of transposable elements (TEs) that create new genetic variations within plant species. Different types of mutations may be created by TEs in plants. Many TEs can avoid the host's defense mechanisms and survive alterations in transposition activity, internal sequence and target site. Thus, plant genomes are expected to utilize a variety of mechanisms to tolerate TEs that are near or within genes. TEs affect the expression of not only nearby genes but also unlinked inserted genes. TEs can create new promoters, leading to novel expression patterns or alternative coding regions to generate alternate transcripts in plant species. TEs can also provide novel cis-acting regulatory elements that act as enhancers or inserts within original enhancers that are required for transcription. Thus, the regulation of plant gene expression is strongly managed by the insertion of TEs into nearby genes. TEs can also lead to chromatin modifications and thereby affect gene expression in plants. TEs are able to generate new genes and modify existing gene structures by duplicating, mobilizing and recombining gene fragments. They can also facilitate cellular functions by sharing their transposase-coding regions. Hence, TE insertions can not only act as simple mutagens but can also alter the elementary functions of the plant genome. Here, we review recent discoveries concerning the contribution of TEs to gene expression in plant genomes and discuss the different mechanisms by which TEs can affect plant gene expression and reduce host defense mechanisms. Copyright © 2018 Elsevier B.V. All rights reserved.
Satellite DNA Modulates Gene Expression in the Beetle Tribolium castaneum after Heat Stress
Feliciello, Isidoro; Akrap, Ivana; Ugarković, Đurđica
2015-01-01
Non-coding repetitive DNAs have been proposed to perform a gene regulatory role, however for tandemly repeated satellite DNA no such role was defined until now. Here we provide the first evidence for a role of satellite DNA in the modulation of gene expression under specific environmental conditions. The major satellite DNA TCAST1 in the beetle Tribolium castaneum is preferentially located within pericentromeric heterochromatin but is also dispersed as single repeats or short arrays in the vicinity of protein-coding genes within euchromatin. Our results show enhanced suppression of activity of TCAST1-associated genes and slower recovery of their activity after long-term heat stress relative to the same genes without associated TCAST1 satellite DNA elements. The level of gene suppression is not influenced by the distance of TCAST1 elements from the associated genes up to 40 kb from the genes’ transcription start sites, but it does depend on the copy number of TCAST1 repeats within an element, being stronger for the higher number of copies. The enhanced gene suppression correlates with the enrichment of the repressive histone marks H3K9me2/3 at dispersed TCAST1 elements and their flanking regions as well as with increased expression of TCAST1 satellite DNA. The results reveal transient, RNAi based heterochromatin formation at dispersed TCAST1 repeats and their proximal regions as a mechanism responsible for enhanced silencing of TCAST1-associated genes. Differences in the pattern of distribution of TCAST1 elements contribute to gene expression diversity among T. castaneum strains after long-term heat stress and might have an impact on adaptation to different environmental conditions. PMID:26275223
Full-f version of GENE for turbulence in open-field-line systems
NASA Astrophysics Data System (ADS)
Pan, Q.; Told, D.; Shi, E. L.; Hammett, G. W.; Jenko, F.
2018-06-01
Unique properties of plasmas in the tokamak edge, such as large amplitude fluctuations and plasma-wall interactions in the open-field-line regions, require major modifications of existing gyrokinetic codes originally designed for simulating core turbulence. To this end, the global version of the 3D2V gyrokinetic code GENE, so far employing a δf-splitting technique, is extended to simulate electrostatic turbulence in straight open-field-line systems. The major extensions are the inclusion of the velocity-space nonlinearity, the development of a conducting-sheath boundary, and the implementation of the Lenard-Bernstein collision operator. With these developments, the code can be run as a full-f code and can handle particle loss to and reflection from the wall. The extended code is applied to modeling turbulence in the Large Plasma Device (LAPD), with a reduced mass ratio and a much lower collisionality. Similar to turbulence in a tokamak scrape-off layer, LAPD turbulence involves collisions, parallel streaming, cross-field turbulent transport with steep profiles, and particle loss at the parallel boundary.
Arthur-Farraj, Peter J; Morgan, Claire C; Adamowicz, Martyna; Gomez-Sanchez, Jose A; Fazal, Shaline V; Beucher, Anthony; Razzaghi, Bonnie; Mirsky, Rhona; Jessen, Kristjan R; Aitman, Timothy J
2017-09-12
Repair Schwann cells play a critical role in orchestrating nerve repair after injury, but the cellular and molecular processes that generate them are poorly understood. Here, we perform a combined whole-genome, coding and non-coding RNA and CpG methylation study following nerve injury. We show that genes involved in the epithelial-mesenchymal transition are enriched in repair cells, and we identify several long non-coding RNAs in Schwann cells. We demonstrate that the AP-1 transcription factor C-JUN regulates the expression of certain micro RNAs in repair Schwann cells, in particular miR-21 and miR-34. Surprisingly, unlike during development, changes in CpG methylation are limited in injury, restricted to specific locations, such as enhancer regions of Schwann cell-specific genes (e.g., Nedd4l), and close to local enrichment of AP-1 motifs. These genetic and epigenomic changes broaden our mechanistic understanding of the formation of repair Schwann cell during peripheral nervous system tissue repair. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Webb, Kristen M; Rosenthal, Benjamin M
2011-01-01
The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple individuals that necessarily comprise such templates. Copyright © 2010 Elsevier B.V. All rights reserved.
Meredith, Brian K.; Berry, Donagh P.; Kearney, Francis; Finlay, Emma K.; Fahey, Alan G.; Bradley, Daniel G.; Lynn, David J.
2013-01-01
Mastitis is an inflammation-driven disease of the bovine mammary gland that occurs in response to physical damage or infection and is one of the most costly production-related diseases in the dairy industry worldwide. We performed a genome-wide association study (GWAS) to identify genetic loci associated with somatic cell score (SCS), an indicator trait of mammary gland inflammation. A total of 702 Holstein-Friesian bulls were genotyped for 777,962 single nucleotide polymorphisms (SNPs) and associated with SCS phenotypes. The SCS phenotypes were expressed as daughter yield deviations (DYD) based on a large number of progeny performance records. A total of 138 SNPs on 15 different chromosomes reached genome-wide significance (corrected p-value ≤ 0.05) for association with SCS (after correction for multiple testing). We defined 28 distinct QTL regions and a number of candidate genes located in these QTL regions were identified. The most significant association (p-value = 1.70 × 10−7) was observed on chromosome 6. This QTL had no known genes annotated within it, however, the Ensembl Genome Browser predicted the presence of a small non-coding RNA (a Y RNA gene) in this genomic region. This Y RNA gene was 99% identical to human RNY4. Y RNAs are a rare type of non-coding RNA that were originally discovered due to their association with the autoimmune disease, systemic lupus erythematosus. Examining small-RNA sequencing (RNAseq) data being generated by us in multiple different mastitis-pathogen challenged cell-types has revealed that this Y RNA is expressed (but not differentially expressed) in these cells. Other QTL regions identified in this study also encoded strong candidate genes for mastitis susceptibility. A QTL region on chromosome 13, for example, was found to contain a cluster of β-defensin genes, a gene family with known roles in innate immunity. Due to the increased SNP density, this study also refined the boundaries for several known QTL for SCS and mastitis. PMID:24223582
Hong, S B; Hwang, I; Dessaux, Y; Guyon, P; Kim, K S; Farrand, S K
1997-01-01
The mechanisms that ensure that Ti plasmid T-DNA genes encoding proteins involved in the biosynthesis of opines in crown gall tumors are always matched by Ti plasmid genes conferring the ability to catabolize that set of opines on the inducing Agrobacterium strains are unknown. The pathway for the biosynthesis of the opine agropine is thought to require an enzyme, mannopine cyclase, coded for by the ags gene located in the T(R) region of octopine-type Ti plasmids. Extracts prepared from agropine-type tumors contained an activity that cyclized mannopine to agropine. Tumor cells containing a T region in which ags was mutated lacked this activity and did not contain agropine. Expression of ags from the lac promoter conferred mannopine-lactonizing activity on Escherichia coli. Agrobacterium tumefaciens strains harboring an octopine-type Ti plasmid exhibit a similar activity which is not coded for by ags. Analysis of the DNA sequence of the gene encoding this activity, called agcA, showed it to be about 60% identical to T-DNA ags genes. Relatedness decreased abruptly in the 5' and 3' untranslated regions of the genes. ags is preceded by a promoter that functions only in the plant. Expression analysis showed that agcA also is preceded by its own promoter, which is active in the bacterium. Translation of agcA yielded a protein of about 45 kDa, consistent with the size predicted from the DNA sequence. Antibodies raised against the agcA product cross-reacted with the anabolic enzyme. These results indicate that the agropine system arose by a duplication of a progenitor gene, one copy of which became associated with the T-DNA and the other copy of which remained associated with the bacterium. PMID:9244272
Nucleotide sequence of the gag gene and gag-pol junction of feline leukemia virus.
Laprevotte, I; Hampe, A; Sherr, C J; Galibert, F
1984-01-01
The nucleotide sequence of the gag gene of feline leukemia virus and its flanking sequences were determined and compared with the corresponding sequences of two strains of feline sarcoma virus and with that of the Moloney strain of murine leukemia virus. A high degree of nucleotide sequence homology between the feline leukemia virus and murine leukemia virus gag genes was observed, suggesting that retroviruses of domestic cats and laboratory mice have a common, proximal evolutionary progenitor. The predicted structure of the complete feline leukemia virus gag gene precursor suggests that the translation of nonglycosylated and glycosylated gag gene polypeptides is initiated at two different AUG codons. These initiator codons fall in the same reading frame and are separated by a 222-base-pair segment which encodes an amino terminal signal peptide. The nucleotide sequence predicts the order of amino acids in each of the individual gag-coded proteins (p15, p12, p30, p10), all of which derive from the gag gene precursor. Stable stem-and-loop secondary structures are proposed for two regions of viral RNA. The first falls within sequences at the 5' end of the viral genome, together with adjacent palindromic sequences which may play a role in dimer linkage of RNA subunits. The second includes coding sequences at the gag-pol junction and is proposed to be involved in translation of the pol gene product. Sequence analysis of the latter region shows that the gag and pol genes are translated in different reading frames. Classical consensus splice donor and acceptor sequences could not be localized to regions which would permit synthesis of the expected gag-pol precursor protein. Alternatively, we suggest that the pol gene product (RNA-dependent DNA polymerase) could be translated by a frameshift suppressing mechanism which could involve cleavage modification of stems and loops in a manner similar to that observed in tRNA processing. PMID:6328019