Qin, Yanhong; Wang, Li; Zhang, Zhenchen; Qiao, Qi; Zhang, Desheng; Tian, Yuting; Wang, Shuang; Wang, Yongjiang; Yan, Zhaoling
2014-01-01
Background Sweet potato chlorotic stunt virus (family Closteroviridae, genus Crinivirus) features a large bipartite, single-stranded, positive-sense RNA genome. To date, only three complete genomic sequences of SPCSV can be accessed through GenBank. SPCSV was first detected from China in 2011, only partial genomic sequences have been determined in the country. No report on the complete genomic sequence and genome structure of Chinese SPCSV isolates or the genetic relation between isolates from China and other countries is available. Methodology/Principal Findings The complete genomic sequences of five isolates from different areas in China were characterized. This study is the first to report the complete genome sequences of SPCSV from whitefly vectors. Genome structure analysis showed that isolates of WA and EA strains from China have the same coding protein as isolates Can181-9 and m2-47, respectively. Twenty cp genes and four RNA1 partial segments were sequenced and analyzed, and the nucleotide identities of complete genomic, cp, and RNA1 partial sequences were determined. Results indicated high conservation among strains and significant differences between WA and EA strains. Genetic analysis demonstrated that, except for isolates from Guangdong Province, SPCSVs from other areas belong to the WA strain. Genome organization analysis showed that the isolates in this study lack the p22 gene. Conclusions/Significance We presented the complete genome sequences of SPCSV in China. Comparison of nucleotide identities and genome structures between these isolates and previously reported isolates showed slight differences. The nucleotide identities of different SPCSV isolates showed high conservation among strains and significant differences between strains. All nine isolates in this study lacked p22 gene. WA strains were more extensively distributed than EA strains in China. These data provide important insights into the molecular variation and genomic structure of SPCSV in China as well as genetic relationships among isolates from China and other countries. PMID:25170926
Hirao, Tomonori; Watanabe, Atsushi; Kurita, Manabu; Kondo, Teiji; Takata, Katsuhiko
2008-06-23
The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. The observed differences in genomic structure between C. japonica and other land plants, including pines, strongly support the theory that the large IRs stabilize the cp genome. Furthermore, the deleted large IR and the numerous genomic rearrangements that have occurred in the C. japonica cp genome provide new insights into both the evolutionary lineage of coniferous species in gymnosperm and the evolution of the cp genome.
The three-dimensional genome organization of Drosophila melanogaster through data integration.
Li, Qingjiao; Tjong, Harianto; Li, Xiao; Gong, Ke; Zhou, Xianghong Jasmine; Chiolo, Irene; Alber, Frank
2017-07-31
Genome structures are dynamic and non-randomly organized in the nucleus of higher eukaryotes. To maximize the accuracy and coverage of three-dimensional genome structural models, it is important to integrate all available sources of experimental information about a genome's organization. It remains a major challenge to integrate such data from various complementary experimental methods. Here, we present an approach for data integration to determine a population of complete three-dimensional genome structures that are statistically consistent with data from both genome-wide chromosome conformation capture (Hi-C) and lamina-DamID experiments. Our structures resolve the genome at the resolution of topological domains, and reproduce simultaneously both sets of experimental data. Importantly, this data deconvolution framework allows for structural heterogeneity between cells, and hence accounts for the expected plasticity of genome structures. As a case study we choose Drosophila melanogaster embryonic cells, for which both data types are available. Our three-dimensional genome structures have strong predictive power for structural features not directly visible in the initial data sets, and reproduce experimental hallmarks of the D. melanogaster genome organization from independent and our own imaging experiments. Also they reveal a number of new insights about genome organization and its functional relevance, including the preferred locations of heterochromatic satellites of different chromosomes, and observations about homologous pairing that cannot be directly observed in the original Hi-C or lamina-DamID data. Our approach allows systematic integration of Hi-C and lamina-DamID data for complete three-dimensional genome structure calculation, while also explicitly considering genome structural variability.
Complete genome sequence of Enterobacter aerogenes KCTC 2190.
Shin, Sang Heum; Kim, Sewhan; Kim, Jae Young; Lee, Soojin; Um, Youngsoon; Oh, Min-Kyu; Kim, Young-Rok; Lee, Jinwon; Yang, Kap-Seok
2012-05-01
This is the first complete genome sequence of the Enterobacter aerogenes species. Here we present the genome sequence of E. aerogenes KCTC 2190, which contains 5,280,350 bp with a G + C content of 54.8 mol%, 4,912 protein-coding genes, and 109 structural RNAs.
Structured Matrix Completion with Applications to Genomic Data Integration.
Cai, Tianxi; Cai, T Tony; Zhang, Anru
2016-01-01
Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.
The complete genomic sequence of egg drop syndrome virus strain AAV-2.
Jin, Q; Zeng, L; Yang, F; Li, M; Hou, Y
1999-12-01
In the search for the genome of egg drop syndrome virus (EDSV-76) Chinese strain AAV-2, part of restriction endonuclease physical map is analyzed, the complete genomic library is organized. On basis of this, the complete genome nucleotide sequences (32 838 bp in length, including terminal structures) are determined. The data analysis shows: compared with the other Adenoviruses, strain AAV-2 has more disparity on genomic structure and the distribution of open reading frame (ORF). There are no clear E1, E3 and E4 regions in AAV-2 genome. Two segments located at both ends of genome (1.1 kb and 8.3 kb in length respectively) have no homology with the other adenovirus genomes. In addition, strain AAV-2 genome lacks ORFs encoding ElA, pV and pIX, which are common ORFs encoding early, lately proteins in Adenovirus. This reveals differences between EDSA-76, the sole standard strain of group III Avian Adenoviruses, and the other Avian Adenoviruses for the first time. It will help the search for Avian Adenovirus and will also help the search of all Adenoviruses.
Sorimachi, Kenji; Okayasu, Teiji; Ohhira, Shuji
2015-04-01
Normalized nucleotide and amino acid contents of complete genome sequences can be visualized as radar charts. The shapes of these charts depict the characteristics of an organism's genome. The normalized values calculated from the genome sequence theoretically exclude experimental errors. Further, because normalization is independent of both target size and kind, this procedure is applicable not only to single genes but also to whole genomes, which consist of a huge number of different genes. In this review, we discuss the applications of the normalization of the nucleotide and predicted amino acid contents of complete genomes to the investigation of genome structure and to evolutionary research from primitive organisms to Homo sapiens. Some of the results could never have been obtained from the analysis of individual nucleotide or amino acid sequences but were revealed only after the normalization of nucleotide and amino acid contents was applied to genome research. The discovery that genome structure was homogeneous was obtained only after normalization methods were applied to the nucleotide or predicted amino acid contents of genome sequences. Normalization procedures are also applicable to evolutionary research. Thus, normalization of the contents of whole genomes is a useful procedure that can help to characterize organisms.
Ciotlos, Serban; Mao, Qing; Zhang, Rebecca Yu; Li, Zhenyu; Chin, Robert; Gulbahce, Natali; Liu, Sophie Jia; Drmanac, Radoje; Peters, Brock A
2016-01-01
The cell line BT-474 is a popular cell line for studying the biology of cancer and developing novel drugs. However, there is no complete, published genome sequence for this highly utilized scientific resource. In this study we sought to provide a comprehensive and useful data set for the scientific community by generating a whole genome sequence for BT-474. Five μg of genomic DNA, isolated from an early passage of the BT-474 cell line, was used to generate a whole genome sequence (114X coverage) using Complete Genomics' standard sequencing process. To provide additional variant phasing and structural variation data we also processed and analyzed two separate libraries of 5 and 6 individual cells to depths of 99X and 87X, respectively, using Complete Genomics' Long Fragment Read (LFR) technology. BT-474 is a highly aneuploid cell line with an extremely complex genome sequence. This ~300X total coverage genome sequence provides a more complete understanding of this highly utilized cell line at the genomic level.
Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan
2015-12-11
High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.
Kim, Sanghee; Lim, Byung-Jin; Min, Gi-Sik; Choi, Han-Gu
2013-05-10
Copepoda is the most diverse and abundant group of crustaceans, but its phylogenetic relationships are ambiguous. Mitochondrial (mt) genomes are useful for studying evolutionary history, but only six complete Copepoda mt genomes have been made available and these have extremely rearranged genome structures. This study determined the mt genome of Calanus hyperboreus, making it the first reported Arctic copepod mt genome and the first complete mt genome of a calanoid copepod. The mt genome of C. hyperboreus is 17,910 bp in length and it contains the entire set of 37 mt genes, including 13 protein-coding genes, 2 rRNAs, and 22 tRNAs. It has a very unusual gene structure, including the longest control region reported for a crustacean, a large tRNA gene cluster, and reversed GC skews in 11 out of 13 protein-coding genes (84.6%). Despite the unusual features, comparing this genome to published copepod genomes revealed retained pan-crustacean features, as well as a conserved calanoid-specific pattern. Our data provide a foundation for exploring the calanoid pattern and the mechanisms of mt gene rearrangement in the evolutionary history of the copepod mt genome. Copyright © 2012 Elsevier B.V. All rights reserved.
Complete mitochondrial genome of a wild Siberian tiger.
Sun, Yujiao; Lu, Taofeng; Sun, Zhaohui; Guan, Weijun; Liu, Zhensheng; Teng, Liwei; Wang, Shuo; Ma, Yuehui
2015-01-01
In this study, the complete mitochondrial genome of Siberian tiger (Panthera tigris altaica) was sequenced, using muscle tissue obtained from a male wild tiger. The total length of the mitochondrial genome is 16,996 bp. The genome structure of this tiger is in accordance with other Siberian tigers and it contains 12S rRNA gene, 16S rRNA gene, 22 tRNA genes, 13 protein-coding genes, and 1 control region.
Wang, Shuo
2016-01-01
We announce here the first complete chloroplast genome sequence of the tropical japonica rice, along with its genome structure and functional annotation. The plant was collected from Indonesia and deposited as a germplasm accession of the International Rice GenBank Collection (IRGC 66630) at the International Rice Research Institute (IRRI). This genome provides valuable data for the future utilization of the germplasm of rice. PMID:26893422
The Divided Bacterial Genome: Structure, Function, and Evolution.
diCenzo, George C; Finan, Turlough M
2017-09-01
Approximately 10% of bacterial genomes are split between two or more large DNA fragments, a genome architecture referred to as a multipartite genome. This multipartite organization is found in many important organisms, including plant symbionts, such as the nitrogen-fixing rhizobia, and plant, animal, and human pathogens, including the genera Brucella , Vibrio , and Burkholderia . The availability of many complete bacterial genome sequences means that we can now examine on a broad scale the characteristics of the different types of DNA molecules in a genome. Recent work has begun to shed light on the unique properties of each class of replicon, the unique functional role of chromosomal and nonchromosomal DNA molecules, and how the exploitation of novel niches may have driven the evolution of the multipartite genome. The aims of this review are to (i) outline the literature regarding bacterial genomes that are divided into multiple fragments, (ii) provide a meta-analysis of completed bacterial genomes from 1,708 species as a way of reviewing the abundant information present in these genome sequences, and (iii) provide an encompassing model to explain the evolution and function of the multipartite genome structure. This review covers, among other topics, salient genome terminology; mechanisms of multipartite genome formation; the phylogenetic distribution of multipartite genomes; how each part of a genome differs with respect to genomic signatures, genetic variability, and gene functional annotation; how each DNA molecule may interact; as well as the costs and benefits of this genome structure. Copyright © 2017 American Society for Microbiology.
Wang, Shuo; Gao, Li-Zhi
2016-02-18
We announce here the first complete chloroplast genome sequence of the tropical japonica rice, along with its genome structure and functional annotation. The plant was collected from Indonesia and deposited as a germplasm accession of the International Rice GenBank Collection (IRGC 66630) at the International Rice Research Institute (IRRI). This genome provides valuable data for the future utilization of the germplasm of rice. Copyright © 2016 Wang and Gao.
The complete chloroplast genome sequence of Dodonaea viscosa: comparative and phylogenetic analyses.
Saina, Josphat K; Gichira, Andrew W; Li, Zhi-Zhong; Hu, Guang-Wan; Wang, Qing-Feng; Liao, Kuo
2018-02-01
The plant chloroplast (cp) genome is a highly conserved structure which is beneficial for evolution and systematic research. Currently, numerous complete cp genome sequences have been reported due to high throughput sequencing technology. However, there is no complete chloroplast genome of genus Dodonaea that has been reported before. To better understand the molecular basis of Dodonaea viscosa chloroplast, we used Illumina sequencing technology to sequence its complete genome. The whole length of the cp genome is 159,375 base pairs (bp), with a pair of inverted repeats (IRs) of 27,099 bp separated by a large single copy (LSC) 87,204 bp, and small single copy (SSC) 17,972 bp. The annotation analysis revealed a total of 115 unique genes of which 81 were protein coding, 30 tRNA, and four ribosomal RNA genes. Comparative genome analysis with other closely related Sapindaceae members showed conserved gene order in the inverted and single copy regions. Phylogenetic analysis clustered D. viscosa with other species of Sapindaceae with strong bootstrap support. Finally, a total of 249 SSRs were detected. Moreover, a comparison of the synonymous (Ks) and nonsynonymous (Ka) substitution rates in D. viscosa showed very low values. The availability of cp genome reported here provides a valuable genetic resource for comprehensive further studies in genetic variation, taxonomy and phylogenetic evolution of Sapindaceae family. In addition, SSR markers detected will be used in further phylogeographic and population structure studies of the species in this genus.
Genomic Diversity and Evolution of the Lyssaviruses
Delmas, Olivier; Holmes, Edward C.; Talbi, Chiraz; Larrous, Florence; Dacheux, Laurent; Bouchier, Christiane; Bourhy, Hervé
2008-01-01
Lyssaviruses are RNA viruses with single-strand, negative-sense genomes responsible for rabies-like diseases in mammals. To date, genomic and evolutionary studies have most often utilized partial genome sequences, particularly of the nucleoprotein and glycoprotein genes, with little consideration of genome-scale evolution. Herein, we report the first genomic and evolutionary analysis using complete genome sequences of all recognised lyssavirus genotypes, including 14 new complete genomes of field isolates from 6 genotypes and one genotype that is completely sequenced for the first time. In doing so we significantly increase the extent of genome sequence data available for these important viruses. Our analysis of these genome sequence data reveals that all lyssaviruses have the same genomic organization. A phylogenetic analysis reveals strong geographical structuring, with the greatest genetic diversity in Africa, and an independent origin for the two known genotypes that infect European bats. We also suggest that multiple genotypes may exist within the diversity of viruses currently classified as ‘Lagos Bat’. In sum, we show that rigorous phylogenetic techniques based on full length genome sequence provide the best discriminatory power for genotype classification within the lyssaviruses. PMID:18446239
Complete genome sequencing and evolutionary analysis of Indian isolates of Dengue virus type 2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dash, Paban Kumar, E-mail: pabandash@rediffmail.com; Sharma, Shashi; Soni, Manisha
Highlights: •Complete genome of Indian DENV-2 was deciphered for the first time in this study. •The recent Indian DENV-2 revealed presence of many unique amino acid residues. •Genotype shift (American to Cosmopolitan) characterizes evolution of DENV-2 in India. •Circulation of a unique clade of DENV-2 in South Asia was identified. -- Abstract: Dengue is the most important arboviral infection of global public health significance. It is now endemic in most parts of the South East Asia including India. Though Dengue virus type 2 (DENV-2) is predominantly associated with major outbreaks in India, complete genome information of Indian DENV-2 is notmore » available. In this study, the full-length genome of five DENV-2 isolates (four from 2001 to 2011 and one from 1960), from different parts of India was determined. The complete genome of the Indian DENV-2 was found to be 10,670 bases long with an open reading frame coding for 3391 amino acids. The recent Indian DENV-2 (2001–2011) revealed a nucleotide sequence identity of around 90% and 97% with an older Indian DENV-2 (1960) and closely related Sri Lankan and Chinese DENV-2 respectively. Presence of unique amino acid residues and non-conservative substitutions in critical amino acid residues of major structural and non-structural proteins was observed in recent Indian DENV-2. Selection pressure analysis revealed positive selection in few amino acid sites of the genes encoding for structural and non-structural proteins. The molecular phylogenetic analysis based on comparison of both complete coding region and envelope protein gene with globally diverse DENV-2 viruses classified the recent Indian isolates into a unique South Asian clade within Cosmopolitan genotype. A shift of genotype from American to Cosmopolitan in 1970s characterized the evolution of DENV-2 in India. Present study is the first report on complete genome characterization of emerging DENV-2 isolates from India and highlights the circulation of a unique clade in South Asia.« less
The complete chloroplast genomes of two Wisteria species, W. floribunda and W. sinensis (Fabaceae).
Kim, Na-Rae; Kim, Kyunghee; Lee, Sang-Choon; Lee, Jung-Hoon; Cho, Seong-Hyun; Yu, Yeisoo; Kim, Young-Dong; Yang, Tae-Jin
2016-11-01
Wisteria floribunda and Wisteria sinensis are ornamental woody vines in the Fabaceae. The complete chloroplast genome sequences of the two species were generated by de novo assembly using whole genome next generation sequences. The chloroplast genomes of W. floribunda and W. sinensis were 130 960 bp and 130 561 bp long, respectively, and showed inverted repeat (IR)-lacking structures as those reported in IRLC in the Fabaceae. The chloroplast genomes of both species contained same number of protein-coding sequences (77), tRNA genes (30), and rRNA genes (4). The phylogenetic analysis with the reported chloroplast genomes confirmed close taxonomical relationship of W. floribunda and W. sinensis.
The FlyBase database of the Drosophila genome projects and community literature
2003-01-01
FlyBase (http://flybase.bio.indiana.edu/) provides an integrated view of the fundamental genomic and genetic data on the major genetic model Drosophila melanogaster and related species. FlyBase has primary responsibility for the continual reannotation of the D. melanogaster genome. The ultimate goal of the reannotation effort is to decorate the euchromatic sequence of the genome with as much biological information as is available from the community and from the major genome project centers. A complete revision of the annotations of the now-finished euchromatic genomic sequence has been completed. There are many points of entry to the genome within FlyBase, most notably through maps, gene products and ontologies, structured phenotypic and gene expression data, and anatomy. PMID:12519974
Liu, Shikai; Zhang, Jiaren; Yao, Jun; Liu, Zhanjiang
2016-05-01
The complete mitochondrial genome of the armored catfish, Hypostomus plecostomus, was determined by next generation sequencing of genomic DNA without prior sample processing or primer design. Bioinformatics analysis resulted in the entire mitochondrial genome sequence with length of 16,523 bp. The H. plecostomus mitochondrial genome is consisted of 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes, and 1 control region, showing typical circular molecule structure of mitochondrial genome as in other vertebrates. The whole genome base composition was estimated to be 31.8% A, 27.0% T, 14.6% G, and 26.6% C, with A/T bias of 58.8%. This work provided the H. plecostomus mitochondrial genome sequence which should be valuable for species identification, phylogenetic analysis and conservation genetics studies in catfishes.
Complete Chloroplast Genome Sequences of Important Oilseed Crop Sesamum indicum L
Yi, Dong-Keun; Kim, Ki-Joong
2012-01-01
Sesamum indicum is an important crop plant species for yielding oil. The complete chloroplast (cp) genome of S. indicum (GenBank acc no. JN637766) is 153,324 bp in length, and has a pair of inverted repeat (IR) regions consisting of 25,141 bp each. The lengths of the large single copy (LSC) and the small single copy (SSC) regions are 85,170 bp and 17,872 bp, respectively. Comparative cp DNA sequence analyses of S. indicum with other cp genomes reveal that the genome structure, gene order, gene and intron contents, AT contents, codon usage, and transcription units are similar to the typical angiosperm cp genomes. Nucleotide diversity of the IR region between Sesamum and three other cp genomes is much lower than that of the LSC and SSC regions in both the coding region and noncoding region. As a summary, the regional constraints strongly affect the sequence evolution of the cp genomes, while the functional constraints weakly affect the sequence evolution of cp genomes. Five short inversions associated with short palindromic sequences that form step-loop structures were observed in the chloroplast genome of S. indicum. Twenty-eight different simple sequence repeat loci have been detected in the chloroplast genome of S. indicum. Almost all of the SSR loci were composed of A or T, so this may also contribute to the A-T richness of the cp genome of S. indicum. Seven large repeated loci in the chloroplast genome of S. indicum were also identified and these loci are useful to developing S. indicum-specific cp genome vectors. The complete cp DNA sequences of S. indicum reported in this paper are prerequisite to modifying this important oilseed crop by cp genetic engineering techniques. PMID:22606240
The complete mitochondrial genome sequence of the maned wolf (Chrysocyon brachyurus).
Zhao, Chao; Yang, Xiufeng; Zhang, Honghai; Zhang, Jin; Chen, Lei; Sha, Weilai; Liu, Guangshuai
2016-01-01
In this study, the complete mitochondrial genome of the maned wolf (Chrysocyon brachyurus), the unique species in Chrysocyon, was sequenced and reported for the first time using blood samples obtained from a female individual in Shanghai Zoo, China. Sequence analysis showed that the genome structure was in accordance with other Canidae species and it contained 12 S rRNA gene, 16 S rRNA gene, 22 tRNA genes, 13 protein-coding genes and 1 control region.
Assembly of cucumber (Cucumis sativus L.) somaclones
NASA Astrophysics Data System (ADS)
Skarzyńska, Agnieszka; Kuśmirek, Wiktor; Pawełkowicz, Magdalena; PlÄ der, Wojciech; Nowak, Robert M.
2017-08-01
The development of next generation sequencing opens the possibility of using sequencing in various plant studies, such as finding structural changes and small polymorphisms between species and within them. Most analyzes rely on genomic sequences and it is crucial to use well-assembled genomes of high quality and completeness. Herein we compare commonly available programs for genomic assembling and newly developed software - dnaasm. Assemblies were tested on cucumber (Cucumis sativus L.) lines obtained by in vitro regeneration (somaclones), showing different phenotypes. Obtained results shows that dnaasm assembler is a good tool for short read assembly, which allows obtaining genomes of high quality and completeness.
The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.
Qian, Jun; Song, Jingyuan; Gao, Huanhuan; Zhu, Yingjie; Xu, Jiang; Pang, Xiaohui; Yao, Hui; Sun, Chao; Li, Xian'en; Li, Chuyuan; Liu, Juyan; Xu, Haibin; Chen, Shilin
2013-01-01
Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.
Logacheva, Maria D; Samigullin, Tahir H; Dhingra, Amit; Penin, Aleksey A
2008-01-01
Background Chloroplast genome sequences are extremely informative about species-interrelationships owing to its non-meiotic and often uniparental inheritance over generations. The subject of our study, Fagopyrum esculentum, is a member of the family Polygonaceae belonging to the order Caryophyllales. An uncertainty remains regarding the affinity of Caryophyllales and the asterids that could be due to undersampling of the taxa. With that background, having access to the complete chloroplast genome sequence for Fagopyrum becomes quite pertinent. Results We report the complete chloroplast genome sequence of a wild ancestor of cultivated buckwheat, Fagopyrum esculentum ssp. ancestrale. The sequence was rapidly determined using a previously described approach that utilized a PCR-based method and employed universal primers, designed on the scaffold of multiple sequence alignment of chloroplast genomes. The gene content and order in buckwheat chloroplast genome is similar to Spinacia oleracea. However, some unique structural differences exist: the presence of an intron in the rpl2 gene, a frameshift mutation in the rpl23 gene and extension of the inverted repeat region to include the ycf1 gene. Phylogenetic analysis of 61 protein-coding gene sequences from 44 complete plastid genomes provided strong support for the sister relationships of Caryophyllales (including Polygonaceae) to asterids. Further, our analysis also provided support for Amborella as sister to all other angiosperms, but interestingly, in the bayesian phylogeny inference based on first two codon positions Amborella united with Nymphaeales. Conclusion Comparative genomics analyses revealed that the Fagopyrum chloroplast genome harbors the characteristic gene content and organization as has been described for several other chloroplast genomes. However, it has some unique structural features distinct from previously reported complete chloroplast genome sequences. Phylogenetic analysis of the dataset, including this new sequence from non-core Caryophyllales supports the sister relationship between Caryophyllales and asterids. PMID:18492277
Reductive evolution of chloroplasts in non-photosynthetic plants, algae and protists.
Hadariová, Lucia; Vesteg, Matej; Hampl, Vladimír; Krajčovič, Juraj
2018-04-01
Chloroplasts are generally known as eukaryotic organelles whose main function is photosynthesis. They perform other functions, however, such as synthesizing isoprenoids, fatty acids, heme, iron sulphur clusters and other essential compounds. In non-photosynthetic lineages that possess plastids, the chloroplast genomes have been reduced and most (or all) photosynthetic genes have been lost. Consequently, non-photosynthetic plastids have also been reduced structurally. Some of these non-photosynthetic or "cryptic" plastids were overlooked or unrecognized for decades. The number of complete plastid genome sequences and/or transcriptomes from non-photosynthetic taxa possessing plastids is rapidly increasing, thus allowing prediction of the functions of non-photosynthetic plastids in various eukaryotic lineages. In some non-photosynthetic eukaryotes with photosynthetic ancestors, no traces of plastid genomes or of plastids have been found, suggesting that they have lost the genomes or plastids completely. This review summarizes current knowledge of non-photosynthetic plastids, their genomes, structures and potential functions in free-living and parasitic plants, algae and protists. We introduce a model for the order of plastid gene losses which combines models proposed earlier for land plants with the patterns of gene retention and loss observed in protists. The rare cases of plastid genome loss and complete plastid loss are also discussed.
Challenges in NMR-based structural genomics
NASA Astrophysics Data System (ADS)
Sue, Shih-Che; Chang, Chi-Fon; Huang, Yao-Te; Chou, Ching-Yu; Huang, Tai-huang
2005-05-01
Understanding the functions of the vast number of proteins encoded in many genomes that have been completely sequenced recently is the main challenge for biologists in the post-genomics era. Since the function of a protein is determined by its exact three-dimensional structure it is paramount to determine the 3D structures of all proteins. This need has driven structural biologists to undertake the structural genomics project aimed at determining the structures of all known proteins. Several centers for structural genomics studies have been established throughout the world. Nuclear magnetic resonance (NMR) spectroscopy has played a major role in determining protein structures in atomic details and in a physiologically relevant solution state. Since the number of new genes being discovered daily far exceeds the number of structures determined by both NMR and X-ray crystallography, a high-throughput method for speeding up the process of protein structure determination is essential for the success of the structural genomics effort. In this article we will describe NMR methods currently being employed for protein structure determination. We will also describe methods under development which may drastically increase the throughput, as well as point out areas where opportunities exist for biophysicists to make significant contribution in this important field.
Genomic Diversity in the Endosymbiotic Bacterium Rhizobium leguminosarum.
Sánchez-Cañizares, Carmen; Jorrín, Beatriz; Durán, David; Nadendla, Suvarna; Albareda, Marta; Rubio-Sanz, Laura; Lanza, Mónica; González-Guerrero, Manuel; Prieto, Rosa Isabel; Brito, Belén; Giglio, Michelle G; Rey, Luis; Ruiz-Argüeso, Tomás; Palacios, José M; Imperial, Juan
2018-01-24
Rhizobium leguminosarum bv. viciae is a soil α-proteobacterium that establishes a diazotrophic symbiosis with different legumes of the Fabeae tribe. The number of genome sequences from rhizobial strains available in public databases is constantly increasing, although complete, fully annotated genome structures from rhizobial genomes are scarce. In this work, we report and analyse the complete genome of R. leguminosarum bv. viciae UPM791. Whole genome sequencing can provide new insights into the genetic features contributing to symbiotically relevant processes such as bacterial adaptation to the rhizosphere, mechanisms for efficient competition with other bacteria, and the ability to establish a complex signalling dialogue with legumes, to enter the root without triggering plant defenses, and, ultimately, to fix nitrogen within the host. Comparison of the complete genome sequences of two strains of R. leguminosarum bv. viciae , 3841 and UPM791, highlights the existence of different symbiotic plasmids and a common core chromosome. Specific genomic traits, such as plasmid content or a distinctive regulation, define differential physiological capabilities of these endosymbionts. Among them, strain UPM791 presents unique adaptations for recycling the hydrogen generated in the nitrogen fixation process.
Genome Editing of Structural Variations: Modeling and Gene Correction.
Park, Chul-Yong; Sung, Jin Jea; Kim, Dong-Wook
2016-07-01
The analysis of chromosomal structural variations (SVs), such as inversions and translocations, was made possible by the completion of the human genome project and the development of genome-wide sequencing technologies. SVs contribute to genetic diversity and evolution, although some SVs can cause diseases such as hemophilia A in humans. Genome engineering technology using programmable nucleases (e.g., ZFNs, TALENs, and CRISPR/Cas9) has been rapidly developed, enabling precise and efficient genome editing for SV research. Here, we review advances in modeling and gene correction of SVs, focusing on inversion, translocation, and nucleotide repeat expansion. Copyright © 2016 Elsevier Ltd. All rights reserved.
The Complete Chloroplast Genome of Wild Rice (Oryza minuta) and Its Comparison to Related Species.
Asaf, Sajjad; Waqas, Muhammad; Khan, Abdul L; Khan, Muhammad A; Kang, Sang-Mo; Imran, Qari M; Shahzad, Raheem; Bilal, Saqib; Yun, Byung-Wook; Lee, In-Jung
2017-01-01
Oryza minuta , a tetraploid wild relative of cultivated rice (family Poaceae), possesses a BBCC genome and contains genes that confer resistance to bacterial blight (BB) and white-backed (WBPH) and brown (BPH) plant hoppers. Based on the importance of this wild species, this study aimed to understand the phylogenetic relationships of O. minuta with other Oryza species through an in-depth analysis of the composition and diversity of the chloroplast (cp) genome. The analysis revealed a cp genome size of 135,094 bp with a typical quadripartite structure and consisting of a pair of inverted repeats separated by small and large single copies, 139 representative genes, and 419 randomly distributed microsatellites. The genomic organization, gene order, GC content and codon usage are similar to those of typical angiosperm cp genomes. Approximately 30 forward, 28 tandem and 20 palindromic repeats were detected in the O . minuta cp genome. Comparison of the complete O. minuta cp genome with another eleven Oryza species showed a high degree of sequence similarity and relatively high divergence of intergenic spacers. Phylogenetic analyses were conducted based on the complete genome sequence, 65 shared genes and matK gene showed same topologies and O. minuta forms a single clade with parental O. punctata . Thus, the complete O . minuta cp genome provides interesting insights and valuable information that can be used to identify related species and reconstruct its phylogeny.
Farré, Marta; Robinson, Terence J; Ruiz-Herrera, Aurora
2015-05-01
Our understanding of genomic reorganization, the mechanics of genomic transmission to offspring during germ line formation, and how these structural changes contribute to the speciation process, and genetic disease is far from complete. Earlier attempts to understand the mechanism(s) and constraints that govern genome remodeling suffered from being too narrowly focused, and failed to provide a unified and encompassing view of how genomes are organized and regulated inside cells. Here, we propose a new multidisciplinary Integrative Breakage Model for the study of genome evolution. The analysis of the high-level structural organization of genomes (nucleome), together with the functional constrains that accompany genome reshuffling, provide insights into the origin and plasticity of genome organization that may assist with the detection and isolation of therapeutic targets for the treatment of complex human disorders. © 2015 WILEY Periodicals, Inc.
Oka, Tomoichiro; Doan, Yen Hai; Shimoike, Takashi; Haga, Kei; Takizawa, Takenori
2017-12-01
Sapoviruses (SaVs) are enteric viruses and have been detected in various mammals. They are divided into multiple genogroups and genotypes based on the entire major capsid protein (VP1) encoding region sequences. In this study, we determined the first complete genome sequences of two genogroup V, genotype 3 (GV.3) SaV strains detected from swine fecal samples, in combination with Illumina MiSeq sequencing of the libraries prepared from viral RNA and PCR products. The lengths of the viral genome (7494 nucleotides [nt] excluding polyA tail) and short 5'-untranslated region (14 nt) as well as two predicted open reading frames are similar to those of other SaVs. The amino acid differences between the two porcine SaVs are most frequent in the central region of the VP1-encoding region. A stem-loop structure which was predicted in the first 41 nt of the 5'-terminal region of GV.3 SaVs and the other available complete genome sequences of SaVs may have a critical role in viral genome replication. Our study provides complete genome sequences of rarely reported GV.3 SaV strains and highlights the common 5'-terminal genomic feature of SaVs detected from different mammalian species.
Two complete chloroplast genome sequences of Cannabis sativa varieties.
Oh, Hyehyun; Seo, Boyoung; Lee, Seunghwan; Ahn, Dong-Ha; Jo, Euna; Park, Jin-Kyoung; Min, Gi-Sik
2016-07-01
In this study, we determined the complete chloroplast (cp) genomes from two varieties of Cannabis sativa. The genome sizes were 153,848 bp (the Korean non-drug variety, Cheungsam) and 153,854 bp (the African variety, Yoruba Nigeria). The genome structures were identical with 131 individual genes [86 protein-coding genes (PCGs), eight rRNA, and 37 tRNA genes]. Further, except for the presence of an intron in the rps3 genes of two C. sativa varieties, the cp genomes of C. sativa had conservative features similar to that of all known species in the order Rosales. To verify the position of C. sativa within the order Rosales, we conducted phylogenetic analysis by using concatenated sequences of all PCGs from 17 complete cp genomes. The resulting tree strongly supported monophyly of Rosales. Further, the family Cannabaceae, represented by C. sativa, showed close relationship with the family Moraceae. The phylogenetic relationship outlined in our study is well congruent with those previously shown for the order Rosales.
The complete mitochondrial genome of the North Chinese Leopard (Panthera pardus japonensis).
Dou, Hailong; Feng, Limin; Xiao, Wenhong; Wang, Tianming
2016-01-01
The North Chinese Leopard (Panthera pardus japonensis) is an endemic subspecies of Panthera pardus to China, living in small and isolated populations with a severely fragmented distribution. Here we first sequenced and annotated its complete mitochondrial genome. The total length of the North Chinese Leopard is of 16,966 base pairs that consist of 2 rRNA gene, 22 tRNA genes, 13 protein-coding genes, 1 OLR and 1 control region (CR). The structures of the genomes were highly similar to other Felidae.
Istace, Benjamin; Friedrich, Anne; d'Agata, Léo; Faye, Sébastien; Payen, Emilie; Beluche, Odette; Caradec, Claudia; Davidas, Sabrina; Cruaud, Corinne; Liti, Gianni; Lemainque, Arnaud; Engelen, Stefan; Wincker, Patrick; Schacherer, Joseph; Aury, Jean-Marc
2017-02-01
Oxford Nanopore Technologies Ltd (Oxford, UK) have recently commercialized MinION, a small single-molecule nanopore sequencer, that offers the possibility of sequencing long DNA fragments from small genomes in a matter of seconds. The Oxford Nanopore technology is truly disruptive; it has the potential to revolutionize genomic applications due to its portability, low cost, and ease of use compared with existing long reads sequencing technologies. The MinION sequencer enables the rapid sequencing of small eukaryotic genomes, such as the yeast genome. Combined with existing assembler algorithms, near complete genome assemblies can be generated and comprehensive population genomic analyses can be performed. Here, we resequenced the genome of the Saccharomyces cerevisiae S288C strain to evaluate the performance of nanopore-only assemblers. Then we de novo sequenced and assembled the genomes of 21 isolates representative of the S. cerevisiae genetic diversity using the MinION platform. The contiguity of our assemblies was 14 times higher than the Illumina-only assemblies and we obtained one or two long contigs for 65 % of the chromosomes. This high contiguity allowed us to accurately detect large structural variations across the 21 studied genomes. Because of the high completeness of the nanopore assemblies, we were able to produce a complete cartography of transposable elements insertions and inspect structural variants that are generally missed using a short-read sequencing strategy. Our analyses show that the Oxford Nanopore technology is already usable for de novo sequencing and assembly; however, non-random errors in homopolymers require polishing the consensus using an alternate sequencing technology. © The Author 2017. Published by Oxford University Press.
Istace, Benjamin; Friedrich, Anne; d'Agata, Léo; Faye, Sébastien; Payen, Emilie; Beluche, Odette; Caradec, Claudia; Davidas, Sabrina; Cruaud, Corinne; Liti, Gianni; Lemainque, Arnaud; Engelen, Stefan; Wincker, Patrick; Schacherer, Joseph
2017-01-01
Abstract Background: Oxford Nanopore Technologies Ltd (Oxford, UK) have recently commercialized MinION, a small single-molecule nanopore sequencer, that offers the possibility of sequencing long DNA fragments from small genomes in a matter of seconds. The Oxford Nanopore technology is truly disruptive; it has the potential to revolutionize genomic applications due to its portability, low cost, and ease of use compared with existing long reads sequencing technologies. The MinION sequencer enables the rapid sequencing of small eukaryotic genomes, such as the yeast genome. Combined with existing assembler algorithms, near complete genome assemblies can be generated and comprehensive population genomic analyses can be performed. Results: Here, we resequenced the genome of the Saccharomyces cerevisiae S288C strain to evaluate the performance of nanopore-only assemblers. Then we de novo sequenced and assembled the genomes of 21 isolates representative of the S. cerevisiae genetic diversity using the MinION platform. The contiguity of our assemblies was 14 times higher than the Illumina-only assemblies and we obtained one or two long contigs for 65 % of the chromosomes. This high contiguity allowed us to accurately detect large structural variations across the 21 studied genomes. Conclusion: Because of the high completeness of the nanopore assemblies, we were able to produce a complete cartography of transposable elements insertions and inspect structural variants that are generally missed using a short-read sequencing strategy. Our analyses show that the Oxford Nanopore technology is already usable for de novo sequencing and assembly; however, non-random errors in homopolymers require polishing the consensus using an alternate sequencing technology. PMID:28369459
The complete mitochondrial genome sequence of Malus hupehensis var. pinyiensis.
Duan, Naibin; Sun, Honghe; Wang, Nan; Fei, Zhangjun; Chen, Xuesen
2016-07-01
The complete mitochondrial genome sequence of Malus hupehensis var. pinyiensis, a widely used apple rootstock, was determined using the Illumina high-throughput sequencing approach. The genome is 422,555 bp in length and has a GC content of 45.21%. It is separated by a pair of inverted repeats of 32,504 bp, to form a large single copy region of 213,055 bp and a small single copy region of 144,492 bp. The genome contains 38 protein-coding genes, four pseudogenes, 25 tRNA genes, and three rRNA genes. The genome is 25,608 bp longer than that of M. domestica, and several structural variations between these two mitogenomes were detected.
Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies.
Card, Daren C; Schield, Drew R; Reyes-Velasco, Jacobo; Fujita, Matthew K; Andrew, Audra L; Oyler-McCance, Sara J; Fike, Jennifer A; Tomback, Diana F; Ruggiero, Robert P; Castoe, Todd A
2014-01-01
As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5-5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.
Two low coverage bird genomes and a comparison of reference-guided versus de novo genome assemblies
Card, Daren C.; Schield, Drew R.; Reyes-Velasco, Jacobo; Fujita, Matthre K.; Andrew, Audra L.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Tomback, Diana F.; Ruggiero, Robert P.; Castoe, Todd A.
2014-01-01
As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (~3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies.
Hamilton, Eileen P; Kapusta, Aurélie; Huvos, Piroska E; Bidwell, Shelby L; Zafar, Nikhat; Tang, Haibao; Hadjithomas, Michalis; Krishnakumar, Vivek; Badger, Jonathan H; Caler, Elisabet V; Russ, Carsten; Zeng, Qiandong; Fan, Lin; Levin, Joshua Z; Shea, Terrance; Young, Sarah K; Hegarty, Ryan; Daza, Riza; Gujja, Sharvari; Wortman, Jennifer R; Birren, Bruce W; Nusbaum, Chad; Thomas, Jainy; Carey, Clayton M; Pritham, Ellen J; Feschotte, Cédric; Noto, Tomoko; Mochizuki, Kazufumi; Papazyan, Romeo; Taverna, Sean D; Dear, Paul H; Cassidy-Hanley, Donna M; Xiong, Jie; Miao, Wei; Orias, Eduardo; Coyne, Robert S
2016-01-01
The germline genome of the binucleated ciliate Tetrahymena thermophila undergoes programmed chromosome breakage and massive DNA elimination to generate the somatic genome. Here, we present a complete sequence assembly of the germline genome and analyze multiple features of its structure and its relationship to the somatic genome, shedding light on the mechanisms of genome rearrangement as well as the evolutionary history of this remarkable germline/soma differentiation. Our results strengthen the notion that a complex, dynamic, and ongoing interplay between mobile DNA elements and the host genome have shaped Tetrahymena chromosome structure, locally and globally. Non-standard outcomes of rearrangement events, including the generation of short-lived somatic chromosomes and excision of DNA interrupting protein-coding regions, may represent novel forms of developmental gene regulation. We also compare Tetrahymena’s germline/soma differentiation to that of other characterized ciliates, illustrating the wide diversity of adaptations that have occurred within this phylum. DOI: http://dx.doi.org/10.7554/eLife.19090.001 PMID:27892853
LINE-1 Elements in Structural Variation and Disease
Beck, Christine R.; Garcia-Perez, José Luis; Badge, Richard M.; Moran, John V.
2014-01-01
The completion of the human genome reference sequence ushered in a new era for the study and discovery of human transposable elements. It now is undeniable that transposable elements, historically dismissed as junk DNA, have had an instrumental role in sculpting the structure and function of our genomes. In particular, long interspersed element-1 (LINE-1 or L1) and short interspersed elements (SINEs) continue to affect our genome, and their movement can lead to sporadic cases of disease. Here, we briefly review the types of transposable elements present in the human genome and their mechanisms of mobility. We next highlight how advances in DNA sequencing and genomic technologies have enabled the discovery of novel retrotransposons in individual genomes. Finally, we discuss how L1-mediated retrotransposition events impact human genomes. PMID:21801021
Cormier, Alexandre; Avia, Komlan; Sterck, Lieven; Derrien, Thomas; Wucher, Valentin; Andres, Gwendoline; Monsoor, Misharl; Godfroy, Olivier; Lipinska, Agnieszka; Perrineau, Marie-Mathilde; Van De Peer, Yves; Hitte, Christophe; Corre, Erwan; Coelho, Susana M; Cock, J Mark
2017-04-01
The genome of the filamentous brown alga Ectocarpus was the first to be completely sequenced from within the brown algal group and has served as a key reference genome both for this lineage and for the stramenopiles. We present a complete structural and functional reannotation of the Ectocarpus genome. The large-scale assembly of the Ectocarpus genome was significantly improved and genome-wide gene re-annotation using extensive RNA-seq data improved the structure of 11 108 existing protein-coding genes and added 2030 new loci. A genome-wide analysis of splicing isoforms identified an average of 1.6 transcripts per locus. A large number of previously undescribed noncoding genes were identified and annotated, including 717 loci that produce long noncoding RNAs. Conservation of lncRNAs between Ectocarpus and another brown alga, the kelp Saccharina japonica, suggests that at least a proportion of these loci serve a function. Finally, a large collection of single nucleotide polymorphism-based markers was developed for genetic analyses. These resources are available through an updated and improved genome database. This study significantly improves the utility of the Ectocarpus genome as a high-quality reference for the study of many important aspects of brown algal biology and as a reference for genomic analyses across the stramenopiles. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Justice, Joshua L; Weese, David A; Santos, Scott Ross
2016-07-01
The Atyidae are caridean shrimp possessing hair-like setae on their claws and are important contributors to ecological services in tropical and temperate fresh and brackish water ecosystems. Complete mitochondrial genomes have only been reported from five of the 449 species in the family, thus limiting understanding of mitochondrial genome evolution and the phylogenetic utility of complete mitochondrial sequences in the Atyidae. Here, comparative analyses of complete mitochondrial genomes from eight genetic lineages of Halocaridina rubra, an atyid endemic to the anchialine ecosystem of the Hawaiian Archipelago, are presented. Although gene number, order, and orientation were syntenic among genomes, three regions were identified and further quantified where conservation was substantially lower: (1) high length and sequence variability in the tRNA-Lys and tRNA-Asp intergenic region; (2) a 317-bp insertion between the NAD6 and CytB genes confined to a single lineage and representing a partial duplication of CytB; and (3) the putative control region. Phylogenetic analyses utilizing complete mitochondrial sequences provided new insights into relationships among the H. rubra genetic lineages, with the topology of one clade correlating to the geologic sequence of the islands. However, deeper nodes in the phylogeny lacked bootstrap support. Overall, our results from H. rubra suggest intra-specific mitochondrial genomic diversity could be underestimated across the Metazoa since the vast majority of complete genomes are from just a single individual of a species.
Complete mitochondrial genome of a Asian lion (Panthera leo goojratensis).
Li, Yu-Fei; Wang, Qiang; Zhao, Jian-ning
2016-01-01
The entire mitochondrial genome of this Asian lion (Panthera leo goojratensis) was 17,183 bp in length, gene composition and arrangement conformed to other lions, which contained the typical structure of 22 tRNAs, 2 rRNAs, 13 protein-coding genes and a non-coding region. The characteristic of the mitochondrial genome was analyzed in detail.
Zhang, Yanjun; Du, Liuwen; Liu, Ao; Chen, Jianjun; Wu, Li; Hu, Weiming; Zhang, Wei; Kim, Kyunghee; Lee, Sang-Choon; Yang, Tae-Jin; Wang, Ying
2016-01-01
Epimedium L. is a phylogenetically and economically important genus in the family Berberidaceae. We here sequenced the complete chloroplast (cp) genomes of four Epimedium species using Illumina sequencing technology via a combination of de novo and reference-guided assembly, which was also the first comprehensive cp genome analysis on Epimedium combining the cp genome sequence of E. koreanum previously reported. The five Epimedium cp genomes exhibited typical quadripartite and circular structure that was rather conserved in genomic structure and the synteny of gene order. However, these cp genomes presented obvious variations at the boundaries of the four regions because of the expansion and contraction of the inverted repeat (IR) region and the single-copy (SC) boundary regions. The trnQ-UUG duplication occurred in the five Epimedium cp genomes, which was not found in the other basal eudicotyledons. The rapidly evolving cp genome regions were detected among the five cp genomes, as well as the difference of simple sequence repeats (SSR) and repeat sequence were identified. Phylogenetic relationships among the five Epimedium species based on their cp genomes showed accordance with the updated system of the genus on the whole, but reminded that the evolutionary relationships and the divisions of the genus need further investigation applying more evidences. The availability of these cp genomes provided valuable genetic information for accurately identifying species, taxonomy and phylogenetic resolution and evolution of Epimedium, and assist in exploration and utilization of Epimedium plants. PMID:27014326
Teng, Y; Liu, H; Lv, J Q; Fan, W H; Zhang, Q Y; Qin, Q W
2007-01-01
The complete genome of spring viraemia of carp virus (SVCV) strain A-1 isolated from cultured common carp (Cyprinus carpio) in China was sequenced and characterized. Reverse transcription-polymerase chain reaction (RT-PCR) derived clones were constructed and the DNA was sequenced. It showed that the entire genome of SVCV A-1 consists of 11,100 nucleotide base pairs, the predicted size of the viral RNA of rhabdoviruses. However, the additional insertions in bp 4633-4676 and bp 4684-4724 of SVCV A-1 were different from the other two published SVCV complete genomes. Five open reading frames (ORFs) of SVCV A-1 were identified and further confirmed by RT-PCR and DNA sequencing of their respective RT-PCR products. The 5 structural proteins encoded by the viral RNA were ordered 3'-N-P-M-G-L-5'. This is the first report of a complete genome sequence of SVCV isolated from cultured carp in China. Phylogenetic analysis indicates that SVCV A-1 is closely related to the members of the genus Vesiculovirus, family Rhabdoviridae.
Wang, Aishuai; Sun, Yuena; Wu, Changwen
2016-11-01
The complete mitochondrial genome of the Cheilodactylus quadricornis was firstly determined in the present study. The mitochondrial genome of C. quadricornis is 16 521 nucleotides, comprising 13 protein-coding genes and 2 ribosomal RNA genes, 22 tRNA genes and 2 main non-coding regions (the control region and the origin of the light-strand replication). The overall base composition was T, 26.3%; C, 29.6%; A, 27.8% and G, 16.3%. The gene arrangement, base composition, and tRNA structures of the complete mitochondrial genome of C. quadricornis is similar to other teleosts. Only two central conserved sequence blocks (CSB-2 and CSB-3) were identified in the control region. In addition, the conserved motif 5'-GCCGG-3' was identified in the origin of light-strand replication of C. quadricornis. The complete mitochondrial genome of C. quadricornis was used to construct phylogenetic tree, which shows that C. quadricornis and C. variegatus clustered in a clade and formed a sister relationship. This mitogenome sequence data would play an important role in population genetics and phylogenetic analysis of the Cheilodactylidae.
Cheng, Hui; Li, Jinfeng; Zhang, Hong; Cai, Binhua; Gao, Zhihong
2017-01-01
Compared with other members of the family Rosaceae, the chloroplast genomes of Fragaria species exhibit low variation, and this situation has limited phylogenetic analyses; thus, complete chloroplast genome sequencing of Fragaria species is needed. In this study, we sequenced the complete chloroplast genome of F. × ananassa ‘Benihoppe’ using the Illumina HiSeq 2500-PE150 platform and then performed a combination of de novo assembly and reference-guided mapping of contigs to generate complete chloroplast genome sequences. The chloroplast genome exhibits a typical quadripartite structure with a pair of inverted repeats (IRs, 25,936 bp) separated by large (LSC, 85,531 bp) and small (SSC, 18,146 bp) single-copy (SC) regions. The length of the F. × ananassa ‘Benihoppe’ chloroplast genome is 155,549 bp, representing the smallest Fragaria chloroplast genome observed to date. The genome encodes 112 unique genes, comprising 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Comparative analysis of the overall nucleotide sequence identity among ten complete chloroplast genomes confirmed that for both coding and non-coding regions in Rosaceae, SC regions exhibit higher sequence variation than IRs. The Ka/Ks ratio of most genes was less than 1, suggesting that most genes are under purifying selection. Moreover, the mVISTA results also showed a high degree of conservation in genome structure, gene order and gene content in Fragaria, particularly among three octoploid strawberries which were F. × ananassa ‘Benihoppe’, F. chiloensis (GP33) and F. virginiana (O477). However, when the sequences of the coding and non-coding regions of F. × ananassa ‘Benihoppe’ were compared in detail with those of F. chiloensis (GP33) and F. virginiana (O477), a number of SNPs and InDels were revealed by MEGA 7. Six non-coding regions (trnK-matK, trnS-trnG, atpF-atpH, trnC-petN, trnT-psbD and trnP-psaJ) with a percentage of variable sites greater than 1% and no less than five parsimony-informative sites were identified and may be useful for phylogenetic analysis of the genus Fragaria. PMID:29038765
Martin, Guillaume; Baurens, Franc-Christophe; Cardi, Céline; Aury, Jean-Marc; D’Hont, Angélique
2013-01-01
Background Banana (genus Musa) is a crop of major economic importance worldwide. It is a monocotyledonous member of the Zingiberales, a sister group of the widely studied Poales. Most cultivated bananas are natural Musa inter-(sub-)specific triploid hybrids. A Musa acuminata reference nuclear genome sequence was recently produced based on sequencing of genomic DNA enriched in nucleus. Methodology/Principal Findings The Musa acuminata chloroplast genome was assembled with chloroplast reads extracted from whole-genome-shotgun sequence data. The Musa chloroplast genome is a circular molecule of 169,972 bp with a quadripartite structure containing two single copy regions, a Large Single Copy region (LSC, 88,338 bp) and a Small Single Copy region (SSC, 10,768 bp) separated by Inverted Repeat regions (IRs, 35,433 bp). Two forms of the chloroplast genome relative to the orientation of SSC versus LSC were found. The Musa chloroplast genome shows an extreme IR expansion at the IR/SSC boundary relative to the most common structures found in angiosperms. This expansion consists of the integration of three additional complete genes (rps15, ndhH and ycf1) and part of the ndhA gene. No such expansion has been observed in monocots so far. Simple Sequence Repeats were identified in the Musa chloroplast genome and a new set of Musa chloroplastic markers was designed. Conclusion The complete sequence of M. acuminata ssp malaccensis chloroplast we reported here is the first one for the Zingiberales order. As such it provides new insight in the evolution of the chloroplast of monocotyledons. In particular, it reinforces that IR/SSC expansion has occurred independently several times within monocotyledons. The discovery of new polymorphic markers within Musa chloroplast opens new perspectives to better understand the origin of cultivated triploid bananas. PMID:23840670
Martin, Guillaume; Baurens, Franc-Christophe; Cardi, Céline; Aury, Jean-Marc; D'Hont, Angélique
2013-01-01
Banana (genus Musa) is a crop of major economic importance worldwide. It is a monocotyledonous member of the Zingiberales, a sister group of the widely studied Poales. Most cultivated bananas are natural Musa inter-(sub-)specific triploid hybrids. A Musa acuminata reference nuclear genome sequence was recently produced based on sequencing of genomic DNA enriched in nucleus. The Musa acuminata chloroplast genome was assembled with chloroplast reads extracted from whole-genome-shotgun sequence data. The Musa chloroplast genome is a circular molecule of 169,972 bp with a quadripartite structure containing two single copy regions, a Large Single Copy region (LSC, 88,338 bp) and a Small Single Copy region (SSC, 10,768 bp) separated by Inverted Repeat regions (IRs, 35,433 bp). Two forms of the chloroplast genome relative to the orientation of SSC versus LSC were found. The Musa chloroplast genome shows an extreme IR expansion at the IR/SSC boundary relative to the most common structures found in angiosperms. This expansion consists of the integration of three additional complete genes (rps15, ndhH and ycf1) and part of the ndhA gene. No such expansion has been observed in monocots so far. Simple Sequence Repeats were identified in the Musa chloroplast genome and a new set of Musa chloroplastic markers was designed. The complete sequence of M. acuminata ssp malaccensis chloroplast we reported here is the first one for the Zingiberales order. As such it provides new insight in the evolution of the chloroplast of monocotyledons. In particular, it reinforces that IR/SSC expansion has occurred independently several times within monocotyledons. The discovery of new polymorphic markers within Musa chloroplast opens new perspectives to better understand the origin of cultivated triploid bananas.
Self-similarity analysis of eubacteria genome based on weighted graph.
Qi, Zhao-Hui; Li, Ling; Zhang, Zhi-Meng; Qi, Xiao-Qin
2011-07-07
We introduce a weighted graph model to investigate the self-similarity characteristics of eubacteria genomes. The regular treating in similarity comparison about genome is to discover the evolution distance among different genomes. Few people focus their attention on the overall statistical characteristics of each gene compared with other genes in the same genome. In our model, each genome is attributed to a weighted graph, whose topology describes the similarity relationship among genes in the same genome. Based on the related weighted graph theory, we extract some quantified statistical variables from the topology, and give the distribution of some variables derived from the largest social structure in the topology. The 23 eubacteria recently studied by Sorimachi and Okayasu are markedly classified into two different groups by their double logarithmic point-plots describing the similarity relationship among genes of the largest social structure in genome. The results show that the proposed model may provide us with some new sights to understand the structures and evolution patterns determined from the complete genomes. Copyright © 2011 Elsevier Ltd. All rights reserved.
Puli'uvea, Christopher; Khan, Subuhi; Chang, Wee-Leong; Valmonte, Gardette; Pearson, Michael N; Higgins, Colleen M
2017-02-01
We present the first complete genome of vanilla mosaic virus (VanMV). The VanMV genomic structure is consistent with that of a potyvirus, containing a single open reading frame (ORF) encoding a polyprotein of 3139 amino acids. Motif analyses indicate the polyprotein can be cleaved into the expected ten individual proteins; other recognised potyvirus motifs are also present. As expected, the VanMV genome shows high sequence similarity to the published Dasheen mosaic virus (DsMV) genome sequences; comparisons with DsMV continue to support VanMV as a vanilla infecting strain of DsMV. Phylogenetic analyses indicate that VanMV and DsMV share a common ancestor, with VanMV having the closest relationship with DsMV strains from the South Pacific.
The complete plastid genome of the middle Asian endemic of Stipa lipskyi (Poaceae).
Myszczyński, Kamil; Nobis, Marcin; Szczecinska, Monika; Sawicki, Jakub; Nowak, Arkadiusz
2016-11-01
The structure of the Stipa lipskyi (GenBank accession no. KT692644) plastid genome is similar to that of closely related Poaceae species: it has a total length of 137 755 bp, the base composition of the plastome is the following: A (30.7%), C (19.3%), G (19.4%) and T (30.5%). The S. lipskyi plastid genome contains 71 genes, excluding second IR region. A complete plastome sequence of S. lipskyi will help the development of primers for examining phylogeny and hybridization events in this taxonomically difficult genus.
The complete chloroplast genome sequence of Hibiscus syriacus.
Kwon, Hae-Yun; Kim, Joon-Hyeok; Kim, Sea-Hyun; Park, Ji-Min; Lee, Hyoshin
2016-09-01
The complete chloroplast genome sequence of Hibiscus syriacus L. is presented in this study. The genome is composed of 161 019 bp in length, with a typical circular structure containing a pair of inverted repeats of 25 745 bp of length separated by a large single-copy region and a small single-copy region of 89 698 bp and 19 831 bp of length, respectively. The overall GC content is 36.8%. One hundred and fourteen genes were annotated, including 81 protein-coding genes, 4 ribosomal RNA genes and 29 transfer RNA genes.
The Mouse Genomes Project: a repository of inbred laboratory mouse strain genomes.
Adams, David J; Doran, Anthony G; Lilue, Jingtao; Keane, Thomas M
2015-10-01
The Mouse Genomes Project was initiated in 2009 with the goal of using next-generation sequencing technologies to catalogue molecular variation in the common laboratory mouse strains, and a selected set of wild-derived inbred strains. The initial sequencing and survey of sequence variation in 17 inbred strains was completed in 2011 and included comprehensive catalogue of single nucleotide polymorphisms, short insertion/deletions, larger structural variants including their fine scale architecture and landscape of transposable element variation, and genomic sites subject to post-transcriptional alteration of RNA. From this beginning, the resource has expanded significantly to include 36 fully sequenced inbred laboratory mouse strains, a refined and updated data processing pipeline, and new variation querying and data visualisation tools which are available on the project's website ( http://www.sanger.ac.uk/resources/mouse/genomes/ ). The focus of the project is now the completion of de novo assembled chromosome sequences and strain-specific gene structures for the core strains. We discuss how the assembled chromosomes will power comparative analysis, data access tools and future directions of mouse genetics.
Complete genome sequence of Methanospirillum hungatei type strain JF1
Gunsalus, Robert; Cook, Lauren E.; Crable, Bryan R.; ...
2016-01-06
Methanospirillum hungatei strain JF1 (DSM 864) is a methane-producing archaeon and is the type species of the genus Methanospirillum, which belongs to the family Methanospirillaceae within the order Methanomicrobiales. Its genome was selected for sequencing due to its ability to utilize hydrogen and carbon dioxide and/or formate as a sole source of energy. Ecologically, M. hungatei functions as the hydrogen- and/or formate-using partner with many species of syntrophic bacteria. Its morphology is distinct from other methanogens with the ability to form long chains of cells (up to 100 m in length), which are enclosed within a sheath-like structure, and terminalmore » cells with polar flagella. The genome of M. hungatei strain JF1 is the first completely sequenced genome of the family Methanospirillaceae, and it has a circular genome of 3,544,738 bp containing 3,239 protein coding and 68 RNA genes. Furthermore, the large genome of M. hungatei JF1 suggests the presence of unrecognized biochemical/physiological properties that likely extend to the other Methanospirillaceae and include the ability to form the unusual sheath-like structure and to successfully interact with syntrophic bacteria.« less
Machado, Lilian de Oliveira; Vieira, Leila do Nascimento; Stefenon, Valdir Marcos; Oliveira Pedrosa, Fábio de; Souza, Emanuel Maltempi de; Guerra, Miguel Pedro; Nodari, Rubens Onofre
2017-04-01
Given their distribution, importance, and richness, Myrtaceae species comprise a model system for studying the evolution of tropical plant diversity. In addition, chloroplast (cp) genome sequencing is an efficient tool for phylogenetic relationship studies. Feijoa [Acca sellowiana (O. Berg) Burret; CN: pineapple-guava] is a Myrtaceae species that occurs naturally in southern Brazil and northern Uruguay. Feijoa is known for its exquisite perfume and flavorful fruits, pharmacological properties, ornamental value and increasing economic relevance. In the present work, we reported the complete cp genome of feijoa. The feijoa cp genome is a circular molecule of 159,370 bp with a quadripartite structure containing two single copy regions, a Large Single Copy region (LSC 88,028 bp) and a Small Single Copy region (SSC 18,598 bp) separated by Inverted Repeat regions (IRs 26,372 bp). The genome structure, gene order, GC content and codon usage are similar to those of typical angiosperm cp genomes. When compared to other cp genome sequences of Myrtaceae, feijoa showed closest relationship with pitanga (Eugenia uniflora L.). Furthermore, a comparison of pitanga synonymous (Ks) and nonsynonymous (Ka) substitution rates revealed extremely low values. Maximum Likelihood and Bayesian Inference analyses produced phylogenomic trees identical in topology. These trees supported monophyly of three Myrtoideae clades.
Two Low Coverage Bird Genomes and a Comparison of Reference-Guided versus De Novo Genome Assemblies
Card, Daren C.; Schield, Drew R.; Reyes-Velasco, Jacobo; Fujita, Matthew K.; Andrew, Audra L.; Oyler-McCance, Sara J.; Fike, Jennifer A.; Tomback, Diana F.; Ruggiero, Robert P.; Castoe, Todd A.
2014-01-01
As a greater number and diversity of high-quality vertebrate reference genomes become available, it is increasingly feasible to use these references to guide new draft assemblies for related species. Reference-guided assembly approaches may substantially increase the contiguity and completeness of a new genome using only low levels of genome coverage that might otherwise be insufficient for de novo genome assembly. We used low-coverage (∼3.5–5.5x) Illumina paired-end sequencing to assemble draft genomes of two bird species (the Gunnison Sage-Grouse, Centrocercus minimus, and the Clark's Nutcracker, Nucifraga columbiana). We used these data to estimate de novo genome assemblies and reference-guided assemblies, and compared the information content and completeness of these assemblies by comparing CEGMA gene set representation, repeat element content, simple sequence repeat content, and GC isochore structure among assemblies. Our results demonstrate that even lower-coverage genome sequencing projects are capable of producing informative and useful genomic resources, particularly through the use of reference-guided assemblies. PMID:25192061
Zhang, Ying; Li, Lei; Yan, Ting Liang; Liu, Qiang
2014-10-01
Praxelis (Eupatorium catarium Veldkamp) is a new hazardous invasive plant species that has caused serious economic losses and environmental damage in the Northern hemisphere tropical and subtropical regions. Although previous studies focused on detecting the biological characteristics of this plant to prevent its expansion, little effort has been made to understand the impact of Praxelis on the ecosystem in an evolutionary process. The genetic information of Praxelis is required for further phylogenetic identification and evolutionary studies. Here, we report the complete Praxelis chloroplast (cp) genome sequence. The Praxelis chloroplast genome is 151,410 bp in length including a small single-copy region (18,547 bp) and a large single-copy region (85,311 bp) separated by a pair of inverted repeats (IRs; 23,776 bp). The genome contains 85 unique and 18 duplicated genes in the IR region. The gene content and organization are similar to other Asteraceae tribe cp genomes. We also analyzed the whole cp genome sequence, repeat structure, codon usage, contraction of the IR and gene structure/organization features between native and invasive Asteraceae plants, in order to understand the evolution of organelle genomes between native and invasive Asteraceae. Comparative analysis identified the 14 markers containing greater than 2% parsimony-informative characters, indicating that they are potential informative markers for barcoding and phylogenetic analysis. Moreover, a sister relationship between Praxelis and seven other species in Asteraceae was found based on phylogenetic analysis of 28 protein-coding sequences. Complete cp genome information is useful for plant phylogenetic and evolutionary studies within this invasive species and also within the Asteraceae family. Copyright © 2014 Elsevier B.V. All rights reserved.
Automatic Tool for Local Assembly Structures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Whole community shotgun sequencing of total DNA (i.e. metagenomics) and total RNA (i.e. metatranscriptomics) has provided a wealth of information in the microbial community structure, predicted functions, metabolic networks, and is even able to reconstruct complete genomes directly. Here we present ATLAS (Automatic Tool for Local Assembly Structures) a comprehensive pipeline for assembly, annotation, genomic binning of metagenomic and metatranscriptomic data with an integrated framework for Multi-Omics. This will provide an open source tool for the Multi-Omic community at large.
Huang, Mingchao; Wang, Yuyu; Liu, Xingyue; Li, Weihai; Kang, Zehui; Wang, Kai; Li, Xuankun; Yang, Ding
2015-02-15
The Plecoptera (stoneflies) is a hemimetabolous order of insects, whose larvae are usually used as indicators for fresh water biomonitoring. Herein, we describe the complete mitochondrial (mt) genome of a stonefly species, namely Acroneuria hainana Wu belonging to the family Perlidae. This mt genome contains 13 PCGs, 22 tRNA-coding genes and 2 rRNA-coding genes that are conserved in most insect mt genomes, and it also has the identical gene order with the insect ancestral gene order. However, there are three special initiation codons of ND1, ND5 and COI in PCGs: TTG, GTG and CGA, coding for L, V and R, respectively. Additionally, the 899-bp control region, with 73.30% A+T content, has two long repeated sequences which are found at the 3'-end closing to the tRNA(Ile) gene. Both of them can be folded into a stem-loop structure, whose adjacent upstream and downstream sequences can be also folded into stem-loop structures. It is presumed that the four special structures in series could be associated with the D-loop replication. It might be able to adjust the replication speed of two replicate directions. Copyright © 2014 Elsevier B.V. All rights reserved.
Phenetic Comparison of Prokaryotic Genomes Using k-mers
Déraspe, Maxime; Raymond, Frédéric; Boisvert, Sébastien; Culley, Alexander; Roy, Paul H.; Laviolette, François; Corbeil, Jacques
2017-01-01
Abstract Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of Streptococcus pneumoniae and Pseudomonas aeruginosa. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in P. aeruginosa than in S. pneumoniae, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets. PMID:28957508
One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly.
Koren, Sergey; Phillippy, Adam M
2015-02-01
Like a jigsaw puzzle with large pieces, a genome sequenced with long reads is easier to assemble. However, recent sequencing technologies have favored lowering per-base cost at the expense of read length. This has dramatically reduced sequencing cost, but resulted in fragmented assemblies, which negatively affect downstream analyses and hinder the creation of finished (gapless, high-quality) genomes. In contrast, emerging long-read sequencing technologies can now produce reads tens of kilobases in length, enabling the automated finishing of microbial genomes for under $1000. This promises to improve the quality of reference databases and facilitate new studies of chromosomal structure and variation. We present an overview of these new technologies and the methods used to assemble long reads into complete genomes. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Comparative Genomics of the Balsaminaceae Sister Genera Hydrocera triflora and Impatiens pinfanensis
Li, Zhi-Zhong; Saina, Josphat K.; Gichira, Andrew W.; Kyalo, Cornelius M.; Wang, Qing-Feng
2018-01-01
The family Balsaminaceae, which consists of the economically important genus Impatiens and the monotypic genus Hydrocera, lacks a reported or published complete chloroplast genome sequence. Therefore, chloroplast genome sequences of the two sister genera are significant to give insight into the phylogenetic position and understanding the evolution of the Balsaminaceae family among the Ericales. In this study, complete chloroplast (cp) genomes of Impatiens pinfanensis and Hydrocera triflora were characterized and assembled using a high-throughput sequencing method. The complete cp genomes were found to possess the typical quadripartite structure of land plants chloroplast genomes with double-stranded molecules of 154,189 bp (Impatiens pinfanensis) and 152,238 bp (Hydrocera triflora) in length. A total of 115 unique genes were identified in both genomes, of which 80 are protein-coding genes, 31 are distinct transfer RNA (tRNA) and four distinct ribosomal RNA (rRNA). Thirty codons, of which 29 had A/T ending codons, revealed relative synonymous codon usage values of >1, whereas those with G/C ending codons displayed values of <1. The simple sequence repeats comprise mostly the mononucleotide repeats A/T in all examined cp genomes. Phylogenetic analysis based on 51 common protein-coding genes indicated that the Balsaminaceae family formed a lineage with Ebenaceae together with all the other Ericales. PMID:29360746
Genome Structure of the Legume, Lotus japonicus
Sato, Shusei; Nakamura, Yasukazu; Kaneko, Takakazu; Asamizu, Erika; Kato, Tomohiko; Nakao, Mitsuteru; Sasamoto, Shigemi; Watanabe, Akiko; Ono, Akiko; Kawashima, Kumiko; Fujishiro, Tsunakazu; Katoh, Midori; Kohara, Mitsuyo; Kishida, Yoshie; Minami, Chiharu; Nakayama, Shinobu; Nakazaki, Naomi; Shimizu, Yoshimi; Shinpo, Sayaka; Takahashi, Chika; Wada, Tsuyuko; Yamada, Manabu; Ohmido, Nobuko; Hayashi, Makoto; Fukui, Kiichi; Baba, Tomoya; Nakamichi, Tomoko; Mori, Hirotada; Tabata, Satoshi
2008-01-01
The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10 951 complete and 19 848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes. PMID:18511435
Inverse Symmetry in Complete Genomes and Whole-Genome Inverse Duplication
Kong, Sing-Guan; Fan, Wen-Lang; Chen, Hong-Da; Hsu, Zi-Ting; Zhou, Nengji; Zheng, Bo; Lee, Hoong-Chien
2009-01-01
The cause of symmetry is usually subtle, and its study often leads to a deeper understanding of the bearer of the symmetry. To gain insight into the dynamics driving the growth and evolution of genomes, we conducted a comprehensive study of textual symmetries in 786 complete chromosomes. We focused on symmetry based on our belief that, in spite of their extreme diversity, genomes must share common dynamical principles and mechanisms that drive their growth and evolution, and that the most robust footprints of such dynamics are symmetry related. We found that while complement and reverse symmetries are essentially absent in genomic sequences, inverse–complement plus reverse–symmetry is prevalent in complex patterns in most chromosomes, a vast majority of which have near maximum global inverse symmetry. We also discovered relations that can quantitatively account for the long observed but unexplained phenomenon of -mer skews in genomes. Our results suggest segmental and whole-genome inverse duplications are important mechanisms in genome growth and evolution, probably because they are efficient means by which the genome can exploit its double-stranded structure to enrich its code-inventory. PMID:19898631
Chloroplast Genome Evolution in Early Diverged Leptosporangiate Ferns
Kim, Hyoung Tae; Chung, Myong Gi; Kim, Ki-Joong
2014-01-01
In this study, the chloroplast (cp) genome sequences from three early diverged leptosporangiate ferns were completed and analyzed in order to understand the evolution of the genome of the fern lineages. The complete cp genome sequence of Osmunda cinnamomea (Osmundales) was 142,812 base pairs (bp). The cp genome structure was similar to that of eusporangiate ferns. The gene/intron losses that frequently occurred in the cp genome of leptosporangiate ferns were not found in the cp genome of O. cinnamomea. In addition, putative RNA editing sites in the cp genome were rare in O. cinnamomea, even though the sites were frequently predicted to be present in leptosporangiate ferns. The complete cp genome sequence of Diplopterygium glaucum (Gleicheniales) was 151,007 bp and has a 9.7 kb inversion between the trnL-CAA and trnV-GCA genes when compared to O. cinnamomea. Several repeated sequences were detected around the inversion break points. The complete cp genome sequence of Lygodium japonicum (Schizaeales) was 157,142 bp and a deletion of the rpoC1 intron was detected. This intron loss was shared by all of the studied species of the genus Lygodium. The GC contents and the effective numbers of co-dons (ENCs) in ferns varied significantly when compared to seed plants. The ENC values of the early diverged leptosporangiate ferns showed intermediate levels between eusporangiate and core leptosporangiate ferns. However, our phylogenetic tree based on all of the cp gene sequences clearly indicated that the cp genome similarity between O. cinnamomea (Osmundales) and eusporangiate ferns are symplesiomorphies, rather than synapomorphies. Therefore, our data is in agreement with the view that Osmundales is a distinct early diverged lineage in the leptosporangiate ferns. PMID:24823358
Chloroplast genome evolution in early diverged leptosporangiate ferns.
Kim, Hyoung Tae; Chung, Myong Gi; Kim, Ki-Joong
2014-05-01
In this study, the chloroplast (cp) genome sequences from three early diverged leptosporangiate ferns were completed and analyzed in order to understand the evolution of the genome of the fern lineages. The complete cp genome sequence of Osmunda cinnamomea (Osmundales) was 142,812 base pairs (bp). The cp genome structure was similar to that of eusporangiate ferns. The gene/intron losses that frequently occurred in the cp genome of leptosporangiate ferns were not found in the cp genome of O. cinnamomea. In addition, putative RNA editing sites in the cp genome were rare in O. cinnamomea, even though the sites were frequently predicted to be present in leptosporangiate ferns. The complete cp genome sequence of Diplopterygium glaucum (Gleicheniales) was 151,007 bp and has a 9.7 kb inversion between the trnL-CAA and trnVGCA genes when compared to O. cinnamomea. Several repeated sequences were detected around the inversion break points. The complete cp genome sequence of Lygodium japonicum (Schizaeales) was 157,142 bp and a deletion of the rpoC1 intron was detected. This intron loss was shared by all of the studied species of the genus Lygodium. The GC contents and the effective numbers of codons (ENCs) in ferns varied significantly when compared to seed plants. The ENC values of the early diverged leptosporangiate ferns showed intermediate levels between eusporangiate and core leptosporangiate ferns. However, our phylogenetic tree based on all of the cp gene sequences clearly indicated that the cp genome similarity between O. cinnamomea (Osmundales) and eusporangiate ferns are symplesiomorphies, rather than synapomorphies. Therefore, our data is in agreement with the view that Osmundales is a distinct early diverged lineage in the leptosporangiate ferns.
Zhu, Lingxiang; Yan, Zhongqiang; Zhang, Zhaojun; Zhou, Qiming; Zhou, Jinchun; Wakeland, Edward K; Fang, Xiangdong; Xuan, Zhenyu; Shen, Dingxia; Li, Quan-Zhen
2013-01-01
The emergence and rapid spreading of multidrug-resistant Acinetobacter baumannii strains has become a major health threat worldwide. To better understand the genetic recombination related with the acquisition of drug-resistant elements during bacterial infection, we performed complete genome analysis on three newly isolated multidrug-resistant A. baumannii strains from Beijing using next-generation sequencing technology. Whole genome comparison revealed that all 3 strains share some common drug resistant elements including carbapenem-resistant bla OXA-23 and tetracycline (tet) resistance islands, but the genome structures are diversified among strains. Various genomic islands intersperse on the genome with transposons and insertions, reflecting the recombination flexibility during the acquisition of the resistant elements. The blood-isolated BJAB07104 and ascites-isolated BJAB0868 exhibit high similarity on their genome structure with most of the global clone II strains, suggesting these two strains belong to the dominant outbreak strains prevalent worldwide. A large resistance island (RI) of about 121-kb, carrying a cluster of resistance-related genes, was inserted into the ATPase gene on BJAB07104 and BJAB0868 genomes. A 78-kb insertion element carrying tra-locus and bla OXA-23 island, can be either inserted into one of the tniB gene in the 121-kb RI on the chromosome, or transformed to conjugative plasmid in the two BJAB strains. The third strains of this study, BJAB0715, which was isolated from spinal fluid, exhibit much more divergence compared with above two strains. It harbors multiple drug-resistance elements including a truncated AbaR-22-like RI on its genome. One of the unique features of this strain is that it carries both bla OXA-23 and bla OXA-58 genes on its genome. Besides, an Acinetobacter lwoffii adeABC efflux element was found inserted into the ATPase position in BJAB0715. Our comparative analysis on currently completed Acinetobacter baumannii genomes revealed extensive and dynamic genome organizations, which may facilitate the bacteria to acquire drug-resistance elements into their genomes.
The History of Bordetella pertussis Genome Evolution Includes Structural Rearrangement
Peng, Yanhui; Loparev, Vladimir; Batra, Dhwani; Bowden, Katherine E.; Burroughs, Mark; Cassiday, Pamela K.; Davis, Jamie K.; Johnson, Taccara; Juieng, Phalasy; Knipe, Kristen; Mathis, Marsenia H.; Pruitt, Andrea M.; Rowe, Lori; Sheth, Mili; Tondella, M. Lucia; Williams, Margaret M.
2017-01-01
ABSTRACT Despite high pertussis vaccine coverage, reported cases of whooping cough (pertussis) have increased over the last decade in the United States and other developed countries. Although Bordetella pertussis is well known for its limited gene sequence variation, recent advances in long-read sequencing technology have begun to reveal genomic structural heterogeneity among otherwise indistinguishable isolates, even within geographically or temporally defined epidemics. We have compared rearrangements among complete genome assemblies from 257 B. pertussis isolates to examine the potential evolution of the chromosomal structure in a pathogen with minimal gene nucleotide sequence diversity. Discrete changes in gene order were identified that differentiated genomes from vaccine reference strains and clinical isolates of various genotypes, frequently along phylogenetic boundaries defined by single nucleotide polymorphisms. The observed rearrangements were primarily large inversions centered on the replication origin or terminus and flanked by IS481, a mobile genetic element with >240 copies per genome and previously suspected to mediate rearrangements and deletions by homologous recombination. These data illustrate that structural genome evolution in B. pertussis is not limited to reduction but also includes rearrangement. Therefore, although genomes of clinical isolates are structurally diverse, specific changes in gene order are conserved, perhaps due to positive selection, providing novel information for investigating disease resurgence and molecular epidemiology. IMPORTANCE Whooping cough, primarily caused by Bordetella pertussis, has resurged in the United States even though the coverage with pertussis-containing vaccines remains high. The rise in reported cases has included increased disease rates among all vaccinated age groups, provoking questions about the pathogen's evolution. The chromosome of B. pertussis includes a large number of repetitive mobile genetic elements that obstruct genome analysis. However, these mobile elements facilitate large rearrangements that alter the order and orientation of essential protein-encoding genes, which otherwise exhibit little nucleotide sequence diversity. By comparing the complete genome assemblies from 257 isolates, we show that specific rearrangements have been conserved throughout recent evolutionary history, perhaps by eliciting changes in gene expression, which may also provide useful information for molecular epidemiology. PMID:28167525
Mandl, C W; Holzmann, H; Kunz, C; Heinz, F X
1993-05-01
The complete nucleotide sequence of the positive-stranded RNA genome of the tick-borne flavivirus Powassan (10,839 nucleotides) was elucidated and the amino acid sequence of all viral proteins was derived. Based on this sequence as well as serological data, Powassan virus represents the most divergent member of the tick-borne serocomplex within the genus flaviviruses, family Flaviviridae. The primary nucleotide sequence and potential RNA secondary structures of the Powassan virus genome as well as the protein sequences and the reactivities of the virion with a panel of monoclonal antibodies were compared to other tick-borne and mosquito-borne flaviviruses. These analyses corroborated significant differences between tick-borne and mosquito-borne flaviviruses, but also emphasized structural elements that are conserved among both vector groups. The comparisons among tick-borne flaviviruses revealed conserved sequence elements that might represent important determinants of the tick-borne flavivirus phenotype.
Wang, Xumin; Deng, Xin; Zhang, Xiaowei; Hu, Songnian; Yu, Jun
2012-01-01
The complete nucleotide sequences of the chloroplast (cp) and mitochondrial (mt) genomes of resurrection plant Boea hygrometrica (Bh, Gesneriaceae) have been determined with the lengths of 153,493 bp and 510,519 bp, respectively. The smaller chloroplast genome contains more genes (147) with a 72% coding sequence, and the larger mitochondrial genome have less genes (65) with a coding faction of 12%. Similar to other seed plants, the Bh cp genome has a typical quadripartite organization with a conserved gene in each region. The Bh mt genome has three recombinant sequence repeats of 222 bp, 843 bp, and 1474 bp in length, which divide the genome into a single master circle (MC) and four isomeric molecules. Compared to other angiosperms, one remarkable feature of the Bh mt genome is the frequent transfer of genetic material from the cp genome during recent Bh evolution. We also analyzed organellar genome evolution in general regarding genome features as well as compositional dynamics of sequence and gene structure/organization, providing clues for the understanding of the evolution of organellar genomes in plants. The cp-derived sequences including tRNAs found in angiosperm mt genomes support the conclusion that frequent gene transfer events may have begun early in the land plant lineage. PMID:22291979
The SUPERFAMILY database in 2004: additions and improvements.
Madera, Martin; Vogel, Christine; Kummerfeld, Sarah K; Chothia, Cyrus; Gough, Julian
2004-01-01
The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.
Rauscher, S; Flamm, C; Mandl, C W; Heinz, F X; Stadler, P F
1997-07-01
The prediction of the complete matrix of base pairing probabilities was applied to the 3' noncoding region (NCR) of flavivirus genomes. This approach identifies not only well-defined secondary structure elements, but also regions of high structural flexibility. Flaviviruses, many of which are important human pathogens, have a common genomic organization, but exhibit a significant degree of RNA sequence diversity in the functionally important 3'-NCR. We demonstrate the presence of secondary structures shared by all flaviviruses, as well as structural features that are characteristic for groups of viruses within the genus reflecting the established classification scheme. The significance of most of the predicted structures is corroborated by compensatory mutations. The availability of infectious clones for several flaviviruses will allow the assessment of these structural elements in processes of the viral life cycle, such as replication and assembly.
Wang, Ying; Cao, Jinjun; Li, Weihai
2017-03-13
We present the complete mitochondrial (mt) genome sequence of the stonefly, Styloperla spinicercia Wu, 1935 (Plecoptera: Styloperlidae), the type species of the genus Styloperla and the first complete mt genome for the family Styloperlidae. The genome is circular, 16,129 base pairs long, has an A+T content of 70.7%, and contains 37 genes including the large and small ribosomal RNA (rRNA) subunits, 13 protein coding genes (PCGs), 22 tRNA genes and a large non-coding region (CR). All of the PCGs use the standard initiation codon ATN except ND1 and ND5, which start with TTG and GTG. Twelve of the PCGs stop with conventional terminal codons TAA and TAG, except ND5 which shows an incomplete terminator signal T. All tRNAs have the classic clover-leaf structures with the dihydrouridine (DHU) arm of tRNASer(AGN) forming a simple loop. Secondary structures of the two ribosomal RNAs are presented with reference to previous models. The structural elements and the variable numbers of tandem repeats are described within the control region. Phylogenetic analyses using both Bayesian (BI) and Maximum Likelihood (ML) methods support the previous hypotheses regarding family level relationships within the Pteronarcyoidea. The genetic distance calculated based on 13 PCGs and two rRNAs between Styloperla sp. and S. spinicercia is provided and interspecific divergence is discussed.
Chen, Peng; Han, Yuqing; Zhu, Chaoying; Gao, Bin; Ruan, Luzhang
2017-12-01
The complete mitochondrial genome sequences of Porzana fusca and Porzana pusilla were determined. The two avian species share a high degree of homology in terms of mitochondrial genome organization and gene arrangement. Their corresponding mitochondrial genomes are 16,935 and 16,978 bp and consist of 37 genes and a control region. Their PCGs were both 11,365 bp long and have similar structure. Their tRNA gene sequences could be folded into canonical cloverleaf secondary structure, except for tRNA Ser (AGY) , which lost its "DHU" arm. Based on the concatenated nucleotide sequences of the complete mitochondrial DNA genes of 16 Rallidae species, reconstruction of phylogenetic trees and analysis of the molecular clock of P. fusca and P. pusilla indicated that these species from a sister group, which in turn are sister group to Rallina eurizonoides. The genus Gallirallus is a sister group to genus Lewinia, and these groups in turn are sister groups to genus Porphyrio. Moreover, molecular clock analyses suggested that the basal divergence of Rallidae could be traced back to 40.47 (41.46‒39.45) million years ago (Mya), and the divergence of Porzana occurred approximately 5.80 (15.16‒0.79) Mya.
The COG database: new developments in phylogenetic classification of proteins from complete genomes
Tatusov, Roman L.; Natale, Darren A.; Garkavtsev, Igor V.; Tatusova, Tatiana A.; Shankavaram, Uma T.; Rao, Bachoti S.; Kiryutin, Boris; Galperin, Michael Y.; Fedorova, Natalie D.; Koonin, Eugene V.
2001-01-01
The database of Clusters of Orthologous Groups of proteins (COGs), which represents an attempt on a phylogenetic classification of the proteins encoded in complete genomes, currently consists of 2791 COGs including 45 350 proteins from 30 genomes of bacteria, archaea and the yeast Saccharomyces cerevisiae (http://www.ncbi.nlm.nih.gov/COG). In addition, a supplement to the COGs is available, in which proteins encoded in the genomes of two multicellular eukaryotes, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster, and shared with bacteria and/or archaea were included. The new features added to the COG database include information pages with structural and functional details on each COG and literature references, improvements of the COGNITOR program that is used to fit new proteins into the COGs, and classification of genomes and COGs constructed by using principal component analysis. PMID:11125040
Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl
2014-07-04
Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was shared by mitochondrial genomes of CMS and male-fertile pepper lines, extensive genome rearrangements were detected. CMS candidate genes located on the edges of highly-rearranged CMS-specific DNA regions and near to repeat sequences. These characteristics were detected among CMS-associated genes in other species, implying a common mechanism might be involved in the evolution of CMS-associated genes.
Cruz-Flores, Roberto; Cáceres-Martínez, Jorge; Del Río-Portilla, Miguel Ángel; Licea-Navarro, Alexei F; Gonzales-Sánchez, Ricardo; Guerrero, Abraham
2018-04-01
Bacteriophages are recognized as major mortality agents of microbes, among them intracellular marine rickettsiales-like bacteria. Recently, a phage hyperparasite of Candidatus Xenohaliotis californiensis (CXc) has been described. This bacterium is considered the causal agent of Withering Syndrome (WS) which is a chronic and potentially lethal disease of abalone species from California, USA and the peninsula of Baja California, Mexico. This hyperparasite which infects CXc could be used as a biocontrol agent for WS. Therefore, it is necessary to obtain genomic information to characterize this phage. In this study, the first complete genome sequence of a novel phage, Xenohaliotis phage (pCXc) was determined. The complete genome of pCXc from red abalone (Haliotis rufescens) is 35,728 bp, while the complete genome of pCXc from yellow abalone (Haliotis corrugata) is 35,736 bp. Both phage genomes consist of double-stranded DNA with a G + C content of 38.9%. In both genomes 33 open reading frames (ORFs) were predicted. Only 10 ORFs encode proteins that have identifiable functional homologues. These 10 ORFs were classified by function, including structural, DNA replication, DNA packaging, nucleotide transport and metabolism, life cycle regulation, recombination and repair, and additional functions. A PCR method for the specific detection of pCXc was developed. This information will help to understand a new group of phages that infect intracellular marine rickettsiales-like bacteria in mollusks.
The complete mitochondrial genome of the central chimpanzee, Pan troglodytes troglodytes.
Liu, Bang; Hu, Xiao-di; Gao, Li-Zhi
2016-07-01
This study first report the complete mitochondrial genome sequence of the central chimpanzee, Pan troglodytes troglodytes. The genome was a total of 16 556 bp in length and had a base composition of A (31.05%), G (12.95%), C (30.84%), and T (25.16%), indicating that the percentage of A + T (56.21%) is higher than G + C (43.79%). Similar to other primates, it possessed a typically conserved structure, including 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes and 1 control region (D-loop). Most of these genes were found to locate on the H-strand except for the ND6 gene and 8 tRNA genes. The phylogenetic analysis showed that the P. t. troglodytes mitochondrial genome formed a cluster with the other three Pan troglodytes genomes and that the genus Pan is closely related to the genus Homo. This mitochondrial genome sequence would supply useful genetic resources to help the conservation management of primate germplasm and uncover hominoid evolution.
Complete mitochondrial genome of the Tyto longimembris (Strigiformes: Tytonidae).
Xu, Peng; Li, Yankuo; Miao, Lujun; Xie, Guangyong; Huang, Yan
2016-07-01
The complete mitochondrial genome of Tyto longimembris has been determined in this study. It is 18,466 bp in length and consists of 13 protein-coding genes, 22 transfer RNA (tRNA) genes, 2 ribosomal RNA (rRNA) genes and a non-coding control region (D-loop). The overall base composition of the heavy strand of the T. longimembris mitochondrial genome is A: 30.1%, T: 23.5%, C: 31.8% and G: 14.6%. The structure of control region should be characterized by a region containing tandem repeats as two definitely separated clusters of tandem repeats were found. This study provided an important data set for phylogenetic and taxonomic analyses of Tyto species.
Szczecińska, Monika; Sawicki, Jakub
2015-09-15
The European continent is presently colonized by nine species of the genus Pulsatilla, five of which are encountered only in mountainous regions of southwest and south-central Europe. The remaining four species inhabit lowlands in the north-central and eastern parts of the continent. Most plants of the genus Pulsatilla are rare and endangered, which is why most research efforts focused on their biology, ecology and hybridization. The objective of this study was to develop genomic resources, including complete plastid genomes and nuclear rRNA clusters, for three sympatric Pulsatilla species that are most commonly found in Central Europe. The results will supply valuable information about genetic variation, which can be used in the process of designing primers for population studies and conservation genetics research. The complete plastid genomes together with the nuclear rRNA cluster can serve as a useful tool in hybridization studies. Six complete plastid genomes and nuclear rRNA clusters were sequenced from three species of Pulsatilla using the Illumina sequencing technology. Four junctions between single copy regions and inverted repeats and junctions between the identified locally-collinear blocks (LCB) were confirmed by Sanger sequencing. Pulsatilla genomes of 120 unique genes had a total length of approximately 161-162 kb, and 21 were duplicated in the inverted repeats (IR) region. Comparative plastid genomes of newly-sequenced Pulsatilla and the previously-identified plastomes of Aconitum and Ranunculus species belonging to the family Ranunculaceae revealed several variations in the structure of the genome, but the gene content remained constant. The nuclear rRNA cluster (18S-ITS1-5.8S-ITS2-26S) of studied Pulsatilla species is 5795 bp long. Among five analyzed regions of the rRNA cluster, only Internal Transcribed Spacer 2 (ITS2) enabled the molecular delimitation of closely-related Pulsatilla patens and Pulsatilla vernalis. The determination of complete plastid genome and nuclear rRNA cluster sequences in three species of the genus Pulsatilla is an important contribution to our knowledge of the evolution and phylogeography of those endangered taxa. The resulting data can be used to identify regions that are particularly useful for barcoding, phylogenetic and phylogeographic studies. The investigated taxa can be identified at each stage of development based on their species-specific SNPs. The nuclear and plastid genomic resources enable advanced studies on hybridization, including identification of parent species, including their roles in that process. The identified nonsynonymous mutations could play an important role in adaptations to changing environments. The results of the study will also provide valuable information about the evolution of the plastome structure in the family Ranunculaceae.
Szczecińska, Monika; Sawicki, Jakub
2015-01-01
Background: The European continent is presently colonized by nine species of the genus Pulsatilla, five of which are encountered only in mountainous regions of southwest and south-central Europe. The remaining four species inhabit lowlands in the north-central and eastern parts of the continent. Most plants of the genus Pulsatilla are rare and endangered, which is why most research efforts focused on their biology, ecology and hybridization. The objective of this study was to develop genomic resources, including complete plastid genomes and nuclear rRNA clusters, for three sympatric Pulsatilla species that are most commonly found in Central Europe. The results will supply valuable information about genetic variation, which can be used in the process of designing primers for population studies and conservation genetics research. The complete plastid genomes together with the nuclear rRNA cluster can serve as a useful tool in hybridization studies. Methodology/principal findings: Six complete plastid genomes and nuclear rRNA clusters were sequenced from three species of Pulsatilla using the Illumina sequencing technology. Four junctions between single copy regions and inverted repeats and junctions between the identified locally-collinear blocks (LCB) were confirmed by Sanger sequencing. Pulsatilla genomes of 120 unique genes had a total length of approximately 161–162 kb, and 21 were duplicated in the inverted repeats (IR) region. Comparative plastid genomes of newly-sequenced Pulsatilla and the previously-identified plastomes of Aconitum and Ranunculus species belonging to the family Ranunculaceae revealed several variations in the structure of the genome, but the gene content remained constant. The nuclear rRNA cluster (18S-ITS1-5.8S-ITS2-26S) of studied Pulsatilla species is 5795 bp long. Among five analyzed regions of the rRNA cluster, only Internal Transcribed Spacer 2 (ITS2) enabled the molecular delimitation of closely-related Pulsatilla patens and Pulsatilla vernalis. Conclusions/significance: The determination of complete plastid genome and nuclear rRNA cluster sequences in three species of the genus Pulsatilla is an important contribution to our knowledge of the evolution and phylogeography of those endangered taxa. The resulting data can be used to identify regions that are particularly useful for barcoding, phylogenetic and phylogeographic studies. The investigated taxa can be identified at each stage of development based on their species-specific SNPs. The nuclear and plastid genomic resources enable advanced studies on hybridization, including identification of parent species, including their roles in that process. The identified nonsynonymous mutations could play an important role in adaptations to changing environments. The results of the study will also provide valuable information about the evolution of the plastome structure in the family Ranunculaceae. PMID:26389887
Enriching public descriptions of marine phages using the Genomic Standards Consortium MIGS standard
Duhaime, Melissa Beth; Kottmann, Renzo; Field, Dawn; Glöckner, Frank Oliver
2011-01-01
In any sequencing project, the possible depth of comparative analysis is determined largely by the amount and quality of the accompanying contextual data. The structure, content, and storage of this contextual data should be standardized to ensure consistent coverage of all sequenced entities and facilitate comparisons. The Genomic Standards Consortium (GSC) has developed the “Minimum Information about Genome/Metagenome Sequences (MIGS/MIMS)” checklist for the description of genomes and here we annotate all 30 publicly available marine bacteriophage sequences to the MIGS standard. These annotations build on existing International Nucleotide Sequence Database Collaboration (INSDC) records, and confirm, as expected that current submissions lack most MIGS fields. MIGS fields were manually curated from the literature and placed in XML format as specified by the Genomic Contextual Data Markup Language (GCDML). These “machine-readable” reports were then analyzed to highlight patterns describing this collection of genomes. Completed reports are provided in GCDML. This work represents one step towards the annotation of our complete collection of genome sequences and shows the utility of capturing richer metadata along with raw sequences. PMID:21677864
Xiang, Yu; Bernardy, Mike; Bhagwat, Basdeo; Wiersma, Paul A; DeYoung, Robyn; Bouthillier, Michel
2015-02-01
Strawberry decline disease, probably caused by synergistic reactions of mixed virus infections, threatens the North American strawberry industry. Deep sequencing of strawberry plant samples from eastern Canada resulted in the identification of a new virus genome resembling poleroviruses in sequence and genome structure. Phylogenetic analysis suggests that it is a new member of the genus Polerovirus, family Luteoviridae. The virus is tentatively named "strawberry polerovirus 1" (SPV1).
Liu, Qiu-Ning; Chai, Xin-Yue; Bian, Dan-Dan; Zhou, Chun-Lin; Tang, Bo-Ping
2016-01-01
The mitochondrial (mt) genome can provide important information for the understanding of phylogenetic relationships. The complete mt genome of Plodia interpunctella (Lepidoptera: Pyralidae) has been sequenced. The circular genome is 15 287 bp in size, encoding 13 protein-coding genes (PCGs), 2 rRNA genes, 22 tRNA genes, and a control region. The AT skew of this mt genome is slightly negative, and the nucleotide composition is biased toward A+T nucleotides (80.15%). All PCGs start with the typical ATN (ATA, ATC, ATG, and ATT) codons, except for the cox1 gene which may start with the CGA codon. Four of the 13 PCGs harbor the incomplete termination codon T or TA. All the tRNA genes are folded into the typical clover-leaf structure of mitochondrial tRNA, except for trnS1 (AGN) in which the DHU arm fails to form a stable stem-loop structure. The overlapping sequences are 35 bp in total and are found in seven different locations. A total of 240 bp of intergenic spacers are scattered in 16 regions. The control region of the mt genome is 327 bp in length and consisted of several features common to the sequenced lepidopteran insects. Phylogenetic analysis based on 13 PCGs using the Maximum Likelihood method shows that the placement of P. interpunctella was within the Pyralidae.
Complete mitochondrial genome sequence of Urechis caupo, a representative of the phylum Echiura
Boore, Jeffrey L
2004-01-01
Background Mitochondria contain small genomes that are physically separate from those of nuclei. Their comparison serves as a model system for understanding the processes of genome evolution. Although hundreds of these genome sequences have been reported, the taxonomic sampling is highly biased toward vertebrates and arthropods, with many whole phyla remaining unstudied. This is the first description of a complete mitochondrial genome sequence of a representative of the phylum Echiura, that of the fat innkeeper worm, Urechis caupo. Results This mtDNA is 15,113 nts in length and 62% A+T. It contains the 37 genes that are typical for animal mtDNAs in an arrangement somewhat similar to that of annelid worms. All genes are encoded by the same DNA strand which is rich in A and C relative to the opposite strand. Codons ending with the dinucleotide GG are more frequent than would be expected from apparent mutational biases. The largest non-coding region is only 282 nts long, is 71% A+T, and has potential for secondary structures. Conclusions Urechis caupo mtDNA shares many features with those of the few studied annelids, including the common usage of ATG start codons, unusual among animal mtDNAs, as well as gene arrangements, tRNA structures, and codon usage biases. PMID:15369601
Complete Chloroplast Genome Sequence of Coptis chinensis Franch. and Its Evolutionary History
He, Yang; Deng, Cao; Fan, Gang; Qin, Shishang
2017-01-01
The Coptis chinensis Franch. is an important medicinal plant from the Ranunculales. We used next generation sequencing technology to determine the complete chloroplast genome of C. chinensis. This genome is 155,484 bp long with 38.17% GC content. Two 26,758 bp long inverted repeats separated the genome into a typical quadripartite structure. The C. chinensis chloroplast genome consists of 128 gene loci, including eight rRNA gene loci, 28 tRNA gene loci, and 92 protein-coding gene loci. Most of the SSRs in C. chinensis are poly-A/T. The numbers of mononucleotide SSRs in C. chinensis and other Ranunculaceae species are fewer than those in Berberidaceae species, while the number of dinucleotide SSRs is greater than that in the Berberidaceae. C. chinensis diverged from other Ranunculaceae species an estimated 81 million years ago (Mya). The divergence between Ranunculaceae and Berberidaceae was ~111 Mya, while the Ranunculales and Magnoliaceae shared a common ancestor during the Jurassic, ~153 Mya. Position 104 of the C. chinensis ndhG protein was identified as a positively selected site, indicating possible selection for the photosystem-chlororespiration system in C. chinensis. In summary, the complete sequencing and annotation of the C. chinensis chloroplast genome will facilitate future studies on this important medicinal species. PMID:28698879
Sabir, Jamal; Schwarz, Erika; Ellison, Nicholas; Zhang, Jin; Baeshen, Nabih A; Mutwakil, Muhammed; Jansen, Robert; Ruhlman, Tracey
2014-08-01
Land plant plastid genomes (plastomes) provide a tractable model for evolutionary study in that they are relatively compact and gene dense. Among the groups that display an appropriate level of variation for structural features, the inverted-repeat-lacking clade (IRLC) of papilionoid legumes presents the potential to advance general understanding of the mechanisms of genomic evolution. Here, are presented six complete plastome sequences from economically important species of the IRLC, a lineage previously represented by only five completed plastomes. A number of characters are compared across the IRLC including gene retention and divergence, synteny, repeat structure and functional gene transfer to the nucleus. The loss of clpP intron 2 was identified in one newly sequenced member of IRLC, Glycyrrhiza glabra. Using deeply sequenced nuclear transcriptomes from two species helped clarify the nature of the functional transfer of accD to the nucleus in Trifolium, which likely occurred in the lineage leading to subgenus Trifolium. Legumes are second only to cereal crops in agricultural importance based on area harvested and total production. Genetic improvement via plastid transformation of IRLC crop species is an appealing proposition. Comparative analyses of intergenic spacer regions emphasize the need for complete genome sequences for developing transformation vectors for plastid genetic engineering of legume crops. © 2014 Society for Experimental Biology, Association of Applied Biologists and John Wiley & Sons Ltd.
Chen, Caihui; Zheng, Yongjie; Liu, Sian; Zhong, Yongda; Wu, Yanfang; Li, Jiang; Xu, Li-An; Xu, Meng
2017-01-01
Cinnamomum camphora , a member of the Lauraceae family, is a valuable aromatic and timber tree that is indigenous to the south of China and Japan. All parts of Cinnamomum camphora have secretory cells containing different volatile chemical compounds that are utilized as herbal medicines and essential oils. Here, we reported the complete sequencing of the chloroplast genome of Cinnamomum camphora using illumina technology. The chloroplast genome of Cinnamomum camphora is 152,570 bp in length and characterized by a relatively conserved quadripartite structure containing a large single copy region of 93,705 bp, a small single copy region of 19,093 bp and two inverted repeat (IR) regions of 19,886 bp. Overall, the genome contained 123 coding regions, of which 15 were repeated in the IR regions. An analysis of chloroplast sequence divergence revealed that the small single copy region was highly variable among the different genera in the Lauraceae family. A total of 40 repeat structures and 83 simple sequence repeats were detected in both the coding and non-coding regions. A phylogenetic analysis indicated that Calycanthus is most closely related to Lauraceae , both being members of Laurales , which forms a sister group to Magnoliids . The complete sequence of the chloroplast of Cinnamomum camphora will aid in in-depth taxonomical studies of the Lauraceae family in the future. The genetic sequence information will also have valuable applications for chloroplast genetic engineering.
Huang, Ya-Yi; Matzke, Antonius J. M.; Matzke, Marjori
2013-01-01
Coconut, a member of the palm family (Arecaceae), is one of the most economically important trees used by mankind. Despite its diverse morphology, coconut is recognized taxonomically as only a single species (Cocos nucifera L.). There are two major coconut varieties, tall and dwarf, the latter of which displays traits resulting from selection by humans. We report here the complete chloroplast (cp) genome of a dwarf coconut plant, and describe the gene content and organization, inverted repeat fluctuations, repeated sequence structure, and occurrence of RNA editing. Phylogenetic relationships of monocots were inferred based on 47 chloroplast protein-coding genes. Potential nodes for events of gene duplication and pseudogenization related to inverted repeat fluctuation were mapped onto the tree using parsimony criteria. We compare our findings with those from other palm species for which complete cp genome sequences are available. PMID:24023703
Huang, Ya-Yi; Matzke, Antonius J M; Matzke, Marjori
2013-01-01
Coconut, a member of the palm family (Arecaceae), is one of the most economically important trees used by mankind. Despite its diverse morphology, coconut is recognized taxonomically as only a single species (Cocos nucifera L.). There are two major coconut varieties, tall and dwarf, the latter of which displays traits resulting from selection by humans. We report here the complete chloroplast (cp) genome of a dwarf coconut plant, and describe the gene content and organization, inverted repeat fluctuations, repeated sequence structure, and occurrence of RNA editing. Phylogenetic relationships of monocots were inferred based on 47 chloroplast protein-coding genes. Potential nodes for events of gene duplication and pseudogenization related to inverted repeat fluctuation were mapped onto the tree using parsimony criteria. We compare our findings with those from other palm species for which complete cp genome sequences are available.
NASA Astrophysics Data System (ADS)
Derelle, Evelyne; Ferraz, Conchita; Rombauts, Stephane; Rouzé, Pierre; Worden, Alexandra Z.; Robbens, Steven; Partensky, Frédéric; Degroeve, Sven; Echeynié, Sophie; Cooke, Richard; Saeys, Yvan; Wuyts, Jan; Jabbari, Kamel; Bowler, Chris; Panaud, Olivier; Piégu, Benoît; Ball, Steven G.; Ral, Jean-Philippe; Bouget, François-Yves; Piganeau, Gwenael; de Baets, Bernard; Picard, André; Delseny, Michel; Demaille, Jacques; van de Peer, Yves; Moreau, Hervé
2006-08-01
The green lineage is reportedly 1,500 million years old, evolving shortly after the endosymbiosis event that gave rise to early photosynthetic eukaryotes. In this study, we unveil the complete genome sequence of an ancient member of this lineage, the unicellular green alga Ostreococcus tauri (Prasinophyceae). This cosmopolitan marine primary producer is the world's smallest free-living eukaryote known to date. Features likely reflecting optimization of environmentally relevant pathways, including resource acquisition, unusual photosynthesis apparatus, and genes potentially involved in C4 photosynthesis, were observed, as was downsizing of many gene families. Overall, the 12.56-Mb nuclear genome has an extremely high gene density, in part because of extensive reduction of intergenic regions and other forms of compaction such as gene fusion. However, the genome is structurally complex. It exhibits previously unobserved levels of heterogeneity for a eukaryote. Two chromosomes differ structurally from the other eighteen. Both have a significantly biased G+C content, and, remarkably, they contain the majority of transposable elements. Many chromosome 2 genes also have unique codon usage and splicing, but phylogenetic analysis and composition do not support alien gene origin. In contrast, most chromosome 19 genes show no similarity to green lineage genes and a large number of them are specialized in cell surface processes. Taken together, the complete genome sequence, unusual features, and downsized gene families, make O. tauri an ideal model system for research on eukaryotic genome evolution, including chromosome specialization and green lineage ancestry. genome heterogeneity | genome sequence | green alga | Prasinophyceae | gene prediction
Dubey, Bhawna; Meganathan, P R; Haque, Ikramul
2012-07-01
This paper reports the complete mitochondrial genome sequence of an endangered Indian snake, Python molurus molurus (Indian Rock Python). A typical snake mitochondrial (mt) genome of 17258 bp length comprising of 37 genes including the 13 protein coding genes, 22 tRNA genes, and 2 ribosomal RNA genes along with duplicate control regions is described herein. The P. molurus molurus mt. genome is relatively similar to other snake mt. genomes with respect to gene arrangement, composition, tRNA structures and skews of AT/GC bases. The nucleotide composition of the genome shows that there are more A-C % than T-G% on the positive strand as revealed by positive AT and CG skews. Comparison of individual protein coding genes, with other snake genomes suggests that ATP8 and NADH3 genes have high divergence rates. Codon usage analysis reveals a preference of NNC codons over NNG codons in the mt. genome of P. molurus. Also, the synonymous and non-synonymous substitution rates (ka/ks) suggest that most of the protein coding genes are under purifying selection pressure. The phylogenetic analyses involving the concatenated 13 protein coding genes of P. molurus molurus conformed to the previously established snake phylogeny.
Complete Mitochondrial Genome of the Medicinal Mushroom Ganoderma lucidum
Chen, Haimei; Chen, Xiangdong; Lan, Jin; Liu, Chang
2013-01-01
Ganoderma lucidum is one of the well-known medicinal basidiomycetes worldwide. The mitochondrion, referred to as the second genome, is an organelle found in most eukaryotic cells and participates in critical cellular functions. Elucidating the structure and function of this genome is important to understand completely the genetic contents of G. lucidum. In this study, we assembled the mitochondrial genome of G. lucidum and analyzed the differential expressions of its encoded genes across three developmental stages. The mitochondrial genome is a typical circular DNA molecule of 60,630 bp with a GC content of 26.67%. Genome annotation identified genes that encode 15 conserved proteins, 27 tRNAs, small and large rRNAs, four homing endonucleases, and two hypothetical proteins. Except for genes encoding trnW and two hypothetical proteins, all genes were located on the positive strand. For the repeat structure analysis, eight forward, two inverted, and three tandem repeats were detected. A pair of fragments with a total length around 5.5 kb was found in both the nuclear and mitochondrial genomes, which suggests the possible transfer of DNA sequences between two genomes. RNA-Seq data for samples derived from three stages, namely, mycelia, primordia, and fruiting bodies, were mapped to the mitochondrial genome and qualified. The protein-coding genes were expressed higher in mycelia or primordial stages compared with those in the fruiting bodies. The rRNA abundances were significantly higher in all three stages. Two regions were transcribed but did not contain any identified protein or tRNA genes. Furthermore, three RNA-editing sites were detected. Genome synteny analysis showed that significant genome rearrangements occurred in the mitochondrial genomes. This study provides valuable information on the gene contents of the mitochondrial genome and their differential expressions at various developmental stages of G. lucidum. The results contribute to the understanding of the functions and evolution of fungal mitochondrial DNA. PMID:23991034
The Complete Plastome Sequence of an Antarctic Bryophyte Sanionia uncinata (Hedw.) Loeske
Park, Mira; Park, Hyun; Lee, Hyoungseok; Lee, Byeong-ha
2018-01-01
Organellar genomes of bryophytes are poorly represented with chloroplast genomes of only four mosses, four liverworts and two hornworts having been sequenced and annotated. Moreover, while Antarctic vegetation is dominated by the bryophytes, there are few reports on the plastid genomes for the Antarctic bryophytes. Sanionia uncinata (Hedw.) Loeske is one of the most dominant moss species in the maritime Antarctic. It has been researched as an important marker for ecological studies and as an extremophile plant for studies on stress tolerance. Here, we report the complete plastome sequence of S. uncinata, which can be exploited in comparative studies to identify the lineage-specific divergence across different species. The complete plastome of S. uncinata is 124,374 bp in length with a typical quadripartite structure of 114 unique genes including 82 unique protein-coding genes, 37 tRNA genes and four rRNA genes. However, two genes encoding the α subunit of RNA polymerase (rpoA) and encoding the cytochrome b6/f complex subunit VIII (petN) were absent. We could identify nuclear genes homologous to those genes, which suggests that rpoA and petN might have been relocated from the chloroplast genome to the nuclear genome. PMID:29494552
gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances.
Domazet-Lošo, Mirjana; Domazet-Lošo, Tomislav
2016-01-01
Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos).
gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances
Domazet-Lošo, Mirjana; Domazet-Lošo, Tomislav
2016-01-01
Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos). PMID:27846272
Véliz, David; Vega-Retter, Caren; Quezada-Romegialli, Claudio
2016-01-01
The complete sequence of the mitochondrial genome for the Chilean silverside Basilichthys microlepidotus is reported for the first time. The entire mitochondrial genome was 16,544 bp in length (GenBank accession no. KM245937); gene composition and arrangement was conformed to that reported for most fishes and contained the typical structure of 2 rRNAs, 13 protein-coding genes, 22 tRNAs and a non-coding region. The assembled mitogenome was validated against sequences of COI and Control Region previously sequenced in our lab, functional genes from RNA-Seq data for the same species and the mitogenome of two other atherinopsid species available in Genbank.
O'Neill, F J; Gao, Y; Xu, X
1993-11-01
The DNAs of polyomaviruses ordinarily exist as a single circular molecule of approximately 5000 base pairs. Variants of SV40, BKV and JCV have been described which contain two complementing defective DNA molecules. These defectives, which form a bipartite genome structure, contain either the viral early region or the late region. The defectives have the unique property of being able to tolerate variable sized reiterations of regulatory and terminus region sequences, and portions of the coding region. They can also exchange coding region sequences with other polyomaviruses. It has been suggested that the bipartite genome structure might be a stage in the evolution of polyomaviruses which can uniquely sustain genome and sequence diversity. However, it is not known if the regulatory and terminus region sequences are highly mutable. Also, it is not known if the bipartite genome structure is reversible and what the conditions might be which would favor restoration of the monomolecular genome structure. We addressed the first question by sequencing the reiterated regulatory and terminus regions of E- and L-SV40 DNAs. This revealed a large number of mutations in the regulatory regions of the defective genomes, including deletions, insertions, rearrangements and base substitutions. We also detected insertions and base substitutions in the T-antigen gene. We addressed the second question by introducing into permissive simian cells, E- and L-SV40 genomes which had been engineered to contain only a single regulatory region. Analysis of viral DNA from transfected cells demonstrated recombined genomes containing a wild type monomolecular DNA structure. However, the complete defectives, containing reiterated regulatory regions, could often compete away the wild type genomes. The recombinant monomolecular genomes were isolated, cloned and found to be infectious. All of the DNA alterations identified in one of the regulatory regions of E-SV40 DNA were present in the recombinant monomolecular genomes. These and other findings indicate that the bipartite genome state can sustain many mutations which wtSV40 cannot directly sustain. However, the mutations can later be introduced into the wild type genomes when the E- and L-SV40 DNAs recombine to generate a new monomolecular genome structure.
Salem, Nida’ M.; Miller, W. Allen; Rowhani, Adib; Golino, Deborah A.; Moyne, Anne-Laure; Falk, Bryce W.
2015-01-01
We determined the complete nucleotide sequence of the Rose spring dwarf-associated virus (RSDaV) genomic RNA (GenBank accession no. EU024678) and compared its predicted RNA structural characteristics affecting gene expression. A cDNA library was derived from RSDaV double-stranded RNAs (dsRNAs) purified from infected tissue. Nucleotide sequence analysis of the cloned cDNAs, plus for clones generated by 5′- and 3′-RACE showed the RSDaV genomic RNA to be 5,808 nucleotides. The genomic RNA contains five major open reading frames (ORFs), and three small ORFs in the 3′-terminal 800 nucleotides, typical for viruses of genus Luteovirus in the family Luteoviridae. Northern blot hybridization analysis revealed the genomic RNA and two prominent subgenomic RNAs of approximately 3 kb and 1 kb. Putative 5′ ends of the sgRNAs were predicted by identification of conserved sequences and secondary structures which resembled the Barley yellow dwarf virus (BYDV) genomic RNA 5′ end and subgenomic RNA promoter sequences. Secondary structures of the BYDV-like ribosomal frameshift elements and cap-independent translation elements, including long-distance base pairing spanning four kb were identified. These contain similarities but also informative differences with the BYDV structures, including a strikingly different structure predicted for the 3′ cap-independent translation element. These analyses of the RSDaV genomic RNA show more complexity for the RNA structural elements for members of the Luteoviridae. PMID:18329064
Salem, Nida' M; Miller, W Allen; Rowhani, Adib; Golino, Deborah A; Moyne, Anne-Laure; Falk, Bryce W
2008-06-05
We determined the complete nucleotide sequence of the Rose spring dwarf-associated virus (RSDaV) genomic RNA (GenBank accession no. EU024678) and compared its predicted RNA structural characteristics affecting gene expression. A cDNA library was derived from RSDaV double-stranded RNAs (dsRNAs) purified from infected tissue. Nucleotide sequence analysis of the cloned cDNAs, plus for clones generated by 5'- and 3'-RACE showed the RSDaV genomic RNA to be 5808 nucleotides. The genomic RNA contains five major open reading frames (ORFs), and three small ORFs in the 3'-terminal 800 nucleotides, typical for viruses of genus Luteovirus in the family Luteoviridae. Northern blot hybridization analysis revealed the genomic RNA and two prominent subgenomic RNAs of approximately 3 kb and 1 kb. Putative 5' ends of the sgRNAs were predicted by identification of conserved sequences and secondary structures which resembled the Barley yellow dwarf virus (BYDV) genomic RNA 5' end and subgenomic RNA promoter sequences. Secondary structures of the BYDV-like ribosomal frameshift elements and cap-independent translation elements, including long-distance base pairing spanning four kb were identified. These contain similarities but also informative differences with the BYDV structures, including a strikingly different structure predicted for the 3' cap-independent translation element. These analyses of the RSDaV genomic RNA show more complexity for the RNA structural elements for members of the Luteoviridae.
Robinson, Nick A; Hall, Nathan E; Ross, Elizabeth M; Cooke, Ira R; Shiel, Brett P; Robinson, Andrew J; Strugnell, Jan M
2016-01-01
The mitochondrial genome of greenlip abalone, Haliotis laevigata, is reported. MiSeq and HiSeq sequencing of one individual was assembled to yield a single 16,545 bp contig. The sequence shares 92% identity to the H. rubra mitochondrial genome (a closely related species that hybridize with H. laevigata in the wild). The sequence will be useful for determining the maternal contribution to hybrid populations, for investigating population structure and stock-enhancement effectiveness.
Flot, Jean-François; Tillier, Simon
2007-10-15
The complete mitochondrial genomes of two individuals attributed to different morphospecies of the scleractinian coral genus Pocillopora have been sequenced. Both genomes, respectively 17,415 and 17,422 nt long, share the presence of a previously undescribed ORF encoding a putative protein made up of 302 amino acids and of unknown function. Surprisingly, this ORF turns out to be the second most variable region of the mitochondrial genome (1% nucleotide sequence difference between the two individuals) after the putative control region (1.5% sequence difference). Except for the presence of this ORF and for the location of the putative control region, the mitochondrial genome of Pocillopora is organized in a fashion similar to the other scleractinian coral genomes published to date. For the first time in a cnidarian, a putative second origin of replication is described based on its secondary structure similar to the stem-loop structure of O(L), the origin of L-strand replication in vertebrates.
Mavian, Carla; López-Bueno, Alberto; Balseiro, Ana; Casais, Rosa; Alcamí, Antonio; Alejo, Alí
2012-04-01
Worldwide amphibian population declines have been ascribed to global warming, increasing pollution levels, and other factors directly related to human activities. These factors may additionally be favoring the emergence of novel pathogens. In this report, we have determined the complete genome sequence of the emerging common midwife toad ranavirus (CMTV), which has caused fatal disease in several amphibian species across Europe. Phylogenetic and gene content analyses of the first complete genomic sequence from a ranavirus isolated in Europe show that CMTV is an amphibian-like ranavirus (ALRV). However, the CMTV genome structure is novel and represents an intermediate evolutionary stage between the two previously described ALRV groups. We find that CMTV clusters with several other ranaviruses isolated from different hosts and locations which might also be included in this novel ranavirus group. This work sheds light on the phylogenetic relationships within this complex group of emerging, disease-causing viruses.
The complete genome sequence of freesia mosaic virus and its relationship to other potyviruses.
Choi, H I; Lim, H R; Song, Y S; Kim, M J; Choi, S H; Song, Y S; Bae, S C; Ryu, K H
2010-07-01
We have completed the genomic sequence of a potyvirus, freesia mosaic virus (FreMV), and compared it to those of other known potyviruses. The full-length genome sequence of FreMV consists of 9,489 nucleotides. The large protein contains 3,077 amino acids, with an AUG start codon and UAA stop codon, containing one open reading frame typical of a potyvirus polyprotein. The polyprotein of FreMV-Kr gives rise to eleven proteins (P1, HC-pro, P3, PIPO, 6K1, CI, 6K2, VPg, NIa, NIb and CP), and putative cleavage sites of each protein were identified by sequence comparison to those of other known potyviruses. Phylogenetic analysis of the polyprotein revealed that FreMV-Kr was most closely related to PeMoV and was related to BtMV, BaRMV and PeLMV, which belong to the BCMV subgroup. This is the first information on the complete genome structure of FreMV, and the sequence information clearly supports the status of FreMV as a member of a distinct species in the genus Potyvirus.
Novel mechanism of conjoined gene formation in the human genome.
Kim, Ryong Nam; Kim, Aeri; Choi, Sang-Haeng; Kim, Dae-Soo; Nam, Seong-Hyeuk; Kim, Dae-Won; Kim, Dong-Wook; Kang, Aram; Kim, Min-Young; Park, Kun-Hyang; Yoon, Byoung-Ha; Lee, Kang Seon; Park, Hong-Seog
2012-03-01
Recently, conjoined genes (CGs) have emerged as important genetic factors necessary for understanding the human genome. However, their formation mechanism and precise structures have remained mysterious. Based on a detailed structural analysis of 57 human CG transcript variants (CGTVs, discovered in this study) and all (833) known CGs in the human genome, we discovered that the poly(A) signal site from the upstream parent gene region is completely removed via the skipping or truncation of the final exon; consequently, CG transcription is terminated at the poly(A) signal site of the downstream parent gene. This result led us to propose a novel mechanism of CG formation: the complete removal of the poly(A) signal site from the upstream parent gene is a prerequisite for the CG transcriptional machinery to continue transcribing uninterrupted into the intergenic region and downstream parent gene. The removal of the poly(A) signal sequence from the upstream gene region appears to be caused by a deletion or truncation mutation in the human genome rather than post-transcriptional trans-splicing events. With respect to the characteristics of CG sequence structures, we found that intergenic regions are hot spots for novel exon creation during CGTV formation and that exons farther from the intergenic regions are more highly conserved in the CGTVs. Interestingly, many novel exons newly created within the intergenic and intragenic regions originated from transposable element sequences. Additionally, the CGTVs showed tumor tissue-biased expression. In conclusion, our study provides novel insights into the CG formation mechanism and expands the present concepts of the genetic structural landscape, gene regulation, and gene formation mechanisms in the human genome.
Complexity: an internet resource for analysis of DNA sequence complexity
Orlov, Y. L.; Potapov, V. N.
2004-01-01
The search for DNA regions with low complexity is one of the pivotal tasks of modern structural analysis of complete genomes. The low complexity may be preconditioned by strong inequality in nucleotide content (biased composition), by tandem or dispersed repeats or by palindrome-hairpin structures, as well as by a combination of all these factors. Several numerical measures of textual complexity, including combinatorial and linguistic ones, together with complexity estimation using a modified Lempel–Ziv algorithm, have been implemented in a software tool called ‘Complexity’ (http://wwwmgs.bionet.nsc.ru/mgs/programs/low_complexity/). The software enables a user to search for low-complexity regions in long sequences, e.g. complete bacterial genomes or eukaryotic chromosomes. In addition, it estimates the complexity of groups of aligned sequences. PMID:15215465
Yi, Xuan; Gao, Lei; Wang, Bo; Su, Ying-Juan; Wang, Ting
2013-01-01
We have determined the complete chloroplast (cp) genome sequence of Cephalotaxus oliveri. The genome is 134,337 bp in length, encodes 113 genes, and lacks inverted repeat (IR) regions. Genome-wide mutational dynamics have been investigated through comparative analysis of the cp genomes of C. oliveri and C. wilsoniana. Gene order transformation analyses indicate that when distinct isomers are considered as alternative structures for the ancestral cp genome of cupressophyte and Pinaceae lineages, it is not possible to distinguish between hypotheses favoring retention of the same IR region in cupressophyte and Pinaceae cp genomes from a hypothesis proposing independent loss of IRA and IRB. Furthermore, in cupressophyte cp genomes, the highly reduced IRs are replaced by short repeats that have the potential to mediate homologous recombination, analogous to the situation in Pinaceae. The importance of repeats in the mutational dynamics of cupressophyte cp genomes is also illustrated by the accD reading frame, which has undergone extreme length expansion in cupressophytes. This has been caused by a large insertion comprising multiple repeat sequences. Overall, we find that the distribution of repeats, indels, and substitutions is significantly correlated in Cephalotaxus cp genomes, consistent with a hypothesis that repeats play a role in inducing substitutions and indels in conifer cp genomes.
Chen, Jinhui; Hao, Zhaodong; Xu, Haibin; Yang, Liming; Liu, Guangxin; Sheng, Yu; Zheng, Chen; Zheng, Weiwei; Cheng, Tielong; Shi, Jisen
2015-01-01
Metasequoia glyptostroboides Hu et Cheng is the only species in the genus Metasequoia Miki ex Hu et Cheng, which belongs to the Cupressaceae family. There were around 10 species in the Metasequoia genus, which were widely spread across the Northern Hemisphere during the Cretaceous of the Mesozoic and in the Cenozoic. M. glyptostroboides is the only remaining representative of this genus. Here, we report the complete chloroplast (cp) genome sequence and the cp genomic features of M. glyptostroboides. The M. glyptostroboides cp genome is 131,887 bp in length, with a total of 117 genes comprised of 82 protein-coding genes, 31 tRNA genes and four rRNA genes. In this genome, 11 forward repeats, nine palindromic repeats, and 15 tandem repeats were detected. A total of 188 perfect microsatellites were detected through simple sequence repeat (SSR) analysis and these were distributed unevenly within the cp genome. Comparison of the cp genome structure and gene order to those of several other land plants indicated that a copy of the inverted repeat (IR) region, which was found to be IR region A (IRA), was lost in the M. glyptostroboides cp genome. The five most divergent and five most conserved genes were determined and further phylogenetic analysis was performed among plant species, especially for related species in conifers. Finally, phylogenetic analysis demonstrated that M. glyptostroboides is a sister species to Cryptomeria japonica (L. F.) D. Don and to Taiwania cryptomerioides Hayata. The complete cp genome sequence information of M. glyptostroboides will be great helpful for further investigations of this endemic relict woody plant and for in-depth understanding of the evolutionary history of the coniferous cp genomes, especially for the position of M. glyptostroboides in plant systematics and evolution.
Chen, Jinhui; Hao, Zhaodong; Xu, Haibin; Yang, Liming; Liu, Guangxin; Sheng, Yu; Zheng, Chen; Zheng, Weiwei; Cheng, Tielong; Shi, Jisen
2015-01-01
Metasequoia glyptostroboides Hu et Cheng is the only species in the genus Metasequoia Miki ex Hu et Cheng, which belongs to the Cupressaceae family. There were around 10 species in the Metasequoia genus, which were widely spread across the Northern Hemisphere during the Cretaceous of the Mesozoic and in the Cenozoic. M. glyptostroboides is the only remaining representative of this genus. Here, we report the complete chloroplast (cp) genome sequence and the cp genomic features of M. glyptostroboides. The M. glyptostroboides cp genome is 131,887 bp in length, with a total of 117 genes comprised of 82 protein-coding genes, 31 tRNA genes and four rRNA genes. In this genome, 11 forward repeats, nine palindromic repeats, and 15 tandem repeats were detected. A total of 188 perfect microsatellites were detected through simple sequence repeat (SSR) analysis and these were distributed unevenly within the cp genome. Comparison of the cp genome structure and gene order to those of several other land plants indicated that a copy of the inverted repeat (IR) region, which was found to be IR region A (IRA), was lost in the M. glyptostroboides cp genome. The five most divergent and five most conserved genes were determined and further phylogenetic analysis was performed among plant species, especially for related species in conifers. Finally, phylogenetic analysis demonstrated that M. glyptostroboides is a sister species to Cryptomeria japonica (L. F.) D. Don and to Taiwania cryptomerioides Hayata. The complete cp genome sequence information of M. glyptostroboides will be great helpful for further investigations of this endemic relict woody plant and for in-depth understanding of the evolutionary history of the coniferous cp genomes, especially for the position of M. glyptostroboides in plant systematics and evolution. PMID:26136762
Kim, Eunsoo; Lane, Christopher E; Curtis, Bruce A; Kozera, Catherine; Bowman, Sharen; Archibald, John M
2008-05-12
Cryptophytes are an enigmatic group of unicellular eukaryotes with plastids derived by secondary (i.e., eukaryote-eukaryote) endosymbiosis. Cryptophytes are unusual in that they possess four genomes-a host cell-derived nuclear and mitochondrial genome and an endosymbiont-derived plastid and 'nucleomorph' genome. The evolutionary origins of the host and endosymbiont components of cryptophyte algae are at present poorly understood. Thus far, a single complete mitochondrial genome sequence has been determined for the cryptophyte Rhodomonas salina. Here, the second complete mitochondrial genome of the cryptophyte alga Hemiselmis andersenii CCMP644 is presented. The H. andersenii mtDNA is 60,553 bp in size and encodes 30 structural RNAs and 36 protein-coding genes, all located on the same strand. A prominent feature of the genome is the presence of a approximately 20 Kbp long intergenic region comprised of numerous tandem and dispersed repeat units of between 22-336 bp. Adjacent to these repeats are 27 copies of palindromic sequences predicted to form stable DNA stem-loop structures. One such stem-loop is located near a GC-rich and GC-poor region and may have a regulatory function in replication or transcription. The H. andersenii mtDNA shares a number of features in common with the genome of the cryptophyte Rhodomonas salina, including general architecture, gene content, and the presence of a large repeat region. However, the H. andersenii mtDNA is devoid of inverted repeats and introns, which are present in R. salina. Comparative analyses of the suite of tRNAs encoded in the two genomes reveal that the H. andersenii mtDNA has lost or converted its original trnK(uuu) gene and possesses a trnS-derived 'trnK(uuu)', which appears unable to produce a functional tRNA. Mitochondrial protein coding gene phylogenies strongly support a variety of previously established eukaryotic groups, but fail to resolve the relationships among higher-order eukaryotic lineages. Comparison of the H. andersenii and R. salina mitochondrial genomes reveals a number of cryptophyte-specific genomic features, most notably the presence of a large repeat-rich intergenic region. However, unlike R. salina, the H. andersenii mtDNA does not possess introns and lacks a Lys-tRNA, which is presumably imported from the cytosol.
Dictionary-driven protein annotation.
Rigoutsos, Isidore; Huynh, Tien; Floratos, Aris; Parida, Laxmi; Platt, Daniel
2002-09-01
Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/ bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were released publicly after we built the Bio-Dictionary that is used in our experiments. Finally, we have computed the annotations of more than 70 complete genomes and made them available on the World Wide Web at http://cbcsrv.watson.ibm.com/Annotations/.
Bioinformatics prediction of siRNAs as potential antiviral agents against dengue viruses
Villegas-Rosales, Paula M; Méndez-Tenorio, Alfonso; Ortega-Soto, Elizabeth; Barrón, Blanca L
2012-01-01
Dengue virus (DENV 1-4) represents the major emerging arthropod-borne viral infection in the world. Currently, there is neither an available vaccine nor a specific treatment. Hence, there is a need of antiviral drugs for these viral infections; we describe the prediction of short interfering RNA (siRNA) as potential therapeutic agents against the four DENV serotypes. Our strategy was to carry out a series of multiple alignments using ClustalX program to find conserved sequences among the four DENV serotype genomes to obtain a consensus sequence for siRNAs design. A highly conserved sequence among the four DENV serotypes, located in the encoding sequence for NS4B and NS5 proteins was found. A total of 2,893 complete DENV genomes were downloaded from the NCBI, and after a depuration procedure to identify identical sequences, 220 complete DENV genomes were left. They were edited to select the NS4B and NS5 sequences, which were aligned to obtain a consensus sequence. Three different servers were used for siRNA design, and the resulting siRNAs were aligned to identify the most prevalent sequences. Three siRNAs were chosen, one targeted the genome region that codifies for NS4B protein and the other two; the region for NS5 protein. Predicted secondary structure for DENV genomes was used to demonstrate that the siRNAs were able to target the viral genome forming double stranded structures, necessary to activate the RNA silencing machinery. PMID:22829722
Genome alignment with graph data structures: a comparison
2014-01-01
Background Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference. Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment. Results We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures. Conclusion We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools. PMID:24712884
Lu, Z.; Altermann, E.; Breidt, F.; Kozyavkin, S.
2010-01-01
Vegetable fermentations rely on the proper succession of a variety of lactic acid bacteria (LAB). Leuconostoc mesenteroides initiates fermentation. As fermentation proceeds, L. mesenteroides dies off and other LAB complete the fermentation. Phages infecting L. mesenteroides may significantly influence the die-off of L. mesenteroides. However, no L. mesenteroides phages have been previously genetically characterized. Knowledge of more phage genome sequences may provide new insights into phage genomics, phage evolution, and phage-host interactions. We have determined the complete genome sequence of L. mesenteroides phage Φ1-A4, isolated from an industrial sauerkraut fermentation. The phage possesses a linear, double-stranded DNA genome consisting of 29,508 bp with a G+C content of 36%. Fifty open reading frames (ORFs) were predicted. Putative functions were assigned to 26 ORFs (52%), including 5 ORFs of structural proteins. The phage genome was modularly organized, containing DNA replication, DNA-packaging, head and tail morphogenesis, cell lysis, and DNA regulation/modification modules. In silico analyses showed that Φ1-A4 is a unique lytic phage with a large-scale genome inversion (∼30% of the genome). The genome inversion encompassed the lysis module, part of the structural protein module, and a cos site. The endolysin gene was flanked by two holin genes. The tail morphogenesis module was interspersed with cell lysis genes and other genes with unknown functions. The predicted amino acid sequences of the phage proteins showed little similarity to other phages, but functional analyses showed that Φ1-A4 clusters with several Lactococcus phages. To our knowledge, Φ1-A4 is the first genetically characterized L. mesenteroides phage. PMID:20118355
The complete chloroplast genome of two Brassica species, Brassica nigra and B. Oleracea.
Seol, Young-Joo; Kim, Kyunghee; Kang, Sang-Ho; Perumal, Sampath; Lee, Jonghoon; Kim, Chang-Kug
2017-03-01
The two Brassica species, Brassica nigra and Brassica oleracea, are important agronomic crops. The chloroplast genome sequences were generated by de novo assembly using whole genome next-generation sequences. The chloroplast genomes of B. nigra and B. oleracea were 153 633 bp and 153 366 bp in size, respectively, and showed conserved typical chloroplast structure. The both chloroplast genomes contained a total of 114 genes including 80 protein-coding genes, 30 tRNA genes, and 4 rRNA genes. Phylogenetic analysis revealed that B. oleracea is closely related to B. rapa and B. napus but B. nigra is more diverse than the neighbor species Raphanus sativus.
Henderson, James B.; Sellas, Anna B.; Fuchs, Jérôme; Bowie, Rauri C.K.; Dumbacher, John P.
2017-01-01
We report here the successful assembly of the complete mitochondrial genomes of the northern spotted owl (Strix occidentalis caurina) and the barred owl (S. varia). We utilized sequence data from two sequencing methodologies, Illumina paired-end sequence data with insert lengths ranging from approximately 250 nucleotides (nt) to 9,600 nt and read lengths from 100–375 nt and Sanger-derived sequences. We employed multiple assemblers and alignment methods to generate the final assemblies. The circular genomes of S. o. caurina and S. varia are comprised of 19,948 nt and 18,975 nt, respectively. Both code for two rRNAs, twenty-two tRNAs, and thirteen polypeptides. They both have duplicated control region sequences with complex repeat structures. We were not able to assemble the control regions solely using Illumina paired-end sequence data. By fully spanning the control regions, Sanger-derived sequences enabled accurate and complete assembly of these mitochondrial genomes. These are the first complete mitochondrial genome sequences of owls (Aves: Strigiformes) possessing duplicated control regions. We searched the nuclear genome of S. o. caurina for copies of mitochondrial genes and found at least nine separate stretches of nuclear copies of gene sequences originating in the mitochondrial genome (Numts). The Numts ranged from 226–19,522 nt in length and included copies of all mitochondrial genes except tRNAPro, ND6, and tRNAGlu. Strix occidentalis caurina and S. varia exhibited an average of 10.74% (8.68% uncorrected p-distance) divergence across the non-tRNA mitochondrial genes. PMID:29038757
Noutoshi, Y; Arai, R; Fujie, M; Yamada, T
1997-01-01
As a model for plant-type chromosomes, we have been characterizing molecular organization of the Chlorella vulgaris C-169 chromosome I. To identify chromosome structural elements including the centromeric region and replication origins, we constructed a chromosome I specific cosmid library and aligned each cosmid clones to generate contigs. So far, more than 80% of the entire chromosome I has been covered. A complete clonal physical reconstitution of chromosome I provides information on the structure and genomic organization of plant genome. We propose our strategy to construct an artificial chromosome by assembling the functional chromosome structural elements identified on Chrorella chromosome I.
Cunha, Mariana Sequetin; Esposito, Danillo Lucas Alves; Rocco, Iray Maria; Maeda, Adriana Yurika; Vasami, Fernanda Gisele Silva; Nogueira, Juliana Silva; de Souza, Renato Pereira; Suzuki, Akemi; Addas-Carvalho, Marcelo; Barjas-Castro, Maria de Lourdes; Resende, Mariângela Ribeiro; Stucchi, Raquel Silveira Bello; Boin, Ilka de Fátima Santana Ferreira; Katz, Gizelda; Angerami, Rodrigo Nogueira
2016-01-01
We report here the genome sequence of Zika virus, strain ZikaSPH2015, containing all structural and nonstructural proteins flanked by the 5′ and 3′ untranslated region. It was isolated in São Paulo state, Brazil, in 2015, from a patient who received a blood transfusion from an asymptomatic donor at the time of donation. PMID:26941134
Kilpert, Fabian; Podsiadlowski, Lars
2006-01-01
Background Sequence data and other characters from mitochondrial genomes (gene translocations, secondary structure of RNA molecules) are useful in phylogenetic studies among metazoan animals from population to phylum level. Moreover, the comparison of complete mitochondrial sequences gives valuable information about the evolution of small genomes, e.g. about different mechanisms of gene translocation, gene duplication and gene loss, or concerning nucleotide frequency biases. The Peracarida (gammarids, isopods, etc.) comprise about 21,000 species of crustaceans, living in many environments from deep sea floor to arid terrestrial habitats. Ligia oceanica is a terrestrial isopod living at rocky seashores of the european North Sea and Atlantic coastlines. Results The study reveals the first complete mitochondrial DNA sequence from a peracarid crustacean. The mitochondrial genome of Ligia oceanica is a circular double-stranded DNA molecule, with a size of 15,289 bp. It shows several changes in mitochondrial gene order compared to other crustacean species. An overview about mitochondrial gene order of all crustacean taxa yet sequenced is also presented. The largest non-coding part (the putative mitochondrial control region) of the mitochondrial genome of Ligia oceanica is unexpectedly not AT-rich compared to the remainder of the genome. It bears two repeat regions (4× 10 bp and 3× 64 bp), and a GC-rich hairpin-like secondary structure. Some of the transfer RNAs show secondary structures which derive from the usual cloverleaf pattern. While some tRNA genes are putative targets for RNA editing, trnR could not be localized at all. Conclusion Gene order is not conserved among Peracarida, not even among isopods. The two isopod species Ligia oceanica and Idotea baltica show a similarly derived gene order, compared to the arthropod ground pattern and to the amphipod Parhyale hawaiiensis, suggesting that most of the translocation events were already present the last common ancestor of these isopods. Beyond that, the positions of three tRNA genes differ in the two isopod species. Strand bias in nucleotide frequency is reversed in both isopod species compared to other Malacostraca. This is probably due to a reversal of the replication origin, which is further supported by the fact that the hairpin structure typically found in the control region shows a reversed orientation in the isopod species, compared to other crustaceans. PMID:16987408
Company profile: Complete Genomics Inc.
Reid, Clifford
2011-02-01
Complete Genomics Inc. is a life sciences company that focuses on complete human genome sequencing. It is taking a completely different approach to DNA sequencing than other companies in the industry. Rather than building a general-purpose platform for sequencing all organisms and all applications, it has focused on a single application - complete human genome sequencing. The company's Complete Genomics Analysis Platform (CGA™ Platform) comprises an integrated package of biochemistry, instrumentation and software that sequences human genomes at the highest quality, lowest cost and largest scale available. Complete Genomics offers a turnkey service that enables customers to outsource their human genome sequencing to the company's genome sequencing center in Mountain View, CA, USA. Customers send in their DNA samples, the company does all the library preparation, DNA sequencing, assembly and variant analysis, and customers receive research-ready data that they can use for biological discovery.
Núñez-Acuña, Gustavo; Aguilar-Espinoza, Andrea; Gallardo-Escárate, Cristian
2013-03-01
Despite the great relevance of mitochondrial genome analysis in evolutionary studies, there is scarce information on how the transcripts associated with the mitogenome are expressed and their role in the genetic structuring of populations. This work reports the complete mitochondrial genome of the marine gastropod Concholepas concholepas, obtained by 454 pryosequencing, and an analysis of mitochondrial transcripts of two populations 1000 km apart along the Chilean coast. The mitochondrion of C. concholepas is 15,495 base pairs (bp) in size and contains the 37 subunits characteristic of metazoans, as well as a non-coding region of 330 bp. In silico analysis of mitochondrial gene variability showed significant differences among populations. In terms of levels of relative abundance of transcripts associated with mitochondrion in the two populations (assessed by qPCR), the genes associated with complexes III and IV of the mitochondrial genome had the highest levels of expression in the northern population while transcripts associated with the ATP synthase complex had the highest levels of expression in the southern population. Moreover, fifteen polymorphic SNPs were identified in silico between the mitogenomes of the two populations. Four of these markers implied different amino acid substitutions (non-synonymous SNPs). This work contributes novel information regarding the mitochondrial genome structure and mRNA expression levels of C. concholepas. Copyright © 2012 Elsevier Inc. All rights reserved.
McNeil, Leslie Klis; Reich, Claudia; Aziz, Ramy K; Bartels, Daniela; Cohoon, Matthew; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Hwang, Kaitlyn; Kubal, Michael; Margaryan, Gohar Rem; Meyer, Folker; Mihalo, William; Olsen, Gary J; Olson, Robert; Osterman, Andrei; Paarmann, Daniel; Paczian, Tobias; Parrello, Bruce; Pusch, Gordon D; Rodionov, Dmitry A; Shi, Xinghua; Vassieva, Olga; Vonstein, Veronika; Zagnitko, Olga; Xia, Fangfang; Zinner, Jenifer; Overbeek, Ross; Stevens, Rick
2007-01-01
The National Microbial Pathogen Data Resource (NMPDR) (http://www.nmpdr.org) is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of approximately 50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development.
Gruenstaeudl, Michael; Gerschler, Nico; Borsch, Thomas
2018-06-21
The sequencing and comparison of plastid genomes are becoming a standard method in plant genomics, and many researchers are using this approach to infer plant phylogenetic relationships. Due to the widespread availability of next-generation sequencing, plastid genome sequences are being generated at breakneck pace. This trend towards massive sequencing of plastid genomes highlights the need for standardized bioinformatic workflows. In particular, documentation and dissemination of the details of genome assembly, annotation, alignment and phylogenetic tree inference are needed, as these processes are highly sensitive to the choice of software and the precise settings used. Here, we present the procedure and results of sequencing, assembling, annotating and quality-checking of three complete plastid genomes of the aquatic plant genus Cabomba as well as subsequent gene alignment and phylogenetic tree inference. We accompany our findings by a detailed description of the bioinformatic workflow employed. Importantly, we share a total of eleven software scripts for each of these bioinformatic processes, enabling other researchers to evaluate and replicate our analyses step by step. The results of our analyses illustrate that the plastid genomes of Cabomba are highly conserved in both structure and gene content.
Kyalo, Cornelius M; Gichira, Andrew W; Li, Zhi-Zhong; Saina, Josphat K; Malombe, Itambo; Hu, Guang-Wan; Wang, Qing-Feng
2018-01-01
Streptocarpus teitensis (Gesneriaceae) is an endemic species listed as critically endangered in the International Union for Conservation of Nature (IUCN) red list of threatened species. However, the sequence and genome information of this species remains to be limited. In this article, we present the complete chloroplast genome structure of Streptocarpus teitensis and its evolution inferred through comparative studies with other related species. S. teitensis displayed a chloroplast genome size of 153,207 bp, sheltering a pair of inverted repeats (IR) of 25,402 bp each split by small and large single-copy (SSC and LSC) regions of 18,300 and 84,103 bp, respectively. The chloroplast genome was observed to contain 116 unique genes, of which 80 are protein-coding, 32 are transfer RNAs, and four are ribosomal RNAs. In addition, a total of 196 SSR markers were detected in the chloroplast genome of Streptocarpus teitensis with mononucleotides (57.1%) being the majority, followed by trinucleotides (33.2%) and dinucleotides and tetranucleotides (both 4.1%), and pentanucleotides being the least (1.5%). Genome alignment indicated that this genome was comparable to other sequenced members of order Lamiales. The phylogenetic analysis suggested that Streptocarpus teitensis is closely related to Lysionotus pauciflorus and Dorcoceras hygrometricum .
Xia, Chongjing; Wang, Meinan; Yin, Chuntao; Cornejo, Omar E; Hulbert, Scot; Chen, Xianming
2018-05-24
Puccinia striiformis f. sp. tritici (Pst) causes devastating stripe (yellow) rust on wheat and P. striiformis f. sp. hordei (Psh) causes stripe rust on barley. Several Pst genomes are available, but no Psh genome is available. More genomes of Pst and Psh are needed to understand the genome evolution and molecular mechanisms of their pathogenicity. We sequenced Pst isolate 93-210 and Psh isolate 93TX-2 using PacBio and Illumina technologies, and RNA sequencing. Their genomic sequences were assembled to contigs with high continuity and showed significant structural differences. The circular mitochondria genomes of both were complete. These genomes provide high-quality resources for deciphering the genomic basis of rapid evolution and host adaptation, identifying genes for avirulence and other important traits, and studying host-pathogen interaction.
Nie, Xiaojun; Lv, Shuzuo; Zhang, Yingxin; Du, Xianghong; Wang, Le; Biradar, Siddanagouda S; Tan, Xiufang; Wan, Fanghao; Weining, Song
2012-01-01
Crofton weed (Ageratina adenophora) is one of the most hazardous invasive plant species, which causes serious economic losses and environmental damages worldwide. However, the sequence resource and genome information of A. adenophora are rather limited, making phylogenetic identification and evolutionary studies very difficult. Here, we report the complete sequence of the A. adenophora chloroplast (cp) genome based on Illumina sequencing. The A. adenophora cp genome is 150, 689 bp in length including a small single-copy (SSC) region of 18, 358 bp and a large single-copy (LSC) region of 84, 815 bp separated by a pair of inverted repeats (IRs) of 23, 755 bp. The genome contains 130 unique genes and 18 duplicated in the IR regions, with the gene content and organization similar to other Asteraceae cp genomes. Comparative analysis identified five DNA regions (ndhD-ccsA, psbI-trnS, ndhF-ycf1, ndhI-ndhG and atpA-trnR) containing parsimony-informative characters higher than 2%, which may be potential informative markers for barcoding and phylogenetic analysis. Repeat structure, codon usage and contraction of the IR were also investigated to reveal the pattern of evolution. Phylogenetic analysis demonstrated a sister relationship between A. adenophora and Guizotia abyssinica and supported a monophyly of the Asterales. We have assembled and analyzed the chloroplast genome of A. adenophora in this study, which was the first sequenced plastome in the Eupatorieae tribe. The complete chloroplast genome information is useful for plant phylogenetic and evolutionary studies within this invasive species and also within the Asteraceae family.
Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen
2015-01-01
Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5' portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids.
Zheng, Renhua; Xu, Haibin; Zhou, Yanwei; Li, Meiping; Lu, Fengjuan; Dong, Yini; Liu, Xin; Chen, Jinhui; Shi, Jisen
2016-01-01
Glyptostrobus pensilis, belonging to the monotypic genus Glyptostrobus (Family: Cupressaceae), is an ancient conifer that is naturally distributed in low-lying wet areas. Here, we report the complete chloroplast (cp) genome sequence (132,239 bp) of G. pensilis. The G. pensilis cp genome is similar in gene content, organization and genome structure to the sequenced cp genomes from other cupressophytes, especially with respect to the loss of the inverted repeat region A (IRA). Through phylogenetic analysis, we demonstrated that the genus Glyptostrobus is closely related to the genus Cryptomeria, supporting previous findings based on physiological characteristics. Since IRs play an important role in stabilize cp genome and conifer cp genomes lost different IR regions after splitting in two clades (cupressophytes and Pinaceae), we performed cp genome rearrangement analysis and found more extensive cp genome rearrangements among the species of cupressophytes relative to Pinaceae. Additional repeat analysis indicated that cupressophytes cp genomes contained less potential functional repeats, especially in Cupressaceae, compared with Pinaceae. These results suggested that dynamics of cp genome rearrangement in conifers differed since the two clades, Pinaceae and cupressophytes, lost IR copies independently and developed different repeats to complement the residual IRs. In addition, we identified 170 perfect simple sequence repeats that will be useful in future research focusing on the evolution of genetic diversity and conservation of genetic variation for this endangered species in the wild. PMID:27560965
Song, Chao; Hu, Gengdong; Qiu, Liping; Fan, Limin; Meng, Shunlong; Chen, Jiazhang
2016-11-01
The complete mitochondrial genome of Hyporhamphus intermedius was determined to be 16,720 bp in length with (A + T) content of 56.3%, and it consists of 13 protein-coding genes, 22 tRNAs, two ribosomal RNAs, and a control region. The gene composition and the structural arrangement of the H. intermedius complete mtDNA were identical to most of the other vertebrates. Interestingly, two tandem repeat units were identified across tRNA-Pro and control region (2*41 bp), while in most of the fishes the tandem repeat units are located in the control region. The molecular data we presented here could play a useful role to study the evolutionary relationships and population genetics of Hemirhamphidae fish.
Peredo, Elena L.; King, Ursula M.; Les, Donald H.
2013-01-01
The re-colonization of aquatic habitats by angiosperms has presented a difficult challenge to plants whose long evolutionary history primarily reflects adaptations to terrestrial conditions. Many aquatics must complete vital stages of their life cycle on the water surface by means of floating or emergent leaves and flowers. Only a few species, mainly within the order Alismatales, are able to complete all aspects of their life cycle including pollination, entirely underwater. Water-pollinated Alismatales include seagrasses and water nymphs (Najas), the latter being the only freshwater genus in the family Hydrocharitaceae with subsurface water-pollination. We have determined the complete nucleotide sequence of the plastid genome of Najas flexilis. The plastid genome of N. flexilis is a circular AT-rich DNA molecule of 156 kb, which displays a quadripartite structure with two inverted repeats (IR) separating the large single copy (LSC) from the small single copy (SSC) regions. In N. flexilis, as in other Alismatales, the rps19 and trnH genes are localized in the LSC region instead of within the IR regions as in other monocots. However, the N. flexilis plastid genome presents some anomalous modifications. The size of the SSC region is only one third of that reported for closely related species. The number of genes in the plastid is considerably less. Both features are due to loss of the eleven ndh genes in the Najas flexilis plastid. In angiosperms, the absence of ndh genes has been related mainly to the loss of photosynthetic function in parasitic plants. The ndh genes encode the NAD(P)H dehydrogenase complex, believed essential in terrestrial environments, where it increases photosynthetic efficiency in variable light intensities. The modified structure of the N. flexilis plastid genome suggests that adaptation to submersed environments, where light is scarce, has involved the loss of the NDH complex in at least some photosynthetic angiosperms. PMID:23861923
Genome-wide characterization of centromeric satellites from multiple mammalian genomes.
Alkan, Can; Cardone, Maria Francesca; Catacchio, Claudia Rita; Antonacci, Francesca; O'Brien, Stephen J; Ryder, Oliver A; Purgato, Stefania; Zoli, Monica; Della Valle, Giuliano; Eichler, Evan E; Ventura, Mario
2011-01-01
Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.
Yoshida, Tetsuya; Kitazawa, Yugo; Komatsu, Ken; Neriya, Yutaro; Ishikawa, Kazuya; Fujita, Naoko; Hashimoto, Masayoshi; Maejima, Kensaku; Yamaji, Yasuyuki; Namba, Shigetou
2014-11-01
In this study, we detected a Japanese isolate of hibiscus latent Fort Pierce virus (HLFPV-J), a member of the genus Tobamovirus, in a hibiscus plant in Japan and determined the complete sequence and organization of its genome. HLFPV-J has four open reading frames (ORFs), each of which shares more than 98 % nucleotide sequence identity with those of other HLFPV isolates. Moreover, HLFPV-J contains a unique internal poly(A) region of variable length, ranging from 44 to 78 nucleotides, in its 3'-untranslated region (UTR), as is the case with hibiscus latent Singapore virus (HLSV), another hibiscus-infecting tobamovirus. The length of the HLFPV-J genome was 6431 nucleotides, including the shortest internal poly(A) region. The sequence identities of ORFs 1, 2, 3 and 4 of HLFPV-J to other tobamoviruses were 46.6-68.7, 49.9-70.8, 31.0-70.8 and 39.4-70.1 %, respectively, at the nucleotide level and 39.8-75.0, 43.6-77.8, 19.2-70.4 and 31.2-74.2 %, respectively, at the amino acid level. The 5'- and 3'-UTRs of HLFPV-J showed 24.3-58.6 and 13.0-79.8 % identity, respectively, to other tobamoviruses. In particular, when compared to other tobamoviruses, each ORF and UTR of HLFPV-J showed the highest sequence identity to those of HLSV. Phylogenetic analysis showed that HLFPV-J, other HLFPV isolates and HLSV constitute a malvaceous-plant-infecting tobamovirus cluster. These results indicate that the genomic structure of HLFPV-J has unique features similar to those of HLSV. To our knowledge, this is the first report of the complete genome sequence of HLFPV.
2014-01-01
Background Clinical and subclinical coccidiosis is cosmopolitan and inflicts significant losses to the poultry industry globally. Seven named Eimeria species are responsible for coccidiosis in turkeys: Eimeria dispersa; Eimeria meleagrimitis; Eimeria gallopavonis; Eimeria meleagridis; Eimeria adenoeides; Eimeria innocua; and, Eimeria subrotunda. Although attempts have been made to characterize these parasites molecularly at the nuclear 18S rDNA and ITS loci, the maternally-derived and mitotically replicating mitochondrial genome may be more suited for species level molecular work; however, only limited sequence data are available for Eimeria spp. infecting turkeys. The purpose of this study was to sequence and annotate the complete mitochondrial genomes from 5 Eimeria species that commonly infect the domestic turkey (Meleagris gallopavo). Methods Six single-oocyst derived cultures of five Eimeria species infecting turkeys were PCR-amplified and sequenced completely prior to detailed annotation. Resulting sequences were aligned and used in phylogenetic analyses (BI, ML, and MP) that included complete mitochondrial genomes from 16 Eimeria species or concatenated CDS sequences from each genome. Results Complete mitochondrial genome sequences were obtained for Eimeria adenoeides Guelph, 6211 bp; Eimeria dispersa Briston, 6238 bp; Eimeria meleagridis USAR97-01, 6212 bp; Eimeria meleagrimitis USMN08-01, 6165 bp; Eimeria gallopavonis Weybridge, 6215 bp; and Eimeria gallopavonis USKS06-01, 6215 bp). The order, orientation and CDS lengths of the three protein coding genes (COI, COIII and CytB) as well as rDNA fragments encoding ribosomal large and small subunit rRNA were conserved among all sequences. Pairwise sequence identities between species ranged from 88.1% to 98.2%; sequence variability was concentrated within CDS or between rDNA fragments (where indels were common). No phylogenetic reconstruction supported monophyly of Eimeria species infecting turkeys; Eimeria dispersa may have arisen via host switching from another avian host. Phylogenetic analyses suggest E. necatrix and E. tenella are related distantly to other Eimeria of chickens. Conclusions Mitochondrial genomes of Eimeria species sequenced to date are highly conserved with regard to gene content and structure. Nonetheless, complete mitochondrial genome sequences and, particularly the three CDS, possess sufficient sequence variability for differentiating Eimeria species of poultry. The mitochondrial genome sequences are highly suited for molecular diagnostics and phylogenetics of coccidia and, potentially, genetic markers for molecular epidemiology. PMID:25034633
Ogedengbe, Mosun E; El-Sherry, Shiem; Whale, Julia; Barta, John R
2014-07-17
Clinical and subclinical coccidiosis is cosmopolitan and inflicts significant losses to the poultry industry globally. Seven named Eimeria species are responsible for coccidiosis in turkeys: Eimeria dispersa; Eimeria meleagrimitis; Eimeria gallopavonis; Eimeria meleagridis; Eimeria adenoeides; Eimeria innocua; and, Eimeria subrotunda. Although attempts have been made to characterize these parasites molecularly at the nuclear 18S rDNA and ITS loci, the maternally-derived and mitotically replicating mitochondrial genome may be more suited for species level molecular work; however, only limited sequence data are available for Eimeria spp. infecting turkeys. The purpose of this study was to sequence and annotate the complete mitochondrial genomes from 5 Eimeria species that commonly infect the domestic turkey (Meleagris gallopavo). Six single-oocyst derived cultures of five Eimeria species infecting turkeys were PCR-amplified and sequenced completely prior to detailed annotation. Resulting sequences were aligned and used in phylogenetic analyses (BI, ML, and MP) that included complete mitochondrial genomes from 16 Eimeria species or concatenated CDS sequences from each genome. Complete mitochondrial genome sequences were obtained for Eimeria adenoeides Guelph, 6211 bp; Eimeria dispersa Briston, 6238 bp; Eimeria meleagridis USAR97-01, 6212 bp; Eimeria meleagrimitis USMN08-01, 6165 bp; Eimeria gallopavonis Weybridge, 6215 bp; and Eimeria gallopavonis USKS06-01, 6215 bp). The order, orientation and CDS lengths of the three protein coding genes (COI, COIII and CytB) as well as rDNA fragments encoding ribosomal large and small subunit rRNA were conserved among all sequences. Pairwise sequence identities between species ranged from 88.1% to 98.2%; sequence variability was concentrated within CDS or between rDNA fragments (where indels were common). No phylogenetic reconstruction supported monophyly of Eimeria species infecting turkeys; Eimeria dispersa may have arisen via host switching from another avian host. Phylogenetic analyses suggest E. necatrix and E. tenella are related distantly to other Eimeria of chickens. Mitochondrial genomes of Eimeria species sequenced to date are highly conserved with regard to gene content and structure. Nonetheless, complete mitochondrial genome sequences and, particularly the three CDS, possess sufficient sequence variability for differentiating Eimeria species of poultry. The mitochondrial genome sequences are highly suited for molecular diagnostics and phylogenetics of coccidia and, potentially, genetic markers for molecular epidemiology.
Muhire, Brejnev Muhizi; Golden, Michael; Murrell, Ben; Lefeuvre, Pierre; Lett, Jean-Michel; Gray, Alistair; Poon, Art Y F; Ngandu, Nobubelo Kwanele; Semegni, Yves; Tanov, Emil Pavlov; Monjane, Adérito Luis; Harkins, Gordon William; Varsani, Arvind; Shepherd, Dionne Natalie; Martin, Darren Patrick
2014-02-01
Single-stranded DNA (ssDNA) viruses have genomes that are potentially capable of forming complex secondary structures through Watson-Crick base pairing between their constituent nucleotides. A few of the structural elements formed by such base pairings are, in fact, known to have important functions during the replication of many ssDNA viruses. Unknown, however, are (i) whether numerous additional ssDNA virus genomic structural elements predicted to exist by computational DNA folding methods actually exist and (ii) whether those structures that do exist have any biological relevance. We therefore computationally inferred lists of the most evolutionarily conserved structures within a diverse selection of animal- and plant-infecting ssDNA viruses drawn from the families Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae and analyzed these for evidence of natural selection favoring the maintenance of these structures. While we find evidence that is consistent with purifying selection being stronger at nucleotide sites that are predicted to be base paired than at sites predicted to be unpaired, we also find strong associations between sites that are predicted to pair with one another and site pairs that are apparently coevolving in a complementary fashion. Collectively, these results indicate that natural selection actively preserves much of the pervasive secondary structure that is evident within eukaryote-infecting ssDNA virus genomes and, therefore, that much of this structure is biologically functional. Lastly, we provide examples of various highly conserved but completely uncharacterized structural elements that likely have important functions within some of the ssDNA virus genomes analyzed here.
Muhire, Brejnev Muhizi; Golden, Michael; Murrell, Ben; Lefeuvre, Pierre; Lett, Jean-Michel; Gray, Alistair; Poon, Art Y. F.; Ngandu, Nobubelo Kwanele; Semegni, Yves; Tanov, Emil Pavlov; Monjane, Adérito Luis; Harkins, Gordon William; Varsani, Arvind; Shepherd, Dionne Natalie
2014-01-01
Single-stranded DNA (ssDNA) viruses have genomes that are potentially capable of forming complex secondary structures through Watson-Crick base pairing between their constituent nucleotides. A few of the structural elements formed by such base pairings are, in fact, known to have important functions during the replication of many ssDNA viruses. Unknown, however, are (i) whether numerous additional ssDNA virus genomic structural elements predicted to exist by computational DNA folding methods actually exist and (ii) whether those structures that do exist have any biological relevance. We therefore computationally inferred lists of the most evolutionarily conserved structures within a diverse selection of animal- and plant-infecting ssDNA viruses drawn from the families Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae and analyzed these for evidence of natural selection favoring the maintenance of these structures. While we find evidence that is consistent with purifying selection being stronger at nucleotide sites that are predicted to be base paired than at sites predicted to be unpaired, we also find strong associations between sites that are predicted to pair with one another and site pairs that are apparently coevolving in a complementary fashion. Collectively, these results indicate that natural selection actively preserves much of the pervasive secondary structure that is evident within eukaryote-infecting ssDNA virus genomes and, therefore, that much of this structure is biologically functional. Lastly, we provide examples of various highly conserved but completely uncharacterized structural elements that likely have important functions within some of the ssDNA virus genomes analyzed here. PMID:24284329
Kim, Eunsoo; Lane, Christopher E; Curtis, Bruce A; Kozera, Catherine; Bowman, Sharen; Archibald, John M
2008-01-01
Background Cryptophytes are an enigmatic group of unicellular eukaryotes with plastids derived by secondary (i.e., eukaryote-eukaryote) endosymbiosis. Cryptophytes are unusual in that they possess four genomes–a host cell-derived nuclear and mitochondrial genome and an endosymbiont-derived plastid and 'nucleomorph' genome. The evolutionary origins of the host and endosymbiont components of cryptophyte algae are at present poorly understood. Thus far, a single complete mitochondrial genome sequence has been determined for the cryptophyte Rhodomonas salina. Here, the second complete mitochondrial genome of the cryptophyte alga Hemiselmis andersenii CCMP644 is presented. Results The H. andersenii mtDNA is 60,553 bp in size and encodes 30 structural RNAs and 36 protein-coding genes, all located on the same strand. A prominent feature of the genome is the presence of a ~20 Kbp long intergenic region comprised of numerous tandem and dispersed repeat units of between 22–336 bp. Adjacent to these repeats are 27 copies of palindromic sequences predicted to form stable DNA stem-loop structures. One such stem-loop is located near a GC-rich and GC-poor region and may have a regulatory function in replication or transcription. The H. andersenii mtDNA shares a number of features in common with the genome of the cryptophyte Rhodomonas salina, including general architecture, gene content, and the presence of a large repeat region. However, the H. andersenii mtDNA is devoid of inverted repeats and introns, which are present in R. salina. Comparative analyses of the suite of tRNAs encoded in the two genomes reveal that the H. andersenii mtDNA has lost or converted its original trnK(uuu) gene and possesses a trnS-derived 'trnK(uuu)', which appears unable to produce a functional tRNA. Mitochondrial protein coding gene phylogenies strongly support a variety of previously established eukaryotic groups, but fail to resolve the relationships among higher-order eukaryotic lineages. Conclusion Comparison of the H. andersenii and R. salina mitochondrial genomes reveals a number of cryptophyte-specific genomic features, most notably the presence of a large repeat-rich intergenic region. However, unlike R. salina, the H. andersenii mtDNA does not possess introns and lacks a Lys-tRNA, which is presumably imported from the cytosol. PMID:18474103
Mutations that Cause Human Disease: A Computational/Experimental Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beernink, P; Barsky, D; Pesavento, B
International genome sequencing projects have produced billions of nucleotides (letters) of DNA sequence data, including the complete genome sequences of 74 organisms. These genome sequences have created many new scientific opportunities, including the ability to identify sequence variations among individuals within a species. These genetic differences, which are known as single nucleotide polymorphisms (SNPs), are particularly important in understanding the genetic basis for disease susceptibility. Since the report of the complete human genome sequence, over two million human SNPs have been identified, including a large-scale comparison of an entire chromosome from twenty individuals. Of the protein coding SNPs (cSNPs), approximatelymore » half leads to a single amino acid change in the encoded protein (non-synonymous coding SNPs). Most of these changes are functionally silent, while the remainder negatively impact the protein and sometimes cause human disease. To date, over 550 SNPs have been found to cause single locus (monogenic) diseases and many others have been associated with polygenic diseases. SNPs have been linked to specific human diseases, including late-onset Parkinson disease, autism, rheumatoid arthritis and cancer. The ability to predict accurately the effects of these SNPs on protein function would represent a major advance toward understanding these diseases. To date several attempts have been made toward predicting the effects of such mutations. The most successful of these is a computational approach called ''Sorting Intolerant From Tolerant'' (SIFT). This method uses sequence conservation among many similar proteins to predict which residues in a protein are functionally important. However, this method suffers from several limitations. First, a query sequence must have a sufficient number of relatives to infer sequence conservation. Second, this method does not make use of or provide any information on protein structure, which can be used to understand how an amino acid change affects the protein. The experimental methods that provide the most detailed structural information on proteins are X-ray crystallography and NMR spectroscopy. However, these methods are labor intensive and currently cannot be carried out on a genomic scale. Nonetheless, Structural Genomics projects are being pursued by more than a dozen groups and consortia worldwide and as a result the number of experimentally determined structures is rising exponentially. Based on the expectation that protein structures will continue to be determined at an ever-increasing rate, reliable structure prediction schemes will become increasingly valuable, leading to information on protein function and disease for many different proteins. Given known genetic variability and experimentally determined protein structures, can we accurately predict the effects of single amino acid substitutions? An objective assessment of this question would involve comparing predicted and experimentally determined structures, which thus far has not been rigorously performed. The completed research leveraged existing expertise at LLNL in computational and structural biology, as well as significant computing resources, to address this question.« less
The complete plastid genome sequence of Eustrephus latifolius (Asparagaceae: Lomandroideae).
Kim, Hyoung Tae; Kim, Jung Sung; Kim, Joo-Hwan
2016-01-01
The complete chloroplast (cp) genome sequence of Eustrephus latifolius was firstly determined in subfamily Lomandriodeae of family Asparagaceae. It was 159,736 bp and contained a large single copy region (82,403 bp) and a small single copy region (13,607 bp) which were separated by two inverted repeat regions (31,863 bp). In total, 132 genes were identified and they were consisted of 83 coding genes, 8 rRNA genes, 38 tRNA genes, 3 pseudogenes. rpl23 and clpP were pseudogenes due to sequence deletions. Among 23 genes containing introns, rps12 and ycf3 contained two introns and the rest had just one intron. The intact ycf68 was identified within an intron of trnI-GAU. The amino acid sequence was almost identical with Phoenix dactylifera in Aracales. Ycf1 of E. latifolius was completely located in IR. It was similar to cp genome structure of Lemna minor, Spirodela polyrhiza, Wolffiella lingulata, Wolffia australiana in Alismatales.
Complete nucleotide sequence and annotation of the temperate corynephage ϕ16 genome.
Lobanova, Juliya S; Gak, Evgueni R; Andreeva, Irina G; Rybak, Konstantin V; Krylov, Alexander A; Mashko, Sergey V
2017-08-01
The complete genome of ϕ16, a temperate corynephage from Corynebacterium glutamicum ATCC 21792, was sequenced and annotated (GenBank: KY250482). The electron microscopy study of ϕ16 virion confirmed that it belongs to the family Siphoviridae. The ϕ16 genome consists of a linear double-stranded DNA molecule of 58,200 bp (G+C = 52.2%) with protruding cohesive 3'-ends of 14 nt. Four major structural proteins were separated by SDS-PAGE and identified by peptide mass fingerprinting technique. Using bioinformatics analysis, 101 putative ORFs and 5 tRNA genes were predicted. Only 27 putative gene products could be assigned to known biological functions. The ϕ16 genome was divided into functional modules. Seven putative promoters and eight putative unidirectional intrinsic terminators were predicted. One site of putative «-1» programmed ribosomal frameshifting was proposed in the phage tail assembly genome region. C. glutamicum genetic tools could be broadened by exploiting the known integrase gene (gp33) and the newly identified excisionase gene (gp47), participating in site-specific recombination between ϕ16-attP/attB.
The coffee genome hub: a resource for coffee genomes
Dereeper, Alexis; Bocs, Stéphanie; Rouard, Mathieu; Guignon, Valentin; Ravel, Sébastien; Tranchant-Dubreuil, Christine; Poncet, Valérie; Garsmeur, Olivier; Lashermes, Philippe; Droc, Gaëtan
2015-01-01
The whole genome sequence of Coffea canephora, the perennial diploid species known as Robusta, has been recently released. In the context of the C. canephora genome sequencing project and to support post-genomics efforts, we developed the Coffee Genome Hub (http://coffee-genome.org/), an integrative genome information system that allows centralized access to genomics and genetics data and analysis tools to facilitate translational and applied research in coffee. We provide the complete genome sequence of C. canephora along with gene structure, gene product information, metabolism, gene families, transcriptomics, syntenic blocks, genetic markers and genetic maps. The hub relies on generic software (e.g. GMOD tools) for easy querying, visualizing and downloading research data. It includes a Genome Browser enhanced by a Community Annotation System, enabling the improvement of automatic gene annotation through an annotation editor. In addition, the hub aims at developing interoperability among other existing South Green tools managing coffee data (phylogenomics resources, SNPs) and/or supporting data analyses with the Galaxy workflow manager. PMID:25392413
Cunha, Mariana Sequetin; Esposito, Danillo Lucas Alves; Rocco, Iray Maria; Maeda, Adriana Yurika; Vasami, Fernanda Gisele Silva; Nogueira, Juliana Silva; de Souza, Renato Pereira; Suzuki, Akemi; Addas-Carvalho, Marcelo; Barjas-Castro, Maria de Lourdes; Resende, Mariângela Ribeiro; Stucchi, Raquel Silveira Bello; Boin, Ilka de Fátima Santana Ferreira; Katz, Gizelda; Angerami, Rodrigo Nogueira; da Fonseca, Benedito Antonio Lopes
2016-03-03
We report here the genome sequence of Zika virus, strain ZikaSPH2015, containing all structural and nonstructural proteins flanked by the 5' and 3' untranslated region. It was isolated in São Paulo state, Brazil, in 2015, from a patient who received a blood transfusion from an asymptomatic donor at the time of donation. Copyright © 2016 Cunha et al.
The complete chloroplast genome sequence of Actinidia arguta using the PacBio RS II platform
Lin, Miaomiao; Qi, Xiujuan; Chen, Jinyong; Sun, Leiming; Zhong, Yunpeng; Fang, Jinbao; Hu, Chungen
2018-01-01
Actinidia arguta is the most basal species in a phylogenetically and economically important genus in the family Actinidiaceae. To better understand the molecular basis of the Actinidia arguta chloroplast (cp), we sequenced the complete cp genome from A. arguta using Illumina and PacBio RS II sequencing technologies. The cp genome from A. arguta was 157,611 bp in length and composed of a pair of 24,232 bp inverted repeats (IRs) separated by a 20,463 bp small single copy region (SSC) and an 88,684 bp large single copy region (LSC). Overall, the cp genome contained 113 unique genes. The cp genomes from A. arguta and three other Actinidia species from GenBank were subjected to a comparative analysis. Indel mutation events and high frequencies of base substitution were identified, and the accD and ycf2 genes showed a high degree of variation within Actinidia. Forty-seven simple sequence repeats (SSRs) and 155 repetitive structures were identified, further demonstrating the rapid evolution in Actinidia. The cp genome analysis and the identification of variable loci provide vital information for understanding the evolution and function of the chloroplast and for characterizing Actinidia population genetics. PMID:29795601
Chemotaxis and flagellar genes of Chromobacterium violaceum.
Pereira, Maristela; Parente, Juliana Alves; Bataus, Luiz Artur Mendes; Cardoso, Divina das Dores de Paula; Soares, Renata Bastos Ascenço; Soares, Célia Maria de Almeida
2004-03-31
The availability of the complete genome of the Gram-negative beta-proteobacterium Chromobacterium violaceum has increasingly impacted our understanding of this microorganism. This review focuses on the genomic organization and structural analysis of the deduced proteins of the chemosensory adaptation system of C. violaceum. C. violaceum has multiple homologues of most chemotaxis genes, organized mostly in clusters in the bacterial genome. We found at least 67 genes, distributed in 10 gene clusters, involved in the chemotaxis of C. violaceum. A close examination of the chemoreceptors methyl-accepting chemotaxis proteins (MCPs), and the deduced sequences of the members of the two-component signaling system revealed canonical motifs, described as essential for the function of the deduced proteins. The chemoreceptors found in C. violaceum include the complete repertoire of such genes described in bacteria, designated as tsr, tar, trg, and tap; 41 MCP loci were found in the C. violaceum genome. Also, the C. violaceum genome includes a large repertoire of the proteins of the chemosensory transducer system. Multiple homologues of bacterial chemotaxis genes, including CheA, CheB, CheD, CheR, CheV, CheY, CheZ, and CheW, were found in the C. violaceum genome.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, C
2009-11-12
In FY09 they will (1) complete the implementation, verification, calibration, and sensitivity and scalability analysis of the in-cell virus replication model; (2) complete the design of the cell culture (cell-to-cell infection) model; (3) continue the research, design, and development of their bioinformatics tools: the Web-based structure-alignment-based sequence variability tool and the functional annotation of the genome database; (4) collaborate with the University of California at San Francisco on areas of common interest; and (5) submit journal articles that describe the in-cell model with simulations and the bioinformatics approaches to evaluation of genome variability and fitness.
The complete mitochondrial genome structure of snow leopard Panthera uncia.
Wei, Lei; Wu, Xiaobing; Jiang, Zhigang
2009-05-01
The complete mitochondrial genome (mtDNA) of snow leopard Panthera uncia was obtained by using the polymerase chain reaction (PCR) technique based on the PCR fragments of 30 primers we designed. The entire mtDNA sequence was 16 773 base pairs (bp) in length, and the base composition was: A-5,357 bp (31.9%); C-4,444 bp (26.5%); G-2,428 bp (14.5%); T-4,544 bp (27.1%). The structural characteristics [0] of the P. uncia mitochondrial genome were highly similar to these of Felis catus, Acinonyx jubatus, Neofelis nebulosa and other mammals. However, we found several distinctive features of the mitochondrial genome of Panthera unica. First, the termination codon of COIII was TAA, which differed from those of F. catus, A. jubatus and N. nebulosa. Second, tRNA(Ser) ((AGY)), which lacked the ''DHU'' arm, could not be folded into the typical cloverleaf-shaped structure. Third, in the control region, a long repetitive sequence in RS-2 (32 bp) region was found with 2 repeats while one short repetitive segment (9 bp) was found with 15 repeats in the RS-3 region. We performed phylogenetic analysis based on a 3 816 bp concatenated sequence of 12S rRNA, 16S rRNA, ND2, ND4, ND5, Cyt b and ATP8 for P. uncia and other related species, the result indicated that P. uncia and P. leo were the sister species, which was different from the previous findings.
Liu, Nian; Huang, Yuan
2010-01-01
The complete 15,599-bp mitogenome of Acrida cinerea was determined and compared with that of the other 20 orthopterans. It displays characteristic gene content, genome organization, nucleotide composition, and codon usage found in other Caelifera mitogenomes. Comparison of 21 orthopteran sequences revealed that the tRNAs encoded by the H-strand appear more conserved than those by the L-stand. All tRNAs form the typical clover-leaf structure except trnS (agn), and most of the size variation among tRNAs stemmed from the length variation in the arm and loop of TΨC and the loop of DHU. The derived secondary structure models of the rrnS and rrnL from 21 orthoptera species closely resemble those from other insects on CRW except a considerably enlarged loop of helix 1399 of rrnS in Caelifera, which is a potentially autapomorphy of Caelifera. In the A+T-rich region, tandem repeats are not only conserved in the closely related mitogenome but also share some conserved motifs in the same subfamily. A stem-loop structure, 16 bp or longer, is likely to be involved in replication initiation in Caelifera and Grylloidea. A long T-stretch (>17 bp) with conserved stem-loop structure next to rrnS on the H-strand, bounded by a purine at either end, exists in the three species from Tettigoniidae. PMID:21197069
Mitochondrial genome sequence of Egyptian swift Rock Pigeon (Columba livia breed Egyptian swift).
Li, Chun-Hong; Shi, Wei; Shi, Wan-Yu
2015-06-01
The Egyptian swift Rock Pigeon is a breed of fancy pigeon developed over many years of selective breeding. In this work, we report the complete mitochondrial genome sequence of Egyptian swift Rock Pigeon. The total length of the mitogenome was 17,239 bp and its overall base composition was estimated to be 30.2% for A, 24.0% for T, 31.9% for C and 13.9% for G, indicating an A-T (54.2%)-rich feature in the mitogenome. It contained the typical structure of 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and a non-coding control region (D-loop region). The complete mitochondrial genome sequence of Egyptian swift Rock Pigeon would serve as an important data set of the germplasm resources for further study.
Liu, Ye; Li, Nan; Zhang, Shoufeng; Zhang, Fei; Lian, Hai; Wang, Ying; Zhang, Jinxia; Hu, Rongliang
2013-12-01
The genome of Irkut virus, isolate IRKV-THChina12, the first non-rabies lyssavirus from China (of bat origin), has been completely sequenced. In general, coding and non-coding regions of this viral genome are similar to those of other lyssaviruses. However, alignment of the deduced amino acid sequences of the structural proteins of IRKV-THChina12 with those of other lyssavirus representatives revealed significant variability between viral species. The nucleoprotein and matrix protein were found to be the most conserved, followed by the large protein, glycoprotein and phosphoprotein. Differences in the antigenic sites in glycoprotein may result in only partial protection of the available rabies biologics against Irkut virus, which is of particular concern for pre- and post-exposure rabies prophylaxis. Copyright © 2013 Elsevier Inc. All rights reserved.
Han, Yeong-Deok; Baek, Ye-Seul; Kim, Jeong-Hoon; Choi, Han-Gu; Kim, Sanghee
2016-05-01
The South Polar Skua, gull-like seabirds is the most fascinating Antarctic seabirds that lay two eggs at sites free of snow and ice and predominantly hunt pelagic fish and penguins. Blood samples of the South Polar Skua Stercorarius maccormicki was collected during the summer activity near King Sejong station in Antarctica. The complete mitochondrial DNA sequence of S. maccormicki was 16,669 bp, showing conserved genome structure and orientation found in other avian species. The control region of S. maccormicki was 93- and 80 bp shorter compared to those of Chroicocephalus saundersi and Synthliboramphus antiquus respectively. Interestingly, there is a (CAACAAACAA)6 repeat sequence in the control region. Our results of S. maccormicki mt genome including the repeat sequence, may provide useful genetic information for phylogenetic and phylogeographic histories of the southern skua complex.
Mader, Malte; Le Paslier, Marie-Christine; Bounon, Rémi; Berard, Aurélie; Vettori, Cristina; Schroeder, Hilke; Leplé, Jean-Charles; Fladung, Matthias
2016-01-01
Complete Populus genome sequences are available for the nucleus (P. trichocarpa; section Tacamahaca) and for chloroplasts (seven species), but not for mitochondria. Here, we provide the complete genome sequences of the chloroplast and the mitochondrion for the clones P. tremula W52 and P. tremula x P. alba 717-1B4 (section Populus). The organization of the chloroplast genomes of both Populus clones is described. A phylogenetic tree constructed from all available complete chloroplast DNA sequences of Populus was not congruent with the assignment of the related species to different Populus sections. In total, 3,024 variable nucleotide positions were identified among all compared Populus chloroplast DNA sequences. The 5-prime part of the LSC from trnH to atpA showed the highest frequency of variations. The variable positions included 163 positions with SNPs allowing for differentiating the two clones with P. tremula chloroplast genomes (W52, 717-1B4) from the other seven Populus individuals. These potential P. tremula-specific SNPs were displayed as a whole-plastome barcode on the P. tremula W52 chloroplast DNA sequence. Three of these SNPs and one InDel in the trnH-psbA linker were successfully validated by Sanger sequencing in an extended set of Populus individuals. The complete mitochondrial genome sequence of P. tremula is the first in the family of Salicaceae. The mitochondrial genomes of the two clones are 783,442 bp (W52) and 783,513 bp (717-1B4) in size, structurally very similar and organized as single circles. DNA sequence regions with high similarity to the W52 chloroplast sequence account for about 2% of the W52 mitochondrial genome. The mean SNP frequency was found to be nearly six fold higher in the chloroplast than in the mitochondrial genome when comparing 717-1B4 with W52. The availability of the genomic information of all three DNA-containing cell organelles will allow a holistic approach in poplar molecular breeding in the future. PMID:26800039
Complete Khoisan and Bantu genomes from southern Africa
Schuster, Stephan C.; Miller, Webb; Ratan, Aakrosh; Tomsho, Lynn P.; Giardine, Belinda; Kasson, Lindsay R.; Harris, Robert S.; Petersen, Desiree C.; Zhao, Fangqing; Qi, Ji; Alkan, Can; Kidd, Jeffrey M.; Sun, Yazhou; Drautz, Daniela I.; Bouffard, Pascal; Muzny, Donna M.; Reid, Jeffrey G.; Nazareth, Lynne V.; Wang, Qingyu; Burhans, Richard; Riemer, Cathy; Wittekindt, Nicola E.; Moorjani, Priya; Tindall, Elizabeth A.; Danko, Charles G.; Teo, Wee Siang; Buboltz, Anne M.; Zhang, Zhenhai; Ma, Qianyi; Oosthuysen, Arno; Steenkamp, Abraham W.; Oostuisen, Hermann; Venter, Philippus; Gajewski, John; Zhang, Yu; Pugh, B. Franklin; Makova, Kateryna D.; Nekrutenko, Anton; Mardis, Elaine R.; Patterson, Nick; Pringle, Tom H.; Chiaromonte, Francesca; Mullikin, James C.; Eichler, Evan E.; Hardison, Ross C.; Gibbs, Richard A.; Harkins, Timothy T.; Hayes, Vanessa M.
2013-01-01
The genetic structure of the indigenous hunter-gatherer peoples of southern Africa, the oldest known lineage of modern human, is important for understanding human diversity. Studies based on mitochondrial1 and small sets of nuclear markers2 have shown that these hunter-gatherers, known as Khoisan, San, or Bushmen, are genetically divergent from other humans1,3. However, until now, fully sequenced human genomes have been limited to recently diverged populations4–8. Here we present the complete genome sequences of an indigenous hunter-gatherer from the Kalahari Desert and a Bantu from southern Africa, as well as protein-coding regions from an additional three hunter-gatherers from disparate regions of the Kalahari. We characterize the extent of whole-genome and exome diversity among the five men, reporting 1.3 million novel DNA differences genome-wide, including 13,146 novel amino acid variants. In terms of nucleotide substitutions, the Bushmen seem to be, on average, more different from each other than, for example, a European and an Asian. Observed genomic differences between the hunter-gatherers and others may help to pinpoint genetic adaptations to an agricultural lifestyle. Adding the described variants to current databases will facilitate inclusion of southern Africans in medical research efforts, particularly when family and medical histories can be correlated with genome-wide data. PMID:20164927
2011-01-01
Background Reconstructing the higher relationships of pulmonate gastropods has been difficult. The use of morphology is problematic due to high homoplasy. Molecular studies have suffered from low taxon sampling. Forty-eight complete mitochondrial genomes are available for gastropods, ten of which are pulmonates. Here are presented the new complete mitochondrial genomes of the ten following species of pulmonates: Salinator rhamphidia (Amphiboloidea); Auriculinella bidentata, Myosotella myosotis, Ovatella vulcani, and Pedipes pedipes (Ellobiidae); Peronia peronii (Onchidiidae); Siphonaria gigas (Siphonariidae); Succinea putris (Stylommatophora); Trimusculus reticulatus (Trimusculidae); and Rhopalocaulis grandidieri (Veronicellidae). Also, 94 new pulmonate-specific primers across the entire mitochondrial genome are provided, which were designed for amplifying entire mitochondrial genomes through short reactions and closing gaps after shotgun sequencing. Results The structural features of the 10 new mitochondrial genomes are provided. All genomes share similar gene orders. Phylogenetic analyses were performed including the 10 new genomes and 17 genomes from Genbank (outgroups, opisthobranchs, and other pulmonates). Bayesian Inference and Maximum Likelihood analyses, based on the concatenated amino-acid sequences of the 13 protein-coding genes, produced the same topology. The pulmonates are paraphyletic and basal to the opisthobranchs that are monophyletic at the tip of the tree. Siphonaria, traditionally regarded as a basal pulmonate, is nested within opisthobranchs. Pyramidella, traditionally regarded as a basal (non-euthyneuran) heterobranch, is nested within pulmonates. Several hypotheses are rejected, such as the Systellommatophora, Geophila, and Eupulmonata. The Ellobiidae is polyphyletic, but the false limpet Trimusculus reticulatus is closely related to some ellobiids. Conclusions Despite recent efforts for increasing the taxon sampling in euthyneuran (opisthobranchs and pulmonates) molecular phylogenies, several of the deeper nodes are still uncertain, because of low support values as well as some incongruence between analyses based on complete mitochondrial genomes and those based on individual genes (18S, 28S, 16S, CO1). Additional complete genomes are needed for pulmonates (especially for Williamia, Otina, and Smeagol), as well as basal heterobranchs closely related to euthyneurans. Increasing the number of markers for gastropod (and more broadly mollusk) phylogenetics also is necessary in order to resolve some of the deeper nodes -although clearly not an easy task. Step by step, however, new relationships are being unveiled, such as the close relationships between the false limpet Trimusculus and ellobiids, the nesting of pyramidelloids within pulmonates, and the close relationships of Siphonaria to sacoglossan opisthobranchs. The additional genomes presented here show that some species share an identical mitochondrial gene order due to convergence. PMID:21985526
Graf, Louis; Kim, Yae Jin; Cho, Ga Youn; Miller, Kathy Ann
2017-01-01
Coccophora langsdorfii (Turner) Greville (Fucales) is an intertidal brown alga that is endemic to Northeast Asia and increasingly endangered by habitat loss and climate change. We sequenced the complete circular plastid and mitochondrial genomes of C. langsdorfii. The circular plastid genome is 124,450 bp and contains 139 protein-coding, 28 tRNA and 6 rRNA genes. The circular mitochondrial genome is 35,660 bp and contains 38 protein-coding, 25 tRNA and 3 rRNA genes. The structure and gene content of the C. langsdorfii plastid genome is similar to those of other species in the Fucales. The plastid genomes of brown algae in other orders share similar gene content but exhibit large structural recombination. The large in-frame insert in the cox2 gene in the mitochondrial genome of C. langsdorfii is typical of other brown algae. We explored the effect of this insertion on the structure and function of the cox2 protein. We estimated the usefulness of 135 plastid genes and 35 mitochondrial genes for developing molecular markers. This study shows that 29 organellar genes will prove efficient for resolving brown algal phylogeny. In addition, we propose a new molecular marker suitable for the study of intraspecific genetic diversity that should be tested in a large survey of populations of C. langsdorfii. PMID:29095864
Poczai, Péter; Hyvönen, Jaakko
2017-01-01
Spanish moss (Tillandsia usneoides) is an epiphytic bromeliad widely distributed throughout tropical and warm temperate America. This plant is highly adapted to extreme environmental conditions. Striking features of this species include specialized trichomes (scales) covering the surface of its shoots aiding the absorption of water and nutrients directly from the atmosphere and a specific photosynthesis using crassulacean acid metabolism (CAM). Here we report the plastid genome of Spanish moss and present the comparison of genome organization and sequence evolution within Poales. The plastome of Spanish moss has a quadripartite structure consisting of a large single copy (LSC, 87,439 bp), two inverted regions (IRa and IRb, 26,803 bp) and short single copy (SSC, 18,612 bp) region. The plastid genome had 37.2% GC content and 134 genes with 88 being unique protein-coding genes and 20 of these are duplicated in the IR, similar to other reported bromeliads. Our study shows that early diverging lineages of Poales do not have high substitution rates as compared to grasses, and plastid genomes of bromeliads show structural features considered to be ancestral in graminids. These include the loss of the introns in the clpP and rpoC1 genes and the complete loss or partial degradation of accD and ycf genes in the Graminid clade. Further structural rearrangements appeared in the graminids lacking in Spanish moss, which include a 28-kb inversion between the trnG-UCC-rps14 region and 6-kb in the trnG-UCC-psbD, followed by a third <1kb inversion in the trnT sequence.
Hyvönen, Jaakko
2017-01-01
Spanish moss (Tillandsia usneoides) is an epiphytic bromeliad widely distributed throughout tropical and warm temperate America. This plant is highly adapted to extreme environmental conditions. Striking features of this species include specialized trichomes (scales) covering the surface of its shoots aiding the absorption of water and nutrients directly from the atmosphere and a specific photosynthesis using crassulacean acid metabolism (CAM). Here we report the plastid genome of Spanish moss and present the comparison of genome organization and sequence evolution within Poales. The plastome of Spanish moss has a quadripartite structure consisting of a large single copy (LSC, 87,439 bp), two inverted regions (IRa and IRb, 26,803 bp) and short single copy (SSC, 18,612 bp) region. The plastid genome had 37.2% GC content and 134 genes with 88 being unique protein-coding genes and 20 of these are duplicated in the IR, similar to other reported bromeliads. Our study shows that early diverging lineages of Poales do not have high substitution rates as compared to grasses, and plastid genomes of bromeliads show structural features considered to be ancestral in graminids. These include the loss of the introns in the clpP and rpoC1 genes and the complete loss or partial degradation of accD and ycf genes in the Graminid clade. Further structural rearrangements appeared in the graminids lacking in Spanish moss, which include a 28-kb inversion between the trnG-UCC–rps14 region and 6-kb in the trnG-UCC–psbD, followed by a third <1kb inversion in the trnT sequence. PMID:29095905
Salmonella Typhi genomics: envisaging the future of typhoid eradication.
Yap, Kien-Pong; Thong, Kwai Lin
2017-08-01
Next-generation whole-genome sequencing has revolutionised the study of infectious diseases in recent years. The availability of genome sequences and its understanding have transformed the field of molecular microbiology, epidemiology, infection treatments and vaccine developments. We review the key findings of the publicly accessible genomes of Salmonella enterica serovar Typhi since the first complete genome to the most recent release of thousands of Salmonella Typhi genomes, which remarkably shape the genomic research of S. Typhi and other pathogens. Important new insights acquired from the genome sequencing of S. Typhi, pertaining to genomic variations, evolution, population structure, antibiotic resistance, virulence, pathogenesis, disease surveillance/investigation and disease control are discussed. As the numbers of sequenced genomes are increasing at an unprecedented rate, fine variations in the gene pool of S. Typhi are captured in high resolution, allowing deeper understanding of the pathogen's evolutionary trends and its pathogenesis, paving the way to bringing us closer to eradication of typhoid through effective vaccine/treatment development. © 2017 John Wiley & Sons Ltd.
Vanlalruati, Catherine; Mandal, Surajit De; Gurusubramanian, Guruswami; Senthil Kumar, Nachimuthu
2016-07-01
The complete mitochondrial genome of Junonia iphita was determined to be 15,433 bp in length, including 37 typical mitochondrial genes and an AT-rich region. All the protein coding genes (PCGs) are initiated by typical ATN codons, except cox1 gene that is by CGA codon. Eight genes use complete termination codon (TAA), whereas the cox1, cox2 and nad5 genes end with single T; nad4 and nad1 ends with stop codon TA. All the tRNA show secondary cloverleaf structures except trnS1 (AGN). The A + T rich region is 546 bp in length containing ATAGA motif followed by a 18 bp poly-T stretch, two microsatellite-like (TA)9 elements and 8 bp poly-A stretch immediately upstream of trnM gene.
Improvement of the Threespine Stickleback Genome Using a Hi-C-Based Proximity-Guided Assembly.
Peichel, Catherine L; Sullivan, Shawn T; Liachko, Ivan; White, Michael A
2017-09-01
Scaffolding genomes into complete chromosome assemblies remains challenging even with the rapidly increasing sequence coverage generated by current next-generation sequence technologies. Even with scaffolding information, many genome assemblies remain incomplete. The genome of the threespine stickleback (Gasterosteus aculeatus), a fish model system in evolutionary genetics and genomics, is not completely assembled despite scaffolding with high-density linkage maps. Here, we first test the ability of a Hi-C based proximity-guided assembly (PGA) to perform a de novo genome assembly from relatively short contigs. Using Hi-C based PGA, we generated complete chromosome assemblies from a distribution of short contigs (20-100 kb). We found that 96.40% of contigs were correctly assigned to linkage groups (LGs), with ordering nearly identical to the previous genome assembly. Using available bacterial artificial chromosome (BAC) end sequences, we provide evidence that some of the few discrepancies between the Hi-C assembly and the existing assembly are due to structural variation between the populations used for the 2 assemblies or errors in the existing assembly. This Hi-C assembly also allowed us to improve the existing assembly, assigning over 60% (13.35 Mb) of the previously unassigned (~21.7 Mb) contigs to LGs. Together, our results highlight the potential of the Hi-C based PGA method to be used in combination with short read data to perform relatively inexpensive de novo genome assemblies. This approach will be particularly useful in organisms in which it is difficult to perform linkage mapping or to obtain high molecular weight DNA required for other scaffolding methods. © The American Genetic Association 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Redwan, R M; Saidin, A; Kumar, S V
2015-08-12
Pineapple (Ananas comosus var. comosus) is known as the king of fruits for its crown and is the third most important tropical fruit after banana and citrus. The plant, which is indigenous to South America, is the most important species in the Bromeliaceae family and is largely traded for fresh fruit consumption. Here, we report the complete chloroplast sequence of the MD-2 pineapple that was sequenced using the PacBio sequencing technology. In this study, the high error rate of PacBio long sequence reads of A. comosus's total genomic DNA were improved by leveraging on the high accuracy but short Illumina reads for error-correction via the latest error correction module from Novocraft. Error corrected long PacBio reads were assembled by using a single tool to produce a contig representing the pineapple chloroplast genome. The genome of 159,636 bp in length is featured with the conserved quadripartite structure of chloroplast containing a large single copy region (LSC) with a size of 87,482 bp, a small single copy region (SSC) with a size of 18,622 bp and two inverted repeat regions (IRA and IRB) each with the size of 26,766 bp. Overall, the genome contained 117 unique coding regions and 30 were repeated in the IR region with its genes contents, structure and arrangement similar to its sister taxon, Typha latifolia. A total of 35 repeats structure were detected in both the coding and non-coding regions with a majority being tandem repeats. In addition, 205 SSRs were detected in the genome with six protein-coding genes contained more than two SSRs. Comparative chloroplast genomes from the subclass Commelinidae revealed a conservative protein coding gene albeit located in a highly divergence region. Analysis of selection pressure on protein-coding genes using Ka/Ks ratio showed significant positive selection exerted on the rps7 gene of the pineapple chloroplast with P less than 0.05. Phylogenetic analysis confirmed the recent taxonomical relation among the member of commelinids which support the monophyly relationship between Arecales and Dasypogonaceae and between Zingiberales to the Poales, which includes the A. comosus. The complete sequence of the chloroplast of pineapple provides insights to the divergence of genic chloroplast sequences from the members of the subclass Commelinidae. The complete pineapple chloroplast will serve as a reference for in-depth taxonomical studies in the Bromeliaceae family when more species under the family are sequenced in the future. The genetic sequence information will also make feasible other molecular applications of the pineapple chloroplast for plant genetic improvement.
Complete Chloroplast Genome of the Wollemi Pine (Wollemia nobilis): Structure and Evolution.
Yap, Jia-Yee S; Rohner, Thore; Greenfield, Abigail; Van Der Merwe, Marlien; McPherson, Hannah; Glenn, Wendy; Kornfeld, Geoff; Marendy, Elessa; Pan, Annie Y H; Wilton, Alan; Wilkins, Marc R; Rossetto, Maurizio; Delaney, Sven K
2015-01-01
The Wollemi pine (Wollemia nobilis) is a rare Southern conifer with striking morphological similarity to fossil pines. A small population of W. nobilis was discovered in 1994 in a remote canyon system in the Wollemi National Park (near Sydney, Australia). This population contains fewer than 100 individuals and is critically endangered. Previous genetic studies of the Wollemi pine have investigated its evolutionary relationship with other pines in the family Araucariaceae, and have suggested that the Wollemi pine genome contains little or no variation. However, these studies were performed prior to the widespread use of genome sequencing, and their conclusions were based on a limited fraction of the Wollemi pine genome. In this study, we address this problem by determining the entire sequence of the W. nobilis chloroplast genome. A detailed analysis of the structure of the genome is presented, and the evolution of the genome is inferred by comparison with the chloroplast sequences of other members of the Araucariaceae and the related family Podocarpaceae. Pairwise alignments of whole genome sequences, and the presence of unique pseudogenes, gene duplications and insertions in W. nobilis and Araucariaceae, indicate that the W. nobilis chloroplast genome is most similar to that of its sister taxon Agathis. However, the W. nobilis genome contains an unusually high number of repetitive sequences, and these could be used in future studies to investigate and conserve any remnant genetic diversity in the Wollemi pine.
Virus-like attachment sites as structural landmarks of plants retrotransposons.
Ochoa Cruz, Edgar Andres; Cruz, Guilherme Marcello Queiroga; Vieira, Andréia Prata; Van Sluys, Marie-Anne
2016-01-01
The genomic data available nowadays has enabled the study of repetitive sequences and their relationship to viruses. Among them, long terminal repeat retrotransposons (LTR-RTs) are the largest component of most plant genomes, the Gypsy and Copia superfamilies being the most common. Recently it has been found that Del lineage, an LTR-RT of Gypsy superfamily, has putative virus-like attachment (vl-att) sites. This signature, originally described for retroviruses, is recognized by retroviral integrase conferring specificity to the integration process. Here we retrieved 26,092 putative complete LTR-RTs from 10 lineages found in 10 fully sequenced angiosperm genomes and found putative vl-att sites that are a conserved structural landmark across these genomes. Furthermore, we reveal that each plant genome has a distinguishable LTR-RT lineage amplification pattern that could be related to the vl-att sites diversity. We used these patterns to generate a specific quick-response (QR) code for each genome that could be used as a barcode of identification of plants in the future. The universal distribution of vl-att sites represents a new structural feature common to plant LTR-RTs and retroviruses. This is an important finding that expands the information about the structural similarity between LTR-RT and retroviruses. We speculate that the sequence diversity of vl-att sites could be important for the life cycle of retrotransposons, as it was shown for retroviruses. All the structural vl-att site signatures are strong candidates for further functional studies. Moreover, this is the first identification of specific LTR-RT content and their amplification patterns in a large dataset of LTR-RT lineages and angiosperm genomes. These distribution patterns could be used in the future with biotechnological identification purposes.
Bejerman, Nicolás; Giolitti, Fabián; Trucco, Verónica; de Breuil, Soledad; Dietzgen, Ralf G; Lenardon, Sergio
2016-07-01
Alfalfa dwarf disease, probably caused by synergistic interactions of mixed virus infections, is a major and emergent disease that threatens alfalfa production in Argentina. Deep sequencing of diseased alfalfa plant samples from the central region of Argentina resulted in the identification of a new virus genome resembling enamoviruses in sequence and genome structure. Phylogenetic analysis suggests that it is a new member of the genus Enamovirus, family Luteoviridae. The virus is tentatively named "alfalfa enamovirus 1" (AEV-1). The availability of the AEV-1 genome sequence will make it possible to assess the genetic variability of this virus and to construct an infectious clone to investigate its role in alfalfa dwarfism disease.
Development and application of Human Genome Epidemiology
NASA Astrophysics Data System (ADS)
Xu, Jingwen
2017-12-01
Epidemiology is a science that studies distribution of diseases and health in population and its influencing factors, it also studies how to prevent and cure disease and promote health strategies and measures. Epidemiology has developed rapidly in recent years and it is an intercross subject with various other disciplines to form a series of branch disciplines such as Genetic epidemiology, molecular epidemiology, drug epidemiology and tumor epidemiology. With the implementation and completion of Human Genome Project (HGP), Human Genome Epidemiology (HuGE) has emerged at this historic moment. In this review, the development of Human Genome Epidemiology, research content, the construction and structure of relevant network, research standards, as well as the existing results and problems are briefly outlined.
Diekmann, Kerstin; Hodkinson, Trevor R; Wolfe, Kenneth H; van den Bekerom, Rob; Dix, Philip J; Barth, Susanne
2009-06-01
Lolium perenne L. (perennial ryegrass) is globally one of the most important forage and grassland crops. We sequenced the chloroplast (cp) genome of Lolium perenne cultivar Cashel. The L. perenne cp genome is 135 282 bp with a typical quadripartite structure. It contains genes for 76 unique proteins, 30 tRNAs and four rRNAs. As in other grasses, the genes accD, ycf1 and ycf2 are absent. The genome is of average size within its subfamily Pooideae and of medium size within the Poaceae. Genome size differences are mainly due to length variations in non-coding regions. However, considerable length differences of 1-27 codons in comparison of L. perenne to other Poaceae and 1-68 codons among all Poaceae were also detected. Within the cp genome of this outcrossing cultivar, 10 insertion/deletion polymorphisms and 40 single nucleotide polymorphisms were detected. Two of the polymorphisms involve tiny inversions within hairpin structures. By comparing the genome sequence with RT-PCR products of transcripts for 33 genes, 31 mRNA editing sites were identified, five of them unique to Lolium. The cp genome sequence of L. perenne is available under Accession number AM777385 at the European Molecular Biology Laboratory, National Center for Biotechnology Information and DNA DataBank of Japan.
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN
Merchant, Nirav
2016-01-01
Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.
Duvick, Jon; Standage, Daniel S; Merchant, Nirav; Brendel, Volker P
2016-04-01
Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today's pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant's Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. © 2016 American Society of Plant Biologists. All rights reserved.
Szałaj, Przemysław; Tang, Zhonghui; Michalski, Paul; Pietal, Michal J; Luo, Oscar J; Sadowski, Michał; Li, Xingwang; Radew, Kamen; Ruan, Yijun; Plewczynski, Dariusz
2016-12-01
ChIA-PET is a high-throughput mapping technology that reveals long-range chromatin interactions and provides insights into the basic principles of spatial genome organization and gene regulation mediated by specific protein factors. Recently, we showed that a single ChIA-PET experiment provides information at all genomic scales of interest, from the high-resolution locations of binding sites and enriched chromatin interactions mediated by specific protein factors, to the low resolution of nonenriched interactions that reflect topological neighborhoods of higher-order chromosome folding. This multilevel nature of ChIA-PET data offers an opportunity to use multiscale 3D models to study structural-functional relationships at multiple length scales, but doing so requires a structural modeling platform. Here, we report the development of 3D-GNOME (3-Dimensional Genome Modeling Engine), a complete computational pipeline for 3D simulation using ChIA-PET data. 3D-GNOME consists of three integrated components: a graph-distance-based heat map normalization tool, a 3D modeling platform, and an interactive 3D visualization tool. Using ChIA-PET and Hi-C data derived from human B-lymphocytes, we demonstrate the effectiveness of 3D-GNOME in building 3D genome models at multiple levels, including the entire genome, individual chromosomes, and specific segments at megabase (Mb) and kilobase (kb) resolutions of single average and ensemble structures. Further incorporation of CTCF-motif orientation and high-resolution looping patterns in 3D simulation provided additional reliability of potential biologically plausible topological structures. © 2016 Szałaj et al.; Published by Cold Spring Harbor Laboratory Press.
T4-Like Genome Organization of the Escherichia coli O157:H7 Lytic Phage AR1▿†
Liao, Wei-Chao; Ng, Wailap Victor; Lin, I-Hsuan; Syu, Wan-Jr; Liu, Tze-Tze; Chang, Chuan-Hsiung
2011-01-01
We report the genome organization and analysis of the first completely sequenced T4-like phage, AR1, of Escherichia coli O157:H7. Unlike most of the other sequenced phages of O157:H7, which belong to the temperate Podoviridae and Siphoviridae families, AR1 is a T4-like phage known to efficiently infect this pathogenic bacterial strain. The 167,435-bp AR1 genome is currently the largest among all the sequenced E. coli O157:H7 phages. It carries a total of 281 potential open reading frames (ORFs) and 10 putative tRNA genes. Of these, 126 predicted proteins could be classified into six viral orthologous group categories, with at least 18 proteins of the structural protein category having been detected by tandem mass spectrometry. Comparative genomic analysis of AR1 and four other completely sequenced T4-like genomes (RB32, RB69, T4, and JS98) indicated that they share a well-organized and highly conserved core genome, particularly in the regions encoding DNA replication and virion structural proteins. The major diverse features between these phages include the modules of distal tail fibers and the types and numbers of internal proteins, tRNA genes, and mobile elements. Codon usage analysis suggested that the presence of AR1-encoded tRNAs may be relevant to the codon usage of structural proteins. Furthermore, protein sequence analysis of AR1 gp37, a potential receptor binding protein, indicated that eight residues in the C terminus are unique to O157:H7 T4-like phages AR1 and PP01. These residues are known to be located in the T4 receptor recognition domain, and they may contribute to specificity for adsorption to the O157:H7 strain. PMID:21507986
Chen, Xiaochen; Li, Qiushi; Li, Ying; Qian, Jun; Han, Jianping
2015-01-01
The chloroplast genome (cp genome) of Aconitum barbatum var. puberulum was sequenced using the third-generation sequencing platform based on the single-molecule real-time (SMRT) sequencing approach. To our knowledge, this is the first reported complete cp genome of Aconitum, and we anticipate that it will have great value for phylogenetic studies of the Ranunculaceae family. In total, 23,498 CCS reads and 20,685,462 base pairs were generated, the mean read length was 880 bp, and the longest read was 2,261 bp. Genome coverage of 100% was achieved with a mean coverage of 132× and no gaps. The accuracy of the assembled genome is 99.973%; the assembly was validated using Sanger sequencing of six selected genes from the cp genome. The complete cp genome of A. barbatum var. puberulum is 156,749 bp in length, including a large single-copy region of 87,630 bp and a small single-copy region of 16,941 bp separated by two inverted repeats of 26,089 bp. The cp genome contains 130 genes, including 84 protein-coding genes, 34 tRNA genes and eight rRNA genes. Four forward, five inverted and eight tandem repeats were identified. According to the SSR analysis, the longest poly structure is a 20-T repeat. Our results presented in this paper will facilitate the phylogenetic studies and molecular authentication on Aconitum.
Tembrock, Luke R.; Zheng, Shaoyu; Wu, Zhiqiang
2018-01-01
Qat (Catha edulis, Celastraceae) is a woody evergreen species with great economic and cultural importance. It is cultivated for its stimulant alkaloids cathine and cathinone in East Africa and southwest Arabia. However, genome information, especially DNA sequence resources, for C. edulis are limited, hindering studies regarding interspecific and intraspecific relationships. Herein, the complete chloroplast (cp) genome of Catha edulis is reported. This genome is 157,960 bp in length with 37% GC content and is structurally arranged into two 26,577 bp inverted repeats and two single-copy areas. The size of the small single-copy and the large single-copy regions were 18,491 bp and 86,315 bp, respectively. The C. edulis cp genome consists of 129 coding genes including 37 transfer RNA (tRNA) genes, 8 ribosomal RNA (rRNA) genes, and 84 protein coding genes. For those genes, 112 are single copy genes and 17 genes are duplicated in two inverted regions with seven tRNAs, four rRNAs, and six protein coding genes. The phylogenetic relationships resolved from the cp genome of qat and 32 other species confirms the monophyly of Celastraceae. The cp genomes of C. edulis, Euonymus japonicus and seven Celastraceae species lack the rps16 intron, which indicates an intron loss took place among an ancestor of this family. The cp genome of C. edulis provides a highly valuable genetic resource for further phylogenomic research, barcoding and cp transformation in Celastraceae. PMID:29425128
Chen, Xiaochen; Li, Qiushi; Li, Ying; Qian, Jun; Han, Jianping
2015-01-01
The chloroplast genome (cp genome) of Aconitum barbatum var. puberulum was sequenced using the third-generation sequencing platform based on the single-molecule real-time (SMRT) sequencing approach. To our knowledge, this is the first reported complete cp genome of Aconitum, and we anticipate that it will have great value for phylogenetic studies of the Ranunculaceae family. In total, 23,498 CCS reads and 20,685,462 base pairs were generated, the mean read length was 880 bp, and the longest read was 2,261 bp. Genome coverage of 100% was achieved with a mean coverage of 132× and no gaps. The accuracy of the assembled genome is 99.973%; the assembly was validated using Sanger sequencing of six selected genes from the cp genome. The complete cp genome of A. barbatum var. puberulum is 156,749 bp in length, including a large single-copy region of 87,630 bp and a small single-copy region of 16,941 bp separated by two inverted repeats of 26,089 bp. The cp genome contains 130 genes, including 84 protein-coding genes, 34 tRNA genes and eight rRNA genes. Four forward, five inverted and eight tandem repeats were identified. According to the SSR analysis, the longest poly structure is a 20-T repeat. Our results presented in this paper will facilitate the phylogenetic studies and molecular authentication on Aconitum. PMID:25705213
Complete genome sequence of Paenibacillus sp. strain JDR-2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chow, Virginia; Nong, Guang; St. John, Franz J.
2012-01-01
Paenibacillus sp. strain JDR-2, an aggressively xylanolytic bacterium isolated from sweetgum (Liquidambar styraciflua) wood, is able to efficiently depolymerize, assimilate and metabolize 4-O-methylglucuronoxylan, the predominant structural component of hardwood hemicelluloses. A basis for this capability was first supported by the identification of genes and characterization of encoded enzymes and has been further defined by the sequencing and annotation of the complete genome, which we describe. In addition to genes implicated in the utilization of -1,4-xylan, genes have also been identified for the utilization of other hemicellulosic polysaccharides. The genome of Paenibacillus sp. JDR-2 contains 7,184,930 bp in a single repliconmore » with 6,288 protein-coding and 122 RNA genes. Uniquely prominent are 874 genes encoding proteins involved in carbohydrate transport and metabolism. The prevalence and organization of these genes support a metabolic potential for bioprocessing of hemicellulose fractions derived from lignocellulosic resources.« less
Complete genome sequence of Conexibacter woesei type strain (ID131577T)
Pukall, Rüdiger; Lapidus, Alla; Glavina Del Rio, Tijana; Copeland, Alex; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Mavromatis, Konstantinos; Ivanova, Natalia; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Chain, Patrick; Meincke, Linda; Sims, David; Brettin, Thomas; Detter, John C.; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Kyrpides, Nikos C.; Klenk, Hans-Peter; Hugenholtz, Philip
2010-01-01
The genus Conexibacter (Monciardini et al. 2003) represents the type genus of the family Conexibacteraceae (Stackebrandt 2005, emend. Zhi et al. 2009) with Conexibacter woesei as the type species of the genus. C. woesei is a representative of a deep evolutionary line of descent within the class Actinobacteria. Strain ID131577T was originally isolated from temperate forest soil in Gerenzano (Italy). Cells are small, short rods that are motile by peritrichous flagella. They may form aggregates after a longer period of growth and, then as a typical characteristic, an undulate structure is formed by self-aggregation of flagella with entangled bacterial cells. Here we describe the features of the organism, together with the complete sequence and annotation. The 6,359,369 bp long genome of C. woesei contains 5,950 protein-coding and 48 RNA genes and is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304704
Dictionary-driven protein annotation
Rigoutsos, Isidore; Huynh, Tien; Floratos, Aris; Parida, Laxmi; Platt, Daniel
2002-01-01
Computational methods seeking to automatically determine the properties (functional, structural, physicochemical, etc.) of a protein directly from the sequence have long been the focus of numerous research groups. With the advent of advanced sequencing methods and systems, the number of amino acid sequences that are being deposited in the public databases has been increasing steadily. This has in turn generated a renewed demand for automated approaches that can annotate individual sequences and complete genomes quickly, exhaustively and objectively. In this paper, we present one such approach that is centered around and exploits the Bio-Dictionary, a collection of amino acid patterns that completely covers the natural sequence space and can capture functional and structural signals that have been reused during evolution, within and across protein families. Our annotation approach also makes use of a weighted, position-specific scoring scheme that is unaffected by the over-representation of well-conserved proteins and protein fragments in the databases used. For a given query sequence, the method permits one to determine, in a single pass, the following: local and global similarities between the query and any protein already present in a public database; the likeness of the query to all available archaeal/bacterial/eukaryotic/viral sequences in the database as a function of amino acid position within the query; the character of secondary structure of the query as a function of amino acid position within the query; the cytoplasmic, transmembrane or extracellular behavior of the query; the nature and position of binding domains, active sites, post-translationally modified sites, signal peptides, etc. In terms of performance, the proposed method is exhaustive, objective and allows for the rapid annotation of individual sequences and full genomes. Annotation examples are presented and discussed in Results, including individual queries and complete genomes that were released publicly after we built the Bio-Dictionary that is used in our experiments. Finally, we have computed the annotations of more than 70 complete genomes and made them available on the World Wide Web at http://cbcsrv.watson.ibm.com/Annotations/. PMID:12202776
The complete sequence of the mitochondrial genome of Arctic fox (Alopex lagopus).
Yan, Shou-Qing; Guo, Peng-Cheng; Yue, Yuan; Li, Wan-Hong; Bai, Chun-Yan; Li, Yu-Mei; Sun, Jin-Hai; Zhao, Zhi-Hui
2016-11-01
In the present study, the complete mitochondrial genome sequence of Arctic fox (Alopex lagopus) was determined for the first time. It has a total length of 16,656 bp, and contains 13 protein-coding genes, 22 tRNA genes, 2 ribosome RNA genes and 1 control region. The nucleotide composition is 31.3% for A, 26.2% for C, 14.8% for G and 27.7% for T, respectively. The D-loop region located between tRNA Pro and tRNA Phe contains a (ACACGTACACGCAT) 18 tandem repeat array. The data will be useful for the investigation of the genetic structure and diversity in the natural and farmed population of Arctic foxes.
The complete mitochondrial genome of Pholis nebulosus (Perciformes: Pholidae).
Wang, Zhongquan; Qin, Kaili; Liu, Jingxi; Song, Na; Han, Zhiqiang; Gao, Tianxiang
2016-11-01
In this study, the complete mitochondrial genome (mitogenome) sequence of Pholis nebulosus has been determined by long polymerase chain reaction and primer-walking methods. The mitogenome is a circular molecule of 16 524 bp in length, including the typical structure of 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 2 non-coding regions (L-strand replication origin and control region), the gene contents of which are identical to those observed in most bony fishes. Within the control region, we identified the termination-associated sequence domain (TAS), and the conserved sequence block domain (CSB-F, CSB-E, CSB-D, CSB-C, CSB-B, CSB-A, CSB-1, CSB-2, CSB-3).
Kim, Young-Kyu; Park, Chong-wook; Kim, Ki-Joong
2009-03-31
The chloroplast DNA sequences of Megaleranthis saniculifolia, an endemic and monotypic endangered plant species, were completed in this study (GenBank FJ597983). The genome is 159,924 bp in length. It harbors a pair of IR regions consisting of 26,608 bp each. The lengths of the LSC and SSC regions are 88,326 bp and 18,382 bp, respectively. The structural organizations, gene and intron contents, gene orders, AT contents, codon usages, and transcription units of the Megaleranthis chloroplast genome are similar to those of typical land plant cp DNAs. However, the detailed features of Megaleranthis chloroplast genomes are substantially different from that of Ranunculus, which belongs to the same family, the Ranunculaceae. First, the Megaleranthis cp DNA was 4,797 bp longer than that of Ranunculus due to an expanded IR region into the SSC region and duplicated sequence elements in several spacer regions of the Megaleranthis cp genome. Second, the chloroplast genomes of Megaleranthis and Ranunculus evidence 5.6% sequence divergence in the coding regions, 8.9% sequence divergence in the intron regions, and 18.7% sequence divergence in the intergenic spacer regions, respectively. In both the coding and noncoding regions, average nucleotide substitution rates differed markedly, depending on the genome position. Our data strongly implicate the positional effects of the evolutionary modes of chloroplast genes. The genes evidencing higher levels of base substitutions also have higher incidences of indel mutations and low Ka/Ks ratios. A total of 54 simple sequence repeat loci were identified from the Megaleranthis cp genome. The existence of rich cp SSR loci in the Megaleranthis cp genome provides a rare opportunity to study the population genetic structures of this endangered species. Our phylogenetic trees based on the two independent markers, the nuclear ITS and chloroplast matK sequences, strongly support the inclusion of the Megaleranthis to the Trollius. Therefore, our molecular trees support Ohwi's original treatment of Megaleranthis saniculiforia to Trollius chosenensis Ohwi.
Li, Ming-Wei; Lin, Rui-Qing; Song, Hui-Qun; Wu, Xiang-Yun; Zhu, Xing-Quan
2008-05-16
Studying mitochondrial (mt) genomics has important implications for various fundamental areas, including mt biochemistry, physiology and molecular biology. In addition, mt genome sequences have provided useful markers for investigating population genetic structures, systematics and phylogenetics of organisms. Toxocara canis, Toxocara cati and Toxocara malaysiensis cause significant health problems in animals and humans. Although they are of importance in human and animal health, no information on the mt genomes for any of Toxocara species is available. The sizes of the entire mt genome are 14,322 bp for T. canis, 14029 bp for T. cati and 14266 bp for T. malaysiensis, respectively. These circular genomes are amongst the largest reported to date for all secernentean nematodes. Their relatively large sizes relate mainly to an increased length in the AT-rich region. The mt genomes of the three Toxocara species all encode 12 proteins, two ribosomal RNAs and 22 transfer RNA genes, but lack the ATP synthetase subunit 8 gene, which is consistent with all other species of Nematode studied to date, with the exception of Trichinella spiralis. All genes are transcribed in the same direction and have a nucleotide composition high in A and T, but low in G and C. The contents of A+T of the complete genomes are 68.57% for T. canis, 69.95% for T. cati and 68.86% for T. malaysiensis, among which the A+T for T. canis is the lowest among all nematodes studied to date. The AT bias had a significant effect on both the codon usage pattern and amino acid composition of proteins. The mt genome structures for three Toxocara species, including genes and non-coding regions, are in the same order as for Ascaris suum and Anisakis simplex, but differ from Ancylostoma duodenale, Necator americanus and Caenorhabditis elegans only in the location of the AT-rich region, whereas there are substantial differences when compared with Onchocerca volvulus,Dirofiliria immitis and Strongyloides stercoralis. Phylogenetic analyses based on concatenated amino acid sequences of 12 protein-coding genes revealed that the newly described species T. malaysiensis was more closely related to T. cati than to T. canis, consistent with results of a previous study using sequences of nuclear internal transcribed spacers as genetic markers. The present study determined the complete mt genome sequences for three roundworms of human and animal health significance, which provides mtDNA evidence for the validity of T. malaysiensis and also provides a foundation for studying the systematics, population genetics and ecology of these and other nematodes of socio-economic importance.
Stability of local secondary structure determines selectivity of viral RNA chaperones.
Bravo, Jack P K; Borodavka, Alexander; Barth, Anders; Calabrese, Antonio N; Mojzes, Peter; Cockburn, Joseph J B; Lamb, Don C; Tuma, Roman
2018-05-18
To maintain genome integrity, segmented double-stranded RNA viruses of the Reoviridae family must accurately select and package a complete set of up to a dozen distinct genomic RNAs. It is thought that the high fidelity segmented genome assembly involves multiple sequence-specific RNA-RNA interactions between single-stranded RNA segment precursors. These are mediated by virus-encoded non-structural proteins with RNA chaperone-like activities, such as rotavirus (RV) NSP2 and avian reovirus σNS. Here, we compared the abilities of NSP2 and σNS to mediate sequence-specific interactions between RV genomic segment precursors. Despite their similar activities, NSP2 successfully promotes inter-segment association, while σNS fails to do so. To understand the mechanisms underlying such selectivity in promoting inter-molecular duplex formation, we compared RNA-binding and helix-unwinding activities of both proteins. We demonstrate that octameric NSP2 binds structured RNAs with high affinity, resulting in efficient intramolecular RNA helix disruption. Hexameric σNS oligomerizes into an octamer that binds two RNAs, yet it exhibits only limited RNA-unwinding activity compared to NSP2. Thus, the formation of intersegment RNA-RNA interactions is governed by both helix-unwinding capacity of the chaperones and stability of RNA structure. We propose that this protein-mediated RNA selection mechanism may underpin the high fidelity assembly of multi-segmented RNA genomes in Reoviridae.
Non-B DB: a database of predicted non-B DNA-forming motifs in mammalian genomes.
Cer, Regina Z; Bruce, Kevin H; Mudunuri, Uma S; Yi, Ming; Volfovsky, Natalia; Luke, Brian T; Bacolla, Albino; Collins, Jack R; Stephens, Robert M
2011-01-01
Although the capability of DNA to form a variety of non-canonical (non-B) structures has long been recognized, the overall significance of these alternate conformations in biology has only recently become accepted en masse. In order to provide access to genome-wide locations of these classes of predicted structures, we have developed non-B DB, a database integrating annotations and analysis of non-B DNA-forming sequence motifs. The database provides the most complete list of alternative DNA structure predictions available, including Z-DNA motifs, quadruplex-forming motifs, inverted repeats, mirror repeats and direct repeats and their associated subsets of cruciforms, triplex and slipped structures, respectively. The database also contains motifs predicted to form static DNA bends, short tandem repeats and homo(purine•pyrimidine) tracts that have been associated with disease. The database has been built using the latest releases of the human, chimp, dog, macaque and mouse genomes, so that the results can be compared directly with other data sources. In order to make the data interpretable in a genomic context, features such as genes, single-nucleotide polymorphisms and repetitive elements (SINE, LINE, etc.) have also been incorporated. The database is accessed through query pages that produce results with links to the UCSC browser and a GBrowse-based genomic viewer. It is freely accessible at http://nonb.abcc.ncifcrf.gov.
Extensive sequencing of seven human genomes to characterize benchmark reference materials
Zook, Justin M.; Catoe, David; McDaniel, Jennifer; Vang, Lindsay; Spies, Noah; Sidow, Arend; Weng, Ziming; Liu, Yuling; Mason, Christopher E.; Alexander, Noah; Henaff, Elizabeth; McIntyre, Alexa B.R.; Chandramohan, Dhruva; Chen, Feng; Jaeger, Erich; Moshrefi, Ali; Pham, Khoa; Stedman, William; Liang, Tiffany; Saghbini, Michael; Dzakula, Zeljko; Hastie, Alex; Cao, Han; Deikus, Gintaras; Schadt, Eric; Sebra, Robert; Bashir, Ali; Truty, Rebecca M.; Chang, Christopher C.; Gulbahce, Natali; Zhao, Keyan; Ghosh, Srinka; Hyland, Fiona; Fu, Yutao; Chaisson, Mark; Xiao, Chunlin; Trow, Jonathan; Sherry, Stephen T.; Zaranek, Alexander W.; Ball, Madeleine; Bobe, Jason; Estep, Preston; Church, George M.; Marks, Patrick; Kyriazopoulou-Panagiotopoulou, Sofia; Zheng, Grace X.Y.; Schnall-Levin, Michael; Ordonez, Heather S.; Mudivarti, Patrice A.; Giorda, Kristina; Sheng, Ying; Rypdal, Karoline Bjarnesdatter; Salit, Marc
2016-01-01
The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly. PMID:27271295
Fujisawa, Takatomo; Narikawa, Rei; Okamoto, Shinobu; Ehira, Shigeki; Yoshimura, Hidehisa; Suzuki, Iwane; Masuda, Tatsuru; Mochimaru, Mari; Takaichi, Shinichi; Awai, Koichiro; Sekine, Mitsuo; Horikawa, Hiroshi; Yashiro, Isao; Omata, Seiha; Takarada, Hiromi; Katano, Yoko; Kosugi, Hiroki; Tanikawa, Satoshi; Ohmori, Kazuko; Sato, Naoki; Ikeuchi, Masahiko; Fujita, Nobuyuki; Ohmori, Masayuki
2010-01-01
A filamentous non-N2-fixing cyanobacterium, Arthrospira (Spirulina) platensis, is an important organism for industrial applications and as a food supply. Almost the complete genome of A. platensis NIES-39 was determined in this study. The genome structure of A. platensis is estimated to be a single, circular chromosome of 6.8 Mb, based on optical mapping. Annotation of this 6.7 Mb sequence yielded 6630 protein-coding genes as well as two sets of rRNA genes and 40 tRNA genes. Of the protein-coding genes, 78% are similar to those of other organisms; the remaining 22% are currently unknown. A total 612 kb of the genome comprise group II introns, insertion sequences and some repetitive elements. Group I introns are located in a protein-coding region. Abundant restriction-modification systems were determined. Unique features in the gene composition were noted, particularly in a large number of genes for adenylate cyclase and haemolysin-like Ca2+-binding proteins and in chemotaxis proteins. Filament-specific genes were highlighted by comparative genomic analysis. PMID:20203057
Mitochondrial genome of the African lion Panthera leo leo.
Ma, Yue-ping; Wang, Shuo
2015-01-01
In this study, the complete mitochondrial genome sequence of the African lion P. leo leo was reported. The total length of the mitogenome was 17,054 bp. It contained the typical mitochondrial structure, including 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes and 1 control region; 21 of the tRNA genes folded into typical cloverleaf secondary structure except for tRNASe. The overall composition of the mitogenome was A (32.0%), G (14.5%), C (26.5%) and T (27.0%). The new sequence will provide molecular genetic information for conservation genetics study of this important large carnivore.
Genome structure of Rosa multiflora, a wild ancestor of cultivated roses
Nakamura, Noriko; Hirakawa, Hideki; Sato, Shusei; Otagaki, Shungo; Matsumoto, Shogo; Tabata, Satoshi; Tanaka, Yoshikazu
2018-01-01
Abstract The draft genome sequence of a wild rose (Rosa multiflora Thunb.) was determined using Illumina MiSeq and HiSeq platforms. The total length of the scaffolds was 739,637,845 bp, consisting of 83,189 scaffolds, which was close to the 711 Mbp length estimated by k-mer analysis. N50 length of the scaffolds was 90,830 bp, and extent of the longest was 1,133,259 bp. The average GC content of the scaffolds was 38.9%. After gene prediction, 67,380 candidates exhibiting sequence homology to known genes and domains were extracted, which included complete and partial gene structures. This large number of genes for a diploid plant may reflect heterogeneity of the genome originating from self-incompatibility in R. multiflora. According to CEGMA analysis, 91.9% and 98.0% of the core eukaryotic genes were completely and partially conserved in the scaffolds, respectively. Genes presumably involved in flower color, scent and flowering are assigned. The results of this study will serve as a valuable resource for fundamental and applied research in the rose, including breeding and phylogenetic study of cultivated roses. PMID:29045613
First Complete Genome Sequence of Pepper vein yellows virus from Australia
Maina, Solomon; Edwards, Owain R.
2016-01-01
We present here the first complete genomic RNA sequence of the polerovirus Pepper vein yellows virus (PeVYV) obtained from a pepper plant in Australia. We compare it with complete PeVYV genomes from Japan and China. The Australian genome was more closely related to the Japanese than the Chinese genome. PMID:27231375
Sloan, Daniel B; Müller, Karel; McCauley, David E; Taylor, Douglas R; Storchová, Helena
2012-12-01
In angiosperms, mitochondrial-encoded genes can cause cytoplasmic male sterility (CMS), resulting in the coexistence of female and hermaphroditic individuals (gynodioecy). We compared four complete mitochondrial genomes from the gynodioecious species Silene vulgaris and found unprecedented amounts of intraspecific diversity for plant mitochondrial DNA (mtDNA). Remarkably, only about half of overall sequence content is shared between any pair of genomes. The four mtDNAs range in size from 361 to 429 kb and differ in gene complement, with rpl5 and rps13 being intact in some genomes but absent or pseudogenized in others. The genomes exhibit essentially no conservation of synteny and are highly repetitive, with evidence of reciprocal recombination occurring even across short repeats (< 250 bp). Some mitochondrial genes exhibit atypically high degrees of nucleotide polymorphism, while others are invariant. The genomes also contain a variable number of small autonomously mapping chromosomes, which have only recently been identified in angiosperm mtDNA. Southern blot analysis of one of these chromosomes indicated a complex in vivo structure consisting of both monomeric circles and multimeric forms. We conclude that S. vulgaris harbors an unusually large degree of variation in mtDNA sequence and structure and discuss the extent to which this variation might be related to CMS. © 2012 The Authors. New Phytologist © 2012 New Phytologist Trust.
Chai, Huan-Na; Du, Yu-Zhou
2012-01-01
The complete 15,413-bp mitochondrial genome (mitogenome) of Sesamia inferens (Walker) (Lepidoptera: Noctuidae) was sequenced and compared with those of four other noctuid moths. All of the mitogenomes analyzed displayed similar characteristics with respect to gene content, genome organization, nucleotide comparison, and codon usages. Twelve-one protein-coding genes (PCGs) utilized the standard ATN, but the cox1 gene used CGA as the initiation codon; cox1, cox2, and nad4 genes had the truncated termination codon T in the S. inferens mitogenome. All of the tRNA genes had typical cloverleaf secondary structures except for trnS1(AGN), in which the dihydrouridine (DHU) arm did not form a stable stem-loop structure. Both the secondary structures of rrnL and rrnS genes inferred from the S. inferens mitogenome closely resembled those of other noctuid moths. In the A+T-rich region, the conserved motif "ATAGA" followed by a long T-stretch was observed in all noctuid moths, but other specific tandem-repeat elements were more variable. Additionally, the S. inferens mitogenome contained a potential stem-loop structure, a duplicated 17-bp repeat element, a decuplicated segment, and a microsatellite "(AT)(7)", without a poly-A element upstream of the trnM in the A+T-rich region. Finally, the phylogenetic relationships were reconstructed based on amino acid sequences of mitochondrial 13 PCGs, which support the traditional morphologically based view of relationships within the Noctuidae.
Chai, Huan-Na; Du, Yu-Zhou
2012-01-01
The complete 15,413-bp mitochondrial genome (mitogenome) of Sesamia inferens (Walker) (Lepidoptera: Noctuidae) was sequenced and compared with those of four other noctuid moths. All of the mitogenomes analyzed displayed similar characteristics with respect to gene content, genome organization, nucleotide comparison, and codon usages. Twelve-one protein-coding genes (PCGs) utilized the standard ATN, but the cox1 gene used CGA as the initiation codon; cox1, cox2, and nad4 genes had the truncated termination codon T in the S. inferens mitogenome. All of the tRNA genes had typical cloverleaf secondary structures except for trnS1(AGN), in which the dihydrouridine (DHU) arm did not form a stable stem-loop structure. Both the secondary structures of rrnL and rrnS genes inferred from the S. inferens mitogenome closely resembled those of other noctuid moths. In the A+T-rich region, the conserved motif “ATAGA” followed by a long T-stretch was observed in all noctuid moths, but other specific tandem-repeat elements were more variable. Additionally, the S. inferens mitogenome contained a potential stem-loop structure, a duplicated 17-bp repeat element, a decuplicated segment, and a microsatellite “(AT)7”, without a poly-A element upstream of the trnM in the A+T-rich region. Finally, the phylogenetic relationships were reconstructed based on amino acid sequences of mitochondrial 13 PCGs, which support the traditional morphologically based view of relationships within the Noctuidae. PMID:22949858
Tsuchiaka, Shinobu; Rahpaya, Sayed Samim; Otomaru, Konosuke; Aoki, Hiroshi; Kishimoto, Mai; Naoi, Yuki; Omatsu, Tsutomu; Sano, Kaori; Okazaki-Terashima, Sachiko; Katayama, Yukie; Oba, Mami; Nagai, Makoto; Mizutani, Tetsuya
2017-01-17
Bovine enterovirus (BEV) belongs to the species Enterovirus E or F, genus Enterovirus and family Picornaviridae. Although numerous studies have identified BEVs in the feces of cattle with diarrhea, the pathogenicity of BEVs remains unclear. Previously, we reported the detection of novel kobu-like virus in calf feces, by metagenomics analysis. In the present study, we identified a novel BEV in diarrheal feces collected for that survey. Complete genome sequences were determined by deep sequencing in feces. Secondary RNA structure analysis of the 5' untranslated region (UTR), phylogenetic tree construction and pairwise identity analysis were conducted. The complete genome sequences of BEV were genetically distant from other EVs and the VP1 coding region contained novel and unique amino acid sequences. We named this strain as BEV AN12/Bos taurus/JPN/2014 (referred to as BEV-AN12). According to genome analysis, the genome length of this virus is 7414 nucleotides excluding the poly (A) tail and its genome consists of a 5'UTR, open reading frame encoding a single polyprotein, and 3'UTR. The results of secondary RNA structure analysis showed that in the 5'UTR, BEV-AN12 had an additional clover leaf structure and small stem loop structure, similarly to other BEVs. In pairwise identity analysis, BEV-AN12 showed high amino acid (aa) identities to Enterovirus F in the polyprotein, P2 and P3 regions (aa identity ≥82.4%). Therefore, BEV-AN12 is closely related to Enterovirus F. However, aa sequences in the capsid protein regions, particularly the VP1 encoding region, showed significantly low aa identity to other viruses in genus Enterovirus (VP1 aa identity ≤58.6%). In addition, BEV-AN12 branched separately from Enterovirus E and F in phylogenetic trees based on the aa sequences of P1 and VP1, although it clustered with Enterovirus F in trees based on sequences in the P2 and P3 genome region. We identified novel BEV possessing highly divergent aa sequences in the VP1 coding region in Japan. According to species definition, we proposed naming this strain as "Enterovirus K", which is a novel species within genus Enterovirus. Further genomic studies are needed to understand the pathogenicity of BEVs.
Complete mitochondrial genome sequence of the polychaete annelidPlatynereis dumerilii
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boore, Jeffrey L.
2004-08-15
Complete mitochondrial genome sequences are now available for 126 metazoans (see Boore 1999; Mitochondrial Genomics link at http://www.jgi.doe.gov), but the taxonomic representation is highly biased. For example, 80 are from a single phylum, Chordata, and show little variation for many molecular features. Arthropoda is represented by 16 taxa, Mollusca by eight, and Echinodermata by five, with only 17 others from the remaining {approx}30 metazoan phyla. With few exceptions (see Wolstenholme 1992 and Boore 1999) these are circular DNA molecules, about 16 kb in size, and encode the same set of 37 genes. A variety of non-standard names are sometimes usedmore » for animal mitochondrial genes; see Boore (1999) for gene nomenclature and a table of synonyms. Mitochondrial genome comparisons serve as a model of genome evolution. In this system, much smaller and simpler than that of the nucleus, are all of the same factors of genome evolution, where one may find tractable the changes in tRNA structure, base composition, genetic code, gene arrangement, etc. Further, patterns of mitochondrial gene rearrangements are an exceptionally reliable indicator of phylogenetic relationships (Smith et al.1993; Boore et al. 1995; Boore, Lavrov, and Brown 1998; Boore and Brown 1998, 2000; Dowton 1999; Stechmann and Schlegel 1999; Kurabayashi and Ueshima 2000). To these ends, we are sampling further the variation among major animal groups in features of their mitochondrial genomes.« less
Microbial genome analysis: the COG approach.
Galperin, Michael Y; Kristensen, David M; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V
2017-09-14
For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
Jaag, Hannah Miriam; Kawchuk, Lawrence; Rohde, Wolfgang; Fischer, Rainer; Emans, Neil; Prüfer, Dirk
2003-01-01
Potato leafroll polerovirus (PLRV) genomic RNA acts as a polycistronic mRNA for the production of proteins P0, P1, and P2 translated from the 5′-proximal half of the genome. Within the P1 coding region we identified a 5-kDa replication-associated protein 1 (Rap1) essential for viral multiplication. An internal ribosome entry site (IRES) with unusual structure and location was identified that regulates Rap1 translation. Core structural elements for internal ribosome entry include a conserved AUG codon and a downstream GGAGAGAGAGG motif with inverted symmetry. Reporter gene expression in potato protoplasts confirmed the internal ribosome entry function. Unlike known IRES motifs, the PLRV IRES is located completely within the coding region of Rap1 at the center of the PLRV genome. PMID:12835413
A Hybrid Approach for the Automated Finishing of Bacterial Genomes
Robins, William P.; Chin, Chen-Shan; Webster, Dale; Paxinos, Ellen; Hsu, David; Ashby, Meredith; Wang, Susana; Peluso, Paul; Sebra, Robert; Sorenson, Jon; Bullard, James; Yen, Jackie; Valdovino, Marie; Mollova, Emilia; Luong, Khai; Lin, Steven; LaMay, Brianna; Joshi, Amruta; Rowe, Lori; Frace, Michael; Tarr, Cheryl L.; Turnsek, Maryann; Davis, Brigid M; Kasarskis, Andrew; Mekalanos, John J.; Waldor, Matthew K.; Schadt, Eric E.
2013-01-01
Dramatic improvements in DNA sequencing technology have revolutionized our ability to characterize most genomic diversity. However, accurate resolution of large structural events has remained challenging due to the comparatively shorter read lengths of second-generation technologies. Emerging third-generation sequencing technologies, which yield markedly increased read length on rapid time scales and for low cost, have the potential to address assembly limitations. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at > 99.9% accuracy. Complex regions with clinically significant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 reference we obtain 14 and 8 scaffolds greater than 1kb, respectively, correcting several errors in the underlying source data. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly. PMID:22750883
Comparative analysis of chloroplast genomes of the genus Citrus and its close relatives.
Liu, Xiaogang; Wu, Hongkun; Luo, Yan; Xi, Wanpeng; Zhou, Zhiqin
2017-01-01
The genus Citrus and its close relatives are economically and nutritionally important fruit trees. However, the huge controversy over the phylogeny of key wild species, as well as the genetic relationship between the cultivated species and their putative wild progenitors, remains unresolved. Comparative analyses of chloroplast (cp) genomes have been useful in resolving various phylogenetic issues. Thus far, the cp genomes of only two Citrus species have been sequenced. In this study, we sequenced six complete cp genomes, four belonging to the genus Citrus, and two belonging to the genera Fortunella and Poncirus, respectively. These newly sequenced genomes together with the two publicly available were used for comparative analyses of the genus Citrus and its close relatives. All eight cp genomes share similar basic structure, gene order and gene content. Phylogenetic analyses supported the monophyly of the three genera in the order Sapindales within the major clade Malvidae.
Vassy, Jason L; Christensen, Kurt D; Slashinski, Melody J; Lautenbach, Denise M; Raghavan, Sridharan; Robinson, Jill Oliver; Blumenthal-Barby, Jennifer; Feuerman, Lindsay Zausmer; Lehmann, Lisa Soleymani; Murray, Michael F; Green, Robert C; McGuire, Amy L
2015-01-01
Aim To describe practicing physicians’ perceived clinical utility of genome sequencing. Materials & methods We conducted a mixed-methods analysis of data from 18 primary care physicians and cardiologists in a study of the clinical integration of whole-genome sequencing. Physicians underwent brief genomics continuing medical education before completing surveys and semi-structured interviews. Results Physicians described sequencing as currently lacking clinical utility because of its uncertain interpretation and limited impact on clinical decision-making, but they expressed the idea that its clinical integration was inevitable. Potential clinical uses for sequencing included complementing other clinical information, risk stratification, motivating patient behavior change and pharmacogenetics. Conclusion Physicians given genomics continuing medical education use the language of both evidence-based and personalized medicine in describing the utility of genome-wide testing in patient care. PMID:25642274
Oh, Dong-Ha; Hong, Hyewon; Lee, Sang Yeol; Yun, Dae-Jin; Bohnert, Hans J.; Dassanayake, Maheshi
2014-01-01
Schrenkiella parvula (formerly Thellungiella parvula), a close relative of Arabidopsis (Arabidopsis thaliana) and Brassica crop species, thrives on the shores of Lake Tuz, Turkey, where soils accumulate high concentrations of multiple-ion salts. Despite the stark differences in adaptations to extreme salt stresses, the genomes of S. parvula and Arabidopsis show extensive synteny. S. parvula completes its life cycle in the presence of Na+, K+, Mg2+, Li+, and borate at soil concentrations lethal to Arabidopsis. Genome structural variations, including tandem duplications and translocations of genes, interrupt the colinearity observed throughout the S. parvula and Arabidopsis genomes. Structural variations distinguish homologous gene pairs characterized by divergent promoter sequences and basal-level expression strengths. Comparative RNA sequencing reveals the enrichment of ion-transport functions among genes with higher expression in S. parvula, while pathogen defense-related genes show higher expression in Arabidopsis. Key stress-related ion transporter genes in S. parvula showed increased copy number, higher transcript dosage, and evidence for subfunctionalization. This extremophyte offers a framework to identify the requisite adjustments of genomic architecture and expression control for a set of genes found in most plants in a way to support distinct niche adaptation and lifestyles. PMID:24563282
Structural analysis of the α subunit of Na(+)/K(+) ATPase genes in invertebrates.
Thabet, Rahma; Rouault, J-D; Ayadi, Habib; Leignel, Vincent
2016-01-01
The Na(+)/K(+) ATPase is a ubiquitous pump coordinating the transport of Na(+) and K(+) across the membrane of cells and its role is fundamental to cellular functions. It is heteromer in eukaryotes including two or three subunits (α, β and γ which is specific to the vertebrates). The catalytic functions of the enzyme have been attributed to the α subunit. Several complete α protein sequences are available, but only few gene structures were characterized. We identified the genomic sequences coding the α-subunit of the Na(+)/K(+) ATPase, from the whole-genome shotgun contigs (WGS), NCBI Genomes (chromosome), Genomic Survey Sequences (GSS) and High Throughput Genomic Sequences (HTGS) databases across distinct phyla. One copy of the α subunit gene was found in Annelida, Arthropoda, Cnidaria, Echinodermata, Hemichordata, Mollusca, Placozoa, Porifera, Platyhelminthes, Urochordata, but the nematodes seem to possess 2 to 4 copies. The number of introns varied from 0 (Platyhelminthes) to 26 (Porifera); and their localization and length are also highly variable. Molecular phylogenies (Maximum Likelihood and Maximum Parsimony methods) showed some clusters constituted by (Chordata/(Echinodermata/Hemichordata)) or (Plathelminthes/(Annelida/Mollusca)) and a basal position for Porifera. These structural analyses increase our knowledge about the evolutionary events of the α subunit genes in the invertebrates. Copyright © 2016 Elsevier Inc. All rights reserved.
Asaf, Sajjad; Khan, Abdul Latif; Khan, Muhammad Aaqil; Waqas, Muhammad; Kang, Sang-Mo; Yun, Byung-Wook; Lee, In-Jung
2017-08-08
We investigated the complete chloroplast (cp) genomes of non-model Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea using Illumina paired-end sequencing to understand their genetic organization and structure. Detailed bioinformatics analysis revealed genome sizes of both subspecies ranging between 154.4~154.5 kbp, with a large single-copy region (84,197~84,158 bp), a small single-copy region (17,738~17,813 bp) and pair of inverted repeats (IRa/IRb; 26,264~26,259 bp). Both cp genomes encode 130 genes, including 85 protein-coding genes, eight ribosomal RNA genes and 37 transfer RNA genes. Whole cp genome comparison of A. halleri ssp. gemmifera and A. lyrata ssp. petraea, along with ten other Arabidopsis species, showed an overall high degree of sequence similarity, with divergence among some intergenic spacers. The location and distribution of repeat sequences were determined, and sequence divergences of shared genes were calculated among related species. Comparative phylogenetic analysis of the entire genomic data set and 70 shared genes between both cp genomes confirmed the previous phylogeny and generated phylogenetic trees with the same topologies. The sister species of A. halleri ssp. gemmifera is A. umezawana, whereas the closest relative of A. lyrata spp. petraea is A. arenicola.
Assembly: a resource for assembled genomes at NCBI
Kitts, Paul A.; Church, Deanna M.; Thibaud-Nissen, Françoise; Choi, Jinna; Hem, Vichet; Sapojnikov, Victor; Smith, Robert G.; Tatusova, Tatiana; Xiang, Charlie; Zherikov, Andrey; DiCuccio, Michael; Murphy, Terence D.; Pruitt, Kim D.; Kimchi, Avi
2016-01-01
The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site. PMID:26578580
Year 2 Report: Protein Function Prediction Platform
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, C E
2012-04-27
Upon completion of our second year of development in a 3-year development cycle, we have completed a prototype protein structure-function annotation and function prediction system: Protein Function Prediction (PFP) platform (v.0.5). We have met our milestones for Years 1 and 2 and are positioned to continue development in completion of our original statement of work, or a reasonable modification thereof, in service to DTRA Programs involved in diagnostics and medical countermeasures research and development. The PFP platform is a multi-scale computational modeling system for protein structure-function annotation and function prediction. As of this writing, PFP is the only existing fullymore » automated, high-throughput, multi-scale modeling, whole-proteome annotation platform, and represents a significant advance in the field of genome annotation (Fig. 1). PFP modules perform protein functional annotations at the sequence, systems biology, protein structure, and atomistic levels of biological complexity (Fig. 2). Because these approaches provide orthogonal means of characterizing proteins and suggesting protein function, PFP processing maximizes the protein functional information that can currently be gained by computational means. Comprehensive annotation of pathogen genomes is essential for bio-defense applications in pathogen characterization, threat assessment, and medical countermeasure design and development in that it can short-cut the time and effort required to select and characterize protein biomarkers.« less
2016-10-27
Institute of Infectious Diseases, Fort Detrick, Frederick, Maryland, USA 9 10 11 Running head: Complete Genome Sequence of Y. pestis strain Cadman...1 Complete Genome Sequence of Pigmentation Negative Yersinia pestis strain Cadman 1 2 3 Sean Lovetta, Kitty Chaseb, Galina Korolevaa, Gustavo...we report the genome sequence of Yersinia pestis strain Cadman, an attenuated strain 25 lacking the pgm locus. Y. pestis is the causative agent of
Laghari, Muhammad Younis; Lashari, Punhal; Xu, Peng; Zhao, Zixia; Jiang, Li; Narejo, Naeem Tariq; Xin, Baoping; Sun, Xiaowen; Zhang, Yan
2016-01-01
Complete mitochondrial genome of fresh water giant catfish, Wallago attu, was isolated by LA PCR (TakaRa LAtaq, Dalian, China); and sequenced by Sanger's method to obtain the complete mitochondrial genome. The complete mitogenome was 15,639 bp in length and contains 13 typical vertebrate protein-coding genes, 2 rRNA and 22 tRNA genes. The whole genome base composition was estimated to be 31.17% A, 28.15% C, 15.55% G and 25.12% T. The complete mitochondrial genome of catfish, W. attu, provides the fundamental tools for genetic breeding.
Complete Chloroplast Genome of the Wollemi Pine (Wollemia nobilis): Structure and Evolution
Yap, Jia-Yee S.; Rohner, Thore; Greenfield, Abigail; Van Der Merwe, Marlien; McPherson, Hannah; Glenn, Wendy; Kornfeld, Geoff; Marendy, Elessa; Pan, Annie Y. H.; Wilkins, Marc R.; Rossetto, Maurizio; Delaney, Sven K.
2015-01-01
The Wollemi pine (Wollemia nobilis) is a rare Southern conifer with striking morphological similarity to fossil pines. A small population of W. nobilis was discovered in 1994 in a remote canyon system in the Wollemi National Park (near Sydney, Australia). This population contains fewer than 100 individuals and is critically endangered. Previous genetic studies of the Wollemi pine have investigated its evolutionary relationship with other pines in the family Araucariaceae, and have suggested that the Wollemi pine genome contains little or no variation. However, these studies were performed prior to the widespread use of genome sequencing, and their conclusions were based on a limited fraction of the Wollemi pine genome. In this study, we address this problem by determining the entire sequence of the W. nobilis chloroplast genome. A detailed analysis of the structure of the genome is presented, and the evolution of the genome is inferred by comparison with the chloroplast sequences of other members of the Araucariaceae and the related family Podocarpaceae. Pairwise alignments of whole genome sequences, and the presence of unique pseudogenes, gene duplications and insertions in W. nobilis and Araucariaceae, indicate that the W. nobilis chloroplast genome is most similar to that of its sister taxon Agathis. However, the W. nobilis genome contains an unusually high number of repetitive sequences, and these could be used in future studies to investigate and conserve any remnant genetic diversity in the Wollemi pine. PMID:26061691
The complete mitochondrial genome of the Korean skate: Hongeo koreana (Rajiformes, Rajidae).
Jeong, Dageum; Kim, Sung; Kim, Choong-Gon; Lee, Youn-Ho
2014-12-01
The complete mitochondrial genome of the Korean skate, Hongeo koreana, the sole member of its genus, is investigated for the first time. The genome consists of 16,906 bp in length including 2 rRNA, 22 tRNA and 13 protein coding genes with the same gene order and structure of the genome as those of other Rajidae species. The overall nucleotide composition of the L-strand is A = 29.8%, C = 27.9%, T = 27.9% and G = 14.3%, showing a high A + T bias. The anti-G bias (6.0%) is more significant in the third codon position. Twelve of the 13 protein-coding genes use ATG as their start codon while the COX1 gene starts with GTG. For stop codon, ND3 and ND4 genes show incomplete stop codon T. The mitogenome sequence of H. koreana will provide important information on the evolution and the phylogenetic relation of the genus Hongeo in relation to the other genera of the family Rajidae.
A decade of human genome project conclusion: Scientific diffusion about our genome knowledge.
Moraes, Fernanda; Góes, Andréa
2016-05-06
The Human Genome Project (HGP) was initiated in 1990 and completed in 2003. It aimed to sequence the whole human genome. Although it represented an advance in understanding the human genome and its complexity, many questions remained unanswered. Other projects were launched in order to unravel the mysteries of our genome, including the ENCyclopedia of DNA Elements (ENCODE). This review aims to analyze the evolution of scientific knowledge related to both the HGP and ENCODE projects. Data were retrieved from scientific articles published in 1990-2014, a period comprising the development and the 10 years following the HGP completion. The fact that only 20,000 genes are protein and RNA-coding is one of the most striking HGP results. A new concept about the organization of genome arose. The ENCODE project was initiated in 2003 and targeted to map the functional elements of the human genome. This project revealed that the human genome is pervasively transcribed. Therefore, it was determined that a large part of the non-protein coding regions are functional. Finally, a more sophisticated view of chromatin structure emerged. The mechanistic functioning of the genome has been redrafted, revealing a much more complex picture. Besides, a gene-centric conception of the organism has to be reviewed. A number of criticisms have emerged against the ENCODE project approaches, raising the question of whether non-conserved but biochemically active regions are truly functional. Thus, HGP and ENCODE projects accomplished a great map of the human genome, but the data generated still requires further in depth analysis. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:215-223, 2016. © 2016 The International Union of Biochemistry and Molecular Biology.
Gao, Lei; Yi, Xuan; Yang, Yong-Xia; Su, Ying-Juan; Wang, Ting
2009-06-11
Ferns have generally been neglected in studies of chloroplast genomics. Before this study, only one polypod and two basal ferns had their complete chloroplast (cp) genome reported. Tree ferns represent an ancient fern lineage that first occurred in the Late Triassic. In recent phylogenetic analyses, tree ferns were shown to be the sister group of polypods, the most diverse group of living ferns. Availability of cp genome sequence from a tree fern will facilitate interpretation of the evolutionary changes of fern cp genomes. Here we have sequenced the complete cp genome of a scaly tree fern Alsophila spinulosa (Cyatheaceae). The Alsophila cp genome is 156,661 base pairs (bp) in size, and has a typical quadripartite structure with the large (LSC, 86,308 bp) and small single copy (SSC, 21,623 bp) regions separated by two copies of an inverted repeat (IRs, 24,365 bp each). This genome contains 117 different genes encoding 85 proteins, 4 rRNAs and 28 tRNAs. Pseudogenes of ycf66 and trnT-UGU are also detected in this genome. A unique trnR-UCG gene (derived from trnR-CCG) is found between rbcL and accD. The Alsophila cp genome shares some unusual characteristics with the previously sequenced cp genome of the polypod fern Adiantum capillus-veneris, including the absence of 5 tRNA genes that exist in most other cp genomes. The genome shows a high degree of synteny with that of Adiantum, but differs considerably from two basal ferns (Angiopteris evecta and Psilotum nudum). At one endpoint of an ancient inversion we detected a highly repeated 565-bp-region that is absent from the Adiantum cp genome. An additional minor inversion of the trnD-GUC, which is possibly shared by all ferns, was identified by comparison between the fern and other land plant cp genomes. By comparing four fern cp genome sequences it was confirmed that two major rearrangements distinguish higher leptosporangiate ferns from basal fern lineages. The Alsophila cp genome is very similar to that of the polypod fern Adiantum in terms of gene content, gene order and GC content. However, there exist some striking differences between them: the trnR-UCG gene represents a putative molecular apomorphy of tree ferns; and the repeats observed at one inversion endpoint may be a vestige of some unknown rearrangement(s). This work provided fresh insights into the fern cp genome evolution as well as useful data for future phylogenetic studies.
Geographic Population Structure in Epstein-Barr Virus Revealed by Comparative Genomics
Chiara, Matteo; Manzari, Caterina; Lionetti, Claudia; Mechelli, Rosella; Anastasiadou, Eleni; Chiara Buscarinu, Maria; Ristori, Giovanni; Salvetti, Marco; Picardi, Ernesto; D’Erchia, Anna Maria; Pesole, Graziano; Horner, David S.
2016-01-01
Epstein-Barr virus (EBV) latently infects the majority of the human population and is implicated as a causal or contributory factor in numerous diseases. We sequenced 27 complete EBV genomes from a cohort of Multiple Sclerosis (MS) patients and healthy controls from Italy, although no variants showed a statistically significant association with MS. Taking advantage of the availability of ∼130 EBV genomes with known geographical origins, we reveal a striking geographic distribution of EBV sub-populations with distinct allele frequency distributions. We discuss mechanisms that potentially explain these observations, and their implications for understanding the association of EBV with human disease. PMID:27635051
Beyond The Human Genome: What's Next? (LBNL Summer Lecture Series)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rokhsar, Daniel
2003-06-18
UC Berkeley's Daniel Rokhsar and his colleagues were instrumental in contributing the sequences for three of the human body's chromosomes in the effort to decipher the blueprint of life- the completion of the DNA sequencing of the human genome. Now he is turning to the structure and function of genes in other organisms, some of them no less important to the planet's future than the human map. Hear the latest in this lecture from Lawrence Berkeley National Laboratory.
Beyond The Human Genome: What's Next? (LBNL Summer Lecture Series)
Rokhsar, Daniel
2018-04-27
UC Berkeley's Daniel Rokhsar and his colleagues were instrumental in contributing the sequences for three of the human body's chromosomes in the effort to decipher the blueprint of life- the completion of the DNA sequencing of the human genome. Now he is turning to the structure and function of genes in other organisms, some of them no less important to the planet's future than the human map. Hear the latest in this lecture from Lawrence Berkeley National Laboratory.
Complete genome sequence of Paenibacillus sp. strain JDR-2
Virginia Chow; Guang Nong; Franz J. St. John; John D. Rice; Ellen Dickstein; Olga Chertkov; David Bruce; Chris Detter; Thomas Brettin; James Han; Tanja Woyke; Sam Pitluck; Matt Nolan; Amrita Pati; Joel Martin; Alex Copeland; Miriam L. Land; Lynne Goodwin; Jeffrey B. Jones; Lonnie O. Ingram; Keelnathan T. Shanmugam; James F. Preston
2012-01-01
Paenibacillus sp. strain JDR-2, an aggressively xylanolytic bacterium isolated from sweetgum (Liquidambar styraciflua) wood, is able to efficiently depolymerize, assimilate and metabolize 4-O-methylglucuronoxylan, the predominant structural component of hardwood hemicelluloses. A basis for this capability was first supported by...
Analysis of genotype diversity and evolution of Dengue virus serotype 2 using complete genomes
Waman, Vaishali P.; Kolekar, Pandurang; Ramtirthkar, Mukund R.; Kale, Mohan M.
2016-01-01
Background Dengue is one of the most common arboviral diseases prevalent worldwide and is caused by Dengue viruses (genus Flavivirus, family Flaviviridae). There are four serotypes of Dengue Virus (DENV-1 to DENV-4), each of which is further subdivided into distinct genotypes. DENV-2 is frequently associated with severe dengue infections and epidemics. DENV-2 consists of six genotypes such as Asian/American, Asian I, Asian II, Cosmopolitan, American and sylvatic. Comparative genomic study was carried out to infer population structure of DENV-2 and to analyze the role of evolutionary and spatiotemporal factors in emergence of diversifying lineages. Methods Complete genome sequences of 990 strains of DENV-2 were analyzed using Bayesian-based population genetics and phylogenetic approaches to infer genetically distinct lineages. The role of spatiotemporal factors, genetic recombination and selection pressure in the evolution of DENV-2 is examined using the sequence-based bioinformatics approaches. Results DENV-2 genetic structure is complex and consists of fifteen subpopulations/lineages. The Asian/American genotype is observed to be diversified into seven lineages. The Asian I, Cosmopolitan and sylvatic genotypes were found to be subdivided into two lineages, each. The populations of American and Asian II genotypes were observed to be homogeneous. Significant evidence of episodic positive selection was observed in all the genes, except NS4A. Positive selection operational on a few codons in envelope gene confers antigenic and lineage diversity in the American strains of Asian/American genotype. Selection on codons of non-structural genes was observed to impact diversification of lineages in Asian I, cosmopolitan and sylvatic genotypes. Evidence of intra/inter-genotype recombination was obtained and the uncertainty in classification of recombinant strains was resolved using the population genetics approach. Discussion Complete genome-based analysis revealed that the worldwide population of DENV-2 strains is subdivided into fifteen lineages. The population structure of DENV-2 is spatiotemporal and is shaped by episodic positive selection and recombination. Intra-genotype diversity was observed in four genotypes (Asian/American, Asian I, cosmopolitan and sylvatic). Episodic positive selection on envelope and non-structural genes translates into antigenic diversity and appears to be responsible for emergence of strains/lineages in DENV-2 genotypes. Understanding of the genotype diversity and emerging lineages will be useful to design strategies for epidemiological surveillance and vaccine design. PMID:27635316
USDA-ARS?s Scientific Manuscript database
We report the complete genome sequence of Clavibacter michiganensis subsp. insidiosus R1-1 isolated in Minnesota, USA. The R1-1 genome, generated by de novo assembly of PacBio sequencing data, is the first complete genome sequence available for this subspecies....
Complete Genome Sequence of Porcine Parvovirus 2 Recovered from Swine Sera
Kluge, M.; Franco, A. C.; Giongo, A.; Valdez, F. P.; Saddi, T. M.; Brito, W. M. E. D.; Roehe, P. M.
2016-01-01
A complete genomic sequence of porcine parvovirus 2 (PPV-2) was detected by viral metagenome analysis on swine sera. A phylogenetic analysis of this genome reveals that it is highly similar to previously reported North American PPV-2 genomes. The complete PPV-2 sequence is 5,426 nucleotides long. PMID:26823583
Yiheng Hu; Xi Chen; Xiaojia Feng; Keith E. Woeste; Peng Zhao
2016-01-01
Carya sinensis (Chinese Hickory, beaked walnut, or beaked hickory) is an endangered species that needs urgent conservation action. Here, we reported the complete chloroplast (cp) genome sequence and the genomic features of the C. sinensis cp, which is the first complete cp genome of any member of Carya. The...
Complete genome of the cotton bacteria blight pathogen Xanthomonas citri pv. malvacearum strain MSCT
USDA-ARS?s Scientific Manuscript database
Xanthomonas citri pv. malvacearum (Xcm) is a major pathogen of Gossypium hirsutum. In this study we report the complete genome of the Xcm strain MSCT assembled from long read DNA sequencing technology. The MSCT genome is the first Xcm genome that has complete coding regions for Xcm transcriptional a...
Complete mitochondrial genome of the Freshwater Catfish Rita rita (Siluriformes, Bagridae).
Lashari, Punhal; Laghari, Muhammad Younis; Xu, Peng; Zhao, Zixia; Jiang, Li; Narejo, Naeem Tariq; Deng, Yulin; Sun, Xiaowen; Zhang, Yan
2015-01-01
The complete mitochondrial genome of Catfish, Rita rita, was isolated by LA PCR (TakaRa LAtaq, Dalian, China); and sequenced by Sanger's method to obtain the complete mitochondrial genome, which is listed Critically Endangered and Red Listed species. The complete mitogenome was 16,449 bp in length and contains 13 typical vertebrate protein-coding genes, 2 rRNA and 22 tRNA genes. The whole genome base composition was estimated to be 33.40% A, 27.43% C, 14.26% G and 24.89% T. The complete mitochondrial genome of catfish, Rita rita provides the basis for genetic breeding and conservation studies.
The complete chloroplast genome of salt cress (Eutrema salsugineum).
Guo, Xinyi; Hao, Guoqian; Ma, Tao
2016-07-01
The complete chloroplast (cp) sequence of the salt cress (Eutrema salsugineum), a plant well-adapted to salt stress, was presented in this study. The circular molecule is 153,407 bp in length and exhibit a typical quadripartite structure containing an 83,894 bp large single copy (LSC) region, a 17,607 bp small single copy (SSC) region, and the two 25,953 bp inverted repeats (IRs). The salt cress cp genome contains 135 known genes, including 87 protein-coding genes, 8 ribosomal RNA genes, and 40 tRNA genes; 21 of these are located in the inverted repeat region. As expected, phylogenetic analysis support the idea that E. salsugineum is sister to Brassiceae species within the Brassicaceae family.
Hu, Min; Chilton, Neil B; Gasser, Robin B
2002-02-01
The complete mitochondrial genome sequences were determined for two species of human hookworms, Ancylostoma duodenale (13,721 bp) and Necator americanus (13,604 bp). The circular hookworm genomes are amongst the smallest reported to date for any metazoan organism. Their relatively small size relates mainly to a reduced length in the AT-rich region. Both hookworm genomes encode 12 protein, two ribosomal RNA and 22 transfer RNA genes, but lack the ATP synthetase subunit 8 gene, which is consistent with three other species of Secernentea studied to date. All genes are transcribed in the same direction and have a nucleotide composition high in A and T, but low in G and C. The AT bias had a significant effect on both the codon usage pattern and amino acid composition of proteins. For both hookworm species, genes were arranged in the same order as for Caenorhabditis elegans, except for the presence of a non-coding region between genes nad3 and nad5. In A. duodenale, this non-coding region is predicted to form a stem-and-loop structure which is not present in N. americanus. The mitochondrial genome structure for both hookworms differs from Ascaris suum only in the location of the AT-rich region, whereas there are substantial differences when compared with Onchocerca volvulus, including four gene or gene-block translocations and the positions of some transfer RNA genes and the AT-rich region. Based on genome organisation and amino acid sequence identity, A. duodenale and N. americanus were more closely related to C. elegans than to A. suum or O. volvulus (all secernentean nematodes), consistent with a previous phylogenetic study using ribosomal DNA sequence data. Determination of the complete mitochondrial genome sequences for two human hookworms (the first members of the order Strongylida ever sequenced) provides a foundation for studying the systematics, population genetics and ecology of these and other nematodes of socio-economic importance.
Li, Jian-Long; Liu, Min; Hu, Xue-Yi
2016-01-01
The complete mitochondrial (mt) genome of the saddleback clownfish Amphiprion polymnus was obtained in this study. The circular mtDNA molecule was 16,804 bp in size and the overall nucleotide composition of the H-strand was 29.59% A, 25.93% T, 15.44% G and 29.04% C, with an A + T bias. The complete mitogenome encoded 13 protein-coding genes, 2 rRNAs, 22 tRNAs and 1 control region (D-loop), with the gene arrangement and translation direction basically identical to other typical vertebrate mitogenomes. We found A. polymnus (KJ101554) and A. bicinctus (JQ030887) had the same length in the protein-coding gene ND5 with 1869 bp, while the ND5 in A. ocellaris (AP006017) was 3 bp less than that of A. polymnus and A. bicinctus. Both structures of ND5, however, could translate to amino acid successfully.
Abayli, Hasan; Tonbak, Sukru; Azkur, Ahmet Kursat; Bulut, Hakan
2017-10-01
Relatively high prevalence and mortality rates of bovine ephemeral fever (BEF) have been reported in recent epidemics in some countries, including Turkey, when compared with previous outbreaks. A limited number of complete genome sequences of BEF virus (BEFV) are available in the GenBank Database. In this study, the complete genome of highly pathogenic BEFV isolated during an outbreak in Turkey in 2012 was analyzed for genetic characterization. The complete genome of the Turkish BEFV isolate was amplified by reverse transcription-polymerase chain reaction (RT-PCR) and sequenced. It was found that the complete genome of the Turkish BEFV isolate was 14,901 nt in length. The complete genome sequence obtained from the study showed 91-92% identity at nucleotide level to Australian (BB7721) and Chinese (Bovine/China/Henan1/2012) BEFV isolates. Phylogenetic analysis of the glycoprotein gene of the Turkish BEFV isolate also showed that Turkish isolates were closely related to Israeli isolates. Because of the limited number of complete BEFV genome sequences, the results from this study will be useful for understanding the global molecular epidemiology and geodynamics of BEF.
Sun, Chia-Tsen; Chiang, Austin W T; Hwang, Ming-Jing
2017-10-27
Proteome-scale bioinformatics research is increasingly conducted as the number of completely sequenced genomes increases, but analysis of protein domains (PDs) usually relies on similarity in their amino acid sequences and/or three-dimensional structures. Here, we present results from a bi-clustering analysis on presence/absence data for 6,580 unique PDs in 2,134 species with a sequenced genome, thus covering a complete set of proteins, for the three superkingdoms of life, Bacteria, Archaea, and Eukarya. Our analysis revealed eight distinctive PD clusters, which, following an analysis of enrichment of Gene Ontology functions and CATH classification of protein structures, were shown to exhibit structural and functional properties that are taxa-characteristic. For examples, the largest cluster is ubiquitous in all three superkingdoms, constituting a set of 1,472 persistent domains created early in evolution and retained in living organisms and characterized by basic cellular functions and ancient structural architectures, while an Archaea and Eukarya bi-superkingdom cluster suggests its PDs may have existed in the ancestor of the two superkingdoms, and others are single superkingdom- or taxa (e.g. Fungi)-specific. These results contribute to increase our appreciation of PD diversity and our knowledge of how PDs are used in species, yielding implications on species evolution.
Villela, Luciana Cristine Vasques; Alves, Anderson Luis; Varela, Eduardo Sousa; Yamagishi, Michel Eduardo Beleza; Giachetto, Poliana Fernanda; da Silva, Naiara Milagres Augusto; Ponzetto, Josi Margarete; Paiva, Samuel Rezende; Caetano, Alexandre Rodrigues
2017-02-01
The cachara (Pseudoplatystoma reticulatum) is a Neotropical freshwater catfish from family Pimelodidae (Siluriformes) native to Brazil. The species is of relative economic importance for local aquaculture production and basic biological information is under development to help boost efforts to domesticate and raise the species in commercial systems. The complete cachara mitochondrial genome was obtained by assembling Illumina RNA-seq data from pooled samples. The full mitogenome was found to be 16,576 bp in length, showing the same basic structure, order, and genetic organization observed in other Pimelodidae, with 13 protein-coding genes, 2 rNA genes, 22 trNAs, and a control region. Observed base composition was 24.63% T, 28.47% C, 31.45% A, and 15.44% G. With the exception of NAD6 and eight tRNAs, all of the observed mitochondrial genes were found to be coded on the H strand. A total of 107 SNPs were identified in P. reticulatum mtDNA, 67 of which were located in coding regions. Of these SNPs, 10 result in amino acid changes. Analysis of the obtained sequence with 94 publicly available full Siluriformes mitogenomes resulted in a phylogenetic tree that generally agreed with available phylogenetic proposals for the order. The first report of the complete Pseudoplatystoma reticulatum mitochondrial genome sequence revealed general gene organization, structure, content, and order similar to most vertebrates. Specific sequence and content features were observed and may have functional attributes which are now available for further investigation.
Production of pseudoinfectious yellow fever virus with a two-component genome.
Shustov, Alexandr V; Mason, Peter W; Frolov, Ilya
2007-11-01
Application of genetically modified, deficient-in-replication flaviviruses that are incapable of developing productive, spreading infection is a promising means of designing safe and effective vaccines. Here we describe a two-component genome yellow fever virus (YFV) replication system in which each of the genomes encodes complete sets of nonstructural proteins that form the replication complex but expresses either only capsid or prM/E instead of the entire structural polyprotein. Upon delivery to the same cell, these genomes produce together all of the viral structural proteins, and cells release a combination of virions with both types of genomes packaged into separate particles. In tissue culture, this modified YFV can be further passaged at an escalating scale by using a high multiplicity of infection (MOI). However, at a low MOI, only one of the genomes is delivered into the cells, and infection cannot spread. The replicating prM/E-encoding genome produces extracellular E protein in the form of secreted subviral particles that are known to be an effective immunogen. The presented strategy of developing viruses defective in replication might be applied to other flaviviruses, and these two-component genome viruses can be useful for diagnostic or vaccine applications, including the delivery and expression of heterologous genes. In addition, the achieved separation of the capsid-coding sequence and the cyclization signal in the YFV genome provides a new means for studying the mechanism of the flavivirus packaging process.
Gubala, Aneta; Davis, Steven; Weir, Richard; Melville, Lorna; Cowled, Chris; Boyle, David
2011-09-01
Tibrogargan virus (TIBV) and Coastal Plains virus (CPV) were isolated from cattle in Australia and TIBV has also been isolated from the biting midge Culicoides brevitarsis. Complete genomic sequencing revealed that the viruses share a novel genome structure within the family Rhabdoviridae, each virus containing two additional putative genes between the matrix protein (M) and glycoprotein (G) genes and one between the G and viral RNA polymerase (L) genes. The predicted novel protein products are highly diverged at the sequence level but demonstrate clear conservation of secondary structure elements, suggesting conservation of biological functions. Phylogenetic analyses showed that TIBV and CPV form an independent group within the 'dimarhabdovirus supergroup'. Although no disease has been observed in association with these viruses, antibodies were detected at high prevalence in cattle and buffalo in northern Australia, indicating the need for disease monitoring and further study of this distinctive group of viruses.
Yatawara, Lalani; Wickramasinghe, Susiji; Rajapakse, R P V J; Agatsuma, Takeshi
2010-09-01
In the present study, we determined the complete mitochondrial (mt) genome sequence (13,839bp) of parasitic nematode Setaria digitata and its structure and organization compared with Onchocerca volvulus, Dirofilaria immitis and Brugia malayi. The mt genome of S. digitata is slightly larger than the mt genomes of other filarial nematodes. S. digitata mt genome contains 36 genes (12 protein-coding genes, 22 transfer RNAs and 2 ribosomal RNAs) that are typically found in metazoans. This genome contains a high A+T (75.1%) content and low G+C content (24.9%). The mt gene order for S. digitata is the same as those for O. volvulus, D. immitis and B. malayi but it is distinctly different from other nematodes compared. The start codons inferred in the mt genome of S. digitata are TTT, ATT, TTG, ATG, GTT and ATA. Interestingly, the initiation codon TTT is unique to S. digitata mt genome and four protein-coding genes use this codon as a translation initiation codon. Five protein-coding genes use TAG as a stop codon whereas three genes use TAA and four genes use T as a termination codon. Out of 64 possible codons, only 57 are used for mitochondrial protein-coding genes of S. digitata. T-rich codons such as TTT (18.9%), GTT (7.9%), TTG (7.8%), TAT (7%), ATT (5.7%), TCT (4.8%) and TTA (4.1%) are used more frequently. This pattern of codon usage reflects the strong bias for T in the mt genome of S. digitata. In conclusion, the present investigation provides new molecular data for future studies of the comparative mitochondrial genomics and systematic of parasitic nematodes of socio-economic importance. 2010 Elsevier B.V. All rights reserved.
Genomic characterization of EmsB microsatellite loci in Echinococcus multilocularis.
Valot, Benoît; Knapp, Jenny; Umhang, Gérald; Grenouillet, Frédéric; Millon, Laurence
2015-06-01
EmsB is a molecular marker applied to Echinococcus multilocularis genotyping studies. This marker has largely been used to investigate the epidemiology of the parasite in different endemic foci. The present study has lifted the veil on the genetic structure of this microsatellite. By in silico analysis on the E. multilocularis genome the microsatellite was described in about 40 copies on the chromosome 5 of the parasite. Similar structure was found in the relative parasite Echinococcus granulosus, where the microsatellite was firstly described. The present study completes the first investigations made on the EmsB microsatellite origins and confirms the reliability of this highly discriminant molecular marker. Copyright © 2015 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
von Nickisch-Rosenegk, Markus; Brown, Wesley M.; Boore, Jeffrey L.
2001-01-01
Using ''long-PCR'' we have amplified in overlapping fragments the complete mitochondrial genome of the tapeworm Hymenolepis diminuta (Platyhelminthes: Cestoda) and determined its 13,900 nucleotide sequence. The gene content is the same as that typically found for animal mitochondrial DNA (mtDNA) except that atp8 appears to be lacking, a condition found previously for several other animals. Despite the small size of this mtDNA, there are two large non-coding regions, one of which contains 13 repeats of a 31 nucleotide sequence and a potential stem-loop structure of 25 base pairs with an 11-member loop. Large potential secondary structures are identified also formore » the non-coding regions of two other cestode mtDNAs. Comparison of the mitochondrial gene arrangement of H. diminuta with those previously published supports a phylogenetic position of flatworms as members of the Eutrochozoa, rather than being basal to either a clade of protostomes or a clade of coelomates.« less
USDA-ARS?s Scientific Manuscript database
The complete genome of a sparfloxacin-resistant Streptococcus agalactiae vaccine strain 138spar is 1,838,126 bp in size. The genome has 1892 coding sequences and 82 RNAs. The annotation of the genome is added by the NCBI Prokaryotic Genome Annotation Pipeline. The publishing of this genome will allo...
Complete Genome Sequence of Porcine Parvovirus 2 Recovered from Swine Sera.
Campos, F S; Kluge, M; Franco, A C; Giongo, A; Valdez, F P; Saddi, T M; Brito, W M E D; Roehe, P M
2016-01-28
A complete genomic sequence of porcine parvovirus 2 (PPV-2) was detected by viral metagenome analysis on swine sera. A phylogenetic analysis of this genome reveals that it is highly similar to previously reported North American PPV-2 genomes. The complete PPV-2 sequence is 5,426 nucleotides long. Copyright © 2016 Campos et al.
Lu, You; Samac, Deborah A.; Glazebrook, Jane
2015-01-01
We report here the complete genome sequence of Clavibacter michiganensis subsp. insidiosus R1-1, isolated in Minnesota, USA. The R1-1 genome, generated by a de novo assembly of PacBio sequencing data, is the first complete genome sequence available for this subspecies. PMID:25953184
Deep Sequencing Reveals the Complete Genome Sequence of Sweet potato virus G from East Timor
Maina, Solomon; Edwards, Owain R.; Barbetti, Martin J.; de Almeida, Luis; Ximenes, Abel
2016-01-01
We present the first complete Sweet potato virus G (SPVG) genome from sweet potato in East Timor and compare it with seven complete SPVG genomes from South Korea (three), Taiwan (two), Argentina (one), and the United States (one). It most resembles the genomes from the United States and South Korea. PMID:27609925
Mind the gap; seven reasons to close fragmented genome assemblies.
Thomma, Bart P H J; Seidl, Michael F; Shi-Kunne, Xiaoqian; Cook, David E; Bolton, Melvin D; van Kan, Jan A L; Faino, Luigi
2016-05-01
Like other domains of life, research into the biology of filamentous microbes has greatly benefited from the advent of whole-genome sequencing. Next-generation sequencing (NGS) technologies have revolutionized sequencing, making genomic sciences accessible to many academic laboratories including those that study non-model organisms. Thus, hundreds of fungal genomes have been sequenced and are publically available today, although these initiatives have typically yielded considerably fragmented genome assemblies that often lack large contiguous genomic regions. Many important genomic features are contained in intergenic DNA that is often missing in current genome assemblies, and recent studies underscore the significance of non-coding regions and repetitive elements for the life style, adaptability and evolution of many organisms. The study of particular types of genetic elements, such as telomeres, centromeres, repetitive elements, effectors, and clusters of co-regulated genes, but also of phenomena such as structural rearrangements, genome compartmentalization and epigenetics, greatly benefits from having a contiguous and high-quality, preferably even complete and gapless, genome assembly. Here we discuss a number of important reasons to produce gapless, finished, genome assemblies to help answer important biological questions. Copyright © 2015 Elsevier Inc. All rights reserved.
Structural and sequence diversity of the transposon Galileo in the Drosophila willistoni genome.
Gonçalves, Juliana W; Valiati, Victor Hugo; Delprat, Alejandra; Valente, Vera L S; Ruiz, Alfredo
2014-09-13
Galileo is one of three members of the P superfamily of DNA transposons. It was originally discovered in Drosophila buzzatii, in which three segregating chromosomal inversions were shown to have been generated by ectopic recombination between Galileo copies. Subsequently, Galileo was identified in six of 12 sequenced Drosophila genomes, indicating its widespread distribution within this genus. Galileo is strikingly abundant in Drosophila willistoni, a neotropical species that is highly polymorphic for chromosomal inversions, suggesting a role for this transposon in the evolution of its genome. We carried out a detailed characterization of all Galileo copies present in the D. willistoni genome. A total of 191 copies, including 133 with two terminal inverted repeats (TIRs), were classified according to structure in six groups. The TIRs exhibited remarkable variation in their length and structure compared to the most complete copy. Three copies showed extended TIRs due to internal tandem repeats, the insertion of other transposable elements (TEs), or the incorporation of non-TIR sequences into the TIRs. Phylogenetic analyses of the transposase (TPase)-encoding and TIR segments yielded two divergent clades, which we termed Galileo subfamilies V and W. Target-site duplications (TSDs) in D. willistoni Galileo copies were 7- or 8-bp in length, with the consensus sequence GTATTAC. Analysis of the region around the TSDs revealed a target site motif (TSM) with a 15-bp palindrome that may give rise to a stem-loop secondary structure. There is a remarkable abundance and diversity of Galileo copies in the D. willistoni genome, although no functional copies were found. The TIRs in particular have a dynamic structure and extend in different ways, but their ends (required for transposition) are more conserved than the rest of the element. The D. willistoni genome harbors two Galileo subfamilies (V and W) that diverged ~9 million years ago and may have descended from an ancestral element in the genome. Galileo shows a significant insertion preference for a 15-bp palindromic TSM.
Functional interactions of archaea, bacteria and viruses in a hypersaline endolithic community.
Crits-Christoph, Alexander; Gelsinger, Diego R; Ma, Bing; Wierzchos, Jacek; Ravel, Jacques; Davila, Alfonso; Casero, M Cristina; DiRuggiero, Jocelyne
2016-06-01
Halite endoliths in the Atacama Desert represent one of the most extreme ecosystems on Earth. Cultivation-independent methods were used to examine the functional adaptations of the microbial consortia inhabiting halite nodules. The community was dominated by haloarchaea and functional analysis attributed most of the autotrophic CO2 fixation to one unique cyanobacterium. The assembled 1.1 Mbp genome of a novel nanohaloarchaeon, Candidatus Nanopetramus SG9, revealed a photoheterotrophic life style and a low median isoelectric point (pI) for all predicted proteins, suggesting a 'salt-in' strategy for osmotic balance. Predicted proteins of the algae identified in the community also had pI distributions similar to 'salt-in' strategists. The Nanopetramus genome contained a unique CRISPR/Cas system with a spacer that matched a partial viral genome from the metagenome. A combination of reference-independent methods identified over 30 complete or near complete viral or proviral genomes with diverse genome structure, genome size, gene content and hosts. Putative hosts included Halobacteriaceae, Nanohaloarchaea and Cyanobacteria. Despite the dependence of the halite community on deliquescence for liquid water availability, this study exposed an ecosystem spanning three phylogenetic domains, containing a large diversity of viruses and predominance of a 'salt-in' strategy to balance the high osmotic pressure of the environment. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.
Implementing genomic medicine in pathology.
Williams, Eli S; Hegde, Madhuri
2013-07-01
The finished sequence of the Human Genome Project, published 50 years after Watson and Crick's seminal paper on the structure of DNA, pushed human genetics into the public eye and ushered in the genomic era. A significant, if overlooked, aspect of the race to complete the genome was the technology that propelled scientists to the finish line. DNA sequencing technologies have become more standardized, automated, and capable of higher throughput. This technology has continued to grow at an astounding rate in the decade since the Human Genome Project was completed. Today, massively parallel sequencing, or next-generation sequencing (NGS), allows the detection of genetic variants across the entire genome. This ability has led to the identification of new causes of disease and is changing the way we categorize, treat, and manage disease. NGS approaches such as whole-exome sequencing and whole-genome sequencing are rapidly becoming an affordable genetic testing strategy for the clinical laboratory. One test can now provide vast amounts of health information pertaining not only to the disease of interest, but information that may also predict adult-onset disease, reveal carrier status for a rare disease and predict drug responsiveness. The issue of what to do with these incidental findings, along with questions pertaining to NGS testing strategies, data interpretation and storage, and applying genetic testing results into patient care, remains without a clear answer. This review will explore these issues and others relevant to the implementation of NGS in the clinical laboratory.
Complete Genomic Structure of the Bloom-forming Toxic Cyanobacterium Microcystis aeruginosa NIES-843
Kaneko, Takakazu; Nakajima, Nobuyoshi; Okamoto, Shinobu; Suzuki, Iwane; Tanabe, Yuuhiko; Tamaoki, Masanori; Nakamura, Yasukazu; Kasai, Fumie; Watanabe, Akiko; Kawashima, Kumiko; Kishida, Yoshie; Ono, Akiko; Shimizu, Yoshimi; Takahashi, Chika; Minami, Chiharu; Fujishiro, Tsunakazu; Kohara, Mitsuyo; Katoh, Midori; Nakazaki, Naomi; Nakayama, Shinobu; Yamada, Manabu; Tabata, Satoshi; Watanabe, Makoto M.
2007-01-01
Abstract The nucleotide sequence of the complete genome of a cyanobacterium, Microcystis aeruginosa NIES-843, was determined. The genome of M. aeruginosa is a single, circular chromosome of 5 842 795 base pairs (bp) in length, with an average GC content of 42.3%. The chromosome comprises 6312 putative protein-encoding genes, two sets of rRNA genes, 42 tRNA genes representing 41 tRNA species, and genes for tmRNA, the B subunit of RNase P, SRP RNA, and 6Sa RNA. Forty-five percent of the putative protein-encoding sequences showed sequence similarity to genes of known function, 32% were similar to hypothetical genes, and the remaining 23% had no apparent similarity to reported genes. A total of 688 kb of the genome, equivalent to 11.8% of the entire genome, were composed of both insertion sequences and miniature inverted-repeat transposable elements. This is indicative of a plasticity of the M. aeruginosa genome, through a mechanism that involves homologous recombination mediated by repetitive DNA elements. In addition to known gene clusters related to the synthesis of microcystin and cyanopeptolin, novel gene clusters that may be involved in the synthesis and modification of toxic small polypeptides were identified. Compared with other cyanobacteria, a relatively small number of genes for two component systems and a large number of genes for restriction-modification systems were notable characteristics of the M. aeruginosa genome. PMID:18192279
An Exploration into Fern Genome Space.
Wolf, Paul G; Sessa, Emily B; Marchant, Daniel Blaine; Li, Fay-Wei; Rothfels, Carl J; Sigel, Erin M; Gitzendanner, Matthew A; Visger, Clayton J; Banks, Jo Ann; Soltis, Douglas E; Soltis, Pamela S; Pryer, Kathleen M; Der, Joshua P
2015-08-26
Ferns are one of the few remaining major clades of land plants for which a complete genome sequence is lacking. Knowledge of genome space in ferns will enable broad-scale comparative analyses of land plant genes and genomes, provide insights into genome evolution across green plants, and shed light on genetic and genomic features that characterize ferns, such as their high chromosome numbers and large genome sizes. As part of an initial exploration into fern genome space, we used a whole genome shotgun sequencing approach to obtain low-density coverage (∼0.4X to 2X) for six fern species from the Polypodiales (Ceratopteris, Pteridium, Polypodium, Cystopteris), Cyatheales (Plagiogyria), and Gleicheniales (Dipteris). We explore these data to characterize the proportion of the nuclear genome represented by repetitive sequences (including DNA transposons, retrotransposons, ribosomal DNA, and simple repeats) and protein-coding genes, and to extract chloroplast and mitochondrial genome sequences. Such initial sweeps of fern genomes can provide information useful for selecting a promising candidate fern species for whole genome sequencing. We also describe variation of genomic traits across our sample and highlight some differences and similarities in repeat structure between ferns and seed plants. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Identification and Classification of Conserved RNA Secondary Structures in the Human Genome
Pedersen, Jakob Skou; Bejerano, Gill; Siepel, Adam; Rosenbloom, Kate; Lindblad-Toh, Kerstin; Lander, Eric S; Kent, Jim; Miller, Webb; Haussler, David
2006-01-01
The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3′UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization. PMID:16628248
Li, Ming-Wei; Lin, Rui-Qing; Song, Hui-Qun; Wu, Xiang-Yun; Zhu, Xing-Quan
2008-01-01
Background Studying mitochondrial (mt) genomics has important implications for various fundamental areas, including mt biochemistry, physiology and molecular biology. In addition, mt genome sequences have provided useful markers for investigating population genetic structures, systematics and phylogenetics of organisms. Toxocara canis, Toxocara cati and Toxocara malaysiensis cause significant health problems in animals and humans. Although they are of importance in human and animal health, no information on the mt genomes for any of Toxocara species is available. Results The sizes of the entire mt genome are 14,322 bp for T. canis, 14029 bp for T. cati and 14266 bp for T. malaysiensis, respectively. These circular genomes are amongst the largest reported to date for all secernentean nematodes. Their relatively large sizes relate mainly to an increased length in the AT-rich region. The mt genomes of the three Toxocara species all encode 12 proteins, two ribosomal RNAs and 22 transfer RNA genes, but lack the ATP synthetase subunit 8 gene, which is consistent with all other species of Nematode studied to date, with the exception of Trichinella spiralis. All genes are transcribed in the same direction and have a nucleotide composition high in A and T, but low in G and C. The contents of A+T of the complete genomes are 68.57% for T. canis, 69.95% for T. cati and 68.86% for T. malaysiensis, among which the A+T for T. canis is the lowest among all nematodes studied to date. The AT bias had a significant effect on both the codon usage pattern and amino acid composition of proteins. The mt genome structures for three Toxocara species, including genes and non-coding regions, are in the same order as for Ascaris suum and Anisakis simplex, but differ from Ancylostoma duodenale, Necator americanus and Caenorhabditis elegans only in the location of the AT-rich region, whereas there are substantial differences when compared with Onchocerca volvulus,Dirofiliria immitis and Strongyloides stercoralis. Phylogenetic analyses based on concatenated amino acid sequences of 12 protein-coding genes revealed that the newly described species T. malaysiensis was more closely related to T. cati than to T. canis, consistent with results of a previous study using sequences of nuclear internal transcribed spacers as genetic markers. Conclusion The present study determined the complete mt genome sequences for three roundworms of human and animal health significance, which provides mtDNA evidence for the validity of T. malaysiensis and also provides a foundation for studying the systematics, population genetics and ecology of these and other nematodes of socio-economic importance. PMID:18482460
Galinier, Richard; van Beurden, Steven; Amilhat, Elsa; Castric, Jeannette; Schoehn, Guy; Verneau, Olivier; Fazio, Géraldine; Allienne, Jean-François; Engelsma, Marc; Sasal, Pierre; Faliex, Elisabeth
2012-06-01
Eel virus European X (EVEX) was first isolated from diseased European eel Anguilla anguilla in Japan at the end of seventies. The virus was tentatively classified into the Rhabdoviridae family on the basis of morphology and serological cross reactivity. This family of viruses is organized into six genera and currently comprises approximately 200 members, many of which are still unassigned because of the lack of molecular data. This work presents the morphological, biochemical and genetic characterizations of EVEX, and proposes a taxonomic classification for this virus. We provide its complete genome sequence, plus a comprehensive sequence comparison between isolates from different geographical origins. The genome encodes the five classical structural proteins plus an overlapping open reading frame in the phosphoprotein gene, coding for a putative C protein. Phylogenic relationship with other rhabdoviruses indicates that EVEX is most closely related to the Vesiculovirus genus and shares the highest identity with trout rhabdovirus 903/87. Copyright © 2012 Elsevier B.V. All rights reserved.
Complete Mitochondrial Genome of Eruca sativa Mill. (Garden Rocket)
Yang, Qing; Chang, Shengxin; Chen, Jianmei; Hu, Maolong; Guan, Rongzhan
2014-01-01
Eruca sativa (Cruciferae family) is an ancient crop of great economic and agronomic importance. Here, the complete mitochondrial genome of Eruca sativa was sequenced and annotated. The circular molecule is 247 696 bp long, with a G+C content of 45.07%, containing 33 protein-coding genes, three rRNA genes, and 18 tRNA genes. The Eruca sativa mitochondrial genome may be divided into six master circles and four subgenomic molecules via three pairwise large repeats, resulting in a more dynamic structure of the Eruca sativa mtDNA compared with other cruciferous mitotypes. Comparison with the Brassica napus MtDNA revealed that most of the genes with known function are conserved between these two mitotypes except for the ccmFN2 and rrn18 genes, and 27 point mutations were scattered in the 14 protein-coding genes. Evolutionary relationships analysis suggested that Eruca sativa is more closely related to the Brassica species and to Raphanus sativus than to Arabidopsis thaliana. PMID:25157569
Vembar, Shruthi Sridhar; Seetin, Matthew; Lambert, Christine; Nattestad, Maria; Schatz, Michael C.; Baybayan, Primo; Scherf, Artur; Smith, Melissa Laird
2016-01-01
The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [∼80.6% (A + T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12 kb, with 50% of the reads between 15.5 and 50 kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [∼90–99% (A + T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission. PMID:27345719
Recurrence time statistics: versatile tools for genomic DNA sequence analysis.
Cao, Yinhe; Tung, Wen-Wen; Gao, J B
2004-01-01
With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
A 454 sequencing approach to dipteran mitochondrial genome research
USDA-ARS?s Scientific Manuscript database
The availability of complete mitochondrial genome data for Diptera, one of the largest Metazoan orders, in public databases is limited. Herein, we generated the complete or nearly complete mitochondrial genomes for Cochliomyia hominivorax, Haematobia irritans, Phormia regina and Sarcophaga crassipa...
Complete Chloroplast Genome Sequences of Four Meliaceae Species and Comparative Analyses
Mader, Malte; Pakull, Birte; Blanc-Jolivet, Céline; Paulini-Drewes, Maike; Bouda, Zoéwindé Henri-Noël; Degen, Bernd; Small, Ian
2018-01-01
The Meliaceae family mainly consists of trees and shrubs with a pantropical distribution. In this study, the complete chloroplast genomes of four Meliaceae species were sequenced and compared with each other and with the previously published Azadirachta indica plastome. The five plastomes are circular and exhibit a quadripartite structure with high conservation of gene content and order. They include 130 genes encoding 85 proteins, 37 tRNAs and 8 rRNAs. Inverted repeat expansion resulted in a duplication of rps19 in the five Meliaceae species, which is consistent with that in many other Sapindales, but different from many other rosids. Compared to Azadirachta indica, the four newly sequenced Meliaceae individuals share several large deletions, which mainly contribute to the decreased genome sizes. A whole-plastome phylogeny supports previous findings that the four species form a monophyletic sister clade to Azadirachta indica within the Meliaceae. SNPs and indels identified in all complete Meliaceae plastomes might be suitable targets for the future development of genetic markers at different taxonomic levels. The extended analysis of SNPs in the matK gene led to the identification of four potential Meliaceae-specific SNPs as a basis for future validation and marker development. PMID:29494509
Hierarchical Scaffolding With Bambus
Pop, Mihai; Kosack, Daniel S.; Salzberg, Steven L.
2004-01-01
The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is commonly used to order and orient these contigs using paired read information. This ordering of contigs is an essential step when finishing and analyzing the data from a whole-genome shotgun project. Most recent assemblers include a scaffolding module; however, users have little control over the scaffolding algorithm or the information produced. We thus developed a general-purpose scaffolder, called Bambus, which affords users significant flexibility in controlling the scaffolding parameters. Bambus was used recently to scaffold the low-coverage draft dog genome data. Most significantly, Bambus enables the use of linking data other than that inferred from mate-pair information. For example, the sequence of a completed genome can be used to guide the scaffolding of a related organism. We present several applications of Bambus: support for finishing, comparative genomics, analysis of the haplotype structure of genomes, and scaffolding of a mammalian genome at low coverage. Bambus is available as an open-source package from our Web site. PMID:14707177
Hierarchical scaffolding with Bambus.
Pop, Mihai; Kosack, Daniel S; Salzberg, Steven L
2004-01-01
The output of a genome assembler generally comprises a collection of contiguous DNA sequences (contigs) whose relative placement along the genome is not defined. A procedure called scaffolding is commonly used to order and orient these contigs using paired read information. This ordering of contigs is an essential step when finishing and analyzing the data from a whole-genome shotgun project. Most recent assemblers include a scaffolding module; however, users have little control over the scaffolding algorithm or the information produced. We thus developed a general-purpose scaffolder, called Bambus, which affords users significant flexibility in controlling the scaffolding parameters. Bambus was used recently to scaffold the low-coverage draft dog genome data. Most significantly, Bambus enables the use of linking data other than that inferred from mate-pair information. For example, the sequence of a completed genome can be used to guide the scaffolding of a related organism. We present several applications of Bambus: support for finishing, comparative genomics, analysis of the haplotype structure of genomes, and scaffolding of a mammalian genome at low coverage. Bambus is available as an open-source package from our Web site.
Lu, You; Samac, Deborah A; Glazebrook, Jane; Ishimaru, Carol A
2015-05-07
We report here the complete genome sequence of Clavibacter michiganensis subsp. insidiosus R1-1, isolated in Minnesota, USA. The R1-1 genome, generated by a de novo assembly of PacBio sequencing data, is the first complete genome sequence available for this subspecies. Copyright © 2015 Lu et al.
Maina, Solomon; Edwards, Owain R.; de Almeida, Luis; Ximenes, Abel
2016-01-01
We present here the first complete genome sequences of Sweet potato chlorotic fleck virus (SPCFV) from sweet potato in Australia and East Timor, and we compare these with four complete SPCFV genomes from South Korea and one from Uganda. The Australian, East Timorese, South Korean, and Ugandan genomes differed considerably from each other. PMID:27231359
Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species.
Chen, Zhiwen; Feng, Kun; Grover, Corrinne E; Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F; Wang, Kunbo; Hua, Jinping
2016-01-01
The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium.
Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species
Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F.; Wang, Kunbo
2016-01-01
The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium. PMID:27309527
Whole mitochondrial genome sequence for an osteoarthritis model of Guinea pig (Caviidae; Cavia).
Cui, Xin-Gang; Liu, Cheng-Yao; Wei, Bo; Zhao, Wen-Jian; Zhang, Wen-Feng
2016-11-01
Animal models played an important role in osteoarthritis studies. Here, the complete mitochondrial genome sequence of the Guinea pig was reported for the first time. The total length of the mitogenome was 16,797 bp. It contained the typical structure, including two ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and one non-coding control region (D-loop region). The overall composition of the mitogenome was estimated to be 34.9% for A, 26.1% for T, 26.0% for C and 13.0% for G showing an A-T (61.0%)-rich feature. This mitochondrial genome sequence will provide new genetic resource into osteoarthritis disease.
Genomic and Phylogenetic Characterization of Brazilian Yellow Fever Virus Strains
Palacios, Gustavo; Cardoso, Jedson F.; Martins, Livia C.; Sousa, Edivaldo C.; de Lima, Clayton P. S.; Medeiros, Daniele B. A.; Savji, Nazir; Desai, Aaloki; Rodrigues, Sueli G.; Carvalho, Valeria L.; Lipkin, W. Ian
2012-01-01
Globally, yellow fever virus infects nearly 200,000 people, leading to 30,000 deaths annually. Although the virus is endemic to Latin America, only a single genome from this region has been sequenced. Here, we report 12 Brazilian yellow fever virus complete genomes, their genetic traits, phylogenetic characterization, and phylogeographic dynamics. Variable 3′ noncoding region (3′NCR) patterns and specific mutations throughout the open reading frame altered predicted secondary structures. Our findings suggest that whereas the introduction of yellow fever virus in Brazil led to genotype I-predominant dispersal throughout South and Central Americas, genotype II remained confined to Bolivia, Peru, and the western Brazilian Amazon. PMID:23015713
Hulse-Kemp, Amanda M; Maheshwari, Shamoni; Stoffel, Kevin; Hill, Theresa A; Jaffe, David; Williams, Stephen R; Weisenfeld, Neil; Ramakrishnan, Srividya; Kumar, Vijay; Shah, Preyas; Schatz, Michael C; Church, Deanna M; Van Deynze, Allen
2018-01-01
Linked-Read sequencing technology has recently been employed successfully for de novo assembly of human genomes, however, the utility of this technology for complex plant genomes is unproven. We evaluated the technology for this purpose by sequencing the 3.5-gigabase (Gb) diploid pepper ( Capsicum annuum ) genome with a single Linked-Read library. Plant genomes, including pepper, are characterized by long, highly similar repetitive sequences. Accordingly, significant effort is used to ensure that the sequenced plant is highly homozygous and the resulting assembly is a haploid consensus. With a phased assembly approach, we targeted a heterozygous F 1 derived from a wide cross to assess the ability to derive both haplotypes and characterize a pungency gene with a large insertion/deletion. The Supernova software generated a highly ordered, more contiguous sequence assembly than all currently available C. annuum reference genomes. Over 83% of the final assembly was anchored and oriented using four publicly available de novo linkage maps. A comparison of the annotation of conserved eukaryotic genes indicated the completeness of assembly. The validity of the phased assembly is further demonstrated with the complete recovery of both 2.5-Kb insertion/deletion haplotypes of the PUN1 locus in the F 1 sample that represents pungent and nonpungent peppers, as well as nearly full recovery of the BUSCO2 gene set within each of the two haplotypes. The most contiguous pepper genome assembly to date has been generated which demonstrates that Linked-Read library technology provides a tool to de novo assemble complex highly repetitive heterozygous plant genomes. This technology can provide an opportunity to cost-effectively develop high-quality genome assemblies for other complex plants and compare structural and gene differences through accurate haplotype reconstruction.
Fu, Jianmin; Liu, Huimin; Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng
2016-01-01
Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.
Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng
2016-01-01
Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros ‘Jinzaoshi’ were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. ‘Jinzaoshi’, support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales. PMID:27442423
Park, Inkyu; Kim, Wook-jin; Yang, Sungyu; Yeo, Sang-Min; Li, Hulin
2017-01-01
Aconitum species (belonging to the Ranunculaceae) are well known herbaceous medicinal ingredients and have great economic value in Asian countries. However, there are still limited genomic resources available for Aconitum species. In this study, we sequenced the chloroplast (cp) genomes of two Aconitum species, A. coreanum and A. carmichaelii, using the MiSeq platform. The two Aconitum chloroplast genomes were 155,880 and 157,040 bp in length, respectively, and exhibited LSC and SSC regions separated by a pair of inverted repeat regions. Both cp genomes had 38% GC content and contained 131 unique functional genes including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. The gene order, content, and orientation of the two Aconitum cp genomes exhibited the general structure of angiosperms, and were similar to those of other Aconitum species. Comparison of the cp genome structure and gene order with that of other Aconitum species revealed general contraction and expansion of the inverted repeat regions and single copy boundary regions. Divergent regions were also identified. In phylogenetic analysis, Aconitum species positon among the Ranunculaceae was determined with other family cp genomes in the Ranunculales. We obtained a barcoding target sequence in a divergent region, ndhC–trnV, and successfully developed a SCAR (sequence characterized amplified region) marker for discrimination of A. coreanum. Our results provide useful genetic information and a specific barcode for discrimination of Aconitum species. PMID:28863163
Park, Inkyu; Kim, Wook-Jin; Yang, Sungyu; Yeo, Sang-Min; Li, Hulin; Moon, Byeong Cheol
2017-01-01
Aconitum species (belonging to the Ranunculaceae) are well known herbaceous medicinal ingredients and have great economic value in Asian countries. However, there are still limited genomic resources available for Aconitum species. In this study, we sequenced the chloroplast (cp) genomes of two Aconitum species, A. coreanum and A. carmichaelii, using the MiSeq platform. The two Aconitum chloroplast genomes were 155,880 and 157,040 bp in length, respectively, and exhibited LSC and SSC regions separated by a pair of inverted repeat regions. Both cp genomes had 38% GC content and contained 131 unique functional genes including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. The gene order, content, and orientation of the two Aconitum cp genomes exhibited the general structure of angiosperms, and were similar to those of other Aconitum species. Comparison of the cp genome structure and gene order with that of other Aconitum species revealed general contraction and expansion of the inverted repeat regions and single copy boundary regions. Divergent regions were also identified. In phylogenetic analysis, Aconitum species positon among the Ranunculaceae was determined with other family cp genomes in the Ranunculales. We obtained a barcoding target sequence in a divergent region, ndhC-trnV, and successfully developed a SCAR (sequence characterized amplified region) marker for discrimination of A. coreanum. Our results provide useful genetic information and a specific barcode for discrimination of Aconitum species.
Functional Information Stored in the Conserved Structural RNA Domains of Flavivirus Genomes
Fernández-Sanlés, Alba; Ríos-Marco, Pablo; Romero-López, Cristina; Berzal-Herranz, Alfredo
2017-01-01
The genus Flavivirus comprises a large number of small, positive-sense single-stranded, RNA viruses able to replicate in the cytoplasm of certain arthropod and/or vertebrate host cells. The genus, which has some 70 member species, includes a number of emerging and re-emerging pathogens responsible for outbreaks of human disease around the world, such as the West Nile, dengue, Zika, yellow fever, Japanese encephalitis, St. Louis encephalitis, and tick-borne encephalitis viruses. Like other RNA viruses, flaviviruses have a compact RNA genome that efficiently stores all the information required for the completion of the infectious cycle. The efficiency of this storage system is attributable to supracoding elements, i.e., discrete, structural units with essential functions. This information storage system overlaps and complements the protein coding sequence and is highly conserved across the genus. It therefore offers interesting potential targets for novel therapeutic strategies. This review summarizes our knowledge of the features of flavivirus genome functional RNA domains. It also provides a brief overview of the main achievements reported in the design of antiviral nucleic acid-based drugs targeting functional genomic RNA elements. PMID:28421048
Chi, Sylvia Ighem; Urbarova, Ilona; Johansen, Steinar D
2018-04-30
The mitochondrial genomes of sea anemones are dynamic in structure. Invasion by genetic elements, such as self-catalytic group I introns or insertion-like sequences, contribute to sea anemone mitochondrial genome expansion and complexity. By using next generation sequencing we investigated the complete mtDNAs and corresponding transcriptomes of the temperate sea anemone Anemonia viridis and its closer tropical relative Anemonia majano. Two versions of fused homing endonuclease gene (HEG) organization were observed among the Actiniidae sea anemones; in-frame gene fusion and pseudo-gene fusion. We provided support for the pseudo-gene fusion organization in Anemonia species, resulting in a repressed HEG from the COI-884 group I intron. orfA, a putative protein-coding gene with insertion-like features, was present in both Anemonia species. Interestingly, orfA and COI expression were significantly up-regulated upon long-term environmental stress corresponding to low seawater pH conditions. This study provides new insights to the dynamics of sea anemone mitochondrial genome structure and function. Copyright © 2018 Elsevier B.V. All rights reserved.
Bennett, Matthew S.; Triemer, Richard E.; Preisfeld, Angelika
2017-01-01
Background Over the last few years multiple studies have been published showing a great diversity in size of chloroplast genomes (cpGenomes), and in the arrangement of gene clusters, in the Euglenales. However, while these genomes provided important insights into the evolution of cpGenomes across the Euglenales and within their genera, only two genomes were analyzed in regard to genomic variability between and within Euglenales and Eutreptiales. To better understand the dynamics of chloroplast genome evolution in early evolving Eutreptiales, this study focused on the cpGenome of Eutreptiella pomquetensis, and the spread and peculiarities of introns. Methods The Etl. pomquetensis cpGenome was sequenced, annotated and afterwards examined in structure, size, gene order and intron content. These features were compared with other euglenoid cpGenomes as well as those of prasinophyte green algae, including Pyramimonas parkeae. Results and Discussion With about 130,561 bp the chloroplast genome of Etl. pomquetensis, a basal taxon in the phototrophic euglenoids, was considerably larger than the two other Eutreptiales cpGenomes sequenced so far. Although the detected quadripartite structure resembled most green algae and plant chloroplast genomes, the gene content of the single copy regions in Etl. pomquetensis was completely different from those observed in green algae and plants. The gene composition of Etl. pomquetensis was extensively changed and turned out to be almost identical to other Eutreptiales and Euglenales, and not to P. parkeae. Furthermore, the cpGenome of Etl. pomquetensis was unexpectedly permeated by a high number of introns, which led to a substantially larger genome. The 51 identified introns of Etl. pomquetensis showed two major unique features: (i) more than half of the introns displayed a high level of pairwise identities; (ii) no group III introns could be identified in the protein coding genes. These findings support the hypothesis that group III introns are degenerated group II introns and evolved later. PMID:28852596
Belkorchia, Abdel; Biderre, Corinne; Militon, Cécile; Polonais, Valérie; Wincker, Patrick; Jubin, Claire; Delbac, Frédéric; Peyretaillade, Eric; Peyret, Pierre
2008-03-01
Brachiola algerae has a broad host spectrum from human to mosquitoes. The successful infection of two mosquito cell lines (Mos55: embryonic cells and Sua 4.0: hemocyte-like cells) and a human cell line (HFF) highlights the efficient adaptive capacity of this microsporidian pathogen. The molecular karyotype of this microsporidian species was determined in the context of the B. algerae genome sequencing project, showing that its haploid genome consists of 30 chromosomal-sized DNAs ranging from 160 to 2240 kbp giving an estimated genome size of 23 Mbp. A contig of 12,269 bp including the DNA sequence of the B. algerae ribosomal transcription unit has been built from initial genomic sequences and the secondary structure of the large subunit rRNA constructed. The data obtained indicate that B. algerae should be an excellent parasitic model to understand genome evolution in relation to infectious capacity.
Complete Coding Genome Sequence for Mogiana Tick Virus, a Jingmenvirus Isolated from Ticks in Brazil
2017-05-04
and capable of infecting a wide range of animal hosts (1–5). Here, we report the complete coding genome sequence (i.e., only missing portions of...segmented nature of the genome was not under- stood. Therefore, only the two genome segments with detectable sequence homolo- gies to flaviviruses were...originally reported (2). We revisited the data set of Maruyama et al. (2) and assembled the complete coding sequences for all four genome segments. We
USDA-ARS?s Scientific Manuscript database
This report includes the complete genome of the Campylobacter concisus type strain ATCC 33237T and the draft genomes of eight additional well characterized C. concisus genomes. C. concisus has been shown to be a genetically heterogeneous species and these nine genomes provide valuable information re...
COGNATE: comparative gene annotation characterizer.
Wilbrandt, Jeanne; Misof, Bernhard; Niehuis, Oliver
2017-07-17
The comparison of gene and genome structures across species has the potential to reveal major trends of genome evolution. However, such a comparative approach is currently hampered by a lack of standardization (e.g., Elliott TA, Gregory TR, Philos Trans Royal Soc B: Biol Sci 370:20140331, 2015). For example, testing the hypothesis that the total amount of coding sequences is a reliable measure of potential proteome diversity (Wang M, Kurland CG, Caetano-Anollés G, PNAS 108:11954, 2011) requires the application of standardized definitions of coding sequence and genes to create both comparable and comprehensive data sets and corresponding summary statistics. However, such standard definitions either do not exist or are not consistently applied. These circumstances call for a standard at the descriptive level using a minimum of parameters as well as an undeviating use of standardized terms, and for software that infers the required data under these strict definitions. The acquisition of a comprehensive, descriptive, and standardized set of parameters and summary statistics for genome publications and further analyses can thus greatly benefit from the availability of an easy to use standard tool. We developed a new open-source command-line tool, COGNATE (Comparative Gene Annotation Characterizer), which uses a given genome assembly and its annotation of protein-coding genes for a detailed description of the respective gene and genome structure parameters. Additionally, we revised the standard definitions of gene and genome structures and provide the definitions used by COGNATE as a working draft suggestion for further reference. Complete parameter lists and summary statistics are inferred using this set of definitions to allow down-stream analyses and to provide an overview of the genome and gene repertoire characteristics. COGNATE is written in Perl and freely available at the ZFMK homepage ( https://www.zfmk.de/en/COGNATE ) and on github ( https://github.com/ZFMK/COGNATE ). The tool COGNATE allows comparing genome assemblies and structural elements on multiples levels (e.g., scaffold or contig sequence, gene). It clearly enhances comparability between analyses. Thus, COGNATE can provide the important standardization of both genome and gene structure parameter disclosure as well as data acquisition for future comparative analyses. With the establishment of comprehensive descriptive standards and the extensive availability of genomes, an encompassing database will become possible.
First Complete Genome Sequence of Bean common mosaic necrosis virus from East Timor
Maina, Solomon; Edwards, Owain R.; de Almeida, Luis; Ximenes, Abel
2016-01-01
We present here the first complete Bean common mosaic necrosis virus (BCMNV) genomic sequence isolated from virus-infected common bean (Phaseolus vulgaris) in East Timor, and compare it with six complete BMCNV genomes from the Netherlands, and one each from the United States, Tanzania, and an unspecified country. It most resembled the Netherlands strain NL-8 genome. PMID:27688343
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.
Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim
2010-03-01
Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org
Carbonell-Caballero, Jose; Alonso, Roberto; Ibañez, Victoria; Terol, Javier; Talon, Manuel; Dopazo, Joaquin
2015-01-01
Citrus genus includes some of the most important cultivated fruit trees worldwide. Despite being extensively studied because of its commercial relevance, the origin of cultivated citrus species and the history of its domestication still remain an open question. Here, we present a phylogenetic analysis of the chloroplast genomes of 34 citrus genotypes which constitutes the most comprehensive and detailed study to date on the evolution and variability of the genus Citrus. A statistical model was used to estimate divergence times between the major citrus groups. Additionally, a complete map of the variability across the genome of different citrus species was produced, including single nucleotide variants, heteroplasmic positions, indels (insertions and deletions), and large structural variants. The distribution of all these variants provided further independent support to the phylogeny obtained. An unexpected finding was the high level of heteroplasmy found in several of the analyzed genomes. The use of the complete chloroplast DNA not only paves the way for a better understanding of the phylogenetic relationships within the Citrus genus but also provides original insights into other elusive evolutionary processes, such as chloroplast inheritance, heteroplasmy, and gene selection. PMID:25873589
Genome-Scale Phylogeny of the Alphavirus Genus Suggests a Marine Origin
Palacios, G.; Tesh, R. B.; Savji, N.; Guzman, H.; Sherman, M.; Weaver, S. C.; Lipkin, W. I.
2012-01-01
The genus Alphavirus comprises a diverse group of viruses, including some that cause severe disease. Using full-length sequences of all known alphaviruses, we produced a robust and comprehensive phylogeny of the Alphavirus genus, presenting a more complete evolutionary history of these viruses compared to previous studies based on partial sequences. Our phylogeny suggests the origin of the alphaviruses occurred in the southern oceans and spread equally through the Old and New World. Since lice appear to be involved in aquatic alphavirus transmission, it is possible that we are missing a louse-borne branch of the alphaviruses. Complete genome sequencing of all members of the genus also revealed conserved residues forming the structural basis of the E1 and E2 protein dimers. PMID:22190718
Jühling, Frank; Pütz, Joern; Bernt, Matthias; Donath, Alexander; Middendorf, Martin; Florentz, Catherine; Stadler, Peter F.
2012-01-01
Transfer RNAs (tRNAs) are present in all types of cells as well as in organelles. tRNAs of animal mitochondria show a low level of primary sequence conservation and exhibit ‘bizarre’ secondary structures, lacking complete domains of the common cloverleaf. Such sequences are hard to detect and hence frequently missed in computational analyses and mitochondrial genome annotation. Here, we introduce an automatic annotation procedure for mitochondrial tRNA genes in Metazoa based on sequence and structural information in manually curated covariance models. The method, applied to re-annotate 1876 available metazoan mitochondrial RefSeq genomes, allows to distinguish between remaining functional genes and degrading ‘pseudogenes’, even at early stages of divergence. The subsequent analysis of a comprehensive set of mitochondrial tRNA genes gives new insights into the evolution of structures of mitochondrial tRNA sequences as well as into the mechanisms of genome rearrangements. We find frequent losses of tRNA genes concentrated in basal Metazoa, frequent independent losses of individual parts of tRNA genes, particularly in Arthropoda, and wide-spread conserved overlaps of tRNAs in opposite reading direction. Direct evidence for several recent Tandem Duplication-Random Loss events is gained, demonstrating that this mechanism has an impact on the appearance of new mitochondrial gene orders. PMID:22139921
2010-01-01
Background Genome reduction is a common evolutionary process in symbiotic and pathogenic bacteria. This process has been extensively characterized in bacterial endosymbionts of insects, where primary mutualistic bacteria represent the most extreme cases of genome reduction consequence of a massive process of gene inactivation and loss during their evolution from free-living ancestors. Sodalis glossinidius, the secondary endosymbiont of tsetse flies, contains one of the few complete genomes of bacteria at the very beginning of the symbiotic association, allowing to evaluate the relative impact of mobile genetic element proliferation and gene inactivation over the structure and functional capabilities of this bacterial endosymbiont during the transition to a host dependent lifestyle. Results A detailed characterization of mobile genetic elements and pseudogenes reveals a massive presence of different types of prophage elements together with five different families of IS elements that have proliferated across the genome of Sodalis glossinidius at different levels. In addition, a detailed survey of intergenic regions allowed the characterization of 1501 pseudogenes, a much higher number than the 972 pseudogenes described in the original annotation. Pseudogene structure reveals a minor impact of mobile genetic element proliferation in the process of gene inactivation, with most of pseudogenes originated by multiple frameshift mutations and premature stop codons. The comparison of metabolic profiles of Sodalis glossinidius and tsetse fly primary endosymbiont Wiglesworthia glossinidia based on their whole gene and pseudogene repertoires revealed a novel case of pathway inactivation, the arginine biosynthesis, in Sodalis glossinidius together with a possible case of metabolic complementation with Wigglesworthia glossinidia for thiamine biosynthesis. Conclusions The complete re-analysis of the genome sequence of Sodalis glossinidius reveals novel insights in the evolutionary transition from a free-living ancestor to a host-dependent lifestyle, with a massive proliferation of mobile genetic elements mainly of phage origin although with minor impact in the process of gene inactivation that is taking place in this bacterial genome. The metabolic analysis of the whole endosymbiotic consortia of tsetse flies have revealed a possible phenomenon of metabolic complementation between primary and secondary endosymbionts that can contribute to explain the co-existence of both bacterial endosymbionts in the context of the tsetse host. PMID:20649993
Understanding protein evolution: from protein physics to Darwinian selection.
Zeldovich, Konstantin B; Shakhnovich, Eugene I
2008-01-01
Efforts in whole-genome sequencing and structural proteomics start to provide a global view of the protein universe, the set of existing protein structures and sequences. However, approaches based on the selection of individual sequences have not been entirely successful at the quantitative description of the distribution of structures and sequences in the protein universe because evolutionary pressure acts on the entire organism, rather than on a particular molecule. In parallel to this line of study, studies in population genetics and phenomenological molecular evolution established a mathematical framework to describe the changes in genome sequences in populations of organisms over time. Here, we review both microscopic (physics-based) and macroscopic (organism-level) models of protein-sequence evolution and demonstrate that bridging the two scales provides the most complete description of the protein universe starting from clearly defined, testable, and physiologically relevant assumptions.
Patterns and processes of Mycobacterium bovis evolution revealed by phylogenomic analyses
USDA-ARS?s Scientific Manuscript database
Mycobacterium bovis is an important animal pathogen worldwide that parasitizes wild and domesticated vertebrate livestock as well as humans. A comparison of the five M. bovis complete genomes from UK, South Korea, Brazil and USA revealed four novel large-scale structural variations of at least 2,000...
Analysis of co-evolving genes in campylobacter jejuni and C. coli
USDA-ARS?s Scientific Manuscript database
Background: The population structure of Campylobacter has been frequently studied by MLST, for which fragments of housekeeping genes are compared. We wished to determine if the used MLST genes are representative of the complete genome. Methods: A set of 1029 core gene families (CGF) was identifie...
Rapid and accurate pyrosequencing of angiosperm plastid genomes
Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E
2006-01-01
Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae). Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically. PMID:16934154
Beffagna, Giorgia; Centelleghe, Cinzia; Franzo, Giovanni; Di Guardo, Giovanni; Mazzariol, Sandro
2017-01-01
Dolphin morbillivirus (DMV) has been deemed as one of the most relevant threats for fin whales (Balaenoptera physalus) being responsible for a mortality outbreak in the Mediterranean Sea in the last years. Knowledge of the complete viral genome is essential to understand any structural changes that could modify virus pathogenesis and viral tissue tropism. We report the complete DMV sequence of N, P/V/C, M, F and H genes identified from a fin whale and the comparison of primary to quaternary structure of proteins between this fin whale strain and some of those isolated during the 1990–‘92 and the 2006–‘08 epidemics. Some relevant substitutions were detected, particularly Asn52Ser located on F protein and Ile21Thr on N protein. Comparing mutations found in the fin whale DMV with those occurring in viral strains of other cetacean species, some of them were proven to be the result of diversifying selection, thus allowing to speculate on their role in host adaptation and on the way they could affect the interaction between the viral attachment and fusion with the target host cells. PMID:28134317
de Castro Nunes, Renata; Orozco-Arias, Simon; Crouzillat, Dominique; Mueller, Lukas A.; Strickler, Suzy R.; Descombes, Patrick; Fournier, Coralie; Moine, Deborah; de Kochko, Alexandre; Yuyama, Priscila M.; Vanzela, André L. L.; Guyot, Romain
2018-01-01
Centromeric regions of plants are generally composed of large array of satellites from a specific lineage of Gypsy LTR-retrotransposons, called Centromeric Retrotransposons. Repeated sequences interact with a specific H3 histone, playing a crucial function on kinetochore formation. To study the structure and composition of centromeric regions in the genus Coffea, we annotated and classified Centromeric Retrotransposons sequences from the allotetraploid C. arabica genome and its two diploid ancestors: Coffea canephora and C. eugenioides. Ten distinct CRC (Centromeric Retrotransposons in Coffea) families were found. The sequence mapping and FISH experiments of CRC Reverse Transcriptase domains in C. canephora, C. eugenioides, and C. arabica clearly indicate a strong and specific targeting mainly onto proximal chromosome regions, which can be associated also with heterochromatin. PacBio genome sequence analyses of putative centromeric regions on C. arabica and C. canephora chromosomes showed an exceptional density of one family of CRC elements, and the complete absence of satellite arrays, contrasting with usual structure of plant centromeres. Altogether, our data suggest a specific centromere organization in Coffea, contrasting with other plant genomes. PMID:29497436
Long terminal repeat retrotransposons of Oryza sativa
McCarthy, Eugene M; Liu, Jingdong; Lizhi, Gao; McDonald, John F
2002-01-01
Background Long terminal repeat (LTR) retrotransposons constitute a major fraction of the genomes of higher plants. For example, retrotransposons comprise more than 50% of the maize genome and more than 90% of the wheat genome. LTR retrotransposons are believed to have contributed significantly to the evolution of genome structure and function. The genome sequencing of selected experimental and agriculturally important species is providing an unprecedented opportunity to view the patterns of variation existing among the entire complement of retrotransposons in complete genomes. Results Using a new data-mining program, LTR_STRUC, (LTR retrotransposon structure program), we have mined the GenBank rice (Oryza sativa) database as well as the more extensive (259 Mb) Monsanto rice dataset for LTR retrotransposons. Almost two-thirds (37) of the 59 families identified consist of copia-like elements, but gypsy-like elements outnumber copia-like elements by a ratio of approximately 2:1. At least 17% of the rice genome consists of LTR retrotransposons. In addition to the ubiquitous gypsy- and copia-like classes of LTR retrotransposons, the rice genome contains at least two novel families of unusually small, non-coding (non-autonomous) LTR retrotransposons. Conclusions Each of the major clades of rice LTR retrotransposons is more closely related to elements present in other species than to the other clades of rice elements, suggesting that horizontal transfer may have occurred over the evolutionary history of rice LTR retrotransposons. Like LTR retrotransposons in other species with relatively small genomes, many rice LTR retrotransposons are relatively young, indicating a high rate of turnover. PMID:12372141
Watanabe, Satoru; Shiwa, Yuh; Itaya, Mitsuhiro; Yoshikawa, Hirofumi
2012-12-01
Genome synthesis of existing or designed genomes is made feasible by the first successful cloning of a cyanobacterium, Synechocystis PCC6803, in Gram-positive, endospore-forming Bacillus subtilis. Whole-genome sequence analysis of the isolate and parental B. subtilis strains provides clues for identifying single nucleotide polymorphisms (SNPs) in the 2 complete bacterial genomes in one cell.
Zhang, Kai-Jun; Zhu, Wen-Chao; Rong, Xia; Zhang, Yan-Kai; Ding, Xiu-Lei; Liu, Jing; Chen, Da-Song; Du, Yu; Hong, Xiao-Yue
2013-06-22
Nilaparvata lugens (the brown planthopper, BPH) and Laodelphax striatellus (the small brown planthopper, SBPH) are two of the most important pests of rice. Up to now, there was only one mitochondrial genome of rice planthopper has been sequenced and very few dependable information of mitochondria could be used for research on population genetics, phylogeographics and phylogenetic evolution of these pests. To get more valuable information from the mitochondria, we sequenced the complete mitochondrial genomes of BPH and SBPH. These two planthoppers were infected with two different functional Wolbachia (intracellular endosymbiont) strains (wLug and wStri). Since both mitochondria and Wolbachia are transmitted by cytoplasmic inheritance and it was difficult to separate them when purified the Wolbachia particles, concomitantly sequencing the genome of Wolbachia using next generation sequencing method, we also got nearly complete mitochondrial genome sequences of these two rice planthoppers. After gap closing, we present high quality and reliable complete mitochondrial genomes of these two planthoppers. The mitogenomes of N. lugens (BPH) and L. striatellus (SBPH) are 17, 619 bp and 16, 431 bp long with A + T contents of 76.95% and 77.17%, respectively. Both species have typical circular mitochondrial genomes that encode the complete set of 37 genes which are usually found in metazoans. However, the BPH mitogenome also possesses two additional copies of the trnC gene. In both mitochondrial genomes, the lengths of the atp8 gene were conspicuously shorter than that of all other known insect mitochondrial genomes (99 bp for BPH, 102 bp for SBPH). That two rearrangement regions (trnC-trnW and nad6-trnP-trnT) of mitochondrial genomes differing from other known insect were found in these two distantly related planthoppers revealed that the gene order of mitochondria might be conservative in Delphacidae. The large non-coding fragment (the A+T-rich region) putatively corresponding responsible for the control of replication and transcription of mitochondria contained a variable number of tandem repeats (VNTRs) block in different natural individuals of these two planthoppers. Comparison with a previously sequenced individual of SBPH revealed that the mitochondrial genetic variation within a species exists not only in the sequence and secondary structure of genes, but also in the gene order (the different location of trnH gene). The mitochondrial genome arrangement pattern found in planthoppers was involved in rearrangements of both tRNA genes and protein-coding genes (PCGs). Different species from different genera of Delphacidae possessing the same mitochondrial gene rearrangement suggests that gene rearrangements of mitochondrial genome probably occurred before the differentiation of this family. After comparatively analyzing the gene order of different species of Hemiptera, we propose that except for some specific taxonomical group (e.g. the whiteflies) the gene order might have diversified in family level of this order. The VNTRs detected in the control region might provide additional genetic markers for studying population genetics, individual difference and phylogeographics of planthoppers.
2013-01-01
Background Nilaparvata lugens (the brown planthopper, BPH) and Laodelphax striatellus (the small brown planthopper, SBPH) are two of the most important pests of rice. Up to now, there was only one mitochondrial genome of rice planthopper has been sequenced and very few dependable information of mitochondria could be used for research on population genetics, phylogeographics and phylogenetic evolution of these pests. To get more valuable information from the mitochondria, we sequenced the complete mitochondrial genomes of BPH and SBPH. These two planthoppers were infected with two different functional Wolbachia (intracellular endosymbiont) strains (wLug and wStri). Since both mitochondria and Wolbachia are transmitted by cytoplasmic inheritance and it was difficult to separate them when purified the Wolbachia particles, concomitantly sequencing the genome of Wolbachia using next generation sequencing method, we also got nearly complete mitochondrial genome sequences of these two rice planthoppers. After gap closing, we present high quality and reliable complete mitochondrial genomes of these two planthoppers. Results The mitogenomes of N. lugens (BPH) and L. striatellus (SBPH) are 17, 619 bp and 16, 431 bp long with A + T contents of 76.95% and 77.17%, respectively. Both species have typical circular mitochondrial genomes that encode the complete set of 37 genes which are usually found in metazoans. However, the BPH mitogenome also possesses two additional copies of the trnC gene. In both mitochondrial genomes, the lengths of the atp8 gene were conspicuously shorter than that of all other known insect mitochondrial genomes (99 bp for BPH, 102 bp for SBPH). That two rearrangement regions (trnC-trnW and nad6-trnP-trnT) of mitochondrial genomes differing from other known insect were found in these two distantly related planthoppers revealed that the gene order of mitochondria might be conservative in Delphacidae. The large non-coding fragment (the A+T-rich region) putatively corresponding responsible for the control of replication and transcription of mitochondria contained a variable number of tandem repeats (VNTRs) block in different natural individuals of these two planthoppers. Comparison with a previously sequenced individual of SBPH revealed that the mitochondrial genetic variation within a species exists not only in the sequence and secondary structure of genes, but also in the gene order (the different location of trnH gene). Conclusion The mitochondrial genome arrangement pattern found in planthoppers was involved in rearrangements of both tRNA genes and protein-coding genes (PCGs). Different species from different genera of Delphacidae possessing the same mitochondrial gene rearrangement suggests that gene rearrangements of mitochondrial genome probably occurred before the differentiation of this family. After comparatively analyzing the gene order of different species of Hemiptera, we propose that except for some specific taxonomical group (e.g. the whiteflies) the gene order might have diversified in family level of this order. The VNTRs detected in the control region might provide additional genetic markers for studying population genetics, individual difference and phylogeographics of planthoppers. PMID:23799924
2013-01-01
Background Galileo is a transposable element responsible for the generation of three chromosomal inversions in natural populations of Drosophila buzzatii. Although the most characteristic feature of Galileo is the long internally-repetitive terminal inverted repeats (TIRs), which resemble the Drosophila Foldback element, its transposase-coding sequence has led to its classification as a member of the P-element superfamily (Class II, subclass 1, TIR order). Furthermore, Galileo has a wide distribution in the genus Drosophila, since it has been found in 6 of the 12 Drosophila sequenced genomes. Among these species, D. mojavensis, the one closest to D. buzzatii, presented the highest diversity in sequence and structure of Galileo elements. Results In the present work, we carried out a thorough search and annotation of all the Galileo copies present in the D. mojavensis sequenced genome. In our set of 170 Galileo copies we have detected 5 Galileo subfamilies (C, D, E, F, and X) with different structures ranging from nearly complete, to only 2 TIR or solo TIR copies. Finally, we have explored the structural and length variation of the Galileo copies that point out the relatively frequent rearrangements within and between Galileo elements. Different mechanisms responsible for these rearrangements are discussed. Conclusions Although Galileo is a transposable element with an ancient history in the D. mojavensis genome, our data indicate a recent transpositional activity. Furthermore, the dynamism in sequence and structure, mainly affecting the TIRs, suggests an active exchange of sequences among the copies. This exchange could lead to new subfamilies of the transposon, which could be crucial for the long-term survival of the element in the genome. PMID:23374229
Expanding the proteome: disordered and alternatively-folded proteins
Dyson, H. Jane
2011-01-01
Proteins provide much of the scaffolding for life, as well as undertaking a variety of essential catalytic reactions. These characteristic functions have led us to presuppose that proteins are in general functional only when well-structured and correctly folded. As we begin to explore the repertoire of possible protein sequences inherent in the human and other genomes, two stark facts that belie this supposition become clear: firstly, the number of apparent open reading frames in the human genome is significantly smaller than appears to be necessary to code for all of the diverse proteins in higher organisms, and secondly that a significant proportion of the protein sequences that would be coded by the genome would not be expected to form stable three-dimensional structures. Clearly the genome must include coding for a multitude of alternative forms of proteins, some of which may be partly or fully disordered or incompletely structured in their functional states. At the same time as this likelihood was recognized, experimental studies also began to uncover examples of important protein molecules and domains that were incompletely structured or completely disordered in solution, yet remained perfectly functional. In the ensuing years, we have seen an explosion of experimental and genome-annotation studies that have mapped the extent of the intrinsic disorder phenomenon and explored the possible biological rationales for its widespread occurrence. Answers to the question “why would a particular domain need to be unstructured?” are as varied as the systems where such domains are found. This review provides a survey of recent new directions in this field, and includes an evaluation of the role not only of intrinsically disordered proteins but of partially structured and highly dynamic members of the disorder-order continuum. PMID:21729349
Marzo, Mar; Bello, Xabier; Puig, Marta; Maside, Xulio; Ruiz, Alfredo
2013-02-04
Galileo is a transposable element responsible for the generation of three chromosomal inversions in natural populations of Drosophila buzzatii. Although the most characteristic feature of Galileo is the long internally-repetitive terminal inverted repeats (TIRs), which resemble the Drosophila Foldback element, its transposase-coding sequence has led to its classification as a member of the P-element superfamily (Class II, subclass 1, TIR order). Furthermore, Galileo has a wide distribution in the genus Drosophila, since it has been found in 6 of the 12 Drosophila sequenced genomes. Among these species, D. mojavensis, the one closest to D. buzzatii, presented the highest diversity in sequence and structure of Galileo elements. In the present work, we carried out a thorough search and annotation of all the Galileo copies present in the D. mojavensis sequenced genome. In our set of 170 Galileo copies we have detected 5 Galileo subfamilies (C, D, E, F, and X) with different structures ranging from nearly complete, to only 2 TIR or solo TIR copies. Finally, we have explored the structural and length variation of the Galileo copies that point out the relatively frequent rearrangements within and between Galileo elements. Different mechanisms responsible for these rearrangements are discussed. Although Galileo is a transposable element with an ancient history in the D. mojavensis genome, our data indicate a recent transpositional activity. Furthermore, the dynamism in sequence and structure, mainly affecting the TIRs, suggests an active exchange of sequences among the copies. This exchange could lead to new subfamilies of the transposon, which could be crucial for the long-term survival of the element in the genome.
Complete Genome Sequences of the Potyvirus Sweet potato virus 2 from East Timor and Australia
Maina, Solomon; Edwards, Owain R.; de Almeida, Luis; Ximenes, Abel
2016-01-01
We present here the first complete genome sequences of Sweet potato virus 2 (SPV2) from sweet potato in Australia and East Timor, and compare these with five complete SPV2 genome sequences from South Korea and one each from Spain and the United States. Both were closely related to SPV2 genomes from South Korea, Spain, and the United States. PMID:27257208
The genomes and comparative genomics of Lactobacillus delbrueckii phages.
Riipinen, Katja-Anneli; Forsman, Päivi; Alatossava, Tapani
2011-07-01
Lactobacillus delbrueckii phages are a great source of genetic diversity. Here, the genome sequences of Lb. delbrueckii phages LL-Ku, c5 and JCL1032 were analyzed in detail, and the genetic diversity of Lb. delbrueckii phages belonging to different taxonomic groups was explored. The lytic isometric group b phages LL-Ku (31,080 bp) and c5 (31,841 bp) showed a minimum nucleotide sequence identity of 90% over about three-fourths of their genomes. The genomic locations of their lysis modules were unique, and the genomes featured several putative overlapping transcription units of genes. LL-Ku and c5 virions displayed peptidoglycan hydrolytic activity associated with a ~36-kDa protein similar in size to the endolysin. Unexpectedly, the 49,433-bp genome of the prolate phage JCL1032 (temperate, group c) revealed a conserved gene order within its structural genes. Lb. delbrueckii phages representing groups a (a phage LL-H), b and c possessed only limited protein sequence homology. Genomic comparison of LL-Ku and c5 suggested that diversification of Lb. delbrueckii phages is mainly due to insertions, deletions and recombination. For the first time, the complete genome sequences of group b and c Lb. delbrueckii phages are reported.
Osypov, Alexander A; Krutinin, Gleb G; Krutinina, Eugenia A; Kamzolova, Svetlana G
2012-04-01
Electrostatic properties of genome DNA are important to its interactions with different proteins, in particular, related to transcription. DEPPDB - DNA Electrostatic Potential (and other Physical) Properties Database - provides information on the electrostatic and other physical properties of genome DNA combined with its sequence and annotation of biological and structural properties of genomes and their elements. Genomes are organized on taxonomical basis, supporting comparative and evolutionary studies. Currently, DEPPDB contains all completely sequenced bacterial, viral, mitochondrial, and plastids genomes according to the NCBI RefSeq, and some model eukaryotic genomes. Data for promoters, regulation sites, binding proteins, etc., are incorporated from established DBs and literature. The database is complemented by analytical tools. User sequences calculations are available. Case studies discovered electrostatics complementing DNA bending in E.coli plasmid BNT2 promoter functioning, possibly affecting host-environment metabolic switch. Transcription factors binding sites gravitate to high potential regions, confirming the electrostatics universal importance in protein-DNA interactions beyond the classical promoter-RNA polymerase recognition and regulation. Other genome elements, such as terminators, also show electrostatic peculiarities. Most intriguing are gene starts, exhibiting taxonomic correlations. The necessity of the genome electrostatic properties studies is discussed.
2013-01-01
Background Contemporary coral reef research has firmly established that a genomic approach is urgently needed to better understand the effects of anthropogenic environmental stress and global climate change on coral holobiont interactions. Here we present KEGG orthology-based annotation of the complete genome sequence of the scleractinian coral Acropora digitifera and provide the first comprehensive view of the genome of a reef-building coral by applying advanced bioinformatics. Description Sequences from the KEGG database of protein function were used to construct hidden Markov models. These models were used to search the predicted proteome of A. digitifera to establish complete genomic annotation. The annotated dataset is published in ZoophyteBase, an open access format with different options for searching the data. A particularly useful feature is the ability to use a Google-like search engine that links query words to protein attributes. We present features of the annotation that underpin the molecular structure of key processes of coral physiology that include (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca2+-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defensome and (15) coral epigenetics. Conclusions We advocate that providing annotation in an open-access searchable database available to the public domain will give an unprecedented foundation to interrogate the fundamental molecular structure and interactions of coral symbiosis and allow critical questions to be addressed at the genomic level based on combined aspects of evolutionary, developmental, metabolic, and environmental perspectives. PMID:23889801
Mapping the Space of Genomic Signatures
Kari, Lila; Hill, Kathleen A.; Sayem, Abu S.; Karamichalis, Rallis; Bryans, Nathaniel; Davis, Katelyn; Dattani, Nikesh S.
2015-01-01
We propose a computational method to measure and visualize interrelationships among any number of DNA sequences allowing, for example, the examination of hundreds or thousands of complete mitochondrial genomes. An "image distance" is computed for each pair of graphical representations of DNA sequences, and the distances are visualized as a Molecular Distance Map: Each point on the map represents a DNA sequence, and the spatial proximity between any two points reflects the degree of structural similarity between the corresponding sequences. The graphical representation of DNA sequences utilized, Chaos Game Representation (CGR), is genome- and species-specific and can thus act as a genomic signature. Consequently, Molecular Distance Maps could inform species identification, taxonomic classifications and, to a certain extent, evolutionary history. The image distance employed, Structural Dissimilarity Index (DSSIM), implicitly compares the occurrences of oligomers of length up to k (herein k = 9) in DNA sequences. We computed DSSIM distances for more than 5 million pairs of complete mitochondrial genomes, and used Multi-Dimensional Scaling (MDS) to obtain Molecular Distance Maps that visually display the sequence relatedness in various subsets, at different taxonomic levels. This general-purpose method does not require DNA sequence alignment and can thus be used to compare similar or vastly different DNA sequences, genomic or computer-generated, of the same or different lengths. We illustrate potential uses of this approach by applying it to several taxonomic subsets: phylum Vertebrata, (super)kingdom Protista, classes Amphibia-Insecta-Mammalia, class Amphibia, and order Primates. This analysis of an extensive dataset confirms that the oligomer composition of full mtDNA sequences can be a source of taxonomic information. This method also correctly finds the mtDNA sequences most closely related to that of the anatomically modern human (the Neanderthal, the Denisovan, and the chimp), and that the sequence most different from it in this dataset belongs to a cucumber. PMID:26000734
Bowden, Katherine E; Weigand, Michael R; Peng, Yanhui; Cassiday, Pamela K; Sammons, Scott; Knipe, Kristen; Rowe, Lori A; Loparev, Vladimir; Sheth, Mili; Weening, Keeley; Tondella, M Lucia; Williams, Margaret M
2016-01-01
During 2010 and 2012, California and Vermont, respectively, experienced statewide epidemics of pertussis with differences seen in the demographic affected, case clinical presentation, and molecular epidemiology of the circulating strains. To overcome limitations of the current molecular typing methods for pertussis, we utilized whole-genome sequencing to gain a broader understanding of how current circulating strains are causing large epidemics. Through the use of combined next-generation sequencing technologies, this study compared de novo, single-contig genome assemblies from 31 out of 33 Bordetella pertussis isolates collected during two separate pertussis statewide epidemics and 2 resequenced vaccine strains. Final genome architecture assemblies were verified with whole-genome optical mapping. Sixteen distinct genome rearrangement profiles were observed in epidemic isolate genomes, all of which were distinct from the genome structures of the two resequenced vaccine strains. These rearrangements appear to be mediated by repetitive sequence elements, such as high-copy-number mobile genetic elements and rRNA operons. Additionally, novel and previously identified single nucleotide polymorphisms were detected in 10 virulence-related genes in the epidemic isolates. Whole-genome variation analysis identified state-specific variants, and coding regions bearing nonsynonymous mutations were classified into functional annotated orthologous groups. Comprehensive studies on whole genomes are needed to understand the resurgence of pertussis and develop novel tools to better characterize the molecular epidemiology of evolving B. pertussis populations. IMPORTANCE Pertussis, or whooping cough, is the most poorly controlled vaccine-preventable bacterial disease in the United States, which has experienced a resurgence for more than a decade. Once viewed as a monomorphic pathogen, B. pertussis strains circulating during epidemics exhibit diversity visible on a genome structural level, previously undetectable by traditional sequence analysis using short-read technologies. For the first time, we combine short- and long-read sequencing platforms with restriction optical mapping for single-contig, de novo assembly of 31 isolates to investigate two geographically and temporally independent U.S. pertussis epidemics. These complete genomes reshape our understanding of B. pertussis evolution and strengthen molecular epidemiology toward one day understanding the resurgence of pertussis.
Design and implementation of a database for Brucella melitensis genome annotation.
De Hertogh, Benoît; Lahlimi, Leïla; Lambert, Christophe; Letesson, Jean-Jacques; Depiereux, Eric
2008-03-18
The genome sequences of three Brucella biovars and of some species close to Brucella sp. have become available, leading to new relationship analysis. Moreover, the automatic genome annotation of the pathogenic bacteria Brucella melitensis has been manually corrected by a consortium of experts, leading to 899 modifications of start sites predictions among the 3198 open reading frames (ORFs) examined. This new annotation, coupled with the results of automatic annotation tools of the complete genome sequences of the B. melitensis genome (including BLASTs to 9 genomes close to Brucella), provides numerous data sets related to predicted functions, biochemical properties and phylogenic comparisons. To made these results available, alphaPAGe, a functional auto-updatable database of the corrected sequence genome of B. melitensis, has been built, using the entity-relationship (ER) approach and a multi-purpose database structure. A friendly graphical user interface has been designed, and users can carry out different kinds of information by three levels of queries: (1) the basic search use the classical keywords or sequence identifiers; (2) the original advanced search engine allows to combine (by using logical operators) numerous criteria: (a) keywords (textual comparison) related to the pCDS's function, family domains and cellular localization; (b) physico-chemical characteristics (numerical comparison) such as isoelectric point or molecular weight and structural criteria such as the nucleic length or the number of transmembrane helix (TMH); (c) similarity scores with Escherichia coli and 10 species phylogenetically close to B. melitensis; (3) complex queries can be performed by using a SQL field, which allows all queries respecting the database's structure. The database is publicly available through a Web server at the following url: http://www.fundp.ac.be/urbm/bioinfo/aPAGe.
A global reference for human genetic variation
2016-01-01
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. PMID:26432245
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stackebrandt, Erko; Zeytun, Ahmet; Lapidus, Alla L.
2013-01-01
Coriobacterium glomerans Haas and Ko nig 1988, is the only species of the genus Coriobacterium, family Coriobacteriaceae, order Coriobacteriales, phylum Actinobacteria. The bacterium thrives as an endosymbiont of pyrrhocorid bugs, i.e. the red fire bug Pyrrhocoris apterus L. The rationale for sequencing the genome of strain PW2T is its endosymbiotic life style which is rare among members of Actinobacteria. Here we describe the features of this symbiont, together with the complete genome sequence and its annotation. This is the first complete genome sequence of a member of the genus Coriobacterium and the sixth member of the order Coriobacteriales for whichmore » complete genome sequences are now available. The 2,115,681 bp long single replicon genome with its 1,804 protein-coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.« less
Ou, Jing; Liu, Jin-Bo; Yao, Fu-Jiao; Wang, Xin-Guo; Wei, Zhao-Ming
2016-01-01
Flour beetles of the genus Tribolium are all pests of stored products and cause severe economic losses every year. The American black flour beetle Tribolium audax is one of the important pest species of flour beetle, and it is also an important quarantine insect. Here we sequenced and characterized the complete mitochondrial genome of T. audax, which was intercepted by Huangpu Custom in maize from America. The complete circular mitochondrial genome (mitogenome) of T. audax was 15,924 bp in length, containing 37 typical coding genes and one non-coding AT-rich region. The mitogenome of T. audax exhibits a gene arrangement and content identical to the most common type in insects. All protein coding genes (PCGs) are start with a typical ATN initiation codon, except for the cox1, which use AAC as its start codon instead of ATN. Eleven genes use standard complete termination codon (nine TAA, two TAG), whereas the nad4 and nad5 genes end with single T. Except for trnS1 (AGN), all tRNA genes display typical secondary cloverleaf structures as those of other insects. The sizes of the large and small ribosomal RNA genes are 1288 and 780 bp, respectively. The AT content of the AT-rich region is 81.36%. The 5 bp conserved motif TACTA was found in the intergenic region between trnS2 (UCN) and nad1.
Filichkin, S A; Bransom, K L; Goodwin, J B; Dreher, T W
2000-09-01
Five highly infectious turnip yellow mosaic virus (TYMV) genomes with sequence changes in their 3'-terminal regions that result in altered aminoacylation and eEF1A binding have been studied. These genomes were derived from cloned parental RNAs of low infectivity by sequential passaging in plants. Three of these genomes that are incapable of aminoacylation have been reported previously (J. B. Goodwin, J. M. Skuzeski, and T. W. Dreher, Virology 230:113-124, 1997). We now demonstrate by subcloning the 3' untranslated regions into wild-type TYMV RNA that the high infectivities and replication rates of these genomes compared to their progenitors are mostly due to a small number of mutations acquired in the 3' tRNA-like structure during passaging. Mutations in other parts of the genome, including the replication protein coding region, are not required for high infectivity but probably do play a role in optimizing viral amplification and spread in plants. Two other TYMV RNA variants of suboptimal infectivities, one that accepts methionine instead of the usual valine and one that interacts less tightly with eEF1A, were sequentially passaged to produce highly infectious genomes. The improved infectivities of these RNAs were not associated with increased replication in protoplasts, and no mutations were acquired in their 3' tRNA-like structures. Complete sequencing of one genome identified two mutations that result in amino acid changes in the movement protein gene, suggesting that improved infectivity may be a function of improved viral dissemination in plants. Our results show that the wild-type TYMV replication proteins are able to amplify genomes with 3' termini of variable sequence and tRNA mimicry. These and previous results have led to a model in which the binding of eEF1A to the 3' end to antagonize minus-strand initiation is a major role of the tRNA-like structure.
Genome structure of a Saccharomyces cerevisiae strain widely used in bioethanol production
Argueso, Juan Lucas; Carazzolle, Marcelo F.; Mieczkowski, Piotr A.; Duarte, Fabiana M.; Netto, Osmar V.C.; Missawa, Silvia K.; Galzerani, Felipe; Costa, Gustavo G.L.; Vidal, Ramon O.; Noronha, Melline F.; Dominska, Margaret; Andrietta, Maria G.S.; Andrietta, Sílvio R.; Cunha, Anderson F.; Gomes, Luiz H.; Tavares, Flavio C.A.; Alcarde, André R.; Dietrich, Fred S.; McCusker, John H.; Petes, Thomas D.; Pereira, Gonçalo A.G.
2009-01-01
Bioethanol is a biofuel produced mainly from the fermentation of carbohydrates derived from agricultural feedstocks by the yeast Saccharomyces cerevisiae. One of the most widely adopted strains is PE-2, a heterothallic diploid naturally adapted to the sugar cane fermentation process used in Brazil. Here we report the molecular genetic analysis of a PE-2 derived diploid (JAY270), and the complete genome sequence of a haploid derivative (JAY291). The JAY270 genome is highly heterozygous (∼2 SNPs/kb) and has several structural polymorphisms between homologous chromosomes. These chromosomal rearrangements are confined to the peripheral regions of the chromosomes, with breakpoints within repetitive DNA sequences. Despite its complex karyotype, this diploid, when sporulated, had a high frequency of viable spores. Hybrid diploids formed by outcrossing with the laboratory strain S288c also displayed good spore viability. Thus, the rearrangements that exist near the ends of chromosomes do not impair meiosis, as they do not span regions that contain essential genes. This observation is consistent with a model in which the peripheral regions of chromosomes represent plastic domains of the genome that are free to recombine ectopically and experiment with alternative structures. We also explored features of the JAY270 and JAY291 genomes that help explain their high adaptation to industrial environments, exhibiting desirable phenotypes such as high ethanol and cell mass production and high temperature and oxidative stress tolerance. The genomic manipulation of such strains could enable the creation of a new generation of industrial organisms, ideally suited for use as delivery vehicles for future bioenergy technologies. PMID:19812109
Single haplotype assembly of the human genome from a hydatidiform mole.
Steinberg, Karyn Meltz; Schneider, Valerie A; Graves-Lindsay, Tina A; Fulton, Robert S; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C; Church, Deanna M; Eichler, Evan E; Wilson, Richard K
2014-12-01
A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. © 2014 Steinberg et al.; Published by Cold Spring Harbor Laboratory Press.
Single haplotype assembly of the human genome from a hydatidiform mole
Steinberg, Karyn Meltz; Schneider, Valerie A.; Graves-Lindsay, Tina A.; Fulton, Robert S.; Agarwala, Richa; Huddleston, John; Shiryev, Sergey A.; Morgulis, Aleksandr; Surti, Urvashi; Warren, Wesley C.; Church, Deanna M.; Eichler, Evan E.; Wilson, Richard K.
2014-01-01
A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. PMID:25373144
Complete genome sequence of Parvibaculum lavamentivorans type strain (DS-1(T)).
Schleheck, David; Weiss, Michael; Pitluck, Sam; Bruce, David; Land, Miriam L; Han, Shunsheng; Saunders, Elizabeth; Tapia, Roxanne; Detter, Chris; Brettin, Thomas; Han, James; Woyke, Tanja; Goodwin, Lynne; Pennacchio, Len; Nolan, Matt; Cook, Alasdair M; Kjelleberg, Staffan; Thomas, Torsten
2011-12-31
Parvibaculum lavamentivorans DS-1(T) is the type species of the novel genus Parvibaculum in the novel family Rhodobiaceae (formerly Phyllobacteriaceae) of the order Rhizobiales of Alphaproteobacteria. Strain DS-1(T) is a non-pigmented, aerobic, heterotrophic bacterium and represents the first tier member of environmentally important bacterial communities that catalyze the complete degradation of synthetic laundry surfactants. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,914,745 bp long genome with its predicted 3,654 protein coding genes is the first completed genome sequence of the genus Parvibaculum, and the first genome sequence of a representative of the family Rhodobiaceae.
Walker, Joseph F; Zanis, Michael J; Emery, Nancy C
2014-04-01
Complete chloroplast genome studies can help resolve relationships among large, complex plant lineages such as Asteraceae. We present the first whole plastome from the Madieae tribe and compare its sequence variation to other chloroplast genomes in Asteraceae. We used high throughput sequencing to obtain the Lasthenia burkei chloroplast genome. We compared sequence structure and rates of molecular evolution in the small single copy (SSC), large single copy (LSC), and inverted repeat (IR) regions to those for eight Asteraceae accessions and one Solanaceae accession. The chloroplast sequence of L. burkei is 150 746 bp and contains 81 unique protein coding genes and 4 coding ribosomal RNA sequences. We identified three major inversions in the L. burkei chloroplast, all of which have been found in other Asteraceae lineages, and a previously unreported inversion in Lactuca sativa. Regions flanking inversions contained tRNA sequences, but did not have particularly high G + C content. Substitution rates varied among the SSC, LSC, and IR regions, and rates of evolution within each region varied among species. Some observed differences in rates of molecular evolution may be explained by the relative proportion of coding to noncoding sequence within regions. Rates of molecular evolution vary substantially within and among chloroplast genomes, and major inversion events may be promoted by the presence of tRNAs. Collectively, these results provide insight into different mechanisms that may promote intramolecular recombination and the inversion of large genomic regions in the plastome.
Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.
2005-08-26
Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. Amore » minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.« less
Zhao, Xing; Liang, Ai-Ping
2016-09-01
The first complete DNA sequence of the mitochondrial genome (mitogenome) of Leptobelus gazelle (Membracoidea: Hemiptera) is determined in this study. The circular molecule is 16,007 bp in its full length, which encodes a set of 37 genes, including 13 proteins, 2 ribosomal RNAs, 22 transfer RNAs, and contains an A + T-rich region (CR). The gene numbers, content, and organization of L. gazelle are similar to other typical metazoan mitogenomes. Twelve of the 13 PCGs are initiated with ATR methionine or ATT isoleucine codons, except the atp8 gene that uses the ATC isoleucine as start signal. Ten of the 13 PCGs have complete termination codons, either TAA (nine genes) or TAG (cytb). The remaining 3 PCGs (cox1, cox2 and nad5) have incomplete termination codons T (AA). All of the 22 tRNAs can be folded in the form of a typical clover-leaf structure. The complete mitogenome sequence data of L. gazelle is useful for the phylogenetic and biogeographic studies of the Membracoidea and Hemiptera.
Mapping the Structure and Dynamics of Genomics-Related MeSH Terms Complex Networks
Siqueiros-García, Jesús M.; Hernández-Lemus, Enrique; García-Herrera, Rodrigo; Robina-Galatas, Andrea
2014-01-01
It has been proposed that the history and evolution of scientific ideas may reflect certain aspects of the underlying socio-cognitive frameworks in which science itself is developing. Systematic analyses of the development of scientific knowledge may help us to construct models of the collective dynamics of science. Aiming at scientific rigor, these models should be built upon solid empirical evidence, analyzed with formal tools leading to ever-improving results that support the related conclusions. Along these lines we studied the dynamics and structure of the development of research in genomics as represented by the entire collection of genomics-related scientific papers contained in the PubMed database. The analyzed corpus consisted in more than 49,000 articles published in the years 1987 (first appeareance of the term Genomics) to 2011, categorized by means of the Medical Subheadings (MeSH) content-descriptors. Complex networks were built where two MeSH terms were connected if they are descriptors of the same article(s). The analysis of such networks revealed a complex structure and dynamics that to certain extent resembled small-world networks. The evolution of such networks in time reflected interesting phenomena in the historical development of genomic research, including what seems to be a phase-transition in a period marked by the completion of the first draft of the Human Genome Project. We also found that different disciplinary areas have different dynamic evolution patterns in their MeSH connectivity networks. In the case of areas related to science, changes in topology were somewhat fast while retaining a certain core-stucture, whereas in the humanities, the evolution was pretty slow and the structure resulted highly redundant and in the case of technology related issues, the evolution was very fast and the structure remained tree-like with almost no overlapping terms. PMID:24699262
The complete chloroplast genome sequence of date palm (Phoenix dactylifera L.).
Yang, Meng; Zhang, Xiaowei; Liu, Guiming; Yin, Yuxin; Chen, Kaifu; Yun, Quanzheng; Zhao, Duojun; Al-Mssallem, Ibrahim S; Yu, Jun
2010-09-15
Date palm (Phoenix dactylifera L.), a member of Arecaceae family, is one of the three major economically important woody palms--the two other palms being oil palm and coconut tree--and its fruit is a staple food among Middle East and North African nations, as well as many other tropical and subtropical regions. Here we report a complete sequence of the data palm chloroplast (cp) genome based on pyrosequencing. After extracting 369,022 cp sequencing reads from our whole-genome-shotgun data, we put together an assembly and validated it with intensive PCR-based verification, coupled with PCR product sequencing. The date palm cp genome is 158,462 bp in length and has a typical quadripartite structure of the large (LSC, 86,198 bp) and small single-copy (SSC, 17,712 bp) regions separated by a pair of inverted repeats (IRs, 27,276 bp). Similar to what has been found among most angiosperms, the date palm cp genome harbors 112 unique genes and 19 duplicated fragments in the IR regions. The junctions between LSC/IRs and SSC/IRs show different features of sequence expansion in evolution. We identified 78 SNPs as major intravarietal polymorphisms within the population of a specific cp genome, most of which were located in genes with vital functions. Based on RNA-sequencing data, we also found 18 polycistronic transcription units and three highly expression-biased genes--atpF, trnA-UGC, and rrn23. Unlike most monocots, date palm has a typical cp genome similar to that of tobacco--with little rearrangement and gene loss or gain. High-throughput sequencing technology facilitates the identification of intravarietal variations in cp genomes among different cultivars. Moreover, transcriptomic analysis of cp genes provides clues for uncovering regulatory mechanisms of transcription and translation in chloroplasts.
Rapid convergent evolution in wild crickets.
Pascoal, Sonia; Cezard, Timothee; Eik-Nes, Aasta; Gharbi, Karim; Majewska, Jagoda; Payne, Elizabeth; Ritchie, Michael G; Zuk, Marlene; Bailey, Nathan W
2014-06-16
The earliest stages of convergent evolution are difficult to observe in the wild, limiting our understanding of the incipient genomic architecture underlying convergent phenotypes. To address this, we capitalized on a novel trait, flatwing, that arose and proliferated at the start of the 21st century in a population of field crickets (Teleogryllus oceanicus) on the Hawaiian island of Kauai. Flatwing erases sound-producing structures on male forewings. Mutant males cannot sing to attract females, but they are protected from fatal attack by an acoustically orienting parasitoid fly (Ormia ochracea). Two years later, the silent morph appeared on the neighboring island of Oahu. We tested two hypotheses for the evolutionary origin of flatwings in Hawaii: (1) that the silent morph originated on Kauai and subsequently introgressed into Oahu and (2) that flatwing originated independently on each island. Morphometric analysis of male wings revealed that Kauai flatwings almost completely lack typical derived structures, whereas Oahu flatwings retain noticeably more wild-type wing venation. Using standard genetic crosses, we confirmed that the mutation segregates as a single-locus, sex-linked Mendelian trait on both islands. However, genome-wide scans using RAD-seq recovered almost completely distinct markers linked with flatwing on each island. The patterns of allelic association with flatwing on either island reveal different genomic architectures consistent with the timing of two mutational events on the X chromosome. Divergent wing morphologies linked to different loci thus cause identical behavioral outcomes--silence--illustrating the power of selection to rapidly shape convergent adaptations from distinct genomic starting points. Copyright © 2014 Elsevier Ltd. All rights reserved.
A Comparison of the First Two Sequenced Chloroplast Genomes in Asteraceae: Lettuce and Sunflower
DOE Office of Scientific and Technical Information (OSTI.GOV)
Timme, Ruth E.; Kuehl, Jennifer V.; Boore, Jeffrey L.
2006-01-20
Asteraceae is the second largest family of plants, with over 20,000 species. For the past few decades, numerous phylogenetic studies have contributed to our understanding of the evolutionary relationships within this family, including comparisons of the fast evolving chloroplast gene, ndhF, rbcL, as well as non-coding DNA from the trnL intron plus the trnLtrnF intergenic spacer, matK, and, with lesser resolution, psbA-trnH. This culminated in a study by Panero and Funk in 2002 that used over 13,000 bp per taxon for the largest taxonomic revision of Asteraceae in over a hundred years. Still, some uncertainties remain, and it would bemore » very useful to have more information on the relative rates of sequence evolution among various genes and on genome structure as a potential set of phylogenetic characters to help guide future phylogenetic structures. By way of contributing to this, we report the first two complete chloroplast genome sequences from members of the Asteraceae, those of Helianthus annuus and Lactuca sativa. These plants belong to two distantly related subfamilies, Asteroideae and Cichorioideae, respectively. In addition to these, there is only one other published chloroplast genome sequence for any plant within the larger group called Eusterids II, that of Panax ginseng (Araliaceae, 156,318 bps, AY582139). Early chloroplast genome mapping studies demonstrated that H. annuus and L. sativa share a 22 kb inversion relative to members of the subfamily Barnadesioideae. By comparison to outgroups, this inversion was shown to be derived, indicating that the Asteroideae and Cichorioideae are more closely related than either is to the Barnadesioideae. Later sequencing study found that taxa that share this 22 kb inversion also contain within this region a second, smaller, 3.3 kb inversion. These sequences also enable an analysis of patterns of shared repeats in the genomes at fine level and of RNA editing by comparison to available EST sequences. In addition, since both of these genomes are crop plants, their complete genome sequence will facilitate development of chloroplast genetic engineering technology, as in recent studies from Daniell's lab. Knowing the exact sequence from spacer regions is crucial for introducing transgenes into the chloroplast genome.« less
Complete genome sequence of ‘Candidatus Liberibacter africanus’
USDA-ARS?s Scientific Manuscript database
The complete genome sequence of ‘Candidatus Liberibacter africanus’ (Laf), strain ptsapsy, was obtained by an Illumina HiSeq 2000. The Laf genome comprises 1,192,232 nucleotides, 34.5% GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S and 5S) ...
Complete genome sequence of salmonella enterica subsp. enterica Serovar Thompson Strain RM6836
USDA-ARS?s Scientific Manuscript database
Salmonella enterica subsp. enterica serovar Thompson (S. Thompson) strain RM6836 was isolated from lettuce in 2002. We report the complete sequence and annotation of the genome of S. Thompson strain RM6836. This is the first reported complete genome sequence for S. Thompson and will provide a point ...
Complete genome sequence of the clinical Campylobacter coli isolate 15-537360
USDA-ARS?s Scientific Manuscript database
Campylobacter coli strain 15-537360 was originally isolated from a 42 year-old patient with gastroenteritis. Here we report its complete genome sequence, which comprises a 1.7 Mbp chromosome and a 29 kbp conjugative cryptic plasmid. This is the first complete genome sequence of a clinical isolate of...
The complete mitochondrial genome of the fall webworm, Hyphantria cunea (Lepidoptera: Arctiidae)
Liao, Fang; Wang, Lin; Wu, Song; Li, Yu-Ping; Zhao, Lei; Huang, Guo-Ming; Niu, Chun-Jing; Liu, Yan-Qun; Li, Ming-Gang
2010-01-01
The complete mitochondrial genome (mitogenome) of the fall webworm, Hyphantria cunea (Lepidoptera: Arctiidae) was determined. The genome is a circular molecule 15 481 bp long. It presents a typical gene organization and order for completely sequenced lepidopteran mitogenomes, but differs from the insect ancestral type for the placement of tRNAMet. The nucleotide composition of the genome is also highly A + T biased, accounting for 80.38%, with a slightly positive AT skewness (0.010), indicating the occurrence of more As than Ts, as found in the Noctuoidea species. All protein-coding genes (PCGs) are initiated by ATN codons, except for COI, which is tentatively designated by the CGA codon as observed in other lepidopterans. Four of 13 PCGs harbor the incomplete termination codon, T or TA. All tRNAs have a typical clover-leaf structure of mitochondrial tRNAs, except for tRNASer(AGN), the DHU arm of which could not form a stable stem-loop structure. The intergenic spacer sequence between tRNASer(AGN) and ND1 also contains the ATACTAA motif, which is conserved across the Lepidoptera order. The H. cunea A+T-rich region of 357 bp is comprised of non-repetitive sequences, but harbors several features common to the Lepidoptera insects, including the motif ATAGA followed by an 18 bp poly-T stretch, a microsatellite-like (AT)8 element preceded by the ATTTA motif, an 11 bp poly-A present immediately upstream tRNAMet. The phylogenetic analyses support the view that the H. cunea is closerly related to the Lymantria dispar than Ochrogaster lunifer, and support the hypothesis that Noctuoidea (H. cunea, L. dispar, and O. lunifer) and Geometroidea (Phthonandria atrilineata) are monophyletic. However, in the phylogenetic trees based on mitogenome sequences among the lepidopteran superfamilies, Papillonoidea (Artogeia melete, Acraea issoria, and Coreana raphaelis) joined basally within the monophyly of Lepidoptera, which is different to the traditional classification. PMID:20376208
Wang, Xiaodan; Ma, Dehong; Huang, Xinwei; Li, Lihua; Li, Duo; Zhao, Yujiao; Qiu, Lijuan; Pan, Yue; Chen, Junying; Xi, Juemin; Shan, Xiyun; Sun, Qiangming
2017-06-15
In the past few decades, dengue has spread rapidly and is an emerging disease in China. An unexpected dengue outbreak occurred in Xishuangbanna, Yunnan, China, resulting in 1331 patients in 2013. In order to obtain the complete genome information and perform mutation and evolutionary analysis of causative agent related to this largest outbreak of dengue fever. The viruses were isolated by cell culture and evaluated by genome sequence analysis. Phylogenetic trees were then constructed by Neighbor-Joining methods (MEGA6.0), followed by analysis of nucleotide mutation and amino acid substitution. The analysis of the diversity of secondary structure for E and NS1 protein were also performed. Then selection pressures acting on the coding sequences were estimated by PAML software. The complete genome sequences of two isolated strains (YNSW1, YNSW2) were 10,710 and 10,702 nucleotides in length, respectively. Phylogenetic analysis revealed both strain were classified as genotype II of DENV-3. The results indicated that both isolated strains of Xishuangbanna in 2013 and Laos 2013 stains (KF816161.1, KF816158.1, LC147061.1, LC147059.1, KF816162.1) were most similar to Bangladesh (AY496873.2) in 2002. After comparing with the DENV-3SS (H87) 62 amino acid substitutions were identified in translated regions, and 38 amino acid substitutions were identified in translated regions compared with DENV-3 genotype II stains Bangladesh (AY496873.2). 27(YNSW1) or 28(YNSW2) single nucleotide changes were observed in structural protein sequences with 7(YNSW1) or 8(YNSW2) non-synonymous mutations compared with AY496873.2. Of them, 4 non-synonymous mutations were identified in E protein sequences with (2 in the β-sheet, 2 in the coil). Meanwhile, 117(YNSW1) or 115 (YNSW2) single nucleotide changes were observed in non-structural protein sequences with 31(YNSW1) or 30 (YNSW2) non-synonymous mutations. Particularly, 14 single nucleotide changes were observed in NS1 sequences with 4/14 non-synonymous substitutions (4 in the coil). Selection pressure analysis revealed no positive selection in the amino acid sites of the genes encoding for structural and non-structural proteins. This study may help understand the intrinsic geographical relatedness of dengue virus 3 and contributes further to research on their infectivity, pathogenicity and vaccine development. Copyright © 2017 Elsevier B.V. All rights reserved.
Complete genome sequence of the hippuricase-positive Campylobacter avium type strain LMG 24591
USDA-ARS?s Scientific Manuscript database
Campylobacter avium is a hippurate-positive, thermotolerant campylobacter that has been isolated from poultry. Here we present the genome sequences of two C. avium strains isolated from broiler chickens: strains LMG 24591T (complete genome) and LMG 24592 (draft genome). The C. avium type strain geno...
The complete mitochondrial genome of a stonefly species, Togoperla sp. (Plecoptera: Perlidae).
Wang, Kai; Wang, Yuyu; Yang, Ding
2016-05-01
The complete mitochondrial (mt) genome of a stonefly species, Togoperla sp. (Plecoptera: Perlidae), was sequenced. The 15,723 bp long genome has the standard metazoan complement of 37 genes and an A+T-rich region, which is the same as the insect ancestral genome arrangement.
Comparing Mycobacterium tuberculosis genomes using genome topology networks.
Jiang, Jianping; Gu, Jianlei; Zhang, Liang; Zhang, Chenyi; Deng, Xiao; Dou, Tonghai; Zhao, Guoping; Zhou, Yan
2015-02-14
Over the last decade, emerging research methods, such as comparative genomic analysis and phylogenetic study, have yielded new insights into genotypes and phenotypes of closely related bacterial strains. Several findings have revealed that genomic structural variations (SVs), including gene gain/loss, gene duplication and genome rearrangement, can lead to different phenotypes among strains, and an investigation of genes affected by SVs may extend our knowledge of the relationships between SVs and phenotypes in microbes, especially in pathogenic bacteria. In this work, we introduce a 'Genome Topology Network' (GTN) method based on gene homology and gene locations to analyze genomic SVs and perform phylogenetic analysis. Furthermore, the concept of 'unfixed ortholog' has been proposed, whose members are affected by SVs in genome topology among close species. To improve the precision of 'unfixed ortholog' recognition, a strategy to detect annotation differences and complete gene annotation was applied. To assess the GTN method, a set of thirteen complete M. tuberculosis genomes was analyzed as a case study. GTNs with two different gene homology-assigning methods were built, the Clusters of Orthologous Groups (COG) method and the orthoMCL clustering method, and two phylogenetic trees were constructed accordingly, which may provide additional insights into whole genome-based phylogenetic analysis. We obtained 24 unfixable COG groups, of which most members were related to immunogenicity and drug resistance, such as PPE-repeat proteins (COG5651) and transcriptional regulator TetR gene family members (COG1309). The GTN method has been implemented in PERL and released on our website. The tool can be downloaded from http://homepage.fudan.edu.cn/zhouyan/gtn/ , and allows re-annotating the 'lost' genes among closely related genomes, analyzing genes affected by SVs, and performing phylogenetic analysis. With this tool, many immunogenic-related and drug resistance-related genes were found to be affected by SVs in M. tuberculosis genomes. We believe that the GTN method will be suitable for the exploration of genomic SVs in connection with biological features of bacterial strains, and that GTN-based phylogenetic analysis will provide additional insights into whole genome-based phylogenetic analysis.
The Complete Sequence of a Human Parainfluenzavirus 4 Genome
Yea, Carmen; Cheung, Rose; Collins, Carol; Adachi, Dena; Nishikawa, John; Tellier, Raymond
2009-01-01
Although the human parainfluenza virus 4 (HPIV4) has been known for a long time, its genome, alone among the human paramyxoviruses, has not been completely sequenced to date. In this study we obtained the first complete genomic sequence of HPIV4 from a clinical isolate named SKPIV4 obtained at the Hospital for Sick Children in Toronto (Ontario, Canada). The coding regions for the N, P/V, M, F and HN proteins show very high identities (95% to 97%) with previously available partial sequences for HPIV4B. The sequence for the L protein and the non-coding regions represent new information. A surprising feature of the genome is its length, more than 17 kb, making it the longest genome within the genus Rubulavirus, although the length is well within the known range of 15 kb to 19 kb for the subfamily Paramyxovirinae. The availability of a complete genomic sequence will facilitate investigations on a respiratory virus that is still not completely characterized. PMID:21994536
Maina, Solomon; Edwards, Owain R.; de Almeida, Luis; Ximenes, Abel
2017-01-01
ABSTRACT Analysis of an RNA-seq library from cucumber leaf RNA extracted from a fast technology for analysis of nucleic acids (FTA) card revealed the first complete genome of Cucurbit aphid-borne yellows virus (CABYV) from East Timor. We compare it with 35 complete CABYV genomes from other world regions. It most resembled the genome of the South Korean isolate HD118. PMID:28495776
Collins, Ryan L; Brand, Harrison; Redin, Claire E; Hanscom, Carrie; Antolik, Caroline; Stone, Matthew R; Glessner, Joseph T; Mason, Tamara; Pregno, Giulia; Dorrani, Naghmeh; Mandrile, Giorgia; Giachino, Daniela; Perrin, Danielle; Walsh, Cole; Cipicchio, Michelle; Costello, Maura; Stortchevoi, Alexei; An, Joon-Yong; Currall, Benjamin B; Seabra, Catarina M; Ragavendran, Ashok; Margolin, Lauren; Martinez-Agosto, Julian A; Lucente, Diane; Levy, Brynn; Sanders, Stephan J; Wapner, Ronald J; Quintero-Rivera, Fabiola; Kloosterman, Wigard; Talkowski, Michael E
2017-03-06
Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. We sequenced 689 participants with autism spectrum disorder (ASD) and other developmental abnormalities to construct a genome-wide map of large SV. Using long-insert jumping libraries at 105X mean physical coverage and linked-read whole-genome sequencing from 10X Genomics, we document seven major SV classes at ~5 kb SV resolution. Our results encompass 11,735 distinct large SV sites, 38.1% of which are novel and 16.8% of which are balanced or complex. We characterize 16 recurrent subclasses of complex SV (cxSV), revealing that: (1) cxSV are larger and rarer than canonical SV; (2) each genome harbors 14 large cxSV on average; (3) 84.4% of large cxSVs involve inversion; and (4) most large cxSV (93.8%) have not been delineated in previous studies. Rare SVs are more likely to disrupt coding and regulatory non-coding loci, particularly when truncating constrained and disease-associated genes. We also identify multiple cases of catastrophic chromosomal rearrangements known as chromoanagenesis, including somatic chromoanasynthesis, and extreme balanced germline chromothripsis events involving up to 65 breakpoints and 60.6 Mb across four chromosomes, further defining rare categories of extreme cxSV. These data provide a foundational map of large SV in the morbid human genome and demonstrate a previously underappreciated abundance and diversity of cxSV that should be considered in genomic studies of human disease.
Deppdb--DNA electrostatic potential properties database: electrostatic properties of genome DNA.
Osypov, Alexander A; Krutinin, Gleb G; Kamzolova, Svetlana G
2010-06-01
The electrostatic properties of genome DNA influence its interactions with different proteins, in particular, the regulation of transcription by RNA-polymerases. DEPPDB--DNA Electrostatic Potential Properties Database--was developed to hold and provide all available information on the electrostatic properties of genome DNA combined with its sequence and annotation of biological and structural properties of genome elements and whole genomes. Genomes in DEPPDB are organized on a taxonomical basis. Currently, the database contains all the completely sequenced bacterial and viral genomes according to NCBI RefSeq. General properties of the genome DNA electrostatic potential profile and principles of its formation are revealed. This potential correlates with the GC content but does not correspond to it exactly and strongly depends on both the sequence arrangement and its context (flanking regions). Analysis of the promoter regions for bacterial and viral RNA polymerases revealed a correspondence between the scale of these proteins' physical properties and electrostatic profile patterns. We also discovered a direct correlation between the potential value and the binding frequency of RNA polymerase to DNA, supporting the idea of the role of electrostatics in these interactions. This matches a pronounced tendency of the promoter regions to possess higher values of the electrostatic potential.
Nishimura, Yuki; Kamikawa, Ryoma; Hashimoto, Tetsuo; Inagaki, Yuji
2014-01-01
Mitochondrial (mt) genome sequences, which often bear introns, have been sampled from phylogenetically diverse eukaryotes. Thus, we can anticipate novel insights into intron evolution from previously unstudied mt genomes. We here investigated the origins and evolution of three introns in the mt genome of the haptophyte Chrysochromulina sp. NIES-1333, which was sequenced completely in this study. All the three introns were characterized as group II, on the basis of predicted secondary structure, and the conserved sequence motifs at the 5′ and 3′ termini. Our comparative studies on diverse mt genomes prompt us to propose that the Chrysochromulina mt genome laterally acquired the introns from mt genomes in distantly related eukaryotes. Many group II introns harbor intronic open reading frames for the proteins (intron-encoded proteins or IEPs), which likely facilitate the splicing of their host introns. However, we propose that a “free-standing,” IEP-like protein, which is not encoded within any introns in the Chrysochromulina mt genome, is involved in the splicing of the first cox1 intron that lacks any open reading frames. PMID:25054084
Copper radical oxidases and related extracellular oxidoreductases of wood-decay Agaricomycetes
Phil Kersten; Dan Cullen
2014-01-01
Extracellular peroxide generation, a key component of oxidative lignocellulose degradation, has been attributed to various enzymes including the copper radical oxidases. Encoded by a family of structurally related sequences, the genes are widely distributed among wood decay fungi including three recently completed polypore genomes. In all cases, core catalytic residues...
Turmel, Monique; Otis, Christian; Lemieux, Claude
1999-01-01
Green plants seem to form two sister lineages: Chlorophyta, comprising the green algal classes Prasinophyceae, Ulvophyceae, Trebouxiophyceae, and Chlorophyceae, and Streptophyta, comprising the Charophyceae and land plants. We have determined the complete chloroplast DNA (cpDNA) sequence (200,799 bp) of Nephroselmis olivacea, a member of the class (Prasinophyceae) thought to include descendants of the earliest-diverging green algae. The 127 genes identified in this genome represent the largest gene repertoire among the green algal and land plant cpDNAs completely sequenced to date. Of the Nephroselmis genes, 2 (ycf81 and ftsI, a gene involved in peptidoglycan synthesis) have not been identified in any previously investigated cpDNA; 5 genes [ftsW, rnE, ycf62, rnpB, and trnS(cga)] have been found only in cpDNAs of nongreen algae; and 10 others (ndh genes) have been described only in land plant cpDNAs. Nephroselmis and land plant cpDNAs share the same quadripartite structure—which is characterized by the presence of a large rRNA-encoding inverted repeat and two unequal single-copy regions—and very similar sets of genes in corresponding genomic regions. Given that our phylogenetic analyses place Nephroselmis within the Chlorophyta, these structural characteristics were most likely present in the cpDNA of the common ancestor of chlorophytes and streptophytes. Comparative analyses of chloroplast genomes indicate that the typical quadripartite architecture and gene-partitioning pattern of land plant cpDNAs are ancient features that may have been derived from the genome of the cyanobacterial progenitor of chloroplasts. Our phylogenetic data also offer insight into the chlorophyte ancestor of euglenophyte chloroplasts. PMID:10468594
Zhang, Yulong; Shao, Dandan; Cai, Miao; Yin, Hong; Zhang, Daochuan
2016-01-01
The complete mitochondrial genome of Gryllotalpa unispina was 15,513 bp in length and contained 70.9% AT. All G. unispina protein-coding sequences except for the nad2 started with a typical ATN codon. The usual termination codons (TAA) and incomplete stop codons (T) were found from 13 protein-coding genes. All tRNA genes were folded into the typical cloverleaf secondary structure, except trnS(AGN) lacking the dihydrouridine arm. The sizes of the large and small ribosomal RNA genes were 1245 and 725 bp, respectively. The A + T-rich region was 917 bp in length with 76.8%. The orientation and gene order of the G. unispina mitogenome were identical to the G. orientalis and G. pluvialis, there was no phenomenon of "DK rearrangement" which has been widely reported in Caelifera.
Enterovirus 74 Infection in Children
Peacey, Matthew; Hall, Richard J.; Wang, Jing; Todd, Angela K.; Yen, Seiha; Chan-Hyams, Jasmine; Rand, Christy J.; Stanton, Jo-Ann; Huang, Q. Sue
2013-01-01
Enterovirus 74 (EV74) is a rarely detected viral infection of children. In 2010, EV74 was identified in New Zealand in a 2 year old child with acute flaccid paralysis (AFP) through routine polio AFP surveillance. A further three cases of EV74 were identified in children within six months. These cases are the first report of EV74 in New Zealand. In this study we describe the near complete genome sequence of four EV74 isolates from New Zealand, which shows only limited sequence identity in the non-structural proteins when compared to the other two known EV74 sequences. As is typical of enteroviruses multiple recombination events were evident, particularly in the P2 region and P3 regions. This is the first complete EV74 genome sequenced from a patient with acute flaccid paralysis. PMID:24098514
Cui, Peng; Ji, Rimutu; Ding, Feng; Qi, Dan; Gao, Hongwei; Meng, He; Yu, Jun; Hu, Songnian; Zhang, Heping
2007-01-01
Background The family Camelidae that evolved in North America during the Eocene survived with two distinct tribes, Camelini and Lamini. To investigate the evolutionary relationship between them and to further understand the evolutionary history of this family, we determined the complete mitochondrial genome sequence of the wild two-humped camel (Camelus bactrianus ferus), the only wild survivor of the Old World camel. Results The mitochondrial genome sequence (16,680 bp) from C. bactrianus ferus contains 13 protein-coding, two rRNA, and 22 tRNA genes as well as a typical control region; this basic structure is shared by all metazoan mitochondrial genomes. Its protein-coding region exhibits codon usage common to all mammals and possesses the three cryptic stop codons shared by all vertebrates. C. bactrianus ferus together with the rest of mammalian species do not share a triplet nucleotide insertion (GCC) that encodes a proline residue found only in the nd1 gene of the New World camelid Lama pacos. This lineage-specific insertion in the L. pacos mtDNA occurred after the split between the Old and New World camelids suggests that it may have functional implication since a proline insertion in a protein backbone usually alters protein conformation significantly, and nd1 gene has not been seen as polymorphic as the rest of ND family genes among camelids. Our phylogenetic study based on complete mitochondrial genomes excluding the control region suggested that the divergence of the two tribes may occur in the early Miocene; it is much earlier than what was deduced from the fossil record (11 million years). An evolutionary history reconstructed for the family Camelidae based on cytb sequences suggested that the split of bactrian camel and dromedary may have occurred in North America before the tribe Camelini migrated from North America to Asia. Conclusion Molecular clock analysis of complete mitochondrial genomes from C. bactrianus ferus and L. pacos suggested that the two tribes diverged from their common ancestor about 25 million years ago, much earlier than what was predicted based on fossil records. PMID:17640355
Zheng, Beiwen; Jiang, Xiawei; Cheng, Hong; Xu, Zemin; Li, Ang; Hu, Xinjun; Xiao, Yonghong
2015-12-20
Lactobacillus heilongjiangensis DSM 28069(T) is a potential probiotic isolated from traditional Chinese pickle. Here we report the complete genome sequence of this strain. The complete genome is 2,790,548bp with the GC content of 37.5% and devoid of plasmids. Sets of genes involved in the biosynthesis of riboflavin and folate were identified in the genome, which revealed its potential application in biotechnological industry. The genome sequence of L. heilongjiangensis DSM 28069(T) now provides the fundamental information for future studies. Copyright © 2015 Elsevier B.V. All rights reserved.
Zhang, Wenping; Yue, Bisong; Wang, Xiaofang; Zhang, Xiuyue; Xie, Zhong; Liu, Nonglin; Fu, Wenyuan; Yuan, Yaohua; Chen, Daqing; Fu, Danghua; Zhao, Bo; Yin, Yuzhong; Yan, Xiahui; Wang, Xinjing; Zhang, Rongying; Liu, Jie; Li, Maoping; Tang, Yao; Hou, Rong; Zhang, Zhihe
2011-10-01
In order to investigate the mitochondrial genome of Panthera tigris amoyensis, two South China tigers (P25 and P27) were analyzed following 15 cymt-specific primer sets. The entire mtDNA sequence was found to be 16,957 bp and 17,001 bp long for P25 and P27 respectively, and this difference in length between P25 and P27 occurred in the number of tandem repeats in the RS-3 segment of the control region. The structural characteristics of complete P. t. amoyensis mitochondrial genomes were also highly similar to those of P. uncia. Additionally, the rate of point mutation was only 0.3% and a total of 59 variable sites between P25 and P27 were found. Out of the 59 variable sites, 6 were located in 6 different tRNA genes, 6 in the 2 rRNA genes, 7 in non-coding regions (one located between tRNA-Asn and tRNA-Tyr and six in the D-loop), and 40 in 10 protein-coding genes. COI held the largest amount of variable sites (9 sites) and Cytb contained the highest variable rate (0.7%) in the complete sequences. Moreover, out of the 40 variable sites located in 10 protein-coding genes, 12 sites were nonsynonymous.
Complete genome sequence of Streptosporangium roseum type strain (NI 9100T)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nolan, Matt; Sikorski, Johannes; Jando, Marlen
2010-01-01
Streptosporangium roseum Crauch 1955 is the type strain of the species which is the type species of the genus Streptosporangium. The pinkish coiled Streptomyces-like organism with a spore case was isolated from vegetable garden soil in 1955. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the family Streptosporangiaceae, and the second largest microbial genome sequence ever deciphered. The 10,369,518 bp long genome with its 9421 protein-coding and 80 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaeamore » project.« less
Comparative Genetic Analyses of Human Rhinovirus C (HRV-C) Complete Genome from Malaysia.
Khaw, Yam Sim; Chan, Yoke Fun; Jafar, Faizatul Lela; Othman, Norlijah; Chee, Hui Yee
2016-01-01
Human rhinovirus-C (HRV-C) has been implicated in more severe illnesses than HRV-A and HRV-B, however, the limited number of HRV-C complete genomes (complete 5' and 3' non-coding region and open reading frame sequences) has hindered the in-depth genetic study of this virus. This study aimed to sequence seven complete HRV-C genomes from Malaysia and compare their genetic characteristics with the 18 published HRV-Cs. Seven Malaysian HRV-C complete genomes were obtained with newly redesigned primers. The seven genomes were classified as HRV-C6, C12, C22, C23, C26, C42, and pat16 based on the VP4/VP2 and VP1 pairwise distance threshold classification. Five of the seven Malaysian isolates, namely, 3430-MY-10/C22, 8713-MY-10/C23, 8097-MY-11/C26, 1570-MY-10/C42, and 7383-MY-10/pat16 are the first newly sequenced complete HRV-C genomes. All seven Malaysian isolates genomes displayed nucleotide similarity of 63-81% among themselves and 63-96% with other HRV-Cs. Malaysian HRV-Cs had similar putative immunogenic sites, putative receptor utilization and potential antiviral sites as other HRV-Cs. The genomic features of Malaysian isolates were similar to those of other HRV-Cs. Negative selections were frequently detected in HRV-Cs complete coding sequences indicating that these sequences were under functional constraint. The present study showed that HRV-Cs from Malaysia have diverse genetic sequences but share conserved genomic features with other HRV-Cs. This genetic information could provide further aid in the understanding of HRV-C infection.
Comparative Genetic Analyses of Human Rhinovirus C (HRV-C) Complete Genome from Malaysia
Khaw, Yam Sim; Chan, Yoke Fun; Jafar, Faizatul Lela; Othman, Norlijah; Chee, Hui Yee
2016-01-01
Human rhinovirus-C (HRV-C) has been implicated in more severe illnesses than HRV-A and HRV-B, however, the limited number of HRV-C complete genomes (complete 5′ and 3′ non-coding region and open reading frame sequences) has hindered the in-depth genetic study of this virus. This study aimed to sequence seven complete HRV-C genomes from Malaysia and compare their genetic characteristics with the 18 published HRV-Cs. Seven Malaysian HRV-C complete genomes were obtained with newly redesigned primers. The seven genomes were classified as HRV-C6, C12, C22, C23, C26, C42, and pat16 based on the VP4/VP2 and VP1 pairwise distance threshold classification. Five of the seven Malaysian isolates, namely, 3430-MY-10/C22, 8713-MY-10/C23, 8097-MY-11/C26, 1570-MY-10/C42, and 7383-MY-10/pat16 are the first newly sequenced complete HRV-C genomes. All seven Malaysian isolates genomes displayed nucleotide similarity of 63–81% among themselves and 63–96% with other HRV-Cs. Malaysian HRV-Cs had similar putative immunogenic sites, putative receptor utilization and potential antiviral sites as other HRV-Cs. The genomic features of Malaysian isolates were similar to those of other HRV-Cs. Negative selections were frequently detected in HRV-Cs complete coding sequences indicating that these sequences were under functional constraint. The present study showed that HRV-Cs from Malaysia have diverse genetic sequences but share conserved genomic features with other HRV-Cs. This genetic information could provide further aid in the understanding of HRV-C infection. PMID:27199901
Complete genome sequence of chinese strain of ‘Candidatus Liberibacter asiaticus’
USDA-ARS?s Scientific Manuscript database
The complete genome sequence of ‘Candidatus Liberibacter asiaticus’ strain (Las) Guangxi-1(GX-1) was obtained by an Illumina HiSeq 2000. The GX-1 genome comprises 1,268,237 nucleotides, 36.5 % GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S ...
Iwanowicz, Luke R.; Iwanowicz, Deborah; Adams, Cynthia; Lewis, Teresa D.; Brandt, Thomas M.; Cornman, Robert S.; Sanders, Lakyn R.
2016-01-01
Here, we report the complete genome of a novel aquareovirus isolated from clinically normal fountain darters, Etheostoma fonticola, inhabiting the San Marcos River, Texas, USA. The complete genome consists of 23,958 bp consisting of 11 segments that range from 783 bp (S11) to 3,866 bp (S1).
Adams, Cynthia R.; Lewis, Teresa D.; Brandt, Thomas M.; Sanders, Lakyn
2016-01-01
Here, we report the complete genome of a novel aquareovirus isolated from clinically normal fountain darters, Etheostoma fonticola, inhabiting the San Marcos River, Texas, USA. The complete genome consists of 23,958 bp consisting of 11 segments that range from 783 bp (S11) to 3,866 bp (S1). PMID:28007856
Samuels, Amy K; Weisrock, David W; Smith, Jeramiah J; France, Katherine J; Walker, John A; Putta, Srikrishna; Voss, S Randal
2005-04-11
We report on a study that extended mitochondrial transcript information from a recent EST project to obtain complete mitochondrial genome sequence for 5 tiger salamander complex species (Ambystoma mexicanum, A. t. tigrinum, A. andersoni, A. californiense, and A. dumerilii). We describe, for the first time, aspects of mitochondrial transcription in a representative amphibian, and then use complete mitochondrial sequence data to examine salamander phylogeny at both deep and shallow levels of evolutionary divergence. The available mitochondrial ESTs for A. mexicanum (N=2481) and A. t. tigrinum (N=1205) provided 92% and 87% coverage of the mitochondrial genome, respectively. Complete mitochondrial sequences for all species were rapidly obtained by using long distance PCR and DNA sequencing. A number of genome structural characteristics (base pair length, base composition, gene number, gene boundaries, codon usage) were highly similar among all species and to other distantly related salamanders. Overall, mitochondrial transcription in Ambystoma approximated the pattern observed in other vertebrates. We inferred from the mapping of ESTs onto mtDNA that transcription occurs from both heavy and light strand promoters and continues around the entire length of the mtDNA, followed by post-transcriptional processing. However, the observation of many short transcripts corresponding to rRNA genes indicates that transcription may often terminate prematurely to bias transcription of rRNA genes; indeed an rRNA transcription termination signal sequence was observed immediately following the 16S rRNA gene. Phylogenetic analyses of salamander family relationships consistently grouped Ambystomatidae in a clade containing Cryptobranchidae and Hynobiidae, to the exclusion of Salamandridae. This robust result suggests a novel alternative hypothesis because previous studies have consistently identified Ambystomatidae and Salamandridae as closely related taxa. Phylogenetic analyses of tiger salamander complex species also produced robustly supported trees. The D-loop, used in previous molecular phylogenetic studies of the complex, was found to contain a relatively low level of variation and we identified mitochondrial regions with higher rates of molecular evolution that are more useful in resolving relationships among species. Our results show the benefit of using complete genome mitochondrial information in studies of recently and rapidly diverged taxa.
Gritsun, T S; Venugopal, K; Zanotto, P M; Mikhailov, M V; Sall, A A; Holmes, E C; Polkinghorne, I; Frolova, T V; Pogodina, V V; Lashkevich, V A; Gould, E A
1997-05-01
The complete nucleotide sequence of two tick-transmitted flaviviruses, Vasilchenko (Vs) from Siberia and louping ill (LI) from the UK, have been determined. The genomes were respectively, 10928 and 10871 nucleotides (nt) in length. The coding strategy and functional protein sequence motifs of tick-borne flaviviruses are presented in both Vs and LI viruses. The phylogenies based on maximum likelihood, maximum parsimony and distance analysis of the polyproteins, identified Vs virus as a member of the tick-borne encephalitis virus subgroup within the tick-borne serocomplex, genus Flavivirus, family Flaviviridae. Comparative alignment of the 3'-untranslated regions revealed deletions of different lengths essentially at the same position downstream of the stop codon for all tick-borne viruses. Two direct 27 nucleotide repeats at the 3'-end were found only for Vs and LI virus. Immediately following the deletions a region of 332-334 nt with relatively conserved primary structure (67-94% identity) was observed at the 3'-non-coding end of the virus genome. Pairwise comparisons of the nucleotide sequence data revealed similar levels of variation between the coding region, and the 5' and 3'-termini of the genome, implying an equivalent strong selective control for translated and untranslated regions. Indeed the predicted folding of the 5' and 3'-untranslated regions revealed patterns of stem and loop structures conserved for all tick-borne flaviviruses suggesting a purifying selection for preservation of essential RNA secondary structures which could be involved in translational control and replication. The possible implications of these findings are discussed.
Reverse Genetics and High Throughput Sequencing Methodologies for Plant Functional Genomics
Ben-Amar, Anis; Daldoul, Samia; Reustle, Götz M.; Krczal, Gabriele; Mliki, Ahmed
2016-01-01
In the post-genomic era, increasingly sophisticated genetic tools are being developed with the long-term goal of understanding how the coordinated activity of genes gives rise to a complex organism. With the advent of the next generation sequencing associated with effective computational approaches, wide variety of plant species have been fully sequenced giving a wealth of data sequence information on structure and organization of plant genomes. Since thousands of gene sequences are already known, recently developed functional genomics approaches provide powerful tools to analyze plant gene functions through various gene manipulation technologies. Integration of different omics platforms along with gene annotation and computational analysis may elucidate a complete view in a system biology level. Extensive investigations on reverse genetics methodologies were deployed for assigning biological function to a specific gene or gene product. We provide here an updated overview of these high throughout strategies highlighting recent advances in the knowledge of functional genomics in plants. PMID:28217003
Plastid genome sequence of an ornamental and editable fruit tree of Rosaceae, Prunus mume.
Wang, Shuo; Gao, Cheng-Wen; Gao, Li-Zhi
2016-11-01
Here we assembled and analyzed the complete chloroplast genome of Prunus mume, a popular ornamental and editable fruit tree of Rosaceae. The cp genome exhibited a circular DNA molecule of 157 712 bp with a typical quadripartite structure consisted of two inverted repeat regions (IRa and IRb) of 26 394 bp separated by large (LSC) and small (SSC) single-copy regions of 85 861 and 19 063 bp, respectively. It encoded 112 unique genes, 19 of which were duplicated in the IR regions, giving a total of 131 genes. Eighteen of these genes harbored one or two introns. GC content was 38.9%, and coding regions accounted for 51.3% of the genome. Phylogenetic analysis showed that P. mume clustered with P. persica and P. kansuensis in the genus Punus. This newly determined chloroplast genome will enhance modern breeding programs for the purpose of genetic improvement of this valuable plant.
A Parvovirus B19 synthetic genome: sequence features and functional competence.
Manaresi, Elisabetta; Conti, Ilaria; Bua, Gloria; Bonvicini, Francesca; Gallinella, Giorgio
2017-08-01
Central to genetic studies for Parvovirus B19 (B19V) is the availability of genomic clones that may possess functional competence and ability to generate infectious virus. In our study, we established a new model genetic system for Parvovirus B19. A synthetic approach was followed, by design of a reference genome sequence, by generation of a corresponding artificial construct and its molecular cloning in a complete and functional form, and by setup of an efficient strategy to generate infectious virus, via transfection in UT7/EpoS1 cells and amplification in erythroid progenitor cells. The synthetic genome was able to generate virus with biological properties paralleling those of native virus, its infectious activity being dependent on the preservation of self-complementarity and sequence heterogeneity within the terminal regions. A virus of defined genome sequence, obtained from controlled cell culture conditions, can constitute a reference tool for investigation of the structural and functional characteristics of the virus. Copyright © 2017 Elsevier Inc. All rights reserved.
Molecular epidemiology of Epizootic haematopoietic necrosis virus (EHNV).
Hick, Paul M; Subramaniam, Kuttichantran; Thompson, Patrick M; Waltzek, Thomas B; Becker, Joy A; Whittington, Richard J
2017-11-01
Low genetic diversity of Epizootic haematopoietic necrosis virus (EHNV) was determined for the complete genome of 16 isolates spanning the natural range of hosts, geography and time since the first outbreaks of disease. Genomes ranged from 125,591-127,487 nucleotides with 97.47% pairwise identity and 106-109 genes. All isolates shared 101 core genes with 121 potential genes predicted within the pan-genome of this collection. There was high conservation within 90,181 nucleotides of the core genes with isolates separated by average genetic distance of 3.43 × 10 -4 substitutions per site. Evolutionary analysis of the core genome strongly supported historical epidemiological evidence of iatrogenic spread of EHNV to naïve hosts and establishment of endemic status in discrete ecological niches. There was no evidence of structural genome reorganization, however, the complement of non-core genes and variation in repeat elements enabled fine scale molecular epidemiological investigation of this unpredictable pathogen of fish. Copyright © 2017 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rawat, Suman R.; Mannisto, Minna; Starovoytov, Valentin
2013-01-01
Granulicella tundricola strain MP5ACTX9T is a novel species of the genus Granulicella in subdivision 1 Acidobacteria. G. tundricola is a predominant member of soil bacterial communities, active at low temperatures and nutrient limiting conditions in Arctic alpine tundra. The organism is a cold-adapted acidophile and a versatile heterotroph that hydro-lyzes a suite of sugars and complex polysaccharides. Genome analysis revealed metabolic versatility with genes involved in metabolism and transport of carbohydrates, including gene modules encoding for the carbohydrate-active enzyme (CAZy) families for the break-down, utilization and biosynthesis of diverse structural and storage polysaccharides such as plant based carbon polymers. Themore » genome of G. tundricola strain MP5ACTX9T consists of 4,309,151 bp of a circular chromosome and five mega plasmids with a total genome con-tent of 5,503,984 bp. The genome comprises 4,705 protein-coding genes and 52 RNA genes.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Starkenburg, S. R.; Polle, J. E. W.; Hovde, B.
ABSTRACT The green alga Scenedesmus obliquus is an emerging platform species for the industrial production of biofuels. Here, we report the draft assembly and annotation for the nuclear, plastid, and mitochondrial genomes of S. obliquus strain DOE0152z.
Molecular characterization of the complete genome of falconid herpesvirus strain S-18
USDA-ARS?s Scientific Manuscript database
Falconid herpesvirus type 1 (FHV-1) is the causative agent of falcon inclusion body disease, an acute, highly contagious disease of raptors. The complete nucleotide sequence of the genome of FHV-1 has been determined. The genome is arranged as a D-type genome with large inverted repeats flanking a ...
Starkenburg, S. R.; Polle, J. E. W.; Hovde, B.; ...
2017-08-10
ABSTRACT The green alga Scenedesmus obliquus is an emerging platform species for the industrial production of biofuels. Here, we report the draft assembly and annotation for the nuclear, plastid, and mitochondrial genomes of S. obliquus strain DOE0152z.
USDA-ARS?s Scientific Manuscript database
The complete genomic sequence of a novel putative member of the genus Potyvirus was detected from Callistephus chinensis (china aster) in South Korea. The genomic RNA consists of 9,859 nucleotides excluding the 3’ poly(A) tail. The Callistephus virus genome, which contains the typical open reading f...
2010-01-01
Background The family Tetranychidae (Chelicerata: Acari) includes ~1200 species, many of which are of agronomic importance. To date, mitochondrial genomes of only two Tetranychidae species have been sequenced, and it has been found that these two mitochondrial genomes are characterized by many unusual features in genome organization and structure such as gene order and nucleotide frequency. The scarcity of available sequence data has greatly impeded evolutionary studies in Acari (mites and ticks). Information on Tetranychidae mitochondrial genomes is quite important for phylogenetic evaluation and population genetics, as well as the molecular evolution of functional genes such as acaricide-resistance genes. In this study, we sequenced the complete mitochondrial genome of Panonychus citri (Family Tetranychidae), a worldwide citrus pest, and provide a comparison to other Acari. Results The mitochondrial genome of P. citri is a typical circular molecule of 13,077 bp, and contains the complete set of 37 genes that are usually found in metazoans. This is the smallest mitochondrial genome within all sequenced Acari and other Chelicerata, primarily due to the significant size reduction of protein coding genes (PCGs), a large rRNA gene, and the A + T-rich region. The mitochondrial gene order for P. citri is the same as those for P. ulmi and Tetranychus urticae, but distinctly different from other Acari by a series of gene translocations and/or inversions. The majority of the P. citri mitochondrial genome has a high A + T content (85.28%), which is also reflected by AT-rich codons being used more frequently, but exhibits a positive GC-skew (0.03). The Acari mitochondrial nad1 exhibits a faster amino acid substitution rate than other genes, and the variation of nucleotide substitution patterns of PCGs is significantly correlated with the G + C content. Most tRNA genes of P. citri are extremely truncated and atypical (44-65, 54.1 ± 4.1 bp), lacking either the T- or D-arm, as found in P. ulmi, T. urticae, and other Acariform mites. Conclusions The P. citri mitochondrial gene order is markedly different from those of other chelicerates, but is conserved within the family Tetranychidae indicating that high rearrangements have occurred after Tetranychidae diverged from other Acari. Comparative analyses suggest that the genome size, gene order, gene content, codon usage, and base composition are strongly variable among Acari mitochondrial genomes. While extremely small and unusual tRNA genes seem to be common for Acariform mites, further experimental evidence is needed. PMID:20969792
Yan, Yan; Wang, Yuyu; Liu, Xingyue; Winterton, Shaun L.; Yang, Ding
2014-01-01
In the holometabolous insect order Neuroptera (lacewings), the cosmopolitan Myrmeleontidae (antlions) are the most species-rich family, while the closely related Nymphidae (split-footed lacewings) are a small endemic family from the Australian-Malesian region. Both families belong to the suborder Myrmeleontiformia, within which controversial hypotheses on the interfamilial phylogenetic relationships exist. Herein, we describe the complete mitochondrial (mt) genomes of an antlion (Myrmeleon immanis Walker, 1853) and a split-footed lacewing (Nymphes myrmeleonoides Leach, 1814), representing the first mt genomes for both families. These mt genomes are relatively small (respectively composed of 15,799 and 15,713 bp) compared to other lacewing mt genomes, and comprise 37 genes (13 protein coding genes, 22 tRNA genes and two rRNA genes). The arrangement of these two mt genomes is the same as in most derived Neuroptera mt genomes previously sequenced, specifically with a translocation of trnC. The start codons of all PCGs are started by ATN, with an exception of cox1, which is ACG in the M. immanis mt genome and TCG in N. myrmeleonoides. All tRNA genes have a typical clover-leaf structure of mitochondrial tRNA, with the exception of trnS1(AGN). The secondary structures of rrnL and rrnS are similar with those proposed insects and the domain I contains nine helices rather than eight helices, which is common within Neuroptera. A phylogenetic analysis based on the mt genomic data for all Neuropterida sequenced thus far, supports the monophyly of Myrmeleontiformia and the sister relationship between Ascalaphidae and Myrmeleontidae. PMID:25170303
Yu, Xiang-Qin; Drew, Bryan T; Yang, Jun-Bo; Gao, Lian-Ming; Li, De-Zhu
2017-01-01
Schima is an ecologically and economically important woody genus in tea family (Theaceae). Unresolved species delimitations and phylogenetic relationships within Schima limit our understanding of the genus and hinder utilization of the genus for economic purposes. In the present study, we conducted comparative analysis among the complete chloroplast (cp) genomes of 11 Schima species. Our results indicate that Schima cp genomes possess a typical quadripartite structure, with conserved genomic structure and gene order. The size of the Schima cp genome is about 157 kilo base pairs (kb). They consistently encode 114 unique genes, including 80 protein-coding genes, 30 tRNAs, and 4 rRNAs, with 17 duplicated in the inverted repeat (IR). These cp genomes are highly conserved and do not show obvious expansion or contraction of the IR region. The percent variability of the 68 coding and 93 noncoding (>150 bp) fragments is consistently less than 3%. The seven most widely touted DNA barcode regions as well as one promising barcode candidate showed low sequence divergence. Eight mutational hotspots were identified from the 11 cp genomes. These hotspots may potentially be useful as specific DNA barcodes for species identification of Schima. The 58 cpSSR loci reported here are complementary to the microsatellite markers identified from the nuclear genome, and will be leveraged for further population-level studies. Phylogenetic relationships among the 11 Schima species were resolved with strong support based on the cp genome data set, which corresponds well with the species distribution pattern. The data presented here will serve as a foundation to facilitate species identification, DNA barcoding and phylogenetic reconstructions for future exploration of Schima.
GWFASTA: server for FASTA search in eukaryotic and microbial genomes.
Issac, Biju; Raghava, G P S
2002-09-01
Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rawat, Suman R.; Mannisto, Minna; Starovoytov, Valentin
2012-01-01
Terriglobus saanensis SP1PR4T is a novel species of the genus Terriglobus. T. saanensis is of ecological interest because it is a representative of the phylum Acidobacteria, which are dominant members of bacterial soil microbiota in Arctic ecosystems. T. saanensis is a cold-adapted acidophile and a versatile heterotroph utilizing a suite of simple sugars and complex polysaccharides. The genome contained an abundance of genes assigned to metabolism and transport of carbohydrates including gene modules encoding for carbohydrate-active enzyme (CAZyme) family involved in breakdown, utilization and biosynthesis of diverse structural and storage polysaccharides. T. saanensis SP1PR4T represents the first member of genusmore » Terriglobus with a completed genome sequence, consisting of a single replicon of 5,095,226 base pairs (bp), 54 RNA genes and 4,279 protein-coding genes. We infer that the physiology and metabolic potential of T. saanensis is adapted to allow for resilience to the nutrient-deficient conditions and fluctuating temperatures of Arctic tundra soils.« less
Laurimäe, Teivi; Kinkar, Liina; Romig, Thomas; Omer, Rihab A; Casulli, Adriano; Umhang, Gérald; Gasser, Robin B; Jabbar, Abdul; Sharbatkhori, Mitra; Mirhendi, Hossein; Ponce-Gordo, Francisco; Lazzarini, Lorena E; Soriano, Silvia V; Varcasia, Antonio; Nejad, Mohammad Rostami; Andresiuk, Vanessa; Maravilla, Pablo; González, Luis Miguel; Dybicz, Monika; Gawor, Jakub; Šarkūnas, Mindaugas; Šnábel, Viliam; Kuzmina, Tetiana; Saarma, Urmas
2018-06-12
Cystic echinococcosis (CE) is a zoonotic disease caused by the larval stage of the species complex Echinococcus granulosus sensu lato. Within this complex, genotypes G6 and G7 have been frequently associated with human CE worldwide. Previous studies exploring the genetic variability and phylogeography of genotypes G6 and G7 have been based on relatively short mtDNA sequences, and the resolution of these studies has often been low. Moreover, using short sequences, the distinction between G6 and G7 has in some cases remained challenging. The aim here was to sequence complete mitochondrial genomes (mitogenomes) to obtain deeper insight into the genetic diversity, phylogeny and population structure of genotypes G6 and G7. We sequenced complete mitogenomes of 94 samples collected from 15 different countries worldwide. The results demonstrated that (i) genotypes G6 and G7 can be clearly distinguished when mitogenome sequences are used; (ii) G7 is represented by two major haplogroups, G7a and G7b, the latter being specific to islands of Corsica and Sardinia; (iii) intensive animal trade, but also geographical isolation, have likely had the largest impact on shaping the genetic structure and distribution of genotypes G6 and G7. In addition, we found phylogenetically highly divergent haplotype from Mongolia (Gmon), which had a higher affinity to G6. Copyright © 2017. Published by Elsevier B.V.
Maina, Solomon; Edwards, Owain R; de Almeida, Luis; Ximenes, Abel; Jones, Roger A C
2017-05-11
Analysis of an RNA-seq library from cucumber leaf RNA extracted from a fast technology for analysis of nucleic acids (FTA) card revealed the first complete genome of Cucurbit aphid-borne yellows virus (CABYV) from East Timor. We compare it with 35 complete CABYV genomes from other world regions. It most resembled the genome of the South Korean isolate HD118. Copyright © 2017 Maina et al.
Genetic characterisation of the recent foot-and-mouth disease virus subtype A/IRN/2005
Klein, Joern; Hussain, Manzoor; Ahmad, Munir; Normann, Preben; Afzal, Muhammad; Alexandersen, Soren
2007-01-01
Background According to the World Reference Laboratory for FMD, a new subtype of FMDV serotype A was detected in Iran in 2005. This subtype was designated A/IRN/2005, and rapidly spread throughout Iran and moved westwards into Saudi Arabia and Turkey where it was initially detected from August 2005 and subsequently caused major disease problems in the spring of 2006. The same subtype reached Jordan in 2007. As part of an ongoing project we have also detected this subtype in Pakistan with the first positive samples detected in April 2006. To characterise this subtype in detail, we have sequenced and analysed the complete coding sequence of three subtype A/IRN/2005 isolates collected in Pakistan in 2006, the complete coding sequence of one subtype A/IRN/2005 isolate collected during the first outbreak in Turkey in 2005 and, in addition, the partial 1D coding sequence derived from 4 epithelium samples and 34 swab-samples from Asian buffaloes or cattle subsequently found to be infected with the A/IRN/2005 subtype. Results The phylogenies of the genome regions encoding for the structural proteins, displayed, with the exception of 1A, distinct, serotype-specific clustering and an evolutionary relationship of the A/IRN/2005 sublineage with the A22 sublineage. Potential recombination events have been detected in parts of the genome region coding for the non-structural proteins of FMDV. In addition, amino acid substitutions have been detected in the deduced VP1 protein sequence, potentially related to clinical or subclinical outcome of FMD. Indications of differential susceptibility for developing a subclinical course of disease between Asian buffaloes and cattle have been detected. Furthermore, hitherto unknown insertions of 2 amino acids before the second start codon, as well as sublineage specific amino acids have been detected in the genome region encoding for the leader proteinase of A/IRN/2005 sublineage. Conclusion Our findings indicate that the A/IRN/2005 sublineage has undergone two different paths of evolution for the structural and non-structural genome regions. The structural genome regions have had their evolutionary starting point in the A22 sublineage. It can be assumed that, due to the quasispecies structure of FMDV populations and the error-prone replication process, advantageous mutations in a changed environment have been fixed and lead to the occurrence of the new A/IRN/2005 sublineage. Together with this mechanism, recombination within the non-structural genome regions, potentially modifying the virulence of the virus, may be involved in the success of this new sublineage. The possible origin of this recombinant virus may be a co-infection with Asia1 and a serotype A precursor of the A/IRN/2005 sublineage potentially within Asian Buffaloes, as these appears to relatively easy become infected, but usually without developing clinical disease and consequently showing not a strong acute inflammatory immune response against a second FMDV infection. PMID:18001482
Pengelly, Reuben J; Tapper, William; Gibson, Jane; Knut, Marcin; Tearle, Rick; Collins, Andrew; Ennis, Sarah
2015-09-03
An understanding of linkage disequilibrium (LD) structures in the human genome underpins much of medical genetics and provides a basis for disease gene mapping and investigating biological mechanisms such as recombination and selection. Whole genome sequencing (WGS) provides the opportunity to determine LD structures at maximal resolution. We compare LD maps constructed from WGS data with LD maps produced from the array-based HapMap dataset, for representative European and African populations. WGS provides up to 5.7-fold greater SNP density than array-based data and achieves much greater resolution of LD structure, allowing for identification of up to 2.8-fold more regions of intense recombination. The absence of ascertainment bias in variant genotyping improves the population representativeness of the WGS maps, and highlights the extent of uncaptured variation using array genotyping methodologies. The complete capture of LD patterns using WGS allows for higher genome-wide association study (GWAS) power compared to array-based GWAS, with WGS also allowing for the analysis of rare variation. The impact of marker ascertainment issues in arrays has been greatest for Sub-Saharan African populations where larger sample sizes and substantially higher marker densities are required to fully resolve the LD structure. WGS provides the best possible resource for LD mapping due to the maximal marker density and lack of ascertainment bias. WGS LD maps provide a rich resource for medical and population genetics studies. The increasing availability of WGS data for large populations will allow for improved research utilising LD, such as GWAS and recombination biology studies.
Nelson, Leigh A; Cameron, Stephen L; Yeates, David K
2011-10-01
The monogeneric family Fergusoninidae consists of gall-forming flies that, together with Fergusobia (Tylenchida: Neotylenchidae) nematodes, form the only known mutualistic association between insects and nematodes. In this study, the entire 16,000 bp mitochondrial genome of Fergusonina taylori Nelson and Yeates was sequenced. The circular genome contains one encoding region including 27 genes and one non-coding A+T-rich region. The arrangement of the protein-coding, ribosomal RNA (rRNA) and transfer RNA (tRNA) genes was the same as that found in the ancestral insect. Nucleotide composition is highly A+T biased. All of the protein initiation codons are ATN, except for nad1 which begins with TTT. All 22 tRNA anticodons of F. taylori match those observed in Drosophila yakuba, and all form the typical cloverleaf structure except for tRNA-Ser((AGN)) which lacks a dihydrouridine (DHU) arm. Secondary structural features of the rRNA genes of Fergusonina are similar to those proposed for other insects, with minor modifications. The mitochondrial genome of Fergusonina presented here may prove valuable for resolving the sister group to the Fergusoninidae, and expands the available mtDNA data sources for acalyptrates overall.
Adel, Susan; Kakularam, Kumar Reddy; Horn, Thomas; Reddanna, Pallu; Kuhn, Hartmut; Heydeck, Dagmar
2015-01-01
Mammalian lipoxygenases (LOXs) have been implicated in cell differentiation and in the biosynthesis of pro- and anti-inflammatory lipid mediators. The initial draft sequence of the Homo neanderthalensis genome (coverage of 1.3-fold) suggested defective leukotriene signaling in this archaic human subspecies since expression of essential proteins appeared to be corrupted. Meanwhile high quality genomic sequence data became available for two extinct human subspecies (H. neanderthalensis, Homo denisovan) and completion of the human 1000 genome project provided a comprehensive database characterizing the genetic variability of the human genome. For this study we extracted the nucleotide sequences of selected eicosanoid relevant genes (ALOX5, ALOX15, ALOX12, ALOX15B, ALOX12B, ALOXE3, COX1, COX2, LTA4H, LTC4S, ALOX5AP, CYSLTR1, CYSLTR2, BLTR1, BLTR2) from the corresponding databases. Comparison of the deduced amino acid sequences in connection with site-directed mutagenesis studies and structural modeling suggested that the major enzymes and receptors of leukotriene signaling as well as the two cyclooxygenase isoforms were fully functional in these two extinct human subspecies. Copyright © 2014 Elsevier Inc. All rights reserved.
The struggle for life of the genome's selfish architects
2011-01-01
Transposable elements (TEs) were first discovered more than 50 years ago, but were totally ignored for a long time. Over the last few decades they have gradually attracted increasing interest from research scientists. Initially they were viewed as totally marginal and anecdotic, but TEs have been revealed as potentially harmful parasitic entities, ubiquitous in genomes, and finally as unavoidable actors in the diversity, structure, and evolution of the genome. Since Darwin's theory of evolution, and the progress of molecular biology, transposable elements may be the discovery that has most influenced our vision of (genome) evolution. In this review, we provide a synopsis of what is known about the complex interactions that exist between transposable elements and the host genome. Numerous examples of these interactions are provided, first from the standpoint of the genome, and then from that of the transposable elements. We also explore the evolutionary aspects of TEs in the light of post-Darwinian theories of evolution. Reviewers This article was reviewed by Jerzy Jurka, Jürgen Brosius and I. King Jordan. For complete reports, see the Reviewers' reports section. PMID:21414203
Meganathan, P R; Pagan, Heidi J T; McCulloch, Eve S; Stevens, Richard D; Ray, David A
2012-01-15
Order Chiroptera is a unique group of mammals whose members have attained self-powered flight as their main mode of locomotion. Much speculation persists regarding bat evolution; however, lack of sufficient molecular data hampers evolutionary and conservation studies. Of ~1200 species, complete mitochondrial genome sequences are available for only eleven. Additional sequences should be generated if we are to resolve many questions concerning these fascinating mammals. Herein, we describe the complete mitochondrial genomes of three bats: Corynorhinus rafinesquii, Lasiurus borealis and Artibeus lituratus. We also compare the currently available mitochondrial genomes and analyze codon usage in Chiroptera. C. rafinesquii, L. borealis and A. lituratus mitochondrial genomes are 16438 bp, 17048 bp and 16709 bp, respectively. Genome organization and gene arrangements are similar to other bats. Phylogenetic analyses using complete mitochondrial genome sequences support previously established phylogenetic relationships and suggest utility in future studies focusing on the evolutionary aspects of these species. Comprehensive analyses of available bat mitochondrial genomes reveal distinct nucleotide patterns and synonymous codon preferences corresponding to different chiropteran families. These patterns suggest that mutational and selection forces are acting to different extents within Chiroptera and shape their mitochondrial genomes. Copyright © 2011 Elsevier B.V. All rights reserved.
Klorin, Geula; Rozenblum, Ester; Glebov, Oleg; Walker, Robert L; Park, Yoonsoo; Meltzer, Paul S; Kirsch, Ilan R; Kaye, Frederic J; Roschke, Anna V
2013-05-01
High-resolution oligonucleotide array comparative genomic hybridization (aCGH) and spectral karyotyping (SKY) were applied to a panel of malignant mesothelioma (MMt) cell lines. SKY has not been applied to MMt before, and complete karyotypes are reported based on the integration of SKY and aCGH results. A whole genome search for homozygous deletions (HDs) produced the largest set of recurrent and non-recurrent HDs for MMt (52 recurrent HDs in 10 genomic regions; 36 non-recurrent HDs). For the first time, LINGO2, RBFOX1/A2BP1, RPL29, DUSP7, and CCSER1/FAM190A were found to be homozygously deleted in MMt, and some of these genes could be new tumor suppressor genes for MMt. Integration of SKY and aCGH data allowed reconstruction of chromosomal rearrangements that led to the formation of HDs. Our data imply that only with acquisition of structural and/or numerical karyotypic instability can MMt cells attain a complete loss of tumor suppressor genes located in 9p21.3, which is the most frequently homozygously deleted region. Tetraploidization is a late event in the karyotypic progression of MMt cells, after HDs in the 9p21.3 region have already been acquired. Published by Elsevier Inc.
Coleman, Jonathan R I; Lester, Kathryn J; Keers, Robert; Roberts, Susanna; Curtis, Charles; Arendt, Kristian; Bögels, Susan; Cooper, Peter; Creswell, Cathy; Dalgleish, Tim; Hartman, Catharina A; Heiervang, Einar R; Hötzel, Katrin; Hudson, Jennifer L; In-Albon, Tina; Lavallee, Kristen; Lyneham, Heidi J; Marin, Carla E; Meiser-Stedman, Richard; Morris, Talia; Nauta, Maaike H; Rapee, Ronald M; Schneider, Silvia; Schneider, Sophie C; Silverman, Wendy K; Thastum, Mikael; Thirlwall, Kerstin; Waite, Polly; Wergeland, Gro Janne; Breen, Gerome; Eley, Thalia C
2016-09-01
Anxiety disorders are common, and cognitive-behavioural therapy (CBT) is a first-line treatment. Candidate gene studies have suggested a genetic basis to treatment response, but findings have been inconsistent. To perform the first genome-wide association study (GWAS) of psychological treatment response in children with anxiety disorders (n = 980). Presence and severity of anxiety was assessed using semi-structured interview at baseline, on completion of treatment (post-treatment), and 3 to 12 months after treatment completion (follow-up). DNA was genotyped using the Illumina Human Core Exome-12v1.0 array. Linear mixed models were used to test associations between genetic variants and response (change in symptom severity) immediately post-treatment and at 6-month follow-up. No variants passed a genome-wide significance threshold (P = 5 × 10(-8)) in either analysis. Four variants met criteria for suggestive significance (P<5 × 10(-6)) in association with response post-treatment, and three variants in the 6-month follow-up analysis. This is the first genome-wide therapygenetic study. It suggests no common variants of very high effect underlie response to CBT. Future investigations should maximise power to detect single-variant and polygenic effects by using larger, more homogeneous cohorts. © The Royal College of Psychiatrists 2016.
Zhang, Yue; Feng, Shiqian; Zeng, Yiying; Ning, Hong; Liu, Lijun; Zhao, Zihua; Jiang, Fan; Li, Zhihong
2018-06-23
Bactrocera tsuneonis (Miyake), generally known as the Japanese orange fly, is considered to be a major pest of commercial citrus crops. It has a limited distribution in China, Japan and Vietnam, but it has the potential to invade areas outside of Asia. More genetic information of B. tsuneonis should be obtained in order to develop effective methodologies for rapid and accurate molecular identification due to the difficulty of distinguishing it from Bactrocera minax based on morphological features. We report here the whole mitochondrial genome of B. tsuneonis sequenced by next-generation sequencing. This mitogenome sequence had a total length of 15,865 bp, a typical circular molecule comprising 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The structure and organization of the molecule were typical and similar compared with the published homologous sequences of other fruit flies in Tephritidae. The phylogenetic analyses based on the mitochondrial genome data presented a close genetic relationship between B. tsuneonis and B. minax. This is the first report of the complete mitochondrial genome of B. tsuneonis, and it can be used in further studies of species diagnosis, evolutionary biology, prevention and control. Copyright © 2018. Published by Elsevier B.V.
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes
Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim
2010-01-01
Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith–Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid™, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. Availability: The database can be accessed through http://proteinworlddb.org Contact: otto@fiocruz.br PMID:20089515
Chen, Zhi-Teng; Du, Yu-Zhou
2015-03-01
The complete mitochondrial genome of the stonefly, Sweltsa longistyla Wu (Plecoptera: Chloroperlidae), was sequenced in this study. The mitogenome of S. longistyla is 16,151bp and contains 37 genes including 13 protein-coding genes (PCGs), 22 tRNA genes, two rRNA genes, and a large non-coding region. S. longistyla, Pteronarcys princeps Banks, Kamimuria wangi Du and Cryptoperla stilifera Sivec belong to the Plecoptera, and the gene order and orientation of their mitogenomes were similar. The overall AT content for the four stoneflies was below 72%, and the AT content of tRNA genes was above 69%. The four genomes were compact and contained only 65-127bp of non-coding intergenic DNAs. Overlapping nucleotides existed in all four genomes and ranged from 24 (P. princeps) to 178bp (K. wangi). There was a 7-bp motif ('ATGATAA') of overlapping DNA and an 8-bp motif (AAGCCTTA) conserved in three stonefly species (P. princeps, K. wangi and C. stilifera). The control regions of four stoneflies contained a stem-loop structure. Four conserved sequence blocks (CSBs) were present in the A+T-rich regions of all four stoneflies. Copyright © 2014 Elsevier B.V. All rights reserved.
Coleman, Jonathan R. I.; Lester, Kathryn J.; Keers, Robert; Roberts, Susanna; Curtis, Charles; Arendt, Kristian; Bögels, Susan; Cooper, Peter; Creswell, Cathy; Dalgleish, Tim; Hartman, Catharina A.; Heiervang, Einar R.; Hötzel, Katrin; Hudson, Jennifer L.; In-Albon, Tina; Lavallee, Kristen; Lyneham, Heidi J.; Marin, Carla E.; Meiser-Stedman, Richard; Morris, Talia; Nauta, Maaike H.; Rapee, Ronald M.; Schneider, Silvia; Schneider, Sophie C.; Silverman, Wendy K.; Thastum, Mikael; Thirlwall, Kerstin; Waite, Polly; Wergeland, Gro Janne; Breen, Gerome; Eley, Thalia C.
2016-01-01
Background Anxiety disorders are common, and cognitive–behavioural therapy (CBT) is a first-line treatment. Candidate gene studies have suggested a genetic basis to treatment response, but findings have been inconsistent. Aims To perform the first genome-wide association study (GWAS) of psychological treatment response in children with anxiety disorders (n = 980). Method Presence and severity of anxiety was assessed using semi-structured interview at baseline, on completion of treatment (post-treatment), and 3 to 12 months after treatment completion (follow-up). DNA was genotyped using the Illumina Human Core Exome-12v1.0 array. Linear mixed models were used to test associations between genetic variants and response (change in symptom severity) immediately post-treatment and at 6-month follow-up. Results No variants passed a genome-wide significance threshold (P = 5 × 10−8) in either analysis. Four variants met criteria for suggestive significance (P<5 × 10−6) in association with response post-treatment, and three variants in the 6-month follow-up analysis. Conclusions This is the first genome-wide therapygenetic study. It suggests no common variants of very high effect underlie response to CBT. Future investigations should maximise power to detect single-variant and polygenic effects by using larger, more homogeneous cohorts. PMID:26989097
Lee, Jungeun; Kang, Yoonjee; Shin, Seung Chul; Park, Hyun; Lee, Hyoungseok
2014-01-01
Background Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been researched as an important ecological marker and as an extremophile plant for studies on stress tolerance. Despite its importance, little genomic information is available for D. antarctica. Here, we report the complete chloroplast genome, transcriptome profiles of the coding/noncoding genes, and the posttranscriptional processing by RNA editing in the chloroplast system. Results The complete chloroplast genome of D. antarctica is 135,362 bp in length with a typical quadripartite structure, including the large (LSC: 79,881 bp) and small (SSC: 12,519 bp) single-copy regions, separated by a pair of identical inverted repeats (IR: 21,481 bp). It contains 114 unique genes, including 81 unique protein-coding genes, 29 tRNA genes, and 4 rRNA genes. Sequence divergence analysis with other plastomes from the BEP clade of the grass family suggests a sister relationship between D. antarctica, Festuca arundinacea and Lolium perenne of the Poeae tribe, based on the whole plastome. In addition, we conducted high-resolution mapping of the chloroplast-derived transcripts. Thus, we created an expression profile for 81 protein-coding genes and identified ndhC, psbJ, rps19, psaJ, and psbA as the most highly expressed chloroplast genes. Small RNA-seq analysis identified 27 small noncoding RNAs of chloroplast origin that were preferentially located near the 5′- or 3′-ends of genes. We also found >30 RNA-editing sites in the D. antarctica chloroplast genome, with a dominance of C-to-U conversions. Conclusions We assembled and characterized the complete chloroplast genome sequence of D. antarctica and investigated the features of the plastid transcriptome. These data may contribute to a better understanding of the evolution of D. antarctica within the Poaceae family for use in molecular phylogenetic studies and may also help researchers understand the characteristics of the chloroplast transcriptome. PMID:24647560
Lee, Jungeun; Kang, Yoonjee; Shin, Seung Chul; Park, Hyun; Lee, Hyoungseok
2014-01-01
Antarctic hairgrass (Deschampsia antarctica Desv.) is the only natural grass species in the maritime Antarctic. It has been researched as an important ecological marker and as an extremophile plant for studies on stress tolerance. Despite its importance, little genomic information is available for D. antarctica. Here, we report the complete chloroplast genome, transcriptome profiles of the coding/noncoding genes, and the posttranscriptional processing by RNA editing in the chloroplast system. The complete chloroplast genome of D. antarctica is 135,362 bp in length with a typical quadripartite structure, including the large (LSC: 79,881 bp) and small (SSC: 12,519 bp) single-copy regions, separated by a pair of identical inverted repeats (IR: 21,481 bp). It contains 114 unique genes, including 81 unique protein-coding genes, 29 tRNA genes, and 4 rRNA genes. Sequence divergence analysis with other plastomes from the BEP clade of the grass family suggests a sister relationship between D. antarctica, Festuca arundinacea and Lolium perenne of the Poeae tribe, based on the whole plastome. In addition, we conducted high-resolution mapping of the chloroplast-derived transcripts. Thus, we created an expression profile for 81 protein-coding genes and identified ndhC, psbJ, rps19, psaJ, and psbA as the most highly expressed chloroplast genes. Small RNA-seq analysis identified 27 small noncoding RNAs of chloroplast origin that were preferentially located near the 5'- or 3'-ends of genes. We also found >30 RNA-editing sites in the D. antarctica chloroplast genome, with a dominance of C-to-U conversions. We assembled and characterized the complete chloroplast genome sequence of D. antarctica and investigated the features of the plastid transcriptome. These data may contribute to a better understanding of the evolution of D. antarctica within the Poaceae family for use in molecular phylogenetic studies and may also help researchers understand the characteristics of the chloroplast transcriptome.
Iwanowicz, Luke R; Iwanowicz, Deborah D; Adams, Cynthia R; Lewis, Teresa D; Brandt, Thomas M; Cornman, Robert S; Sanders, Lakyn
2016-12-22
Here, we report the complete genome of a novel aquareovirus isolated from clinically normal fountain darters, Etheostoma fonticola, inhabiting the San Marcos River, Texas, USA. The complete genome consists of 23,958 bp consisting of 11 segments that range from 783 bp (S11) to 3,866 bp (S1). Copyright © 2016 Iwanowicz et al.
First Complete Squash leaf curl China virus Genomic Segment DNA-A Sequence from East Timor
Maina, Solomon; Edwards, Owain R.; de Almeida, Luis; Ximenes, Abel
2017-01-01
ABSTRACT We present here the first complete Squash leaf curl China virus (SLCCV) genomic segment DNA-A sequence from East Timor. It was isolated from a pumpkin plant. When compared with 15 complete SLCCV DNA-A genome sequences from other world regions, it most resembled the Malaysian isolate MC1 sequence. PMID:28619789
Ogura, Kohei; Watanabe, Shinya; Kirikae, Teruo; Miyoshi-Akiyama, Tohru
2017-01-01
Epidemiologic typing of Streptococcus pyogenes (GAS) is frequently based on the genotype of the emm gene, which encodes M/Emm protein. In this study, the complete genome sequence of GAS emm3 strain M3-b, isolated from a patient with streptococcal toxic shock syndrome (STSS), was determined. This strain exhibited 99% identity with other complete genome sequences of emm3 strains MGAS315, SSI-1, and STAB902. The complete genomes of five additional strains isolated from Japanese patients with and without STSS were also sequences. Maximum-likelihood phylogenetic analysis showed that strains M3-b, M3-e, and SSI-1, all which were isolated from STSS patients, were relatively close.
Transposable element evolution in Heliconius suggests genome diversity within Lepidoptera
2013-01-01
Background Transposable elements (TEs) have the potential to impact genome structure, function and evolution in profound ways. In order to understand the contribution of transposable elements (TEs) to Heliconius melpomene, we queried the H. melpomene draft sequence to identify repetitive sequences. Results We determined that TEs comprise ~25% of the genome. The predominant class of TEs (~12% of the genome) was the non-long terminal repeat (non-LTR) retrotransposons, including a novel SINE family. However, this was only slightly higher than content derived from DNA transposons, which are diverse, with several families having mobilized in the recent past. Compared to the only other well-studied lepidopteran genome, Bombyx mori, H. melpomene exhibits a higher DNA transposon content and a distinct repertoire of retrotransposons. We also found that H. melpomene exhibits a high rate of TE turnover with few older elements accumulating in the genome. Conclusions Our analysis represents the first complete, de novo characterization of TE content in a butterfly genome and suggests that, while TEs are able to invade and multiply, TEs have an overall deleterious effect and/or that maintaining a small genome is advantageous. Our results also hint that analysis of additional lepidopteran genomes will reveal substantial TE diversity within the group. PMID:24088337
A 'periodic table' for protein structures.
Taylor, William R
2002-04-11
Current structural genomics programs aim systematically to determine the structures of all proteins coded in both human and other genomes, providing a complete picture of the number and variety of protein structures that exist. In the past, estimates have been made on the basis of the incomplete sample of structures currently known. These estimates have varied greatly (between 1,000 and 10,000; see for example refs 1 and 2), partly because of limited sample size but also owing to the difficulties of distinguishing one structure from another. This distinction is usually topological, based on the fold of the protein; however, in strict topological terms (neglecting to consider intra-chain cross-links), protein chains are open strings and hence are all identical. To avoid this trivial result, topologies are determined by considering secondary links in the form of intra-chain hydrogen bonds (secondary structure) and tertiary links formed by the packing of secondary structures. However, small additions to or loss of structure can make large changes to these perceived topologies and such subjective solutions are neither robust nor amenable to automation. Here I formalize both secondary and tertiary links to allow the rigorous and automatic definition of protein topology.
Analysis of whole genome sequences of 16 strains of rubella virus from the United States, 1961-2009.
Abernathy, Emily; Chen, Min-hsin; Bera, Jayati; Shrivastava, Susmita; Kirkness, Ewen; Zheng, Qi; Bellini, William; Icenogle, Joseph
2013-01-25
Rubella virus is the causative agent of rubella, a mild rash illness, and a potent teratogenic agent when contracted by a pregnant woman. Global rubella control programs target the reduction and elimination of congenital rubella syndrome. Phylogenetic analysis of partial sequences of rubella viruses has contributed to virus surveillance efforts and played an important role in demonstrating that indigenous rubella viruses have been eliminated in the United States. Sixteen wild-type rubella viruses were chosen for whole genome sequencing. All 16 viruses were collected in the United States from 1961 to 2009 and are from 8 of the 13 known rubella genotypes. Phylogenetic analysis of 30 whole genome sequences produced a maximum likelihood tree giving high bootstrap values for all genotypes except provisional genotype 1a. Comparison of the 16 new complete sequences and 14 previously sequenced wild-type viruses found regions with clusters of variable amino acids. The 5' 250 nucleotides of the genome are more conserved than any other part of the genome. Genotype specific deletions in the untranslated region between the non-structural and structural open reading frames were observed for genotypes 2B and genotype 1G. No evidence was seen for recombination events among the 30 viruses. The analysis presented here is consistent with previous reports on the genetic characterization of rubella virus genomes. Conserved and variable regions were identified and additional evidence for genotype specific nucleotide deletions in the intergenic region was found. Phylogenetic analysis confirmed genotype groupings originally based on structural protein coding region sequences, which provides support for the WHO nomenclature for genetic characterization of wild-type rubella viruses.
Complete Genome Sequence of the Mesoplasma florum W37 Strain
Baby, Vincent; Matteau, Dominick; Knight, Thomas F.
2013-01-01
Mesoplasma florum is a small-genome fast-growing mollicute that is an attractive model for systems and synthetic genomics studies. We report the complete 825,824-bp genome sequence of a second representative of this species, M. florum strain W37, which contains 733 predicted open reading frames and 35 stable RNAs. PMID:24285658
Complete genome sequence of Sulfurimonas autotrophica type strain (OK10T)
Sikorski, Johannes; Munk, Christine; Lapidus, Alla; Ngatchou Djao, Olivier Duplex; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Han, Cliff; Cheng, Jan-Fang; Tapia, Roxanne; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Sims, David; Meincke, Linda; Brettin, Thomas; Detter, John C.; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Lang, Elke; Spring, Stefan; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter
2010-01-01
Sulfurimonas autotrophica Inagaki et al. 2003 is the type species of the genus Sulfurimonas. This genus is of interest because of its significant contribution to the global sulfur cycle as it oxidizes sulfur compounds to sulfate and by its apparent habitation of deep-sea hydrothermal and marine sulfidic environments as potential ecological niche. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the second complete genome sequence of the genus Sulfurimonas and the 15th genome in the family Helicobacteraceae. The 2,153,198 bp long genome with its 2,165 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304749
Complete genome sequence of Aminobacterium colombiense type strain (ALA-1T)
Chertkov, Olga; Sikorski, Johannes; Brambilla, Evelyne; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Detter, John C.; Bruce, David; Tapia, Roxanne; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Spring, Stefan; Rohde, Manfred; Göker, Markus; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter
2010-01-01
Aminobacterium colombiense Baena et al. 1999 is the type species of the genus Aminobacterium. This genus is of large interest because of its isolated phylogenetic location in the family Synergistaceae, its strictly anaerobic lifestyle, and its ability to grow by fermentation of a limited range of amino acids but not carbohydrates. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the second completed genome sequence of a member of the family Synergistaceae and the first genome sequence of a member of the genus Aminobacterium. The 1,980,592 bp long genome with its 1,914 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304712
Biswal, Devendra Kumar; Ghatani, Sudeep; Shylla, Jollin A.; Sahu, Ranjana; Mullapudi, Nandita
2013-01-01
Helminths include both parasitic nematodes (roundworms) and platyhelminths (trematode and cestode flatworms) that are abundant, and are of clinical importance. The genetic characterization of parasitic flatworms using advanced molecular tools is central to the diagnosis and control of infections. Although the nuclear genome houses suitable genetic markers (e.g., in ribosomal (r) DNA) for species identification and molecular characterization, the mitochondrial (mt) genome consistently provides a rich source of novel markers for informative systematics and epidemiological studies. In the last decade, there have been some important advances in mtDNA genomics of helminths, especially lung flukes, liver flukes and intestinal flukes. Fasciolopsis buski, often called the giant intestinal fluke, is one of the largest digenean trematodes infecting humans and found primarily in Asia, in particular the Indian subcontinent. Next-generation sequencing (NGS) technologies now provide opportunities for high throughput sequencing, assembly and annotation within a short span of time. Herein, we describe a high-throughput sequencing and bioinformatics pipeline for mt genomics for F. buski that emphasizes the utility of short read NGS platforms such as Ion Torrent and Illumina in successfully sequencing and assembling the mt genome using innovative approaches for PCR primer design as well as assembly. We took advantage of our NGS whole genome sequence data (unpublished so far) for F. buski and its comparison with available data for the Fasciola hepatica mtDNA as the reference genome for design of precise and specific primers for amplification of mt genome sequences from F. buski. A long-range PCR was carried out to create an NGS library enriched in mt DNA sequences. Two different NGS platforms were employed for complete sequencing, assembly and annotation of the F. buski mt genome. The complete mt genome sequences of the intestinal fluke comprise 14,118 bp and is thus the shortest trematode mitochondrial genome sequenced to date. The noncoding control regions are separated into two parts by the tRNA-Gly gene and don’t contain either tandem repeats or secondary structures, which are typical for trematode control regions. The gene content and arrangement are identical to that of F. hepatica. The F. buski mtDNA genome has a close resemblance with F. hepatica and has a similar gene order tallying with that of other trematodes. The mtDNA for the intestinal fluke is reported herein for the first time by our group that would help investigate Fasciolidae taxonomy and systematics with the aid of mtDNA NGS data. More so, it would serve as a resource for comparative mitochondrial genomics and systematic studies of trematode parasites. PMID:24255820
Dash, Paban Kumar; Sharma, Shashi; Soni, Manisha; Agarwal, Ankita; Sahni, Ajay Kumar; Parida, Manmohan
2015-01-02
Dengue is now hyper-endemic in most parts of south and southeast Asia including India. The northern India particularly national capital New Delhi witnessed major Dengue outbreaks with Dengue virus type 1 (DENV-1) as the dominant serotype since last five years. This study was initiated to decipher the complete genome information of recently circulating DENV-1 (2009-2011) along with the prototype Indian DENV-1, isolated in 1956. Further extensive ML phylogenetic and Bayesian phylogeography analysis was carried out to investigate the evolution of this virus and understand its spatiotemporal diffusion across the globe. The complete genome analysis revealed deletion of a unique 21-nucleotide stretch in the 3' un-translated region of recent Indian DENV-1. The north Indian DENV-1 revealed up to 5.2% nucleotide sequence difference compared to recent isolates from southern India. Selection pressure analysis revealed positive selection in few amino acid sites of both structural and non-structural proteins. The molecular phylogeny classified the Indian DENV-1 into genotype III, which is also known as cosmopolitan genotype. The northern and southern Indian DENV-1 were grouped into distinct clades. The molecular clock analysis estimated a mean evolutionary rate of 7.08×10(-4) substitutions/site/year for cosmopolitan genotype. The phylogeography analysis revealed that the cosmopolitan genotype DENV-1 originated ∼1938 in India and subsequently spread globally. The diffusion of virus from India to Caribbean and South America was confirmed through SPREAD analysis. This study also confirmed the temporal displacement of different clades of DENV-1 in India over last five decades. Copyright © 2014 Elsevier B.V. All rights reserved.
Phylogenomic Analysis and Dynamic Evolution of Chloroplast Genomes in Salicaceae
Huang, Yuan; Wang, Jun; Yang, Yongping; Fan, Chuanzhu; Chen, Jiahui
2017-01-01
Chloroplast genomes of plants are highly conserved in both gene order and gene content. Analysis of the whole chloroplast genome is known to provide much more informative DNA sites and thus generates high resolution for plant phylogenies. Here, we report the complete chloroplast genomes of three Salix species in family Salicaceae. Phylogeny of Salicaceae inferred from complete chloroplast genomes is generally consistent with previous studies but resolved with higher statistical support. Incongruences of phylogeny, however, are observed in genus Populus, which most likely results from homoplasy. By comparing three Salix chloroplast genomes with the published chloroplast genomes of other Salicaceae species, we demonstrate that the synteny and length of chloroplast genomes in Salicaceae are highly conserved but experienced dynamic evolution among species. We identify seven positively selected chloroplast genes in Salicaceae, which might be related to the adaptive evolution of Salicaceae species. Comparative chloroplast genome analysis within the family also indicates that some chloroplast genes are lost or became pseudogenes, infer that the chloroplast genes horizontally transferred to the nucleus genome. Based on the complete nucleus genome sequences from two Salicaceae species, we remarkably identify that the entire chloroplast genome is indeed transferred and integrated to the nucleus genome in the individual of the reference genome of P. trichocarpa at least once. This observation, along with presence of the large nuclear plastid DNA (NUPTs) and NUPTs-containing multiple chloroplast genes in their original order in the chloroplast genome, favors the DNA-mediated hypothesis of organelle to nucleus DNA transfer. Overall, the phylogenomic analysis using chloroplast complete genomes clearly elucidates the phylogeny of Salicaceae. The identification of positively selected chloroplast genes and dynamic chloroplast-to-nucleus gene transfers in Salicaceae provide resources to better understand the successful adaptation of Salicaceae species. PMID:28676809
de Cambiaire, Jean-Charles; Otis, Christian; Lemieux, Claude; Turmel, Monique
2006-01-01
Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. While the basal position of the Prasinophyceae is well established, the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae (UTC) remains uncertain. The five complete chloroplast DNA (cpDNA) sequences currently available for representatives of these classes display considerable variability in overall structure, gene content, gene density, intron content and gene order. Among these genomes, that of the chlorophycean green alga Chlamydomonas reinhardtii has retained the least ancestral features. The two single-copy regions, which are separated from one another by the large inverted repeat (IR), have similar sizes, rather than unequal sizes, and differ radically in both gene contents and gene organizations relative to the single-copy regions of prasinophyte and ulvophyte cpDNAs. To gain insights into the various changes that underwent the chloroplast genome during the evolution of chlorophycean green algae, we have sequenced the cpDNA of Scenedesmus obliquus, a member of a distinct chlorophycean lineage. Results The 161,452 bp IR-containing genome of Scenedesmus features single-copy regions of similar sizes, encodes 96 genes, i.e. only two additional genes (infA and rpl12) relative to its Chlamydomonas homologue and contains seven group I and two group II introns. It is clearly more compact than the four UTC algal cpDNAs that have been examined so far, displays the lowest proportion of short repeats among these algae and shows a stronger bias in clustering of genes on the same DNA strand compared to Chlamydomonas cpDNA. Like the latter genome, Scenedesmus cpDNA displays only a few ancestral gene clusters. The two chlorophycean genomes share 11 gene clusters that are not found in previously sequenced trebouxiophyte and ulvophyte cpDNAs as well as a few genes that have an unusual structure; however, their single-copy regions differ considerably in gene content. Conclusion Our results underscore the remarkable plasticity of the chlorophycean chloroplast genome. Owing to this plasticity, only a sketchy portrait could be drawn for the chloroplast genome of the last common ancestor of Scenedesmus and Chlamydomonas. PMID:16638149
Behere, G T; Firake, D M; Tay, W T; Azad Thakur, N S; Ngachan, S V
2016-01-01
Ladybird beetles are generally considered as agriculturally beneficial insects, but the ladybird beetles in the coleopteran subfamily Epilachninae are phytophagous and major plant feeding pest species which causes severe economic losses to cucurbitaceous and solanaceous crops. Henosepilachna pusillanima (Mulsant) is one of the important pest species of ladybird beetle. In this report, we sequenced and characterized the complete mitochondrial genome of H. pusillanima. For sequencing of the complete mitochondrial genome, we used the Ion Torrent sequencing platform. The complete circular mitochondrial genome of the H. pusillanima was determined to be 16,216 bp long. There were totally 13 protein coding genes, 22 transfer RNA, 2 ribosomal RNA and a control (A + T-rich) region estimated to be 1690 bp. The gene arrangement and orientations of assembled mitogenome were identical to the reported predatory ladybird beetle Coccinella septempunctata L. This is the first completely sequenced coleopteran mitochondrial genome from the beetle subfamily Epilachninae from India. Data generated in this study will benefit future comparative genomics studies for understanding the evolutionary relationships between predatory and phytophagous coccinellid beetles.
Complete Genome Sequence of Bacteroides ovatus V975
Goesmann, Alexander; Carding, Simon R.
2016-01-01
The complete genome sequence of Bacteroides ovatus V975 was determined. The genome consists of a single circular chromosome of 6,475,296 bp containing five rRNA operons, 68 tRNA genes, and 4,959 coding genes. PMID:27908995
Qiao, Qin; Ren, Zhumei; Zhao, Jiayuan; Yonezawa, Takahiro; Hasegawa, Masami; Crabbe, M. James C; Li, Jianqiang; Zhong, Yang
2013-01-01
Background The central function of chloroplasts is to carry out photosynthesis, and its gene content and structure are highly conserved across land plants. Parasitic plants, which have reduced photosynthetic ability, suffer gene losses from the chloroplast (cp) genome accompanied by the relaxation of selective constraints. Compared with the rapid rise in the number of cp genome sequences of photosynthetic organisms, there are limited data sets from parasitic plants. Principal Findings/Significance Here we report the complete sequence of the cp genome of Cistanche deserticola, a holoparasitic desert species belonging to the family Orobanchaceae. The cp genome of C. deserticola is greatly reduced both in size (102,657 bp) and in gene content, indicating that all genes required for photosynthesis suffer from gene loss and pseudogenization, except for psbM. The striking difference from other holoparasitic plants is that it retains almost a full set of tRNA genes, and it has lower dN/dS for most genes than another close holoparasitic plant, E. virginiana, suggesting that Cistanche deserticola has undergone fewer losses, either due to a reduced level of holoparasitism, or to a recent switch to this life history. We also found that the rpoC2 gene was present in two copies within C. deserticola. Its own copy has much shortened and turned out to be a pseudogene. Another copy, which was not located in its cp genome, was a homolog of the host plant, Haloxylon ammodendron (Chenopodiaceae), suggesting that it was acquired from its host via a horizontal gene transfer. PMID:23554920
Altermann, Eric; Lu, Jingli; McCulloch, Alan
2017-01-01
Expert curated annotation remains one of the critical steps in achieving a reliable biological relevant annotation. Here we announce the release of GAMOLA2, a user friendly and comprehensive software package to process, annotate and curate draft and complete bacterial, archaeal, and viral genomes. GAMOLA2 represents a wrapping tool to combine gene model determination, functional Blast, COG, Pfam, and TIGRfam analyses with structural predictions including detection of tRNAs, rRNA genes, non-coding RNAs, signal protein cleavage sites, transmembrane helices, CRISPR repeats and vector sequence contaminations. GAMOLA2 has already been validated in a wide range of bacterial and archaeal genomes, and its modular concept allows easy addition of further functionality in future releases. A modified and adapted version of the Artemis Genome Viewer (Sanger Institute) has been developed to leverage the additional features and underlying information provided by the GAMOLA2 analysis, and is part of the software distribution. In addition to genome annotations, GAMOLA2 features, among others, supplemental modules that assist in the creation of custom Blast databases, annotation transfers between genome versions, and the preparation of Genbank files for submission via the NCBI Sequin tool. GAMOLA2 is intended to be run under a Linux environment, whereas the subsequent visualization and manual curation in Artemis is mobile and platform independent. The development of GAMOLA2 is ongoing and community driven. New functionality can easily be added upon user requests, ensuring that GAMOLA2 provides information relevant to microbiologists. The software is available free of charge for academic use. PMID:28386247
Li, Xi; Zhang, Ti-Cao; Qiao, Qin; Ren, Zhumei; Zhao, Jiayuan; Yonezawa, Takahiro; Hasegawa, Masami; Crabbe, M James C; Li, Jianqiang; Zhong, Yang
2013-01-01
The central function of chloroplasts is to carry out photosynthesis, and its gene content and structure are highly conserved across land plants. Parasitic plants, which have reduced photosynthetic ability, suffer gene losses from the chloroplast (cp) genome accompanied by the relaxation of selective constraints. Compared with the rapid rise in the number of cp genome sequences of photosynthetic organisms, there are limited data sets from parasitic plants. PRINCIPAL FINDINGS/SIGNIFICANCE: Here we report the complete sequence of the cp genome of Cistanche deserticola, a holoparasitic desert species belonging to the family Orobanchaceae. The cp genome of C. deserticola is greatly reduced both in size (102,657 bp) and in gene content, indicating that all genes required for photosynthesis suffer from gene loss and pseudogenization, except for psbM. The striking difference from other holoparasitic plants is that it retains almost a full set of tRNA genes, and it has lower dN/dS for most genes than another close holoparasitic plant, E. virginiana, suggesting that Cistanche deserticola has undergone fewer losses, either due to a reduced level of holoparasitism, or to a recent switch to this life history. We also found that the rpoC2 gene was present in two copies within C. deserticola. Its own copy has much shortened and turned out to be a pseudogene. Another copy, which was not located in its cp genome, was a homolog of the host plant, Haloxylon ammodendron (Chenopodiaceae), suggesting that it was acquired from its host via a horizontal gene transfer.
Altermann, Eric; Lu, Jingli; McCulloch, Alan
2017-01-01
Expert curated annotation remains one of the critical steps in achieving a reliable biological relevant annotation. Here we announce the release of GAMOLA2, a user friendly and comprehensive software package to process, annotate and curate draft and complete bacterial, archaeal, and viral genomes. GAMOLA2 represents a wrapping tool to combine gene model determination, functional Blast, COG, Pfam, and TIGRfam analyses with structural predictions including detection of tRNAs, rRNA genes, non-coding RNAs, signal protein cleavage sites, transmembrane helices, CRISPR repeats and vector sequence contaminations. GAMOLA2 has already been validated in a wide range of bacterial and archaeal genomes, and its modular concept allows easy addition of further functionality in future releases. A modified and adapted version of the Artemis Genome Viewer (Sanger Institute) has been developed to leverage the additional features and underlying information provided by the GAMOLA2 analysis, and is part of the software distribution. In addition to genome annotations, GAMOLA2 features, among others, supplemental modules that assist in the creation of custom Blast databases, annotation transfers between genome versions, and the preparation of Genbank files for submission via the NCBI Sequin tool. GAMOLA2 is intended to be run under a Linux environment, whereas the subsequent visualization and manual curation in Artemis is mobile and platform independent. The development of GAMOLA2 is ongoing and community driven. New functionality can easily be added upon user requests, ensuring that GAMOLA2 provides information relevant to microbiologists. The software is available free of charge for academic use.
Complete genome sequence of 285P, a novel T7-like polyvalent E. coli bacteriophage.
Xu, Bin; Ma, Xiangyu; Xiong, Hongyan; Li, Yafei
2014-06-01
Bacteriophages are considered potential biological agents for the control of infectious diseases and environmental disinfection. Here, we describe a novel T7-like polyvalent Escherichia coli bacteriophage, designated "285P," which can lyse several strains of E. coli. The genome, which consists of 39,270 base pairs with a G+C content of 48.73 %, was sequenced and annotated. Forty-three potential open reading frames were identified using bioinformatics tools. Based on whole-genome sequence comparison, phage 285P was identified as a novel strain of subgroup T7. It showed strongest sequence similarity to Kluyvera phage Kvp1. The phylogenetic analyses of both non-structural proteins (endonuclease gp3, amidase gp3.5, DNA primase/helicase gp4, DNA polymerase gp5, and exonuclease gp6) and structural protein (tail fiber protein gp17) led to the identification of 285P as T7-like phage. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis and matrix-assisted laser desorption/ionization time-of-flight mass spectrometric analyses verified the annotation of the structural proteins (major capsid protein gp10a, tail protein gp12, and tail fiber protein gp17).
Hassan, Syed S.; Jamal, Syed B.; Radusky, Leandro G.; Tiwari, Sandeep; Ullah, Asad; Ali, Javed; Behramand; de Carvalho, Paulo V. S. D.; Shams, Rida; Khan, Sabir; Figueiredo, Henrique C. P.; Barh, Debmalya; Ghosh, Preetam; Silva, Artur; Baumbach, Jan; Röttger, Richard; Turjanski, Adrián G.; Azevedo, Vasco A. C.
2018-01-01
Diphtheria is an acute and highly infectious disease, previously regarded as endemic in nature but vaccine-preventable, is caused by Corynebacterium diphtheriae (Cd). In this work, we used an in silico approach along the 13 complete genome sequences of C. diphtheriae followed by a computational assessment of structural information of the binding sites to characterize the “pocketome druggability.” To this end, we first computed the “modelome” (3D structures of a complete genome) of a randomly selected reference strain Cd NCTC13129; that had 13,763 open reading frames (ORFs) and resulted in 1,253 (∼9%) structure models. The amino acid sequences of these modeled structures were compared with the remaining 12 genomes and consequently, 438 conserved protein sequences were obtained. The RCSB-PDB database was consulted to check the template structures for these conserved proteins and as a result, 401 adequate 3D models were obtained. We subsequently predicted the protein pockets for the obtained set of models and kept only the conserved pockets that had highly druggable (HD) values (137 across all strains). Later, an off-target host homology analyses was performed considering the human proteome using NCBI database. Furthermore, the gene essentiality analysis was carried out that gave a final set of 10-conserved targets possessing highly druggable protein pockets. To check the target identification robustness of the pipeline used in this work, we crosschecked the final target list with another in-house target identification approach for C. diphtheriae thereby obtaining three common targets, these were; hisE-phosphoribosyl-ATP pyrophosphatase, glpX-fructose 1,6-bisphosphatase II, and rpsH-30S ribosomal protein S8. Our predicted results suggest that the in silico approach used could potentially aid in experimental polypharmacological target determination in C. diphtheriae and other pathogens, thereby, might complement the existing and new drug-discovery pipelines. PMID:29487617
Dominova, I. N.; Kublanov, I. V.; Podosokorskaya, O. A.; Derbikova, K. S.; Patrushev, M. V.
2013-01-01
The complete genomic sequence of a novel hyperthermophilic crenarchaeon, strain 1910bT, was determined. The genome comprises a 1,750,259-bp circular chromosome containing single copies of 3 rRNA genes, 43 tRNA genes, and 1,896 protein-coding sequences. In silico genome-genome hybridization suggests the proposal of a novel species, “Thermofilum adornatus” strain 1910bT. PMID:24029764
The bipartite mitochondrial genome of Ruizia karukerae (Rhigonematomorpha, Nematoda).
Kim, Taeho; Kern, Elizabeth; Park, Chungoo; Nadler, Steven A; Bae, Yeon Jae; Park, Joong-Ki
2018-05-10
Mitochondrial genes and whole mitochondrial genome sequences are widely used as molecular markers in studying population genetics and resolving both deep and shallow nodes in phylogenetics. In animals the mitochondrial genome is generally composed of a single chromosome, but mystifying exceptions sometimes occur. We determined the complete mitochondrial genome of the millipede-parasitic nematode Ruizia karukerae and found its mitochondrial genome consists of two circular chromosomes, which is highly unusual in bilateral animals. Chromosome I is 7,659 bp and includes six protein-coding genes, two rRNA genes and nine tRNA genes. Chromosome II comprises 7,647 bp, with seven protein-coding genes and 16 tRNA genes. Interestingly, both chromosomes share a 1,010 bp sequence containing duplicate copies of cox2 and three tRNA genes (trnD, trnG and trnH), and the nucleotide sequences between the duplicated homologous gene copies are nearly identical, suggesting a possible recent genesis for this bipartite mitochondrial genome. Given that little is known about the formation, maintenance or evolution of abnormal mitochondrial genome structures, R. karukerae mtDNA may provide an important early glimpse into this process.
2012-01-01
Background Efficient, robust, and accurate genotype imputation algorithms make large-scale application of genomic selection cost effective. An algorithm that imputes alleles or allele probabilities for all animals in the pedigree and for all genotyped single nucleotide polymorphisms (SNP) provides a framework to combine all pedigree, genomic, and phenotypic information into a single-stage genomic evaluation. Methods An algorithm was developed for imputation of genotypes in pedigreed populations that allows imputation for completely ungenotyped animals and for low-density genotyped animals, accommodates a wide variety of pedigree structures for genotyped animals, imputes unmapped SNP, and works for large datasets. The method involves simple phasing rules, long-range phasing and haplotype library imputation and segregation analysis. Results Imputation accuracy was high and computational cost was feasible for datasets with pedigrees of up to 25 000 animals. The resulting single-stage genomic evaluation increased the accuracy of estimated genomic breeding values compared to a scenario in which phenotypes on relatives that were not genotyped were ignored. Conclusions The developed imputation algorithm and software and the resulting single-stage genomic evaluation method provide powerful new ways to exploit imputation and to obtain more accurate genetic evaluations. PMID:22462519
Lei, Yong-Liang; Wang, Xiao-Guang; Tao, Xiao-Yan; Li, Hao; Meng, Sheng-Li; Chen, Xiu-Ying; Liu, Fu-Ming; Ye, Bi-Feng; Tang, Qing
2010-01-01
Based on sequencing the full-length genomes of four Chinese Ferret-Badger and dog, we analyze the properties of rabies viruses genetic variation in molecular level, get the information about rabies viruses prevalence and variation in Zhejiang, and enrich the genome database of rabies viruses street strains isolated from China. Rabies viruses in suckling mice were isolated, overlapped fragments were amplified by RT-PCR and full-length genomes were assembled to analyze the nucleotide and deduced protein similarities and phylogenetic analyses from Chinese Ferret-Badger, dog, sika deer, vole, used vaccine strain were determined. The four full-length genomes were sequenced completely and had the same genetic structure with the length of 11, 923 nts or 11, 925 nts including 58 nts-Leader, 1353 nts-NP, 894 nts-PP, 609 nts-MP, 1575 nts-GP, 6386 nts-LP, and 2, 5, 5 nts- intergenic regions(IGRs), 423 nts-Pseudogene-like sequence (psi), 70 nts-Trailer. The four full-length genomes were in accordance with the properties of Rhabdoviridae Lyssa virus by BLAST and multi-sequence alignment. The nucleotide and amino acid sequences among Chinese strains had the highest similarity, especially among animals of the same species. Of the four full-length genomes, the similarity in amino acid level was dramatically higher than that in nucleotide level, so the nucleotide mutations happened in these four genomes were most synonymous mutations. Compared with the reference rabies viruses, the lengths of the five protein coding regions had no change, no recombination, only with a few point mutations. It was evident that the five proteins appeared to be stable. The variation sites and types of the four genomes were similar to the reference vaccine or street strains. And the four strains were genotype 1 according to the multi-sequence and phylogenetic analyses, which possessed the distinct district characteristics of China. Therefore, these four rabies viruses are likely to be street viruses already existing in the natural world.
Visualization of Genome Diversity in German Shepherd Dogs.
Mortlock, Sally-Anne; Booth, Rachel; Mazrier, Hamutal; Khatkar, Mehar S; Williamson, Peter
2015-01-01
A loss of genetic diversity may lead to increased disease risks in subpopulations of dogs. The canine breed structure has contributed to relatively small effective population size in many breeds and can limit the options for selective breeding strategies to maintain diversity. With the completion of the canine genome sequencing project, and the subsequent reduction in the cost of genotyping on a genomic scale, evaluating diversity in dogs has become much more accurate and accessible. This provides a potential tool for advising dog breeders and developing breeding programs within a breed. A challenge in doing this is to present complex relationship data in a form that can be readily utilized. Here, we demonstrate the use of a pipeline, known as NetView, to visualize the network of relationships in a subpopulation of German Shepherd Dogs.
Pettengill, Emily A.; Hoffmann, Maria; Roberts, Richard J.; Payne, Justin; Allard, Marc; Michelacci, Valeria; Minelli, Fabio; Morabito, Stefano
2015-01-01
We present here the complete genome sequence of a strain of enteroinvasive Escherichia coli O96:H19 from a severe foodborne outbreak in a canteen in Italy in 2014. The complete genome may provide important information about the acquired pathogenicity of this strain and the transition between commensal and pathogenic E. coli. PMID:26251502
Govin, Jerome; Gaucher, Jonathan; Ferro, Myriam; Debernardi, Alexandra; Garin, Jerome; Khochbin, Saadi; Rousseaux, Sophie
2012-01-01
After meiosis, during the final stages of spermatogenesis, the haploid male genome undergoes major structural changes, resulting in a shift from a nucleosome-based genome organization to the sperm-specific, highly compacted nucleoprotamine structure. Recent data support the idea that region-specific programming of the haploid male genome is of high importance for the post-fertilization events and for successful embryo development. Although these events constitute a unique and essential step in reproduction, the mechanisms by which they occur have remained completely obscure and the factors involved have mostly remained uncharacterized. Here, we sought a strategy to significantly increase our understanding of proteins controlling the haploid male genome reprogramming, based on the identification of proteins in two specific pools: those with the potential to bind nucleic acids (basic proteins) and proteins capable of binding basic proteins (acidic proteins). For the identification of acidic proteins, we developed an approach involving a transition-protein (TP)-based chromatography, which has the advantage of retaining not only acidic proteins due to the charge interactions, but also potential TP-interacting factors. A second strategy, based on an in-depth bioinformatic analysis of the identified proteins, was then applied to pinpoint within the lists obtained, male germ cells expressed factors relevant to the post-meiotic genome organization. This approach reveals a functional network of DNA-packaging proteins and their putative chaperones and sheds a new light on the way the critical transitions in genome organizations could take place. This work also points to a new area of research in male infertility and sperm quality assessments.
One Bacterial Cell, One Complete Genome
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woyke, Tanja; Tighe, Damon; Mavrommatis, Konstantinos
2010-04-26
While the bulk of the finished microbial genomes sequenced to date are derived from cultured bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing attempts, severely limiting the ability to recover complete or even partial genomes from these environmental species. Single cell genomics is a novel culture-independent approach, which enables access to the genetic material of an individual cell. No single cell genome has to our knowledge been closed and finished to date. Here we report the completed genome from an uncultured single cell of Candidatus Sulcia muelleri DMIN. Digital PCR on single symbiont cells isolated frommore » the bacteriome of the green sharpshooter Draeculacephala minerva bacteriome allowed us to assess that this bacteria is polyploid with genome copies ranging from approximately 200?900 per cell, making it a most suitable target for single cell finishing efforts. For single cell shotgun sequencing, an individual Sulcia cell was isolated and whole genome amplified by multiple displacement amplification (MDA). Sanger-based finishing methods allowed us to close the genome. To verify the correctness of our single cell genome and exclude MDA-derived artifacts, we independently shotgun sequenced and assembled the Sulcia genome from pooled bacteriomes using a metagenomic approach, yielding a nearly identical genome. Four variations we detected appear to be genuine biological differences between the two samples. Comparison of the single cell genome with bacteriome metagenomic sequence data detected two single nucleotide polymorphisms (SNPs), indicating extremely low genetic diversity within a Sulcia population. This study demonstrates the power of single cell genomics to generate a complete, high quality, non-composite reference genome within an environmental sample, which can be used for population genetic analyzes.« less
2012-01-01
Background The complete sequences of chloroplast genomes provide wealthy information regarding the evolutionary history of species. With the advance of next-generation sequencing technology, the number of completely sequenced chloroplast genomes is expected to increase exponentially, powerful computational tools annotating the genome sequences are in urgent need. Results We have developed a web server CPGAVAS. The server accepts a complete chloroplast genome sequence as input. First, it predicts protein-coding and rRNA genes based on the identification and mapping of the most similar, full-length protein, cDNA and rRNA sequences by integrating results from Blastx, Blastn, protein2genome and est2genome programs. Second, tRNA genes and inverted repeats (IR) are identified using tRNAscan, ARAGORN and vmatch respectively. Third, it calculates the summary statistics for the annotated genome. Fourth, it generates a circular map ready for publication. Fifth, it can create a Sequin file for GenBank submission. Last, it allows the extractions of protein and mRNA sequences for given list of genes and species. The annotation results in GFF3 format can be edited using any compatible annotation editing tools. The edited annotations can then be uploaded to CPGAVAS for update and re-analyses repeatedly. Using known chloroplast genome sequences as test set, we show that CPGAVAS performs comparably to another application DOGMA, while having several superior functionalities. Conclusions CPGAVAS allows the semi-automatic and complete annotation of a chloroplast genome sequence, and the visualization, editing and analysis of the annotation results. It will become an indispensible tool for researchers studying chloroplast genomes. The software is freely accessible from http://www.herbalgenomics.org/cpgavas. PMID:23256920
Yang, G; Liu, X G; Qiu, B S
2000-07-01
The complete nucleotides of two Chinese tobacco mosaic virus (TMV) isolates, TMV-Cv (vulgare strain) and TMV-N14 (an attenuated virus originated from a tomato strain), were determined from their respective full-length infectious cDNA clones and compared with published TMV sequences. The genome structure of TMV-Cv contained 6395 nucleotides, in which four functional open reading frames (ORF), coding for replicase (126 kD/183 kD), movement protein (MP, 30 kD) and coat protein (CP, 17.6 kD) respectively, could be recognized. TMV-N14 contained 6384 nucleotides in its genome. In contrast to TMV-Cv, five functional ORFs encoding the replicase 98.5 kD/126 kD/183 kD, MP(27 kD) and CP(17.6 kD), respectively, were detected in the TMV-N14 genome. TMV-Cv is 99% homologous to a Korean TMV isolate belonging to the vulgare strain at the nucleotide level. TMV-N14 is 99% homologous to a highly virulent Japanese isolate TMV-L (tomato strain) at the nucleotide level. In TMV-N14, one opal nulation (UGA) occurred in the replicase gene and one ochre nutation (UAA) in the MP gene. The former mutation created a potential, additional ORF within the replicase gene, the latter reduced the size of the MP to 27 kD. In addition, there were also 13 amino acid substitutions in the replicase gene of TMV-N14 when compared to that of TMV-L. Collectively, these changes may have significant implications in the attenuation of the virulence of TMV-N14.
Norman, Paul J.; Norberg, Steven J.; Guethlein, Lisbeth A.; Nemat-Gorgani, Neda; Royce, Thomas; Wroblewski, Emily E.; Dunn, Tamsen; Mann, Tobias; Alicata, Claudia; Hollenbach, Jill A.; Chang, Weihua; Shults Won, Melissa; Gunderson, Kevin L.; Abi-Rached, Laurent; Ronaghi, Mostafa; Parham, Peter
2017-01-01
The most polymorphic part of the human genome, the MHC, encodes over 160 proteins of diverse function. Half of them, including the HLA class I and II genes, are directly involved in immune responses. Consequently, the MHC region strongly associates with numerous diseases and clinical therapies. Notoriously, the MHC region has been intractable to high-throughput analysis at complete sequence resolution, and current reference haplotypes are inadequate for large-scale studies. To address these challenges, we developed a method that specifically captures and sequences the 4.8-Mbp MHC region from genomic DNA. For 95 MHC homozygous cell lines we assembled, de novo, a set of high-fidelity contigs and a sequence scaffold, representing a mean 98% of the target region. Included are six alternative MHC reference sequences of the human genome that we completed and refined. Characterization of the sequence and structural diversity of the MHC region shows the approach accurately determines the sequences of the highly polymorphic HLA class I and HLA class II genes and the complex structural diversity of complement factor C4A/C4B. It has also uncovered extensive and unexpected diversity in other MHC genes; an example is MUC22, which encodes a lung mucin and exhibits more coding sequence alleles than any HLA class I or II gene studied here. More than 60% of the coding sequence alleles analyzed were previously uncharacterized. We have created a substantial database of robust reference MHC haplotype sequences that will enable future population scale studies of this complicated and clinically important region of the human genome. PMID:28360230
Ginkgo and Welwitschia Mitogenomes Reveal Extreme Contrasts in Gymnosperm Mitochondrial Evolution.
Guo, Wenhu; Grewe, Felix; Fan, Weishu; Young, Gregory J; Knoop, Volker; Palmer, Jeffrey D; Mower, Jeffrey P
2016-06-01
Mitochondrial genomes (mitogenomes) of flowering plants are well known for their extreme diversity in size, structure, gene content, and rates of sequence evolution and recombination. In contrast, little is known about mitogenomic diversity and evolution within gymnosperms. Only a single complete genome sequence is available, from the cycad Cycas taitungensis, while limited information is available for the one draft sequence, from Norway spruce (Picea abies). To examine mitogenomic evolution in gymnosperms, we generated complete genome sequences for the ginkgo tree (Ginkgo biloba) and a gnetophyte (Welwitschia mirabilis). There is great disparity in size, sequence conservation, levels of shared DNA, and functional content among gymnosperm mitogenomes. The Cycas and Ginkgo mitogenomes are relatively small, have low substitution rates, and possess numerous genes, introns, and edit sites; we infer that these properties were present in the ancestral seed plant. By contrast, the Welwitschia mitogenome has an expanded size coupled with accelerated substitution rates and extensive loss of these functional features. The Picea genome has expanded further, to more than 4 Mb. With regard to structural evolution, the Cycas and Ginkgo mitogenomes share a remarkable amount of intergenic DNA, which may be related to the limited recombinational activity detected at repeats in Ginkgo Conversely, the Welwitschia mitogenome shares almost no intergenic DNA with any other seed plant. By conducting the first measurements of rates of DNA turnover in seed plant mitogenomes, we discovered that turnover rates vary by orders of magnitude among species. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fleischmann, R.D.; Adams, M.D.; White, O.
1995-07-28
An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism. 46 refs., 4 figs., 4 tabs.
Completed Genome Sequences of Strains from 36 Serotypes of Salmonella
Robertson, James; Yoshida, Catherine; Gurnik, Simone; Rankin, Marisa
2018-01-01
ABSTRACT We report here the completed closed genome sequences of strains representing 36 serotypes of Salmonella. These genome sequences will provide useful references for understanding the genetic variation between serotypes, particularly as references for mapping of raw reads or to create assemblies of higher quality, as well as to aid in studies of comparative genomics of Salmonella. PMID:29348347
Characterization of the complete chloroplast genome of Platycarya strobilacea (Juglandaceae)
Jing Yan; Kai Han; Shuyun Zeng; Peng Zhao; Keith Woeste; Jianfang Li; Zhan-Lin Liu
2017-01-01
The whole chloroplast genome (cp genome) sequence of Platycarya strobilacea was characterized from Illumina pair-end sequencing data. The complete cp genome was 160,994 bp in length and contained a large single copy region (LSC) of 90,225 bp and a small single copy region (SSC) of 18,371 bp, which were separated by a pair of inverted repeat regions...
Metagenomic Analysis of Cucumber RNA from East Timor Reveals an Aphid lethal paralysis virus Genome
Maina, Solomon; Edwards, Owain R.; de Almeida, Luis; Ximenes, Abel
2017-01-01
ABSTRACT We present here the first complete genomic Aphid lethal paralysis virus (ALPV) sequence isolated from cucumber plant RNA from East Timor. We compare it with two complete ALPV genome sequences from China, and one each from Israel, South Africa, and the United States. It most closely resembled the Chinese isolate LGH genome. PMID:28082492
First Complete Genome Sequence of Suakwa aphid-borne yellows virus from East Timor
Maina, Solomon; Edwards, Owain R.; de Almeida, Luis; Ximenes, Abel
2016-01-01
We present here the first complete genomic RNA sequence of the polerovirus Suakwa aphid-borne yellows virus (SABYV), from East Timor. The isolate sequenced came from a virus-infected pumpkin plant. The East Timorese genome had a nucleotide identity of 86.5% with the only other SABYV genome available, which is from Taiwan. PMID:27469955
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).
Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren
2016-04-01
Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)
Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren
2016-01-01
Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575
Krzeminska, Urszula; Wilson, Robyn; Rahman, Sadequr; Song, Beng Kah; Seneviratne, Sampath; Gan, Han Ming; Austin, Christopher M
2016-07-01
The complete mitochondrial genomes of two jungle crows (Corvus macrorhynchos) were sequenced. DNA was extracted from tissue samples obtained from shed feathers collected in the field in Sri Lanka and sequenced using the Illumina MiSeq Personal Sequencer. Jungle crow mitogenomes have a structural organization typical of the genus Corvus and are 16,927 bp and 17,066 bp in length, both comprising 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal subunit genes, and a non-coding control region. In addition, we complement already available house crow (Corvus spelendens) mitogenome resources by sequencing an individual from Singapore. A phylogenetic tree constructed from Corvidae family mitogenome sequences available on GenBank is presented. We confirm the monophyly of the genus Corvus and propose to use complete mitogenome resources for further intra- and interspecies genetic studies.
Complete mitochondrial genome of Cynopterus sphinx (Pteropodidae: Cynopterus).
Li, Linmiao; Li, Min; Wu, Zhengjun; Chen, Jinping
2015-01-01
We have characterized the complete mitochondrial genome of Cynopterus sphinx (Pteropodidae: Cynopterus) and described its organization in this study. The total length of C. sphinx complete mitochondrial genome was 16,895 bp with the base composition of 32.54% A, 14.05% G, 25.82% T and 27.59% C. The complete mitochondrial genome included 13 protein-coding genes, 22 tRNA genes, 2 rRNA genes (12S rRNA and 16S rRNA) and 1 control region (D-loop). The control region was 1435 bp long with the sequence CATACG repeat 64 times. Three protein-coding genes (ND1, COI and ND4) were ended with incomplete stop codon TA or T.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Norton, Jeanette M.; Klotz, Martin G; Stein, Lisa Y
2008-01-01
The complete genome of the ammonia-oxidizing bacterium, Nitrosospira multiformis (ATCC 25196T), consists of a circular chromosome and three small plasmids totaling 3,234,309 bp and encoding 2827 putative proteins. Of these, 2026 proteins have predicted functions and 801 are without conserved functional domains, yet 747 of these have similarity to other predicted proteins in databases. Gene homologs from Nitrosomonas europaea and N. eutropha were the best match for 42% of the predicted genes in N. multiformis. The genome contains three nearly identical copies of amo and hao gene clusters as large repeats. Distinguishing features compared to N. europaea include: the presencemore » of gene clusters encoding urease and hydrogenase, a RuBisCO-encoding operon of distinctive structure and phylogeny, and a relatively small complement of genes related to Fe acquisition. Systems for synthesis of a pyoverdine-like siderophore and for acyl-homoserine lactone were unique to N. multiformis among the sequenced AOB genomes. Gene clusters encoding proteins associated with outer membrane and cell envelope functions including transporters, porins, exopolysaccharide synthesis, capsule formation and protein sorting/export were abundant. Numerous sensory transduction and response regulator gene systems directed towards sensing of the extracellular environment are described. Gene clusters for glycogen, polyphosphate and cyanophycin storage and utilization were identified providing mechanisms for meeting energy requirements under substrate-limited conditions. The genome of N. multiformis encodes the core pathways for chemolithoautotrophy along with adaptations for surface growth and survival in soil environments.« less
The complete chloroplast genome of the Dendrobium strongylanthum (Orchidaceae: Epidendroideae).
Li, Jing; Chen, Chen; Wang, Zhe-Zhi
2016-07-01
Complete chloroplast genome sequence is very useful for studying the phylogenetic and evolution of species. In this study, the complete chloroplast genome of Dendrobium strongylanthum was constructed from whole-genome Illumina sequencing data. The chloroplast genome is 153 058 bp in length with 37.6% GC content and consists of two inverted repeats (IRs) of 26 316 bp. The IR regions are separated by large single-copy region (LSC, 85 836 bp) and small single-copy (SSC, 14 590 bp) region. A total of 130 chloroplast genes were successfully annotated, including 84 protein coding genes, 38 tRNA genes, and eight rRNA genes. Phylogenetic analyses showed that the chloroplast genome of Dendrobium strongylanthum is related to that of the Dendrobium officinal.
Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies
Lasitschka, Bärbel; Jones, David; Northcott, Paul; Hutter, Barbara; Jäger, Natalie; Kool, Marcel; Taylor, Michael; Lichter, Peter; Pfister, Stefan; Wolf, Stephan; Brors, Benedikt; Eils, Roland
2013-01-01
The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes. PMID:23776689
RNAiFOLD: a constraint programming algorithm for RNA inverse folding and molecular design.
Garcia-Martin, Juan Antonio; Clote, Peter; Dotu, Ivan
2013-04-01
Synthetic biology is a rapidly emerging discipline with long-term ramifications that range from single-molecule detection within cells to the creation of synthetic genomes and novel life forms. Truly phenomenal results have been obtained by pioneering groups--for instance, the combinatorial synthesis of genetic networks, genome synthesis using BioBricks, and hybridization chain reaction (HCR), in which stable DNA monomers assemble only upon exposure to a target DNA fragment, biomolecular self-assembly pathways, etc. Such work strongly suggests that nanotechnology and synthetic biology together seem poised to constitute the most transformative development of the 21st century. In this paper, we present a Constraint Programming (CP) approach to solve the RNA inverse folding problem. Given a target RNA secondary structure, we determine an RNA sequence which folds into the target structure; i.e. whose minimum free energy structure is the target structure. Our approach represents a step forward in RNA design--we produce the first complete RNA inverse folding approach which allows for the specification of a wide range of design constraints. We also introduce a Large Neighborhood Search approach which allows us to tackle larger instances at the cost of losing completeness, while retaining the advantages of meeting design constraints (motif, GC-content, etc.). Results demonstrate that our software, RNAiFold, performs as well or better than all state-of-the-art approaches; nevertheless, our approach is unique in terms of completeness, flexibility, and the support of various design constraints. The algorithms presented in this paper are publicly available via the interactive webserver http://bioinformatics.bc.edu/clotelab/RNAiFold; additionally, the source code can be downloaded from that site.
Xu, Teng; Qin, Song; Hu, Yongwu; Song, Zhijian; Ying, Jianchao; Li, Peizhen; Dong, Wei; Zhao, Fangqing; Yang, Huanming; Bao, Qiyu
2016-01-01
Arthrospira platensis is a multi-cellular and filamentous non-N2-fixing cyanobacterium that is capable of performing oxygenic photosynthesis. In this study, we determined the nearly complete genome sequence of A. platensis YZ. A. platensis YZ genome is a single, circular chromosome of 6.62 Mb in size. Phylogenetic and comparative genomic analyses revealed that A. platensis YZ was more closely related to A. platensis NIES-39 than Arthrospira sp. PCC 8005 and A. platensis C1. Broad gene gains were identified between A. platensis YZ and three other Arthrospira speices, some of which have been previously demonstrated that can be laterally transferred among different species, such as restriction-modification systems-coding genes. Moreover, unprecedented extensive chromosomal rearrangements among different strains were observed. The chromosomal rearrangements, particularly the chromosomal inversions, were analysed and estimated to be closely related to palindromes that involved long inverted repeat sequences and the extensively distributed type IIR restriction enzyme in the Arthrospira genome. In addition, species from genus Arthrospira unanimously contained the highest rate of repetitive sequence compared with the other species of order Oscillatoriales, suggested that sequence duplication significantly contributed to Arthrospira genome phylogeny. These results provided in-depth views into the genomic phylogeny and structural variation of A. platensis, as well as provide a valuable resource for functional genomics studies. PMID:27330141
Oduru, Sreedhar; Campbell, Janee L; Karri, SriTulasi; Hendry, William J; Khan, Shafiq A; Williams, Simon C
2003-01-01
Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells. PMID:12783626
Complete Genome Sequence of Bacteroides ovatus V975.
Wegmann, Udo; Goesmann, Alexander; Carding, Simon R
2016-12-01
The complete genome sequence of Bacteroides ovatus V975 was determined. The genome consists of a single circular chromosome of 6,475,296 bp containing five rRNA operons, 68 tRNA genes, and 4,959 coding genes. Copyright © 2016 Wegmann et al.
Complete genome sequence of Sanguibacter keddieii type strain (ST-74T)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ivanova, Natalia; Sikorski, Johannes; Sims, David
2009-05-20
Sanguibacter keddieii is the type species of the genus Sanguibacter, the only described genus within the family of Sanguibacteraceae. Phylogenetically, this family is located in the neighbourhood of the genus Oerskovia and the family Cellulomonadaceae within the actinobacterial suborder Micrococcineae. The strain described in this report was isolated from blood of apparently healthy cows. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the family Sanguibacteraceae, and the 4,253,413 bp long single replicon genome with its 3735 protein-coding and 70 RNA genes is part ofmore » the Genomic Encyclopedia of Bacteria and Archaea project.« less
Complete genome sequence of Nakamurella multipartita type strain (Y-104).
Tice, Hope; Mayilraj, Shanmugam; Sims, David; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Copeland, Alex; Cheng, Jan-Fang; Meincke, Linda; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Detter, John C; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chen, Feng
2010-03-30
Nakamurella multipartita (Yoshimi et al. 1996) Tao et al. 2004 is the type species of the monospecific genus Nakamurella in the actinobacterial suborder Frankineae. The nonmotile, coccus-shaped strain was isolated from activated sludge acclimated with sugar-containing synthetic wastewater, and is capable of accumulating large amounts of polysaccharides in its cells. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the family Nakamurellaceae. The 6,060,298 bp long single replicon genome with its 5415 protein-coding and 56 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.
Filichkin, Sergei A.; Bransom, Kay L.; Goodwin, Joel B.; Dreher, Theo W.
2000-01-01
Five highly infectious turnip yellow mosaic virus (TYMV) genomes with sequence changes in their 3′-terminal regions that result in altered aminoacylation and eEF1A binding have been studied. These genomes were derived from cloned parental RNAs of low infectivity by sequential passaging in plants. Three of these genomes that are incapable of aminoacylation have been reported previously (J. B. Goodwin, J. M. Skuzeski, and T. W. Dreher, Virology 230:113–124, 1997). We now demonstrate by subcloning the 3′ untranslated regions into wild-type TYMV RNA that the high infectivities and replication rates of these genomes compared to their progenitors are mostly due to a small number of mutations acquired in the 3′ tRNA-like structure during passaging. Mutations in other parts of the genome, including the replication protein coding region, are not required for high infectivity but probably do play a role in optimizing viral amplification and spread in plants. Two other TYMV RNA variants of suboptimal infectivities, one that accepts methionine instead of the usual valine and one that interacts less tightly with eEF1A, were sequentially passaged to produce highly infectious genomes. The improved infectivities of these RNAs were not associated with increased replication in protoplasts, and no mutations were acquired in their 3′ tRNA-like structures. Complete sequencing of one genome identified two mutations that result in amino acid changes in the movement protein gene, suggesting that improved infectivity may be a function of improved viral dissemination in plants. Our results show that the wild-type TYMV replication proteins are able to amplify genomes with 3′ termini of variable sequence and tRNA mimicry. These and previous results have led to a model in which the binding of eEF1A to the 3′ end to antagonize minus-strand initiation is a major role of the tRNA-like structure. PMID:10954536
Complete Genome Sequence of Genotype VI Newcastle Disease Viruses Isolated from Pigeons in Pakistan
Wajid, Abdul; Rehmani, Shafqat Fatima; Sharma, Poonam; Goraichuk, Iryna V.; Dimitrov, Kiril M.
2016-01-01
Two complete genome sequences of Newcastle disease virus (NDV) are described here. Virulent isolates pigeon/Pakistan/Lahore/21A/2015 and pigeon/Pakistan/Lahore/25A/2015 were obtained from racing pigeons sampled in the Pakistani province of Punjab during 2015. Phylogenetic analysis of the fusion protein genes and complete genomes classified the isolates as members of NDV class II, genotype VI. PMID:27540069
Li, Jia; Gao, Lei; Chen, Shanshan; Tao, Ke; Su, Yingjuan; Wang, Ting
2016-02-11
Sciadopitys verticillata is an evergreen conifer and an economically valuable tree used in construction, which is the only member of the family Sciadopityaceae. Acquisition of the S. verticillata chloroplast (cp) genome will be useful for understanding the evolutionary mechanism of conifers and phylogenetic relationships among gymnosperm. In this study, we have first reported the complete chloroplast genome of S. verticillata. The total genome is 138,284 bp in length, consisting of 118 unique genes. The S. verticillata cp genome has lost one copy of the canonical inverted repeats and shown distinctive genomic structure comparing with other cupressophytes. Fifty-three simple sequence repeat loci and 18 forward tandem repeats were identified in the S. verticillata cp genome. According to the rearrangement of cupressophyte cp genome, we proposed one mechanism for the formation of inverted repeat: tandem repeat occured first, then rearrangement divided the tandem repeat into inverted repeats located at different regions. Phylogenetic estimates inferred from 59-gene sequences and cpDNA organizations have both shown that S. verticillata was sister to the clade consisting of Cupressaceae, Taxaceae, and Cephalotaxaceae. Moreover, accD gene was found to be lost in the S. verticillata cp genome, and a nucleus copy was identified from two transcriptome data.
2010-09-25
dermatitis associated with Rothia mucilaginosa bacteremia: a case report ,”American Journal of Dermatopathol- ogy, vol. 32, no. 2, pp. 175–179, 2010. [5] P...root- filled teeth with chronic apical periodontitis ,” International Endodontic Journal, vol. 34, no. 6, pp. 429–434, 2001. [12] L. C. de Paz...of Rothiamucilaginosa DY-18: A Clinical Isolate with DenseMeshwork-Like Structures from a Persistent Apical Periodontitis Lesion Kazuyoshi Yamane,1
Dermauw, Wannes; Vanholme, Bartel; Tirry, Luc; Van Leeuwen, Thomas
2010-04-01
In this study we sequenced and analysed the complete mitochondrial (mt) genome of the Chilean predatory mite Phytoseiulus persimilis Athias-Henriot (Chelicerata: Acari: Mesostigmata: Phytoseiidae: Amblyseiinae). The 16 199 bp genome (79.8% AT) contains the standard set of 13 protein-coding and 24 RNA genes. Compared with the ancestral arthropod mtDNA pattern, the gene order is extremely reshuffled (35 genes changed position) and represents a novel arrangement within the arthropods. This is probably related to the presence of several large noncoding regions in the genome. In contrast with the mt genome of the closely related species Metaseiulus occidentalis (Phytoseiidae: Typhlodrominae) - which was reported to be unusually large (24 961 bp), to lack nad6 and nad3 protein-coding genes, and to contain 22 tRNAs without T-arms - the genome of P. persimilis has all the features of a standard metazoan mt genome. Consequently, we performed additional experiments on the M. occidentalis mt genome. Our preliminary restriction digests and Southern hybridization data revealed that this genome is smaller than previously reported. In addition, we cloned nad3 in M. occidentalis and positioned this gene between nad4L and 12S-rRNA on the mt genome. Finally, we report that at least 15 of the 22 tRNAs in the M. occidentalis mt genome can be folded into canonical cloverleaf structures similar to their counterparts in P. persimilis.
The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans.
Tully, Benjamin J; Graham, Elaina D; Heidelberg, John F
2018-01-16
Microorganisms play a crucial role in mediating global biogeochemical cycles in the marine environment. By reconstructing the genomes of environmental organisms through metagenomics, researchers are able to study the metabolic potential of Bacteria and Archaea that are resistant to isolation in the laboratory. Utilizing the large metagenomic dataset generated from 234 samples collected during the Tara Oceans circumnavigation expedition, we were able to assemble 102 billion paired-end reads into 562 million contigs, which in turn were co-assembled and consolidated in to 7.2 million contigs ≥2 kb in length. Approximately 1 million of these contigs were binned to reconstruct draft genomes. In total, 2,631 draft genomes with an estimated completion of ≥50% were generated (1,491 draft genomes >70% complete; 603 genomes >90% complete). A majority of the draft genomes were manually assigned phylogeny based on sets of concatenated phylogenetic marker genes and/or 16S rRNA gene sequences. The draft genomes are now publically available for the research community at-large.
2018-01-01
ABSTRACT The mitochondrial genomes of Saccharomyces cerevisiae strains contain up to 13 introns. An intronless recombinant genome introduced into the nuclear background of S. cerevisiae strain W303 gave the S. cerevisiae CW252 strain, which is used to model mitochondrial respiratory pathologies. The complete sequence of this mitochondrial genome was obtained using a hybrid assembling methodology. PMID:29700138
RNA-dependent RNA polymerases from flaviviruses and Picornaviridae.
Lescar, Julien; Canard, Bruno
2009-12-01
Flaviviruses and picornaviruses are positive-strand RNA viruses that encode the RNA-dependent RNA polymerase (RdRp) required for replicating the viral genome in infected cells. Because of their specific and essential role in the virus life cycle, RdRps are prime targets for antiviral drugs. Recent structural data have shed light on the different strategies used by RdRps from flaviviruses and Picornaviridae to initiate RNA polymerization. New details about the catalytic mechanism, the role of metal ions, how these RdRps interact with other nonstructural (NS) viral and host-cell proteins as well as with the viral RNA genome have also been published. These advances contribute to give a more complete picture of the 3D structure and mechanism of a membrane-bound viral replication complex for these two classes of medically important human pathogens.
Yao, Jie; Yang, Hong; Dai, Renhuai
2017-10-01
Acanthoscelides obtectus is a common species of the subfamily Bruchinae and a worldwide-distributed seed-feeding beetle. The complete mitochondrial genome of A. obtectus is 16,130 bp in length with an A + T content of 76.4%. It contains a positive AT skew and a negative GC skew. The mitogenome of A. obtectus contains 13 protein-coding genes (PCGs), 22 tRNA genes, two rRNA genes and a non-coding region (D-loop). All PCGs start with an ATN codon, and seven (ND3, ATP6, COIII, ND3, ND4L, ND6, and Cytb) of them terminate with TAA, while the remaining five (COI, COII, ND1, ND4, and ND5) terminate with a single T, ATP8 terminates with TGA. Except tRNA Ser , the secondary structures of 21 tRNAs that can be folded into a typical clover-leaf structure were identified. The secondary structures of lrRNA and srRNA were also predicted in this study. There are six domains with 48 helices in lrRNA and three domains with 32 helices in srRNA. The control region of A. obtectus is 1354 bp in size with the highest A + T content (83.5%) in a mitochondrial gene. Thirteen PCGs in 19 species have been used to infer their phylogenetic relationships. Our results show that A. obtectus belongs to the family Chrysomelidae (subfamily-Bruchinae). This is the first study on phylogenetic analyses involving the mitochondrial genes of A. obtectus and could provide basic data for future studies of mitochondrial genome diversities and the evolution of related insect lineages.
Yang, Jun-Bo; Li, De-Zhu; Li, Hong-Tao
2014-09-01
Chloroplast genomes supply indispensable information that helps improve the phylogenetic resolution and even as organelle-scale barcodes. Next-generation sequencing technologies have helped promote sequencing of complete chloroplast genomes, but compared with the number of angiosperms, relatively few chloroplast genomes have been sequenced. There are two major reasons for the paucity of completely sequenced chloroplast genomes: (i) massive amounts of fresh leaves are needed for chloroplast sequencing and (ii) there are considerable gaps in the sequenced chloroplast genomes of many plants because of the difficulty of isolating high-quality chloroplast DNA, preventing complete chloroplast genomes from being assembled. To overcome these obstacles, all known angiosperm chloroplast genomes available to date were analysed, and then we designed nine universal primer pairs corresponding to the highly conserved regions. Using these primers, angiosperm whole chloroplast genomes can be amplified using long-range PCR and sequenced using next-generation sequencing methods. The primers showed high universality, which was tested using 24 species representing major clades of angiosperms. To validate the functionality of the primers, eight species representing major groups of angiosperms, that is, early-diverging angiosperms, magnoliids, monocots, Saxifragales, fabids, malvids and asterids, were sequenced and assembled their complete chloroplast genomes. In our trials, only 100 mg of fresh leaves was used. The results show that the universal primer set provided an easy, effective and feasible approach for sequencing whole chloroplast genomes in angiosperms. The designed universal primer pairs provide a possibility to accelerate genome-scale data acquisition and will therefore magnify the phylogenetic resolution and species identification in angiosperms. © 2014 John Wiley & Sons Ltd.
Fumoto, Masaki; Miyazaki, Satoru; Sugawara, Hideaki
2002-01-01
Genome Information Broker (GIB) is a powerful tool for the study of comparative genomics. GIB allows users to retrieve and display partial and/or whole genome sequences together with the relevant biological annotation. GIB has accumulated all the completed microbial genome and has recently been expanded to include Arabidopsis thaliana genome data from DDBJ/EMBL/GenBank. In the near future, hundreds of genome sequences will be determined. In order to handle such huge data, we have enhanced the GIB architecture by using XML, CORBA and distributed RDBs. We introduce the new GIB here. GIB is freely accessible at http://gib.genes.nig.ac.jp/. PMID:11752256
Toward genome-enabled mycology.
Hibbett, David S; Stajich, Jason E; Spatafora, Joseph W
2013-01-01
Genome-enabled mycology is a rapidly expanding field that is characterized by the pervasive use of genome-scale data and associated computational tools in all aspects of fungal biology. Genome-enabled mycology is integrative and often requires teams of researchers with diverse skills in organismal mycology, bioinformatics and molecular biology. This issue of Mycologia presents the first complete fungal genomes in the history of the journal, reflecting the ongoing transformation of mycology into a genome-enabled science. Here, we consider the prospects for genome-enabled mycology and the technical and social challenges that will need to be overcome to grow the database of complete fungal genomes and enable all fungal biologists to make use of the new data.
Ji, Feng; Zhao, Jing-Zhuang; Liu, Miao; Lu, Tong-Yan; Liu, Hong-Bai; Yin, Jiasheng; Xu, Li-Ming
2017-04-01
Infectious pancreatic necrosis (IPN) is a significant disease of farmed salmonids resulting in direct economic losses due to high mortality in China. However, no gene sequence of any Chinese infectious pancreatic necrosis virus (IPNV) isolates was available. In the study, moribund rainbow trout fry samples were collected during an outbreak of IPN in Yunnan province of southwest China in 2013. An IPNV was isolated and tentatively named ChRtm213. We determined the full genome sequence of the IPNV ChRtm213 and compared it with previously identified IPNV sequences worldwide. The sequences of different structural and non-structural protein genes were compared to those of other aquatic birnaviruses sequenced to date. The results indicated that the complete genome sequence of ChRtm213 strain contains a segment A (3099 nucleotides) coding a polyprotein VP2-VP4-VP3, and a segment B (2789 nucleotides) coding a RNA-dependent RNA polymerase VP1. The phylogenetic analyses showed that ChRtm213 strain fell within genogroup 1, serotype A9 (Jasper), having similarities of 96.3% (segment A) and 97.3% (segment B) with the IPNV strain AM98 from Japan. The results suggest that the Chinese IPNV isolate has relative closer relationship with Japanese IPNV strains. The sequence of ChRtm213 was the first gene sequence of IPNV isolates in China. This study provided a robust reference for diagnosis and/or control of IPNV prevalent in China.
CBS Genome Atlas Database: a dynamic storage for bioinformatic results and sequence data.
Hallin, Peter F; Ussery, David W
2004-12-12
Currently, new bacterial genomes are being published on a monthly basis. With the growing amount of genome sequence data, there is a demand for a flexible and easy-to-maintain structure for storing sequence data and results from bioinformatic analysis. More than 150 sequenced bacterial genomes are now available, and comparisons of properties for taxonomically similar organisms are not readily available to many biologists. In addition to the most basic information, such as AT content, chromosome length, tRNA count and rRNA count, a large number of more complex calculations are needed to perform detailed comparative genomics. DNA structural calculations like curvature and stacking energy, DNA compositions like base skews, oligo skews and repeats at the local and global level are just a few of the analysis that are presented on the CBS Genome Atlas Web page. Complex analysis, changing methods and frequent addition of new models are factors that require a dynamic database layout. Using basic tools like the GNU Make system, csh, Perl and MySQL, we have created a flexible database environment for storing and maintaining such results for a collection of complete microbial genomes. Currently, these results counts to more than 220 pieces of information. The backbone of this solution consists of a program package written in Perl, which enables administrators to synchronize and update the database content. The MySQL database has been connected to the CBS web-server via PHP4, to present a dynamic web content for users outside the center. This solution is tightly fitted to existing server infrastructure and the solutions proposed here can perhaps serve as a template for other research groups to solve database issues. A web based user interface which is dynamically linked to the Genome Atlas Database can be accessed via www.cbs.dtu.dk/services/GenomeAtlas/. This paper has a supplemental information page which links to the examples presented: www.cbs.dtu.dk/services/GenomeAtlas/suppl/bioinfdatabase.
Complete genomes of Hairstreak butterflies, their speciation, and nucleo-mitochondrial incongruence
Cong, Qian; Shen, Jinhui; Borek, Dominika; Robbins, Robert K.; Otwinowski, Zbyszek; Grishin, Nick V.
2016-01-01
Comparison of complete genomes of closely related species enables research on speciation and how phenotype is determined by genotype. Lepidoptera, an insect order of 150,000 species with diverse phenotypes, is well-suited for such comparative genomics studies if new genomes, which cover additional Lepidoptera families are acquired. We report a 729 Mbp genome assembly of the Calycopis cecrops, the first genome from the family Lycaenidae and the largest available Lepidoptera genome. As detritivore, Calycopis shows expansion in detoxification and digestion enzymes. We further obtained complete genomes of 8 Calycopis specimens: 3 C. cecrops and 5 C. isobeon, including a dry specimen stored in the museum for 30 years. The two species differ subtly in phenotype and cannot be differentiated by mitochondrial DNA. However, nuclear genomes revealed a deep split between them. Genes that can clearly separate the two species (speciation hotspots) mostly pertain to circadian clock, mating behavior, transcription regulation, development and cytoskeleton. The speciation hotspots and their function significantly overlap with those we previously found in Pterourus, suggesting common speciation mechanisms in these butterflies. PMID:27120974
Matsuyama, Tomoki; Kimura, Makoto T.; Koike, Kuniaki; Abe, Tomoko; Nakano, Takeshi; Asami, Tadao; Ebisuzaki, Toshikazu; Held, William A.; Yoshida, Shigeo; Nagase, Hiroki
2003-01-01
Understanding the role of ‘epigenetic’ changes such as DNA methylation and chromatin remodeling has now become critical in understanding many biological processes. In order to delineate the global methylation pattern in a given genomic DNA, computer software has been developed to create a virtual image of restriction landmark genomic scanning (Vi-RLGS). When using a methylation- sensitive enzyme such as NotI as the restriction landmark, the comparison between real and in silico RLGS profiles of the genome provides a methylation map of genomic NotI sites. A methylation map of the Arabidopsis genome was created that could be confirmed by a methylation-sensitive PCR assay. The method has also been applied to the mouse genome. Although a complete methylation map has not been completed, a region of methylation difference between two tissues has been tested and confirmed by bisulfite sequencing. Vi-RLGS in conjunction with real RLGS will make it possible to develop a more complete map of genomic sites that are methylated or demethylated as a consequence of normal or abnormal development. PMID:12888509
Complete genomes of Hairstreak butterflies, their speciation, and nucleo-mitochondrial incongruence.
Cong, Qian; Shen, Jinhui; Borek, Dominika; Robbins, Robert K; Otwinowski, Zbyszek; Grishin, Nick V
2016-04-28
Comparison of complete genomes of closely related species enables research on speciation and how phenotype is determined by genotype. Lepidoptera, an insect order of 150,000 species with diverse phenotypes, is well-suited for such comparative genomics studies if new genomes, which cover additional Lepidoptera families are acquired. We report a 729 Mbp genome assembly of the Calycopis cecrops, the first genome from the family Lycaenidae and the largest available Lepidoptera genome. As detritivore, Calycopis shows expansion in detoxification and digestion enzymes. We further obtained complete genomes of 8 Calycopis specimens: 3 C. cecrops and 5 C. isobeon, including a dry specimen stored in the museum for 30 years. The two species differ subtly in phenotype and cannot be differentiated by mitochondrial DNA. However, nuclear genomes revealed a deep split between them. Genes that can clearly separate the two species (speciation hotspots) mostly pertain to circadian clock, mating behavior, transcription regulation, development and cytoskeleton. The speciation hotspots and their function significantly overlap with those we previously found in Pterourus, suggesting common speciation mechanisms in these butterflies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-06-01
meraculous2 is a whole genome shotgun assembler for short-reads that is capable of assembling large, polymorphic genomes with modest computational requirements. Meraculous relies on an efficient and conservative traversal of the subgraph of the k-mer (deBruijn) graph of oligonucleotides with unique high quality extensions in the dataset, avoiding an explicit error correction step as used in other short-read assemblers. Additional features include (1) handling of allelic variation using "bubble" structures within the deBruijn graph, (2) gap closing of repetitive and low quality regions using localized assemblies, and (3) an improved scaffolding algorithm that produces more complete assemblies without compromising onmore » scaffolding accuracy« less
Sesavirus: prototype of a new parvovirus genus in feces of a sea lion.
Phan, Tung Gia; Gulland, Frances; Simeone, Claire; Deng, Xutao; Delwart, Eric
2015-02-01
We describe the nearly complete genome of a highly divergent parvovirus, we tentatively name Sesavirus, from the feces of a California sea lion pup (Zalophus californianus) suffering from malnutrition and pneumonia. The 5,049-base-long genome contained two major ORFs encoding a 553-aa nonstructural protein and a 965-aa structural protein which shared closest amino acid identities of 25 and 28 %, respectively, with members of the copiparvovirus genus known to infect pigs and cows. Given the low degree of similarity, Sesavirus might be considered as prototype for a new genus with a proposed name of Marinoparvovirus in the subfamily Parvovirinae.
Complete genome sequence of the plant pathogen Erwinia amylovora strain ATCC 49946
USDA-ARS?s Scientific Manuscript database
Erwinia amylovora causes the economically important disease fire blight that affects rosaceous plants, especially pear and apple. Here we report the complete genome sequence and annotation of strain ATCC 49946. The analysis of the sequence and its comparison with sequenced genomes of closely related...
USDA-ARS?s Scientific Manuscript database
The complete 16,345-bp mitochondrial genome of the agriculturally-destructive pod sucking pest, the giant coreid bug, Anoplocnemis curvipes (Hemiptera: Coreidae), was assembled from paired end next generation sequencing reads. The A. curvipes mitochondrial genome consists of 13 protein coding genes...
Ross, Daniel E.; Gulliver, Djuna
2016-10-06
The draft genome sequence ofPseudomonas stutzeristrain K35 was separated from a metagenome derived from a produced water microbial community of a coalbed methane well. The genome encodes a complete nitrogen fixation pathway and the upper and lower naphthalene degradation pathways.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ross, Daniel E.; Gulliver, Djuna
The draft genome sequence ofPseudomonas stutzeristrain K35 was separated from a metagenome derived from a produced water microbial community of a coalbed methane well. The genome encodes a complete nitrogen fixation pathway and the upper and lower naphthalene degradation pathways.
Palau, Montserrat; Boujida, Nadia; Manresa, Àngels; Miñana-Galbis, David
2018-04-19
The complete genome sequence of the halophilic strain Marinobacter flavimaris LMG 23834 T is presented here. The genomic information of this type strain will be useful for taxonomic purposes and for its potential use in bioremediation studies. Copyright © 2018 Palau et al.
Complete genome of the cellulolytic ruminal bacterium Ruminococcus albus 7
USDA-ARS?s Scientific Manuscript database
Ruminococcus albus 7 is a highly cellulolytic rumen bacterium that is a member of the phylum Firmicutes. Here, we describe the complete genome for this microbe. This genome will be useful for rumen microbiology, cellulosome biology, and in biofuel production, as one of its major fermentation product...
Complete genome sequence of Leptotrichia buccalis type strain (C-1013-bT)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ivanova, Natalia; Gronow, Sabine; Lapidus, Alla
2009-05-20
Leptotrichia buccalis (Robin 1853) Trevisan 1879 is the type species of the genus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically adequately accessed family 'Leptotrichiaceae' within the phylum 'Fusobacteria'. Species of Leptotrichia are large fusiform non-motile, non-sporulating rods, which often populate the human oral flora. L. buccalis is anaerobic to aerotolerant, and saccharolytic. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of the order 'Fusobacteriales' and no more than the second sequence from themore » phylum 'Fusobacteria'. The 2,465,610 bp long single replicon genome with its 2306 protein-coding and 61 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less
Complete genome sequence of Leptotrichia buccalis type strain (C-1013-bT)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ivanova, N; Gronow, Sabine; Lapidus, Alla L.
2009-01-01
Leptotrichia buccalis (Robin 1853) Trevisan 1879 is the type species of the genus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically adequately accessed family 'Leptotrichiaceae' within the phylum 'Fusobacteria'. Species of Leptotrichia are large, fusiform, non-motile, non-sporulating rods, which often populate the human oral flora. L. buccalis is anaerobic to aerotolerant, and saccharolytic. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first complete genome sequence of the order 'Fusobacteriales' and no more than the second sequence from themore » phylum 'Fusobacteria'. The 2,465,610 bp long single replicon genome with its 2306 protein-coding and 61 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.« less
The complete chloroplast genome of North American ginseng, Panax quinquefolius.
Han, Zeng-Jie; Li, Wei; Liu, Yuan; Gao, Li-Zhi
2016-09-01
We report complete nucleotide sequence of the Panax quinquefolius chloroplast genome using next-generation sequencing technology. The genome size is 156 359 bp, including two inverted repeats (IRs) of 52 153 bp, separated by the large single-copy (LSC 86 184 bp) and small single-copy (SSC 18 081 bp) regions. This cp genome encodes 114 unigenes (80 protein-coding genes, four rRNA genes, and 30 tRNA genes), in which 18 are duplicated in the IR regions. Overall GC content of the genome is 38.08%. A phylogenomic analysis of the 10 complete chloroplast genomes from Araliaceae using Daucus carota from Apiaceae as outgroup showed that P. quinquefolius is closely related to the other two members of the genus Panax, P. ginseng and P. notoginseng.
Complete Genome Sequence of the Electricity-Producing “Thermincola potens” Strain JR▿
Byrne-Bailey, Kathryne G.; Wrighton, Kelly C.; Melnyk, Ryan A.; Agbo, Peter; Hazen, Terry C.; Coates, John D.
2010-01-01
“Thermincola potens” strain JR is one of the first Gram-positive dissimilatory metal-reducing bacteria (DMRB) for which there is a complete genome sequence. Consistent with the physiology of this organism, preliminary annotation revealed an abundance of multiheme c-type cytochromes that are putatively associated with the periplasm and cell surface in a Gram-positive bacterium. Here we report the complete genome sequence of strain JR. PMID:20525829
Booher, Nicholas J.; Carpenter, Sara C. D.; Sebra, Robert P.; Wang, Li; Salzberg, Steven L.; Leach, Jan E.
2015-01-01
Pathogen-injected, direct transcriptional activators of host genes, TAL (transcription activator-like) effectors play determinative roles in plant diseases caused by Xanthomonas spp. A large domain of nearly identical, 33–35 aa repeats in each protein mediates DNA recognition. This modularity makes TAL effectors customizable and thus important also in biotechnology. However, the repeats render TAL effector (tal) genes nearly impossible to assemble using next-generation, short reads. Here, we demonstrate that long-read, single molecule real-time (SMRT) sequencing solves this problem. Taking an ensemble approach to first generate local, tal gene contigs, we correctly assembled de novo the genomes of two strains of the rice pathogen X. oryzae completed previously using the Sanger method and even identified errors in those references. Sequencing two more strains revealed a dynamic genome structure and a striking plasticity in tal gene content. Our results pave the way for population-level studies to inform resistance breeding, improve biotechnology and probe TAL effector evolution. PMID:27148456
Choudhary, Kumari S.; Mih, Nathan; Monk, Jonathan; Kavvas, Erol; Yurkovich, James T.; Sakoulas, George; Palsson, Bernhard O.
2018-01-01
Two-component systems (TCSs) consist of a histidine kinase and a response regulator. Here, we evaluated the conservation of the AgrAC TCS among 149 completely sequenced Staphylococcus aureus strains. It is composed of four genes: agrBDCA. We found that: (i) AgrAC system (agr) was found in all but one of the 149 strains, (ii) the agr positive strains were further classified into four agr types based on AgrD protein sequences, (iii) the four agr types not only specified the chromosomal arrangement of the agr genes but also the sequence divergence of AgrC histidine kinase protein, which confers signal specificity, (iv) the sequence divergence was reflected in distinct structural properties especially in the transmembrane region and second extracellular binding domain, and (v) there was a strong correlation between the agr type and the virulence genomic profile of the organism. Taken together, these results demonstrate that bioinformatic analysis of the agr locus leads to a classification system that correlates with the presence of virulence factors and protein structural properties. PMID:29887846
The Complete Chloroplast Genome Sequence of Date Palm (Phoenix dactylifera L.)
Yang, Meng; Zhang, Xiaowei; Liu, Guiming; Yin, Yuxin; Chen, Kaifu; Yun, Quanzheng; Zhao, Duojun; Al-Mssallem, Ibrahim S.; Yu, Jun
2010-01-01
Background Date palm (Phoenix dactylifera L.), a member of Arecaceae family, is one of the three major economically important woody palms—the two other palms being oil palm and coconut tree—and its fruit is a staple food among Middle East and North African nations, as well as many other tropical and subtropical regions. Here we report a complete sequence of the data palm chloroplast (cp) genome based on pyrosequencing. Methodology/Principal Findings After extracting 369,022 cp sequencing reads from our whole-genome-shotgun data, we put together an assembly and validated it with intensive PCR-based verification, coupled with PCR product sequencing. The date palm cp genome is 158,462 bp in length and has a typical quadripartite structure of the large (LSC, 86,198 bp) and small single-copy (SSC, 17,712 bp) regions separated by a pair of inverted repeats (IRs, 27,276 bp). Similar to what has been found among most angiosperms, the date palm cp genome harbors 112 unique genes and 19 duplicated fragments in the IR regions. The junctions between LSC/IRs and SSC/IRs show different features of sequence expansion in evolution. We identified 78 SNPs as major intravarietal polymorphisms within the population of a specific cp genome, most of which were located in genes with vital functions. Based on RNA-sequencing data, we also found 18 polycistronic transcription units and three highly expression-biased genes—atpF, trnA-UGC, and rrn23. Conclusions Unlike most monocots, date palm has a typical cp genome similar to that of tobacco—with little rearrangement and gene loss or gain. High-throughput sequencing technology facilitates the identification of intravarietal variations in cp genomes among different cultivars. Moreover, transcriptomic analysis of cp genes provides clues for uncovering regulatory mechanisms of transcription and translation in chloroplasts. PMID:20856810
Miller, Eric S.; Heidelberg, John F.; Eisen, Jonathan A.; Nelson, William C.; Durkin, A. Scott; Ciecko, Ann; Feldblyum, Tamara V.; White, Owen; Paulsen, Ian T.; Nierman, William C.; Lee, Jong; Szczypinski, Bridget; Fraser, Claire M.
2003-01-01
The complete genome sequence of the T4-like, broad-host-range vibriophage KVP40 has been determined. The genome sequence is 244,835 bp, with an overall G+C content of 42.6%. It encodes 386 putative protein-encoding open reading frames (CDSs), 30 tRNAs, 33 T4-like late promoters, and 57 potential rho-independent terminators. Overall, 92.1% of the KVP40 genome is coding, with an average CDS size of 587 bp. While 65% of the CDSs were unique to KVP40 and had no known function, the genome sequence and organization show specific regions of extensive conservation with phage T4. At least 99 KVP40 CDSs have homologs in the T4 genome (Blast alignments of 45 to 68% amino acid similarity). The shared CDSs represent 36% of all T4 CDSs but only 26% of those from KVP40. There is extensive representation of the DNA replication, recombination, and repair enzymes as well as the viral capsid and tail structural genes. KVP40 lacks several T4 enzymes involved in host DNA degradation, appears not to synthesize the modified cytosine (hydroxymethyl glucose) present in T-even phages, and lacks group I introns. KVP40 likely utilizes the T4-type sigma-55 late transcription apparatus, but features of early- or middle-mode transcription were not identified. There are 26 CDSs that have no viral homolog, and many did not necessarily originate from Vibrio spp., suggesting an even broader host range for KVP40. From these latter CDSs, an NAD salvage pathway was inferred that appears to be unique among bacteriophages. Features of the KVP40 genome that distinguish it from T4 are presented, as well as those, such as the replication and virion gene clusters, that are substantially conserved. PMID:12923095
Wang, Jiajia; Li, Hu; Dai, Renhuai
2017-12-01
Here, we describe the first complete mitochondrial genome (mitogenome) sequence of the leafhopper Taharana fasciana (Coelidiinae). The mitogenome sequence contains 15,161 bp with an A + T content of 77.9%. It includes 13 protein-coding genes, two ribosomal RNA genes, 22 transfer RNA genes, and one non-coding (A + T-rich) region; in addition, a repeat region is also present (GenBank accession no. KY886913). These genes/regions are in the same order as in the inferred insect ancestral mitogenome. All protein-coding genes have ATN as the start codon, and TAA or single T as the stop codons, except the gene ND3, which ends with TAG. Furthermore, we predicted the secondary structures of the rRNAs in T. fasciana. Six domains (domain III is absent in arthropods) and 41 helices were predicted for 16S rRNA, and 12S rRNA comprised three structural domains and 24 helices. Phylogenetic tree analysis confirmed that T. fasciana and other members of the Cicadellidae are clustered into a clade, and it identified the relationships among the subfamilies Deltocephalinae, Coelidiinae, Idiocerinae, Cicadellinae, and Typhlocybinae.
Ali, M. Rahmat; Alam, A. S. M. Rubayet Ul; Amin, M. Al; Ullah, Huzzat; Siddique, Mohammad Anwar; Momtaz, Samina; Sultana, Munawar
2017-01-01
ABSTRACT The complete genome sequence of foot-and-mouth disease virus (FMDV) serotype Asia1 isolated from Bangladesh is reported here. Genome analysis revealed amino acid substitutions in the VP1 antigenic region and deletions in both the 5′ and 3′ untranslated regions (UTRs) compared to the genome of the existing vaccine strain (GenBank accession no. AY304994). PMID:29074654
Complete genome sequence of Corynebacterium glutamicum CP, a Chinese l-leucine producing strain.
Gui, Yongli; Ma, Yuechao; Xu, Qingyang; Zhang, Chenglin; Xie, Xixian; Chen, Ning
2016-02-20
Here, we report the complete genome sequence of Corynebacterium glutamicum CP, an industrial l-leucine producing strain in China. The whole genome consists of a circular chromosome and a plasmid. The comparative genomics analysis shows that there are many mutations in the key enzyme coding genes relevant to l-leucine biosynthesis compared to C. glutamicum ATCC 13032. Copyright © 2016 Elsevier B.V. All rights reserved.
Ikegami, Kentaro; Aita, Yuto; Shiroma, Akino; Shimoji, Makiko; Tamotsu, Hinako; Ashimine, Noriko; Shinzato, Misuzu; Ohki, Shun; Nakano, Kazuma; Teruya, Kuniko; Satou, Kazuhito; Hirano, Takashi; Yohda, Masafumi
2018-05-03
The complete genome sequence of Petrimonas sp. strain IBARAKI in a Dehalococcoides -containing culture was determined using the PacBio RS II platform. The genome is a single circular chromosome of 3,693,233 nucleotides (nt), with a GC content of 44%. This is the first genome sequence of a Petrimonas species. Copyright © 2018 Ikegami et al.
De León, Kara B.; Utturkar, Sagar M.; Camilleri, Laura B.; ...
2015-09-24
The genome of Pelosinus fermentans JBW45, isolated from a chromium-contaminated site in Hanford, Washington, USA, has been completed with PacBio sequencing. Finally, nine copies of the rRNA gene operon and multiple transposase genes with identical sequences resulted in breaks in the original draft genome and may suggest genomic instability of JBW45.
Tellgren-Roth, Christian; Baudo, Charles D.; Kennell, John C.; Sun, Sheng; Billmyre, R. Blake; Schröder, Markus S.; Andersson, Anna; Holm, Tina; Sigurgeirsson, Benjamin; Wu, Guangxi; Sankaranarayanan, Sundar Ram; Siddharthan, Rahul; Sanyal, Kaustuv; Lundeberg, Joakim; Nystedt, Björn; Boekhout, Teun; Dawson, Thomas L.; Heitman, Joseph
2017-01-01
Abstract Complete and accurate genome assembly and annotation is a crucial foundation for comparative and functional genomics. Despite this, few complete eukaryotic genomes are available, and genome annotation remains a major challenge. Here, we present a complete genome assembly of the skin commensal yeast Malassezia sympodialis and demonstrate how proteogenomics can substantially improve gene annotation. Through long-read DNA sequencing, we obtained a gap-free genome assembly for M. sympodialis (ATCC 42132), comprising eight nuclear and one mitochondrial chromosome. We also sequenced and assembled four M. sympodialis clinical isolates, and showed their value for understanding Malassezia reproduction by confirming four alternative allele combinations at the two mating-type loci. Importantly, we demonstrated how proteomics data could be readily integrated with transcriptomics data in standard annotation tools. This increased the number of annotated protein-coding genes by 14% (from 3612 to 4113), compared to using transcriptomics evidence alone. Manual curation further increased the number of protein-coding genes by 9% (to 4493). All of these genes have RNA-seq evidence and 87% were confirmed by proteomics. The M. sympodialis genome assembly and annotation presented here is at a quality yet achieved only for a few eukaryotic organisms, and constitutes an important reference for future host-microbe interaction studies. PMID:28100699
Yu, Danna; Fang, Xindong; Storey, Kenneth B; Zhang, Yongpu; Zhang, Jiayong
2016-05-01
The complete mitochondrial genomes of the yellow-bellied slider (Trachemys scripta scripta) and anoxia tolerant red-eared slider (Trachemys scripta elegans) turtles were sequenced to analyze gene arrangement. The complete mt genomes of T. s. scripta and elegans were circular molecules of 16,791 bp and 16,810 bp in length, respectively, and included an A + 1 frameshift insertion in ND3 and ND4L genes. The AT content of the overall base composition of scripta and elegans was 61.2%. Nucleotide sequence divergence of the mt-genome (p distance) between scripta and elegans was 0.4%. A detailed comparison between the mitochondrial genomes of the two subspecies is shown.
Zhang, Chenhua; Zheng, Hongying; Yan, Dankan; Han, Kelei; Song, Xijiao; Liu, Yong; Zhang, Dongfang; Chen, Jianping; Yan, Fei
2017-08-01
Cowpea and broad bean plants showing severe stunting and leaf rolling symptoms were observed in Hefei city, Anhui province, China, in 2014. Symptomatic plants from both species were shown to be infected with milk vetch dwarf virus (MDV) by PCR. The complete genomes of MDV isolates from cowpea and broad bean were sequenced. Each of them had eight genomic DNAs that differed between the two isolates by 10.7% in their overall nucleotide sequences. In addition, the MDV genomes from cowpea and broad bean were associated with two and three alphasatellite DNAs, respectively. This is the first report of MDV on cowpea in China and the first complete genome sequences of Chinese MDV isolates.
Mavromatis, Konstantinos; Stackebrandt, Erko; Munk, Christine; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxanne; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Pagani, Ioanna; Ivanova, Natalia; Mikhailova, Natalia; Huntemann, Marcel; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Rohde, Manfred; Gronow, Sabine; Göker, Markus; Detter, John C.; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Woyke, Tanja
2013-01-01
Alistipes finegoldii Rautio et al. 2003 is one of five species of Alistipes with a validly published name: family Rikenellaceae, order Bacteroidetes, class Bacteroidia, phylum Bacteroidetes. This rod-shaped and strictly anaerobic organism has been isolated mostly from human tissues. Here we describe the features of the type strain of this species, together with the complete genome sequence, and annotation. A. finegoldii is the first member of the genus Alistipes for which the complete genome sequence of its type strain is now available. The 3,734,239 bp long single replicon genome with its 3,302 protein-coding and 68 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:23961309
Using Arabidopsis to understand centromere function: progress and prospects.
Copenhaver, Gregory P
2003-01-01
Arabidopsis thaliana has emerged in recent years as a leading model for understanding the structure and function of higher eukaryotic centromeres. Arabidopsis centromeres, like those of virtually all higher eukaryotes, encompass large DNA domains consisting of a complex combination of unique, dispersed middle repetitive and highly repetitive DNA. For this reason, they have required creative analysis using molecular, genetic, cytological and genomic techniques. This synergy of approaches, reinforced by rapid progress in understanding how proteins interact with the centromere DNA to form a complete functional unit, has made Arabidopsis one the best understood centromere systems. Yet major problems remain to be solved: gaining a complete structural definition of the centromere has been surprisingly difficult, and developing synthetic mini-chromosomes in plants has been even more challenging.
Šmajs, David; Zobaníková, Marie; Strouhal, Michal; Čejková, Darina; Dugan-Rocha, Shannon; Pospíšilová, Petra; Norris, Steven J.; Albert, Tom; Qin, Xiang; Hallsworth-Pepin, Kym; Buhay, Christian; Muzny, Donna M.; Chen, Lei; Gibbs, Richard A.; Weinstock, George M.
2011-01-01
Treponema paraluiscuniculi is the causative agent of rabbit venereal spirochetosis. It is not infectious to humans, although its genome structure is very closely related to other pathogenic Treponema species including Treponema pallidum subspecies pallidum, the etiological agent of syphilis. In this study, the genome sequence of Treponema paraluiscuniculi, strain Cuniculi A, was determined by a combination of several high-throughput sequencing strategies. Whereas the overall size (1,133,390 bp), arrangement, and gene content of the Cuniculi A genome closely resembled those of the T. pallidum genome, the T. paraluiscuniculi genome contained a markedly higher number of pseudogenes and gene fragments (51). In addition to pseudogenes, 33 divergent genes were also found in the T. paraluiscuniculi genome. A set of 32 (out of 84) affected genes encoded proteins of known or predicted function in the Nichols genome. These proteins included virulence factors, gene regulators and components of DNA repair and recombination. The majority (52 or 61.9%) of the Cuniculi A pseudogenes and divergent genes were of unknown function. Our results indicate that T. paraluiscuniculi has evolved from a T. pallidum-like ancestor and adapted to a specialized host-associated niche (rabbits) during loss of infectivity to humans. The genes that are inactivated or altered in T. paraluiscuniculi are candidates for virulence factors important in the infectivity and pathogenesis of T. pallidum subspecies. PMID:21655244
Genome sequence of the ultrasmall unicellular red alga Cyanidioschyzon merolae 10D.
Matsuzaki, Motomichi; Misumi, Osami; Shin-I, Tadasu; Maruyama, Shinichiro; Takahara, Manabu; Miyagishima, Shin-Ya; Mori, Toshiyuki; Nishida, Keiji; Yagisawa, Fumi; Nishida, Keishin; Yoshida, Yamato; Nishimura, Yoshiki; Nakao, Shunsuke; Kobayashi, Tamaki; Momoyama, Yu; Higashiyama, Tetsuya; Minoda, Ayumi; Sano, Masako; Nomoto, Hisayo; Oishi, Kazuko; Hayashi, Hiroko; Ohta, Fumiko; Nishizaka, Satoko; Haga, Shinobu; Miura, Sachiko; Morishita, Tomomi; Kabeya, Yukihiro; Terasawa, Kimihiro; Suzuki, Yutaka; Ishii, Yasuyuki; Asakawa, Shuichi; Takano, Hiroyoshi; Ohta, Niji; Kuroiwa, Haruko; Tanaka, Kan; Shimizu, Nobuyoshi; Sugano, Sumio; Sato, Naoki; Nozaki, Hisayoshi; Ogasawara, Naotake; Kohara, Yuji; Kuroiwa, Tsuneyoshi
2004-04-08
Small, compact genomes of ultrasmall unicellular algae provide information on the basic and essential genes that support the lives of photosynthetic eukaryotes, including higher plants. Here we report the 16,520,305-base-pair sequence of the 20 chromosomes of the unicellular red alga Cyanidioschyzon merolae 10D as the first complete algal genome. We identified 5,331 genes in total, of which at least 86.3% were expressed. Unique characteristics of this genomic structure include: a lack of introns in all but 26 genes; only three copies of ribosomal DNA units that maintain the nucleolus; and two dynamin genes that are involved only in the division of mitochondria and plastids. The conserved mosaic origin of Calvin cycle enzymes in this red alga and in green plants supports the hypothesis of the existence of single primary plastid endosymbiosis. The lack of a myosin gene, in addition to the unexpressed actin gene, suggests a simpler system of cytokinesis. These results indicate that the C. merolae genome provides a model system with a simple gene composition for studying the origin, evolution and fundamental mechanisms of eukaryotic cells.
Gan, Han M; Lee, Yin P; Austin, Christopher M
2017-01-01
We improved upon the previously reported draft genome of Hydrogenophaga intermedia strain PBC, a 4-aminobenzenesulfonate-degrading bacterium, by supplementing the assembly with Nanopore long reads which enabled the reconstruction of the genome as a single contig. From the complete genome, major genes responsible for the catabolism of 4-aminobenzenesulfonate in strain PBC are clustered in two distinct genomic regions. Although the catabolic genes for 4-sulfocatechol, the deaminated product of 4-aminobenzenesulfonate, are only found in H. intermedia , the sad operon responsible for the first deamination step of 4-aminobenzenesulfonate is conserved in various Hydrogenophaga strains. The absence of pabB gene in the complete genome of H. intermedia PBC is consistent with its p -aminobenzoic acid (pABA) auxotrophy but surprisingly comparative genomics analysis of 14 Hydrogenophaga genomes indicate that pABA auxotrophy is not an uncommon feature among members of this genus. Of even more interest, several Hydrogenophaga strains do not possess the genomic potential for hydrogen oxidation, calling for a revision to the taxonomic description of Hydrogenophaga as "hydrogen eating bacteria."
SeqHound: biological sequence and structure database as a platform for bioinformatics research
2002-01-01
Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit. PMID:12401134
Hemipteran Mitochondrial Genomes: Features, Structures and Implications for Phylogeny
Wang, Yuan; Chen, Jing; Jiang, Li-Yun; Qiao, Ge-Xia
2015-01-01
The study of Hemipteran mitochondrial genomes (mitogenomes) began with the Chagas disease vector, Triatoma dimidiata, in 2001. At present, 90 complete Hemipteran mitogenomes have been sequenced and annotated. This review examines the history of Hemipteran mitogenomes research and summarizes the main features of them including genome organization, nucleotide composition, protein-coding genes, tRNAs and rRNAs, and non-coding regions. Special attention is given to the comparative analysis of repeat regions. Gene rearrangements are an additional data type for a few families, and most mitogenomes are arranged in the same order to the proposed ancestral insect. We also discuss and provide insights on the phylogenetic analyses of a variety of taxonomic levels. This review is expected to further expand our understanding of research in this field and serve as a valuable reference resource. PMID:26039239
Linear and Nonlinear Statistical Characterization of DNA
NASA Astrophysics Data System (ADS)
Norio Oiwa, Nestor; Goldman, Carla; Glazier, James
2002-03-01
We find spatial order in the distribution of protein-coding (including RNAs) and control segments of GenBank genomic sequences, irrespective of ATCG content. This is achieved by correlations, histograms, fractal dimensions and singularity spectra. Estimates of these quantities in complete nuclear genome indicate that coding sequences are long-range correlated and their disposition are self-similar (multifractal) for eukaryotes. These characteristics are absent in prokaryotes, where there are few noncoding sequences, suggesting the `junk' DNA play a relevant role to the genome structure and function. Concerning the genetic message of ATCG sequences, we build a random walk (Levy flight), using DNA symmetry arguments, where we associate A, T, C and G as left, right, down and up steps, respectively. Nonlinear analysis of mitochondrial DNA walks reveal multifractal pattern based on palindromic sequences, which fold in hairpins and loops.
Miller, Webb; Hayes, Vanessa M.; Ratan, Aakrosh; Petersen, Desiree C.; Wittekindt, Nicola E.; Miller, Jason; Walenz, Brian; Knight, James; Qi, Ji; Zhao, Fangqing; Wang, Qingyu; Bedoya-Reina, Oscar C.; Katiyar, Neerja; Tomsho, Lynn P.; Kasson, Lindsay McClellan; Hardie, Rae-Anne; Woodbridge, Paula; Tindall, Elizabeth A.; Bertelsen, Mads Frost; Dixon, Dale; Pyecroft, Stephen; Helgen, Kristofer M.; Lesk, Arthur M.; Pringle, Thomas H.; Patterson, Nick; Zhang, Yu; Kreiss, Alexandre; Woods, Gregory M.; Jones, Menna E.; Schuster, Stephan C.
2011-01-01
The Tasmanian devil (Sarcophilus harrisii) is threatened with extinction because of a contagious cancer known as Devil Facial Tumor Disease. The inability to mount an immune response and to reject these tumors might be caused by a lack of genetic diversity within a dwindling population. Here we report a whole-genome analysis of two animals originating from extreme northwest and southeast Tasmania, the maximal geographic spread, together with the genome from a tumor taken from one of them. A 3.3-Gb de novo assembly of the sequence data from two complementary next-generation sequencing platforms was used to identify 1 million polymorphic genomic positions, roughly one-quarter of the number observed between two genetically distant human genomes. Analysis of 14 complete mitochondrial genomes from current and museum specimens, as well as mitochondrial and nuclear SNP markers in 175 animals, suggests that the observed low genetic diversity in today's population preceded the Devil Facial Tumor Disease disease outbreak by at least 100 y. Using a genetically characterized breeding stock based on the genome sequence will enable preservation of the extant genetic diversity in future Tasmanian devil populations. PMID:21709235
Li, Yueyue; Wang, Yang; Hu, John; Xiao, Long; Tan, Guanlin; Lan, Pingxiu; Liu, Yong; Li, Fan
2017-01-31
Tomato mottle mosaic virus (ToMMV) is a recently identified species in the genus Tobamovirus and was first reported from a greenhouse tomato sample collected in Mexico in 2013. In August 2013, ToMMV was detected on peppers (Capsicum spp.) in China. However, little is known about the molecular and biological characteristics of ToMMV. Reverse transcription-polymerase chain reaction (RT-PCR) and rapid identification of cDNA ends (RACE) were carried out to obtain the complete genomic sequences of ToMMV. Sap transmission was used to test the host range and pathogenicity of ToMMV. The full-length genomes of two ToMMV isolates infecting peppers in Yunnan Province and Tibet Autonomous Region of China were determined and analyzed. The complete genomic sequences of both ToMMV isolates consisted of 6399 nucleotides and contained four open reading frames (ORFs) encoding 126, 183, 30 and 18 kDa proteins from the 5' to 3' end, respectively. Overall similarities of the ToMMV genome sequence to those of the other tobamoviruses available in GenBank ranged from 49.6% to 84.3%. Phylogenetic analyses of the sequences of full-genome nucleotide and the amino acids of its four proteins confirmed that ToMMV was most closely related to Tomato mosaic virus (ToMV). According to the genetic structure, host of origin and phylogenetic relationships, the available 32 tobamoviruses could be divided into at least eight subgroups based on the host plant family they infect: Solanaceae-, Brassicaceae-, Cactaceae-, Apocynaceae-, Cucurbitaceae-, Malvaceae-, Leguminosae-, and Passifloraceae-infecting subgroups. The detection of ToMMV on some solanaceous, cucurbitaceous, brassicaceous and leguminous plants in Yunnan Province and other few parts of China revealed ToMMV only occurred on peppers so far. However, the host range test results showed ToMMV could infect most of the tested solanaceous and cruciferous plants, and had a high affinity for the solanaceous plants. The complete nucleotide sequences of two Chinese ToMMV isolates from naturally infected peppers were verified. The tobamoviruses were divided into at least eight subgroups, with ToMMV belonging to the subgroup that infected plants in the Solanaceae. In China, ToMMV only occurred on peppers in the fields till now. ToMMV could infect the plants in family Solanaceae and Cucurbitaceae by sap transmission.
Lu, Zen H; Brown, Alexander; Wilson, Alison D; Calvert, Jay G; Balasch, Monica; Fuentes-Utrilla, Pablo; Loecherbach, Julia; Turner, Frances; Talbot, Richard; Archibald, Alan L; Ait-Ali, Tahar
2014-03-04
Porcine Reproductive and Respiratory Syndrome (PRRS) is a disease of major economic impact worldwide. The etiologic agent of this disease is the PRRS virus (PRRSV). Increasing evidence suggest that microevolution within a coexisting quasispecies population can give rise to high sequence heterogeneity in PRRSV. We developed a pipeline based on the ultra-deep next generation sequencing approach to first construct the complete genome of a European PRRSV, strain Olot/9, cultured on macrophages and then capture the rare variants representative of the mixed quasispecies population. Olot/91 differs from the reference Lelystad strain by about 5% and a total of 88 variants, with frequencies as low as 1%, were detected in the mixed population. These variants included 16 non-synonymous variants concentrated in the genes encoding structural and nonstructural proteins; including Glycoprotein 2a and 5. Using an ultra-deep sequencing methodology, the complete genome of Olot/91 was constructed without any prior knowledge of the sequence. Rare variants that constitute minor fractions of the heterogeneous PRRSV population could successfully be detected to allow further exploration of microevolutionary events.
Genomes of all known members of a Plasmodium subgenus reveal paths to virulent human malaria.
Otto, Thomas D; Gilabert, Aude; Crellen, Thomas; Böhme, Ulrike; Arnathau, Céline; Sanders, Mandy; Oyola, Samuel O; Okouga, Alain Prince; Boundenga, Larson; Willaume, Eric; Ngoubangoye, Barthélémy; Moukodoum, Nancy Diamella; Paupy, Christophe; Durand, Patrick; Rougeron, Virginie; Ollomo, Benjamin; Renaud, François; Newbold, Chris; Berriman, Matthew; Prugnolle, Franck
2018-06-01
Plasmodium falciparum, the most virulent agent of human malaria, shares a recent common ancestor with the gorilla parasite Plasmodium praefalciparum. Little is known about the other gorilla- and chimpanzee-infecting species in the same (Laverania) subgenus as P. falciparum, but none of them are capable of establishing repeated infection and transmission in humans. To elucidate underlying mechanisms and the evolutionary history of this subgenus, we have generated multiple genomes from all known Laverania species. The completeness of our dataset allows us to conclude that interspecific gene transfers, as well as convergent evolution, were important in the evolution of these species. Striking copy number and structural variations were observed within gene families and one, stevor, shows a host-specific sequence pattern. The complete genome sequence of the closest ancestor of P. falciparum enables us to estimate the timing of the beginning of speciation to be 40,000-60,000 years ago followed by a population bottleneck around 4,000-6,000 years ago. Our data allow us also to search in detail for the features of P. falciparum that made it the only member of the Laverania able to infect and spread in humans.
SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata.
Hitz, Benjamin C; Rowe, Laurence D; Podduturi, Nikhil R; Glick, David I; Baymuradov, Ulugbek K; Malladi, Venkat S; Chan, Esther T; Davidson, Jean M; Gabdank, Idan; Narayana, Aditi K; Onate, Kathrina C; Hilton, Jason; Ho, Marcus C; Lee, Brian T; Miyasato, Stuart R; Dreszer, Timothy R; Sloan, Cricket A; Strattan, J Seth; Tanaka, Forrest Y; Hong, Eurie L; Cherry, J Michael
2017-01-01
The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.
SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata
Podduturi, Nikhil R.; Glick, David I.; Baymuradov, Ulugbek K.; Malladi, Venkat S.; Chan, Esther T.; Davidson, Jean M.; Gabdank, Idan; Narayana, Aditi K.; Onate, Kathrina C.; Hilton, Jason; Ho, Marcus C.; Lee, Brian T.; Miyasato, Stuart R.; Dreszer, Timothy R.; Sloan, Cricket A.; Strattan, J. Seth; Tanaka, Forrest Y.; Hong, Eurie L.; Cherry, J. Michael
2017-01-01
The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package. PMID:28403240
From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems.
Garza, Daniel R; Dutilh, Bas E
2015-11-01
Microorganisms and the viruses that infect them are the most numerous biological entities on Earth and enclose its greatest biodiversity and genetic reservoir. With strength in their numbers, these microscopic organisms are major players in the cycles of energy and matter that sustain all life. Scientists have only scratched the surface of this vast microbial world through culture-dependent methods. Recent developments in generating metagenomes, large random samples of nucleic acid sequences isolated directly from the environment, are providing comprehensive portraits of the composition, structure, and functioning of microbial communities. Moreover, advances in metagenomic analysis have created the possibility of obtaining complete or nearly complete genome sequences from uncultured microorganisms, providing important means to study their biology, ecology, and evolution. Here we review some of the recent developments in the field of metagenomics, focusing on the discovery of genetic novelty and on methods for obtaining uncultured genome sequences, including through the recycling of previously published datasets. Moreover we discuss how metagenomics has become a core scientific tool to characterize eco-evolutionary patterns of microbial ecosystems, thus allowing us to simultaneously discover new microbes and study their natural communities. We conclude by discussing general guidelines and challenges for modeling the interactions between uncultured microorganisms and viruses based on the information contained in their genome sequences. These models will significantly advance our understanding of the functioning of microbial ecosystems and the roles of microbes in the environment.
Lisenkova, A A; Grigorenko, A P; Tyazhelova, T V; Andreeva, T V; Gusev, F E; Manakhov, A D; Goltsov, A Yu; Piraino, S; Miglietta, M P; Rogaev, E I
2017-02-01
Turritopsis dohrnii (Cnidaria, Hydrozoa, Hydroidolina, Anthoathecata) is the only known metazoan that is capable of reversing its life cycle via morph rejuvenation from the adult medusa stage to the juvenile polyp stage. Here, we present a complete mitochondrial (mt) genome sequence of T. dohrnii, which harbors genes for 13 proteins, two transfer RNAs, and two ribosomal RNAs. The T. dohrnii mt genome is characterized by typical features of species in the Hydroidolina subclass, such as a high A+T content (71.5%), reversed transcriptional orientation for the large rRNA subunit gene, and paucity of CGN codons. An incomplete complementary duplicate of the cox1 gene was found at the 5' end of the T. dohrnii mt chromosome, as were variable repeat regions flanking the chromosome. We identified species-specific variations (nad5, nad6, cob, and cox1 genes) and putative selective constraints (atp8, nad1, nad2, and nad5 genes) in the mt genes of T. dohrnii, and predicted alterations in tertiary structures of respiratory chain proteins (NADH4, NADH5, and COX1 proteins) of T. dohrnii. Based on comparative analyses of available hydrozoan mt genomes, we also determined the taxonomic relationships of T. dohrnii, recovering Filifera IV as a paraphyletic taxon, and assessed intraspecific diversity of various Hydrozoa species. Copyright © 2016 Elsevier Inc. All rights reserved.
Extremely Low Genomic Diversity of Rickettsia japonica Distributed in Japan.
Akter, Arzuba; Ooka, Tadasuke; Gotoh, Yasuhiro; Yamamoto, Seigo; Fujita, Hiromi; Terasoma, Fumio; Kida, Kouji; Taira, Masakatsu; Nakadouzono, Fumiko; Gokuden, Mutsuyo; Hirano, Manabu; Miyashiro, Mamoru; Inari, Kouichi; Shimazu, Yukie; Tabara, Kenji; Toyoda, Atsushi; Yoshimura, Dai; Itoh, Takehiko; Kitano, Tomokazu; Sato, Mitsuhiko P; Katsura, Keisuke; Mondal, Shakhinur Islam; Ogura, Yoshitoshi; Ando, Shuji; Hayashi, Tetsuya
2017-01-01
Rickettsiae are obligate intracellular bacteria that have small genomes as a result of reductive evolution. Many Rickettsia species of the spotted fever group (SFG) cause tick-borne diseases known as "spotted fevers". The life cycle of SFG rickettsiae is closely associated with that of the tick, which is generally thought to act as a bacterial vector and reservoir that maintains the bacterium through transstadial and transovarial transmission. Each SFG member is thought to have adapted to a specific tick species, thus restricting the bacterial distribution to a relatively limited geographic region. These unique features of SFG rickettsiae allow investigation of how the genomes of such biologically and ecologically specialized bacteria evolve after genome reduction and the types of population structures that are generated. Here, we performed a nationwide, high-resolution phylogenetic analysis of Rickettsia japonica, an etiological agent of Japanese spotted fever that is distributed in Japan and Korea. The comparison of complete or nearly complete sequences obtained from 31 R. japonica strains isolated from various sources in Japan over the past 30 years demonstrated an extremely low level of genomic diversity. In particular, only 34 single nucleotide polymorphisms were identified among the 27 strains of the major lineage containing all clinical isolates and tick isolates from the three tick species. Our data provide novel insights into the biology and genome evolution of R. japonica, including the possibilities of recent clonal expansion and a long generation time in nature due to the long dormant phase associated with tick life cycles. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Synaptogenesis and heritable aspects of executive attention.
Fossella, John A; Sommer, Tobias; Fan, Jin; Pfaff, Don; Posner, Michael I
2003-01-01
In humans, changes in brain structure and function can be measured non-invasively during postnatal development. In animals, advanced optical imaging measures can track the formation of synapses during learning and behavior. With the recent progress in these technologies, it is appropriate to begin to assess how the physiological processes of synapse, circuit, and neural network formation relate to the process of cognitive development. Of particular interest is the development of executive function, which develops more gradually in humans. One approach that has shown promise is molecular genetics. The completion of the human genome project and the human genome diversity project make it straightforward to ask whether variation in a particular gene correlates with variation in behavior, brain structure, brain activity, or all of the above. Strategies that unify the wealth of biochemical knowledge pertaining to synapse formation with the functional measures of brain structure and activity may lead to new insights in developmental cognitive psychology. Copyright 2003 Wiley-Liss, Inc.
Evolutionary characterization of the West Nile Virus complete genome.
Gray, R R; Veras, N M C; Santos, L A; Salemi, M
2010-07-01
The spatial dynamics of the West Nile Virus epidemic in North America are largely unknown. Previous studies that investigated the evolutionary history of the virus used sequence data from the structural genes (prM and E); however, these regions may lack phylogenetic information and obscure true evolutionary relationships. This study systematically evaluated the evolutionary patterns in the eleven genes of the WNV genome in order to determine which region(s) were most phylogenetically informative. We found that while the E region lacks resolution and can potentially result in misleading conclusions, the full NS3 or NS5 regions have strong phylogenetic signal. Furthermore, we show that geographic structure of WNV infection within the US is more pronounced than previously reported in studies that used the structural genes. We conclude that future evolutionary studies should focus on NS3 and NS5 in order to maximize the available sequences while retaining maximal interpretative power to infer temporal and geographic trends among WNV strains. Copyright 2010 Elsevier Inc. All rights reserved.
Kim, Jusik; Choi, Inseo; Lee, Youngsoo
2017-11-01
Maintenance of genomic integrity is one of the critical features for proper neurodevelopment and inhibition of neurological diseases. The signals from both ATM and ATR to TP53 are well-known mechanisms to remove neural cells with DNA damage during neurogenesis. Here we examined the involvement of Atm and Atr in genomic instability due to Terf2 inactivation during mouse brain development. Selective inactivation of Terf2 in neural progenitors induced apoptosis, resulting in a complete loss of the brain structure. This neural loss was rescued partially in both Atm and Trp53 deficiency, but not in an Atr-deficient background in the mouse. Atm inactivation resulted in incomplete brain structures, whereas p53 deficiency led to the formation of multinucleated giant neural cells and the disruption of the brain structure. These giant neural cells disappeared in Lig4 deficiency. These data demonstrate ATM and TP53 are important for the maintenance of telomere homeostasis and the surveillance of telomere dysfunction during neurogenesis.
Shittu, Ismaila; Sharma, Poonam; Joannis, Tony M.; Volkening, Jeremy D.; Odaibo, Georgina N.; Olaleye, David O.; Williams-Coplin, Dawn; Solomon, Ponman; Abolnik, Celia; Miller, Patti J.; Dimitrov, Kiril M.
2016-01-01
The first complete genome sequence of a strain of Newcastle disease virus (NDV) of genotype XVII is described here. A velogenic strain (duck/Nigeria/903/KUDU-113/1992) was isolated from an apparently healthy free-roaming domestic duck sampled in Kuru, Nigeria, in 1992. Phylogenetic analysis of the fusion protein gene and complete genome classified the isolate as a member of NDV class II, genotype XVII. PMID:26847901
Misra, Namrata; Panda, Prasanna Kumar; Parida, Bikram Kumar
2014-12-01
Lysophosphatidyl acyltransferase (LPAT) is one of the major triacylglycerol synthesis enzymes, controlling the metabolic flow of lysophosphatidic acid to phosphatidic acid. Experimental studies in Arabidopsis have shown that LPAT activity is exhibited primarily by three distinct isoforms, namely the plastid-located LPAT1, the endoplasmic reticulum-located LPAT2, and the soluble isoform of LPAT (solLPAT). In this study, 24 putative genes representing all LPAT isoforms were identified from the analysis of 11 complete genomes including green algae, red algae, diatoms and higher plants. We observed LPAT1 and solLPAT genes to be ubiquitously present in nearly all genomes examined, whereas LPAT2 genes to have evolved more recently in the plant lineage. Phylogenetic analysis indicated that LPAT1, LPAT2 and solLPAT have convergently evolved through separate evolutionary paths and belong to three different gene families, which was further evidenced by their wide divergence at gene structure and sequence level. The genome distribution supports the hypothesis that each gene encoding a LPAT is not duplicated. Mapping of exon-intron structure of LPAT genes to the domain structure of proteins across different algal and plant species indicates that exon shuffling plays no role in the evolution of LPAT genes. Besides the previously defined motifs, several conserved consensus sequences were discovered which could be useful to distinguish different LPAT isoforms. Taken together, this study will enable the generation of experimental approximations to better understand the functional role of algal LPAT in lipid accumulation.
Covering complete proteomes with X-ray structures: A current snapshot
Mizianty, Marcin J.; Fan, Xiao; Yan, Jing; ...
2014-10-23
Structural genomics programs have developed and applied structure-determination pipelines to a wide range of protein targets, facilitating the visualization of macromolecular interactions and the understanding of their molecular and biochemical functions. The fundamental question of whether three-dimensional structures of all proteins and all functional annotations can be determined using X-ray crystallography is investigated. A first-of-its-kind large-scale analysis of crystallization propensity for all proteins encoded in 1953 fully sequenced genomes was performed. It is shown that current X-ray crystallographic knowhow combined with homology modeling can provide structures for 25% of modeling families (protein clusters for which structural models can be obtainedmore » through homology modeling), with at least one structural model produced for each Gene Ontology functional annotation. The coverage varies between superkingdoms, with 19% for eukaryotes, 35% for bacteria and 49% for archaea, and with those of viruses following the coverage values of their hosts. It is shown that the crystallization propensities of proteomes from the taxonomic superkingdoms are distinct. The use of knowledge-based target selection is shown to substantially increase the ability to produce X-ray structures. It is demonstrated that the human proteome has one of the highest attainable coverage values among eukaryotes, and GPCR membrane proteins suitable for X-ray structure determination were determined.« less
Holland, M J; Holland, J P; Thill, G P; Jackson, K A
1981-02-10
Segments of yeast genomic DNA containing two enolase structural genes have been isolated by subculture cloning procedures using a cDNA hybridization probe synthesized from purified yeast enolase mRNA. Based on restriction endonuclease and transcriptional maps of these two segments of yeast DNA, each hybrid plasmid contains a region of extensive nucleotide sequence homology which forms hybrids with the cDNA probe. The DNA sequences which flank this homologous region in the two hybrid plasmids are nonhomologous indicating that these sequences are nontandemly repeated in the yeast genome. The complete nucleotide sequence of the coding as well as the flanking noncoding regions of these genes has been determined. The amino acid sequence predicted from one reading frame of both structural genes is extremely similar to that determined for yeast enolase (Chin, C. C. Q., Brewer, J. M., Eckard, E., and Wold, F. (1981) J. Biol. Chem. 256, 1370-1376), confirming that these isolated structural genes encode yeast enolase. The nucleotide sequences of the coding regions of the genes are approximately 95% homologous, and neither gene contains an intervening sequence. Codon utilization in the enolase genes follows the same biased pattern previously described for two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes (Holland, J. P., and Holland, M. J. (1980) J. Biol. Chem. 255, 2596-2605). DNA blotting analysis confirmed that the isolated segments of yeast DNA are colinear with yeast genomic DNA and that there are two nontandemly repeated enolase genes per haploid yeast genome. The noncoding portions of the two enolase genes adjacent to the initiation and termination codons are approximately 70% homologous and contain sequences thought to be involved in the synthesis and processing messenger RNA. Finally there are regions of extensive homology between the two enolase structural genes and two yeast glyceraldehyde-3-phosphate dehydrogenase structural genes within the 5- noncoding portions of these glycolytic genes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Haberle, Rosemarie C.; Fourcade, Matthew L.; Boore, Jeffrey L.
2006-01-09
Chloroplast genome structure, gene order and content arehighly conserved in land plants. We sequenced the complete chloroplastgenome sequence of Trachelium caeruleum (Campanulaceae) a member of anangiosperm family known for highly rearranged chloroplast genomes. Thetotal genome size is 162,321 bp with an IR of 27,273 bp, LSC of 100,113bp and SSC of 7,661 bp. The genome encodes 115 unique genes, with 19duplicated in the IR, a tRNA (trnI-CAU) duplicated once in the LSC and aprotein coding gene (psbJ) duplicated twice, for a total of 137 genes.Four genes (ycf15, rpl23, infA and accD) are truncated and likelynonfunctional; three others (clpP, ycf1 andmore » ycf2) are so highly divergedthat they may now be pseudogenes. The most conspicuous feature of theTrachelium genome is the presence of eighteen internally unrearrangedblocks of genes that have been inverted or relocated within the genome,relative to the typical gene order of most angiosperm chloroplastgenomes. Recombination between repeats or tRNAs has been suggested as twomeans of chloroplast genome rearrangements. We compared the relativenumber of repeats in Trachelium to eight other angiosperm chloroplastgenomes, and evaluated the location of repeats and tRNAs in relation torearrangements. Trachelium has the highest number and largest repeats,which are concentrated near inversion endpoints or other rearrangements.tRNAs occur at many but not all inversion endpoints. There is likely nosingle mechanism responsible for the remarkable number of alterations inthis genome, but both repeats and tRNAs are clearly associated with theserearrangements. Land plant chloroplast genomes are highly conserved instructure, gene order and content. The chloroplast genomes of ferns, thegymnosperm Ginkgo, and most angiosperms are nearly collinear, reflectingthe gene order in lineages that diverged from lycopsids and the ancestralchloroplast gene order over 350 million years ago (Raubeson and Jansen,1992). Although earlier mapping studies identified a number of taxa inwhich several rearrangements have occurred (reviewed in Raubeson andJansen, 2005), an extraordinary number of chloroplast genome alterationsare concentrated in several families in the angiosperm order Asterales(sensu APGII, Bremer et al., 2003). Gene mapping studies ofrepresentatives of the Campanulaceae (Cosner, 1993; Cosner et al.,1997,2004) and Lobeliaceae (Knox et al., 1993; Knox and Palmer, 1999)identified large inversions, contraction and expansion of the invertedrepeat regions, and several insertions and deletions in the cpDNAs ofthese closely related taxa. Detailed restriction site and gene mapping ofthe chloroplast genome of Trachelium caeruleum (Campanulaceae) identifiedseven to ten large inversions, families of repeats associated withrearrangements, possible transpositions, and even the disruption ofoperons (Cosner et al., 1997). Seventeen other members of theCampanulaceae were mapped and exhibit many additional rearrangements(Cosner et al., 2004). What happened in this lineage that made itsusceptible to so many chloroplast genome rearrangements? How do normallyvery conserved chloroplast genomes change? The cause of rearrangements inthis group is unclear based on the limited resolution available withmapping techniques. Several mechanisms have been proposed to explain howrearrangements occur: recombination between repeats, transposition, ortemporary instability due to loss of the inverted repeat (Raubeson andJansen, 2005). Sequencing whole chloroplast genomes within theCampanulaceae offers a unique opportunity to examine both the extent andmechanisms of rearrangements within a phylogenetic framework.We reporthere the first complete chloroplast genome sequence of a member of theCampanulaceae, Trachelium caeruleum. This work will serve as a benchmarkfor subsequent, comparative sequencing and analysis of other members ofthis family and close relatives, with the goal of further understandingchloroplast genome evolution. We confirmed features previously identifiedthrough mapping, and discovered many additional structural changes,including several partial to entire gene duplications, deterioration ofat least four normally conserved chloroplast genes into gene fragments,and the nature and position of numerous repeat elements at or nearinversion endpoints. The focus of this paper is on analyses of sequencesat or near these rearrangements in Trachelium caeruleum. Inversions arebelieved to occur due to the presence of repeat elements subject tohomologous recombination (Palmer, 1991; Knox et al., 1993). Repeats mayfacilitate inversions or other genome rearrangements (Achaz et al.,2003), and higher incidences of repeats have been correlated with greaternumbers of rearrangements (Rocha, 2003). Alternatively, repeats mayproliferate within a genome asa result of DNA strand repair mechanismsfollowing a rearrangement event such as an inversion. Gene« less
Pharmacogenomics and its potential impact on drug and formulation development.
Regnstrom, Karin; Burgess, Diane J
2005-01-01
Recent advances in genomic research have provided the basis for new insights into the importance of genetic and genomic markers during the different stages of drug development. A new field of research, pharmacogenomics, which studies the relationship between drug effects and the genome, has emerged. Structural pharmacogenomics maps the complete DNA sequences of whole genomes (genotypes) including individual variations, and functional pharmacogenomics assesses the expression levels of thousands of genes in one single experiment. Together, these two areas of pharmacogenomics have generated massive databases, which have become a challenge for the research field of informatics and have fostered a new branch of research, bioinformatics. If skillfully used, the databases generated by pharmacogenomics together with data mining on the Web promise to improve the drug development process in a variety of areas: identification of drug targets, evaluation of toxicity, classification of diseases, evaluation of formulations, assessment of drug response and treatment, post-marketing applications, and development of personalized medicines.
The complete genome sequence and proteomics of Yersinia pestis phage Yep-phi.
Zhao, Xiangna; Wu, Weili; Qi, Zhizhen; Cui, Yujun; Yan, Yanfeng; Guo, Zhaobiao; Wang, Zuyun; Wang, Hu; Deng, Haijun; Xue, Yan; Chen, Weijun; Wang, Xiaoyi; Yang, Ruifu
2011-01-01
Yep-phi, a lytic phage of Yersinia pestis, was isolated in China and is routinely used as a diagnostic phage for the identification of the plague pathogen. Yep-phi has an isometric hexagonal head containing dsDNA and a short non-contractile conical tail. In this study, we sequenced the Yep-phi genome (GenBank accession no. HQ333270) and performed proteomics analysis. The genome consists of 38 ,616 bp of DNA, including direct terminal repeats of 222 bp, and is predicted to contain 45 ORFs. Most structural proteins were identified by proteomics analysis. Compared with the three available genome sequences of lytic phages for Y. pestis, the phages could be divided into two subgroups. Yep-phi displays marked homology to the bacteriophages Berlin (GenBank accession no. AM183667) and Yepe2 (GenBank accession no. EU734170), and these comprise one subgroup. The other subgroup is represented by bacteriophage ΦA1122 (GenBank accession no. AY247822). Potential recombination was detected among the Yep-phi subgroup.
Dong, Chen; Hu, Huigang; Xie, Jianghui
2016-12-01
DNA-binding with one finger (Dof) domain proteins are a multigene family of plant-specific transcription factors involved in numerous aspects of plant growth and development. In this study, we report a genome-wide search for Musa acuminata Dof (MaDof) genes and their expression profiles at different developmental stages and in response to various abiotic stresses. In addition, a complete overview of the Dof gene family in bananas is presented, including the gene structures, chromosomal locations, cis-regulatory elements, conserved protein domains, and phylogenetic inferences. Based on the genome-wide analysis, we identified 74 full-length protein-coding MaDof genes unevenly distributed on 11 chromosomes. Phylogenetic analysis with Dof members from diverse plant species showed that MaDof genes can be classified into four subgroups (StDof I, II, III, and IV). The detailed genomic information of the MaDof gene homologs in the present study provides opportunities for functional analyses to unravel the exact role of the genes in plant growth and development.
Li, Erna; Wei, Xiao; Ma, Yanyan; Yin, Zhe; Li, Huan; Lin, Weishi; Wang, Xuesong; Li, Chao; Shen, Zhiqiang; Zhao, Ruixiang; Yang, Huiying; Jiang, Aimin; Yang, Wenhui; Yuan, Jing; Zhao, Xiangna
2016-01-01
Enterobacter aerogenes (Enterobacteriaceae) is an important opportunistic pathogen that causes hospital-acquired pneumonia, bacteremia, and urinary tract infections. Recently, multidrug-resistant E. aerogenes have been a public health problem. To develop an effective antimicrobial agent, bacteriophage phiEap-2 was isolated from sewage and its genome was sequenced because of its ability to lyse the multidrug-resistant clinical E. aerogenes strain 3-SP. Morphological observations suggested that the phage belongs to the Siphoviridae family. Comparative genome analysis revealed that phage phiEap-2 is related to the Salmonella phage FSL SP-031 (KC139518). All of the structural gene products (except capsid protein) encoded by phiEap-2 had orthologous gene products in FSL SP-031 and Serratia phage Eta (KC460990). Here, we report the complete genome sequence of phiEap-2 and major findings from the genomic analysis. Knowledge of this phage might be helpful for developing therapeutic strategies against E. aerogenes. PMID:27320081
Li, Erna; Wei, Xiao; Ma, Yanyan; Yin, Zhe; Li, Huan; Lin, Weishi; Wang, Xuesong; Li, Chao; Shen, Zhiqiang; Zhao, Ruixiang; Yang, Huiying; Jiang, Aimin; Yang, Wenhui; Yuan, Jing; Zhao, Xiangna
2016-06-20
Enterobacter aerogenes (Enterobacteriaceae) is an important opportunistic pathogen that causes hospital-acquired pneumonia, bacteremia, and urinary tract infections. Recently, multidrug-resistant E. aerogenes have been a public health problem. To develop an effective antimicrobial agent, bacteriophage phiEap-2 was isolated from sewage and its genome was sequenced because of its ability to lyse the multidrug-resistant clinical E. aerogenes strain 3-SP. Morphological observations suggested that the phage belongs to the Siphoviridae family. Comparative genome analysis revealed that phage phiEap-2 is related to the Salmonella phage FSL SP-031 (KC139518). All of the structural gene products (except capsid protein) encoded by phiEap-2 had orthologous gene products in FSL SP-031 and Serratia phage Eta (KC460990). Here, we report the complete genome sequence of phiEap-2 and major findings from the genomic analysis. Knowledge of this phage might be helpful for developing therapeutic strategies against E. aerogenes.
Genome Sequencing of Steroid Producing Bacteria Using Ion Torrent Technology and a Reference Genome.
Sola-Landa, Alberto; Rodríguez-García, Antonio; Barreiro, Carlos; Pérez-Redondo, Rosario
2017-01-01
The Next-Generation Sequencing technology has enormously eased the bacterial genome sequencing and several tens of thousands of genomes have been sequenced during the last 10 years. Most of the genome projects are published as draft version, however, for certain applications the complete genome sequence is required.In this chapter, we describe the strategy that allowed the complete genome sequencing of Mycobacterium neoaurum NRRL B-3805, an industrial strain exploited for steroid production, using Ion Torrent sequencing reads and the genome of a close strain as the reference. This protocol can be applied to analyze the genetic variations between closely related strains; for example, to elucidate the point mutations between a parental strain and a random mutagenesis-derived mutant.
Complete genome sequence of a recent panzootic virulent Newcastle disease virus from Pakistan
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of a new strain of Newcastle disease virus (NDV) (chicken/Pak/Lahore-611/2013) is reported. The strain was isolated from a vaccinated chicken flock in Pakistan in 2013 and has panzootic features. The genome is 15192 nucleotides in length and is classified as sub-genotype V...
Complete genome sequence of yam chlorotic necrosis virus, a novel macluravirus infecting yam
USDA-ARS?s Scientific Manuscript database
Complete genomic sequence of a novel member of the genus Macluravirus was determined from yam plants with chlorotic and necrotic symptoms in China. The genomic RNA consists of 8,261 nucleotides (nt) excluding the 3’-terminal poly (A) tail, containing one long open reading frame (ORF) encoding a larg...
Complete Genome Sequence of Magnetospirillum gryphiswaldense MSR-1
Wang, Xu; Wang, Qing; Zhang, Weijia; Wang, Yinjia; Li, Li; Wen, Tong; Zhang, Tongwei; Zhang, Yang; Xu, Jun; Hu, Junying; Li, Shuqi; Liu, Lingzi; Liu, Jinxin; Jiang, Wei; Tian, Jiesheng; Wang, Lei; Li, Jilun
2014-01-01
We report the complete genomic sequence of Magnetospirillum gryphiswaldense MSR-1 (DSM 6361), a type strain of the genus Magnetospirillum belonging to the Alphaproteobacteria. Compared to the reported draft sequence, extensive rearrangements and differences were found, indicating high genomic flexibility and “domestication” by accelerated evolution of the strain upon repeated passaging. PMID:24625872
Full Genome Sequence of Giant Panda Rotavirus Strain CH-1
Guo, Ling; Yang, Shaolin; Wang, Chengdong; Chen, Shijie; Yang, Xiaonong; Hou, Rong; Quan, Zifang; Hao, Zhongxiang
2013-01-01
We report here the complete genomic sequence of the giant panda rotavirus strain CH-1. This work is the first to document the complete genomic sequence (segments 1 to 11) of the CH-1 strain, which offers an effective platform for providing authentic research experiences to novice scientists. PMID:23469354
USDA-ARS?s Scientific Manuscript database
Tung tree (Vernicia fordii) is an economically important plant widely cultivated for industrial oil production in China. To better understand the molecular basis of tung tree chloroplasts, we sequenced and characterized the complete chloroplast genome. The chloroplast genome was 161,524 bp in length...
USDA-ARS?s Scientific Manuscript database
The availability of complete or nearly complete genome sequences from several plant species permits detailed discovery and cross-species comparison of transposable elements (TEs) at the whole genome level. We initially investigated 510 LTR-retrotransposon (LTR-RT) families that are comprised of 32,...
Yiheng Hu; Jing Yan; Xiaojia Feng; Meng Dang; Keith E. Woeste; Peng. Zhao
2017-01-01
The wheel wingnut (Cyclocarya paliurus) is an endemic species distributed in eastern and central China. Cyclocarya is a woody genus in the Juglandaceae used in medicine and horticulture. The complete chloroplast genome of C. paliurus was sequenced using the Illumina Hiseq 2500 platform. The total genome...
Complete genome sequence of a divergent strain of Japanese yam mosaic virus from China
USDA-ARS?s Scientific Manuscript database
A novel strain of Japanese yam mosaic virus (JYMV-CN) was identified in a yam plant with foliar mottle symptoms in China. The complete genomic sequence of JYMV-CN was determined. Its genomic sequence of 9701 nucleotides encodes a polyprotein of 3247 amino acids. Its organization was virtually identi...
USDA-ARS?s Scientific Manuscript database
Campylobacter jejuni is an important foodborne pathogen that causes gastroenteritis in humans and is commonly found in poultry and meat products. Here, we report the complete genome sequence of a Campylobacter jejuni strain recently isolated from retail beef liver. The genome size was 1,712,361 bp, ...
USDA-ARS?s Scientific Manuscript database
Salmonella enterica are a versatile group of bacteria with a wide range in virulence potential. To facilitate genome comparisons across this virulence spectrum, we present eight complete closed genome sequences of four S. enterica serotypes (Anatum, Montevideo, Typhimurium, and Newport) isolated fro...